Installation

Before we begin, ensure you have Scrapy installed on your system. If you donโ€™t, you can easily install it using pip, the Python package installer:

pip3 install scrapy

First, run this command in your terminal to download the custom scrapy spider,ย ReconSpider, and extract it to the current working directory.

wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip 

Basic command

python3 ReconSpider.py http://DOMAIN.com

results.json

After runningย ReconSpider.py, the data will be saved in a JSON file,ย results.json. This file can be explored using any text editor. Below is the structure of the JSON file produced:

{
    "emails": [
        "lily.floid@inlanefreight.com",
        "cvs@inlanefreight.com",
        ...
    ],
    "links": [
        "https://www.themeansar.com",
        "https://www.inlanefreight.com/index.php/offices/",
        ...
    ],
    "external_files": [
        "https://www.inlanefreight.com/wp-content/uploads/2020/09/goals.pdf",
        ...
    ],
    "js_files": [
        "https://www.inlanefreight.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2",
        ...
    ],
    "form_fields": [],
    "images": [
        "https://www.inlanefreight.com/wp-content/uploads/2021/03/AboutUs_01-1024x810.png",
        ...
    ],
    "videos": [],
    "audio": [],
    "comments": [
        "<!-- #masthead -->",
        ...
    ]
}

Each key in the JSON file represents a different type of data extracted from the target website:

JSON KeyDescription
emailsLists email addresses found on the domain.
linksLists URLs of links found within the domain.
external_filesLists URLs of external files such as PDFs.
js_filesLists URLs of JavaScript files used by the website.
form_fieldsLists form fields found on the domain (empty in this example).
imagesLists URLs of images found on the domain.
videosLists URLs of videos found on the domain (empty in this example).
audioLists URLs of audio files found on the domain (empty in this example).
commentsLists HTML comments found in the source code.

By exploring this JSON structure, you can gain valuable insights into the web applicationโ€™s architecture, content, and potential points of interest for further investigation.