Installation
Before we begin, ensure you have Scrapy installed on your system. If you donโt, you can easily install it using pip, the Python package installer:
pip3 install scrapy
First, run this command in your terminal to download the custom scrapy spider,ย ReconSpider
, and extract it to the current working directory.
wget -O ReconSpider.zip https://academy.hackthebox.com/storage/modules/144/ReconSpider.v1.2.zip
unzip ReconSpider.zip
Basic command
python3 ReconSpider.py http://DOMAIN.com
results.json
After runningย ReconSpider.py
, the data will be saved in a JSON file,ย results.json
. This file can be explored using any text editor. Below is the structure of the JSON file produced:
{
"emails": [
"lily.floid@inlanefreight.com",
"cvs@inlanefreight.com",
...
],
"links": [
"https://www.themeansar.com",
"https://www.inlanefreight.com/index.php/offices/",
...
],
"external_files": [
"https://www.inlanefreight.com/wp-content/uploads/2020/09/goals.pdf",
...
],
"js_files": [
"https://www.inlanefreight.com/wp-includes/js/jquery/jquery-migrate.min.js?ver=3.3.2",
...
],
"form_fields": [],
"images": [
"https://www.inlanefreight.com/wp-content/uploads/2021/03/AboutUs_01-1024x810.png",
...
],
"videos": [],
"audio": [],
"comments": [
"<!-- #masthead -->",
...
]
}
Each key in the JSON file represents a different type of data extracted from the target website:
JSON Key | Description |
---|---|
emails | Lists email addresses found on the domain. |
links | Lists URLs of links found within the domain. |
external_files | Lists URLs of external files such as PDFs. |
js_files | Lists URLs of JavaScript files used by the website. |
form_fields | Lists form fields found on the domain (empty in this example). |
images | Lists URLs of images found on the domain. |
videos | Lists URLs of videos found on the domain (empty in this example). |
audio | Lists URLs of audio files found on the domain (empty in this example). |
comments | Lists HTML comments found in the source code. |
By exploring this JSON structure, you can gain valuable insights into the web applicationโs architecture, content, and potential points of interest for further investigation.