Skip to content

Latest commit

 

History

History
23 lines (12 loc) · 873 Bytes

README.md

File metadata and controls

23 lines (12 loc) · 873 Bytes

WebCrawler

This is a Python script that crawls a website and saves the text content of each page in a text file. It also extracts all the hyperlinks from each page and follows the links that are within the same domain to continue the crawling process.

Requirements

Python 3.x
Works on Linux, Windows, macOS, BSD

Install

Install dependencies:

pip install -r requirements.txt

Usage

To use this script, replace the domain and full_url variables with the domain and full URL of the website you want to crawl. Then, simply run the script in your Python environment.

The script will create a text directory in the same directory as the script, which will contain a directory for the domain being crawled and text files for each page crawled.

Note: It is recommended to use this script with permission from the authors of the websites.