Skip to content
This repository has been archived by the owner on May 9, 2023. It is now read-only.

ReticulatedSpline/web_crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

62 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

About

This is another small webapp based on the Utilities Balancer UI. Give it a root node, the number of pages to search, and some RegEx targets and it will politely ask the NodeJS backend to complete the search. The split stack was done as a work-around to the Same Origin Policy.

Credits

This app uses HTML5 for DOM elements and pure CSS for styling. The business logic is implemented with Angular, a Javascript framework. A single purpose Node.js server is used to serve the webpage as well as run the search algorithm.

This would not have been possible without Stephen's tutorial on NetInstructions.

This project was generated with Angular CLI version 1.1.2.

UI elements provided by Angular Material.

NPM was used to manage Node packages, including Cheerio for DOM scraping and Express for serving the static transpiled build version. ngx-clipboard. for clipboard functionality.

Launching

Run ng serve for a dev server. Navigate to http://localhost:4200/. The app will automatically reload if you change any of the source files. Run ng build to transpile to the /dist folder. It can then be served with npm start on http:localhost:8080. The latest version of the origin branch is live Here. It may take a few minutes to spool up from cold boot.

Usage

The root site is where the search starts. Quota is the number of pages to process. You can specify searching for email, phone, regex, or add custom targets.

About

nodeJS web crawler based on regex

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published