Skip to content

A simple scraper to get various objects like, images or links parsed from a website.

License

Notifications You must be signed in to change notification settings

Pritish-Sinha/WebpageScrapper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Webpage Scrapper

An interactive, easy and unique way to scrape web page components from webpages.

Features:

  • Scraping following objects and parsing them:

    • Headings and subheading
    • Links
    • Images
    • Tables (Coming soon)
    • CSS based class selector.

Libraries Used:

  • Requests (To fetch Webpage contents)
  • Inquirer (Interactive CLI UI)
  • BeautifulSoup4 (Parse the HTML)

How to run?

To run this you need pipenv installed on your system. Do this to install it: pip install pipenv

and then run the module by running pipenv run start

TODO:

  • Work on Tables
  • Pretty print tables
  • Select the title of the page
  • Gather all the text of the page
  • Get all the <p> tag content
  • Whole source code
  • Enable writing to file
  • Work on CSS based selector

About

A simple scraper to get various objects like, images or links parsed from a website.

Topics

Resources

License

Stars

Watchers

Forks

Languages