README

This is a very simple scraper that takes three parameters:

SCRAPE_URL: the scraping target
REGEX: a python favor regular expression for matching the scraped content
OUTPUT_FILE: the filepath to which the result should be written

Setup

pip3 install -r requirements.txt

Build

$ docker build -t simple-scraper-regex .

RUN

$ docker run --env-file ./sample.env -v /Users/billykong/workspace/github.com/billykong/scraper-regex/results:/results simple-scraper-regex:latest

With sample.env:

SCRAPE_URL=https://www.bing.com/covid/data
REGEX=\{[^\{]*hong[^\}]*\}
OUTPUT_FILE=/results/output-container.txt

You can also run it in a terminal: $ python scraper.py 'https://www.bing.com/covid/data' '\{[^\{]*hong[^\}]*\}' ./output.txt

TODO

Add unit tests
Setup GitHub Action to push Docker image to DockerHub upon push to master

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
scraper.py		scraper.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Setup

Build

RUN

TODO

About

Releases

Packages

Languages

billykong/simple-scraper-regex

Folders and files

Latest commit

History

Repository files navigation

README

Setup

Build

RUN

TODO

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages