This is a very simple scraper that takes three parameters:
SCRAPE_URL
: the scraping targetREGEX
: a python favor regular expression for matching the scraped contentOUTPUT_FILE
: the filepath to which the result should be written
pip3 install -r requirements.txt
$ docker build -t simple-scraper-regex .
$ docker run --env-file ./sample.env -v /Users/billykong/workspace/github.com/billykong/scraper-regex/results:/results simple-scraper-regex:latest
With sample.env:
SCRAPE_URL=https://www.bing.com/covid/data
REGEX=\{[^\{]*hong[^\}]*\}
OUTPUT_FILE=/results/output-container.txt
You can also run it in a terminal:
$ python scraper.py 'https://www.bing.com/covid/data' '\{[^\{]*hong[^\}]*\}' ./output.txt
- Add unit tests
- Setup GitHub Action to push Docker image to DockerHub upon push to master