baidu_index

crawl baidu index without selenium&phantomjs

requirements

flask
pillow
numpy
requests
lxml
docker

install

启动 docker

sudo docker pull scrapinghub/splash
sudo docker run -p 8050:8050 -p 5023:5023 scrapinghub/splash

拷贝项目

git clone https://github.com/Syhen/baidu-index.git

设置 baidu-index 环境变量
启动flask微服务

cd baidu-index/baidu_index/backend
python index.py

配置nginx 配置微服务的nginx，因为splash不能解析localhost

然后将 baidu_index.core.index.get_res2 中的域名调整为配置好的域名

demo

from __future__ import unicode_literals

from requests.cookies import RequestsCookieJar

from baidu_index.core.index import BaiduIndexCrawler

cookies = RequestsCookieJar()
# update cookies with login
baidu_index_crawler = BaiduIndexCrawler('机器学习', cookies, start_date="2017-01-01", end_date="2017-01-31")
baidu_index_crawler.next()
# 936

warning!!

禁止商用！

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
baidu_index		baidu_index
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

baidu_index

requirements

install

warning!!

About

Releases

Packages

Languages

License

Syhen/baidu-index

Folders and files

Latest commit

History

Repository files navigation

baidu_index

requirements

install

warning!!

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages