Releases: juzraai/cordis-projects-crawler
Releases · juzraai/cordis-projects-crawler
Version 2.0.0
Version 1.3.0
- two download mode: download all, or download one by RCN
- can download project data page and publication list JSON string
- detects CORDIS internal error messages
- retries downloads with increasing sleeps
- download files into output directory
- can skip already existing files
- filename templates can be configured
- can read RCNs from already downloaded files' names instead of crawling CORDIS list
- CLI for set up and run crawler
- file logging
- can parse all info from project page (IMPROVED)
- can parse publication list JSON strings (IMPROVED)
- can export projects' data to CSV file
- can export all data to MySQL database (NEW)
Version 1.2.0
Features:
- two download mode: download all, or download one by RCN
- can download project data page and publication list JSON string
- detects CORDIS internal error messages (IMPROVED)
- retries downloads with increasing sleeps
- download files into output directory
- can skip already existing files
- filename templates can be configured
- can read RCNs from already downloaded files' names instead of crawling CORDIS list (IMPROVED)
- CLI for set up and run crawler
- file logging
- can parse all info from project page (NEW)
- can parse publication list JSON strings (NEW)
- can export projects' data to CSV file (NEW)
Fixed:
- added server error messages to detector
- avoided RCN redundancy in readRCNFromDirectory method
Version 1.1.0
Features:
- two download mode: download all, or download one by RCN
- can download project data page and publication list JSON string (IMPROVED)
- detects CORDIS internal error messages (NEW)
- retries downloads with increasing sleeps
- download files into output directory
- can skip already existing files
- filename templates can be configured
- can read RCNs from already downloaded files' names instead of crawling CORDIS list (IMPROVED)
- CLI for set up and run crawler
- file logging (NEW)
Fixed:
- project reference parsing
Version 1.0.0
Features:
- can download project data page and publication list JSON string
- two download mode: download all, or download one by RCN
- download files into output directory
- can skip already existing files
- filename templates can be configured
- can download JSONs for existing project data pages
- CLI for set up and run crawler