Skip to content

Releases: juzraai/cordis-projects-crawler

Version 2.0.0

28 Apr 21:21
Compare
Choose a tag to compare

Version 1.3.0

09 Dec 18:09
Compare
Choose a tag to compare
  • two download mode: download all, or download one by RCN
  • can download project data page and publication list JSON string
  • detects CORDIS internal error messages
  • retries downloads with increasing sleeps
  • download files into output directory
  • can skip already existing files
  • filename templates can be configured
  • can read RCNs from already downloaded files' names instead of crawling CORDIS list
  • CLI for set up and run crawler
  • file logging
  • can parse all info from project page (IMPROVED)
  • can parse publication list JSON strings (IMPROVED)
  • can export projects' data to CSV file
  • can export all data to MySQL database (NEW)

Version 1.2.0

30 Nov 15:54
Compare
Choose a tag to compare

Features:

  • two download mode: download all, or download one by RCN
  • can download project data page and publication list JSON string
  • detects CORDIS internal error messages (IMPROVED)
  • retries downloads with increasing sleeps
  • download files into output directory
  • can skip already existing files
  • filename templates can be configured
  • can read RCNs from already downloaded files' names instead of crawling CORDIS list (IMPROVED)
  • CLI for set up and run crawler
  • file logging
  • can parse all info from project page (NEW)
  • can parse publication list JSON strings (NEW)
  • can export projects' data to CSV file (NEW)

Fixed:

  • added server error messages to detector
  • avoided RCN redundancy in readRCNFromDirectory method

Version 1.1.0

26 Nov 11:21
Compare
Choose a tag to compare

Features:

  • two download mode: download all, or download one by RCN
  • can download project data page and publication list JSON string (IMPROVED)
  • detects CORDIS internal error messages (NEW)
  • retries downloads with increasing sleeps
  • download files into output directory
  • can skip already existing files
  • filename templates can be configured
  • can read RCNs from already downloaded files' names instead of crawling CORDIS list (IMPROVED)
  • CLI for set up and run crawler
  • file logging (NEW)

Fixed:

  • project reference parsing

Version 1.0.0

25 Nov 10:02
Compare
Choose a tag to compare

Features:

  • can download project data page and publication list JSON string
  • two download mode: download all, or download one by RCN
  • download files into output directory
  • can skip already existing files
  • filename templates can be configured
  • can download JSONs for existing project data pages
  • CLI for set up and run crawler