Skip to content

Reads transport invoice PDF files from Brazil Federal government to its States to extract vaccine batches data.

License

Notifications You must be signed in to change notification settings

mirianbr/vaccine-batches

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Brazil vaccine-batches dataset

Project aimed at creating a public dataset for the COVID-19 vaccine batches sent to each of Brazil States by the Federal Government, between January to May 2021.

The vaccine batches information is made available by the Brazilian government agency SAGE in PDF files, one file for each vaccine distribution phase. The PDF files are for transport (freight) invoices, and they contain information about each vaccine batch (including its number and expiration dates). The picture below presents an example of such document.

Transportation invoice example

The data extraction was originally done using python, imagemagick and pytesseract (see Jupyter Notebook). You can see the raw result here.

The following partial clean sets are available:

Current status (as of Aug 6th, 2021): cleaning the dataset to make it public. Feel free to contact me if you have any thoughts, suggestions or questions.

Original data source: https://sage.saude.gov.br/sistemas/vacina/vacina_fases.php

About

Reads transport invoice PDF files from Brazil Federal government to its States to extract vaccine batches data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published