Skip to content

The library transforms rows from a relational database table into a nested document and then to a standard matrix file format. The document structure consists of nested dictionaries and is formatted in a human readable JSON format. The self describing matrix format is HDF5 which can be read by a wide range of scientific programming environments …

Notifications You must be signed in to change notification settings

jhajagos/TransformDBtoHDF5ML

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

69 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TransformDBtoHDF5ML

The library transforms rows from a relational database table into a nested document and then to a standard matrix file format. The document structure consists of nested dictionaries and is formatted in a human readable JSON format. The self describing matrix format is HDF5 which can be read by a wide range of scientific programming environments including: Matlab, Scikits via h5py, Mathematica and R.

This code started out as a mapper for relational data into a format that could be used to easily train machine learning algorithms for hospital readmission and quality work. The examples in the tests are formatted around this use case. The two programs "build_document_mapping_from_db.py" and "build_hdf5_matrix_from_document.py" are not limited to the readmission use case and have been designed to scale with data size.

Documentation for configuring the data maps and running the Python scripts are here: https://github.com/jhajagos/TransformDBtoHDF5ML/tree/master/documentation

About

The library transforms rows from a relational database table into a nested document and then to a standard matrix file format. The document structure consists of nested dictionaries and is formatted in a human readable JSON format. The self describing matrix format is HDF5 which can be read by a wide range of scientific programming environments …

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages