Skip to content

Latest commit

 

History

History
22 lines (13 loc) · 1.64 KB

README.md

File metadata and controls

22 lines (13 loc) · 1.64 KB

Collections as Data

Collections as Data files and documentation from Penn State University Libraries.

The University Libraries seeks to make its digital collections available in computational formats in order to broaden access, make their contents available for research and instruction in new forms, and enhance the value of the collections to the Libraries and to our users. In doing so the Libraries seek to follow the example and the principles of the Collections as Data: Part to Whole initiative.

Questions about the data contained in this repository may be directed to the Digital Collections Librarian.

About The Data

Currently the Collections as Data program at Penn State publishes full-text transcriptions of collections for which they are readily available (through optical character recognition or manual transcription), in both plain text and JSON formats. These corpora are suitable for text analysis.

Collections are named according to the identifiers provided for them through the Penn State Digital Object Guidelines, as defined in the Digital Preservation Policy. A key for associating identifiers with their collections is provided below.

The "scripts" directory consists of tools, mostly written in Python, enabling data extract from CONTENTdm and re-encoding in other formats suitable for computational use.

Guide to Collection Folders

Folder Name Collection Name
pst_0019379118 Transactions of the Northeast Section, The Wildlife Society
pst_ces Cahiers Césairiens