Collections as Data files and documentation from Penn State University Libraries.
The University Libraries seeks to make its digital collections available in computational formats in order to broaden access, make their contents available for research and instruction in new forms, and enhance the value of the collections to the Libraries and to our users. In doing so the Libraries seek to follow the example and the principles of the Collections as Data: Part to Whole initiative.
Questions about the data contained in this repository may be directed to the Digital Collections Librarian.
Currently the Collections as Data program at Penn State publishes full-text transcriptions of collections for which they are readily available (through optical character recognition or manual transcription), in both plain text and JSON formats. These corpora are suitable for text analysis.
Collections are named according to the identifiers provided for them through the Penn State Digital Object Guidelines, as defined in the Digital Preservation Policy. A key for associating identifiers with their collections is provided below.
The "scripts" directory consists of tools, mostly written in Python, enabling data extract from CONTENTdm and re-encoding in other formats suitable for computational use.
Folder Name | Collection Name |
---|---|
pst_0019379118 | Transactions of the Northeast Section, The Wildlife Society |
pst_ces | Cahiers Césairiens |