Skip to content

Creating and manipulating various protein sequence-structure datasets using Python, Julia, and other tools.

License

Notifications You must be signed in to change notification settings

dillondaudert/proteindatasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

70 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Protein sequence and structure datasets

This repo contains scripts for creating various protein sequence and structure datasets, as well as some guides for how to use them.

Contents

proteinfeatures

Protein amino acid features.

cpdb

Working with the cullPDB dataset created in Zhou & Troyanskaya, 2014.

cpdb2

Creating a new protein sequence-structure dataset following the methods used for the cullPDB dataset, referred to as cpdb2.

psiblast

Scripts for calling NCBI+ psiblast on large fasta files from BioPython and handling the results using multiprocessing.