Skip to content
This repository has been archived by the owner on Aug 3, 2019. It is now read-only.

Tool for creating static mapping files from multiples data types to the reactions and pathways in the Reactome graph database.

License

Notifications You must be signed in to change notification settings

PathwayAnalysisPlatform/Extractor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Extractor Module

This is a command line program in Java, which is a module of PathwayMatcher

This module gathers reference biological data necessary to perform pathway search and analysis, and creates static mapping files that are loaded during execution of PathwayMatcher.

The extractor has two main components, one for the mapping of genetic variants and the other to map proteins and proteoforms to pathways.

Image of reference data extraction process

The necessary mappings for the pathway search:

Image of static mappings for pathway search

  • SNP --> Gene name
  • SNP --> Protein (UniProt accession)
  • Protein --> Proteoforms
  • Protein --> Reactions
  • Proteoform --> Reactions
  • Reactions --> Pathways
  • Pathways --> Top Level Pathways

The necessary mappings for the interaction networks are: Image of static mappings for protein interaction networks

Image of static mappings for proteoform interaction networks

Genetic variants

  • VepFolderProcessor: Creates table files with the mapping of genetic variants to gene names and protein UniProt [1] accessions using the Variant Effect Predictor [2].

Input:

No file is needed as input.

Output:

Tables with the mapping from genetic variants to gene names and SwissProt entries (UniProt). One table for each chromosome: 1.gz, 2.gz,...,22.gz

Reactions and Pathways

  • Extractor: Creates the mapping files to go from gene names, proteins and proteoforms to reactions and pathways of Reactome [3].

Set up requirements:

Input:

Tables generated with VepFolderProcessor: 1.gz, 2.gz,...,22.gz

Output:

Serialized files ready to be used by PathwayMatcher:

  • Entity lists:

    • proteins.gz
    • reactions.gz
    • pathways.gz
  • Static mappings for pathway search:

    • Pairs of chromosome and base pair to protein UniProt accessions: chrBpToProteins1.gz,...,chrBpToProteins22.gz
    • SNP rsIds to protein UniProt accessions: rsIdsToProteins1.gz,..., rsIdsToProteins22.gz
    • Gene names to protein UniProt accessions: genesToProteins.gz
    • Ensembl protein identifiers to UniProt accessions: ensemblToProteins.gz
    • Protein UniProt accessions to proteoforms: proteinsToProteoforms.gz
    • Protein UniProt accessions to reactions: proteinsToReactions.gz
    • Proteoforms to reactions: proteoformsToReactions.gz
    • Pathways to top level pathways: pathwaysToTopLevelPathways.gz
  • Static mappings for interaction networks:

    • Protein UniProt accessions to the complexes they can form: proteinsToComplexes.gz
    • Protein UniProt accessions to entity sets: proteinsToSets.gz
    • Proteoforms to complexes: proteoformsToComplexes.gz
    • Proteoforms to entity sets: proteoformsToSets.gz

Peptides

  • ExtractorPeptides This class gathers the 'Proteotypic Peptide' set from ProteomeTools[4] in a single list file.

    This is an extra command line application that was used as support during the development process of PathwayMatcher. It is not needed for the main functionality.

Protein modifications

  • ExtractorPsiMod: Http client application to gather the available modifications from the PSI-MOD[5] community standard for representation of protein modification data.

    This is also an extra command line application not needed for the main functionality, but useful in case a user wants to get the list of available modifications programmatically.

References

[1] UniProt Consortium, T. UniProt: the universal protein knowledgebase. Nucleic acids research 46, 2699-2699, doi:10.1093/nar/gky092 %J Nucleic Acids Research (2018).
[2] McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biology 17, 122, doi:10.1186/s13059-016-0974-4 (2016).
[3] Fabregat, A. et al. The Reactome Pathway Knowledgebase. Nucleic acids research 46, D649-d655, doi:10.1093/nar/gkx1132 (2018).
[4] Desiere, et al., "The PeptideAtlas Project", Nucleic Acids Research, 2006, 34, D655-D658
[5] Montecchi-Palazzi, L. et al. The PSI-MOD community standard for representation of protein modification data. Nature Biotechnology 26, 864, doi:10.1038/nbt0808-864 (2008).

About

Tool for creating static mapping files from multiples data types to the reactions and pathways in the Reactome graph database.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages