Skip to content

What the Package Does (One Line, Title Case)

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

CCICB/TCGAexpression

Repository files navigation

TCGAexpression

⚠️ Warning: This package is in early development and not yet ready for use

Lifecycle: experimental CRAN status

TCGAexpression provides easy access to TCGA expression datasets in R. Optimised for easy visualisation and analysis with express

Installation

You can install the development version of TCGAexpression like so:

# install.packages("remotes")
remotes::install_github('CCICB/TCGAexpression')

Quick Start

High Memory Systems

If you have sufficient memory to read very large dataframes into memory:

library(TCGAexpression)

# See available datasets
tcga_expression_available()

# Load available datasets into memory
tcga_expression_load(cohorts="GBM")

# Load just specific genes memory
tcga_expression_load(cohorts="GBM", c('TP53', 'PTEN'))

Memory Efficient Database Approach

For all devices, but particularly low-memory devices, it will be faster to download this database as an sqlite db (consider Parquet files with arrow interface as this will be easier to store and download from github) - Bioconductor experimenthub might allow storage of these as indexed files as a github alternative

# Download 
tcga_expression_download(cohors = "GBM") #try cohorts = "Everything" to download the full SQLITE db

# Create a Connection 
tcga_expression_connect()

Finding Expression Datasets

Source Description
TCGA PanCanAtlas

Contains gene expression (and lots of other modalities) for 33 tumor types profiled by TCGA

File: EBPlusPlusAdjustPANCAN_IlluminaHiSeq_RNASeqV2.geneExp.tsv

cBioPortal Many datasets including TCGA cohorts
openPBTA

Open paediatric brain tumour atlas includes over 1,074 pediatric brain tumors and 22 patient-derived cell lines. This data can be downloaded from a public cavatica project. Note genomic and histological

File: pbta-gene-expression-rsem-tpm.polya.rds

OR if you want the stranded data

File: pbta-gene-expression-rsem-tpm.stranded.rds

See data-files-description.md for details

OpenPedCan

Children’s Hospital of Philadelphia is an open analysis effort that harmonizes pediatric cancer data from multiple sources:

  • TARGET

  • Kids First Neuroblastoma

  • OpenPBTA

  • GTEx RNA-Seq

  • TCGA RNA-Seq

  • CHOP DGD Targeted DNA and Fusion Panel Sequencing

UCSCtreehouse

Contains expression datasets describing both child and adult patient cancer tumors.

See the Newest Tumor Compendium: v11 (polyA expression data)

File: gene expression RNAseq - RSEM log2(TPM + 1) normalized

XENA datasets Contains hundreds of cohorts TCGA, PCAWG datasets including methylation, rnaseq, atac-seq, wgs data. (includes treehouse data)

About

What the Package Does (One Line, Title Case)

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages