Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FRASER Resources and Run Time #49

Open
lemdcock opened this issue Feb 3, 2023 · 0 comments
Open

FRASER Resources and Run Time #49

lemdcock opened this issue Feb 3, 2023 · 0 comments

Comments

@lemdcock
Copy link

lemdcock commented Feb 3, 2023

Dear C. Mertes,

At the moment I'm running FRASER on 61 untreated PBMC samples and 60 treated PBMC samples. However we are running into some issues with HPC infrastructure. The HPC system has limited the compute time for all analysis to 70h, based on the current output of the FRASER model fit step, I'm afraid these 70h will be insufficient time to finish:

Wed Feb  1 12:08:32 2023: Fit step for: 'psi5'.
Wed Feb  1 12:08:34 2023: Running fit with correction method: PCA
Wed Feb  1 12:08:36 2023: Computing PCA ...
Wed Feb  1 12:10:44 2023: Fitting rho ...
Thu Feb  2 06:22:23 2023: rho fit:
             Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
        0.0000000 0.0000001 0.0000001 0.0039224 0.0000001 0.6091761
Thu Feb  2 06:27:56 2023: Compute p values for: 'psi5'.
Fri Feb  3 00:16:14 2023: Adjust p values for: 'psi5'.
Fri Feb  3 00:17:03 2023: Compute Z scores for: 'psi5'.

Fri Feb  3 00:21:16 2023: Fit step for: 'psi3'.
Fri Feb  3 00:21:28 2023: Running fit with correction method: PCA
Fri Feb  3 00:21:30 2023: Computing PCA ...
Fri Feb  3 00:23:18 2023: Fitting rho ...

Furthermore I previously provided FRASER with 90GB RAM for the analysis, which was unfortunately insufficient and I restarted the analysis with 200GB of RAM. Is this normal behaviour for FRASER with these sample amounts? Or can we speed up the model fitting process? Or can I fit each parameter separately in different jobsubmissions?

The script I currently use is the following:

############ NVQ_Run093-NVQ_Run642 Untreated Samples FRASER Analysis: Fit Model and Call Outliers ############

#### Load Required Libraries ####
library(FRASER)
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
library(org.Hs.eg.db)

#### Define Global Parameters ####
WORK_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/work/'
OUT_DIR <- '/data/gent/shared/001/gvo00101/RNA-Seq_in_Diagnostics/Aberrant_Splicing/FRASER/output/'
register(MulticoreParam(workers = 10))

#### Read in FRASER Object ####
fds <- readRDS(paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Filtered_Optim_psi5_psi3_and_theta.rds', sep = ''))

#### Detection of Aberrant Splicing Events ####
# Fit the Splicing Model
fds <- FRASER(fds, q=c(psi5=5, psi3=5, theta=5))

# Annotate Introns
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
orgDb <- org.Hs.eg.db

fds <- annotateRangesWithTxDb(fds, txdb=txdb, orgDb=orgDb)

# Call Splicing Outliers
res <- results(fds, zScoreCutoff=NA, padjCutoff = 0.999999, deltaPsiCutoff=0.0000001)

#### Save Objects ####
# Processed FRASER Object
FDS_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Processed.rds', sep = '')

saveRDS(fds, FDS_File)

# Results Object
RES_File <- paste(OUT_DIR,'NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results.rds', sep = '')

saveRDS(res, RES_File)

#### Generate Results File for Each Sample ####
Samples <- unique(res$sampleID)

for (s in Samples){
  Results_Samples <- res[res$sampleID == s]
  
  write.xlsx(Results_Samples, paste(OUT_DIR, 'Untreated/','NVQ_Run093-NVQ_Run642_FRASER_Untreated_Results_deltaPsi_0_padj_1_', s, '.xlsx', sep = ''))
} 

Kind regards,
Laurenz De Cock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant