Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregate new samples to previous analysis #89

Closed
geocarvalho opened this issue Aug 8, 2024 · 4 comments
Closed

Aggregate new samples to previous analysis #89

geocarvalho opened this issue Aug 8, 2024 · 4 comments

Comments

@geocarvalho
Copy link

Hi, thanks for making FRASER open-source.
I ran FRASER on 250 samples and want to add new samples to the analysis.
How do you guys recommend running a new BAM file and aggregating the counts to the previous counts from the first project?

Sorry to ask in the issues session, but I couldn't find any e-mail for that.
Thank you!

@AtaJadidAhari
Copy link
Collaborator

Hi @geocarvalho,

When you add a new sample the following steps happen if recount=True:

  1. FRASER reads the split_reads for already counted samples from cached data
  2. FRASER counts the split_reads for newly added samples
  3. FRASER merges the split_reads and creates an updated spliceSites object
  4. FRASER counts the non_split_reads for all samples using the new spliceSites

Since adding new sample updates the spliceSites, We have to recount the non_split_reads for all samples.
We are thinking of optimizing this part and making it faster but unfortunately right now there is no other way and you have to recount the non_split_reads for all samples.

Best,

Ata

@geocarvalho
Copy link
Author

Hi @AtaJadidAhari, I have a question about whether I need to keep the BAM files for all the samples when I need to rerun the analysis with a new sample. In the documentation, there is a topic titled "Creating a FraserDataSet from existing count matrices", but I'm unsure how to create the junctionCts and spliceSiteCts, and whether it's necessary to have all the BAM files available.

@AtaJadidAhari
Copy link
Collaborator

AtaJadidAhari commented Aug 28, 2024

Hi @geocarvalho ,
Unfortunately you need the bam files to count the non-split reads on the new junctions that will probably be added by aggregating new samples.
For future references, running snakemake exportCounts --rerun-triggers mtime from DROP will create the required external files as explained in the documentation.

@geocarvalho
Copy link
Author

Thanks, @AtaJadidAhari. That is a lot of space in my case.
I'm unable to run DROP on my environment, so I was trying to use the FRASER script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants