Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1.4.0 : no tmp .bout files #18

Open
Proginski opened this issue Sep 26, 2023 · 3 comments
Open

v1.4.0 : no tmp .bout files #18

Proginski opened this issue Sep 26, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@Proginski
Copy link

Proginski commented Sep 26, 2023

Dear genEra developers,

Describe the bug
The CDS of A thaliana I am using, won't be dated.
I already succeded using genEra v1.4.0 with a subset of H sapiens' CDS
Now using the enclosed fasta, even when providing 500Go RAM for 262Go of results, it does not work.
Notice that I performed the same analysis (same command) with v1.2.0 and it went perfectly fine (except it took longer of course).
I would have bet the problem is caused by the "|" character in the middle of the CDS name, but it worked with the previous version.

To Reproduce
Steps to reproduce the behaviour, e.g.

genEra \
-t 3702\
-q CDS/cds_from_genomic.faa \
-b /diamonddb/NR_DB/nr \
-n 75 \
-r ncbi_lineages_2023-07-12.csv

Expected behaviour
The ages are not assigned :
#gene phylostratum rank taxonomic_representativeness
lcl|NC_000932.1_cds_NP_051037.1_48181 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051038.1_48226 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051039.1_48182 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051040.2_48183 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051041.1_48184 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051042.1_48185 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051043.1_48186 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051044.1_48187 Absent from the DIAMOND/MMseqs2 results NA NA
lcl|NC_000932.1_cds_NP_051045.1_48188 Absent from the DIAMOND/MMseqs2 results NA NA

Screenshots or code
Here are the last lines of the err file (16 Mo of similar 'No such file or directory' lines)

awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_001321941.1_644.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_177334.1_10947.bout': No such file or directory
awk: cannot open /store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003070.9_cds_NP_565027.1_10948.bout (No such file or directory)
rm: cannot remove '/store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_lcl|NC_003071.7_cds_NP_001323584.1_19320.bout': No such file or directory
.................................................. 1M
.................................................. 2M
.................................................. 3M
.................................................. 4M
...
[mclIO] writing </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
.......................................
[mclIO] wrote native interchange 48227x48227 matrix with 4144755 entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mci>
[mclIO] wrote 48227 tab entries to stream </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.tab>
[mcxload] tab has 48227 entries
[mclIO] reading </store/EQUIPES/BIM/MEMBERS/paul.roginski/Eukaryotes/GENERA/ATHA/tmp_3702_11608/tmp_3702.mcl>
.......................................
[mclIO] read native interchange 48227x8569 matrix with 48227 entries

Session info:

Paul

@Proginski Proginski added the bug Something isn't working label Sep 26, 2023
@josuebarrera
Copy link
Owner

Dear Paul,

Thanks for reaching out! You are right, the new script for faster gene age assignment seems to mistake the "|" characters in the FASTA headers with column separators, leading to errors. We'll start working on a solution throughout the weekend, but I think it should be fairly easy to fix.

Cheers,
Josué.

@josuebarrera
Copy link
Owner

Dear Paul,

@RocesV just fixed the issue with the fast headers containing | characters. Please download the newest version of FASTSTEP3R and let me know if this fixed your problem.

Best,
Josué.

@Proginski
Copy link
Author

Thanks a lot !

Paul

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants