Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSError: [Errno 18] Invalid cross-device link when using spoligotype #248

Closed
micronorman opened this issue Oct 14, 2022 · 5 comments
Closed

Comments

@micronorman
Copy link

I've been getting an error when running v4.3.0 profiler using the --spoligotype parameter. It looks like fastq.py is throwing an error after attemting to rename the output of kmc, when the tb-profiler prefix points to a different device (I am using a temporary folder located on a scratch partition, which is then copied back when tb-profiler completes).

Running command:
set -u pipefail; kmc  -sm -m2 -t1 -sf1 -sp1 -sr1 -k25 @75a89818-a176-42be-88e0-0a7bf16cb424.list 75a89818-a176-42be-88e0-0a7bf16cb424 75a89818-a176-42be-88e0-0a7bf16cb424

Running command:
set -u pipefail; kmc_dump 75a89818-a176-42be-88e0-0a7bf16cb424 75a89818-a176-42be-88e0-0a7bf16cb424.kmers.txt
Traceback (most recent call last):
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/bin/tb-profiler", line 619, in <module>
    args.func(args)
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/bin/tb-profiler", line 155, in main_profile
    results["spoligotype"] = tbp.spoligotype(args)
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/lib/python3.9/site-packages/tbprofiler/spoligotyping.py", line 7, in spoligotype
    result = bam2spoligotype(args.bam_file,args.files_prefix,args.conf)
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/lib/python3.9/site-packages/tbprofiler/spoligotyping.py", line 35, in bam2spoligotype
    results = fq2spoligotype(tmp_fq_file,files_prefix,conf)
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/lib/python3.9/site-packages/tbprofiler/spoligotyping.py", line 26, in fq2spoligotype
    kmers = fastq.get_kmer_counts(files_prefix,klen=25)
  File "/srv/data/TB/Tools/micromamba/envs/tbmyk_tbprofiler/lib/python3.9/site-packages/pathogenprofiler/fastq.py", line 129, in get_kmer_counts
    os.rename(f"{tmp_prefix}.kmers.txt", f"{prefix}.kmers.txt")
OSError: [Errno 18] Invalid cross-device link: '75a89818-a176-42be-88e0-0a7bf16cb424.kmers.txt' -> '/scratch/ansm/tmp.tbprofiler_job-9WRIVeZ8/tbprofiler/239b9a52-414d-455e-b7c1-95fffc1e2b5b.kmers.txt'
Cleaning up after failed run

A temporary workaround that worked for me was to use shutil.move instead of os.rename, but would it be possible to get kmc (get_kmer_counts) to use the same directory as tb-profilers storage directory (DIR) for temporary files?

@jodyphelan
Copy link
Owner

jodyphelan commented Oct 26, 2022

Hi @micronorman,

Thanks for reporting this.

Are you are running this in your scratch partition (/scratch/ansm/tmp.tbprofiler_job-9WRIVeZ8/tbprofiler/)? if so I would expect both the .kmers.txt file never to be moved over to your storage directory. I.e. it would be simple renaming /scratch/ansm/tmp.tbprofiler_job-9WRIVeZ8/tbprofiler/75a89818-a176-42be-88e0-0a7bf16cb424.kmers.txt to /scratch/ansm/tmp.tbprofiler_job-9WRIVeZ8/tbprofiler/239b9a52-414d-455e-b7c1-95fffc1e2b5b.kmers.txt

In any case, the .kmers.txt file is actually only meant to be a temp file (renamed from one uuid to another uuid) and isn't meant to be stored long term so would it be ok to follow your solution of using shutil.move()?

@micronorman
Copy link
Author

Hi @jodyphelan

I am running tb-profiler on a HPC server where jobs are sent to a SLURM queue via an sbatch command. Each slurm-node has a dedicated super fast local scratch partition, however, our TB analysis pipeline is normally executed from a large 200 TB network storage partition (/srv/data/), which I'm guessing is causing the cross-device confusion. I noticed that most other sub-processes launched by tb-profiler were placing temporary files (the ones with long uuid prefixes) inside the directory specified by the DIR parameter, so I designed my pipeline module to just create a temporary working directory on the local scratch space and point to that to minimise disk activity on the shared storage partition (also to avoid clogging up our shared data directory with lots of .errlog.txt and uuid prefixed files if jobs fail) and only move files back if the job finished with exit status 0. I could definitely just tell the script to cd into the scratch drive before launching tb-profiler, but would prefer it if all files (temp or final) created from tb-profiler, were kept below the path of an initially defined working directory. Alternatively, would you consider adding a ---tmp_dir parameter to tb-profiler, or make use of the environmental $TMPDIR variable?

Thank you for your time!

@jodyphelan
Copy link
Owner

Ok that makes sense, I'll have a look at this in the coming week and let you know how it goes. The --tmp_dir idea sounds good.

jodyphelan added a commit to jodyphelan/pathogen-profiler that referenced this issue Jan 9, 2023
jodyphelan added a commit that referenced this issue Jan 9, 2023
jodyphelan added a commit to jodyphelan/pathogen-profiler that referenced this issue Jan 9, 2023
@jodyphelan
Copy link
Owner

Happy new year and sorry for the delay with this. Fixed this issue by removing the requirement for os.rename/shutil.move altogether and added in a --temp setting which makes sure all intermediate files are created in that directory. This will be added into the next bioconda update

jodyphelan added a commit to jodyphelan/pathogen-profiler that referenced this issue Jan 9, 2023
@micronorman
Copy link
Author

Thanks a bunch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: 🏗 In progress
Development

No branches or pull requests

2 participants