Miscellaneous bioinformatics

Navigating common challenges in microbial ecology.

Multiplexing

Protocol. How to sort barcoded illumina reads into individual FASTQ files... The easy way to taxonomically identify microbial isolates! Includes a program (demultiplexFASTQ.py) and a four-sample dataset.

exactMatching

Protocol. We commonly want to find exact matches between sequences in two FASTA files. When files are large, we don't always need or want the robust BLAST algorithm. This is a perl program that is fast, light, and easy.

Cutadapt

Protocol. How to trim primer sequences from reads generated by Illumina. Plus several common targets:

microbial V3-V4 16S rRNA
microbial V4 16S rRNA
bacterial V4-V5 16S rRNA
microbial V1-V9 16S rRNA
microbial V1-ITS 16S rRNA
fungal 18S rRNA

DADA2

filterAndTrim_bigData.R. At the filterAndTrim step, process groups of samples one at a time instead of all samples simultaneously. Saves time and computer power and crashes and headaches.
merge_ASV_tables.R. Helpful when you have many ASV tables from DADA2 and want to merge them by unique FASTA sequences.

NCBI

removeLineBreakFASTA.sh. Downloading contigs from NCBI, there are line breaks at 800bp. Remove those with this.
downloadMultipleSRA_series.sh. Download multiple files from Sequence Read Archive. Use when you're interested in runs that are named as a series of numbers, which is typical for BioProjects (e.g., runs in project PRJNA597057 range from SRR10755563 to SRR10755886).
downloadMultipleSRA_text.sh. Download multiple files from Sequence Read Archive. Use when you're interested in runs that are not named in a series. Create a text file called "runs.txt" with all desired runs.
ncbiTaxDB_scrape.sh. With a list of NCBI IDs, scrape the taxonomy database webpage associated with it, keeping only taxonomy paths (Kingdom, Phylum, etc) in the resulting file.
ncbiAssemblyDB_scrape.sh Sample thing, here we are scraping the NCBI assemby database for associated BioSamples.

navigateFASTQ-A

catFASTQ.sh. Concatenate FASTQ files with identical names. Its original purpose was to combine files from two sequencing runs (on full and nano Illumina flow cells) on the same samples.
calculateRPKM.py. Count number of bases in FASTA and convert to reads per kilobase million (rpkm). Metric used in metatrascriptomics.
subsetFASTQ.sh. Subset a large FASTQ into smaller ones. Was helpful when learning error rates on a large dataset in dada2.
fastaToCSV.sh. Have a FASTA file? Want to work with it in Excel or R? Use this. The result is a spreadsheet with two columns, "Headers" and "FASTA."

toolsAndPipelines

rgiFASTA.sh. Mine ARGs from FASTAs in a directory with CARD's resistance gene identifier.
deepARG_organize.R. Load and organize results from the deepARG online tool.
metaxa2_[fastq/fasta].sh. Assess taxonomy in assembled or unassembled metagenomes with Metaxa2.
integronFinder.sh. Mine integron sequences from contigs with Integron Finder.
mobileOG-db.sh. Mine mobile genetic elements from the mobileOG database.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Miscellaneous bioinformatics

Multiplexing

exactMatching

Cutadapt

DADA2

NCBI

navigateFASTQ-A

toolsAndPipelines

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
Cutadapt		Cutadapt
DADA2		DADA2
Multiplexing		Multiplexing
NCBI		NCBI
exactMatching		exactMatching
navigateFASTQ-A		navigateFASTQ-A
toolsAndPipelines		toolsAndPipelines
README.md		README.md

NewtonLabUWM/Misc_Bioinformatics

Folders and files

Latest commit

History

Repository files navigation

Miscellaneous bioinformatics

Multiplexing

exactMatching

Cutadapt

DADA2

NCBI

navigateFASTQ-A

toolsAndPipelines

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages