Tourmaline is an amplicon sequence processing workflow for Illumina sequence data that uses QIIME 2 and the software packages it wraps. Tourmaline manages commands, inputs, and outputs using the Snakemake workflow management system.
The current version of Tourmaline supports qiime2-2023.5. To use previous versions of Qiime2, check out previous Tourmaline versions under Releases.
Tourmaline has several features that enhance usability and interoperability:
If you have used QIIME 2 before, you might be wondering which QIIME 2 commands Tourmaline uses and supports. All commands are specified as rules in Snakefile
, and typical workflows without and with sequence filtering are shown as directed acyclic graphs in the folder dags
. The main analysis features and options supported by Tourmaline and specified by the Snakefile are as follows:
Please cite our paper in GigaScience:
If this is your first time using Tourmaline or Snakemake, you may want to browse through the Wiki for a detailed walkthrough. If you want to get started right away, check out the Quick Start below and follow along with the video tutorial on YouTube.
Tourmaline provides Snakemake rules for DADA2 (single-end and paired-end) and Deblur (single-end). For each type of processing, there are four steps:
Steps 2–4 have unfiltered and filtered modes, the difference being that in the taxonomy step of filtered mode, undesired taxonomic groups or individual sequences from the representative sequences and feature table are removed. The diversity and report rules are the same for unfiltered and filtered modes, except the output goes into separate subdirectories.
The current version of Tourmaline supports qiime2-2023.5. To use previous versions of Qiime2, check out previous Tourmaline versions under Releases.
Before you download the Tourmaline commands and directory structure from GitHub, you first need to install QIIME 2, Snakemake, and the other dependencies of Tourmaline. Two options are provided: a native installation on a Mac or Linux system and a Docker image/container. If you have an Apple Silicon chip (M1, M2 Macs), the instructions to install QIIME 2 vary slightly.
To run Tourmaline natively on a Mac (Intel) or Linux system, start with a Conda installation of Snakemake.
conda create -c conda-forge -c bioconda -n snakemake snakemake-minimal
Then install QIIME 2 with conda (for Linux, change “osx” to “linux”):
wget https://data.qiime2.org/distro/core/qiime2-2023.5-py38-osx-conda.yml
conda env create -n qiime2-2023.5 --file qiime2-2023.5-py38-osx-conda.yml
Activate the qiime2-2023.5 environment and install the other Conda- or PIP-installable dependencies:
conda activate qiime2-2023.5
conda install -c conda-forge -c bioconda biopython muscle clustalo tabulate
conda install -c conda-forge deicode
pip install empress
qiime dev refresh-cache
conda install -c bioconda bioconductor-msa bioconductor-odseq
Follow these instructions for Macs with M1/M2 chips.
First, set your Terminal application to run in Rosetta mode.
wget https://data.qiime2.org/distro/core/qiime2-2023.5-py38-osx-conda.yml
CONDA_SUBDIR=osx-64 conda env create -n qiime2-2023.5 --file qiime2-2023.5-py38-osx-conda.yml
conda activate qiime2-2023.5
conda config --env --set subdir osx-64
Then continue to install the other Conda- or PIP-installable dependencies.
To run Tourmaline inside a Docker container:
docker pull aomlomics/tourmaline
docker run -v $HOME:/data -it aomlomics/tourmaline
If installing on a Mac with an Apple M1 chip, run the Docker image with the --platform linux/amd64
command. It will take a few minutes for the image to load the first time it is run.
docker run --platform linux/amd64 -v $HOME:/data -it aomlomics/tourmaline
The -v
(volume) flag above allows you to mount a local file system volume (in this case your home directory) to read/write from your container. Note that symbolic links in a mounted volume will not work.
Use mounted volumes to:
See the Install page for more details on installing and running Docker.
If this is your first time running Tourmaline, you’ll need to set up your directory. Simplified instructions are below, but see the Wiki’s Setup page for complete instructions.
Start by cloning the Tourmaline directory and files:
git clone https://github.com/aomlomics/tourmaline.git
If using the Docker container, it’s recommended you run the above command from inside /data
.
The test data (16 samples of paired-end 16S rRNA data with 1000 sequences per sample) comes with your cloned copy of Tourmaline. It’s fast to run and will verify that you can run the workflow.
Download reference database sequence and taxonomy files, named refseqs.qza
and reftax.qza
(QIIME 2 archives), in 01-imported
:
cd tourmaline/01-imported
wget https://data.qiime2.org/2023.5/common/silva-138-99-seqs-515-806.qza
wget https://data.qiime2.org/2023.5/common/silva-138-99-tax-515-806.qza
ln -s silva-138-99-seqs-515-806.qza refseqs.qza
ln -s silva-138-99-tax-515-806.qza reftax.qza
Edit FASTQ manifests manifest_se.csv
and manifest_pe.csv
in 00-data
so file paths match the location of your tourmaline
directory. In the command below, replace /path/to
with the location of your tourmaline
directory—or skip this step if you are using the Docker container and you cloned tourmaline
into /data
:
cd ../00-data
cat manifest_pe.csv | sed 's|/data/tourmaline|/path/to/tourmaline|' > temp && mv temp manifest_pe.csv
cat manifest_pe.csv | grep -v "reverse" > manifest_se.csv
Go to Run Snakemake.
Before setting up to run your own data, please note:
Now edit, replace, or store the required input files as described here:
00-data/metadata.tsv
. The first column header should be “sample_name”, with sample names matching the FASTQ manifest(s), and additional columns containing any relevant metadata for your samples. You can use a spreadsheet editor like Microsoft Excel or LibreOffice, but make sure to export the output in tab-delimited text format.00-data/manifest_pe.csv
(paired-end) and/or 00-data/manifest_se.csv
(single-end). Ensure that (1) file paths in the column “absolute-filepath” point to your .fastq.gz files (they can be anywhere on your computer) and (2) sample names match the metadata file. You can use a text editor such as Sublime Text, nano, vim, etc.01-imported/fastq_pe.qza
(paired-end) and/or 01-imported/fastq_se.qza
(single-end).00-data/refseqs.fna
and 00-data/reftax.tsv
.01-imported/refseqs.qza
and 01-imported/reftax.qza
.config.yaml
to set DADA2 and/or Deblur parameters (sequence truncation/trimming, sample pooling, chimera removal, etc.), rarefaction depth, taxonomic classification method, and other parameters. This YAML (yet another markup language) file is a regular text file that can be edited in Sublime Text, nano, vim, etc.Tourmaline is now run within the snakemake conda environment, not the qiime2-2023.5 environment.
conda activate snakemake
Shown here is the DADA2 paired-end workflow. See the Wiki’s Run page for complete instructions on all steps, denoising methods, and filtering modes.
Note that any of the commands below can be run with various options, including --printshellcmds
to see the shell commands being executed and --dryrun
to display which rules would be run but not execute them. To generate a graph of the rules that will be run from any Snakemake command, see the section “Directed acyclic graph (DAG)” on the Run page. Always include the –use-conda option.
From the tourmaline
directory (which you may rename), run Snakemake with the denoise rule as the target, changing the number of cores to match your machine:
snakemake --use-conda dada2_pe_denoise --cores 4
Pausing after the denoise step allows you to make changes before proceeding:
config.yaml
.Continue the workflow without filtering (for now). If you are satisfied with your parameters and files, run the taxonomy rule (for unfiltered data):
snakemake --use-conda dada2_pe_taxonomy_unfiltered --cores 4
Next, run the diversity rule (for unfiltered data):
snakemake --use-conda dada2_pe_diversity_unfiltered --cores 4
Finally, run the report rule (for unfiltered data):
snakemake --use-conda dada2_pe_report_unfiltered --cores 4
After viewing the unfiltered results—the taxonomy summary and taxa barplot, the representative sequence summary plot and table, or the list of unassigned and potential outlier representative sequences—the user may wish to filter (remove) certain taxonomic groups or representative sequences. If so, the user should first check the following parameters and/or files:
2-output-dada2-pe-unfiltered/02-alignment-tree/repseqs_to_filter_outliers.tsv
to 00-data/repseqs_to_filter_dada2-pe.tsv
to filter outliers, or manually include feature IDs in 00-data/repseqs_to_filter_dada2-pe.tsv
to filter those feature IDs (change “dada2-pe” to “dada2-se” or “deblur-se” as appropriate);exclude_terms
in config.yaml
– add taxa to exclude from representative sequences, if desired;repseq_min_length
and repseq_max_length
in config.yaml
– set minimum and/or maximum lengths for filtering representative sequences, if desired;repseq_min_abundance
and repseq_min_prevalence
in config.yaml
– set minimum abundance and/or prevalence values for filtering representative sequences, if desired.Now we are ready to filter the representative sequences and feature table, generate new summaries, and generate a new taxonomy bar plot, by running the taxonomy rule (for filtered data):
snakemake --use-conda dada2_pe_taxonomy_filtered --cores 4
Next, run the diversity rule (for filtered data):
snakemake --use-conda dada2_pe_diversity_filtered --cores 4
Finally, run the report rule (for filtered data):
snakemake --use-conda dada2_pe_report_filtered --cores 1
Open your HTML report (e.g., 03-reports/report_dada2-pe_unfiltered.html
) in Chrome{target=”_blank”} or Firefox{target=”_blank”}. To view the linked files:
rooted_tree.qzv
) may take more than 10 minutes to load.Downloaded files can be deleted after viewing because they are already stored in your Tourmaline directory.
snakemake dada2_pe_report_unfiltered
(without filtering representative sequences) or snakemake dada2_pe_report_filtered
(after filtering representative sequences). Warning: If your parameters are not optimized, the results will be suboptimal (garbage in, garbage out).02-output-{method}-{filter}
and 03-report
) generated in the previous run. If you want to save these outputs and rerun with different parameters, you can change the name of the output directories and report files to something informative and leave them in the Tourmaline directory.snakemake FILE
and Snakemake will determine which rules (commands) need to be run to generate that file; or, run snakemake RULE
where the rule generates the desired file as output.If you’ve run Tourmaline on your dataset before, you can speed up the setup process and initialize a new Tourmaline directory (e.g., tourmaline-new
) with the some of the files and symlinks of the existing one (e.g., tourmaline-existing
) using the command below:
cd /path/to/tourmaline-new
scripts/initialize_dir_from_existing_tourmaline_dir.sh /path/to/tourmaline-existing
You may get error messages if some files don’t exist, but it should have copied the files that were there. The files that will be copied from the existing directory to the new directory are:
config.yaml
00-data/manifest_pe.csv
00-data/manifest_se.csv
00-data/metadata.tsv
00-data/repseqs_to_filter_dada2-pe.tsv
00-data/repseqs_to_filter_dada2-se.tsv
00-data/repseqs_to_filter_deblur-se.tsv
01-imported/refseqs.qza
01-imported/reftax.qza
01-imported/classifier.qza
Ensure you make any changes to your configuration file and (if necessary) delete any files you want to be regenerated before you run Snakemake. If you copy over output files from a previous Tourmaline run manually that you do not want to be regenerated (eg, 02-output-{method}-unfiltered
), you should use the cp -p
flag to preserve timestamps.
cp -rp tourmaline-old/02-output-dada2-pe-unfiltered/ tourmaline-new/
Some alternative pipelines for amplicon sequence analysis include the following: