Assessment of common and emerging bioinformatics pipelines for targeted metagenomics

Overview

More and more bioinformatics analysis pipelines are available for metagenetics (=targeted metagenomics) analyses. They often require advanced bioinformatics skills, computing resources and can dishearten users not familiar with the diversity of existing analytical processes. Each pipeline proposes its own guidelines, with configuration profiles, reference databases and recommended analytical steps. Choosing a pipeline with a set of parameters and algorithms for a given application can quickly become a brainteaser without a formal evaluation.

We introduce below the first-ever evaluation protocol to compare metagenetics pipelines in their entirety:

Protocol

This evaluation protocol includes universal comparison metrics such as clustering and diversity indices, and both simulated and real datasets to objectively evaluate the performances of a pipeline. Within this evaluation protocol, we evaluated a total of 6 pipelines: mothur, QIIME and BMP, which implement OTU clustering approaches and kraken, CLARK and One Codex, which are emerging solutions based on discriminative k-mers that were developed for shotgun metagenomics and have never been evaluated on metagenetics datasets.

This work has been published in the following paper : Assessment of Common and Emerging Bioinformatics Pipelines for Targeted Metagenomics, Siegwald et al. PLOS ONE, 2017. It provides a unique resource to help the user to choose the most appropriate bioinformatics pipeline to judiciously analyse his own metagenetics datasets.

Evaluation protocol datasets

The datasets built in this evaluation protocol are openly available (compressed fastq files) : 200(V3), 400(V4-V5) and real.

For the simulated metagenomes, the number of sequences per amplicon for each dataset is described in the following Excel file (one spreadsheet per target) : datasets_descriptions.xls

Pipelines execution modes

The following text files contain all guidelines used for each dataset (symbolized by $dataset) and database evaluated in our study :

All guidelines can be downloaded in this archive : guidelines.zip.