Next generation metagenetic (=targeted metagenomics) sequencing allows biologists to reveal the whole microbial diversity of their samples without the limitations caused by culture and phenotypic identification techniques. Sequencing protocols are now standardized, and generating metagenetic data at low costs is no longer an issue thanks to benchtop sequencers. However, there are a lot of different analysis methodologies to interpret those results, with no standardization nor good practices guidelines.

Bioanalysts may have trouble evaluating the fidelity of the analytical pipelines and configuration profile they choose. Preconceptions on the composition of their samples are often the only elements used to criticize the results, and there is no standard method to evaluate the reliability of softwares. Moreover, it is very difficult to estimate analysis biases caused by the variation of parameters and the implemented algorithms, or to justify the choice of a certain reference databank.

The main focus of this PhD is to adress key metagenetic analysis issues in order to help bioanalysts to get a better grasp of their results, from experimental design to biological interpretation.

Preliminary work has been presented at the ECCB’14 (Europen Conference on Computational Biology) poster session (figure on the right) and at the 15th André Verbert day at Université de Lille (figure on the left).

This PhD is an ongoing project between Genes Diffusion, the TAG platform of Institut Pasteur de Lille and the Bonsai team from INRIA Lille and CRIStAL.