MICRA data

This page contains all the MICRA results presented in the related publication.

DH10B

Fully-controlled conditions: simulated reads and real genomes

Artificially mutated genome

The genome of Escherichia coli str. K12 substr. DH10B (NCBI:CP000948) was artificially mutated: 21,052 SNVs and 2,238 DIPs (available here) were randomly introduced using a script developed specifically to record position and type of mutations. Two shigatoxin genes (Shiga toxin 2 subunit A and B) from Escherichia coli O157:H7 str. Sakai (NCBI:NC_002695) were inserted between positions 1,557,859 and 1,557,915 and finally the mdtL gene implied in choramphenicol resistance (3,987,222..3,988,397) was deleted.
The sequence of the mutated genome is available for download in FASTA format.

Ion Torrent Simulated reads

From this artificially mutated genome, 1,000,000 reads were simulated with CuReSim 1.2⁠⁠ to obtain a theoretical sequencing-depth around 40X with the default parameters (read size of 200bp with a standard deviation of 20, 1% of deletions 0,5% of insertions and 0,5% of substitutions). To mimic the transfer of a plasmid, 50,000 reads from the pSFO157 plasmid (NCBI :AF401292), corresponding to a theoretical sequencing-depth of 80X, were also simulated with CuReSim (default parameters) and added to the FASTQ file which was finally shuffled.
The FASTQ containing the 1,050,000 reads is available for download.

MICRA parameters and results

Illumina paired-end Simulated reads

Paired-end Illumina reads were simulated with ART from artificially mutated genome with parameters -l 100 (read size) -m 200 (mean fragment size) -s 10 (standard deviation for fragment size) and -f 40 (sequencing-depth of 40X) and from pSFO157 plasmid with parameters -p -l 100 -f 80 -m 200 -s 10. Resulting reads are then merged in two FASTQ files which were finally synchronously shuffled. The FASTQ are available for download: file1 file2.

MICRA parameters and results

Fully-controlled conditions: real data and artificially mutated genome

Cross-validation

In order to avoid simulation biases and confirm the previous results, the same experiment was led changing the initial conditions: one million of the real reads (to reach a mean depth of 40X) has been taken in input and the artificially mutated genome sequence was integrated among the reference genomes in replacement of the K12-DH10B genome.

MICRA parameters and results

Real-life conditions and comparison with external tools

Data description

A set of 2,290,055 reads with a mean length of 200bp and a mean quality of Q28 from a 316 Ion Torrent run was used in this part. Two distinct MICRA analysis were performed:

  1. MICRA was used with the entire set of reads and only the K12-DH10B genome (NCBI:CP000948.1) as reference sequence. This can illustrate the study of a mutant in the view to rapidly identify the variation compared to its related wild-type strain.
  2. MICRA was used in automatic mode.

MICRA parameters and results

  1. With the K12-DH10B genome (NCBI:CP000948.1)
    • MICRA parameters:
      • Reads: DH10B reads (Ion Torrent 316 run)
      • Sequencing technology: Ion Torrent
      • In "Selection of the reference sequences" in the part "Give your own reference sequences" load this reference ID file containing only the DH10B reference genome.
      • In "General options" check the box to keep the SAM files
      • Ask MICRA to perform Antibiotic module
    • MICRA results are available for download.
  2. Automatic analysis
    • Running only the preProcess module of MICRA:
      • In the part "Selection of the reference sequences" put the "Number of reference genomes to be selected" to 6 and check the box "Performs only the selection of the reference genomes?".
      • In the result directory, copy the 5 lines (excluding the first one corresponding to CP000948.1) from the genomeList.txt file and the 5 lines from the plasmidList.txt file in a new text file for the next step (the reference ID file).
    • MICRA parameters :
      • Reads: DH10B reads (Ion Torrent 316 run)
      • Sequencing technology: Ion Torrent
      • In "Selection of the reference sequences" in the part "Give your own reference sequences" load this reference ID file containing five reference genomes and five reference plasmids and change the field value "Percentage of covered sequence for a plasmid to be considered in the analysis" to 60%.
      • In "General options" check the box to keep the SAM files
      • Ask MICRA to perform Antibiotic module
    • MICRA results are available for download.

Fundamental research case: the P134 strain of Bordetella pertussis

Data description

Genomics characteristics of P134 has only been superficially investigated and its genome contains a high number of inserted repeated sequences. We sequenced a lab adapted P134 strain using PGM Ion Torrent sequencer (314 chip, raw reads are available on NCBI SRA:SRR4019415) and used MICRA to characterize this strain.

MICRA parameters and results

Clinical case: the 2011 German outbreak caused by Escherichia coli O104:H4

MICRA used with the TY2482 sequences

Data description

MICRA was used with the TY2482 chromosome and the three plasmid sequences (downloaded here) to evaluate the possible power of the analysis, independently of the choice of the reference sequences given in input.

MICRA parameters and results

MICRA used without the TY2482 sequences

Data description

MICRA was run in automatic way. In order to be as closed as possible from the original outbreak conditions and not bias the results obtained with MICRA, the sequences too similar and sequenced after the outbreak were discarded in a local version of the databases used for the selection of the reference sequences leading to the selection of 5 genomes and 10 plasmids.

MICRA parameters and results

Study of the 2009-2050 and 2009-2071 strains (Illumina data)

Data description

Two Escherichia coli O104:H4 isolates closed to TY2482 was sequenced and studied in 2011 a bit after the German outbreak: 2009EL-2050 and 2009EL-2071. We used MICRA with a subset of the single Illumina reads and the same reference genomes than the sequences used for the TY2482 isolate.

MICRA parameters and results for the 2009-2050 strain

MICRA parameters and results for the 2009-2071 strain

Study of Staphylococcus aureus data

Illumina paired-end data

Data description

The Staphylococcus aureus sequence Type 239 (TW20) strain, a multi-resistant strain which has emerged recently and highly transmissible, It contains one chromosome (TW20 NCBI:NC_017331.1) and two plasmids: pTW20_1 (NCBI:NC_017352.1) and pTW20_2 (NCBI:NC_017332.1). We used this strain to ebaluate the MICRA pipeline dealing with Illumina paired-end data.

MICRA parameters and results for aureus with reference sequences

MICRA parameters and results for aureus without reference sequences

Ion torrent data

Data description

The strain Staphylococcus aureus subsp. aureus ST398 contains one chromosome (NCBI:AM990992.1) and three plasmids (pS0385-1 NCBI:AM990993.1, pS0385-2 NCBI:AM990994.1 and pS0385-3 NCBI:AM990995.1).

MICRA parameters and results for aureus without reference sequences

Study of Clostridium autoethanogenum data

Data description

A draft genome for Clostridium autoethanogenum DSM10061 strain, an industrially relevant bacterium, was published by Bruno-Barcena et al. in 2013 [3]⁠ from 454 GS FLX and Ion Torrent PGM data. Pacific biosciences single-molecule DNA sequencing technology was used to generate a finished genome sequence by Brown et al. in 2014 [4]⁠. Humphreys et al. in 2015 [5]⁠ detailedly inspected the Brown et al. closed genome sequence and identified some frame-shift mutations resulting in premature stop-codons. They resequenced this strain using Illumina MiSeq technology and observed 243 single nucleotides discrepancies when compared to the previous published genome due to sequencing errors and finally completed a comprehensive manual annotation. The study of this strain represented a very good application case for MICRA.

Illumina paired-end data

MICRA parameters and results for clostridium with reference sequences

MICRA parameters and results for aureus without reference sequences

Ion Torrent data

MICRA parameters and results for aureus with reference sequences

MICRA parameters and results for aureus without reference sequences