Sequence analysis through QIIME - The potential role of tap water bacteria in inflammatory bowe

A popular bioinformatics pipeline for analysis of sequences is QIIME, which is an abbreviation for Quantitative Insights Into Microbial Ecology. A mapping file is normally required for data analysis, giving the program necessary information about the samples. Navas-Molinas et al. (2013) have proposed a rough division of QIIME workflow into an “upstream” and “downstream” analysis, each encompassing several steps managed by a series of commands.

1.8.1 Upstream analysis Pre-processing of input data

The first step of analysis by this open-access tool, is a pre-processing step encompassing several events impacting downstream analysis. The first event involves the designation of sequences into their respective samples, based on the unique barcode attached at the end, also known as

demultiplexing. Barcodes and primers are eventually removed. This is ensued by a quality filtration step, where sequences of low quality or with possible ambiguities are discarded according to a given set of parameters. This could include the minimum Q-score (q), percentage of consecutive base calls of high quality (p) and the maximum number of consecutive base calls of low quality (r) and ambiguous bases (n). (Navas-Molina et al. 2013) Often, a sub sampling of sequences of a given threshold (cut-off value) is implemented after the quality filtering, giving an even depth in all samples before downstream analysis. Thus, a number of sequences identical to this cut-off value are selected from each sample in a random manner. (Kuczynski et al. 2011) (Nelson et al. 2014)

OTU designation

An important step that potentially could pose a great impact on downstream analysis, is the designation of sequences into OTUs, which normally is performed with 97% sequence similarity.

QIIME present three different approaches for this purpose: de novo, open reference based and closed reference based sequence clustering. The de novo based method encompass the designation of sequences into OTUs based on their resemblance to each other, without the use of known reference sequences. The reference based approaches on the other hand involves sequence clustering against references, thus giving a predefined set of possible OTUs. The main difference between these two reference based approaches is that the closed approach involves the exclusion of sequences that fail to be clustered against the reference. In open reference based approach however, these sequences are clustered de novo. Thus, each OTU comprise several related sequences.

In order to simplify downstream computer analysis, one representative sequence is given to each OTU which subsequently is given a taxonomic identity. The hierarchical level of taxonomic designation however, is dependent on the resolution of the representative sequence. This sequence could if needed, be submitted to an appropriate database such as BLAST (Basic Local Alignment Search Tool) for further taxonomic identification. (Kuczynski et al. 2011) (qiime.org) The OTUs are finally used to make an OTU-table and to create a phylogenetic tree in order to visualize the phylogenetic relationship between the identified OTUs. It has been argued that the creation of an

OTU-table should be ensued by a second quality filtration step to remove spurious OTUs of low abundance (Navas-Molina et al. 2013) which often are the results of chimera formation, PCR errors or sequencing errors. (Nelson et al. 2014)

1.8.2 Downstream analysis

Using the constructed OTU-table and the phylogenetic tree, QIIME provides the user with a number of different possibilities for downstream analysis, statistics and visualization. The relative

abundance of different taxonomic levels, both within and between communities can be visualized through charts, and through a number of commands, several different metrics can be implemented for estimates of diversity estimates. (qiime.org) For simplicity, only a subset of metrics and visualization options will be presented.

Intragroup diversity analysis

Alpha diversity encompass the diversity within samples and is often presented as OTU-richness, although several other indices for alpha diversity has been developed, such as the Chao1, Shannon and Simpson indices. While the Simpson indices tries to estimate the relative abundance of the species in a sample, the Shannon metric also tries to identify the number of unique species. Chao1 on the other hand aspire to estimate the number of species present in a sample, if sampled

exhaustedly. Regardless of method for alpha diversity estimates, QIIME allows for presentation through a rarefraction plot, thus making it possible to assess whether the cut-off value gave

satisfactory coverage of the species present. This is usually determined by evaluating the extent of which the slopes present an asymptotic shape. (Pepper et al. 2015) (qiime.org)

Intergroup diversity analysis

Beta diversity metrics typically aspire to present degree of similarity in species composition and/or distribution between samples. Several indices for beta diversity exist with the Jaccard, Bray Curtis and Unifrac possibly comprising the most common approaches. While Jaccard only consider the presence and absence of species, their relative abundance is taken into consideration in Bray-Curtis.

(Pepper et al. 2015) Unifrac however, aims at determining the difference between microbial communities by establishing their phylogenetic distance in terms of branch length. (Lozupone &

Knight 2005) Thus, the extent of tree similarity between communities determines the beta diversity.

(Pepper et al. 2015) Unifrac measurements can be unweighted or weighted, where the latter approach accommodate for potential differences in the relative abundance of taxa in the compared

communities, thus giving a qualitative measurement of beta diversity. (Lozupone et al. 2007) The unweighted approach on the other hand only interpret the absence/presence of OTUs. (Navas-Molina, 2013) QIIME permit visualization of the beta diversity through PCoA-plot (Principal Coordinates Analysis-plot) and hierarchical clustering. (Kuczynski et al. 2011) (qiime.org)

In document The potential role of tap water bacteria in inflammatory bowel disease (sider 31-34)