1.2. Evolutionary mechanisms of divergence
1.2.4. Tools for analysis of genetic divergence: Molecular markers
The huge diversity exhibited by the different organisms that inhabit the world can be reflected not only in phenotypic characters but also at the molecular level. Phylogenetic relationships, genetic structure, and evolutionary history among organisms or genes are studied by comparing DNA or protein sequences.
Differences between the sequences evidence genetic divergence caused by evolutionary processes over time.
Despite the large number of species described, there are still millions of unclassified or unknown species.
The traditional system of organism classification is based on morphology and presents some limitations.
Molecular markers are also useful in the classification and identification of unknown organisms. Their use, despite presenting some weaknesses, can complement the traditional morphology-based method for ecological studies (Patwardhan et al., 2014).
Next-generation sequencing (NGS) technologies are revolutionising the field of evolutionary biology, providing new opportunities for genetic analysis at scales not previously possible. These new technologies open the possibility to carry out research in population genetics (Hohenlohe et al., 2010), quantitative trait mapping (Baird et al., 2008), comparative genomics, and phylogeography (Emerson et al., 2010;
Gompert et al., 2010) at a genome-wide level in model and non-model organisms (Mardis, 2008a, 2008b).
Despite considerable progress, these techniques have some limitations, mainly related to the need to develop robust analytical tools to carry out the bioinformatic analysis (Etter et al., 2012).
In general, molecular markers play a basic role in the establishment of genetic variation and biodiversity with precision and reliability. These markers can be mainly classified into two types: mitochondrial and nuclear markers.
1.2.4.1. Mitochondrial DNA
Until now, mtDNA has been the most used molecular marker; therefore, there is an advanced development in techniques and methodology (Patwardhan et al., 2014). It encompasses several features that make it optimal, such as its molecular simplicity, high levels of variability and almost neutral mode of evolution (Avise, 2004, 2009). Mitochondrial markers also have an effective population size (Ne) approximately one-quarter that of nuclear markers, making it possible to recover the pattern and time of recent historical events; since due to the low recombination present in this genome region, the whole molecule can be assumed to have the same genealogical history (Castro et al., 1998; Jiang et al., 2016). The variable substitution rates enable faster evolving regions of the mitochondrial genome to be used for intraspecific variation, and the slower evolving regions for interspecific or intra-genera variation (Gübitz et al., 2000, 2005; Brown et al., 2000, 2006, 2008; Amato et al., 2008; Terrasa et al., 2009a).
Even though mitochondrial DNA has been proved to be extremely useful in describing population genetic structure, resolving species-level phylogenies, or phylogeographic analysis, it has limitations. It only provides information from the maternal lineage and does not recombine, meaning that the resulting gene tree might have a different history to the information that could be obtained with a genomic approach. Besides, there have been technical issues arising from the presence mtDNA integrated into the nuclear genome that could lead to analysis error (Hurst & Jiggins, 2005).
19
Introduction The vertebrate mitochondrial genome is a circular molecule about 17 kb in length containing 37 genes (Wolstenholme, 1992). Among these, there are two ribosomal RNA (rRNA) genes: 12S and 16S. 12S rRNA is highly conserved and has been applied to understand the genetic diversity of higher categorical levels such as phyla. Meanwhile, 16S rRNA is often used for studies at family or genus levels (Gerber et al., 2001).
Mitochondrial coding genes are regarded as powerful markers for genetic diversity analysis at lower categorical levels, due to their faster evolutionary rates compared to rRNA genes. Animal mitochondria contain 13 protein-coding genes; however, three of the most extensively used are cytochrome b (CYTB), NADH dehydrogenase (NADH), and mitochondrial cytochrome oxidase I (COI). COI has recently gained more attention in developing DNA barcodes for species identification and biodiversity analysis (Janzen et al., 2005; Dawnay et al., 2007). Mitochondrial DNA also contains a non–coding region called the control region (CR) due to its role in replication and transcription of mtDNA. The CR fragment shows a higher level of variation than coding sequences due to reduced functional constraints and relaxed selection pressure (Onuma et al., 2006; Arif & Khan, 2009).
1.2.4.2. Nuclear DNA
The second type of markers, as essential as mitochondrial DNA, are nuclear loci, which represent bi-paternal inheritance (Jiang et al., 2016). The most used nuclear markers include random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), microsatellites, and single nucleotide polymorphisms (SNPs). These markers have several applications in biodiversity analysis and are very useful in determining genetic variability among individuals by comparing genotypes at a number of polymorphic loci (Avise, 2004).
Random amplified polymorphic DNA (RAPD) markers use PCR to amplify random segments of nuclear DNA. The technique uses short single primers that attach to both strands of DNA and low annealing temperatures with the aim of amplifying multiple regions (multilocus). The major limitation of RAPD is the inability to differentiate between homozygote and heterozygote, although it is a simple and inexpensive technique.
Amplified fragment length polymorphism (AFLP) is a multilocus technique that includes restriction digestion and PCR amplification. The main advantages of AFLP are its high specificity and reproducibility since it allows for selective amplification due to the use of restriction digestion, specific adaptors, and high annealing temperatures.
Microsatellites are multiple copies of short tandem repeats, located in both coding and non–coding regions and distributed throughout the genomes. Microsatellites are markers with multiallelic presentation in different populations.
The development of more advanced sequencing techniques has enabled the emergence of SNPs. These markers enable single base differences to be detected between several sequences of an individual region or of a whole genome within species. Depending on the level of DNA sequencing (genome-wide or specific region), SNPs can provide broad genome coverage, high levels of variability, and can be used for phylogenetic reconstruction (Arif & Khan, 2009).
20
Introduction
1.2.4.2.1. Single Nucleotide Polymorphisms
Single nucleotide polymorphisms are single base changes in the nucleotide sequence of genomic DNA that may result in a phenotypic characteristic or not. SNPs are usually biallelic markers that present less variability than microsatellites, but they are important nuclear markers that have been widely used for population structure and genetic diversity studies due to the large increase in the number of loci available (Brumfield et al., 2003) and the high polymorphism among populations or individuals (Morin et al., 2004). Other advantages are their simpler mutational process that causes a lowered rate of homoplasy, and the capacity for rapid, large scale, cost-effective genotyping (Syvänen, 2001; Vignal et al., 2002; Brumfield et al., 2003; Chen & Sullivan, 2003; Schlötterer, 2004). Several studies have examined the theoretical potential of SNPs for estimating parameters such as population history and inference of pairwise relationships (Kuhner et al., 2000; Glaubitz et al., 2003), and their application in ecological or conservation analysis (Morin et al., 2004; Kleinman-Ruiz et al., 2017; Rhode et al., 2017; Lemopoulos et al., 2019).
1.2.4.2.2. Genome-wide sequencing: ddRAD sequencing
The relatively newly developed method that generates short sequenced segments throughout the whole genome, to be subsequently analysed and compiled into genomes, is called Next Generation Sequencing (NGS). NGS technologies have become an important tool for molecular ecologists interested in performing evolutionary, ecological, and conservation studies (Allan & Max, 2010), since they open the possibility to carry out genetic analysis at a genomic scale and with non-model organisms, meaning individuals with few genomic resources (Etter et al., 2012). Studies based on large, genome-wide datasets improve the power of studies based on a small number of neutral loci in the determination of population structure; as well as in the estimation of the small proportion of loci that are putatively to be under selection and consequently ecologically relevant in adaptation (Allendorf et al., 2010; Stapley et al., 2010; Narum et al., 2013). With these new genome-wide techniques, we can assess levels of population structure not detected with previous sequencing techniques. Recent episodes of divergence, high gene flow, or detection genetic drift can be observed (Benestan et al., 2015; Lal et al., 2016; Vendrami et al., 2017), and, even when there is negligible neutral differentiation (Jones et al., 2012; Pavey et al., 2015), also the detection of adaptive divergence, which is relevant in management decisions and delimitation of conservation units (Funk et al., 2012).
Restriction site associated DNA (RADseq) genotyping methods (Baird et al., 2008; Etter et al., 2012) combined with NGS technologies have become a powerful and widely used tool in ecological and evolutionary genomic studies (Davey & Blaxter, 2010; Andrews et al., 2016). It enables thousands of genome-wide polymorphic sites (SNPs) to be recovered cost-effectively in model and non-model organisms (Davey et al., 2011). The RADseq approach combines a restriction enzyme (RE) digestion and genome-wide sequencing of the regions adjacent to restriction sites, enabling the exploration of homologous genomic regions for thousands of individuals and the identification of several genetic polymorphisms along the genome. Although a larger proportion of the genome could be examined with other NGS methods, they are more expensive and cannot be used with so many individuals (Andrews et al., 2016).
21
Introduction A recent RADseq based genotyping method is double digest RADseq (ddRAD) (Peterson et al., 2012).
This technique involves the digestion of genomic DNA using both common and rare REs. After whole genome digestion, different molecular processes such as adaptor ligation and size selection transform DNA fragments into a genomic library suitable for sequencing on a NGS platform. Single-end or paired-end sequencing can be used to generate the amount of genomic information and markers (Figure 13).
This protocol differs from RADseq in two principal points: digestion with two REs, rather than random shearing, and the precise size-selection step. These characteristics enable the production of sequencing libraries consisting only of the subset of genomic restriction digest fragments generated by cuts with both REs and which fall within the size-selection window (Peterson et al., 2012).
1.2.4.2.3. Outlier loci
Determining the genetic basis of adaptive characteristics in natural populations is key to understanding populations’ adaptation to variable environments (Nunes et al., 2011a). Diversity in environmental factors may result in phenotypical or physiological differences, leading to morphological and molecular adaptations. Genetic divergence between populations can derive in allele frequency differences at loci related to local adaptation or natural selection, and consequently some regions undergo faster divergence than other regions (Wu, 2001; Nosil et al., 2009). These differences result in the existence of peaks of divergence where genetic differentiation accumulates, and other regions with little to no differentiation (Feder et al., 2013; Seehausen et al., 2014). The regions that exhibit greater differentiation than expected under neutrality are known as “genomic islands” (Figure 14), and are composed of selected, tightly linked loci identified as outliers (Nosil et al., 2009).
Restriction enzyme digestion
Ligation of P1and P2adapters
Commonrestriction enzyme sites Rarerestriction enzyme sites
Genomic fragment
Figure 13. Graphic representation of the ddRAD sequencing process.
22 genetic drift or gene flow, influence all genomic regions, whereas selection leaves a characteristic variability pattern on select loci, enabling the identification of these loci (Beaumont, 2005; Storz, 2005). Detection of these outlier loci is carried out by the estimation of population genetic differentiation (i.e., FST) (Beaumont
& Balding, 2004; Storz, 2005; Bonin, 2008; Foll & Gaggiotti, 2008; Narum
& Hess, 2011). Divergence of recently isolated populations may not be reflected at neutral loci and may not be detected by traditional approaches.
Figure 14. Genomic islands of speciation. Schematic figure of patterns of differentiation along a chromosome. Excavations represent regions under balancing selection, sea floor represents neutrally evolving regions, and sea level represents a neutrality threshold. Islands are regions with greater differentiation than expected under neutrality. Source: Nosil et al. (2009).
Otherwise, loci putatively under selection may offer valuable population markers for more recent ecological timescales (Russello et al., 2012). Genetic variation at neutral loci is shaped by mutation, recombination, gene flow, and genetic drift (Wright, 1931), and has effects on genome-wide variation within and between populations. Natural selection operates on population structure to cause adaptive divergence. Next-generation sequencing is actually the chosen method in population genetics, as it integrates information from neutral and adaptive loci to characterize population genetic structure and adaptive differentiation within populations (Funk et al., 2012).
1.3. Melanism
Melanism is a type of colour polymorphism that consists of completely black coloured individuals; it is a well-studied trait in different taxa and has been the focus of several studies on evolutionary adaptation
(Norris & Lowe, 1964; Wiens, 1999;
Cox & John-Alder, 2005; Janse van Rensburg et al., 2009; Alho et al., 2010). The role of melanism is very complex and has been related to a wide range of adaptive functions such as sexual selection and reproductive success (Wiernasz, 1989; Sinervo & Lively, 1996; Jawor
& Breitwisch, 2003; Griffith et al., 2006; Ducrest et al., 2008; Fedorka et al., 2013).
Figure 15. Dermal chromatophore unit. Source: Vitt & Caldwell (2014).
23
Introduction Different physiological functions have also been associated with melanism: thermoregulation, whereby darker animals warm faster and maintain higher body temperature (Kettlewell, 1973; Kingsolver &
Wiernasz, 1991; Vences et al., 2002; Clusella-Trullas et al., 2007; Reguera et al., 2014; Azócar et al., 2016);
ultraviolet (UV) protection, by discarding harmful radiation with dark pigments (Gunn, 1998; Hofer
& Mokri, 2000; Callaghan et al., 2004; Calbó et al., 2005; Reguera et al., 2014); and immune response, taking advantage of the properties of melanin (Mackintosh, 2001; Wilson et al., 2001; Dubovskiy et al., 2013). Other functions like stress resistance, energy balance (Hoekstra, 2006; Ducrest et al., 2008), and crypsis in response to predation risk (Kettlewell, 1973; Endler, 1984; Vroonen et al., 2012; Fulgione et al., 2014; Reguera et al., 2014) have also been related.
In colour-changing vertebrates (except birds and mammals), there are three types of chromatophores (pigment cells): melanophores which impart brown, black, or red colouration; xanthophores, which contain yellow, red, and orange pigments; and iridophores that comprise blue, purple, green, and iridescent pigments (Figure 15) (Hofreiter & Schöneberg, 2010; Kuriyama et al., 2013; Vitt & Caldwell, 2014).
The melanophore’s distribution depends on the organisation of melanosomes, which are organelles that exclusively contain melanin (Bagnara & Hadley, 1973), the name given to all members of the tyrosine-derived class of pigments found in melanophores (Lerner & Fitzpatrick, 1950; Nicolaus & Piattelli, 1962; Ito & Wakamatsu, 2003). Melanin can be classified in several types: neuromelanin, allomelanin, pheomelanin, and eumelanin (Fedorow et al., 2005). Neuromelanin is specifically expressed in the nervous systems of primates (Marsden, 1961; Fedorow et al., 2005), while allomelanin is found in fungi, plant, and bacteria kingdoms (Fedorow et al., 2005). Pheomelanin ranges from yellow to red and has been found only in melanocytes of mammalian and avian species (Ito & Wakamatsu, 2003). Finally, there is the brown-black eumelanin, which is produced by all vertebrates with the physiological ability of colour change (Aspengren et al., 2009). In general, variation in colouration in vertebrates is mostly attributable to differential accumulation of reddish-brown pheomelanin and to black-grey eumelanin pigments (Majerus, 1998) (Figure 16).
In reptiles, all three dermal pigment cell types are present in melanic individuals, but melanophores are more abundant (Kuriyama et al., 2016). Reptile melanophores are known only to produce eumelanin (Bagnara & Hadley, 1973), but pheomelanin has also recently been discovered in the shell of Hermann’s tortoise (Roulin et al., 2013). Changes in the production and dispersion of melanin granules are ultimately responsible for changes in the dorsal colour of reptiles (Hadley, 1997).