• No results found

Intra-population variability 1. Allele frequencies

Materials and methods

3. Materials and methods 1. Samples

3.5. Statistical analyses

3.5.1. Intra-population variability 1. Allele frequencies

Allele frequencies are the relative frequency of a particular allele of a specific locus in a determined population. In population genetics, allele frequencies are used to describe the amount of variation at a particular locus in a population.

For Indel, Alu, and STR markers, allele frequencies were calculated using Arlequin v.3.5 software (Excoffier and Lischer, 2010). In X-chromosome markers, allele frequencies of males and females were calculated separately, and then total frequencies were estimated using the following formula for each allele in each marker:

𝑝𝑖 = (2 ∗𝑓𝑒𝑚𝑎𝑙𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑦) + 𝑚𝑎𝑙𝑒 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 3

3.5.1.2. Hardy-Weinberg Equilibrium (HWE)

Once we know the allele frequencies of a population, the proportions of the genotypes in the succeeding generation by combining gametes at random can be predicted through the postulate of the Hardy-Weinberg principle.

In our studies, the HWE and p-values were calculated using Arlequin v.3.5 software (Excoffier and Lischer, 2010). In X-chromosome markers the calculations were made by taking into account only the female data.

In statistical significance testing, the p-value measures how the observation compares with the expectation. The null hypothesis can be rejected when the p-value is less than significance level α, which in our case is 0.05. When the null hypothesis is rejected it can be said that the results are statistically significant.

51 3.5.1.3. Diversity parameters

3.5.1.3.1. Gene diversity (GD)

This is equivalent to the expected heterozygosity for diploid data (X-chromosome and autosomal markers). It is defined as the probability that two randomly chosen haplotypes are different in the population (Nei, 1987).

𝐺𝑒𝑛𝑒 𝑑𝑖𝑣𝑒𝑟𝑠𝑖𝑡𝑦 = 1 − ∑(𝑝𝑖)2

𝑛

𝑖

where pi is the allele frequency of each allele in the sample.

This was performed using only female data in X-chromosome markers and all the samples for Autosomal Indels.

For multiallelic markers, diversity parameters were estimated using Arlequin v.3.5 (Excoffier and Lischer, 2010).

3.5.1.3.2. Haplotype diversity (HD)

Haplotype diversity is a measure of the uniqueness of a particular haplotype in a given population. It is defined as the probability that two randomly chosen haplotypes are different in the population. This parameter is equivalent to gene diversity in haploid markers. It was calculated using Arlequin v.3.5 (Excoffier and Lischer, 2010).

Haplotype diversity is computed as Nei and Tajima (1981):

𝐻𝐷 = 𝑁

𝑁 − 1 (1 − ∑(𝑥𝑖)2

𝑛

𝑖

)

Here, xi is the (relative) haplotype frequency of each haplotype in the sample, and N is the sample size.

This was calculated for male samples in X-chromosome markers. It was also calculated in the X-STRs for the four different linkage groups (LGs) included in the Investigator Argus X-12 kit, and for X-Indels for the markers that were shown to be linked in the linkage disequilibrium analysis. It was also performed for all the samples studied for Y-chromosome STRs, and mtDNA.

52 3.5.1.3.3. Mitochondrial DNA diversity

In mtDNA analyses additional parameters were calculated: a) K (number of different haplotypes); b) S (number of polymorphic sites); c) π (nucleotide diversity), which is the probability that two randomly chosen homologous (nucleotide or RFLP) sites are different, which is equivalent to gene diversity at nucleotide level for DNA data; and d) Theta (θ).

Theta is a fundamental parameter of molecular evolution that encapsulates the expected level of genetic diversity in a randomly mating, constant-sized population not subject to selection when an equilibrium is reached between genetic drift and mutation. It is defined as:

θ = 2nNeµ,

where n is the number of heritable copies of the locus per individual (0.5 in the case of mtDNA), Ne is the effective population size, and µ is the mutation rate per nucleotide (or per sequence) and per generation (Nei, 1987; Tajima, 1993).

There are different ways to estimate θ from sequence data; depending on the parameter used the estimator is called θs (using the number of polymorphic sites), θK (using the number of different haplotypes), etc. The Theta estimator based on the number of different lineages (θK) (Tamura and Nei, 1993), which is based on the relationship between sample size and the number of distinct lineages, is more sensitive to the effects of lineage sorting during recent demographic history.

3.5.1.4. Neutrality tests

There are many methods to detect selection, which typically calculate a statistic that compares a feature of the observed diversity to that expected under neutral evolution.

In this work Tajima's D test of selective neutrality (Tajima, 1989; 1993)

which compares the number of segregating sites per site with nucleotide diversity

was estimated in mtDNA analysis.This test compares two estimators of the population parameter. Under the infinite-site model, both estimators should estimate the same quantity, but differences can arise under selection, population non-stationarity, or heterogeneity of mutation rates among sites. The test statistic D is defined as:

𝐷 = 𝜃̂𝜋 − 𝜃̂𝑠

√𝑉𝑎𝑟 (𝜃̂𝜋 − 𝜃̂𝑠)

53 3.5.1.5. Linkage disequilibrium (LD)

Linkage disequilibrium is an estimate of recombination at a population level. LD measures whether specific alleles at different loci are correlated with one another more or less often than would be expected by chance (Jobling et al., 2014). LD is influenced by many factors, including evolutionary forces and also population characteristics (mating system, population substructure…). Thus, the pattern of LD is a powerful tool to understand past evolutionary and demographic events in human history or in the history of a particular population.

Exact test of LD was calculated for haplotypic data (Y-chromosome, and in males in X-chromosome markers) using Arlequin v.3.5 software (Excoffier and Lischer, 2010) performing 1000000 steps in Markov chains and 1000 steps of demorization.

3.5.1.6. Forensic parameters

To test how efficient the recombinant markers used in this work (STR, Alu and Indel) are for forensic purposes, a series of parameters was calculated.

Polymorphism information content (PIC) (Botstein et al., 1980) and expected heterozygosity (formula equivalent to gene diversity) are devised for more general purposes and are valid for both autosomal and X-chromosome markers.

𝑃𝐼𝐶 = 1 − (∑ 𝑝𝑖2 parameter is not suitable for X-chromosome markers except for deficiency cases in which the paternal grandmother is investigated instead of the alleged father.

𝑀𝐸𝐶𝐾𝑅Ü = ∑ 𝑓𝑖3(1 − 𝑓𝑖)2+ 𝑓𝑖(1 − 𝑓𝑖)3+ ∑ 𝑓𝑖𝑓𝑗(𝑓𝑖+ 𝑓𝑗)

𝑖<𝑗

𝑖 (1 − 𝑓𝑖 − 𝑓𝑗)2

Kishida et al. (1997) devised a MECKIS for X-chromosome markers that covers trios including a daughter. If MECKRÜ is compared to MECKIS, the latter is considerably

54

larger. This highlights the fact that in trios involving a daughter, X-chromosome markers are more efficient than autosomal markers.

𝑀𝐸𝐶𝐾𝐼𝑆 = ∑ 𝑓𝑖3(1 − 𝑓𝑖) + 𝑓𝑖(1 − 𝑓𝑖)2+ ∑ 𝑓𝑖𝑓𝑗(𝑓𝑖 + 𝑓𝑗)

𝑖<𝑗

𝑖 (1 − 𝑓𝑖− 𝑓𝑗)

𝑀𝐸𝐶𝐷𝑡𝑟𝑖𝑜 = 1 − ∑ 𝑓𝑖2+ ∑ 𝑓𝑖4− (∑ 𝑓𝑖2

𝑖<𝑗 )

2 𝑖

𝑖

𝑀𝐸𝐶𝐷𝑑𝑢𝑜 = 1 − 2 ∑ 𝑓𝑖2+ ∑ 𝑓𝑖3

𝑖 𝑖

Finally, Desmarais et al. (1998) introduced formulae for the mean exclusion chance of X-chromosome markers in trios involving daughters (MECDtrio) and in father/daughter duos lacking maternal genotype information (MECDduo). MECDtrio is equivalent to MECKIS whilst MECDduo is also appropriate for maternity testing of mother/son duos.

Power of discrimination in Females (PDfemale) and power of discrimination in males (PDmale) are parameters suitable to assess the power of markers for forensic identification purposes in males and females, respectively.

𝑃𝐷𝑓𝑒𝑚𝑎𝑙𝑒 = 1 − 2 (∑ 𝑓𝑖2

𝑖 )2 + ∑ 𝑓𝑖4

𝑖

𝑃𝐷𝑚𝑎𝑙𝑒 = 1 − ∑ 𝑓𝑖2

𝑖

Here, fi (fj) are population frequencies of the ith (jth) marker alleles.

For autosomal Indels, PowerStats formulae (Brenner and Morris, 1989; Jones, 1972) were used to calculate forensic parameters. For the match probability (MP) and power of discrimination, the formulae are the following:

𝑀𝑎𝑡𝑐ℎ 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦= ∑ 𝐺𝑖2

𝑛

𝑖=𝑎

55

𝑃𝑜𝑤𝑒𝑟 𝑜𝑓 𝑑𝑖𝑠𝑐𝑟𝑖𝑚𝑖𝑛𝑎𝑡𝑖𝑜𝑛= 1 − ∑ 𝐺𝑖2

𝑛

𝑖=𝑎

where G1 is the fraction of samples with genotype “i”.

To determine the power of exclusion (PE) and typical paternity index (TPI or PI) the following formulae were used

:

𝑃𝑜𝑤𝑒𝑟 𝑜𝑓 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑜𝑛= ℎ2(1 − 2 ∗ ℎ ∗ 𝐻2)

𝑇𝑦𝑝𝑖𝑐𝑎𝑙 𝑝𝑎𝑡𝑒𝑟𝑛𝑖𝑡𝑦 𝑖𝑛𝑑𝑒𝑥 =(𝐻 + ℎ) 2𝐻

where h defines the number of heterozygotes and H is the number of homozygotes. These formulae are valid for both autosomal and X-chromosome markers.

To test forensic efficiency in Y-STRs, discrimination capacity (DC) was calculated as the percentage of different haplotypes and haplotype match probability (HMP) as 1-haplotype diversity.

3.5.2. Genetic structure and Inter-population variability