• No results found

Power-law distributions are fitted to low or high frequencies of human TCR

3.3 Testing of diversity and similarity measures on simulated clonal frequency data . 19

4.1.3 Power-law distributions are fitted to low or high frequencies of human TCR

4.1 3.4 1.6 4.1 1.5 1.9

2 3 4 5

Pre B cell Naive B cell Bone Marrow Plasma Cell Spleen Plasma Cell

Cell type

Power-law α̂ estimate Isotype

IgG IgM

Estimated power-law α̂ for mouse BCR data

Median values are shown above each boxplot

Figure 4.2: The estimated power-law exponents vary across cell types and isotypes.

For bone marrow plasma cells there is a clear separation between the estimated exponents for IgG and IgM. For spleen plasma cells the groups overlap mostly.

4.1.3 Power-law distributions are fitted to low or high frequencies of human TCR

1e-06 1e-04 1e-02 1e+00

100 101 102 103

Read frequency CDF Distribution

Exponential Poisson Log-normal Power-law

Dataset: D1-Na, DeWitt et al. (2016)

Naive B cell repertoire (donor 1, sample a)

(a)This repertoire does not show a straight line on the double-logarithmic plot. None of the distributions fit the complete repertoire well; although the log-normal distribution fits a small part of the largest values in the repertoire.

1e-06 1e-04 1e-02 1e+00

100 101 102 103

Read frequency CDF Distribution

Exponential Poisson Log-normal Power-law

Dataset: D1-Nb, DeWitt et al. (2016)

Naive B cell repertoire (donor 1, sample b)

(b)This biological replicate seems to show the same

‘bumps’ as the repertoire in (a), but the estimated power-law parameter and xminare very different. The fitted log-normal distribution is more stable.

1e-06 1e-04 1e-02 1e+00

100 101 102 103

Read frequency CDF Distribution

Exponential Poisson Log-normal Power-law

Dataset: D2-N, DeWitt et al. (2016)

Naive B cell repertoire (donor 2)

(c)Due to the bump in the double-logarithmic plot of this repertoire, none of the fitted distributions seem to explain the data well.

1e-06 1e-04 1e-02 1e+00

1 10 100

Read frequency CDF Distribution

Exponential Poisson Log-normal Power-law

Dataset: D3-N, DeWitt et al. (2016)

Naive B cell repertoire (donor 3)

(d) Of all human BCR repertoires, this repertoire seems most ‘straight’ on the double-logarithmic plot.

The highest values in this repertoire may be explained through a power-law distribution.

Figure 4.3: The human naive B cell repertoires do not follow a straight line on the double logarithmic plot. Log-normal distributions seem like the best candidates to explain certain parts of the frequency distributions, although some of the ‘bumps’ in the repertoires can not be explained by any of the fitted distributions. The ˆα estimates are given in Table 4.1.

1e-06 1e-04 1e-02 1e+00

1 10 100

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: D1-M, DeWitt et al. (2016)

Memory B cell repertoire (donor 1)

(a) This frequency distribution follows a wavy line on the double-logarithmic plot. As a result, none of the fitted distributions explain the read frequency distribution of this repertoire.

1e-06 1e-04 1e-02 1e+00

100 101 102 103 104

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: D2-M, DeWitt et al. (2016)

Memory B cell repertoire (donor 2)

(b)As with the repertoire described in (a), the four distributions are not explain the frequency distribution of this repertoire. The ‘upward bend’ in this repertoire seems similar to the shape described for pre-B cells in the mouse BCR dataset.

1e-06 1e-04 1e-02 1e+00

1 10 100

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: D3-M, DeWitt et al. (2016)

Memory B cell repertoire (donor 3)

(c) Of the three human memory B cell receptor reper-toires, this repertoire appears most straight on the double-logarithmic plot. Therefore, the power-law dis-tribution fits this repertoire best. However, some of the highest-frequency reads do diverge from the fitted power-law distribution.

Figure 4.4: The memory B cell repertoires are slightly less ‘bumpy’ on the double logarithmic plots than the naive B cell repertoires. Naive B cell repertoires are shown in Figure 4.3. Still, none of the fitted distributions accurately describe the memory B cell receptor repertoires. Both log-normal and power-law distributions describe small parts of the clonal frequency distributions, but neither yield a satisfying fit. The ˆα estimates are given in Table 4.1.

0.0001 0.0100 1.0000

100 101 102 103

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: HIP09041, Emerson et al. (2017)

T cell repertoire: good power-law fit (human)

(a)The power-law distribution seems to fit the data in this repertoire. The log-normal distribution may also be a good candidate, but the estimated xmin is far lower for power-law.

0.0001 0.0100 1.0000

101 102 103 104

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: HIP11711, Emerson et al. (2017)

T cell repertoire: poor power-law fit (human)

(b) In this case, the power-law distribution fits the data poorly. Both low and high values are not cap-tured by the fitted power-law function. The log-normal distribution may describe some of the lower values in the data, but does not fit the higher values.

Figure 4.5: Some of the T cell repertoire frequency distributions show a straight line on the double-logarithmic plot, but not all. This figures shows two of repertoires where the power-law distribution fits well (a) and poorly (b).

0.0001 0.0100 1.0000

100 101 102 103

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: HIP01393, Emerson et al. (2017)

T cell repertoire: low-value power-law fit (human)

(a)The fitted power-law distribution captures the low values in this dataset, but fails to describe the values above 10.

0.0001 0.0100 1.0000

100 101 102 103 104

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: HIP17793, Emerson et al. (2017)

T cell repertoire: high-value power-law fit (human)

(b)In this case only the higher values of the dataset are explained by the fitted power-law function, but not the lower values.

Figure 4.6: When TCR repertoires have an ‘upward bend’ on the double-logarithmic plot, the power-law distribution is either fitted to the low or the high frequencies in the clonal frequency distribution. In 440 of the cases, the power-law function is fitted to the lower values in the repertoire. In the remaining 169 cases it is fitted to the higher values.

2.9 2.9 3

3 5 7

Female Male Unknown

Gender

Power-law α̂ estimate

Estimated power-law α̂ for human TCR data, by gender

Median values are shown above each boxplot

(a)There is no substantial difference between the ˆα for males and females.

2.9 2.8 3

3 5 7

Positive Negative Unknown

CMV seropositivity status

Power-law α̂ estimate

Estimated power-law α̂ for human TCR data, by CMV status

Median values are shown above each boxplot

(b)Also the CMV status of the individuals does not have an effect on ˆα.

3.1 2.8 2.9 2.3 2.9 2.8 2.9

3 5 7

American Indian

or Alaska Native Asian Black or African

American Native Hawaiian

or other Pacific Islander Unknown Unknown, Hispanic

or Latino White

Ethnicity

Power-law α̂ estimate

Estimated power-law α̂ for human TCR data, by ethnicity

Median values are shown above each boxplot

(c) There are slight differences between the ˆαestimates in ethnic groups, but these are most likely due to small sample sizes. In the groups with a larger representation (white, unknown and hispanic or latino), there are some high outliers.

Figure 4.7: There is little difference in the ˆα estimate of TCR repertoires between gender, CMV seropositivity status and ethnicity. The median ˆα estimate is slightly below 3 for most groups, although there are outliers with far higher ˆα estimates.