• No results found

Pre- and naive mouse BCRs may obey a power law, but not plasma BCRs 25

3.3 Testing of diversity and similarity measures on simulated clonal frequency data . 19

4.1.1 Pre- and naive mouse BCRs may obey a power law, but not plasma BCRs 25

4.1 Fitting power-law distributions to experimental data

To test whether the frequency distributions of HTS immune repertoire data follow a power-law distribution, the data of three previously published studies [15, 24, 25] were analyzed. As described in Section 1.4, it is important to consider alternative distributions when fitting a power-law distribution, because other distributions may exhibit similar behavior. Therefore, log-normal, exponential and Poisson distributions were fitted to the datasets, in addition to power-law distributions. Optimal xmin thresholds were estimated for each of these distributions, meaning that the distributions are only fitted to all frequency values above xmin. Subsequently, the other distribution parameters were estimated.

4.1.1 Pre- and naive mouse BCRs may obey a power law, but not plasma BCRs

The clonal frequency distributions for plasma cells do not follow a straight line on the double-logarithmic plot, meaning a power-law distribution can be ruled out for these cell types. The other distributions did not yield a satisfying fit either. Though pre-B cells produced a more straight line than the plasma cells, the plots often showed an ‘upward bend’, which could not be explained by any of the fitted distributions. Therefore it is unlikely the pre-B cell datasets are truly power-law distributed, but power-law distributions may fit the data well enough for the meaningful comparison of ˆα parameters. Most of the naive B cell repertoires showed a straight line on the double-logarithmic plot. Therefore, a power-law distribution can not be ruled out, although additional tests are needed to truly prove this statement. Nevertheless, as with pre-B cells, the power-law distribution describes the naive B cell repertoires well enough for the ˆα estimate to be meaningful. The predicted ˆα exponents differ across cell types and isotypes, but are mostly consistent within groups.

4.1.2 Naive and memory human BCR repertoires do not obey a power law

Although there were only seven samples of human BCR repertoire data, the samples themselves were very large (see Table 3.1). Power-law, log-normal, exponential and Poisson distributions were fitted to these repertoires and the results are displayed in double-logarithmic plots in Figures 4.3 (naive B cell repertoires) and 4.4 (memory B cell repertoires). The ˆα estimates are given in Table 4.1. None of these repertoires produce a perfectly straight line on the double-logarithmic plots. Instead, all of the repertoires show a typical ‘curve’ among the lower values of the data, that was also seen in the BCR repertoire data for mice (Section 4.1.1). The repertoires in Figures 4.3a and 4.3b are biological replicates. Although they look visually very similar, the power-law distribution is fitted to entirely different parts of the repertoire. This shows that the power-law distribution cannot be fitted robustly to these datasets; different biological replicates can yield vastly different ˆα estimates (as shown in Table 4.1).

None of the samples present a perfectly straight line on the double-logarithmic plots, which rules out the possibility that the data is truly power-law distributed. The naive B cell samples show a ‘bumpier’ line than the memory B cells. As a result, the power-law distribution can not be fitted robustly, even across biological replicates (Figures 4.3a and 4.3b). This makes the ˆα estimate an unreliable descriptor for these datasets.

Cell type Donor and sample estimated ˆα Naive B cells Donor 1 (sample a) 3

Donor 1 (sample b) 6.1

Donor 2 5.8

Donor 3 7

Memory B cells Donor 1 4.8

Donor 2 3.6

Donor 3 4.6

Table 4.1: The estimated αˆ values for human BCR vary between samples. Generally the naive B cells have a slightly higher ˆα estimate than memory B cells, but naive B cell sample a from donor 1 is an exception to this. Figures 4.3 and 4.4 show the corresponding double-logarithmic plots of the frequency distributions.

0.0001 0.0100 1.0000

101 102 103

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: ova_8.prebc_igm, Greiff et al. (2017)

Pre-B cell repertoire (mouse)

(a)Most pre-B cell repertoires follow a fairly straight line on the double-logarithmic plot, although some repertoires show an ‘upward bend’ on this plot. Both power-law and log-normal distributions seem to cap-ture a part of the data well, but do not explain this

‘upward bend’. The fitted power-law distribution cap-tures the high-frequency clonotypes better, whereas the log-normal distribution has a lower xmin.

0.0001 0.0100 1.0000

101 102 103

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: healthy_2.nfbc_igm, Greiff et al. (2017)

Naive B cell repertoire (mouse)

(b) Of the four different cell types, the naive B cell repertoires showed the most straight lines on the double-logarithmic plots. The power-law distribution fits the data better than any of the other distributions.

0.001 0.010 0.100 1.000

101 102 103 104

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: hbsag_4.sppc_igm, Greiff et al. (2017)

Spleen plasma cell repertoire (mouse)

(c) Spleen plasma cell repertoires tend to follow a

‘wavy’ line with one or two shallow bumps on the double-logarithmic plot. None of the fitted distribu-tions explain the data well.

0.0001 0.0010 0.0100 0.1000 1.0000

101 102 103 104 105

Read frequency

CDF

Distribution Exponential Poisson Log-normal Power-law Dataset: healthy_8.bmpc_igm, Greiff et al. (2017)

Bone marrow plasma cell repertoire (mouse)

(d)Bone marrow plasma cell repertoires show a big

‘bump’ on the double-logarithmic plots, that steeply descents and is then followed by a smaller ‘bump’.

The log-normal distribution partly captures the first

‘bump’, but ignores this second bump. The other distributions are not able to explain the data.

Figure 4.1: Each cell type has its own characteristic shape when plotting the reper-toires on a double-logarithmic plot. The pre- and naive B cells show a more straight line, whereas the plasma cells show a bumpy line on this type of plot.

4.1 3.4 1.6 4.1 1.5 1.9

2 3 4 5

Pre B cell Naive B cell Bone Marrow Plasma Cell Spleen Plasma Cell

Cell type

Power-law α̂ estimate Isotype

IgG IgM

Estimated power-law α̂ for mouse BCR data

Median values are shown above each boxplot

Figure 4.2: The estimated power-law exponents vary across cell types and isotypes.

For bone marrow plasma cells there is a clear separation between the estimated exponents for IgG and IgM. For spleen plasma cells the groups overlap mostly.

4.1.3 Power-law distributions are fitted to low or high frequencies of human TCR