Next generation sequencing on soil samples from Etosha National Park: Diversity studies of an anthrax

(1)

Next generation sequencing on soil samples from Etosha National Park: Diversity studies of an anthrax

reservoir.

Karoline Valseth

Master of Science Thesis

Deptartment of Bioscience University of Oslo

Norway

November 1, 2015

(2)

©Karoline Valseth 2015

Next generation sequencing on soil samples from Etosha National Park: Diversity studies of an anthrax reservoir.

Karoline Valseth http://www.duo.uio.no

Trykk: Reprosentralen, Universitetet i Oslo

(3)

Summary

Nutrient availability is important for microbes in soil environments. When an animal dies and decomposes, nutrients are released into the soil. This may cause a shift in the diversity of microbes present, as well as lead to the introduction of new species into the already established microbial community.

In the present study shot-gun metagenomic sequencing was used in order to investigate microbial changes in the soil before and after an influx of nutrients from two anthrax carcasses in Etosha National Park, Namibia. In addition, the study investigated temporal fluctuations in the microbial community after the introduction of nutrients and Bacillus anthracis (B. anthracis), the causative agent of anthrax. The final aspect of the study locked at the phylogeny of the two B. anthracis isolates from the anthrax carcasses and how they compared to published B. anthracis data.

Soil samples were collected at six time-points during the first month after the death of two zebras, within the area of bloodspill from the carcasses. DNA was isolated and sequenced using Illumina MiSeq. The resulting metagenome sequences were cleaned, before taxonomical composition and abundance were determined using the programs metaxa2 and MetaAmp. A clear responsive community was found, were orders like Bacilliales, Fusobacteriales and Clostridiales increased after the zebras were introduced and then decreased during the sample period. The sequences that caused the increase of the Fu- sobacteriales order was from Fusobacteriales equinum, which is known to inhabit the mucosa of horses, indicating a species directly introduced by the zebra. Throughout the sampling period many orders were found to be non-responsive. These orders included the Frankiales, known to degrade organic matter, and Rhizobiales, known to fix nitrogen.

These non-responsive orders likely had a stable level of nutrients available from the soil, and thus were unaffected by the carcass.

From each carcass, a B. anthracis colony was grown, isolated and whole genome sequenced with Illumina MiSeq. The sequences were assembled into contigs using Spades and CLC. Various quality controls indicated that better assembly was achieved with the CLC assembly, and the resulting sequence strains were then sent for annotation to NCBI BioSample. Through phylogenetic analysis, the two CLC strains were found to belong to the A.Br.Aust94 branch of the B. anthracis phylogenetic tree. This is supported by several published studies where the A.Br.Aust94 branch was found to be the most prevalent in Namibia.

The present study found a fluctuating microbial community in response to nutrients released as the anthrax carcasses decomposed. The Bacilliales order were shown to partly dominate the community in three of the sample times. B. anthracis isolated from the anthrax carcasses were shown to belong to the A.Br.Aust94 branch of the B. anthracis phylogeny.

ii

(4)

Acknowledgements

Firstly I would like to thank my supervisors at Forsvarets Forskningsinstitutt (FFI), Jaran Strand Olsen and Mari Espelund, as well as my supervisors at the Centre for Ecological and Evolutionary Synthesis (CEES) at UiO, Nils Christian Stenseth and Thomas Hen- dricus Augustus Haverkamp. With a special thanks to Thomas for staying patient with me whilst teaching me how to do bioinformatics!

I will also thank all the wonderful people that helped me with my fieldwork when I was in Etosha National Park in Namibia. It was a truly rewarding experience, and something I will always be glad I got to do.

Then I would like to thank all the people at FFI who helped me whenever I got stuck or needed a coffee break.

Last, but not the least, I would like to thank all my friends and family, for their continuous support, and laughs and giggles throughout my master project.

Sincerely,

Karoline Valseth

November 1, 2015

(5)

1 Introduction

Soil is a diverse environment, which harbours a varied microbial community. This community is affected, by among others rainfall, temperature and availability of nutrients. In Etosha National Park (ENP) in Namibia, a savannah environment with little water and sparse vegetation, the influx of nutrients from a carcass can have a great influence on the soil’s microbial community. Every year in ENP there are outbreaks of anthrax, caused by Bacillus anthracis (B. anthracis), amongst the animals living there. When animals affected by B. anthracis die and decompose, both nutrients andB. anthracis are released into the soil environment. This may have an effect on the soil microbial community already present in the soil. Metagenomic analysis of the microbiome in the soil can shed some light on such changes.

1.1 Bacillus anthracis

B. anthracis, is a gram-positive, rod shaped bacterium who sporulate to survive outside the host (Sharp and Roberts, 2006). The bacterium is well researched, were many geno- types have been determined (Beyer et al., 2012; Van Ert et al., 2007) and the anthrax disease, mortality rates and survival inside the host are documented (Organization and Epizootics, 2008). The general consensus of the stages of the bacterium’s lifecycle in ENP and similar environments is that it endures as an inactive spore in the top layer of soil until it is eaten or inhaled by a host, upon which the vegetative cycle starts again (Hugh-Jones and Blackburn, 2009; Organization and Epizootics, 2008). Recent findings suggest that there is more to the soil stage of the bacterium’s lifecycle (figure (fig.) 1).

B. anthracis has been found to survive as a vegetative cell inside amoebas living in soil (Dey et al., 2012). In addition some bacteriophages have been found to interact with B.

anthracis enabling the bacterium to survive as a vegetative cell outside a host (Schuch and Fischetti, 2009), whilst other bacteriophages can induce lysis of the bacterium (Ganz et al., 2014a).

1.1.1 Bacillus cereus group

B. anthracis is almost indistinguishable from its close relatives Bacillus cereus sensu stricto (B. cereus s.s.) and Bacillus thuringiensis (B. thuringiensis), where only a few single nucleotide polymorphisms (SNPs) cause the variations seen between the strains (Olsen et al., 2007). The spread, diversity and likely geographic place of origin of B.

anthracis strains can be determined by looking at the evolutionary relationship of the SNP variations between the strains (Simonson et al., 2009; Van Ert et al., 2007). B.

cereus s.s. is a environmental bacterium, who is known to cause food-borne illnesses in humans and B. thuringiensis is a ubiquitous insect pathogen, which is often used as a

1

(8)

Figure 1: The lifecycle of B. anthracis and its possible interactions in soil. From Schuch and Fischetti (2009).

model organism when studying B. anthracis (Swiecicka, 2008). Whilst B. thuringiensis and B. cereus s.s. can be commonly found in a normal soil environment, B. anthracis is mainly restricted to areas where anthrax carcasses have decomposed (Lindeque and Turnbull, 1994).

1.1.2 Plasmids and identification of B.anthracis

Two very recognisable plasmids are found in most B. anthracis strains, distinguishing it from its close relatives. The first plasmid called pXO1, encodes the lethal toxin complex consisting of protective antigen (PA), lethal factor (LF) and oedema factor (EF), which are controlled by the pagA, lef and cya genes (Koehler, 2009; Mikesell et al., 1983) and regulated by the atxA gene (Uchida et al., 1993). The second plasmid, pXO2, harbours the capBCDAE operon, that encodes the poly-γ-D-glutamic acid capsule (Green et al., 1985), which protects the vegetative B. anthracis cells from phagocytosis. In order to develop anthrax disease like symptoms one or both of these plasmids needs to be present in the bacterium (Jensen et al., 2003). While pXO1 and pXO2 have been used to identify B. anthracis, it has been found B. cereus s.s. strains who contain a plasmid, which is 99.6% identical to the B. anthracis pXO1 plasmid (Hoffmaster et al., 2004). Due to this similarity in the plasmids, chromosomal markers such as PL3 (Wielinga et al., 2011), BA5357 (Létant et al., 2011) and BA5345 (Antwerpen et al., 2008) have further been used to distinguish B. anthracis from other B. cereus group members (Ågren et al., 2013). Chromosomal markers are considered to be more accurate as identifiers, since they are generally more stable than plasmid markers. There are also several known SNP markers that are specific to B. anthracis, which can be used for differentiation. One of these markers is the transcriptional mutation in the plcR gene that causes a nonsense mutation resulting in a inactive gene in B. anthracis. The inactive form of this gene

(9)

supports sporulation of B. anthracis (Easterday et al., 2005a).

All of these markers are based on PCR (polymerase chain reaction) detection. For detection purposes this system works well, as it provides rapid and accurate detection of known organisms, with already determined DNA sequences. A disadvantage of PCR is that it only can identify DNA regions for which the primers were designed. This has the potential to introduce errors, especially when using universal primers meant for recog- nition of unknown organisms. Both in the making of the primers and the denaturing, annealing and extension steps in the PCR reaction errors can easily be introduced. This could potentially present a problem when taking into consideration the similarities between the close relatives in the B. cereus group. Another limitation of conventional PCR is that it is not very well suited for looking at diversity in complex samples, such as soil samples.

1.2 Microbial diversity in soil

In environmental soil samples there are an immense variety of organisms present, ranging from bacteria, as probably the most abundant, to archaea and various eukaryotes (Ganz et al., 2012; Hartmann et al., 2014; Nguyen and Bruns, 2015). Although the microbial composition will vary depending on the area, there are a 9 bacterial phylums that seem to be always present and comprise the biggest part of the soil community (Janssen, 2006).

These phyla are Acidobacteria, Actinobacteria, Proteobacteria, Chloroflexi, Verrucomi- crobia,

Bacteriodetes, Planctomycetes, Firmicutes and Gemmatimonadetes. Even though these phyla seem to be present in most soils, the composition of the species present in the soil has the capability to change in response to availability of nutrients (Cobaughet al., 2015;

Hyde et al., 2015). As proved by Howardet al. (2010) who performed a study in which a swine carcass was allowed to decompose on the ground in a controlled environment and microbial changes in the soil were recorded over time. They found that the most prevalent bacteria species at day 0 was proteolytic bacteria, such as variousBacillus species, which was gradually replaced by lipolytic bacteria, such as Acinetobacter species around day 9 and onwards. The microbial changes corresponded well with the availability of nutrients released by the carcass as it decomposed (Howard et al., 2010).

1.2.1 Metagenomics to determine microbial diversity

A suitable approach to understand the composition and the complex relationships between microbial species in the environment, would be to use metagenomic studies. Metagenomic studies investigate all the genomic material in a sample, where the sample could be soil from the environment, waste water from a waterplant or stomach fluids. Newer sequencing platforms such as Illumina HiSeq and MiSeq are able to catch the diversity of such

3

(10)

metagenomic samples by using marker genes, such as 16S rRNA or rpoB, or full shot-gun sequencing of the metagenome (Kozich et al., 2013; Logares et al., 2013). Whilst these studies are computational expensive and creates a enormous amount of data, they could give an invaluable insight into microbial communities, by enabling determination of the diversity and abundance of species in the sample. To understand the complexity of the ecology in the area, it is possible with shot-gun metagenomic datasets, to obtain the metabolic potential within the sequences. There are many programs available that are able to transcribe and then translate the DNA sequences from the metagenome datasets into estimated protein sequences. This will give an indication of the metabolic potential within the environment from which the samples came (Neelakanta and Sultana, 2013; Tan et al., 2015). This was very difficult to do until metagenomic sequencing was a possibility (Fierer et al., 2007, 2012).

1.2.2 Challenges in metagenomics

One of the main challenges with shot-gun metagenomic samples from a soil environment is basically the huge diversity of microbes found in soil. This variety will generate an immense amount of sequences, many of which might be uncompleted sequences or sequences from previously unknown organisms. A issue will be to separate out full sequences and unknown sequences from the uncompleted ones. For groups of organisms that are very closely related, such as the B. cereus group, the difficulty lies in separating them taxo- nomically. The basis for many programs used to analyse metagenomic data and determine taxonomy, is to find clusters of similar sequences based on similar 16S/18S rRNA regions in the dataset, which is called an operational taxonomical unit (OTU). The use of OTUs have a potential to aid the discovery of new species as they cluster based on sequence similarity first, not already described taxonomy (He et al., 2015). The theory behind this clustering technique is the idea that species closely related to each other most likely have similar sequences, hence they should cluster together. Since metagenomic studies of environmental soils are still in an early stage, the potential for OTUs to find new species is important and will lead to a better understanding of microbial communities.

1.3 Anthrax disease

1.3.1 Anthrax in Etosha National Park

Acquisition ofB. anthracis in livestock and wild animals is dependent on the ecology and geography of the area they live in. ENP lies in the centre of northern Namibia and is situated around a 4760 km² saltpan, called the Etosha pan. It was founded as a game reserve in 1907 and received status as national park in 1967. Environmentally the park is an arid dry area with sparse water availability, however in the rainy season lush patches of vegetation and grass steppes appear. The anthrax disease is endemic and considered

(11)

to be a natural part of the ecosystem in the park. Caretakers of ENP only intervene to remove animals dead of anthrax from waterholes, due to the risk of water contamination and mass infection of animals flocking to the waterholes in the dry season.

Since the natural lifecycle ofB. anthracis has been preserved in the park, researchers have been able to study the bacterium here for more than 40 years. Research has previously mainly focused on the ecological aspect and the effect of anthrax on the carnivores, scavengers and herbivores in the park. Interestingly carnivores and scavengers seem to be less affected by anthrax than herbivores, even though they feed on animals that have died of the disease. It is hypothesised that the constant exposure to the bacterium results in a stable level of antibodies against the bacterium in the scavangers and carnivores. Lions in the park have been shown to have antibodies against B. anthracis, whereas lions outside the park in non-anthrax areas did not (Turnbull et al., 1992; Turnbull et al., 1989).

1.3.2 Disease vectors and affected species

Peaks of anthrax outbreaks varies depending on the animal species and the geography of the area. In ENP the animals that most often succumb to anthrax are the grazers, such as plains zebra (Eqqus quagga), blue wildebeest (Connochaetes taurinus) and springbok (Antidorcas marsupialis) (Beyer et al., 2012). A possible explanation may be that when nutrients from a decomposing anthrax carcass seep into the ground beneath it, hence stimulating the growth of grass and other plants in the localised area, which in return attracts the grazers (Havarua et al., 2014). When grazing animals visit anthrax positive carcass sites to feed, there is a potential to acquire the disease via the B. anthracis spores present in soil adhering to grass and roots (Turner et al., 2014). Plains zebra, the species who seems to be most affected by anthrax in ENP (Beyer et al., 2012; Lindeque and Turnbull, 1994), have a peak in death rates at the end of the rainy season. Whilst elephant (Loxodonta africana) have a similar peak at the end of the dry season (Beyer et al., 2012).

In Kruger National Park in South Africa it is mainly browsers, animals that feed on leaves and shrubs like the greater kudu (Tragelaphus strepsiceros), who die of anthrax (Braack and De Vos, 1990; Hugh-Jones and Vos, 2002). In this ecosystem it is thought that blowflies are the vector of the disease (Braack and De Vos, 1990). The blowflies ingest the bacterium and spores by taking a bloodmeal from the anthrax carcass. The fly then lands on nearby shrubs or trees where it vomits and defecate on the leaves thereby contaminating them. Browsers are then infected byB. anthracis when they eat the leaves contaminated by the blowfly. Although there are various routes of infection depending on the area, the common factors for initial spread of the disease are that the bacterium sporulates upon the opening of the carcass by scavengers. The spores can then be spread by wind, water and various vectors (Hugh-Jones and Vos, 2002). The animals themselves also have a potential for spreading B. anthracis around, after ingesting the spores it takes

5

(12)

a couple of days before the disease manifests and the animal dies. The animal would most likely have moved away from the place of ingestion in search for more food, thus B.

anthracis would be spread to a new localised area.

1.3.3 Route of transmission

There are three main routes of transmission of anthrax; by inhalation, ingestion or cutaneous (via the skin) infection. Of these three, inhaled and ingested anthrax are considered the most lethal. According to U.S. Food and Drug Administration, the mortality rates of anthrax varies depending on the exposure to the spores or bacterium. But generally, when left untreated, 20%, 25 -75% and ≥ 80% die of cutaneous, ingestional and inhalational anthrax respectively in humans (http://www.fda.gov/BiologicsBloodVaccines/Vaccines/

ucm061751.htm). The infection manifests differently depending on the route of transmission.

Cutaneous anthrax is often facilitated by contact between open wounds or cuts in the skin and contaminated animals, hides or soil. This type of anthrax can be easily treated and fatalities are rare if discovered early after infection. Injectional anthrax, occurring predominantly with heroin addicts, comes under cutaneous anthrax, but can be more severe since the needle reaches further into the skin and muscle.

Inhalation anthrax has flu-like symptoms, and is deadly within 2-3 days if not treated, however incubation time can vary depending on the number of inhaled spores (Dayet al., 2011). A study on rhesus monkeys (Macaca mulatta) showed that massive haemorrhages occurred in the mesenteric and tracheobronchial lymph nodes and in the lungs in general, as well as haemorrhages and acute inflammation in the spleen and lymph nodes (Fritz et al., 1995).

Ingested anthrax is mostly acquired through eating undercooked meat from animals infected with the bacterium among humans, and for animals through consumption of grass or leaves with adhering spores or anthrax infected carcasses. Symptoms include amongst others; nausea, abdominal pain and swelling of the neck. It leads to lesions, oedema, haemorrhages and tissue necrosis who can appear anywhere in the gastrointestinal tract and nearby tissues (Hugh-Jones and Vos, 2002; Organization and Epizootics, 2008).

1.4 B. anthracis survival

1.4.1 Survival within the host

Upon entering the host by any route of transmission, B. anthracis spores are engulfed by macrophages, which travel to the regional lymph nodes. Inside the lymph nodes B.

anthracis spores germinate and start to replicate (Irenge and Gala, 2012). During growth, B. anthracis produces highly lethal toxins, leading to local cell death. After cell lysis, B. anthracis may enter the blood stream and induce bacteraemia, which will damage

(13)

blood vessels and cause oedema and haemorrhages (Zakowska et al., 2012). The lethal toxin and oedema toxin produced by B. anthracis, act to hinder the immune system by obstructing the cell signalling pathways and interfering with phagocytosis. Phagocytosis is the process where the immune system engulfs a foreign bacterium in order to protect itself from infection. Disabling of the phagocytosis enables the bacterium to survive within the host and continue to release the B. anthracis specific toxins resulting in host death (Moayeri and Leppla, 2004, 2009).

1.4.2 Survival outside the host

VegetativeB. anthracis thrives in a low oxygen environment, such as the human body and is unable to survive for long as a vegetative cell outside the host (Driks, 2009). Survival of vegetative B. anthracis is also affected by access to amino acids. McKevitt et al. (2007) tested the effect of D-alanine on germination of B. anthracis and found that D-alanine inhibited germination in the unfavourable condition where B. anthracis interacts with macrophages, which enhances survival of the bacterium (McKevitt et al., 2007). In order to survive in soil, a high oxygen, low amino acid environment, B. anthracis sporulates.

The sporulation process is triggered by several factors, such as higher oxygen levels, lack of nutrients / amino acids, pH and changes in temperature (Organization and Epizootics, 2008).

1.5 B. anthracis in the microbial diversity

Whether B. anthracis affects the microbiome in any way when an animal dies of anthrax is not well studied. Studies in ENP (Bellan et al., 2013) and in Wood Buffalo National Park in Canada (Dragon et al., 2005) have shown that high densities of B. anthracis spores can be found in soil where blood from an infected animal has spilled into the ground, and that the density is lower in areas that are not affected by said blood-spill.

However, while these studies indicate that high densities of B.anthracis is present, they do not address a possible interaction or effects of the bacterium on the soil microbiome.

One study however, did note that B. anthracis can promote the growth of Enneapogon desvauxii, a preferred grazing grass by plains zebra (Ganzet al., 2014b). Thus, indicating that B. anthracis could play a bigger role in the soil ecology than presumed so far.

1.6 Aim

This study analysed isolated metagenomic DNA from a number of soil samples taken directly from beneath two zebras recently deceased from anthrax. The analysis was performed in order to assess the impact on the soil’s microbiome in the first month after B. anthracis was introduced by the decomposing zebras. Illumina MiSeq was used to

7

(14)

sequence the metagenome samples and for whole genome sequencing of B. anthracis isolated. The isolates were obtained from soil samples contaminated with zebra blood and were used to determine their relationship to the already known lineages of B. anthracis.

This analysis aims to increase our knowledge ofB. anthracis and the temporal microbiome dynamics when nutrients and B. anthracis are introduced into the soil community.

(15)

2 Materials and Methods

2.1 Sample collection

Two fresh zebra carcasses, hereafter called Ca1 and Ca2, were found on 03.03.2014 at coordinates S19.03133/E015.54865 (Ca1) and S19.03715/E015.55367 (Ca2) 835 m apart.

Ca1 was unopened by scavengers, resulting in a pressure build up and consequently a lot of blood had poured out of the nose and mouth. Some intestine of Ca2 had been dragged out from the anus by scavenging vultures, which resulted in lower blood leakage from the nose and mouth area. Hence the jugular vein was cut to allow more blood to seep into the ground. The sample area was defined as a 30 x 30 cm grid around the focal point of the blood-spill. This was around the nose area of Ca1 (fig. 2A) and the jugular vein area within the red square of Ca2 (fig. 2B). Samples were collected into a 50 ml tube from within the top 1 to 1.5 cm layer of soil. Each tube was filled with 50/50% soil and blood-covered soil ratio. The soil was collected randomly from within the whole sample area. This was done to ensure that as much of the microbial community present in the area was obtained. Samples were taken at several times (T) throughout the first month, starting at T = 0 (when the carcasses were found), T = 3 days, T = 7 days, T = 14 days, T = 21 days and T = 30 days. A control sample was also taken from the same grids, but without any blood present at day 0, T = Blank.

(A) Ca1 (B) Ca2

Figure 2: (A) Ca1, soil samples were taken from within the 30 x 30 cm grid, (B) Ca2, samples were taken from within the red square (resembling the 30 x 30 cm metal grid shown in A).

Nothing was done to prevent scavengers from removing or eating part of the carcasses, before or after the samples were taken. Samples were brought right back to the laboratory in Okaukuejo, ENP, Namibia, which was an approximately 50 minutes drive from the sites, and stored at - 20 ℃. Additional 10 cm deep soil core samples were taken at day 3, at 4 different points on a circle with a 10 m radius from the epicentre of each carcass. In order to determine the nutrient composition of the soil in the area, the 4 core samples collected from each carcass site, were mixed and analysed for pH, electrical conductivity

9

(16)

(ECw) (uS/cm), organic matter (OM)(%), phosphorous (P), potassium (K), calcium (Ca), magnesium (Mg) and sodium (Na) (all in parts per million (ppm)). The soil composition of sand, silt and clay composition were analysed at the Agricultural Laboratory in Windhoek, Namibia. Nitrogen (N) contents was determined using the Kjeldahl method at Analytical Laboratory Services in Windhoek, Namibia. Rainfall data was also measured during the 30 day period at Okaukuejo in millimetre (mm).

2.2 Sample preparation

2.2.1 DNA isolation

DNA was isolated using a FastDNA spin kit for soil (MP Biomedicals, Ohio, USA). The manufacturers protocol, found at http://www.mgp.cz/files/kity/FastDNA_Spin_Kit_for

%20soil.pdf, was followed with these adjustments at corresponding steps in the protocol:

• Step 1, between 250 and 300 mg of soil were added.

• Step 4, homogenised with a MiniBead Beater, model 607 (BioSpec Products, Okla- homa, USA) for 80 sec.

• Step 5, centrifuged for 10 min after adding MT buffer.

• Step 10, 650 µl was removed.

• Step 13-14, centrifuged for 3 min.

• Step 16, DES buffer was heated to 55 ℃ in Dri-Bath, type 17600 (Thermolyn, Iowa, USA) before addition. Then tubes where put on a heat block for 5 min at 55 ℃.

For each sample, 14 in total, DNA from 8 x 0.250 gram soil per carcass were isolated and divided into two pools before sterile filtration was undertaken as described below, meaning each sample had a pool 1 and a pool 2 of DNA. Hence, each carcass had two pools of DNA isolates for every time-point. The DNA extracts were sterile filtrated using an Ultrafree Durapore PVDF 0.1 µM spinfilter (Millipore, Darmstadt, Germany) and centrifuged using a Force Microcentrifuge, model FORCE 1418 (Select Bioproducts, New Jersey, USA) at 11.000 revolutions per minute (rpm) for 3 min.

2.2.2 Purification

To keep the DNA intact during transport from ENP to Forsvarets Forskningsinstitutt (FFI) at Kjeller, Norway, an ethanol precipitation was performed. To each sample, 3M CH2COON a (24.6 g CH2COON a in 70 µl dH2O, pH adjusted to 5.2 with 100%

CH3COOH) in the ratio 1/10 of the sample volume and 2.5 X the sample volume of 100% ethanol was added, tubes were mixed by inverting the tube 20 times. The samples

(17)

were then placed on ice for 5 min and centrifuged at 13000 rpm for 40 min. After cen- trifugation, the supernatant was removed from the samples and pellets were air-dried in a Bioflow laminar flow cabinet for 15 min.

After transporting the samples from Okaukuejo to FFI, an equal volume of DES buffer, as was removed during the ethanol precipitation in Okaukuejo, was added to the samples at FFI to ensure no loss of DNA due to changes in volume. For sequencing a certain concentration of DNA was needed, this meant that before purification pool 1 and pool 2 for each carcass and time-point had to be mixed together to ensure enough sample volume to reach the proper concentration for sequencing. Samples were then concentrated and purified using Agencourt AMPure XP beads (Beckman Coulter, Massachusetts, USA), and treated with a PowerClean Pro DNA clean-Up Kit, (MO BIO Laboratories, California, USA) to remove any remaining inhibitors. The Quant-IT PicoGreen dsDNA Assay kit (Invitrogen, Oregon, USA) was used as the manufacturer described and analysed with FluoStarOptima (BMG Labtechnologies, Offenburg, Germany) to determine the quantity of DNA. The DNA quality was determined with the use of Agilent 2100 Bioanalyser, model G2938C and the Agilent DNA 7500 Kit (Agilent Technologies, Waldbronn, Germany) following the protocol described by the manufacturer. A standard NanoDrop analysis with ND-1000 Spectrophometer (Thermo Fisher Scientific, Delaware, USA) was also used to determine the DNA quality.

2.2.3 Spiked soil sample

In order to have a comparison for amount of B. anthracis extracted in the other soil samples. Two 250 mg soil samples from the area around Ca1, but without blood, were spiked with Sterne34F2 B. anthracis equal to 3.4 x 10⁶ spores and DNA extracted as described above.

2.3 Metagenome samples from soil isolates

2.3.1 Sequencing and cleaning of raw reads

DNA metagenome samples were sequenced at the Norwegian Sequencing Centre (NSC) using Illumina MiSeq (Illumina Inc., San Diego, USA). A paired-end sequencing library was generated with a 400 bp insert length and 250 bp reads using a Regular TruSeq adapter ligation kit (Illumina Inc., San Diego, USA). The resulting metagenomic raw reads for each time-point went through a series of quality control steps before the taxonomical analysis. This included the removal of adapters using cutadapt 1.8.1 (Martin, 2011) with parameters set to -q 20, –minimum-length 50 and –max-n 1. Further controls included additional removal of duplicated regions, reverse complement, exact duplicates and low complexity reads using PRINSEQ 0.19.5 (Schmieder and Edwards, 2011) with parameters set to -min_qual_mean = 20, -trim_qual_left = 20, -trim_qual_right = 20, -ns_max_n = 1,

11

(18)

-min_len = 50, -exact_only, -derep 14, -derep_min 2, -lc-method entropy and –lc-threshold 70. Number of reads in each step was visualised in R-studio (Supplementary section 6.2.1 and Table S1 and S2).

2.3.2 Taxonomical determination of reads

For determining the taxonomy in the sequenced metagenome samples, extraction of metagenomic reads matching 16S/18S rRNA sequences was performed using the metaxa2 2.0.1 (Bengtsson-Palmeet al., 2015) program on the small subunit (SSU) with the quality filter set to 20 and using the metaxa2 2.0.1 NCBI BLAST database (metaxa2_db).

2.3.3 Quantitative PCR

A quantitative PCR (qPCR) using taqMAMA (Easterday et al., 2005b), a B. anthracis specific assay that detects the mutation in the plcR gene, was used to quantitate the amount of B. anthracis in each metagenome sample. The qPCR was run as described by Easterday et al. (2005b) on a LightCycler®96 (Roche, Mannheim, Germany) using the LightCycler®Roche 96 software 1.1. To each sample, in triplicate, 9 µl of mastermix were added with the following final concentrations: 1 x PCR MasterMix, 1 mg/ml BSA, 1.2 µM forward primer, 1.2 µM reverse primer, 0.25 µM taqMAMA, 1 x EXO IPC Mix (AB, California, USA) and 1 x EXO IPC DNA (AB, California, USA). The resulting concentrations for each soil sample were then adjusted to estimated number of genomes per gram soil (Table S3 and S4) and visualised using R-Studio (Supplementary section 6.2.2).

2.4 Whole genome sequencing of B. anthracis isolates from zebra carcasses

2.4.1 Isolation of B. anthracis from soil samples

Preparation of DNA from B. anthracis to be whole genome sequenced was obtained by following this protocol: 2.5 g of soil collected from each carcass site at day 3 was transferred into a 50 ml tube and 45 ml N a4O7P2 (0.9 g N a4O7P2 crystals into 900 ml dH2O, autoclaved at 125 °C for 20 min) was added, the tubes were vortexed in a Multitube vortexer, model 94505 (VWR Scientific Products, New Jersey, USA) at full speed for 10 min. The tubes were then centrifuged in a Eppendorf centrifuge, model 5702 (Eppendorf, Hamburg, Germany) for 3 min at 300 relative centrifugal force (rcf) and the supernatant was poured into a new 50 ml tube, the pellet in the old tube was discarded. The supernatant was centrifuged for 15 min at 3000 rcf. The supernatant was then discarded and the pellet re-suspended in 5 ml of dH₂O, in order to create the stock solution. Serial dilutions of the supernatant, ranging from 10⁻² to 10⁻⁶, were

(19)

plated onto polymyxin-lysozyme-EDTA-thallous acetate (PLET) medium (Organization and Epizootics, 2008). The plates were then incubated at 37 °C for 4 days. From each carcass one colony was chosen from the 10⁻³ dilution and re-grown on PLET for 3 days.

DNA was isolated and purified as described above, but instead of adding soil, all the available bacteria on the PLET dish were added to a 1.5 ml eppendorf tube filled with 300 ml phosphate buffered saline (PBS) buffer (1 tablet from Sigma-Aldrich, USA, dissolved in 200 ml dH₂O, autoclaved at 121 °C for 20 min) before isolation.

2.4.2 Sequencing and cleaning of isolates

The bacterial samples grown and isolated on PLET, from Ca1 and Ca2, were sent for whole genome sequencing at the NSC using the TruSeq Nano reagents (Illumina Inc., San Diego, USA) and sequenced with Illumina MiSeq and the following parameters; paired- end, insert size of 500 bp and a read length of 300 bp. Adapters were removed from raw reads by AdapterRemoval 1.5.4 (Lindgreen, 2012) with parameters set to –trimns, –maxns 0, –trimqualities, –minquality 20, –stats, –minalignment 50. The adapter removal was also run with the same settings, but with -maxns set to 1 to improve contig lengths.

The quality of the cleaned files was finally checked with SGA preqc (Simpson, 2014).

2.4.3 Genome assembly

For the sequenced strains to be sent for annotation, a genome assembly was necessary. A SPAdes 3.5.0 (Bankevich et al., 2012) assembly was performed using the singleton, pair1 and pair2 truncated files from the AdapterRemoval 1.5.4, with these settings -careful, -t 16, -k 21,31,41,51,61,71,81,91,101,111,121,127 and -m 62. This resulted in files containing contigs of the strains. Another assembly using CLC was also performed on the same set of files from the AdapterRemoval analysis, obtaining a fasta file with contigs for the sequenced isolates from Ca1 and Ca2, to allow for comparison. The resulting contig files were examined using the Mauve 2.3.1 aligner (Darlinget al., 2004) with standard settings.

In order to visualise the coverage of the contigs, reapr 1.0.17 (Hunt et al., 2013) were run with the options facheck, smaltmap and pipeline, on the contig files from the SPAdes and CLC assembly to generate the bam and sam files needed by IGV 2.3.40 (Robinson et al., 2011) to visualise the contigs. Reapr also evaluated the SPAdes and CLC assemblies. In both Mauve, Reapr and IGV the B. anthracis Vollum strain was used as reference. NCBI nucleotide blast (Johnsonet al., 2008) at http://blast.ncbi.nlm.nih.gov/Blast.cgi was used to determine if the contigs belonged toB. anthracis. This was in order to remove erroneous contigs, and determine the best assembly of the B. anthracis strains from Ca1 and Ca2 to send for annotation at NCBI BioSample. The sequenced B. anthracis strains from Ca1 and Ca2 that were sent for annotation and used further in this study are hereafter called K1 and K2 respectively.

13

(20)

To determine the coverage of our B. anthracis strains in the metagenome samples, the cleaned metagenome files were mapped against The K1 and K2 strains using bwa- index and bwa-mem (not yet published) from BWA 0.7.8 and index, flagstat, depth and milepileup options in SAMtools 1.1 (Li et al., 2009). Results (Table S5) was visualised using RStudio (Supplementary section 6.2.4).

2.4.4 Phylogenetic analysis

In order to compare the K1 and K2 B. nathracis isolates to a larger collection of already known whole genome sequencedB. anthracis strains, the available fasta files of theB. anthracis genomes on the NCBI genome browser were downloaded. There were 112 strains available as of 06.10.2015, 109 of these (Table S12) and the K1 and K2 strains were used in the phylogeny analysis. A concatenation of the fasta files were needed in order to create a pseudochromosome such that each strain was represented by one sequence, this was facilitated by the union command within the emboss 6.5.7 package (Riceet al., 2000). Then the fasta files, 111 in total, were uploaded to PanSeq (https://lfz.corefacility.ca/panseq/

analyses/) (Lainget al., 2010) and the analysis run with default settings with these excep- tions; percent sequence identity cut-off 95 and core genome threshold 111. The resulting snp.phylip file was then uploaded and analysed in SplitsTree 4.13.1 (Huson and Bryant, 2006) where a neighbour-net tree was created with uncorrected P-values and equal angles.

To show the relationship to theB. cereus group, the coding sequence (CDS) regions of K1 and K2 strains together with CDS-regions of two other B. anthracis, three B. cereus s.s. and two B. thuringiensis strains were compared against the B. anthracis Ohio ACB strain using Blast Ring Image Generator (BRIG) 0.95 (Alikhan et al., 2011) with the BLAST options set to -evalue 0.0001 and -max_target_seqs 1.

2.5 Statistical analysis

The taxonomical clustering between the metaxa 2 metagenome samples were visualised with a heatmap created in RStudio using the modules gplots, Heatplus (Ploner, 2015), vegan (Oksanenet al., 2015) and RcolorBrewer (Neuwirth, 2014) (Supplementary section 6.2.5). Bray-Curtis dissimilarity, average linkage and log normalisation were used to visualise species similarity, cluster the data and correlation between the to carcass sites.

The unknown section (present in all the taxonomical levels) was removed from all the datasets but at kingdom level to avoid skewness of the datasets.

To visualise the diversity of the metagenome samples, species richness and rarefaction were determined by processing the metaxa 2 extraction files through the MetaAmp 1.1 (http://ebg.ucalgary.ca/metaamp/) pipeline with these settings: metaamp.pl -map Mapping_Zebra.txt (Table S14) -an ZebraSSU_byZebra -seqtype single -seqformat fasta -minoverlen 50 -maxdiffs 0 -trunclen 250 -maxee 0.5 -oligos oligos.txt -pdiffs 0. Cor-

(21)

responding graphs were generated using the plot function in RStudio (Supplementary section 6.2.6) using the resulting tab-delimited files from MetaAmp (Table S6 and S7).

To further investigate the relationship between the diversity at each carcass site a heatmap was created with the 30 most abundant organisms at each time point at the order level within the metagenome datasets. This was enabled by first log5 scaling the data and then using decostand with the total option in the vegan package to obtain relative abundance. The heatmaps were created by the ggplot2 (Wickham, 2009) package in R-Studio (Supplementary section 6.2.7 and Table S10 and S11).

15

(22)

3 Results

3.1 Sample collection, DNA extraction and purification

There was very little rainfall during the sampling period, except the two days prior to day 21 (fig. S1). Concentration of the DNA extracted from soil at each carcass site varied between 23 and 63.07 ng/µl of DNA prior to purification and up-concentration steps.

The up-concentration increased the concentration of DNA to lie between 26.71 and 88.22 ng/µl (Table 1), and the purification steps successfully increased the A260/280 ratio to over 1.8 as stipulated by NSC in DNA samples sent for sequencing.

Table 1: Visualising DNA per gram soil, start and final DNA concentrations and DNA quality (where the A260/280 ratio refers to the DNA quality) in each metagenome sample.

Sample DNA [Start] [Final]

µg/g soil ng/µl A260/280 ng/µl A260/280

Ca1_Blank 16.4 52.3 1.85 65.6 1.84

Ca2_Blank 20.5 63.1 1.91 88.2 1.91

Ca1_0 7.5 24.1 1.90 26.7 1.85

Ca2_0 17.0 52.2 1.90 73.9 1.91

Ca1_3 11.7 17.4 1.62 41.2 1.84

Ca2_3 12.6 38.7 1.59 73.4 1.90

Ca1_7 7.4 22.6 1.51 38.4 1.92

Ca2_7 7.6 24.9 1.53 79.8 1.91

Ca1_14 10.3 29.4 1.54 43.5 1.94

Ca2_14 17.6 57.3 1.75 87.9 1.92

Ca1_21 8.9 27.3 1.56 57.0 1.89

Ca2_21 7.4 22.9 1.58 32.4 1.89

Ca1_30 9.2 29.9 1.62 36.4 1.85

Ca2_30 12.6 39.3 1.91 48.6 1.89

3.2 Sequence recovery after cleaning of sequenced metagenome samples

The sequencing of the metagenome samples resulted in an average of 7755062.6 R1 reads with a standard deviation (SD) of 971341.2 and 8379956.4 (SD=1233229.1) R2 for Ca1 (fig. 3A) and Ca2 (fig. 3B), respectively. After cleaning the datasets these were reduced to a final number of reads with an average of 5664129.3 (SD = 567519.7) for pair 1 and 5892444.1 (SD = 606504.5) for pair 2. This represents a loss of reads with an average of 28.8% (SD = 8.8%) for pair 1 and an average of 26.4% (SD = 7.2%) for pair 2.

(23)

Blank 0 3 7 14 21 30 Time (days)

Reads (n) 0 4000000 800000012000000

Raw R1 Raw R2 Cut adapt Pair 1 Cut adapt Pair 2 Prinseq Pair 1 Priseq Pair 2

A

Blank 0 3 7 14 21 30

Time (days) Reads (n) 0 4000000 800000012000000

Raw R1 Raw R2 Cut adapt Pair 1 Cut adapt Pair 2 Prinseq Pair 1 Priseq Pair 2

B

Figure 3: Number of reads for Ca1 (A) and Ca2 (B) showing raw reads, reads after cutadapt and after the prinseq analysis. The Blank sample is with no blood present.

17

(24)

Blank 0 3 7 14 21 30 Time (days)

Adjusted genomes per gram soil 0 200000 6000001000000

P1 P2 Spike

A

Blank 0 3 7 14 21 30

Time (days) Adjusted genomes per gram soil 0 200000 6000001000000

P1 P2 Spike

B

Figure 4: qPCR adjusted for genomes per gram soil for Ca1 (A) and Ca2 (B) pool 1 (P1) are marked in blue and pool 2 (P2) are marked in purple, whilst the spiked samples are shown in darkgreen. Each spiked sample is spiked with 3.4 x10⁶ Stern34F2B. anthracis spores into 0.25 g of soil free from blood from carcass site Ca1.

(25)

3.3 Quantitative determination of B. anthracis in metagenome soil samples

A qPCR was performed on DNA from pool 1 (P1) and pool 2 (P2) from the metagenome samples, as well as the spiked sample, in order to determine the isolation efficiency and the amount of B. anthracis in each metagenome sample. In the qPCR with the tacMAMA primers, the spiked samples had a estimated recovery of 347183 and 654419 genomes, representing 10.2% and 19.2% isolation efficiency respectively. Ca1 had a high peak of B. anthracis genomes at day 0, with a estimated recovery of 514656 and 349230 genomes in P1 and P2 respectively (fig. 4A), whilst the same peak occur at day 3 for Ca2 (fig.

4B) with the estimated recovery of 935506 (P1) and 806550 (P2) genomes. The peaks at day 3 in Ca2 are significantly higher than the spiked sample, however only P2 in Ca1 is significantly higher than the spiked sample in the peak at day 0. After this high peak in both carcasses, the number of genomes present from day 7 (from day 3 in Ca1) and onwards are below 20000 genomes. In Ca2 there are no B. anthracis genomes present in P2 on day 21 or P1 and P2 on day 30.

19

(26)

3.4 Taxonomical diversity of soil metagenomes

The taxonomical composition of the metagenome datasets was determined by metaxa2.

Although the number of sequences that could be determined and provide taxonomical information varies from 19661 sequences in Ca2_21 to 110996 in Ca1_7 (Table S8). The relative abundance of the different domains stays the same throughout the time-points (fig. 5). Bacteria are found to have a relative abundance of about 0.7 in all the samples, this drops to 0.17 for mitochondria and 0.11 for chloroplasts. Archaea and eukaryota both have a relative abundance around 0.01. The uncertain reads have a relative abundance lower than 0.0013.

Blank 0 3 7 14 21 30

Time (days) Realtive abundance (reads) 0.00.20.40.60.8

Archaea Ca1 Archaea Ca2 Bacteria Ca1 Bacteria Ca2 Chloroplast Ca1 Chloroplast Ca2 Eukaryota Ca1 Eukaryota Ca2 Mitochondria Ca1 Mitochondria Ca2 Uncertain Ca1 Uncertain Ca2

Figure 5: Relative abundance of 16S/18S sequences from metaxa2 analysis shows the metagenome composition. Archaea are almost undetectable, and the number of uncertain sequences are very low.

The MetaAmp analysis gave us 421 different OTUs at the genus level for all the samples combined. The two Blank samples had the fewest number of OTUs, 179 and 170 for Ca1 and Ca2 respectively, and the highest numbers of OTUs was found in Ca1_3. The rarefaction curve (fig. 6A) shows that the diversity of the microbial community has been captured, since the number of OTUs levels off after about 5400 sequences. This indicates that the diversity in the soil samples is about the same for Ca1 and Ca2 regardless of sample time-points, although the highest number of sequences per OTU varies from about 2000 to 200000 sequences. However, there is a shift in diversity between day 3 and day 7 for both carcasses. The number of OTUs as the sequence number increase is always lower in Ca2 than Ca1 per time-point (fig. S2A). This is confirmed in the rank abundance

(27)

plot (fig. 6B), day 3 and day 7 have the lowest evenness. Ca1 has a higher abundance of OTUs than Ca2. At the remaining time-points, the evenness increases, where in Ca2_21 and Ca2_30, almost all the OTUs have the same abundance.

21

(28)

Sequences (n) OTU (n) 0100200300400500

0 5000 10000 15000 20000 25000 30000

Blank 0 3 7 14 21 30

A

OTU rank (log) Abundance (OTU) 050010001500

1 2 5 10 20 50 100 200 500

Blank 0 3 7 14 21 30

B

Figure 6: Rarefaction curve (A) and rank abundance (B) showing all the different time-points per carcass in different colours, where the solid line is Ca1 and dotted line is Ca2.

(29)

3.4.1 Coverage of K1 and K2 strains in metagenome samples

The whole genome sequenced strains K1 and K2 were compared to the metagenome datasets in order to detect the coverage of each strain in the samples by using the mapping program bwa mem from the BWA-package. The K1 and K2 strains were so similar that the settings for the genome mapping did not detect any difference between them, hence the graphs for the two strains were next to identical and the coverage graph of K1 is shown here (fig. 7). The K1 strain was dominant in all samples harvested from Ca1 and Ca2 except in day 0. The coverage of K1 and K2 strains are higher than the Blank in all but day 0. On day 21 the coverage of K1 in Ca1 and Ca2 are only 0.85 and 1.7 higher than in the Blank sample.

Blank 0 3 7 14 21 30

Time (days) Sequence coverage 05101520

Ca1 Ca2

Figure 7: Coverage ofB. anthracis K1 strain in Ca1 and Ca2 as calculated from bwa mem. The coverage of K2 on Ca1 and Ca2 were close to identical to K1, hence the figure was not included.

23

(30)

3.4.2 Clustering of metagenome samples

The metaxa2 analysis divided the data into nine taxonomical levels, however the accuracy of the lower levels in the taxonomical determination was uncertain. Resulting in a focus on the order level. 289 different orders were detected, spanning the three domains Archaea, Bacteria and Eukaryota as well as mithocondria and chloroplasts, who are the artificial created groups by the NCBI taxonomy process. The 289 orders can be further subdivided into 595 families and 1516 genera. In addition about 50% of the 16S/18S rRNA sequences detected by metaxa2 were categorised as unknown.

Clustering of the taxonomical dataset was performed by first log 10 normalisation of the 50 most abundant orders within the metaxa2 datasets. Bray-Curtis dissimilarity was then calculated, showing how similar the samples were before hierarchical average linkage was performed to get the best match between the samples. This process revealed two main clusters within the 14 samples in the dataset (fig. 8), where both blank samples were closest together and linked with both of the day 0 samples. Also in the same cluster, but clearly distant from the other four samples in the same cluster is Ca2_3. The rest of the samples were placed in the other cluster with no clear correlation between carcass or sample time.

Ca2_30 Ca2_21 Ca1_21 Ca2_14 Ca1_7 Ca1_3 Ca2_7 Ca1_14 Ca1_30 Ca2_3 Ca1_0 Ca2_0 Ca1_Blank Ca2_Blank

Fusobacteriales Pasteurellales Craniata Mitochondria Basal fungi Erysipelotrichales Lactobacillales Cyanobacteria_SubsectionIII Flavobacteriales Enterobacteriales Bacteroidales Unclassified Mitochondria Pezizomycotina Unclassified Bacteria Xanthomonadales Unclassified Gammaproteobacteria Pseudomonadales Myxococcales Burkholderiales Unclassified Alphaproteobacteria Sphingomonadales Rhodospirillales Rhodobacterales Rhizobiales Caulobacterales Planctomycetales Gemmatimonadales Unclassified Firmicutes Clostridiales Unclassified Bacilli Bacillales Unclassified Cyanobacteria Unclassified_Chloroflexi Chloroflexales Sphingobacteriales Cytophagales Unclassified_Actinobacteria Unclassified Thermoleophilia Solirubrobacterales Rubrobacterales Unclassified Actinobacteria Streptomycetales Pseudonocardiales Propionibacteriales Micromonosporales Micrococcales Frankiales Corynebacteriales Unclassified Thaumarchaeota Acidimicrobiales

0 2 4

Value Color Key

Figure 8: Clustering of metagenomic samples based on 16S/18S sequences and clustered in regards to each sample, showing time-points and the 50 most abundant organisms at order level.

(31)

3.5 Microbial community fluctuation

Taking the 30 most abundant organisms at the taxonomical level order for each carcass and time-point, there is a clear responsive community, that increases and decreases over time, with some organisms disappearing completely. There is also a variety of organisms that seem to be unaffected by the influx of nutrients from the carcass (fig. 9). The Bacilliales group are present in both carcasses throughout the sampling period. They also seem to be the group that reaches the highest abundance in the dataset. This occurs at day 3, the abundance stays high until day 14, then gradually decreases to the same level as in the Blank sample at day 21, Ca2 then remains at that level whilst Ca1 has a increase again on day 30. Fusobacteriales are present in the first days in both Ca1 and Ca2, where they are most abundant on day 0 for both samples. They are completely absent on day 14 and 30 in Ca1 and from day 7 onwards in Ca2. The increase in abundance for this order is most likely due to introduction of Fusobacterium equinum. The Clostridiales order are found in low levels throughout the sampling period in both carcasses, but are highly abundant at day 0. Claustridium botulinum BKT015925 is one of the culprits in the increase in this order at day 0.

Ca1 has another order that is present from the Blank sample and until day 7, with a high abundance at day 0, this is the Pasteurellales order. The species that cause the increase on day 0 within this order isPasteurella caballi. The Pasteurellales order are not present within the 30 most abundant organisms of Ca2.

An order that is probably introduced by the decomposing zebra is the Lactobacilliales.

This order is absent in the Ca1_Blank, but present in the rest of the samples. While the Lactobacilliales is abundant at day 0, a gradual reduction in abundance throughout the sampling period is seen for both carcasses. Many of the orders in Ca1 have a drop in abundance on day 21 before increasing again till day 30, this is not seen in Ca2.

For both carcasses orders such as Frankiales, Rhizobiales and Micrococcales are present in relatively stable, high levels throughout the sampling period.

There are few eukaryotic orders among the 30 most abundant organisms, Pezizomy- cotina is present in both carcasses. However, the eukaryotic Craniata order present in the blank sample, and at day 0 and 3 in Ca2, is not represented at all in the 30 most abundant organisms of Ca1. Furthermore Basal fungi are present in low levels throughout the sampling period in Ca1.

25

(32)

Figure 9: These plots show the 30 most abundant organisms at the order level for each time- point on a log scale for both carcasses, Ca1 (A) and Ca2 (B). The dataset have been log 5 normalised and then scaled such that each column have a total of 1.

(33)

3.6 Whole genome sequences from K1 and K2 B. anthracis iso- lates

3.6.1 Genome assemblies

Four assemblies were run with two different assembly methods, SPAdes and CLC, and two sets of input files, allowing 0 and 1 unknown nucleotides (Ns). A reapr analysis of the assemblies found that the number of contigs when allowing 0 Ns were 51 and 64 for K1 and K2 respectively when using SPAdes and 57 (K1) and 68 (K2) when using CLC.

In the assemblies allowing 1 Ns, the number of contigs increased to 60 and 67 for K1 and K2 respectively in the SPAdes assembly, however in the CLC assembly the number of contigs dropped to 37 for both carcasses. The mean sequence length of the contigs when allowing for 0 Ns was found to be 107096.61 and 89714.47 for K1 and K2 respectively in the SPAdes assembly and 95538 (K1) and 80094.93 (K2) in the CLC assemblies. In the assemblies allowing 1 N, the mean sequence length decreased in the SPAdes assemblies to 91204.85 and 81693.75 in K1 and K2 respectively, whilst in the CLC assemblies, the mean sequences length increased to 147527.92 and 147599 in K1 and K2 respectively.

The CLC assemblies where 1Ns was allowed were sent for annotation. The number of contigs were increased in both strains to 38 as the 37 contigs were aligned such that the first contig started with the same nucleotide as otherB. anthracis genomes available in the NCBI genome browser. Thus, the contig matching the start and end of the chromosome were split into two. This resulted in annotatedB. anthracis genomes where the K1 strain has a sequence size of 5456561 bp, consisting of 38 contigs, has a GC content of 35.1%, 5837 genes and 5549 CDS. K2 has a sequence size of 5461141 bp, consisting of 38 contigs, a GC content of 35.1%, 5856 genes and 5549 CDS.

3.6.2 Phylogenetic analysis

The core genome of all theB. anthracis strains at NCBI genome browser were determined and all the SNPs extracted with PanSeq. The extracted SNPs were then used to build a neighbour-net tree (fig. 10) in the SplitsTree program. This generated a neighbour-net tree with 14 main clades, where the K1 and K2 strains are found in the clade together with K1285, 2000031027, Ohio ACB, close to the A.Br.Aust94 branch (Table S12). There is a clear separation of the A branch from the B and C branches. In addition, the A branches separate further into two rough groups, where the A.Br.001/002, A.Br.003/004, A.Br.Ames and A.Br.Aust94 are found in one group and A.BR.005/005, A.Br.008/011, A.Br.011/009, A.Br.Vollum, A.Br.WNA and H9401 are found in the other. The B and C branches are also separated from each other.

A comparison of the CDS-regions of fourB. anthracis strains as well as threeB. cereus.

s.s. and two B. thuringiensis strains were done to visualise the similarities between the

27

(34)

Figure 10: Neighbour-net of 111 B. anthracis strains, 109 strains from NCBI genome browser and K1 and K2 strain marked in light blue, are found within the A.Br.Aust94 branch.

strains (fig. 11). Ohio ACB was used as a reference strain to K1 and K2 since the phylogenetic analysis (fig. 10) indicated it as one of the closest relatives with a whole genome available at NCBI genome browser. The plasmid region of the B. anthracis strains can be located in position 5200 kilo base pairs (kbp) to 6000 kbp. All the B.

anthracis included in this comparison are close to identical, whilst the B. cereus and the B. thuringiensis do not have the plasmid region, part from the B. cereus biovar anthracis Cl strain. There are three other regions that are not present in the B. cereus s.s. and B.

thuringiensis, namely at 800 kbp, 1100 kbp and 3000 kbp.

(35)

Figure 11: Similarity of CDS-regions of K1 and K2 strains together with CDS-regions of three other B. anhtracis, B. cereus s.s. and two B. thuringiensis strains were compared against the B. anthracis Ohio ACB strain. In the legend Ba, Bc and Bt corresponds to B. anhtracis, B.

cereus s.s. and B. thuringiensis respectively. Ba_CA_bison is the Ba Canadian Bison strain, and Bc_bioval_Cl is the Bc biovar anthracis Cl strain. As indicated by the legend, the full colours represent 100% similarity, whilst the grey areas represent 95% similarity to the reference sequence.

29

(36)

4 Discussion

It would be expected that the microbial community adjusts in accordance with nutrient availability (Hartmann et al., 2014). Although the local heterogeneity of bacteria species in soil are variable, the diversity at phylum level is remarkably stable throughout soil types (Janssen, 2006; Yasiret al., 2015). Where desert soils are more comparable to each other than non-desert soils (Fierer et al., 2012).

When analysing metagenome datasets there are a number of aspects that are important to take into consideration, especially when determining the taxonomy of the datasets. In environmental samples it is likely that only a fraction of the obtained 16S/18S rRNA sequences have been described (Logares et al., 2012). This is a common problem in metagenomic analysis and is confirmed in this study when the metaxa2 program were only able to assign a phylogeny to 50% of the detected 16S/18S sequences from the raw reads obtained after sequencing. The lack of identities need to be considered when interpreting results from an analysis. The possibility that undetected microbes can have a drastic effect on the changes in the microbial community are there, and too much weight can be put onto the organisms that have been detected. On the other hand, it is important to consider that many of these undetermined sequences could be faulty 16S/18S rRNA sequences generated by the metaxa2 program.

Another note is that many metagenome analysis, such as rarefaction curves and rank abundance, are based on the binning of similar 16S/18S rRNA sequences into OTUs.

This is quite straight forward in for example 16S amplicon sequencing. Reason being the same sections of the 16S/18S DNA sequences in the samples are amplified (Kozich et al., 2013; Poretsky et al., 2014). This is not true for shot-gun sequencing, here all the DNA fragments in the sample are amplified. Consequently many different sections of the 16S/18S regions from all the DNA in the sample will be present. Subsequently the binning of sequences into similar OTUs are challenging (Di Bella et al., 2013). Unstable OTUs, where sequences could be assigned to one OTU in one subset, but to two OTUs in a different subset, present a problem during clustering of sequences (He et al., 2015). The huge difference in the number of sequences and OTUs between the metagenome datasets (fig. 6A) in our study, could be more likely due to unstable or erroneous OTUs, than faults in the sequencing itself. In addition the fact that the data in fig. 6A is not normalised will also play a part in the skewness of the sample.

Despite that, 421 different OTUs were discovered in the dataset, with the least number of separate OTUs were found in the Blank samples and the highest number of separate OTUs in the Ca1_3 sample (fig. 6A). This is consistent with the thought that when the source of nutrients is plentiful and varied, from the zebra and what’s found normally in the soil, the diversity will be greater. As the original soil microbial community will be there, together with opportunistic bacteria and some bacteria introduced from the zebra.

(37)

The increase seen in the number of OTUs from the Blank sample and peaking at day 3 is to be expected as many of the bacteria would have had time to switch to the new nutrient source and flourish.

Even though the soil community is diverse, most of the sequences in day 3 and 7 fall within a few number of OTUs (fig. 6B), indicating that in those days, the soil community although diverse, is dominated by a few species. Whilst the later time-points such as day 21 and day 30, especially for Ca2, there is a more even number of sequences within each OTU. Indicating that there is a high evenness within the species at these time-points.

This can be seen when comparing the clustering of our two carcass sites (fig. 8). The samples taken at day 0 are found in one cluster, where the blank samples and the samples with blood cluster separately within this cluster. After day 0, there is no correlation between the organisms in the soil at the various sample times or the two carcass sites.

This clustering indicates that the original microbial composition found in the soil at day 0 at the two carcass sites are more similar to each other. However, when nutrients are introduced, the composition changes in accordance with what is available at the time, which will vary depending on what is released by the zebras as they decompose (Hyde et al., 2015; Pechal et al., 2013).

The nine phyla discussed by Janssen (2006) to be present in most soils are, together with 39 other phylums, present in our metagenome datasets analysed with metaxa2.

Within those, 289 different orders are found, scrutiny of the taxa at order level in these datasets (fig. 9) reveals responsive as well as non-responsive taxa. Of the responsive taxa the order that reach the highest abundance is the Bacilliales group. Whilst being present at all time-points, a bloom of Bacilliales occur between day 0 and day 3, then it stays at a high level until day 14 before gradually decreasing in small intervals. The fact that Bacilliales abundance does not increase drastically from the Blank sample and day 0 indicating that most Bacilliales did not come from the zebra, but probably responded to the nutrients made available as it decomposed. As is supported by the knowledge that Bacilliales belong to the phylum Firmicutes, which are known to increase in the early days of carcass decomposition (Cobaugh et al., 2015; Howardet al., 2010).

One order that is present in both carcasses and only have an increased abundance in the samples from day 0, is the Clostridiales order. Even though Clostridiales are a known soil bacterium order, many of the sequences that cause the increase in day 0 belong to Claustridium botulinum BKT015925. This bacterium is known to cause animal botulism (Skarin et al., 2011), even though it could have come from the soil, the fact that the sequences from this organism is found mostly in day 0, indicates that it could also originate from the zebras.

An order that most likely is introduced by the zebra however, is the Fusobacteriales.

They are present in lower abundance in the blank samples, however they increase in the day 0 sample. Before it completely disappears in both carcasses, part from in day 21 in

31

(38)

Ca1. The disappearance makes sense, seeing that most of the increase in the Fusobac- teriales is caused by sequences matching those of the bacterial species Fusobacteriales equinum, who are found in mucosa of horses. It is likely to believe that similar bacteria as those found in the mucus of horses can be found in the mucus of zebras. It is also likely that this bacterial species is not able to survive in soil, when its normal environment is host associated mucus. Low abundances of this order is however seen in the Blank sample. Indicating that some bacteria species within this order are found in soil normally, but are outcompeted. This is also true for the Pasteurellales order found in Ca1, here the sequences causing the high abundance in day 0 is similar to the sequences representing Pasteurellales caballi, a bacterium normally found in the upper respiratory tract of horses.

Again the order disappears completely after day 7, probably due to loss of nutrient source or unfavourable conditions.

In Ca2 the eukaryotic Craniata order are present until day 3. When looking at the genus level this is from the Mammaila genus, hence would probably have been introduced by the zebra, there was no indication that other carnivores had reached the carcass site and potentially left DNA. However, it is not totally unsurprising that the Craniata order is also seen in the Blank sample, with 33 hits for the Craniata order in Ca2, simply because the zebra could have thrashed around before it died, this order is not found amongst the 30 most abundant orders in Ca1.

In addition to a responsive community, there are organisms that seem to be little affected by the influx of nutrients from the decomposing zebra. These are amongst others from the orders Frankiales, Rhizobiales, Micrococcales, Unclassified Actinomicrobiales and Acidimicrobiales. All these are found in the Actinobacteria phylum, except Rhizobiales who belong to the Proteobacteria phylum. Actinobacteria is a common soil bacteria phylum who utilises energy from organic matter and often dominate their environment (Delmont et al., 2012). Rhizobiales species are nitrogen fixing bacteria in soil. These type of bacteria have a specialised way of acquire nutrients, and as such would not be expected to change metabolic properties due to nutrient availability from the decomposing zebra.

High-throughput sequencing is well suited for investigating microbial diversity because it is able to generate a large number of sequences. Despite the use of high-throughput sequencing in soil microbiome studies, only a few have used shot-gun sequencing with Illumina MiSeq as the sequencing platform. The traditionally used approach is 16S amplification and 454 pyrosequencing (Frisli et al., 2013; Hyde et al., 2015; Pechal et al., 2013). Illumina is a good option because it is cheaper than the 454 sequencing and gen- erates much more sequences, with gives better sequence coverage whilst at the same time avoiding biases in both 16S amplification and 454 pyrosequencing (Kozich et al., 2013;

Logares et al., 2013).

To obtain good quality sequences from the sequencing platform, the isolated DNA needs to be of high quality. The samples in this study came from soil covered in blood,

Next generation sequencing on soil samples from Etosha National Park: Diversity studies of an anthrax