Complete mitochondrial genomes of eleven extinct or possibly extinct bird species
Jarl Andreas Anmarkrud* and Jan T. Lifjeld
Natural History Museum, University of Oslo, PO Box 1172 Blindern, 0318 Oslo, Norway
Corresponding author: Jarl Andreas Anmarkrud, Natural History Museum, University of Oslo, PO Box 1172 Blindern, 0318 Oslo, Norway, Fax: +47 22 85 18 35 Email: [email protected]
Running title: Mitogenomes of 11 extinct birds
Abstract 1
Natural history museum collections represent a vast source of ancient and historical DNA samples from 2
extinct taxa that can be utilized by high throughput sequencing tools to reveal novel genetic and 3
phylogenetic information about them. Here we report on the successful sequencing of complete 4
mitochondrial genome sequences (mitogenomes) from eleven extinct bird species, using de novo 5
assembly of short sequences derived from toepad samples of degraded DNA from museum specimens.
6
For two species (the Passenger Pigeon Ectopistes migratorius and the South Island Piopio Turnagra 7
capensis), whole mitogenomes were already available from recent studies, whereas for five others (the 8
Great Auk Pinguinis impennis, the Imperial Woodpecker Campehilus imperialis, the Huia Heteralocha 9
acutirostris, the Kauai Oo Moho braccathus and the South Island Kokako Callaeas cinereus) there were 10
partial mitochondrial sequences available for comparison. For all seven species we found sequence 11
similarities of >98%. For the remaining four species (the Kamao Myadestes myadestinus, the Paradise 12
Parrot Psephotellus pulcherrimus, the Ou Psittirostra psittacea, and the Lesser Akialoa Akialoa obscura) 13
there was no sequence information available for comparison, so we conducted blast searches and 14
phylogenetic analyses to determine their phylogenetic positions and identify their closest extant 15
relatives. These mitogenomes will be valuable for future analyses of avian phylogenetics and illustrate 16
the importance of museum collections as repositories for genomics resources.
17
Introduction 18
Museum collections hold millions of biological specimens. These specimens function as a reservoir of 19
genetic material distributed throughout the tree of life and collected worldwide (Buerki & Baker 20
2016). For extinct taxa, museum collections may be the only source available for genetic material.
21
Museum specimens can also be a rich resource for genomic analyses of rare and remote species from 22
which access to fresh samples can be hard to obtain (Besnard et al. 2016; Kollias et al. 2015).
23
Especially for studies of how species respond to environmental change, habitat loss and changes in 24
population size, the value of museum specimens has increased in recent years (Buerki & Baker 2016).
25
During the last decade, new sequencing technologies have revolutionized the field of molecular 26
genetics. High throughput methods make it possible to reconstruct genetic regions, or genomes, 27
from objects hundreds or even thousands of years old (Miller et al. 2009; Miller et al. 2008). The 28
methods require relatively short DNA molecules compared to traditional Sanger sequencing, and are 29
therefore highly suitable for sequencing degraded DNA, like old museum specimens. The utilization 30
of museum specimens for genomic analyses has given rise to the term ‘museomics’ (e.g. Guschanski 31
et al. 2013; Zedane et al. 2016).
32
The mitochondrial genome (mitogenome) is, due to its maternal inheritance and mutation 33
characteristics, an important marker for studies related to taxonomy, phylogenetics, biodiversity and 34
evolution (Boore & Brown 1998; Gibb et al. 2007; Ingman et al. 2000; Sankoff et al. 1992). In 35
vertebrates, the number of mitochondrial DNA copies may be several orders of magnitude higher 36
than the number of nuclear DNA copies in most tissues (Shadel & Clayton 1997). This feature 37
facilitates the utilization of mitochondrial over nuclear DNA when working with small quantities of 38
template DNA. Complete mitogenomes, with decent read coverage, may be reconstructed by semi- 39
high-throughput sequencing, even when the sequence data are insufficient to assemble nuclear 40
genomes.
41
The number of mitogenomes reconstructed from extinct species is now rapidly increasing (Paijmans 42
et al. 2013). From birds, there are several mitogenomes available from extinct species. The 43
mitogenomes of two species of moa, Anomalopteryx didiformis and Emeus crassus, were 44
reconstructed using traditional Sanger sequencing (Haddrath & Baker 2001). Mitchell et al. (2014) 45
used a targeted sequencing strategy to sequence the mitogenomes of the elephant birds Aepyornis 46
hildebrandti and Mullerornis agilis, and Mitchell et al. (2016) used a similar approach to reconstruct 47
the mitogenomes of stout-legged wren Pachyplichas yaldwyni, Lyall’s wren Traversia lyalli and the 48
bush wren Xenicus longipes. Shotgun sequencing approaches were recently used to recover the 49
mitogenomes of the passenger pigeon Ectopistes migratorius and the South Island piopio Turnagra 50
capensis (Gibb et al. 2015; Hung et al. 2013).
51
In the present study we reconstructed the complete mitogenomes from eleven extinct or probably 52
extinct bird species from museum specimens in the Natural History Museum, University of Oslo. The 53
mitogenomes were obtained by shotgun sequencing of genomic DNA extracted from toe pad 54
samples and bioinformatically assembled. We provide a straightforward pipeline for mitogenome 55
reconstructions of old and degraded museum specimens and discuss the quality and scientific 56
potential of this approach for the study of extinct taxa.
57
58
Materials and methods 59
Samples 60
Eleven museum specimens of extinct bird species were chosen for this study. The extinction status of 61
the species and the collection details of the analyzed specimens are provided in Table 1. Images of 62
each individual museum specimen are available in the online Supplementary Figure S1.
63
DNA extractions and sequencing 64
The extraction procedures were performed in UV PCR stations in a lab dedicated for historic material.
65
Strict guidelines for sensitive museum samples were followed (available on request). DNA was 66
extracted from toe pads using the Qiagen Blood and Tissue kit (Qiagen Inc.) following the 67
manufacturer’s protocol. For the optimization of yield and concentration, the tissue was incubated 68
with proteinase K overnight and DNA was eluted in 2 x 80 µl elution buffer. Two no-template controls 69
were included in each step in the extraction procedure. DNA concentrations and integrity of the DNA 70
was measured using a Qubit spectrophotometer (ThermoFischer Scientific) and a Fragment Analyzer 71
(Advanced Analytical) instrument using the High Sensitivity Genomic DNA kit (DNF-488, Advanced 72
Analytical). DNA concentrations of the extracts are provided in the Supplementary Table S1.
73
Library preparations were performed using either the TrueSeq Nano (Illumina) or the MicroPlex 74
(Diagenode) library preparation kit, depending on DNA concentrations (see Supplementary Table S1) 75
and sequenced on an Illumina NextSeq instrument. Due to the degraded nature of the extracts (see 76
Supplementary Figure S2), no size selection was performed and the DNA extracts were sequenced 77
using 75 bp single read settings.
78
Bioinformatics and statistical analyses 79
Low quality reads were trimmed using Trimmomatic v0.33 (Bolger et al. 2014) with the following 80
settings: Sliding window 4:25, minimum length 50 bp. Mitogenomes were reconstructed using a 81
baiting and iterative mapping approach with the software MITObim 1.8 (Hahn et al. 2013). This 82
approach has been shown to produce high-quality mitogenomes even from relatively low coverage 83
data (Machado et al. 2016). Mitochondrial reference data used in the initial MITObim mapping is 84
provided in Table 2. Due to frequent duplications of the control region in avian mitogenomes (i.e.
85
Abbott et al. 2005; Gibb et al. 2007), which may result in almost identical copies (Singh et al. 2008), 86
correct assembly of avian mitogenomes may be challenging. Here we addressed this issue by 87
inspecting coverage plots of the mapped reads as previously suggested (Gibb et al. 2015). This was 88
performed in the software Tablet v1.14.10.20 (Milne et al. 2013). If an increased coverage was 89
observed in the region where duplications are likely to occur (between the cytB gene and the 12S 90
rRNA gene), we performed additional MITObim mapping using reference genomes from related taxa 91
with known duplicated control regions. In addition to using complete mitochondrial genomes as 92
references, we employed this iterative mapping strategy using the assembled cytB, NAD6 and 12S 93
rRNA gene sequences separately as short sequence bait. With this strategy, we aimed to provide 94
independent iterative mapped assemblies from each side of the regions where duplications are likely 95
to occur. Coverage plots from these assemblies were inspected and the contig sequences were 96
manually aligned in MEGA6 (Tamura et al. 2013).
97
We also performed initial mapping using a human mitogenome as reference. By this approach, we 98
were not able to reconstruct the human mitogenome from the read pool, indicating that our 99
consensus sequences were not affected by human contamination.
100
Gene annotations were performed automatically using the MITOs web server (Bernt et al. 2013) and 101
manually inspected.
102
For the four species without any prior genetic information, we searched for their closest relatives 103
using the cytochrome oxidase I (COI) gene nucleotide sequence as query in the BOLD database 104
(Ratnasingham & Hebert 2007).
105 106
Results 107
On average, we obtained 20,194,345 (SE= 445,089) reads per specimens (see Table 2 for details). We 108
were able to reconstruct the mitogenome from all the study objects with our iterative mapping 109
approach, yielding an average coverage ranging from 22.54X (Moho braccatus) to 160.67X 110
(Campephilus imperialis). Se Supplementary Table S1 for details. Number of mapped reads to each 111
reference in the iterative mapping is provided in Table 2. Total size of the mitogenomes ranged 112
between 16,664 bp (Myadestes myadestionus) and 17,063 bp (Psephotellus pulcherrimus). The 11 113
mitogenomes consisted of 2 rRNAs, 22 tRNAs, and 13 proteins coding genes.
114
For reconstruction of the Pinguinus impennis mitogenome, we used available mitogenomes from the 115
following species as reference sequence in the initial MITObim mapping: Synthliboramphus antiquus 116
(GenBank acc.no. AP009042), Chroicocephalus brunnicephalus (GenBank acc.no JX155863), 117
Saundersilarus saundersi (GenBank acc.no. NC017601) and Ichthyaetus relictus (GenBank acc.no.
118
KC760146). A major gap in the NAD6 gene was revealed in all MITObim mapping approaches using 119
these reference mitogenomes. Therefore we also used a previous published Pinguinus impennis and 120
the related Alca torda sequences (GenBank acc.no 21104303 and AJ242683) as sequence bait in the 121
iterative mapping (Moum et al. 2002). When using these sequences as bait, the gap in the NAD6 122
gene was resolved.
123
When inspecting coverage plots of the initial mapped reads, two species (Psephotellus pulcherrimus 124
and Moho braccatus) had an increased coverage between the cytB and 12S rRNA gene 125
(Supplementary Figure S3), which may indicate a duplicated region. For these species we employed 126
the iterative mapping strategy described above. For Psephotellusus pulcherrimus we used the 127
mitochondrial genomes from the following species as referecence, both previously shown to have 128
duplicated control regions: Melopsittacus undulatus (GenBank acc.no NC_009134) and Psittacus 129
erithacus (GenBank acc. no KM611474). When including these mitogenomes as reference, and the 130
gene specific baits, we uncovered an extra sequence motif, approximately 700 bp, in the control 131
region at nucleotide position 15,558. The gene arrangement was, however, identical to that of the 132
initial mapping. For the Moho braccatus we used the following mitogenomes with duplicated control 133
region as reference: Petroica macrocephala (GenBank acc. no KC545402) and Tachycineta albiventer 134
(GenBank acc. no JQ071620). Based on these references and the gene specific baits, we also here 135
obtained no change in the gene rearrangements.
136
One species, Callaeas cinereus, revealed a suspiciously low coverage region within the control region 137
at nucleotide position 15,580, even though the complete mitogenome coverage was relatively 138
uniform (Supplementary figure S3). This drop in coverage was observed around a C(n)TAC(n) motif. For 139
this species we followed the same strategy as described above using the reference mitogenome from 140
Philesturnus carunculatus (GenBank acc. no KC545403, (Gibb et al. 2015)) . The respective low 141
coverage region was not resolved with this strategy, leading to an unresolved motif in the final 142
mitogenome sequence.
143
All mitogenomes, except for Campephillus imperialis, revealed a standard avian gene order 144
(Desjardins & Morais 1990). The Campephillus imperialis mitogenome displayed a second non-coding 145
region, 88 bp in length, downstream of trnE.
146
A frame shift mutation was observed in the NAD3 gene in Ectopistes migratorius, Pinguinus impennis, 147
Psephotellus pulcherrimus and Campephilus imperialis. We also found incomplete stop codons in the 148
COXIII gene in all specimens, and in the NAD4 gene in all specimens except in Campephilus imperialis.
149
Seven of the eleven species have genetic information available in GenBank. A summary of the 150
similarity between the sequence data obtained in our study and known mitochondrial sequence data 151
from the respective species are provided in Table 3. Note that two of the extinct species, Ectopistes 152
migratorius and Turnagra capensis, already have published mitogenomes (Gibb et al. 2015; Hung et 153
al. 2013). For these two species, the similarity between the published mitogenomes and the 154
mitogenomes from our study were 0.997 and 0.996, respectively.
155
The following four species have, to our knowledge, no present genetic information available in 156
GenBank: Psephotellus pulcherrimus, Myadestes myadestinus, Psittirostra psittacea and Akialoa 157
obscura. Based on species identification searches in the BOLD database (Ratnasingham & Hebert 158
2007), the most similar species for Psephotellus pulcherrimus was Psephotellus chrysopterygius 159
(similarity: 0.942). For Myadestes myadestinus the most similar species was Myadestes obscurus 160
(similarity: 0.949). Loxops mana and Loxops coccineus were most similar to Psittirostra psittacea 161
(similarity: both 0.942). The most similar species to Akialoa obscura was Loxops mana (similarity:
162
0.930).
163
Because we also obtained sequence data from nuclear genomic DNA with our approach, we tested 164
the possibility to reconstruct such genetic regions with our sequencing data. Using available nuclear 165
sequence data from GenBank as bait in the MITObim mapping, we tried to reconstruct nuclear genes.
166
However, the amount of sequence data generated in our sequencing runs was insufficient to 167
reconstruct nuclear sequence data with high coverage. When using the nuclear contig sequences 168
with coverage ≥2, the reconstructed sequences obtained similarity between 0.991-1.000 with the 169
species-specific sequence bait.
170
171
Discussion 172
Standard avian mitochondrial gene arrangements (Desjardins & Morais 1991) were observed for all 173
the mitogenomes except for the Campephilus imperialis. For this individual a second non-coding 174
element was uncovered between trnE and trnF, previously described as ‘pseudo-control region’
175
(Haring et al. 1999). The pseudo-control region was of similar size as in the related species 176
Campephilus guatemalensis (Fuchs et al. 2015). Such pseudo-control regions have been described for 177
several bird taxa (see i.e Bensch & Härlid 2000; Gibb et al. 2007; Haring et al. 2001; Mindell et al.
178
1998b; Rêgo et al. 2010; Väli 2002). Duplicated control regions have also been reported among 179
parrots (Eberhard & Wright 2016; Schirtzinger et al. 2012). Duplications in avian mitogenomes affect 180
the gene order and will create gene rearrangement. We did not uncover any such gene 181
rearrangements within the Psephotellus pulcherrimus mitogenome. However, it may be difficult to 182
assemble duplicated control regions from high-throughput sequence data when the copies show high 183
similarity. Performing traditional Sanger sequencing (Sanger et al. 1977) on amplicons spanning the 184
challenging regions may be solution to this issue. Such an approach is, however, demanding when 185
working with highly degraded DNA. Accordingly we addressed this issue by inspecting coverage plots, 186
particularly focusing on the region between the cytB gene and the 12S rRNA gene. We observed 187
relatively uniform average coverage throughout all mitogenome except for the Psephotellus 188
pulcherrimus and the Moho braccatus species (Supplementary Figure S3). In these species we 189
observed an increased coverage in the region where duplications are likely to occur. By both 190
employing reference genomes with duplicated control regions and assembled single gene sequences 191
as baits in the iterative mapping, we did not observe different gene rearrangement in the final contig 192
sequence. We did, however, uncover a ~700 bp sequence motif not detected in the initial mapping in 193
the Psephotellus pulcherrimus control region (Supplementery Figure S3). This highlights the 194
importance of employing multiple references in an iterative mapping strategy if inadequate mapping 195
is suspected. Since we observed increased coverage depth in the respective cytB – 12S rRNA region 196
in the initial mapped assembly of the Moho braccatus mitogenome, we also suspected this species to 197
have duplicated motifs in the control region. Yet, we were not able to detect any alterations from the 198
standard avian mitochondrial gene arrangement when additional references and sequence baits 199
were included in the iterative mapping. Nevertheless, considering the relative high coverage 200
between the cytB gene and the 12S rRNA gene, we are not fully confident about the true sequence in 201
this region. We would therefore recommend caution when using this region of the mitogenome for 202
future phylogenetic analyses in this or related species. Providing Sanger-based sequence data from 203
the control region, using high quality DNA from closely related taxa, can also be an approach to 204
obtain robust contig sequences from this challenging region.
205
In order to reconstruct the Pinguinus impennis mitogenome we first used four different 206
mitogenomes as reference in the MITObim mapping. All these assemblies revealed a major gap in the 207
NAD6 gene. Such gap may indicate a duplicating event. Hence, we used two previously published 208
baits, one from Pinguinus impennis and one from the related Alca torda (Moum et al. 2002). These 209
sequence baits cover the complete region between the cytB gene and the 12S rRNA gene and are 210
obtained by traditional Sanger based sequencing. When these sequence baits were included we 211
managed to resolve the respective gap in the NAD6 gene and no duplication events were uncovered.
212
Coverage plots from the assembled mitogenome of Callaeas cinereus revealed relatively uniform 213
average coverage. Yet, a motif with suspiciously low coverage (Supplementary Figure S3) was 214
revealed. This motif consist of C(n)TAC(n) and is located approximately 800 bp downstream from the 215
NAD6 gene. We were not able to resolve this motif, and it was replaced by ‘N’s in the final contig 216
sequence. Similar drop in coverage was observed in several of the samples in the same region and at 217
the identical sequence motif (Pinguinus impennis, Ectopistes migratorius, Psephotellus pulcherrimus) 218
or at the similar motif C(n)TTC(n) (Akialoa obscura, Psittirostra psittacea), C(n)TCAC(n) (Heteralocha 219
acutirostris) or C(n)TTAC(n) (Turnagra capensis, Moho braccatus). Such regions with drop in coverage 220
should be flagged and carefully investigated, since they may indicate gene rearrangements.
221
Nevertheless, systematic coverage bias in vertebrate mitochondrial data has previously been 222
reported, including a bird (Ekblom et al. 2014). These authors reported a negative association 223
between local GC content (50 bp window) and coverage in assemblies of mitochondrial genomes 224
obtained from Illumina data. The local GC content in the above mentioned respective sequence 225
motifs ranged between 0.824 and 0.909. The observed drop in coverage may accordingly be 226
explained by systematic coverage bias due to high local GC content.
227
We observed a frame shift mutation in the NAD3 gene, violating the genetic code in Ectopistes 228
migratorius, Pinguinus impennis, Psephotellus pulcherrimus and Campephilus imperialis. This 229
mutation has frequently been observed among avian taxa (e.g. Mindell et al. 1998a). The incomplete 230
stop codons in COXIII and NAD4 are also common features of avian mitogenomes (e.g. Gibb et al.
231
2015; Hung et al. 2013). The incomplete NAD4 stop codon was not observed in Campephilus 232
imperialis. This is in concordance with the similar trait seen in the closely related Campephilus 233
guatemalensis (Fuchs et al. 2015).
234
Except for the existing genetic information from Callaeas cinereus and the 12S gene from Moho 235
braccatus (see GenBank acc.no. and references in Table 3), our sequence data was highly similar to 236
available genetic sequence data. The Callaeas cinereus sequence data in GenBank, provided by 237
Zuccon and Ericson (2012), was derived from a specimen from the extant population on the North 238
Island (Zuccon and Ericson, pers. comm.). In the classification adopted by Zuccon and Ericson (2012), 239
i.e. Howard and Moore 3rd edition (Dickinson 2003), the North Island kokako was considered a 240
subspecies (ssp. wilsoni) of Callaeas cinereus. Today most classifications, including Howard and 241
Moore 4th edition (Dickinson & Christidis 2014), recognize them as separate species: C. cinereus and C.
242
wilsoni. In our alignment with the Moho braccatus mitogenome and the GenBank 12S gene 243
sequence (287 bp) we observed 6 nucleotide substitutions and three gaps. Two of these gaps were 244
filled in with an ‘N’ in the GenBank sequence. Our sequence data revealed an identical cytb gene 245
compared to the available sequence data from the same species.
246
We employed the DNA barcode marker (COI gene) and the BOLD database for indicating the closest 247
relative to the four species without any prior genetic information. The BOLD database may have 248
incomplete taxonomic coverage, so our identification of the sister species should be treated with 249
caution. The correct phylogenetic placement of these four species must await further phylogenetic 250
studies.
251
Our results show that complete mitogenomes of high quality can easily be reconstructed from 252
toepad samples of old museum skins. For all samples we had an average coverage of >20 reads 253
across the entire mitogenome, which should be sufficient for reconstructing the true sequence. Our 254
approach did not have enough sequence depth to recover nuclear regions, so reconstructing the 255
nuclear genome would require several orders of magnitude higher sequencing coverage. Our study 256
exemplifies the value of museum collections as repositories for genetic resources. Mitochondrial 257
DNA markers will continue to play a vital role in the evolutionary studies of birds and other taxa (Zink 258
& Barrowclough 2008), and it is obvious that the utilization of full mitogenomes will increase the 259
resolution and quality of the analysis. Importantly, it will be possible to make better assessments of 260
the phylogenetic position of extinct taxa and probe deeper into their evolutionary history and 261
patterns of genetic diversity (Paijmans et al. 2013).
262
Acknowledgements 263
The project was funded by an internal research grant at Natural History Museum, University of Oslo.
264
We are grateful to Lars Erik Johannessen for help with photographing the specimens. We are grateful 265
to the Norwegian Sequencing Center for expert technical assistance.
266
References 267
Abbott CL, Double MC, Trueman JWH, Robinson A, Cockburn A (2005) An unusual source of apparent 268 mitochondrial heteroplasmy: duplicate mitochondrial control regions in Thalassarche
269 albatrosses. Molecular Ecology 14, 3605-3613.
270 Bensch S, Härlid A (2000) Mitochondrial genomic rearrangements in songbirds. Molecular Biology 271 and Evolution 17, 107-113.
272 Bernt M, Donath A, Jühling F, et al. (2013) MITOS: Improved de novo metazoan mitochondrial 273 genome annotation. Molecular Phylogenetics and Evolution 69, 313-319.
274 Besnard G, Bertrand JAM, Delahaie B, et al. (2016) Valuing museum specimens: High-throughput 275 DNA sequencing on historical collections of New Guinea crowned pigeons (Goura). Biological 276 Journal of the Linnean Society 117, 71-82.
277
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data.
278 Bioinformatics 30, 2114-2120.
279 Boore JL, Brown WM (1998) Big trees from little genomes: Mitochondrial gene order as a 280 phylogenetic tool. Current Opinion in Genetics & Development 8, 668-674.
281 Buerki S, Baker WJ (2016) Collections-based research in the genomic era. Biological Journal of the 282 Linnean Society 117, 5-10.
283 Desjardins P, Morais R (1990) Sequence and gene organization of the chicken mitochondrial genome:
284 A novel gene order in higher vertebrates. Journal of Molecular Biology 212, 599-634.
285
Desjardins P, Morais R (1991) Nucleotide sequence and evolution of coding and noncoding regions of 286 a quail mitochondrial genome. Journal of Molecular Evolution 32, 153-161.
287 Dickinson EC (2003) The Howard and More complete checklist of the birds of the world Christopher
288 Helm, London.
289 Dickinson EC, Christidis L (2014) The Howard and Moore Complete Checklist of the Birds of the World, 290 4th edn. Aves Press, Eastbourne, U. K.
291 Eberhard JR, Wright TF (2016) Rearrangement and evolution of mitochondrial genomes in parrots.
292 Molecular Phylogenetics and Evolution 94, 34-46.
293
Ekblom R, Smeds L, Ellegren H (2014) Patterns of sequencing coverage bias revealed by ultra-deep 294 sequencing of vertebrate mitochondria. BMC Genomics 15, 1-9.
295 Fleischer RC, James HF, Olson SL (2008) Convergent evolution of Hawaiian and Australo-Pacific 296 Honeyeaters from distant songbird ancestors. Current Biology 18, 1927-1931.
297 Fleischer RC, Kirchman JJ, Dumbacher JP, et al. (2006) Mid-Pleistocene divergence of Cuban and 298 North American ivory-billed woodpeckers. Biology Letters 2, 466-469.
299 Fuchs J, Pons J-M, Pasquet E, Bonillo C (2015) Complete mitochondrial genomes of the white-browed 300 piculet (Sasia ochracea, Picidae) and pale-billed woodpecker (Campephilus guatemalensis, 301 Picidae). Mitochondrial DNA Part A 27, 3640-3641.
302 Gibb GC, England R, Hartig G, et al. (2015) New Zealand passerines help clarify the diversification of 303 major songbird lineages during the Oligocene. Genome Biology and Evolution.
304 Gibb GC, Kardailsky O, Kimball RT, Braun EL, Penny D (2007) Mitochondrial genomes and avian 305 phylogeny: Complex characters and resolvability without explosive radiations. Molecular 306 Biology and Evolution 24, 269-280.
307 Guschanski K, Krause J, Sawyer S, et al. (2013) Next-generation museomics disentangles one of the 308 largest primate radiations. Systematic Biology.
309 Haddrath O, Baker AJ (2001) Complete mitochondrial DNA genome sequences of extinct birds: ratite 310 phylogenetics and the vicariance biogeography hypothesis. Proceedings of the Royal Society 311 B: Biological Sciences 268, 939-945.
312 Hahn C, Bachmann L, Chevreux B (2013) Reconstructing mitochondrial genomes directly from 313 genomic next-generation sequencing reads—a baiting and iterative mapping approach.
314 Nucleic Acids Research 41, e129.
315
Haring E, Kruckenhauser L, Gamauf A, Riesing MJ, Pinsker W (2001) The complete sequence of the 316 mitochondrial genome of Buteo buteo (Aves, Accipitridae) indicates an early split in the 317 phylogeny of raptors. Molecular Biology and Evolution 18, 1892-1904.
318 Haring E, Riesing MJ, Pinsker W, Gamauf A (1999) Evolution of a pseudo-control region in the 319 mitochondrial genome of Palearctic buzzards (genus Buteo). Journal of Zoological 320 Systematics and Evolutionary Research 37, 185-194.
321 Hung C-M, Lin R-C, Chu J-H, et al. (2013) The de novo assembly of mitochondrial genomes of the 322 extinct Passenger Pigeon (Ectopistes migratorius) with next generation sequencing. PLoS ONE 323 8, e56301.
324 Ingman M, Kaessmann H, Paabo S, Gyllensten U (2000) Mitochondrial genome variation and the 325 origin of modern humans. Nature 408, 708-713.
326 Kollias S, Poortvliet M, Smolina I, Hoarau G (2015) Low cost sequencing of mitogenomes from 327 museum samples using baits capture and Ion Torrent. Conservation Genetics Resources 7, 328
345-348.
329 Lambert DM, Shepherd LD, Huynen L, et al. (2009) The molecular ecology of the extinct New Zealand 330 Huia. PLoS ONE 4, e8019.
331 Machado DJ, Lyra ML, Grant T (2016) Mitogenome assembly from genomic multiplex libraries:
332 comparison of strategies and novel mitogenomes for five species of frogs. Molecular Ecology 333 Resources 16, 686-693.
334 Miller W, Drautz DI, Janecka JE, et al. (2009) The mitochondrial genome sequence of the Tasmanian 335 tiger (Thylacinus cynocephalus). Genome Research 19, 213-220.
336 Miller W, Drautz DI, Ratan A, et al. (2008) Sequencing the nuclear genome of the extinct woolly 337 mammoth. Nature 456, 387-390.
338 Milne I, Stephen G, Bayer M, et al. (2013) Using Tablet for visual exploration of second-generation 339 sequencing data. Briefings in Bioinformatics 14, 193-202.
340 Mindell DP, Sorenson MD, Dimcheff DE (1998a) An extra nucleotide is not translated in mitochondrial 341 ND3 of some birds and turtles. Molecular Biology and Evolution 15, 1568-1571.
342
Mindell DP, Sorenson MD, Dimcheff DE (1998b) Multiple independent origins of mitochondrial gene 343 order in birds. Proceedings of the National Academy of Sciences 95, 10693-10697.
344 Mitchell KJ, Llamas B, Soubrier J, et al. (2014) Ancient DNA reveals elephant birds and kiwi are sister 345 taxa and clarifies ratite bird evolution. Science 344, 898-900.
346 Mitchell KJ, Wood JR, Llamas B, et al. (2016) Ancient mitochondrial genomes clarify the evolutionary 347 history of New Zealand’s enigmatic acanthisittid wrens. Molecular Phylogenetics and
348 Evolution 102, 295-304.
349 Moum T, Arnason U, Arnason E (2002) Mitochondrial DNA sequence evolution and phylogeny of the 350 Atlantic Alcidae, including the extinct great auk (Pinguinus impennis). Molecular Biology and 351 Evolution 19, 1434-1439.
352 Paijmans JLA, Gilbert MTP, Hofreiter M (2013) Mitogenomic analyses from ancient DNA. Molecular 353 Phylogenetics and Evolution 69, 404-416.
354 Ratnasingham S, Hebert PDN (2007) BOLD: The Barcode of Life Data System 355 (http://www.barcodinglife.org). Molecular Ecology Notes 7, 355-364.
356 Rêgo PS, Araripe J, Silva WAG, et al. (2010) Population genetic studies of mitochondrial pseudo- 357 control region in the endangered araripe manakin (Antilophia bokermanni). The Auk 127,
358 335-342.
359 Sanger F, Nicklen S, Coulson AR (1977) DNA sequencing with chain-terminating inhibitors.
360 Proceedings of the National Academy of Sciences of the United States of America 74, 5463-
361 5467.
362 Sankoff D, Leduc G, Antoine N, et al. (1992) Gene order comparisons for phylogenetic inference:
363 evolution of the mitochondrial genome. Proceedings of the National Academy of Sciences 89,
364 6575-6579.
365
Schirtzinger EE, Tavares ES, Gonzales LA, et al. (2012) Multiple independent origins of mitochondrial 366 control region duplications in the order Psittaciformes. Molecular Phylogenetics and
367 Evolution 64, 342-356.
368 Shadel GS, Clayton DA (1997) Mitochondrial DNA maintenance in vertebrates. Annual Review of 369 Biochemistry 66, 409-435.
370 Shepherd LD, Lambert DM (2007) The relationships and origins of the New Zealand wattlebirds 371 (Passeriformes, Callaeatidae) from DNA sequence analyses. Molecular Phylogenetics and 372 Evolution 43, 480-492.
373 Singh TR, Shneor O, Huchon D (2008) Bird mitochondrial gene order: Insight from 3 warbler 374 mitochondrial genomes. Molecular Biology and Evolution 25, 475-477.
375 Tamura K, Stecher G, Peterson D, Filipski A, Kumar S (2013) MEGA6: Molecular evolutionary genetics 376 analysis version 6.0. Molecular Biology and Evolution.
377 Tebbutt S, Simons C (2002) Gene sequences from New Zealand's extinct Huia. Journal of the Royal 378 Society of New Zealand 32.
379 Väli Ü (2002) Mitochondrial pseudo-control region in old world eagles (genus Aquila). Molecular 380 Ecology 11, 2189-2194.
381 Zedane L, Hong-Wa C, Murienne J, et al. (2016) Museomics illuminate the history of an extinct, 382 paleoendemic plant lineage (Hesperelaea, Oleaceae) known from an 1875 collection from 383 Guadalupe Island, Mexico. Biological Journal of the Linnean Society 117, 44-57.
384 Zink RM, Barrowclough GF (2008) Mitochondrial DNA under siege in avian phylogeography.
385 Molecular Ecology 17, 2107-2121.
386
Zuccon D, Ericson PGP (2012) Molecular and morphological evidences place the extinct New Zealand 387 endemic Turnagra capensis in the Oriolidae. Molecular Phylogenetics and Evolution 62, 414-
388 426.
389 390
Data accessibility 391
DNA sequences: GenBank accessions KU158188-KU158198.
392
Sequence data have been submitted to the NCBI Sequence Read Archive under BioProject ID:
393
PRJNA312568.
394
Voucher specimen accession numbers: See Table 1. Each voucher accession number is searchable via 395
the online Collection Explorer: http://nhmo-birds.collectionexplorer.org/accession.aspx 396
Table 1. List of taxa, their extinction status and voucher information.
Taxon1 Range Extinct2 Voucher specimen Sex Locality Collection date
Great Auk
Pinguinus impennis (Linnaeus 1758)
North Atlantic Ocean
1852 NHMO-BI-77944 Unknown Eldey, Iceland 1831
Passenger Pigeon
Ectopistes migratorius (Linnaeus 1766)
North America 1914 NHMO-BI-77945 Male Niagara, USA 1870
Imperial Woodpecker
Campephilus imperialis (Gould 1832)
Mexico 1956 NHMO-BI-62037 Male Chihuahua, Mexico 10 January 1891
Paradise Parrot
Psephotellus pulcherrimus (Gould 1845)
Australia 1928 NHMO-BI-64259 Male Nogoa River,
Queensland, Australia
5 August 1881
South Island Kokako
Callaeas cinereus (Gmelin 1788)
South Island, New Zealand
1967 NHMO-BI-63921 Unknown South Island, New Zealand
Unknown
Huia
Heteralocha acutirostris (Gould 1837)
New Zealand 1907 NHMO-BI-63914 Male North Island, New Zealand
Unknown
South Island Piopio
Turnagra capensis (Sparrman 1787)
South Island, New Zealand
1905 NHMO-BI-60577 Male South Island, New Zealand
1891
Kauai Oo
Moho braccatus (Cassin 1855)
Kauai, Hawaiian Islands, USA
1987 NHMO-BI-63104 Unknown Kauai, Hawaiian Islands, USA
1893
Kamao
Myadestes myadestinus (Stejneger 1887)
Kauai, Hawaiian Islands, USA
1985 NHMO-BI-60523 Unknown Kauai, Hawaiian Islands, USA
1893
Ou
Psittirostra psittacea (Gmelin 1789)
Hawaiian Islands, USA
1987 NHMO-BI-67819 Unknown Kauai, Hawaiian Islands, USA
Unknown
Lesser Akialoa
Akialoa obscura (Gmelin 1788)
Hawaii, Hawaian Islands, USA
1940 NHMO-BI-67809 Female Kauai, Hawaiian Islands, USA
Unknown
1 IOC World Bird Names, version 5.4
2Last documented record according to IUCN 2015-3
Table 2. Number of reads, reference sequences used in MITObim mapping, reads mapped to the mitogenome and GenBank accession number from the respective study species.
TaxonName Reads Reference in initial mapping (GB acc.no) Reads mapped to
mitogenome GB acc. no.
Pinguinus impennisa 19,954,465 See text 15,350/13,088/13,107/
13,111/6,504/4063 KU158188 Ectopistes migratorius 19,562,210 Ectopistes migratorius (JF312866) 5,699 KU158192 Campephilus imperialis 19,809,348 Campephilus guatamalensis (KT443920) 35,166 KU158198 Psephotellus pulcherrimus 20,464,476 Initial: Psittrichas fulgidus (KM611475): 16,592
KU158195 Final: See text
Callaeas cinereus 20,213,930 Initial: Urocissa erythroryncha (JQ423932) 16,550
KU158191 Final: See text
Heteralocha acutirostris 21,596,754 Poecile antripacilla (KJ909190 ) 12,290 KU158193 Turnagra capensis 17,643,416 Pyrrhocorax graculus (KJ598623) 21,421 KU158197 Moho braccatus 20,788,183 Initial: Bombycilla cedrorum (KJ909187) 4,330
KU158189 Final: See text
Myadestes myadestinus 22,496,771 Luscinia cyanura (KF997864) 8,866 KU158194 Psittirostra psittacea 21,631,741 Final: Loxops caeruleirostris (KM078776) 20,476
KU158196 Initial: Serinus canaria (HF969008)
Akialoa obscura 17,976,501 Pseudonestor xanthroprys (KM078809) 11,373 KU158190
aSix different reference sequences where utilized in the iterative mapping for this specimen. Numbers of mapped reads and average coverage follow the same respective order as the reference sequences given in the main text.
Table 3. Similarity between available sequence data and the sequence data produced in this study.
Scientific name Gene/region GB acc.no Reference Identity
Pinguinus impennis NAD5 AJ242685 Moum et al. (2002) 1.00
Pinguinus impennis cytb AJ242685 Moum et al. (2002) 0.998
Pinguinus impennis NAD6 AJ242685 Moum et al. (2002) 0.998
Pinguinus impennis 12S AJ242685 Moum et al. (2002) 1.00
Pinguinus impennis CR AJ242685 Moum et al. (2002) 0.982
Heteralocha acutirostris CR GU176413 Lambert et al. (2009) 1.00 Heteralocha acutirostris 12S AF470618 Tebbutt and Simons (2002) 1.00 Heteralocha acutirostris NAD2 DQ469296 Shepherd and Lambert (2007) 1.00 Heteralocha acutirostris cytb DQ469300 Shepherd and Lambert (2007) 1.00 Campephilus imperialis COI DQ518894 Fleischer et al. (2006) 0.988 Campephilus imperialis cytb DQ521905 Fleischer et al. (2006) 0.998 Campephilus imperialis ND2 DQ521922 Fleischer et al. (2006) 1.00 Campephilus imperialis ATP6 DQ521930 Fleischer et al. (2006) 1.00 Campephilus imperialis ATP8 DQ521937 Fleischer et al. (2006) 1.00
Moho braccatus 12S FJ378060 Fleischer et al. (2008) 0.96
Moho braccatus cytb FJ383125 Fleischer et al. (2008) 1.00
Ectopistes migratorius mitogenome KC489473 Hung et al. (2013) 0.997 Ectopistes migratorius mitogenome KC489474 Hung et al. (2013) 0.993
Callaeas cinereus CR AF433181 Unpublished 0.995
Callaeas cinereus 12S DQ469308 Shepherd and Lambert (2007) 0.998 Callaeas cinereus ND3 JN614676 Zuccon and Ericson (2012) 0.945a
Callaeas cinereus ND2 JN614726 Zuccon and Ericson (2012) 0.945a Callaeas cinereus cytb JN614896 Zuccon and Ericson (2012) 0.962a Turnagra capensis mitogenome KT894672 Gibb et al. (2015) 0.996
a Recent taxonomic split (Dickinson & Christidis 2014). GenBank accessions are obtained from the sister species.
Great Auk Pinguinus impennis
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=77944
Passenger Pigeon Ectopistes migratorius
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=77945
Imperial Woodpecker Campephilus imperialis
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=62037
Paradise Parrot Psephotellus pulcherrimus
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=64259
South Island Kokako Callaeas cinereus
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=63921
Huia Heteralocha acutirostris
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=63914
South Island Piopio Turnagra capensis
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=60577
Kauai Oo Moho braccatus
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=63104
Kamao Myadestes myadestinus
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=60523
Ou Psittirostra psittacea
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=67819
Lesser Akialoa Akialoa obscura
Link to the specimen accession in the online Collection Explorer, including high resolution images:
http://nhmo-birds.collectionexplorer.org/accession.aspx?acc=67809
Supplementary Figure S2. Fragment Analyzer trace of the DNA extract from each taxon sample. The lower marker (LM) peak is 1 bp and contains a constant concentration (0.027 ng/µl). The number above each peak represents the peak size (bp). The NTC is the two non-template controls combined.
Callaeas cinereus Moho braccatus
Psephotellus pulcherrimus – initial mapping
Psephotellus pulcherrimus – final assembly
Supplementary Figure S3. Coverage plot of the complete mitogenome for Psephotellus pulcherrimus, Moho braccatus and Callaeas cinereus. Two plots from Psephotellus pilcherrimus are presented, the first from the initial mapping and the
second from the final assembly (see main text for details). The trnF gene is the starting position in all plots. The dashed
vertical line illustrate position of the cytB stop codon. Average coverage for ‘Psephotellus pulcherrimus - final assembly’ is
74.02X. Coverage information for the other plots is provided in Supplementary Table S1. The ~700 bp sequence motif
revealed in the final assembly is flagged with a solid black line. The low coverage unresolved sequence in Callaeas cinereus
is flagged with a black arrow. The coverage plots were obtained from the software Tablet v1.14.10.20 (Milne et al. 2013).
Supplementary table S1. DNA concentrations, peak size of DNA molecules in the DNA extracts, library preparation method and sequence depth (average coverage) for the respective study objects.
TaxonName Qubit (ng/µl) Peak size (bp) Prep Average coverage
Ectopistes migratorius 5.14 147 TrueSeq Nano 28.41
Heteralocha acutirostris 6.16 114 TrueSeq Nano 58.69
Myadestes myadestinus 7.78 152 TrueSeq Nano 42.69
Pinguinus impennisa 3.23 116 TrueSeq Nano 75.66/59.45/60.1/59.81/66.37/68.66
Psephotellus pulcerrimus 6.03 114 TrueSeq Nano 76.03
Moho braccatus 8.31 154 TrueSeq Nano 22.54
Callaeas cinereus 5.94 112 TrueSeq Nano 76.49
Psittirostra psittacea 0.65 71 MicroPlex 92.67
Akialoa obscura 1.12 74 MicroPlex 52.95
Turnagra capensis 7.7 122 TrueSeq Nano 98.64
Campephilus imperialis 5.59 90 TrueSeq Nano 160.67
aSix different reference sequences where utilized in the initial mapping for this specimen. Average coverage values follow the same respective order as the reference sequences given in the main text.