The dental proteome of Homo antecessor

The phylogenetic relationships between hominins of the Early Pleistocene epoch in Eurasia, such as Homo antecessor, and hominins that appear later in the fossil record during the Middle Pleistocene epoch, such as Homo sapiens, are highly debated^1,2,3,4,5. For the oldest remains, the molecular study of these relationships is hindered by the degradation of ancient DNA. However, recent research has demonstrated that the analysis of ancient proteins can address this challenge^6,7,8. Here we present the dental enamel proteomes of H. antecessor from Atapuerca (Spain)^9,10 and Homo erectus from Dmanisi (Georgia)¹, two key fossil assemblages that have a central role in models of Pleistocene hominin morphology, dispersal and divergence. We provide evidence that H. antecessor is a close sister lineage to subsequent Middle and Late Pleistocene hominins, including modern humans, Neanderthals and Denisovans. This placement implies that the modern-like face of H. antecessor—that is, similar to that of modern humans—may have a considerably deep ancestry in the genus Homo, and that the cranial morphology of Neanderthals represents a derived form. By recovering AMELY-specific peptide sequences, we also conclude that the H. antecessor molar fragment from Atapuerca that we analysed belonged to a male individual. Finally, these H. antecessor and H. erectus fossils preserve evidence of enamel proteome phosphorylation and proteolytic digestion that occurred in vivo during tooth formation. Our results provide important insights into the evolutionary relationships between H. antecessor and other hominin groups, and pave the way for future studies using enamel proteomes to investigate hominin biology across the existence of the genus Homo.

Enamel proteome shows that Gigantopithecus was an early diverging pongine

Article 13 November 2019

Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny

Article 11 September 2019

Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel

Article 26 April 2024

Main

Since 1994, over 170 human fossil remains have been recovered from level TD6 of the Gran Dolina site of the Sierra de Atapuerca¹⁰ (Burgos, Spain) (Extended Data Fig. 1, Supplementary Information). These fossils have been dated to the late Early Pleistocene epoch and exhibit a unique combination of cranial, mandibular and dental features^9,11. To accommodate the variation observed in the human fossils from TD6, a new species of the genus Homo—H. antecessor—was proposed in 1997⁹. The relationship of this species to earlier or later hominins in Eurasia—such as the H. erectus specimens from Dmanisi or Neanderthals, Denisovans and modern humans, respectively—have been the subject of considerable debate^3,4,12,13. These issues remain unresolved owing to the fragmentary nature of hominin fossils at other sites, and the failure to recover ancient DNA in Eurasia that dates to the Early, and most of the Middle, Pleistocene epoch.

By contrast, recent developments in the extraction and tandem mass-spectrometric analysis of ancient proteins have made it possible to retrieve phylogenetically informative protein sequences from Early Pleistocene contexts^6,8. We therefore applied ancient protein analysis to a H. antecessor molar from sublevel TD6.2 of the Gran Dolina site of the Sierra de Atapuerca (specimen ATD6-92) (Extended Data Fig. 2a). This specimen, identified as an enamel fragment of a permanent lower left first or second molar, has been directly dated to 772–949 thousand years ago (ka) using a combination of electron spin resonance and U-series dating¹¹. In addition, we sampled dentine and enamel from an isolated H. erectus upper first molar (specimen D4163) (Extended Data Fig. 2b) from Dmanisi (Georgia) that has been dated to 1.77 million years ago (Ma)^1,14,15, as amino acid racemization analysis of this specimen indicated the presence of an endogenous protein component in the intracrystalline enamel fraction of the tooth (Extended Data Fig. 3, Supplementary Information). On both specimens, we performed digestion-free peptide extraction optimized for the recovery of short, degraded protein remains⁶. Nanoscale liquid chromatography–tandem mass spectrometry (nanoLC–MS/MS) acquisition was replicated in two independent proteomic laboratories (Extended Data Table 1), implementing common precautions and analytical workflows to minimize protein contamination (Methods). We compared the proteomic datasets retrieved from the Pleistocene hominin tooth specimens with those generated from a positive control, a recent human premolar (Ø1952; which is from a male individual and is approximately three centuries old), as well as previously published Holocene teeth¹⁶ (Methods, Supplementary Information). Finally, to validate our enamel peptide spectrum matches, we performed machine-learning-based MS/MS spectrum intensity prediction using the wiNNer algorithm¹⁷. The results show that the wiNNer model retrained for randomly cleaved and heavily modified peptides provides a predictive performance similar to that of the wiNNer model trained on modern, trypsin-digested samples, assuring accurate sequence identification for the phylogenetically informative peptides (median Pearson correlation coefficients of ≥0.76) (Methods, Supplementary Fig. 6, Supplementary Information).

Protein recovery from the Dmanisi dentine sample was limited to sporadic collagen type I fragments, and therefore in-depth analysis of this material was not further pursued. By contrast, we recovered ancient proteomes from both hominin enamel samples. We found that the composition of these proteomes is similar to that of the recent human specimen that we processed as a positive control, as well as to previously published proteomes from ancient enamel^6,16,18,19 (Extended Data Table 2, Supplementary Table 6). The enamel-specific proteins include amelogenin (both AMELX and AMELY isoforms), enamelin (ENAM), ameloblastin (AMBN), amelotin (AMTN) and the enamel-specific protease matrix metalloproteinase 20 (MMP20). Serum albumin (ALB) and collagens (COL1α1, COL1α2 and COL17α1) are also present. For the enamel-specific proteins, the peptide sequences that we retrieved cover approximately the same protein regions in all of the specimens that we analysed (Extended Data Fig. 4). Although destructive, our sampling of Pleistocene hominin teeth resulted in higher protein sequence coverage than acid-etching of Holocene enamel surfaces^16,20 (Supplementary Fig. 7). The AMTN-specific peptides largely derive from a single sequence region involved in hydroxyapatite precipitation through the presence of phosphorylated serines²¹. Finally, the observation of the AMELY-specific peptides (which is coded on the non-recombinant portion of the Y chromosome) demonstrates that the H. antecessor molar that we studied belonged to a male individual¹⁶ (Extended Data Fig. 5).

Besides proteome composition and sequence coverage, several further lines of evidence independently support the endogenous origin of the hominin enamel proteomes. Unlike exogenous trypsin, keratins and other human-skin contaminants that we identified, the enamel proteins have high deamidation rates (Extended Data Fig. 6)—above the rate observed for the recent human specimens (Supplementary Fig. 8). Both Pleistocene hominins have average peptide lengths that are shorter than those observed for our recent human controls (Extended Data Fig. 6d). The average peptide length is shorter in the Dmanisi hominin, but longer in the younger Atapuerca hominin (Extended Data Fig. 6d). By contrast, we observe that the peptide lengths in enamel from the Dmanisi hominin are indistinguishable from those of the faunal remains from the same site. Together, our protein data are therefore consistent with theoretical and experimental^6,22 expectations for samples of their relative age.

In addition to diagenetic modifications, we observe two kinds of in vivo modification in our recent and ancient enamel proteomes. First, we detect serine (S) phosphorylation within the S-X-E motif (Fig. 1a, b). This motif, as well as the S-X-phosphorylated S motif, is recognized by the FAM20C secreted kinase, which is active in the phosphorylation of extracellular proteins^23,24. The presence of phosphoserine in fossil enamel and its location in the S-X-E and/or S-X-phosphorylated S motifs has also previously been observed in other Pleistocene enamel proteomes^6,25. Phosphorylation occupancy can be computed successfully for ancient and recent samples, and reveals differences in the ratios of phosphorylated peptides between samples (Fig. 1c, Supplementary Table 5). Second, the peptide populations that we retrieve primarily cover the ameloblastin, enamelin and amelogenin sequence regions, representing cleavage products deriving from in vivo activity of the proteases MMP20 and—subsequently—kallikrein 4 (KLK4) (Extended Data Fig. 4, Methods). The peptide populations are also enriched in N and C termini that correspond to known MMP20 and KLK4 cleavage sites (Extended Data Fig. 7, Supplementary Fig. 9). FAM20C phosphorylation and MMP20 and KLK4 proteolysis are the two main processes that occur in vivo during enamel biomineralization. Our observation of products deriving from both processes opens up the possibility of studying in vivo processes of hominin tooth formation across the Pleistocene epoch.

**Fig. 1: Phosphorylation of hominin enamel proteomes.**

Homo antecessor is known only from the Gran Dolina TD6 assemblage in Atapuerca⁹. Its relationship with other European Middle Pleistocene fossils is heavily debated^3,4,5,26,27. It remains contentious as to whether H. antecessor represents the last common ancestor of H. sapiens, Neanderthals and Denisovans⁹, or whether it represents a sister lineage to the last common ancestor of these species^28,29. We address this issue by conducting phylogenetic analyses on the basis of our ancient protein sequences from H. antecessor (ATD6-92), a panel of present-day great ape genomes and protein sequences translated from archaic hominin genomes (Methods).

We built several phylogenetic trees using maximum likelihood and Bayesian methods (Fig. 2a, Supplementary Figs. 13–16). In these trees, the H. antecessor sequence represents a sister taxon that is closely related to, but not part of, the group composed of Late Pleistocene hominins for which molecular data are available (Fig. 2a, Supplementary Figs. 13, 15, 16). The enamel protein sequences do not resolve the relationships between H. sapiens, Neanderthals and Denisovans owing to the low number of informative single amino acid polymorphisms. However, pairwise divergence of the amino acid sequences between H. antecessor and the clade containing H. sapiens, Neanderthals and the Denisovan is larger than the divergence between the members of this clade (Fig. 2b, Supplementary Fig. 12, Supplementary Information). The concatenated gene tree may be subjected to incomplete lineage sorting, and we have too little sequence data to discard this possibility at the moment. However, if we use the concatenation of available gene trees as a best guess for the population tree, and assume that such a population tree is a good descriptor of the relationships among ancient hominins, then our results support the placement of H. antecessor as a closely related sister taxon of the last common ancestor of H. sapiens, Neanderthals and Denisovans. The phylogenetic position of H. antecessor agrees with a divergence of the H. sapiens and Neanderthal + Denisovan lineages between 550 and 765 ka^30,31, as ATD6-92 has been dated to 772–949 ka¹¹. This is further supported by recent reconsiderations of the morphology of H. antecessor in relation to Middle and Late Pleistocene hominins²⁹.

**Fig. 2: Phylogenetic analysis of *H. antecessor* ATD6-92.**

**Fig. 3: Skeletal proteome preservation in the Middle and Early Pleistocene epoch (0.12–2.6 Ma).**

Homo antecessor has tentatively been proposed as the last common ancestor of Neanderthals and modern humans⁹. The similarities observed between the modern-like mid-facial topography of H. antecessor and H. sapiens—including a modern pattern of coronal orientation of the infraorbital surface, the sloping and directionality of this plane, as well as the anterior flexion of the maxillary surface and arching of the zygomatic-alveolar crest—were key in this proposal^9,32. Additional studies of the face of ATD6-69 have confirmed that H. antecessor exhibits the oldest known modern-like face in the fossil record^12,13. The phylogenetic placement of H. antecessor implies that this modern-like face—as represented by H. antecessor—must have a considerably deep ancestry in the genus Homo. Findings made between 2003 and 2005 have shown that the H. antecessor hypodigm includes some features that were previously considered Neanderthal autapomorphies²⁸. Our results suggest that these features appeared in Early Pleistocene hominins, and were retained by Neanderthals and lost by modern humans.

By contrast, the phylogenetic tree built with the H. erectus specimen from Dmanisi has only moderate resolution (Extended Data Fig. 8, Supplementary Fig. 11), despite deeper shotgun protein sequencing for this specimen (Extended Data Table 1). This partly inconclusive result might be due to the shorter average peptide lengths compared to the Atapuerca H. antecessor specimen (Extended Data Fig. 6d, Methods) and an absence of uniquely segregating single amino acid polymorphisms (Supplementary Table 9). Although our H. erectus data from Dmanisi demonstrate that ancient hominin proteins can be reliably obtained from the Early Pleistocene epoch, they also highlight the current limits of ancient protein analysis when applied to the phylogenetic placement of Early Pleistocene hominin remains.

Our dataset provides a unique molecular resource of hominin biomolecular sequences from Early and Middle Pleistocene hominins, and represents—to our knowledge—the oldest ancient hominin proteomes presented to date. Comparison of hominin and fauna proteomes from different skeletal tissues reveals that the dental enamel proteome outlasts dentine and bone proteome preservation (Fig. 3). Here the prolonged survival of hominin enamel proteomes is exploited to show that H. antecessor represents a hominin taxon closely related to the last common ancestor of H. sapiens, Neanderthals and Denisovans. In addition, our datasets demonstrate that in vivo proteome modifications, such as serine phosphorylation, survive over time scales of hundreds of thousands of years. Current research therefore suggests that dental enamel, the hardest tissue in the mammalian skeleton, is the material of choice for the analysis of hominin evolution in deep time.

Methods

No statistical methods were used to predetermine sample size. The experiments were not randomized and investigators were not blinded to allocation during experiments and outcome assessment.

Site location and specimen selection

Recent human control specimens

We analysed Ø1952, a human premolar recovered in an archaeological excavation in Copenhagen (Almindeligt Hospital Kirkegård, excavated in 1952, from kisse ‘2’). The tooth is approximately three centuries old, as the cemetery was in use from approximately ad 1600 to approximately ad 1800, and originates from a male individual. We also re-analysed previously published data¹⁶ related to specimens that are dated to between approximately 5,700 and 200 years ago; of these specimens, we took SK339 as a recent example in our comparative figures (a male individual from Fewston (UK) dated to the nineteenth century ad).

Atapuerca

One fragmentary permanent lower left first or second molar (ATD6-92; field number and museum accession number at CENIEH) was used for ancient protein analysis (Extended Data Fig. 2a, Supplementary Information). ATD6-92 originates from sublevel TD6.2 of the Gran Dolina cave site. Sublevel TD6.2 contains a large number of faunal remains, about 170 hominin fossils and about 830 archaeological artefacts. All hominin specimens from sublevel TD6.2, including ATD6-92, are attributed to H. antecessor⁹. ATD6-92 has recently been directly dated through electron spin resonance, laser-ablation inductively coupled plasma mass spectrometry U-series and bulk U-series dating¹¹. Together with previous chronological research at the site, these analyses constrain the age of ATD6-92 to 772–949 thousand years old¹¹.

Dmanisi

One fragmentary permanent upper first molar (D4163; field number and museum accession number at the Georgian National Museum) was used for ancient protein analysis (Extended Data Fig. 2b, Supplementary Information). D4163 derives from layer B1 in excavation block M6 (Dmanisi). Layer B1 at Dmanisi contains one of the richest palaeontological assemblages attributed to the Eurasian Early Pleistocene epoch, including several hominin crania. Below, we refer to these specimens as H. erectus (Dmanisi). They represent the earliest hominin fossils outside Africa, and are dated to 1.77 Ma¹⁴. Faunal material from the site previously demonstrated ancient protein survival for most specimens, but a total absence of ancient DNA⁶ (Fig. 3).

Amino acid racemization

Chiral amino acid analysis was undertaken on one Pleistocene sample from the hominin tooth (D4163) to test the endogeneity of the enamel protein through its degradation patterns. The tooth chip was separated into the enamel and dentine portions, and each was powdered with an agate pestle and mortar. All samples were prepared using previously published procedures³⁹, modified to be optimized for enamel, using a bleach time of 72 h to isolate the intracrystalline protein, demineralization in HCl, KOH neutralization and formation of a biphasic solution through centrifugation⁴⁰. Two subsamples were analysed from each portion: one fraction was directly demineralized and the free amino acids analysed, and the second was treated to release the peptide-bound amino acids, thus yielding the total hydrolysable amino acid fraction. Samples were analysed in duplicate by reversed-phase high-performance liquid chromatography, with standards and blanks analysed alongside samples. During preparative hydrolysis, both asparagine (Asn) and glutamine (Gln) undergo rapid irreversible deamidation to aspartic acid (Asp) and glutamic acid (Glu), respectively⁴¹. It is therefore not possible to distinguish between the acidic amino acids and their derivatives, and they are reported together as Asx and Glx, respectively. Additional descriptions of the methods, as well as additional results, are given in the Supplementary Information.

Proteomic extraction and nanoLC–MS/MS

Protein extraction

Protein extraction was conducted on enamel samples (from the Atapuerca H. antecessor, Dmanisi H. erectus and Ø1952) and a dentine sample (Dmanisi), using one of three protocols. In brief, the first extraction method used HCl for demineralization, but included no subsequent reduction, alkylation or digestion. The second extraction method used a more standard approach, in which the pellet left from the demineralization in extraction one was reduced, alkylated and digested with LysC and trypsin. The third extraction method used TFA for demineralization, and had no subsequent reduction, alkylation or digestion. The first and third extraction approaches provided more extensive peptide recovery in ancient enamel proteomes⁶ compared to the second extraction approach⁴². Further details can be found in the Supplementary Information and a previous publication⁶. Ø1952 was processed using extraction methods one and three. No proteinase and phosphatase inhibitors were used during extraction, as we assumed that catalytically active enzymes were not present in our specimens and the high acidic conditions during our extraction would have irreversibly denatured any proteases possibly present as contaminants in our reagents. Extended Data Table 1 provides a breakdown of the use of specific extraction methods, hominin samples and hominin tissues.

NanoLC–MS/MS analysis

Shotgun proteomic data were obtained on peptide extracts of both hominins at separate facilities at the Novo Nordisk Centre for Protein Research (University of Copenhagen) and the Proteomics Unit (Centre for Genomic Regulation, Barcelona Institute of Science and Technology). Full peptide elutions were injected, in some cases across replicate runs in both Copenhagen and Barcelona. In brief, samples processed in Copenhagen were suspended in 0.1% trifluoroacetic acid, 5% acetonitrile, and analysed on a Q-Exactive HF or HF-X mass spectrometer (Thermo Fisher Scientific) coupled to an EASY-nLC 1200 (Thermo Fisher Scientific). The HF or HF-X mass spectrometer was operated in positive ion mode with a nanospray voltage of 2 kV and a source temperature of 275 °C. Data-dependent acquisition mode was used for all mass spectrometric measurements. Full mass spectrometry scans were done at a resolution of 120,000 with a mass range of m/z 300–1,750 and 350–1,400 for the HF and HF-X mass spectrometers, respectively, with detection in the Orbitrap mass analyser. Fragment ion spectra were produced at a resolution of 60,000 via high-energy collision dissociation (HCD) at a normalized collision energy of 28% and acquired in the Orbitrap mass analyser. In addition, test runs for the Dmanisi sample were performed at a shorter gradient (Supplementary Information). In Barcelona, samples were dissolved in 0.1% formic acid and analysed on a LTQ-Orbitrap Fusion Lumos mass spectrometer (Thermo Fisher Scientific) coupled to an EASY-nLC 1000. The mass spectrometer was operated similarly to the parameters stated for the HF and HF-X mass spectrometers in Copenhagen, except the nanospray voltage was 2.4 kV and full mass spectrometry scans with 1 micro scan were used over a mass range of m/z 350–1,500. Further details of the LC–MS/MS analysis can be found in the Supplementary Information.

Proteomic data analysis

Protein sequence database construction

We constructed an initial Hominidae sequence database containing protein sequences of all major and minor enamel proteins derived from all extant great apes, a hylobatid (Nomascus leucogenys) and a macaque (Macaca mulatta). Additionally, we added protein sequences translated from extinct Late Pleistocene hominins^30,43, and sequences from Gorilla beringei, Pongo pygmaeus and Pongo tapanuliensis^44,45,46. For each protein, we reconstructed the protein sequence of ancestral nodes in the Hominidae family through PhyloBot⁴⁷ to minimize cross-species proteomic effects⁴⁸, and added missing isoform variation on the basis of the isoforms present for each protein in the human proteome as given by UniProt (Supplementary Information). Furthermore, we downloaded the entire human reference proteome from UniProt (4 September 2018) for a single separate search to allow matches to proteins previously not encountered in enamel proteomes. To each constructed database, we added a set of known or possible laboratory contaminants to allow for the identification of possible protein contaminants⁴⁹.

Proteomic software, settings and false-discovery rate

Raw mass spectrometry data were searched for each specimen and tissue separately in either PEAKS⁵⁰ (v.7.5) or MaxQuant⁵¹ (v.1.5.3.30). No fixed modifications were specified in any search. For PEAKS, variable post-translational modifications were set to include proline hydroxylation, glutamine and asparagine deamidation, oxidation (M), phosphorylation (STY), carbamidomethylation (C) and pyroglutamic acid (from Q and E). For MaxQuant, the following variable post-translation modifications were additionally included: ornithine formation (R), oxidation (W), dioxidation (MW), histidine to aspartic acid (H>D), and histidine to hydroxyglutamate. Searches were conducted with unspecific digestion. For PEAKS, precursor mass tolerance was set to 10 ppm and fragment mass tolerance to 0.05 Da, and the false-discovery rate of peptide spectrum matches was set to equal ≤1.0%. For MaxQuant, default settings of 20 ppm for the first search and 4.5 ppm for the final search were used, a fragment mass tolerance of 20 ppm, and peptide spectrum match (PSM) and protein false-discovery rate was set to 1.0%, with a minimum required Andromeda score of 40 for all peptides. Protein matches were accepted with a minimum of two unique peptide matches in either the PEAKS or MaxQuant search. Proteins that conform to these criteria are detailed in Extended Data Table 2. Example MS/MS spectra from the MaxQuant search and overlapping sites of phylogenetic interest (single amino acid polymorphisms) are included as Supplementary Data 1.

Data search iterations

For both the proteomes of Dmanisi and Atapuerca specimens, we conducted two separate initial searches. First, we conducted a search in PEAKS against the entire human proteome. Only standard enamel proteins were identified in these searches, allowing us to continue with more specific searches. For the Dmanisi dentine sample, this first search resulted in a small number of peptides matching to collagen type I only. On the basis of the limited amount of sequence data, no further analysis of the Dmanisi dentine data was therefore conducted. Second, for the enamel data, we conducted a search in PEAKS and MaxQuant against the entire enamel proteome database of all extant and extinct Hominidae. This search was used to observe single amino acid polymorphisms outside the known sequence variation in PEAKS and MaxQuant through the de novo, error-tolerant and/or dependent peptide approaches implemented in each of these search engines. These initial searches indicate overall good protein preservation in both samples and the presence of peptide matches to Pan- and Homo-derived proteins only.

On the basis of these two initial searches, a novel protein sequence database was used that only includes sequences from the genus Pan, the genus Homo, their predicted ancestral sequences and novel protein sequences observed for both the Dmanisi or Atapuerca samples. Final searches and subsequent data analysis were conducted against this database using the above search and post-translational modification settings. Positions supported by insufficient spectral data were replaced by ‘X’, in resulting peptide alignments before phylogenetic analysis.

Data analysis of Ø1952 and the previously published¹⁶ dataset was conducted only in MaxQuant against a database restricted to H. sapiens. All other search settings and database restrictions were similar between these two recent human controls and the ancient hominin proteomes.

Peptide sequence and single amino acid polymorphism validation

To validate the PSMs covering single amino acid polymorphisms of interest, we performed peptide spectrum intensity prediction and validation on our dataset using wiNNer¹⁷. Data from the ancient specimens (Dmanisi H. erectus and Atapuerca H. antecessor) were divided into a subset that contained phylogenetically informative peptide sequences and a larger subset that did not contained these peptides. A training dataset was prepared by taking a subset of the latter peptides, and adding a previously published dataset of enamel proteomes from Dmanisi fauna⁶. We built two models, one for HCD +2 spectra and one for HCD +3 spectra. We took into account the large number of variable modifications observed in our ancient enamel proteomes, and split the retained data for each model into subsets for training, validation and testing (80:10:10). We then obtained Pearson correlation coefficients for the predicted and true fragment intensities in the test dataset and the phylogenetically informative spectra. The architecture of wiNNer was built using Keras (version 2.0.8; https://keras.io) and Tensorflow (version 1.3.0). The wiNNer analysis indicated close correspondence between predicted and true fragment ion intensities (Pearson correlation coefficient medians between 0.85 and 0.76 for different subsets of the data), indicating adequate peptide sequence identification for all our peptides, including phylogenetically informative positions and the variable post-translational modifications. The wiNNer model can be accessed on GitHub (https://github.com/cox-labs/wiNNer.git). Additional methodological details of the wiNNer architecture are given in the Supplementary Information.

Protein damage analysis

Ancient proteins can be modified diagenetically in a variety of ways compared to their modern counterparts. We quantify glutamine and asparagine deamidation following a previously publication⁴² for MaxQuant output, based on MS1 spectral intensities and protein-based bootstrapping (1,000 bootstraps). Further details can be found in the previous publication⁴². We observe that both glutamines and asparagines are almost all deamidated to glutamic acid and aspartic acid, respectively (Extended Data Fig. 6a–c). In addition, peptide length distributions were obtained for datasets presented here and elsewhere^6,8, demonstrating a shortening of average peptide length and overall peptide length distributions for older samples (Extended Data Fig. 6d).

Protein in vivo modification analysis

The existing literature on enamel and enamel proteome biomineralization describes three processes that are key to the maturation of the enamel proteome: protein hydrolysis by MMP20 and KLK4^52,53,54,55, in vivo phosphorylation of serine residues^6,8,23 and expression of different isoforms of AMELX, AMBN and AMTN^52,55,56. We sought to explore the presence of both in vivo protein hydrolysis and serine phosphorylation modifications in our Pleistocene hominin proteomes.

For protein hydrolysis by MMP20 and KLK4, we made use of the Atapuerca digestion-free dataset and the described locations of AMBN, AMELX and AMELY, and ENAM cleavage by MMP20 and KLK4^52,53,54,55. We compared the experimentally observed cleavage sites to a random cleavage model of each protein separately and tested whether the cleavage sites are present in a larger portion of PSMs in the ancient sample. Here we can indeed show an increased presence of PSMs with termini at, or close to, known MMP20 and KLK4 cleavage locations (Extended Data Fig. 7). This corresponds with our observation that protein regions with continuous sequence coverage correspond to known proteolytic fragments after MMP20 and KLK4 activity (Extended Data Fig. 4).

Phosphorylation of serines (S), threonines (T) and tyrosines (Y) was assessed using Icelogo⁵⁷ sequence motif analysis. This analysis was based on the MaxQuant results, from which only identified phosphorylation sites with a localization probability of ≥0.95 were selected. STY sites with no phosphorylation or localization probabilities ≤ 0.95 were taken as the non-phosphorylated background, and a sequence motif window of 7 amino acids on either side of the STY was selected. Sequence motif analysis indicates a strong preference for the phosphorylation of S with a glutamic acid (E) on the +2 position (S-X-E motif) (Fig. 1a, b) in both hominin enamel proteomes. This substrate motif and the S-X-phosphorylated S motif are recognized by the kinase FAM20C, which is known to be active in vivo on extracellular proteins involved in biomineralization²³, and has previously been reported for ancient, non-hominin enamel proteomes as well^6,8.

To compare phosphorylation occupancy between the Dmanisi and Atapuerca enamel proteomes, we performed a separate MaxQuant database search (Supplementary Information) and restricted our analyses to amino acid positions covered by phosphorylated and non-phosphorylated peptides, observed in both hominins and quantified through label-free quantification.

Phylogenetic analysis

Comparison between the ancient protein sequences and modern reference proteins

We compared the reconstructed ancient protein sequences from the Dmanisi H. erectus and Atapuerca H. antecessor with protein sequences from great apes^44,46, three Neanderthals^31,43,58, a Denisovan⁵⁹ and a panel of present-day humans, including 256 samples from the Simons Genome Diversity Panel³³ and 41 high-coverage individuals from the 1000 Genomes Project³⁴. Altogether, our reference data represent worldwide human and great ape variation (Supplementary Tables 7, 8). Additionally, we included protein sequences from macaque (M. mulatta) and gibbon (N. leucogenys) to root phylogenetic trees. The protein sequences were retrieved from the UniProt database or reconstructed from the reference whole-genome sequences as described in Supplementary Methods.

The ancient and reference protein sequences were aligned using mafft⁶⁰. We aligned the sequences of each protein separately and obtained an alignment for each of the ancient individuals independently (Supplementary Table 9). The isobaric amino acids leucine (L) and isoleucine (I) cannot be distinguished with the experimental procedure used for this study. Therefore, we have to take the following precautions to avoid unintentional sequence differences. If either I or L was present at a specific amino acid position in the reference protein sequences, we replaced all corresponding amino acids in the ancient protein sequences with the amino acid that is present. Alternatively, if both amino acids are present in the reference protein sequence, we replace all I to L for all sequences. We used sequence information for seven proteins (ALB, AMBN, AMELX, AMELY, COL17α1, ENAM and MMP20) for the H. antecessor individual and six proteins for the H. erectus individual (ALB, AMBN, AMELX, COL17α1, ENAM and MMP20) with a total of 22.08% and 22.14% non-missing sites, respectively (Supplementary Table 9). We were able to recover a unique single amino acid polymorphism for H. antecessor; however, for H. erectus no unique single amino acid polymorphism was detected (Supplementary Tables 9–11, Supplementary Figs. 10–12).

Phylogenetic reconstruction

We built phylogenetic trees using our protein sequence alignments following three approaches: a maximum likelihood approach using PhyML v.3⁶¹, and two Bayesian approaches using mrBayes⁶² and BEAST⁶³.

For the maximum likelihood approach, we built maximum likelihood trees for each protein independently and for a concatenated alignment consisting of all of the available protein sequences for each of the ancient samples (Supplementary Figs. 13, 14). We used PhyML v.3 and the parameters described in the Supplementary Information section 2.3.5a to build and optimize the tree topologies, branch length and substitutions rates for each of the alignments. Support for each bipartition was obtained based on 100 non-parametric bootstrap replicates. We evaluated the effect of significant missingness in the ancient samples on the inferred topology. Finally, we looked at the effect of varying which of the subset of present-day human samples was included in the tree (Supplementary Information section 2.3.5b, c).

For the Bayesian approach using mrBayes, to assess the robustness of the maximum likelihood inference results, we performed Bayesian phylogenetic inference on the basis of the concatenated alignments using mrBayes 3.2 and the parameters described in Supplementary Information section 2.3.5d (Extended Data Fig. 8, Supplementary Fig. 16). Bayesian inference was performed using the CIPRES Science Gateway⁶⁴.

For the Bayesian approach using BEAST, we used BEAST 2.5 to obtain a time calibrated tree for the seven proteins used for H. antecessor. For this analysis, we used concatenated alignments including the Neanderthals, the Denisovan, seven randomly chosen H. sapiens individuals and a single individual per great ape species. The alignment was partitioned by gene and a coalescent constant population model was used for the tree prior. The dates of the ancient samples included in the analysis (Vindija Neanderthal, 52 ka⁵⁸; Altai Neanderthal, 112 ka³¹; Denisovan, 72 ka⁵⁹ and H. antecessor, 860.5 ka¹¹) were used as tip dates for calibration. For each partition, we used the Jones–Taylor–Thornton substitution model with four categories for the gamma parameter, for which we allowed the Markov chain Monte Carlo chain to sample the shape of the gamma distribution (with an exponentially distributed prior) and assigned independent clock models. Additionally, we set a prior for the divergence time of great apes to 23.85 ± 2.5 Ma (normally distributed)⁶⁵, and rooted the tree using the macaque (M. mulatta). The overall topology of the tree was estimated for the seven partitions jointly. The convergence of the algorithm was assessed using Tracer v.1.7.0⁶⁶. Finally, we repeated this analysis with 100 alignments, each of them consisting of 7 present-day humans chosen randomly. Although the topology within the clade consisting of present-day humans, Neanderthals and Denisovan was not consistent across the replicates, 99 of the replicates consistently place the H. antecessor sequence as an outgroup to this clade (Fig. 2a).

Further details on phylogenetic analysis and results can be found in the Supplementary Information. Example MS/MS spectra from the MaxQuant search and overlapping sites of phylogenetic interest (single amino acid polymorphisms) for both hominins are included as Supplementary Data 1.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

Mass spectrometry proteomics data have been deposited in the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD014342. Generated ancient protein consensus sequences used for phylogenetic analysis for H. antecessor (Atapuerca) and H. erectus (Dmanisi) hominins can be found in the Supplementary Data 2, which is formatted as a .fasta file. Full protein sequence alignments used during phylogenetic analysis can be accessed via Figshare (https://doi.org/10.6084/m9.figshare.9927074). Amino acid racemization data are available online through the NOAA database. The wiNNer model can be accessed on GitHub (https://github.com/cox-labs/wiNNer.git).

Change history

29 July 2020
A Correction to this paper has been published: https://doi.org/10.1038/s41586-020-2580-6

References

Gabunia, L. et al. Earliest Pleistocene hominid cranial remains from Dmanisi, Republic of Georgia: taxonomy, geological setting, and age. Science 288, 1019–1025 (2000).
Article ADS CAS PubMed Google Scholar
Zhu, Z. et al. Hominin occupation of the Chinese Loess Plateau since about 2.1 million years ago. Nature 559, 608–612 (2018).
Article ADS CAS PubMed Google Scholar
Stringer, C. The origin and evolution of Homo sapiens. Phil. Trans. R. Soc. Lond. B 371, 20150237 (2016).
Article Google Scholar
Hublin, J. J. The origin of Neandertals. Proc. Natl Acad. Sci. USA 106, 16022–16027 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Rightmire, G. Human evolution in the Middle Pleistocene: the role of Homo heidelbergensis. Evol. Anthropol. 6, 218–227 (1998).
Article Google Scholar
Cappellini, E. et al. Early Pleistocene enamel proteome from Dmanisi resolves Stephanorhinus phylogeny. Nature 574, 103–107 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Chen, F. et al. A late Middle Pleistocene Denisovan mandible from the Tibetan Plateau. Nature 569, 409–412 (2019).
Article ADS CAS PubMed Google Scholar
Welker, F. et al. Enamel proteome shows that Gigantopithecus was an early diverging pongine. Nature 576, 262–265 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Bermúdez de Castro, J. M. et al. A hominid from the lower Pleistocene of Atapuerca, Spain: possible ancestor to Neandertals and modern humans. Science 276, 1392–1395 (1997).
Article PubMed Google Scholar
Carbonell, E. et al. Lower Pleistocene hominids and artifacts from Atapuerca-TD6 (Spain). Science 269, 826–830 (1995).
Article ADS CAS PubMed Google Scholar
Duval, M. et al. The first direct ESR dating of a hominin tooth from Atapuerca Gran Dolina TD-6 (Spain) supports the antiquity of Homo antecessor. Quat. Geochronol. 47, 120–137 (2018).
Article Google Scholar
Freidline, S. E., Gunz, P., Harvati, K. & Hublin, J.-J. Evaluating developmental shape changes in Homo antecessor subadult facial morphology. J. Hum. Evol. 65, 404–423 (2013).
Article PubMed Google Scholar
Lacruz, R. S. et al. Facial morphogenesis of the earliest Europeans. PLoS One 8, e65199 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Ferring, R. et al. Earliest human occupations at Dmanisi (Georgian Caucasus) dated to 1.85–1.78 Ma. Proc. Natl Acad. Sci. USA 108, 10432–10436 (2011).
Article ADS CAS PubMed PubMed Central Google Scholar
Lordkipanidze, D. et al. A complete skull from Dmanisi, Georgia, and the evolutionary biology of early Homo. Science 342, 326–331 (2013).
Article ADS CAS PubMed Google Scholar
Stewart, N. A., Gerlach, R. F., Gowland, R. L., Gron, K. J. & Montgomery, J. Sex determination of human remains from peptides in tooth enamel. Proc. Natl Acad. Sci. USA 114, 13649–13654 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Tiwary, S. et al. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis. Nat. Methods 16, 519–525 (2019).
Article CAS PubMed Google Scholar
Castiblanco, G. A. et al. Identification of proteins from human permanent erupted enamel. Eur. J. Oral Sci. 123, 390–395 (2015).
Article CAS PubMed Google Scholar
Asaka, T. et al. Type XVII collagen is a key player in tooth enamel formation. Am. J. Pathol. 174, 91–100 (2009).
Article CAS PubMed PubMed Central Google Scholar
Porto, I. M., Laure, H. J., de Sousa, F. B., Rosa, J. C. & Gerlach, R. F. New techniques for the recovery of small amounts of mature enamel proteins. J. Archaeol. Sci. 38, 3596–3604 (2011).
Article Google Scholar
Gasse, B., Chiari, Y., Silvent, J., Davit-Béal, T. & Sire, J.-Y. Amelotin: an enamel matrix protein that experienced distinct evolutionary histories in amphibians, sauropsids and mammals. BMC Evol. Biol. 15, 47 (2015).
Article PubMed PubMed Central Google Scholar
Demarchi, B. et al. Protein sequences bound to mineral surfaces persist into deep time. eLife 5, e17092 (2016).
Article PubMed PubMed Central Google Scholar
Tagliabracci, V. S. et al. Secreted kinase phosphorylates extracellular proteins that regulate biomineralization. Science 336, 1150–1153 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Hu, J. C. C., Yamakoshi, Y., Yamakoshi, F., Krebsbach, P. H. & Simmer, J. P. Proteomics and genetics of dental enamel. Cells Tissues Organs 181, 219–231 (2005).
Article CAS PubMed Google Scholar
Glimcher, M. J., Cohen-Solal, L., Kossiva, D. & de Ricqles, A. Biochemical analyses of fossil enamel and dentin. Paleobiology 16, 219–232 (1990).
Article Google Scholar
Wagner, G. A. et al. Radiometric dating of the type-site for Homo heidelbergensis at Mauer, Germany. Proc. Natl Acad. Sci. USA 107, 19726–19730 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Martinón-Torres, M. et al. Dental evidence on the hominin dispersals during the Pleistocene. Proc. Natl Acad. Sci. USA 104, 13279–13282 (2007).
Article ADS PubMed PubMed Central Google Scholar
Bermúdez de Castro, J. M., Martinón-Torres, M., Arsuaga, J. L. & Carbonell, E. Twentieth anniversary of Homo antecessor (1997–2017): a review. Evol. Anthropol. 26, 157–171 (2017).
Article PubMed Google Scholar
Gómez-Robles, A., Bermúdez de Castro, J. M., Arsuaga, J.-L., Carbonell, E. & Polly, P. D. No known hominin species matches the expected dental morphology of the last common ancestor of Neanderthals and modern humans. Proc. Natl Acad. Sci. USA 110, 18196–18201 (2013).
Article ADS PubMed PubMed Central Google Scholar
Meyer, M. et al. Nuclear DNA sequences from the Middle Pleistocene Sima de los Huesos hominins. Nature 531, 504–507 (2016).
Article ADS CAS PubMed Google Scholar
Prüfer, K. et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014).
Article ADS PubMed Google Scholar
Lacruz, R. S. et al. The evolutionary history of the human face. Nat. Ecol. Evol. 3, 726–736 (2019).
Article PubMed Google Scholar
Mallick, S. et al. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538, 201–206 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article Google Scholar
Welker, F. et al. Middle Pleistocene protein sequences from the rhinoceros genus Stephanorhinus and the phylogeny of extant and extinct Middle/Late Pleistocene Rhinocerotidae. PeerJ 5, e3033 (2017).
Article PubMed PubMed Central Google Scholar
Hill, R. C. et al. Preserved proteins from extinct Bison latifrons identified by tandem mass spectrometry; hydroxylysine glycosides are a common feature of ancient collagen. Mol. Cell. Proteomics 14, 1946–1958 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wadsworth, C. & Buckley, M. Proteome degradation in fossils: investigating the longevity of protein survival in ancient bone. Rapid Commun. Mass Spectrom. 28, 605–615 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
Orlando, L. et al. Recalibrating Equus evolution using the genome sequence of an early Middle Pleistocene horse. Nature 499, 74–78 (2013).
Article ADS CAS PubMed Google Scholar
Penkman, K. E. H., Kaufman, D. S., Maddy, D. & Collins, M. J. Closed-system behaviour of the intra-crystalline fraction of amino acids in mollusc shells. Quat. Geochronol. 3, 2–25 (2008).
Article CAS PubMed PubMed Central Google Scholar
Dickinson, M., Lister, A. M. & Penkman, K. E. H. A new method for enamel amino acid racemization dating: a closed system approach. Quat. Geochronol. 50, 29–46 (2019).
Article Google Scholar
Hill, R. L. Hydrolysis of proteins. Adv. Protein Chem. 20, 37–107 (1965).
Article CAS PubMed Google Scholar
Mackie, M. et al. Palaeoproteomic profiling of conservation layers on a 14th century italian wall painting. Angew. Chem. Int. Ed. Engl. 57, 7369–7374 (2018).
Article CAS PubMed PubMed Central Google Scholar
Castellano, S. et al. Patterns of coding variation in the complete exomes of three Neandertals. Proc. Natl Acad. Sci. USA 111, 6666–6671 (2014).
Article ADS CAS PubMed PubMed Central Google Scholar
de Manuel, M. et al. Chimpanzee genomic diversity reveals ancient admixture with bonobos. Science 354, 477–481 (2016).
Article ADS PubMed PubMed Central Google Scholar
Nater, A. et al. Morphometric, behavioral, and genomic evidence for a new orangutan species. Curr. Biol. 27, 3487–3498.e10 (2017).
Article CAS PubMed Google Scholar
Prado-Martinez, J. et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Hanson-Smith, V. & Johnson, A. PhyloBot: a web portal for automated phylogenetics, ancestral sequence reconstruction, and exploration of mutational trajectories. PLOS Comput. Biol. 12, e1004976 (2016).
Article ADS PubMed PubMed Central Google Scholar
Welker, F. Elucidation of cross-species proteomic effects in human and hominin bone proteome identification through a bioinformatics experiment. BMC Evol. Biol. 18, 23 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hendy, J. et al. A guide to ancient protein studies. Nat. Ecol. Evol. 2, 791–799 (2018).
Article PubMed Google Scholar
Zhang, J. et al. PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol. Cell. Proteomics 11, M111.010587 (2012).
Article PubMed Google Scholar
Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).
Article CAS PubMed Google Scholar
Chun, Y. H. P. et al. Cleavage site specificity of MMP-20 for secretory-stage ameloblastin. J. Dent. Res. 89, 785–790 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yamakoshi, Y., Hu, J. C. C., Fukae, M., Yamakoshi, F. & Simmer, J. P. How do enamelysin and kallikrein 4 process the 32-kDa enamelin? Eur. J. Oral Sci. 114 (Suppl 1), 45–51, 93–95, 379–380 (2006).
Google Scholar
Iwata, T. et al. Processing of ameloblastin by MMP-20. J. Dent. Res. 86, 153–157 (2007).
Article CAS PubMed Google Scholar
Nagano, T. et al. Mmp-20 and Klk4 cleavage site preferences for amelogenin sequences. J. Dent. Res. 88, 823–828 (2009).
Article CAS PubMed PubMed Central Google Scholar
Fukae, M. et al. Primary structure of the porcine 89-kDa enamelin. Adv. Dent. Res. 10, 111–118 (1996).
Article CAS PubMed Google Scholar
Colaert, N., Helsens, K., Martens, L., Vandekerckhove, J. & Gevaert, K. Improved visualization of protein consensus sequences by iceLogo. Nat. Methods 6, 786–787 (2009).
Article CAS PubMed Google Scholar
Prüfer, K. et al. A high-coverage Neandertal genome from Vindija Cave in Croatia. Science 358, 655–658 (2017).
Article ADS PubMed PubMed Central Google Scholar
Meyer, M. et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Article CAS PubMed PubMed Central Google Scholar
Guindon, S. et al. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321 (2010).
Article CAS PubMed Google Scholar
Ronquist, F. et al. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542 (2012).
Article PubMed PubMed Central Google Scholar
Bouckaert, R. et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLOS Comput. Biol. 15, e1006650 (2019).
Article CAS PubMed PubMed Central Google Scholar
Miller, M. A., Pfeiffer, W. & Schwartz, T. in Gateway Computing Environments Workshop (GCE) 1–8 (New Orleans, 2010).
Besenbacher, S., Hvilsom, C., Marques-Bonet, T., Mailund, T. & Schierup, M. H. Direct estimation of mutations in great apes reconciles phylogenetic dating. Nat. Ecol. Evol. 3, 286–292 (2019).
Article PubMed Google Scholar
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

F.W. is supported by a Marie Skłodowska Curie Individual Fellowship (no. 795569). E. Cappellini was supported by VILLUM FONDEN (no. 17649). E.W. is supported by the Lundbeck Foundation, the Danish National Research Foundation, the Novo Nordisk Foundation, the Carlsberg Foundation, KU2016 and the Wellcome Trust. Without the effort of the members of the Atapuerca research team during fieldwork, this work would have not been possible; we make a special mention of J. Rosell, who supervises the excavation of the TD6 level. The research of the Atapuerca project has been supported by the Dirección General de Investigación of the Ministerio de Ciencia, Innovación y Universidades (grant numbers PGC2018-093925-B-C31, C32, and C33); field seasons are supported by the Consejería de Cultura y Turismo of the Junta de Castilla y León and the Fundación Atapuerca. We acknowledge The Leakey Foundation through the personal support of G. Getty (2013) and D. Crook (2014–2016, 2018, and 2019) to M.M.-T., as well as F.W. (2017). Restoration and conservation work on the material have been carried out by P. Fernández-Colón and E. Lacasa from the Conservation and Restoration Area of CENIEH-ICTS and L. López-Polín from IPHES. The picture of the specimen ATD6-92 was made by M. Modesto-Mata. E. Cappellini, J.C., J.V.O. and P. Gutenbrunner are supported by the Marie Skłodowska-Curie European Training Network (ETN) TEMPERA, a project funded by the European Union’s Framework Program for Research and Innovation Horizon 2020 (grant agreement no. 722606). Amino acid analyses were undertaken thanks to the Leverhulme Trust (PLP-2012-116) and NERC (NE/K500987/1). T.M.-B. is supported by BFU2017-86471-P (MINECO/FEDER, UE), U01 MH106874 grant, Howard Hughes International Early Career, Obra Social ‘La Caixa’ and Secretaria d’Universitats i Recerca and CERCA Programme del Departament d’Economia i Coneixement de la Generalitat de Catalunya (GRC 2017 SGR 880). C.L.-F. is supported by a FEDER-MINECO grant (PGC2018-095931-B-100). M.K. was supported by the Postdoctoral Junior Leader Fellowship Programme from ‘la Caixa’ Banking Foundation (LCF/BQ/PR19/11700002). M.M. is supported by the Danish National Research Foundation award PROTEIOS (DNRF128). Work at the Novo Nordisk Foundation Center for Protein Research is funded in part by a donation from the Novo Nordisk Foundation (grant number NNF14CC0001). The CRG/UPF Proteomics Unit is part of the Spanish Infrastructure for Omics Technologies (ICTS OmicsTech) and it is a member of the ProteoRed PRB3 consortium, which is supported by grant PT17/0019 of the PE I+D+i 2013-2016 from the Instituto de Salud Carlos III (ISCIII) and ERDF. We acknowledge support from the Spanish Ministry of Science, Innovation and Universities, ‘Centro de Excelencia Severo Ochoa 2013-2017’, SEV-2012-0208, and ‘Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement de la Generalitat de Catalunya’ (2017SGR595). D.L. and A.M. are supported by the John Templeton Foundation (no. 52935) and by the Shota Rustaveli National Science Foundation of Georgia (no. FR-18-27262). We thank M. L. Schjellerup Jørkov for providing specimen Ø1952.

Author information

These authors contributed equally: Frido Welker, Jazmín Ramos-Madrigal, Petra Gutenbrunner

Authors and Affiliations

Evolutionary Genomics Section, Globe Institute, University of Copenhagen, Copenhagen, Denmark
Frido Welker, Jazmín Ramos-Madrigal, Meaghan Mackie & Enrico Cappellini
Computational Systems Biochemistry, Max Planck Institute of Biochemistry, Martinsried, Germany
Petra Gutenbrunner, Shivani Tiwary & Jürgen Cox
The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark
Meaghan Mackie, Rosa Rakownikow Jersie-Christensen & Jesper V. Olsen
Center for Genomic Regulation (CNAG-CRG), Barcelona Institute of Science and Technology, Barcelona, Spain
Cristina Chiva, Tomas Marques-Bonet & Eduard Sabidó
Proteomics Unit, University Pompeu Fabra, Barcelona, Spain
Cristina Chiva & Eduard Sabidó
Department of Chemistry, University of York, York, UK
Marc R. Dickinson & Kirsty Penkman
Institute of Evolutionary Biology (UPF-CSIC), University Pompeu Fabra, Barcelona, Spain
Martin Kuhlwilm, Marc de Manuel, Pere Gelabert, Tomas Marques-Bonet & Carles Lalueza-Fox
Centro Nacional de Investigación sobre la Evolución Humana (CENIEH), Burgos, Spain
María Martinón-Torres & José María Bermúdez de Castro
Anthropology Department, University College London, London, UK
María Martinón-Torres & José María Bermúdez de Castro
Georgian National Museum, Tbilisi, Georgia
Ann Margvelashvili & David Lordkipanidze
Centro Mixto UCM-ISCIII de Evolución y Comportamiento Humanos, Madrid, Spain
Juan Luis Arsuaga
Departamento de Paleontología, Facultad de Ciencias Geológicas, Universidad Complutense de Madrid, Madrid, Spain
Juan Luis Arsuaga
Departamento d’Història i Història de l’Art, Universidad Rovira i Virgili, Tarragona, Spain
Eudald Carbonell
Institut Català de Paleoecologia Humana i Evolució Social (IPHES), Tarragona, Spain
Eudald Carbonell
Catalan Institution of Research and Advanced Studies (ICREA), Barcelona, Spain
Tomas Marques-Bonet
Institut Català de Paleontologia Miquel Crusafont, Universitat Autònoma de Barcelona, Barcelona, Spain
Tomas Marques-Bonet
Tbilisi State University, Tbilisi, Georgia
David Lordkipanidze
Lundbeck Foundation GeoGenetics Centre, Globe Institute, University of Copenhagen, Copenhagen, Denmark
Fernando Racimo & Eske Willerslev
Department of Zoology, University of Cambridge, Cambridge, UK
Eske Willerslev
Wellcome Sanger Institute, Hinxton, UK
Eske Willerslev
Danish Institute for Advanced Study, University of Southern Denmark, Odense, Denmark
Eske Willerslev

Authors

Frido Welker
View author publications
You can also search for this author in PubMed Google Scholar
Jazmín Ramos-Madrigal
View author publications
You can also search for this author in PubMed Google Scholar
Petra Gutenbrunner
View author publications
You can also search for this author in PubMed Google Scholar
Meaghan Mackie
View author publications
You can also search for this author in PubMed Google Scholar
Shivani Tiwary
View author publications
You can also search for this author in PubMed Google Scholar
Rosa Rakownikow Jersie-Christensen
View author publications
You can also search for this author in PubMed Google Scholar
Cristina Chiva
View author publications
You can also search for this author in PubMed Google Scholar
Marc R. Dickinson
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kuhlwilm
View author publications
You can also search for this author in PubMed Google Scholar
Marc de Manuel
View author publications
You can also search for this author in PubMed Google Scholar
Pere Gelabert
View author publications
You can also search for this author in PubMed Google Scholar
María Martinón-Torres
View author publications
You can also search for this author in PubMed Google Scholar
Ann Margvelashvili
View author publications
You can also search for this author in PubMed Google Scholar
Juan Luis Arsuaga
View author publications
You can also search for this author in PubMed Google Scholar
Eudald Carbonell
View author publications
You can also search for this author in PubMed Google Scholar
Tomas Marques-Bonet
View author publications
You can also search for this author in PubMed Google Scholar
Kirsty Penkman
View author publications
You can also search for this author in PubMed Google Scholar
Eduard Sabidó
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Cox
View author publications
You can also search for this author in PubMed Google Scholar
Jesper V. Olsen
View author publications
You can also search for this author in PubMed Google Scholar
David Lordkipanidze
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Racimo
View author publications
You can also search for this author in PubMed Google Scholar
Carles Lalueza-Fox
View author publications
You can also search for this author in PubMed Google Scholar
José María Bermúdez de Castro
View author publications
You can also search for this author in PubMed Google Scholar
Eske Willerslev
View author publications
You can also search for this author in PubMed Google Scholar
Enrico Cappellini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

E. Cappellini, E.W., J.M.B.d.C., D.L., C.L.-F. and F.W. designed the study. E. Cappellini, M.M., F.W., J.R.-M., R.R.J.-C., M.R.D., C.C. and M.d.M. performed experiments. E. Cappellini, A.M., J.L.A., E. Carbonell, P. Gelabert, E.S., J.C., J.V.O., T.M.-B. and D.L. provided material, reagents or research infrastructure. F.W., J.R.-M., P. Gutenbrunner, S.T., E. Cappellini, F.R., M.M.-T., J.M.B.d.C., M.K., M.R.D., C.L.-F. and K.P. analysed data. F.W., E. Cappellini and J.M.B.d.C. wrote the manuscript with input from all other authors.

Corresponding authors

Correspondence to Frido Welker, José María Bermúdez de Castro, Eske Willerslev or Enrico Cappellini.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Location and stratigraphy of the hominin fossils studied.

a, Geographic location of Gran Dolina and Dmanisi. Base map was generated using public domain data from www.naturalearthdata.com. b, Summarized stratigraphic profile of Gran Dolina, including the location of hominin fossils in layers ‘Pep’ and ‘Jordi’ of sublevel TD6.2.

Extended Data Fig. 2 Hominin specimens studied.

a, ATD6-92 in buccal view. The fragment represents a portion of a permanent lower left first or second molar. b, D4163 in occlusal view. The specimen is a fragmented right upper first molar. Scale bar differs between a and b.

Extended Data Fig. 3 Amino acid racemization of D4163.

a, b, The extent of intracrystalline racemization in enamel for the free amino acid (FAA) (x axis) fraction and the total hydrolysable amino acids (THAA) (y axis) fraction for aspartic acid plus asparagine (here denoted Asx) (a), and glutamic acid plus glutamine (here denoted Glx) (b), demonstrates endogenous amino acids breaking down within a closed system. The hominin value is displayed in relation to values for enamel samples from other fauna from Dmanisi⁶ (blue squares) and a range of previously obtained Pleistocene and Pliocene Proboscidea from the UK⁴⁰ (grey diamonds). Fauna are shown for comparison, but different rates in their protein breakdown mean that they will show different extents of racemization. The x and y axis are on different scales.

Extended Data Fig. 4 Sequence coverage for five enamel-specific proteins across Pleistocene samples and recent human controls.

For each protein, the bars span protein positions covered, with positions remapped to the human reference proteome. The top row indicates the position of a selection of known MMP20 and KLK4 cleavage products of the enamel-specific proteins AMELX⁵⁵, AMBN⁵² and ENAM⁵⁶. Several in vivo proteolytic degradation fragments of ENAM share the same N terminus, but have unknown C termini⁵³. Dotted line for AMBN indicates a putative cleavage product based on known MMP20 (squares) and KLK4 (circles) in vivo cleavage positions. For AMTN, serines (S) at positions 115 and 116 (indicated by asterisks) are conserved among vertebrates and involved in mineral-binding²¹. Additional cleavage products as well as MMP20 and KLK4 cleavage sites are known in all enamel-specific proteins. SK339¹⁶ and Ø1952 are two recent human control samples (Methods). AA, amino acids; Steph., Stephanorhinus⁶; TRAP, tyrosine-rich amelogenin polypeptide.

Extended Data Fig. 5 Homo antecessor specimen ATD6-92 represents a male hominin.

a, Mass spectrum of an AMELY-specific peptide from the recent human control Ø1952. b, Mass spectrum of the same AMELY-specific peptide from H. antecessor. c, Alignment of a selection of AMELY- and AMELX-specific peptide fragment ion series deriving from H. antecessor. The alignment stretches along human AMELX isoform 1, positions 37 to 52 only (Uniprot accession numbers Q99217 (AMELX), Q99218 (AMELY)). See Supplementary Fig. 5 for another example of an AMELY-specific MS2 spectrum.

Extended Data Fig. 6 Enamel proteome damage.

a, b, Glutamine (Q) and asparagine (N) deamidation of enamel-specific proteins from H. antecessor (Atapuerca) (a), and H. erectus (Dmanisi) (b). Values are based on 1,000 bootstrap replications of protein deamidation. c, Relationship between mean asparagine (N) and glutamine (Q) deamidation for all proteins in both the Atapuerca and Dmanisi hominin datasets. Error bars represent 95% confidence interval window of 1,000 bootstrap replications of protein deamidation. Dashed line is x = y. d, Peptide length distribution of H. antecessor (Atapuerca), H. erectus (Dmanisi), four previously published enamel proteomes^6,8,16 and one additional human Medieval control sample (Ø1952). For a, b and d, the number of peptides (n) is given for each violin plot. The box plots within the violin plots define the range of the data (whiskers extend to 1.5× the interquartile range), outliers (black dots, beyond 1.5× the interquartile range), 25th and 75th percentiles (boxes), and medians (centre lines). P values of two-sided t-tests conducted between sample pairs are indicated. No independent replication of these experiments was performed.

Extended Data Fig. 7 Survival of in vivo MMP20 and KLK4 cleavage sites in the Atapuerca enamel proteome.

a, Experimentally observed cleavage matrices for ameloblastin (AMBN), enamelin (ENAM) and amelogenin (AMELX and AMELY) (Methods). Fold differences are colour-coded by comparing observed PSM cleavage frequencies to a random cleavage matrix for each protein separately⁷. b, Fold differences for all observed cleavage pairs per protein. Red filled circles represent MMP20, KLK4 and signal peptide cleavage sites mentioned in the literature^53,54,55,56. Red open circles indicate cleavage sites located up to two amino acid positions away from such sites. c, PSM coverage for each protein. The signal peptide (thick horizontal bar labelled ‘sig’), known MMP20 and KLK4 cleavage sites (vertical bars), and O- and N-linked glycosylation sites (asterisks) are also indicated. For AMELX, peptide positions for all three known isoforms were remapped to the coordinates of isoform 3, which represents the longest isoform (UniProt accession Q99217-3). The x and y axes differ between the three panels of c.

Extended Data Fig. 8 Phylogenetic position of D4163 through Bayesian analysis.

Nomascus leucogenys and M. mulatta were used as outgroups.

Extended Data Table 1 Extraction and mass spectrometry details of analyses conducted on both ancient hominin specimens

Full size table

Extended Data Table 2 Ancient hominin enamel proteome composition and coverage

Full size table

Supplementary information

Supplementary Information

Supplementary Information containing supplementary information on the archaeological sites of Gran Dolina, Atapuerca, and Dmanisi, amino acid racemization, proteomic data extraction, data generation, and data analysis, as well as phylogenetic analysis of recovered ancient hominin proteomes. The Supplementary Information contains 11 SI tables and 16 SI figures.

Reporting Summary

Supplementary Data

Supplementary information file containing annotated MS/MS spectra of phylogenetic interest.

Supplementary Data

File in .fasta format containing recovered Homo antecessor (Atapuerca) and Homo erectus (Dmanisi) protein sequences. Sequence coverage obtained after MaxQuant and PEAKS analysis is combined.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Welker, F., Ramos-Madrigal, J., Gutenbrunner, P. et al. The dental proteome of Homo antecessor. Nature 580, 235–238 (2020). https://doi.org/10.1038/s41586-020-2153-8

Download citation

Received: 04 July 2019
Accepted: 21 January 2020
Published: 01 April 2020
Issue Date: 09 April 2020
DOI: https://doi.org/10.1038/s41586-020-2153-8
Springer Nature Limited

This article is cited by

A label-free quantification method for assessing sex from modern and ancient bovine tooth enamel
- Paula Kotli
- David Morgenstern
- Elisabetta Boaretto
Scientific Reports (2024)
Deep-time phylogenetic inference by paleoproteomic analysis of dental enamel
- Alberto J. Taurozzi
- Patrick L. Rüther
- Enrico Cappellini
Nature Protocols (2024)
Oldest genetic data from a human relative found in 2-million-year-old teeth
- Ewen Callaway
Nature (2023)
Comparing extraction method efficiency for high-throughput palaeoproteomic bone species identification
- Dorothea Mylopotamitaki
- Florian S. Harking
- Frido Welker
Scientific Reports (2023)
Why the geosciences are becoming increasingly vital to the interpretation of the human evolutionary record
- Mike W. Morley
- Ian Moffat
- Kira Westaway
Nature Ecology & Evolution (2023)

The dental proteome of Homo antecessor

Abstract

Similar content being viewed by others

Main

Methods

Site location and specimen selection

Recent human control specimens

Atapuerca

Dmanisi

Amino acid racemization

Proteomic extraction and nanoLC–MS/MS

Protein extraction

NanoLC–MS/MS analysis

Proteomic data analysis

Protein sequence database construction

Proteomic software, settings and false-discovery rate

Data search iterations

Peptide sequence and single amino acid polymorphism validation

Protein damage analysis

Protein in vivo modification analysis

Phylogenetic analysis

Comparison between the ancient protein sequences and modern reference proteins

Phylogenetic reconstruction

Reporting summary

Data availability

Change history

29 July 2020

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Extended data figures and tables

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation