Main

Megachirella preserves traits that indicate that it is a lepidosaurian reptile, such as the presence of a well-developed quadrate conch, an ectepicondylar foramen in the humerus and pleurodont dentition. Some of these features led previous authors to recognize the lepidosauromorph affinities of Megachirella, which was previously considered as a non-squamate lepidosauromorph, although no definitive conclusions on its phylogenetic placement had ever been reached5,10,11. Yet, the unique condition of Megachirella as one of the very few articulated and well-preserved Triassic lepidosauromorphs hints at its potential to help resolve important aspects of lepidosaur evolution. Here we provide substantial new information on Megachirella, based on personal observations and high-resolution microfocus X-ray computed tomography (micro-CT) scans, which reveal several previously unnoticed features in Megachirella (Fig. 1, Extended Data Figs. 1, 2 and Supplementary Discussion). Results from the micro-CT scans include a combination of features that are found uniquely in squamates: a triradiate squamosal (not tetraradiate as in most other diapsids, including rhynchocephalians); the squamosal lacks an anteriorly concave articulatory facet for the postorbital; a well-developed alar process of the prootic; a well-developed radial condyle on the humerus; an ulnar patella; a secondary curvature of the clavicles; and an expanded epiphysis of the first metacarpal along with the absence of the first distal carpal (suggesting its fusion with the first metacarpal, as observed in modern squamates12). Finally, the micro-CT scans indicate that Megachirella has features that are absent in all rhynchocephalians (the sister lineage to squamates), including the earliest forms such as Gephyrosaurus: the presence of a splenial; the ectopterygoids are directed anteriorly (not laterally as in rhynchocephalians); the presacral pleurocentra lack a notochordal canal; and dorsal (coronoid) expansion of the surangular and dentary bones is absent. The new information presented here, along with our extensive revision of diapsid and early squamate phylogeny, unambiguously resolves the placement of Megachirella as the oldest known squamate. As expected for a squamate that is 85 million years (Myr) older than the oldest previously known articulated squamates for which the osteology is well known—Eichstaettisaurus and Ardeosaurus from the Late Jurassic of Germany8,13Megachirella retains numerous plesiomorphic features. These features are observed in other diapsid reptiles, and some are retained in rhynchocephalians, but they are almost entirely lost in crown squamates. These include amphicoelic vertebrae (although present in geckoes and Huehuecuetzpalli), a small quadratojugal, gastralia and an entepicondylar foramen in the humerus.

Fig. 1: Holotype of M. wachtleri (PZO 628).
figure 1

a, b, Whole skeleton dorsal and ventral views. c, d, Skull in dorsal (c) and ventral (d) views. e, Palatal region in ventral view. f, Braincase in left lateral view. g, Dentary in cross-section. h, i, Right forelimb in dorsal (h) and ventral (i) views. Abbreviations: Al.Cr., prootic alar crest; A.Sc.C., anterior semicircular canal; Ax, axis; Boc, basioccipital; Bptg.Pr., basipterygoid process; Bsp, basisphenoid; C, coronoid; Cap, capitulum; Cb, ceratobranchial; Ce.R., cervical rib; Ce.V., cervical vertebrae; Cl, clavicle; Co, coracoid; C.P., cultriform process; Cr.D., crista dorsalis; D, dentary; Do.V., dorsal vertebrae; D.T., dentary teeth; Ect, ectopterygoid; Ect.Fr., ectepicondylar foramen; Ent.Fr., entepicondylar foramen; F, frontal; H, humerus; J, jugal; La.W., labial wall; Li.Cr., lingual crest; M, maxilla; M.C., medial centrale; McI, metacarpal I; P, parietal; Opi, opisthotics; Pal, palatine; POF, postorbitofrontal; POP, paraoccipital process; PrF; prefrontal; Pro, prootic; Ptg, pterygoid; Ptg.Q.Pr, pterygoid quadrate process; Ptg.T.R., pterygoid tooth rows; Ptg.Tr.Pr., pterygoid transverse process; Q, quadrate; Ra, radius; RAP, retroarticular process; Sca, scapula; Spl, splenial; Sq, squamosal; Ul, ulna; Ul.P., ulnar patella. Scale bars, 10 mm (a, b), 5 mm (cf, h, i) and 1 mm (g).

Assessing the phylogenetic position of Megachirella and other lepidosauromorph reptiles is challenging because there has never been a phylogenetic dataset comprising a rich sampling of both non-lepidosaurian diapsid reptiles and squamates. Almost invariably, broad-scale reptile phylogenies have represented the nearly 10,000 extant species and the hundreds of fossil species of squamates as a single operational taxonomic unit14,15,16 (for more examples, see Supplementary Methods). This approach oversimplifies the enormous diversity of phenotypes and genotypes in squamates. Conversely, studies focused on squamate phylogeny never include more than a few taxa outside the Squamata to serve as outgroups9,17. Here we create the first morphological phylogenetic dataset comprising all the main branches of the diapsid tree of life, including extant taxa and fossils from all major lineages of rhynchocephalians (for example, tuataras) and squamates at the species level (Supplementary Data 15). We also focused on primary data collection, personally observing numerous specimens covering 100% of the taxa included in this dataset. We performed a meticulous revision of reptile and squamate phylogenetic characters (and created new characters) to avoid issues caused by logical or biological biases in morphological characters18. Owing to the rich sampling of extant squamate species, we also included molecular data from 16 loci (13 nuclear and 3 mitochondrial). The analyses performed include morphological and combined evidence (morphological and molecular data) analyses of diapsid and lepidosaurian relationships, carried out under multiple phylogenetic inference methods (see Methods).

Despite the difference in the datasets used (that is, morphology versus combined evidence) and phylogenetic optimality criteria, all results converge on Megachirella representing a stem squamate along with Marmoretta oxoniensis, from the Middle Jurassic of Britain, and Huehuecuetzpalli mixtecus, from the Early Cretaceous period of Mexico (Fig. 2 and Extended Data Figs. 38). This resolution is particularly well supported in the combined evidence analysis, in which Megachirella has a leaf stability above the overall mean (Extended Data Fig. 9). In analyses with maximum parsimony, Sophineta cracoviensis also falls within the Squamata stem, but this is not recovered in the remaining analyses. This indicates that some taxa previously proposed to be early-evolving lepidosauromorphs (for example, Megachirella and Marmoretta)5,10,11 actually represent the oldest known squamates, partially filling the supposed 70-Myr fossil gap in the early history of the clade. Other taxa also considered to be early lepidosauromorphs by previous studies (for example, kuehneosaurids and Saurosternon5) are consistently found in our results to be nested in other parts of the diapsid tree outside the Lepidosauromorpha. Additionally, all previous morphology- and molecular-based squamate phylogenies available in the literature disagree with each other concerning the earliest-evolving crown group squamates: iguanians for morphology-based analyses17,19, but dibamids and gekkotans for molecular analyses7,20,21 (see also Supplementary Methods). The results of the combined evidence analyses typically match those of the molecular data alone6,9; however, our results show unprecedented agreement between morphological and molecular data, in placing geckoes instead of iguanians among the earliest-evolving squamates. Iguanians are consistently found further crownward in the tree, nested either with anguimorphs and snakes (clade Toxicofera, Extended Data Figs. 3, 58), or with teiioids (Extended Data Fig. 4). This unprecedented agreement between molecular and morphological data with regards to the early evolution of squamates might be a consequence of our broad sampling of taxa outside squamates (thus affecting character polarity and branch length parameters) and strict criteria for morphological dataset construction.

Fig. 2: Combined evidence relaxed-clock Bayesian inference analysis with total evidence tip and node dating using the fossilized birth–death tree model.
figure 2

Summary of the majority rule consensus tree depicting the median divergence time estimates for the major diapsid and squamate lineages against a geological time scale. Numbers at nodes indicate posterior probabilities and the orange dashed line represents the Permian/Triassic mass extinction event. For the full tree and 95% highest posterior density on divergence times see Extended Data Fig. 8. N, Neogene period.

Megachirella provides unique insights into the early acquisition of squamatan features, as it is the first unequivocal squamate from the Triassic. Megachirella, and also Huehuecuetzpalli22, show that features that are commonly attributed to squamates characterize crown squamates, but were not yet present in stem squamates. For instance, Megachirella and Huehuecuetzpalli still retain amphicoelic vertebrae, an entepicondylar foramen, and lack a ball-like distal epiphysis of the ulna. Megachirella further indicates that the loss of the quadratojugal and gastralia occurred within squamates, and not at the point of divergence from rhynchocephalians. The same pattern occurs in rhynchocephalians, for which Triassic and Early Jurassic fossils were previously known23, and which retain plesiomorphic features (such as the pleurodont dentition) that are absent in most of the later members of that group.

Previous molecular-clock estimates have placed the squamate crown divergence time between the Late Triassic and Early Jurassic6,7,24, and lepidosaurs originating at some point in the Triassic5,6 or the Middle Permian period7,25. Our time-calibrated Bayesian inference analyses combine information from both the molecular and morphological relaxed-clocks on lepidosaurs and other diapsid lineages (Fig. 2 and Extended Data Fig. 8), providing a more holistic approach to the divergence time of squamates, lepidosaurs and other diapsids. Our estimates indicate lepidosaurs originated 269 Myr ago (median estimate) in the Middle Permian, and crown squamates 206 Myr ago in the Late Triassic (thus agreeing with recent phylogenomic analyses7). Furthermore, our morphological sampling allows a more precise estimate of the origin of the squamate root by the inclusion of fossils now recognized as stem squamates, and thus the age of origin of all squamates can be set at 257 Myr ago, close to the Permian/Triassic mass extinction (PTME).

Some of the oldest known fossils for certain diapsid lineages are known from the earliest Triassic, including ichthyosaurs16, sauropterygians26 and archosaurs27, with more recent fossil evidence already suggesting the presence of archosauriforms in the Late Permian28, strongly suggesting their divergence preceded the PTME. In accordance, our divergence time estimates for almost all major diapsid lineages (such as lepidosaurs, archosauriforms and marine reptiles) are in the Permian (Fig. 2 and Extended Data Fig. 8) and not the Triassic (the period from which their oldest known fossils are known). This corresponds to the general expectation that the oldest known fossil of a lineage is likely to be much younger than the actual divergence time for that same lineage29.

The origin of lepidosaurs and other major diapsid lineages before the PTME contradicts previous ideas suggesting that those groups originated in the aftermath of the greatest mass extinction in Earth’s history30. Instead, our results indicate those lineages already existed, but radiated in the Triassic. It is likely that the PTME opened new niches and opportunities to lineages previously restricted in diversity, thus enabling their radiation in the Triassic into numerous forms and sizes, occupying all major biomes on the planet.

Methods

Micro-CT

The holotype of Megachirella wachtleri was analysed by micro-CT at the Multidisciplinary Laboratory of the Abdus Salam International Centre of Theoretical Physics (Trieste, Italy), using a system specifically designed in collaboration with Elettra-Sincrotrone Trieste (Basovizza, Italy) for the study of palaeontological and archaeological materials31. The micro-CT acquisition of the complete specimen was carried out by using a sealed X-ray source (Hamamatsu L8121-03) at a voltage of 150 kV, a current of 100 μA and with a focal spot size of 20 μm. The X-ray beam was filtered by a 1.5-mm-thick aluminium absorber. A set of 2,400 projections of the sample were recorded over a total scan angle of 360° by a flat panel detector (Hamamatsu C7942SK-25) with an exposure time of 2.0 s. The resulting micro-CT slices were reconstructed in 16-bit format using the commercial software DigiXCT (DIGISENS) and an isotropic voxel size of 42.51 μm. Additionally, the proximal part of the sample was re-analysed (voltage 150 kV, current 100 μA, 1-mm copper filter, exposure time/projection 3.0 s and 1,800 projections over 360°) setting an effective pixel size of 18 μm and reconstructed using the same software to achieve a higher spatial resolution.

Morphological dataset construction

All taxa used in this study were personally observed by at least one of us, and more than half by two or more of the co-authors. The new dataset presented herein includes a large sample of species of squamates, as well as a broad variety of non-squamatan lepidosaurs and non-lepidosaurian diapsid species, representing all of the major clades of diapsid reptiles. Characters were assessed based on primary homology assessment and according to strict criteria for character construction, to avoid biases owing to logical or biological dependencies across characters, overweighting of any anatomical attributes and many other issues that may affect the morphological component of phylogenetic datasets18. We selected Protorothyris archeri as the outgroup to our analyses and all morphological characters were treated as unordered (see Supplementary Methods for additional details).

Molecular dataset alignment, model selection and partitions

The molecular dataset consists of 16 genetic markers (13 nuclear and 3 mitochondrial loci) for 38 extant taxa. A complete list of sampled loci and sequence lengths is provided in Supplementary Table 1. Sequence data for the selected coding regions were obtained from GenBank (Supplementary Data 2). For three ingroup taxa, Liolaemus signifer, Pristidactylus scapulatus and Stenocercus scapularis, for which molecular data were not available, we used sequences of the congeneric species, L. ornatus, P. torquatus and S. guentheri, respectively. Sequences were aligned in the MAFFT 7.24532 online server using the global alignment strategy with iterative refinement and consistency scores. For the protein-coding genes, alignments were verified by translating nucleotide sequences to amino acids. The final multiple sequence alignment was concatenated and visually examined in Mesquite 3.0433. Molecular sequences from all extant taxa were analysed for the best partitioning scheme and model of evolution using PartitionFinder234 under Akaike information criterion.

Equal weights maximum parsimony analysis

Analyses were conducted in TNT v.1.135 using the new technology search algorithms. This strategy enables the sampling of trees from a broader spectrum of local optima than is allowed by the heuristic search with ratchet runs in PAUP* v.4.0 beta 10, especially for large datasets35,36. Tree searches were conducted using 1,000 initial trees by random addition sequences with 100 iterations or rounds for each of the four NTS algorithms: sectorial search, ratchet, drift and tree fusing. The output trees were used as the starting trees for subsequent runs, using 1,000 iterations/rounds of each of the new technology search algorithms. The latter step was repeated once, and the final output trees were filtered for all the most parsimonious trees (MPTs). A total of 621 MPTs were obtained with 2,268 steps each.

Implied weights maximum parsimony analysis

Analyses were also conducted in TNT, using the implied weighting algorithm37, with a K = 12 and collapsing all branches with support = 0. Tree searches were conducted as performed for the equal weights parsimony analysis. Larger K values than the default (3.0) are indicated to perform better for large datasets38. A total of five best fit trees were obtained (fit = 91.768892) and used to calculate the strict consensus tree.

Bayesian inference analyses

Analyses were conducted using Mr. Bayes v.3.2.639 using the Cedar computer cluster made available through Compute Canada and the CIPRES Science Gateway v.3.340. Molecular partitions were analysed using the models of evolution obtained from PartitionFinder2 (see dataset), and the morphological partition was analysed with the MkV model41.

The distribution for rate heterogeneity was tested for best fit to the data under both γ and log-normal distributions, as it was recently demonstrated that a log-normal distribution may better fit morphological data for a large variety of datasets42,43. Fit to the data was assessed using Bayes factors [B10]44,45 calculated with the marginal model likelihoods obtained from the stepping-stone sampling method46. The interpretation of the results of the model fit to the data was performed as previously described45: when 2loge(B) > 2 (positive evidence against model M0); when 2loge(B) > 6 (strong evidence against model M0); when 2loge(B) > 10 (very strong evidence against model M0). However, 2loge(B) was less than one between the γ and log-normal runs, indicating that there was no significant difference in fit to the morphological data between both distributions. The morphological partition was thus analysed under the γ model for all subsequent analyses.

Time-calibrated relaxed-clock Bayesian inference analyses

We implemented ‘total-evidence-dating’ using the fossilized birth–death tree model with sampled ancestors, under a relaxed-clock model in Mr. Bayes v.3.2.647,48,49. The chosen relaxed-clock model is the independent γ rate relaxed-clock model50. This is a continuous uncorrelated relaxed-clock model using a gamma distribution to assess clock rate variation across lineages. The latter is compatible with the fossilized birth–death tree model, unlike the compound Poisson process relaxed-clock model48. The base clock rate was given an informative prior, which was derived from the non-clock Bayesian inference analysis: the median value for tree height in substitutions from the entire posterior trees sample divided by the age of the tree, which is based on the median of the distribution for the root prior: 25.1658/325.45 = 0.0773, in natural log scale = −2.560061. We chose to use the exponent of the mean to provide a broad standard deviation (e0.0773 = 1.080366) as previously recommended6. The sampling strategy was set to diversity, which is more appropriate when extant taxa are sampled in a manner that maximizes diversity (as performed herein) and fossils are sampled randomly47,48. Diversity sampling is very common in higher-level phylogenies, and not accounting for it has a deep effect on tree inference, pushing divergence times further back and creating unreasonably older and more variable divergence times48,51. This is a considerable advantage of using Mr. Bayes for divergence time estimates over current implementations available in the software package BEAST52.

The wealth of fossil taxa in our dataset, including some of the oldest known taxa for many clades, provided numerous calibration points. Therefore, the vast majority of our calibrations were based on tip dating, which accounts for the uncertainty in the placement of fossil taxa and avoids the issue of bound estimates for node-based age calibrations47. The fossil ages used for tip dating correspond to the uniform prior distributions on the age range of the stratigraphic occurrence of the fossils (available in Supplementary Table 2). However, it has recently been demonstrated that using tip dates only can contribute to unrealistically older divergence time estimates for some clades53,54. Therefore, when we lacked the oldest known fossils for any of the clades in our analysis (namely, captorhinids, choristoderes, snakes and rhynchocephalians), we used node-age calibrations with a soft lower bound as long as the age of the oldest known fossil was well-established and there was overwhelming support in the literature (and in all our other analyses) for their monophyletism. Combined with the diversity sampling strategy, the latter dating protocol can ensure reliable divergence time estimates.

The age of the root was set with a soft lower bound, which gives a low (but non-zero) likelihood of the age being older than the lower bound value. Minimum and maximum root bounds were placed as follows. The minimum age was set at the oldest possible age for the oldest known reptile, Hylonomus (from the Joggins Formation in Nova Scotia, Canada), which comes from the late Bashkirian Stage (early Pennsylvanian, Late Carboniferous) and is between 318 and 315 Myr old55. Considering Petrolacosaurus may be as much as 307 Myr old, placing the minimum age at 318 Myr seems consistent, as the most recent common ancestor of diapsids and captorhinids must have been at least a few million years older than Petrolacosaurus. The maximum age was based on the maximum soft age for the reptile–synapsid split56, 332.9 Ma.

Convergence of independent runs was assessed using an average standard deviation of split frequencies of approximately 0.01, potential scale reduction factors of approximately 1 for all parameters57 and an effective sample size greater than 200 for each parameter.

Leaf stability

Leaf stability was assessed using RogueNaRok58, which allows assessing the difference between the highest and the second highest support values for alternative resolutions of each taxon quartet or triplet in the dataset (LSdif)59. We applied this method to the posterior trees from the Bayesian inference analysis including both the morphological and molecular data. Because of the large number of taxa and large number of trees, it was necessary to downsample the total number of posterior trees from each analysis (100,000 trees after discarding burn-in). The final sample consisted of 10,000 trees (selecting one at every 10 trees) using the Burntrees script for Perl (https://github.com/nylander/Burntrees). Taxon names and raw data relating to each number depicted in Extended Data Fig. 9 can be found in Supplementary Table 3.

Reporting summary

Further information on experimental design is available in the Nature Research Reporting Summary linked to this paper.

Data availability

The micro-CT scan data are available from the authors upon reasonable request. The morphological and molecular datasets for the phylogenetic analyses, including the Mr. Bayes parameters block, are available as Supplementary Information.