Introduction

The study of human development is challenging, but the investigation of the extraembryonic tissue of the trophoblast is "…a riddle, wrapped in a mystery, inside an enigma" (Winston Churchill, 1939). Fetal development has benefited from human embryonic stem cells (hESCs) as a model [1]. These stem cell models greatly expanded our understanding of cell fate decisions and developmental lineages and led to innovations, such as regenerative medicine and cell-based therapies [2]. The other half of development is the placenta, specifically the trophoblast lineage's extraembryonic cells. Our knowledge of this organ and its specific cell types lag behind fetal or somatic tissues [3], and potentially stem cell models may help accelerate research.

While mouse trophoblast stem cells (TSCs) were generated in the previous millennium [4], the derivation of human trophoblast stem cells (hTSCs) has been sought for several decades. Initial attempts were unsuccessful using conditions similar to the derivation of mouse TSC [5]. Recently, human trophoblast stem cells were derived from human blastocysts and first-trimester placenta cells [6]. This bona fide stem cell of the trophoblast, analogous to the trophoblast stem cells derived from mouse embryos, opens opportunities to study human reproduction. These cell lines are grown in 2D cultures and can be differentiated into syncytium and extravillous trophoblast. Two other groups recently published methods for deriving and maintaining trophoblast organoids derived from first-trimester trophoblast [7, 8]. These organoids can be grown in 3D culture and differentiated into syncytium and EVT-like cells.

Before deriving bona fide hTSC and organoids, methods utilizing BMP treatment to derive trophoblast were actively investigated by many research groups [9,10,11,12]. This method is met with controversy by some groups proposing that the cells were not trophoblast but mesoderm or amnion-like cells [13,14,15]. The primary evidence for trophoblast presented by most publications was the presence of various trophoblast marker genes and phenotypes. mRNA and protein expression, such as GATA3, TFAP2C and KRT7, and the production of estradiol and bHCG (CGA/B) hormones are considered definitive markers of trophoblast [16, 17]. Another hallmark was the ability of the cells to generate a syncytium [12, 17]. One line of reasoning against BMP4-mediated methods of trophoblast derivation is that human ESC were cultured as primed cells that are more representative of a post-implantation stage, long after trophoblast cell fate decisions were determined. hESC can be transited to an earlier developmental state called naïve that is closer to the preimplantation stage of the blastocyst inner cell mass [18]. Interestingly, methods to create human trophoblast from naïve stage embryonic stem cells were established using the media conditions similar to hTSC derivation and maintenance [19,20,21].

Shockingly, human naïve embryonic stem cells are reported to generate blastocyst-like structures called blastoids [22,23,24,25]. By seeding a large number of cells (> 100) into suspension under specific conditions, the aggregates differentiate into primitive endoderm/hypoblast, trophoblast and epiblast while maintaining spatial organization and a cavitated blastocoel. Cell isolates from blastoids were also used to derive hESC, hTSC and hypoblast cell lines.

Many methods appear to enable hESC to access early developmental states, including the trophoblast. This review aims to use available transcriptomic data to establish the similarity of hESC and trophoblast-derived models to in vivo sources. We also investigate the blastoids as a model of embryo cell fate specification and determination, specifically if blastoids achieve trophoblast cell fates through embryo-like processes.

Results

hESC to trophoblast conversion

We searched the Gene Expression Omnibus for data sets on hTSC and trophoblast derivation from hESC made available between July 1, 2018, and June 8, 2021, that were Homo sapiens. We identified 94 studies and selected the 20 data sets on trophoblast derivation or containing established trophoblast cell lines and differentiated cell types. We removed studies using only choriocarcinoma lines (BeWo, JEG3) or transfection-derived cell lines (HTR8/SVneo). Single-cell sequencing data sets on blastoids and cell models were assessed in a separate analysis. ArrayExpress was searched with the same criteria, yielding four more results with two meeting criteria for this study. In total, there were two microarray and 12 bulk RNA sequencing data sets. All data were processed from raw files using a consistent bioinformatics pipeline. Only one study was removed after failing to pass quality control. In the end, we proceeded with two microarray and 11 bulk RNA sequencing data sets (Supplemental Table 1) comprising 308 samples (Supplemental Table 2). For convenience, we identified embryonic and pluripotent stem cells as ESC.

Standard approaches of data integration and batch correction for bulk RNA sequencing did not work as many studies continued to show batch separation from each other (data not shown). Given the different number of platforms (sequencing conditions and microarrays) and inconsistent cell types between data sets (culture conditions, starting cells and in vivo sources), we applied data integration methods from the R package Seurat, typically meant for single cell sequencing. However, these methods utilize a system to identify stable genes for batch correction and normalizaiton to avoid dependency on common cell types as batch correction landmarks. After data integration, we applied a Uniform Manifold Approximation and Projection (UMAP) of clustered data. We favour UMAP over other non-linear dimension reduction methods as it better preserves global relationships [26]. We observed a distribution and mixing of samples by study ID, indicating that experimental batch effects were likely removed or minimized (Fig. 1A). We observed four data groupings composed of seven statistical clusters (Fig. 1B). By convention, clusters are numbered starting at 0 in order of population size. We observed that clusters 0, 1, 4 and 6 belong to the largest block of samples and clusters 2, 3 and 5 show a high degree of separation from the large block. Clusters 2 and 3 are close together, while 5 is remote (Fig. 1B). The organization of the clusters suggests the large group is closely related (clusters 0, 1, 4 and 6), clusters 2 and 3 are similar and cluster 5 is unique (Fig. 1B).

Fig. 1
figure 1

UMAP projections of integrated data sets of ESC to TSC/trophoblast conversion models. A Samples coloured by data set origin. B Samples coloured by statistical cluster numbered from 0 to 6 in order of the number of samples per cluster. C Samples coloured by starting cell types or tissue indicate cluster 2 contains hTSC and 3 holds organoids and in vivo trophoblast isolates and whole villus samples. Cluster 4 is undifferentiated primed and naïve hESC and iPSC. This illustrates that a variety of starting cell types can be induced to trophoblast-like gene expression patterns. D Word cloud overlay of media used in cell culture/derivation for samples

Assessing the clusters for co-grouping in vivo samples found that the placental samples (isolated trophoblasts and whole villous) are in cluster 3 along with the trophoblast organoids (Fig. 1C). Cluster 2 contains hTSC lines derived from the placental trophoblast (Fig. 1C). Associated with clusters 2 and 3 are samples derived from naïve human embryonic stem cells. In general, a variety of starting cell types can be induced into trophoblast-like states (Fig. 1C; Tables 1 and 2). This also indicates that hTSC and organoids are different, possibly due to differences in developmental states, media and culturing methods (2D adherent vs 3D matrix embedded).

Table 1 ESC-derived TSC samples clustered with organoids
Table 2 ESC-derived TSC clustered with trophoblast-derived hTSCs

Investigating media conditions reveals significant separation and correlation to cluster memberships (Fig. 1D). Of the clusters without in vivo trophoblast, the largest block of cells belonging to clusters 0, 1, 4 and 6 contained undifferentiated ESC and various differentiated derivatives using protocols to generate trophoblast, including the BAP- or BMP4-related protocols (Fig. 1D). The separation of these clusters from the trophoblast containing clusters 2 and 3 indicates that their bulk transcriptional profile is not similar to in vivo trophoblast, organoids or hTSC directly derived from the trophoblast. Clusters 2 and 3 containing the in vivo trophoblast samples and bona fide cell models primarily comprise largely naïve hESC or iPSCs cultured with trophoblast stem cell media conditions. Cluster 5 contains only a set of samples derived from ESC using trophoblast and ACE media.

Gene enrichments to clusters

We investigated the typical marker panel of genes used to evaluate trophoblast identity in cell-based differentiation experiments. Embryonic epiblast (ESC) markers, such as NANOG and POU4F1 (OCT3-4), strongly label the ESC cluster 4 (Fig. 2A). A typical trophoblast marker panel of KRT7, CDX2, HAND1, GCM1, TFAP2C, ERVW-1, PGF, GATA3, ITGA2 and PAPPA produces patterns with overlapping expression to clusters 0, 2, 3 and 5 (Fig. 2A, B). The lack of specificity to the trophoblast containing clusters 2 and 3 questions the utility of these markers in the context of cell-based differentiation experiments.

Fig. 2
figure 2

Expression patterns of trophoblast markers. A Expression of individual genes is colour scaled from low to high (yellow–orange–red). Markers of pluripotency show specificity to clusters 4 and 1 (NANOG, POU5F1) are boxed in red. Markers of trophoblast fate (blue box) show inconsistent patterns that encompass clusters 0, 2, 3 and 5, where only clusters 2 and 3 contain in vivo and in vitro trophoblast samples. Genes calculated to have enriched expression to trophoblast clusters 2, 3 or both clusters are boxed in green shades. B Heat map organized by cluster (columns) shows trophoblast markers patterns in clusters 0, 2, 3 and 5

We tested for differential expression to identify candidate markers that can distinguish bona fide trophoblast cells and models from ESC derivatives. We used a one versus all comparison and set significance at an FDR < 0.05 and log2 (fold change) > 0.5. We found 100s of enriched genes to each cluster with many genes displaying precise expression patterns to clusters 2, 3 or both clusters combined (Fig. 2A, B). While untested, these candidates or others may be helpful to assess the quality of novel cell models relative to in vivo and bona fide cell sources.

To address the identities of the different clusters and their functional qualities, we applied gene set enrichment to the top significantly enriched genes in each cluster (Fig. 3). The significant gene ontologies were grouped by overlapping gene sets and a consolidated title was generated using the Cytoscape plug in Enrichment Map (Supplemental Table 3). A potential problem with the human Gene Ontology is under-representing embryonic and especially extraembryonic structures and pathways.

Fig. 3
figure 3

Ontology identifiers from EnrichmentMap analysis of differentially expressed genes to each cluster. Genes identified with increased expression (FDR < 0.05 and a log2(fold change) > 0.5) to each cluster in a one versus all calculation were assessed against the human Gene Ontology Biological Process. EnrichmentMap organized significantly enriched ontologies (FDR < 0.05) to consolidate ontologies with significant overlap in gene membership. A complete list of ontologies is in Supplemental Table 3

Cluster 2, which contained the hTSC and ESC-derived cell lines, displayed enrichment to genes involved in blood vessel formation (angiogenesis), immune functions related to leukocytes and mast cells and processes for locomotion and migration. In contrast, Cluster 3, which comprised organoids, villous and cytotrophoblast samples and ESC-derived cells, was enriched in terms related to organism growth, morphogenesis and growth, signalling related to steroids and stress protein kinase cascades. These ontological differences may relate to developmental stages represented by each model. The cluster 2 ontology enrichment contains early phenotypes of primary trophoblast post-implantation, such as the attraction of angiogenic processes and promotion of immune tolerance [27, 28]. Cluster 3 enriched ontologies likely represent processes of the established placental surface that transport nutrients and are elaborating the villous structures [29,30,31]. No terms specific to trophoblast or placenta were enriched in either cluster.

Cluster 5 was enriched in very few human gene ontologies. Most were related to cell movement regulation, inflammatory response and muscle structure development. The isolated clustering away from bona fide trophoblast samples suggests that cluster 5 is not enriched for trophoblast cell types.

The remaining clusters (0, 1, 4, and 6) include the undifferentiated ESC samples (cluster 4) and differentiated samples from various methods. This sample cluster’s connectivity suggests progressive development in the epiblast lineage. Chronologically, cluster 4 represents the undifferentiated ESC, containing both primed and naïve. Next, cluster 1 is highly enriched in terms related to neurogenesis and neuron projections. Cluster 6 showed a continued and expanded enrichment of ontology terms related to neurogenesis, including synapse formation and patterning of the telencephalon and forebrain. Additionally, cluster 6 was enriched in ontology terms related to cardiac muscle development and calcium-mediated muscle action.

Cluster 0, the outermost group, is composed of endpoint BMP4- or BAP-based differentiated protocol samples (days 5–10). This cluster contained the largest number of differentially expressed genes and enriched ontologies. One of the biggest groupings of ontologies related to morphogenesis and branching in ureteric development contained 117 different ontology terms. Expectedly, a grouping of ontology terms related to BMP signalling was present. Continued enrichment of neurogenesis and cardiac developmental terms were present. Of significant interest is the presence of enriched gene ontologies related to syncytium and plasma membrane fusion. Within the enriched ontologies are cardiac muscle cell fusion pathways [32] and share expression with trophoblast fusion [33]. Importantly, placental-related ontology terms were found as enriched, but these were significantly intersected with maternal tissue, including the decidua. These samples do not cluster with the hTSC, organoids or primary cytotrophoblasts, so they are unlikely pure or enriched populations of trophoblast cells. However, we cannot exclude the possibility of trophoblast cell types being mixed with other mesodermal and neuronal cell types. Similarly, these may contain trophoblast cells that are not well represented in the organoid, hTSC or isolated cytotrophoblast. Single-cell sequencing could resolve this complex mixture as it is important to determine if trophoblast cell fate can be reached from the naïve and primed ESC state.

Embryos, blastoids and stem cells, oh my!

The most recent development in the derivation of the trophoblast is the production of blastoids from ESC, which appear to generate the trophectoderm, hypoblast (primitive endoderm) and epiblast components in a cavitated structure [22,23,24,25]. A possible concern is that the TE structure is amnion, as there is a degree of gene expression overlap between these cell types and both grow as a monolayer tight epithelium. To determine if blastoids produce trophoblast similar to the blastocyst, we combined three published single-cell sequencing data sets on human blastoids [23,24,25]. We integrated these with a single-cell sequencing data set on human embryo development spanning totipotency to blastocyst formation [34]. Additionally, we integrated single-cell data of primed and naïve hESC, human dermal fibroblasts and hTSC derived from blastocysts (Fig. 4).

Fig. 4
figure 4

UMAP projections of aggregated single-cell sequencing data of blastoids, human embryos and stem cell lines. Each point represents an individual cell. A Cells are coloured by cluster membership. Clusters 0–11 are numbered in order of largest to smallest. B Cells are coloured by sample origin and show some sample unique or enriched arrangements, such as embryo populations, primed hESC, hTSC. C Multi-panel display of UMAP projection coloured by cluster membership as in A and separated by sample origin as in B. Comparison of blastoid cells to embryo cells (top row) highlights contributions to similar clusters. Cell lines (bottom row) show tight aggregations typically to a single cluster. D Blastoid cells from Kagawa 2022 separated by culture time (24, 60 and 96 h) compared to human embryos separated by time show a similar arrangement of cluster memberships from days 5 through 7. EG are cells coloured by gene expression from low to high (grey–orange–red). E Pluripotency genes. F Hypoblast or primitive endoderm markers. G Trophoblast markers show different patterns. CDX2 and HAND1 are expressed in clusters that initiate on day 5 of embryo development and are reduced in other clusters. TFAP2C and GATA3 show general expression through all clusters. GCM1, CGA, PGF and ERVW-1 show expression beginning in later clusters associated with embryonic days 6 and 7

In total, 6160 cells passed quality control filters and were statistically organized into 12 clusters and displayed in a UMAP (Fig. 4A). The clusters showed good integration of independent data sets with no strong batch effects (Fig. 4B). Clusters were assigned a cell lineage based on the presence of annotated human embryo cells within the cluster. Human cell lineage annotation was defined in the deposited data set. Clusters 6 and 11 represent cells of embryonic days 3 and 4 of human development, respectively, and are considered as totipotent (Fig. 4C). Neither cluster contained significant numbers of cells from blastoids or cell lines (Fig. 4C), suggesting that likely none of the published blastoid conditions achieve a totipotency state. Clusters 1, 3 and 7 contained trophectoderm cells and the hTSC (cluster 7). Cluster 5 contained primitive endoderm (hypoblast cells). Clusters 0, 2 and 10 contained epiblast cells and cells from the naïve hESC. Cluster 4 was only cells from primed hESC and clusters 8 and 9 were only associated with the blastoid data. Cluster 9 also contained human dermal fibroblast cells.

Day 5 human embryos and 24–60-h blastoids produced cells integrated to clusters that spanned epiblast, hypoblast (primitive endoderm) and trophoblast (Fig. 4C, D). Additionally, the three blastoid data sets produced clusters of cells (clusters 4, 8 and 9) not generated by human embryos (Fig. 4C).

As the focus of this review is the trophoblast, we will only briefly address the epiblast and hypoblast-associated clusters. Cells associated with clusters of embryonic epiblast lineage showed similar expression of epiblast markers POU5F1 and NANOG (Fig. 4D, E). Cluster 8 is composed of cells from naïve ESC and blastoids but does not contain embryo-derived epiblast (Fig. 4D). The primed ESCs in cluster 4 contain some blastoid cells but not naïve ESC or human embryo samples (Fig. 4D). This suggests that blastoids generate appropriate embryo cell types (clusters 0 and 2), although the blastoid epiblast appears dominated by cluster 0, while the embryo is cluster 2. This proportion is similar to that observed in the naïve cell culture sample.

Cluster 5 is the hypoblast and contains cells of the blastoids and human embryo but not naïve or primed hESC and most cells express the markers PDGFRA, SOX17, FOXA2 and FGFR2 (Fig. 4A, C, F). This suggests that blastoid differentiation creates a developmental state that can access hypoblast fate without using totipotency.

The trophoblast from the human embryo falls into multiple clusters and is well mixed with blastoid cells (1, 3 and 7, Fig. 4A, C, G). Cluster 3 trophoblast expressed marker genes, including CDX2, HAND1 and TFAP2C. Cluster 2 trophoblast expressed gene associated with later developmental phenotypes, including GCM1, KRT7, the hormones PGF and CGA and the syncytialization marker ERVW-1 (Syncytin). The human TSC associate completely with cluster 7 along with cells from the blastoids of Yu and colleagues [24] and Yanagida and colleagues [23] (Fig. 4C). The blastoids of Kagawa and colleagues [25] did not create cells that belong to cluster 7. Chronologically, the human embryo first produced cells of cluster 3 followed by 2 and 7 (Fig. 4C). The blastoid time series data from Kagawa and colleagues [25] similarly produced cluster 3 cells followed by 2 but did not produce cells of cluster 7 as stated above (Fig. 4C). Time series data from Yanagida and colleagues [23] produced clusters 3 and 2 concurrently, followed by cluster 7 (data not shown). Our analysis indicates that blastoids produce a variety of trophoblast subtypes similar to those of the embryo but without access to totipotency. There is also a chronological similarity to trophoblast development between blastoids and embryos. There are differences in cellular proportions between the different blastoid methods although these are subject to technical bias and error.

Associated with the trophoblast clusters is cluster 9 produced by blastoids from Kanagawa and Yu [24, 25] and a small number of cells from naïve hESC (Fig. 4A, C, D). The absence of these cells from the human embryo suggests inappropriate development within the blastoids and a propensity of naïve hESC to spontaneously produce this cell type (Fig. 4C, D). Over-representation analysis of human ontology using genes significantly enriched to cluster 9 (one versus all, FDR < 0.05, log2(fold change) > 0.5) revealed biological processes involved in adhesion and cell spreading, actin and cytoskeleton and epithelium morphogenesis that fit with the formation of a monolayered tight epithelium. Also present are terms related to muscle development and smooth muscle proliferation. Curiously, this is similar to ontological enrichments observed in cluster 5 from the ESC to trophoblast conversion models (Fig. 1B).

There are no GO terms annotated to the amnion, so this structure could not be detected using standard ontology databases. Markers of human amnion are not well known, and the few available are also markers of trophoblast (e.g. CDX2 and GATA3). A study of induced amnion from primed hESC provided a list of differentially expressed genes between amnion-like epithelial cells and hESC [15]. In a different approach, we imported validated amnion gene markers from the mouse EMAPA ontology (http://www.obofoundry.org/ontology/emapa.html). Of the 150 genes annotated with expression to the amnion in mice, 41 showed significant differential expression between the clusters. In a heat map of these 41 markers and the data set of merged single cells organized by cluster, 21 genes were specifically increased in cluster 9 relative to other clusters and several were significant to cluster 9 and other clusters (Fig. 5). This is strong evidence of a propensity for naïve cells to make amnion under some blastoid conditions.

Fig. 5
figure 5

Heat map of mouse embryonic amnion markers in cells organized by cluster membership. Cluster 9 (boxed in purple) is only found in blastoid samples but associated with trophoblast cell clusters found in embryo and blastoid samples. Cluster 9 shows high enrichment of differentially expressed amnion genes

Discussion

The creation of cell models of the human trophoblast is fuelling a new interest in trophoblast tissue and the placenta from those traditionally not involved in placental and trophoblast research. This is an excellent opportunity to forge new collaborations between stem cell, tissue engineering and trophoblast/reproductive biology research groups. However, caution is needed in this new burgeoning field, and a consensus is required on what constitutes trophoblast and how to assess cell lines and tissues derived from stem cells. Of concern is the propensity of cells in culture to find viable but not in vivo equivalent states [35]. Some recent articles have begun to address this [36, 37]. Our findings show that context is critical in interpreting and selecting markers. While classical trophoblast markers worked very well in blastoid models to identify trophoblast from epiblast hypoblast and amnion. In cell conversion models their ambiguous performance may be based on the presence of heterogeneous cell cultures and post-gastrulation lineages of mesoderm, ectoderm and endoderm.

Lessons can be learned from past debates after creating human ESC and induced pluripotent stem cells on how to establish what constituted a “true” stem cell [38, 39]. Human cell models are challenged in contrast to the mouse cell lines that can be transplanted and assessed in vivo for their developmental potential. However, even in a “gold standard” host embryo transplantation assay, cells can be misinterpreted based solely on their location within the embryo or placenta [37]. Single markers can be misleading in cell culture, and even tissue definitive markers need to be reassessed in different experimental conditions outside of their in vivo context.

Our analysis of the human ESC to TSC derivation methods indicates that only some protocols produce large numbers of trophoblast cells. This was not an obvious conclusion due to the expression of accepted markers and presence of functional ontologies associated with the placenta. Our analysis is afforded a significant benefit of contrasting a variety of published data sets and methods with the recent creation of “gold standard” trophoblast stem cells lines and organoids derived directly from the trophoblast. While some methods produced cells with similar classical marker gene expression patterns, these same cells are revealed not to be transcriptionally similar to in vivo trophoblast when assessed by genome-wide methods. We present a consolidated data set of candidate markers to test for efficacy in cell type discrimination. However, including bona fide trophoblast cell models or data is likely a best practice to follow in cell culture experiments.

An alternate conclusion may be that human trophoblast arise from multiple routes that include both the totipotent state and the epiblast (naïve and primed), which in humans may be more plastic relative to mouse. Our ontology findings raise ambiguity as many of the processes found in the placenta arose through a process of evolutionary co-option of gene networks in other tissues [40]. The fusion process in the placenta uses genes sound in other syncytial tissues, such as cardiac [32, 33].

Blastoids are an exciting new cell culture model. As individually reported, our integrated analysis of three blastoid single-cell data sets indicated that each method produced cell types highly similar to the embryo [23, 25, 41]. Significantly, blastoids did not recapitulate early totipotent stages of embryogenesis yet somehow access trophoblast and hypoblast genes and developmental programmes. We also observed embryo inappropriate and missing cell types within the epiblast and trophoblast lineages of blastoids. We found three time-dependent trophoblast populations produced by the embryo but that not each blastoid method produced them all. An open question is how these multiple trophoblast populations of the embryo and blastoids are spatial arranged, such as polar regions analogous to the mouse or other patterns. Do individual blastoids contain all of these cell types?

We observed that two blastoid models produced cells showing transcriptional profiles of amnion that group with the trophoblast clusters. Early trophoblast markers and amnion share common regulators, such as CDX2 and GATA3. Given the connection of the trophoblast and amnion clusters, it is tempting to speculate on a developmental relationship. However, it is possible that this is not a normal developmental trajectory. As blastoids do not appear to use totipotency to reach trophoblast and hypoblast, a potential explanation is that blastoids use amnion as a gateway. We observed that a few undifferentiated naïve hESC could produce rare cells clustered with blastoid amnion. The strong signalling environment of blastoid culture media may trans-differentiate amnion progenitors into trophoblast. However, one blastoid condition did not show the presence of amnion cells [23]. These blastoid culture conditions are possibly better at restricting amnion and channelling cells into the trophoblast. Alternatively, this data set contained fewer cells, and amnion may be missing due to under sampling. Recognition of these unintended cell types should help improve methods to better model in vivo development.

While we did not directly assess data on implantation models, care should be taken not to overinterpret decidualization. In the mouse, decidualization can be induced with lectin-coated beads and other growth factors [42, 43] in the complete absence of trophoblast. Better molecular characterization of implantation is needed to understand and differentiate generalized versus trophoblast-mediated decidual responses, such as models incorporating resident immune cell populations.

Important questions remain to be answered. We need a consensus on a panel of markers and functional assays to assess a trophoblast cell’s identity. We need to establish assays to determine a trophoblast stem cell’s developmental capacity and functional similarity to in vivo counterparts. As the origins of the trophoblast further investigation in needed to determine the mechanisms of primed and naïve hESC’s access to trophoblast gene networks. Lastly, how do epiblast cells access the extraembryonic lineages without using totipotency, or are correctly defining totipotency?

Methods

Identifying data sets

NCBI’s GEO was searched using the terms "trophoblast" and "placenta organoid". Data sets were filtered for a deposit date between July 1, 2018, and June 8, 2021, Homo sapiens, "expression profiling by array" and "expression profiling by high throughput sequencing". Samples with or of choriocarcinoma lines (BeWo, JEG3) and transfection-derived cell lines (HTR8/SVneo) were excluded. Array express was searched with the same criteria.

Data set processing

To ensure uniform analysis, bulk RNA sequencing data was obtained as raw FASTQ files from the Sequence Read Archive using NCBI’s SRA toolkit Count tables were generated by aligning FASTQ files to the GRCh38.p12 human genome build using Kallisto pseudo alignment and importing the abundance files into R with Bioconductor's tximport by gene. Microarray raw data were directly downloaded from Array Express or GEO using Bioconductor's query package.

Data set integration

Seurat was selected for downstream analysis for bulk RNA sequencing and microarray data due to its strength in integrating, clustering and visually representing data. While this software is typically applied to large numbers of cells with sparse data, this method performed well in our hands with fewer samples but with near complete overlapping gene coverage typical of microarray or bulk RNA sequencing. The relatively small sample size of bulk RNA sequencing and microarray data required manual integration of data sets before import and integration in Seurat. Bulk RNA sequencing data sets containing similar sample types were first filtered with limma’s filterByExpr and normalized with calcNormFactors. We then merged data sets containing similar samples using the ribiosNGS package mergeDGEList function. Expression data were then log-transformed using limma’s zoom function. The variation introduced due to the inclusion of different experiments was accounted for using RemoveBatchEffect with groups of highly similar samples across experiments assigned to eliminate covariate batches. Once four larger data sets were created, they were large enough to integrate with Seurat FindIntegrationAnchors and IntegrateData. The remainder of the analysis in Seurat involved scaling data, finding clusters, differential expression and visualizing the clusters with UMAP.

Single-cell sequencing

Count tables for single-cell sequencing were directly downloaded were available or generated from FASTQ files using HTSEQ2 aligned to the human genome. First, individual data sets were processed using Seurat package functions to clean and normalize data. Single-cell sequencing Seurat objects were integrated with Seurat FindIntegrationAnchors and IntegrateData. The combined object was used for the remainder of the analysis in Seurat involved scaling data, finding clusters, differential expression and visualizing the clusters with UMAP.