Introduction

Over the last two decades, transcriptional profiling methods have evolved from targeted quantification of one or few transcripts (e.g., qRT-PCR) to unbiased, simultaneous profiling of thousands of transcripts (e.g., microarrays and RNA-seq). The targeted approaches such as qRT-PCR are still commonly used, as they are rapid, affordable, and do not require bioinformatics expertise. However, next-generation sequencing–based techniques, mainly RNA-seq, have quickly become a powerful alternative to the conventional techniques. RNA-seq allows profiling of the whole transcriptome and can therefore reveal alterations in the entire signaling networks, and potentially lead to the identification of novel genes of significance that could not be predicted a priori. With the availability of a large array of wet-lab and dry-lab tools, RNA-seq has become one of the standard techniques in molecular biology research.

Since the initial application of next-generation sequencing technology to complementary DNA (cDNA) [1,2,3], RNA-seq has been widely used in skeletal biology research. Today, RNA-seq continues to hold significant potential to answer some of the pressing questions regarding skeletal development and disease. The origin of skeletal tissues, such as bone, cartilage, ligament, and tendon, remains incompletely understood with respect to the mesenchymal cell populations involved in their development. At the same time, our understanding of prevalent skeletal disorders, such as osteoporosis and osteoarthritis, is limited in terms of the molecular events underlying disease initiation, progression, and treatment with existing therapies. RNA-seq has provided insight into some of the aforementioned issues in studies evaluating clinically available and experimental therapies in animal models, and will continue to do so with the emergence of advanced techniques such as single-cell RNA-seq (scRNA-seq) and spatial transcriptomics.

In this paper, I will review the typical RNA-seq workflow commonly used to evaluate messenger RNA (mRNA) expression, while highlighting the specific challenges associated with studying skeletal tissues and the solutions that have been proposed in the literature. I will provide a brief overview of recent studies on bone, cartilage, ligament, and tendon tissues that benefited from RNA-seq analyses. I will also review more recently developed technologies such as scRNA-seq, whose application have so far been limited but holds potential to provide deeper insight into the biology of the skeleton.

Generation and Analysis of RNA-seq Data

A typical RNA-seq experiment has two components: Library preparation (wet-lab) and computational data analysis (dry-lab). For the preparation of high-quality RNA-seq libraries, high-quality RNA samples are required. The use of degraded RNA results in uneven coverage of transcript sequences with a bias towards the 3′ untranslated region (UTR), and the complexity of the transcriptome is reduced when the degradation is severe [4]. A standard method for determining RNA quality is to calculate the RNA integrity number (RIN, within a range of 0 to 10), based on the length distribution of RNA transcripts and the relative abundances of 18S and 28S ribosomal RNA in each specimen. However, isolation of RNA with sufficient quality (i.e., high RIN) and quantity can be challenging when dealing with matrix-rich skeletal tissues, especially those from small animal models. Post-mortem RNA preservation techniques (e.g., RNAlater) that can chemically stabilize highly cellular tissues such as brain, liver, or kidney are not necessarily compatible with skeletal tissues [5]; these solutions are not capable of fully penetrating the extracellular matrix to reach the cells, especially in the case of mineralized tissues. Alternatively, de-activation of RNA-degrading enzymes by rapidly cooling tissue specimens following harvest can be a viable approach [6, 7]. This problem has been recognized in the field of skeletal biology, and multiple tissue-specific RNA isolation protocols have been described for bone, cartilage, ligament, and tendon [8,9,10, 11•, 12, 13•, 14•,15,16,17]. Laser capture microdissection (LCM), which allows extraction of cells from precise anatomic locations, typically yields degraded RNA. However, specialized approaches for sequencing LCM-derived RNA have also been described [18,19,20]. A typical library preparation pipeline involves enrichment of mRNA (or conversely, depletion of ribosomal RNA), fragmentation, reverse transcription of mRNA, double-stranded cDNA generation, blunting of 5′ and 3′ regions of the cDNA molecules, barcoded adapter ligation (which allow combining multiple samples in downstream sequencing steps), amplification, and purification of the cDNA libraries (Fig. 1). Pooled libraries can then be sequenced with single- or paired-end settings; in other words, one or two-paired reads can be generated by sequencing the transcripts, each with typically 50–100 bp length.

Fig. 1
figure 1

The typical RNA-seq workflow entails mRNA isolation from cells or tissues, fragmentation, reverse transcription, barcoding, and amplification of cDNA. The short reads generated using the cDNA libraries are mapped to a reference genome, counted, and normalized to gene-specific expression data. These data can then be utilized in differential expression analyses comparing two or more groups of libraries, or identification of novel genes or pathways of interest

The appropriate number of reads per library depends on the desired downstream application; however, for standard mRNA expression analyses, 10–25 million reads per library have been reported to be sufficient to detect significant differences between treatment and control groups in skeletal biology studies on animal models [21••, 22]. Importantly, the number of genes whose expression could be reliably detected with RNA-seq depends on the transcriptional complexity of the source tissue. While evaluating multiple mouse tissues with RNA-seq, we have previously shown that less than 5000 transcripts can be detected above the detectability threshold (RPKM> 5) for skeletal muscle and whole blood, whereas > 8000 transcripts can be detected above the same threshold in bone marrow and > 9000 transcripts in long bone tissue [21••]. Furthermore, increasing the number of sequencing reads does not necessarily lead to a meaningful increase in the number of genes whose expressions are detected. Conversely, when gene expression differences between two or more groups are sought, increasing the number of replicates in each group can substantially enhance the statistical power of the experiment, and might help detect subtler changes in transcriptional activity [23, 24]. So how many biologic replicates does one need in an experiment? The answer depends on the expected variability in each group (which can be lower when testing a cell line vs fresh frozen tissue) and the desired power (e.g., detecting threefold changes in expression vs 1.5-fold). Power analysis tools for RNA-seq experiments have been developed ([25,26,27,28]). Furthermore, whole tissue RNA-seq experiments on long bone and ligament tissue have reported the detection of gene expression changes as low as twofold with n = 6–8 biologic replicates [21••, 22, 29]. With cultured cells, where the variability within each biologic group might be limited relative to fresh frozen tissue, it could be possible to attain a similar level of statistical power with less replicates.

Following quality control of sequencing reads (FastQC is a commonly employed tool for this [30]), the first step of bioinformatics analysis is the alignment of the reads to a reference genome. Using paired and long reads can increase the specificity of alignment, but even with 50 bp single-end reads, high-quality RNA-seq libraries can generate reads ~ 99% of which can be successfully aligned (~ 80–85% uniquely) to a position in the well-defined mouse or human genomes [19, 31, 32]. Then, the mapped reads are annotated; in other words, reads that correspond to the exonic sequences of each gene are identified and gene-specific read counts are calculated (Fig. 1). In experiments where unconventional model organisms are utilized, familiarity with the genetics of the model organism might be helpful, as available sequence and gene annotation data may be limited. This problem could be circumvented with de novo transcriptome assembly tools, which do not require the existence of a reference genome, but come at the cost of additional computational effort [33]. Once total read counts per gene are calculated, these values can be normalized for downstream analyses in a number of ways. In order to rank genes based on their expression levels, reads/fragments per kilobase million (RPKM/FPKM) values are calculated by normalizing the read counts with respect to (1) the total number of reads in each library and (2) the length of the exonic sequence in each gene.

For statistical evaluations where two or more groups of libraries are compared, more sophisticated approaches are required, which take the transcriptional complexity of libraries into account. edgeR and DESeq are two very commonly used R subroutine packages that offer advanced normalization and differential expression algorithms [34, 35]. In addition to these, an important consideration in RNA-seq data analysis is the utilization of algorithms for multiple hypothesis testing correction. Performing thousands of simultaneous statistical tests on gene expression leads to an increased number of incorrect significance calls (i.e., type I error). In order to address this problem, the concept of false discovery rate (FDR) [36] is incorporated into differential expression analysis packages (including edgeR and DESeq) and can be controlled through conservative adjustment of p values, which results in a reduction in the aforementioned error. More “user friendly” software with a graphic user interface, such as Galaxy, are also available [37]. Finally, gene set enrichment analysis software can assist in evaluating differences between multiple groups of samples, especially when a large number of genes are found to be differentially expressed.

Applications of RNA-seq on Skeletal Tissues

As an unbiased molecular profiling method, RNA-seq has been applied to a very diverse set of problems in the field of skeletal biology. The broad scope of the studies reviewed herein demonstrates the power and range of RNA-seq, and that it is widely utilized as a molecular analysis tool by skeletal researchers.

Recent studies utilizing RNA-seq have revealed specific transcriptional changes in osteoblasts during differentiation towards an osteocyte-like phenotype. These studies highlighted the role of epigenetic events in osteocytogenesis, such as alterations in vitamin D receptor binding [38•, 39•], and determined the stage-specific transcriptomes of differentiating calvarial osteoblasts in vitro [40]. Further work on the epigenome of differentiating osteoblasts through combined CHIP-seq and RNA-seq analyses identified Ezh2 as a negative regulator of osteoblast maturation and skeletal development [41, 42]. RNA-seq–based transcriptional profiling also led to identification of novel molecules involved in PTH signaling such as Cdc73 [43], and verified the similarity of molecular changes induced by intermittent PTH treatment and salt inducible kinase (SIK) inhibition in osteocyte-like Ocy454 cells [44]. Studies on mice with altered Lrp5-mediated Wnt signaling have also identified novel transcripts (in addition to Col1a1 and Bglap), whose abundance in long bone tissue correlates with Wnt signaling activity in mice [45]. Altogether, these findings indicate that transcriptome profiling with RNA-seq has the power to identify novel molecules involved in bone cell differentiation, as well as anabolic bone formation induced by PTH and Wnt signaling.

One particular area of bone biology where RNA-seq has been used in pursuit of novel genes and mechanisms is bone mechanotransduction. That bone tissue remodels itself and positively responds to mechanical loading has been known for a long time [46], and canonical Wnt signaling has been identified as a key molecular event in bone tissue’s ability to respond to mechanical loads [47, 48]. With the ability to screen the entire transcriptome in an unbiased manner, a number of studies evaluated mechanically loaded long bones [49••, 50], osteocyte-like MLO-Y4 cells stimulated with fluid flow [51] and bone marrow stem cells subjected to microgravity [52], in order to determine transcriptome-wide changes in gene expression in response to alterations in mechanical environment. These studies identified time- and compartment-dependent changes in tissue-level gene expression in bone, while verifying the involvement of Wnt signaling through changes in the expression of canonical and non-canonical Wnt ligands. Yet, novel genes and signaling events outside the Wnt pathway are yet to be definitively associated with mechanotransduction in bone cells. Further validation experiments will be necessary to demonstrate the mechano-responsiveness of any gene that is identified in future RNA-seq experiments, such as those performed on mice with Lrp5 mutations [53, 54] that demonstrate inhibited or enhanced bone formation following mechanical stimulation.

Unlike bone tissue, the cellular and genetic make-up of tendons and ligaments remain largely uncharacterized. Transcriptional profiling has therefore been utilized in order to better understand the development of connective tissues, as well disease- or trauma-induced changes. Specifically, the involvement of mTORC1 signaling [55] and transcription factor Foxf2 [56] in regulating mouse tendon development has been shown with RNA-seq. A large body of literature in ligament research focuses on anterior cruciate ligament (ACL) injuries, as they tend to trigger post-traumatic osteoarthritis (PTOA). RNA-seq experiments on torn ACL tissue from patients have revealed that the extent of injury to the joint, specifically the presence of meniscal tears, significantly influence the transcriptome of damaged ACL cells [57]. Further, work on mouse models shows that genetic background has a significant effect on the severity of PTOA phenotype which appears to be correlated with inflammatory cytokine expression in damaged ACL tissue [58, 59]. Experiments on large animal models of ACL transection also show that dramatic transcriptional changes occur in several joint tissues, including the synovium, ACL, and articular cartilage, in a time-dependent manner [22, 29, 60]. Consistent with studies on mouse and human patient-derived tissues, the aforementioned experiments identified transcriptional changes in inflammation and cell cycle–related genes in multiple joint tissues.

RNA-seq has revealed novel aspects of transcriptional regulation of growth plate cartilage development as well. Specifically, in the growth plate, RNA-seq experiments suggested that PTPN11/SHP2 loss results in disorganization of the hierarchy of growth plate chondrocytes [61]. Disruption of epigenetic regulators alter the transcriptome of chondrocytes, leading to an osteogenic gene expression repertoire with the loss of Ezh2 [42] and reduced expression chondrogenic genes (including Acan and Sox9) with the loss of Kdm6b [62]. Multiple studies identified novel targets of transcription factors Sox9 and Pitx1 in chondrocytes [63,64,65]. RNA-seq has also identified disease-related changes in articular cartilage, specifically within the context of osteoarthritis. Comparative evaluation of damaged and intact cartilage specimens from patients going under total joint replacement surgery, as well as assessment of post-mortem tissue and primary chondrocytes, has revealed alterations in cartilage transcriptome, specifically in the expression of cartilage-associated genes Sox9, Col11a2, Acan, and other novel transcription factors [66]. A recent study by Ji et al. utilized a scRNA-seq approach, and identified multiple transcriptionally distinct chondrocyte populations in diseased cartilage tissue [67].

Regardless of the tissue of interest, a common challenge in most RNA-seq studies in skeletal biology is the inherent cellular heterogeneity. When a transcriptional change is identified between two biologic groups, is it due to changes in the relative quantity of a specific cell type, or changes in the transcriptomes of all cells? If specific markers for cells of interest are available, flow cytometry might offer a potential solution to this problem, provided that cells will have to go under additional processing such as enzymatic and/or mechanical dissociation from the source tissue. scRNA-seq, which allows transcriptional profiling of individual cells, will likely offer solutions to this persisting problem of cellular complexity. A better understanding of the cellular complexity of skeletal tissues will likely be possible in the near future, through the combined use of complimentary methods such as scRNA-seq, flow cytometry, and immunohistochemistry-based lineage tracing.

Single-Cell RNA-seq and New Possibilities in Skeletal Biology

scRNA-seq offers the ability to profile the transcriptomes of individual cells, and therefore makes it possible to evaluate the cellular diversity of complex tissues such as bone and bone marrow. The advances in the field of scRNA-seq have been accelerated by the advent of droplet-based single cell capture technologies [68••], followed by the rapid emergence of several platforms that facilitate cell capture and single cell mRNA library preparation. The commercially available technologies can be classified into two categories: droplet-based cell capture techniques (originally termed “Drop-seq” by the inventors of the technique [68••]) and plate-based capture and sequencing techniques [69, 70]. While the former technology allows processing of tens of thousands of cells from a single specimen, the number of transcripts detected per cell and the sequence coverage of each transcript (heavily enriched at 3′ UTR) are limited. The plate-based techniques on the other hand rely on the separation of individual cells into wells of 96- or 384-well plates, and are therefore limited in terms of the cells that can be processed to a few hundred cells at one time. However, with plate-based techniques, it is possible to sequence the transcriptome of each cell in greater depth, and potentially evaluate the whole sequence of each transcript, rather than solely the 3′ UTR.

The applications of scRNA-seq have been relatively limited in the field of skeletal biology so far; however, this is likely to change thanks to the availability of the technique through various commercial platforms that utilize the aforementioned approaches. One of the biggest challenges associated with scRNA-seq remains its cost, which can easily surpass $1000 per specimen only to capture cells and prepare libraries. However, with the advent of novel multiplexing technologies such as CITE-seq (which allows antibody-mediated labeling of cells, and therefore multiplexing cells with distinct origin prior to processing [71••]), scRNA-seq will likely become affordable and accessible to a growing community of scientists.

Earliest studies in the field of scRNA-seq focused on the biology of developing organisms and tissues, in an effort to better characterize the biology of actively differentiating cell populations [72]. Starting with Drop-seq, it became apparent that novel cell types could be transcriptionally defined in tissues with high cellular diversity, such as the retina [68••, 73]. Debnath et al. have shown that multiple mesenchymal cell populations in the long bone periosteum are labeled with Ctsk expression, which until recently was thought to be unique to osteoclasts [74••]. scRNA-seq experiments have identified n = 4 distinct populations within the aforementioned cell pool, one of which exhibits characteristics of stem cells. Going forward, scRNA-seq will likely improve our understanding of the biology of the developing skeleton (as the mesenchymal origins of osteoblasts remain incompletely characterized, and the same is true for connective tissue tendons and ligaments and their interfaces with bone and muscle), and also the changes that an injured tissue goes under during repair (such as a bone fracture callus, a torn ligament, or articular cartilage at the onset of post-traumatic osteoarthritis). scRNA-seq could be utilized in multiple different ways to gain further insight into these problems. It remains to be seen if scRNA-seq can identify gene expression changes in individual skeletal cell populations following perturbation. The number of genes detected per cell is typically limited compared to conventional RNA-seq, which may limit the statistical power of the assay. However, scRNA-seq has been utilized in identifying novel cell types that only arise in disease state (for example, during cystic fibrosis in the lung [75, 76]). A recent study by Ji et al. has reported the presence of multiple transcriptionally distinct chondrocyte populations in diseased human articular cartilage [63]; however, it is unclear if any of these populations are unique to the osteoarthritic phenotype. One potential approach to delineating the origin and fate of cell populations of interest could be to utilize classic lineage tracing approaches, in combination with flow cytometry and scRNA-seq. Specific cell populations could be marked in a mouse model using the inducible Cre-recombinase system, after which the labeled cells and their descendants are harvested, sorted, and finally evaluated with scRNA-seq. Takahashi et al. [77•] and Mizuhashi et al. [78•] have recently utilized this strategy to label and describe mesenchymal cell populations in the dental follicle and long bone growth plate, respectively. Whatever the approach might be, scRNA-seq will surely enhance the molecular biology toolset of skeletal researchers and help characterize the complexity of tissues at a resolution that was not possible until recently.

Conclusions

RNA-seq has facilitated new discoveries in skeletal biology and will likely lead to more in the future. Availability of single-cell RNA-seq will accelerate these discoveries, especially as it becomes more affordable and accessible. As transcriptome profiling methods rapidly advance, the challenges remain in interpretation, rather than generation of data. In bulk RNA-seq studies, wherein thousands of genes are simultaneously evaluated, multiple results may be generated that fit the predicted paradigm just by chance. For this reason, follow-up validation experiments that use orthogonal techniques are crucial. Rather than reaching definitive conclusions, the objective of a good RNA-seq experiment is to generate new, testable hypotheses. Therefore, one design criteria when planning a transcriptome profiling experiment might be to ensure the availability of in vivo or in vitro platforms, wherein genes and pathways could be conveniently manipulated within the context of the biologic question of interest. More advanced techniques, such as scRNA-seq, still pose a number of technical challenges, such as isolation of a sufficient number of viable cells that are accurately representative of the cellular diversity of tissues in vivo. Furthermore, while multiple data analysis tools are available, the analysis pipelines are yet to be standardized, especially towards identifying novel cell populations and cell type–specific gene expression changes. Yet, the initial examples of scRNA-seq have shown that it can significantly contribute to exploratory efforts in skeletal development and disease. More studies will surely follow that will unravel the cellular complexity of the skeleton.