Introduction

Since the completion of the deciphering of complete mammalian genomes [19, 31, 62, 67] and the accession to the post-genomic era, one of the main objectives of biology is to determine the function of all genes. The community of physiologists should play a leading role in this challenge, inasmuch as they develop and use adapted tools for large-scale analysis. This is now possible, because the post-genomic era has not only brought pieces of information on the identification of entire genomes, but it has also seen the development of sophisticated methods to study the expression of these genomes [49, 60]. The scope of this short review is to discuss the relevance, feasibility and impact of analysis of transcriptomes in renal physiology. We will first make some general considerations about the nature of transcriptomes, their methods of analysis and the specific requirements in renal physiology and then, based on our own experience, consider some foreseeable contributions of transcriptome analysis in renal physiology and pathophysiology.

Rationale for transcriptome research in renal physiology

Conversely to cell biology, which addresses the mechanisms of universal cell functions that allow cell division, nutrition and death, physiology is focused mainly on the cell-specific functions which emerge with cell differentiation and organogenesis during development of superior multicellular organisms. This cell specification results from expression of specific sets of genes in different cell types and tissues. Thus, establishing the catalogue of the genes which are expressed (i.e. the transcriptome) in the different tissues or cell types of multicellular organisms appears as a prerequisite to analyse the functional specificities of these cellular systems. Until recently, functional specificities of tissues were addressed through the study of individual genes and often necessitated, as a first step, the identification and cloning of these genes. Not only has this molecular identification step become useless with the availability of complete genomes, but these protein-by-protein studies have demonstrated their limit; indeed, functional properties of individual proteins may vary according to their environment, in particular the nature of the proteins with which they associate to build supramolecular complexes. This also stresses the interest to index the whole protein equipment of individual cells.

Similarly, it is now well established that physiological regulations are pleiotropic processes that involve a large number of molecular partners. As a consequence, it is not only necessary to know the molecular equipment of specific cells, but also to get access to the variations of this equipment under different functional states.

Analysis of transcriptomes for physiological purpose may have also consequences in genome studies. Firstly, it may not only provide experimental confirmation of gene prediction but also may lead to the discovery of new genes which remain undetected by genome analysis. For example, although the complete genome sequence of Saccharomyces cerevisiae was available [20], analysis of its transcriptome allowed the discovery of several hundreds of transcripts which were not predicted by computerised genome annotation [61]. Secondly, it may help to discover the function of orphan genes, as physiologists are well qualified to formulate hypothesis regarding the function of a gene in view of its tissue distribution and/or of its regulation of expression and to experimentally verify their hypothesis.

Nature and properties of transcriptomes

Mammalian genomes are currently considered to consist of ≈30,000 genes [19, 31, 62, 67]. The exact number of genes is not known precisely because, although complete genome sequences contain all the information about genes, the exact identification of genes, i.e. the exon/intron limits and the genes boundaries, are not precisely known. These ≈30,000 genes can be transcribed in >100,000 mRNAs, as most mammalian genes consist of numerous exons which generate a great number of splice variants. Each cell likely contains 300,000 mRNA molecules which correspond to ≈15,000 different molecular species [24]. Transcripts are 1.5 kb long on average and therefore a cell transcriptome corresponds to ≈5×108 bases. In addition, the abundance of each transcript varies from a few copies to >10,000 copies per cell [24]. Thus, methods for analysis of transcriptome must combine the potency of analysing a large number of different molecular species with a dynamic range greater than three orders of magnitude. Conversely to the genome which, except for acquired mutations, is identical in all somatic cells and stable during life span, cell transcriptomes vary not only from cell to cell, but also with time along with functional adaptations. In other words, transcriptomes provide a quantitative and dynamic view of genome expression, and therefore are related to cell function potential.

Proteomes are wider than transcriptomes, since each transcript may encode several proteins, and each protein exists under several different functional states. From a physiological point of view, it is often argued that transcriptome analysis would not be relevant, because only proteins are the basis of cell functions and transcripts may not be translated. Unfortunately, several factors still restrict proteome analysis. Although new methods such as Isotope Coded Affinity Tagging (ICAT) [22] or protein microarrays [52] have been developed recently, proteome analysis remains today mainly based on separation and characterisation of proteins by 2D gel electrophoresis, followed by characterisation of the peptide spots through proteolysis, sequencing and/or mass spectroscopy. As yet, 2D gel electrophoresis suffers several limitations: (1) it is not an exhaustive method, as only abundant proteins can be detected (a few hundreds per sample); (2) membrane proteins are hardly detectable, a major limitation in renal physiology, since the main functions of renal cells rely on ion and water transporting proteins displaying multiple membrane spanning domains; and (3) compulsory precautionary measures are required to achieve a sufficient degree of reproducibility for comparative studies.

Besides their wider spectrum and their relative easiness of use as compared to proteome studies, transcriptome analysis presents additional advantage, as it is becoming quite clear that, besides their role as vector of information from gene to proteins, transcripts play important functions independently of their own translation. As a matter of fact, increasing amount of works demonstrate regulation of transcript expression via endogenous antisense transcripts [32]. Thus, it seems legitimate to carry out studies on transcriptomes, especially as long as proteome analysis will remain technically limited and demanding.

Prerequisites for transcriptome analysis in kidney

It has been recognised for decades that understanding kidney functions and dysfunctions requires circumventing the cellular heterogeneity of the organ [38]. Even when considering exclusively the glomerulo-tubular complexes of kidney, this heterogeneity is high: each nephron consists of successive tubule segments with specific functions, and some segments may consist of several intermingled cell types with distinct functions. Two strategies have been developed to palliate this heterogeneity: the physical separation of glomeruli and nephron subsegments by surgical or immunological dissection and the establishment of cellular models such as primary cell culture or immortalised cell lines. As long as physiological problems are addressed, only native tissue is relevant. Indeed, despite their advantage for addressing questions relevant to cell biology, primary cell cultures as well as established cell lines undergo important loss of function (both qualitatively and quantitatively) and some gain of function through dedifferentiation/transdifferentiation, which rule out their use for physiological purposes. For example, comparison of the transcriptomes of native mouse collecting ducts [7] and immortalised collecting duct principal cells [45] reveals marked differences, although the same method was applied for analysis (Table 1). As a matter of fact, (1) mRNAs for specific markers of principal cells such as aquaporin 2 (the vasopressin regulated water channel) [47] or type 2 11β-hydroxysteroid dehydrogenase (an enzyme that contributes to the aldosterone sensitivity of the collecting duct) [1] are 200- and 20-fold less abundant in immortalised principal cells than in collecting ducts; (2) conversely, mRNAs for pendrin, a marker of collecting duct β intercalated cells normally absent from principal cells [46], and for urokinase, which is preferentially expressed in thick ascending limbs and proximal tubules [65], are expressed at relatively high level in cultured cells; and (3) transcripts expressed at high level (≥1‰ of all mRNAs) include 40 transcripts for ribosomal proteins in immortalised cells versus two transcripts in native collecting ducts. This index of high protein metabolism may imply that immortalised cells are not fully quiescent and fully differentiated.

Table 1 Comparative expression of a selection of genes in mouse native collecting duct and immortalized collecting duct cells. This table compares the abundance of serial analysis of gene expression mRNA tags in two libraries constructed from microdissected mouse outer medullary collecting ducts (OMCD) and from a mouse kidney cortical collecting duct cell line (mpkCCD), after normalization to 20,000 tags in the two libraries. Three sets of transcripts were selected for comparison: (1) transcripts corresponding to functional cell markers of collecting duct principal cells, including aquaporins 2 and 3 (AQP2 and AQP3), 11β-hydroxysteroid dehydrogenase type 2 (11OHD2), the apical potassium secreting channel (ROMK), and two subunits of Na,K-ATPase and of the apical amiloride-sensitive sodium channel (ENaC); (2) transcripts corresponding to “housekeeping” functions equally expressed in OMCDs and mpkCCD cells; and (3) transcripts normally absent from collecting duct principal cells and/or expressed at higher level in mpkCCD cells than in OMCD. Data are from Cheval et al. [7] and Robert-Nicoud et al. [45]

This absolute necessity to analyse transcriptomes on native kidney cells generates a major constraint that results from the tiny quantities of tissue obtainable either by surgical or immunological dissection relative to the amounts required for classical transcriptome analysis methods. As a matter of fact, a 1-mm long segment of nephron contains approximately 500 cells, i.e. 0.1 ng of poly(A) mRNA, whereas classical transcriptome methods such as DNA microarrays or chips and serial analysis of gene expression (SAGE) [60] require approximately 2–10 µg of poly(A) mRNA. However, it is now possible to palliate in part this problem by using down scaled adaptations of SAGE. Two types of adaptations have been introduced that reduce by 250- to 1000-fold the required amount: (1) addition of an early amplification step in the library generation procedure in SAR-SAGE [63], SAGE-Lite [42] and aRNA-longSAGE [25], and (2) improving the yield of mRNA extraction and cDNA synthesis in SAGE adaptation for downsized extracts (SADE) [64], miniSAGE [71] and microSAGE [9]. SADE was specifically developed for transcriptome analysis on ≈100 mm of nephron segments, an amount easily accessible through surgical microdissection, and all results presently available on renal transcriptome at the level of individual structures were obtained with this method.

Principle, potentialities and limits of SAGE

SAGE consists in characterising each cDNA obtained by retrotranscription of tissue mRNAs by a short (10 bp) informative nucleotide sequence, or tag. Several transcript-specific tags are concatenated into long DNA molecules which are cloned and sequenced. Computer-assisted analysis of the sequences permits to extract, classify and count the different tags, and database queries allow identification of them. The method is quantitative, as the relative abundance of the different tags in the library reflects the abundance of the cognate transcripts in the biological sample.

Without going into the details of SAGE procedure, it is worth considering some aspects which have implications for data analysis.

Deepness of analysis

Admitting that a cell contains 300,000 mRNA molecules, sequencing of 30,000 tags allows analysis of 10% of the transcriptome: under this condition, transcripts present at 3,000–30 copies per cell are detected (300–3 tags expected) with a probability ≥95%, whereas transcripts present at three copies per cell will most often not be detected (26% probability of being detected).

Using the equation of Clarke and Carbon [8], one calculates that detecting a transcript present at one copy per cell with a probability of 95% would require to sequence 900,000 tags. Besides the cost of this procedure, increasing the deepness of analysis beyond ≈50,000 tags per library is not a satisfactory method for increasing the sensitivity of SAGE. Indeed, SAGE is based on single lecture, single-strand DNA sequencing, a procedure that conveys an error rate of ≤10−2 (≤one mismatch each 100 nucleotides), meaning that up to 10% of the 10-bp tags contain a sequence error. This means that for a tag counted 100-fold in a library, there are 10 erroneous tags that differ by a single base in their sequence (30 possibilities). Considering that sequencing errors are random in nature, the probability for observing the same error at the same site in two samples of a same tag is negligible (1/30), and therefore the large majority of erroneous tags due to sequencing errors are present only once in the library. Practically, this means that one can get rid of sequencing errors just by dismissing from data analysis the tags counted only once. This would no longer hold for a tag counted 1,000-fold (i.e. the same tag but obtained at a deepness of analysis of 500,000 instead of 50,000 tags). Given these limitations, one must keep in mind that SAGE will not detect transcripts expressed at low level reliably.

Tag identification

The specific identification of a tag to a transcript relies on (1) the nucleotide sequence of tags, as a 10-bp oligonucleotide theoretically discriminates > 106 different molecular species, and (2) the location of the tag relative to the poly(A) tail of the transcript. Indeed (Fig. 1), due to the construction procedure, the tag corresponds to the 10-bp located downstream the 3′-most site of cleavage by a restriction enzyme (anchoring enzyme) with a 4-bp recognition site [usually NlaIII (CATG recognition site) or, in our SADE adaptation, Sau3A (GATC recognition site)]. Note that cDNAs lacking recognition site for the anchoring enzyme do not generate tags and are therefore artefactually excluded from the transcriptome analysis.

Fig. 1
figure 1

Schematic localisation of a serial analysis of gene expression (SAGE) tag along cDNA. The bold line corresponds to the coding region, the AE boxes to the anchoring enzyme recognition sites [GATC in case of Sau3A utilised in SAGE adaptation for downsized extracts (SADE)]. The tag (XX . . . XX box) corresponds to the ten base pairs located downstream of the 3′ -most anchoring enzyme site. Although in most cases the tag is located in the 3′UTR region (as shown on the diagram), it can also be within the coding region

Thus, tag identification through sequence matching in database may lead to different classes of annotation: (1) cDNA or EST annotations (with or without corresponding protein) for tags matching a cDNA or an EST sequence at the right position [note that this requires that authors have deposited the full length cDNA sequence containing the poly(A) in databases]; (2) unreliable match annotations for tags matching a cDNA or EST sequence without possible localisation with regards to the poly(A) tail; and (3) no match annotation for tags without matching sequence in GenBank. In addition, cDNA and EST annotation may not be univocal, as two different transcripts may correspond to a same tag. The efficiency of tag identification can be improved by using the longSAGE adaptation of SAGE which theoretically generates 17-bp tags and therefore permits better discrimination among transcripts [48]. Furthermore, it allows pertinent identification of no match tags through comparison with genomic sequences.

SAGE tag identification has been facilitated by the development of facilities freely accessible on the Web such as SAGEmap (http://www.ncbi.nlm.nih.gov/SAGE/) or SAGEgenie (http://cgap.nci.nih.gov/SAGE) (note that presently SAGEgenie is restricted to the analysis of SAGE tags generated using NlaIII as anchoring enzyme).

Gene discovery

As already mentioned, sequencing of whole genomes is not sufficient to identify all the genes of a species and, a fortiori to identify all transcripts. Because SAGE is based on sequencing of cDNA fragments, it offers this possibility to identify unknown transcripts (the no match tags mentioned above). Interestingly, although the sequence information contained in a tag is reduced, it proved to be sufficient to permit PCR cloning of cognate full-length cDNA [58].

Our laboratory currently masters both large scale microdissection of nephron segments and the SADE technique, allowing us to routinely generate transcriptomes from 500–1,000 microdissected segments of nephron. In the following, we will discuss the relevance and outcomes of recent findings obtained using this approach. Two applications have been considered: (1) the axial comparison of transcriptomes along the nephron to define the molecular basis of renal functional heterogeneity and (2) the physiological regulation and pathophysiological dysregulation of gene expression in specific nephron segments.

Segmental analysis of gene expression along the human nephron

Analysis of approximately 50,000 tags in each of eight substructures of the human nephron [glomerulus (Glom), proximal convoluted tubule (PCT) and proximal straight tubule (PST), medullary thick ascending limb of Henle’s loop (MTAL) and cortical thick ascending limb of Henle’s loop (CTAL), distal convoluted tubule (DCT) and cortical collecting duct (CCD) and outer medullary collecting duct (OMCD)] produced >92,000 different tags including >32,000 tags counted at least twice and therefore unlikely corresponding to sequence errors (see above) [5]. Data confirm the large dynamic range of gene expression with (1) the most abundant tag counted >4,000-fold and the less abundant ones only once, and (2) the 92 most abundant tags (1‰ of all molecular species) accounting for >22% of all tags. In view of such an abundant source of data, the question arises as to how to mine relevant pieces of information. Two different strategies can be used, on a non-exclusive manner, according to the questions that are addressed: global, non-selective analysis of data and gene-targeted analysis. Following are some examples of such strategies and pieces of information that can be derived.

Clustering analysis of human renal SAGE libraries

As the less a priori-based method for transcriptome data exploitation, one can perform general clustering analysis of SAGE data, using software freely available on the Web [4, 13]. Such an analysis of human kidney data reveals several specific clusters of genes that roughly correspond to genes differentially expressed in the different nephron portions. On that basis, it is possible to establish kinship links among renal structures. As shown by the hierarchical tree on top of Fig. 2, clustering analysis reveals the distinct positions of the Glom and of the PCT and PST versus the other tubular segments. It also shows a close relationship between the PCT and PST and of the CCD and OMCD. Unexpectedly, it indicates a closer relationship between MTAL and DCT than between MTAL and CTAL.

Fig. 2
figure 2

Hierarchical clustering analysis of SAGE data for eight human nephron segment libraries. Dendrogram on the top represents relationships among SAGE libraries corresponding to eight substructures of human nephron: Glom glomerulus, PCT proximal convoluted tuble, PST proximal straight tubule, MTAL medullary thick ascending limb of Henle’s loop, CTAL cortical thick ascending limb of Henle’s loop, DCT distal convoluted tubule, CCD cortical collecting duct and OMCD outer medullary collecting duct. Only the proximal tubule-specific cluster with continuous transition to the less specific genes also expressed in Glom (left panel), and in MTAL and CTAL (right panel) is included in the figure. Each row represents a gene/SAGE tag, whereas each column corresponds to a SAGE library/nephron portion. The expression level in each library is presented by panel colour, ranging from green for minimum expression to red for maximum expression, black corresponding to median expression level and grey boxes corresponding to the absence of tag count in a given library. The colour intensity represents the magnitude of the deviation from the median. At the right of the panel shown in red, the names of SAGE tag identifications corresponding to solute carrier(s) or other gene(s) implicated in tissue transport and permeability (the gene symbols are from the Human Genome Organization nomenclature and usual alternative names in brackets), in blue the names of genes having more general function implicated in cell proliferation and differentiation and in black the tag pairs corresponding to the same or similar genes as an internal control of clustering procedure. Only genes with total tag counts of at least 40 are included in analysis. The uncentred Pearson correlation was used for distance calculation, and the average-linkage clustering was performed on logarithmically transformed median centred SAGE data, using Michael Eisen software program Cluster, version 2.12, followed by tree diagrams visualisation with TreeView, version 1.47 [13]

As an example of segment specific gene clusters, Fig. 2 shows the genes displaying maximum expression in the PCT and PST. Moreover, we can see a continuous transition from strictly proximal tubule-specific genes to less specific genes also expressed in Glom (at the bottom of left panel) or in the MTAL and CTAL (right panel). Identification of tags in these proximal tubule-specific gene clusters confirms the preferential expression of solute carriers [organic anion transporter (OAT1) SLC22A6] [30] and sodium-dependent dicarboxylate transporter (NADC3, SLC13A3) [66] and water channels (AQP1) [72] known to be expressed preferentially or exclusively in proximal tubules. Unexpectedly, a rather large proportion of proximal tubule-specific tags correspond to genes involved in general cell functions (shown in blue), including an established or predicted role in the regulation of cell cycle, proliferation and differentiation (C6ORF108, KIA1007, ADO37, MAX, DLEC1, ACY1), apoptosis (PDCD8), and stress responses (GPX3, GADD45A). These findings demonstrate that the axial heterogeneity of kidney tubule concerns not only expression of genes underlying the specific physiological functions of the different nephron segments, but also genes involved in cell division and differentiation (see below). Finally, it is interesting to note the close clustering association between two tags with related sequence (gtccGTGctg vs gtccTGTctg). The first one was identified as polycystic kidney disease type I protein (PKD1) [23], whereas the second did not match to any GenBank sequence. This may suggest the existence of a frequent, but yet non-identified, polymorphism in this part of the gene, and it would be interesting to confirm the existence of this polymorphism and to evaluate whether or not it is associated with polycystic disease.

Results of clustering analysis are quite dependent on the choice of parameters and algorithms. It is noticeable, however, that such clustering provides relevant information as long as (1) it reveals specific expression of physiologically relevant genes in a given tissue, and (2) tags corresponding to a same gene (due to splice variants and/or polymorphism) such as ALDOB or LOC348158 or to functionally related genes (such as BHMT and BHMT2) are closely segregated in the same clusters.

Search for expression of specific genes along the nephron

Conversely, another way of mining transcriptome data is to localise expression along the nephron of specific genes of interest. The relevance of this approach is based on the confirmation by SAGE data of known segmental expression of genes. As a matter of fact, the two panels on top of Fig. 3 illustrate that the abundance of tags in the Glom and the different nephron segments are well correlated with the known expression of the related proteins, whether proteins are expressed along the whole nephron or are specific of a single segment or a subgroup of renal structures. For example, tags of the α1, β1 and γ subunits of Na,K-ATPase are found along the whole nephron, but their abundance culminates in thick ascending limb and DCT, where the functional expression of Na,K-ATPase is highest [27]. Similarly, tags for specific markers such as podocalyxin (Glom) [28], aquaporin 1 (AQP1, proximal tubule) [72], furosemide-sensitive Na,K,2Cl cotransporter (NKCC2, TAL) [70], thiazide-sensitive Na,Cl cotransporter (NCCT, DCT) [36] or aquaporin 3 (AQP3, collecting duct) [12] are exclusively detected in the appropriate libraries. The predictional value of tag abundance along the nephron for determining the profile of protein expression remains valuable even for tags found at low level in the SAGE database. For example, at the time of the cloning of the non-erythroid rhesus-associated proteins Rhcg and Rhbg [34, 35], SAGE data available in our laboratory indicated a preferential abundance of the corresponding tags in the distal nephron, a finding that was later verified both at the transcript level by RT-PCR on microdissected nephron segments and at the protein level by immunolocalisation on kidney slices [14, 44].

Fig. 3
figure 3

Patterns of distribution of SADE mRNA tags along the different portions of human nephron. Tag abundance indicates mRNA tag counts in libraries normalised to 50,000 tags. The two top panels show the distribution of the tags corresponding to the α1, β1 and γ subunits of Na,K-ATPase (left panel α, β and γ NKA) and to markers of the different renal structures (right panel): podocalyxin (Glom), aquaporin 1 (AQP1 proximal tubule), furosemide-sensitive Na,K, 2Cl cotransporter (NKCC2 TAL), thiazide-sensitive Na,Cl cotransporter (NCCT DCT) and aquaporin 3 (AQP3 collecting duct). The lower four panels show distribution of non-identified tags corresponding to hypothetical proteins (DKFZ, LOC, KIAA or FLJ HUGO nomenclature) or without GenBank identification (no match) that are preferentially expressed in one or a subgroup of renal structures

Because those segment-specific proteins are associated with important functions whose alterations are often associated with diseases, it will be important to characterise the function of genes of yet unknown function expressed in a segment-specific manner. Mining the human nephron segment database reveals a quite large number of tags without functional identification, including 12 no-match tags, 25 tags identified as cDNAs without known protein and 42 tags corresponding to hypothetical proteins. The four panels at the bottom of Fig. 3 illustrate the specific distribution of some of these tags.

Axial heterogeneity of expression of genes not involved in tubular transport properties.

Under physiological conditions, the rate of cell division in adult kidney is low [2], since basal production of epithelial cells only compensates for the loss of the epithelial cells into the urine. This balance between mitosis and cell loss is tightly controlled to prevent both kidney size reduction and hyperplasia. In contrast, under many pathological conditions, a marked change in cell turnover rate occurs as a result of both induced cell death and regeneration processes. As a matter of fact, recent results demonstrate the role of cell cycle regulating proteins in diverse glomerular diseases, as well as apoptosis of damaged cells, dedifferentiation and proliferation of surviving tubular epithelial cells in proximal tubules during recovery from ischemia/reperfusion injury [2, 21, 37, 40]. Thus, identification of the genes which control these processes in the kidney presents an important pathophysiological relevance.

As mentioned above, besides genes involved in transport and permeability properties (i.e. solute carriers, water and ion channels, ion-transport ATPases), the analysis of differential gene expression along the human nephron has revealed a new, yet unidentified, specificity in expression pattern in the different nephron portions for the genes participating in cell differentiation and proliferation control. Figure 4 shows different classes of such genes in the different clusters of renal structures.

Fig. 4
figure 4

Distribution of genes for cell differentiation and proliferation control differentially expressed along the human nephron. In the left box are shown the external and intrinsic stimuli inducing proliferative cellular responses, with examples of segment specific genes in different nephron portions coding for growth factors. On the right the genes are regrouped according to their established or predicted role in indicated cellular processes (vertical alignment). The genes on the overlap of the two oval boxes represent the functional class sharing the both functions. The genes differentially expressed in different nephron segments, i.e. in GLOM and in PCT/PST segments, MTAL/CTAL and CCD/OMCD segments are horizontally aligned

In adult human tissue, proliferative responses may be induced by various external and intrinsic stress signals, including DNA damage, toxic compounds, radiation or oxidative stress, induction of oncogenes and growth factors. Such responses may be observed in kidney shortly after injury, e.g. acute renal failure, and result in both cell death and regenerative cell replication [43]. At this very initial step, there is already a cell-specific response with, for example, glomeruli expressing vascular endothelial growth factor and fibroblast growth factor, whereas the thick ascending limbs express epidermal growth factor, consistent with specific functions assigned to these different growth factors [17, 37, 40]. The first step after sensing particular stress stimuli is the rapid induction of transcription of immediate early response genes in association with the entry of cells from quiescence into the cell cycle. Here again, the different nephron segments express specific genes belonging to this functional class, e.g. DNA-damage-inducible β and α protein genes (GADD45B and GADD45A) in the Glom and proximal tubules, respectively [26, 33, 57]. These two early induced genes participate to the regulation of apoptosis, a feature also shared by immediate early response 3 [69] and clusterin [55], which are specifically expressed in collecting ducts. The next step in proliferative response is placed under the control of cell cycle progression, and it appears that the different renal structures also express specific regulators acting at the G1/S and G2/M phase transition checkpoints [41, 53]. For example, Glom specifically express the cyclin-dependent kinase inhibitor collecting ductKN1C, which controls G1/S transition, whereas the collecting ducts express CDKN1A (p21) [68] and BTG2, which control the same step [11, 54].

The large number of genes involved in these processes and their high level of expression may appear paradoxical for normal kidney tissue. It should be stressed, however, that the SAGE libraries were constructed from nephron segments dissected from healthy fragments of kidneys surgically removed from donors with kidney tumours, making it possible that the tissue was not totally normal. In addition, surgical constraints in humans induce periods of warm and cold hypoxia before tissue microdissection which may act as a stress signal for proliferation. In any case, the main observation remains that axial heterogeneity of nephron segments concerns not only genes involved in physiological functions, but also involved in the control of cell division, differentiation and death. Because clustering of all tags or of the tags corresponding to those genes participating in cell differentiation and proliferation control reveal the same kinship among renal structures, one might suggest the existence of a causal relationship between expression of specific genes of cell differentiation/proliferation and establishment of a final differentiation status of the different renal cell types.

Regulation of gene expression

One of the objectives of physiology is to decipher the mechanisms underlying functional regulations. In relation with their key role in regulating the milieu intérieur, mammalian kidneys display remarkable aptitudes to adapt their transport functions in response to a wide variety of stresses, including food and water intake, drugs, hormonal environment, diseases and genetical alterations. Within the kidney, the distal tubule system plays a central role, as these nephron segments are the main targets for the regulation of water, sodium, potassium and acid/base balances, and are important contributors to calcium and magnesium balances.

When considering exclusively sodium balance, dysregulations of the distal tubule system are responsible for widespread pathologies including hypertension and oedema diseases. Indeed, all the gene products which have been shown to alter blood pressure in human, or whose deletion or overexpression modifies blood pressure in mouse or rat, are directly related to the transport of sodium in the distal nephron or to its regulation [18]. When blood pressure is not altered, abnormal sodium retention by the distal nephron may promote oedema formation [59].

Now, if the cellular mechanisms of sodium transport in the distal nephron are well known, in particular with the molecular and functional characterisation of the apical sodium entry pathways (thiazide-sensitive NaCl cotransporter and epithelial sodium channel ENaC) and of basolateral Na,K-ATPase, the mechanisms of regulation and the basis of pathological dysfunctions remain not completely understood. For example, aldosterone has been known for decades to be the main positive regulator of distal tubule sodium transport, but except for the early demonstration of its actions on sodium transporters [15, 29, 56], only a small number of aldosterone-induced genes have been identified [3, 6, 51]. The same holds for vasopressin, which controls not only water but also sodium balance. Similarly, for oedema-inducing disease such as nephrotic syndrome, the only available information regarding the dysregulation of collecting duct sodium transport concerns its independence towards aldosterone and vasopressin [10].

In conclusion, available data indicate that (1) the molecular mechanisms of the genomic effects of aldosterone and of vasopressin in the distal nephron remain mostly unknown, (2) the molecular disorders responsible for distal sodium retention in hypertensive patients are known only in the minute fraction of patients with a mendelian form of the disease, and (3) the independence of sodium retention towards aldosterone and vasopressin in oedema disease demonstrates the existence of yet unknown pathways that control distal tubule sodium transport. Thus, a promising application of renal transcriptome analysis will be to compare profiles of gene expression along the distal nephron in normal and in conditions of altered sodium transport.

As a preliminary step, one had to question the ability of SAGE to identify transcripts differentially expressed in a given nephron segment microdissected from animals under different physiological conditions. Indeed, these adaptations are likely to induce changes in gene expression profile of lesser amplitude than the differences prevailing among distinct nephron segments. Comparison of SAGE libraries constructed from OMCD of normal mice and mice fed a potassium-depleted diet for only 3 days [7] revealed the feasibility of this strategy. As a matter of fact, sequencing of ≈20,000 tags in each library demonstrated statistically different levels of expression for ≈200 tags. Interestingly, 7% of the differentially expressed tags have no match in GenBank, further demonstrating that transcripts of physiological interest remain to be discovered. Also interesting, functionally identified tags overexpressed during potassium depletion are involved in different functional classes, including not only ion and water transporters, but also proteins involved in the control of cell cycle, apoptosis and differentiation. Kinetical correlation between the variations of expression of some of these genes and the development of cell hypertrophy and hyperlasia suggests their involvement in the control of the switch from hyperlasia to hypertrophy. In view of the above-mentioned objective to identify genes involved in the regulation of a given cell function, these results also outline the difficulty in data mining, since it seems arbitrary to decide whether a gene whose expression is modulated for example during potassium depletion is correlated with potassium transport or with any secondary cell adaptation.

One means to distinguish the genes specifically related to maintenance of the sodium balance from those related to other functions may be to compare transcriptional and functional adaptations to several “stresses” that, among various adaptative responses, all induce changes in sodium transport. As schematised on Fig. 5, analysis of transcriptomes of collecting ducts in response to adaptation to three different stresses allows characterisation of three subsets of transcripts whose expression is modified. Functional analysis of collecting ducts under the same three adaptations allows revelation of various functional and morphologic changes (with altered sodium transport as a common feature to all three adaptations). If adaptations were chosen so that sodium transport is the only functional change common to all three conditions, the group of transcripts common to the three adaptations can be defined as underlying sodium transport. This type of analysis, which aims at characterising new physiologic processes underlying regulation of sodium transport and at identifying putative targets involved in the alterations of sodium balance that induce hypertension or oedema, is currently developed in our laboratory. Besides sodium transport, it will provide profusion of data regarding other physiological processes originating in the collecting duct.

Fig. 5
figure 5

Strategy for characterising the sets of genes involved in a given cell function in distal nephron segments. The strategy is based on multiple comparisons of changes in molecular (transcriptome) and functional phenotypes of distal nephron segments in response to different adaptations (models 1–3). Adaptations are chosen to induce a common change of the function of interest (e.g. sodium transport) along with various alterations of the other functions of distal nephron segments (including transport functions and regulation of cell cycle and differentiation). Based on previous experience, sequencing of 50,000 tags in each condition should allow the characterisation of 30,000 different tags in each library, among which 1,000 should reach statistical difference of abundance when comparing two libraries

Conclusion

Methods

Coupled to microdissection, SAGE appears as a reliable method to identify gene expression profiles along the nephron in humans as well as in laboratory animals and to study changes in expression during pathophysiological processes in well defined nephron segments. Nonetheless, SAGE remains tedious, and its use limited to too few laboratories for two reasons: firstly, it requires relatively large capacities for DNA sequencing for a physiology laboratory, but this can be palliated by recoursing to national large sequencing facilities, and secondly, tag identification remains a fastidious step. However, this last limit is toning down along with the increasing number of identified tags and the increasing information about genomes. When SAGE allows drawing up of the complete catalogue of transcripts expressed in the kidney, a foreseeable evolution will be to generate dedicated DNA arrays, inasmuch the sensitivity of array methodology will have been increased to become compatible with microdissected tissue.

Microdissection of renal tissue does not circumvent totally the kidney heterogeneity, since glomeruli and distal nephron segments are made of several cell types. To circumvent this restriction, it is conceivable, however, to generate SAGE libraries from subpopulations of cells isolated from these structures by immunodissection [16]. Alternately, cell localisation of transcripts identified by SAGE in these renal structures can be determined a posteriori by in situ hybridisation on kidney sections (when expression is high enough to be detectable) or by single-cell RT-PCR [39].

Transcriptome comparison between human and mouse [7] reveals quantitative differences that might be either of biological origin (which raises the problem of the generalised use of mouse models for human physiology and pathologies) or of methodological origin (inherent to the condition of material collection in humans: ethical questions, pathology of donors, latency in treatment of samples). To make possible reliable studies on human kidney tissue, a solution might be to couple the recourse to biopsy samples, as managed for example by the European Renal cDNA Bank network [50], with laser microdissection to separate different structures or cells.

Fields of application of renal transcriptome analysis

Discovery of new genes and/or transcripts is the most obvious application of renal transcriptomes. Preliminary reports in human as well as in mouse revealed an important number of transcripts displaying a high specificity of expression along the nephron, and therefore of putative physiological relevance, but of unknown molecular or functional identity. In particular, a special attention should be paid to the tags corresponding to putative antisense transcripts which display the same specificity of expression as their cognate transcript. The role of these naturally occurring antisense transcripts in the regulation of transcript expression is an open new field of investigation of major interest in regulatory physiology.

One of the most original observations of human kidney transcriptome analysis was the axial specificity of expression of genes involved in cell proliferation/death/differentiation. It will be important to define the relevance of this specificity and in particular, its relationship with terminal cell differentiation.

Finally, one of the most exciting application concerns the establishment of relationships between molecular and functional phenotypes, with the physiological objective of defining the whole network of molecules and interactions underlying specific functions. In this regard, the kidney appears as an outstanding system, owing to its high cell plasticity. SAGE is also a powerful approach in that field because, as it provides absolute tag counts, it allows in silico comparison to previously published transcriptomes.