Introduction

Kidney stone disease (KSD) is a common urological disease with a multifactorial etiology, including a polygenic milieu [1, 2]. Approximately 80% of the major components include calcium containing-stones, such as calcium oxalate and calcium phosphate [3]. Other stone components included uric acid, cysteine, and magnesium ammonia-phosphate [4]. Calcium-containing kidney stone disease (CKSD) has a high recurrence rate up to an estimated 50% within five years [5]. Aside from genetics, there are many systemic diseases associated with CKSD, including gout, diabetes, obesity, metabolic syndrome, hyperparathyroidism, and hyperthyroidism [5, 6]. Environmental and lifestyle factors, such as geographical location, warm weather, and dietary habits are also reported to be associated with CKSD [5]. Therefore, CKSD formation may be a polygenetic factor linked to systemic diseases that may interact with the environment and other factors.

Some previous studies on single-gene polymorphisms (SNPs) have attempted to study the contribution of a single gene to the pathogenesis of KSD, such as IL-18, TAP-2, urokinase, E-cadherin, and vitamin D receptor genes [7,8,9,10,11]. However, there are several limitations, including a less effective single allele and a small population. In 1996, Risch et al. reported that the identification of the entire human genome and its genetic polymorphisms after a human genome project led to new insights into the analysis of risk loci for common diseases [12]. Recently, genome-wide association studies (GWAS) have been conducted to elucidate the influence of multiple factors associated with KSD [13]. Several novel risk loci have been identified in the Japanese and British populations (12,123 cases and 417,378 controls ), including DKGH, CYP24A1, BCR, WDR72, and GPlC1 [14]. The CLDN14 gene polymorphism was identified and studied in a GWAS of 3,773 cases and 42,510 controls from Iceland and the Netherlands [15]. ALPL, CaSR, SLC34A1, TRPV5, and other genes were identified [16]. A GWAS study of a large-scale Japanese population of 11,130 stone patients and 187,639 controls identified 14 significant loci, including nine novel loci [17]. These genetic factors related to the regulation of metabolism and crystallization pathways contributed to the development of KSD. However, GWAS studies specifically focusing on CKSD in other Asian populations, including the Taiwanese population, are lacking. There is also an issue regarding the accuracy of the stone patients and the controls from a self-reported database. In addition, the recruited patients were not accurately classified according to their compositions. Therefore, we aimed to conduct a GWAS focusing on CKSD from a database of a tertiary medical center with the exclusion of other stone compositions, such as infection, uric acid, and cystine stones from the Taiwanese population.

Materials and methods

Subjects and database information

This study utilized electronic medical records from China Medical University Hospital (CMUH). The individuals with urine pH < 5.5 or those whose urine culture yields Proteus mirabilis, Pseudomonas, Gardnerella, Lactobacillus, and Enterobactericeae, Klebsiella pneumoniae are excluded. Those stone analysis revealed cystine stone was also excluded. We used International Classification of Diseases (ICD)-9th Edition (‘592.0’, ‘592.1’, ‘592.9’, ‘592.90’), and ICD-10th Edition (‘N13.2’, ‘N20.0’, ‘N20.1’, ‘N20.2’, ‘N20.9’, ‘V13.01’, ‘Z87.442’) associated with urolithiasis to identify the case group and aged above 30 years. The control group consists of individuals with negative results for red blood cells (RBCs) in urine and aged above 40 years. After matching for gender, we included 14,934 patients in the case group and 29,868 patients in the control group. This study was approved by the Ethics Review Board of China Medical University Hospital (CMUH111-REC1-026) and was part of the CMUH Precision Medicine Project that began in 2018, which was also approved by the ethics committees of CMUH (CMUH110-REC3-005 and CMUH111-REC1-176).

Genotyping

Genomic DNA was extracted from 200 µl peripheral blood samples of all study participants using the MagCore Genomic DNA Whole Blood Kit from RBC Bioscience, Taiwan, following the manufacturer’s prescribed protocol. To obtain genetic information from the Taiwanese population samples, we employed the Affymetrix Axiom genotyping platform, specifically utilizing the Axiom Taiwan Precision Medicine (TPM) customized SNP array (Thermo Fisher Scientific, Inc., Santa Clara, CA, USA), which covered 714,457 SNPs across the entire human genome.

For data analysis, we utilized PLINK1.9 and excluded samples and SNPs with missing rates greater than 0.1 (--geno 0.1 for SNPs and -mind 0.1 for samples). Additionally, variants with a Hardy-Weinberg equilibrium P-value less than 1e-6 (--hwe 1e-6) and a minor allele frequency (MAF) less than 1e-4 (--maf 0.0001) were also excluded. To improve phasing accuracy, SHAPEIT4 was applied to phase TPM arrays. Subsequently, we employed Beagle5.2 for imputation, which has demonstrated enhanced effectiveness and accuracy compared to other imputation tools. During the imputation process, data were filtered based on an R-square alternate allele dosage of less than 0.3 and a genotype posterior probability of less than 0.9 as the criteria [18].

Genome-wide association study (GWAS)

To identify associated variants, we employed PLINK 1.9 to generate summary statistics for individuals with urolithiasis and the control group. To determine the familial history of second-degree relatives, we collected data for one individual from each familial group within the targeted group, ensuring representation from both cases with different phenotype groups. Family membership was established using PLINK 2.0 KINSHIP analysis. For the case-control-based Genome-Wide Association Study (GWAS), we adopted an additive genetic model. Logistic regression was used to analyze trait associations, adjusting for multiple covariates such as gender, age, and principal components (PCs). To avoid issues of collinearity in linkage disequilibrium (LD) that could lead to overestimation, we selected the most significant variant. Significant associations with urolithiasis were determined based on a p-value threshold of < 1e-05 or a more stringent threshold of 5e-08. To visualize the results, we created Manhattan plots and quantile-quantile plots using the “qqman” R package. Additionally, we presented region plots of the variants of interest using LocusZoom tools.

Polygenic risk score (PRS) analysis

To compute the Polygenic Risk Score (PRS), we partitioned the cohort randomly into three datasets: the base, training, and testing groups (80%, 10% and 10% of total case and control patients, respectively). Initially, we investigated the association between the studied variables and urolithiasis in the base group using PLINK1.9. Subsequently, we constructed a PRS utilizing the training group and PRSice2 tools while filtering out variants with a Minor Allele Frequency (MAF) greater than 0.01 [19]. As a reference panel, we utilized the 1000 Genomes Phase v.3 data of the East Asian population. The PRS was calculated through z-score normalization. Subsequently, we validated the PRS models in the testing group. To assess the classification accuracy of the PRS, we employed receiver operating characteristic (ROC) curves and calculated the area under the curve (AUC). These statistical analyses were conducted using IBM SPSS Statistics (version 22).

Statistical analysis

Continuous and categorical variables at baseline in the different genotype groups were assessed using appropriate statistical tests, including Student’s t-test, Chi-square test, or Fisher’s exact test. To compare multiple groups, a one-way analysis of variance (ANOVA) with Tukey’s post hoc test was conducted. The significance level for all statistical tests was set at a two-sided level of 0.05. All statistical analyses were carried out using IBM SPSS Statistics (version 22) and R software (version 4.1.0).

Results

GWAS in patients with CKSD

There was 9519 (63.7%) male and 5415 (36.3%) female patients in the stone group (total 14,934 patients). The matched control group had twice the number of patients (29,868), which included 19,038 (63.7%) male and 10,830 (36.3%) female patients. The clinical characteristics of patients with CKSD and control individuals are presented in Table 1. A flowchart of the study sample is presented in Fig. 1.

Table 1 Clinical characteristics of patients in control and stone patients
Fig. 1
figure 1

Flow chart of the genome-wide association studies (GWAS) pipeline. A total of 14,934 patients with CKSD and 29,868 control patients were enrolled. The study population was randomly divided into three groups, namely base, training, and testing, for polygenic risk score (PRS) analysis. A GWAS was performed in the base group, and the results were used to train the PRS model in the training group. The PRS model was then validated in the testing group

We randomly divided the patients with CKSD and controls into the base group (80% of study cohort) and the replication group (20% of study cohort). A GWAS was conducted in the base group, and 432 SNPs were determined to reach genome-wide significance at a threshold of P < 10− 5. A total of 132 SNPs reached stricter significance at a threshold of P < 5 × 10− 8 on chromosomes 4, 13, 16, 17, and 18. The GWAS results are shown as Manhattan and quantile-quantile (QQ) plots (Fig. 2A). Quantile–quantile plot demonstrating that the observed P values were consistent with the expected values (Fig. 2B). The red line represents the expected values. The region plots of chromosomes 4, 13, 16, and 17 are presented in Supplementary Fig. 1. The results were uploaded and analyzed using LocusZoom, an open tool for analyzing and visualizing GWAS results. The most significant loci according to LocusZoom are listed in Table 2.

Fig. 2
figure 2

Manhattan plot and quantile–quantile plot of CKSD. (A) The Manhattan plot depicts the peaks of SNPs surpassing genome-wide significance levels between the patients with CKSD and control groups. The blue line represents the threshold of P < 1 × 10− 5. The red line represents the threshold of P < 5 × 10− 8. The most significant SNPs in the gene regions of chromosome 2, 4, 13, 16, 17, and 18 were labeled with gene symbols. (B) Quantile–quantile plot demonstrating that the observed P values were consistent with the expected values. The red line represents the expected values

Table 2 Top loci identified from database

PRS analysis in CKSD

Next, we established a PRS model to assess the cumulative effects of multiple genetic variants in CKSD according to the GWAS results. The patients with CKSD and controls in the replication group were randomly divided into the training group and the testing group (10% of study cohort, respectively). We determined a set of 25 SNPs according to a p-value threshold of 0.0001 and an R2 threshold of 0.00397 to establish the optimal PRS model for the training group (Fig. 3A). The SNPs are listed in Table 3. The PRSs were compared in the CKSD and controls, and a significant difference was observed in the training group (normalized mean PRS ± SD: -0.039 ± 1.006 and 0.076 ± 0.984 in CKSD and control group, respectively; P < 0.001; Fig. 3B). The PRS model was repeated in the testing group and reached significant difference (normalized mean PRS ± SD: -0.029 ± 1.005 in CKSD versus 0.057 ± 0.988 in controls; P = 0.007; Fig. 3C).

Fig. 3
figure 3

Distribution of PRS in CKSD. (A) Bar plot of p value threshold. The x-axis is p value threshold and the y-axis are R square. (B, C) Distribution of PRSs in CKSD in the training group (B) and the testing group (C). Significance was calculated by Mann-Whitney U test and P value was adjusted with FDR-BH. *** represents P values < 0.005. * represents P values < 0.05

Table 3 Top loci identified from database

To assess the classification accuracy of the PRS model, we used the receiver operating characteristic (ROC) curve and found an area under the curve (AUC) of 0.535 (Fig. 4A), indicating the ability of the PRS model to discriminate between the patients with CKSD and the controls. The AUC of the PRS combined with sex and age increased to 0.655 (Fig. 4A). We then compared the risk of CKSD in each quantile and found that the patients in the top quantile for the PRSs had 1.39 folds the risk of being classified into the CKSD group compared to the patients in the bottom quantile (p < 0.001, odds ratio [OR] 1.39, 95% confidence interval [95% CI] 1.229–1.578; Fig. 4B). The OR of CKSD increased with the PRS quantiles, indicating that an elevated risk was associated with higher PRSs.

Fig. 4
figure 4

Statistical results and Quantile plots of PRS in CKSD. (A) ROC and AUC analyses for assessment of accuracy of the PRS model and the PRS model combined with age and sex for predicting CKSD. (B) Patients in the replication group were separated into quintiles on the basis of the PRS of CKSD and odds ratios for the likelihood of an association between each quintile of PRSs and CKSD were generated using logistic regression analysis. Values are the odds ratios with 95% confidence intervals

PheWAS analysis in CKSD

To assess the association between genetic variants and various diseases, we used Phenome-wide association studies (PheWAS) to investigate the diseases associated with PRS. PheWAS is a genetic epidemiological method designed to comprehensively assess multiple phenotypic traits, such as diseases, contributing to our understanding of the role of genes in health and disease. We divided all individuals in the replication group into quartiles based on PRSs and conducted a PheWAS analysis. We found an association between the SNPs used to establish the PRS model and the calculi of the kidney and urinary tract (Fig. 5).

Fig. 5
figure 5

PheWAS analysis of CKSD. The Manhattan plot of PheWAS of patients with CKSD by quartile PRS. The X-axis is classifications of physiological function. Red line indicates P value of 0.001, and blue line indicated P value of 0.05

Discussion

Since the causes of stone disease may vary with stone composition, this was the first survey that focused on calcium-containing stone diseases. We identified significant 182 genetic loci of SNPs from a GWAS with stricter significance. In the PRS analysis, after adjusting for age and sex, the area under the curve was 0.652. Our analysis differentiated the controls from the patients with CKSD. There was a 1.39-fold significant risk of CKSD in the top quartile compared with the bottom quartile in the PRS analysis. The highly significant genetic loci were DKGH, PDILT, and BCAS3, which have been previously reported in Japan and Europe [14, 16, 17]. This revealed significant SNPs associated with stone diseases. We also identified novel genetic loci in our enrolled patients focused on calcium-containing diseases instead of all stone diseases, including uric acid, infection, or cystine stones, which might have some confounding effects when performing GWAS and PRS analysis.

The nine lead loci were NFACT1, PCDH15, DGKH, ABCG2, PDILT, BCAS3, HDAC4, RN7SKP27, and AP003068.2. Of these, DGKH, PDILT, and BCAS3 have been reported in the European and Japanese populations. Calcium regulation-related genes include DGKH, NFACT1, and PCDH15. Both BCAS3 and HDAC4 are involved in histone acetyltransferase activity.

One of the top loci was rs682573, which encodes the DGKH gene with an odds ratio of 1.15 that increased the risk of KSD. DGKH, which is associated with stone disease, has been reported in several populations with different reference sequences. Howles et al. reported that the risk allele rs1170174 of the DGKH gene was not associated with stone disease [14]. This gene encodes the diacylglycerol kinase (DGK) enzyme family, which regulates the intracellular concentrations of diacylglycerol and phosphatidic acid. However, an association with urinary calcium excretion in male stone formers among AA homozygotes was reported by Howels et al. in a study with a small sample size [14]. DGKH may require further translational studies to confirm this association.

The nuclear factor of activated T cell cytoplasmic 1 (NFATc1) is a novel and significant genetic locus found in this GWAS that highlights the association between bone and vascular disease; its reported function is involved in osteoclastic activity and atherosclerotic calcification, both of which are associated with calcium regulation [20]. This may also be associated with calcium stone disease. PCDH15 is a member of the cadherin superfamily of membrane proteins. PCDH15 formation mediates calcium-dependent cell-cell adhesion. ABCG2 is a membrane protein belonging to the superfamily of ATP-binding cassette transporters. ABCG2 encoded protein function may play a major role in multidrug resistance in breast cancer [21]. Although there have been no previous reports of stone disease, the ABCG2 rs2231142 variant regulates renal urate excretion [22]. However, our finding for the ABCG2 risk allele was rs2199936, which is a different locus. Further studies on ABCG2 SNPs are required. Both BCAS3 and HDAC4 regulate histone acetylase activity [23]. Their functional role is related to carcinogenesis; however, no association with stone diseases has been reported.

Compared to self-reporting, clinical data from a single medical center have the advantage of verifying clinical information, which can be diagnosed accurately using clinical laboratory data and the final diagnosis from the chart reviews. For an accurate assessment of the disease, the database requires both width (small amounts of data with large numbers of participants) and breatth (large amounts of data with small numbers of participants) [24]. Questionnaires, self-reports, and databases from different sources may have some disadvantages. In addition, the database should include a follow-up period for diseases that may occur within a time interval that can record disease traits more accurately. Although our database has limited breadth, it has sufficient width because of medical records from a hospital base. Therefore, it is important to study a series of GWAS results for various diseases, such as alopecia, hyperthyroidism, early menarche, and body height have been published [25,26,27].

Stone disease can be caused by many pathogens with different genetic backgrounds. Most data sources from the GWAS were obtained from mixed large-scale biobanks, including the UK, Japan, and Taiwan [28]. The confirmation of stone patients was not unique and included medical doctors, images, self-reports, and questionnaires from computer touchscreens. Such a database has discrepancies in selection bias due to patient recruitment, recall bias, and uncertain stone composition [29, 30]. Different stone compositions may be associated with different pathogenic mechanisms, including genetic differences [31]. Infection stones are a major problem that may be caused by a urea-splitting microorganism rather than a genetic background. Uric acid stones are the major form in extremely acidic urine conditions (pH < 5.5). Furthermore, cystic stones are caused by inborn genetic errors [32]. Therefore, patient recruitment is an important issue when focusing on calcium stone disease.

To clarify the data, we recruited patients from a medical center. The diagnosis of stone disease was confirmed through evidence, including doctor’s diagnosis, stone analysis, image reports, and laboratory tests. Laboratory data were collected to rule out infections and uric acid stones. Moreover, patients with cystine stones were excluded. Although the number of patients was limited, our database focused on calcium-containing stones. The control group was also important for recruitment. In our study, we excluded control patients with hematuria from routine urine reports because of possible occult stone disease. Therefore, the database was more reliable. This may have some different SNP loci of SNPs apart from those in the UK and Japan biobanks.

This study included a limited number of patients’ ethnic source from a single medical center. We confirmed the diagnosis of calcium-containing stone disease in well-clarified control patients. Patients with other stone compositions that could alter the results of the genetic analysis were excluded. This decreased the chances of recall and selection bias. Some of the results were similar to the published genetic loci and demonstrated novel loci that were previously neglected.

Conclusion

We reported the first survey of GWAS on CKSD using a hospital-based database, in which some novel genetic loci could be further studied in terms of calcium stone disease formation. PRSs have emerged as accurate and efficient tools for predicting an individual’s health status and susceptibility to disease. By assessing an individual’s genetic composition, PRSs offer valuable insights into potential treatment effectiveness and the likelihood of a positive or negative prognosis, even before symptoms manifest. PRSs enable early identification of a condition before abnormal test results become apparent. The utilization of the PRSs empowers healthcare practitioners to embrace precision medicine, enabling the prescription of personalized preventive strategies tailored to each patient’s unique genetic profile. In this study, we developed multiple PRSs to enhance their discriminatory power in predicting CKSD, with the aim of contributing to the early diagnosis and prediction of this condition.