Introduction

Obesity is one of the worldwide concerns in medicine and public health, which could last for an extended time as an important risk factor for cardiovascular disease and some types of cancers [1,2,3,4].

Regarding the complex multifactorial nature of obesity, in addition to the environmental intervention, many genetic variations could affect obesity prevalence and its persistence over the lifetime. One of the most important gene-families in associations with obesity is the peroxisome proliferator-activated receptors family (PPARs) [5]. This family is responsible for coding a large group of nuclear hormone receptors and ligand-activated transcription factors, which contains two sites for ligand binding with three isotypes (alpha, beta, and gamma) [6, 7]. PPAR alpha regulates energy homeostasis by reducing the triglyceride level, while PPAR beta and PPAR gamma increase fatty acids metabolism and insulin sensitivity, respectively [6, 7]. More precisely, PPAR gamma (PPARG) is expressed mainly in the adipose tissue and regulates the expression of fat metabolism associated genes with less expression in other tissues [8, 9]. In humans, the PPARG gene is located at chromosome 3 (3p25.2) with nine exons, which code three subtypes (PPAR gamma 1, 2 and 3) with over 100 kilobytes long [10, 11].

Many association studies were conducted to discover the underlying relation between obesity-related risk factors and SNPs located at PPARG using a single SNP analysis [12,13,14,15]. However, we found no study, which addresses the correlation between SNPs in association analysis. Although such a single SNP analysis has been found useful in association analysis for common variants, it could decrease the power of association tests, especially for rare variants [16]. Moreover, SNP typing in the GWAS platform may not include the causal SNP. Therefore, single SNP analysis would discover only weak to moderate effects of SNPs that are in linkage disequilibrium (LD) with the causal SNP [16]. One way to increase the power of association is to make SNP sets such that each set consists of nearby SNPs with high LD (and probably with causal SNP) and conduct a region-based association analysis for each set [15, 16, 18, 19]. SNP set analysis could be analyzed through a kernel machine regression (KMR) model, which has the advantage of solving serious dimensional space problems and testing the joint effect of many SNPs as well as their interactions on the phenotype. Moreover, this regression-based method overcame the limited power of classical single-marker association analysis for rare variants poses a central challenge in association studies [16, 17]. One of the most powerful KMR models that handle test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait is named SKAT [18, 19]. This regression-based method overcame the limited power of classical single-marker association analysis for rare variants poses a central challenge in association studies and quickly calculate calibrated p values for rare variant association analysis in case–control studies [18, 19].

The main aim of this longitudinal study is to assess the association between variations at the PPARG gene with long-term persistent weight. To this aim, we compared allele frequencies of individuals who had long term persistent normal weight with individuals who had long term persistently obese weight, through the SKAT model. We also benefit from single SNP analysis for SNPs located at significantly related SNP set.

The findings of this study could be used to identify specific variants of the PPARG gene that predicts long-term persistent obesity in the Tehran Cardiometabolic Genetic Study (TCGS).

Materials and methods

Subjects and data

In this study, the information of adults (age ≥ 18) who participated in three consecutive phases of the Tehran Cardio-metabolic Genetic Study (TCGS) with permanent weight change were used. In brief, the TCGS is a part of a longitudinal study, the Tehran Lipid, and Glucose Study (TLGS), in which subjects were genotyped and followed for cardio-metabolic risk factors every three years since 1999. The TLGS is being conducted in six phases: phase I (1999–2001), phase II (2002–2005), phase III (2006–2008), phase IV (2009–2011), phase V (2012–2015), and phase VI (2016–2018). At each visit, subjects signed a consent form and were interviewed for obtaining demographic data and were then referred to trained physicians and laboratories for clinical examinations and blood sampling [20, 21]. The method has been described in detail in a previous [20, 21].

In this study, the inclusion criteria were: (1) having age ≥ 18 at entry time, (2) having BMI data at least in three consecutive phases of the study, and (3) having genotypes information at PPARG locus. Moreover, the exclusion criteria were: (1) having 25 ≥ BMI < 30 or (2) having BMI ≤ 20 or BMI ≥ 35 in at least in three consecutive phases. Individuals assigned to the case group who had long-term persistent obesity in the range of 30–35 of BMI (n = 1676) and individuals assigned to the control group who always had long-term persistent normal weight in the range of 20–25 of BMI (n = 1547), other individuals excluded from the study.

All procedures were following the ethical standards of the ethics committee on human subject research at Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences.

Genotyping

Samples were washed with lysis buffer where PBS and RBCs were separated. DNA was extracted from the white blood cells with an alkaline boiling method, and the extracts were stored at − 20 °C. Quantitative and qualitative assessments on the extracted DNA were performed by electrophoresis and spectrophotometry. DNA samples of 13,399 TCGS participants were genotyped with Illumina Human OmniExpress-24-v1-0 bead chip containing 649,932 SNP loci at the deCODE genetics company (Iceland) according to manufacturer’s specifications (Illumina Inc., San Diego, CA, USA) [21]. After quality control for markers and people, from the final dataset 134 SNPs in the PPARG gene was selected for further investigation.

Strategy for constructing SNP sets

We used the four-gamete rules procedure from Haploview software to construct SNP sets [22]. In brief, this procedure computes the population frequencies of the four possible two marker haplotypes [22]. Blocks are formed by consecutive markers where only three gametes are observed, and the recombination is deemed to have occurred when all four possible haplotypes are observed [22, 23].

Logistic Kernel machine regression (LKMR)

After constructing SNP sets according to the four gamete rules procedure, each SNP set were analyzed through the SKAT logistic regression in SKAT package of R software [18, 19]. This method is illustrated elsewhere [18, 19].

An additive genetic model for SNP set association analysis (i.e., we recode AA, Aa, and aa genotypes to 0, 1, and 2, respectively) was considered. Association analysis of the joint effect of all SNPs within a set was conducted by a logistic sequence kernel association test using the SKAT package in R software [18, 19]. In this study, we used nine kernel machine models regarding the type of kernel and covariates considered in the model. All kernels were linear with three different weights: (1) no weight, (2) weight each SNP proportionate to its allele frequency, (3) weight each SNP inverse to its allele frequency, the latest one means that rare SNPs gain more weight than common SNPs. These three models were repeated considering three covariates models: (1) no covariate, (2) gender and initial age as covariates, (3) gender, initial age, and power two of initial age as covariates.

Single SNP analysis

Single SNP analysis is conducted for SNPs in the significant SNP set, at 0.05 level, in the SKAT model. For this analysis, we used a logistic single SNP regression analysis and a resampling permutation test based on 1,000,000 iterations of MAX(T) for each SNP individually and considered the gender and the initial age as covariates using the PLINK2 program [24]. In this step, we reported parameters estimation and P values based on the logistic additive model and permutation test on the additive effect (with 1 million iterations) [25].

The risk allele for each associated marker was labeled, and for the presence of risk allele, the score of one is assigned. For all individual who carries the risk allele for selected markers, the mean of BMI was calculated and compared with the sum of the risk allele score.

Result

Attendances consist of 1547 subjects who had long-term persistent normal weight as control and 1676 subjects who had long term persistent obesity as cases. The mean ± SD of the age for control and case groups at the entry phase was 36 ± 0.4 and 42 ± 0.3, respectively. As expected, the case group was older and had higher mean values for all reported characteristics on obesity-related factors as well as lipid profiles except HDL. Moreover, while the mean age for males in both groups is approximately the same, it is more than ten years lower for females in the control group. The significant difference between cases and controls in all characteristics had been shown in Table 1 (P < 0.001).

Table 1 Summary for demographic and clinical characteristics of study individuals at three interval phase (mean ± SE)

Genotyped data cleaning and clustering procedure

After omitting SNPs with departure from Hardy–Weinberg equilibrium (exact test P < 0.01), 131 SNPs remained with the Min and Max for the minor allele frequency (MAF) of 0.0005 and 0.5, respectively. Moreover, these 131 SNPs were clustered into 15 sets according to the four gamete rules procedure of Haploview software with the minimum, and maximum SNP set sizes of 1 and 43, respectively. LD plot and haplotype plot under four gamete rule, with a connection between genotyped SNPs within the tested region of PPARG gene according to linkage disequilibrium (LD) between each pair of SNPs (|D′|), drowned and shown in supplementary materials Figs. 1 and 2.

Association analysis result

In this study, we used nine logistic SNP set regression models that explained previously; all nine models showed that one set (S2), which contains nine SNPs, has a significant association with persistent obesity. Figure 1 shows the LD plot for these nine SNPs and their correlation according to D′ values. To indicate changes in mean BMI by increasing the number of risk alleles for these nine SNPs (risk score), we calculate the mean BMI of all phases for each individual and plot the mean of BMI mean (MBM) against risk score in Fig. 2. This figure shows that MBM in individuals with no risk allele is about 27.5, and this value increases in individuals with one or two risk alleles. Individuals with 3 and 4 risk alleles have a lower amount of MBM than individuals with less risk score (0, 1 and 2), which might be the result of the small sample size in these two groups. Furthermore, if we consider point 5 for risk score as a separator point and categorize individuals into two groups (risk scoreless and more than five), there is a significant association between persistence obesity and risk score (P value of Chi-square was 0.01). That means individuals with persistent obesity were mostly in a high score group, and individuals with persistent normal-weight were mostly in less risk score groups.

Fig. 1
figure 1

LD plot according to the four gamete rules procedure of Haploview software for 9 significant SNPs

Fig. 2
figure 2

Mean of BMI means (MBM) against the number of risk alleles in 9 significant SNPs. This figure also shows the percent of cases and controls in each bar, depicted with R software

Moreover, as shown in Fig. 3, two SNPs (rs1899951 and rs4684848) are located at intron one, while seven other SNPs are located upstream of the PPARG gene. Comparing allele frequencies of these 9 SNPs among the populations from the five continents shows the assessment of population stratification (Fig. 4).

Fig. 3
figure 3

Schematic picture for PPARG gene and location of 9 significant SNPs at this gene

Fig. 4
figure 4

Compares minor allele frequency (MAF) in different populations including Iran

Table 2 shows the result of the kernel machine association and single SNP association analysis under the additive model for a significant set. According to the single SNP analysis result, all 9 SNPs in the meaningful set has an association with persistent obesity.

Table 2 The result of association analysis under additive linear LKMR and additive single SNP models

Discussion

The long-term persistent normal weight is a dream for human beings today. The most remarkable result to emerge from the data is finding the variations in the upstream region of the PPARG gene may play an exciting role in persistent weight. The most surprising finding through our longitudinal cohort study was finding a cluster of nine SNPs, which have different join effect between cases and controls. These markers were located at the upstream region and first intron of the PPARG gene and were correlated together with the high value of D′.

In the first intron, the T allele of rs17036328 was in association with Modified Stumvoll Insulin Sensitivity Index and fasting blood insulin. However, in our population, the T allele is a reference allele, and the presence of the C allele is in association with persistent obesity. Two other SNPs in the upstream region was previously associated with diabetes type 2 and obesity. The risk allele of rs4684848, which were in associated with diabetes type 2, is the G allele, but in our population, the minor allele is the A allele.

Many studies have been accompanied by the role of PPARG in humans. This transcription factor plays a pivotal role in adipogenesis, inflammatory response, and cell differentiation [26,27,28,29,30].

PPARG is known as the most important keys regulator of adipogenesis [31] that regulates the transcription of many other genes involved in cellular differentiation and lipid accumulation [31]. This gene also is the only known factor that is highly necessary for the occurrence of the adipocyte differentiation process and could change metabolic parameters through different signaling [32].

However, some case/control studies show variants located at the PPARG gene are associated with insulin resistance and type 2 diabetes but not with obesity [33]. For instance, a family-based study on Mexican American individuals showed that rs1175541, located at PPARG, is associated with body fat percentage. This study also confirms that 6 SNPs located in this gene (rs1175541, rs2972164, rs11128598, rs17793951, rs1151996, and rs3856806) have a significant effect on insulin resistance. However, they did not observe associations between any variants in PPARG and baseline levels or changes in measures of adiposity [34]. Although, variants on PPARG is reported to be associated with other persistent obesity-related diseases, including chronic kidney disease [35], type 2 diabetes, [36, 37] fasting insulin [37], lipid profile [37], risk of coronary heart disease [38], hypertension [39] and metabolic syndrome [40], more studies have to be conducted to assess their functionality or precise mechanism.

Our study has some advantages regarding previous studies. We compared the joint effect of correlated nearby SNPs in two groups of long-term persistent obesity and long-term persistent normal weight. The follow-up time of the study was at least ten years for all individuals. Kernel machine regression could help in reducing the false-positive results and reveals accurate signals of related markers that were not being genotyped but were in LD with genotyped markers.

What is already known on this subject?

Previous studies showed that variants on PPARG are associated with adiposity related diseases, however, we found no study on the association between these variants and persistent long term obesity. In this study, we assessed the association between the joint effect of variants on PPARG and long term persistent obesity.

What does this study add?

We declare that variants on PPARG are in a relationship with long-term persistent obesity, and their relation to long term persistent obesity consequences may need more studies.

Conclusion

In our knowledge, this is the first study on the association between PPARG variants with persistent obesity. Nine correlated nearby SNPs showed a significant joint effect on persistent obesity. Three of these markers were reported in previous GWAS studies to be associated with related diseases. For the studied markers in the PPARG gene, the Iranian allele frequency was near the American and European populations. Nowadays, the availability of population-scale disease-related genetic variants has enabled the researchers to survey the variant frequencies across different populations and estimate the genetic burden of disease.