Introduction

Most smokers begin smoking during adolescence and early age of smoking initiation is related to more frequent current smoking, daily smoking and more dependent smoking (Everret et al., 1999; Lando et al., 1999). Smoking initiation (yes/no) is influenced by both genetic and environmental influences (Li et al., 2003a; Sullivan and Kendler, 1999). However, relatively little is known of the genetics of age at onset of smoking. A study by Heath et al. (1999) reported a heritability of 62% in a young cohort (age ≤30 years) and 51% in an older cohort (age >30 years). The remaining variance was attributed to unique environmental influences. There was no evidence for shared environmental influences on age at onset of smoking. In contrast, Stallings et al. (1999) reported that recollected age at first use and age at daily use was not influenced by genetic factors but only by shared environmental and unique environmental influences. The age of the participants in their study ranged from 50 to 96 years (Stallings et al., 1999). Both studies used a different analytical approach, a different definition of the variable and different age cohorts, which may account for the different results.

Several genome wide scans of smoking behavior have been published (Bergen et al., 1999; Bierut et al., 2004; Duggirala et al., 1999; Gelernter et al., 2004; Li et al., 2003b; Saccone et al., 2003; Straub et al., 1999; Sullivan et al., 2004; Vink et al., 2004; Wang et al., 2005) but none of these scans used the phenotype “age at first cigarette”. A recent association study in a Chinese population used the related phenotype “early smoking onset” and has indicated a possible association between this phenotype and polymorphisms of the dopamine transporter gene on chromosome 5p (Ling et al., 2004). The rs2072-A allele was significantly associated with smoking onset less than 18 years (OR = 4,0 in the total population of 668 smokers).

In summary, less is know about the genetic influences on age at first cigarette and results are contradictory. In the present paper we will first investigate whether age at smoking the first cigarette is a heritable trait in a Dutch twin-family sample. We also tested whether age at first cigarette is a unidimensional trait. Next we report the results from a complete genome scan of age at first cigarette in a genotyped subsample of dizygotic twins and sibling pairs.

Methods

Subjects

This study is part of an ongoing twin-family study on health, lifestyle and personality of the Netherlands Twin Register (Boomsma et al., 2002). For this study data from the surveys of 1993, 1995 and 1997 were used. In total, 7836 twins and siblings participated at least in one of those surveys. Participants aged 18 years and older when completing the survey were selected (n=6175). No data on age at first cigarette were available for 74 subjects. Half-siblings and adoption-siblings were excluded (n=47), leaving 6054 participants in the study. The mean age when completing the survey was 26.2 (SD 9.2) for males and 26.3 (SD 9.3) for females.

For the genetic model fitting analyses the twins and at most two brothers and two sisters were selected. This procedure excluded 18 brothers and 39 sisters. The remaining sample consisted of 3687 twins (587 MZM, 470 DZM, 1041 MZF, 670 DZF, 453 DOS females and 466 DOS males), 758 brothers and 894 sisters from 2453 families. DZ twin and non-twin sibling pairs were selected for the linkage scan. DNA marker data were available for 630 sibling pairs. Genotypic and phenotypic data were available for 422 DZ twins and siblings from 175 families, forming 368 sibling pairs. The sample consisted of 117 sister pairs, 95 brother pairs and 156 opposite sex pairs.

Phenotype

Twins and siblings were asked at what age they smoked their first cigarette. Response categories were: 11 years or younger, 12, 13, 14, 15, 16, 17, 18 years or older, and never. This question was included in the surveys of 1993, 1995 and 1997. Our analyses rely on the retrospective recall of age at smoking first cigarette. Because self-reported data on substance use can be unreliable (Johnson and Mott, 2001), we selected subjects on the following criteria: (1) not more than 1 year difference for age at first cigarette in the longitudinal data or (2) subjects who reported they never smoked in the first survey(s) and who reported an age at first cigarette in later survey(s). According to those criteria, 464 subjects (about 16% of all subjects with longitudinal data) were excluded.

Genotyping

Genotyping was performed by two facilities: the Mammalian genotyping service, Center for Medical Genetics, Marshfield, USA, and the Molecular Epidemiology Section, Leiden University Medical Centre, The Netherlands. Ninety four persons were included in both scans. A 379 autosomal marker genome scan (9.44 cM spacing) was done by the Mammalian Genotyping Service, using microsatellite screening set 10 (Yuan et al., 1997) with few alternative markers. Markers with excessive recombination rates (n=5) and inconsistently marked markers (n=5) were not included in the analysis. A 419 marker genome scan (8.34 cM spacing) was performed by the Molecular Epidemiology Section, Leiden University Medical Centre, The Netherlands (Beekman et al., 2003; Heijmans et al., 2005).

Mendelian errors were detected using PEDSTATS and unlikely double recombinants using MERLIN; both types of error were removed using PEDWIPE (Abecasis et al., 2002). Pedigree relationships in the entire sample were checked with the GRR program (Abecasis et al., 2001). The location of the markers was taken from an integrated genetic map with interpolated genetic map positions (http://www2.qimr.edu.au/davidD/). The position is in deCODE cM (Kong et al., 2002), estimated via locally weighted linear regression from the Build 34.3 (and 35.1) physical map positions and published deCODE and Marshfield genetic map positions. If available, parents were typed for between 344 and 375 markers (mean of 363±6). For offspring, the number of typed markers ranged from 344 to 678, with an average of 389 (±69) total markers.

Genetic Model Fitting

Structural equation modeling in Mx was used to fit models to the data (Neale et al., 1999). First, to explore whether the variable age at onset of the first cigarette (measured with the categories 11 years or younger, 12, 13, 14, 15, 16, 17, 18 years or older, never) is unidimensional we fitted three different models to the data (Heath and Martin, 1993; Koopmans et al., 1999; Vink et al., 2005):

  • Single liability model: postulates that the liability to age at onset of the first cigarette is unidimensional.

  • Independent liability model: assumes two independent liability dimensions for smoking initiation and the age at first cigarette.

  • Combined model: the combined model includes features of both the single liability model and the independent liability model. Like the independent model it postulates the existence of two separate dimensions but it allows the possibility that there are some genetic and environmental risk-factors which influence both dimensions.

In a saturated model we tested whether the means and variances of males and females differed from each other, and whether the covariances of twin–sib and sib–sib pairs were different. We also tested whether age at completing the survey influenced the individual differences in reported age of smoking the first cigarette by including “age at completing the survey” as a covariate in the model fitting procedure. The resulting most parsimonious model was the one which was used to test the relative contributions of genetic and environmental influences. Sources of variation that were considered in modeling were additive genetic variance ( \(\sigma^{2}_{\rm am}\) for males and \(\sigma^{2}_{\rm af}\) for females) shared environmental variance ( \(\sigma^{2}_{\rm cm}\) for males and \(\sigma^{2}_{\rm cf}\) for females), and variance due to non-shared environmental influences including error of measurement ( \(\sigma^{2}_{\rm em}\) for males and \(\sigma ^{2}_{\rm ef}\) for females) (Fulker and Cherny, 1996).

Under this model, the expected covariance between sibling pairs is given by

  • \(r_{\rm g}\sigma^{2}_{\rm am}+r_{\rm c}\sigma^{2}_{\rm cm}\) for brother pairs,

  • \(r_{\rm g}\sigma^{2}_{\rm af}+r_{\rm c}\sigma^{2}_{\rm cf}\) for sister pairs and,

  • \(r_{\rm g}\sigma_{\rm am}\sigma_{\rm af}+r_{\rm c}\sigma_{\rm cm}\sigma_{\rm cf}\) for brother–sister pairs.

where r g is 1 for MZ twin pairs, 0.5 for same-sex sibling pairs and is estimated between 0 and 0.5 for opposite sex sibling pairs (to test for qualitative sex differences), r c is 1 in MZ and DZ twin pairs and estimated between 0 and 1 in the non-twin sibling pairs (twin–sib or sib–sib), to test for a special shared twin environment.

The means were modeled according to the formula:

$$Y_{ij}=\mu +{\beta} \hbox{Age}_{ij}+e_{ij}$$

where y ij is the observed phenotype for sibling j in the ith family, \(\mu\) denotes the grand mean and was allowed to differ for males and females, β represents the regression coefficient for age, ageij represents the age of sib j from the ith family, and e ij represents the residual term that is not explained by the fixed effects of age and sex.

IBD Estimates

The probability of sharing 0, 1 or 2 alleles IBD were estimated for a 1 cM grid using the Lander–Green algorithm implemented in the program Merlin (Abecasis et al., 2002). The estimate of the proportion of alleles shared identical by descent ( \(\hat{\pi}\)) was obtained as: \(\hat{\pi}_{ijk} = 0.5 {p_{({\rm IBD}=1)}}_{ijk} + 1 {p_{({\rm IBD}=2)}}_{ijk}\) where \(\hat{\pi}_{ijk}\) is the estimated proportion of alleles shared IBD between sib j and k for the ith family, and \({p_{({\rm IBD}=1)}}_{ijk}\) and, \({p_{({\rm IBD}=2)}}_{ij}\) are the probabilities that sib j and k share 1 or 2 alleles, respectively, conditional on the marker information (Sham, 1998).

Linkage Analyses

Linkage analyses were performed in Mx (Neale et al., 1999). Variance components linkage analyses included the variation explained by a putative quantitative trait locus (Q) at or near a specific location on the genome. Estimates of this variance component were obtained using the \(\hat{\pi}\) approach, in which the covariance due to the marker or trait locus for a sib pair is modeled as a function of the IBD status of the sib pair. The variance in liability and the phenotypic variance of the residual term, respectively, were decomposed into additive genetic variance ( \(\sigma^{2}_{\rm a}\)), shared environmental variance ( \(\sigma^{2}_{\rm c}\)), variance due to non-shared environmental influences ( \(\sigma^{2}_{\rm e}\)) and variance due to the QTL ( \(\sigma^{2}_{\rm q}\)) (Fulker and Cherny, 1996).

\(r_{\rm g}\sigma^{2}_{\rm am}+r_{\rm c} \sigma^{2}_{\rm cm}+\hat{\pi}_{ijk} \sigma ^{2}_{\rm qm}\) for brother pairs,

\(r_{\rm g}\sigma^{2}_{\rm af}+r_{\rm c}\sigma^{2}_{\rm cf}+\hat{\pi}_{ijk} \sigma ^{2}_{\rm qf}\) for sister pairs and,

\(r_{\rm g}\sigma_{\rm am} \sigma_{\rm af}+r_{\rm c} \sigma_{\rm cm} \sigma_{\rm cf}+ \hat{\pi} _{ijk} \sigma_{\rm qm} \sigma_{\rm qf}\) for brother–sister pairs

where r g and r c and \(\hat{\pi}_{ijk}\) are defined as above. The analyses included also phenotypic data from the (non-genotyped) MZ twin pairs to allow the distinction between background additive genetic and other familial effects. For the MZ twins the covariances were modeled as

\(\sigma^{2}_{\rm am} + \sigma^{2}_{\rm cm} +\sigma^{2}_{\rm qm}\) for MZM and,

\(\sigma^{2}_{\rm af}+\sigma^{2}_{\rm cf} + \sigma^{2}_{\rm qf}\) for MZF

First, male and female QTL effects were estimated as separate parameters. Minus twice the log-likelihood of this model compared to a model where the male and female QTL components are constrained equal gives a test for sex-specific QTL effects. When both the male and female QTL effects are simultaneously dropped from the model, the difference in degrees of freedom is 2. The distribution of the test-statistic is in this instance: \({\scriptstyle 1/4} \chi ^{2} 0 :{\scriptstyle 1/2}\chi^{2} 1:{\scriptstyle 1/4}\chi^{2} 2.\) One alternative is to treat it as a standard two degrees of freedom test, which is more conservative. We have decided to use this latter, more conservative method. Finally, the QTL effect for males was equated to the QTL effect for females and linkage was tested by comparing the fit of this model with a model without a QTL (1 df). Significance of genetic variation due to the QTL was evaluated by the likelihood ratio test, from which the LOD score can be calculated by dividing the test statistic χ2 by 2 ln 10 (∼4.6) (Sham, 1998).

The three loci with LOD scores >2.0 were modeled simultaneously in a three-locus linkage analysis. This simultaneous analysis is more robust to the well-known overestimation of variance attributable to markers in a linkage analyses. In a single-locus analysis, the QTL effects may be overestimated and together sum to a percentage of explained variance >100% (Geus de et al., 2005; Goring et al., 2001). The three-locus analysis constrains the sum of the QTL variances to be less than 100% of the phenotypic variance. To calculate a LOD-score for the model without the three loci, the p-value of the χ2 difference with 3 degrees of freedom was used to estimate the corresponding χ2 with 1 degree of freedom. This χ2 was used to calculate the LODscore by dividing this test statistic by 2 ln 10.

For the chromosome with the most promising finding, the chromosome-wide empirical p-value was determined by simulation, using the 1000× permutation algorithm as described by Lystig (2003) and implemented (for genome-wide thresholds) by Posthuma et al. (2005). Permuted datasets were obtained as follows: each row in the observed dataset represents one family and contains phenotypic data for each individual within that family as well as the IBD probabilities across the chromosome for all pairs within that family. This file is split into a phenotypic file and an IBD file. Families in the experiment are labeled with unique numbers one through n. The phenotypic data are then shuffled by taking a random permutation of the indices 1,...,n and matching the ith phenotypic trait value to the family with index given by the ith element of permuted indices. This permuted vector of traits is matched with the original (non-permuted) genotypic information for all families. One thousand permuted datasets were generated. The significant linkage threshold was defined as the LOD-score occurring in 50 of the 1000 permutation corresponding to a probability of 0.05 in a genome scan (Lander and Kruglyak, 1995).

Results

Descriptives

The distribution of age at first cigarette in males and females is shown in Figure 1. The distribution is skewed due to the number of subjects that never smoked a cigarette (around 30% of the sample). The skewness statistic is −0.402 with SE 0.033. In general, slightly more males than females try their first cigarette on an early age. The median age is 15 years for males and 16 years for females.

Fig. 1
figure 1

Distribution of the age at first cigarette for males and females.

To test whether the variable was unidimensional three different models were fitted to the data: a single liability model, an independent liability model and a combined model (Heath and Martin, 1993; Vink et al., 2005). The single liability model, which postulates that the liability to age at onset of the first cigarette is unidimensional, was the most parsimonious model (Table I). Thus, subjects who never tried a cigarette could be included in the analyses.

Table I Goodness-of-fit of the Single Liability Dimension Model, the Independent Liability Model and the Combined Model to the Data on Age at First Cigarette

Twin and Sibling Correlation

The means for males and females were significantly different; males report an earlier age at onset than females (16.3 for males vs. 16.7 for females). The variances of age at first cigarette could be constrained to be equal for males and females. The covariances between same sex twin–sibling pairs and same-sex sibling–sibling pairs and the covariances between opposite-sex twin–sibling pairs and opposite-sex sibling–sibling pairs could also be constrained to be equal. Dropping the age-effect (age at completing the survey) significantly worsened the fit of the model. Table II shows the correlations and 95% CI for twin and sibling pairs for the age at smoking the first cigarette. For males the MZ correlations are twice as high as the DZ correlations, suggesting additive genetic influences. For females, the DZ correlation was greater than half the MZ correlation, indicating that common environmental effects may also contribute to the familial clustering of individual differences in age at smoking the first cigarette. The DZ twin correlation was greater than the sibling correlation suggesting a special twin environment.

Table II Twin- and Sibling-Correlations with 95%CI for Age at First Cigarette

Genetic Model Fitting

Model fitting results for the age at smoking the first cigarette are summarized in Table III. The models included sex-differences in the mean and in parameter estimates. Setting the shared environmental correlation in non-twin sibling pairs to 1 (model 2) significantly reduced the goodness of fit of the model indicating a special twin environment. Model 3 tested whether the genetic correlation (r g) in opposite sex siblings could be constrained to be 0.5, like the correlation in same sex sibling pairs. Constraining this correlation significantly worsened the fit of the model. For males the MZ correlation was (almost) twice as high as the DZ correlation, suggesting that shared environmental influences are absent or very low. Dropping the shared environment from the model (model 4) did not reduce the fit of the model in males, but resulted in a significant (p<0.001) loss of fit in females (model 5). In conclusion, the variation in the age at first cigarette was best described by an AE model for males (A=60%, 95%CI 51–68% and E=40%, 95%CI 34–47%) and an ACE model for females (A=39%, 95%CI 22–59%, C=30%, 95%CI 11–46% and E=31%, 95%CI 27–46%). The genetic correlation in the opposite sex siblings was estimated to be 0.40 (95%CI=0.27–0.58) in the AE–ACE model. The shared environmental correlation in sibling-pairs (non-twin siblings) was estimated to be 0.33 (95%CI=0.16–0.57). The effect of age was rather small (β=−0.017 in AE–ACE model) but significant (χ2 =12.7, df=1, p=0.0004).

Table III Model Fitting Results for Age of Smoking First Cigarette

Linkage Analyses

Because variation in age at first cigarette was heritable, we explored this phenotype in a linkage analysis. The linkage analyses were performed on a smaller sample. As summarized above, we found that the genetic correlation in opposite sex sibling pairs was somewhat lower than 0.5, therefore the male and female QTL effects were estimated as separate parameters. To test for sex-specific QTL effects, minus twice the log-likelihood of this model was subtracted from that obtained when the male and female QTL effects were equated. For most positions along the genome, constraining the QTL effects for males and females to be equal did not worsen the fit of the model. For only five regions, the model with sex-specific QTL effects provided a more parsimonious model than the model with a single QTL effect. For four of these regions a model without QTL effects did not fit worse, suggesting no significant linkage for these regions. However, for a region on chromosome 6 (166–182 cM), the model with sex-specific QTL effects was more parsimonious, and the LOD score in that region was higher (1.3) than for the model with 1 QTL (0.5). The grey line in Figure 2 (chromosome 6) represents the LOD-scores for a model with sex-specific QTL effects.

Fig. 2
figure 2

Linkage results for sex-specific linkage (grey line) and linkage results for analyses where QTL effects were equated for males and females (black line). For the region from 166 to 182 cM the sex-specific model were more parsimonious. LOD-scores of the sex-specific linkage model were higher in this region than LOD-scores of the model without sex-specific QTL effects. For the rest of the chromosome, the model where QTL effects were the same for males and females did not fit significantly worse.

Thus, the QTL effect for males was equated to the QTL effect for females and linkage was tested by comparing the fit of this model to a model without a QTL (Fig. 3). LOD scores greater than 2 were found for chromosomes 5, 14 and 22 (Fig. 4). Three other peaks were found with LOD scores between 1.5 and 2.0 on chromosomes 6, 7 and 10. The six regions with LOD score >1.5, the highest LOD score and the location of the highest LOD score are shown in Table IV.

Fig. 3
figure 3

LOD scores across the genome (chromosomes 1–22) for age at first cigarette. centiMorgans are shown on the x-axis and LOD-scores are shown on the y-axis.

Fig. 4
figure 4

Linkage results for age at first cigarette on chromosomes 5, 14 and 22. Distance in cM is shown along the x-axis and the LOD-scores along the y-axis.

Table IV Highest LOD Scores (≥1.5) and Locations

The three loci with LOD scores >2.0 were modeled simultaneously in a three-locus linkage analysis. For all three loci simultaneously, an adjusted LOD score of 1.8 was found (Table V). For chromosome 22, the evidence of linkage disappeared in the multilocus analysis. For chromosomes 5 and 14 the evidence weakened in comparison with the single-locus analyses. The peak on chromosome 5 explained the most of the variance in age at first cigarette.

Table V Simultaneous Linkage Analysis of Three Loci with a Single-Locus LOD Score >2.0 for Age at First Cigarette

The significant linkage threshold was defined for chromosome 5 by simulation, using a 1000× permutation algorithm. The LOD score occurring in 50 of the 1000 permutations (corresponding to a probability of 0.05) was 2.01. The empirical p-value for a LOD score of 2.09 was 0.04, indicating that the peak on chromosome 5 is significant.

Discussion

A quantitative trait representing the age at first cigarette was assessed in a genetic and linkage analyses. Correlations between sibling pairs were consistent with the existence of a genetic component, as were variance components estimates. The linkage analyses showed suggestive linkage on chromosomes 5, 14 and 22.

Variation in age at first cigarette was significantly influenced by genetic factors, heritability was estimated at 60% for males and 39% for females. Shared environmental influences were found only in females (30%).

Heath et al. (1999) explored a slightly different phenotype: onset of smoking, based on the questions “have you ever been a smoker” and if yes “at what age did you start smoking”. A one-dimensional model gave a poor fit to those data. In the study of Heath et al. (1999), the age at onset was largely influenced by genetic factors (62% in young cohort and 51% in older cohort) but not by shared environmental factors. In a study of Stallings et al. (1999) the variable “At what age did you smoke your first cigarette” was analyzed. The genetic analyses were performed on residual scores of the age at onset variable, partialing out effects of gender, age and interactions between these effects. Dropping the additive genetic influences from the model did not result in a significant decrement in model fit. The estimate of shared environmental influences was 42%. It should be noted that this sample was an older cohort (born between 1904 and 1937) and that the environmental culture regarding smoking has changed considerably since the time subjects in this sample initiated smoking. In addition, 75% of the sample was female and the genetic analyses did not allow for sex differences.

The genetic analyses in the present study were performed on a relatively young sample (mean age 26 years). The correlation between the genetic factors in opposite-sex siblings was estimated at 0.40, suggesting partly different genes are expressed in males and females. Therefore the linkage model allowed for sex-limitation. Sex-limited effects on the phenotypic variance, also known as genotype-by-sex interaction, has been detected in several physiological and psychological traits (Comuzzie et al., 1993; Lamb et al., 2005; Towne et al., 1993; Weiss et al., 2005). The implementation of models in which the effects of quantitative trait loci are allowed to differ between males and females has been described previously (Towne et al., 1997). Comparison of the model where QTL effects for males and females were allowed to differ and the model where QTL effects for males and females were the same showed that the latter was not fitting worse. An exception was a region on chromosome 6. For the region from 166 to 182 cM the model where the QTLs for males and females were allowed to differ was more parsimonious. This resulted in a higher LOD score (1.3 vs. 0.5) for that specific region.

Results of the model where the effects of the QTL on males and females were the same showed suggestive linkage peaks (with a LOD score higher than 2) on chromosomes 5, 14 and 22.

On chromosome 5, a peak was located between 196 and 209 cM. The simultaneous three-locus analyses confirmed the importance of this peak. Two other studies detected a peak in this region (Duggirala et al., 1999) detected a peak with a LOD score of 3.2 for “ever smoked” on 217 cM while (Saccone et al., 2003) detected a peak with LOD score of 1.10 for maximum number of cigarettes (non-smokers excluded) on 200 cM. Two other studies reported linkage on chromosome 5 in a non-overlapping region. A peak was found for nicotine dependence at 160 cM (Sullivan et al., 2004) and a peak was found for habitual smoking at 119 cM (Bierut et al., 2004). The peak found in the present study encloses a gene coding for the D1A dopamine receptor. Dopaminergic genes are likely candidates for heritable influences on cigarette smoking (Lerman et al., 1999; Sabol et al., 1999). The dopamine D1 receptor gene might play a role in addictive behaviors. In a group of controls and individuals attending a smoking cessation clinic, 39.3% of the controls vs. 66.1% of the smokers was homozygous for the 11 or 22 genotype of the dopamine receptor D1 gene (p=0.0002), (Comings et al., 1997). This region on chromosome 5 merits follow-up in future studies.

The peak found on chromosome 14 (120–128 cM) is located in the same region as a peak for nicotine dependence on 126 cM (Sullivan et al., 2004). Three other studies found peaks in the neighborhood of this region: on 111 cM for smoking rate (Li et al., 2003b), on 95–110 cM for smoking initiation (Bergen et al., 1999) and on 87 cM for maximum number of cigarettes (Saccone et al., 2003).

The highest LOD-score was found for chromosome 22. Remarkably, in the simultaneous linkage analyses of the loci on chromosomes 5, 14 and 22, the peak on chromosome 22 was not significant anymore. No other genome-scan for smoking behavior detected linkage on this chromosome. The highest LOD-score (2.5) was found at 20 cM but the peak covered most of the chromosome. The CYP2D6 gene (a cytochrome p450 enzyme) is located on chromosome 22, but is not located in the same region as our highest peak. The cytochrome p450 enzymes are involved in nicotine metabolism and neurotransmitter synthesis. Both CYP2A6 (on chromosome 19) and CYP2D6 (on chromosome 22) catalyze the formation of cotinine from nicotine. Taken together, the peak on chromosome 22 might be spurious.

Peaks with LOD-scores between 1.5 and 2.0 were detected on chromosomes 6, 7 and 10. Although those LOD-score were <2.0 and therefore linkage in these regions could be spurious, there is some overlap with other publications. The peaks on chromosomes 6 and 10 are (partly) overlapping with peaks for smoking initiation and for maximum number of cigarettes that we detected in a previous linkage analyses for in partly the same Dutch sample (Vink et al., 2004). For chromosome 6, the results are a replication of a peak for smoking initiation with LOD score of 1.10 reported in the same region (63 cM) (Duggirala et al., 1999). Although the regions on chromosomes 6 and 10 do not harbor obvious candidate genes for smoking, our results suggest that genes involved in smoking behavior may be located in this region.

Although age at onset of first cigarette is technically a continuous measure, in the present study it is collected as a quasi-continuous measure because answer categories were: 11 years or younger, 12, 13, 14, 15, 16, 17, 18 years or older and never tried. All subjects were aged 18 years or older. If the subjects who never tried smoking will start smoking in the future, they will change only one category (from “never” to “18 years or older”). In general it is accepted to analyze data with more than seven categories as continuous variable (Dolan, 1994). An alternative is to use a threshold model where the trait is considered to have an underlying, continuous liability. However, using a threshold model reduces power (Neale et al., 1994) and by combining categories information is lost.

It should be noted that the sample size in our study is relatively small and none of the linkage results met genome-wide statistical significance criteria. Due to the small sample size, there is a possibility for false-positive findings simply by change. However, we determined the empirical p-value for chromosome 5, using a 1000× permutation algorithm. The empirical p-value of 0.04 for a LOD-score of 2.09 indicated that the peak on chromosome 5 (harboring the D1A dopamine receptor) is significant.

In conclusion, we detected 3 genomic regions with LOD score >2 (on chromosomes 5, 14 and 22) that may harbor susceptibility loci for age at first cigarette. A simultaneous three-locus analysis confirmed the importance of the peak on chromosome 5. There is some overlap between our results and those of previous genome scans looking at smoking behavior. To our knowledge, this is the first genome scan for age at smoking the first cigarette. The other genome scans to smoking behavior used different phenotypes. Each phenotype is likely to be influenced by multiple genes, leading to a different picture of the genetic architecture of smoking behavior.