Introduction

The relationship between smoking and melanoma remains unclear. Surprisingly, several published studies that investigated the relationship between smoking and malignant melanoma found at least weak inverse associations [113]. The inverse associations persisted in both prospective and retrospective study designs, in men and women, with examination of incident as well as fatal cases of melanoma, using detailed smoking exposures, and after controlling for potential confounding variables (Table 1). In a recent study of this perplexing smoking-melanoma relationship, a cohort analysis was performed using the American Cancer Society’s, Cancer Prevention Study II (CPS2) [14]. While the results of this investigation were far from conclusive, they were in support of smoking conferring a weak protective effect for the risk of malignant melanoma among a very large and well characterized cohort of Americans.

Table 1 Findings of prior studies investigating the relationship between smoking and melanoma

Tobacco smoke is a group I carcinogen, which is known to be causally related to at least 18 types of cancer, as well as cardiovascular diseases and chronic pulmonary diseases. Squamous cell carcinoma, another type of skin cancer, has been shown in some studies to be positively associated to tobacco smoke. In a recent mortality study of cancer patients who were smokers at the time of diagnosis, both current and former smoking were found to be positively related to all-cause and disease-specific death in women with melanoma [15]. However, to date, the weak inverse association of smoking and risk of melanoma persists in the literature and remains unexplained biologically.

A competing risk is an event whose occurrence either precludes the occurrence of another event under examination or fundamentally alters the probability of occurrence of that event [16], this may serve as a censoring mechanism by removing the subject from the at-risk population when the competing risk event cannot be observed. In the study of smoking-related diseases, competing risks are an important consideration because regular smoking increases the risk of death from a number of different outcomes. In 2009, in the US, the age-adjusted incidence rate for malignant melanoma was 22.58 per 100,000 and the median age of diagnosis was 61 years old, with 63 % of all incident cases occurring above the age of 65 years (40 % of which occurred above age 75) [17]. In the same year, in the US, the age-adjusted mortality rate from lung cancer was 48.49 per 100,000, with the median age of death of 72 years of age, and 60 % of lung cancer-attributable deaths occurring before the age of 74 [17]. Also in 2009, in the US, the median age of first acute myocardial infarction was 65 in women and 73 in men and 35 % of all deaths from CVD were before 75 years of age [18]. In 2005, in the US, the median age of death from COPD was 76 in men and 77 women [19]. Since many smoking-related diseases have similar and overlapping age at mortality distributions, studying one in a population with the prevalence of other smoking-related comorbidities makes competing risk bias unavoidable, especially, when trying to study a rare disease (e.g., melanoma) in a population at risk of more common of smoking-related diseases such as smoking-related cancer heart disease and COPD. Even if the competing risk does not result in death of a subject, it may preclude them from participation (either via refusal or exclusion) in studies of rare cancers such as melanoma.

In this study, we propose a plausible mechanism: competing risk bias which might explain or contribute to the observed association between smoking and malignant melanoma in study samples, after selecting for survival from other smoking-related diseases. We simulate two scenarios, one in which smoking does not cause melanoma, and one in which smoking causes melanoma. In each scenario, we explore the conditions that would be required to produce a spurious inverse association between melanoma and smoking after selection for survival from three major smoking-related diseases: lung cancer, heart disease, and chronic obstructive pulmonary disease (COPD). The aim of this study is not to show whether methods for addressing competing risks will overturn the observed inverse relationship, but simply to demonstrate that the presence of unaddressed competing risks can be seen as one possible explanation for the published inverse relationship between smoking and melanoma.

Methods

Causal assumptions

We represent our causal assumptions underlying the structural relationship between smoking and melanoma using the graphs as formalized in directed acyclic graphs (DAGs). The use of DAGs to express these causal relationships imparts a basic set of rules, which have been extensively described for use in observational research elsewhere [2024]. Briefly, in a DAG, nodes are occupied by variables. An arrow originating from a node or variable X and pointing to another node Y indicates a direct causal relationship between X and Y. Y is also a child or consequence of X. If variables X and Y are caused by another variable Z, Z is a common cause of X and Y and thus a confounding variable for the relationship between X and Y. This common cause path starting from X through Z and on to Y is a “backdoor” path as it does not represent a (direct or indirect) causal effect of X on Y. Without controlling for Z, the path between X and Y is open and biasing. If two variables X and Y, each has arrows pointing from them into a third variable S, then S is a “collider”. Since there is no flow of information from X via S to Y when S is a collider, this path is closed and can only be opened by conditioning or selecting on S. Conditioning on a collider such as S creates a biasing path between X and Y, the parents of S. The two major sources of biasing paths are uncontrolled confounding (Fig. 1a), that results from failing to control for a confounding (such as a common cause) variable, and conditioning on a collider, thus opening up a previously blocked path (Fig. 1b). A well-known example of collider bias is selection bias as a result of study participants’ non-response or loss-to-follow-up.

Fig. 1
figure 1

a X and Y are not causally related but there is an open “backdoor” path through the confounding variable Z. b X and Y are not causally related but there is an open path resulting from conditioning on collider variable S. c In the presence of, but not conditional on, the competing risk (CR), smoking (SM) and melanoma (MM) are unconditionally independent. d Smoking (SM) and melanoma (MM) are conditionally associated due to selecting within strata of a competing risk (CR)

The basic bias mechanism under investigation in this paper can be explained using the DAGs displayed in Fig. 1c, d. In both figures, smoking (SM) is an etiologic factor for a competing risk (CR) of malignant melanoma (MM). This competing risk precludes the occurrence of melanoma such that those who die from the smoking-related diseases are no longer at risk for melanoma. The competing risk and melanoma are related through an unknown variable (U), but there is no causal relationship between smoking and melanoma. In the unconditional analysis (Fig. 1c) there are no open paths from smoking to melanoma. However, when the analysis is conditioned on survival from the competing risk (Fig. 1d), there is an open (biasing) path between smoking and melanoma, which passes through the unknown confounding variable. Conditioning on survival is equivalent to censoring in a cohort study; in a case-control study it would be the result of the control sampling mechanism: e.g., recruiting age-matched controls from a population that is over-representative of smoking “survivors”. The bias introduced is considered to be “collider bias” because conditioning on the competing risk opens up a previously blocked path at a collider node. The conditioning step in this DAG is an example of competing risk bias because selecting on survival from a risk that competes with melanoma and is related to smoking opens a path between the smoking and melanoma that would not have existed otherwise. This bias assumes the presence of an unknown variable that is related to the competing risk and to the disease of interest (melanoma), but which itself might not be related to the smoking exposure. This unknown variable could be thought of as a “fitness” quality, or some other unidentified characteristics that make a person less likely to get cardiovascular disease or cancer, regardless of their smoking status.

Simulation studies and statistical analysis

For the purpose of the bias simulations, two scenarios were designed to be realistic regarding competing risk types and presence of other biasing variables. We selected three major competing risks of malignant melanoma (MM): death from lung cancer (CA), death from coronary heart disease (HD), and death from chronic obstructive pulmonary disease (COPD). In the first scenario (Fig. 2a), risk of death from lung cancer, heart disease, COPD and melanoma were related through three unmeasured variables (U1, U2, and U3), but there was no causal relationship between smoking and malignant melanoma. In the second scenario (Fig. 2c), we hypothesized that smoking caused melanoma. We selected only subjects that survived from death from the three competing risks, thereby opening biasing paths between smoking and melanoma. In an alternative approach to both scenarios (Fig. 2b, d), we added an additional biasing path through a fourth unknown variable, U4. This variable represented known but unmeasured, or truly unknown, confounding variables that increase the risk of melanoma but are negatively related to smoking. The addition of U4 was intended to add a confounding path which represented the type of biasing variable that is most elusive for adequate control in the smoking-melanoma literature (e.g., outdoor activities, sunscreen use, etc.)

Fig. 2
figure 2

a Scenario 1 (no causal relationship between Smoking (SM) and Melanoma (MM)) but there are three open paths through the competing risks of death from lung cancer (CA), death from heart disease (HD), and death from COPD, which are connected by Unknown variables U1–U3 to MM. b Scenario 1, with additional confounding of the SM–MM relationship by U4. c Scenario 2 (introduces a causal relationship between SM and MM). d Scenario 2, with additional confounding of the SM–MM relationship by U4

Using Monte-Carlo techniques, we simulated both scenarios with fixed priors for the population characteristics, and the risk of death from smoking related diseases, and 27 different input levels for the prevalence of the unknown variables and the relationship between the unknown variables and the known variables, for a total of 108 simulations. Each simulated cohort consisted of 100,000 persons who had binary variables generated representing gender, smoking status, melanoma status, death as a result of the competing risks of lung cancer, heart disease or COPD, and presence or absence of the unknown characteristics, U1 to U4. Gender and status of the unknown characteristics U1–U4 (that is, U x where x is 1, 2, 3 or 4) were generated by random draws from independent Bernoulli distributions such that: Female ~ B(1, 0.57) and U x  ~ B(1, P(U x  = 1)) where P(U x  = 1) varied between 0.2 and 0.5 depending on the simulation trial. Smoking status (current, former, never) was generated from two conditional Bernoulli distributions such that each smoking status was an exclusive categorization (i.e., no individual could belong to more than one category) with mean population distributions conditional on gender, as provided in Table 2. Additionally, in scenarios 1 and 2 with added confounding by U4, probability of current and former smoking was modeled as a function of the U4 variable as in the following equations:

$$ Current\,Smoker \sim B\left( {1, \frac{1}{{1 + \exp \left( { - \left( {\log \left( {\frac{{P\left( {CS_0 = 1} \right)}}{{1 - P\left( {CS_0 = 1} \right)}}} \right) + \log \left( {OR_{CS - U4} } \right)U_{4} } \right)} \right)}}} \right) $$

and

$$ Former\,Smoker \sim B\left( {1, \frac{1}{{1 + \exp \left( { - \left( {\log \left( {\frac{{P\left( {FS_0 = 1} \right)}}{{1 - P\left( {FS_0 = 1} \right)}}} \right) + \log \left( {OR_{FS - U4} } \right)U_{4} } \right)} \right)}}} \right) $$

where P(CS0 = 1) and P(FS0 = 1) were the gender-specific current and former smoking rates as found in Table 2, respectively, and ORCS-U4 and ORFS-U4 were the odds ratios relating current and former smoking to the unknown characteristic U4.

Table 2 Priors for the simulation studies

Malignant melanoma was generated from random draws from a Bernoulli distribution, with an incident risk that was a function of the background risk of MM and unknown characteristics U1–U4, where P(MM0 = 1) was the background risk of melanoma, and ORMM-UX was the odds ratio relating U1–U4 to melanoma:

$$ MM \sim B\left( {1, \frac{1}{{1 + \exp \left( { - \left( {\log \left( {\frac{{P\left( {MM_0 = 1} \right)}}{{1 - P\left( {MM_0 = 1} \right)}}} \right) + \mathop \sum \nolimits_{x} \left( {\log (OR_{{MM - U_{X} }} ) U_{X}} \right)} \right)} \right)}}} \right) $$

In scenario 2, melanoma was additionally dependent on gender specific rates for current and former smokers, where ORMM-CS and ORMM-FS were the odds ratios relating current and former smoking to melanoma, respectively:

$$ MM \sim B\left( {1, \frac{1}{{1 + \exp \left( - \left(\log \left( {\frac{{P\left( {MM_0 = 1} \right)}}{{1 - P\left( {MM_0 = 1} \right)}}} \right) + \log \left( {OR_{MM - CS} } \right)CS + \log \left( {OR_{MM - FS} } \right)FS + \mathop \sum \nolimits_{x} \left( {\log \left( {OR_{MM - U_{x}} } \right)U_{x} } \right)\right)\right)}}} \right) $$

“Death” from more than one competing risk per individual was not allowed, and the competing risk death variables were generated as a function of all related variables, over the life course: first lung cancer (CA), then heart disease (HD), then COPD, using the following equations:

$$ CA \sim B\left( {1, \frac{1}{{1 + \exp \left( { - \left( {\log \left( {\frac{{P\left( {CA_0 = 1} \right)}}{{1 - P\left( {CA_0 = 1} \right)}}} \right) + \log \left( {OR_{CA - CS} } \right) CS + \log \left( {OR_{CA - FS} } \right)FS + \mathop \sum \nolimits_{x} (\log (OR_{CA - U} )U_{x} )} \right)} \right)}}} \right) $$
$$ HD \sim B\left( {1, \frac{1}{{1 + \exp \left( { - \left( {\log \left( {\frac{{P\left( {HD_0 = 1} \right)}}{{1 - P\left( {HD_0 = 1} \right)}}} \right) + \log \left( {OR_{HD - CS} } \right) CS + \log \left( {OR_{HD - FS} } \right)FS + \mathop \sum \nolimits_{x} \left( {\log \left( {OR_{HD - U_{x}} } \right)U_{x}} \right)} \right)} \right)}}} \right)|CA = 0 $$
$$ COPD \sim B\left( {1, \frac{1}{{1 + \exp ( - (\log \left( {\frac{{P\left( {COPD_0 = 1} \right)}}{{1 - P\left( {COPD_0 = 1} \right)}}} \right) + \log \left( {OR_{COPD - CS} } \right) CS + \log \left( {OR_{COPD - FS} } \right)FS + \mathop \sum \nolimits_{x} \left( {\log \left( {OR_{COPD - U_{x}} } \right)U_{x}} \right)))}}} \right)| CA = 0, HD = 0 $$

For example, if a cohort member contributed to the deaths associated with lung cancer, they were not eligible to contribute to the deaths from heart disease or COPD.

Published results from the Cancer Prevention Study II (CPSII) [25] cohort were used as an example for the realistic specification of inputs for baseline variables gender and smoking status, background risk of the smoking-related competing risks, and the relationship between smoking status (former or current vs. nonsmoker) and death from those competing risks. The background cumulative incidence of melanoma (0.1 %) is a conservative estimate based on the most recent CPSII study of smoking and melanoma [12]. In scenario 2, we added an assumed relationship between smoking and MM reflecting an odds ratio of 1.5 and 1.2 for current and former smoking, respectively, which was based on the results found for the relationship between smoking and squamous cell carcinoma (SCC) in a large prospective study of Nurses in the US [26]. The fixed inputs that were used in our simulation scenarios are provided in Table 2.

For each simulated dataset, we performed logistic regression of melanoma status on smoking status at the end of follow-up, restricting to subjects who did not “die” of any competing risk, stratified by gender. The analysis was repeated for each level of the different prior inputs. For each simulation, the distribution of the estimated ORs was summarized using the median (the 50th percentile) and 95 % simulation intervals (the 2.5th and 97.5th percentiles). All analyses were performed using SAS version 9.3 (SAS institute, Cary, NC).

Results

In scenario 1 (Table 3), for which we assumed a true null association between smoking and melanoma, we observed a protective effect of smoking on melanoma after conditioning on survival from the simulated competing risks (CR), lung cancer, heart disease, and COPD. The magnitude of this negative association was found to be higher for current smokers and males than it was for former smokers and females. With no confounding by U4, the odds ratios detected at the strongest prior specifications and an assumed prevalence of 35 % for the unknown variables were 0.86 (0.65, 1.13) for male current smokers and 0.98 (0.78, 1.24) for female former smokers. Addition of confounding by U4 dramatically increased magnitude of bias, as the same results at a strong relationship between U4 and SM and U4 and MM (OR = 0.7 and 3.0, respectively) were 0.77 (0.61, 0.96) for male current smokers and 0.93 (0.77, 1.10) for female former smokers. It also appeared that the magnitude of bias was more sensitive to increases in the relationship between the unknown variables U1–U4 and the competing risk variables than it was to increases in the population prevalence of U1–U4.

Table 3 Simulation study results—Scenario 1 (Assumes ORMM-CS = 1 and ORMM-FS = 1)

In scenario 2 (Table 4) we assumed a weak causal relationship between smoking and melanoma with a dose response (OR = 1.5 and 1.2 comparing current and former smokers to non-smokers, respectively). The bias mechanism attenuated this causal relationship, but almost all of the resulting odds ratios (after selecting for survival from competing risks) showed a small positive association or were borderline null. Patterns of attenuation across gender and smoking status were the same as scenario 1, with stronger positive associations found among males and current smokers while the borderline null results found in females and former smokers. With no confounding by U4, the odds ratios detected at the strongest prior specifications (at the 35 % prevalence level) were 1.36 (1.07, 1.74) for male current smokers and 1.16 (0.90, 1.46) for female former smokers. Addition of U4 confounding at the highest level further attenuated these values to 1.22 (1.01, 1.48) and 1.09 (0.90, 1.31) for male smokers and female former smokers, respectively, but none of the priors specified in scenario 2 resulted in a protective association between smoking and melanoma.

Table 4 Simulation study results—Scenario 2 (Assumes ORMM-CS = 1.5, ORMM-FS = 1.2)

Discussion

In this paper we hypothesized a bias mechanism that might be compatible with current published results on the relationship between smoking and melanoma, some of which suggest that smoking confers a small protective effect. We simulated a large realistic cohort and then varied the levels of our unknown or unmeasured variables in two scenarios – one where the true relationship between smoking and melanoma was null and the second, in which there was a true causal relationship. In scenario 1, most of our priors yielded data that, after selection for competing risks in the simulated population, led to an apparent protective relationship between smoking and melanoma. The strength of this spurious inverse association was increased with stronger confounding effect by the unknown variables, but not markedly with stronger prevalence of the unknown variables. Increasing the prevalence of the unknown variables did not markedly increase the bias but it did improve the precision of the detected inverse relationships. Almost all prior specifications in scenario 1 led to odds ratios that were compatible with the current literature results as presented in Table 1. Encouragingly and as expected, our simulations results in scenario 1 are similar to those obtained from different studies using used different methods. In scenario 2, however, the protective bias mechanism was unable to overcome the strength of the assumed causal relationship between smoking and melanoma. These results seem to indicate that, under our assumed causal structure and prior specifications, the published results presented in Table 1 could be compatible with a non-causal relationship between smoking and melanoma, and may instead be due to combined confounding and selection bias from death due to other competing effects of smoking. This exercise supports previous reports that demonstrate that selective survival can result in attenuated associations among survivors at older ages, even though the true effects are not necessarily diminished [2730].

Interpreting the results of this study requires some caution and context. The results are illustrative and only offer an alternative or additional explanation for interpreting the findings of smoking and melanoma studies given the working assumptions and models. We assume an underlying causal structure, and we simulate our population to behave exactly in the manner that our DAG describes. Additional assumptions are included in our fixed and variable inputs. We assume that there is no effect measure modification (other than gender) in our populations, and no misclassification of the smoking exposure. We did not simulate any additional unmeasured or residual confounding other than that which was conveyed in the unknown variables U1–U4. Some consideration as to whether the magnitude of our simulated U1–U4 variables is realistic seems appropriate. For example, these variables were modeled with an OR between the unknown variable and its consequences as 5 for 36 of 108 of the simulations performed, and it is unlikely that we would not have already discovered these confounder variables as independent predictors of disease if the true magnitude of their effect was that great. Perhaps a more likely scenario is that U1, U2, U3 and U4 individually represent a group of unknown or unmeasured characteristics which, acting together, are capable of the strength of confounding we have simulated. An example would be unidentified gene-environment interaction—potentially composed of many of interrelated (or unrelated) risk factors that could be as acting congruently. Although we did base our simulations on published findings, further research is needed in determining the plausibility of this biasing mechanism in real data.

Cohort analysis using appropriate models for selection bias will account for known attrition due to competing risks in a manner that does not bias outcome measures [31], but these methods may not entirely rule out the possibility of this type of bias because smoking-related competing risks could increase the risk of loss to follow-up [32]. However, we can assume that a time-to-event analysis that accounts for competing risks would at least attenuate such a bias. Such a time-to-event analysis would not, however, add any validity in an uncontrolled confounding scenario such as was simulated by adding the variable U4. In fact, this type of analysis might contribute to the inverse association over time as confounding due to U4, if time-varying, would compound the resulting bias.

One distinction that is well characterized in this experiment is the sign of the bias that results from the two different types of biasing paths between smoking and melanoma in our causal structures. The collider bias that results from conditioning on a smoking-related competing risk of melanoma opens a path where it should remain closed, while the influence of the variable U4 represents failing to close a pathway where it should not be open. The sign of the bias due to uncontrolled confounding by U4 is net negative by being equal to the product of the negative association between U4 and smoking and the assumed positive association between U4 and melanoma [33]. In contrast, when we conditioned on survival from each competing risk (i.e. a collider), the sign of the resulting bias between smoking and melanoma was negative because: (1) the association between smoking and U1–U3 became negative conditional on their common effect i.e. the competing risk, and (2) this conditional negative association between smoking and each of U1–U3 induced by conditioning on the competing-risk combined with the positive association between each of U1–U3 and melanoma to induce a net negative association between smoking and melanoma [30, 34, 35]. These combined sources of bias produce a more powerful negative total bias than either source acting alone.

We aimed to demonstrate a mechanism of bias that might help to explain the yet-unexplained protective association between smoking and melanoma in the published literature. Strengths of this study include an attempt at simulating a realistic cohort, which was modeled after a real cohort (CPSII), and the exploration of a vast number of varying prior inputs for our unknown variables. Limitations of this study include an inherent limitation of all simulations—the inability to perfectly simulate the variation that exists in a real-life cohort, including measurement error, and complex effect measure modification mechanisms. While an explanation for the negative association between smoking and melanoma may still be forthcoming, exploration of the bias introduced by selecting on surviving the competing risks of smoking-related disease is an important consideration in studies of the smoking-melanoma relationship. In particular, future studies that uncover this type of association may benefit from an analysis strategy designed to address competing risk bias [36, 37].