Introduction

Digital health applications, or “apps,” have become a popular method of tracking important health indicators for clinical and personal use [1]. Apps can be useful tools for tracking and accomplishing health goals [1, 2]. A segment of these health apps is used to track menstrual cycles. Menstrual cycle tacking apps (MCTAs) assist users in observing their menstrual cycle and related signs and symptoms, as well as managing their fertility [3]. MCTAs give users more control over their own personal health [3]. MCTAs have features that can increase users’ knowledge about the menstrual cycle in general, and the act of tracking cycles can help users learn the patterns of their own bodies [4]. This makes a sample of women using an MCTA a useful source of data for menstrual cycle research.

MCTAs are a valuable potential tool for epidemiologic research [1]. With MCTAs, menstrual cycle study samples can expand from hundreds of participants to thousands or more. For some MCTAs, all users agree to share their data anonymously, which avoids the need for study “recruitment” and may decrease volunteer bias [5,6,7,8,9]. Some MCTAs facilitate tracking of ovulation, providing researchers with access to population-level data on ovulation timing, which is not easily accessible. The use of apps is increasing as mobile users switch from web browsing to app use and smart phone use continues to grow worldwide. Three quarters of smartphone subscription growth came from Africa and Asia in the first quarter of 2015 [10]. This suggests that apps are globally available, increasing the feasibility of including a diverse group of users in menstrual cycle research [10]. Use of data from MCTAs could improve our understanding of the menstrual cycle [11].

While the promise of MCTAs for epidemiologic research is exciting, there are potential limitations to these data. As of yet, it is not clear how representative or accurate MCTAs are, or how susceptible they are to missing data and loss to follow-up. The purpose of this literature review was to synthesize published literature on MCTAs with respect to their utility for epidemiologic research. For this review, we examined all published studies that included MCTA-collected data and extracted from those studies information related to several primary areas of interest, chosen for their relevance for epidemiologic research. These areas were 1) selection: who uses MCTAs and why, 2) misclassification: is MCTA-collected data accurate, and 3) overall, what is the potential for using MCTAs in epidemiologic research.

Methods

PubMed, Web of Science, and Scopus were searched for literature on the topic of MCTAs. Search terms used were phrases about menstruation (menstrual cycle, menses, menstrual), contraception (natural family planning methods, fertilization, fertility, fertile, conception, conceiving, ovarian cycle, endometrial cycle, family planning, time-to-pregnancy), and applications (smartphone, mobile application, app(s), portable software, portable electronic, app-based). References from retrieved papers were also examined for additional literature. One additional article was included after reviewing these references. These searches yielded a total of 150 articles as of September 2020.

Articles were screened to identify those that address the epidemiologic characteristics of app data; in broad categories, these were 1) accuracy of identifying ovulation or measuring time to pregnancy; 2) demographic or behavioral characteristics of users of MCTAs and reported reasons for using apps, which may further describe population characteristics; and 3) degree of missingness and loss to follow-up in app data. Articles were initially screened on their title and abstract. Publications were excluded either if they did not contain original data or if the purpose of the described apps was solely clinical decision-making or patient education. The remaining 81 articles were subject to a full-text review. The same criteria were applied along with these additional exclusions: meeting or poster abstracts, articles not in English, and studies focusing only on a wearable device. This left 49 articles.

While we had established broad categories of interest prior to the full-text review, we modified these categories, if for example no studies addressed them, or to accommodate categories we had not considered. After reviewing these full-text articles, the specific domains for this review were 1) characteristics of MCTA users in research on menstrual cycles, fertility, and contraception; 2) reasons users use or continue using MCTAs; 3) accuracy of identifying ovulation and utility at promoting and preventing pregnancy; and 4) previously published quality assessments of MCTAs. Table 1 summarizes the domains chosen for this review and the primary objectives of each of the papers included within that domain. We aim to review these specific domains in the published literature, not make direct comparisons across apps. Our review synthesizes the literature within each of these domains independent of the research objective of each paper.

Table 1 Primary objective of each paper included in this review and the section(s) in which it is cited

Results

Characteristics of MCTA Users in Research

This section focuses on published scientific papers that used MCTA data, and as such, the samples of users described in our review are those who volunteered or provided enough data to be included in those publications. Some MCTAs consent women separately for research studies, a separate recruitment step, while others obtain consent from all users when the MCTA is first downloaded. Thus, MCTAs have a population of users, and the published data from any one MCTA may not be representative of all users of that MCTA; however, we do not have any means of describing the characteristics of all users of a given MCTA. Recruitment techniques may help to diversify study participants recruited from an MCTA. For example, a US study of users of the app Dot (Dynamic Optimal Timing) found that self-guided enrollment was preferred, and that the percentage of Black and Hispanic participants increased when recruitment changed from enrollment with a study representative to a self-guided process [12•].

Understanding the demographic characteristics of MCTA users is important for several reasons. First, it may help researchers select which app they want to use for their research, for example, an app primarily used by teenagers, or an app primarily used by those trying to conceive. Second, describing MCTA users will help to determine the generalizability of any analytic results derived from their recorded data. Third, it would help to describe the potential for selection bias when addressing a specific research question. For instance, in a study looking at the behavior of users of two different apps, Kindara users mainly resided in the USA and used the app to promote pregnancy, while Sympto users were mainly European and used the app to prevent pregnancy [46•]. This section highlights the demographic data of papers from the literature review in which the primary research objective was menstrual cycles, fertility, or contraception. Table 2 summarizes MCTAs discussed in this paper.

Table 2 Summary of the most commonly mentioned MCTAs in this review

MCTA Users in Menstrual Cycle Research

We found ten studies that describe menstrual cycle research using MCTAs. Six were descriptive studies of menstrual cycle length or ovulation, and the remaining four will be described individually. One study addressed menstrual bleeding intensity and was focused on adolescents [13]. This study required a history of regular menstrual cycles (21 to 45 days long) and menses lasting less than 8 days [13]. Since teens are more likely to have irregular cycles, and the time since menarche influences their regularity, this study selected for girls who are further from menarche or achieved regularity quickly [51]. The study found that teenagers prefer an MCTA to paper cycle tracking, but this may only be applicable to teens with regular cycles [13]. One MCTA study reported an association between sexually transmitted infection and increased premenstrual symptoms (headache, cramps, and sadness) in younger users (median age: 26) [14]. Another study used an app to deliver an acupressure intervention to German users with dysmenorrhea aged 18–34 and reported a reduction in menstrual pain over six menstrual cycles [15]. Finally, in a randomized controlled trial of workers in Japan aged 20–45, use of an MCTA was associated with reduced depression and dysmenorrhea after 3 months of use [16].

The populations from the remaining six studies are from the USA [17, 18], the UK and the USA [19], Japan [20], or a combination of the USA, UK, and Sweden [21•]. In the sixth study, the full user base includes 150 countries, 5 continents, and 8 languages, yet most users reside in the USA and Europe [46•]. These app studies primarily represent the USA and Europe. In one of these studies, most participants reported their race as White (78%) [18]. However, four menstrual cycle studies do not present other race/ethnicity data outside of country of residence [19, 20, 21•, 46•]. In total, it appears that most studies include populations of European descent or do not present race/ethnicity data at all.

Three of the menstrual cycle studies had a mean age of 30 or higher [20, 21•, 46•] and one study reported that 70% of users were aged 25–34 [18]. One study did not provide any demographic data [19]. While the literature is small, most MCTA menstrual cycle studies included older users with adolescents and young adults less well-represented. In one study, users tended to have a college degree or higher education (71%) [18]. Many studies did not report the education level of their participants [19, 20, 21•, 46•].

Body mass index was centered on the normal range for two of the studies (mean BMI = 23) [21•, 46•], while one included 31% overweight or obese users [18]. In the latter study, 30% of users were missing BMI information [18]. Some studies did not report the BMI distribution [19, 20]. In total, MCTA menstrual cycle research shows varying distributions of BMI although other MCTA studies did not report BMI data at all.

MCTA studies sometimes impose limits on the average cycle length or regularity of the participants in their analyses. For example, one study observed cycles of 23–67 days but excluded those that were 1.5 times longer or shorter than the user’s reported cycle length (approximately 70% of participants reported a cycle length of 25 to 30 days) [19]. Some studies have included a wide range of cycle lengths: 19 to 60 days [18], 10 to 90 days [21•], 20 to 45 days [20]. Another study based on fertility awareness methods did not have strict exclusion criteria based on length, but did require that cycles > 40 days did not have any mid-cycle bleeding and that the total cycle length was 4 days longer than the number of bleeding days reported [46•]. Overall, menstrual cycle research using MCTAs has incorporated a wide range of average cycle lengths.

In conclusion, MCTA menstrual cycle studies focus on users of White race or European residence, although some studies did not describe their study sample. MCTA menstrual cycle studies tend to include older users, and researchers interested in younger populations may need to seek out an app that targets that population. MCTA-based studies include a wide range of BMI and cycle lengths. Exclusions of cycle lengths should be carefully considered with regard to how this may balance misclassification and generalizability. MCTA studies have an opportunity through their large user base to describe menstrual cycle characteristics across a diverse sample. Without the careful reporting of demographic characteristics, it is a challenge to ascertain if samples are representative of the MCTA user population, or of wider country or even global populations. We suggest that publications using MCTA data for menstrual cycle research thoroughly describe their sample’s demographic characteristics, and the peer-review process should request this. This is fundamental for understanding generalizability and potential for selection bias.

MCTA Users in Fertility Research

Two of the apps that were previously described as contributing to menstrual cycle research [18, 19] also conduct research on the probability of conception; these include the Clearblue Connected Ovulation Test System (a combination of app and ovulation tests) [23] and Ovia Fertility [22]. One additional study using data from the Clue app has also been used to develop a model for conception; however demographics of the user base were not described [24]. The two studies with demographic information were based in either the UK [23] or the USA [22], and both samples had a mean age of 30 and a wide range of BMI (mean = 26 and SD = 5 [23] or 43% overweight or obese [22]). MCTAs have the potential to over-select for people who are subfertile, for example, women who know they are subfertile are more likely to use an app to help them time their intercourse or track their cycles to facilitate conception. However, data to evaluate this issue are limited. Lower conception rates and a higher rate of endometriosis than those of the general population were found in one MCTA study [18]. In a cohort study of women attempting to conceive, the prevalence of subfertility was not higher in MCTA users [25]. However, users of “selected” MCTAs and “other” MCTAs were more likely to take folic acid than non-users (83.3% and 73.7% vs 66.8%) and were less likely to have recently used contraceptive hormones (37.4% and 34.6% vs 44.0%) [25]. The fertility profile of MCTA users should be further investigated especially when the research objective is fertility related because this is important for generalizing the results.

MCTA Users of Contraceptive Apps

MCTAs can be used to prevent pregnancy, and some of the previously described apps used for fertility research or menstrual cycle research can also be used by women who wish to avoid pregnancy. The characteristics of women who use an app for contraception may differ from those who choose to use the same app to aid conception, so we focus on describing users studied in MCTA contraception research in this section. The app with the most published research, the Natural Cycles contraceptive app, is FDA-approved as a contraceptive [50] and provided data in the previous section on menstrual cycle research. Users of the app were for the most part young (aged 20–35), of a healthy BMI, and from Sweden [8]. The remaining two studies that specifically address contraceptive research focused on low-income countries [26••, 27] and used the Cycle Beads contraceptive app. The Cycle Beads users were young (69.7% are 18–25), 45.4% were students, 27% were in a relationship but not married, and 49.7% attended/completed university/postgraduate school [26••]. One important observation from the Cycle Beads studies was that a third of the users were not previously using another form of contraceptive, suggesting that MCTAs could fill an unmet need for contraception [26••].

Reasons Users Choose or Continue Using a Particular MCTA

A person may select a specific MCTA for a variety of reasons, including accuracy, a referral from a friend, inclusiveness, and tracking features. MCTA users continue to engage with an app if they are satisfied with their experience. One important consideration for choosing and continuing an MCTA is its perceived accuracy. Those planning to use an app for contraception rated accurate ovulation prediction as highly important (90.7%). Users lose trust in an MCTA [28] and discontinue use when menstrual cycle milestones are miscalculated, which can be due to the method of calculation or to user characteristics, such as irregular cycles [27, 29, 30]. For contraceptive MCTAs, the more accurate methods include a wider fertile window [31], yet discontinuation is also more likely with a wide fertile window as it allows for fewer days with unprotected intercourse [8]. This highlights users’ desire for an MCTA that accurately identifies the exact fertile window which allows for more days of unprotected intercourse. Research using MCTAs should consider that the sample is selected for users who have been satisfied with the accuracy of the app.

Users refer their friends to a well-liked MCTA [27, 32, 33]. Fifty percent of Cycle Bead’s users were referred by a friend [27]. Furthermore, 68.4% of 1000 survey respondents reported that word of mouth was a somewhat or very important reason for choosing an app [33]. This suggests that the users of a given app may be clustered in meaningful ways for epidemiologic research.

Using MCTAs can be discouraging for users whose identities are not represented by the app. There is a tendency for MCTAs to focus on fertility, and users who are outside of the gender binary, are infertile, are new menstruators, are celibate, or are not heterosexual, may feel excluded [29, 34]. My Period Tracker and Glow have received feedback for being gendered, and therefore are more likely to select for cis-gendered users [35••]. Users describe Clue as gender neutral and not focused on pregnancy, despite imagery suggesting a male partner [35••].

MCTAs are used to observe cycles independent of fertility. In two online surveys, most respondents used MCTAs to track their cycle [28, 30]. Likewise, in two qualitative studies, themes of cycle observation for health purposes emerged, regardless of fertility goals [29, 35••]. Another qualitative study found that people trying to conceive reported learning more about personal fertility patterns from using MCTAs [36]. This suggests that app users who track their cycles are interested in learning about their health, and apps that address this motivation may better retain users. Furthermore, capitalizing on this interest may lead to more complete data in MCTA datasets. Users with irregular cycles find it beneficial and report better control of their condition when using an MCTA [37]. Keeping track of symptoms could be useful for both providers and patients for managing menstrual disorders [38]. However, in a study with 72 participants who had dysmenorrhea and premenstrual syndrome (PMS), only 24% said it helped them understand PMS patterns [32]. This points to an area in which MCTAs can grow and potentially retain a group of women who otherwise may not continue with an app.

Accuracy of Ovulation and Fertile Window Prediction

Here, we review the published literature that has evaluated the accuracy of fertile window and ovulation prediction of MCTAs. Many MCTAs are based on fertility awareness methods and some are marketed for pregnancy prevention [39]. For some MCTAs, it is unclear whether health professionals or the published scientific literature have contributed to their development [40, 41]. A calendar method refers to tracking a menstrual cycle on a calendar, and different MCTAs have different variations of this method. Apps that used one of four calendar methods all had an ovulation day prediction accuracy of lower than 90%, ranging from 17 to 89% [31]. The wider the estimated fertile window, the more likely the app was to identify the true fertile days [31]. However, a wider fertile window has implications for user adherence, as less days with unprotected intercourse will be allowed. In a different study, the MCTA Dot was found to be more effective than other calendar-based MCTAs for users with non-average cycle lengths because of the app’s ability to change predictions based on an individual’s data [42]. Still, the authors note that Dot is most effective for people with regular cycles [42]. Similarly, the LunaLuna app was compared with existing calendar-based methods (Ogina and HCL) and was found to be more accurate at predicting ovulation, particularly as the number of menstrual cycles per user increased, and at the extremes of cycle length [20]. The standard day method, a method used for the app Cycle Beads, was only effective for users with cycles between 26 and 32 days and does not adjust predictions over time [27, 42]. The standard day method, a type of calendar method, is traditionally used with a color-coded strand of beads that is representative of a menstrual cycle and identifies days 8 through 19 as fertile [27].

Ovulation testing in combination with an MCTA may improve accurate identification of the fertile window leading to increased conception rates. In a prospective cohort study of women trying to become pregnant (the Pregnancy Study Online or PRESTO), women who used an MCTA had a higher probability of conceiving and the associations were stronger when used with fertility indicators such as basal body temperature, ovulations tests, and cervical fluid [25]. Similarly, in another study, using ovulation tests in combination with a study app was associated with twice the odds of conception compared with only using the app [23]. On its own, basal body temperature methods are sensitive to misclassification due to multiple temperature peaks in a cycle (not necessarily due to multiple ovulations) and fever [43]. The Natural Cycles contraceptive app incorporates basal body temperature measurements into the algorithm that predicts the fertile window. In two studies of the accuracy of the Natural Cycles app (over 4000 users), the Pearl Index typical-use score (the number of contraceptive failures per 100 person-years of exposure) was 7.0 [8, 9, 52], which was a conservative estimate [7,8,9], and was still an improvement on other fertility awareness–based methods which have a typical-use Pearl Index of 24 [8].

The Ovia Fertility app allows users to input self-detected ovulation information, and the resulting data have been compared with published studies of ovulation timing to determine their consistency [18]. For example, the probability of pregnancy was highest on the 5 days before and on the day of ovulation, which is consistent with previously published biomarker-based studies of the fertile window [18]. Symptoms of ovulation such as cervical fluid changes also changed around the estimated ovulation day. Taken together, the authors suggest that the Ovia Fertility app contains accurate ovulation and fertile window data. While ovulation testing may improve the accuracy of fertile window identification, it does increase user burden and may decrease user engagement or lead to discontinuation.

The accuracy of MCTAs may also be improved by incorporating an educational or training component [44]. A study on the app CycleProGo found fewer missing data and increased long-term use when users took a natural family planning course [45]. Researchers should determine whether an app they are considering employing includes a training component, and the implications of that training for accuracy and user burden.

Missing Data and Accuracy

Missing data can affect MCTA prediction accuracy [6]. MCTA studies acknowledge that missing data are a problem [46•], but do not always fully describe the frequency of missingness or the implications of missing data for interpretation [24]. Of 1.4 million eligible menstrual cycles, ovulation could not be assigned in the Natural Cycles app in 665,603 cycles, which may have been largely due to missing temperatures: 75% of the cycles without a day of ovulation were missing temperatures for at least half of the cycle [21•]. Users who record more intercourse track more data in the Kindara and Sympto apps; these users are also trying to conceive (40% of cycles had recordings every single day when the user was trying to conceive) [46•]. This suggests that a goal of conception encourages users to record data. MCTAs are designed to reduce missingness by incorporating reminders, which users report as helpful [53], but also annoying [53] and patronizing [36]. Users have a clear preference for making reminders optional [30, 35••]. MCTAs with a low burden for users could promote data entry and improve app accuracy [47]. However, even with innovations designed to target simplicity, over half of 196 MCTA users in an online survey cited app complexity as a reason for switching, and 22% reported having switched apps before [28]. Future research and innovation will be necessary to address missingness in MCTA data. Moreover, innovations must balance the decrease in missingness with the increase in user annoyance that may lead to app switching. For epidemiologists, a longitudinal cohort might not be feasible if many users switch apps.

Quality Assessments of MCTAs

Previous studies have evaluated MCTAs on disparate measures of “quality.” While accuracy of identifying the fertile window and scientific quality are typically included [40, 44, 48, 49•], other criteria include access to technical support [40, 44], password protection [40], privacy policy clarity [54], third party advertising [40], cost [44], and ease of use [44, 49•]. However, perception of quality may differ based on intended use of the app, and studies comparing quality have been inconsistent in their scoring criteria [44, 48]. For example, one study developed ten criteria applicable to apps in general (other than accuracy), and another developed eight criteria related to fertility specifically [44, 48]. Thus, quality assessments of apps are difficult to compare, having been based on different criteria. Efforts have been made to develop a standardized method of evaluating quality for MCTAs. Moglia et al. adapted the APPLICATIONS Scoring System, which is used to evaluate mobile apps in general, for use with MCTAs; one additional study followed suit [40, 49•]. None of the reviewed apps was considered perfect by the applied scoring system in either study. Although these two studies applied similar criteria, they did not review the same applications so direct comparisons cannot be made [40, 49•]. One of these two studies reported the highest score for Ovia Fertility Period Tracker with 13/15 points and the lowest to Pregnancy Tracker Baby Center with 9/15 points [49•], while in the second of these studies, Clue received the highest score with 13/15 points and Free Girl Cal received the lowest score with 6/15 points [40]. Reasons for lower scores included an unclear link to published scientific research, and a potential lack of health professional involvement in their design [49•]. Both Clue and Ovia received high scores for addressing all reviewed areas except “involvement of a health professional” [40, 49•]. Ovia scored particularly well in comprehensiveness, which was a measure of the diversity of tracking features and educational information [49•]. Clue included “other” features that set it apart including having a medical disclaimer and health education material, data security features like a backup and the ability to export data, availability in Spanish, custom reminders, the ability to track many menstrual characteristics, and alerts for the next menses and the fertile window [40]. These assessments show that “quality” can include many facets of apps beyond scientific accuracy and will therefore depend somewhat on the priorities of the user or the researcher. Furthermore, if a user shifts their goals, for example, from wanting to avoid pregnancy to trying to achieve it, the relative quality of the app they are using may change, which the adapted APPLICATIONS Scoring System does not account for [49•].

Discussion

The objective of this review was to provide an epidemiologic perspective on the current MCTA literature. We found that the existing literature fell into four relevant categories which are described in our “Results” section: characteristics of MCTA users, reasons women use or continue using MCTAs, accuracy of identifying ovulation or the fertile window, and comparisons of MCTAs across differing measures of “quality.” These four categories inform our understanding of the potential for selection bias and misclassification when using MCTA data independent of the research goals of the currently published literature. We included 48 studies in this review—a small but burgeoning literature. Regarding the first category, we found a tendency for published MCTA menstrual cycle studies to include users of White race or European residence, while most included a wide range of BMI and cycle lengths. Several studies did not report the demographic information of their users at all. MCTA-based studies of fertility should investigate the prevalence of subfertility in their user base to determine if MCTAs designed to aid conception are used predominantly by those with fertility concerns. Fewer studies exist describing characteristics of individuals who use MCTAs for contraception, despite there being some evidence that MCTAs can fill unmet contraceptive needs.

We found that the reasons that people use MCTAs vary and can change over time which is consistent with one other review article [55•]. Users who feel an MCTA meets their needs are more likely to continue to use it and to recommend it to others. MCTAs that employ non-binary or non-gendered environments will appeal to these underrepresented groups. Additionally, cycle tracking, independent of pregnancy planning, is an important motivation for using MCTAs. Features that appeal to MCTA users who are not planning a pregnancy, such as a focus on tracking symptoms rather than ovulation, may improve their satisfaction with the app, and therefore their data completeness.

The accuracy of apps depends upon the algorithm used, and apps may benefit from the inclusion of biological monitoring rather than just calendar methods. Missing data can contribute to inaccuracy of the MCTA’s prediction algorithm, as well as selection or generalizability issues; however, few MCTA-based studies have described the potential impact of missing data on their results, and this is likely to be an important area of future research.

Several reviews have already been conducted that have identified MCTAs that are of higher quality based on both scientific and non-scientific criteria. These assessments show that “quality” can include many facets of apps beyond scientific accuracy and will therefore depend somewhat on the priorities of the user or the researcher. Without a standardized method to evaluate quality, users, providers, and researchers will need to make their own decisions about an app on an individual basis.

The literature on incorporating MCTAs into research is currently small, but the availability of large datasets (hundreds of thousands of menstrual cycles) and rich data collection (daily diaries, questionnaires, geolocation data) will lead to increasing interest from researchers. Our recommendations for future research using MCTAs include describing the diversity of the user base and assessing how the MCTA addresses the needs of its users. Different strategies have been applied to promote user engagement, which is important for reducing missing data and maintaining longitudinal cohorts. Furthermore, MCTA studies should thoroughly describe their sample’s demographic and behavioral characteristics and previous fertility experience, and the peer-review process should request this. This is fundamental for understanding generalizability and the potential for selection bias. Finally, the choice of MCTA for research will depend on the research question and the ability of the MCTA to address the population, exposure, or health condition of interest.

Limitations

A limitation of this review is the inability to directly compare MCTAs because studies used different validation methods to evaluate accuracy. Furthermore, not all of the literature disclosed the name of the particular app being studied. Literature is still focusing on evaluating the algorithms and methods that an MCTA uses, and most apps do not share their method. This compounds the difficulty of direct app comparisons.

Conclusion

MCTAs are an important tool for the advancement of epidemiologic research on menstruation. MCTA studies should describe the demographic and behavioral characteristics of their user base and the patterns of missing data. Describing the motivation for using MCTAs over time and validating the data collected should be prioritized in future research. The ubiquity of MCTAs provides an opportunity for epidemiologic research that is demographically diverse. At the same time, MCTA data could also be leveraged to address an underrepresented health condition or exposure.