1 Introduction

Long-term research has shown that being overweight or obese are significant risk factors for type 2 diabetes, cancer, cardiovascular disease, and all-cause mortality [1]. Conversely, reducing excess body fat through a healthy diet can decrease risk for chronic disease. Accurate and reliable body composition (fat mass vs. fat free mass) estimations are necessary for health care professionals and patients to successfully quantify health risk and weight loss intervention progress [2]. Hydrostatic weighing (HW) is a common body composition measurement technique in exercise physiology labs around the world and has been used for over 50 years. The current “gold standard” technique for measuring body composition includes a combination of HW with dual-energy x-ray absorptiometry (DEXA) and air displacement plethysmography [3]. However, these methods require more expensive equipment and trained technicians, making them less accessible to many labs. In comparison to DEXA, HW has been found to produce reliable measurements of body composition and is widely used due to ease of use and availability [4]. Recently, there has been a rise in home health electronics that are widely available and cost effective. New emerging and converging technologies in the health and fitness industry have enabled consumers to seamlessly track numerous physiological variables (i.e., sleep, heart rate, activity level, and body composition) using Bluetooth and Wi-Fi wearables and smart scales [5]. Specifically, bioelectrical impedance analysis (BIA) devices can estimate body composition changes over time, and are meant to encourage individuals to become more active and allow them to more easily quantify progress [5].

Commercial BIA devices provide a relatively simple, inexpensive, and non-invasive technique to estimate body composition, but there is a lack of research involving the accuracy and consistency of many of these products. These devices differ from one another in a variety of ways, including body segments utilized for analysis, electrode contact points, and body fat prediction equations [4,5,6]. Numerous studies have found foot-to-foot BIA to be accurate [7,8,9,10], while others reported a poor agreement between BIA and criterion methods [11,12,13,14,15,16]. Most consumer devices use proprietary equations (rather than directly displaying measured impedance), but different equations can be used for specific ethnic groups or body sizes, meaning body fat results from BIA scales may not be as accurate for certain populations [6]. The Fitbit Aria™ Wi-Fi smart scale is a popular product that can wirelessly sync with computer and phone applications to monitor changes in both weight and body composition over time [7].

Overweight or obese patients may have these devices in their home in an effort to improve their health. As these devices are already being purchased and utilized by patients, it is important for health professionals to be able to attest to their accuracy and consistency. This is the first study to evaluate the validity and reliability of the Aria™ scale compared to criterion methods. Our primary objectives were to 1) evaluate the validity of body fat measurements using the Fitbit Aria™ compared to the criterion of HW and 2) determine the between-day and between-week consistency of body fat measurements using the Fitbit Aria™.

2 Methods

2.1 Participants

Forty-three healthy male (n = 22) and female (n = 21) volunteers (mean ± SD; age = 27.9 ± 5.6 yr., height = 169.9 ± 8.6 cm, mass = 68.9 ± 13.8 kg, Body Mass Index [BMI] = 23.7 ± 3.3 kg/m2, HW body fat = 21.7 ± 6.3%) with an age range of 21 to 43 years participated in this study. All participants reported no known chronic diseases or illnesses [8].

2.2 Compliance with ethical standards

The study was approved by the San Francisco State University Institutional Review Board (IRB) for Human Subjects. Informed consent was obtained from all individual participants included in the study. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

2.3 Experimental Design

Using a repeated measures design, all participants reported to the laboratory at the same time of day for three separate visits (visit two took place 24 h after visit one, and a third visit a week following either the first or second visit). This protocol was selected to replicate real world application, as many individuals will track progress for weight loss by daily or weekly weigh ins. Female participants were scheduled to start the first day of testing between days five and eleven of their menstrual cycle to avoid any potential confounding variables related to water retention that may have occurred [9]. Each participant was asked to comply with the following pre-test instructions: no eating or drinking for three hours prior to testing, emptying bladder and bowels within 30 min of testing, no alcohol for 24 h prior to testing, and no strenuous exercise at least 12 h prior to testing. Each participant wore the same style of clothing for all three visits and for all body composition testing (Females: swimsuits or spandex shorts and sports bras; Males: Speedo swimsuits, Lycra bike shorts [no padding], or boxer briefs).

During visit one, height was measured to the nearest centimeter on a wall-mounted stadiometer (Seca 216 Mechanical Stadiometer, Seca Inc., Chino, CA, USA). Each participant was weighed wearing approved testing attire using a calibrated digital scale (BOD POD: Life Measurement, Inc., Concord, CA, USA) measured to the nearest 0.01 kg. Residual volume (RV) was determined with the participant in a seated position using the Wilmore oxygen dilution method via metabolic cart (True One 2400®, Parvo-Medics, Inc., Provo, UT, USA). Participants completed a minimum of three trials and the average of the closest two trials within 5% was used to represent RV. Total body water measures were used to estimate hydration status via 4-lead bioelectrical spectroscopy device (ImpediMed SFB7, ImpediMed Inc., Carlsbad, CA, USA). Body composition was measured via BIA using the Fitbit Aria™ (see Fig. 1a) then HW (see Fig. 1b and c). For both the second and third visits, only weight, hydration status, and body composition were measured.

Fig. 1
figure 1

a Participant standing on the Fitbit Aria™ Wi-Fi smart scale with both bare feet completely on the scale surface, and weight evenly distributed b Participant preparing to submerge fully underwater in hydrostatic weighing (HW) tank as a researcher delivers verbal instructions c Participant submerging fully underwater in the HW as the researcher records measurements from the load cell

2.4 Bioelectrical impedance analysis

The participant’s height in feet and inches, weight in pounds, birth date, and sex were entered into an online Fitbit account synced with the Aria™ scale (Fitbit Inc., San Francisco, CA, USA) via private Wi-Fi. For additional security, up to eight users may associate their Fitbit.com accounts with a single scale. If someone uses the scale and they do not have an account, only weight will be displayed and the data will not be shared with anyone or attributed to any existing user account. Results from the scale may be found on their Fitbit.com dashboard at http://www.fitbit.com/weight. The Aria™ connects directly to the patient’s home wireless network and supports wireless 802.11b standard, WEP/WPA/WPA2 personal security, and automatic (DHCP) IP setup network configurations. As published by the Fitbit Aria™ Instruction Manual, the participant was asked to stand on the scale with both bare feet completely on the scale surface, and weight evenly distributed (Fig. 1a). As addressed in the Fitbit Aria™ Instruction Manual, the scale will measure the weight of children and adults over 9 kg and the maximum weight is 158 kg. The participant was instructed not to move during the measurement until the percent body fat was displayed on the scale. The Aria™ has both a ‘regular’ (AR) and ‘lean’ (AL) mode. According to the Instruction Manual, AL is intended for professional or high-level athletes, such as marathon runners or body builders, or individuals with very low body fat relative to their muscle mass (Fitbit Inc., 2016). All participants were tested using both scale modes (regular first, followed by lean mode). For each mode, three body fat measurements were recorded and the average was used to represent scale body composition. Four load cells measure weight and indium tin oxide (ITO) electrodes form a pattern on the top of the scale and send a small, safe signal through the body to measure impedance. The Aria™ uses proprietary equations to calculate the percent body fat for each mode and does not display impedance resistance values.

2.5 Hydrostatic weighing

Underwater weight (UWW) was measured in a custom made HW tank in which a chair was suspended from a load cell (Omega IN-USBH, Omega Engineering Inc., Stamford, CT, USA) (Fig. 1b and c). A minimum of three trials were performed until an underwater weight plateau was observed. The same researcher performed all UWW measurements for all three days to minimize risk of measurement bias. A damping technique, as previously described, was performed to reduce the size of mechanical scale arm oscillations [10]. The UWW was calculated as the average of the heaviest three trials measured. With both the chair and sinker attached, the load cell was calibrated to zero using a two-point calibration with a 4.5 kg weight. Load cell output was converted to kilograms using a predetermined calibration equation in the Omega transducer software installed on a laboratory computer. Three load cell data points from each trial were recorded and averaged. Water temperature was recorded to determine water density. Body volume was calculated as [11]:

$$ \mathrm{Body}\kern0.17em \mathrm{Volume}=\left(\frac{\mathrm{Dry}\;\mathrm{weight}\hbox{--} \mathrm{UWW}}{\mathrm{Water}\kern0.17em \mathrm{density}}\right)\hbox{--} \left(\mathrm{RV}+{\mathrm{V}}_{\mathrm{G}}\right) $$

Gastrointestinal gases (VG) were assumed to be 100 ml. Body density was calculated by dividing dry mass by body volume. Body fat percentage was calculated from body density by using two-component population- specific formulas [8]:

$$ \%\mathrm{Body}\;\mathrm{Fat}=\left(495/\mathrm{Body}\kern0.17em \mathrm{Density}\right)-450 $$

2.6 Statistical analysis

Statistical analysis was conducted using IBM SPSS Statistics (version 24, SPSS Inc., Chicago, IL, USA). Mean and standard deviation (mean ± SD) values were calculated for participant characteristics using data from the first day of testing. The method of Bland and Altman [12] was used to identify the mean bias (mean difference) and 95% limits of agreement between the Aria™ (in both regular and lean modes) and HW. Mean difference was determined using a one-sample t test (Aria – HW). The 95% limits of agreement were calculated as ±1.96 SD of the mean difference between methods [12]. Bland-Altman plots were analyzed for scedasticity (pattern of mean biases’ random error across the range of body fat percentages) by examining the coefficient of determination (R2) between the mean values and absolute differences. Plots were determined homoscedastic with an R2 value of 0.0–0.1, and heteroscedastic with R2 value >0.1 [13]. Pearson’s correlation and standard error of estimate (SEE) were also determined for the Aria™ compared to HW. All validity statistics were performed using data from the first day of testing. Two-way mixed, average measures absolute agreement intraclass correlation coefficients (ICC) were used to compare measurement reliability between days (Day 2 to 1) and between weeks (Day 7 to 1). The standard error of the measurement (SEM), which is an absolute measure of reliability, was calculated according to Weir et al. 2005 [14].

For all tests, statistical significance was accepted at P < 0.05.

3 Results

All participants had a BMI ranging from 16.7 to 33.1 (5.4–36.3% BF derived from HW) with BMI categorizations of 4.7% underweight (BMI below 18.5), 67.4% healthy weight (BMI 18.5–24.9), 23.3% overweight (BMI 25.0–29.9), and 2.3% obese (BMI > 30.0). Participants identified themselves as primarily Caucasian (76%), Singaporean (Chinese, Indian, Malay) (18%), and Hispanic (5%).

Bland-Altman plots comparing the Fitbit Aria™, in both AR and AL modes, to the validity criterion (HW) for all participants, males, and females are presented in Fig. 2. Compared to HW, there were no significant differences between measurements of BF% using AR for all groups (all participants, females, and males) (Table 1 and Fig. 2). There was no significant difference in measurements of BF% using AL for females compared to HW (Table 1). AL significantly underestimated BF% for all participants and males (analyzed separately from females) compared to HW (P < 0.05) (Table 1). The SEE and Pearson’s correlation coefficient for BF% are presented for each comparison in Table 1. SEE values for both modes and all groups were above 3.5%. Though the scale consistently underestimated BF% for males in both modes, the SEE values were lower in both AR (3.7%) and AL (3.6%) and closer to an acceptable error range than for females. Estimations of BF% for males were strongly correlated for both AR (r = 0.83) and AL (r = 0.84) with the selected criterion method of HW. The lowest correlation was for females using AR (r = 0.55). Table 2 quantifies relationships represented in Fig. 2, indicating scedasticity. Male participants demonstrated heteroscedasticity, indicating unequal distribution of random error across the range of body fat percentages. Additionally, linear regressions revealed no significant Bland-Altman trends for any of the methods or groups (P > 0.05).

Fig. 2
figure 2

Bland-Altman plots for all participants, women, and men showing mean biases (solid lines) and limits of agreement (dashed lines) for body fat (BF) percent estimates from each Fitbit Aria™ Wi-Fi smart scale mode (regular, AR; lean, AL) compared to hydrostatic weighing (HW) (all comparisons are from the first day of testing)

Table 1 Validity of body composition measurements by the Fitbit Aria™ Wi-Fi smart scale using hydrostatic weighing as the criterion
Table 2 Bland-Altman plot correlations indicating scedasticity

Between-day and between-week reliability of body fat percentage (BF%) measurements for HW and the Aria™ in both regular (AR) and lean (AL) modes are presented in Table 3. Results showed similar day-to-day and week-to-week reliability values (mean difference, ICC, and SEM) for both Aria™ modes and all groups (all participants, men, and women). As a control, hydration status, as measured by total body water, was found to not significantly differ between-days or between-weeks.

Table 3 Day-to-day and week-to-week reliability of Fitbit Aria™ Wi-Fi smart scale and hydrostatic weighing (HW) to estimate body fat percentage (BF%)

4 Discussion

This is the first study to investigate the validity (i.e., accuracy) and reliability (i.e., consistency) of body fat measurements using the Fitbit Aria™ Wi-Fi smart scale compared to HW. In this group of healthy adults, measurements of body fat percentage from both scale modes (AR and AL) for all groups (all participants, males, and females) were highly reproducible with ICC values ≥0.96 for both day-to-day and week-to-week (Table 3).

However, the smart scale did not estimate BF% within ±3.5% SEE of HW for either mode. Although there were no significant differences between the Aria™ measurements of BF% compared to HW for all groups with AR and for females with AL, large individual error existed and the two methods did not agree sufficiently to be used interchangeably (Table 1).

Body fat percentage estimates from the Fitbit Aria™ scale, in both AR and AL modes, support previous research indicating that segmental BIA correlates well with a criterion method, but large limits of agreement prevents satisfactory assessment for individual subjects [9, 11,12,13,14,15,16]. Depending on the specific equations used by each BIA scale, body fat results may not be accurate for certain populations [6]. Additionally, individual and sex differences in fat distribution can contribute to inaccuracies in foot-to-foot BIA body fat measurements [15]. A criticism of foot-to-foot BIA is that the electrical current emitted from the device is focused in the lower portion of the body, hence detecting gynoid fat, but potentially missing android fat [15, 16]. For the Fitbit Aria™, body fat percent for male participants was more often significantly underestimated compared to HW in either mode. This discrepancy could be due to sex differences in fat distribution, as males typically store adipose tissue around the abdomen [15]. This population of males may have had predominately android fat distribution, which the foot-to-foot BIA did not adequately register or the equations used in the scale to calculate body fat did not accurately account for, ultimately leading to underestimation of body fat.

Additionally, male participants displayed heteroscedasticity (R2, > 0.1), implying unequal distribution of random error across the range of body fat percentages (Fig. 2 and Table 2). This finding could be due to two outliers on either extreme of body fat percentage (one participant at 5.3% and one at 34.5% body fat). A larger sample size of participants at these extremes is needed to determine if the scale increasingly underestimates males with higher body fat percentages. Female participants displayed homoscedasticity (i.e., consistent pattern of random error across the range of body fat percentages). The Aria™ both under and overestimated female body composition across the range of body fat percentages, and a linear regression revealed no significant Bland-Altman trends.

In the present study, AL significantly underestimated body fat percent compared to HW for all participants and males. Although no significant difference was found for AL measurements of BF% compared to HW for female participants, the mean bias indicated that BF% was underestimated by 1.9% on average. According to the Fitbit Aria™ Instruction Manual, ‘lean’ mode is intended for lean individuals or professional athletes, such a marathon runners and body builders [7]. A study by Swartz et al. [17] investigated the effect of different activity levels on two different modes (‘athlete’ and ‘adult’) for a foot-to-foot BIA device compared to HW in adult males. The study concluded that scale mode impacted validity of body fat percentage estimates depending on the activity level of the participant. There was no significant difference between BF% compared to HW when athlete mode was used for highly (>10 h aerobic exercise per week) and moderately active adults (2.5–10 h aerobic exercise per week); though the range of individual errors were high [17]. In the present study, the activity level for each participant was not quantified; however, this could be a contributing factor to help predict which individuals should utilize lean mode on the Aria™ smart scale. Further investigations quantifying participant activity level are necessary.

The Fitbit Aria™ reliably (constantly) measured BF% for both male and female participants in either AR or AL (Table 3). Similar within- and between-day reliability results have been found using foot-to-foot BIA scales [16, 18]. Though reliability of a device is essential, it does not imply the device is sensitive at measuring changes in body composition over time, such as with fat loss or muscle mass gains. Studies have shown that similar foot-to-foot BIA devices were able to detect changes in fat-free mass and fat mass over time [6]. Utter et al. [19] found that a foot-to- foot BIA scale was able to reliably track body fat loss over time in obese individuals with the same accuracy as HW. Kavis et al. [20] found agreement between DEXA and foot-to-foot BIA in measuring the direction of change in body fat percent over time in obese children. The authors concluded that the BIA scale could be used to accurately measure the direction of change in body fat percent, but initial body fat measurements were not accurate compared to DEXA [6]. Future research investigating Aria™ sensitivity to detect changes in BF% over longer durations is needed.

Though sex, body mass, and body fat did not influence the variations in scale body fat measurements, the current results only report the accuracy and reliability for a group of predominately Caucasian male and female participants from the San Francisco Bay Area. Further research investigating a more diverse population is needed to determine if the equations used in the scale are accurate for other ethnicities/populations. This study investigated a group of adults aged 21 to 43 years. Typically, fat-free body mass declines between the ages of 30 and 70 years, while fat mass increases with age [21]. Due to the shift in body composition seen with increasing age, future research validating the Aria™ for a population aged over 45 years is necessary. Additionally, future research should include participants with a wider range of BF% and fitness levels. To improve accuracy, we recommend that future designs of at-home BIA devices 1) include hand and foot sensors to account for both gynoid and android fat, 2) utilize (and potentially develop new) equations specifically taking into account age and ethnicity (as well as sex and ‘lean’ or ‘regular’ modes), and 3) also report the impedance values detected for those that might find it useful.

A body fat measurement devices consistency over time is arguably the most important consideration when monitoring a patient’s progress during weight loss interventions. While the Fitbit Aria™ smart scale may not be valid for all individuals (as compared to “gold standard” lab methods), it is a reliable at-home method for measuring body composition from day-to-day and week-to-week. At an affordable price point of ≈$100 per unit, many patients will be able to access their body composition with the privacy and security they require. At-home Wi-Fi smart scales may empowerment patients and encourage tracking their own BF% changes over time. Health professionals may consider recommending this technology for patients who want to monitor or improve their body composition independently. In conclusion, the Fitbit Aria™ Wi-Fi smart scale can provide simple, rapid, and reproducible measurements of body fat percent over time.