Introduction

Panic disorder and agoraphobia (PD/A) are very common and tend to become chronic if untreated (Jacobi et al., 2014). These anxiety disorders often cause considerable impairments in patients’ social, work, and private life (Skapinakis et al., 2011). Research has shown that cognitive behavioral therapy (CBT) is effective in the treatment of PD/A (e.g. Bandelow et al., 2021; NICE, 2020). However, numerous patients remain untreated due to long waiting times for psychotherapy, fear of being stigmatized, and differences in geographical proximity to mental health care institutions (Neutens, 2015; Villatoro et al., 2022). Hence, there is a great need for alternative treatments which are easy to access. Recently, digital health applications (DHA) have gained popularity as an alternative approach to treating people with various mental disorders, including anxiety disorders. DHA are medical applications, which can be prescribed by therapists or physicians, and which are implemented by means of a computer program, a browser-based web-program, or a smartphone-based app (Maaß et al., 2022). Meta-analyses have shown that DHA are well accepted and effective in the treatment of various mental problems, including anxiety disorders (Andersson et al., 2014; Andrews et al., 2010, 2018). A more detailed analysis of specific effects of such interventions for panic and agoraphobic symptoms is provided in a more recent review and meta-analysis (Stech et al., 2020). The authors summarized that DHA outperformed waitlist control groups and psychoeducational control groups, both for panic and agoraphobic symptoms with higher effect sizes for panic symptoms (Stech et al., 2020). Results from three studies even suggested that DHA might be as effective as face-to-face treatment for PD/A, and that these effects are stable. Even more encouraging, Pauley et al. (2023) calculated a Number needed to treat in DHA for PD/A with 1.8, suggesting that half of all persons treated with DHA will benefit from treatment.

These findings are somewhat limited by a high heterogeneity of study quality (Stech et al., 2020) and by collapsing different kinds of DHA and hybrid interventions combining digital and therapeutic contact in one analysis. It is, thus, still difficult to conclude which kind of DHA might work best for anxiety disorders. As these disorders are often linked to considerable situational avoidance, DHA using smartphone-based interventions might be especially promising as treatment often incorporates exercises outside, and people are used to taking their smartphones with them wherever they go (de Vries et al., 2021; Heron & Smyth, 2010).

To date, smartphone-based interventions for anxiety yielded somewhat mixed results (see Firth et al., 2017). Positive effects were shown by Ivanova and colleagues (2016) who compared a waitlist control group with an Acceptance and Commitment Therapy (ACT) for social anxiety and panic disorder, delivered by laptop and smartphone. ACT outperformed the control group with an effect size of d = 0.39 in the reduction of general anxiety symptoms. Interestingly, effects were comparable between participants who received additional therapeutic feedback and participants that used the digital intervention without further guidance, suggesting that the app as stand-alone treatment was effective. However, it has to be noted that no significant group differences were found when PD/A symptoms were analyzed separately. Another recent study by Ebenfeld and colleagues (2021) found a desktop- and smartphone-based CBT self-management intervention for PD/A to be superior in reducing PD/A symptoms compared to a waitlist control group. Here, participants received standardized written feedback and reminders from coaches to increase motivation and adherence.

While there is some evidence that DHA might be effective in reducing symptoms of panic disorder and agoraphobia, the relevance of additional therapeutic contact is still object of debate. While some studies have indicated that guided DHA were significantly more effective in reducing symptom severity (Domhardt et al., 2019) and led to higher adherence rates (Linardon et al., 2019) than self-management interventions, Pauley et al. (2023) and Ivanova et al. (2016) found no significant difference between guided and unguided digital interventions for anxiety. With regard to the limited resources in health care, evidence-based interventions working without any therapist guidance might further help to improve health care for individuals suffering from PD/A who cannot be treated otherwise. In addition, relying on a smartphone app rather than relying on hybrid internet- and smartphone-interventions offers many advantages, such as smartphones being used much more frequently and, therefore, offering immediate engagement in exercises and monitoring of symptoms (Linardon et al., 2020). These advantages may also keep users more engaged (Heron & Smyth, 2010).

However, specific studies examining smartphone-based DHA as a self-management treatment in panic disorder and agoraphobia are missing. Therefore, the app “Mindable” was developed for providing self-help for people suffering from PD/A. App development was informed by an evidence-based cognitive behavioral treatment manual for panic disorder and agoraphobia (Lang et al., 2012), that emphasizes exposure exercises as a core active ingredient of CBT. The aforementioned treatment manual has demonstrated its effectiveness in a large multicenter RCT, examining two variants of treatment delivery: Exposure with and without therapeutic assistance (Gloster et al., 2011). Interestingly, both treatment variants led to stable reductions in anxiety symptoms, suggesting that an instructed exposure exercise without therapist-assistance in the field might yield positive effects. This finding encourages the assumption that a well described self-help instruction to exposure treatment might hold potential as a low-threshold alternative when psychotherapy is not available.

The development of the app “Mindable” was guided by the treatment manual used in the study of Gloster and colleagues, thus, attempting to translate an evidence-based treatment to a digital intervention. As studies have shown that adding cognitive interventions to exposure does not lead to a significant improvement of the treatment, the app “Mindable” does not include cognitive therapy (Lang et al., 2009) but focuses on psychoeducation, self-monitoring, and exposure to internal and external cues.

The current study aimed to evaluate the effects of a smartphone-based self-management delivery of an exposure-based treatment manual for panic disorder and agoraphobia. Notably, the treatment was realized without additional therapeutic contacts as a stand-alone self-help intervention. Based on previous studies, it was assumed that the app “Mindable” would be effective in reducing symptoms associated with panic disorder and agoraphobia. More specifically, it was hypothesized that using the app would result in higher reductions in panic and agoraphobic symptoms as well as stronger improvements in anxiety-related control beliefs, quality of life, and functional impairment compared to a waitlist control group.

Method

Study Design

To evaluate the app’s efficacy, we conducted a prospective multicentered two-armed RCT. Participants diagnosed with panic disorder and/ or agoraphobia were randomly assigned either to an intervention group (AG) who received the app “Mindable” for eight weeks or to a waitlist control group (CG) who received general information on self-help during waiting time for psychotherapy (Hähnel et al., 2004). Ethical approval for the study was granted by the ethical committee of the German Society for Psychology (reference number: LangThomas2020-12-14VA). The study was preregistered with the Clinical Trial Registration (Registration number: DRKS00029090) and conducted in agreement with CONSORT guidelines.

Randomization

Randomization was conducted by a staff member not involved in the diagnostic processes. The assignment was stratified for the presence or absence of agoraphobia as well as for the study center, which conducted the initial diagnostic processes. Block randomization with n = 20 participants per block was used to achieve a balanced distribution. The random numbers were determined by the program “research randomizer” (www.randomizer.org). The staff members involved in the diagnostic processes were blinded to the participants’ assignment, and baseline- and post-assessment were carried out by different staff members.

Participants

Participants were recruited from July 2020 until September 2022 via online-advertisements, social media, press releases, and during initial presentations at the outpatient treatment centers involved in the study (Hamburg, Bremen, Münster). Because of the Covid-Pandemic, recruitment and conduction of baseline- and post-assessments were switched from face-to-face to online in September 2021. Hence, participants could then be recruited from all over Germany.

Participants had to meet the DSM-5 diagnostic criteria of panic disorder and/ or agoraphobia which were assessed using a structured clinical interview and be at least 18 years of age. Participants were excluded, if they (a) were currently undergoing psychotherapy, (b) did not have a smartphone, (c) had had a change in medication in the last two months or were taking benzodiazepines, (d) had a comorbid substance use or psychotic disorder, (e) had comorbid chronic respiratory or cardiovascular diseases, (f) were suicidal and (g) did not have sufficient German language abilities or illiteracy.

Treatment

Participants in the treatment condition received a standardized introduction to the app “Mindable” and were guided through the installation of the app on their individual smartphone. They received an activation code for the app directly from the researcher. Usually, the app “Mindable” has to be prescribed by physicians or psychotherapists who confirm that people meet the diagnostic criteria of PD/A. With the prescription, patients may ask their health insurance for an activation code that activates the app on their smartphone. For study purposes, the app was ready to use when installed.

The app includes a module “psychoeducation”, comprising nine multimedia information lessons on the etiology and maintenance of panic disorder and agoraphobia. A second module “symptom provocation” informs about aims and procedures of interoceptive exposure and provides instructions for interoceptive exercises, such as hyperventilation or spinning. The third module “exposure” provides information about exposure in vivo as well as suggestions and protocols for specific exposure exercises. In both the “symptom provocation” and the “exposure” module, participants are first instructed to select a specific exercise out of given examples (e.g. spinning or riding a bus). It is also possible to create an individual exposure hierarchy and derive exercises. Users might start an exercise within the app and can record the course of the exercise within specific protocols. The app also provides reminders for planned exercises and charts examining anxiety curves within and across exercises. The app also includes a daily “diary” concerning anticipatory anxiety, panic attacks and avoidance behavior. Once a week, users are invited to give information on the current state of their symptoms in the “checkup”.

For our study, participants were instructed to use the app on a daily basis over the period of eight weeks. Participants were informed that they could contact research staff in case of technical problems, but there was no further therapeutic contact. All app modules were accessible from the beginning, and participants were free to decide how they wanted to use the app. More detailed information on all app data points can be found in Appendix D.

The waitlist control group received a standardized leaflet with non-specific information on how to deal with waiting time for psychotherapy (recommendations of self-help literature and general recommendations, such as staying in contact with others, carrying out self-care activities, and exercising). This leaflet was developed for another study examining effects of a minimal intervention during the waiting time for psychotherapy (Helbig & Hoyer, 2007).

Assessments

The baseline-assessment (T1) comprised the informed consent procedure, and – when a written consent was provided, a structured diagnostic interview as well as all outcome measures and additional symptom measures to further examine moderator and mediator variables. After four weeks, a between-assessment (T2) was conducted online in order to be able to analyze specific mediators or moderators in the future. Eight weeks after the baseline-assessment, the post-assessment (T3) took place, which comprised a structured diagnostic interview as well as the outcome measures and the additional symptom measures. After twelve weeks (T4), participants were asked to rate the outcome measures and the additional symptom measures again in an exploratory follow-up-assessment. The between-assessments are not included in the current analysis, as no moderator or mediator analyses are conducted. Follow-up data was only partially available, and waitlist control group participants had mostly started using the app “Mindable” by the time the follow-up-assessment took place compromising treatment integrity at follow-up. Therefore, these data were not used for further group comparisons. However, we provide descriptive follow-up data in Appendices E to G.

Treatment integrity was evaluated by means of an adherence score, which was defined a priori based on clinical judgements. To generate the adherence score, the recommended app use was defined for each app module and meeting these recommendations or even exceeding them resulted in an adherence score of 100%. Falling below the recommendations was also defined and thus, categorized in lower adherence scores. Firstly, three adherence scores were calculated for (1) the module “psychoeducation”, (2) the “checkup” and the “diary” and (3) the number of overall exercises in the modules “symptom provocation” and “exposure”. To reach a 100% adherence score, participants had to (1) complete nine out of nine psychoeducation lessons, (2) have at least eight entries in the “checkup” and “diary”, and (3) have conducted at least five exercises. Secondly, an average and overall adherence score was calculated out of the three adherence scores. Participants having an adherence score of ≥ 75% were considered as adherent.

Diagnostic Status

The German version of the Diagnostic Interview for Mental Disorders (DIPS, Margraf et al., 2017) was used to evaluate whether participants met any exclusion or the inclusion criteria in the baseline-assessment. The DIPS assesses all mental disorders according to the DSM-5. The interview was delivered in the centers or via certified video software (RED Medical, 2014) by trained members of staff with a diploma or master’s degree in Psychology and who were currently training to be psychotherapists.

Outcome Measures

Severity of Panic and Agoraphobic Symptoms

The primary outcome measure was the self-assessed severity of panic and agoraphobic symptoms by means of the German version of the Panic and Agoraphobia Scale (PAS, Bandelow, 2016). The PAS is useful in measuring treatment efficacy (Bandelow et al., 1998) and comprises five subscales (panic attacks, agoraphobic avoidance, anticipatory anxiety, impairment in social relationships and work, assumptions of somatic disease). The 13 items are rated on five-point Likert-scales. Total scores range between 0 and 52 with scores from 0 to 8 indicating no clinically relevant symptoms, scores from 9 to 28 indicating moderate symptoms and scores from 29 and above indicating severe levels of symptoms. Cronbach α for the total score was 0.84 in this study.

Quality of Life

Quality of life was assessed by means of the German version of the World Health Organization Quality of Life Scale (WHOQOL-BREF, Angermeyer et al., 2000). Its psychometric properties are highly satisfactory (Skevington et al., 2004). Participants are asked to rate 26 items which cover the four domains physical health, psychological health, social relationships and environment. For the current study, the domain psychological health was chosen as outcome-relevant (Cronbach α = 0.81 in this study). Scores can range from 0 to 100.

Anxiety-Related Control Beliefs

Anxiety-related control beliefs were assessed by means of the German version of the Anxiety Control Questionnaire (ACQ, Helbig-Lang et al., 2012b). Studies have shown that anxiety-related control beliefs serve as a mediator for symptom change in psychotherapy of anxiety disorders. The ACQ is highly reliable (Brown et al., 2004). The 30 items are rated on a scale from 0 (“strongly disagree”) to 5 (“strongly agree”) and total scores range from 0 to 150. In the current study, Cronbach α was 0.81.

Functional Impairment

Perceived functional impairment was assessed by means of the German version of the Sheehan Disability Scale (SDS, Leon et al., 1992). The SDS consists of three items. Participants are asked to rate their impairment in professional, social and family settings. Impairment scores range from 0 (“not at all”), 1–3 (“mildly”), 4–6 (“moderately”), 7–9 (“markedly”) to 10 (“extremely”). The SDS is a well evaluated measure (Hodgins, 2013). Cronbach α was 0.69 in the current study.

User Satisfaction

User satisfaction was assessed by means of the German version of the Client Satisfaction Questionnaire adapted to Internet-based interventions (CSQ-I; Boß et al., 2016). It consists of eight statements with response scales ranging from 1 (does not apply to me) to 4 (does totally apply to me) concerning participants’ satisfaction with the app. Its psychometric properties are of good quality (Boß et al., 2016).

Statistical Analyses

Sample Size

The program G*Power (Faul et al., 2007) was used to calculate the sample size needed in the study based on the assumption of finding moderate effects (d = 0.6) in the case of the primary outcome measure. As a power of 0.8, a significance level of α = 0.05 and a dropout-rate of 30% were assumed, N = 92 participants were the calculated sample size needed in the current study.

Test for Bias

The primary and secondary outcomes’ descriptive statistics and distributions were calculated to test whether they met the requirements for the statistical analyses. In addition, all baseline variables were tested for significant biases concerning the factors “group”, “study center” and “dropout”.

To account for possible selection effects, outcome analyses were conducted in two data sets: (a) full analysis set (FAS) or intent-to-treat-sample, and (b) per protocol set (PPS) or completer sample, which consisted of all participants who took part in the baseline- and post-assessment. For exploratory analyses, a (c) quality set (QS) was analyzed, which consisted of participants from the PPS set which had an adherence score of at least 75% in the use of the app “Mindable”.

Missing Values

As the questionnaires were presented electronically, there were no missing values, as participants were forced to make a choice if they wanted to continue with the questionnaire. In the case of the analyses in the FAS set, it was made sure that missing values due to dropout were missing completely at random. After that, missing values were estimated by means of multiple imputation.

Data Analysis

App efficacy was analyzed by means of linear mixed models (LMM). The dependent variables were scores of primary and secondary outcome measures. The variable “participant” was utilized as a random effect in the model in order to control for individual differences in participants. Furthermore, the variables “group” (app vs. waitlist control), “time” (baseline vs. post), the group x time-interaction, as well as meeting the criteria of an agoraphobia and either being recruited “online” vs. “face-to-face” were utilized as fixed effects. All analyses were two-tailed (α = 0.05). To account for multiple testing, secondary outcome analyses were additionally analyzed considering a Bonferroni-corrected α of 0.017.

Analyses for all four hypotheses were performed using a “full-model” including the interaction that was compared to a “null-model” that differed only by excluding the interaction. This was done in order to be able to evaluate whether including the interaction significantly improved the model x data-fit. In reference to Nakagawa and Schielzeth (2013), the difference in variance explained by each of the two models (R2) was calculated in order to specify the variance explained by the interaction between time and group.

Response was evaluated by calculating the Reliable Change Index (RCI, Christensen & Mendoza, 1986; Jacobson et al., 1984). Remission was defined by achieving reliable change and a PAS score below the cut-off of 8 points in the post-assessment.

Data were prepared and analyzed using IBM SPSS 29.0 statistical software and the R packages lme4 for calculating the LMM (Bates et al., 2015), r2glmm for calculating the R2 (Nakagawa & Schielzeth, 2013) and mitml for the multiple imputation (Grund et al., 2023). In order to impute missing data, 20 estimations per missing value were made, using all variables relevant to the LMM as predictors. These 20 estimations were then pooled in accordance with Rubin’s rules (Rubin, 1987). The multiple imputation’s convergence was examined and found to be satisfactory. All Rhat-scores were < 1.05 and all Trace-plots varied about a constant equilibrium.

Results

Participants and Enrollment

After screening, a total of 122 participants took part in the baseline-assessment. 107 individuals (78.7%) met the inclusion criteria, and were included in the study (see Fig. 1 for a study flow chart). Seventeen participants (15.9%) dropped out during the observation period or could not be contacted anymore. Therefore, 90 participants completed the study. Dropout-rates did not significantly differ between the two study groups (χ2(1) = 1.06, p = .30).

Fig. 1
figure 1

Flow of participants

Baseline Characteristics

Table 1 shows the participants’ baseline characteristics. The majority of the participants were female, younger than 40 years of age and highly educated. Most participants met the criteria of both panic disorder and agoraphobia (n = 87), and a third of the participants had at least one comorbid disorder (mostly depressive disorders).

Table 1 Demographic sample characteristics at baseline

Bias Control

It was analyzed whether participants having dropped out of the study differed from participants completing the study in baseline variables. No significant differences were found in the demographic variables nor the primary and secondary outcomes. Hence, the risk of selection biases was estimated to be low.

Additionally, it was analyzed whether baseline variables differed across study centers. Because of the change in recruitment from face-to-face to online, a new variable “recruitment” was created with the labels “Bremen” (n = 20), “Hamburg” (n = 17), “Münster” (n = 3) and “online” (n = 67). Significant differences between these recruitment “centers” were found concerning age (F(3) = 2.92; p = .038), the percentage of people suffering from an agoraphobia and a panic disorder (χ²(1) = 9.25, p = .026) and baseline values on the PAS (F(3) = 5.27; p = .02). Post-hoc analyses showed that the “online” group had significantly higher levels of self-rated anxiety in comparison to the other recruitment groups. Because of the low number of participants in Münster, this group could not be considered in the post-hoc analyses. For further analyses, the dichotomous variable “online” vs. “face-to-face” was created and used as a fixed effect in the main outcome analyses.

Adverse Events

Two participants reported symptom deterioration during the observation period, both due to other reasons than study participation. One of them dropped out of the study. No other adverse events were reported.

App Usage

Two participants in the app group did not consent to the processing of their app data. Hence, app data was available for 55 participants. App usage widely differed with the highest mean adherence score for the psychoeducation module (M = 85.86, SD = 33.16), where more than 85% of participants completed all of the psychoeducation contents. On average, participants completed 7.73 (SD = 2.98) out of nine psychoeducation lessons. The modules “symptom provocation” and “exposure” were used less frequently. Only half of the participants (56.4%) underwent five or more exercises. 25% of the participants did not document any exercise. The mean sum of interoceptive and in vivo exposure exercises was M = 9.15 (SD = 15.73). On average, participants completed 63% (SD = 42.23) of the recommended exercises in the exercise modules. Regarding the self-monitoring modules “daily diary” and “weekly checkup”, 70% of participants made at least eight entries, with an average of M = 15.20 (SD = 14.93) entries. On average, participants completed 78% (SD = 30.43) of the recommended checkup entries. The mean overall adherence score was 75.6% (SD = 27.72) and 34 out of 55 (61.8%) participants met the overall adherence score cut-off of 75.

User Satisfaction

Satisfaction with the app was moderate to high. Mean agreement was highest for the items “I would recommend this training to a friend, if he or she were in need of similar help” (M = 3.39, SD = 0.71) and “The training I attended was of high quality” (M = 3.30, SD = 0.70). The item “The training has met my needs” received the lowest agreement (M = 2.67, SD = 0.94) in the study sample. The user satisfaction mean scores and standard deviations were also calculated for the participants, who had an adherence score of > 75% (QS, n = 29). High app users did not differ in satisfaction compared to the whole app group.

Severity of Panic and Agoraphobic Symptoms

Table 2 shows the means and standard deviations at the different assessment time points in the FAS- and PPS-data sets of all outcomes. The mean PAS scores were higher than 8 and lower than 28 points indicating moderate PD/A symptom severity. In order to analyze, whether using the app would result in higher reductions in PD/A symptoms, the change in the total PAS scores over time was compared between the two groups by means of LMM. The results for the intent-to-treat- (FAS) and the completer- (PPS) data sets are depicted in Table 3. Exploratory analyses for all outcome measures in the quality (QS)-data set can be found in the Appendix A and B. Descriptive data on PAS subscales can be found in Appendix C.

Table 2 Descriptive data of the outcome variables at baseline- and post-assessment in the two groups and the FAS- and PPS-data sets

In all three data sets the time as well as the group x time interaction effect became significant, whereas the overall group effect did not. Hence, the app group and the waitlist control group changed differently over time. In all three data sets it made a significant difference whether the interaction was included in the model or not, but the variance in the data explained by the “full-model” was low in all three cases (FAS: F(1, 760.431) = 11.538, p < .001, R2 = 0.20; PPS: χ²(1) = 9.205, p = .002, R2 = 0.19; QS: χ²(1) = 9.067, p = .003, R2 = 0.20). Adding the interaction to the model had a small effect in all three data sets (Diff(R2full– R2null): FAS = 0.019; PPS = 0.016; QS = 0.022).

Furthermore, the factor “site” gained significance in all three data sets indicating that people differed dependent on the way they were assessed (online vs. face-to-face). The factor “agoraphobia” was significant in the FAS-data set but not in the PPS- and QS- data sets. This suggests that suffering from agoraphobia influenced the response to the study.

Response and Remission

The Reliable Change Index as an indicator of response in participants was calculated by means of the PAS’ norm sample (rtt = 0.78). This resulted in a reliable change being defined by a difference of > 13.4 points from baseline- to post-assessment. Based on this definition, no participant displayed a reliable deterioration. 13% of the app group (n = 6) and 9.1% (n = 4) of the control-group reported reliable improvements (χ²(1) = 0.356 p = .551). 4.3% of the app group (n = 2) and none of the waitlist control group participants met the remission criteria (χ²(1) = 1.957 p = .162) of having an RCI > 13.4 and falling below the PAS cut-off score of 8 points. At post-assessment, 86.67% (AG = 84.78%, CG = 88.64%) fulfilled the DSM-5 diagnostic criteria of a panic disorder and 82.22% (AG = 80.43%, CG = 84.09%) those of an agoraphobia. Response and remission rates did not significantly differ between groups.

Secondary Outcomes

Quality of Life

Quality of life was evaluated to be moderate. It varied around 55 and 54 points with the WHOQOL-BREF (domain psychological health) ranging from 0 to 100 points (see Table 2). The group x time interaction did not gain significance in any of the data sets indicating no significant differences over time in quality of life in the two groups. Table 3 depicts the results of the LMM in the intent-to-treat- and the completer- data sets (see Appendix A and B for results in the quality-data set). Therefore, no further analyses concerning the importance of the interaction effect for perceived quality of life were calculated. The effect of time was significant in all data sets. In the FAS- and QS-data sets the time effect did not gain significance when being analyzed by means of the Bonferroni corrected α-level of 0.017.

Table 3 Results of the LMM in all outcome measures and the FAS- and PPS-data sets

Anxiety-Related Control Beliefs

Both groups had slightly lower mean scores on the ACQ at baseline than the cut-off of 75.5 indicating relevant impairments in anxiety-related control beliefs (see Table 2). Next, it was examined whether the app group and the waitlist control group differed significantly over time in their change in anxiety-related control beliefs. The results of the LMM in the intent-to-treat- and the completer-data sets are shown in Table 3 (see Appendix A and B for the results in the quality-data set). The group effect was significant in the FAS analysis but not in the PPS or QS analysis whereas the time effect was significant in all three data sets. The group x time interaction was significant in all three data sets, too. In all three data sets it made a significant difference whether the interaction effect was included in the model or if not, but the variance explained by the “full-model” was low in all three data sets (FAS: F(1, 2508.540) = 8.701, p = .003, R2 = 0.125); PPS: χ²(1) = 6.770, p = .009, R2 = 0.145); QS = χ²(1) = 5.908, p = .015, R2 = 0.15). Adding the interaction effect to the model had a small effect in all three data sets (Diff(R2full– R2null): FAS = 0.011; PPS = 0.009; QS = 0.009).

Disability

The SDS scores varied around 4 points in both groups which indicated that participants were “moderately” impaired in functioning. To examine whether the app group and the waitlist control group differed significantly in the change in functional impairment over time LMM were calculated in the three data sets FAS, PPS and QS. The results of the FAS- and PPS-data sets are shown in Table 3 (see Appendix A and B for results of the QS-data set). The time factor gained significance in all three data sets, indicating that all participants’ functional impairment improved significantly over time. The factor “site” gained significance in the FAS- data set. The latter effect was not significant anymore when considering the Bonferroni-adjusted α-level. As the time x group interaction was not significant no further analysis were conducted concerning the importance of the interaction for the model.

Discussion

Main Findings

Many people suffering from panic disorder and/or agoraphobia remain untreated due to limited access and/or availability of therapeutic treatment. Given the high chronicity of these conditions, improving mental health care for those people seems mandatory. The current study, thus, aimed to evaluate a self-management approach using a smartphone-based app that can easily be implemented.

Our results showed that the app “Mindable” led to a significant reduction of PD/A symptoms over time compared to a waitlist control group. This effect was found across the different data analysis sets, indicating stability of the findings. Reductions in panic and agoraphobic symptoms were smaller than in face-to-face CBT (see Gloster et al., 2011), however, as the extent of PD/A symptoms at baseline was comparable between the therapy study (PAS range of 27.1–28.4) and our study (mean baseline PAS of 26.7), and given that these reductions were achieved without any therapist assistance, we believe this to be a valuable contribution to support persons with serious anxiety problems. The significant reduction of PD/A symptoms found in the current study is also in line with recent meta-analyses and systematic reviews on DHA for PD/A (Andrews et al., 2018; Domhardt et al., 2020; Stech et al., 2020; Weisel et al., 2019). Concerning smartphone-based DHA, our findings are somewhat in contrast to findings from Ivanova and colleagues (2016) who could not show significant improvements for panic symptoms. This might be attributed to the different treatment approaches used: Ivanova et al. (2016) relied on an ACT protocol, while the app “Mindable” used an exposure rationale, as the evidence-based treatment for PD/A. In line with this assumption, Ebenfeld and colleagues (2021) who also used exposure exercises within their app, also showed significant improvements. It has to be noted that the reductions found in our study were achieved by pure self-help without further therapeutic contact, thus providing further evidence that non-guided DHA might be as effective as DHA incorporating further therapeutic contact (Pauley et al., 2023).

The app group showed a significant increase of anxiety-related control beliefs in comparison to the waitlist control group over time. This is a promising finding as anxiety control is not only associated with avoidance behavior (Telch et al., 1989; White et al., 2006), it was also found to mediate the relationship between panic attacks and subsequent increases in anticipatory anxiety (Helbig-Lang et al., 2012a). Studies in therapeutic settings have already shown that an increase in anxiety-related control beliefs was accompanied by symptom reductions in patients with panic disorder and social anxiety disorder (Craske et al., 2014). It might be assumed the increases in perceived anxiety control have also mediated anxiety reductions in this study, however, this assumption needs further evaluation.

Participants in both groups showed a significant increase in quality of life and functional impairment over time. This is in line with the findings that passive psychoeducation strategies, such as receiving information or materials on a mental disorder or receiving feedback on one’s diagnosis after a structured clinical interview have been shown to lead to symptom reduction in depression and anxiety (Donker et al., 2009). Anyhow, the group x time effects on quality of life and functional impairment were not significant in the current study. The time period may have been too short to be able to find significant changes in the perceived quality of life by means of the app. Ebenfeld and colleagues (2021) for instance, found no significant changes in quality of life in people suffering from PD/A at the post-assessment or a 3-month follow-up after using an app combined with online coaching, but at 6-month follow-up participants showed significant increases in quality of life. An explanation for the lack of change in functional impairment due to the app may be that the questionnaires used to assess functional impairment were too unspecific and therefore, external confounding influences may have had a greater impact on the answers given in these questionnaires compared to symptom specific questionnaires.

All in all, the results on the effect of smartphone-based DHA on the change in perceived quality of life and functional impairment in people suffering from PD/A are still inconsistent and further research is needed.

It can be highlighted that the study was conducted under naturalistic conditions (few exclusion criteria, participants were free in the way they used the app). As the app did not require further contact with a therapist, this treatment approach meets the current challenges faced by practitioners and patients due to the limited access to psychotherapy. The switch to online baseline-assessments allowed people from all over Germany to take part in the study and further strengthened external validity. Furthermore, the sample size was satisfactory in comparison to many other studies in the field. The drop-out rate of 15.9% was lower than the mean drop-out rate calculated in a recent meta-analysis on DHA (24%, Linardon & Fuller-Tyszkiewicz, 2020), maybe reflecting the rather high user satisfaction reported by participants.

Limitations

Findings have to be considered with caution due to some study limitations. First, the change from face-to-face to online assessments may have led to a distortion of the study results. Because of the COVID-19 pandemic, face-to-face assessments were not feasible for most of the participants. As a result, more severe cases of PD/A may have been able to participate because people did not have to leave their home to take part. Differences in the baseline-assessment between participants of the “face-to-face” vs. the “online” group in anxiety symptoms underline this assumption. On the one hand, variance in the baseline-assessments and, therefore, the general representativeness of the sample may have been increased; on the other hand, maybe more severely anxious participants were included in the study for whom the app “Mindable” may not be indicated. Anyhow, exploratory analyses showed no significant differences between “face-to-face” and “online” participants in the way PD/A symptoms changed over time.

Second, suffering from an agoraphobia had a significant impact on how participants responded to the study. It is possible that this effect was overestimated because only very few participants did not meet the diagnostic criteria of an agoraphobia (n = 14) and, thus, variation in this fixed factor was low. Anyhow, other studies have found that agoraphobic avoidance at baseline leads to less improvement (e.g. Porter & Chambless, 2015). Future studies should further examine whether the self-management apps are equally suitable for individuals with panic disorder, agoraphobia, and both disorders. In this regard, future studies may also profit from a clinical evaluation of panic and agoraphobia symptoms next to the self-report by participants.

Third, it needs to be noted that even though the study was conducted under naturalistic conditions, the findings may be limited in their real-life applicability. Participants in our study had contact to professionals at baseline- and post-assessment. Research on DHA where participants could take part in studies without any contact with a researcher led to higher attrition rates compared to participants who at least had a telephone or an in-person interview with a researcher (Linardon et al., 2020). As the use of Mindable has a medical prescription as a prerequisite, this at least partially reflects naturalistic health care. Additionally, the app “Mindable” offers reminder messages because this has been shown to lead to lower attrition rates (Linardon et al., 2020).

A last, but serious limitation of the study is the lack of conclusive follow-up data. As participants in the waitlist group gained access to the app during the follow-up period, group comparisons at follow-up were not meaningful to analyze long-term effects. An exploratory analysis of follow-up data suggested that improvements in PD/A symptoms were not stable, as the PAS total score deteriorated from post to follow-up (Appendix E). As these analyses were based on a limited sample of n = 23 participants of the app group only, we cannot conclude whether this reflects an actual deterioration effect or an unspecific effect of time. These effects were also limited to the PAS; improvements in anxiety-related control beliefs remained rather stable, suggesting that participants had achieved confidence in coping with anxiety-related symptoms. However, there is a need for further studies specifically examining long-term effects of DHA, especially when using these strategies as part of a stepped care approach.

All in all, future studies should replicate the current study’s results and expand on them. For instance, follow-up analyses will be important to further be able to make assumptions on the long-term efficacy of using a self-management app for treating PD/A symptoms. Further, interindividual variables should be examined to be able to make more solid statements on the mechanisms underlying the change in PD/A symptoms over time. It is of great importance to further examine how exactly which app module impacts which symptom and which therapeutic mechanism lies beyond the symptom improvements. Comparing the app effects to an active control group and examining whether the app might further enhance effectiveness of face-to-face therapy will also foster health care for anxiety disorders in the future.

Conclusions

In conclusion, the self-management smartphone app “Mindable” has shown to significantly reduce anxiety symptoms and to improve anxiety-related control beliefs in patients suffering from panic disorder and/ or agoraphobia. It is well accepted and easy to use. As no symptom remission could be achieved and the exploratory follow-up analysis indicated low stability in PD/A symptom change, this approach may be helpful for a stepped care approach for anxiety disorders taking limited treatment capacities into account.