Introduction

Posttraumatic stress disorder (PTSD) is a mental health condition that may follow exposure to a traumatic event (American Psychiatric Association 2013). Symptoms include reexperiencing the trauma, avoidance of reminders of the trauma, arousal, and negative cognitions. PTSD affects approximately 6% of the United States (US) population during their lifetime (Goldstein et al. 2016; Pietrzak et al. 2011). Rates are higher in combat or military-exposed populations such as veterans who use health services provided by the US Department of Veterans Affairs (VA; Holowka et al. 2014; Shiner et al. 2012a, b, c). The VA has implemented multiple effective treatments for PTSD, including two specific evidence-based psychotherapy (EBP) protocols (Karlin and Cross 2014): Cognitive Processing Therapy (CPT) and Prolonged Exposure (PE). CPT is comprised of twelve weekly 60-min sessions of cognitive therapy, during which veterans address maladaptive thoughts associated with their worst traumatic event (Resick et al. 2017). CPT can be administered either in an individual therapy format or a group format (Resick et al. 2015). PE consists of nine to twelve weekly 90-min sessions of trauma-associated imaginal and in-vivo exposures administered in an individual therapy format (Foa et al. 2007). Research trials of CPT and PE have resulted in statistically significant and clinically meaningful improvement in veterans’ PTSD symptoms (Haagen et al. 2015). The VHA Uniform Mental Health Services Package mandated the availability of these treatments in VHA clinics beginning in 2008 (Kussman 2008).

Measuring the implementation of EBPs for PTSD has been a challenge. Single-site studies have used labor-intensive chart review to identify psychotherapy notes indicating the provision of EBPs (Hundt et al. 2015; Kehle-Forbes et al. 2016; Lamp et al. 2014; Lu et al. 2016; Mott et al. 2014a, b, c; Shiner et al. 2012a). Studies attempting to measure implementation of EBPs for PTSD nationally have relied on use of psychotherapy procedural codes (Cully et al. 2008; Mott et al. 2014a, b, c), with some assumptions about how the number and timing of encounters indicate that an EBP could have been delivered (Seal et al. 2010; Spoont et al. 2010). For example, Spoont et al. (2010) measured whether patients had at least eight encounters associated with a psychotherapy procedural code over the course of 6 months (Spoont et al. 2010), while Seal et al. (2010) determined whether those encounters occurred over the course of 15 weeks (Seal et al. 2010). However, assumptions about the use of psychotherapy procedural codes may be incorrect, as these codes are not protocol-specific.

We have performed three studies using automated natural language processing (NLP) of psychotherapy notes to bridge the gap between laborious chart review and efficient but potentially inaccurate use of psychotherapy procedural codes. NLP is a method to abstract information from large unstructured bodies of note text (Meystre et al. 2008). Our general approach has been to use machine learning to train a computer to mimic the judgments of expert clinicians in classifying clinical notes (Hripcsak and Wilcox 2002); in our case, practicing therapists classify whether a psychotherapy note describes the provision of an EBP for PTSD. In our initial (single-site) study, we found that in 43% of encounters with psychotherapy procedural codes, the associated notes described services other than psychotherapy, such as intakes, psychological testing, and case management services (Shiner et al. 2012a, b, c). This raised concerns about the accuracy of psychotherapy procedural codes. In our second (regional) study of 1924 patients enrolling in six specialized outpatient PTSD clinics, patients had a mean of 9.1 encounters with psychotherapy procedural codes over their initial six months of treatment, but only 0.4 of these were EBP sessions (Shiner et al. 2013). Importantly, 6.1% (n = 121) patients received at least one EBP session. This showed both that having a given number of encounters was not a proxy for receiving EBP and that it is possible to measure EBP delivery with an automated NLP-based classifier. In our third (national) study of 255,933 Iraq and Afghanistan Veterans, we found that 20.2% (n = 51,852) received at least one EBP session over a median of 4.1 years of observation (Maguen et al. 2018). This showed we could efficiently apply an automated NLP-based classifier to a large national population. However, in focusing on Iraq and Afghanistan Veterans, this study examined only a subset of VA patients with PTSD. Additionally, this work did not examine the adequacy of treatment for patients who received EBP.

Donabedian (1997) proposed a framework for measuring healthcare quality that divides measures into domains of structure, process, and outcome (Donabedian 1997). In Donabedian’s model, a quality measure assessing whether patients with PTSD received an EBP would fall under the process domain. Such process measures would allow healthcare teams to assess the effectiveness of their efforts to improve the quality of care that they deliver. For example, staff members at a VA mental health clinic trying to increase the number of patients who receive EBP for PTSD might use such a process measure to understand whether their improvement intervention has worked. However, this model is predicated upon the validity of the quality measure. Chassin et al. (2010) proposed that to be valid, a quality measure must capture whether an evidence-based care process has actually been provided. In the case of EBP for PTSD, the receipt of at least one session is an insufficient measure of quality because the studies establishing the efficacy of EBP for PTSD typically require multiple weekly sessions delivered by the same therapist over several months. Therefore, now that we can classify whether encounters associated with psychotherapy procedural codes include the provision of EBP, the next step is to examine the effect of increased measurement requirements designed to better approximate the evidence-based care process.

This study expands our work to all veterans who initiated PTSD care in VA from 2004 through 2013. This was a time of intense demographic change (Hermes et al. 2012; Rosenheck and Fontana 2007) and resource reallocation (Wagner et al. 2011) in VA, with a national focus on improving the capacity of the VA mental health treatment system to deliver evidence-based treatments (Karlin and Cross 2014; Rosen et al. 2016). Our objectives were to: (1) measure the delivery of EBPs for PTSD to a national cohort of Veterans from diverse service eras; (2) determine longitudinal trends in EBP for PTSD delivery according to potential quality measures; and (3) determine whether quality measures that more stringently reflect the evidence supporting EBPs are associated with superior outcomes. While the VA has operationalized an EBP reporting strategy that leverages therapist-completed medical record templates (Sripada et al. 2018a, b), our prior work has shown that uptake of the templates has lagged therapist-reported use of EBPs (Shiner et al. 2018a, b). As efforts to incentivize the use standardized reporting tools such as templates are implemented (Sripada et al. 2018a, b), we feel that our work leveraging historical data will be informative to the VA and other large healthcare systems as they look to leverage these diverse data sources to develop valid quality measures to help drive improvement (Brown et al. 2014; Hepner et al. 2016).

Methods

Data Source

We used the VA corporate data warehouse (CDW) to identify patients with new PTSD treatment episodes from fiscal year 2004 (FY04) through FY13. We obtained patient demographic information as well as encounter, diagnostic, patient-reported outcome, and pharmacy data from the CDW. The Veterans Institutional Review Board of Northern New England and VA National Data Systems approved this study.

Patients

We included VA users who received a primary diagnosis of PTSD at two or more outpatient encounters, at least one of which occurred in a mental health setting, over the course of 90 days between October 1, 2003 and September 30, 2013 and had not met this criterion during the prior two years. We examined one year of treatment receipt following the first diagnosis of the two qualifying diagnoses. This was called the “index PTSD diagnosis.” When patients met the cohort inclusion criteria multiple times over the 10-year period, only their first episode was included. This resulted in a cohort of 731,520 patients. This cohort has been previously described elsewhere (Shiner et al. 2016, 2017a, b).

Evidence-Based Psychotherapy for PTSD Receipt

We identified all encounters associated with psychotherapy procedural codes for each patient during the one-year period of observation and linked these encounters to the related treatment notes. This resulted in a set of 18,185,216 documents. We used our previously-developed NLP-based classifier, which has an overall classification accuracy of 0.92 (Maguen et al. 2018), to determine whether each document described the provision of psychotherapy at all, whether psychotherapy documents described the provision of PE or CPT, and whether CPT was delivered in a group or an individual format (CPT-G, CPT-I). We found that 0.5% (n = 88,674) of documents described PE, 0.8% (n = 143,147) of documents described CPT-G, 1.2% (n = 217,250) of documents described CPT-I, 30.6% (n = 5,558,844) of documents described other group or individual psychotherapy, and 67.0% (n = 12,177,301) of documents did not describe psychotherapy at all.

Measures of Psychotherapy Quality

We followed a series of progressively restrictive steps in calculating our putative measures of psychotherapy quality. First, we used the NLP-based classifier results to determine whether each patient received any psychotherapy, any individual psychotherapy, any group psychotherapy, as well as each of the EBPs during their initial year of treatment based on their clinical notes. Second, we added a requirement that patients had an “adequate” number of psychotherapy sessions, defined here as eight or more sessions. Outcomes research in psychotherapy for anxiety and depressive disorders has indicated that half of patients achieve a clinically meaningful improvement after eight sessions (Howard et al. 1986). Similarly, most patients who respond to evidence-based psychotherapies for PTSD have achieved the bulk of their gains by session eight (Galovski et al. 2012; Tuerk et al. 2011). Third, we added a requirement that the eight sessions be delivered by the same therapist. Continuity of care is associated with improved health outcomes across disorders (van Walraven et al. 2010), and in mental health treatment in particular (Adair et al. 2005). For group therapy led by two-therapist teams, each therapist was considered separately for meeting this requirement. Fourth, we added a requirement that eight sessions be delivered during a 14-week period. Because both PE and CPT are designed for delivery in a weekly or twice-a-week format (Foa et al. 2005; Resick et al. 2002), this requirement ensures that the sessions are spaced in a similar manner to the efficacy trials supporting clinical practice, while allowing some flexibility for missed or rescheduled sessions. This treatment density standard has been used as part of VA psychotherapy performance measures (Trafton et al. 2013).

Concurrent Evidence-Based Medication for PTSD Receipt

We determined whether patients also received adequate trials of evidence-based medications for PTSD. To do this, we examined all medications dispensed by VA pharmacies during the year following the index PTSD diagnosis. Antidepressant drug names were classified into categories for individual agents and an overall category. The antidepressant drug class label was used to confirm our coding. We determined whether patients received one of the four effective antidepressants for PTSD specifically recommended in the VA/Department of Defense Clinical Practice Guideline (VA/DoD CPG) in place during the time our cohort received treatment (Friedman et al. 2010). These included fluoxetine, paroxetine, sertraline, and venlafaxine. For patients who received one of the four effective antidepressants for PTSD, we determined whether they received an adequate treatment, which we defined as eight weeks of a daily dose at least as high as the dose used in the efficacy trials supporting the treatment recommendation (Jonas et al. 2013; Watts et al. 2013). While the length of efficacy trials of psychotropic medications for PTSD varies, the VA/DoD CPG recommended medication trials of at least eight weeks (Friedman et al. 2010). Therefore, participants receiving continuous treatment of one of the following medications daily for eight weeks or more were considered to have received an adequate medication trial (AMT): fluoxetine 20 mg or more daily, paroxetine 20 mg or more daily, sertraline 100 mg or more daily, and venlafaxine 150 mg or more daily.

Covariates

We developed three groups of covariates. First, we examined patient characteristics, such as age, gender, race, military service era, rurality, military-related exposures (including combat and sexual trauma), and medical and psychiatric comorbidities. Second, we examined health service use characteristics including prior receipt of psychotherapy, outpatient visits, emergency department visits, and admissions. For prior receipt of psychotherapy, we assessed whether patients had an outpatient encounter associated with psychotherapy procedural codes in the two years prior to their index PTSD diagnosis. Outpatient visits included visits to specialized PTSD clinics, general mental health clinics, substance abuse clinics, and integrated primary care-mental health clinics. Emergency department visits included those for a psychiatric indication. Admissions included stays on an acute inpatient psychiatric clinic, a residential PTSD treatment program, or a residential substance abuse program. Third, we examined therapist characteristics. Patients were assigned a primary therapist based on the clinician who completed the plurality of their psychotherapy encounters. Primary therapists were characterized by age, gender, service section, and professional background. Service section included specialized PTSD, general mental health, substance abuse, and primary care-mental health integration clinics. Because individual therapists may work across multiple service sections, we calculated the percentage of time they spend seeing PTSD patients in various settings. This was based on our assumption that therapists who spend a higher percentage of their time in specialized PTSD settings may bring increased knowledge and experience in treating PTSD, even when seeing patients in non-specialized settings. Professional background included psychologist, social worker, nurse, and psychiatrist. To account for the possibility that some psychotherapy might be delivered briefly in the course of medication management, we assessed whether each provider had prescription privileges.

Patient-Reported Outcomes Assessment

Use of patient-reported outcome measurement using the PTSD Checklist (PCL; Weathers et al. 1993) as part of routine practice became more common beginning in FY08 (Shiner et al. 2018a, b). Therefore, we obtained available PCL data for the FY08-13 portion of the cohort. During these years, the VA used the version of the PCL corresponding to PTSD diagnostic criteria in the fourth version of the Diagnostic and Statistical Manual of Mental Disorders, or DSM-IV (American Psychiatric Association 2000; Wilkins et al. 2011). This version of the PCL was a 17-item measure with each item rated on a five-point Likert-type scale, resulting in total scores ranging from 17 through 85 (Weathers et al. 1993). Respondents were asked to rate how much they are bothered by each symptom over the last month. Symptom presence was determined by a response of “moderately” or greater (Weathers et al. 1993). Therefore, the tool could be used to determine whether patients meet minimal symptomatic criteria for PTSD according to DSM-IV, defined as one re-experiencing symptom, three avoidance and numbing symptoms, and two hyperarousal symptoms. Clinically meaningful improvement has been previously defined as a decrease of 10 points or more on the PCL (Monson et al. 2008). A clinically meaningful improvement in PTSD symptoms plus no longer meeting diagnostic criteria for PTSD has been shown to be an important marker of improved quality of life (Schnurr and Lunney 2016).

Analysis

Our analysis plan was divided into descriptive and causal elements. For descriptive analyses using the entire FY04-13 cohort, we summarized cohort characteristics and compared patients who had at least one encounter that was administratively coded as psychotherapy with those who did not using t-test or χ2 analysis, as appropriate. We then described psychotherapy receipt as measured using both administrative coding and the NLP-based clinical note classification algorithm for the entire cohort during each fiscal year and for the overall 10-year period. We then focused on psychotherapy initiation by excluding patients who had encounters that were administratively coded as psychotherapy in the two years prior to their index PTSD diagnosis and recalculated initiation rates for each psychotherapy category for each individual fiscal year and for the overall 10-year period. We progressively added the measures of psychotherapy quality described above to this sub-cohort newly initiating psychotherapy, representing the cumulative number of patients who met each increasingly restrictive standard during their first year of PTSD treatment.

For causal analyses using patients from the FY08-13 portion of the cohort, we identified patients who initiated EBP at progressively higher levels of adherence to our “quality” measures (8 visits, 8 visits with the same therapist, 8 visits with the same therapist within 14 weeks) and had concurrent symptoms measurement using the PCL, as defined below. We created orthogonal comparison groups by including patients only in the longitudinally earliest (first during treatment year) quality standard that they met. Patients who initiated care that met multiple quality standards on the same day were assigned to the strictest standard met on that day. From this group, we selected patients who had a minimum of a PCL score at or before the second session (baseline) but no more than 14 days prior to the first session, and at or after the seventh session (follow-up) but no more than 14 days after the eighth session. To ensure patients had active PTSD symptoms at baseline, we required that they meet DSM-IV symptomatic criteria on their baseline PCL. When there were multiple PCL scores meeting our baseline criterion, we selected the measure closest to session 1. When there were multiple PCL scores meeting our follow-up criterion, we selected the measure closest to session 8. We calculated two change measures from baseline to follow-up: (1) mean PCL change, and (2) “loss of diagnosis,” which included both no longer meeting symptomatic criteria for PTSD plus experiencing a meaningful decrease in symptoms of 10 points or more.

Following a procedure developed in prior work (Shiner et al. 2018a, b), we examined both the raw change in PTSD symptoms among those with measurement and the patient characteristic-weighted mean change, as well as the percentage of patients achieving our reliable change and loss of diagnosis criteria. Given that we were comparing three conditions (8 visits, 8 visits with the same therapist, 8 visits with the same therapist within 14 weeks), we used a conservative Bonferroni-corrected alpha of p < 0.0167 for pre/post comparisons to avoid type I error. We balanced patient characteristics that have a plausible association with the outcome using inverse propensity of treatment weighting (IPTW; Stuart 2010). We estimated propensity scores with multinomial logistic regression using generalized booster effects (McCaffrey et al. 2013), in which case the dependent variable is an indicator for the quality standard met and the independent variables are an antiparsimonious specification of variables that have a plausible correlation with the outcome. Using these propensity scores, we weighted participants in order to balance the pretreatment covariate distribution. Covariates in the IPTW model included baseline PCL score, number of days between the baseline PCL and session 1, number of days between follow-up PCL and session 8, and all covariates described in Table 1. In balancing almost 50 patient characteristics, a Bonferroni correction would indicate a corrected alpha of p < 0.001. However, we conservatively maintained an alpha threshold of p < 0.01 for significant differences to avoid type II error. Therefore, covariates that continued to differ at the p < 0.01 threshold after IPTW were included as covariates in models of change in PTSD symptoms. We assessed the potential contribution of unmeasured confounding on our results by calculating E-values, which indicate the minimum strength of association on the risk ratio scale that an unmeasured confounder would need to have with both the exposure and the outcome, conditional on the measured covariates, to fully explain away a specific exposure-outcome association (Haneuse et al. 2019; VanderWeele and Ding 2017).

Table 1 VA Users with new episodes of PTSD care from 2004 to 2013, by receipt of psychotherapy procedure code

In addition to our pre/post measures, we performed a repeated measures model that included all PCL measurements between baseline and follow-up. We used a generalized linear mixed model (GLMM) to account for both within-person and across-person variability. We compared changes in PTSD symptom during the time treatment was delivered, including a time by treatment interaction to measure the change in slope over time among the tree treatment groups. The model is weighted by the inverse of the propensity scores and adjusted for any unbalanced covariates (p < 0.01). We performed data management in SAS version 9.4 (SAS Institute), and developed causal models in R version 3.5.0 (R core team). This included IPTW models created using the R twang package (Ridgeway et al. 2017), and models to detect unmeasured confounding using the R evalue package (Mathur et al. 2018).

Results

Of the 731,520 patients in our cohort, 88.6% (n = 647,513) had at least one psychotherapy procedural code during their first year of PTSD treatment. Patients who did and did not receive a psychotherapy procedural code differed on almost all variables (Table 1). Most prominently, those who received a psychotherapy procedural code were more likely to be women, to have experienced sexual trauma while in the military, and to have comorbid psychiatric and substance abuse diagnoses. At the same time, they were less likely to be rural or to have been exposed to combat. They also received other VA health services at higher levels, and importantly, 47.1% (n = 305,132) also received a psychotherapy procedural code in the two years prior to their index PTSD diagnosis. Almost half of patients who received a psychotherapy procedural code saw a woman as their primary therapist, and patients most commonly saw a psychologist or social worker as their primary therapist. In over a third of cases, the primary therapist had prescription privileges, indicating that therapy could have been coded as part of medication management. Patients primary therapists generally spent most of their time in general mental health settings, followed by specialized PTSD settings.

In the overall cohort, use of any psychotherapy, whether classified using procedural codes or natural language processing, increased over the 10-year period of examination (Table 2). While the percentage of patients receiving at least one psychotherapy procedural code had little room for improvement, the difference between receipt of any psychotherapy as measured using procedural codes and as measured using NLP decreased from FY04-05 (86.0% versus 54.7%) to FY12-13 (90.2% versus 65.8%). At the same time, the mean number of psychotherapy encounters remained stable (9.3 versus 10.0). This indicates that despite persistence of procedural coding discrepancies, more patients with PTSD were actually receiving psychotherapy during administratively coded psychotherapy encounters by the end of the period of examination. Furthermore, there was a dramatic increase in the use of EBP for PTSD, from 0.7% in FY04-05 to 14.1% in FY12-13. The most common EBP modality was individual CPT-I, followed by CPT-G, and PE.

Table 2 Psychotherapy receipt in the in the year following initial PTSD diagnosis

We then applied quality standards to psychotherapy receipt among the 54.1% (n = 396,032) of patients initiating psychotherapy after their index PTSD diagnosis. This resulted in a decrease in the percentage of patients who met those standards as the standards became more stringent (Table 3). For example, while 86.5% received at least one procedural code for psychotherapy in their first year of treatment, only 13.8% received eight or more sessions (as measured using procedural codes) over the course of any 14-week period. Similarly, if we use NLP rather than procedural codes to classify psychotherapy receipt, the figure drops from 13.8% to 11.4%. If we then require that NLP indicates the sessions are EBP, the figure drops from 11.4% to 2.0%. Therefore, estimates of psychotherapy receipt appear to be highly dependent on both the restrictiveness of the quality standards and the content of the psychotherapy notes. Despite these caveats, quality as determined by all standards we applied improved over time during the period of examination (Appendix 1).

Table 3 Psychotherapy initiation in the year following initial PTSD diagnosis among 396,032 patients with no psychotherapy encounters in the 2 years prior to PTSD diagnosis, Fiscal years 2004–2013; mean of 7.8 (SD = 10.8) psychotherapy encounters

A substantial number of patients from the FY08-13 cohort who met our increasingly restrictive quality standards had PCL measurement aligned with sessions 1 and 8 and were included in analyses comparing outcomes among patients who met increasingly strict quality standards. Among the 10,765 patients who had 8 or more sessions of EBP as measured using NLP, 19.1% (n = 2052) met our PCL-based inclusion criteria. Table 4 shows that there were few significant differences among patients who had 8 or more sessions of EBP with and without aligned PCL measurement. Furthermore, where differences were significant, the magnitude was small. After applying the IPTW procedure to balance covariates across quality standard groups, only one unbalanced variable remained (Appendix 2): days between baseline PCL and session 1. This unbalanced variable was used as a covariate in weighted analyses.

Table 4 VA users initiating evidence-based psychotherapy for PTSD and completing 8 or more sessions within a year, fY 2008–2013, by receipt of aligned PTSD checklist measurement

In pre/post causal analysis (Table 5), the most stringent quality standard (8 EBP sessions with the same therapist within 14 weeks) was associated with significantly higher rates of loss of diagnosis (23.3% versus 13.8%; p = 0.0004, e = 2.78) and continuous improvement on the PCL (− 9.3 versus − 7.1; p = 0.0101, e = 1.60) than the least stringent standard (any 8 EBP sessions during the first year of treatment, but not the second most stringent quality standard (8 EBP sessions with the same therapist during the first year of treatment). However, the second most stringent quality standard was not significantly superior to the least stringent quality standard, indicating that across data sources, only the strictest definition of treatment adequacy was consistently associated with superior pre/post outcomes. The e-value findings indicate that it would take a very strong unmeasured confounder (relative risk of 2.78 or greater) to overturn the loss of diagnosis finding and a moderately strong unmeasured confounder (relative risk of 1.60 or greater) to overturn the continuous improvement on the PCL finding. Our GLMM approach supports this assessment (Fig. 1). The rate of improvement in PCL score was best when using the most stringent treatment adequacy standard. Thus, requiring a quality standard of 8 or more sessions with the same therapist within 14 weeks was associated with both the greatest amount of pre/post change and the fastest rate of change.

Table 5 Inverse propensity of treatment weighted comparison of PTSD symptomatic outcomes for patients completing 8 or more sessions of evidence-based psychotherapy for PTSD with aligned PTSD Checklist measurement, FY 2008–2013, by quality standard
Fig. 1
figure 1

Repeated Measures Model of Change in Total PCL Score. Note. NLP = Natural Language Processing; EBP = Evidence-Based Psychotherapy for Posttraumatic Stress Disorder

Discussion

We found that psychotherapy for PTSD quality standards that more stringently reflect the underlying evidence were associated with superior outcomes in clinical practice. Thus, our work provides preliminary validity for an NLP-based quality measure comprising eight or more sessions of EBP, delivered by the same therapist, over the course of 14 weeks. The percentage of VA patients with new PTSD treatment episodes meeting this standard improved from 0.1 to 3.7% over a 10-year period marked by investment in mental health services from 2004 through 2013. This improvement is likely a reflection of the resources invested in the national implementation of EBP for PTSD. However, these findings highlight that while most patients initiating PTSD care in the VA did receive some psychotherapy in the initial year, the vast majority did not meet this quality standard. Thus, it is possible that many patients initiating care during this period would have benefited from more intensive treatment. This work shows that by examining the content of psychotherapy sessions, it is possible to avoid overestimating treatment quality, providing a more accurate baseline against which to measure the effect of improvement efforts. Regardless of how session content is measured in the future (e.g., NLP of note text versus the use of EBP-specific note templates), our work provides a basic framework for using the related data to develop an EBP for PTSD quality measure.

Our study addresses several gaps in the available research regarding quality measurement for PTSD treatment. First, few studies include clinical detail from chart notes, such as whether an EBP was delivered (Hepner et al. 2016). By using NLP, we were able to identify when an EBP was delivered for each person in the cohort and incorporate this information into our quality measures. Similarly, most measures of psychotherapy focus on access to care or quantifying the number of visits, and often this is due to limited data on diagnosis, severity of illness, treatment history, and the content and number of visits (Brown et al. 2014). Availability of these additional factors in our dataset allowed us to perform causal analyses in order to determine whether various definitions of quality were associated with improved PTSD outcomes.

There are several limitations to our study. First, we did not examine a range of cutoffs for the required number of sessions and for number of weeks over which those sessions should be delivered. Examining multiple cutoffs would have created an unmanageable number of comparisons, across which we would have had to balance our covariates to avoid bias in causal analyses. Thus, we used a single standard for number of sessions supported by prior research and a single standard for treatment density that has been used operationally in the VA. Future research should address the question of the minimal number of sessions for an adequate treatment and the maximum amount of time over which those sessions should be delivered. Second, we did not compare EBP to non-EBP. Extensive available research already demonstrates that trauma-focused evidence-based psychotherapy for PTSD is associated with superior outcomes to non-specific psychotherapy in the treatment of PTSD (The Management of Posttraumatic Stress Disorder Work Group 2017). While additional “real world” studies about the clinical effectiveness of EBPs for PTSD (compared to other treatments) may be warranted, our work is not designed to make those inferences. Fourth, there were several differences in potentially important patient and therapist characteristics among those meeting various quality standards. However, analyses controlled for key differences and our sensitivity analyses indicate that unmeasured confounding is unlikely to overturn our outcome. Finally, even NLP of psychotherapy notes to detect EBP use is a proxy measure of EBP delivery. Without video, we cannot be sure what happened during psychotherapy sessions. However, we believe that our NLP method is the closet possible approximation to study EBP implementation in the VA during the critical time period examined.

In summary, this research demonstrates that a theoretically-oriented approach to quality measurement can be used to create the basic structure of a psychotherapy for PTSD quality measure. While our work captures the receipt of effective and timely treatment, our measure of quality is incomplete. Health systems should also seek to provide PTSD care that is safe, patient-centered, equitable, and efficient (Pincus et al. 2007).