Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preliminary Remarks

As defined by Good Practice in Secondary Data Analysis (GPS; see below), secondary data are “data which are provided for analysis over and above their original, primary purpose.” Their classification as secondary data critically rests upon differences between the primary purpose for which the data are collected and their subsequent utilization. It is irrelevant for classification whether the data owners themselves or third parties undertake the further utilization of the data. Therefore, routine data of health insurances (‘claims data’) are considered secondary data not only if they are used for scientific purposes but also if they are used by the health insurance for health services planning, for instance. GPS further specifies: “Secondary data analysis defines the utilization of secondary data. Secondary data analysis includes the survey and preparation steps of the secondary data body that are required for the analysis. By these preparation steps the data are accessible for scientific questions” [1].

In recent years, a variety of data sets have been made available for scientific utilization in the context of secondary data analyses. Since the 1990s, the initial focus was almost exclusively on routine dataFootnote 1 from the statutory health insurances (SHIs) , which was promoted by initial extensive studies and an early memorandum on the analysis and utilization of health care and social data; in recent years, routine data from the German statutory pension system and Federal Employment Agency have increasingly been made available to researchers as well [25].

This chapter focuses on the utilization of SHI-financed medical services , so we will not further discuss the latter data sets. For additional information, refer to the websites of the research data centers of the statutory pension fund (www.fdz-rv.de) and the Institute for Employment Research (IAB; http://fdz.iab.de). The health-related data in official statistics (e.g., hospital diagnosis statistics) can be utilized as well through the research data center of the German federal states’ and federal government’s statistical offices (www.forschungsdatenzentrum.de). Since these data are very similar in content to SHI data, these sources are not explicitly discussed here. However, the opportunities and limitations of the utilization of secondary data as discussed in this chapter generally apply to these data sets as well. Additional discussions of these data sets are found in [6, 7]. Kurth (2008) presents additional specific data sets that could be used for health-care utilization (HCU) research and health services monitoring (e.g., data from clinical registries, networks of expertise, or external quality assurance) [8].

This chapter focuses on discussing the scientific utilization of SHI routine data since these data provide an almost complete picture of the utilization of medical services, namely of all SHI-financed services . In addition, we are guided by the idea that “health services [and particularly utilization; E.S.] research requires data collected in routine care” [9]. Figure 5.1 shows that social data and particularly the discussed SHI data can provide information about the entire utilization process, particularly about the specific health services processes (throughput), the provided services (output), and the results of medical services (outcomes). This will be discussed in greater detail under ‘Examples of use’ [10].

Fig. 5.1
figure 1

Social data and the health-care system. (Source: [10])

Data Structure

Discussing the methodological, technical, and data protection aspects of the scientific utilization of SHI routine data first requires an overview of the data stored by the SHIs. The German Social Code Book V (SGB V), Chap. 10, “Transmission of health services data” [Übermittlung von Leistungsdaten] (§ 295 ff.) lays down provisions regarding the standardized transmission of data about rendered services and associated costs by the service providers (physicians in private practice, hospitals, rehabilitation facilities, pharmacies, and nonphysician service providers such as occupational and physical therapists) to the insurances. With the exception of dental careFootnote 2, every major health-care sector requires specific case-based documentation for every physician–patient contact (Table 5.1), and the specific content of the transmitted data is specified down to individual variables. The 2004 revision of § 295 included the outpatient sector in this documentation requirement.

Table 5.1 Provisions in the Social Code, Book V on the transmission of routine data from the statutory health insurance

The routine data are owned by the individual SHIs. The scientific utilization of these data within the context of health services research, therefore, requires contacting the insurances and discussing and contractually agreeing on the objectives of use under aspects of health services epidemiology. This aspect is discussed in greater detail below.

Methodological Aspects of Handling Routine Data

SHI routine data are primarily collected for the billing of medical services. The transmitted variables and the type of coding are specified in accordance with this primary purpose. This fact must always be borne in mind when discussing the advantages and disadvantages of utilizing routine data in the context of HCU research . It also affects validation and data processing as central steps before the actual data analysis can start. Before presenting the associated methodological problems and solution approaches, we list a series of distinctive characteristics that reveal the suitability of these data for research purposes:

  • Population-based: Epidemiological figures are typically expressed as rates, with the absolute number of target events in the numerator and a definable and quantifiable population in the denominator. Such rates are also relevant for HCU research , for instance, when expressing treatment frequencies (e.g., number of diabetics per 1,000 insured) or distinct events (e.g., number of hospital admissions with diabetes as the main diagnosis), which may be further differentiated by age, gender , and other socio-demographic characteristics. Typical epidemiological figures such as incidence (new cases or initial documentation of diagnosis) and prevalence (number of treated patients with a certain diagnosis) can be clearly determined on the basis of SHI routine data by using the number of members insured by an SHI at a specific date or during an observation period (the so-called population at risk). Since exact insurance coverage periods are available for each member, the denominator can list precise member-years. Other HCU-relevant data from physician practices, hospitals, or disease registers generally do not include this precise population-based information.

  • Individual-based: SHI routine data are highly significant for HCU research because the health services provided to individual insured members can be tracked over an extended time period, both retrospectively and prospectively. Through pseudonymized health insurance-member numbers, all of the member’s contacts in all sectors can be pooled, regardless of service provider and location of service provision [11]. This is not true for statistics that are merely case-based or sector-based (such as the German federal hospital diagnostics statistics). For instance, these statistics present transfers and readmissions as separate cases without a clear link to the insured individual. The official diagnostics statistics, therefore, do not reveal how many patients were involved in the roughly 18.5 million hospitalizations (2010; www.gbe-bund.de).

  • Residence-based and location-based: The postal code of the insured, with the fifth digit often removed for data protection purposes, allows the detailed resolution of epidemiological figures to the district and community level, irrespective of the difficulties associated with the clear matching of postal codes to communities and districts. No other health services data currently allow such small-scale regional representation of the health services situation with clear population-based information. Local health services analyses can be performed with regard to the service providers as well, in compliance with data protection regulations [1, 12].

  • Data quality: In the context of their primary purpose, routine data are checked for completeness and accuracy, for instance, regarding the consistency of reimbursement-relevant diagnostic and surgical data. However, variables that are not checked for quality by the data owner, such as information on specialty departments or causes of inpatient admission, must be separately validated by the secondary user. This also applies to specific content-related questions, such as the review of the documentation of diagnoses and services in the context of disease-specific and procedure-specific examinations [13, 14].

  • Completeness: Because they are relevant to reimbursement, most SHI data sets can be assumed to be nearly 100 % complete (inpatient cases, pharmaceutical prescriptions, non-pharmaceutical therapy, and technical aid prescriptions), and they are associated with an extremely low risk of selective reporting bias, which is often a suspected problem in external quality assurance. However, only information about SHI-financed services can be assumed to be complete. Privately financed services, such as the purchase of over-the-counter (OTC) pharmaceuticals , are by definition missing from SHI routine data. Private prescriptions and so-called IGeL services (Individuelle Gesundheitsleistungen, individual self-paid health services) are not systematically documented. Data of incapacity to work also underrepresent short-term sick leave lasting 3 days or less because many employers only require employees to submit an incapacity certificate starting on the fourth missed working day [6].

  • Low cost: Routine data are created through standardized pathways within the day-to-day operation of the health-care system. The only cost-relevant aspects for HCU researchers are expenses arising for data processing, supply, and transmission beyond the routine administration process. At the insurances, these expenses are comparatively low since only filtration is typically required, and the data are not assessed further for content. The data are typically available in a form that facilitates subsequent information technology (IT) processing. Nevertheless, the expenses and time required for supplying the SHI data can represent obstacles to scientific utilization in light of the routine tasks of the data owners.

For many years, the scientific utilization of routine data was deemed ‘second-class research’ because it used supposedly inferior (secondary) data and did not employ specific instruments. Its problems, methods, and results did, therefore, not receive appropriate recognition. This negative perception changed with the growing availability and scientific utilization of SHI routine data and the initial publication of a scientific standard for the proper use of these data (GPS) , which drew on Good Epidemiological Practice (GEP) [17, 18]. GPS was the first established standard for conducting secondary data analyses, and it represented a basis for contracts between data owners and external secondary users as well. GPS has now been revised twice, most recently in 2012 (download available at www.dgepi.de, also in English translation).

The first revision structurally aligned GPS with GEP . GPS now also features 11 guidelines, each with explanations and recommendations. The guidelines primarily reflect the entire secondary data analysis process from data generation and verification to analysis and interpretation. Specifically, GPS provides recommendations on study design, data processing, data analysis, quality assurance, data privacy, and contractual frameworks as well as the scientific independence of secondary users. GPS particularly emphasizes the documentation and transparency of data processing. Herein lies the biggest difference to research with primary data since the subsequent secondary users cannot influence the generation and collection of routine data. GPS additionally stresses the need for contractually regulating the secondary utilization by defining the rights and duties of data owners and health services researchers. Finally, the utilization of SHI data within the context of health services research requires full compliance with the applicable data privacy provisions and the SGB [1921].

The GPS’ target group is data owners, secondary users involved in social medicine and health services epidemiology, and those who use their research results. This includes not only members of universities but also all those who apply scientific methods to secondary data and their analyses from a scientific perspective. GPS is now an established standard. The improved reputation of secondary data analysis and recognition of its importance in health services research and hence HCU research are highlighted by the methodology memos issued by the German Network for Health Services Research, which explicitly discuss the utilization of secondary data [22, 23].

The next section covers the specific uses of SHI routine data and the associated methodological problems and solution approaches.

Examples of Use

SHI routine data supply information about all phases of the health services process that were outlined above (see Fig. 5.1), although to a varying extent. Input-oriented health services research includes needs research and the investigation of health services utilization. The analysis of routine data primarily aims to reveal utilization; on the basis of retrospective contact analyses, they can also provide approximate information about objective needs but not subjective needs. Throughput-oriented health services research deals with structures and institutions; here, routine data can supply suitable process and outcome parameters for comparisons between service providers. Routine data directly indicate the output of the health-care system through billed services. Increasingly, SHI routine data are also used to identify the outcomes of medical services. Table 5.2 presents the fields in which SHI routine data can be scientifically utilized [24].

Table 5.2 Selected applications for SHI data. (Source: [24], shortened)

A 2009 review found 70 studies on the use of pharmaceutical prescription data alone [25]. Newer studies and studies with a different focus add to this number. At this point, a complete overview of the use of SHI routine data in Germany can, therefore, not be provided. Instead, a few brief examples of current questions that are investigated using SHI routine data will be presented below. These examples are intended to demonstrate the wide range of potential uses and to encourage more intensive utilization. Some of the examples come from analyses of the Sample Survey of Persons Insured in SHI Institutions in Hessen [Versichertenstichprobe AOK Hessen/KV Hessen] since the author considers this database the most extensive and longest used data source for secondary data analysis in Germany [26]. Numerous additional examples are found, for instance, in Swart and Heller (2007), Schubert et al. (2008), Swart and Ihle (2005), Grobe (2008), and the Health Care Report of the AOK Research Institute (WIdO), which has been published annually since 2011 [7, 24, 2729].

Sector-based analyses

Pharmaceutical prescription data have been in routine use for many years. The review by Hoffmann (2009) has already been mentioned [25]. Nink et al. (2005) provide a detailed description of the opportunities presented by pharmaceutical data in health services research (information systems, research questions, and selected results) [30]. The Pharmaceutical Prescription Report [Arzneimittelverordnungsreport], which has been published annually since 1985, is one of the standard references that provide detailed information about utilization behavior in one of the most important health-care sectors. In the Pharmaceutical Prescription Report, the prescriptions written by SHI-accredited physicians are subjected to a systematic, descriptive analysis and evaluation to improve market and cost transparency. Regular contents of the Prescription Report include the general prescription and market development, the prescription frequency for new active substances, and topics related to indication groups. The 2012 Pharmaceutical Prescription Report, for instance, focuses on analyzing the initial results of the early benefit assessment of new active ingredients, where the additional benefit of new medicines is quantified as compared to established medicines, within the context of the Act on the Reorganization of the Pharmaceutical Market (AMNOG: Arzneimittelneuordnungsgesetz) [31].

This reference publication is now supplemented by specific health reports published by individual SHIs [32]. The additional reference to a defined population of insured members allows the calculation of epidemiological indices and the identification of foci of care. Targeted analyses seem useful when a high percentage (of costs) of prescribed pharmaceuticals benefit a small percentage of members. For instance, 50 % of all pharmaceutical expenditures were spent on fewer than 4 % of members of the GEK (Gmünder Ersatzkasse; now fused with BARMER to form BARMER-GEK) [32, Table 5.3]. This group primarily consisted of seriously ill patients with multiple morbidities, who are frequently treated with a complex, poorly coordinated medication regimen. Such constellations lend themselves to steering and coordinating measures in the context of disease management programs.

Table 5.3 Distribution of pharmaceutical costs among insured cases, BARMER GEK, 2010 (in %). (Source: [32])

Data of incapacity-to-work are among the most intensively used SHI data to date. Routine reports on the basis of incapacity-to-work reports for the subpopulation of working members have been part of the standard reporting system of all health insurers for many years [3335]. The annual report on time missed from work published by the AOK Research Institute is merely one example of reports that provide detailed information about the development of sickness figures in the German economy and examine in detail the incapacity-to-work situation in the individual sectors of the economy, with varying focus areas [36]. We will not further discuss these data since the incapacity-to-work situation only indirectly supplies information about HCU. For additional methodological discussions regarding the utilization of data on incapacity to work, please refer to specific publications [16, 37].

The data from the inpatient sector have been used intensively for many years as well. Again, there is an established tradition of regularly published insurance-specific health reports, such as by BARMER GEK or BKK Bundesverband [National Association of Company Health Insurance Funds] (www.bkk.de) [38]. These analyses of the inpatient sector frequently discuss so-called high users of the health-care system, meaning members requiring frequent and/or expensive services, who are occasionally referred to as “revolving door” patients. Specific patient groups can be analyzed through specific diagnoses and thanks to the clear member-based information in SHI routine data . Alcoholic patients, for example, are very high users even when they present with other symptoms. After eliminating hospitalizations that are directly related to alcoholism (cases with F10 as main diagnosis), the hospitalization frequency within a 4-year period was still three times higher in alcoholics than in a comparison group of members without alcohol problems (Table 5.4) [39]. Such analyses are made possible by the clear reference to the individual member, which reveals multiple utilizations (in this case, hospitalizations) by the same patient; this reference is not available in other case-based data sources (such as the German federal hospital diagnosis statistics).

Table 5.4 Diagnosis-specific hospitalization frequency for insured members with and without alcohol problems, GEK 2000 through 2003. (cases per 1,000 insured years, age-adjusted). (Source: [39])

Until 2004, it was difficult to identify the health services situation in the outpatient sector because case-based information was not directly transmitted by the service-providing SHI-accredited physicians to the insurances. Until that year, primarily case-based analyses from the outpatient sector could only be created by the Central Research Institute of Ambulatory Health Care [40]. The revision of § 295 SGB V in 2004 enabled case-based or even individual-based analyses of the outpatient care situation using SHI data. The outpatient care sector generates enormously large data sets because about 90 % of all SHI members have at least one outpatient physician contact per year, and an average of 15–20 annual physician contacts per person have been documented through 2008 [28]. In addition, they include data on confirmed diagnoses, tentative diagnoses, and diagnoses by exclusion as well as services according to the EBM (Uniform Value Scale) catalog. While valid contact-based analyses can no longer be performed following the most recent EBM reform in 2008, these data still allow analyses by age, gender , and diagnosis. The available data on the 8.7 million members of BARMER-GEK alone include 74.5 treated cases per quarter, 296 million diagnosis codes, and 498 million billing codes [41].

Prescriptions of non-pharmaceutical therapies and technical aids account for less than 5 % of SHI expenses. However, this sector has been associated with a disproportionate increase in expenses, so that it is increasingly the focus of health services analyses. The National Associations of the SHIs, with support from the AOK Research Institute , have, therefore, created a non-pharmacological therapies information system (Heilmittel-Informations-System, HIS) that has been in operation for several years. Schröder et al. (2005) describe the HIS design and structure as well as analysis options [42]. On this basis, annual reports on non-pharmaceutical therapies have been published since 2004; they supply information about the frequency of prescriptions for nonphysician services (e.g., physical, occupational, and speech therapy) [43]. Other insurances now regularly publish reports on non-pharmaceutical therapies and technical aids as well [44].

Morbidity estimates and costs of specific diseases

The use of routine data plays a central role in incidence and prevalence estimates of acute and particularly chronic diseases. More precisely, the numbers indicate treatment incidence and treatment prevalence because secondary data reveal newly arisen or existing diseases only if they become treatment-relevant and are listed as inpatient or outpatient main or secondary diagnoses. SHI secondary data are ideally suited to depict the health services demand as reflected by utilization, show its course over time, and predict future utilization on the basis of population projections. Such uses have been common for several years, for instance for diabetes [45, 46].

Disease-based cross-sector analyses of SHI routine data also allow the calculation of direct disease-related health services costs in the context of health economic analyses. Such analyses can be conducted on a member level and extrapolated to all patients with a specific disease. The disease-related excess costs, when compared with costs for members without the particular disease, can be determined through a case–control approach with matched pairs. For instance, the KoDiM study calculated the total health-care costs for diabetes mellitus in Germany as EUR 45–50 billion per year (2009), which represents a 70 % increase since 2000. A 28 % increase remains, even after the data are standardized by age and gender and inflation-adjusted. This distinct rise results not so much from a growth in per capita costs (approx. EUR 6,000 per year) or excess costs (EUR 2,600 per year) but rather from the increase in diagnosed diabetics to nearly 10 % of all SHI members [47]. This study provides further evidence that controlled study designs can be used with and within SHI routine data.

In the same manner, incidences over time can be estimated in individuals with risk factors, provided the risk factors can be ICD (International Statistical Classification of Diseases and Related Health Problems) operationalized, or in individuals with similar or related diseases; examples include estimates for amputation in diabetes mellitus or endometriosis in patients with related symptoms. [48, 49] Finally, the frequencies of (nonspecific) symptoms and the relationship with potential diagnoses can be determined using routine data, for instance for back pain, an area that is highly relevant in health services [50, 51].

Analyses of specific groups of insured members

Beyond pure description, SHI routine data allow a qualitative assessment of the utilization behavior of specific groups of members. For example, recent publications address and quantify the prescription frequencies for pharmaceuticals that are potentially hazardous for older members and are on the so-called PRISCUS list, a listing of medications for which safer alternatives exist from a pharmacological perspective [52]. The work of Schubert et al. (2012) shows that around 1 % of retirement-age members receive immediate-release nifedipine, contrary to guidelines [53]. The procedure discussed in that paper can be analogously applied to other specific pharmaceuticals, for instance, to long-term prescriptions following acute myocardial infarction. A prospective analysis of secondary data over an average of 4.2 years showed continuous guideline adherence for 40 % of patients [54].

However, patient groups can be defined by more than age, gender , and diagnoses. Regardless of methodological difficulties, SHI routine data are also suitable for analyzing socially disadvantaged groups of members and their typically above-average utilization. This is because occupational biographies in the form of documented employment and unemployment periods are available, at least for working-age members [55]. As example, unemployed members exhibit a higher utilization of inpatient services than employed members in all diagnostic classes; this difference is particularly pronounced in mental illnesses and behavioral disorders, which also includes addictions. Overall, unemployed men accrue about twice as many inpatient days as employed men, a difference that is not as pronounced in women [56].

Cross-sectional analyses of utilization behavior cannot establish causal relationships between unemployment and illness. However, longitudinal designs with the same data set can more specifically investigate potential cause–effect relationships. They show that unemployment is associated not only with poorer health and higher utilization but also with a higher mortality risk. This is found when classifying the insured by degree of unemployment and tracking deaths in each of the four unemployment classes for a 3-year follow-up period. The mortality risk rises with increasing unemployment experience and is three times as high in long-term unemployed members experiencing more than 2 years of unemployment than in those who were never unemployed (Table 5.5) [56]. Hence, SHI routine data can also be used to implement challenging epidemiological study designs.

Table 5.5 Mortality risk of unemployed persons, GEK, 2004. (Source: [55], slightly shortened)

Subgroup analysis can also focus on specific services that carry high priority in the German health-care system for curative or preventative purposes. This includes services within the German cancer screening program . The SHI catalog of services includes annual fecal occult blood tests (starting at 50 years of age) and colonoscopy (starting at 55 years of age, up to two screenings at least 10 years apart). Analyses of SHI routine data revealed utilization by fewer than 25 % of entitled members and higher utilization by women due to regular gynecologist visits [57]. In this area, routine data can certainly contribute to quantifying underuse, overuse, and misuse of health services [58].

Quality of care

Assessing the outcomes of various treatment alternatives is significant in clinical research. For service providers, patients, and health insurers, an empirically grounded assessment of the quality of care offered by individual service providers is just as important. Internationally, routine data have been commonly used in quality assessments for some time [5962]. At the same time, external quality assurance measures that have been established for a longer time have proven to be time-consuming and error-prone. The joint project “Quality assurance in inpatient care using routine data (QSR)” of the Federal Association of the AOK, the AOK Research Institute (WIdO), and Helios Kliniken was developed in consideration of this background. In the course of the project, a method was developed to enable the relatively inexpensive, longitudinal, risk-adjusted assessment of the quality of care with the aid of SHI routine data, namely using tracer-specific, outcome-related quality indicators [48, 63]. The validity of the collected data was found to be less affected by quality-relevant biases because the data are not generated in view of a quality assessment, the data are verified by the insurance, and key quality indicators (e.g., hospital readmissions) are documented independently from the institution to be assessed. Even more importantly, unlike most other quality assurance methods in QSR, individual patient-based analysis is possible in addition to case-based analysis. As a result, this procedure enables true longitudinal analyses and ultimately better indicators of the quality of outcomes.

The QSR procedure has been further advanced in the meantime [64]. The AOK hospital navigator now publishes the QSR results for the service areas ‘implantation of a hip endoprosthesis in patients with coxarthrosis’, ‘implantation of a hip endoprosthesis or osteosynthesis in patients with hip fracture’, ‘implantation of a total knee endoprosthesis’, and ‘cholecystectomy’ (gall bladder removal) (www.qualitaetssicherung-mit-routinedaten.de, as of: May 2012).

Building on the QSR project and in a manner similar to the prospective tracking of myocardial infarction patients, SHI secondary data can be used to present a differentiated picture of the health services situation and medium-term and long-term quality of care in other health-care areas as well, for instance, for osteoarthritis, revisions following endoprosthetic procedures, or initial care dependency [65]. However, this example also demonstrates the (current) limitations of SHI routine data : They reveal (contra)indications to surgical procedures only to a limited degree since they do not include patient-related information (e.g., pain and mobility limitations), and relevant diagnosis-related history information may be missing in case of short observation periods. Notwithstanding their limitations, routine data allow subgroup analyses of members with increased service utilization or of those who are more vulnerable.

Local health services research

Typically, most actors within the health-care system have a local perspective and act on the community and district levels (or postal code level). For instance, this applies to the choice of a suitable hospital (from the perspective of the patients and referring physicians), the analysis of hospital catchment areas (from the perspective of service providers), or structural planning (from the perspective of the planning authorities). SHI routine data allow such local health services analyses because they include a clear local reference with the member’s place of residence and the location of the service-providing institution (physician’s practice and hospital), which potentially permits the resolution of health services structures and processes down to district, community, or postal code.

Local HCU research is important because substantial differences in the utilization of medical services are the rule rather than the exception in Germany, a situation that has been known from other countries for years [1, 12, 6670]. In the state of Sachsen-Anhalt, there are regions with above-average and below-average utilization persisting over time, even after adjusting for age and gender . When breaking down the total utilization of inpatient services by four-digit postal codes, utilization varies by some 20–30 % above and below the state average, independently of population density and SHI-accredited physician density. In certain (common) diagnoses and surgeries, differences by a factor 2–5 are frequently seen [1] (Fig. 5.2).

Fig. 5.2
figure 2

Age-adjusted hospitalization frequency by four-digit postal code areas for members of AOK Sachsen-Anhalt, 2006. (values for the 96 postal code areas vary in a range from 2,080 to 3,832; 10th percentile: 2,275; median: 2,626; 90th percentile: 3,033) (source: [1])

Other interesting results of local analyses of the health services situation include: (a) the examination of underuse, overuse, and misuse, and their effects on planning; (b) the quantification of patient migration from and to other service areas, which can affect planning in border regions; (c) the analysis of hospital catchment areas and market shares, which can provide insights into future focus areas of service providers; and (d) the benchmarking of neighboring or structurally similar hospitals using valid process and outcome indicators [58].

Evaluation of new health services models

The discussed suitability of SHI routine data for complex, controlled epidemiological studies now enables the evaluation of complex new health-care models, such as disease management programs (DMPs) or integrated care (IC) models . The common self-selection bias generally complicates the evaluation of such programs by simple comparison of participants and nonparticipants. However, advanced methods of secondary data analysis, such as propensity score matching, allow generating control groups that no longer differ from participants in validly operationalizable utilization-influencing factors [71]. Using this approach, an analysis of member data from Techniker Krankenkasse [technicians’ health insurance] failed to show a clear proof of effectiveness for the diabetes mellitus DMP: Lower inpatient care costs and fewer emergency admissions, on the one hand, were offset by higher outpatient utilization and more prescriptions on the other [72]. However, such study approaches cannot ensure full structural equivalence because they include only utilization-influencing factors that are expressed by objective socio-demographic characteristics or by documented diagnoses and services.

Some types of controlled studies can also be conducted on the basis of SHI routine data . For instance, an intention-to-treat approach was used in a study of overuse, underuse, and misuse conducted to evaluate the project “Gesundes Kinzigtal” of the AOK and LKK Baden-Württemberg (see Siegel et al. in this issue). The study compares the group of all potential participants in the IC project (of whom some 30 % had decided to participate by late 2011) with a control group that is representative of all other members and is additionally standardized to match the intervention group in all prognostically relevant variables. The intervention is then evaluated using the effect variable ‘quality of care’ (e.g., extent of guideline compliance) and costs of care (cost development and degree of cost coverage) by comparing the intervention and control groups [73].

Perspectives

The characteristics of SHI routine data and the presented brief examples indicate the wide range of potential uses of these data in HCU research. Twenty years ago, an investigation of surgical treatment quality already recognized this fact: “Insurance claims data are population based, covering all services provided to a defined population regardless of where the care is obtained…. Their low cost and routine availability facilitate their use for monitoring outcomes over long periods. They are free of the reporting bias and inadequate follow-up that afflict case series studies and avoid the high costs required when special registries are organized” [74].

Nevertheless, it is important to keep in mind the limitations of SHI routine data . Two issues are particularly relevant: the validity of diagnostic information and the transferability of the results of secondary data analyses to other populations. A patient-based comparison of diagnostic information in primary physician’s patient records with the diagnostic information in SHI billing data revealed considerable underreporting in SHI routine data (in 30 % of cases), particularly for common primary care diagnoses of lesser severity and chronic diseases treated without pharmaceuticals. Simultaneously, permanent diagnoses that were not currently treated were overreported (in 19 % of cases) [75]. A study using SHI data alone also revealed deficits in the continuous documentation of chronic diseases and inconsistencies between diagnosis coding and specific pharmaceutical prescriptions [76]. Nevertheless, these two studies were conducted before the direct transmission of billing information to the SHIs (which started in 2004) and the introduction of the morbidity-oriented risk structure compensation scheme (in 2009).

The validity of diagnosis-based incidence and prevalence estimates has been increased by the availability of outpatient billing data since 2004 and the binding coding of diagnostic confidence. Nevertheless, the diagnosis information in billing data must still be separately validated to supplement the SHI error checks. Depending on the symptoms, confirmed outpatient and inpatient diagnoses and any specific prescriptions are used for validation and identification of so-called epidemiologically confirmed cases . In chronic diseases, documenting a diagnosis alone is typically insufficient [14, 77]. These systematic internal validation approaches increase the quality of the diagnosis-based analysis of SHI routine data.

In case of short observation periods, however, differentiating between incident and prevalent cases of chronic diseases is still a problem because mild cases, such as diagnosed diabetes mellitus not yet requiring drug treatment, may not be regularly documented in billing data. In such cases, billing data should be available for an extended time period if possible, so that initially documented diagnoses as incident cases can be validated throughout several diagnosis-free quarters to avoid overestimating incidence [78]. Provided that incidence and prevalence are validly estimated, SHI routine data allow comprehensive health services monitoring for specific chronic diseases throughout all health-care sectors [79].

The external validity of secondary data analyses generally requires separate examination. Since the employed data sets typically come from a single health insurer, the results are not automatically transferable to members of other SHIs. Particularly, incidence and prevalence estimates depend on the health insurance’s member profile. A newer study by Hoffmann and Icks (2011) showed significant differences in member profiles and morbidity structures, even when adjusting for age and gender : “Some morbidity differences remain even after adjusting for relevant health-related variables” [80].

External validity problems could be solved if the so-called data transparency provision act (§ 303a-e SGB V) is consistently implemented to be expected in 2014. The SHI Care Structure Act that came into effect in 2012 now similarly provides that the data reported by the health insurers to the German Federal Insurance Office according to § 268 SGB V for risk structure compensation are combined and made available for further analysis. Although the specific details of the Act are still partly deficient from the perspective of HCU researchers, for the first time the new data pool allows determining members’ personal treatment prevalence and indirectly the incidence of treatment-requiring diseases. There have been calls to expand this process to enable more advanced analyses, for instance regarding treatment courses or for small area analyses [81].

In addition to the data pool to be created according to § 303 SGB V, data pools from the Central Research Institute of Ambulatory Health Care in Germany (ZI; www.zi-berlin.de) are already available for cross-insurance analysis. This includes data on billing diagnoses, service (‘EBM’) codes, and pharmaceutical prescriptions from all physicians in private practice in Germany. This database is primarily used for internal analyses by the Association of SHI Physicians, but it is increasingly made available to health services researchers, and a scientific use file will be set up. The most prominent example is the new health services mapping [Versorgungsatlas] project (www.versorgungsatlas.de), which focuses on local variations in utilization on the level of counties and independent towns; successive analyses have been published for various indications since 2011 (e.g., frequency of depression, influenza vaccination rates, and participation rates in bowel cancer screening).

Another cross-insurance data source that may be of interest to HCU researchers is routine data that are directly read out from doctor information systems and processed for secondary use in research. However, this source has only been used in the context of special studies and has not yet been implemented nationwide. Technical issues (such as error-free data transfer from the practice IT systems), logistical difficulties (continuous contact with thousands of physicians in private practice), and methodological challenges (such as cross-practice, consistent patient pseudonymization, or the handling of free text information) currently prevent their routine use; hence, we will not discuss these data in more detail [82].

Members of private health insurance (PHI) have not yet been discussed. The PHI data structure currently does not allow analyses of the described nature. Currently, the research institute of the PHIs largely limits its analyses to topics relevant to health economics and pharmaceutical prescriptions [83, 84]. Therefore, the results of secondary analyses of SHI data cannot be transferred to the privately insured population (which represents about 15 % of the total German population).

SHI Data and the Andersen Model

The utilization of routine data in the context of HCU research is not suitable for investigating individual risk factors that do not immediately require treatment or sociodemographic or socioeconomic factors influencing utilization that are insufficiently documented in routine data. Clinical disease-related factors with direct significance for the type and intensity of utilization (such as degree of severity) are also typically missing from SHI routine data. This limits the modeling of predisposing, enabling, and need factors in the context of the Andersen model of utilization of medical services [85, 86]. Table 5.6 shows which specific factors of the model can be included.

Table 5.6 Representation of components of the Andersen model in SHI routine data

Among the predisposing factors, routine data only include sociodemographic characteristics, which may be of limited use because of the questionable validity of information on education, training, and current occupation [54, 87]. The new version of the occupation code, which SHIs must report at least once yearly for their working members, took effect in December 2011 and could improve this situation [88]. However, the specific health behaviors and attitudes of individual SHI members are generally missing from SHI routine data.

Structural enabling factors , in contrast, are directly available if they are documented through the utilization of SHI-reimbursable medical services and the characterization of the service provider (e.g., primary care physician/ specialist), or they can be retrieved via information on the member’s place of residence (postal code, community, or district) for hierarchical analyses. Physician directories of the Associations of SHI Physicians, general hospital plans of the federal states, and an extensive database of socioeconomic variables are available in regional resolution (INKAR, Indikatoren und Karten zur Raumentwicklung), indicators and maps on spatial and urban development in Germany and in Europe, available through the Federal Institute for Research on Building, Urban Affairs and Spatial Development; www.bbrs.de).

Some need factors can be identified in SHI routine data. Diseases, injuries, physical and/or mental disorders that lead to the utilization of medical services are represented by the ICD classification and potentially by specific pharmaceutical prescriptions. Symptoms and subjective illnesses that do not lead to contact with professional service providers and do not result in SHI-reimbursable services are problematic. Other methodological issues are the virtual absence of medical findings (e.g., degree of severity) unless they manifest in specific ICD codes as well as the general problem of how nonspecific causes for physician visits are coded in disease diagnoses in the context of ICD and the valid billing regulations.

The limitations of sector-based and particularly insurance-based secondary data , which often fail to fully depict utilization, represent a further problem. For instance, the costs of medical services are not borne by the SHIs alone but often also by other social insurance funds, for instance by the statutory pension insurance in case of medical rehabilitation of employees. Cross-insurance, member-specific analyses may be the primary solution to this problem.

Solution Approach: Individual Linkage

Linkage of primary and secondary data on the individual level could overcome the problem that information on individual health behaviors and specific risk factors, which is typically available in primary data, is missing in SHI routine data. This theoretically ideal solution for epidemiology, and hence HCU research, is fraught with a series of legal, technical, and organizational challenges, and data linkage has, therefore, only been tested in a few cases [76, 89]. Current studies indicate, however, that although often logistically complex, linkage with primary data is technically feasible, legally permissible, and promises novel insights [21, 90].

Specifically, individual data linkage requires obtaining approval for the scientific utilization of social data according to § 75 SGB X from the relevant federal or state supervisory authority, ensuring acceptance of this procedure through the informed consent of study participants, guaranteeing secure data transfer and administration that rule out the reidentification of participations, and collecting SHI data from numerous SHIs based on separate data utilization contracts [21]. On the basis of predominantly positive experiences, individual linkage of primary and secondary data (SHI billing data, cancer registry data, etc.) and the establishment of a secondary data competence center are envisioned for the National Cohort planned to start in 2014 [91]. The National Cohort is expected to provide strong impulses for future health services and HCU research in this respect.

Individual linkage of data also enables the cross-validation of self-reported health-care utilization , particularly in the outpatient sector. No reliable analyses are currently available on the validity of members’ self-reported frequency of outpatient physician contacts within a time period of 3 or 12 months (see Swart and Griehl in this volume). These reports can now be compared to health insurance data. Similarly, primary data can be used to estimate the extent to which the introduction of case-based flat rates has led to an underestimated frequency of individual physician contacts in SHI data. Current and future studies will provide important methodological insights in this regard.

The same consideration applies to reported pharmaceutical use. Cross-validation of primary and secondary data can assess the accuracy of self-reported information on prescription-only medications and on the intensity of use of nonprescription medications and hence on the estimate of “utilization bias” [92]. For outpatient physician contacts and prescriptions or drug intake, neither of the two data sources can be considered the gold standard. The situation differs for inpatient stays, where SHI routine data can be assumed to provide complete and accurate documentation (except for inpatient rehabilitation, which is frequently financed through the statutory pension insurance).