Introduction

Electronic health records have been rarely adopted in U.S. hospitals due to a lack of interoperability [1]. The lack of interoperability in the health domain in China has prevented health informatization [2]. Many standards have been created to engender interoperability. Health Level 7 Clinical Document Architecture Release 2 (HL7 CDA R2) [3, 4] is currently one of most widely adopted clinical document architectures model. The Continuity of Care Document (CCD) attempted to harmonize the Continuity of Care Record (CCR) [5, 6] with the CDA. The Integrating Healthcare Enterprise (IHE) suggested combining existing standards to improve interoperability [7]. The Taiwan electronic Medical record Template (TMT) was designed to achieve semantic interoperability in EHR exchanges nationally [8].

The Chinese Ministry of Health has proposed an overall approach to healthcare informatization for the Chinese mainland to support the national strategy of providing basic medical insurance to all Chinese residents. The approach was described as laying three foundations, building three level platforms, and improving profession application systems [9]. The three foundations will be reflected in the nationally used and standardized Electronic Health Record (EHR) for all residents, the standardized Electronic Medical Record (EMR) structure for all medical pavilions, and the national health information data dictionary. This research focused on the second foundation and was supported by a grant from the Ministry of Health. Whereas the goals and purposes of the second foundation are to build the basic EMR structure and its data element standard, this research proposes a basic EMR content structure to establish a guideline for the EMRS of all healthcare providers in China.

According to a survey on urban resident health conditions conducted in 2009, only 20% of Chinese citizens had regular physical examinations [10]. This means that most people learn about their health conditions only when visiting a doctor after becoming ill. Only then do medical institutes obtain their health information. Therefore, in most cases, medical institutes are positioned to collect and report the medical information of residents, and these medical institutes become the best source of the personal and health information of residents. The statistical data of the Ministry of Health shows that in 2007, most national- and provincial-level hospitals owned a local Hospital Information System (HIS), and about 38% of county-level hospitals had various levels of HIS.

The factor that most influences the variety in HIS levels is the absence of standards that could be adopted in the HIS software. The Basic Function Specification of HIS issued by the Ministry of Health in 2001 [11] is the only functional guideline for hospital data collection, and The Common Data Elements of Health Records issued in January 2009 by the Ministry of Health is not sufficient to support the requirements of defining the basic EMR structure and content.

As a component of HIS, an EMR system (EMRS) is designed to obtain the original medical data from the HIS database, create a medical summary of patient stay in a hospital, and respond to the information requirements from EHR platforms of different levels. Also, the EMRS could report the most recently updated information to the EHR platforms. Therefore, the EMR is the information source for the EHR of regional healthcare information platforms.

Our final aim was to propose a basic EMR content structure to clarify the types and formats of data needed by the regional EHR platforms. These data are to be obtained, managed and organized by the EMRS of healthcare providers. This content structure includes data groups and is independent of any specific EMRS. This structure will become the basis for all institutional EMR systems in China.

HL7 V3 Clinical Document Architecture (CDA) has provided an abstract and holistic information model to cover all forms of health information. However, the holistic and abstract attributes of the CDA created difficulty in building a particular EMR guideline in China. Therefore, we adopted the CDA model as the fundamental theory of this research.

Methodology

This study’s method comprises four steps: (1) proposing a theoretical EMR content structure; (2) collecting original medical record sheets from hospitals and integrating these sheets into a basic set of medical sheets; (3) comparing the proposed basic EMR content structure to items of the basic set of medical sheets and creating basic data groups for the EMR structure; and (4) enlisting standard data elements for data groups and building the content structure for the EMR. Figure 1 shows a flowchart outlining this research.

Fig. 1
figure 1

Research steps

The basic content structure of the EMR

We regarded every medical sheet as a complete clinical medical document. An EMR is the entire collection of medical documents for a patient. A basic EMR content structure was proposed to outline information to be included in the EMR and the manner in which they should be included. We used the HL7 CDA R2 model and medical diagnostics theory [12, 13]. The HL7 CDA R2 model offered the basic document structure that categorized information into the header and the body, while medical diagnostics theory provided the logic for organizing medical information.

To build an EMR content structure based on medical knowledge and the HL7 CDA R2 model, we invited medical experts to participate in relevant discussions and revising sessions. The components of the basic structure were described using a list of data groups matching the header and body as defined in the HL7 CDA R2 model.

A data group was defined as a composite data structure that aggregates relevant information elements. Occasionally, a data group could be considered as the aggregation of smaller data groups and data elements. Generally, the main data elements composing data groups were essential for semantic interoperability in a specific context. Data groups organized the relevant data elements, and the values of these groups were assigned through the data elements. Examples of data groups include symptom, medication, surgery, and document ID.

A data element was the basic unit of data. Data elements were specified by atomic units of data that have precise meaning or semantics. The list of data groups in this step did not define the data elements included within the groups in detail. That will be done when comparing the groups and the set of medical record sheets. The listed groups were used to outline the content of the EMR.

The framework of diagnostics was helpful in creating the groups and in defining the relationships between them. The experts that participated in the discussions included senior physicians, surgeons, general practitioners, and specialists. It was particularly important to enlist the participation of the senior engineers who were familiar with HIS designing in these discussions. Two discussion sessions were held to discuss content structure. Expert opinions were used to improve the structure of the data groups and to identify their relationships. After this, the data groups were compared with the data elements included in the basic set of medical sheets.

Integrating the information collected from medical sheets used in hospitals

The collection of medical sheets from hospitals

Expert opinions and hospital medical sheets were collected simultaneously. The collection was conducted from January to March 2009. Seventeen hospitals in China were selected to conduct the collection. The hospitals selected had the following characteristics: (1) more than 50% were large-scale hospitals (large: more than 1,000 beds, medium: more than 500 beds, small: less than 500 beds); (2) all selected hospitals were 3A level; (3) at least two Chinese medicine hospitals of medium scale were included; and (4) at least one specialized hospital was included.

Merging and creating a preliminary set of medical sheets from hospitals

All medical sheets collected were to be categorized according to the medical activities targeted in the records. In categorizing, we created a list of medical activities and a list of medical profession fielding those activities. Overall consideration of the fields of profession and the proposed content structure was helpful in creating the basic EMR content structure in step 2.4. Those original sheets that were categorized into the same activity, we merged their items on a single sheet. Then, based on the original sheets collected, we grouped the data on a single sheet. After cataloging and merging, a preliminary set of medical sheets was achieved.

Checking and integrating data elements in the medical sheets

Data elements in the preliminary set were combined from the same-purpose-sheets from different hospitals, and were not comprehensive. The sheets collected from specific hospitals did not necessarily contain information for default elements, such as the names and codes of the hospitals, hospital legal counsel, and the codes’, authors, and the names of the data entry operators.

We regarded each medical sheet as a complete clinical medical document. Based on the HL7 CDA R2 model, each document was composed of a header and body. Elements that were essential to form the header of a complete EHR according to the EHR Basic Architecture and Data Standard issued by the Chinese Ministry of Health in May 2009 were defined as default elements. The data types of the elements defined in the EHR Basic Architecture and Data Standard were HL7 V3 data types.

First, each preliminary sheet was reviewed, and default data elements were added if necessary without changing the names and data types of these standard elements. Personal information items were added according to the regulations of the Basic Dataset of Electronic Health Record issued by the Ministry of Health in 2007 [14]. At this point, the preliminary set of medical sheets became the basic set, and default data elements were added and redundant data elements deleted.

Substitution of items in the basic set of medical sheets with standard data groups and data elements according to the content structure constructed using comparison and reconstruction

The data groups of the proposed EMR content structure formed in step 2.1 were compared with the basic medical sheets from step 2.2. The comparison had two aims: first, we intended to guarantee that the lists of data groups were exhaustive. Second, we intended to categorize the data elements of these sheets into groups to create the EMR basic data groups and EMR basic data elements.

After revising the basic medical sheets based on expert opinions, the data groups had the properties of the basic set of medical information. By assigning an ID code to each data group, the basic EMR data groups could be listed without including data elements. After careful cataloging, the integrated data elements of the basic medical sheets created in step 2.2.3 became the source of the EMR basic data elements.

The value ranges of data elements were set according to current national coding standards. For example, all national institutions in China were assigned a standard name and code in GB/T 17538-1998. The education of patients was coded in GB/T 4658-1984. After coding the data elements, the groups became real data groups that contained data elements.

Building basic EMR content templates and structures

Building basic EMR content templates

The basic medical sheets represented different medical activities, and records of similar activities were attributed to the same data group. After the elements of the data groups were revised on experts’ opinion, we used the data groups to rebuild the basic medical sheets and created EMR templates. The templates were composed of the basic data groups that contained data elements. The data groups that were included in the template were decided based on the medical activities recorded. The cardinality values in the templates were defined in the process of rebuilding. That means the times that a data group could be reused in a template were defined in the cardinality value. The templates were called EMR templates and they could be used to guide what data must be included in EMRS. All templates formed the basic EMR templates.

Building basic EMR content structure

The basic templates, the basic data elements, and the basic data groups were organized together after thorough consideration of the fields of medical profession and the EMR data groups. The relationships between them were proposed.

Results

EMR content structure

The basic content structure of the EMR was based on diagnostics theory and the levels of the HL7 CDA R2 model. The model of the structure is shown in Fig. 2. The clinical document rectangle represents an EMR, which was the core of the figure. The content of the EMR was composed of data from 25 groups and some groups may include several levels of subgroups.

Fig. 2
figure 2

Information levels of the EMR structure

Most of the groups in the body came from medical knowledge, such as the main complaints of the patient, their present health problems, history of illness, family genetics, personal life and marital history, and surgical history. The groups were at different levels in the EMR structure. For instance, personal history is part of ‘past history’, and the groups could be extended with subgroups when necessary.

According to the HL7 CDA R2 model, a clinical document consists of two main parts: a header and a body. The body of the EMR was composed of diagnostic groups, while the header of the EMR was composed of additional groups, such as demographic data and kin information.

To form the basic data groups, the depth of the levels should be sufficient to reach the leaf level of the structure, until subgroups that were only composed of data elements. The specific levels were confirmed after categorizing the medical sheets into these groups.

An EMR content structure with 25 data groups and 45 subgroups was proposed after the discussion sessions with medical experts. Parts of the groups are shown in Fig. 2.

Creating a basic set of medical sheets and basic data elements based on the collection of medical profession sheets of hospitals

All sheets used in hospitals were regarded as medical record documents that contain medical profession information. We collected and merged the sheets from selected hospitals. After merging, the list of medical fields and profession activities was generated and a basic set of hospital medical sheets was created. The basic data elements were acquired after merging different elements and eliminating redundant elements in the basic set of medical sheets.

Collecting medical record sheets

Hospitals in China were classified into three grades and four levels using the Hospital Grade Management Script issued by the Ministry of Health in 1989. Grade 3A is the top level of the highest grade. The evaluation criteria were based on hospital practice variety, mission, facilities, service quality, and management.

Adoption of a hospital information system was not listed in the criteria because the research was focused on the medical profession content that was independent of any specific system, while most selected hospitals have had HISs of different scales and scopes.

The collection of sheets in hospitals was conducted thoroughly, meaning that all sheets used in diagnosis and treatment were captured and collected. Sheets on patient insurance and identity information were also collected. For hospitals with computerized sheets, the sheets were printed and collected.

Seventeen 3A hospitals were selected to collect the sheets for medical profession. The selected hospitals were of different scales, and three Chinese medicine hospitals and one specialized hospital were included. Table 1 shows the characteristics of the hospitals selected.

Table 1 The accommodation capacity and hospital types of selected hospitals

The three Chinese medicine hospitals were also categorized as hospitals with less than 1,000 beds. In the collection of all 1,747 sheets, 260 were from the three Chinese medicine hospitals and 110 were from the specialized hospital.

Merging and creating a preliminary set of medical record sheets in hospitals

One hundred and forty-five extractant sheets were acquired after the original 1,747 sheets were merged and redundant elements were eliminated. Taken together, all 145 sheets were taken as the basic set of medical sheets. The dramatic reduction in the number of medical sheets was due to hospitals across China providing mostly similar procedures. Hospitals in China adopted the Basic Regulation of Patient Record issued by the Ministry of Health as the rules for writing medical records. The regulation stipulated the scope and format of medical records. The scope was classified to many medical activities that included list of items. After merging and sorting the sheets, we categorized them to the activities, listed the medical fields and added the healthcare organization to integrate the information.

We generated 7 medical fields that contained 18 classes of first-level medical activity records and 62 second-level medical activity records that were the subgroups of the 18 first-level medical activity records. Table 2 shows the seven medical fields and the categorization of the 18 first level medical activity records.

Table 2 Medical activity records in different medical fields

A medical sheet represented a medical activity, and the categorization of the medical activity records depended on medical fields. Medical activity records were categorized to represent all medical sheets and were coded as EMRxxxx. By adding two digits at the end of the code, we could specify medical activity at the second level. Similar activity records that occurred on different occasions, such as outpatient or inpatient treatment, were categorized into one record. For example, nursing records for outpatient and inpatient treatment were found bearing the same record—EMR0601.

All 145 sheets were categorized into the above 62 catalogs of second level medical activity records according to the similarity of the data elements included. According to the first level category to which they belonged, their correspondent data groups would form the EMR basic template after comparison with the proposed EMR content structure.

Checking on the integrated data elements in the medical record sheets

The default elements that belonged to the header groups were obtained by referring to the Basic Dataset of Electronic Health Record. The Basic Dataset of Electronic Health Record was created to support the EHR information requirement. It was a component of the EHR Basic Architecture and Data Standard. Its data elements are consistent with the HL7 Data Type model. In addition, specific elements used exclusively in China were included in the basic dataset. Some of the elements had coding systems in China. For example, the coding of patent education was taken from the standard GB/T 4658-1984, the coding of patient occupation was taken from the standard GB/T 6565-1999, and years of working was categorized under Date type.

We used 46 data elements and 13 value domains to assign values to elements in the Basic Dataset of Personal Information. The 145 sheets were reviewed and their elements were compared with the dataset. The elements of the basic dataset were added into the sheets after a careful selection that filtered out default elements. The selection process was necessary because converting the sheets into a single complete document only required some elements in the basic dataset.

Substitute items in the basic set of medical sheets with standard data elements defined in the national EHR according to the proposed content structure

After comparison, some data elements were independent of the data groups. For example, ABO and Rh blood types are two inherent biological components that will not change in the life span of an individual. We classified blood type in group H.02.001, representing personal biological identifiers, as a subgroup of H.02, representing identifiers of target service. As a result, the data group H.02 was composed of data elements and sub-data groups.

Each data group had an ID code. The structure of the data group ID codes was designed similar to that for H.00.001, where H indicated that the group was a component of the header of a clinical document and the two-digit number after the first separator dot indicated the first level sequence. If there was a subgroup, another three-digit serial number was added after the second separator dot. The level of the sequence indicated the affiliation between data groups. For example, H.01.001 denoted the data group of personal biological identifiers, which was one of the components of the target service data groups. The initial letter could be H or S, where H represents “header” and S represents “section of the clinical document body”.

We added 5 subgroups when compared with the proposed EMR structure. The structure of the groups made it easy to add more subgroups by adding a serial number in the last three-figure number.

Seventy-five data groups were created according to the data elements in the sheet sets. Table 3 shows that subgroups could be added easily during medical development.

Table 3 Seventy-five data groups

Data elements were defined in detail after being categorized into data groups. The cardinality of each element showed how many times it could be repeatedly used in one instance and the cardinality value was assigned according to the HL7 model. Meanwhile, we clarified their definitions and specified the data types and the formats of the data types. The values of the elements were set by coding where possible.

Finally, we obtained 451 data elements that served as the components of the data groups. Table 4 shows a part of the table of groups with the data elements, where group H.02 had two subgroups as shown in the table.

Table 4 List of the selected data groups and elements

Some elements defined in the EHR Basic Architecture and Data Standard were defined using codes. Their codes were defined in the Code Value table with ‘CV’ followed by a series of numbers. For example, the element “ID-type code” was defined in the table as CV0100.03, in which the type of the ID code was coded and listed (see Appendix I).

The structure of the data groups could be graphically represented as shown in Fig. 3. In Fig. 3, the data groups had two levels, and one data group could be included in another data group. These were called sub-data groups. Data elements were composed the two levels of data groups and constrained by value domain.

Fig. 3
figure 3

The relationship of data group, data element and data value

Sub-data groups were composed of data elements alone, whereas data groups were composed of sub-data groups and data elements. The sub-data groups were used to assemble groups of relevant data that were used together.

Building the EMR basic content structure and the EMR basic content templates

Basic EMR content templates

Based on the categorization of medical activity records, seventeen templates were drafted to cover the medical sheets used in most hospitals. The relationship between the templates and medical activity records is shown in Table 5.

Table 5 EMR templates

The templates consisted of data groups that could be used to express a specific sheet by assigning values to data elements. Table 6 is a part of the table showing templates with groups. Ten data groups were used to compose template MT01, Personal Health Summary, in Table 6, and nineteen data groups were used to compose template MT12, Inpatient Medical Process Records. The templates represented different medical activities, and their elements should be included in specific EMRS.

Table 6 Partial templates and relevant data groups

The template model and its implementation

Each template was composed of data groups, and its cardinality value was defined in its model (see Fig. 4). According to its cardinality value, data group H.01 could be repeated multiple times in one personal health summary, in a similar manner in which each data element repeat time in each data group would be repeated according to its cardinality definition.

Fig. 4
figure 4

The model of medical template MT01

When applying a template to represent a specific medical sheet, data groups could be reused in this instance and data elements could be reused in this data group as well.

The basic EMR content structure

The basic data elements and data groups were the basic components of the EMR content structure. Figure 5 shows the relationship between the EMR, medical field, and data groups.

Fig. 5
figure 5

The relationships among the EMR, medical field, and data groups

Discussion

The basic proposed content structure

Apart from the basic header-body structure as defined in the CDA model, the content system of diagnostics provided a basic framework for organizing clinical information. We proposed that content structure be regarded as a guideline for the collection of medical sheets in hospitals. The structure outlined the basic content of the EMR and directed the categorization of medical information. The discussion sessions with medical experts contributed to the reliability and rationality of the proposed structure. Other theory bases, excluding diagnostics, may also be adopted, but by choosing diagnostics, many actual descriptions of medical data could be borrowed to systematically organize clinical documents. Ultimately, it is easier to combine diagnostics with the CDA model.

Obtaining information from the medical sheets used in hospital

Collecting medical sheets from hospitals

China has more than 18,000 hospitals above the county-level classification, among which are over 800 3-A hospitals. The hospitals involved in this research were suitably representative of all hospitals in China. First, the geographical locations of these hospitals were distributed among nine provinces in northern and southern China. Also, four of these hospitals were very large-scale hospitals with over 2,000 beds. Second, China has a unitary medical system, and all hospitals operate with similar management concepts with little variety found among them. Third, the selected 17 hospitals are digital hospitals as certified by the Ministry of Health in December 2008, and the type of medical information that their medical sheets represented might be better presented in hospitals that use more information technology.

Merging and creating a preliminary set of medical sheets in hospitals

Our collection was one of several surveys in Chinese hospitals collecting medical profession sheets. We collected as many sheets as possible through the survey and found three characteristics of the medical profession in hospitals: (1) the medical profession bears great similarities in most Chinese hospitals, (2) Chinese medicine hospitals have adopted Western medicine to a great extent, and (3) most hospitals did not consider information exchange among different healthcare providers.

The set of medical sheets resulting from this research reflected information cataloging in hospitals in general. The set that we collected will also provide a useful reference for future EMR research and development.

Checking on the integration of data elements

Because most hospitals did not consider information exchange among different healthcare centers, almost all sheets lacked unified identification. The ID number of different sheets in one hospital may be generated by different systems or in different ways from the method used in a different hospital. The process of complementation elements for medical sheets was constrained by the national data element standard, the Basic Dataset of Person Information, assuring the interoperability of information among local EMRS and regional EHR platforms [15].

The regional EHR platform was a national EHR architecture composed of three levels: national, provincial and regional or city levels. The building of a regional EHR platform has begun in several regions with the goal of sharing personal health information regionally and (ultimately) nationally. The EMR were data sources for the regional EHRS; therefore, consistency with EHR standards was a basic requirement for EMR standards.

Comparison of proposed EMR content structure with the basic medical sheets

The proposed EMR content structure influenced the categorization of items on medical sheets. This process verified the capacity of the content structure and to what extent the vessel could cover all items from those original data sheets from the hospitals. The matching process might lead to the addition of data groups or data elements. Eventually, this research sought to build a universal structure composed of data groups, and the data groups could append data elements when necessary. When data groups became the basic information units of the EMR content structure, their values were assigned by acquiring data elements.

The relationship between EMR templates and data groups

Data groups were used to rebuild the real medical record sheets and form the templates of the activity sheet. The advantage of the templates was their practicality. By assigning values to data elements and data groups, the abstract EMR structure became concrete and could be used as a guideline in building EMRS.

Future research will focus on building standards and rules for EMRS in hospitals. The standards will be applied to the EHR architecture at national, provincial, and regional levels. This will also facilitate cross-institutional sharing of medical information.

Conclusion

We proposed an EMR basic content structure with a basic set of medical record sheets, a basic set of data groups with data elements, and finally a basic set of templates. The proposed structure covers most information that hospitals collect regarding their patients, and the universality of structure and concise table format allow for its implementation in specific EMRS. The EMR basic content structure will help to promote the standardization of the EMR system and aid in building local EHR platforms in China.