Keywords

Introduction

In 1968, Donald C. Fyler, through the Department of Cardiology at Boston Children’s Hospital, organized The New England Regional Infant Cardiac Program (NERICP), a clinical care program that was designed to enhance the delivery of care to infants born in the New England area with life threatening congenital heart disease [1]. In addition to establishing a cooperative referral network, an important element of this program was the creation of a centralized regional registry of incident cases of congenital heart disease. This database provided epidemiologic and descriptive data concerning the frequency of various forms of heart disease [2], and original insights into factors associated with the presence of congenital heart disease such as birth order, population density, and season [3, 4]. Key to this effort was a system of disease categorization. A frequently quoted but still germane observation by William Farr in 1839 while working as an epidemiologist for the city of London is that “nomenclature is of as much importance in this department of inquiry, as weights and measures in the physical sciences, and should be settled without delay” [5]. Dr. Fyler completely understood the need for a systematized nomenclature of congenital heart disease for the NERICP but could not identify an adequate existing resource. He therefore initiated the construction of a coding system to fill this void, a nomenclature that in recognition of his contribution is generally referred to as the Fyler Coding System (FCS). The FCS has succeeded from both the clinical and research perspective and is still the primary system for disease and intervention classification at the Boston Children’s Hospital Department of Cardiology and is also used at many other pediatric congenital heart programs both nationally and internationally. However, with regard to the purposes of this book, the shortcomings of the FCS are of more importance than the successes. Dr. Fyler worked to improve the FCS for more than 30 years during the remainder of his career, but remained invariably dissatisfied with his efforts. The reasons for this dissatisfaction can be readily described and are by no means unique in the field of medical nomenclatures. In fact, many new nomenclature efforts continue to make the same mistakes in the design of their systems. This chapter recounts in brief the Fyler experience and the lessons that can be learned from it.

Medical Record Keeping in the Nascent Electronic Era

At the time of initiation of the NERICP, affordable minicomputers were just making their way into the commercial market and from the beginning the NERICP data storage was computer based. The FCS was therefore created to optimize the efficiency of digital storage and retrieval, particularly during the early experience when digital computing was billed based on microseconds of CPU use and digital storage was both slow and expensive. Although the NERICP was initiated during the punch card era and initially relied on this technology, the transition to terminal-based data entry enabled wider access to computer resources. Dr. Fyler responded to this technical advance by quickly introducing this nomenclature into the clinical care arena at Boston Children’s Hospital, constituting one of the earliest forays into the realm of electronic medical records. Surgical and catheterization data that preceded the availability of this database were retrospectively entered (back to 1950!) and the diagnoses, findings, and procedures were coded using the Fyler Codes. Prospective data collection was expanded to include outpatient clinic visits, electrocardiograms, and eventually other diagnostic and therapeutic modalities including echocardiography, electrophysiology, cardiac magnetic resonance, and cardiac computed tomography were similarly captured and coded using the same systematized nomenclature system. During the subsequent 45 year period of continuous use and modification of the Fyler Codes in response to clinical use and feedback, much has been learned. To date, over 3.3 million Fyler codes have been entered into clinical echocardiographic reports issued by the Department of Cardiology at Boston Children’s Hospital alone.

Numeric Versus Text Codes

The Fyler Codes were originally based on a list of numeric codes, each of which represented a specific textual description. More recently, this system has been modified to an alphanumeric system, but conceptually the approach is the same insofar as the code represents a specific concept that may be a diagnosis, finding, risk factor, intervention, etc. In retrospect, this was a decision that proved to be an important success of the FCS. The alternative approach, as has been implemented in some coding systems, is to rely on the textual description alone. There are some important advantages to the use of an abstract alphanumeric representation. The FCS stores only the numeric code in the database, resulting in very compact data storage, which was certainly a significant issue in the original design of the FCS whereby the average reduction of data storage requirements was >10 fold. This advantage is considerably less important in the current era of abundant storage capacity and high capacity networks. The primary disadvantage to coding systems based on numeric codes is the need for translation between text and numbers during data storage and retrieval, but again this was primarily an issue when printed manuals were the primary resource for this translation whereas now digital resources can be used to make this process completely transparent, effectively insulating the user from the abstract numeric codes. In contrast, the disadvantages of the text-based codes cannot be as easily overcome. Nomenclature evolves over time and changing the text of a specific code disrupts the validity of the previously stored data if the text is changed, an issue that does not arise as long as the data are stored using an abstract representation. Perhaps the most important advantage to the use of an abstract alphanumeric representation is that it enables synonyms and multilingual translations, such that the same alphanumeric code can be presented to the user as “atrioventricular canal”, “atrioventricular septal defect”, “canale atrioventricolare completo” or other locally specified designations or languages, thereby greatly facilitating collaborative work. The use of abstract concept representations is widely viewed as essential within the medical informatics community and must be viewed as one of the FCS successes.

Intrinsic Hierarchy Based on Numeric Order

The FCS was predicated on a hierarchy based on severity of disease. Major group categories (for example single ventricle, transposition of the great arteries (TGA), double outlet right ventricle, and tetralogy of Fallot) were assigned a numeric value with an ordered sequence such that those with the greatest clinical and anatomic abnormality were assigned the lowest numbers. The sequence was based primarily on probability of early survival, such that hypoplastic left heart syndrome was one of the lowest number groups whereas atrial septal defect was much higher in the sequence. In patients with multiple lesions, the lowest number code was typically the “primary”, or most important diagnosis. Conceptually, this approach enabled very simple assignment of risk hierarchy based on rank order of numeric code in addition to enabling very simple statistical description of population disease severity. This approach proved moderately successful for comparison of complex versus simpler forms of congenital heart disease (such as hypoplastic left heart versus coarctation of the aorta) but fails for comparison within these groups (such as atrial septal defect versus pulmonary stenosis), where the modifying influence of severity of pulmonary stenosis or size of the atrial septal defect is a more important determinant of outcome than the category of that anatomic abnormality. For patients with multiple diagnoses, the numeric sequence of the codes therefore often fails to designate which of the several codes represents the patient’s “primary” or most important diagnosis. This shortcoming led Dr. Fyler to ultimately consider the FCS to have been a failure, and is one of the reasons he never chose to publish this system. This experience provided the important lesson that the numeric (or alphanumeric) representation should be abstract and code properties such as severity or relative importance of disease should be maintained as a separate property either of the code or a property determined by the code in conjunction with one or more modifiers.

Code Organization Based on Alphanumeric Representation

Most nomenclatures require an organizational structure for their efficient use. For example, in the FCS diagnoses were grouped according to the nature of the anatomic abnormality, such that all forms of d-loop TGA were positioned numerically between 0700 and 0799 and entities with l-loop TGA were numbered as 0800–0899. From a database and software design perspective this approach provided very efficient search routines and by intrinsically grouping the related codes it facilitated manual code searches by grouping related terms together in printed manuals. This is the approach taken in the International Classification of Disease (ICD) coding system, now in its 10th revision (ICD-10). The ICD-10 has 16 sections in the procedure coding system with all members of a particular section having an alphanumeric code that begins with the same character (http://www.cms.gov/Medicare/Coding/ICD10/Downloads/pcs_final_report2013.pdf). If the groupings are truly mutually exclusive, as is the case with the biological classification into domains, kingdoms, phylums, classes, orders, families, genus, and species, such an approach can be useful and can assist in code identification, hierarchical designation and relationships between categories. However, congenital and acquired heart disease nomenclatures are based on combinations of anatomic and physiologic relationships that do not stratify into mutually exclusive categories. For congenital heart disease individual diseases often consist of “complexes” consisting of multiple anatomic abnormalities, leading to ambiguity as to where in the hierarchy the code should be positioned. For example, a heart with the combination of L-loop TGA, tricuspid atresia, and ventricular septal defect could justifiably be positioned amongst the disorders of the ventriculoarterial junction, ventricles, or atrioventricular junction. Electronic representations can easily accommodate this ambiguity by displaying the same code on multiple branches of the tree, enabling users to find the correct code regardless of which of these categories they prefer to search. However, support for multiple hierarchies requires maintenance by some other method than the alphanumeric code, which must be unique and therefore cannot be used to designate multiple locations in the tree. This is another instance where the design of the FCS failed by attempting to attach meaning to the alphanumeric code beyond simply designating the text it represents. The FCS shares this failure with the ICD and other systems. In general, the network of relationships between concepts and terms must be encoded separately from the encoding of the concepts and terms themselves.

Atomic Versus Molecular Design

As is the case with many disease entities, patients with congenital heart disease manifest one or more specific anatomic or physiologic abnormalities in a variety of combinations. For example, a ventricular septal defect can be observed as an isolated finding but is often found in association with other structural anomalies such as aortic coarctation, and in the case of more complex abnormalities such as double outlet right ventricle the ventricular septal defect is generally integral to the disease. Coding systems have generally taken two different approaches to classifying these hearts. The “atomic” approach to coding identifies each of these lesions as an independent finding and the codes for each component are captured independently. The “molecular” approach involves assigning individual codes to both the individual components and to each of the valid combinations, which constitute the “molecular” codes, also referred to as composite codes. For example, the molecular approach permits the combination of ventricular septal defect and coarctation to be captured as a single code whereas the atomic approach would require each to be entered separately. The Fyler system and the European Pediatric Cardiology Codes (EPCC) [6] have taken a more atomic approach, whereas the Society of Thoracic Surgery (STS) [7] coding system has taken a molecular approach. The Fyler system is not purely “atomic”. For example, the FCS has “anatomic” codes for d-loop TGA, ventricular septal defect, and intact ventricular septum, but also has molecular codes for d-loop TGA with ventricular septal defect and for d-loop TGA with intact ventricular septum. In contrast, the STS system is highly skewed towards the molecular approach by including combination codes such as “Hypoplastic left heart syndrome (HLHS), Without intrinsic valvar stenosis, Hypoplastic aortic valve + mitral valve + left ventricle (Hypoplastic left heart complex = HLHC), VSD, Nonrestrictive VSD”. Is there in fact a reason to prefer one approach over the other? There are advantages and disadvantages to each approach and these differences are worth discussing.

Insofar as the structure of the nomenclature should be optimized for the desired functionality, functional specifications can be generally grouped into three categories: (1) maintenance of the coding system, (2) code capture (code finding in conjunction with data entry), and (3) code retrieval (code finding in conjunction with database searches). The first issue (code maintenance) is the simplest to discuss. The atomic approach results in a relatively short list of codes whereas the molecular approach, in which all of the atoms are designated in addition to all valid combinations of the individual atoms, results in a set of codes that may be orders of magnitude larger. In addition, insofar as the molecular system attempts to represent the universe of possible combinations, maintaining a complete “molecular” system is far more subject to error. In general, the cost and complexity of code system maintenance increase in proportion to the number of terms in the system, and therefore an atomic system is far more economical to maintain.

Code finding in conjunction with data entry involves searching a repository of codes and selecting those appropriate to a particular patient or patient-related event. The shorter list of codes in an atomic system makes it easier to locate individual codes at the time of code entry, but searches for specific complex cases with multiple anatomic abnormalities requires repeated searches for each component. The primary advantage to the molecular code schema is that a much smaller set of codes needs to be selected, but it is often necessary to search through a very long list of codes to find the best match. Even with a feature-rich search facility this can be very time consuming and the net tradeoff between these costs and benefits is unclear. There is a theoretical advantage to the molecular approach because the universe of valid code combinations is pre-specified. Because the user of an atomic system has the freedom to choose any combination of terms, invalid combinations (such as mitral regurgitation plus mitral atresia) can be selected for the same patient. However, because the molecular system must by necessity include all the individual atoms, such an error can be made in this system as well.

Code finding in conjunction with database searches has a different trade-off. Since the same atomic entry will appear in many molecular terms, searches for a specific atom (such as ventricular septal defect) are more complex because of the need to specify a large number of molecular entries to capture all instances of that atom. Searches for composite diagnoses are also more complex in a molecular system since they may have been coded using the composite code (“TGA with ventricular septal defect”) or the individual components may have been selected separately (“TGA” and “ventricular septal defect” as atomic codes). Again, the consensus in the health informatics community is that concepts should be represented as their constituent components as is done in an atomic system.

Provisions for Expansion and Modification of the Coding System

The pace of change in medicine is fairly remarkable and as new knowledge is gained the need for new terminology and codes is relentless. Many of the standard coding systems, such as the ICD system, operate on revisions that are issued in cycles measured over years, which is completely inadequate from a clinical perspective. New procedures, for example, need to be captured in near-real time because back-coding is a notorious source of data loss. From the beginning, the FCS was, and remains, a dynamic system with expansion of the codes to accommodate missing elements, newly recognized diseases, and most importantly the continuous evolution of new therapeutic options. When the first arterial switch operation was performed at Boston Children’s Hospital, the appropriate code was added the next day. This responsiveness considerably enhances the value of the coding system, but also creates a higher risk of incorporating terminology that is rapidly outdated or duplicates existing codes (for example separate codes may be created for “LEOPARD syndrome” and “Noonan syndrome with multiple lentigines” because of a failure to recognize these as synonyms). Management of the coding system therefore requires a system of governance, generally by individuals with considerable interest in maintaining the integrity of the system, who have sufficient knowledge and are willing to put in the effort to confirm accuracy and need for new entries.

Coding Data Capture Workflow

Early on in the FCS experience it became apparent that disease classification was most accurately performed by the physician at the time of care delivery rather than later translation of free text reports based on chart review, a function that is often performed by personnel not directly involved in care delivery. The cardiology information system at Boston Children’s hospital was therefore constructed with this goal in mind, implementing capture of these data as a function of the clinical workflow. It was relatively straightforward to introduce this model in the diagnostic testing environment, as the physicians performing and interpreting the echocardiograms, cardiac catheterizations, cardiac magnetic resonance imaging, exercise stress testing, and electrocardiograms entered these codes as part of the electronic documentation process. The coding process was used to facilitate the reporting process because the Fyler code translations were included in the final reports, thereby supporting clinical care delivery as well as billing requirements. Positioning this process as a critical step in care delivery ensures a higher degree of accuracy than relegating code capture to an administrative role where mistakes and inaccuracies have less opportunity for critical review and error recognition by the entire care delivery team. Capturing these data in other environments such as the inpatient service and intensive care unit has been more challenging and this remains an area where development of better systems for information capture remains an important goal.

Lessons Learned

A significant effort has been devoted toward the theoretical specification of the features of a terminology system that will best accomplish the multiple purposes that such a system is intended to accomplish. A position paper from the American National Standards Institute (ANSI) regarding informatics standards for health terminology [8] documented that many of the same problems we experienced with the FCS are present in most of the other health care terminology systems (such as the ICD and CPT systems). These other systems also tend to be constructed as a list of terms with numeric designations that are intended to have meaning beyond an arbitrary alphanumeric code, such as group membership or hierarchical relationships, and are rarely atomic. The ANSI recommendations are in alignment with overcoming the shortcomings we have experienced with the original design of the Fyler codes, such as the need for context-free identifiers and support for synonyms, properties (also known as attributes), and multiple hierarchies. The ANSI report also documented the need for mapping between systems, language independence, and other features that are time consuming and expensive to implement but are ultimately vital to the utility of these classification systems for robust data analysis.

The ANSI working group identified explicit definitions as one of the primary requirements for a health system terminology, noting that the definitions are inconsistent in medical dictionaries and are subject to even greater variance between practitioners. This is unquestionably one of the most glaring shortcomings of the various lexicons that are currently in use for congenital heart disease, including the Fyler system. The controversies concerning the proper use and understanding of the congenital heart disease lexicon has been well documented for many years. The work in progress by the International Working Group for Defining the Diagnostic and Procedural Terms for Pediatric and Congenital Heart Disease, a subcommittee of the International Society for Nomenclature of Pediatric and Congenital Heart Disease (ISNPCHD), to correct this situation was presented in February, 2013 at The Sixth World Congress of Pediatric Cardiology and Cardiac Surgery in Cape Town, South Africa. The goal of this group is to create concise definitions for the diagnostic and procedural terms in the International Pediatric and Congenital Cardiac Code (IPCCC) and the committee has been working towards this goal since its formation in 2007. The FCS will benefit from this work, through a process of cross-mapping and transparent incorporation of these definitions.