Background

The National Lung Screening Trial (NLST) aims to compare the effectiveness of two screening tests, low-dose spiral CT scan and chest x-ray (CXR), on net lung cancer-specific mortality in persons who are at high risk for developing lung cancer. The trial is sponsored by the National Cancer Institute (NCI) and conducted under a harmonized protocol within two separate administrative organizations: the Lung Screening Study (LSS) and the American College of Radiology Imaging Network (ACRIN). Accrual through 10 LSS screening centers (SCs) is complete with 34,614 participants enrolled from September 2002 to April 2004. One SC has a satellite medical center that functions operationally independently of its parent, and many SCs enroll participants through multiple medical centers. The LSS SCs operate within the screening centers of the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial Network.1 NCI has contracted Westat (Rockville, MD, USA), an independent research corporation, to provide coordinating and statistical services for the LSS network.

Participants are randomized to CT and CXR groups. Screens (studies)Footnote 1 are obtained at baseline, then annually for 2 years—three studies per participant (screening years T0, T1, and T2); the final screen will be performed in mid 2006. Each SC provides diagnostic interpretation of local imaging studies. Westat maintains diagnostic summaries as well as other clinical and demographic data. The management tool and database used to collect and maintain these data is called the Interactive Data Entry and Administrative System (IDEAS). Westat provisioned each SC with an IDEAS workstation for local keying, data verification, and error checking of demographic and clinical information. IDEAS allows local storage of participants' privacy data to facilitate correspondence with participants and their physicians, and to request medical records. It also permits the encrypted transmission of demographic and clinical data to the central repository at Westat. Reports and files may likewise be downloaded from Westat to SC IDEAS workstations. Radiologic images are neither transmitted to nor stored in IDEAS.

The Mallinckrodt Institute of Radiology (MIR), Washington University School of Medicine, manages the imaging Quality Assurance Coordinating Center for the LSS network of the NLST2. Following MIR's early experience in the management of quality assurance image studies from multiple sites, NCI contracted with MIR to assemble and administer the LSS-NLST CT Image Library (CTIL) to consist of digital copies of all LSS CT studies. This central image repository will serve as a large collection of serial CT screens from a well-defined population for use by imaging researchers. In this work, we describe the methods for the delivery of studies to the library, the hardware and software involved, and the check-in management of arriving studies. Confidentiality, security, quality assurance, and accessibility issues are also addressed.

All participants enrolling in the NLST signed an informed consent developed and approved by the SCs' institutional review boards (IRBs), the NCI IRB and the Westat IRB, before randomization. At the initiation of the CTIL, the SCs worked with MIR to establish data use agreements for the image sets to be transferred to the CTIL and which are in place for every SC. After April 2003, newly recruited and returning participants signed the necessary Health Insurance Portability and Accountability Act (HIPAA) agreements as directed by their institutions and approved by NCI to allow inclusion of deidentified CT examinations in the CTIL. As soon as the CTIL is operational and image sets are prepared for “check-out,” NCI will ask investigators to sign a separate materials and data use agreement for each research initiative.

The LSS utilizes a Manual of Operations and Procedures (MOOP) that details the development, implementation, and evaluation of LSS-NLST protocol. To ensure SC compliance with this protocol, including informed consent issues, Westat monitors SC activities on an ongoing basis and reports regularly to NCI and the SCs. NCI and Westat also conduct annual site visits at each SC to audit and monitor screening activities, in addition to evaluating the SC's adherence to LSS quality assurance and quality control procedures.

Methods

Study Collection

Table 1 lists the SCs and, the number and percentage of CT participants. SCs adhere to a strict LSS-NLST CT acquisition protocol, though CT scanners vary by vendor (GE, Philips, Siemens, Toshiba) and model across the SCs. An LSS medical physicist coordinates scanner-QA testing with medical physicists associated with each SC, and three LSS radiologists monitor image-acquisition quality on a monthly basis.

Table 1 LSS Screening Centers and Satellite (Parent) with Number and Percentage of CT Participants and Scanner Vendor Codes

Screening centers vary in their storage of NLST CT studies. Some use their medical centers' picture archive and communications system (PACS), whereas others use a research PACS or even an NLST-specific archive. Figure 1 illustrates the generalized harvesting of CT studies from varied archives in preparation for delivery to the CTIL. Although the SCs use CT scanners from various vendors, all utilize a standard DICOM (digital imaging and communications in medicine) format. For each study, an SC has a three-step task: (1) collect each study from the local archive, (2) deidentify the study to remove protected health information (PHI), (3) deliver the study to the CTIL. To help the SCs with these tasks, MIR provisioned each SC with custom software and a laptop computer (Dell Inspiron 1150; Dell, Round Rock, TX, USA) with DVD writer and 37 GB hard drive. In addition, each SC received a 250-GB universal serial bus (USB) external hard drive (XHD). The laptop is used to obtain the CT studies from the local PACS and prepare the studies for delivery to the CTIL. The custom software provides a simple user interface to facilitate PHI removal. Those SCs with separate research PACS or NLST archives may have already partially deidentified their studies.

Fig 1
figure 1

Screening center (SC) preparation of CT studies for delivery to the CT image library (CTIL). All studies are sent to an SC laptop. The laptop's DICOM utilities software allows studies to be (a) “pushed” from the PACS (or NLST archive), (b) “retrieved” from the PACS, or (c) loaded from an external hard drive (XHD). NLST-affiliated regional medical centers may transmit their studies to the SC's PACS or provide studies on CDs or DVDs. Once they are on the laptop, studies are deidentified of participant information except for an NLST-assigned, SC-specific participant identifier (PID). Deidentified studies are then shipped (via DVD or XHD) or Internet-transmitted through a Virtual Private Network to the CTIL.

Any SC may choose whatever collection mechanisms best suit local workflow efficiencies. Multiple mechanisms may be used at any time, and the mix may change over time. For example, one SC may push studies from its PACS to the laptop during low network-volume evening hours, whereas another may write studies to an XHD to avoid a congested network. An SC may submit studies to the CTIL in any order that suits its workflow. For example, an SC may submit all its T0 screens before any others. Other SCs may submit all three screens (T0, T1, and T2) for participants who have completed them ahead of those who have not. Other SCs may choose to submit more recent studies that are readily available on “near-term” disk storage before those on archival tapes.

A user invokes a laptop application “Clinical Studies Workstation” (CSW) user interface3 to view studies that are currently on the laptop and to select studies for deidentification and delivery to the CTIL. When launched, the CSW application sweeps through the laptop-resident studies and builds a worklist from information in the DICOM headers of these studies. Worklist columns show participant name, local ID and accession number, study date, number of series, and total number of images. Studies are selected from the worklist, one at a time, for export to the CTIL; however, any selected study must first be certified for export.

Study Certification

The SC has two concerns in preparing a study for delivery to the CTIL: (1) the study selected for delivery must be verified as the T0, T1, or T2 study of an NLST participant; and (2) the study must be deidentified by removing all PHI.

Study Verification

Monthly, Westat provides to each SC, via IDEAS, a list of known CT studies from information provided to Westat by the SCs at the times that the screens are performed. The SC transfers this file to the laptop. Each line of this list represents a unique study and contains an NLST participant identifier (PID), study date, screen year (T0, T1, or T2), visit number, date of birth (DOB), and gender. Visit number is “1” for the first visit, but may be higher if the participant returns for a repeat screen, likely attributable to the prior visit's screen being of inadequate diagnostic quality. When a study is selected in the CSW worklist, the custom software extracts the values of three parameters from its DICOM header: study date, DOB, gender. If these same values can be found in one line of the Westat list, the study is deemed verified as belonging in the CTIL. By default, studies must match all three criteria. However, some SCs with their own research PACS may already have inserted the NLST PID as the DICOM patient ID and may already have eliminated DOB and/or gender from DICOM headers. For these studies, the matching rules are based on study date and NLST PID.

Study Deidentification

DICOM fields containing, or likely to contain, PHI are blanked or given fixed-phrase fillers. For example, the DICOM Patient Name is coded as PATIENT^NAME and Accession Number as ACC. The NLST PID is stored in the DICOM Patient ID field. The DICOM Study Date is replaced with 19990102, a date earlier than the first participant's T0 screen. To distinguish studies with the same NLST PID, the NLST CT screening year (T0, T1, or T2) is inserted in a DICOM Comment field, together with visit number and the criteria used to match against the Westat list. A comprehensive accounting of DICOM header changes, both at the time of laptop deidentification at an SC and at the time of library check-in (below) may be found in Appendix A.

Study Delivery to the CTIL

An SC may deliver deidentified studies to the CTIL via the Internet or by shipping a DVD or an XHD. If Internet transmission is chosen, the SC first makes a password-protected virtual private network (VPN) connection to the CTIL DICOM receiver at MIR. Not all SCs have opted for this route because of local firewall/network issues and policies. If an XHD is shipped, the CTIL delivers a replacement the day after its receipt. Timing, workflow, and shipping charges dictate the choice of method. Because the CTIL initiative began well after recruitment started, a backlog of more than 50% of all CT studies existed at the time that study delivery to the CTIL began. The XHD option, which allows storage of more than 1,000 CT studies of sizes typical for NLST, was offered to provide a more efficient means of study delivery. Once the backlog has been whittled, the DVD and/or network transmission options may prove preferable.

Study Check-in at the CTIL

Figure 2 provides an overview of CTIL storage and management. CTIL management developed and maintains its own database (CTIL-DB), a PostgreSQL (Wolfville, Nova Scotia, Canada) database, for tracking the study check-in process; it is independent of Westat's IDEAS. An arriving study is checked for proper identifiers and screened for PHI. Problematic studies require dialog between the submitting SC and a CTIL image librarian. For a compliant study, the CTIL-DB is updated to reflect study date, NLST PID, arrival date, scanner acquisition and reconstruction parameters, and number of images. The NLST PID, identifying the SC origin of the study, is removed from the DICOM headers and replaced with a CTIL PID that is unique to the participant but lacks any SC identifiers. The link between the NLST and CTIL PIDs is known only to Westat and CTIL management. A unique six-digit CTIL accession number is then assigned to each study to distinguish studies obtained in different screening years {T0, T1, T2} or on different visits in the same screening year.

Fig 2
figure 2

CT image library storage and management. Studies arrive at the CTIL via DVD, XHD, or Internet VPN to be “checked-in” and examined for proper content (quality assurance). Acceptable studies are stored in a Merge-eFilm FUSION Server PACS with attached mirrored EMC Centera units of 8 TB each (separated by two city blocks). Stored studies are then visually inspected using eFilm client viewers. CTIL-DB, a PostgreSQL database with a custom user interface, facilitates CTIL management. Studies arrive deidentified except for a Screening Center (SC)-specific NLST participant identifier (PID). Studies are stored, void of any SC-specific information, with CTIL-assigned IDs and accession numbers. The entire CTIL and associated machinery reside on a private network.

The check-in process is performed on a Sun Microsystems (Santa Clara, CA, USA) SunFire V120 (SunOS 5.9), home to the CTIL-DB and to a password-protected web server providing a user interface with management tools to query the CTIL-DB. A provisionally accepted study is then moved to a Merge-eFilm (Milwaukee, WI, USA) FUSION Server, a commercially available PACS system. The archival storage for this FUSION Server consists of mirrored 8-TB content-addressable, network-attached EMC (Hopkinton, MA, USA) Centera units. Using a Merge-eFilm desktop client image viewer, an image librarian visually inspects the study for complete lung coverage and adequate image quality. Questionable studies are detained from library commitment, pending radiologist evaluation.

An SC and the CTIL must agree upon, for each study, the number of images sent and received. How that agreement is verified may happen in two ways. The SC may submit this number to the CTIL prior to or concurrent with image study delivery, typically in a spreadsheet containing numbers for many studies or even all studies. A CTIL librarian verifies against the study itself. Agreed-upon numbers are noted in the CTIL-DB; both SC and CTIL management have access to this information via a website with tools to check study status (see below). Alternatively, the SC may wait until the numbers are available through this website and then confirm agreement with the CTIL. Conflicting numbers must be resolved in dialog between the SC and a librarian.

Scanner Parameters

Scanner parameters transmitted with each study series and recorded in the CTIL-DB are listed in Table 2. Checks are made to ensure that these parameters are within NLST protocol image-acquisition specifications; studies with measurements falling outside these limits are flagged within the CTIL-DB, but are otherwise included in the library. At least one series of any study must indicate a protocol-allowed reconstruction filter, an image-reconstruction thickness of no more than 2.5 mm, and a slice-reconstruction interval equal to or less than the slice thickness.

Table 2 Series Level CTIL-DB Entries for Scanner Parameters

CTIL-DB Access

Figure 3 illustrates management database access. CTIL management monitors in-house activity via user interface tools on a private network web server. These tools help track the various stages of library check-in and thus facilitate recognition of discrepancies and provide the means to resolve them. Screening centers are granted similar access, provided they make a VPN connection. For SCs without VPN access and for Westat and NCI, a static copy of the CTIL-DB is downloaded weekly to a machine with a publicly accessible web server. Access to current reports enables sponsorship (NCI) and LSS project management (Westat) to assess the status of the collection effort. The SCs are granted access to reports only of their own studies, primarily to double check that the studies they believe to have been sent to the CTIL have, indeed, arrived and contain the same number of images believed to have been sent. Logon to both the private and public servers is password-protected.

Fig 3
figure 3

Accessing the CT image library management database (CTIL-DB). The CTIL-DB is on a private network. Management tools, as real-time queries to CTIL-DB, are available via a user interface on a web server, also on the private network. A screening center (SC) with virtual private network (VPN) privileges may use similar tools for identifying which studies, including the number of images in each, have been committed to the CTIL. Other SCs, as well as NLST personnel at Westat and the National Cancer Institute, have access to similar tools on a public network web server querying a static copy (updated weekly) of the CTIL-DB. A password is required to access either web server user interface. Any SC is restricted to data relevant to its own site.

Library Size

If each of the 17,308 CT participants receives three screens, the CTIL would comprise 51,924 studies, although some participant dropout is to be expected. Differing SCs have different average study sizes of 150–450 slices, somewhat depending on the number of series reconstructed. All reconstructed series are transmitted to the CTIL, with studies averaging about 300 slices. Storagewise, the CTIL will be 6–8 TB. As of May 2006, about 31,000 studies (∼60%) have been received.

CTIL Personnel

Except for the principal investigator, MIR personnel affiliated with the CTIL all work for and are located in MIR's Electronic Radiology Laboratory (ERL). Most have other duties within ERL, and some are called upon only as needed. A project manager oversees general software development and database management, a full-time programmer monitors the CTIL-DB and manages the websites, a data manager oversees two image librarians and the image study check-in activities, a system administrator ensures hardware is operational and arranges for upgrades and backups, and a network specialist monitors the private network and VPN access.

Discussion

Collection Flexibility

The collection-process design applies a rigorous structure by which each study must match an entry in a Westat list of known studies by birth date, gender, and study date. And all studies arrive at the library with their DICOM headers voided of PHI in the same manner. Rigor aside, the SCs have wide latitude in local workflow implementation. There is no particular order in which studies are selected for submission nor in the vehicle chosen for delivery to the library. Although most SCs prefer to accumulate hundreds of studies on an XHD, others prefer to work in smaller sets using DVDs. Creation of an LSS-NLST CT image library requires coordinating the delivery and posting of a large backlog of existing cases, followed by the ongoing delivery and posting of a smaller number of cases as they are acquired. As active participants in the QA portion of NLST, screening centers are familiar with moving studies from their PACS to a Clinical Studies Workstation and transmitting studies with new identifiers and scrubbed DICOM headers. Use of XHDs and/or DVDs offers additional flexibility at very little learning cost. Ongoing relationships among screening center coordinators and library management, established through the NLST QA work, facilitate the resolution of unforeseen problems.

DICOM Storage

All studies are stored in the library in a standard DICOM format. This allows library management to use a commercially available PACS for the storage and retrieval of CTIL studies. Proven Merge-eFilm PACS and EMC mirrored storage, both with ongoing maintenance agreements, offer reassurance of a reliable library. Furthermore, retrieved studies destined for general research consumption are already in a format readable by a wide variety of public domain DICOM software. All clinical PACS and many commercial image processing packages, e.g., Analyze and Matlab, support DICOM as an input format. Most privately developed image processing research tools also support DICOM, the standard output format for clinical modalities.

Significance of the CTIL

To stimulate computer-aided diagnostic (CAD) research in lung nodule detection and classification, the NCI launched the Lung Image Database Consortium (LIDC)4 to form an image database of retrospective and prospective studies with 3–30 mm nodules, contributed by five institutions and documented with interinstitution expert interpretation of image, clinical, and laboratory data. Eligible studies include both diagnostic and screening types; and, for a given patient, may or may not include serial events. The image database and attending documentation is to be made publicly available, in its entirety or in part, on DVD, for the modest cost of reproduction. There are no plans to maintain separate CAD development and evaluation subsets. Unlike the LIDC, the CTIL is an offshoot effort of a larger trial, conceived well after that trial began. Little will be known of the nodule content of the CTIL until NLST screening is complete in mid 2006. But the CTIL is attractive because of its well-defined patient population, scanned using a common protocol, and its collection of evenly spaced studies. It is anticipated that the large size of the CTIL database will support its organization into separate development and evaluation sets. This is a challenging issue that may come to the forefront as better CAD nodule detectors and classifiers appear in the market. As consumers, diagnostic medical centers will have no way to objectively compare vendor claims unless competing products have been certified against the same evaluation image set. In addition to investigations related to CAD programs for nodule detection, classification, and growth, the CTIL will be a rich source of image data for other research, such as studies related to emphysema or other pulmonary or nonpulmonary conditions, reader comparison studies, and technical CT scanner issues.

Accessing the CTIL

The NCI, although acutely interested in the widespread use of the library by both NLST clinical researchers and others, has not yet completed the formulation of a strategy for making CTIL content available to investigators. However, the process likely will involve submission of proposals by investigators for approval by NCI or its designees and the NLST Data and Safety Monitoring Board. Approved proposals would then be serviced by Westat, keepers of the demographic and clinical data corresponding to CTIL image sets. An investigator would first query (or request Westat to query) the IDEAS database for cases that meet the requirements of his/her research. Examples of features for which IDEAS might be searched would include age, gender, smoking history, recorded nodule sizes, screening result, and proven malignancies (lung or other). The investigator may need to refine the query if the number of cases exceeds or falls short of the range of cases sought. Westat would then provide ID numbers to the CTIL for the library retrieval and delivery of image studies to the investigator by DVD, XHD, or transmitted over a secure network. These measures will be necessary to avoid release of information that may affect the integrity of the NLST and to discourage frivolous requests. Until a formal mechanism is established, inquiries regarding CTIL usage may be made to the individuals named in Appendix B.

Conclusion

The CTIL infrastructure is now in place and collecting all LSS-NLST CT studies with anticipated completion in late 2006. Its size and content of evenly spaced serial screens, from a specific group of participants with histories of heavy smoking, make the CTIL attractive to lung nodule CAD developers and clinical researchers of lung disease.