The CT Image Library (CTIL) of the Lung Screening Study (LSS) network of the National Lung Screening Trial (NLST) consists of up to three annual screens using CT imaging from each of 17,308 participants with a significant history of smoking but no evidence of cancer at trial enrollment (Fall 2002–Spring 2004). Screens performed at numerous medical centers associated with 10 LSS-NLST screening centers are deidentified of protected health information and delivered to the CTIL via DVD, external hard disk, or Internet/Virtual Private Network transmission. The collection will be completed in late 2006. The CTIL is of potential interest to clinical researchers and software developers of nodule detection algorithms. Its attractiveness lies in its very specific, well-defined patient population, scanned via a common CT protocol, and in its collection of evenly spaced serial screens. In this work, we describe the technical details of the CTIL collection process from screening center retrieval through library storage.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Background
The National Lung Screening Trial (NLST) aims to compare the effectiveness of two screening tests, low-dose spiral CT scan and chest x-ray (CXR), on net lung cancer-specific mortality in persons who are at high risk for developing lung cancer. The trial is sponsored by the National Cancer Institute (NCI) and conducted under a harmonized protocol within two separate administrative organizations: the Lung Screening Study (LSS) and the American College of Radiology Imaging Network (ACRIN). Accrual through 10 LSS screening centers (SCs) is complete with 34,614 participants enrolled from September 2002 to April 2004. One SC has a satellite medical center that functions operationally independently of its parent, and many SCs enroll participants through multiple medical centers. The LSS SCs operate within the screening centers of the Prostate, Lung, Colorectal, and Ovarian (PLCO) Cancer Screening Trial Network.1 NCI has contracted Westat (Rockville, MD, USA), an independent research corporation, to provide coordinating and statistical services for the LSS network.
Participants are randomized to CT and CXR groups. Screens (studies)Footnote 1 are obtained at baseline, then annually for 2 years—three studies per participant (screening years T0, T1, and T2); the final screen will be performed in mid 2006. Each SC provides diagnostic interpretation of local imaging studies. Westat maintains diagnostic summaries as well as other clinical and demographic data. The management tool and database used to collect and maintain these data is called the Interactive Data Entry and Administrative System (IDEAS). Westat provisioned each SC with an IDEAS workstation for local keying, data verification, and error checking of demographic and clinical information. IDEAS allows local storage of participants' privacy data to facilitate correspondence with participants and their physicians, and to request medical records. It also permits the encrypted transmission of demographic and clinical data to the central repository at Westat. Reports and files may likewise be downloaded from Westat to SC IDEAS workstations. Radiologic images are neither transmitted to nor stored in IDEAS.
The Mallinckrodt Institute of Radiology (MIR), Washington University School of Medicine, manages the imaging Quality Assurance Coordinating Center for the LSS network of the NLST2. Following MIR's early experience in the management of quality assurance image studies from multiple sites, NCI contracted with MIR to assemble and administer the LSS-NLST CT Image Library (CTIL) to consist of digital copies of all LSS CT studies. This central image repository will serve as a large collection of serial CT screens from a well-defined population for use by imaging researchers. In this work, we describe the methods for the delivery of studies to the library, the hardware and software involved, and the check-in management of arriving studies. Confidentiality, security, quality assurance, and accessibility issues are also addressed.
All participants enrolling in the NLST signed an informed consent developed and approved by the SCs' institutional review boards (IRBs), the NCI IRB and the Westat IRB, before randomization. At the initiation of the CTIL, the SCs worked with MIR to establish data use agreements for the image sets to be transferred to the CTIL and which are in place for every SC. After April 2003, newly recruited and returning participants signed the necessary Health Insurance Portability and Accountability Act (HIPAA) agreements as directed by their institutions and approved by NCI to allow inclusion of deidentified CT examinations in the CTIL. As soon as the CTIL is operational and image sets are prepared for “check-out,” NCI will ask investigators to sign a separate materials and data use agreement for each research initiative.
The LSS utilizes a Manual of Operations and Procedures (MOOP) that details the development, implementation, and evaluation of LSS-NLST protocol. To ensure SC compliance with this protocol, including informed consent issues, Westat monitors SC activities on an ongoing basis and reports regularly to NCI and the SCs. NCI and Westat also conduct annual site visits at each SC to audit and monitor screening activities, in addition to evaluating the SC's adherence to LSS quality assurance and quality control procedures.
Methods
Study Collection
Table 1 lists the SCs and, the number and percentage of CT participants. SCs adhere to a strict LSS-NLST CT acquisition protocol, though CT scanners vary by vendor (GE, Philips, Siemens, Toshiba) and model across the SCs. An LSS medical physicist coordinates scanner-QA testing with medical physicists associated with each SC, and three LSS radiologists monitor image-acquisition quality on a monthly basis.
Screening centers vary in their storage of NLST CT studies. Some use their medical centers' picture archive and communications system (PACS), whereas others use a research PACS or even an NLST-specific archive. Figure 1 illustrates the generalized harvesting of CT studies from varied archives in preparation for delivery to the CTIL. Although the SCs use CT scanners from various vendors, all utilize a standard DICOM (digital imaging and communications in medicine) format. For each study, an SC has a three-step task: (1) collect each study from the local archive, (2) deidentify the study to remove protected health information (PHI), (3) deliver the study to the CTIL. To help the SCs with these tasks, MIR provisioned each SC with custom software and a laptop computer (Dell Inspiron 1150; Dell, Round Rock, TX, USA) with DVD writer and 37 GB hard drive. In addition, each SC received a 250-GB universal serial bus (USB) external hard drive (XHD). The laptop is used to obtain the CT studies from the local PACS and prepare the studies for delivery to the CTIL. The custom software provides a simple user interface to facilitate PHI removal. Those SCs with separate research PACS or NLST archives may have already partially deidentified their studies.
Any SC may choose whatever collection mechanisms best suit local workflow efficiencies. Multiple mechanisms may be used at any time, and the mix may change over time. For example, one SC may push studies from its PACS to the laptop during low network-volume evening hours, whereas another may write studies to an XHD to avoid a congested network. An SC may submit studies to the CTIL in any order that suits its workflow. For example, an SC may submit all its T0 screens before any others. Other SCs may submit all three screens (T0, T1, and T2) for participants who have completed them ahead of those who have not. Other SCs may choose to submit more recent studies that are readily available on “near-term” disk storage before those on archival tapes.
A user invokes a laptop application “Clinical Studies Workstation” (CSW) user interface3 to view studies that are currently on the laptop and to select studies for deidentification and delivery to the CTIL. When launched, the CSW application sweeps through the laptop-resident studies and builds a worklist from information in the DICOM headers of these studies. Worklist columns show participant name, local ID and accession number, study date, number of series, and total number of images. Studies are selected from the worklist, one at a time, for export to the CTIL; however, any selected study must first be certified for export.
Study Certification
The SC has two concerns in preparing a study for delivery to the CTIL: (1) the study selected for delivery must be verified as the T0, T1, or T2 study of an NLST participant; and (2) the study must be deidentified by removing all PHI.
Study Verification
Monthly, Westat provides to each SC, via IDEAS, a list of known CT studies from information provided to Westat by the SCs at the times that the screens are performed. The SC transfers this file to the laptop. Each line of this list represents a unique study and contains an NLST participant identifier (PID), study date, screen year (T0, T1, or T2), visit number, date of birth (DOB), and gender. Visit number is “1” for the first visit, but may be higher if the participant returns for a repeat screen, likely attributable to the prior visit's screen being of inadequate diagnostic quality. When a study is selected in the CSW worklist, the custom software extracts the values of three parameters from its DICOM header: study date, DOB, gender. If these same values can be found in one line of the Westat list, the study is deemed verified as belonging in the CTIL. By default, studies must match all three criteria. However, some SCs with their own research PACS may already have inserted the NLST PID as the DICOM patient ID and may already have eliminated DOB and/or gender from DICOM headers. For these studies, the matching rules are based on study date and NLST PID.
Study Deidentification
DICOM fields containing, or likely to contain, PHI are blanked or given fixed-phrase fillers. For example, the DICOM Patient Name is coded as PATIENT^NAME and Accession Number as ACC. The NLST PID is stored in the DICOM Patient ID field. The DICOM Study Date is replaced with 19990102, a date earlier than the first participant's T0 screen. To distinguish studies with the same NLST PID, the NLST CT screening year (T0, T1, or T2) is inserted in a DICOM Comment field, together with visit number and the criteria used to match against the Westat list. A comprehensive accounting of DICOM header changes, both at the time of laptop deidentification at an SC and at the time of library check-in (below) may be found in Appendix A.
Study Delivery to the CTIL
An SC may deliver deidentified studies to the CTIL via the Internet or by shipping a DVD or an XHD. If Internet transmission is chosen, the SC first makes a password-protected virtual private network (VPN) connection to the CTIL DICOM receiver at MIR. Not all SCs have opted for this route because of local firewall/network issues and policies. If an XHD is shipped, the CTIL delivers a replacement the day after its receipt. Timing, workflow, and shipping charges dictate the choice of method. Because the CTIL initiative began well after recruitment started, a backlog of more than 50% of all CT studies existed at the time that study delivery to the CTIL began. The XHD option, which allows storage of more than 1,000 CT studies of sizes typical for NLST, was offered to provide a more efficient means of study delivery. Once the backlog has been whittled, the DVD and/or network transmission options may prove preferable.
Study Check-in at the CTIL
Figure 2 provides an overview of CTIL storage and management. CTIL management developed and maintains its own database (CTIL-DB), a PostgreSQL (Wolfville, Nova Scotia, Canada) database, for tracking the study check-in process; it is independent of Westat's IDEAS. An arriving study is checked for proper identifiers and screened for PHI. Problematic studies require dialog between the submitting SC and a CTIL image librarian. For a compliant study, the CTIL-DB is updated to reflect study date, NLST PID, arrival date, scanner acquisition and reconstruction parameters, and number of images. The NLST PID, identifying the SC origin of the study, is removed from the DICOM headers and replaced with a CTIL PID that is unique to the participant but lacks any SC identifiers. The link between the NLST and CTIL PIDs is known only to Westat and CTIL management. A unique six-digit CTIL accession number is then assigned to each study to distinguish studies obtained in different screening years {T0, T1, T2} or on different visits in the same screening year.
The check-in process is performed on a Sun Microsystems (Santa Clara, CA, USA) SunFire V120 (SunOS 5.9), home to the CTIL-DB and to a password-protected web server providing a user interface with management tools to query the CTIL-DB. A provisionally accepted study is then moved to a Merge-eFilm (Milwaukee, WI, USA) FUSION Server, a commercially available PACS system. The archival storage for this FUSION Server consists of mirrored 8-TB content-addressable, network-attached EMC (Hopkinton, MA, USA) Centera units. Using a Merge-eFilm desktop client image viewer, an image librarian visually inspects the study for complete lung coverage and adequate image quality. Questionable studies are detained from library commitment, pending radiologist evaluation.
An SC and the CTIL must agree upon, for each study, the number of images sent and received. How that agreement is verified may happen in two ways. The SC may submit this number to the CTIL prior to or concurrent with image study delivery, typically in a spreadsheet containing numbers for many studies or even all studies. A CTIL librarian verifies against the study itself. Agreed-upon numbers are noted in the CTIL-DB; both SC and CTIL management have access to this information via a website with tools to check study status (see below). Alternatively, the SC may wait until the numbers are available through this website and then confirm agreement with the CTIL. Conflicting numbers must be resolved in dialog between the SC and a librarian.
Scanner Parameters
Scanner parameters transmitted with each study series and recorded in the CTIL-DB are listed in Table 2. Checks are made to ensure that these parameters are within NLST protocol image-acquisition specifications; studies with measurements falling outside these limits are flagged within the CTIL-DB, but are otherwise included in the library. At least one series of any study must indicate a protocol-allowed reconstruction filter, an image-reconstruction thickness of no more than 2.5 mm, and a slice-reconstruction interval equal to or less than the slice thickness.
CTIL-DB Access
Figure 3 illustrates management database access. CTIL management monitors in-house activity via user interface tools on a private network web server. These tools help track the various stages of library check-in and thus facilitate recognition of discrepancies and provide the means to resolve them. Screening centers are granted similar access, provided they make a VPN connection. For SCs without VPN access and for Westat and NCI, a static copy of the CTIL-DB is downloaded weekly to a machine with a publicly accessible web server. Access to current reports enables sponsorship (NCI) and LSS project management (Westat) to assess the status of the collection effort. The SCs are granted access to reports only of their own studies, primarily to double check that the studies they believe to have been sent to the CTIL have, indeed, arrived and contain the same number of images believed to have been sent. Logon to both the private and public servers is password-protected.
Library Size
If each of the 17,308 CT participants receives three screens, the CTIL would comprise 51,924 studies, although some participant dropout is to be expected. Differing SCs have different average study sizes of 150–450 slices, somewhat depending on the number of series reconstructed. All reconstructed series are transmitted to the CTIL, with studies averaging about 300 slices. Storagewise, the CTIL will be 6–8 TB. As of May 2006, about 31,000 studies (∼60%) have been received.
CTIL Personnel
Except for the principal investigator, MIR personnel affiliated with the CTIL all work for and are located in MIR's Electronic Radiology Laboratory (ERL). Most have other duties within ERL, and some are called upon only as needed. A project manager oversees general software development and database management, a full-time programmer monitors the CTIL-DB and manages the websites, a data manager oversees two image librarians and the image study check-in activities, a system administrator ensures hardware is operational and arranges for upgrades and backups, and a network specialist monitors the private network and VPN access.
Discussion
Collection Flexibility
The collection-process design applies a rigorous structure by which each study must match an entry in a Westat list of known studies by birth date, gender, and study date. And all studies arrive at the library with their DICOM headers voided of PHI in the same manner. Rigor aside, the SCs have wide latitude in local workflow implementation. There is no particular order in which studies are selected for submission nor in the vehicle chosen for delivery to the library. Although most SCs prefer to accumulate hundreds of studies on an XHD, others prefer to work in smaller sets using DVDs. Creation of an LSS-NLST CT image library requires coordinating the delivery and posting of a large backlog of existing cases, followed by the ongoing delivery and posting of a smaller number of cases as they are acquired. As active participants in the QA portion of NLST, screening centers are familiar with moving studies from their PACS to a Clinical Studies Workstation and transmitting studies with new identifiers and scrubbed DICOM headers. Use of XHDs and/or DVDs offers additional flexibility at very little learning cost. Ongoing relationships among screening center coordinators and library management, established through the NLST QA work, facilitate the resolution of unforeseen problems.
DICOM Storage
All studies are stored in the library in a standard DICOM format. This allows library management to use a commercially available PACS for the storage and retrieval of CTIL studies. Proven Merge-eFilm PACS and EMC mirrored storage, both with ongoing maintenance agreements, offer reassurance of a reliable library. Furthermore, retrieved studies destined for general research consumption are already in a format readable by a wide variety of public domain DICOM software. All clinical PACS and many commercial image processing packages, e.g., Analyze and Matlab, support DICOM as an input format. Most privately developed image processing research tools also support DICOM, the standard output format for clinical modalities.
Significance of the CTIL
To stimulate computer-aided diagnostic (CAD) research in lung nodule detection and classification, the NCI launched the Lung Image Database Consortium (LIDC)4 to form an image database of retrospective and prospective studies with 3–30 mm nodules, contributed by five institutions and documented with interinstitution expert interpretation of image, clinical, and laboratory data. Eligible studies include both diagnostic and screening types; and, for a given patient, may or may not include serial events. The image database and attending documentation is to be made publicly available, in its entirety or in part, on DVD, for the modest cost of reproduction. There are no plans to maintain separate CAD development and evaluation subsets. Unlike the LIDC, the CTIL is an offshoot effort of a larger trial, conceived well after that trial began. Little will be known of the nodule content of the CTIL until NLST screening is complete in mid 2006. But the CTIL is attractive because of its well-defined patient population, scanned using a common protocol, and its collection of evenly spaced studies. It is anticipated that the large size of the CTIL database will support its organization into separate development and evaluation sets. This is a challenging issue that may come to the forefront as better CAD nodule detectors and classifiers appear in the market. As consumers, diagnostic medical centers will have no way to objectively compare vendor claims unless competing products have been certified against the same evaluation image set. In addition to investigations related to CAD programs for nodule detection, classification, and growth, the CTIL will be a rich source of image data for other research, such as studies related to emphysema or other pulmonary or nonpulmonary conditions, reader comparison studies, and technical CT scanner issues.
Accessing the CTIL
The NCI, although acutely interested in the widespread use of the library by both NLST clinical researchers and others, has not yet completed the formulation of a strategy for making CTIL content available to investigators. However, the process likely will involve submission of proposals by investigators for approval by NCI or its designees and the NLST Data and Safety Monitoring Board. Approved proposals would then be serviced by Westat, keepers of the demographic and clinical data corresponding to CTIL image sets. An investigator would first query (or request Westat to query) the IDEAS database for cases that meet the requirements of his/her research. Examples of features for which IDEAS might be searched would include age, gender, smoking history, recorded nodule sizes, screening result, and proven malignancies (lung or other). The investigator may need to refine the query if the number of cases exceeds or falls short of the range of cases sought. Westat would then provide ID numbers to the CTIL for the library retrieval and delivery of image studies to the investigator by DVD, XHD, or transmitted over a secure network. These measures will be necessary to avoid release of information that may affect the integrity of the NLST and to discourage frivolous requests. Until a formal mechanism is established, inquiries regarding CTIL usage may be made to the individuals named in Appendix B.
Conclusion
The CTIL infrastructure is now in place and collecting all LSS-NLST CT studies with anticipated completion in late 2006. Its size and content of evenly spaced serial screens, from a specific group of participants with histories of heavy smoking, make the CTIL attractive to lung nodule CAD developers and clinical researchers of lung disease.
Notes
In the context of medical imaging, “study” implies a set of at least one series each containing at least one image and should be distinguished here from an investigative study (e.g., LSS). In the context of NLST, CT studies are called screens; in this article, study and screen are considered equivalent.
References
Gohagan JK, Prorok PC, Hayes RB, Kramer BS, and the Prostate, Lung, Colorectal and Ovarian Cancer Screening Trial Project Team: The Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial of the National Cancer Institute: history, organization and status. Control Clin Trials 21 (6 Suppl):251S–272S, December 2000
SM Moore DS Gierada KW Clark GJ Blaine InstitutionalAuthorNameThe PLCO-NLST Quality Assurance Working Group (2005) ArticleTitleImage quality assurance in the Prostate, Lung, Colon, and Ovarian (PLCO) cancer screening trial network of the national lung screening trial. SCAR-2004, Poster Session J Digit Imaging 18 IssueID3 242–250 Occurrence Handle15924251 Occurrence Handle10.1007/s10278-005-5153-1
Moore SM, Maffitt DR, Blaine GJ, Bae KT: A workstation acquisition node for multi-center imaging studies. Proceedings of SPIE, Medical Imaging 2001, PACS and Integrated Medical Information Systems: Design and Evaluation 4323:271–277, 2001
SG Armato G McLennan MF McNitt et al. (2004) ArticleTitleLung image database consortium: developing a resource for the medical imaging research community Radiology 232 IssueID3 739–748 Occurrence Handle15333795
Acknowledgments
Funding for this project was provided in part by the National Cancer Institute Contract N01-CN-25516 (National Lung Screening Trial). The CTIL gratefully acknowledges Merge-eFilm's generous contribution of the FUSION Server and its continued support under their research agreement with the Mallinckrodt Institute of Radiology. The CTIL likewise thanks the Westat management team that has readily offered assistance and encouragement at every juncture. CTIL management is indebted to all LSS screening center coordinators whose cooperative spirits have eased the monumental chore of image study collection. The authors thank our image librarians, Joan Moulton and Mary Wolfsberger, who cheerfully manage the day-to-day CTIL check-in process.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
DICOM Header Changes
The table below identifies the changes made to DICOM headers at the time of deidentification on the laptops at the screening centers and at the time studies are committed to the CT Image Library. Quoted entries are literals.
Not listed here are the so-called “private” or vendor scanner-specific DICOM groups and elements (attributes) that are only retained by laptop deidentification software if used for computing CT pitch. In the Library, all private attributes are removed.
Changes to DICOM Groups and Elements | |||
DICOM Group and Element | Description | Laptop Deidentification | Library Deidentification |
0008 0012 | Instance Creation Date | “19990102” | “19990102” |
0008 0020 | Study Date | ||
0008 0021 | Series Date | ||
0008 0022 | Acquisition Date | ||
0008 0023 | Image Date | ||
0008 0013 | Instance Creation Time | “1200” | “1200” |
0008 0030 | Study Time | ||
0008 0031 | Series Time | ||
0008 0032 | Acquisition Time | ||
0008 0033 | Image Time | ||
0008 0050 | Accession Number | “ACC” | CTIL Accession Number1 |
0008 0090 | Referring Physician | “REFERRING^PHYSICIAN” | “REFERRING^PHYSICIAN” |
0008 1030 | Study Description | <blank> | <blank> |
0008 1050 | Performing Physician | “PERFORMING^PHYSICIAN” | “PERFORMING PHYSICIAN” |
0008 1060 | Reading Physician | “READING^PHYSICIAN” | “READING^PHYSICIAN” |
0008 1080 | Admitting Diagnoses | “ADMITTING DIAGNOSES” | “ADMITTING DIAGNOSES” |
0010 0010 | Patient ID | NLST PID2 | CTIL ID3 |
0010 0020 | Patient Name | “PATIENT^NAME” | “PATIENT^NAME” |
0010 0030 | Patient Birth Date | <blank> | <blank> |
0010 0040 | Patient Sex | <blank> | <blank> |
0010 1040 | Patient Address | “ADDR” | “ADDR” |
0020 4000 | Image Comments | 4 | 4 |
Appendix B
CTIL Inquiries
Specific policies for CTIL use are not yet well defined. Until they are, inquiries should be directed to either:
David Gierda, MD | Guillermo Marquez |
Principal Investigator | Early Detection Research Group |
LSS-NLST CT Image Library | Division of Cancer Prevention |
Mallinckrodt Institute of Radiology | National Cancer Institute |
Washington Univ. School of Medicine | Executive Plaza North |
510 South Kingshighway | Suite 3066, MSC 7346 |
St. Louis, MO 63110 | 6130 Executive Blvd. |
Rockville, MD 20852-7346 |
Rights and permissions
About this article
Cite this article
Clark, K.W., Gierada, D.S., Moore, S.M. et al. Creation of a CT Image Library for the Lung Screening Study of the National Lung Screening Trial. J Digit Imaging 20, 23–31 (2007). https://doi.org/10.1007/s10278-006-0589-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10278-006-0589-5