INTRODUCTION

Multi-institutional collaborations in biomedical research and clinical studies require access to data and applications outside local institutional boundaries. This necessitates development of tools and software infrastructures that support sharing of remote data and analytical resources across geographically distributed institutions. In radiology, it is still a challenging task to share data across institutions despite the widespread adoption of the Digital Imaging and Communications in Medicine (DICOM) standard1, which enabled digital storage technologies such as Picture Archiving and Communication System (PACS). All too often, the community relies on simple yet inefficient means of sharing data such as burning it on CDs and mailing them. This is largely due to the lack of fast and secure mechanisms that support interactive access to data resources from outside institutional firewalls. As a remedy to this problem, this paper proposes a set of mechanisms for standardized, efficient, and secure access to geographically distributed sources of imaging data potentially represented and stored in diverse formats. For this purpose, a toolkit, known as VirtualPACS, has been designed and developed.

VirtualPACS can behave as a virtualized DICOM PACS server that allows local DICOM data producers and consumers to interact with multiple image data sources over the Internet. These data sources can be native DICOM sources (such as DICOM PACS servers) and non-native DICOM sources. Non-native DICOM sources refer to sources of DICOM objects, which do not support DICOM messaging and may store images on disk or in databases. Image archives such as National Cancer Imaging Archive (NCIA), American College of Radiology Imaging Network (ACRIN), and Quality Assurance Review Center (QARC) are all examples of non-native DICOM sources. In other words, VirtualPACS federates multiple remote data sources including those that do not support DICOM messaging and presents them to a DICOM client as a single virtual resource, i.e., a virtual PACS. The consistent and unifying interface provided by VirtualPACS can be used, for instance, by a DICOM review workstation to communicate with a federation of such data sources (see Fig. 1). Participants in clinical trials or cooperative oncology groups can also benefit from this toolkit by interacting with their imaging archives directly from review workstations or other local DICOM data producers and consumers.

Fig 1
figure 1

VirtualPACS creating a virtualized server for a local workstation from a collection of native and non-native, remote DICOM data sources. VirtualPACS uses the caBIG™ caGrid infrastructure 2,3 to federate remote image data sources. Each data source is exposed to the environment through caGrid data service interfaces.

The framework employed in VirtualPACS provides a set of high-level mechanisms to federate the various remote image data sources, and provide role-based, secure access to them via DICOM requests. This framework consists of three layers: presentation, middleware, and mediation layers. The presentation layer provides a gateway between local DICOM clients and remote image data resources. It converts DICOM requests into appropriate grid requests that are handled by the middleware layer. These requests are executed on the remote resources, generating appropriate messages and data for conversion by the presentation layer back to DICOM messages, which are then consumed by the local clients. The middleware layer provides the tools, runtime support, and services for secure access to a federation of image data resources. It makes use of the caGrid infrastructure 2,3 of the NCI cancer Biomedical Informatics Grid (caBIG™) program. The mediation layer implements the tools for mapping various backend image databases to a common set of interfaces and data models so that image data sources can be federated and accessed remotely within the VirtualPACS framework. This is accomplished by wrapping native DICOM and non-native DICOM image databases as caGrid data services. This layer builds on the In Vivo Imaging Middleware (IVIM) system, which also uses the caGrid infrastructure as its core technology.

The rest of this paper is organized as follows. In the RELATED WORK section, we present a brief survey of the related work. In the SYSTEM DESIGN AND ARCHITECTURE section, we describe the design and implementation of VirtualPACS and its three-layer framework. In the PERFORMANCE RESULTS AND CONCLUSIONS section, we present an experimental evaluation of the network performance of VirtualPACS for retrieving remote image data. A final discussion and some of the planned extensions are also presented in the PERFORMANCE RESULTS AND CONCLUSIONS section.

RELATED WORK

The challenges associated with sharing data and tools, and creating a federation of resources in multi-institutional settings are not specific to radiology; they arise in many other application domains in engineering, science, and biomedical research as well. Grid computing has emerged as a viable solution to address these challenges.46 Using grid-enabled software built on open protocols and standard frameworks, users, and institutions can share resources, including software, applications, data and compute and storage platforms. VirtualPACS leverages Grid technologies to provide a platform for federated access to distributed image databases. In this section, we review Grid-enabled systems that have been developed to address the challenges associated with sharing, transferring, and processing images.79

Espert et al. create virtual repositories of distributed storage sites and provide support for global querying, auto-encryption, pseudo-anonymization, and progressive download using JPEG2000.10 The MammoGrid project is developing a European-wide database of mammograms using grid technologies.7 In the GRID Platform for Computer Assisted Library for Mammography (GPCALMA) project, a CAD system is developed for detection of micro-calcifications and masses and made available to hospitals via the Grid.11 In the Medicus project12, which is used by the Children’s Oncology Group, a computer with the Medicus software stack is deployed at each participating site. DICOM images are uploaded to this client, and the metadata is stored in an SQL database. Remote clients can send global queries and securely retrieve data from the remote sites leveraging standard technologies like GridFTP and the Grid Replication Service.

In addition to the use of grid technologies to provide federation, interoperability and secure remote access, medical informatics and radiology have their own efforts to address these issues. The most notable among these is the IHE Cross-enterprise Document Sharing for Imaging Profile (XDS-I).13 IHE XDS-I, however, relies heavily on DICOM messaging for data communication and consequently cannot provide access to non-native DICOM resources. RSNAs MIRC initiative is another work in the area of federation.14 However, it is not a lightweight solution. It requires the use of MIRC field centers, which can lead to data replication at the hosting site. To provide secure remote access institutions often use Virtual Private Networks (VPN). However, as we describe in the security section of this paper, this approach is not scalable and is not always secure in collaborative projects where the users are from different institutions.

While all of these efforts address the issue of remote access to image data, they are designed as solutions for specific use-cases such as support for only an SQL backend in the Medicus project or tools and interfaces designed for mammograms only as in the MammoGrid project. Furthermore, the absence of a standardized data model makes these solutions hard to use in workflows and correlative studies involving nonimaging modalities. For example, a research workflow might involve correlation of in vivo imaging to pathology, proteomics, and genomics data. Unlike the other efforts, VirtualPACS is designed and developed as a generic, extensible platform with an architecture that places emphasis on standards and interoperability. The owners of the data control the data access, there is no replication, and data can be made available to the broader community as soon as it is made available at the remote DICOM storage site. Consumers of this data do not have to do anything special or change their existing operational workflow to get this federated access.

SYSTEM DESIGN AND ARCHITECTURE

Federation of remote image data sources requires: (1) a common data representation across data sources; (2) tools that can wrap heterogeneous data sources so that image data stored in a data source is exposed using the common data representation and accessed through well-defined interfaces; (3) a mechanism to discover data sources in the environment; (4) a mechanism to interact with secure and unsecure data sources and move results of a request efficiently from multiple data sources to data requester/consumer; and (5) a mechanism to translate a DICOM request to a request that can be passed to the data sources for execution, and the results of the request back to DICOM objects. To meet these requirements, the VirtualPACS platform is realized in a layered architecture, consisting of three layers (see Fig. 2). The requirements in items 1 and 2 are addressed by the mediation layer. The middleware layer implements the support to meet the requirements in items 3 and 4. The requirement in item 5 is supported by the presentation layer. In this section, we present these three layers, the interaction between the layers, and their implementation in the current release of VirtualPACS.

Fig 2
figure 2

The layered design of the VirtualPACS and the functions of each of its three layers.

Presentation Layer

The presentation layer of VirtualPACS is a client-side application that provides DICOM-aware virtualization, i.e., a gateway to other image data resources that are exposed on the Grid through the middleware and mediation layers. Clients interact with the presentation layer using a set of DICOM requests. The presentation layer is designed to accept DICOM requests and objects using the DICOM Part 8 network communication protocols. Existing client applications do not need to be modified to access the federated resources on the Grid using VirtualPACS. This is one of the key distinguishing features of our solution when compared to other Grid-enabled solutions.79 We have tested VirtualPACS with a commercial DICOM workstation, Merge Healthcare eFilm and open source DICOM workstations such as Osirix15, K-PACS16, and PixelMed.17 All these workstations could successfully query, retrieve, and submit data from and to remote image data sources through a VirtualPACS instance.

When a client sends a DICOM request to the presentation layer, it is translated into a Grid request, which is handled and sent to the remote resources by the middleware layer. For example, if researchers in a cooperative groupFootnote 1 wish to run queries on data sources within their cooperative group, they send a C_FIND query to the VirtualPACS. The presentation layer converts this request into a query expressed in the caGrid Query Language (CQL) conforming to the NCIA DICOM Data Model and passes it to the middleware layer. The middleware layer sends the query to the participating image sources. These image sources are wrapped as caGrid data services by the mediation layer, which parses the CQL query and translates to a query for the backend image source. As each site returns its results, they are transferred to the presentation layer through the middleware layer. The results are translated into a set of DICOM attributes and streamed as they arrive, to the DICOM client that initiated the query. In this transaction, the workstation has no knowledge of the Grid and is configured only to speak to another DICOM PACS, i.e., VirtualPACS. This use-case can similarly be extended to data retrieval and submission.

Middleware Layer

The middleware layer of VirtualPACS addresses the requirements outlined earlier in items 3 and 4. It leverages the caGrid infrastructure of the caBIG™ program for this purpose.2,3 caGrid provides the technological framework for common data representation, advertisement, discovery, and invocation of data and analytic resources. In caGrid, all resources are exposed as Grid services with well-defined interfaces and published data models to ensure semantic and syntactic interoperability. Services and clients interact with each other by communicating over the Grid using standard service invocation protocols.

The caGrid infrastructure consists of coordination services that are required by clients and services for Grid-wide functions. Coordination services include services for metadata management, service advertisement and discovery, and security.3 Using the metadata, advertisement, and discovery services of caGrid, VirtualPACS can discover image data sources in the environment and make them available to DICOM clients as a federated collection.

Mediation Layer

This layer provides support for data source owners to expose their image data resources through a well-defined interface and common image data model so that they can be federated in the VirtualPACS platform. This support is implemented using the In Vivo Imaging Middleware (IVIM) that is layered on top of the caGrid infrastructure. The image data is represented using a data model, produced by the NCIA, based on the DICOM standard. This model, referred to here as the NCIA DICOM Data Model, includes about 90 commonly used DICOM attributes. A small subset of this model illustrating some well-known attributes is shown in Figure 3. caGrid employs the NCI cancer Data Standards Repository (caDSR), the Enterprise Vocabulary System (EVS), and the Mobius Global Model Exchange (GME)18,19 to manage published data and information models. The NCIA DICOM data model is registered in the caDSR/EVS system and the corresponding XML schema in the GME. This makes it possible to share the model among image data providers in the community and develop image data services that expose heterogeneous collections of backend databases (e.g., image data in a relational database, images stored in files, or PACS) through a common data model.

Fig 3
figure 3

A subset of the NCIA DICOM Data Model used in VirtualPACS and IVIM.

An image data source interfaces with the environment as a DICOM Data Service (DDS), provided in IVIM. This enables the presentation layer of the VirtualPACS to interact with the data sources and exchange data and requests using standard protocols. The presentation layer uses the client side API of DDS to query, retrieve, and submit image data objects. DDSs use GridFTP20,21, which is a high-performance data exchange protocol and its implementation, for data transfer between the service and the clients. IVIM provides a default DDS implementation for PACS, which translates Grid requests into DICOM requests to the backend PACS. A PACS administrator can use this implementation to expose their PACS on the Grid. Providers of non-DICOM image sources can expose their systems using DDS interfaces, but are required to implement mechanisms to translate queries submitted to DDS into requests for their backend systems.

A query to a DDS instance is represented in the caGrid Query Language (CQL), which is an XML-based, object-oriented query language.22 Each query specifies a target object (the result type) and a list of object properties (the name-predicate-value pairing of queried attributes). Figure 4 shows an example CQL query for a DICOM request to find patient data given the name of the patient and a study instance UID. When the DDS implementation provided by the IVIM receives a CQL query, its query processor validates and translates the CQL query into a DICOM C_FIND request. The results of C_FIND are converted into the NCIA DICOM data model, transmitted as SOAP messages and returned as objects nested in a CQL Query Result Set. Retrieval and submission of data use a slightly different mechanism. While the retrieval and submission requests are sent as CQL query objects and are translated and processed as appropriate DICOM requests (C_GET/C_MOVE or C_STORE), the service does not return the data as SOAP messages. Instead, it moves the data to its GridFTP server and returns the list of GridFTP URLs that are used as pointers to the locations that the presentation layer can use to download/upload data.

Fig 4
figure 4

Simple DICOM query to retrieve Patients with name John Doe and Study Instance UID = 1.3...1832. Also shown is the corresponding CQL statement for this query and the subset of the NCIA DICOM Data model used in constructing the CQL statement.

Security in VirtualPACS

Secure and authorized access to data is of critical importance, especially when the data contains clinical information. As the presentation layer acts as a gateway between a DICOM network and the Grid, security at both its DICOM interface and Grid interface must be considered (see Fig. 5). A common security policy employed by existing DICOM entities is to secure access from within an institution through a trusted network of preauthorized devices.Devices whose IP addresses and AETitles are not in the preauthorized device list are denied access. Access from outside an institutional firewall is usually provided through VPN tunnels. These user authentication strategies can become problematic in use-cases involving remote review and collaborative environments, where the authorized remote user of data is not a member of the institution that is hosting the data. As the number of such remote users increases, it can become impractical to issue credentials and VPN access, as such privileges make them trusted users of the entire institution, allowing access to all DICOM devices and all data on these devices. Furthermore, this strategy does not always address data-level authorization, which is critical when privacy and IRB requirements are taken into consideration for research and cross-institutional collaborations.

Fig 5
figure 5

Schematic showing the stages of VirtualPACS security when DICOM request is issued.

To secure request/response transactions across the middleware layer and authorize them at the mediation layer, VirtualPACS relies on the caGrid security infrastructure, called GAARDS.23 The GAARDS infrastructure enables federation of user credentials, distributed role-based access control, secure and encrypted data transfer, and management of a trust fabric. It provides single sign-on, can use an individual’s institutional identity provider or a grid-based virtual organization identity to generate a grid credential that is used for authenticating and authorizing an individual across the grid. Collaborative projects can make use of this credential to authenticate users from collaborating institutions to distributed resources and allow resource providers to implement and enforce their own access policies on their grid resources.

The VirtualPACS user interface has been designed with the necessary plug-ins for a user to login and obtain credentials (see Fig. 5) through grid identity providers such as the GAARDS Dorian service24 or their local institutional identity provider. Once the user has logged in, a proxy certificate is generated and stored in the local VirtualPACS instance, which is then transmitted to remote grid services during the query, retrieve, and submit process, and used for authentication, authorization, and securing the transfer of DICOM data to and from the VirtualPACS.

We anticipate that the presentation layer will run behind institutional firewalls and can leverage existing IT security infrastructures for all activities on its DICOM interface, i.e., for all local transactions, it will use the local security infrastructure and policies. On it’s caGrid interface, all traffic is initiated by the presentation layer, and therefore, does not require any inbound firewall exception handling. There is, however, potential loophole caused by differences in the security capability between DICOM and the middleware. This is illustrated in the following scenario. Assume user A starts a VirtualPACS instance with their grid credentials. At this time, user B, who also is authorized to use the same VirtualPACS, can send DICOM requests and access data that only user A is authorized to access. To prevent this unauthorized access by user B, VirtualPACS provides the option of generating unique session-based AETitle instead of a predefined AETitle. When it is configured to use unique session based AETitles, VirtualPACS will generate a unique AETitle, anytime a user logs in. The user is then required to configure their workstation to communicate with the VirtualPACS using this new AETitle.

PERFORMANCE RESULTS AND CONCLUSIONS

We have evaluated the data transfer rates of VirtualPACS for accessing images from a remote site. In this experiment, DICOM data of multiple Patients were ingested into an open-source PACS server built using the open-source DICOM toolkit known as PixelMed.17 This PACS server was running on one of the three nodes (Node 1) of a 64-node Linux cluster located at the Department of Biomedical Informatics at The Ohio State University. A second node (Node 2) hosted a DDS providing access to the DICOM PACS server, running on node 1. A third node acted as the local workstation retrieving the data (Node 3). The three nodes are equipped with dual 2.4 GHz Opteron 250 processors, 8 GB of RAM, and 35TB RAID5 disk array. The nodes are interconnected with a dedicated gigabit Ethernet.

The DICOM image datasets used in these experiments consisted of thoracic CT images of multiple patients. Sets of queries were created to extract a subset of patient series. These queries were created in such a way so that their response would provide data between 40 MB to 4 GB. These queries were then run by node 3 using the VirtualPACS infrastructure, and the retrieval times were recorded. These same queries were also run by a DICOM client (built using the PixelMed toolkit) to retrieve the data from the PACS server on node 1.

We evaluated the data transfer rates of VirtualPACS for accessing images, and as shown in Figure 6, the system achieved high rates in transferring data between a remote PACS and a local workstation. The timing results shown by line A demonstrate the time it takes to retrieve the series, with the data getting queried and retrieved by node 3 in one single batch transaction. Line B shows the same transaction where a single query was executed by node 3, and data was downloaded progressively (i.e., the data appears continuously on node 3). Line C shows the same set of DICOM queries when run by a DICOM client (also built using the PixelMed toolkit), to retrieve the data from the PACS server on node 1. Using the retrieval times by DICOM messaging (Line C) as a reference, we can see that our approach is not significantly worse than DICOM messaging, while at the same time, providing a DICOM client with secure access to a federation of DICOM data sources that may or may not support DICOM messaging.

Fig 6
figure 6

Performance of VirtualPACS in retrieving remote DICOM data of different sizes using two different retrieval techniques. A base comparison line (line C) shows the time it takes to retrieve the data directly from the PACS using DICOM protocols. The DICOM retrieval used an open source PACS server and DICOM client, both of which are built using the PixelMed toolkit.

In this paper, we presented a framework and a toolkit for secure, efficient, and standard-based access to remote imaging data. This work has the potential to become a key in enhancing multi-institutional collaborations in biomedical imaging studies. We are currently working on extensions to this toolkit that target support for other DICOM objects such as structured reports and work lists. We are also working on developing security plug-ins that can leverage the DICOM security standard for transactions between a DICOM client and the presentation layer of VirtualPACS. We are also working with NCIA and ACRIN to develop grid interfaces for their respective repositories, which will provide authorized users access to these large image archives using their own DICOM workstations.

VirtualPACS and all its software components are publicly available, and instructions on how to download and use them can be found in its website: http://www.virtualpacs.org.