Abstract
Medical image classification (for example, lesions on MRI scans) is a very challenging task due to the complicated relationships between different lesion sub-types and expensive cost to collect high quality labelled training datasets. Graph model has been used to model the complicated relationship for medical imaging classification successfully in many previous work. However, most existing graph based models assumed the structure is known or pre-defined, and the classification performance severely depends on the pre-defined structure. To address all the problems of current graph learning models, we proposed to jointly learn the graph structure and use it for classification task in one framework. Besides imaging features, we also use the disease semantic features (learned from clinical reports), and predefined lymph node ontology graph to construct the graph structure. We evaluated our model on a T2 MRI image dataset with 821 samples and 14 types of lymph nodes. Although this dataset is very unbalanced on different types of lymph nodes, our model shows promising classification results on this challenging datasets compared to several state of art methods.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
1 Introduction
Accurate classification of lymph nodes is of very important clinical meaning for the diagnosis or prognosis of numerous diseases such as metabolises cancer and can be used as assistance for the radiologist to locate abnormality and diagnosis different diseases. Developing machine learning methods to automatically identify different lymph nodes from MRI scans is a very challenging task due to the similar morphological structures of lymph nodes, very complicated relationship between different types of lymph nodes and the expensive cost to collect large-scale labelled image datasets. Labelling lymph node types from the MRI scans is very time consuming and expensive since it requires the well trained radiologist to look into each MRI slices, therefore, it is not realistic to collect a large size labeled dataset to train a deep learning system.
In this work, instead of manually labelling images, we extracted lymph node key words from clinical notes associated with MRI scans by experienced radiologists and used the extracted key words as classification labels for the corresponding MRI images. We extracted 14 different types of lymph nodes from the clinical reports (as shown in Fig. 3) from 821 T2 MRI key slices. It is worth noting that our dataset is highly unbalanced on different types of lymph nodes (some lymph node has more than 80 training images and some lymph node only have less than 10 training images). Considering the unbalanced small size training dataset, it is very challenging to train the classification model with high accuracy and generalizability.
Motivated by recent works on successfully using semantic information for image classification and image captioning problems [10, 11], we proposed to leverage the semantic features learned from clinical notes along with a predefined lymph node ontology graph (as shown in Fig. 2) by radiologist for the lymph node classification. We proposed to learn a semantic feature embedding on the clinical reports and used it to learn the semantic relationship between different lymph node types. Based on this semantic embedding space, we also combined it with the ontology graph (shown in Fig. 1 (b)) to construct a knowledge graph to guide our classification task. Besides, most existing graph models assume that the graph structure is known or predefined. When graph structure is unknown, they usually use K-nearest neighbor to construct a graph structure from image features and use it for downstream image classification task [2, 8]. Defining graph structure is critical for the downstream graph node classification task, recent work shows that learning a graph structure can significantly boost the performance of graph based classification tasks [12]. In this work, we also proposed a joint graph structure learning and classification framework with prior information from semantic features (learned from clinical reports) and radiologist defined lymph node ontology graph. We evaluated our proposed model on lymph node classification in T2 MRI scans and show consistent improvement compared to several state-of-the-art methods given a small and imbalanced training dataset.
2 Method
Graph Learning for Classification. We define a graph \({G}= (\mathbf {V}, E)\), where \(\mathbf {V}\) is the set of vertices or nodes (we will use nodes throughout the paper), and \(E\) is the set of edges. Let \(\mathbf {v}_i\in \mathbf {V}\) denote a node and \(e_{ij} = (\mathbf {v}_i,\mathbf {v}_j)\in E\) to denote an edge pointing from \(\mathbf {v}_j\) to \(\mathbf {v}_i\). The adjacency matrix \(\mathbf {\Lambda }\) is a \(n \times n\) matrix with \(\mathbf {\Lambda }_{ij} = 1\) if \(e_{ij} \in E\) and \(A_{ij} = 0\) if \(e_{ij} \notin E\). It is worth noting that we only consider about the un-directed graph in this work, thus \(\mathbf {\Lambda }\) is symmetric. It is straightforward to extend our frame work to the directed graph with different graph structure regularization terms. The node of graph has node feature \(x_i \in \mathcal {R}^{d\times 1}\), \(X =[x_1,\cdots ,x_n] \in \mathbb {R}^{n\times d}\) is a node feature matrix for all nodes in the graph. Let \(\mathbf {y}_i\) denotes the class labels of node \(x_i\) and \(\mathbf {Y}= [\mathbf {y}_1,\cdots ,\mathbf {y}_n]\) denotes the node class labels of all different nodes on the graph. Conventional graph learning methods usually define a function \(f(\mathbf {X},\mathbf {\Lambda })=\mathbf {Y}\) with input as node feature vector matrix \(\mathbf {X}\) and graph adjacency matrix \(\mathbf {\Lambda }\) to estimate the class labels \(\mathbf {Y}\) for each node. f can be a simple linear model or a graph neural network model. The classic graph learning solves the following function to learn f,
where \(\Vert \cdot \Vert _p\) represents \(L_p\) norm, the first term is the label prediction term and the second term is the laplacian constraint term, \(\mathbf {I}\) is the identity matrix, is the laplacian matrix of graph. If using a multi-layer deep graph neural network model for f
These graph learning methods always assumed that the graph structure is predefined (the adjacency matrix \(\mathbf {\Lambda }\) is given), which is not applicable for many cases. For example, the connection or similarity between different patients or different types of lymph nodes is hard to defined. In practice, many previous works just use a K-nearest neighbor to extract the graph adjacency matrix and use it for label propagation on the graph. Recent works have shown that defining an optimal graph structure is crucial for the label classification task on the graph and jointly optimizing label propagation and learning graph structures can significantly improve the performance and generalizability of graph models [2].
Jointly Learning Graph Structure and Classification. We propose to learn the graph structure \(\mathbf {\Lambda }\) jointly with graph label propagation, let \(\mathbf {\Lambda }= g(\mathbf {X}) \) be the function to learn the graph structure, we define the new loss function as,
There are many ways to construct the graph structure learning g function, for example, previous work [2] has tried to use Bernoulli sampling to learning the graph structure from discrete nodes, or using sparse and low rank subspace learning to learn the graph structure. We follow the [1, 6, 13] to use the sparse and low rank subspace learning in our work. However, it is straight forward to extend our work with different graph structure learning methods. We reformulated our objective function as,
where \(\mathbf {\Lambda }\) is constrained to be low rank using the nuclear norm \(\Vert \Vert _*\) and sparse using \(L_1\) norm.
Predefined Knowledge Graph. Many previous works have shown that expert knowledge is tremendously helpful for medical data analysis especially when labelled training data-size is small. In this work, we only have 821 labelled MRI slices as training data, which is very small for training a graph neural network. In order to further improve our model, we extracted a knowledge graph from radiologist labelled lymph node ontology graph as shown in Fig. 2. For all labelled training images/nodes, we will construct in-directed edges between them, if they are connected in the predefined ontology graph.
Besides the ontology graph, since we can access the clinical report associated with each MRI slice in our dataset, we also use the MRI reports (shown in Fig. 3) to train a report classification model using pre-trained BERT [4] and apply it to extract a semantic vector by attention pooling for different lymph nodes. Based on the similarity between these semantic features and the ontology graph, we constructed a knowledge graph for node label classification. Denote the adjacency matrix of the knowledge graph as \(A_{kg}\), our problem is further formulated as,
where \(\beta \) is a hyper-parameter to learn for adding the knowledge graph as prior in this problem. In order to solve Eq. 6, we use a Lagrange multiplayer to add the constraints to the objective function,
Graph Convolutional Neural Networks. Graph convolutional neural networks have shown great successes on many applications [3, 14], our proposed framework can be easily combined with graph convolutional neural networks. Let \(\mathbf {H}\) denotes the hidden states of a graph neural network, \(\mathbf {W}_l\) as the weights for the hidden layer l, a standard graph convolutional neural network can be formulated \( \mathbf {H}_{l+1} =f(\mathbf {W}_l,\mathbf {\Lambda },\mathbf {H}_{l}) \), where the input is the image feature matrix \(\mathbf {X}\) and output is the predicted labels \(\mathbf {Y}\), thus, \(\mathbf {H}_{0} = \mathbf {X}, \mathbf {H}_{last} = \mathbf {Y}, \mathbf {H}=[\mathbf {H}_0,\cdots ,\mathbf {H}_{last}]\). We can rewrite Eq. 7 using graph convolutional neural networks as,
Optimization Method. It is trivial to optimize both the graph structure and node classification on the same dataset. In order to solve Eq. 7, we proposed a bi-level optimization method to learn the graph structure \(\mathbf {\Lambda }\), GNN model weight \(\mathbf {W}\) and labels \(\mathbf {Y}\) jointly. The graph structure adjacency matrix \(\mathbf {\Lambda }\) is learned on the validation dataset, and the GNN model weight \(\mathbf {W}\) is learned on the training dataset. Hyperparamters \(\lambda _0,\lambda _1,\lambda _2,\lambda _3,\beta \) are also learned on the validation dataset too. The detailed optimization algorithm is shown in Algorithm 1.
3 Experiments
Dataset. For model development and validation, we collected large-scale MRI studies from **, performed between Jan 2015 to Sept 2019 along with their associated radiology reports. The majority (63%) of the MRI studies were from the oncology department. This dataset consists of a total of 821 T2-weighted MRI axial slices from 584 unique patients. The lymph node labels were extracted by a radiologist with 8 years of post-graduate experience. The study was a retrospective study and was approved by the Institutional Review Board with a waiver of informed consent. This dataset comprised the reference (gold) standard for our evaluation and comparative analysis.
Benchmark Methods. We implemented several benchmark methods in our experiments. 1) Support Vector Machine (SVM) [7]: applying classical SVM on the extracted multi-scale bounding box features. 2) Structured SVM: constraining the support vector machine to output structural labels constrained by the knowledge graph structure in Fig. 2. 3) Standard Simple Graph Model (SG); 4) SG with Graph Structure Learning (SG+SL); 5) SG with SL and Predefined Knowledge Graph (SG+SL+KG); 6) Deep Neural Graph (GCN); 7) GCN with Graph Structure Learning (GCN+SL); 8) GCN with SL and Predefined Knowledge Graph (GCN+SL+KG); 9) Deep Neural Hyper-Graph (HGCN)[9]; 10) HGCN with Graph Structure Learning (HGCN+SL); 11) HGCN with SL and Predefined Knowledge Graph (HGCN+SL+KG). We use the same lymph node image feature embedding framework for all competing methods since we want to show the different classification performance between our methods and all other benchmark methods.
Experiment Setting and Data Processing. We divided the dataset into 10 folds, use one fold as validation dataset and two folds as testing dataset, the left seven folds are used as the training dataset. We run the cross validation for 10 times and report the averaged top-k (k = 1,2,3) classification of accuracy for different types of lymph nodes, F1-score and AUC of binary classification performance. In our dataset, we have the access to the clinical report of each MRI scan. The radiologist describes the lymph node information including the labels, size measurements, and slice numbers in a sentence with hyperlink (called bookmark) referred to the related MRI slices. The radiologist defines a bookmark as a hyperlink connection between the annotation in the image and then writes description in the report. We have one experienced radiologist to extract the lymph node labels from the bookmark linked sentences and use them as the ground truth labels for the lymph nodes in the connected MRI slices.
The size of lymph nodes are measured by four points at maximum dimension of lymph nodes or two points at the maximum dimension of lymph nodes. Based on these key points, we extracted multi-scale bounding boxes around the lymph nodes and extract the features in these bounding boxes using pretrained CNN model on MRI slices. We further use graph-cut to extract the fine contours of the lymph nodes and extract the cut lymph nodes using pretrained CNN model. We concatenated all these multi-scale bounding box features and lymph node features and use it as the graph node feature representation. The length of the concatenated multi-scale feature vector is 25088. We use the pre-trained bioBERT to train the clinical notes classification and label attention to extract the semantic label embedding. We used about more than 28000 sentences of de-identified clinical reports from ** hospital and use it to embed the distance between different lymph node names in our dataset. Based on the semantic distance between different lymph nodes, we constructed a semantic embedding graph and combine it with the predefined ontology graph shown in Fig. 2 to refine the final label prediction results.
Quantitative Results. We compared our proposed model to several benchmarks and Table 1 shows the top-k mean accuracy and F1 score of 14 classification results on the 10-fold cross validation. Top-k accuracy has been broadly used for multi-class classification performance measurement in previous work [5]. We show that the simple graph model generally outperforms SVM methods, and the structured SVM improves classical SVM about \(>0.03\) on both accuracy and F1 score by adding the structured constraints on different classes (extracted from the pre-defined ontology graph). Learning graph structure improves the top-k accuracy of the simple graph model by \(> 0.03\) and F1 score about \(>0.02\). Using knowledge graph improves the simple graph mode further about \(>0.03\) on both top-k accuracy and F1 score. The convolutional neural graph also improves the classification top-k accuracy and F1 score consistently compared to the simple graph model. Learning graph structure and using knowledge graph under convolutional neural graph framework, our proposed model achieves the best performance and shows about 0.91 on top-3 accuracy and 0.90 on top-3 F1 score. We also combine the graph learning method with convolutional hyper-graph model and show that it improves the accuracy and F1-score \(>\% 2\) compared to convolutional graph model. The best top-3 accuracy and F1 score achieved by hyper-graph model is \(93\%\) and \(0.92\%\).
References
Chen, J., Yang, J.: Robust subspace segmentation via low-rank representation. IEEE Trans. Cybern. 44(8), 1432–1445 (2014)
Franceschi, L., Niepert, M., Pontil, M., He, X.: Learning discrete structures for graph neural networks. In: Proceedings of the 36th International Conference on Machine Learning (2019)
Hamilton, W.L., Ying, R., Leskovec, J.: Representation learning on graphs: Methods and applications. CoRR abs/1709.05584 (2017)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2019)
Lu, J., Xu, C., Zhang, W., Duan, L.Y., Mei, T.: Sampling wisely: Deep image embedding by top-k precision optimization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), October 2019)
Sui, Y., Wang, G., Zhang, L.: Sparse subspace clustering via low-rank structure propagation. Pattern Recogn. 95, 261–271 (2019)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res. 6, 1453–1484 (2005)
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. CoRR abs/1901.00596 (2019)
Yadati, N., Nimishakavi, M., Yadav, P., Louis, A., Talukdar, P.P.: Hypergcn: Hypergraph convolutional networks for semi-supervised classification. CoRR abs/1809.02589 (2018). http://arxiv.org/abs/1809.02589
Zareian, A., Karaman, S., Chang, S.: Bridging knowledge graphs to generate scene graphs. In: CVPR (2020)
Zhang, D., et al.: Knowledge graph-based image classification refinement. IEEE Access 7, 57678–57690 (2019)
Zhou, Y., Sun, Y., Honavar, V.G.: Improving image captioning by leveraging knowledge graphs. CoRR abs/1901.08942 (2019)
Zhu, X., Zhang, S., Li, Y., Zhang, J., Yang, L., Fang, Y.: Low-rank sparse subspace for spectral clustering. IEEE Trans. Knowl. Data Eng. 31(8), 1532–1543 (2019)
Zhu, Y., Zhu, X., Kim, M., Yan, J., Kaufer, D., Wu, G.: Dynamic hyper-graph inference framework for computer-assisted diagnosis of neurodegenerative diseases. IEEE Trans. Med. Imaging 38(2), 608–616 (2019)
Acknowledgment
This research was supported in part by the Intramural Research Program of the National Institutes of Health, Clinical Center and National Library of Medicine.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, Y. et al. (2021). Learning Structure from Visual Semantic Features and Radiology Ontology for Lymph Node Classification on MRI. In: Lian, C., Cao, X., Rekik, I., Xu, X., Yan, P. (eds) Machine Learning in Medical Imaging. MLMI 2021. Lecture Notes in Computer Science(), vol 12966. Springer, Cham. https://doi.org/10.1007/978-3-030-87589-3_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-87589-3_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87588-6
Online ISBN: 978-3-030-87589-3
eBook Packages: Computer ScienceComputer Science (R0)