Hierarchical Conceptual Labeling

Jiang, Haiyun; Zhang, Cengguang; Yang, Deqing; Xiao, Yanghua; Liu, Jingping; Chen, Jindong; Wang, Chao; Li, Chenguang; Liang, Jiaqing; Liang, Bin; Wang, Wei

doi:10.1007/978-3-030-18590-9_18

Haiyun Jiang¹⁹,
Cengguang Zhang¹⁹,
Deqing Yang¹⁹,
Yanghua Xiao^19,20,
Jingping Liu¹⁹,
Jindong Chen¹⁹,
Chao Wang¹⁹,
Chenguang Li¹⁹,
Jiaqing Liang¹⁹,
Bin Liang¹⁹ &
…
Wei Wang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

3707 Accesses

Abstract

The bag-of-words model is widely used in many AI applications. In this paper, we propose the task of hierarchical conceptual labeling (HCL), which aims to generate a set of conceptual labels with a hierarchy to represent the semantics of a bag of words. To achieve it, we first propose a denoising algorithm to filter out the noise in a bag of words in advance. Then the hierarchical conceptual labels are generated for a clean word bag based on the clustering algorithm of Bayesian rose tree. The experiments demonstrate the high performance of our proposed framework.

This paper was supported by National Natural Science Foundation of China under No. 61732004.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Explaining a bag of words with hierarchical conceptual labels

Article 12 February 2020

Understanding a bag of words by conceptual labeling with prior weights

Article 14 April 2020

Automatic ontology construction based on clustering nucleus

Article 01 April 2015

1 Introduction

The bag-of-words model is widely used in many natural language processing tasks. There are lots of mature technologies to generate a bag of words (BoW) [4]. However, a BoW is just a collection of scattered words and it is difficult to be understood by machines or human beings without explicit semantic analysis. The conceptualization-based methods, i.e., conceptual labeling (CL), aim to generate conceptual labels for a BoW to explicitly represent its semantics. In [3, 5, 6], a BoW is first divided into multiple groups according to their semantic relevance and then each group is labeled with a concept that can specifically summarize the explicit semantics. We present two examples as follows.

In this paper we propose the task of hierarchical conceptual labeling (HCL), which represents the semantics of a BoW by hierarchical conceptual labels (i.e., a label set with different granularities). For example, given a BoW {China,Japan,France,Germany,Russia$\}$, the hierarchical conceptual labels can be {Asian country, EU State} and {country}. In general, the hierarchical labels contain more information, which allows real applications to select labels with different abstractness according to their real requirements.

We consider the hierarchical cluster algorithm: Bayesian rose tree (BRT) [1] as our framework to generate hierarchical conceptual labels, where the candidate concepts are derived from the knowledge base: Microsoft concept graph (MCG) [8]. Besides, we also propose a simple but effective method to delete the noise before the conceptualization operation.

2 Framework

We first present how to filter out the noise words in a BoW. Then we elaborate on the generation process of hierarchical conceptual labels.

2.1 Filtering Out Noise

The basic idea is: if a word in a BoW is hard to be semantically clustered with any other word, i.e., difficult to be tagged with the same conceptual label as any other word, then we take it as noise and remove it from the BoW.

Specifically, let $\mathcal {D}$ be the input BoW, and $d_i$ ($d_j$) be the i-th (j-th) instance^{Footnote 1} in $\mathcal {D}$. We take $p(c|{d_i},{d_j})$ to measure how well the concept c conceptualizes the semantics of two instances $d_i, d_j$. We use Bayesian rule to compute $p(c|{d_i},{d_j})$ as follows:

$$\begin{aligned} \begin{aligned} p(c|d_i, d_j)&= \frac{p(d_i,d_j|c)p(c)}{p(d_i,d_j)} = \frac{p(d_i|c)p(d_j|c)p(c)}{p(d_i)p(d_j)} \\ \end{aligned} \end{aligned}$$

(1)

Then $p(c|d_i, d_j) = \frac{1}{\tilde{p}^2}p(d_i|c)p(d_j|c)p(c)$. The prior probability p(c) measures the popularity of c. Intuitively, a larger $p(c|d_i, d_j)$ indicates c can summarize $d_i$ and $d_j$ well, so $d_i$ and $d_j$ have strong semantic relevance. $p(d_k|c)$ and p(c) are estimated using knowledge in MCG [8]. Let $\mathcal {C}_{i}$ and $\mathcal {C}_{j}$ be the concept sets of $d_i$ and $d_j$ in MCG, respectively. $\mathcal {C}_{i,j} = \mathcal {C}_{i}\cap \mathcal {C}_{j}$ denotes the shared concept set of $d_i$ and $d_j$. We describe the denoising algorithm as follows.

Consider the word ${d_i\in \mathcal {D}}$, for any other word $d_j\in \mathcal {D}$ ($d_j\ne d_i$), if we cannot find an appropriate concept in $\mathcal {C}_{i,j}$ to conceptualize $d_i$ and $d_j$, i.e.,

$$\begin{aligned} \mathop {\max }\limits _{\mathop {{d_j \in \mathcal {D}}, c\in \mathcal {C}_{i,j}}} p(c|d_i, d_j) < {\delta } \end{aligned}$$

(2)

then $d_i$ is treated as noise. $\delta $ is a hyperparameter.

2.2 Hierarchical Conceptual Labeling

Next, we describe how to generate hierarchical conceptual labels for a BoW. The basic idea is: clustering a BoW $\mathcal {D}$ hierarchically based on BRT [1], and for each cluster $\mathcal {D}_m$ an appropriate conceptual label will be generated. We present the pseudo code in Algorithm 1.

Estimation of f($\mathcal {D}_m$) and p($D_m|T_m$). $f(\mathcal {D}_m)$ qualifies the probability that all the words in ${\mathcal {D}_m}$ belong to the same cluster and it further helps us to estimate $p(D_m|T_m)$. Similar to [7], we consider that ${\mathcal {D}_m}$ with more shared concepts in MCG is more inclined to belong to the same cluster. For each $c \in \mathcal {C}_m$ (the shared concepts of ${\mathcal {D}_m}$), the probability that ${\mathcal {D}_m}$ belongs to the same cluster is computed as

$$\begin{aligned} p({{\mathcal {D}_m}| {{c}}{} \mathbf t.}) = \mathop \prod \limits _{{d_i} \in {\mathcal {D}_m}} p( {{d_i}| {{c}}.} ) \end{aligned}$$

(3)

When considering all the concepts in $\mathcal {C}_m$, $f(\mathcal {D}_m)$ is computed by $f(\mathcal {D}_m) = {\sum \nolimits _{{c} \in {\mathcal {C}_m}} {p\left( {{c}} \right) p\left( {{\mathcal {D}_m}\left| {{c}} \right. } \right) }}$. Based on $f(\mathcal {D}_m)$, the probability $p(D_m|T_m)$ can be recursively calculated by $p\left( {{\mathcal {D}_m}\left| {{T_m}} \right. } \right) = {\pi _m}f(\mathcal {D}_m) + \left( {1 - {\pi _m}} \right) \mathop \prod \nolimits _{{T_k} \in \hbox {ch}(T_m)} p\left( {{D_k}\left| {{T_k}} \right. } \right) $.

Estimation of $\pi _m$. $\pi _m$ is a hyperparameter denoting the prior probability that the leaves under $T_m$ are kept in one cluster rather than subdivided by the recursive partitioning process. We simply set $\pi _m = 0.5$ in this paper.

Label Generation. To generate hierarchical conceptual labels for a BoW, we need to select an appropriate conceptual label to well conceptualize each cluster $\mathcal {D}_m$. The following criterion is used to select the most appropriate conceptual label:

$$\begin{aligned} \begin{aligned} c^*_m =&\arg \mathop {\max }\limits _{c \in {\mathcal {C}_m}} p\left( {c\left| {\mathcal {D}_m} \right. } \right) = \arg \mathop {\max }\limits _{c \in {\mathcal {C}_m}} p(\mathcal {D}_m|c)p(c) \end{aligned} \end{aligned}$$

(4)

Likelihood Ratio $\gamma $. In most cases, a BoW is hard to be semantically merged into one cluster, so the cluster operation should be stopped when there is no appropriate label. We take a likelihood ratio $\gamma $, and stop clustering when $L\left( {{T_m}} \right) < \gamma $.

3 Experiments

We evaluate the generated hierarchical conceptual labels. In all experiments, $\delta = 5 \times {10^{ - 8}}$ and $\gamma = 0.8$ are used.

Dataset. The dataset in [7] is used, which contains two subsets: Flickr and Wikipedia. We sample $b = 500$ BoWs from each dataset for evaluation.

Baselines. To the best of our knowledge, there is no work to deal with the task of HCL, so we present two strong baselines constructed by ourselves. (1) Bayesian hierarchical clustering-based model (BHC). We first cluster a BoW using Bayesian hierarchical clustering [2]. Each node in the hierarchy is equipped with a concept to conceptualize the corresponding cluster, where the candidate concepts are also from MCG. (2) Maximal clique segmentation-based model (MCS). We first construct a semantic graph for a BoW, where the vertex corresponds to a word. Then we take the maximal clique segmentation [5] to split the graph into several subgraphs given a similarity threshold. Finally, we select one conceptual label for each graph, thus generating a flat conceptual label set for a BoW. Furthermore, when considering multiple similarity thresholds, we will get the multiple label sets with different granularities for a BoW.

Metric. We evaluate the models and consider the two cases: with (without) denoising algorithm. We recruit $v=5$ volunteers to evaluate the labeling results by scoring ($0\le score \le 3$), where the scoring criteria are motivated by [7]. The average score is computed by $\frac{1}{bv}\sum _{i=1}^{v}\sum _{j=1}^{b}{s_{i,j}}$, where $s_{i,j}$ is the score of the j-th BoW by volunteer i, b is number of BoWs in each dataset and v is the number of volunteers.

Table 1. Average scores on Flickr and Wikipedia data.

Full size table

Results and Analysis. The results are presented in Table 1. We conclude that (1) the scores with the denoising algorithm are higher than these without it for all models, which proves the effectiveness of the denoising method. (2) The proposed model outperforms the other two baselines. In particular, BHC only considers the binary branching structures in the hierarchy and cannot generate multi-branching structures that frequently appear in the BoW clustering. MCS only clusters BoWs into multi-level label sets without hierarchy.

4 Conclusion

This paper first proposes the task of HCL, which aims to generate conceptual labels with different granularities for BoWs. To achieve it, we propose the BRT-based approach with high performance. Besides, we also propose a denoising algorithm to effectively filter out the noise in advance.

Notes

1.
In this paper, the words in BoWs are also called instances.

References

Blundell, C., Teh, Y.W., Heller, K.A.: Bayesian rose trees. In: UAI (2010)
Google Scholar
Heller, K.A., Ghahramani, Z.: Bayesian hierarchical clustering. In: ICML 21 (2005)
Google Scholar
Hua, W., Wang, Z., Wang, H., Zheng, K.: Short text understanding through lexical-semantic analysis. In: IEEE International Conference on Data Engineering, pp. 495–506 (2015)
Google Scholar
Pay, T.: Totally automated keyword extraction. In: 2016 IEEE International Conference on Big Data (Big Data), pp. 3859–3863 (2016)
Google Scholar
Song, Y., Wang, H., Wang, H.: Open domain short text conceptualization: a generative + descriptive modeling approach. In: International Conference on Artificial Intelligence, pp. 3820–3826 (2015)
Google Scholar
Song, Y., Wang, H., Wang, Z., Li, H., Chen, W.: Short text conceptualization using a probabilistic knowledge base. In: IJCAI, pp. 2330–2336 (2011)
Google Scholar
Sun, X., Xiao, Y., Wangy, H., Wang, W.: On conceptual labeling of a bag of words. In: IJCAI, pp. 1326–1332 (2015)
Google Scholar
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: SIGMOD, pp. 481–492 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Fudan University, Shanghai, China
Haiyun Jiang, Cengguang Zhang, Deqing Yang, Yanghua Xiao, Jingping Liu, Jindong Chen, Chao Wang, Chenguang Li, Jiaqing Liang, Bin Liang & Wei Wang
Shanghai Institute of Intelligent Electronics and Systems, Shanghai, China
Yanghua Xiao

Authors

Haiyun Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Cengguang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Deqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yanghua Xiao
View author publications
You can also search for this author in PubMed Google Scholar
Jingping Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jindong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chenguang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqing Liang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Liang
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanghua Xiao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Guoliang Li
Duke University, Durham, NC, USA
Jun Yang
University of Porto, Porto, Portugal
Joao Gama
Chiang Mai University, Chiang Mai, Thailand
Juggapong Natwichai
Beihang University, Beijing, China
Yongxin Tong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, H. et al. (2019). Hierarchical Conceptual Labeling. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-030-18590-9_18
Published: 24 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-18589-3
Online ISBN: 978-3-030-18590-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Hierarchical Conceptual Labeling

Abstract

Similar content being viewed by others

Explaining a bag of words with hierarchical conceptual labels

Understanding a bag of words by conceptual labeling with prior weights

Automatic ontology construction based on clustering nucleus

1 Introduction

2 Framework

2.1 Filtering Out Noise

2.2 Hierarchical Conceptual Labeling

3 Experiments

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Hierarchical Conceptual Labeling

Abstract

Similar content being viewed by others

Explaining a bag of words with hierarchical conceptual labels

Understanding a bag of words by conceptual labeling with prior weights

Automatic ontology construction based on clustering nucleus

1 Introduction

2 Framework

2.1 Filtering Out Noise

2.2 Hierarchical Conceptual Labeling

3 Experiments

4 Conclusion

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation