Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation

Shinnou, Hiroyuki; Onodera, Yoshiyuki; Sasaki, Minoru; Komiya, Kanako

doi:10.1007/978-981-10-0515-2_7

Hiroyuki Shinnou¹²,
Yoshiyuki Onodera¹²,
Minoru Sasaki¹² &
…
Kanako Komiya¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 593))

Included in the following conference series:

Conference of the Pacific Association for Computational Linguistics

644 Accesses
3 Citations

Abstract

In this paper, an active learning method of domain adaptation issues for word sense disambiguation is presented. In general, active learning is an approach where data with high learning effect is selected from an unlabeled data set, then labeled manually, and added to the training data. However, data in the source domain can deteriorate classification precision (misleading data), which extends errors to the domain adaptation. When data labeled by active learning is added to training data, an attempt is made to detect misleading data in the source domain and delete it from the training data. In this way, compared to standard learning classification precision is improved.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation

Domain Adaptation for Word Sense Disambiguation Using Word Embeddings

Semantic Unsupervised Learning for Word Sense Disambiguation

Keywords

1 Introduction

When a natural language processing task is performed, the training and test data are usually in the same domain. However, sometimes the data comes from different domains. Recently, studies into domain adaptation have fine-tuned the classifier by using the training data of a learned domain (source domain) to match the test data of another domain (target domain) [5, 7, 11].

If the subject of the domain adaptation is problematic due to lack of target domain labels, active learning [8, 10] and semi-supervised learning [1] are effective. In this paper, we use active learning for domain adaptation for Word Sense Disambiguation (WSD).

Generally, active learning is an approach that gradually increases the precision of the classifier by selecting data with a high learning effect from an unlabeled data set, labeling the data, and adding it to the training data, thereby increasing the amount of training data monotonically. However, in domain adaptation, there are data that have a negative influence on the target domain due to classification in the source domain training data. Here we refer to such data as “misleading data” [3]. In this paper, we detect such data in the source domain training data and delete it to construct training data suitable for the target domain using active learning.

In the experiment, we use three domains: Yahoo! Answers (OC), Book (PB) and newspaper (PN) from the Balanced Corpus of Contemporary Written Japanese (BCCWJ [4]). The data set, which is provided by a Japanese WSD SemEval-2 task [6] has word sense tags attached to parts of these corpora. There are 16 multi-sense words with a certain frequency across all domains, and six patterns of domain adaptation (OCPB, PBPN, PNOC, OCPN, PNPB, and PBOC). We investigate domain adaptation for WSD using the proposed active learning method for \( 16 \times 6 = 96 \) patterns and show the effectiveness of the proposed method.

2 Active Learning with Deleted Misleading Data

2.1 Active Learning

Active learning is an approach that reduces the amount of manual labeling when building effective training data.Using a classifier trained on the current training data, we selected data with as high a learning effect as possible from an unlabeled data set. Then, we manually assign correct labels to the selected data and add it to the training data. Consequently, the amount of labeled data is increased and the classifier is improved.

The key question of active learning is how to choose data with a high learning effect. There are many active learning methods [10]; however, one particularly effective method is widely used. This method selects data with the lowest classification reliability determined by a powerful classifier such as a support vector machine (SVM) classifier [9].

2.2 Detecting and Deleting Misleading Data

The initial labeled data in a general active learning is fixed. This is not problematic because all labeled data is useful. However, the initial pool of labeled data for domain adaptation, i.e., labeled data in the source domain can include harmful data.Here we refer to such data ‘misleading data.’ When general active learning is applied to domain adaptation, misleading data in the source domain prevents active learning from improving the classifier. Therefore, when we add labeled data to the training data, we detect misleading data and delete it from the labeled training data in the source domain.

Figure 1 shows the algorithm of our method. The initial labeled data in the source domain is denoted \( D_0 \), and the labeled data added to training data during the active learning process is denoted \( A \), where initial \( A \) is empty. \( D_1 \) is the union of \( D_0 \) and \( A\), and \( h_1 \) is the classifier learned through \( D_1\). By using \( h_1\), we classify \( D_0\); the classification result is denoted \( L_1 \). Like general active learning, we classify the unlabeled data set \( U \) in the target domain using \( h_1\) and assign a correct label to identify data \( {\varvec{b}} \) with the lowest classification reliability. Data \( {\varvec{b}} \) is added to \( A\). \( D_2 \) is the union of \( D_0 \) and \( A\), and \( h_2 \) is the classifier learned through \( D_2\). We use \( h_2\) to classify \( D_0\) and denote the classification result as \( L_2 \). We detect misleading data \( {\varvec{z}} \) using \( L_1 \) and \( L_2\) by following procedure. Using to following cases (a),(b) or (c), we can identify misleading data. (a) There are false classifications in \( L_2\). In this case, we identify the data with the highest classification reliability among the false classifications. (b) There are no false classifications. In this case, by comparing \( L_1 \) with \( L_2 \), we identify the data with the greatest decrease in reliability from \( L_1 \) to \( L_2\). (c) There are no false classifications and no data with decreased reliability. In this case, no misleading data is identified. As shown in Fig. 1, this procedure is repeated 10 times.

In this study, active learning is complete when 10 data have been added to the labeled training data set. The only difference between general active learning and active learning for domain adaptation is the distribution of the initial labeled data set. Thus when labeled data is increased through active learning, there are very few differences. Therefore, we evaluate the proposed method with 10 repetitions of active learning.

Table 1. Target words of experiment

Full size table

3 Experiment

In the experiment, we use three domains: OC, PB and PN from the Balanced Corpus of Contemporary Written Japanese (BCCWJ [4]). As mentioned previously the data set, which was provided by a Japanese WSD SemEval-2 task [6], has word sense tags attached to part of these corpora. There are 16 multi-sense words with some frequency across all domains. These 16 target words are shown in Table 1.^{Footnote 1} There are six direction patterns of (OCPB, PBPN, PNOC, OCPN, PNPB, and PBOC). Consequently \( 16 \times 6 = 96 \) types of domain adaptation of WSD are used in the experiment.

In each direction of domain adaptation (e.g., OCPB), we conducted active learning for 16 target words. We evaluated the active learning method for domain adaptation using the average of these 16 precision.

We tried three methods. The first method is active learning to select added data at random (Random), the second is standard active learning (AL), and the third is our proposed active learning (Our AL). For all methods, the classifier is a SVM. We use the SVM tool ‘libsvm’ ^{Footnote 2} to train the classifier. Using the -b option, we can obtain the reliability of the classification.

Table 2. Average precision of the final classifier (%)

Full size table

We show the result of the experiment in Figs. 3, 4, 5, 6, 7 and 8. Each figure shows the result of each domain adaptation. In this experiment, active learning stops after 10 repetitions. After 10 repetitions, the current classifier is presented in Table 2 and Fig. 2. Our proposed active learning method outperforms standard active learning in every domain adaptation type.

4 Discussion

4.1 Existence and Detection of Misleading Data

We do not know whether the data as misleading data in the experience are actually misleading data. Here, we use the data labels to determine if the detected data are in fact misleading data, and we examine whether the method for detecting misleading data is effective.

At first, we identify the misleading data individually following a previously proposed method [13]. The labeled data \( D \) in \( S \) of target word \( w \) exists in domain adaptation for fine-tuning the domain \( S \) to \( T\). Next we measure the correct answer rate \( p_0 \) of the classifier \( T \) learned by \( D \), delete data \( x \) from \( D \), and measure the correct answer rate \( p_1 \) of the classifier \( T \) learned by \( D - \{x\}\). When \( p_1 > p_0 \), we consider data \( x \) to be misleading data. We perform this procedure for all data across \( D\) and find the misleading data of target word \( w\). Table 3 shows the amount of misleading data found by this process. The numerical values in the parentheses are the amount of all data.

From the data presented in Table 3, we investigate whether misleading data detected by the experimental procedure are true or not. The result are shown in Table 4. The numerical values in the parenthesis are the amount of detected data, and the numerical values next to the parenthesis are the amount of the true misleading data. From Table 4, it is evident that the amount of detected data is 959, the amount of true misleading data is 121, and the precision is 0.1262. It is thought that this value is low. However, precision is not always reduced deleting false detected data. Therefore, we believe that the detected data were not related to classification.

4.2 Instance Weight

In domain adaptation tasks, labeled data in the target domain are more important than labeled data in the source domain. Therefore, instance weight learning is effective in domain adaptation [3]. Generally, the weight of the instance is defined by the probability density ratio [12]. Here, we investigate active learning weighting of the detected target domain data. We simply weight detected data by doubling the frequency of such data. Table 5 shows the average precision of the final classifier obtained by active learning.

Table 3. Misleading data

Full size table

Table 4. Correct answer rates of detection of misleading data

Full size table

Table 5. Active learning with instance weight (%)

Full size table

Table 6. Use of Daumé’s method in active learning (%)

Full size table

From Table 5, we can confirm the effect of weighting on target domain labeled data. This experiment is simply weighting double heaviness. We intended to investigate the potential for improvement in future work.

4.3 Feature Weight

Because target domain labeled data are added by active learning, we can use the supervised domain adaptation method.

Here, we combine Daumé’s method [2] with active learning. We convert vector \( {\varvec{x}_s} \) of the source domain into a triple length vector \( ({\varvec{x}_s},{\varvec{x}_s},{\varvec{0}}) \), and vector \( {\varvec{x}_t} \) of the target domain into a triple length vector \( ({\varvec{0}},{\varvec{x}_t},{\varvec{x}_t}) \) using Daumé’s method. We classify the target domain data with the standard classification using the tripled vector. This method weights the common (overlapped) features of the source domain and the target domain.

When the Daumé’s method is combined with active learning, we only have to convert source domain data \( {\varvec{x}_s} \) into \( ({\varvec{x}_s},{\varvec{x}_s},{\varvec{0}}) \), and target domain data \( {\varvec{x}_t} \) into \( ({\varvec{0}},{\varvec{x}_t},{\varvec{x}_t}) \). The result for ten repetitions are shown in Table 6.

From Table 6, it is evident that using the proposed method with Daumé’s method is not effective; however standard active learning combined with Daumé’s method is effective. It is thought that the influence of misleading data becomes small with Daumé’s method; consequently, the proposed method with Daumé’s method was not effective. In future, we intend to investigate this possibility.

5 Conclusion

In this paper, we proposed a new active learning method of domain adaptation for WSD. In standard active learning, labeled training data increases monotonically. However, data in the source domain can deteriorate classification precision (misleading data), which extends errors to the domain adaptation. Our proposed method detects and deletes misleading data in the source domain during the standard active learning process. Through an experiment using three domains (OC, PB and PN) in BCCWJ and 16 common target words, the proposed method outperformed standard active learning. In future, we intend to investigate methods to detect misleading data more accurately and to assign proper weight to instances and features during the active learning process.

Notes

1.
The word “(Hairu)” has three senses in a dictionary. However, it has four senses in OC and PB domain. The fourth sense is new. In Japanese WSD SemEval-2 task, tagging the new sense was attempted.
2.
http://www.csie.ntu.edu.tw/~cjlin/libsvm/.

References

Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised learning, vol. 2. MIT press, Cambridge (2006)
Book Google Scholar
Daumé, III, H.: Frustratingly easy domain adaptation. In: ACL-2007, pp. 256–263 (2007)
Google Scholar
Jiang, J., Zhai, C.: Instance weighting for domain adaptation in NLP. In: ACL-2007, pp. 264–271 (2007)
Google Scholar
Maekawa, K.: Design of a balanced corpus of contemporary written Japanese. In: Symposium on Large-Scale Knowledge Resources (LKR 2007), pp. 55–58 (2007)
Google Scholar
Mori, S.: Domain adaptation in natural language processing (in japanese). Jpn. Soc. Artif. Intell. 27(4), 365–372 (2012)
Google Scholar
Okumura, M., Shirai, K., Komiya, K., Yokono, H.: SemEval-2010 task: Japanese WSD. In: The 5th International Workshop on Semantic Evaluation, pp. 69–74 (2010)
Google Scholar
Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)
Article Google Scholar
Rai, P., Saha, A., Daumé III., H., Venkatasubramanian, S.: Domain adaptation meets active learning. In: NAACL HLT 2010 Workshop on Active Learning for Natural Language Processing, pp. 27–32 (2010)
Google Scholar
Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. In: ICML, pp. 839–846 (2000)
Google Scholar
Settles, B.: Active Learning Literature Survey. University of Wisconsin, Madison (2010)
Google Scholar
Søgaard, A.: Semi-Supervised Learning and Domain Adaptation in Natural Language Processing. Morgan & Claypool, Milton Keynes (2013)
Google Scholar
Sugiyama, M., Kawanabe, M.: Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation. MIT Press, Cambridge (2011)
Google Scholar
Yoshida, H., Shinnou, H.: Detection of misleading data by outlier detection methods (in japanese). In: The 5th Japanese Corpus Linguistics Workshop, pp. 49–56 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Sciences, Ibaraki University, 4-12-1 Nakanarusawa, Hitachi, Ibaraki, Japan
Hiroyuki Shinnou, Yoshiyuki Onodera, Minoru Sasaki & Kanako Komiya

Authors

Hiroyuki Shinnou
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiyuki Onodera
View author publications
You can also search for this author in PubMed Google Scholar
Minoru Sasaki
View author publications
You can also search for this author in PubMed Google Scholar
Kanako Komiya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroyuki Shinnou .

Editor information

Editors and Affiliations

Graduate School of Information Science, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
Kôiti Hasida
School of Electrical Eng and Informatics, Bandung Institute of Technology, Bandung, Indonesia
Ayu Purwarianti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shinnou, H., Onodera, Y., Sasaki, M., Komiya, K. (2016). Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation. In: Hasida, K., Purwarianti, A. (eds) Computational Linguistics. PACLING 2015. Communications in Computer and Information Science, vol 593. Springer, Singapore. https://doi.org/10.1007/978-981-10-0515-2_7

Download citation

DOI: https://doi.org/10.1007/978-981-10-0515-2_7
Published: 20 February 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-0514-5
Online ISBN: 978-981-10-0515-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Active Learning to Remove Source Instances for Domain Adaptation for Word Sense Disambiguation

Abstract

Similar content being viewed by others

Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation

Domain Adaptation for Word Sense Disambiguation Using Word Embeddings

Semantic Unsupervised Learning for Word Sense Disambiguation

Keywords

1 Introduction