Collectives of Term Weighting Methods for Natural Language Call Routing

Sergienko, Roman; Gasanova, Tatiana; Semenkin, Eugene; Minker, Wolfgang

doi:10.1007/978-3-319-26453-0_6

Roman Sergienko⁵,
Tatiana Gasanova⁵,
Eugene Semenkin⁶ &
…
Wolfgang Minker⁵

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 370))

884 Accesses
3 Citations

Abstract

The paper presents the investigation of collectives of term weighting methods for natural language call routing. The database consists of user utterances recorded in English language from caller interactions with commercial automated agents. Utterances from this database are labelled by experts and divided into 20 classes. Seven different unsupervised and supervised term weighting methods were tested and compared with each other for classification with k-NN. Also a novel feature extraction method based on terms belonging to classes was applied. After that different combinations of term weighting methods were formed as collectives and used for meta-classification with rule induction. The numerical experiments have shown that the combination of two best term weighting methods (Term Relevance Ratio and Confident Weights) increases classification effectiveness in comparison with the best individual term weighting method significantly.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A Comparative Study of Text Preprocessing Techniques for Natural Language Call Routing

Learning User Intentions in Natural Language Call Routing Systems

Modeling Machine Learning Agent for Interaction Conversational System Using Max Entropy Approach in Natural Language Processing

Keywords

1 Introduction

Natural language call routing is an important problem in the design of modern automatic call services and the solving of this problem could lead to improvement of the call service. Generally natural language call routing can be considered as two different problems. The first one is speech recognition of calls and the second one is call categorization for further routing. This paper focuses on text categorization methods applied for call routing.

Text classification can be considered to be a part of natural language understanding, where there is a set of predefined categories and the task is to automatically assign new documents to one of these categories. The method of text preprocessing and text representation influences the results that are obtained even with the same classification algorithms.

In the vector space model [18] text categorization is considered as a machine learning problem. The complexity of text categorization with a vector space model is compounded by the need to extract the numerical data from text information before applying machine learning methods. Therefore, text categorization consists of two parts: text preprocessing and classification using the obtained numerical data.

All text preprocessing methods are based on the idea that the category of the document depends on the words or phrases from this document. The simplest approach is to take each word of the document as a binary coordinate and the dimensionality of the feature space will be the number of words in our dictionary (“bag of words” method [10]). There exist more advanced approaches for text preprocessing to overcome this problem such as term weighting methods. There exist different unsupervised and supervised term weighting methods. The most well-known unsupervised term weighting method is TF-IDF [17]. The following supervised term weighting methods are also considered in the paper: Gain Ratio [4], Confident Weights [21], TM2 [23], Relevance Frequency [13], Term Relevance Ratio [11] and Novel Term Weighting [5]; these methods involve information about the classes of the utterances. It is important to notice that we use no morphological and stop-word filtering before text preprocessing. It means that the text preprocessing can be performed without expert or linguistic knowledge and that the text preprocessing is language-independent.

After text preprocessing we obtain a vector of numerical variables for each document and the dimensionality of the feature space is the number of words in the dictionary. In this case direct application of the machine learning algorithms is time-consuming. Many researchers have used a variety of unsupervised techniques for various tasks in order to improve classification quality or to decrease dimension of the features. One common approach is to induce word features with unsupervised methods (for example, clustering which was used in [8, 12, 14, 16] or neural language models which have been proposed by [1, 3, 15, 20] and then apply supervised algorithm. In our work we propose a novel feature extraction method based on terms belonging to classes as a dimensionality reduction method. The method reduces the dimensionality radically; the dimensionality equals number of classes. After dimensionality reduction we apply k nearest neighbours algorithm (k-NN) for classification.

In our paper we propose collectives of different term weighting methods based on an idea that different term weighting methods provide classification errors on different utterances; the simultaneous use of different term weighting methods may increase the effectiveness of text classification. We propose meta-classification with rule induction as a collective classification approach.

This paper is organized as follows: In Sect. 2, we describe the problem and the database. Section 3 describes the considered term weighting methods. The feature extraction method and the classification algorithms are presented in Sect. 4. Section 5 reports on the experimental results. Finally, we provide concluding remarks and directions for future investigations in Sect. 6.

Table 1 The distribution of the classes

Full size table

2 Corpora Description

The data for testing and evaluation consists of 292,156 user utterances recorded in English language from caller interactions with commercial automated agents. The database contains calls in textual format after speech recognition. The database is provided by the company Speech Cycle (New York, USA). Utterances from this database are manually labelled by experts and divided into 20 classes (such as appointments, operator, bill, internet, phone and technical support) One of them is a special class TE-NOMATCH which includes utterances that cannot be put into another class or can be put into more than one class. The classes and their distribution are presented in Table 1.

The database contains 45 unclassified calls and they were removed. The database contains also 23,561 empty calls without any words. These calls were placed in the class TE-NOMATCH automatically and they were also removed from the database. As a rule, the calls are short in the database; many of them contain only one or two words. So there are a lot of duplicated utterances in the database and utterance duplicates were removed. After that the database contains 24,458 unique non-empty classified calls.

We have performed 20 different separations of the database into training and test samples randomly. The train samples contain 90 % of the calls and the test samples contain 10 % of the calls. For each training sample we have designed a dictionary of unique words which appear in the training sample. The size of the dictionary varies from 3275 to 3329 words for different separations.

3 Term Weighting Methods

After generating training and test samples we performed term weighting. As a rule, term weighting is a multiplication of two parts: the part based on the term frequency in a document (TF) and the part based on the term frequency in the whole database. The TF-part is fixed for all considered term weighting methods and is calculated as following:

$$\begin{aligned} {TF}_{ij}=\log \left( {tf}_{ij}+1\right) ; {tf}_{ij}= \frac{{n}_{ij}}{{N}_{i}}, \end{aligned}$$

where $n_{ij}$ is the number of times the ith word occurs in the jth document, $N_{j}$ is the document size (number of words in the document).

The second part of the term weighting is calculated once for each word from the dictionary and does not depend on an utterance for classification. We consider 7 different methods for calculation of the second part of term weighting.

3.1 Inverse Document Frequency (IDF)

IDF is a well-known unsupervised term weighting method which was proposed in [17]. There are some modifications of IDF and we use the most popular one:

$$\begin{aligned} {idf}_{i}=\log \frac{\left| D \right| }{{n}_{i}}, \end{aligned}$$

where $\left| D \right| $ is the number of documents in the training set and $n_i$ is the number of documents that have the ith word.

3.2 Gain Ratio (GR)

Gain Ratio (GR) is mainly used in term selection [24], but in [4] it was shown that it could also be used for weighting terms, since its value reflects the importance of a term. The definition of GR is as follows:

$$\begin{aligned} GR\left( {t}_{i},{c}_{j} \right) =\frac{\sum _{c\in \{{c}_{j}, {\bar{c}}_{j} \}}^{}\sum _{t\in \{{t}_{j}, {\bar{t}}_{j} \}}^{}P\left( t, c \right) \cdot \log \frac{P\left( t,c \right) }{P\left( t \right) \cdot P\left( c \right) }}{-\sum _{c\in \{{c}_{j}, {\bar{c}}_{j} \}}^{}P\left( c \right) \cdot \log P\left( c \right) }, \end{aligned}$$

where P(t, c) is the relative frequency that a document contains the term t and belongs to the category c; P(t) is the relative frequency that a document contains the term t and P(c) is the relative frequency that a document belongs to category c.

Then, the weight of the term $t_i$ is the max value between all categories as follows:

$$\begin{aligned} GR\left( {t}_{i} \right) =\max _{{c}_{j}\in C}GR\left( {t}_{i},{c}_{j} \right) , \end{aligned}$$

where C is a set of all classes.

3.3 Confident Weights (CW)

The Confident Weights method (ConfWeight) is a supervised term weighting that involves information about classes which correspond to documents. This approach has been proposed in [21]. The main idea of the method is that the term t has a non-zero weight in the class c only if the frequency of the term t in documents of the c class is greater than the frequency of the term t in all other classes.

Firstly, the proportion of documents containing term t is defined as the Wilson proportion estimate p(x, n) [22] by the following equation:

$$\begin{aligned} p\left( x,n \right) =\frac{x+0.5{z}^{2}_{\alpha /2}}{n+{z}^{2}_{\alpha /2}}, \end{aligned}$$

where x is the number of documents containing the term t in the given corpus, n is the number of documents in the corpus and $\varPhi \left( {z}_{\alpha /2} \right) =\alpha /2$, where $\varPhi $ is the t-distribution (Students law) when $n<30$ and the normal distribution when $n\ge 30$.

In this work $\alpha = 0.95$ and $0.5{z}^{2}_{\alpha /2}=1.96$ (as recommended by the authors of ConfWeight). For each term t and each class c two functions $p_{pos} (x, n)$ and $p_{neg} (x, n)$ are calculated. For $p_{pos} (x, n)$ x is the number of vectors which belong to the class c and have term t; n is the number of documents which belong to the class c. For $p_{neg} (x, n)$ x is number of vectors which have the term t but do not belong to the class c; n is the number of documents which do not belong to the class c.

The confidence interval $(p^-,p^+)$ at 95 % is calculated using the following equations:

$$\begin{aligned} p^-=p-0,5{z}^{2}_{\alpha /2}\sqrt{\frac{p(1-p)}{n+{z}^{2}_{\alpha /2}}}; p^+=p+0,5{z}^{2}_{\alpha /2}\sqrt{\frac{p(1-p)}{n+{z}^{2}_{\alpha /2}}}. \end{aligned}$$

The strength of the term t in the category c is defined as the follows:

$$\begin{aligned} str(t,c)={\left\{ \begin{array}{ll} \log _2\frac{2p^-_{pos}}{p^-_{pos}+p^+_{neg}}, &{} \mathrm {if} {\ p^-_{pos}>p^+_{neg}},\\ 0, &{} \mathrm {otherwise}. \end{array}\right. } \end{aligned}$$

The maximum strength (Maxstr) of the term $t_i$ is calculated as follows:

$$\begin{aligned} Maxstr({t}_{i})=\max _{{c}_{j}\in C}str\left( {t}_{i},{c}_{j} \right) ^{2}. \end{aligned}$$

The ConfWeight method uses Maxstr as an analogue of IDF.

The numerical experiments conducted in [21] have shown that the ConfWeight method outperforms the gain ratio and TF-IDF with SVM and k-NN as classification methods on three benchmark corpora.

3.4 TM2 (Second Moment of a Term)

This supervised term weighting method was proposed in [23].

Let $P(c_j|t)$ be the probability that a document belongs to the category $c_j$ with the condition that the document contains the term t and belongs to the category c; $P(c_j)$ is the probability that a document belongs to the category c without any conditions. The idea is the following: the more $P(c_j|t)$ is different from $P(c_j)$ the more important the term $t_i$ is. Therefore, we can calculate the term weight as the following:

$$\begin{aligned} TM2(t_i)= \sum _{j=1}^{\left| C \right| }\left( P(c_j|t)-P(c_j) \right) ^2. \end{aligned}$$

where C is a set of all classes.

3.5 Relevance Frequency (RF)

The RF term weighting method was proposed in [13] and is calculated as the following:

$$\begin{aligned} rf(t_i, c_j)=\log _2 \left( 2+\frac{a_j}{\max \{1, \bar{a_j}\}} \right) ;rf(t_i)=\max _{{c}_{j}\in C}rf\left( {t}_{i},{c}_{j} \right) , \end{aligned}$$

where $a_j$ is the number of documents of the category $c_j$ which contain the term $t_i$ and $\bar{a_j}$ is the number of documents of all the other categories which also contain this term.

3.6 Term Relevance Ratio (TRR)

The TRR method [11] uses tf weights and it is calculated as the following:

$$\begin{aligned} TRR(t_i, c_j)=\log _2 \left( 2+\frac{P(t_i|c_j)}{P(t_i|\bar{c_j}} \right) ;TRR(t_i)=\max _{c_{j}\in C}TRR\left( {t}_{i},{c}_{j} \right) , \end{aligned}$$

$$\begin{aligned} P(t_i,c)=\frac{\sum _{k=1}^{\left| T_c \right| }tf_{ik}}{\sum _{l=1}^{\left| V \right| }\sum _{k=1}^{\left| T_c \right| }tf_{lk}}, \end{aligned}$$

where $c_j$ is a class of the document, $\bar{c_j}$ is all of the other classes of $c_j$, V is the vocabulary of the training data and $T_c$ is the document set of the class c.

3.7 Novel Term Weighting (NTW)

This method was proposed in [5, 6]. For each term we assign a real number term relevance that depends on the frequency in utterances. Term weight is calculated using a modified formula of fuzzy rules relevance estimation for fuzzy classifiers [9]. Membership function has been replaced by word frequency in the current class.

The details of the procedure are the following. Let L be the number of classes; $n_i$ is the number of documents which belong to the ith class; $N_{ij}$ is the number of occurrences of the jth word in all articles from the ith class. $T_{ij} = N_{ij} / n_i$ is the relative frequency of occurrences of the jth word in the ith class; $R_j=\max _i T_{ij}$; $S_j=\arg \max _i T_{ij}$ is the class which we assign to the jth word. The term relevance $C_j$ is calculated by the following:

$$\begin{aligned} C_j=\frac{1}{\sum _{i=1}^{L}T_{ij}}\cdot \left( R_j-\frac{1}{L-1}\cdot \sum _{i=1, i\ne S_j}^{L}T_{ij} \right) . \end{aligned}$$

4 Feature Extraction and Classification Algorithms

Dimensionality reduction allows the processing time for the application of a machine learning algorithm to be decreased with acceptable classification results. For some machine learning algorithms a dimensionality reduction is critically important because they cannot be applied for problems with such large dimensionality as the size of the dictionary of a text classification problem. There are two different methods of dimensionality reduction: feature extraction and feature selection. With the first approach we generate a small number of new features from previous ones, i.e. with clustering of terms [7]. With the second one we remove useless and non-informative features.

We propose a novel feature extraction method based on terms belonging to classes. The idea is to assign each term from the dictionary to the most appropriate class. Such assignment is performed during the calculation of GR, CW, RF, TRR, and NTW. With TF-IDF and TM2 we can also assign one class for each term using the relative frequency of the word in classes:

$$\begin{aligned} S_j=\arg \max _{c\in C} \frac{n_{jc}}{N_c}, \end{aligned}$$

where $S_j$ is the most appropriate class for the jth term, c is an index of a class, C is a set of all classes, $n_{jc}$ is the number of documents of the cth class which contain the jth term, $N_c$ is the number of all documents of the cth class.

After the assigning of each word to one class and term weighting we can calculate the sums of term weights in a document for each class. We can put these sums as new features of the text classification problem. Therefore, such a method reduces the dimensionality radically; the dimensionality of the classification problem equals the number of classes.

We used k-NN with distance weighting as the classification algorithm with all seven term weighting methods. We have varied k from 1 to 15.

After the classification was performed with all term weighting methods we put the classification results (the predicted class by each term weighting method) on the training sample as categorical features for the next collective meta-classifier. We used rule induction [2] as a meta-classifier for collectives of term weighting methods.

RapidMiner was used as software for k-NN and rule induction application [19]. The classification criterion is the macro F-score.

5 Results of Numerical Experiments

At first we have tested different term weighting methods with different values of k for 20 separations of the database on test samples. The numerical experiments provide ranking of the methods with statistical analysis (t-test). The ranking is presented in Table 2. Table 2 contains the averaged results for 20 separations of the database.

Table 2 Ranking of the term weighting methods

Full size table

Table 3 The results with collectives of the term weighting methods

Full size table

Table 4 The comparison of the collectives with TRR

Full size table

For rule induction application we have designed the collectives with different numbers of included methods from 7 to 2, with consistent exception of the worst methods. For collectives with 3 methods we include RF despite the fact that there is no statistically significant difference between RF and TM2. Therefore, the collective 7 contains all 7 term weighting methods; the collective 6 contains TRR, CW, RF, TM2, NTW, and GR; the collective 5 contains TRR, CW, RF, TM2, and NTW; the collective 4 contains TRR, CW, RF, and TM2; the collective 3 contains TRR, CW, and RF; the collective 2 contains TRR and CW. The results of the numerical experiments with the collectives of term weighting methods are presented in Table 3.

Table 4 contains a comparison of the collectives with the best term weighting method TRR. The 2nd column contains the average increment of the max F-score, the 3rd column contains the number of database separations with the increment of the max F-score from 20 separations, and the 4th column contains the values of t-test in comparison with the best term weighting method TRR.

Figure 1 shows the average F-score of all of the individual methods and all of the collectives depending on k.

6 Conclusions and Future Directions

The investigations have shown that the collective of two best term weighting methods (TRR and CW) provides a statistically significant increment of classification effectiveness in comparison with the best term weighting (TRR) for natural language call routing. The collectives with more methods do not differ significantly from the best individual method (TRR). Also we can conclude that the collectives are more effective in comparison with individual term weighting methods for higher values of k in k-NN.

For the next investigations we propose the following:

The application of a different classification algorithm for individual term weighting methods: for example SVM and artificial neural networks.
The separation of the database into three samples. The first one will be used as training sample for the learning of individual term weighting methods, the second one will be used for the learning of the collectives, and the third one will be used as a test sample. In the current work we use the same training sample for individual methods and for the collectives. The use of different samples for learning may increase the effectiveness of the collectives, especially for the k-NN algorithm with low values of k.

In future work, we are planning the following:

The application of different feature selection and feature extraction methods.
The application of different approaches for meta-classification.
The application of collectives of term weighting methods for different text classification problems and for different databases.

References

Bengio, Y., Schwenk, H., Senecal, J.-S., Morin, F., and Gauvain, J.-L.: Neural probabilistic language models. In: Innovations in Machine Learning, 137–186 (2006)
Google Scholar
Cohen, W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, Lake Tahoe, California (1995)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: Deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning, 160–167 (2008)
Google Scholar
Debole, F., Sebastiani, F.: Supervised term weighting for automated text categorization. Text mining and its applications, Springer, Berlin Heidelberg, 81–97 (2004)
Google Scholar
Gasanova, T., Sergienko, R., Minker, W., Semenkin, E., Zhukov, E.: A semi-supervised approach for natural language call routing. In: Proceedings of the SIGDIAL 2013 Conference, 344–348 (2013)
Google Scholar
Gasanova, T., Sergienko, R., Akhmedova, S., Semenkin, E., Minker, W.: Opinion mining and topic categorization with novel term weighting. In: Proceedings of the 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Association for Computational Linguistics, Baltimore, Maryland, USA, 84–89 (2014)
Google Scholar
Gasanova, T., Sergienko, R., Semenkin, E., Minker, W.: Dimension Reduction with Coevolutionary Genetic Algorithm for Text Classification. In: Proceedings of the 11th International Conference on Informatics in Control, Automation and Robotics (ICINCO), Vienna University of Technology, Austria, Vol. 1, 215–222 (2014)
Google Scholar
Huang, F., Yates, A.: Distributional representations for handling sparsity in supervised sequencelabeling. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: ACL, Vol. 1, 495–503 (2009)
Google Scholar
Ishibuchi, H., Nakashima, T., Murata, T.: Performance evaluation of fuzzy classifier systems for multidimensional pattern classification problems. IEEE Trans. Syst. Man Cybern. Part B: Cybern. 29(5), 601–618 (1999)
Article Google Scholar
Joachims, T.: Learning to classify text using support vector machines: methods, theory and algorithms. Kluwer Academic Publishers, Berlin (2002)
Google Scholar
Ko, Y.: A study of term weighting schemes using class information for text classification. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, 1029–1030 (2012)
Google Scholar
Koo, T., Carreras, X., Collins, M.: Simple semisupervised dependency parsing. ACL, 595–603 (2008)
Google Scholar
Lan, M., Tan, C.L., Su, J., Lu, Y.: Supervised and traditional term weighting methods for automatic text categorization. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 721–735 (2009)
Article Google Scholar
Miller, S., Guinness, J., Zamanian, A.: Name tagging with word clusters and discriminative training. HLT-NAACL 4, 337–342 (2004)
Google Scholar
Mnih, A. Hinton, G.: Three new graphical models for statistical language modelling. In: Proceedings of the 24th International Conference on Machine Learning, 641–648 (2007)
Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, ACL, 147–155 (2009)
Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inform. Process. manage. 24(5), 513–523 (1988)
Article Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Article Google Scholar
Shafait, F., Reif, M., Kofler, C., Breuel, T.M.: Pattern recognition engineering. RapidMiner Community Meeting and Conference, 9 (2010)
Google Scholar
Schwenk, H. Gauvain, J.-L.: Connectionist language modeling for large vocabulary continuous speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1 (2002)
Google Scholar
Soucy, P., Mineau, G.W.: Beyond TFIDF weighting for text categorization in the Vector space model. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), 1130–1135 (2005)
Google Scholar
Wilson, E.B.: Probable inference, the law of succession, and statistical inference. J. Am. Stat. Assoc. 22(158), 209–212 (1927)
Article Google Scholar
Xu, H., Li, C.: A Novel term weighting scheme for automated text Categorization. Intelligent Systems Design and Applications (2007)
Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. ICML 9, 412–420 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Ulm University, Albert-Einstein-Allee 43, 89081, Ulm, Germany
Roman Sergienko, Tatiana Gasanova & Wolfgang Minker
Siberian State Aerospace University, Krasnoyarskiy Rabochiy Avenue 31, 660014, Krasnoyarsk, Russian Federation
Eugene Semenkin

Authors

Roman Sergienko
View author publications
You can also search for this author in PubMed Google Scholar
Tatiana Gasanova
View author publications
You can also search for this author in PubMed Google Scholar
Eugene Semenkin
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Minker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Roman Sergienko .

Editor information

Editors and Affiliations

Polytechnic Institute of Setúbal INSTICC, Setúbal, Portugal
Joaquim Filipe
Ford Research & Adv. Engineering, Dearborn, Michigan, USA
Oleg Gusikhin
Univ Paris-Est Creteil (UPEC), Créteil, France
Kurosh Madani
Carleton University, Mechanical and Aerospace Engineering, Ottawa, Ontario, Canada
Jurek Sasiadek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sergienko, R., Gasanova, T., Semenkin, E., Minker, W. (2016). Collectives of Term Weighting Methods for Natural Language Call Routing. In: Filipe, J., Gusikhin, O., Madani, K., Sasiadek, J. (eds) Informatics in Control, Automation and Robotics. Lecture Notes in Electrical Engineering, vol 370. Springer, Cham. https://doi.org/10.1007/978-3-319-26453-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-26453-0_6
Published: 27 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26451-6
Online ISBN: 978-3-319-26453-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Collectives of Term Weighting Methods for Natural Language Call Routing

Abstract

Similar content being viewed by others

A Comparative Study of Text Preprocessing Techniques for Natural Language Call Routing

Learning User Intentions in Natural Language Call Routing Systems

Modeling Machine Learning Agent for Interaction Conversational System Using Max Entropy Approach in Natural Language Processing

Keywords

1 Introduction

2 Corpora Description

3 Term Weighting Methods

3.1 Inverse Document Frequency (IDF)

3.2 Gain Ratio (GR)

3.3 Confident Weights (CW)

3.4 TM2 (Second Moment of a Term)

3.5 Relevance Frequency (RF)

3.6 Term Relevance Ratio (TRR)

3.7 Novel Term Weighting (NTW)

4 Feature Extraction and Classification Algorithms

5 Results of Numerical Experiments

6 Conclusions and Future Directions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Collectives of Term Weighting Methods for Natural Language Call Routing

Abstract

Similar content being viewed by others

A Comparative Study of Text Preprocessing Techniques for Natural Language Call Routing

Learning User Intentions in Natural Language Call Routing Systems

Modeling Machine Learning Agent for Interaction Conversational System Using Max Entropy Approach in Natural Language Processing

Keywords

1 Introduction

2 Corpora Description

3 Term Weighting Methods

3.1 Inverse Document Frequency (IDF)

3.2 Gain Ratio (GR)

3.3 Confident Weights (CW)

3.4 TM2 (Second Moment of a Term)

3.5 Relevance Frequency (RF)

3.6 Term Relevance Ratio (TRR)

3.7 Novel Term Weighting (NTW)

4 Feature Extraction and Classification Algorithms

5 Results of Numerical Experiments

6 Conclusions and Future Directions

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation