Short-Text Feature Expansion and Classification Based on Non-negative Matrix Factorization

Zhang, Ling; Jiang, Wenchao; Zhao, Zhiming

doi:10.1007/978-3-030-62463-7_32

Ling Zhang¹²,
Wenchao Jiang¹² &
Zhiming Zhao¹³

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 12488))

Included in the following conference series:

International Conference on Machine Learning for Cyber Security

948 Accesses
1 Citations

Abstract

In this paper, a Non-negative Matrix Factorization Feature Expansion (NMFFE) approach was proposed to overcome the feature-sparsity issue when expanding features of short-text. Firstly, we took the internal relationships of short texts and words into account when segmenting words from texts and constructing their relationship matrix. Secondly, we utilized Dual regularization Non-negative Matrix Tri-Factorization algorithm (DNMTF) to obtain the words clustering indicator matrix, which was used to get the feature space by dimensionality reduction methods. Thirdly, words with close relationship were selected out from the feature space and added into the short-text in order to solve the sparsity issue. The experimental results showed that the accuracy of short text classification of our NMFFE algorithm increased 25.77%, 10.89% and 1.79% on three datasets: Web snippets, Twitter sports and AGnews respectively compared with Word2Vec algorithm and Char-CNN algorithm. It indicated that the NMFFE algorithm was better than BOW algorithm and the Char-CNN algorithm in terms of classification accuracy and algorithm robustness.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Short Text Feature Extension Based on Improved Frequent Term Sets

Crest: Cluster-based Representation Enrichment for Short Text Classification

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

Keywords

1 Introduction

Short texts are convenient in human communication, and have prevalent on the social networks nowadays. Short text classification is one of the challenges due to its natural sparsity, noise words, syntactical structure and colloquial terminologies [1]. Those topics attracted lots of research attention in the field of short text expansion and classification research.

Due to the imitation of words and low-frequency of terms in short text, the bag-of-words (BOW) representation has limits in analyzing short texts [2]. One possible solution for handling sparsity is to expand short text by appending new features based on semantic information extracted from Web searching, lexical databases or provided by machine translations [3], which are called an external resource-based approaches. Web searching [4] based feature extension technologies need to interact frequently with search engines, and result in high communication overhead and low efficiency for data analysis. Knowledge bases or lexical databases, such as Wikipedia and HowNet for concept taxonomies [5,6,7] or topic models [8, 9] are used to enrich short text representations. However, these feature extension methods have high dependencies on the integrity of external resources, and often time consuming. Moreover, these predefined topics and categories are domain-specialized or language-specific.

Using rules or statistical information hidden in the context of short texts is another kind of approaches to extend features, which are called the self-contained resource approaches [10, 11, 22,23,24, 27]. Mining hidden information in short texts plays a key role in feature extension. A self-aggregation-based topic model (SATM) [22] has been reported recently, which assumes short texts are sampled from long pseudo-documents, and then topic modeling is conducted by finding “document-ship” for each short text. U. K. Sikdar et al. [10] described a deep learning approach to recognize Amharic named entities from a large dataset annotated with six different classes, trained on various language independent features together with word vectors, which were the semantic information taken by an unsupervised learning algorithm, word2vec. The word vectors were merged with a set of specifically developed language independent features and together fed to the neural network model to predict the classes of the words. Zhang et al. [11] proposed a character-level convolutional network model for short text classification without any knowledge on the syntactic or semantic structures of a language. Nevertheless, these works ignore the relevance of the words in short texts. In the case of limited words, the association between words can be used as additional information to serve as an important basis for feature expansion and solve the problem of sparse features of the short text.

This paper considers two forms of information: inter-type and intra-type relationships between words and short texts. Based on these two kinds of data relations, the feature space is obtained by dimension reduction of word clustering indicator, which is obtained by non-negative matrix tri-factorization [12]. Then, according to the correlation between words, closely related features in the feature space are selected to expand the text feature vector, and this can effectively solve the problem of feature sparseness.

2 Related Works

Feature expansion is essential to classify short texts, and it has been mainly focusing on two kinds of approaches by now, Latent Dirichlet Allocation (LDA) topic model [40, 42, 43] and Word Embedding [29,30,31, 35,36,37,38, 42]. Y. Xu used LDA for clustering words or documents into “topics”, and based on a “topic-word” probability distribution model, the closely-related words were found and selected out to expand feature space of words [42]. W. Xia, et al. chose the liveness of each user as a feature, and modelled it as the weighted value for the user. They improve the precision of topic detection and tracking, by including the user feature into LDA model to expand the feature of short texts [40]. Yu, et al. [43] used the Dirichlet Multinomial Mixture (DMM) model as the main framework and extended short texts with the potential feature vector representation of the words by combining the user-LDA topic model, and achieved a good performance as an external extension of short texts. The complexity of Probabilistic Graphical Model hampers the development of LDA, and the computational cost of LDA results in bigger penalty compare with the improvement of this algorithm.

On the other hand, word embedding presents another kind of words representation, converting per word into a continuous vector space with dimensionality reduction [32, 33]. Semantic expansion of words is then obtained by clustering of vectors. Recently, researches have widely employed deep learning-based approaches for word embedding model. Google developed a Word2Vec tool based on Bengio neural network for word embedding [24]. In fact, Word2Vec predicted words based on their context by using one of two distinct neural models: CBOW [33, 35, 38, 39] and Skip-Gram [10, 29, 31, 34, 36, 37, 40].

P. Wang et al. proposed a framework to expand short texts, based on skip-gram model to learn word embeddings from large-scale unstructured text data. By using additive composition over word embeddings from context with variable window width, the representations of multi-scale semantic units in short texts were computed [37]. In literature [36], distributed word embeddings were learned by skip-gram algorithm through a neural network architecture, and then they were combined into a sentence representation to predict the semantic relations between short texts. W. X. Liang et al. proposed a global and local word embedding-based topic model (GLTM) for short texts [34]. They trained global word embeddings from large external corpus and employed the continuous skip-gram model with negative sampling (SGNS) to obtain local word embeddings. Utilizing both the global and local word embeddings, their method could distill semantic related information between words which could be further leveraged by Gibbs sampler in the inference process to strengthen semantic coherence of topics.

G. X. Xun et al. used Continuous Bag of Words (CBOW) to provide additional semantics for short text corpus, and incorporated it into each short document’s model to establish a Gaussian topic in the vector space [39]. In addition, a discrete background mode over word types was also added to complement the continuous Gaussian topics model. In literature [38], by using word embedding features, L. Sang et al. expanded and enriched the words density in the short texts, and semantic similarities of short texts were calculated for effective learning. This method combined external sources of word semantic information with the short text structure information. A. J. Pascual et al. presented a Contextual Specificity Similarity (CSS) algorithm [33] for document similarity measure, where documents were represented as arrays of their word vectors, and then Inverse Document Frequency (IDF) of the words were added into to define the closeness degree between documents.

Although Word2Vec has an outstanding performance in synonymous words analysis, it still relies on local context so much, lacking of global statistical information of short texts. Accordingly, in 2014, Jeffrey Pennington et al. presented a new model based on the words ice and steam to illustrate how to generate meaning from word occurrence, and how to result a global word vectors representing that meaning [23]. They defined it as GloVe, whose training was performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showed interesting linear substructures of the word vector space [37]. Comparative study [41] showed that its effectives for the Arabic language processing, and pointed out that the appropriate starting point for word vector learning might be indeed with ratios of co-occurrence probabilities rather than the probabilities themselves. The shortcoming of GloVe was also mentioned in literature [25], demanding a large-scale corpus and big enough storage resource.

Both approaches mentioned above cannot work without huge corpus data support. Opposite to the large-scale learning algorithms, this paper studies on feature expansion by short text itself. There are three aspects of relations taken into consideration, including word-to-word, word-to-text and text-to-text, to make use of more relatedness information from short text. We use this method as an alternative to the aforementioned relation features, in cases where only limited amounts of training data are available.

3 Algorithm Framework

Given a short text set T = {t_1,…, t_m} and a word set W = {w₁, …, w_n}. The goal is to group the texts {t₁, …, t_m} into k clusters, in the meantime also grouping the words {w₁, …, w_n} into k clusters. The relationship matrix R describes the inter-type relationships between texts and words. The correlation matrix A_t and A_w represent the intra-type relationships of texts and words, respectively. The clustering indicator matrix F represents the clustering result of words, whose element F_ij represents the possibility that w_i belongs to cluster k_j. Similarly, the clustering indicator matrix G represents the clustering result of short texts. Since the short text category label of training set is known, the matrix G can be obtained. In this way, the feature expansion for short texts is transformed into the clustering of texts and words jointly.

The overall framework of our algorithm is based on non-negative matrix factorization, including four steps: feature space establishment, feature expansion, feature space updating and short text classification, as shown in Fig. 1.

The feature space of the short text itself describes the possibility of the word belonging to the category. Based on training texts, we construct a relationship matrix to describe membership of word-to-text, and two correlation matrixes to describe intra-type relation of text-to-text and word-to-word respectively. Under the manifold regularization, the nonnegative matrix factorization algorithm is used to build words clustering indicator matrix. After removing some evenly distributed features in the indicator matrix, a dimension-reduced feature space is constructed. The feature of short text is to extend by the correlation between the features in the feature space and the text features. The updating of feature space is to predict the clustering indicator value of the unknown feature with the clustering indicator average value of the known feature in the same text, and then add the new feature into the feature space. The classifier is to divide the testing samples into different categories by using an SVM algorithm.

4 Feature Space Construction Based on DNMTF

4.1 Non-negative Matrix Tri-Factorization

The feature space is constructed by factorization of the relationship matrix. Firstly, according to the label data of the short text training set, the clustering indicator matrix G can be directly obtained, which is part of the relationship matrix R in the non-negative matrix tri-factorization [13]. Then, with manifold regularization constraint added, word clustering indicator matrix F is obtained by decomposition.

The relation matrix R is decomposed into three matrices, F, S and G, noted as: R ≈ FSG^T. Matrix F and G are clustering indicator matrix corresponding to two types of entities respectively, and matrix S is an equilibrium matrix with multi-dimension, which would guarantee the accuracy of low-dimensional matrix representation.

4.2 Construction of Relationship and Correlation Matrix

The construction of the relationship matrix R follows the natural relationship between text and word. If the word w_i appears in the text t_j, then R_ij = 1, otherwise R_ij = 0.

The construction of the correlation matrix A_t and A_w is based on statistics information between text and words. The calculation of correlation strength between two samples x_i and x_j is shown in Eq. (1).

$$ A_{ij} = \frac{{B\left( {x_{i} ,x_{j} } \right)}}{{\mathop \sum \nolimits_{{x_{a} ,x_{b} \in T\left( W \right)}} B\left( {x_{a} ,x_{b} } \right)}} $$

(1)

Where $ B\left( {x_{i} ,x_{j} } \right) $ is the number of words (text) co-occurrence by sample x_i and x_j in T (word set W).

4.3 Relationship Matrix Factorization with Manifold Regularization

According to the manifold hypothesis [14], if two samples x_i and x_j are similar in geometric structure, then the practical significance of these two samples is also similar, which is reflected in clustering labels. Therefore, we propose a novel algorithm based on the dual regularization non-negative matrix tri-factorization algorithm (DNMTF) [15] to capture the intra-type and inter-type relationship among entities. The relationship matrix factorization based on manifold regularization is shown in Eq. (2).

$$ J_{1} = \left\| {R - FSG^{T} } \right\|^{2} + \mu tr\left( {F^{T} L_{w} F} \right) + \phi tr\left( {G^{T} L_{t} G} \right)\,\,\,\,\,s.t. F,S,G \ge 0 $$

(2)

Where μ, ϕ > 0 are the regularization parameters, used to balance the reconstruction error of DNMTF in the first item and graph regularizations in the second and third terms in Eq. (2). $ L_{w} = D_{w} - A_{w} $ is the graph Laplacian of the data graph which reflects the label smoothness of the data points, and $ L_{t} = D_{t} - A_{t} $ is the graph Laplacian of the feature graph which reflects the label smoothness of the feature D_w and D_t are diagonal matrix, whose entities are column sum of Aw and At, noted as $ D_{ii}^{w} = \mathop \sum \limits_{j} A_{ij}^{w} $, $ D_{ii}^{t} = \mathop \sum \limits_{j} A_{ij}^{t} $, respectively.

Since labels of training set are known already, the clustering indicator matrix G can be directly obtained as part input of J₁. The objective function in Eq. (2) can be rewritten into Eq. (3).

$$ J_{1} = tr\left( {\left( {R - FSG^{T} } \right)\left( {R - FSG^{T} } \right)^{T} } \right) + \mu tr\left( {F^{T} L_{w} F} \right) + \phi tr\left( {G^{T} L_{t} G} \right) $$

$$ = tr\left( {RR^{T} } \right) - 2tr\left( {RGS^{T} F^{T} } \right) + tr\left( {FSG^{T} GS^{T} F^{T} } \right) + \mu tr\left( {F^{T} L_{w} F} \right) + \phi \left( {G^{T} L_{t} G} \right) $$

(3)

Introduce Lawrencian multiplier α_n × k, β_m × k and γ_k × k for constraint F ≥ 0, G ≥ 0 and S ≥ 0, respectively. Accordingly, the Lawrencian function is shown in Eq. (4).

$$\begin{aligned} L & = tr\left( {RR^{T} } \right) - 2tr\left( {RGS^{T} F^{T} } \right) + tr\left( {FSG^{T} GS^{T} F^{T} } \right) + \mu tr\left( {F^{T} L_{w} F} \right) \\ & + \,\phi tr\left( {G^{T} L_{t} G} \right) + tr\left( {\alpha F^{T} } \right) + tr\left( {\beta G^{T} } \right) + tr\left( {\gamma S^{T} } \right) \\ \end{aligned} $$

(4)

In solving the matrix S, we take the matrix F and G as the given conditions, and then let the partial differential $ \frac{\partial L}{{\partial \varvec{S}}} = 0 $, then we derive Eq. (5).

$$ \gamma = 2F^{T} RG - 2F^{T} FSG^{T} G $$

(5)

Using KKT condition [16] $ \gamma_{ij} S_{ij} = 0 $. Then we can get Eq. (6).

$$ [F^{T} RG - F^{T} FSG^{T} G]_{ij} S_{ij} = 0 $$

(6)

According to Eq. (6), matrix S follows the following updating, as shown in Eq. (7).

$$ S_{ij} \leftarrow S_{ij} \frac{{[F^{T} RG]_{ij} }}{{[F^{T} FSG^{T} G]_{ij} }} $$

(7)

In solving the matrix F, we take the matrix S and G as the given conditions, and then let the partial differential $ \frac{\partial L}{{\partial \varvec{F}}} = 0 $. Then we get Eq. (8).

$$ \alpha = 2RGS^{T} - 2FSG^{T} GS^{T} - 2\mu L_{w} F $$

(8)

Replace $ L_{w} = D_{w} - A_{w} $ into Eq. (8) and use KKT condition [16] $ \alpha_{ij} F_{ij} = 0 $. Then we can get Eq. (9).

$$ [RGS^{T} - FSG^{T} GS^{T} - \mu D_{w} F + \mu A_{w} F]_{ij} F_{ij} = 0 $$

(9)

According to Eq. (9), matrix F follows the following updating, as shown in Eq. (10).

$$ F_{ij} \leftarrow F_{ij} \frac{{[RGS^{T} + \mu A_{w} F]_{ij} }}{{[FSG^{T} GS^{T} + \mu D_{w} F]_{ij} }} $$

(10)

5 Feature Extension Based on Self-resources

5.1 Feature Expansion

Suppose there are p feature words in the feature space H_p×k, which is the output of Algorithm 1. Then, from space H, there are q (p >> q) features f_i (i = 1, …, q) are chosen out to compose of a subset of the feature space H, denoted as H*_q×k, which contains and only contains those q features. Then, multiply H* with feature space H to get matrix E_q×p, as shown in Eq. (11).

$$ E = H^{*} \cdot H^{T} $$

(11)

Where the matrix E describes f_i (i = 1, …, q) correlation with all features in space H.

In order to select features for expansion conveniently, the matrix E is compressed, and the values of each column are added and the mean is calculated to get the vector e with dimensions p, as shown in Eq. (12).

$$ e\left( j \right) = \frac{{\mathop \sum \nolimits_{i = 1}^{q} E_{ij} }}{q},\quad j = 1 \cdots p $$

(12)

Vector e describes the relevance between each feature word in the feature space H and feature representation f_i (i = 1, …, q) in the subspace H*. In addition to the existing text features, the first K features are selected to expand the short text according to the relevance in e.

5.2 Feature Space Update

In the process of extending the features of the short text, there is a possibility: some features extracted from the short text are not included in the feature space H. At this time, the feature space has an insufficient feature expansion. Therefore, before the feature expansion of the short text, the text features should be firstly detected to see whether update of space H to cover all new text features is needed. There are two kinds of new features needed to update:

(1)
the feature does not exist in the feature space H
(2)
the feature is not the one that had been deleted after dimension reduction on clustering indicator matrix.

Suppose there are features needed to be updated, and their corresponding clustering indicator matrix is H**. Due to the correlation between input data, H** can be calculated based on H*, as shown in Eq. (13).

$$ H_{i}^{**} \left( j \right) = \frac{{\mathop \sum \nolimits_{g = 1}^{q} \varvec{H}_{gj}^{*} }}{q},j = 1 \cdots k,i = 1 \cdots a $$

(13)

Finally, H** is incorporated into H to obtain an enlarged feature space, based on which feature expansion is carried out. Here, H^* is a subset of the feature space H.

5.3 Algorithm Description

6 Experiments and Discussion

6.1 Experimental Datasets

This paper verifies the effectiveness of the proposed method using three datasets. In the experiment, the open source tool libsvm is used as the text classifier. The first dataset, Web snippets, obtained from Web search by Phan et al. [17], is a commonly used short text classification test set. The data set contains 8 categories, including 10060 training sets and 2280 test sets, with an average text length of 17.93. Specific information is listed in Table 1.

Table 1. Web snippets dataset

Full size table

The second data set is the Twitter 100k, published by Hu et al. [18]. The text is written by users in an informal language and is subject to the number limitation of words. Without class label in this data set, only sports-related data are selected out, and used as experimental data for sport-item data classification after they are manually tagged and the final 6 items, including 3000 training sets and 630 test sets, are left with an average text length of 12.95. The specific information is listed in Table 2.

Table 2. Twitter sports dataset

Full size table

The third data set is the AGnews data obtained by Zhang [19] et al., and the 4 classes with the largest amount of are selected to construct the data set, including 120,000 training sets and 7600 test sets, with an average text length of 38.82. The specific information is listed in Table 3.

Table 3. AGnews dataset

Full size table

6.2 Parameters Selection

In Eq. (2), the regularization parameters μ and ϕ are selected according to one of the three evaluation indexes, Purity [20], Normalized Mutual Information (NMI) [21] and Adjusted Rand Index (ARI) [26]. Purity calculates the proportion of correctly clustered documents in the total number of documents. NMI measures the degree of similarity between the two clustering results, and ARI measures the degree of coincidence between the clustering results and the real situation. In the process of relationship matrix factorization, the regularization parameter is set to μ = ϕ. Based on different value of μ, the DNMTF method with random initialization is carried out for 50 times, and the comparison results are shown in Fig. 2.

From Fig. 2, we can see that the clustering accuracy arrives the highest when μ = 0.6, with any one of three evaluation indexes. Accordingly, in the following experiments of matrix factorization, we set up the regularization parameter to be μ = 0.6.

The Web snippets data set has 4775 features, Twitter sports data set has 1248 features, and AGnews data set has 6582 features. The selection of feature extension number K directly affects the classification results. Therefore, different parameters K are selected on three data sets for comparative experiments, and the results are shown in Fig. 3(a)–(c), respectively. We can see that no matter which data set, even if there is only one feature is added, and the accuracy of classification results increase rapidly to be close to the optimal value 1. The reason for that is the feature with the strongest relevance to the short text is found in the feature space according to Eq. (12), which must be the most indicative feature in a certain category. Expansion by this feature will allow other short texts of the same category to enlarge their feature representation, in case they did not have it before. The similarity between the sparse feature vectors of the same category is greatly improved, which has a positive impact on the classification results.

When the number of extended features gradually increases, the accuracy of classification results increases comparatively constant until it reaches the peak point of each dataset, then it begins to decline slightly, as shown in Fig. 3 (a)–(c).

6.3 Compared Algorithms

In order to verify the effect of NMFFE algorithm, we compare NMFFE with BOW and Char-CNN, namely word bag method and character level convolutional neural network method without considering semantic information. The results are shown in Table 4. and the corresponding best results in the table are all in bold font. In the study [11], the accuracy of BOW algorithm and Char-CNN algorithm on AGnews data set was 88.81% and 87.18%, respectively. In our experimental environment and data processing operations, our experimental results shown in Table 4 are little different with those presented by study [11].

Table 4. Comparison results of classification accuracy on 3 datasets

Full size table

From Table 4, we can find that in the respect of dataset size, the Char-CNN algorithm performs well in big datasets but perform less in small datasets, where the limited training data cannot cover the overall distribution of data, and lead to the over-fitting of convolutional neural network.

In the respect of data integrity, text length of the AGnews dataset is relatively long, and its sufficient corpus makes the three algorithms perform well in text classification. The accuracies of their classification results have small differences. The similarity between test dataset and training dataset of Web snippets (co-occurrence of keywords) is not as high as the other two datasets, making the BOW algorithm based on word frequency statistics on this dataset less effective.

The overall performance of the proposed NMFFE algorithm achieves better classification results than those of the other two algorithms, and the robustness on datasets with different sizes is better than the two latter. BOW algorithm and Char-CNN algorithm are more suitable for large-scale datasets.

The running time of the three algorithms is compared on three data sets, and the results are shown in Fig. 4. The execution time of BOW algorithm is shorter than the other two algorithms, and it is more obvious on large datasets, mainly because the model of BOW algorithm is relatively simple. NMFFE algorithm takes the longest time in the feature expansion process, because it involves a lot of matrix operations. When the number of feature extensions K increases, the running time also increases. The Char-CNN algorithm model consists of 6 convolution layers and 3 full connection layers.

7 Conclusions

Different from vector-form based feature expansion method of short texts, we proposed a method using K relevant features as a self-contained subset to extend feature space of short texts. Without relying on the external resources, words clustering indicator matrix was obtained from text dataset itself through graph dual regularization non-negative matrix tri-factorization (DNMTF). After dimension reduction, feature space was obtained as the basis for feature expansion, and then the most relevant features extracted within the dataset itself were selected to enlarge the feature space of short texts. Experimental results showed that NMFFE algorithm performed better than Word2Vec algorithm and Char-CNN algorithm in accuracy of classification. However, the datasets used in this paper were all open datasets which actually had been pre-processed. However, the main challenge of short-text feature expansion and classification is the online and real-time data processing. So, we will adjust our method to adapt the real-time online environments in the future.

References

Rafeeque, P.C., Sendhilkumar, S.: A survey on short text analysis in web. In: Third International Conference on Advanced Computing. https://doi.org/10.1109/icoac.2011.6165203
Heap, B., Bain, M., Wobcke, W.: Word vector enrichment of low frequency words in the bag-of-words model for short text multi-class classification problems. e-print (2017). https://arxiv.org/abs/170905778H
Tommasel, A., Godoy, D.: Short-text feature construction and selection in social media data: a survey. Artif. Intell. Rev. 49(3), 301–338 (2016). https://doi.org/10.1007/s10462-016-9528-0
Article Google Scholar
Kang, W., Qiu, H.Z., Jiao, D.D.: Search-based short-text classification. Appl. Electron. Tech. (2018). https://doi.org/10.16157/j.issn.0258-7998.181392
Article Google Scholar
Li, X., Su, Y., Ma, H., Cao, L.: Combining statistical information and semantic similarity for short text feature extension. In: Shi, Z., Vadera, S., Li, G. (eds.) IIP 2016. IAICT, vol. 486, pp. 205–210. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48390-0_21
Chapter Google Scholar
Li, J., Cai, Y., Cai, Z., Leung, H., Yang, K.: Wikipedia based short text classification method. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds.) DASFAA 2017. LNCS, vol. 10179, pp. 275–286. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-55705-2_22
Chapter Google Scholar
Li, P.P., He, L., Wang, H.Y., et al.: Learning from short text streams with topic drifts. IEEE Trans. Cybern. 48(9), 2697–2710 (2018)
Article Google Scholar
Vo, D.T., Ock, C.Y.: Learning to classify short text from scientific documents using topic models with various types of knowledge. Expert Syst. Appl. 42(3), 1684–1698 (2015)
Article Google Scholar
Zhang, H., Zhong, G.Q.: Improving short text classification by learning vector representations of both words and hidden topics. Knowl. Based Syst. 102(C), 76–86 (2016)
Google Scholar
Sikdar, U.K., Gambäck, B.: Named entity recognition for amharic using stack-based deep learning. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10761, pp. 276–287. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77113-7_22
Chapter Google Scholar
Zhang, X., Zhao, J.B., Yann, L.C.: Character-level convolutional networks for text classification. In: 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, Canada, pp. 649–657 (2015)
Google Scholar
Wang, D.Q., Lu, C.W., Wu, J.J.: Softly associative transfer learning for cross-domain classification. IEEE Trans. Cybern. (1) (2019). https://doi.org/10.1109/tcyb.2019.2891577
Cheng, X., Guo, J., Liu, S.: Learning topics in short texts by non-negative matrix factorization on term correlation matrix. In: 13th SIAM International Conference on Data Mining, May 2013, Texas, USA, pp. 749–757 (2013)
Google Scholar
Borg, I.: A note on the positive manifold hypothesis. Pers. Individ. Differ. 134(1), 13–15 (2018)
Article Google Scholar
Shang, F.H., Jiao, L.C., Wang, F.: Graph dual regularization non-negative matrix factorization for co-clustering. Pattern Recogn. 45(6), 2237–2250 (2012)
Article Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization, vol. 3, pp. 63–107. Cambridge University Press, Cambridge (2004)
Google Scholar
Phan, X.H., Nguyen, L.M., Horiguchi, S: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceeding of the 17th International Conference on World Wide Web, pp. 91–100 (2008)
Google Scholar
Hu, Y.T., Zheng, L., Yang, Y.: Twitter 100k: a real-world dataset for weakly supervised cross-media retrieval. IEEE Trans. Multimedia 20(4), 927–938 (2018)
Article Google Scholar
Zhang, X.: AG’s News Topic Classification Dataset Version 3, Updated, 09 September 2015
Google Scholar
Hassani, M., Seidl, T.: Using internal evaluation measures to validate the quality of diverse stream clustering algorithms. Vietnam J. Comput. Sci. 4(3), 171–183 (2016). https://doi.org/10.1007/s40595-016-0086-9
Article Google Scholar
Yang, S., Zhang, L.: Non-redundant multiple clustering by nonnegative matrix factorization. Mach. Learn. 106(5), 695–712 (2016). https://doi.org/10.1007/s10994-016-5601-9
Article MathSciNet MATH Google Scholar
Quan, X.J., Kit, C.Y., Ge, Y.: Short and sparse text topic modeling via self-aggregation. In: 1st International Workshop on Social Influence Analysis/24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina, pp. 2270–2276 (2015)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1532–1543 (2014)
Google Scholar
Mikolov, T., Chen, K., Corrado, G.: Efficient estimation of word representations in vector space (2013). https://arxiv.org/abs/1301.3781v3
Jameel, S., Bouraoui, Z., Schockaert, S.: Unsupervised learning of distributional relation vectors. In: The 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, pp. 1–11 (2018)
Google Scholar
Robert, V., Vasseur, Y., Brault, V.: Comparing high dimensional partitions, with the coclustering adjusted rand index, 18 May 2017
Google Scholar
Li, Z.H., Yang, Z.H., Shen, C.: Integrating shortest dependency path and sentence sequence into a deep learning framework for relation extraction in clinical text. BMC Med. Inform. Decis. Mak. 19(1), 22 (2019)
Article Google Scholar
Tsapatsoulis, N., Djouvas, C.: Opinion mining from social media short texts: does collective intelligence beat deep learning. Front. Rob. AI 5, 138–146 (2019)
Article Google Scholar
Hassan, A., Mahmood, A.: Deep learning approach for sentiment analysis of short texts. In: The 3rd IEEE International Conference on Control, Automation and Robotics, Nagoya, Japan, pp. 705–710 (2017)
Google Scholar
Wang, Z.L., Li, S., Chen, G.: Deep and shallow features learning for short texts matching. In: The 5th IEEE International Conference on Progress in Informatics and Computing, Nanjing, Peoples R China, pp. 51–55 (2017)
Google Scholar
Severyn, A., Moschitt, A.: Learning to rank short text pairs with convolutional deep neural networks. In: The 38th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, pp. 373–382 (2015)
Google Scholar
Jinarat, S., Manaskasemsak, B., Rungsawang, A.: Short text clustering based on word semantic graph with word embedding model. In: Joint 10th International Conference on Soft Computing and Intelligent Systems/19th International Symposium on Advanced Intelligent Systems, Toyama, Japan, pp. 1427–1432 (2018)
Google Scholar
Jiménez Pascual, A., Fujita, S.: Text similarity function based on word embeddings for short text analysis. In: Gelbukh, A. (ed.) CICLing 2017. LNCS, vol. 10761, pp. 391–402. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77113-7_31
Chapter Google Scholar
Liang, W.X., Feng, R., Liu, L.X.Y.: GLTM: a global and local word embedding-based topic model for short texts. IEEE Access 6, 43612–43621 (2018)
Google Scholar
Al-Azani, S., El-Alfy, E.S.M.: Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short Arabic text. In: The 8th International Conference on Ambient Systems, Networks and Technologies/7th International Conference on Sustainable Energy, Madeira, Portugal, vol. 109, pp. 359–366 (2017)
Google Scholar
De Boom, C., Van Canneyt, S., Demeester, T.: Representation learning for very short texts using weighted word embedding aggregation. Pattern Recogn. Lett. 80, 150–156 (2016)
Article Google Scholar
Wang, P., Xu, B., Xu, J.M.: Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification. Neurocomputing 174, 806–814 (2016)
Article Google Scholar
Sang, L., Xie, F., Liu, X.J.: WEFEST: word embedding feature extension for short text classification. In: The 16th IEEE International Conference on Data Mining, Barcelona, Spain, pp. 677–683 (2016)
Google Scholar
Xun, G.X., Gopalakrishnan, V., Ma, F.L.: Topic discovery for short texts using word embeddings. In: The 16th IEEE International Conference on Data Mining, Barcelona, Spain, pp. 1299–1304 (2016)
Google Scholar
Xia, W., He, Y.X., Tian, Y.: Feature expansion for microblogging text based on Latent Dirichlet Allocation with user feature. In: The 6th IEEE Joint International Information Technology and Artificial Intelligence Conference, Chongqing, China, pp. 228–232 (2011)
Google Scholar
Naili, M., Chaibi, A.H., Ben, G.: Comparative study of word embedding methods in topic segmentation. Procedia Comput. Sci. 112, 340–349 (2017)
Article Google Scholar
Xu, Y.: Research on Short Text Classification Based on Word Vectors and Topics, vol. 5, pp. 5–14. Huazhong University of Technology, Wuhan (2018)
Google Scholar
Yu, J., Qiu, L.R.: ULW-DMM: an effective topic modeling method for microblog short text. IEEE Access 7, 884–893 (2019)
Article Google Scholar

Download references

Acknowledgements

This paper was funded by Scientific Project of Guangdong Provincial Transport Department (No. Tec-2016-02-30), Natural Science Foundation of Guangdong Province under Grant 2018A030313061, in part by the Guangdong Science and Technology Plan under Grant 2017B010124001, Grant 201902020016, and Grant 2019B010139001. The last author is also partially supported by the European Horizon 2020 research and innovation program by the ENVRI-FAIR project (824068), the BLUEClOUD project (862409), and the ARTICONF project (825134).

Author information

Authors and Affiliations

School of Computers, Guangdong University of Technology, Guangzhou, 510006, China
Ling Zhang & Wenchao Jiang
Multiscale Networked System (MNS) Group, Informatics Institute, University of Amsterdam, Science Park 904, 1098XH, Amsterdam, The Netherlands
Zhiming Zhao

Authors

Ling Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Wenchao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wenchao Jiang .

Editor information

Editors and Affiliations

Xidian University, Xi'an, China
Xiaofeng Chen
Guangzhou University, Guangzhou, China
Hongyang Yan
Michigan State University, East Lansing, MI, USA
Qiben Yan
Division of Computer, Electrical and Mathematical Sciences and Engineering, King Abdullah University of Science, Thuwal, Saudi Arabia
Xiangliang Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Jiang, W., Zhao, Z. (2020). Short-Text Feature Expansion and Classification Based on Non-negative Matrix Factorization. In: Chen, X., Yan, H., Yan, Q., Zhang, X. (eds) Machine Learning for Cyber Security. ML4CS 2020. Lecture Notes in Computer Science(), vol 12488. Springer, Cham. https://doi.org/10.1007/978-3-030-62463-7_32

Download citation

DOI: https://doi.org/10.1007/978-3-030-62463-7_32
Published: 11 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-62462-0
Online ISBN: 978-3-030-62463-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Short-Text Feature Expansion and Classification Based on Non-negative Matrix Factorization

Abstract

Similar content being viewed by others

Short Text Feature Extension Based on Improved Frequent Term Sets

Crest: Cluster-based Representation Enrichment for Short Text Classification

Research on Text Classification Method Based on Word2vec and Improved TF-IDF

Keywords

1 Introduction

2 Related Works

3 Algorithm Framework

4 Feature Space Construction Based on DNMTF

4.1 Non-negative Matrix Tri-Factorization

4.2 Construction of Relationship and Correlation Matrix

4.3 Relationship Matrix Factorization with Manifold Regularization