Full-class set classification using the Hungarian algorithm

Kuncheva, Ludmila I.

doi:10.1007/s13042-010-0002-z

Full-class set classification using the Hungarian algorithm

Original Article
Published: 16 September 2010

Volume 1, pages 53–61, (2010)
Cite this article

Download PDF

Access provided by CONRICYT-eBooks

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Full-class set classification using the Hungarian algorithm

Download PDF

Ludmila I. Kuncheva¹

363 Accesses
22 Citations
Explore all metrics

Abstract

Consider a set-classification task where c objects must be labelled simultaneously in c classes, knowing that there is only one object coming from each class (full-class set). Such problems may occur in automatic attendance registration systems, simultaneous tracking of fast moving objects and more. A Bayes-optimal solution to the full-class set classification problem is proposed using a single classifier and the Hungarian assignment algorithm. The advantage of set classification over individually based classification is demonstrated both theoretically and experimentally, using simulated, benchmark and real data.

Hierarchical Single Label Classification: An Alternative Approach

Who Is Missing? A New Pattern Recognition Puzzle

SAMME.C2 algorithm for imbalanced multi-class classification

Article 24 July 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

For many years now, pattern recognition and machine learning have devoted major efforts to improving classification accuracy, and have allegedly cast aside a number of challenges arising from real-life problems [7]. One of the standard assumptions in classical pattern recognition is that the data comes as an independent identically distributed (i.i.d) sequence of instances. Here we abandon this assumption and consider dependent data where a set of instances has to be classified together, knowing that the set contains at most one instance from each class (or exactly one instance from each class, if the cardinality of the set equals the number of classes). Consider an automatic system that uses face recognition to record students’ attendance of a lesson against a predefined list of identities. Without enforcing the one-to-one correspondence, one identity may be assigned to two or more students. If the individual face classifier is reasonably correct, then some mistakes can be remedied. In this context, a classifier is informally described as “reasonably correct” if the true class is ranked high among all classes, even when the top ranked class is incorrect.

An example of non-i.i.d classification, called the multiple-instance problem, arises in complex machine learning applications where the information about the instances is incomplete or ambiguous [4, 11, 17], e.g., in drug activity prediction [4]. The training examples come in “bags” labelled either positive or negative. For a positive bag, it is known that at least one instance in the bag has true positive label. For a bag labelled negative, all instances are known to be negative. The problem is to design a classifier that can label as accurately as possible an unseen bag of instances.

Set classification is considered by Ning and Karypis [13], where all the instances in the set to be classified are known to have come from the same class. This problem may arise in face recognition where multiple images of the same person’s face are submitted as a set. Collective recognition is another scenario where a set of instances are labelled together [12, 16]. The crucial assumption there is that the instances within the set are related, so that the dependencies can be used to improve the classification accuracy. For examples, in classifying web pages into topic categories, hyperlinked web pages are more likely to share common class labels than non-linked pages [16].

Simultaneous classification of a set of instances has been used in tracking. For example, a moving object can be regarded as a patchwork of parts [1] or a set of tracklets [8], which are matched from one image frame to the next. Although often mentioned en passant, this fits within the ‘full-class’ set classification considered here because each part/tracklet on the object can be referred to as a class label, and the segmented pieces in the image have to be distributed to the class labels. However, the instances within the set are not i.i.d, as the parts are spatially linked within the object, and also follow physical motion laws.

Our full-class set classification problem is, in a way, the opposite to multiple-instance classification or Ning and Karypis’ set-classification. We assume that a classifier is available, and is trained following a standard training procedure. A set of instances needs to be classified, knowing that there is at most one instance from each class; termed the “full-class classification” problem. Intuitively, this set-up seems to be favourable compared to the classical i.i.d set-up, as it contains information about the dependency between the data to be classified. To the best of the author’s knowledge, this problem has not been treated on a systematic level before. Here we view it from a Bayesian perspective. To find a guaranteed optimal solution we cast the full-class classification as a linear programming (LP) problem and apply the Hungarian assignment algorithm [9].

The rest of the paper is organised as follows. Section 2 introduces the details of the set classification problem and explains the proposed method. Experimental results are shown in Sect. 3, and conclusions are given in Sect. 4.

2 Full-class set classification

Consider a classification problem where an instance x may come from one of the c classes in the set Ω = {ω₁,..., ω_c}. Let X = {x ₁,..., x _c} be a set containing exactly one instance from each class. A set-classifier, D _set, will label any set X with a permutation of the class indices.

The accuracy of D _set is the probability that the whole set is labelled correctly. D _set can be constructed using the output of a base individual classifier D _ind. We assume that D _ind outputs estimates of P(ω_i|x _j), the posterior probability that instance x _j belongs to class ω_i. Then D _set is defined as

$$ D_{{\rm set}}:{{\mathcal{M}}}\to {{\mathcal{I}}}, $$

(1)

where ${{\mathcal{I}}}$ is the set of all permutations of the class indices, and ${{\mathcal{M}}}$ is the set of all square matrices of size (c × c) with entries m _(i,j) ∈ [0, 1], such that $\sum\nolimits_{j=1}^c m_{(i,j)}=1.$

Let p be the probability that D _ind will label correctly a randomly chosen instance x. Suppose that D _set takes the labels suggested by D _ind without any modification. Assuming that all instances are labelled independently, the accuracy of D _set will be p _ind = p ^c. There is a possibility to improve upon this rate, if D _set takes into account the fact that all class labels must be present. To illustrate this, consider the example below.

2.1 An example

Let c = 2, X = {x ₁, x ₂}, and let the true posterior probabilities be P(ω₁|x ₁) > 0.5 and P(ω₂|x ₂) > 0.5. Ideally, D _ind will label x ₁ in class ω₁, and x ₂ in class ω₂. However, the estimates of the posterior probabilities obtained from D _ind may deviate from the true values, and corrupt the labelling. Denote these estimates by $P_1 = \hat{P}(\omega_1|{\bf x}_1)$ and $P_2 = \hat{P}(\omega_2|{\bf x}_2).$ Suppose that, due to the imperfection of the classifier, it labels x ₂ incorrectly, i.e. P ₂ < 0.5. Then both points are labelled in ω₁, and, if these labels are taken forward, D _set will label the set incorrectly. However, if D _set is constructed so that it maximises the sum of the logarithms of the posterior probabilities with the constraint that there should be one label from each class, the mistake can be remedied.^{Footnote 1} The sum of the estimates for the two possible labellings are log(P ₁) + log(P ₂) for permutation 〈ω₁, ω₂〉, and log(1 − P ₁) + log(1 − P ₂) for permutation 〈ω₂, ω₁〉. Thus if

$$ P_1P_2 > (1-P_1)(1-P_2),\; \hbox{or equivalently},\; P_1+P_2 > 1, $$

(2)

then D _set will assign the correct labels to X. Equation 2 will hold not only when both P ₁ > 0.5 and P ₂ > 0.5, but also when one is <0.5 but P ₁ + P ₂ > 1. As a numerical example consider P ₁ = 0.8, P ₂ = 0.4. The score for labelling 〈ω₁, ω₂〉 is log(0.8) + log(0.4) ≈ −1.1, and for 〈ω₂, ω₁〉, log(0.2) + log(0.6) ≈ −2.1. Then D _set will label both objects correctly while D _ind will fail on x ₂.

2.2 The Bayes full-class set classifier

Let X = {x ₁,..., x _c} be the set of c instances. The super-label that is assigned to X is taken from the set ${{\mathcal{C}}}$ of all possible sequences of c label outputs. Figure 1 shows the relationship between ${{\mathcal{C}}}$ and ${{\mathcal{I}}}$ (the set of all label permutations). It is convenient to call the elements of ${{\mathcal{C}}}$ ‘super-labels’.

We propose to construct D _set as follows. Find a super-label, $S^{*}\in {{\mathcal{I}}},$ such that

$$ S^{*} = \hbox{arg}\;\max_{S}\; \sum\limits_{i=1}^c \log \hat{P}(\omega_{s_i}|{\bf x}_i),\quad S=\langle s_1,s_2,\ldots, s_c\rangle,\;S\in {{\mathcal{I}}}. $$

(3)

Below we demonstrate the Bayes-optimality of this criterion. Define a prior distribution over ${{\mathcal{C}}}.$

$$ P(S)=\left\{ \begin{array}{ll} {\frac{1}{c!}},& \hbox{if}\;S\in{{\mathcal{I}}}\\ 0,& \hbox{otherwise,} \end{array}\right. $$

(4)

The posterior probability of super-label $S\in {{\mathcal{C}}},$ given the set X, is

$$ P(S|X) = {\frac{P(S)}{P(X)}} P(X|S) = {\frac{1}{c!P(X)}} P(X|S). $$

(5)

We assume that the elements of X are conditionally independent. In other words, the super-label is fixed, and the elements of X are drawn independently from the respective classes. Then for S = 〈s ₁,..., s _c〉,

$$ P(X|S) = \prod\limits_{i=1}^c\; p({{\mathbf{x}}}_i|s_i). $$

(6)

Fro any super-label $S\in {{\mathcal{I}}},$ the product $\prod_{i=1}^c P(s_i)$ is a constant because it does not depend on the order of the labels in the specific permutation. Also, $\prod_{i=1}^c p({{\mathbf{x}}}_i)$ is a constant with respect to the super-label A. Define a constant $\beta = \prod_{i=1}^c P(s_i) / \prod_{i=1}^c p({\mathbf{x}}_i).$ Assuming non-zero priors for all classes and non-vanishing probability density functions, multiply and divide (5) by β

$$ P(S|X) = {\frac{1}{c!P(X)\beta}} \prod\limits_{i=1}^c\; p({{\mathbf{x}}}_i|s_i) \beta = {\frac{1} {c!P(X)\beta}}\prod\limits_{i=1}^c\;{\frac{P(s_i)p({{\mathbf{x}}}_i|s_i)} {p({{\mathbf{x}}}_i)}}. $$

(7)

Noticing that the terms of the product are the posterior probabilities P(s _i|x _i), and absorbing the constant in front of the product into a single constant α, we obtain

$$ P(S|X) = \alpha \prod\limits_{i=1}^c\; P(s_i|{{\mathbf{x}}}_i). $$

(8)

Equation (8) shows the relationship between the posterior probability of the super-label and the posterior probabilities of the elements of X for the permutation of individual labels. Taking logarithm, and maximising the posterior probability on $S\in{{\mathcal{C}}},$ the optimal S* is derived as

$$ S^{*} = \hbox{arg} \max_S \;\log P(S|X) = \hbox{arg} \max_S \; \sum\limits_{i=1}^c \log P(s_i|{{\mathbf{x}}}_i). $$

(9)

The term log(α) is a constant for any S, and hence has been dropped from the criterion. Note that the maximisation of the sum of the logarithms is not equivalent to the maximisation of the sum of the posterior probability. The equivalence holds only for two classes, as shown by the example above. It is easy to construct an example for c > 2 classes, where the set labelling chosen by the sum of the posteriors will be different to that chosen by the sum of the logarithms.

Constructing D _set amounts to finding the optimal label permutation. This problem is not trivial. To look for a solution, we cast it as a LP problem. Let $M\in {{\mathcal{M}}}$ be a reward matrix with entries $m_{i,j}=\hat{P}(\omega_i|{{\mathbf{x}}}_j)$ (the probability estimate for class ω_i that D _ind outputs for instance x _j). Introducing the unknowns r _(i,j) ∈ {0, 1}, i, j = 1,..., c, the LP is

$$ \hbox{max}\;\sum\limits_{i=1}^c\sum\limits_{j=1}^cr_{i,j} \;\log (m_{i,j}), $$

subject to

$$ \begin{aligned} \sum\limits_{i=1}^{c}r_{i,j}=1,\quad j = 1,\ldots,c,\\ \sum\limits_{j=1}^cr_{i,j}=1,\quad i = 1,\ldots,c. \\ \end{aligned} $$

Hungarian assignment algorithm [9] has been used in image processing to resolve the correspondence between tracklets [8] or shape parts [3]. We propose to use it here for constructing D _set. The input to the algorithm is M and the output is the optimal permutation S*. The set classification proceeds by first labelling all instances through D _ind to obtain M, and then deriving the class labels through the Hungarian algorithm.

The set classifier will always be no worse than the c consecutive labellings by D _ind. If all instances are correctly labelled by D _ind, then the sum of the estimates of the posterior probabilities will maximise the criterion (3)), and D _set will also assign the correct labels. In addition, correct labelling may be achieved when D _ind is wrong, and the mistakes are remedied by the process of constructing D _set. Therefore

$$ P_{{\rm set}} = \hbox{Pr}(\hbox{all correct}) + {\bf }\hbox{Pr}(\hbox{some incorrect}) \hbox{Pr}(\hbox{mistake corrected}) \geq p^c. $$

(10)

2.3 A guaranteed improvement for two classes

Consider a two-class problem. The ROC curve of a classifier is constructed by nominating any of the two classes to be the ‘positive’ class and the other to be the ‘negative’ class. The area under the ROC curve, AUC, gives the probability that the classifier will rank a randomly chosen positive instance higher than randomly chosen negative instance [6]. Phrased differently, this is the probability that the classifier will make errors with less certainty compared to the certainty when assigning a correct label. Formally, using again the notation $P_1 = \hat{P}(\omega_1|{{\mathbf{x}}}_1)$ and $P_2 = \hat{P}(\omega_2|{{\mathbf{x}}}_2),$

$$ \hbox{AUC}=\hbox{Pr} (P_1>1-P_2\;|\;\omega_1 \;\hbox{correct}) = \hbox{Pr}(P_2>1-P_1\;|\;\omega_2 \;\hbox{correct}). $$

Definition 1

We will call a two-class classifier reasonable if its AUC is strictly >0.5.

Proposition 1

Forc = 2 classes and a reasonable base classifierD_indwith accuracyp, the accuracy ofD_setis strictly greater thanp.

Proof

Without loss of generality, assume that the correct super-label is 〈ω₁, ω₂〉. The set-classifier D _set will label the set correctly in the following three mutually exclusive cases: (1) both labels are correct; (2) x ₁ is labelled correctly (P ₁ > 0.5), x ₂ is labelled incorrectly (P ₂ < 0.5), and P ₁ > 1 − P ₂; (3) x ₁ is labelled incorrectly (P ₁ < 0.5), x ₂ is labelled correctly (P ₂ > 0.5), and P ₂ > 1 − P ₁. Then the probability that D _set labels the set correctly is

$$ \begin{aligned} P_{{\rm set}} &= \hbox{Pr}(\hbox{both correct}) + \hbox{Pr}(\hbox{correct/wrong}) \hbox{Pr}(P_1>1-P_2|\;\omega_1 \;\hbox{correct})\\ &\quad+\hbox{Pr}(\hbox{wrong/correct})\hbox{Pr}(P_2>1-P_1|\;\omega_2 \;\hbox{correct}) \\ &= p^2 + p(1-p)\hbox{AUC} + p(1-p)\hbox{AUC} \\ \end{aligned} $$

(11)

$$ P_{{\rm set}} = p^2 + 2p(1-p)\hbox{AUC}, $$

(12)

Since AUC >0.5,

$$ P_{{\rm set}} = p^2 + 2p(1-p)\hbox{AUC} > p^2 + 2p(1-p)\times 0.5 =p. $$

(13)

$\square$

2.4 Simulation results

Simulations were carried out with number of classes c = {2, 4, 8, 10, 20, 35, 50} and individual classification accuracy p = {0.70, 0.75, 0.80, 0.90, 0.95}. For each pair (c, p), 1,000 random matrices M of size c × c were sampled. To ensure that each row of M summed up to one, we selected c−1 random numbers to split the [0, 1] interval into c parts, and arranged them in ascending order, e.g., n ₁, n ₂,..., n _c−1. The posterior probabilities forming a row of M were calculated as the differences P _i = n _i − n _i − 1, where n ₀ = 0 and i = 1,..., c. With probability p, the entry (i, i) was swapped with the largest entry in the row, making the true label the most probable one. With probability 1 − p, entry (i, i) would contain a random value, smaller than the highest in the row (not necessarily the second highest). Thus we are subjecting the set classifier to an unfavourable scenario, where the true class has the highest rank with probability p but is ranked randomly among the other classes with probability 1 − p.

The surface of the set accuracy p _set as a function of p and c is shown in Fig. 2. As expected, the set accuracy for this scenario drops quite abruptly with increasing number of classes c. For more inaccurate individual classifiers D _ind (smaller p) the drop is steeper than for accurate classifiers. The surface also confirms that higher p leads to higher set accuracy. Recall the notation p _ind, the accuracy of labelling the whole set when D _ind is used c times. The surface for p _ind runs just underneath that for p _set. Note that even for moderate number of classes c, the set accuracy drops to zero even for quite accurate classifiers. This is a consequence of our choice of unfavourable scenario where the true class does not necessarily rate high if D _ind labels the object incorrectly. In real problems, it may be expected that if D _ind is wrong, the true class will be ranked second or third. This will increase the chance of repairing the wrong classification and will lead to a better set accuracy, compared to the simulation scenario.

To compare p _ind and p _set, Fig. 3 plots the two accuracies as functions of c and p, where one of the parameters is fixed. Both plots indicate, as expected, that p _set > p _ind.

3 Experiments with real data

3.1 Benchmark data

Experiments were carried out with thirty data sets from UCI [2] and from a private repository. The data sets are listed in alphabetical order in Table 1, and the source is indicated on the side. The entries in the table are the averages of the testing accuracies of a stratified tenfold cross-validation. For each fold, a linear discriminant classifier (LDC) [5] was trained on the training data. The testing data was constructed by sampling 500 sets from the testing part of the fold. Every set contained one randomly chosen instance from each class.^{Footnote 2} Five accuracies were estimated:

1.
The individual accuracy of D _ind (p).
2.
The accuracy of D _ind applied c times to label the set (p _ind).
3.
The set-accuracy (p _set) of the following Greedy procedure. The largest posterior probability in the matrix is identified and the respective class label is assigned to the object. The object and the class label are eliminated, and the next largest posterior probability is identified and used to label the next object. The result is a set-labelling with a permutation of the class labels.
4.
The set-accuracy (p _set) of the following Sampled procedure. A class label is sampled for the first object in the set using the distribution defined by the posterior probabilities for this object. The class is recorded and dropped from further consideration. For the next object, a class label is sampled from the reduction of the posterior probability distribution over the remaining set of class labels. Again, the result is a set-labelling with a permutation of the class labels.
5.
The set-accuracy (p _set) of the proposed set-classification solution using the Hungarian algorithm (shown in the table as H).

Table 1 Individual accuracy (p) and set accuracies for the four set methods

Full size table

The accuracy for a set was recorded as 1 if all elements of the set were correctly classified, and 0 otherwise. Superscript ‘●’ in Table 1 indicates that H is significantly better, and superscript ‘○’ indicates that H is significantly worse than the respective classifier (paired t test, α = 0.05).

As expected, the set-classification approach was not even once worse than taking the c individual decisions of D _ind. The improvement was statistically significant on 26 out of the 30 data sets (α = 0.05). Even better, the set accuracy happened to be significantly better than the individual accuracy p on 14 out of the 30 data sets. These are mainly two-class data sets (see Proposition 1) except for ‘iris’ and ‘wine’ data sets, both with three classes. Note that Proposition 1 is valid for any classifier, not just the LDC. The Greedy algorithm looks as a good proxy for the optimal assignment. Greedy and H are identical for two classes (15 data sets). For the remaining 15 datasets, H was better than Greedy on 8 data sets, and in 8 of the cases the difference was found to be significant at 0.05. The difference was in favour of Greedy only for the ecoli data set. The sampled algorithm was generally worse than H.

To gain further insight about the success of D _set for two classes, we plotted P ₁ versus P ₂ for all 15 data sets with c = 2 classes. For each data set and for each cross-validation fold we sampled 100 sets of 2 instances, the first from class ω₁ and the second from ω₂. Figure 4 shows a scatterplot of the 15,000 points (P ₁,P ₂). As argued in the proof of the Proposition, D _set will assign correct labels to both instances in the set if P ₁ + P ₂ > 1. The area corresponding to this case is shaded in light grey. The dark grey is where P ₁ > 0.5 and P ₂ > 0.5, i.e., where D _ind will be correct for both instances, and hence for the set. The improvement upon p ² is due to the points that lie in the light shaded area, which, for our example, is a substantial amount: 59% of the points lie in the dark grey area and additional 27% lie in the light grey area.

There is another interesting phenomenon in Fig. 4. The density of points increases toward the right vertical edge of the unit square, where P ₁ values are high regardless of the value of P ₂. A detailed check revealed that this tendency was common to many of the data sets. One possible reason is that the class usually labelled as class ω₁ is the more “standard” of the two classes, in that it is more easily recognised. Then many of the points from this class will have high P ₁. On the other hand, class ω₂ is usually the class of interest, and the objects from that class are not as clearly distinguishable from the other class, hence the smaller values of P ₂.

Figure 5 shows the accuracy gain upon p _ind by using D _set. The points in the figure represent the 30 data sets. The size of the blob corresponds to the number of classes—the larger the size, the larger the c. All points are above the diagonal line indicating p _set ≥ p _ind. The largest improvement is seen for medium values of p _ind and for small number of classes. The figure also shows that for larger c, both p _ind and p _set are smaller, which was also observed in the simulation experiment.

3.2 Illustration

Consider a real example where the task is to recognise six pens in an image as shown in Fig. 6. To demonstrate the benefit from set classification we used deliberately a crude segmentation method, and a primitive classifier. After segmentation, three features were extracted from the segmented rectangle: the mean read, mean green and mean blue. The nearest mean classifier was chosen for the task. A single manually cropped example of the six pens provided the “means” for the classes. Six pieces were automatically segmented from the images as shown in the figure. The segmented pieces were labelled in the six classes. The posterior probabilities for an object x were calculated using the soft-max equation [5]

$$ \hat{P}(\omega_i|{{\mathbf{x}}})={\frac{\exp\{g_i({{\mathbf{x}}})\}} {\sum\nolimits_{j=1}^c\;\exp\{g_j({{\mathbf{x}}})\}}}, $$

where g _i(x) is the discriminant function for class ω_i. For the nearest mean classifier used here, g _i(x) is the negative Euclidean distance from x to the centre of class ω_i in the space of the three features.

The six segmented pens are shown next to the images in the order of how they were classified by D _set. The only incorrect classification occurred in the image in subplot (d) where the first two classes were swapped. All the other sets were labelled correctly. On the other hand, applying D _ind six times and taking the labels forward, none of the four sets was labelled correctly. What is more, in each of the four cases, D _ind labels only one of the six pens correctly while D _set manages to correct all five errors in cases (a), (b) and (c) and four errors in case (d).

4 Conclusions

This study considers simultaneous classification of a set of c objects in c classes, where the set contains exactly one object from each class, called full-class set classification. The proposed Bayes-optimal solution is based upon the Hungarian assignment algorithm. It maximises the sum of the logarithms of the posterior probability estimates for the chosen classes under the one-to-one constraint. Simulation results and experiments with benchmark data support, from a practical perspective, the intuition and the theory about the set-classification method. In the illustration given at the end, we deliberately chose imperfect automatic segmentation and a practically untrained classifier in order to showcase the set-classification approach.

What makes this approach appealing is that the recognition is not tied up with the previous stages of the scenario. For example, no information about the previous positions of the objects that are being tracked is used in the classification. This makes the approach suitable when the objects move quickly and randomly or when the recognition times are spaced apart. An example of the latter can be wild life observation and day-to-day following of a group of animals.

The strength of the set-classification approach is when the classes are difficult to distinguish with high certainty or when there is no time for applying a sophisticated and accurate classifier. The pen-image illustration shows that high accuracy can be achieved even with a very primitive classifier and noisy input data.

One difficulty in applying the set-classification approach is that posterior probabilities are not readily produced by all classifier models. Probabilistic calibration of the outputs of the most popular and accurate classifiers such as decision trees, SVM, and the nearest neighbour are not straightforward [14, 18, 19]. The classifier that lends itself naturally to this task is the Naïve Bayes classifier but it may not be very accurate when the features are dependent.

This paper only lays the ground of the full-class set classification scenario. There are several interesting further directions.

Sensitivity to noise. A theoretical evaluation of the sensitivity of the approach to noise in the estimates of the posterior probabilities can be carried out, similar to the one in [10], but within the current context.
Investigating the accuracy when the number of objects to classify is strictly smaller than the number of classes. This situation will arise more often than the full-class set in automatic systems for classroom attendance, where some students will be missing.
Identifying impostors. It is interesting to investigate set-classification when some of the objects are “impostors”, i.e., they do not belong to any of the classes which the base classifier is trained to recognise. The task would be to gauge the recognition success as a function of the proportion from unknown classes.
Estimating the correct labelling within the set. Here we assumed that if at least two labels are swapped, the set is completely misclassified. For practical applications this may be too harsh, especially when there is a way to remedy a small number of mistakes to achieve correct set-classification. For example, in a face recognition system for classroom attendance, the lecturer may be able to correct swapped identities while still having the rest of the class successfully registered. Thus it will be useful to measure the correct classification within the set as well as the total set-accuracy.
The most important, and at the same time the most challenging future direction is taking into account possible relationships within the set to be classified. For example, when a lecturer learns the names of the students in the class, a strong reference pointers are where the students sit in the lecture theatre and which students sit in a group close to one another. In labelling moving objects, the full-class set classification may be facilitated by modelling laws of physics. Tracking the parts of a composite object will be aided by incorporating knowledge of the spatial structure of the object. It will be interesting to look for a principled approach starting from the problem set-up proposed in this paper.

Notes

The relevance of the logarithm will transpire later in relation to the Bayes optimality of the set classifier. The base of the logarithm can be any.
The Matlab code for the Hungarian algorithm was written by Alex Melin, University of Tennessee, 2006, available through Matlab Central.

References

Amit Y, Trouvé A (2007) POP: patchwork of parts models for object recognition. Int J Comput Vis 75:267–282
Article Google Scholar
Asuncion A, Newman DJ (2007) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences. http://www.ics.uci.edu/~mlearn/MLRepository.html
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Analy Mach Intell 24:509–522
Article Google Scholar
Dietterich TG, Lathrop RH, Lozano-Perez T (1997) Solving the multiple-instance problem with axis-parallel rectangles. Artif Intell 89:31–71
Article MATH Google Scholar
Duda RO, Hart PE, Stork DG (2001) Pattern classification, 2nd edn. Wiley, NY
Google Scholar
Fawcett T (2003) ROC graphs: notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Labs, Palo Alto. http://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf
Hand DJ (2006) Classifier technology and the illusion of progress (with discussion). Stat Sci 21:1–34
Article MATH MathSciNet Google Scholar
Kaucic R, Perera AGA, Brooksby G, Kaufhold J, Hoogs A (2005) A unified framework for tracking through occlusions and across sensor gaps. In: IEEE computer society conference on computer vision and pattern recognition, CVPR, vol 1, pp 1063–1069
Kuhn HW (1955) The Hungarian method for the assignment problem. Nav Res Logist Q 2:83–97
Article Google Scholar
Kuncheva LI (2002) A theoretical study on expert fusion strategies. IEEE Trans Pattern Anal Mach Intell 24(2):281–286
Article Google Scholar
Mangasarian OL, Wild EW (2008) Multiple instance classification via successive linear programming. J Optim Theory Appl 137:555–568
Article MATH MathSciNet Google Scholar
McDowell LK, Gupta KM, Aha DW (2007) Cautious inference in collective classification. In: Processdings of AAAI, pp 596–601
Ning X, Karypis G (2009) The set classification problem and solution methods. In: Proceedings of SIAM data mining, pp 847–858
Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3):199–215
Article MATH Google Scholar
Ripley BD (1996) Pattern recognition and neural networks. University Press, Cambridge
MATH Google Scholar
Sen P, Namata G, Bilgic M, Getoor L, Gallagher B, Eliassi-Rad T (2008) Collective classification in network data. AI Magazine 29:93–106
Google Scholar
Wang J, Zucker J-Dl (2000) Solving the multiple-instance problem: a lazy learning approach. In: Proceedings 17th international conference on machine learning, pp 1119–1125
Zadrozny B, Elkan C (2001) Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proceedings of the eighteenth international conference on machine learning (ICML’01), pp 609–616
Zadrozny B, Elkan C (2002) Transforming classifier scores into accurate multiclass probability estimates. In: Proceedings of the 8th international conference on knowledge discovery and data mining (KDD’02)

Download references

Author information

Authors and Affiliations

School of Computer Science, Bangor University, Bangor, Gwynedd, LL57 1UT, UK
Ludmila I. Kuncheva

Authors

Ludmila I. Kuncheva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ludmila I. Kuncheva.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kuncheva, L.I. Full-class set classification using the Hungarian algorithm. Int. J. Mach. Learn. & Cyber. 1, 53–61 (2010). https://doi.org/10.1007/s13042-010-0002-z

Download citation

Received: 23 March 2010
Accepted: 23 June 2010
Published: 16 September 2010
Issue Date: December 2010
DOI: https://doi.org/10.1007/s13042-010-0002-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Full-class set classification using the Hungarian algorithm

Abstract

Similar content being viewed by others

Hierarchical Single Label Classification: An Alternative Approach

Who Is Missing? A New Pattern Recognition Puzzle

SAMME.C2 algorithm for imbalanced multi-class classification

1 Introduction