Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 29, Number 1, February 2024
Page(s) 51 - 58
DOI https://doi.org/10.1051/wujns/2024291051
Published online 15 March 2024

© Wuhan University 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

The multi-label classification problem is an emerging machine learning paradigm [1]. A single sample could be related to several classes simultaneously, and the class labels are no longer mutually exclusive. Nowadays, there are lots of real-world applications, such as text categorization, automatic scene classification, gene function prediction, and music emotions annotation [2]. In recent years, with in-depth research on multi-label classification problems, many multi-label classification methods have been proposed. Most authors agree with the taxonomy that differentiates between two main approaches in solving multi-label classification problems: problem transformation methods and algorithm adaptation methods [2]. Despite being studied extensively, most existing studies of multi-label classification are restricted to batch learning, which requires that the whole training dataset is read into memory and processed for the decision model at once. However, these batch learning methods, especially for classifying large-scale datasets, will consume a significant amount of time and space resources.

Online learning algorithms represent a family of quick and simple machine learning techniques, which generally make few statistical assumptions about the datasets and are often used for classification problems. As discussed in this thesis, it is to update the model on each round without retraining. In general, online learning aims to incrementally learn some prediction models to make correct predictions on a stream of examples that arrive sequentially. Online learning is advantageous for its high efficiency and scalability for large-scale applications. It has been applied to solve online classification tasks in various real-world data mining applications. In the machine learning community, online learning has been actively studied, and various online learning algorithms have been proposed, including first-order and second-order online learning algorithms [3]. For example, the classical and popular first-order algorithms include perceptron [4] and passive-aggressive (PA) [5]. Examples of second-order online learning algorithms include adaptive regularization of weights (AROW) [6] and soft confidence-weighted (SCW) [7]. These methods can be explored to guide and enhance algorithm classification performance. Most aforementioned online algorithms, however, are online single-label algorithms. Aiming at these issues, we use online learning for large-scale learning tasks in multi-label classification problems.

In traditional online learning classification, a learner is trained sequentially to predict the class labels of a sequence of samples as precisely as possible [8-10]. In this paper, we propose a multi-label online learning algorithm by maximizing the margin between the relevant and irrelevant labels in multi-label samples: learning label correlations for multi-label online passive-aggressive classification algorithm (MLRPA), which builds a sorted error set by predicting the pairs of labels and then updates its classifier model according to the size of error set. The experiment compares the MLRPA algorithm with four other algorithms on four multi-label datasets, according to four popular and indicative performance measures. The experimental results show that the MLRPA algorithm has good performance.

1 Related Work

1.1 Online Learning

Online learning reduces computation costs by incrementally updating classifiers in which new training samples are taken into account instead of training the classifier again from the beginning on the combined training samples. Online learning has been actively studied in the machine learning community, in which various online learning algorithms have been presented, including several first-order and second-order algorithms. One of the most well-known first-order online approaches is the perceptron algorithm, which updates the learning function by adding the misclassified example with a previous weight to update the current weight vectors. Recently, a number of online learning algorithms have been developed based on the criterion of maximum margin. One representative example is the PA algorithm. It updates the classification function when a new sample is misclassified, or its classification score does not exceed the predefined margin. Empirical studies showed that the maximum margin-based online learning algorithms are generally more effective than the perceptron algorithm. Recent years have seen a surge of studies on second-order online learning algorithms, which have shown that parameter confidence information can be explored to guide and improve online learning performance [3].

Multi-label classification problem has always been an emerging machine learning paradigm. Because of many real-world applications, a few online multi-label classification methods have recently been studied [11]. For example, Ref.[12] proposed an extreme learning machine-based online universal classifier that is independent of classification type and can perform all three types of classification. Moreover, cost-sensitive dynamic principal projection (CS-DPP) [13] resolves three important real-world issues: online updating, label space dimension reduction (LSDR), and cost sensitivity. Based on binary relevance, Refs.[14,15] presented online multi-label algorithms in which the dataset is divided into many single-label datasets to solve the multi-label classification problem. The online multi-label semi-supervised (OMLSS) [16] introduced non-local labeling functions taking into account the topology of the network in the prediction of a label and improving the influence of the labeling strategy on the topology of the network in the multi-label case, using the labels to improve the synaptic links.

1.2 Label Correlations

To cope with the challenge of exponential-sized output space, it is essential to facilitate the learning process by exploiting correlations among labels [17]. Therefore, effectively exploiting the label correlation information is deemed crucial for the success of multi-label learning techniques [18]. According to the different ways and degrees of label correlation, the existing multi-label classification methods can be roughly divided into three categories. First-order strategy: the task of multi-label learning is tackled in a label-by-label style, thus ignoring the co-existence of the other labels. The prominent merit of the first-order approach lies in its conceptual simplicity and high efficiency. Typical first-order strategy algorithms include Binary Relevance (BR) [19] and multi-label k-nearest neighbor (ML-KNN) [20]. However, the effectiveness of the resulting approaches has low generalization performance due to the ignorance of label correlations. Second-order strategy: the task of multi-label learning is tackled by considering pairwise relations between labels. As label correlations are exploited to some extent by the second-order approach, the resulting methods can obtain good generalization performance. The most representative algorithm is RankSVM( Rank Support Vector Machine) [21]. High-order strategy: the task of multi-label learning is tackled by considering high-order relations among labels, such as imposing all other label's influences on each label [22]. However, dealing with large-scale learning problems is difficult because of the high computational complexity [23]. Traditional multi-label online methods often need to request all class labels of each incoming sample. The correlations among labels are not considered in the multi-label online classification problem. We propose a multi-label online classification algorithm based on label correlations to compensate for this shortcoming.

2 Proposed Method

In the multi-label classification problem, a single instance could be related to several classes simultaneously, and the class labels are no longer mutually exclusive. For concreteness, we assume that there are q different possible labels and denote the set of all possible labels by Q={1,2,…,q}. At each step t, the set of relevant labels Yt of instance is therefore a subset of Y. We can say that label y is relevant to the sample , if . Otherwise, the set of irrelevant labels is a subset of Y. This setting is often discussed in text categorization applications. As we all know, the learning paradigm and the analysis that we use in this paper belong to the mistake-bound model for online learning. The interval between sample labels in multi-label sorting is defined as follows:

(1)

On round t, an online learning algorithm receives an instance . Given the instance , the learning algorithm outputs a ranking , where is induced by . The algorithm then receives the (correct) set of relevant labels . Given the feedback and the predicted topic ranking , the algorithm computes the associated loss . If is zero, the algorithm does not modify the model. Otherwise, it updates its label-ranking rule by modifying the set of weights . The goal of the online label-ranking algorithm is to suffer a cumulative loss that is competitive with a zero cumulative loss. The MLRPA algorithm we describe employs a refined notion of a mistake by examining all pairs of labels. The goal of this method is that the margin of an instance is positive, which means that all the sets of relevant labels are ranked higher than the non-relevant labels. Therefore, we define the multi-label loss function:

(2)

where w represents the weight vector of the instance. r and s denote the number of prediction errors in the relevant and irrelevant labels of instance x, respectively. represents the interval function between all relevant and irrelevant labels. It can be seen from the above loss that the sum of each pair of labels is calculated for the case where the ranking of irrelevant labels in the sample is higher than the relevant label. The classification performance of the algorithm model can be improved by fully considering the correlation among the labels.

To improve the classification performance of the algorithm, we add slack variables to the objective function. The MLRPA algorithm optimization model is as follows:

(3)

where is the slack variables and C > 0,. When the loss of function , the algorithm does not update the weight value. When , similar to the multi-class PA algorithm, the above optimization model is divided into two parts to meet the Lagrange function:

(4)

and

(5)

where and ≥ 0 are Lagrange multipliers. is slack variables. At time t, taking the derivative of the above formula (4) and (5) concerning and , we get

(6)

and

(7)

where indicates the sum of squares of the number of pairs of labels predicted to be unordered.

(8)

Taking the derivative of L (·) with respect to ξ and setting it to zero, we get

(9)

the Karush-Kuhn-Tucker (KKT) conditions confine λ to be non-negative, so we conclude that

(10)

Finally, taking the derivative of L(·) with respect to and setting it to zero, we get

(11)

Plugging the Eq. (9) back into Eq.(4).

(12)

Simplifying the above formula, we get

(13)

Finally, we summarize the proposed method in the following Algorithm 1. In Algorithm 1, the classifier weights matrix and parameters are initialized. At time t, the classifier accepts a sample and then predicts the relevant and irrelevant labels of the sample. Then, the MLRPA algorithm calculates the loss function based on the obtained real and predicted labels. Finally, the MLRPA algorithm updates the weight of the classifier according to whether the loss function is greater than zero.

Algorithm 1 MLRPA learning algorithm
Input: Set the key parameters: C = 1.
1: Initialize weight matrix W.
2: fort = 1,…, Tdo
3:   Receives an incoming sample
4:   Calculate the learner Wt ·
5:   Get the real relevant labels and irrelevant labels
6:    The loss is calculated according to Eq. (2)
7:   if > 0 then
8:    fordo
9:        According to Eq. (6) and Eq. (7), update weight          vector
10:    end for
12:   end if
13: end for

3 Experiments

3.1 Four Datasets and Four Existing Methods

We use four benchmark datasets: Corel5K(image), Corel16k001(image), Delicious(text), and Tmc2007(text) to validate our proposed method. Some valuable statistics of these datasets are provided in Table 1. In this study, we select three different multi-label online classification algorithms: perceptron algorithm based on multi class and multi label feedback (MMP) [24], BR-PA, and BR-perceptron (BR-PE), and a multi-label offline classification algorithm: ML-KNN. The MMP algorithm is based on a perceptron-based multi-label online classification algorithm, which considers label correlations. The BR-PA algorithm and BR-PE algorithm are based on the problem decomposition strategy of the multi-label online classification algorithm. The BR-PA algorithm transforms the multi-label classification problem into two types of online PA algorithm, and the BR-PE algorithm is similar to the BR-PA algorithm, which transforms the multi-label classification problem into a binary online perceptron algorithm. The ML-KNN is derived from the traditional k-Nearest Neighbor (KNN) algorithm. For each unseen instance, its k nearest neighbors in the training set are first identified. After that, based on statistical information gained from the label sets of these neighboring instances.

In BR-PA and our MLRPA, two parameters are set as C = 1. For the ML-KNN algorithm, the smoothing factor s = 1 and the nearest neighbor k = 10. In order to ensure that the datasets are realistic, we randomize the dataset. In the experiment, we use online learning in the training dataset and offline in the test dataset. All the experiments were executed with about 10 runs of different random permutations for each dataset. All the results were reported by averaging over these 10 runs.

Table 1

Statistics for four benchmark multi-label datasets

3.2 Performance Comparison

First, we have selected four large-scale multi-label datasets on four online algorithms in training datasets, where Fig. 1 and Fig. 2 show the average precision and one error values. Figure 1 shows the average precision of the four different algorithms for multi-label online learning. We observe that with the increase of the training sample set, the average precision of the four different online classification algorithms also increases. The MLRPA algorithm proposed in this paper has an average precision value better than other algorithms. In addition, on the image datasets (Corel16k001), the performance of the online algorithm considering label correlation is better than the algorithm based on the decomposition strategy. Figure 2 summarizes the one error performance of the four diverse algorithms. With the increase of the training sample set, the proposed MLRPA algorithm has fewer errors than other algorithms. The performance is more prominent, especially in the image dataset (Corel16k001). Experimental results show that the MLRPA algorithm based on label correlation significantly outperforms the online algorithm based on problem decomposition when dealing with multi-label online classification. MLRPA algorithm is also more suitable for dealing with large-scale dataset classification problems.

thumbnail Fig. 1

The average precision analysis of four algorithms on four datasets

thumbnail Fig. 2

The one error analysis of four algorithms on four datasets

To further evaluate the online MLRPA algorithm, we compare the MLRPA algorithm with online and offline algorithms for classification performance in test datasets. Tables 2-5 show detailed metrics of four different algorithms for multi-label online learning on four testing datasets in an offline learning way. The experiment results are listed in Tables 2-5 according to four criteria mentioned in this paper, where the optimal value of each dataset among four methods is highlighted in boldface. To compare these methods, we sort four methods on a single metric with 1-5 and calculate the average rank of each method over all four datasets, as listed in the last rows of Tables 2-5.

The results in Table 2 to Table 5 show that our proposed MLRPA algorithm achieves good classification performance for all datasets under four different evaluation criteria. Table 2 shows one error value on four datasets. The MLRPA works the best on four datasets. From the ranking loss in Table 3, the MLRPA performs the best on four datasets and achieves the first rank. In Table 4 for average precision, the four best values are from our MLRPA. At the same time, the MLRPA obtains the best rank. According to Table 5, the MLRPA works the best on all datasets and receives the best rank. In addition, we observe that our MLRPA outperforms the ML-KNN and MMP on all datasets. The experimental results also show that the MLRPA algorithm based on label correlation is more suitable for large-scale multi-label classification problems.

Table 2

Performance comparisons on one error (unit:%)

Table 3

Performance comparisons on ranking loss (unit:%)

Table 4

Performance comparisons on average precision (unit:%)

Table 5

Performance comparisons on coverage (unit:%)

4 Conclusion

This paper investigated multi-label online learning techniques for machine learning and mining data streams. By maximizing the margin between the relevant and irrelevant labels in multi-label samples, we proposed a new algorithm for multi-label classification, which considers the label correlations. The experimental results on four datasets illustrate that our proposed method works better than three existing methods, including a simple MMP ranking algorithm and two multi-label online approaches, according to four widely-used evaluation metrics. For further work, we will validate the effectiveness of our method on more datasets and take online second-order algorithms into account in multi-label classification problems..

References

  1. Hoi S C H, Wang J L, Zhao P L. Libol: A library for online learning algorithms[J]. Journal of Machine Learning Research, 2014, 15: 495- 499. [Google Scholar]
  2. Crammer K, Kulesza A, Dredze M. Adaptive regularization of weight vectors[J]. Machine Learning, 2013, 91(2): 155-187. [CrossRef] [MathSciNet] [Google Scholar]
  3. Wang J L, Zhao P L, Hoi S C H. Exact soft confidence-weighted learning[C]// Proceedings of the 29th International Conference on Machine Learning (ICML12). New York: ACM, 2012: 121-128. [Google Scholar]
  4. Siblini W, Kuntz P, Meyer F. A review on dimensionality reduction for multi-label classification[J]. IEEE Trans Knowl Data Eng, 2021, 33: 839-857. [Google Scholar]
  5. Gibaja E, Ventura S. A tutorial on multilabel learning[J]. ACM Computing Surveys, 2015, 47(3): 1-38. [CrossRef] [Google Scholar]
  6. Li P Y, Wang H L, Böhm C, et al. Online semi-supervised multi-label classification with label compression and local smooth regression[C]//Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2020, 20: 1359-1365. [Google Scholar]
  7. Liang S P, Liu Z, You D L, et al. Online multi-label stream feature selection based on neighborhood rough set with missing labels[J]. Pattern Analysis and Applications, 2022, 25(4): 1025-1039. [CrossRef] [Google Scholar]
  8. Gong K L, Zhai T T. An online active multi-label classification algorithm based on a hybrid label query strategy[C]//2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). New York: IEEE, 2021: 463-468. [Google Scholar]
  9. Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386-408. [CrossRef] [PubMed] [Google Scholar]
  10. Crammer K, Dekel O, Keshet J, et al. Online passive-aggressive algorithms[J]. Journal of Machine Learning Research, 2006, 7: 551-585. [MathSciNet] [Google Scholar]
  11. Liu J H, Lin Y J, Li Y W, et al. Online multi-label streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84: 273-287. [NASA ADS] [CrossRef] [Google Scholar]
  12. Er M J, Venkatesan R, Wang N. An online universal classifier for binary, multi-class and multi-label classification[C]//2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). New York: IEEE, 2016: 3701-3706 . [Google Scholar]
  13. Chu H M, Huang K H, Lin H T. Dynamic principal projection for cost-sensitive online multi-label classification[J]. Machine Learning, 2019, 108(8/9): 1193-1230. [CrossRef] [MathSciNet] [Google Scholar]
  14. Guo X Z, Zhang Y W, Xu J H. Online multi-label passive aggressive active learning algorithm based on binary relevance[C]//Neural Information Processing. Cham: Springer-Verlag, 2017, 10: 256-266. [Google Scholar]
  15. Liu J, Guo Z W, Sun Z W, et al. Online multi-label feature selection on imbalanced data sets[C]//Communications in Computer and Information Science. Singapore: Springer-Verlag, 2018, 812: 165-174. [Google Scholar]
  16. Boulbazine S, Cabanes G, Matei B, et al. Online semi-supervised growing neural gas for multi-label data classification[C]//2018 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2018: 1-8. [Google Scholar]
  17. Huang S J, Zhou Z H. Multi-label learning by exploiting label correlations locally[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 26(1): 949-955. [CrossRef] [Google Scholar]
  18. Zhang M L, Zhang K. Multi-label learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 999-1008. [Google Scholar]
  19. Zhang M L, Zhou Z H. A review on multi-label learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 1819-1837. [CrossRef] [Google Scholar]
  20. Zhang M L, Zhou Z H. ML-KNN: A lazy learning approach to multi-label learning[J]. Pattern Recognition, 2007, 40(7): 2038-2048. [CrossRef] [Google Scholar]
  21. Ju X C, Tian Y J, Liu D L, et al. Nonparallel hyperplanes support vector machine for multi-class classification[J]. Procedia Computer Science, 2015, 51(1): 1574-1582. [CrossRef] [Google Scholar]
  22. Jia X Y, Li Z C, Zheng X A, et al. Label distribution learning with label correlations on local samples[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(4): 1619-1631. [CrossRef] [Google Scholar]
  23. Li R X, Du J X, Ding J M, et al. Semi-supervised multi-label dimensionality reduction learning by instance and label correlations[J]. Mathematics, 2023, 11(3): 782. [CrossRef] [Google Scholar]
  24. Crammer K, Singer Y. A family of additive online algorithms for category ranking[J]. Journal of Machine Learning Research, 2003, 3: 1025-1058. [MathSciNet] [Google Scholar]

All Tables

Table 1

Statistics for four benchmark multi-label datasets

Table 2

Performance comparisons on one error (unit:%)

Table 3

Performance comparisons on ranking loss (unit:%)

Table 4

Performance comparisons on average precision (unit:%)

Table 5

Performance comparisons on coverage (unit:%)

All Figures

thumbnail Fig. 1

The average precision analysis of four algorithms on four datasets

In the text
thumbnail Fig. 2

The one error analysis of four algorithms on four datasets

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.