Issue 
Wuhan Univ. J. Nat. Sci.
Volume 29, Number 1, February 2024



Page(s)  51  58  
DOI  https://doi.org/10.1051/wujns/2024291051  
Published online  15 March 2024 
Computer Science
CLC number: TP391
Learning Label Correlations for MultiLabel Online Passive Aggressive Classification Algorithm
Nanjing NARI Intelligent Transport Technology Co., Ltd., Nanjing 210061, Jiangsu, China
Received:
25
July
2023
Label correlations are an essential technique for data mining that solves the possible correlation problem between different labels in multilabel classification. Although this technique is widely used in multilabel classification problems, batch learning deals with most issues, which consumes a lot of time and space resources. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for largescale datasets. However, existing online learning research has done little to consider correlations between labels. On the basis of existing research, this paper proposes a multilabel online learning algorithm based on label correlations by maximizing the interval between related labels and unrelated labels in multilabel samples. We evaluate the performance of the proposed algorithm on several public datasets. Experiments show the effectiveness of our algorithm.
Key words: label correlations / passive aggressive / multilabel classification / online learning
Cite this article: ZHANG Yongwei. Learning Label Correlations for MultiLabel Online Passive Aggressive Classification Algorithm[J]. Wuhan Univ J of Nat Sci, 2024, 29(1): 5158.
Biography: ZHANG Yongwei, male, Master, Engineer, research directions: machine learning, and artificial intelligence technology, etc. Email: zywei_1988@163.com
Fundation item: Supported by the State Grid Technology Item (52460D230002)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
The multilabel classification problem is an emerging machine learning paradigm^{ [1]}. A single sample could be related to several classes simultaneously, and the class labels are no longer mutually exclusive. Nowadays, there are lots of realworld applications, such as text categorization, automatic scene classification, gene function prediction, and music emotions annotation^{ [2]}. In recent years, with indepth research on multilabel classification problems, many multilabel classification methods have been proposed. Most authors agree with the taxonomy that differentiates between two main approaches in solving multilabel classification problems: problem transformation methods and algorithm adaptation methods^{ [2]}. Despite being studied extensively, most existing studies of multilabel classification are restricted to batch learning, which requires that the whole training dataset is read into memory and processed for the decision model at once. However, these batch learning methods, especially for classifying largescale datasets, will consume a significant amount of time and space resources.
Online learning algorithms represent a family of quick and simple machine learning techniques, which generally make few statistical assumptions about the datasets and are often used for classification problems. As discussed in this thesis, it is to update the model on each round without retraining. In general, online learning aims to incrementally learn some prediction models to make correct predictions on a stream of examples that arrive sequentially. Online learning is advantageous for its high efficiency and scalability for largescale applications. It has been applied to solve online classification tasks in various realworld data mining applications. In the machine learning community, online learning has been actively studied, and various online learning algorithms have been proposed, including firstorder and secondorder online learning algorithms^{ [3]}. For example, the classical and popular firstorder algorithms include perceptron^{ [4]} and passiveaggressive (PA)^{ [5]}. Examples of secondorder online learning algorithms include adaptive regularization of weights (AROW)^{ [6]} and soft confidenceweighted (SCW)^{ [7]}. These methods can be explored to guide and enhance algorithm classification performance. Most aforementioned online algorithms, however, are online singlelabel algorithms. Aiming at these issues, we use online learning for largescale learning tasks in multilabel classification problems.
In traditional online learning classification, a learner is trained sequentially to predict the class labels of a sequence of samples as precisely as possible^{ [810]}. In this paper, we propose a multilabel online learning algorithm by maximizing the margin between the relevant and irrelevant labels in multilabel samples: learning label correlations for multilabel online passiveaggressive classification algorithm (MLRPA), which builds a sorted error set by predicting the pairs of labels and then updates its classifier model according to the size of error set. The experiment compares the MLRPA algorithm with four other algorithms on four multilabel datasets, according to four popular and indicative performance measures. The experimental results show that the MLRPA algorithm has good performance.
1 Related Work
1.1 Online Learning
Online learning reduces computation costs by incrementally updating classifiers in which new training samples are taken into account instead of training the classifier again from the beginning on the combined training samples. Online learning has been actively studied in the machine learning community, in which various online learning algorithms have been presented, including several firstorder and secondorder algorithms. One of the most wellknown firstorder online approaches is the perceptron algorithm, which updates the learning function by adding the misclassified example with a previous weight to update the current weight vectors. Recently, a number of online learning algorithms have been developed based on the criterion of maximum margin. One representative example is the PA algorithm. It updates the classification function when a new sample is misclassified, or its classification score does not exceed the predefined margin. Empirical studies showed that the maximum marginbased online learning algorithms are generally more effective than the perceptron algorithm. Recent years have seen a surge of studies on secondorder online learning algorithms, which have shown that parameter confidence information can be explored to guide and improve online learning performance^{ [3]}.
Multilabel classification problem has always been an emerging machine learning paradigm. Because of many realworld applications, a few online multilabel classification methods have recently been studied ^{[11]}. For example, Ref.[12] proposed an extreme learning machinebased online universal classifier that is independent of classification type and can perform all three types of classification. Moreover, costsensitive dynamic principal projection (CSDPP)^{ [13]} resolves three important realworld issues: online updating, label space dimension reduction (LSDR), and cost sensitivity. Based on binary relevance, Refs.[14,15] presented online multilabel algorithms in which the dataset is divided into many singlelabel datasets to solve the multilabel classification problem. The online multilabel semisupervised (OMLSS)^{ [16]} introduced nonlocal labeling functions taking into account the topology of the network in the prediction of a label and improving the inﬂuence of the labeling strategy on the topology of the network in the multilabel case, using the labels to improve the synaptic links.
1.2 Label Correlations
To cope with the challenge of exponentialsized output space, it is essential to facilitate the learning process by exploiting correlations among labels^{ [17]}. Therefore, effectively exploiting the label correlation information is deemed crucial for the success of multilabel learning techniques^{ [18]}. According to the different ways and degrees of label correlation, the existing multilabel classification methods can be roughly divided into three categories. Firstorder strategy: the task of multilabel learning is tackled in a labelbylabel style, thus ignoring the coexistence of the other labels. The prominent merit of the firstorder approach lies in its conceptual simplicity and high efficiency. Typical firstorder strategy algorithms include Binary Relevance (BR)^{ [19] }and multilabel knearest neighbor (MLKNN)^{ [20]}. However, the effectiveness of the resulting approaches has low generalization performance due to the ignorance of label correlations. Secondorder strategy: the task of multilabel learning is tackled by considering pairwise relations between labels. As label correlations are exploited to some extent by the secondorder approach, the resulting methods can obtain good generalization performance. The most representative algorithm is RankSVM( Rank Support Vector Machine)^{ [21]}. Highorder strategy: the task of multilabel learning is tackled by considering highorder relations among labels, such as imposing all other label's influences on each label^{ [22]}. However, dealing with largescale learning problems is difficult because of the high computational complexity ^{[23]}. Traditional multilabel online methods often need to request all class labels of each incoming sample. The correlations among labels are not considered in the multilabel online classification problem. We propose a multilabel online classification algorithm based on label correlations to compensate for this shortcoming.
2 Proposed Method
In the multilabel classification problem, a single instance could be related to several classes simultaneously, and the class labels are no longer mutually exclusive. For concreteness, we assume that there are q different possible labels and denote the set of all possible labels by Q={1,2,…,q}. At each step t, the set of relevant labels Y_{t} of instance is therefore a subset of Y. We can say that label y is relevant to the sample , if . Otherwise, the set of irrelevant labels is a subset of Y. This setting is often discussed in text categorization applications. As we all know, the learning paradigm and the analysis that we use in this paper belong to the mistakebound model for online learning. The interval between sample labels in multilabel sorting is defined as follows:
On round t, an online learning algorithm receives an instance . Given the instance , the learning algorithm outputs a ranking , where is induced by . The algorithm then receives the (correct) set of relevant labels . Given the feedback and the predicted topic ranking , the algorithm computes the associated loss . If is zero, the algorithm does not modify the model. Otherwise, it updates its labelranking rule by modifying the set of weights . The goal of the online labelranking algorithm is to suffer a cumulative loss that is competitive with a zero cumulative loss. The MLRPA algorithm we describe employs a refined notion of a mistake by examining all pairs of labels. The goal of this method is that the margin of an instance is positive, which means that all the sets of relevant labels are ranked higher than the nonrelevant labels. Therefore, we define the multilabel loss function:
where w represents the weight vector of the instance. r and s denote the number of prediction errors in the relevant and irrelevant labels of instance x, respectively. represents the interval function between all relevant and irrelevant labels. It can be seen from the above loss that the sum of each pair of labels is calculated for the case where the ranking of irrelevant labels in the sample is higher than the relevant label. The classification performance of the algorithm model can be improved by fully considering the correlation among the labels.
To improve the classification performance of the algorithm, we add slack variables to the objective function. The MLRPA algorithm optimization model is as follows:
where is the slack variables and C > 0,. When the loss of function , the algorithm does not update the weight value. When , similar to the multiclass PA algorithm, the above optimization model is divided into two parts to meet the Lagrange function:
and
where and ≥ 0 are Lagrange multipliers. is slack variables. At time t, taking the derivative of the above formula (4) and (5) concerning and , we get
and
where indicates the sum of squares of the number of pairs of labels predicted to be unordered.
Taking the derivative of L (·) with respect to ξ and setting it to zero, we get
the KarushKuhnTucker (KKT) conditions confine λ to be nonnegative, so we conclude that
Finally, taking the derivative of L(·) with respect to and setting it to zero, we get
Plugging the Eq. (9) back into Eq.(4).
Simplifying the above formula, we get
Finally, we summarize the proposed method in the following Algorithm 1. In Algorithm 1, the classifier weights matrix and parameters are initialized. At time t, the classifier accepts a sample and then predicts the relevant and irrelevant labels of the sample. Then, the MLRPA algorithm calculates the loss function based on the obtained real and predicted labels. Finally, the MLRPA algorithm updates the weight of the classifier according to whether the loss function is greater than zero.
Algorithm 1 MLRPA learning algorithm 

Input: Set the key parameters: C = 1. 1: Initialize weight matrix W. 2: fort = 1,…, Tdo 3: Receives an incoming sample 4: Calculate the learner W_{t} · 5: Get the real relevant labels and irrelevant labels 6: The loss is calculated according to Eq. (2) 7: if > 0 then 8: fordo 9: According to Eq. (6) and Eq. (7), update weight vector 10: end for 12: end if 13: end for 
3 Experiments
3.1 Four Datasets and Four Existing Methods
We use four benchmark datasets: Corel5K(image), Corel16k001(image), Delicious(text), and Tmc2007(text) to validate our proposed method. Some valuable statistics of these datasets are provided in Table 1. In this study, we select three different multilabel online classification algorithms: perceptron algorithm based on multi class and multi label feedback (MMP)^{ [24]}, BRPA, and BRperceptron (BRPE), and a multilabel offline classification algorithm: MLKNN. The MMP algorithm is based on a perceptronbased multilabel online classification algorithm, which considers label correlations. The BRPA algorithm and BRPE algorithm are based on the problem decomposition strategy of the multilabel online classification algorithm. The BRPA algorithm transforms the multilabel classification problem into two types of online PA algorithm, and the BRPE algorithm is similar to the BRPA algorithm, which transforms the multilabel classification problem into a binary online perceptron algorithm. The MLKNN is derived from the traditional kNearest Neighbor (KNN) algorithm. For each unseen instance, its k nearest neighbors in the training set are first identified. After that, based on statistical information gained from the label sets of these neighboring instances.
In BRPA and our MLRPA, two parameters are set as C = 1. For the MLKNN algorithm, the smoothing factor s = 1 and the nearest neighbor k = 10. In order to ensure that the datasets are realistic, we randomize the dataset. In the experiment, we use online learning in the training dataset and offline in the test dataset. All the experiments were executed with about 10 runs of different random permutations for each dataset. All the results were reported by averaging over these 10 runs.
Statistics for four benchmark multilabel datasets
3.2 Performance Comparison
First, we have selected four largescale multilabel datasets on four online algorithms in training datasets, where Fig. 1 and Fig. 2 show the average precision and one error values. Figure 1 shows the average precision of the four different algorithms for multilabel online learning. We observe that with the increase of the training sample set, the average precision of the four different online classification algorithms also increases. The MLRPA algorithm proposed in this paper has an average precision value better than other algorithms. In addition, on the image datasets (Corel16k001), the performance of the online algorithm considering label correlation is better than the algorithm based on the decomposition strategy. Figure 2 summarizes the one error performance of the four diverse algorithms. With the increase of the training sample set, the proposed MLRPA algorithm has fewer errors than other algorithms. The performance is more prominent, especially in the image dataset (Corel16k001). Experimental results show that the MLRPA algorithm based on label correlation significantly outperforms the online algorithm based on problem decomposition when dealing with multilabel online classification. MLRPA algorithm is also more suitable for dealing with largescale dataset classification problems.
Fig. 1 The average precision analysis of four algorithms on four datasets 
Fig. 2 The one error analysis of four algorithms on four datasets 
To further evaluate the online MLRPA algorithm, we compare the MLRPA algorithm with online and offline algorithms for classification performance in test datasets. Tables 25 show detailed metrics of four different algorithms for multilabel online learning on four testing datasets in an offline learning way. The experiment results are listed in Tables 25 according to four criteria mentioned in this paper, where the optimal value of each dataset among four methods is highlighted in boldface. To compare these methods, we sort four methods on a single metric with 15 and calculate the average rank of each method over all four datasets, as listed in the last rows of Tables 25.
The results in Table 2 to Table 5 show that our proposed MLRPA algorithm achieves good classification performance for all datasets under four different evaluation criteria. Table 2 shows one error value on four datasets. The MLRPA works the best on four datasets. From the ranking loss in Table 3, the MLRPA performs the best on four datasets and achieves the first rank. In Table 4 for average precision, the four best values are from our MLRPA. At the same time, the MLRPA obtains the best rank. According to Table 5, the MLRPA works the best on all datasets and receives the best rank. In addition, we observe that our MLRPA outperforms the MLKNN and MMP on all datasets. The experimental results also show that the MLRPA algorithm based on label correlation is more suitable for largescale multilabel classification problems.
Performance comparisons on one error (unit:%)
Performance comparisons on ranking loss (unit:%)
Performance comparisons on average precision (unit:%)
Performance comparisons on coverage (unit:%)
4 Conclusion
This paper investigated multilabel online learning techniques for machine learning and mining data streams. By maximizing the margin between the relevant and irrelevant labels in multilabel samples, we proposed a new algorithm for multilabel classification, which considers the label correlations. The experimental results on four datasets illustrate that our proposed method works better than three existing methods, including a simple MMP ranking algorithm and two multilabel online approaches, according to four widelyused evaluation metrics. For further work, we will validate the effectiveness of our method on more datasets and take online secondorder algorithms into account in multilabel classification problems..
References
 Hoi S C H, Wang J L, Zhao P L. Libol: A library for online learning algorithms[J]. Journal of Machine Learning Research, 2014, 15: 495 499. [Google Scholar]
 Crammer K, Kulesza A, Dredze M. Adaptive regularization of weight vectors[J]. Machine Learning, 2013, 91(2): 155187. [CrossRef] [MathSciNet] [Google Scholar]
 Wang J L, Zhao P L, Hoi S C H. Exact soft confidenceweighted learning[C]// Proceedings of the 29th International Conference on Machine Learning (ICML12). New York: ACM, 2012: 121128. [Google Scholar]
 Siblini W, Kuntz P, Meyer F. A review on dimensionality reduction for multilabel classification[J]. IEEE Trans Knowl Data Eng, 2021, 33: 839857. [Google Scholar]
 Gibaja E, Ventura S. A tutorial on multilabel learning[J]. ACM Computing Surveys, 2015, 47(3): 138. [CrossRef] [Google Scholar]
 Li P Y, Wang H L, Böhm C, et al. Online semisupervised multilabel classification with label compression and local smooth regression[C]//Proceedings of the TwentyNinth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2020, 20: 13591365. [Google Scholar]
 Liang S P, Liu Z, You D L, et al. Online multilabel stream feature selection based on neighborhood rough set with missing labels[J]. Pattern Analysis and Applications, 2022, 25(4): 10251039. [CrossRef] [Google Scholar]
 Gong K L, Zhai T T. An online active multilabel classification algorithm based on a hybrid label query strategy[C]//2021 3rd International Conference on Machine Learning, Big Data and Business Intelligence (MLBDBI). New York: IEEE, 2021: 463468. [Google Scholar]
 Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain[J]. Psychological Review, 1958, 65(6): 386408. [CrossRef] [PubMed] [Google Scholar]
 Crammer K, Dekel O, Keshet J, et al. Online passiveaggressive algorithms[J]. Journal of Machine Learning Research, 2006, 7: 551585. [MathSciNet] [Google Scholar]
 Liu J H, Lin Y J, Li Y W, et al. Online multilabel streaming feature selection based on neighborhood rough set[J]. Pattern Recognition, 2018, 84: 273287. [NASA ADS] [CrossRef] [Google Scholar]
 Er M J, Venkatesan R, Wang N. An online universal classifier for binary, multiclass and multilabel classification[C]//2016 IEEE International Conference on Systems, Man, and Cybernetics (SMC). New York: IEEE, 2016: 37013706 . [Google Scholar]
 Chu H M, Huang K H, Lin H T. Dynamic principal projection for costsensitive online multilabel classification[J]. Machine Learning, 2019, 108(8/9): 11931230. [CrossRef] [MathSciNet] [Google Scholar]
 Guo X Z, Zhang Y W, Xu J H. Online multilabel passive aggressive active learning algorithm based on binary relevance[C]//Neural Information Processing. Cham: SpringerVerlag, 2017, 10: 256266. [Google Scholar]
 Liu J, Guo Z W, Sun Z W, et al. Online multilabel feature selection on imbalanced data sets[C]//Communications in Computer and Information Science. Singapore: SpringerVerlag, 2018, 812: 165174. [Google Scholar]
 Boulbazine S, Cabanes G, Matei B, et al. Online semisupervised growing neural gas for multilabel data classification[C]//2018 International Joint Conference on Neural Networks (IJCNN). New York: IEEE, 2018: 18. [Google Scholar]
 Huang S J, Zhou Z H. Multilabel learning by exploiting label correlations locally[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 26(1): 949955. [CrossRef] [Google Scholar]
 Zhang M L, Zhang K. Multilabel learning by exploiting label dependency[C]//Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2010: 9991008. [Google Scholar]
 Zhang M L, Zhou Z H. A review on multilabel learning algorithms[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(8): 18191837. [CrossRef] [Google Scholar]
 Zhang M L, Zhou Z H. MLKNN: A lazy learning approach to multilabel learning[J]. Pattern Recognition, 2007, 40(7): 20382048. [CrossRef] [Google Scholar]
 Ju X C, Tian Y J, Liu D L, et al. Nonparallel hyperplanes support vector machine for multiclass classification[J]. Procedia Computer Science, 2015, 51(1): 15741582. [CrossRef] [Google Scholar]
 Jia X Y, Li Z C, Zheng X A, et al. Label distribution learning with label correlations on local samples[J]. IEEE Transactions on Knowledge and Data Engineering, 2021, 33(4): 16191631. [CrossRef] [Google Scholar]
 Li R X, Du J X, Ding J M, et al. Semisupervised multilabel dimensionality reduction learning by instance and label correlations[J]. Mathematics, 2023, 11(3): 782. [CrossRef] [Google Scholar]
 Crammer K, Singer Y. A family of additive online algorithms for category ranking[J]. Journal of Machine Learning Research, 2003, 3: 10251058. [MathSciNet] [Google Scholar]
All Tables
All Figures
Fig. 1 The average precision analysis of four algorithms on four datasets 

In the text 
Fig. 2 The one error analysis of four algorithms on four datasets 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.