Open Access
Issue
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 2, April 2023
Page(s) 169 - 176
DOI https://doi.org/10.1051/wujns/2023282169
Published online 23 May 2023

© Wuhan University 2023

Licence Creative CommonsThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

0 Introduction

Under the consensus of the global energy shortage environment and achieving the goal of carbon neutralization, the development and utilization of new energy has become an important way for many countries to solve the energy shortage[1]. Wind energy, as a large-scale commercial green energy, has become an important part of the transformation of energy structure. When wind energy is used as a supplement to alternative energy and basic energy, the randomness of wind direction and wind speed causes large-scale uncertain fluctuations of wind power, which seriously restricts the large-scale development of wind power[2]. In order to ensure the safe and stable operation of the power grid and improve the wind power consumption capacity, there is an urgent need to develop accurate wind power forecasting methods.

Around the short-term accurate wind power forecasting, researchers have carried out more extensive and in-depth research[3,4]. The existing load forecasting methods can be divided into two categories: statistical method[5-7] and deep learning method[8-10]. Statistical methods usually use historical data with time series to predict wind power values. The commonly used statistical methods are regression method[5] and time series method[6, 7] and so on. The statistical method has the problem of poor ability to deal with the nonlinear relationship between data when the load sample is large, so the prediction accuracy is not high. The depth learning method is through the depth neural network combined with a large number of data training and practice, so that the method can learn the relationship between the data to predict the wind power. The methods commonly used in load forecasting are neural network[8], random forest[9], extreme learning machine[10] and so on.

With the rapid development of deep learning in recent years, researchers have carried out more extensive and in-depth research on wind short-term power prediction by using this method, and have made great progress. Cherkassky et al[11] established a multi-supported support vector machine (SVM) method and enhanced the parameter optimization, which further improved the accuracy of prediction. However, the gradient explosion and disappearance of traditional neural networks are important factors that limit its development at present[12]. In order to solve this problem, Shahid et al[13] proposed to use long short-term memory (LSTM) method to establish wind power, which effectively solves the gradient problem of traditional neural network. However, the prediction process requires multiple weight parameters, the training of the method is difficult, and the prediction waiting time is long. In order to solve this pain point, the limit gradient lifting algorithm[14] has aroused the much interests among researchers due to its better accuracy than traditional methods and less time cost in model training through parallel operation. At present, this method has been applied in photovoltaic power generation prediction[15] and wind power prediction[16]. For the multi-dimensional model, Zhang et al[17] determined that the wind power data had chaotic characteristics and established the corresponding chaotic model.

Ding et al[18] decomposed the wind power series by improving the complementary set empirical mode, and then predicted the wind power data by using the echo state network. The prediction method in Ref.[18] only considers the problem of wind turbine power, but ignores other meteorological characteristics. In reality, different wind farms are suitable for different model meteorological feature inputs[19], so corresponding algorithms are required to select appropriate meteorological features as the input variables of the model. CatBoost algorithm[20] can reduce the need for many super parameter tuning and the chance of over fitting, and make the model more versatile[21]. Furthermore, CatBoost can effectively avoid the impact of uneven data distribution on the model with good effect, high precision and strong generalization ability[22].

In this paper, a short-term wind power forecasting method CatBoost short term wind energy forecast (CBSWF) based on the combination of meteorological characteristics and CatBoost is proposed. First of all, the meteorological characteristics affecting wind power are defined; then the Categorical features is counted, the feature frequency is calculated to generate new exponential features, and the feature dimension is increased by combining categories; then the feature algorithm is used for parameter optimization; finally, the prediction method based on CBSWF method is tested by comparison with other methods. The experimental results show that this method can effectively improve the training time and prediction accuracy of wind power.

1 Meteorological Features

In the short-term wind power prediction, wind speed is the main factor that determines the output power of wind farm. Zhu et al[23] found that the wind speed at different heights has a certain impact on the atmospheric characteristics around the wind farm. Then Ma et al[24] pointed out that the wake effect in the fan space is also one of the important factors affecting the wind power, and the wind direction has a great influence on the wind power, so it is necessary to select and model the meteorological characteristics. In short-term wind power forecasting, the commonly-used meteorological characteristics are shown in Table 1.

When dealing with the categorical features in meteorological features, the CBSWF method is replaced by the average value of the label corresponding to the categorical feature. In the decision tree, the average value of the label will be used as the criterion for node splitting. This method is called Greedy Target-based Statistics[25], or Greedy TS for short, as shown in formula (1):

x ^ k i = j = 1 n [ x j , k = x i , k ]   Y i j = 1 n [ x j , k = x i , k ] (1)

where x represents the numerical variable, i the category, Y the target variable, j the current sample, and k the training sample. The label averaging method is used to accommodate more information. If the average value of the label is forced to represent the feature, the conditional offset problem will occur when the data structure, the distribution of the training data set and the test data set are different. The CBSWF method improves Greedy TS by adding a priori distribution term, resulting in the impact of noise and low-frequency type data on data distribution:

x ^ k i = j = 1 l - 1 [ x σ j , k = x σ l , k ]   Y σ j + a l j = 1 l - 1 [ x σ j , k = x σ l , k ] + a (2)

where l is the added prior term, σ represents the random arrangement order, and a is usually the weight coefficient greater than 0. Aiming at the feature with a small number of categories, the noise data can be reduced. For the regression problem, in general, a priori term can take the mean value of the data set lable. For binary classification, a priori term is a priori probability of a positive example. It is also valid to use multiple data sets to arrange, but if calculated directly, it may lead to overfitting. CBSWF avoids the problem of over-fitting in the arrangement and calculation of symmetric trees.

Table 1

Common meteorological features for short-term wind power prediction

2 Feature Screening Method

Feature screening is a fast and effective dimensionality reduction method for dealing with high-dimensional data. In this paper, a feature screening method MS-SIS based on sure independence screening (SIS)[26] is designed, which can directly deal with the high-dimensional feature screening problem of logp=O(nα) without dealing with multi-classification discrete variables and covariables. There are dA (d=1,...,n) covariables in the CBSWF method, which are used to construct a feature screening index ωd to measure the importance of the correlation between the covariable Bd and the response variable C.

ω d = m a x 1 r R b   s u p | P ( B d b | C = c r ) - P ( B d b ) | (3)

where P is the number of dimensions, b represents the random discrete variable, and c is the covariable of b, the total is r.

Based on the characteristic index ωd, let i=1,...,n, the important variable is A, the non-important variable is I, and the random sample (Bi,Ci) is taken from {B,C}, then

P ^ ( B d b | C = c r ) = i = 1 n I { B i d b , C = c r } i = 1 n I { C i = c r } (4)

P ^ ( B d b ) = 1 n i = 1 n I { B i d b } (5)

Build the filtering process

ω ^ d = m a x 1 r R b   s u p | P ^ ( B d b | C = c r ) - P ^ ( B d b ) | (6)

There exist non-negative numbers 0τ12 and 0ξ12-τ, when c0.Let R=O(nξ), mindAωd2cn-τ. The filtered subset A^ is calculated by ω^d:

A ^ = { d : ω ^ d c n - τ , 1 d p } (7)

There is a positive number λ, and

δ = m i n             d A ω ^ d - m i n d I ω d > 0

which distinguishes the variables in A^ in order of importance:

P ( m i n d A ω ^ d m i n d I ω ^ d )

1 - O ( p e x p { - λ δ 2 n 1 - 2 ξ + ( 1 + ξ ) l o g n } ) (8)

3 Gradient Deviation and Iteration

CBSWF uses Algorithm 1 to overcome the prediction offset problem, as shown as below.

Algorithm 1: Ordered boosting
input: {[Ki,Yi]}i=1nA
σrandom permutation of [1,n];
Mi0 for i=1,...,n;
for t1 to A do
for i1 to n do
  riyi-Mσ(i)-1(ki);
for i1 to n do
  ΔMLearnModel((kj,rj) :σ(j)i); MiMi+ΔM
return Mn

In Algorithm 1, in order to obtain unbiased gradient estimation, CBSWF trains a separate method Mi for each sample xi, and method Mi is trained by using a training set that does not contain sample ki.

The gradient estimation of the sample is obtained by Mi, and the gradient is used to train the base learner and get the final method.

Let Fi be the method after building i trees, and gi(Xk,Yk) be the gradient value of training sample k after building i trees. Then the method Fi is trained to make gi(Xk,Yk) unbiased to method Fi. For each Xk, a separate method Mk is trained, and the method is never updated with a gradient estimate based on the sample. Use Mk to estimate the gradient of Xk, and use the estimated value to score the result tree. As shown in Algorithm 2, where Loss(ui,v) is the loss function to be optimized, u is the label value, and v is the formula calculated value.

Algorithm 2: Updating the models and calculating model values for gradient estimation
input: {[Xi,Yi]}i=1n ordered according to σ;
the number of tree I;
Mi0 for i=1,...,n;
for iter1 to I do
  for i1 to n do
   for j1 to i-1 do
    gjddvLoss(uj,v)|v=Mi(Xj)
   MLearn-Tree((Xj,gi) for j=1,...,i-1)
   MiMi+M
return M1,...,Mn;M1(X1),...,Mn(Xn)

CBSWF algorithm is to get a strong learner through the serial iteration of a group of classifiers, so as to classify with higher precision. It uses forward distribution algorithm and weak learner uses classification regression tree (CART).

Let t be the number of iterations, the strong learner obtained in the previous iteration is Ft-1(x), and the loss function is L(y,Ft-1(x)), then the purpose of this iteration is to find the weak learner ht of the regression tree method and minimize the loss function of this round. Formula (9) represents the ht of the current iteration.

h t = a r g m i n h H L ( ( y , F t - 1 ( x ) + h ( x ) ) ) (9)

The negative gradient of the loss function is used to fit the approximate value of the loss of each round, and gt(x,y) represents the above gradient in formula (10).

g t ( x , y ) = L ( y , s ) s | s = F t - 1 ( x ) (10)

The formula (11) is usually used to approximate ht.

h t = a r g m i n h H E ( - g t ( x , y ) - h ( x ) ) 2 (11)

Finally, as shown in formula (12), we get

F t ( x ) = F t - 1 ( x ) + h ( x ) (12)

The conditional distribution gt(Xk,yk)|Xk calculated randomly according to {Xk} is offset from the distribution gt(X,y)|X of the test set, so that there is a deviation between ht defined by formula (11) and formula (9), which finally affects the generalization ability of method Ft.

4 Construction of Composition Method

The CBSWF method can effectively integrate the information from multiple methods, reduce the prediction risk of a single method, and improve the overall prediction accuracy of the algorithm by using CatBoost fusion MS-SIS method to construct a combination method[27]. Therefore, CBSWF method is combined with MS-SIS feature selection algorithm to construct a combination forecasting method with CatBoost as the underlying algorithm. The corresponding training and forecasting process is shown in Fig. 1.

thumbnail Fig. 1

Forecast training process

1) MS-SIS based feature selection operation is performed on the original feature set.

2) According to the order of MS-SIS, the data set is divided into n training subsets {A^1,...,A^n}.

3) n training subsets are trained by CatBoost to generate n subvariables {X1,...,Xn}. The prediction error ei of each sub-method is evaluated by test set and mean square error, and the calculation formula is as follows:

e i = 1 N d = 1 N ( z d - z ^ d ) 2 (13)

where zt is the measured power value of Xd, z^d is the predicted power value of Xd, and N is the number of samples.

4) Data preprocessing including feature selection is carried out on the prediction set, and the processed prediction set is divided into n prediction subsets according to the wind speed at different heights.

5) The {X1,...,Xn} generated by the training process corresponds to the prediction error {e1,...,en}, and then the test samples are calculated to get the final prediction results.

5 Test and Analysis

Test is performed on the Win10 operating system and python3.7 and TensorFlow2.0, CPU: i7 12800HX, graphics card: GTX 3070Ti.

The sample data of wind farm A selected in this paper is the measured public data set of a wind turbine group in the United States in 2012 provided by the National Renewable Energy Laboratory (NREL). The installed capacity of the cluster is 18 MW, and the data obtained include six different time series data, such as wind direction, wind speed, air temperature, air pressure, atmospheric density and wind power. The sampling interval is 5 min, without missing value, a total of 105 120 time section data are obtained. In order to fully demonstrate the performance of the method proposed in this paper, relevant experiments are carried out on the measured data and historical numerical weather forecast data of a wind farm B in Northeast China. The installed capacity of the farm is 80 MW, the hub height is 70 m, and the sampling interval is 15 min. The data have a total of 70 000 time sections. Bilinear interpolation and Kriging interpolation methods are used to select missing values for different meteorological elements.

Considering that the sampling interval of the original data is 5 and 15 min, in order to maximize the extraction of hourly features, daily features and monthly features, the number of time steps is 12, the network uses mini-batch input, and the number of samples per input is 72. The method uses 50 epochs iterative training calculations, and the measures of earlystop are taken to reduce the over-fitting and training time, and the mean square error (MSE) is used as the loss function during training. Wind farm A uses the first 80 000 pieces of data as the training set, and the remaining 25 120 pieces of data as the test set; wind farm B uses the first 55 000 pieces of data as the training set, and the remaining 15 000 pieces of data as the test set. The grid search optimization method provided by keras is used to ensure the accuracy of parameter setting. The parameter structure selection are shown in Table 2.

Root mean square error (RMSE) [28] and mean absolute error (MAE) [29] are used to evaluate the prediction effect. RMSE mainly reflects the performance of the method to control the absolute error. MAE mainly reflects the actual situation of the prediction error. The specific calculation methods are as follows:

ς R M S E = 1 n i = 1 n [ μ ( i ) - μ ^ ( i ) ] (14)

ς M A E = 1 n i = 1 n | μ ( i ) - μ ^ ( i ) | (15)

where μ(i) and μ^(i) are the actual and predicted values of wind power, n is the number of prediction and verification data, and i is the sequence number of prediction points. In addition, all the assessment indicators provided in this paper are normalized by installed capacity.

The prediction ability of the CBSWF method is compared with two groups of prediction algorithms, MLSTM[16] method and CEEMD-FA-ESN[17] method. The parameters of each method are adjusted according to the actual prediction situation to achieve the corresponding optimal prediction effect for comparison. And in order to be more in line with the actual situation, the value that exceeds the maximum generation power in the prediction results is recorded as the maximum generation power and the predicted negative power as 0. Figure 2 shows the time loss of the three prediction methods in the iterative process, which shows that the CBSWF method has the highest time efficiency.

thumbnail Fig. 2

Iterations of the experimental model

The comparison of the prediction results of the three prediction methods for wind farm A and wind farm B is shown in Fig. 3, which shows that for wind farm A and wind farm B, the three methods can better predict the trend of wind power, but the accuracy of CBSWF method is obviously improved and the degree of fit is better than that of other methods.

thumbnail Fig. 3

Comparative analysis of test data for wind farm A ((a), (b)) and wind farm B ((c), (d))

From the comparison of the prediction results of Fig. 3(b) and 3(d) combined with the error comparison of various methods given in Table 3, it is obvious that the prediction accuracy of CBSWF method is higher than that of MLSTM method and CEEMD-FA-ESN method, and the average error of this method in RMSE is 40.97% and 27.24% lower than that of MLSTM method and CEEMD-FA-ESN method, respectively. The corresponding average error of MAE also decreased by 44.42% and 35.2%, respectively. For the sharp rise and fall of the original signal, the fitting effect of this method is still relatively ideal, while the errors of MLSTM method and CEEMD-FA-ESN method are relatively large in the face of these sharp fluctuations. This shows that the MS-SIS method can effectively extract the sequence details of meteorological data and achieve higher prediction accuracy. It further shows that the CBSWF method has obvious advantages in wind power prediction.

Table 2

Parameter selection of method

Table 3

Prediction errors of models

6 Summary

In view of the weak correlation of meteorological data and the accuracy of data prediction in the current short-term wind power forecasting, a forecasting method CBSWF which combines the feature selection method of meteorological data with CatBoost is proposed in this paper. The meteorological data are processed based on MS-SIS, and the meteorological features which have great influence on wind power are obtained by the relevant feature selection algorithm, and the coupling relationship between variables is eliminated. In order to take into account the accuracy and efficiency of the algorithm and reduce the prediction risk of a single method, this method uses CatBoost as the underlying algorithm to construct the combination forecasting method CBSWF. The prediction accuracy is improved by sorting the algorithm, and the gradient deviation algorithm is designed to increase the stability of the method, reduce the training time and reduce the amount of calculation. The experimental results show that the CBSWF method has a good performance in predicting the overall trend of wind power and its local details. In the next step of work, it will be considered to further improve the prediction accuracy of wind power by downscaling the data and time through meteorological numerical prediction.

References

  1. Li H H, Tan Z F, Chen H T, et al. Integrated heat and power dispatch model for wind-CHP system with solid heat storage device based on robust stochastic theory [J]. Wuhan University Journal of Natural Sciences, 2018, 23(1): 31-42. [CrossRef] [MathSciNet] [Google Scholar]
  2. Hu Y, Li Q, Fang F, et al. Dynamic interval modeling of ultra-short-term output of wind farm based on finite difference operating domains [J]. Power System Technology, 2022, 46(4): 1346-1357(Ch). [Google Scholar]
  3. Tang X Z, Gu N W, Huang X Q, et al. Progress on short term wind power forecasting technology [J]. Journal of Mechanical Engineering, 2022, 58(12): 213-236(Ch). [Google Scholar]
  4. Feng S L, Wang W S, Liu C, et al. Study on the physical approach to wind power prediction [J]. Proceedings of the CSEE, 2010, 30(2): 1-6(Ch). [Google Scholar]
  5. Lu P, Ye L, Pei M, et al. Coordinated control strategy for active power of wind power cluster based on model predictive control [J]. Proceedings of the CSEE, 2021, 41(17): 5887-5900(Ch). [Google Scholar]
  6. Sun Y, Li Z Y, Yu X N, et al. Research on ultra-short-term wind power prediction considering source relevance [J]. IEEE Access, 2020, 8: 147703-147710. [CrossRef] [Google Scholar]
  7. Lu P, Ye L, Zhong W Z, et al. A novel spatio-temporal wind power forecasting framework based on multi-output support vector machine and optimization strategy [J]. Journal of Cleaner Production, 2020, 254: 119993. [CrossRef] [Google Scholar]
  8. Liu T H, Wei H K, Zhang K J. Wind power prediction with missing data using Gaussian process regression and multiple imputation [J]. Applied Soft Computing, 2018, 71: 905-916. [Google Scholar]
  9. Liu X, Zhou J, Qian H M. Short-term wind power forecasting by stacked recurrent neural networks with parametric sine activation function [J]. Electric Power Systems Research, 2021, 192: 107011. [CrossRef] [Google Scholar]
  10. Zhao Z, Wang X S. Ultra-short-term multi-step wind power prediction based on CEEMD and improved time series model[J]. Acta Energiae Solaris Sinica, 2020, 41(7): 352-358(Ch). [Google Scholar]
  11. Cherkassky V, Ma Y Q. Practical selection of SVM parameters and noise estimation for SVM regression [J]. Neural Networks: The Official Journal of the International Neural Network Society, 2004, 17(1): 113-126. [Google Scholar]
  12. Zhou D W, Zhao L J, Duan R, et al. Image super-resolution based on recursive residual networks [J]. Acta Automatica Sinica, 2019, 45(6): 1157-1165(Ch). [Google Scholar]
  13. Shahid F, Zameer A, Muneeb M. A novel genetic LSTM model for wind power forecast [J]. Energy, 2021, 223: 120069. [Google Scholar]
  14. Meng Y, Chen S L, Wu Z H, et al. A DC arc fault detection method based on CatBoost algorithm for different electrode materials [J]. Journal of Xi'an Jiaotong University, 2022, 56(3): 124-134(Ch). [Google Scholar]
  15. Munawar U, Wang Z L. A framework of using machine learning approaches for short-term solar power forecasting[J]. Journal of Electrical Engineering & Technology, 2020, 15(2): 561-569. [NASA ADS] [CrossRef] [Google Scholar]
  16. Liu K W, Pu T J, Zhou H M, et al. A short-term wind power forecasting model based on combination algorithms [J]. Proceedings of the CSEE, 2013, 33(34): 130-135(Ch). [Google Scholar]
  17. Zhang Q, Tang Z H, Wang G, et al. Ultra-short-term wind power prediction model based on long and short term memory network [J]. Acta Energiae Solaris Sinica, 2021, 42(10): 275-281(Ch). [Google Scholar]
  18. Ding J L, Chen G C, Yuan K. Short-term wind power prediction based on improved firefly algorithm [J]. Journal of System Simulation, 2019, 31(11): 2509-2516(Ch). [Google Scholar]
  19. Qian Z, Pei Y, Cao L X, et al. Review of wind power forecasting method [J]. High Voltage Engineering, 2016, 42(4): 1047-1060(Ch). [Google Scholar]
  20. Dong L M, Zeng W Z, Lei G Q. Coupling CatBoost model with bat algorithm to simulate the pan evaporation in northwest China [J]. Water Saving Irrigation, 2021(2): 63-69(Ch). [Google Scholar]
  21. Miao F S, Li Y, Gao C, et al. Diabetes prediction method based on CatBoost algorithm [J]. Computer Systems & Applications, 2019, 28(9): 215-218(Ch). [Google Scholar]
  22. Yao F Q, Sun J W, Dong J H. Estimating daily dew point temperature based on local and cross-station meteorological data using CatBoost algorithm [J]. Computer Modeling in Engineering & Sciences, 2022, 130(2): 671-700. [Google Scholar]
  23. Zhu R, Xu H, Gong Q, et al. Wind environmental regionalization for development and utilization of wind energy in China [J]. Acta Energiae Solaris Sinica, 2022, 8: 1-14(Ch). [Google Scholar]
  24. Ma X P, He S E, Yao Y, et al. Virtual inertia estimation of wind farm zones with wind speed uncertainty and correlation[J]. Power System Protection and Control, 2022, 50(10): 123-131(Ch). [Google Scholar]
  25. Kundu S M, Pal S K. Deprecation based greedy strategy for target set selection in large scale social networks [J]. Information Sciences: An International Journal, 2015, 316: 107-122. [Google Scholar]
  26. Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space [J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70(5): 849-911. [CrossRef] [MathSciNet] [Google Scholar]
  27. Zhou Q, Ren H J, Li J, et al. Variable weight combination method for mid-long term power load forecasting based on hierarchical structure [J]. Proceedings of the CSEE, 2010, 30(16): 47-52(Ch). [Google Scholar]
  28. Faber N M. Estimating the uncertainty in estimates of root mean square error of prediction: Application to determining the size of an adequate test set in multivariate calibration [J]. Chemometrics and Intelligent Laboratory Systems, 1999, 49(1): 79-89. [Google Scholar]
  29. Coyle E J, Lin J H. Stack filters and the mean absolute error criterion [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988, 36(8): 1244-1254. [Google Scholar]

All Tables

Table 1

Common meteorological features for short-term wind power prediction

Table 2

Parameter selection of method

Table 3

Prediction errors of models

All Figures

thumbnail Fig. 1

Forecast training process

In the text
thumbnail Fig. 2

Iterations of the experimental model

In the text
thumbnail Fig. 3

Comparative analysis of test data for wind farm A ((a), (b)) and wind farm B ((c), (d))

In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.