Short-Term Wind Power Prediction Method Based on Combination of Meteorological Features and CatBoost

: As one of the hot topics in the field of new energy, short-term wind power prediction research should pay attention to the im‐ pact of meteorological characteristics on wind power while improving the prediction accuracy. Therefore, a short-term wind power predic‐ tion method based on the combination of meteorological features and CatBoost is presented. Firstly, morgan-stone algebras and sure inde‐ pendence screening(MS-SIS) method is designed to filter the meteorological features, and the influence of the meteorological features on the wind power is explored. Then, a sort enhancement algorithm is designed to increase the accuracy and calculation efficiency of the method and reduce the prediction risk of a single element. Finally, a prediction method based on CatBoost network is constructed to further realize short-term wind power prediction. The National Renewable Energy Laboratory (NREL) dataset is used for experimental analysis. The results show that the short-term wind power prediction method based on the combination of meteorological features and CatBoost not only improve the prediction accuracy of short-term wind power, but also have higher calculation efficiency.


Introduction
Under the consensus of the global energy shortage environment and achieving the goal of carbon neutralization, the development and utilization of new energy has become an important way for many countries to solve the energy shortage [1] . Wind energy, as a large-scale commercial green energy, has become an important part of the transformation of energy structure. When wind en-casting, researchers have carried out more extensive and in-depth research [3,4] . The existing load forecasting methods can be divided into two categories: statistical method [5][6][7] and deep learning method [8][9][10] . Statistical methods usually use historical data with time series to predict wind power values. The commonly used statistical methods are regression method [5] and time series method [6,7] and so on. The statistical method has the problem of poor ability to deal with the nonlinear relationship between data when the load sample is large, so the prediction accuracy is not high. The depth learning method is through the depth neural network combined with a large number of data training and practice, so that the method can learn the relationship between the data to predict the wind power. The methods commonly used in load forecasting are neural network [8] , random forest [9] , extreme learning machine [10] and so on.
With the rapid development of deep learning in recent years, researchers have carried out more extensive and in-depth research on wind short-term power prediction by using this method, and have made great progress. Cherkassky et al [11] established a multi-supported support vector machine (SVM) method and enhanced the parameter optimization, which further improved the accuracy of prediction. However, the gradient explosion and disappearance of traditional neural networks are important factors that limit its development at present [12] . In order to solve this problem, Shahid et al [13] proposed to use long short-term memory (LSTM) method to establish wind power, which effectively solves the gradient problem of traditional neural network. However, the prediction process requires multiple weight parameters, the training of the method is difficult, and the prediction waiting time is long. In order to solve this pain point, the limit gradient lifting algorithm [14] has aroused the much interests among researchers due to its better accuracy than traditional methods and less time cost in model training through parallel operation. At present, this method has been applied in photovoltaic power generation prediction [15] and wind power prediction [16] . For the multi-dimensional model, Zhang et al [17] determined that the wind power data had chaotic characteristics and established the corresponding chaotic model.
Ding et al [18] decomposed the wind power series by improving the complementary set empirical mode, and then predicted the wind power data by using the echo state network. The prediction method in Ref. [18] only considers the problem of wind turbine power, but ig-nores other meteorological characteristics. In reality, different wind farms are suitable for different model meteorological feature inputs [19] , so corresponding algorithms are required to select appropriate meteorological features as the input variables of the model. CatBoost algorithm [20] can reduce the need for many super parameter tuning and the chance of over fitting, and make the model more versatile [21] . Furthermore, CatBoost can effectively avoid the impact of uneven data distribution on the model with good effect, high precision and strong generalization ability [22] .
In this paper, a short-term wind power forecasting method CatBoost short term wind energy forecast (CBSWF) based on the combination of meteorological characteristics and CatBoost is proposed. First of all, the meteorological characteristics affecting wind power are defined; then the Categorical features is counted, the feature frequency is calculated to generate new exponential features, and the feature dimension is increased by combining categories; then the feature algorithm is used for parameter optimization; finally, the prediction method based on CBSWF method is tested by comparison with other methods. The experimental results show that this method can effectively improve the training time and prediction accuracy of wind power.

Meteorological Features
In the short-term wind power prediction, wind speed is the main factor that determines the output power of wind farm. Zhu et al [23] found that the wind speed at different heights has a certain impact on the atmospheric characteristics around the wind farm. Then Ma et al [24] pointed out that the wake effect in the fan space is also one of the important factors affecting the wind power, and the wind direction has a great influence on the wind power, so it is necessary to select and model the meteorological characteristics. In short-term wind power forecasting, the commonly-used meteorological characteristics are shown in Table 1.
When dealing with the categorical features in meteorological features, the CBSWF method is replaced by the average value of the label corresponding to the categorical feature. In the decision tree, the average value of the label will be used as the criterion for node splitting. This method is called Greedy Target-based Statistics [25] , or Greedy TS for short, as shown in formula (1): where x represents the numerical variable, i the category, Y the target variable, j the current sample, and k the training sample. The label averaging method is used to accommodate more information. If the average value of the label is forced to represent the feature, the conditional offset problem will occur when the data structure, the distribution of the training data set and the test data set are different. The CBSWF method improves Greedy TS by adding a priori distribution term, resulting in the impact of noise and low-frequency type data on data distribution: where l is the added prior term, σ represents the random arrangement order, and a is usually the weight coefficient greater than 0. Aiming at the feature with a small number of categories, the noise data can be reduced. For the regression problem, in general, a priori term can take the mean value of the data set lable. For binary classification, a priori term is a priori probability of a positive example. It is also valid to use multiple data sets to arrange, but if calculated directly, it may lead to overfitting. CBSWF avoids the problem of over-fitting in the arrangement and calculation of symmetric trees.

Feature Screening Method
Feature screening is a fast and effective dimensionality reduction method for dealing with highdimensional data. In this paper, a feature screening method MS-SIS based on sure independence screening (SIS) [26] is designed, which can directly deal with the high-dimensional feature screening problem of log p = O(n α ) without dealing with multi-classification discrete variables and covariables. There are d Î A (d = 1 ... n) covariables in the CBSWF method, which are used to construct a feature screening index ω d to measure the importance of the correlation between the covariable B d and the response variable C.
where P is the number of dimensions, b represents the random discrete variable, and c is the covariable of b, the total is r.
Build the filtering process There exist non-negative numbers 0 ≤ τ ≤ There is a positive number λ, and δ = min which distinguishes the variables in Â in order of importance:

Gradient Deviation and Iteration
CBSWF uses Algorithm 1 to overcome the prediction offset problem, as shown as below.
In Algorithm 1, in order to obtain unbiased gradient estimation, CBSWF trains a separate method M i for each sample x i , and method M i is trained by using a training set that does not contain sample k i .
The gradient estimation of the sample is obtained by M i , and the gradient is used to train the base learner and get the final method.
Let F i be the method after building i trees, and g i (X k Y k ) be the gradient value of training sample k after building i trees. Then the method F i is trained to make g i (X k Y k ) unbiased to method F i . For each X k , a separate method M k is trained, and the method is never updated with a gradient estimate based on the sample. Use M k to estimate the gradient of X k , and use the estimated value to score the result tree. As shown in Algorithm 2, where Loss(u i v) is the loss function to be optimized, u is the label value, and v is the formula calculated value.
CBSWF algorithm is to get a strong learner through the serial iteration of a group of classifiers, so as to classify with higher precision. It uses forward distribution algorithm and weak learner uses classification regression tree (CART).
Let t be the number of iterations, the strong learner obtained in the previous iteration is F t -1 (x), and the loss function is L(yF t -1 (x)), then the purpose of this iteration is to find the weak learner h t of the regression tree method and minimize the loss function of this round. Formula (9) represents the h t of the current iteration.
The negative gradient of the loss function is used to fit the approximate value of the loss of each round, and g t (xy) represents the above gradient in formula (10).
The formula (11) is usually used to approximate h t .
Finally, as shown in formula (12), we get The conditional distribution g t (X k y k )|X k calculated randomly according to {X k } is offset from the distribution g t (Xy)|X of the test set, so that there is a deviation between h t defined by formula (11) and formula (9), which finally affects the generalization ability of method F t .

Construction of Composition Method
The CBSWF method can effectively integrate the information from multiple methods, reduce the prediction risk of a single method, and improve the overall prediction accuracy of the algorithm by using CatBoost fusion MS-SIS method to construct a combination method [27] . Therefore, CBSWF method is combined with MS-SIS feature selection algorithm to construct a combination forecasting method with CatBoost as the underlying algorithm. The corresponding training and forecasting process is shown in Fig. 1. 1) MS-SIS based feature selection operation is performed on the original feature set.
2) According to the order of MS-SIS, the data set is divided into n training subsets {Â1  ... Ân }.
3) n training subsets are trained by CatBoost to gen- M ¬ Learn -Tree((X j g i ) for j = 1 ... i -1) erate n subvariables {X 1  ... X n }. The prediction error e i of each sub-method is evaluated by test set and mean square error, and the calculation formula is as follows: where z t is the measured power value of X d , ẑd is the predicted power value of X d , and N is the number of samples.
4) Data preprocessing including feature selection is carried out on the prediction set, and the processed prediction set is divided into n prediction subsets according to the wind speed at different heights.
5) The {X 1  ... X n } generated by the training process corresponds to the prediction error {e 1  ... e n }, and then the test samples are calculated to get the final prediction results.

Test and Analysis
Test is performed on the Win10 operating system and python3.7 and TensorFlow2.0, CPU: i7 12800HX, graphics card: GTX 3070Ti.
The sample data of wind farm A selected in this paper is the measured public data set of a wind turbine group in the United States in 2012 provided by the National Renewable Energy Laboratory (NREL). The installed capacity of the cluster is 18 MW, and the data obtained include six different time series data, such as wind direction, wind speed, air temperature, air pressure, atmospheric density and wind power. The sampling interval is 5 min, without missing value, a total of 105 120 time section data are obtained. In order to fully demonstrate the performance of the method proposed in this paper, relevant experiments are carried out on the measured data and historical numerical weather forecast data of a wind farm B in Northeast China. The installed capacity of the farm is 80 MW, the hub height is 70 m, and the sampling interval is 15 min. The data have a total of 70 000 time sections. Bilinear interpolation and Kriging interpolation methods are used to select missing values for different meteorological elements.
Considering that the sampling interval of the original data is 5 and 15 min, in order to maximize the extraction of hourly features, daily features and monthly features, the number of time steps is 12, the network uses mini-batch input, and the number of samples per input is 72. The method uses 50 epochs iterative training calculations, and the measures of earlystop are taken to reduce the over-fitting and training time, and the mean square error (MSE) is used as the loss function during training. Wind farm A uses the first 80 000 pieces of data as the training set, and the remaining 25 120 pieces of data as the test set; wind farm B uses the first 55 000 pieces of data as the training set, and the remaining 15 000 pieces of data as the test set. The grid search optimization method provided by keras is used to ensure the accuracy of parameter setting. The parameter structure selection are shown in Table 2.
Root mean square error (RMSE) [28] and mean absolute error (MAE) [29] are used to evaluate the prediction effect. RMSE mainly reflects the performance of the method to control the absolute error. MAE mainly reflects the actual situation of the prediction error. The specific calculation methods are as follows: where μ(i) and μ̂(i) are the actual and predicted values of wind power, n is the number of prediction and verification data, and i is the sequence number of prediction points. In addition, all the assessment indicators provided in this paper are normalized by installed capacity. The prediction ability of the CBSWF method is compared with two groups of prediction algorithms, MLSTM [16] method and CEEMD-FA-ESN [17] method.
The parameters of each method are adjusted according to the actual prediction situation to achieve the corresponding optimal prediction effect for comparison. And in order to be more in line with the actual situation, the value that exceeds the maximum generation power in the prediction results is recorded as the maximum generation power and the predicted negative power as 0. Figure 2 shows the time loss of the three prediction methods in the iterative process, which shows that the CBSWF method has the highest time efficiency.
The comparison of the prediction results of the three prediction methods for wind farm A and wind farm B is shown in Fig. 3, which shows that for wind farm A  and wind farm B, the three methods can better predict the trend of wind power, but the accuracy of CBSWF method is obviously improved and the degree of fit is better than that of other methods.
From the comparison of the prediction results of Fig. 3(b) and 3(d) combined with the error comparison of various methods given in Table 3, it is obvious that the prediction accuracy of CBSWF method is higher than that of MLSTM method and CEEMD-FA-ESN method, and the average error of this method in RMSE is 40.97% and 27.24% lower than that of MLSTM method and CEEMD-FA-ESN method, respectively. The corresponding average error of MAE also decreased by 44.42% and 35.2%, respectively. For the sharp rise and fall of the original signal, the fitting effect of this method is still relatively ideal, while the errors of MLSTM method and CEEMD-FA-ESN method are relatively large in the face of these sharp fluctuations. This shows that the MS-SIS method can effectively extract the sequence details of meteorological data and achieve higher prediction accuracy. It further shows that the CBSWF method has obvious advantages in wind power prediction.

Summary
In view of the weak correlation of meteorological data and the accuracy of data prediction in the current short-term wind power forecasting, a forecasting method CBSWF which combines the feature selection method of meteorological data with CatBoost is proposed in this paper. The meteorological data are processed based on MS-SIS, and the meteorological features which have great influence on wind power are obtained by the relevant feature selection algorithm, and the coupling relationship between variables is eliminated. In order to take into account the accuracy and efficiency of the algorithm and reduce the prediction risk of a single method, this method uses CatBoost as the underlying algorithm to construct the combination forecasting method CBSWF. The prediction accuracy is improved by sorting the algorithm, and the gradient deviation algorithm is designed to increase the stability of the method, reduce the training time and reduce the amount of calculation. The experimental results show that the CBSWF method has a good performance in predicting the overall trend of wind power and its local details. In the next step of work, it will be considered to further improve the prediction accuracy of wind power by downscaling the data and time through meteorological numerical prediction.