Issue 
Wuhan Univ. J. Nat. Sci.
Volume 28, Number 2, April 2023



Page(s)  169  176  
DOI  https://doi.org/10.1051/wujns/2023282169  
Published online  23 May 2023 
Computer Science
CLC number: TP 305
ShortTerm Wind Power Prediction Method Based on Combination of Meteorological Features and CatBoost
Jilin Province Meteorological Information Network Center, Changchun 130062, Jilin, China
^{†} To whom correspondence should be addressed. Email: 1647003470@qq.com
Received:
23
August
2022
As one of the hot topics in the field of new energy, shortterm wind power prediction research should pay attention to the impact of meteorological characteristics on wind power while improving the prediction accuracy. Therefore, a shortterm wind power prediction method based on the combination of meteorological features and CatBoost is presented. Firstly, morganstone algebras and sure independence screening(MSSIS) method is designed to filter the meteorological features, and the influence of the meteorological features on the wind power is explored. Then, a sort enhancement algorithm is designed to increase the accuracy and calculation efficiency of the method and reduce the prediction risk of a single element. Finally, a prediction method based on CatBoost network is constructed to further realize shortterm wind power prediction. The National Renewable Energy Laboratory (NREL) dataset is used for experimental analysis. The results show that the shortterm wind power prediction method based on the combination of meteorological features and CatBoost not only improve the prediction accuracy of shortterm wind power, but also have higher calculation efficiency.
Key words: meteorological features / shortterm power load forecasting / CatBoost / wind power
Biography: MOU Xingyu, male, Master, Assistant engineer, research direction: wind power. Email: 974805007@qq.com
Fundation item: Supported by the National Science and Technology Basic Work Project of China Meteorological Administration(2005DKA3170006), Innovation Fund of Public Meteorological Service Center of China Meteorological Administration (M2020013)
© Wuhan University 2023
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
0 Introduction
Under the consensus of the global energy shortage environment and achieving the goal of carbon neutralization, the development and utilization of new energy has become an important way for many countries to solve the energy shortage^{[1]}. Wind energy, as a largescale commercial green energy, has become an important part of the transformation of energy structure. When wind energy is used as a supplement to alternative energy and basic energy, the randomness of wind direction and wind speed causes largescale uncertain fluctuations of wind power, which seriously restricts the largescale development of wind power^{[2]}. In order to ensure the safe and stable operation of the power grid and improve the wind power consumption capacity, there is an urgent need to develop accurate wind power forecasting methods.
Around the shortterm accurate wind power forecasting, researchers have carried out more extensive and indepth research^{[3,4]}. The existing load forecasting methods can be divided into two categories: statistical method^{[57]} and deep learning method^{[810]}. Statistical methods usually use historical data with time series to predict wind power values. The commonly used statistical methods are regression method^{[5]} and time series method^{[6, 7]} and so on. The statistical method has the problem of poor ability to deal with the nonlinear relationship between data when the load sample is large, so the prediction accuracy is not high. The depth learning method is through the depth neural network combined with a large number of data training and practice, so that the method can learn the relationship between the data to predict the wind power. The methods commonly used in load forecasting are neural network^{[8]}, random forest^{[9]}, extreme learning machine^{[10]} and so on.
With the rapid development of deep learning in recent years, researchers have carried out more extensive and indepth research on wind shortterm power prediction by using this method, and have made great progress. Cherkassky et al^{[11]} established a multisupported support vector machine (SVM) method and enhanced the parameter optimization, which further improved the accuracy of prediction. However, the gradient explosion and disappearance of traditional neural networks are important factors that limit its development at present^{[12]}. In order to solve this problem, Shahid et al^{[13]} proposed to use long shortterm memory (LSTM) method to establish wind power, which effectively solves the gradient problem of traditional neural network. However, the prediction process requires multiple weight parameters, the training of the method is difficult, and the prediction waiting time is long. In order to solve this pain point, the limit gradient lifting algorithm^{[14]} has aroused the much interests among researchers due to its better accuracy than traditional methods and less time cost in model training through parallel operation. At present, this method has been applied in photovoltaic power generation prediction^{[15]} and wind power prediction^{[16]}. For the multidimensional model, Zhang et al^{[17]} determined that the wind power data had chaotic characteristics and established the corresponding chaotic model.
Ding et al^{[18]} decomposed the wind power series by improving the complementary set empirical mode, and then predicted the wind power data by using the echo state network. The prediction method in Ref.[18] only considers the problem of wind turbine power, but ignores other meteorological characteristics. In reality, different wind farms are suitable for different model meteorological feature inputs^{[19]}, so corresponding algorithms are required to select appropriate meteorological features as the input variables of the model. CatBoost algorithm^{[20]} can reduce the need for many super parameter tuning and the chance of over fitting, and make the model more versatile^{[21]}. Furthermore, CatBoost can effectively avoid the impact of uneven data distribution on the model with good effect, high precision and strong generalization ability^{[22]}.
In this paper, a shortterm wind power forecasting method CatBoost short term wind energy forecast (CBSWF) based on the combination of meteorological characteristics and CatBoost is proposed. First of all, the meteorological characteristics affecting wind power are defined; then the Categorical features is counted, the feature frequency is calculated to generate new exponential features, and the feature dimension is increased by combining categories; then the feature algorithm is used for parameter optimization; finally, the prediction method based on CBSWF method is tested by comparison with other methods. The experimental results show that this method can effectively improve the training time and prediction accuracy of wind power.
1 Meteorological Features
In the shortterm wind power prediction, wind speed is the main factor that determines the output power of wind farm. Zhu et al^{[23]} found that the wind speed at different heights has a certain impact on the atmospheric characteristics around the wind farm. Then Ma et al^{[24]} pointed out that the wake effect in the fan space is also one of the important factors affecting the wind power, and the wind direction has a great influence on the wind power, so it is necessary to select and model the meteorological characteristics. In shortterm wind power forecasting, the commonlyused meteorological characteristics are shown in Table 1.
When dealing with the categorical features in meteorological features, the CBSWF method is replaced by the average value of the label corresponding to the categorical feature. In the decision tree, the average value of the label will be used as the criterion for node splitting. This method is called Greedy Targetbased Statistics^{[25]}, or Greedy TS for short, as shown in formula (1):
${\widehat{x}}_{k}^{i}=\frac{{\displaystyle \sum _{j=\mathrm{1}}^{n}}[{x}_{j,k}={x}_{i,k}]\text{}{Y}_{i}}{{\displaystyle \sum _{j=\mathrm{1}}^{n}}[{x}_{j,k}={x}_{i,k}]}$(1)
where x represents the numerical variable, i the category, Y the target variable, j the current sample, and k the training sample. The label averaging method is used to accommodate more information. If the average value of the label is forced to represent the feature, the conditional offset problem will occur when the data structure, the distribution of the training data set and the test data set are different. The CBSWF method improves Greedy TS by adding a priori distribution term, resulting in the impact of noise and lowfrequency type data on data distribution:
${\widehat{x}}_{k}^{i}=\frac{{\displaystyle \sum _{j=\mathrm{1}}^{l\mathrm{1}}}[{x}_{\sigma j,k}={x}_{\sigma l,k}]\text{}{Y}_{\sigma j}+al}{{\displaystyle \sum _{j=\mathrm{1}}^{l\mathrm{1}}}[{x}_{\sigma j,k}={x}_{\sigma l,k}]+a}$(2)
where l is the added prior term, $\sigma $ represents the random arrangement order, and $a$ is usually the weight coefficient greater than 0. Aiming at the feature with a small number of categories, the noise data can be reduced. For the regression problem, in general, a priori term can take the mean value of the data set $\mathrm{l}\mathrm{a}\mathrm{b}\mathrm{l}\mathrm{e}$. For binary classification, a priori term is a priori probability of a positive example. It is also valid to use multiple data sets to arrange, but if calculated directly, it may lead to overfitting. CBSWF avoids the problem of overfitting in the arrangement and calculation of symmetric trees.
Common meteorological features for shortterm wind power prediction
2 Feature Screening Method
Feature screening is a fast and effective dimensionality reduction method for dealing with highdimensional data. In this paper, a feature screening method MSSIS based on sure independence screening (SIS)^{[26]} is designed, which can directly deal with the highdimensional feature screening problem of $\mathrm{l}\mathrm{o}\mathrm{g}p=O({n}^{\alpha})$ without dealing with multiclassification discrete variables and covariables. There are $d\in A\text{}(d=\mathrm{1},...,n)$ covariables in the CBSWF method, which are used to construct a feature screening index ${\omega}_{d}$ to measure the importance of the correlation between the covariable ${B}_{d}$ and the response variable C.
${\omega}_{d}=\underset{\mathrm{1}\le r\le {R}_{b}}{\mathrm{m}\mathrm{a}\mathrm{x}}\text{}\mathrm{s}\mathrm{u}\mathrm{p}P({B}_{d}\le bC={c}_{r})P({B}_{d}\le b)$(3)
where P is the number of dimensions, b represents the random discrete variable, and c is the covariable of b, the total is r.
Based on the characteristic index ${\omega}_{d}$, let $i=\mathrm{1},...,n$, the important variable is A, the nonimportant variable is I, and the random sample $({B}_{i},{C}_{i})$ is taken from $\{B,C\}$, then
$\widehat{P}({B}_{d}\le bC={c}_{r})=\frac{{\displaystyle \sum _{i=\mathrm{1}}^{n}}I\{{B}_{id}\le b,C={c}_{r}\}}{{\displaystyle \sum _{i=\mathrm{1}}^{n}}I\{{C}_{i}={c}_{r}\}}$(4)
$\widehat{P}({B}_{d}\le b)=\frac{\mathrm{1}}{n}{\displaystyle \sum _{i=\mathrm{1}}^{n}}I\{{B}_{id}\le b\}$(5)
Build the filtering process
${\widehat{\omega}}_{d}=\underset{\mathrm{1}\le r\le {R}_{b}}{\mathrm{m}\mathrm{a}\mathrm{x}}\text{}\mathrm{s}\mathrm{u}\mathrm{p}\widehat{P}({B}_{d}\le bC={c}_{r})\widehat{P}({B}_{d}\le b)$(6)
There exist nonnegative numbers $\mathrm{0}\le \tau \le \frac{\mathrm{1}}{\mathrm{2}}$ and $\mathrm{0}\le \xi \le \frac{\mathrm{1}}{\mathrm{2}}\tau $, when $c\ge \mathrm{0}$.Let $R=O({n}^{\xi})$, $\underset{d\in A}{\mathrm{m}\mathrm{i}\mathrm{n}}{\omega}_{d}\ge \mathrm{2}c{n}^{\tau}$. The filtered subset $\widehat{A}$ is calculated by ${\widehat{\omega}}_{d}$:
$\widehat{A}=\{d:{\widehat{\omega}}_{d}\ge c{n}^{\tau},\mathrm{1}\le d\le p\}$(7)
There is a positive number $\lambda $, and
$\underset{\text{}d\in A}{\delta =\mathrm{m}\mathrm{i}\mathrm{n}}{\widehat{\omega}}_{d}\underset{d\in I}{\mathrm{m}\mathrm{i}\mathrm{n}}{\omega}_{d}\mathrm{0}$
which distinguishes the variables in $\widehat{A}$ in order of importance:
$P(\underset{d\in A}{\mathrm{m}\mathrm{i}\mathrm{n}}{\widehat{\omega}}_{d}\ge \underset{d\in I}{\mathrm{m}\mathrm{i}\mathrm{n}}{\widehat{\omega}}_{d})$
$\ge \mathrm{1}O(p\mathrm{e}\mathrm{x}\mathrm{p}\{\lambda {\delta}^{\mathrm{2}}{n}^{\mathrm{1}\mathrm{2}\xi}+(\mathrm{1}+\xi )\mathrm{l}\mathrm{o}\mathrm{g}n\})$(8)
3 Gradient Deviation and Iteration
CBSWF uses Algorithm 1 to overcome the prediction offset problem, as shown as below.
Algorithm 1: Ordered boosting 

input:$\text{}\{[{K}_{i},{Y}_{i}{]\}}_{i=1}^{n}$，$A$ $\sigma \leftarrow \mathrm{r}\mathrm{a}\mathrm{n}\mathrm{d}\mathrm{o}\mathrm{m}\text{}\mathrm{p}\mathrm{e}\mathrm{r}\mathrm{m}\mathrm{u}\mathrm{t}\mathrm{a}\mathrm{t}\mathrm{i}\mathrm{o}\mathrm{n}\text{}\mathrm{o}\mathrm{f}\text{}[1,n];$ ${M}_{i}\leftarrow 0\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}i=1,...,n;$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}t\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}A\text{}\mathrm{d}\mathrm{o}$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}i\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}n\text{}\mathrm{d}\mathrm{o}$ ${r}_{i}\leftarrow {y}_{i}{M}_{\sigma (i)1}({k}_{i});$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}i\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}n\text{}\mathrm{d}\mathrm{o}$ $\mathrm{\Delta}M\leftarrow \mathrm{L}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{n}\mathrm{M}\mathrm{o}\mathrm{d}\mathrm{e}\mathrm{l}(({k}_{j},{r}_{j})\text{}:\sigma (j)\le i);$$\text{}{M}_{i}\leftarrow {M}_{i}+\mathrm{\Delta}M$ return ${M}_{n}$ 
In Algorithm 1, in order to obtain unbiased gradient estimation, CBSWF trains a separate method ${M}_{i}$ for each sample ${x}_{i}$, and method ${M}_{i}$ is trained by using a training set that does not contain sample ${k}_{i}$.
The gradient estimation of the sample is obtained by ${M}_{i}$, and the gradient is used to train the base learner and get the final method.
Let ${F}_{i}$ be the method after building $i$ trees, and ${g}^{i}({X}_{k},{Y}_{k})$ be the gradient value of training sample $k$ after building $i$ trees. Then the method ${F}_{i}$ is trained to make ${g}^{i}({X}_{k},{Y}_{k})$ unbiased to method ${F}_{i}$. For each ${X}_{k}$, a separate method ${M}_{k}$ is trained, and the method is never updated with a gradient estimate based on the sample. Use ${M}_{k}$ to estimate the gradient of ${X}_{k}$, and use the estimated value to score the result tree. As shown in Algorithm 2, where $\mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}({u}_{i},v)$ is the loss function to be optimized, $u$ is the label value, and $v$ is the formula calculated value.
Algorithm 2: Updating the models and calculating model values for gradient estimation 

input: $\{[{X}_{i},{Y}_{i}{]\}}_{i=1}^{n}$ ordered according to $\sigma $; the number of tree $I$; ${M}_{i}\leftarrow 0\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}i=1,...,n;$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}\mathrm{i}\mathrm{t}\mathrm{e}\mathrm{r}\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}I\text{}\mathrm{d}\mathrm{o}$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}i\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}n\text{}\mathrm{d}\mathrm{o}$ $\mathrm{f}\mathrm{o}\mathrm{r}\text{}j\leftarrow 1\text{}\mathrm{t}\mathrm{o}\text{}i1\text{}\mathrm{d}\mathrm{o}$ ${g}_{j}\leftarrow \frac{\mathrm{d}}{\mathrm{d}v}\mathrm{L}\mathrm{o}\mathrm{s}\mathrm{s}({u}_{j},v){}_{v={M}_{i}({X}_{j})}$ $M\leftarrow \mathrm{L}\mathrm{e}\mathrm{a}\mathrm{r}\mathrm{n}\mathrm{T}\mathrm{r}\mathrm{e}\mathrm{e}(({X}_{j},{g}_{i})\text{}\mathrm{f}\mathrm{o}\mathrm{r}\text{}j=1,...,i1)$ ${M}_{i}\leftarrow {M}_{i}+M$ return ${M}_{1},...,{M}_{n};{M}_{1}({X}_{1}),...,{M}_{n}({X}_{n})$ 
CBSWF algorithm is to get a strong learner through the serial iteration of a group of classifiers, so as to classify with higher precision. It uses forward distribution algorithm and weak learner uses classification regression tree (CART).
Let $t$ be the number of iterations, the strong learner obtained in the previous iteration is ${F}^{t\mathrm{1}}(x)$, and the loss function is $L(y,{F}^{t\mathrm{1}}(x))$, then the purpose of this iteration is to find the weak learner ${h}^{t}$ of the regression tree method and minimize the loss function of this round. Formula (9) represents the ${h}^{t}$ of the current iteration.
${h}^{t}=\underset{h\in H}{\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{i}\mathrm{n}}L((y,{F}^{t\mathrm{1}}(x)+h(x)))$(9)
The negative gradient of the loss function is used to fit the approximate value of the loss of each round, and ${g}^{t}(x,y)$ represents the above gradient in formula (10).
${g}^{t}(x,y)=\frac{\partial L(y,s)}{\partial s}{}_{s={F}^{t\mathrm{1}}(x)}$(10)
The formula (11) is usually used to approximate ${h}^{t}$.
${h}^{t}=\underset{h\in H}{\mathrm{a}\mathrm{r}\mathrm{g}\mathrm{m}\mathrm{i}\mathrm{n}}E({g}^{t}(x,y)h{(x))}^{\mathrm{2}}$(11)
Finally, as shown in formula (12), we get
${F}^{t}(x)={F}^{t\mathrm{1}}(x)+h(x)$(12)
The conditional distribution ${g}^{t}({X}_{k},{y}_{k}){X}_{k}$ calculated randomly according to $\{{X}_{k}\}$ is offset from the distribution ${g}^{t}(X,y)X$ of the test set, so that there is a deviation between ${h}^{t}$ defined by formula (11) and formula (9), which finally affects the generalization ability of method ${F}^{t}$.
4 Construction of Composition Method
The CBSWF method can effectively integrate the information from multiple methods, reduce the prediction risk of a single method, and improve the overall prediction accuracy of the algorithm by using CatBoost fusion MSSIS method to construct a combination method^{[27]}. Therefore, CBSWF method is combined with MSSIS feature selection algorithm to construct a combination forecasting method with CatBoost as the underlying algorithm. The corresponding training and forecasting process is shown in Fig. 1.
Fig. 1 Forecast training process 
1) MSSIS based feature selection operation is performed on the original feature set.
2) According to the order of MSSIS, the data set is divided into n training subsets $\{{\widehat{A}}_{\mathrm{1}},...,{\widehat{A}}_{n}\}$.
3) $n$ training subsets are trained by CatBoost to generate n subvariables $\{{X}_{\mathrm{1}},...,{X}_{n}\}$. The prediction error ${e}_{i}$ of each submethod is evaluated by test set and mean square error, and the calculation formula is as follows:
${e}_{i}=\frac{\mathrm{1}}{N}{\displaystyle \sum _{d=\mathrm{1}}^{N}}({z}_{d}{\widehat{z}}_{d}{)}^{\mathrm{2}}$(13)
where ${z}_{t}$ is the measured power value of ${X}_{d}$, ${\widehat{z}}_{d}$ is the predicted power value of ${X}_{d}$, and N is the number of samples.
4) Data preprocessing including feature selection is carried out on the prediction set, and the processed prediction set is divided into n prediction subsets according to the wind speed at different heights.
5) The $\{{X}_{\mathrm{1}},...,{X}_{n}\}$ generated by the training process corresponds to the prediction error $\{{e}_{\mathrm{1}},...,{e}_{n}\}$, and then the test samples are calculated to get the final prediction results.
5 Test and Analysis
Test is performed on the Win10 operating system and python3.7 and TensorFlow2.0, CPU: i7 12800HX, graphics card: GTX 3070Ti.
The sample data of wind farm A selected in this paper is the measured public data set of a wind turbine group in the United States in 2012 provided by the National Renewable Energy Laboratory (NREL). The installed capacity of the cluster is 18 MW, and the data obtained include six different time series data, such as wind direction, wind speed, air temperature, air pressure, atmospheric density and wind power. The sampling interval is 5 min, without missing value, a total of 105 120 time section data are obtained. In order to fully demonstrate the performance of the method proposed in this paper, relevant experiments are carried out on the measured data and historical numerical weather forecast data of a wind farm B in Northeast China. The installed capacity of the farm is 80 MW, the hub height is 70 m, and the sampling interval is 15 min. The data have a total of 70 000 time sections. Bilinear interpolation and Kriging interpolation methods are used to select missing values for different meteorological elements.
Considering that the sampling interval of the original data is 5 and 15 min, in order to maximize the extraction of hourly features, daily features and monthly features, the number of time steps is 12, the network uses minibatch input, and the number of samples per input is 72. The method uses 50 epochs iterative training calculations, and the measures of earlystop are taken to reduce the overfitting and training time, and the mean square error (MSE) is used as the loss function during training. Wind farm A uses the first 80 000 pieces of data as the training set, and the remaining 25 120 pieces of data as the test set; wind farm B uses the first 55 000 pieces of data as the training set, and the remaining 15 000 pieces of data as the test set. The grid search optimization method provided by keras is used to ensure the accuracy of parameter setting. The parameter structure selection are shown in Table 2.
Root mean square error (RMSE)^{ [28]} and mean absolute error (MAE)^{ [29]} are used to evaluate the prediction effect. RMSE mainly reflects the performance of the method to control the absolute error. MAE mainly reflects the actual situation of the prediction error. The specific calculation methods are as follows:
${\varsigma}_{\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}}=\sqrt[]{\frac{\mathrm{1}}{n}{\displaystyle \sum _{i=\mathrm{1}}^{n}}[\mu (i)\widehat{\mu}(i)]}$(14)
${\varsigma}_{\mathrm{M}\mathrm{A}\mathrm{E}}=\frac{\mathrm{1}}{n}{\displaystyle \sum _{i=\mathrm{1}}^{n}}\mu (i)\widehat{\mu}(i)$(15)
where $\mu (i)$ and $\widehat{\mu}(i)$ are the actual and predicted values of wind power, n is the number of prediction and verification data, and i is the sequence number of prediction points. In addition, all the assessment indicators provided in this paper are normalized by installed capacity.
The prediction ability of the CBSWF method is compared with two groups of prediction algorithms, MLSTM^{[16]} method and CEEMDFAESN^{[17]} method. The parameters of each method are adjusted according to the actual prediction situation to achieve the corresponding optimal prediction effect for comparison. And in order to be more in line with the actual situation, the value that exceeds the maximum generation power in the prediction results is recorded as the maximum generation power and the predicted negative power as 0. Figure 2 shows the time loss of the three prediction methods in the iterative process, which shows that the CBSWF method has the highest time efficiency.
Fig. 2 Iterations of the experimental model 
The comparison of the prediction results of the three prediction methods for wind farm A and wind farm B is shown in Fig. 3, which shows that for wind farm A and wind farm B, the three methods can better predict the trend of wind power, but the accuracy of CBSWF method is obviously improved and the degree of fit is better than that of other methods.
Fig. 3 Comparative analysis of test data for wind farm A ((a), (b)) and wind farm B ((c), (d)) 
From the comparison of the prediction results of Fig. 3(b) and 3(d) combined with the error comparison of various methods given in Table 3, it is obvious that the prediction accuracy of CBSWF method is higher than that of MLSTM method and CEEMDFAESN method, and the average error of this method in RMSE is 40.97% and 27.24% lower than that of MLSTM method and CEEMDFAESN method, respectively. The corresponding average error of MAE also decreased by 44.42% and 35.2%, respectively. For the sharp rise and fall of the original signal, the fitting effect of this method is still relatively ideal, while the errors of MLSTM method and CEEMDFAESN method are relatively large in the face of these sharp fluctuations. This shows that the MSSIS method can effectively extract the sequence details of meteorological data and achieve higher prediction accuracy. It further shows that the CBSWF method has obvious advantages in wind power prediction.
Parameter selection of method
Prediction errors of models
6 Summary
In view of the weak correlation of meteorological data and the accuracy of data prediction in the current shortterm wind power forecasting, a forecasting method CBSWF which combines the feature selection method of meteorological data with CatBoost is proposed in this paper. The meteorological data are processed based on MSSIS, and the meteorological features which have great influence on wind power are obtained by the relevant feature selection algorithm, and the coupling relationship between variables is eliminated. In order to take into account the accuracy and efficiency of the algorithm and reduce the prediction risk of a single method, this method uses CatBoost as the underlying algorithm to construct the combination forecasting method CBSWF. The prediction accuracy is improved by sorting the algorithm, and the gradient deviation algorithm is designed to increase the stability of the method, reduce the training time and reduce the amount of calculation. The experimental results show that the CBSWF method has a good performance in predicting the overall trend of wind power and its local details. In the next step of work, it will be considered to further improve the prediction accuracy of wind power by downscaling the data and time through meteorological numerical prediction.
References
 Li H H, Tan Z F, Chen H T, et al. Integrated heat and power dispatch model for windCHP system with solid heat storage device based on robust stochastic theory [J]. Wuhan University Journal of Natural Sciences, 2018, 23(1): 3142. [CrossRef] [MathSciNet] [Google Scholar]
 Hu Y, Li Q, Fang F, et al. Dynamic interval modeling of ultrashortterm output of wind farm based on finite difference operating domains [J]. Power System Technology, 2022, 46(4): 13461357(Ch). [Google Scholar]
 Tang X Z, Gu N W, Huang X Q, et al. Progress on short term wind power forecasting technology [J]. Journal of Mechanical Engineering, 2022, 58(12): 213236(Ch). [Google Scholar]
 Feng S L, Wang W S, Liu C, et al. Study on the physical approach to wind power prediction [J]. Proceedings of the CSEE, 2010, 30(2): 16(Ch). [Google Scholar]
 Lu P, Ye L, Pei M, et al. Coordinated control strategy for active power of wind power cluster based on model predictive control [J]. Proceedings of the CSEE, 2021, 41(17): 58875900(Ch). [Google Scholar]
 Sun Y, Li Z Y, Yu X N, et al. Research on ultrashortterm wind power prediction considering source relevance [J]. IEEE Access, 2020, 8: 147703147710. [CrossRef] [Google Scholar]
 Lu P, Ye L, Zhong W Z, et al. A novel spatiotemporal wind power forecasting framework based on multioutput support vector machine and optimization strategy [J]. Journal of Cleaner Production, 2020, 254: 119993. [CrossRef] [Google Scholar]
 Liu T H, Wei H K, Zhang K J. Wind power prediction with missing data using Gaussian process regression and multiple imputation [J]. Applied Soft Computing, 2018, 71: 905916. [Google Scholar]
 Liu X, Zhou J, Qian H M. Shortterm wind power forecasting by stacked recurrent neural networks with parametric sine activation function [J]. Electric Power Systems Research, 2021, 192: 107011. [CrossRef] [Google Scholar]
 Zhao Z, Wang X S. Ultrashortterm multistep wind power prediction based on CEEMD and improved time series model[J]. Acta Energiae Solaris Sinica, 2020, 41(7): 352358(Ch). [Google Scholar]
 Cherkassky V, Ma Y Q. Practical selection of SVM parameters and noise estimation for SVM regression [J]. Neural Networks: The Official Journal of the International Neural Network Society, 2004, 17(1): 113126. [Google Scholar]
 Zhou D W, Zhao L J, Duan R, et al. Image superresolution based on recursive residual networks [J]. Acta Automatica Sinica, 2019, 45(6): 11571165(Ch). [Google Scholar]
 Shahid F, Zameer A, Muneeb M. A novel genetic LSTM model for wind power forecast [J]. Energy, 2021, 223: 120069. [Google Scholar]
 Meng Y, Chen S L, Wu Z H, et al. A DC arc fault detection method based on CatBoost algorithm for different electrode materials [J]. Journal of Xi'an Jiaotong University, 2022, 56(3): 124134(Ch). [Google Scholar]
 Munawar U, Wang Z L. A framework of using machine learning approaches for shortterm solar power forecasting[J]. Journal of Electrical Engineering & Technology, 2020, 15(2): 561569. [NASA ADS] [CrossRef] [Google Scholar]
 Liu K W, Pu T J, Zhou H M, et al. A shortterm wind power forecasting model based on combination algorithms [J]. Proceedings of the CSEE, 2013, 33(34): 130135(Ch). [Google Scholar]
 Zhang Q, Tang Z H, Wang G, et al. Ultrashortterm wind power prediction model based on long and short term memory network [J]. Acta Energiae Solaris Sinica, 2021, 42(10): 275281(Ch). [Google Scholar]
 Ding J L, Chen G C, Yuan K. Shortterm wind power prediction based on improved firefly algorithm [J]. Journal of System Simulation, 2019, 31(11): 25092516(Ch). [Google Scholar]
 Qian Z, Pei Y, Cao L X, et al. Review of wind power forecasting method [J]. High Voltage Engineering, 2016, 42(4): 10471060(Ch). [Google Scholar]
 Dong L M, Zeng W Z, Lei G Q. Coupling CatBoost model with bat algorithm to simulate the pan evaporation in northwest China [J]. Water Saving Irrigation, 2021(2): 6369(Ch). [Google Scholar]
 Miao F S, Li Y, Gao C, et al. Diabetes prediction method based on CatBoost algorithm [J]. Computer Systems & Applications, 2019, 28(9): 215218(Ch). [Google Scholar]
 Yao F Q, Sun J W, Dong J H. Estimating daily dew point temperature based on local and crossstation meteorological data using CatBoost algorithm [J]. Computer Modeling in Engineering & Sciences, 2022, 130(2): 671700. [Google Scholar]
 Zhu R, Xu H, Gong Q, et al. Wind environmental regionalization for development and utilization of wind energy in China [J]. Acta Energiae Solaris Sinica, 2022, 8: 114(Ch). [Google Scholar]
 Ma X P, He S E, Yao Y, et al. Virtual inertia estimation of wind farm zones with wind speed uncertainty and correlation[J]. Power System Protection and Control, 2022, 50(10): 123131(Ch). [Google Scholar]
 Kundu S M, Pal S K. Deprecation based greedy strategy for target set selection in large scale social networks [J]. Information Sciences: An International Journal, 2015, 316: 107122. [Google Scholar]
 Fan J Q, Lv J C. Sure independence screening for ultrahigh dimensional feature space [J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2008, 70(5): 849911. [CrossRef] [MathSciNet] [Google Scholar]
 Zhou Q, Ren H J, Li J, et al. Variable weight combination method for midlong term power load forecasting based on hierarchical structure [J]. Proceedings of the CSEE, 2010, 30(16): 4752(Ch). [Google Scholar]
 Faber N M. Estimating the uncertainty in estimates of root mean square error of prediction: Application to determining the size of an adequate test set in multivariate calibration [J]. Chemometrics and Intelligent Laboratory Systems, 1999, 49(1): 7989. [Google Scholar]
 Coyle E J, Lin J H. Stack filters and the mean absolute error criterion [J]. IEEE Transactions on Acoustics, Speech, and Signal Processing, 1988, 36(8): 12441254. [Google Scholar]
All Tables
All Figures
Fig. 1 Forecast training process 

In the text 
Fig. 2 Iterations of the experimental model 

In the text 
Fig. 3 Comparative analysis of test data for wind farm A ((a), (b)) and wind farm B ((c), (d)) 

In the text 
Current usage metrics show cumulative count of Article Views (fulltext article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.
Data correspond to usage on the plateform after 2015. The current usage metrics is available 4896 hours after online publication and is updated daily on week days.
Initial download of the metrics may take a while.