Study on the Predictive of Dynamic Milling Force of Milling Process Based on Data Mining

: To improve surface accuracy of the work-piece and ob ‐ tain potentially valuable information, a dynamic milling force pre ‐ diction model was proposed based on data mining. In view of the current dynamic milling force obtained through finite element simulation and analytical calculation, in the finite element model ‐ ing, the model built is inevitably different from the actual working conditions, and the analytical calculation is slightly cumbersome and complex, and a dynamic milling force prediction model based on data mining is proposed. The model was established using a combination of regression analysis and Radial Basis Function (RBF) neural network. Using data mining as a means, the internal relationship between milling force, cutting parameters, tempera ‐ ture, vibration and surface quality is deeply analyzed, and the in ‐ fluence of dynamic milling force changes on different situations is extracted and summarized by the methods of cluster analysis and correlation analysis. The results show that the proposed dynamic milling force model has a good prediction effect, ensures the pro ‐ duction quality, reduces the occurrence of flutter, improves the surface accuracy of the work-piece, and provides a more accurate basis for the selection of process parameters.


Introduction
In recent years, the deep integration of information technology and manufacturing industry has become one of the ultimate goals of the manufacturing industry to promote "smart manufacturing" [1] .Data-driven predictive and proactive manufacturing is beginning to permeate all aspects of the machining process [2] .
Judging and predicting workpiece surface quality by monitoring signals related to vibration, cutting forces, temperature and etc. is an important aspect of intelligent cutting [3] .The quality, efficiency and competitiveness of milling can be greatly improved by converting the large amount of relevant data contained in the milling process, such as vibration, cutting forces and temperature, into useful information, exploring its intrinsic relevance to actual machining problems, and applying methods such as cluster analysis, correlation analysis, regression analysis, and artificial neural networks to generate new and unusual information on the basis of the available data sets to guide production [4] .
As early as 1985, Wang et al [5] used fuzzy cluster analysis to identify the cutting state, and this method has been widely used in data mining.Cai et al [6] used cluster analysis to classify and identify the energy efficiency status by applying image processing, and invented a method to locate and trace the source of low energy efficiency problems occurring in cutting and machining systems, providing a theoretical basis and a potential technical solution for online monitoring of the energy efficiency status of cutting and machining systems.Bracke et al [7] outlined an integrated approach to detect grinding parameter settings which ensures repeatability on the one hand and economy on the other hand.Cluster analysis was used to identify groups of grinding parameters and grinding results with similar characteristics.
Based on the connection function, Pei et al [8] analyzed the correlation of force signal, vibration signal and roughness signal in milling process, and got a better result.Rodrigues et al [9] applied correlation analysis to explore the degree of correlation between roughness and cutting speed and feed rate when turning 45-gauge steel in order to find the optimal configuration of the parameters.Thasthakeer et al [10] investigated the correlation between the variation of different parameters and the rate of change of cutting forces and surface roughness after machining.They found that this correlation was higher under wet machining conditions.Based on the multiple regression analysis, Lin et al [11] established the prediction model of the cutting parameters and the machining vibration of the machined parts in face milling by the surface roughness, and got a good prediction result.In Ref. [12], to explore the relationship between machining parameters and cutting temperature, the analysis of variance was used to test the significance of the parameters and a regression equation was developed to demonstrate the accuracy of the predictive model by comparing predicted and experimental values of cutting temperature.Rajesh et al [13] used multivariate regression analysis, which included a variance analysis, to establish a predictive model for the relative effects of process parameters on incision width and surface roughness.Yeganefar et al [14] used support vector machine, the neural network and regression analysis to predict and optimize the surface roughness and cutting forces during milling, and compared them with each other in order to obtain a selfuse and optimal selection scheme.Maher et al [15] used an adaptive neuro-fuzzy inference system model to study the degree of correlation between cutting parameters, cutting forces and surface roughness.Basak et al [16] applied Radial Basis Function (RBF) neural network to milling, and fitted RBF neural network model by 27 kinds of experimental data under different cutting conditions, which showed that this method can be applied to milling.
The advantage of data mining is the introduction of cluster analysis, correlation analysis and regression analysis which are efficient, convenient and accurate in solving practical machining problems.This method could start from the actual milling process and take cluster analysis, correlation analysis and regression analysis and RBF neural network as the main means to build a dynamic milling force prediction model.Results demonstrate that the proposed dynamic milling force model has a better prediction effect, which ensures production quality and reduces the occurrence of chattering to provide a more accurate process parameters selection basis.

Milling Data Gain with Multiple Sensors
In order to obtain the necessary experimental data, in this paper, the complex milling experiments including surface quality detection before milling, milling process, milling temperature detection, milling vibration detection, dynamic milling force detection and milling process are designed.The experimental scheme is shown in Fig. 1.

Design of the Milling Experimental Programme
In this paper, 12CrNi2 was selected as the experimental material (as shown in Fig. 2).This material belongs to alloy steel, which contains nickel and has certain sticky characteristics.Therefore, in order to make the cut surface have a high surface finish, a four-flute tungsten steel monolithic alloy end mill with the size of D12 mm×75 mm×4F is selected, as shown in Fig. 3.
Considering the influence of chatter on the actual cutting process before setting up the process parameters, the cutting stability region is found through the milling stability analysis in advance (Fig. 4).The range of process parameters is determined by combining the process manual.
The technological parameters are as follows: cutting width a e =0.5-6 mm, cutting depth a p =0.1-2 mm, cutting speed v c =72-140 m/min, feed per tooth f z =0.025-0.063mm, rotational speed n=1 000-4 000 r/min, and f is the feed speed.Then the response surface method is used to design the concrete parameter scheme.The specific cutting process parameters are shown in Table 1.

Experimental Data Acquisition
Before milling, one end of the three acceleration sensors is fixed at the spindle handle and fixture, respectively.The other end is connected to an LMS tester connected to a laptop computer, so as to collect the vibration of the spindle tool holder and the vibration of the work-piece itself during the milling process, and prepare for subsequent data processing.In order to measure the change of cutting temperature with time during the milling process, we focus the infrared thermal image on the tool tip, set a moderately sized temperature collection frame, select the real-time highest temperature in the display collection frame, and use this temperature as the tool tip for milling temperature to realize real-time monitoring of milling temperature.The milling method is down milling.To avoid the influence of cutting fluids on cutting temperature variations, a dry cutting machining method is chosen.During cutting, the pre-treated workpiece is tightly fixed to the fixture of the pressure-type three-way force test sensor, the surface of the workpiece is gently tapped with a copper rod to keep its level, and the data on the real-time variation of the cutting force is recorded by this sensor.The experimental conditions are shown in Fig. 5.
When the experiments were completed, roughly 3.5×10 8 experimental data were gained.The resulting data were collated to obtain the experimental results.Taking the first set of experiments as an example, the data images are shown in Figs.6-9.
Figure 6 is the first set of roughness curves for the first piece of workpiece.According to the distribution of cutting traces, the surface of each workpiece is divided

Experimental Data Preprocessing
Before data mining can be performed on a sample of data, the data are usually needed to pre-process, i. e. clean, integrate, transform and normalize.
As the samples used for data mining in this paper are derived from experiments where the data acquisition sources are the respective test equipment, the acquired roughness, temperature, cutting force and vibration data have been pre-classified and placed at the time of acquisition.These data are imported into EXCLE tables separately and integrated to complete the integration work; as the data samples are collected under specific conditions and the collection process is supervised by personal, there is no missing data, mis-storage and duplication, but the data signal itself has trend terms and noise, so the data with numerical anomalies are needed to eliminate from the samples through the data screening function of EXCLE before analysis to obtain a more reasonable database, and then eliminate the trend items and reduce the noise.In order to complete the subsequent data mining work efficiently, workstations were chosen to be used for data processing where the data was within the computer  s reach and, therefore, no data imputation was required.
As the data samples have various characteristics and their extends are different from each other, the data needs to be standardized.There are many types of experimental data in this paper, such as the temperature data, vibration data, the surface roughness data and cutting force data.The data itself has a large numerical span, so the Z-score standardization is used.After the Zscore normalization process, the mean value of the data could become 0, while the standard deviation becomes 1.The calculation is as follows: where x is the original data of the sample, x ˉis the mean of the original data, σ is the standard deviation of the original data, and x * is the new sample data after the standardization process.Taking the cutting force data of the first group of experiments as an example, its cutting force data after standardization is shown in Fig. 11.
Considering that data preprocessing will make some features of the original data impossible to observe, the original data is qualitatively and quantitatively analyzed.
1) It is known from the temperature data that the influence of cutting width on cutting temperature in the cutting process parameters is more significant, and the cutting instability, severe roughness and machining surface hardening will make the overall temperature of the cutting process rise significantly.
2) By analyzing vibration data, it was found that higher amplitudes often mean higher temperatures.When the spindle speed reaches a certain value, the increase in cutting depth at this time will lead to increased vibration.
3) From the surface roughness data, it can be seen that if the spindle speed and feed speed can be reasonably increased, and the cutting depth can be reduced, it will effectively improve the quality of the processing surface and improve the processing efficiency, otherwise it will be counterproductive.

Cluster Analysis
The cutting force data has three characteristics: X, Y and Z direction components.In this paper, we propose to use the pre-processed dynamic milling force data as a classification, and use the k-means algorithm as the core to filter and cluster the dynamic milling force and its corresponding cutting process parameters and time according to the principle of "the highest degree of similarity".
Taking the first group of experiments as an example, after the sample data is preprocessed and normalized, its distribution is shown in Fig. 12.When performing the cluster analysis, the best number of classifications for the group of samples was first determined by drawing a line graph, and then the iteration value was set.The value of k is usually picked up by a large number of experiments, d is the sum of the distances under different k values from a large number of experiments.The appropriate k value can be determined by making a line chart with d as the ordinate and k as the abscissa.The corresponding k value at the inflection point is the best choice.Here, the number of clustering centers k=4 and the number of iterations is 100.After 100 iterations, the clustering result suiting to the smallest selection of d is shown in Fig. 13, where different colors represent different categories.
The remaining 45 groups of milling force data are clustered using the same method.Eventually it was found the majority (21 groups in total) of the experimental groups had milling force data that could be classified into 3 categories; 9 experimental groups had data that could be classified into 4 categories; 6 experimental groups had data that could be classified into 2 categories; few (2 groups in total) had data that could be classi-fied into 5 categories; and the rest of the data was unclassifiable data that each belonged to a category.Cluster analysis of data, the sample with a large amount of data, is divided into different sub-categories according to certain rules, so as to avoid problems such as large errors and insufficient accuracy when different types of data are analyzed together in the later stage.

Correlation Analysis
The clustering method described above classified the available 46 sets of milling force data into 126 categories, which shows that the main influencing factors for the variation of the milling forces are still different, even when they are generated under the same cutting process parameters.However, the clustering is of little significance for dynamic stiffness calculations at a later stage.In order to clarify the differences and characteristics of each type of data, further categorize them according to their main influencing factors, and improve the efficiency of data processing, the correlation analysis method was chosen to process each type of data and find their respective "labels".From the above analysis it was found that temperature, roughness and vibration were the main influencing factors on dynamic milling forces, from which a preliminary judgement can be made that each group of experimental values of milling forces can be subdivided into multiple categories and their division may be related to these influencing factors.However, we know that dynamic milling forces are influenced by more factors than this.Therefore, in order to find a corresponding "label" for each group of data, a correlation analysis was chosen to correlate each category of data with temperature, roughness and vibration data, and to carry out a twotailed test for significance, while data that were not significantly correlated with these three factors were grouped into a new category "Other".
Before conducting the correlation analysis, a scatter plot was drawn to make a preliminary judgement on the linearity or non-linearity of the relationship between the data, with a view to determining the type of correlation coefficient chosen.Taking the first six sets of data as an example, Fig. 14 shows the scatter plot of correlations corresponding to the first six sets of experiments.
As shown in Fig. 14, scatter plots of the correlation between the different classes of the first set of experi-mental three-way milling forces and the vibration data in the Z1 direction, the vibration data in the Y2 direction, the roughness data and the temperature data were plotted, respectively, and it was found that all relationships were not linear and the data itself did not obey a normal distribution, so the Spearman coefficient was chosen for further analysis.
Table 2 is an excerpt from the results of the correlation analysis performed on the different classes of the 46 groups of experimental milling forces (since the original data were divided into 126 classes of more categories, they are not listed here).
It is important to note that the correlations are grouped in the same category, but in different directions, e. g. two categories are significantly correlated with vibration data, but the first category is positively correlated and the second is negatively correlated.
The correlation analysis revealed that 40 classes were significantly correlated with vibration data, 72 classes were significantly correlated with temperature data, 67 classes were significantly correlated with roughness data, and only 16 classes were classified as correlated with other factors, some of which were significantly correlated with two or three kinds of vibration, temperature and roughness data at the same time.Therefore, the milling force data obtained from all experimental groups can be classified into eight major classes as a whole: 1) vibration as the main factor; 2) temperature as the main factor; 3) roughness as the main factor; 4) vibration and temperature as the main factors; 5) vibration and roughness as the main factors; 6) temperature and roughness as the main factors; 7) vibration, temperature and roughness all as the main factors; 8) vibration, temperature and roughness are not major factors.The data modeling should be performed later to match with them and be handled separately.

Data Modeling
As mentioned above, the dynamics of milling forces are divided into eight categories.The relationship between the dynamic milling force and corresponding independent variables was also different because of the different main categories of each type of data.According to the characteristics of different types of data, the dynamic milling force prediction model with high fitting accuracy is obtained by modeling the classified data with multielement regression analysis and RBF neural network, respectively.

Regression Analysis of Dynamic Cutting Force
Regression analysis is an important analysis tool for data mining, which has been widely used in modeling of various fields for its high efficiency, intuitionistic and accurate characteristics.In practical processing, the change of a certain index is caused by the coupling of several parameter variables, so multiple regression analysis is the most commonly used regression method in engineering application.
The current experimental formula of milling force is usually expressed in exponential form, which consists of correction factor, cutting process parameter and exponential factor [17] .The available process parameters are cutting width, cutting depth, cutting speed, feed per tooth, feed, spindle speed, milling cutter diameter and number of teeth.However, there is an absolute correlation between the feed speed, the number of teeth per tooth and the spindle speed, and between the cutting speed, spindle speed and the diameter of the milling cutter.
With consideration of degree of close correlation between dynamic cutting forces and time, the time parameter is introduced into the model at the same time.The regression model for the dynamic cutting forces was initially determined to contain the following six parameters: depth of cut a p , feed per tooth f z , cutting width a e , cutting speed v c , number of teeth of the milling tool z and time t.The model expression is: Compared with the general model of multi-linear regression, the original model is in the form of polynomial summation, and the model is in the form of exponential class product, so the model is needed to deform, which are obtained from equations as follows: lg Let y = lgF, b = lgAz, x 1 = lga p , x 2 = lgf z , x 3 = lga e , x 4 = lgt, x 5 = lgv c , then the regression model is trans- The independent variable and dependent variable data were processed using the least squares.To find the constant term B of a polynomial, add a column of 1 to the data of the independent variable, and the first coefficient is the constant term.
Through correlation analysis, take the data of "vibration as the main factor" as an example, carry on logarithmic analysis to the data sample, then carry on the regression analysis.The results show that the judgment parameters of the regression model are not ideal and from the residual analysis graph it is proved that the confidence intervals of many residuals do not contain zeros.
(i) When the first regression analysis of the data is not satisfactory, the reasonable expectation of goodness of fit R2 is given first, and then the iteration is repeated.In the process of iteration, the original data without zero in all residual confidence intervals are eliminated, and the remaining data are retained to form new independent and dependent data samples.And then regression analy-sis is carried out again, until the goodness of fit is met.
(ii) At this point, according to the latest residual analysis image, adjust the fitting degree properly, and repeat the iterative process in step (1) until the evaluation index can judge the model more accurately.
Figure 15 shows the residual analysis of the first major category of data after the residuals are eliminated.The red lines in the two yellow circles in the graph indicate abnormal data, while the remaining green lines indicate the zero confidence interval from normal data to the residuals.The small circle in the centre of each line indicates the value of the residuals, while the whole line indicates the confidence interval of the residuals.
As shown in Fig. 15, after the exception data are culled, there are only 2 exception data left, which are negligible compared with the nearly 6 000 data samples.The analysis of the other scoring criteria showed that the R2 of fit was 0.776 6, the F-value was 3 989.293,and the probability of error P was 0. Therefore, it is clear that the constructed regression model is successful.The final regression analysis results for the eight categories of data are shown in Table 3.
After judging and analyzing the final results of the above eight types of multiple linear regression models, it is found that the first five dynamic cutting force multiple linear regression models, "vibration was the main factor", "vibration and roughness were the main factors", "vibration and temperature were the main factors", "vibration, roughness and temperature were the main factors", and "vibration, temperature and roughness were not the main factors", fit better, all at 0.6 F values are large, the probability of error is zero, and the number of confidence intervals for the residuals without zeros is less than 10, which can be directly applied.Their linear fit equations are: 1) Vibration is the main factor affecting the dynamic milling force variation: F = 1.1594 ´10 23 a p 0.5397 f z -4.9559 a e -0.1141 t -0.0118 v c 2.9676 z (5) 2) Vibration and roughness are the main factors affecting the dynamic milling force variation: F = 0.0003a p -0.2574 f z -1.1646 a e -0.0949 t -0.0146 v c -0.0751 z (6) 3) Vibration and temperature are the main factors affecting the dynamic milling force variation: F = 0.2500a p -0.2328 f z -0.4130 a e -0.0923 t -0.0199 v c -0.1460 z (7) 4) Vibration, roughness and temperature are the main factors affecting the dynamic milling force variation: F = 3.6833 ´10 6 a p -0.1234 f z 0.5339 a e 0.1387 t -0.0663 v c -1.2183 z (8) 2952 a e 0.5863 t 0.0119 v c 0.9340 z (9) However, multiple linear regression models were constructed for the main factors of roughness, temperature, and temperature and roughness.Although the Fvalues, P-values and residual analysis were good, the fit was low at 0.266 5, 0.184 1 and 0.272 1, respectively, indicating that the three models were poorly fitted and not very accurate.Therefore, there was no significant linear or fitted linear relationship between the logarithms of the independent and dependent variables for the latter three types of data.By combining their scatter plots with the number of independent variables, it was found that the scatter plots did not fit the common non-linear regression model.This result supports the conjecture that there may be a non-linear relationship between the independent and dependent variables of these three types of data that cannot be expressed directly, or that there is no direct correlation between the three types of data.Therefore, the latter three types of data could be further analyzed by using RBF neural network modeling.

RBF Neural Network Modeling of Dynamic Cutting Force
RBF neural network has a more intuitive structure, including the input layer, the implicit layer and the output layer.The input layer and the output layer respectively place the data samples of the independent variable and the dependent variable, and the number of input neurons is the number of independent variables.The im-plicit layer can be seen as a "bridge".When building a network, there are two transformations: the nonlinear transformation from the input layer to the implicit layer, and the linear transformation from the implicit layer to the output layer.It can be seen that the implicit layer plays a "bearing" role in the construction of the RBF neural network.In fact, when the vector is at a low latitude, there will be a linear inseparability, and the mapping of the vector from low latitude to high latitude can be realized through the implicit layer, so as to achieve the purpose of linear separability.
Assuming that there is a non-linear relationship between the independent and dependent variables of the three types of data mentioned above and that the form of the relationship is complex and the deterministic relational equation can be obtained, we choose to use RBF neural networks for prediction and error analysis modelling.To build the model, a total of six neurons were set up in the input layer and lga p , lgf z , lga e , lgv c , lgz and lgt were entered into each neuron.The output layer has only one neuron and stores the milling force data in it.The three categories of data were divided into two separate parts.The last 200 data sets of each category were used as test samples and the rest of the data were used as training samples for the construction of the neural network.
To improve the accuracy of the neural network, relatively reasonable expansion coefficients were set for each of the three types of data.The schematic diagram of the neural network model is shown in Fig. 16.Finally, RBF neural network models were constructed for the three main factors of roughness (expansion coefficient   17 indicates the fitted value of the neural network, and the blue line curve indicates the experimental value, both of which have similar trends and relatively small differences.Combined with the error plot, it can be tentatively judged that the constructed model fits well and the error is relatively small and within the acceptable The MSE of the neural network model is 0.045 9 with temperature and roughness as influencing factors, 0.040 3 with temperature as influencing factor, and 0.040 4 with roughness as influencing factor.The MSE of the three models is very small, and the RNF neural network model has good prediction accuracy and can be applied in practice.
Comparing the modeling of the above regression analysis, it can be found that not all the relationships between dynamic milling force data and cutting process parameters can be obtained by multiple linear regression analysis, especially when considering the dynamic milling force variation at different times.
For the dynamic milling force data, where the vibration factor is not one of the main influencing factors, it is more meaningful to construct a model using RBF neural network.At present, the constructed model can directly predict the dynamic milling force by cutting process parameters and time, and then combine with the dynamic displacement change to obtain the dynamic stiffness of the milling tool system, but the specific expression of the dynamic stiffness of the milling tool system cannot be obtained.

Conclusion
A combination of data mining and cluster analysis with correlation analysis studies is conducted in this paper.The results show that the dynamic cutting force data is divided into eight main categories based on correlation as follows."Vibration is the main factor", "vibration and temperature are the main factors", "vibration and roughness are the main factors", "vibration, temperature and roughness are the main factors", "vibration, temperature and roughness are not the main factors", "temperature is the main factor", "roughness is the main factor", "both temperature and roughness are the main factors.
Multi-linear regression analysis was applied to model the data of "vibration as the main factor", "vibration and temperature as the main factor", "vibration and roughness as the main factor", "vibration, temperature and roughness as the main factor", "vibration, temperature and roughness are not the main factors", and fit the expressions to the five types of data.The three types of data, "temperature is the main factor", "roughness is the main factor", and "temperature and roughness are the main factors", were modeled by RBF neural network, and achieved high prediction accuracy.
Finally, with two prediction models, better predictions were obtained, providing a more accurate basis for selecting process parameters, ensuring production quality and reducing the occurrence of chatter.

Fig. 14
Fig. 14 Scatter plots of the correlation of the first set of experimental

Fig. 15
Fig. 15 Residual analysis graph Fig.17indicates the fitted value of the neural network, and the blue line curve indicates the experimental value, both of which have similar trends and relatively small differences.Combined with the error plot, it can be tentatively judged that the constructed model fits well and the error is relatively small and within the acceptable

Fig. 16 Fig. 17
Fig. 16 Schematic diagram of RBF neural network model Fig. 17 Fitting curve of RBF neural network