A Power Load Prediction by LSTM Model Based on the Double Attention Mechanism for Hospital Building

: This work proposed a LSTM (long short-term memory) model based on the double attention mechanism for power load predic ‐ tion, to further improve the energy-saving potential and accurately control the distribution of power load into each department of the hospi ‐ tal. Firstly, the key influencing factors of the power loads were screened based on the grey relational degree analysis. Secondly, in view of the characteristics of the power loads affected by various factors and time series changes, the feature attention mechanism and sequential at ‐ tention mechanism were introduced on the basis of LSTM network. The former was used to analyze the relationship between the historical information and input variables autonomously to extract important features, and the latter was used to select the historical information at critical moments of LSTM network to improve the stability of long-term prediction effects. In the end, the experimental results from the power loads of Shanxi Eye Hospital show that the LSTM model based on the double attention mechanism has the higher forecasting accu ‐ racy and stability than the conventional LSTM, CNN-LSTM and attention-LSTM models.


Introduction
In recent years, with the rapid development of the urban construction in China, the problem of the building energy consumption has become increasingly prominent. At present, the building energy consumption accounts for about 30% of the total energy consumption of the whole society, and the building energy consumption on per unit area is 2-3 times that of the developed countries, which has become one of the main obstacles limit-sequence of the occurrence time. It is a typical time series with the uncertain nonlinear and discrete characteristics. Given the typical time series, the method of the power loads forecasting for air-conditioner has been developed gradually from the traditional statistical regression forecasting models to the nonlinear and discrete models of the shallow and deep neural network forecasting based on the artificial intelligence.
Artificial neural network (ANN) is one of the most widely used shallow network prediction models in the time series prediction. With the development of the prediction methods from the shallow neural network to the deep neural network, recurrent neural network (RNN) has received extensive attention, and the long short-term memory (LSTM) network [2] has promoted the development of RNN. It is so precise because LSTM can identify the nonlinearity and complexity of the data structures and patterns in the time series forecasting [3] that it has been widely used in the power loads forecasting [4,5] . However, although LSTM has many advantages in dealing with the complex nonlinear data, it has the limitations. For example, the LSTM method is too complex to be trained, and its performance is less than that of the simple ARIMA (autoregressive integrated moving average) method [6] . Zhang et al [7] proposed a new hybrid model for the power loads forecasting based on the improved empirical model of decomposition (IEMD), ARIMA and wavelet neural network (WNN), and found that its accuracy was better than that of the LSTM. Kim et al [8] proposed a simple recursive initial convolution neural network (RICNN) to enhance power loads forecasting in a specific time, and found that its accuracy was equivalent to that from the LSTM model. Just due to the above shortcomings of LSTM, in recent years, more and more researchers have improved the LSTM prediction models by combining with the traditional methods or the other machine learning methods. For example, Laib et al [9] combined the multi-layered perceptron (MLP) neural network with the LSTM model to improve the prediction accuracy significantly.
In particular, recently Shao et al [10] proposed a building energy consumption prediction model by integrating attention mechanism and LSTM. On the basis of LSTM model, the attention mechanism is added to highlight the impact of the building energy consumption characteristics included in the key time on the prediction results. Subsequently, aiming at the instability of the influence degree of each input characteristic on the cool-ing loads in the shopping malls, a short-term zoned cooling loads prediction model based on the double attention mechanism and LSTM was proposed by Zhao et al [11] , and the key influencing factors of the cooling loads in different areas of the shopping malls were screened by using grey relational degree analysis method. On the basis of the LSTM model, the feature attention was used to analyze the relationship between historical information and input variables autonomously to extract important features, and the sequential attention was used to select the historical information at critical moments of the LSTM network to improve the stability of long-term prediction effects. They found that, compared with the LSTM model, the error indexes were significantly reduced for the CNN-LSTM and attention-LSTM models, indicating a good generalization ability and strong stability.
In China, the average level of the energy consumption of the hospital buildings is second only to that of the shopping malls. The hospital has the characteristics of the year-round work cycle, complex equipment composition, large flow of people, and obvious seasonality [12,13] . Therefore, as a typical public building, the hospital building has a high energy consumption level, and the power load is a typical time series with the uncertain nonlinear and discrete characteristics. The systematic investigations into the energy consumption for the hospital building is of great significance for the energy conservation, regulation and supervision of the energy consumption, and the energy-saving reconstruction.
In this work, in order to realize and improve the energy-saving potential of the transmission and distribution system of air-conditioner in the hospital, and accurately control the distribution of the power load into each department of the hospital as required, the characteristics of loads and key influencing factor of load in the different departments of the hospital were analyzed, and a power loads forecasting method oriented to the functional partition of the hospital was proposed. Firstly, the key influencing factors of the power loads in different zones, such as the building space structure, business type and personnel density, were screened based on the grey relational degree analysis. Then, in view of the characteristics of the hospital power loads affected by various factors and time series changes, the feature attention mechanism and sequential attention mechanism were introduced on the basis of attention-LSTM network. The former was used to analyze the relationship between the historical information and input variables autonomously to extract important features, and the latter was used to select the historical information at critical moments of LSTM network to improve the stability of long-term prediction effects. Taking the power loads of Shanxi Eye Hospital as an example, combined with attention-LSTM network, the importance of key historical moments to predict the loads was analyzed, the output of the model was optimized, and the accuracy of the model was improved.

Functional Partitions and Loads Characteristics
The comprehensive Grade III-A Hospital is mainly composed of four departments: outpatient department, inpatient department, management department and logistics department. The outpatient department and inpatient department are the main building energy consumption departments in the hospital [13] . In these two departments, people s flow are diversified in nature, and people s flow time is long-lasting and unevenly distributed, meanwhile the medical energy consumption facilities are centralized, the performances of the equipment are different (e. g., rated voltage and loads are greatly different among the different energy consumption equipment), and the areas are special (surgical operation ward, medical ward, intensive care unit, etc.). Therefore, only the outpatient department and inpatient department are considered.
Furthermore, due to the large energy consumption of the diagnosis and treatment equipment, and scientific and effective energy saving measures taken on the basis of the safe use of the equipment, the instrument diagnosis and treatment departments are separately divided from the outpatient department and inpatient department in this work. Based on the hospital s year-round work cycle, complex equipment composition, large and obvious seasonal characteristics of people flow, taking Shanxi Eye Hospital (Grade III-A Hospital) as the research object, the main hospital building was divided into the outpatient department, inpatient department, instrument diagnosis and treatment department, and the power loads were discussed according to the different characteristics in the different seasons (winter and summer) in order to more accurately predict the actual loads demands with the uncertainty and non-uniformity to avoid the energy waste.
Due to the uncertainty of the activities of the hospital personnel, the average number of the people detained in a certain area in a certain period of time is taken as the statistical value of the number of personnel. For example, the average number of people between 7: 30 a. m. and 8:30 a.m. is considered as the number of people at 8:00 a. m. The hourly personnel densities in the different departments are obtained by fitting through the multiple linear regression, and the changes of the personnel density in each department are shown in Fig. 1. In terms of the time period, the personnel density in January is higher than that in July, and the personnel density on Monday morning is higher than those from Tuesday to Sunday morning, and the personnel density is higher in the morning than in the afternoon. On Saturday and Sunday, the personnel densities decrease significantly in comparison with those from Monday to Friday. The density of the personnel is closely related to the rules of daily life. From the perspective of space, the density of the outpatient department is significantly higher than those of the inpatient department and the instrument diagnosis and treatment department, which is related to department functions.
It has been found from many investigations that the changes of the daily power load in a week and the hourly power load in the same department are periodic and similar, and those in the workday are different from those on Saturday or Sunday. In this work, the changes of the hourly power load of each functional partition on a Tuesday and a Sunday are collected in Fig. 2.
On the whole, the power load change of each department presents a double peak shape, and the double peaks of those in the diagnosis and treatment department and outpatient department are more obvious than those in the inpatient department. Furthermore, the power load demand of each functional partition varies greatly. The power load demand in the inpatient department is the largest, followed by the instrument diagnosis and treatment department, and the outpatient department has the smallest power load demand. Moreover, for the instrument diagnosis, treatment department and the outpatient department, the power load demand on Tuesday is significantly greater than that on Sunday, and the power load demand during the normal working period is greater than that during the non-working period . For the inpatient department, the power load demand on Tuesday is larger than that on Sunday, but the difference is smaller than that of the instrument diagnosis and treatment department or the outpatient department; The power load demand in the normal working period is larger than that during the non-working period, and the difference is smaller than that in the instrument diagnosis and treatment department or the outpatient department. For the inpatient department and instrument diagnosis and treatment department, the power load demand in July is greater than that in January, while for the outpatient department, the trend is just the opposite.

Analysis of Influencing Factors of Power Load for Functional Partition
The power load of Grade III-A Hospital mainly consists of the personnel power load, diagnosis and treatment power load (mainly the electrical equipment power load), lighting power load, enclosure power load and fresh air power load. Grey relation analysis (GRA) can be used for the statistical analysis of the non-linear relationship between power load and multivariable factors [14] . The steps are as follows: Step 1 Determine the reference sequence and comparison sequence: Air-conditioning power load (Y c ) is as the reference sequence, and the personnel density (X p ), lighting power density (X l ), electrical power density (X e ), dry bulb temperature (X t ), relative humidity (X h ), solar radiation (X r ), wind speed (X w ), power load at time T−1 (X C(T -1) ), power load at time T − 2 (X C(T -2) ) are set as the comparison sequence. Set the reference sequence Y c , the reference sequence is as 2) , and X w .
Step 2 Calculate the grey correlation coefficient between the reference sequence and comparison sequence (ξ i (k)): where y(k) is the sequence of the normalized power load, x(k) is the sequence of the normalized influencing factors, and resolution coefficient ρ = 0.5.
Step 3 Calculate grey correlation degree r i : The grey correlation degrees between the different input variables and air-conditioning power loads in the dynamic prediction model of the air-conditioning power load in each functional area are shown in Table 1.
Similar to the large shopping malls, the grey correlation degree between the air-conditioning power load and wind speed in each functional partition of the hospital is small. This is because the hospital building adopts a closed enclosure structure, which does not form natural ventilation with the outside world. For the inpatient department and outpatient department, the direct heat transfers occur with the outdoor temperature and solar radiation due to the spatial location, which is greatly affected by the external meteorological parameters. The area with the large power of the diagnosis and treatment equipment has a large heat dissipation, which directly affects the change of the summer (cooling) power load, so it has strong correlation with the air-conditioning cooling power load. Due to the difference of the space layout and function in the different functional zones, the influencing factors of the air-conditioning power loads in different functional zones vary greatly. For example, no matter in summer or in winter, the power load in the instrument diagnosis and treatment department is significantly affected by the power of the diagnosis and treatment equipment; The power loads in the outpatient and inpatient departments are greatly affected by the personnel density and lighting power density. In addition, no matter the cooling load or heating load, X C(T -2) is far less related to X C(T -1) . Therefore, in order to reduce the coupling effect of the historical power load on the current power load forecasting, only the key influencing factors of each functional area and the regional historical power load data at T−1 are considered to form the input data set in the following forecasting models.

LSTM Neural Network
According to the analysis of the power load data from the different departments of the hospital, the changes of the power load have obvious periodic laws, so it is necessary to fully consider the time series. Compared with the traditional neural network, LSTM network model can effectively analyze the implicit information in the time sequence by using the time series [15] . In the power load forecasting by LSTM network, the historical power load information of the previous time can be fully used to process the sequence data, and the processed sequence data can be used to adjust the output of the power load at the next time.
The LSTM repeated module is composed of three activation function gates (i. e., forget gate σ 1 , input gate σ 2 , and output gate σ 3 ) and two activation tanh functions (ϕ 1 and ϕ 2 ) regarding the output, shown in Fig. 3. The symbol (• ) represents the concatenation operation, and the π and Σ symbols represent the element-wise multiplication and addition, respectively. The fundamental component of LSTM is the cell state, in which a line comes from the previous block memory to the current block memory. Afterward, the flow of information straight down the line is allowed. In other words, the forgetting part of the memory of the cell state was determined by the input of the cell state at the last time in the forget gate, the memory and intermediate output of the cell state. The data to be added to the cell state is adjusted by the sigmoid function, and the intermediate output is determined by the updated memory cell state and the output. The input variable X can be represented by Eq. (3).
The output of the forget gate is as follows: The input of the input gate is as follows: Table 1 Grey correlation degree between influencing variables and loads for each functional area (i.e., outpatient department (OD), inpatient department (ID), and instrument diagnosis and treatment department (IDTD)) on January (Jan) and July (Ju) The information of the output gate is as follows: where f T and i T are the information discarded and updated by the LSTM network, and C' T , C T , O T , h T+1 , W and b are the intermediate variable to store the current cell information, new cell state, information of the output gate at time T, output value of the hidden layer node, weight measurement of each gate, and deviation, respectively, and h T = O T tanh(C T ).

Attention Mechanism
The attention mechanism in deep learning is a mechanism by which the resource allocation is simulated according to the human brain attention. At a specific moment, the attention will be focused on the certain area that needs to be focused, so as to obtain more details that needs to be got attention while suppress other useless information [16] . Based on the principle of the attention mechanism, the feature attention mechanism and sequential attention mechanism are designed to analyze the influence of the real-time change of the meteorological factors, subjective behavior of personnel, and power of the diagnosis and treatment equipment on the change of the hospital power load. The former is used to analyze the importance of the different input variables to the power load, and to explore the correlation between the variables and power load, and the latter is used to analyze the importance of the power load at different historical times to the power load at the prediction time, and select the data at the critical time to improve the accuracy of the prediction model.

Feature attention mechanism
The changes of the air-conditioning power load are influenced by many factors. According to the grey relational degree analysis above, the key variables that af-fect the air-conditioning power load are different for the different departments, and the relational degrees of the key variables are also different. In order to explore the relational degree of the key factors to the airconditioning power load of the different departments of the hospital, and learn further the information of the input features, the feature attention mechanism is introduced, as shown in Fig. 4. The multiple perceptron method is used to quantify the weight of each feature influence.
According to the algorithm of the feature attention mechanism, the hidden state h T-1 of the LSTM network at the previous time and the input feature X N T of the current time are taken as the input of the feature attention mechanism, and the attention weight of each variable at the current time is calculated through Eq. (7), and then normalized using Eq. (8). Finally, the feature attention output X' is obtained by multiplying the weight of the current moment by the corresponding feature variable, and the influence of the relevant feature will be obtained through adaptive optimization.
where V e Î  N´1 , W e Î  N´q , U e Î  N´N , and b e Î  N´N are the neuron weights that needs to be learnt in the multilayer perceptron and the bias parameter, respectively, and q is the number of neurons in the last hidden layer of the LSTM network.

Sequential attention mechanism
The power load at the current moment is greatly affected by the historical power load, and the influence of the power load at different times is different. In order to clarify the impact of the information at each historical time on the current prediction results, the sequential at- tention mechanism is introduced into the output of the LSTM network. By means of the probability distribution, the importance of the information of the historical time of LSTM network is analyzed, and the importance of the power load at each historical time to the power load at the prediction time is analyzed, thus the prediction accuracy is improved.
The structure of the sequential attention mechanism is shown in Fig. 5. X T (T∈ [1, n]) and h T (T∈ [1, n]) represent the input and output of the hidden layer of the LSTM model, and α' T (T∈ [1, n]) is the probability distribution of the attention mechanism to the output of the hidden layer of the LSTM, and y is the output value of the LSTM network with the sequential attention mechanism, respectively. The attention weight matrix α' and eigenvector expression v is as follows: where e′ T is the normalized weight matrix, and w s , b s and u s are the weight matrix, bias amount and time series matrix of the randomly initialized attention mechanism, respectively.

LSTM Model for Power Load Forecasting Based on Double Attention Mechanism
The LSTM prediction model of the double attention mechanism is shown in Fig. 6. This model is mainly composed of the input vector, feature attention layer, LSTM network layer, sequential attention mechanism layer and fully connected output layer. The input vector is combined with the hidden state h T -1 at the previous time of LSTM network, and the weight of each feature variable at the current time is calculated through the fea-ture attention mechanism layer. Then, the feature attention input X' is calculated by multiplying the weight with the corresponding feature variable. After learning characteristics through LSTM network, the feature variable input X' is used to calculate the influence weight of the output of the power load information at each historical time through the sequential attention mechanism layer, and the improved output of the current time hidden layer state is obtained. Finally, it is input to the full connection layer, and the final prediction result is obtained.

Evaluation Index
Three evaluation indexes (correlation coefficient R 2 , mean absolute percentage error (MAPE), root mean square error (RMSE)) were used to evaluate the accuracy of the prediction model. The fitting accuracy of the power load forecasting model in a certain period of time is quantified by estimating the R 2 value. The greater the value of R 2 , the more accurate the model is. MAPE can be used to quantify variance in data, and the RMSE value is used to evaluate the error between the predicted value and real value. where n, J, C aj , C pj and C amean are the total number of power load, index of the data in time, actual power load, predicted power load, average value of the actual power load, respectively.

Data Set
In order to further validate the above model, the actual data from Shanxi Eye Hospital during the period of 2021-05-01-2021-07-31 and 2021-11-01-2022-01-28 were used to learn and test the prediction model. For heat load, the data from 2021-05-01 to 2021-06-31 are used as the training set, and those from 2021-07-01 to 2021-07-15 are used as the verification set, and those from 2021-07-15 to 2021-07-31 are used as the test set. For the cooling load, the data from 2021-11-01 to 2021-12-31 are used as the training set, those from 2022-01-01 to 2022-01-14 are used as the verification set, and those from 2022-01-15 to 2022-01-28 are used as the test set. The original data set during the period of 8: 00 -22:00 includes the power load data (kW), dry bulb temperature (℃ ), relative humidity (% ), solar radiation (kW/m 2 ), personnel density (N/m 2 ), lighting power density (W/m 2 ), and equipment power density (W/m 2 ).

Experimental Setup
Considering the difference of the changes of the power load within a week, weekend, summer and winter, the hourly density of the personnel in different weeks and seasons is fitted as the input of the power load prediction, which is used to estimate the impact of the time and week. Keras is used for the data processing. The input time step is set to 6 h, and the data of the first 6 h are used to predict the power load at the next time by sliding the time window. The LSTM network parameters are adjusted by the grid search and cross validation methods. According to the Refs. [11,[17][18][19], the parameters of the final LSTM prediction model are determined through multiple adjustments of the relevant parameters of the model (i. e., number of LSTM network layers: 2; number of the LSTM neurons in the first and second layers: 128 and 32; initial learning rate: 0.001; activation function: Relu; Dropout: 0.2; Gradient descent method; training cycle: 200).

Influence of variable selection on experimental results
The key influencing factors of the power load in the different functional partitions are different. In order to explore the influence of the input variables on the prediction model, the original data set is divided into two categories, namely, the feature variables selected by the grey relational degree analysis and those without the analysis. With the help of the attention LSTM model, they are predicted, and the prediction accuracy is shown in Table 2, and the results are shown in Fig. 7.
It can be seen from Fig. 7 that the predicted curve has the best fit with the real curve after selecting the characteristic variables. From Table 2, after selecting input variables, MAPE decreases to less than 5.5% for most of cases, and R 2 > 0.95, and RMSE also decreases significantly. For the same dataset, when the original variables are selected, the error is lower than that of those without variable selection. Therefore, selecting the input characteristic variables can eliminate redundant variables and invalid variables, enhance the generalization ability of the model, and effectively improve the prediction speed and prediction results of the model.

Influence of prediction model on results
In order to verify the effectiveness and prediction effect of the LSTM model based on the double attention mechanism, the differences of the power load demand in working days, weekends, summer and winter are considered. Tuesday, July 24, 2021, is selected as the typical summer working day, and Sunday, July 29, 2021 is selected as the typical summer weekend day. January 22, 2022 (Tuesday) is as a typical winter working day, and January 27, 2022 (Sunday) is a typical winter weekend  From Figs. 8-13, the LSTM model based on double attention mechanism has the best fit between the prediction curve and the real value in the power load prediction of each functional area on working days. It can be seen from Table 3 that the MAPE values of each functional area are 3.45%, 3.28%, 3.68%, 3.25%, 3.60% and 3.19%, respectively, lower than those of the other three models. In most cases, the values of R² are greater than 0.98, higher than those of the other models. The value of RMSE are smaller than those of other models. All above results indicate that the LSTM model based on double attention mechanism is more accurate in the prediction of the power load in hospital.
For the prediction of power load in each functional area on weekends, the LSTM model based on double attention mechanism has also the best fit between the prediction curve and the real value. From Table 4, it can be seen that the MAPE values are 3.36%, 3.52%, 2.73%, 3.61%, 3.53% and 2.62%, respectively, lower than those of the other three models. In most cases, the values of R² are greater than 0.98, higher than those of the other models. The value of RMSE are smaller than those of other models. Therefore, the LSTM model based on double at-tention mechanism is more accurate in the prediction of the power load in hospital.
In conclusion, based on the above results of the power load forecasting on weekdays and weekends, the LSTM model based on the double attention mechanism has the better forecasting performance, higher forecasting accuracy and better stability than those from the traditional LSTM forecasting model, CNN-LSTM and attention-LSTM models. This is due to the addition of two-layer attention mechanism on the basis of LSTM network model, which not only analyzes the importance of relevant features and optimizes the input features through feature attention mechanism for the multivariable input, but also optimizes the output of the model through the sequential attention mechanism combined with the historical power load information.

Conclusion
In order to further improve the energy-saving potential, and accurately control the distribution of the power load into each department of the hospital, the characteristics and key influencing factor of load in the outpatient, inpatient, and instrument diagnosis and treatment departments were analyzed, and a power load prediction by LSTM model based on the double attention mechanism was proposed in this paper. The experimental results show that the LSTM model based on the double attention mechanism has the high forecasting accuracy and stability. The conclusions of this article are as follows : 1) Grey correlation analysis shows that the key factors affecting the power load of each functional area in hospital are different. The determination of the key factors can improve the accuracy of the power load prediction and the generalization ability of the prediction model.
2) The LSTM model based on the double attention mechanism has the better forecasting performance, higher forecasting accuracy and better stability than those from the traditional LSTM forecasting model, CNN-LSTM and attention-LSTM models. This is due to the addition of two-layer attention mechanism on the basis of LSTM network model, which not only analyzes the importance of relevant features and optimizes the input features through feature attention mechanism for the multivariable input, but also optimizes the output of the model through the sequential attention mechanism combined with the historical power load information.