(1.濟南大學水利與環(huán)境學院,濟南 250022;2.中國水利水電科學研究院水資源所,北京 100038;3.天津大學建筑工程學院,天津 300072;4.河海大學水利水電學院,南京 210098)
圖1 BP神經(jīng)網(wǎng)絡模型結構
膠東調(diào)水工程是山東省水利建設的重要組成部分,包括引黃調(diào)水工程和引黃濟青工程2條輸水線路。引黃濟青工程于1986年4月15日開工興建,1989年11月25日正式通水;引黃調(diào)水工程于2003年12月19日開工,2013年7月全線貫通,2013年12月主體工程建成通水。其中,引黃調(diào)水工程包括明渠段和管道段兩部分,明渠段以宋莊分水閘為起點,以黃水河泵站為終點,途經(jīng)灰埠、東宋、辛莊3座泵站及若干倒虹吸、渡槽等輸水建筑物,全長約160 km。本文所選研究區(qū)為引黃調(diào)水工程明渠段,具體研究區(qū)域為東宋泵站前后,其上游控制節(jié)點為灰埠泵站,下游控制節(jié)點為埠上節(jié)制閘,該渠段及沿線建筑物分布情況見圖2。
圖2 研究渠段及沿線建筑物分布
表1 影響因子相關性分析結果
采用3 600個數(shù)據(jù)預測未來2 h的水位變化,7∶3比例下的R2、ERMS、EMA分別維持在0.95、0.04、0.03左右。增大該比例時各指標預測效果略有提高,但相差不大,高于5∶1時其預測精度基本不提高。具體對比見圖3和圖4。
圖3 未來2 h水位預測結果(7∶3)
圖4 未來2 h水位預測結果(5∶1)
圖5 未來2 h水位預測結果(7∶3)
圖6 未來2 h水位預測結果(7∶3)
影響因子數(shù)量。當影響因子與預測因子之間都具有較高相關性時,影響因子數(shù)量越多,預測結果越精確。但影響因子的數(shù)量會增加訓練期的數(shù)據(jù)需求量,為減少數(shù)據(jù)需求量且保證預測精度,利用不同數(shù)量的影響因子進行訓練和驗證,驗證結果表明:短期(1~3個月)內(nèi)至少選擇3~5個影響因子進行訓練,3個月至1 a的數(shù)據(jù)量則至少需要5~7個相關性最大的影響因子。
圖7 3個因子水位預測結果(7∶3)
影響因子的時間間隔。數(shù)據(jù)間隔均為2 h時,對東宋泵站未來時刻的水位進行預測:未來2 h的水位預測結果較穩(wěn)定,R2結果均在0.9以上,ERMS和EMA也較??;未來4 h的水位預測結果一般,R2為0.8~0.9,ERMS和EMA比2 h預測略大;未來6 h的水位預測結果較差,R2不穩(wěn)定且變化區(qū)間較大,結果較好時也僅為0.7左右,ERMS和EMA則預測結果偏大,分別在0.11和0.09左右。即訓練期內(nèi)數(shù)據(jù)不發(fā)生改變時,預測時間越長,預測精度越低。對3個月的數(shù)據(jù)進行篩選,使2 h間隔轉為4 h間隔,并預測東宋泵站未來4 h的水位,預測結果見圖8。
圖8 未來4 h水位變化結果(7∶3)
對10個月的數(shù)據(jù)進行4 h間隔的篩選并預測未來4 h水位,預測結果見圖9。
圖9 未來4 h水位變化結果(7∶3)
研究結果表明,與采用2 h間隔的數(shù)據(jù)直接預測相比,采用4 h間隔的數(shù)據(jù)預測泵站未來4 h的水位,其預測精度更高,R2變化基本維持在0.82~0.93,ERMS和EMA分別維持在0.05~0.06、0.04~0.05。
對1 a的數(shù)據(jù)進行篩選,使2 h間隔轉為6 h間隔,并預測東宋泵站未來6 h的水位。預測結果表明,篩選后進行預測比用2 h的數(shù)據(jù)直接預測其預測效果更差。經(jīng)分析,上述現(xiàn)象是由6 h的時間間隔太長不能完全反映各因子變化規(guī)律導致,所以篩選后進行預測的結果比直接采用2 h間隔的數(shù)據(jù)進行預測結果更差。
影響因子對預測結果的影響:數(shù)據(jù)量與影響因子數(shù)量呈對應關系,3個月的數(shù)據(jù)量需3~5個影響因子進行訓練,3個月至1 a的數(shù)據(jù)量則需5~7個影響因子確保相同預測效果。
Prediction model for forebay water level of pumping stations with different time scales based on BP neural networks
XUE Ping1,ZHANG Zhao2,LEI Xiaohui2,LU Longbin1,YAN Peiru3,LI Yueqiang4
(1.School of Water Conservancy and Environment,University of Jinan,Jinan 250022,China;2.Institute of Water Resources,China Institute of Water Resources and Hydropower Research,Beijing 100038,China;3.School of Civil Engineering,Tianjin University,Tianjin 300072,China;4.College of Water Conservancy and Hydropower Engineering,Hohai University,Nanjing 210098,China)
Abstract:Considering the difficulty in water level prediction under building control,a water level prediction model for the forebay of a pumping station was built on the basis of back-propagation(BP)neural networks,and the influence of time series and impact factors on the accuracy of water level prediction was analyzed under different time scales.The constructed model was applied to the Dongsong Pumping Station of the Jiaodong Water Transfer Project.The research results revealed that:when the total amount of data was fixed,and the ratio of the training period to the prediction period was 7∶3,the prediction result was good;a larger amount of data was accompanied by a greater number of positively correlated impact factors required for certain prediction accuracy;in a short period of time,when the prediction time interval was the same as the time interval of the data itself,the prediction effect was better.The constructed model can meet the demand for dynamic prediction of the water level in the forebay of the open channel water transfer project and can achieve the 2 h accurate prediction of the forebay water level of the pumping station and the 4 h general accurate prediction.Additionally,it can be popularized and applied in other similar open channel water transfer projects.
Keywords:forebay of pump station;water level prediction;BP neural network;time series;proportion
Fund:National Natural Science Foundation of China(51779268)
Author′sbrief:XUE Ping(1998-),female,Weifang Shandong Province,mainly engaged in research on hydrology and water resources.E-mail:2857487127@qq.com
Correspondingauthor:LEI Xiaohui(1974-),male,Weinan Shaanxi Province,Ph.D.,professor-level senior engineer,mainly engaged in research on hydrology and water resources,reservoir dispatching,and hydraulic control.E-mail:lxh@iwhr.com
For the long-distance water dispatching of an open channel water transfer project,hydraulic structures such as pumping stations,control gates,and inverted siphons are generally set up in the channel to relieve the influence of terrain conditions on water transfer restrictions.Meanwhile,monitoring equipment such as water level meters and flow meters are installed in front of buildings to obtain water information and monitor water safety.Compared with real-time water level monitoring,high-precision water level prediction can provide more scientific guidance for dispatchers in the process of water dispatching,especially the water level prediction in the forebay of pumping stations,which is of great significance to the regulation of pumping stations,water dispatching,and channel safety.Affected by various factors such as climate,temperature,and human activities,the hydrological sequence collected by monitoring equipment often presents the characteristics of nonlinearity and uncertainty,and it is difficult to analyze the laws and predict the trend by conventional methods.Scholars[1-3]have built hydraulic models to simulate the changing process of channel water flow,but the modeling requires complete and accurate topographic data,engineering parameters,and measured data;moreover,the calibration process of the roughness rate is also repetitive and cumbersome[4],and thus there are huge limitations.With the continuous progress of artificial intelligence technology and machine learning methods,the data-driven methods used for the prediction can avoid many requirements and limitations of hydraulic modeling and directly explore the inherent laws between data[5].
Up to now,most scholars[6-9]have built neural network models for water level prediction,such as the optimized RBF neural network,LSTM neural network model,and wavelet neural network applied in groundwater level prediction,with high prediction accuracy and an excellent prediction effect.Although the relevance vector machine(RVM)prediction model[10],Mike model[11],similarity model[12],statistical model[13],and Bayesian model[14]can be constructed for water level prediction,their applications are limited to a certain extant,and hence they are not widely used in water level prediction for water transfer projects.As the neural network has been commonly used in water level prediction,and its development has gradually matured,many scholars[15-26]have made water level predictions by combining neural network models and algorithms or improving algorithms.For instance,Wu et al.[27]combined KNN,GA,and BP to predict the flood level of the Qinhuai River,and compared with the neural network model without combination,the combined method has higher prediction accuracy but is slightly more complicated.In other words,the uncombined neural network models are simple and practical.For example,Gao et al.[28]used the BP neural network to predict the water level in front of the pumping station and found that the BP neural network has great advantages in solving nonlinear problems and has significant potential in intelligent prediction.In addition,the commonly used evaluation indicators include the root mean square error(ERMS)and determination coefficient(R2)[29].
In summary,it is a feasible research method to construct a neural network for water level prediction.Moreover,intelligent algorithms such as artificial neural networks have certain applicability conditions in hydrological prediction applications.For example,ANN has a strong nonlinear ability,but due to its simple structure,previous information can not be saved,and time series data can not be learned.RNN can retain the water level prediction at the previous moment and can effectively process sequence data,but there are defects in gradient transfer.LSTM has long and short-term memory functions and can solve gradient disappearance and gradient explosion to a certain extent,but there are still problems in long sequences,and it can not be parallelized.Restricted by the one-way flow of information,the classical BP neural network considers a limited amount of historical information and is only suitable for short-term prediction,but it has a stable structure and features versatility and simplicity,which can flexibly deal with nonlinear problems,achieve high prediction accuracy,and has strong nonlinear mapping ability.As the hydrological sequence in hydrological forecasting is greatly affected by human factors and presents a prominent nonlinear characteristic,and the BP neural network is suitable for hydrological forecasting.Since BP neural network was proposed by Rumelhart et al.[30]in 1986,it has been widely used in research on hydrological prediction.In this paper,a BP neural network was established.We used historical data to predict the water level in the forebay of the pumping station and analyzed the influence of the time series proportion and impact factors on the water level prediction.The research results can provide a new method for water level prediction and reference data for the changing trend of the water level in the forebay of the pumping station.
The water level in the forebay of the pumping station is selected as the research object.The impact factors are determined by correlation analysis and are used as the input to construct the BP neural network model,and then the prediction results are judged by the parameters of each indicator.
Under the influence of various hydraulic factors(section area,hydraulic gradient,roughness,etc.),there is a corresponding relationship between the section flow and the water level in the channel.As one of the monitoring sections,the water level in the forebay of the pumping station may have a hydraulic connection with the water level of the adjacent section,the flow of the pumping station,the upstream flow,and the flow difference.Taking these relevant water levels and flow as variables,we conduct a correlation analysis of each variable and the predictor,and the impact factors with a certain degree of correlation are identified.
The impact factor identification methods adopted include Pearson′s correlation coefficient,Kendall′s correlation coefficient,Spearman′s rank correlation coefficient,and grey relational analysis(GRA).Pearson′s correlation coefficient is used to measure the degree of correlation between two variables,and Pearson′s correlation coefficient between two variables is defined as the quotient of the covariance and standard deviation between the two variables.Kendall′s correlation coefficient is a method to represent the degree of correlation of multi-column rank variables.Ifnsimilar statistical objects are sorted by a specific attribute,other attributes are usually out of order,and the ratio of the difference between same-order pairs and out-of-order pairs to the total number of pairs[n(n-1)/2]is defined as Kendall′s coefficient.Spearman′s rank correlation coefficient is a method to study the correlation between two variables according to the rank data;in other words,it is calculated according to the rank difference between each pair of two-column paired ranks,and the monotone equation is used to evaluate the correlation of the two statistical variables.The range of the correlation indicator or correlation coefficient of the above three methods is from-1 to 1:When the absolute value of the correlation coefficient is closer to 1,the correlation is higher;when it is equal to zero,there is no correlation.GRA is a quantitative method for analyzing the correlation degree of each factor in the system,which measures the degree of correlation between factors according to the degree of similarity or dissimilarity in development trends among different variable sequences.When GRA is less than 0.6,it is considered that there is no correlation,and when it is closer to 1,the correlation degree is higher.
A BP neural network is a multilayer feedforward neural network trained by an error back-propagation algorithm,generally including the input layer,hidden layer,and output layer.The input layer has the function of information access,i.e.,signal reception.When the signal reception is completed,the information is transmitted to the hidden layer,and the number of neurons in the input layer is the numbernof input impact factors.The hidden layer is responsible for information processing and information transformation,and the number of neurons in the hidden layer ism,which is less thanN-1(Nis the number of training samples),whose value is tested in MATLAB.Then,the information is transmitted from the hidden layer to the output layer,and the output layer outputs the results.The typical structure of a three-layer network is shown in Fig.1.
Fig.1 BP neural network model structure
The neural network structure parameters are set as follows:maximum training times=100;required accuracy of training=1×10-8;learning rate=0.01.Upon the parameter setting,the network automatically adjusts the weights and thresholds by the back-propagation of errors,which drives the expression function in the BP neural network to obtain the optimal solution,and finally,it outputs the prediction results and the indicator values of the evaluation results.
R2,ERMS,and the mean absolute error(EMA)are used as the evaluation criteria to judge the strengths and weaknesses of the prediction results.WhenR2is closer to 1,andERMSandEMAare closer to zero,the prediction accuracy is higher.
The Jiaodong Water Transfer Project is an important part of the water conservancy construction in Shandong Province,including two water transmission lines:the Yellow River Transfer Project and the Water Transfer Project from the Yellow River to Qingdao.The latter started on April 15,1986,and it was officially put into operation on November 25,1989;the Yellow River Water Transfer Project started on December 19,2003,and the whole line was completed in July 2013,with the main project put into operation in December.The Yellow River Transfer Project includes two parts:the open channel section and the pipeline section.The open channel section starts from the Songzhuang Transfer Gate and terminates at the Huangshuihe Pumping Station,passing through three pumping stations in Huibu,Dongsong,and Xinzhuang,several inverted siphons,aqueducts,and other water transfer structures,with a total length of about 160 km.The study area selected in this paper is the open channel section of the Yellow River Water Transfer Project.Specifically,the study area is around the Dongsong Pumping Station,with the upstream control node as the Huibu Pumping Station and the downstream control node as the control gate on the port.The building distribution of this section and buildings along the line are shown in Fig.2.
Fig.2 Canal section and building along the distribution
The relationship between the water level and flow rate and the influence of human factors were considered when studying the water level in the forebay of the Dongsong Pumping Station in the future.In addition to the water level of the adjacent section,the flow of the Dongsong Pumping Station,the flow of the Huibu Pumping Station,and the flow difference between the two pumping stations were also selected as the impact factors for prediction.The impact factors are all the impact factors at the current time.Tab.1 shows the correlation analysis results between each factor and the water level of the forebay under different methods.
Tab.1 Correlation analysis of impact factors
It can be seen from Tab.1 that the order of the correlation of impact factors from high to low is the water level in front of the Dongsong Pumping Station,the flow difference of the two pumping stations,downstream water level of the Haizheng River inverted siphon,upstream water level of the Haizheng River inverted siphon,the flow of the pumping station,flow of the Dongsong Pumping Station,and upstream flow of the pumping station.The coefficients of the first four impact factors are all between 0.8 and 0.9,which are identified as impact factors with a high correlation and are given priority when modeling.Considering the indicators of the last three impact factors,only GRA indicates that the degree of correlation is high,and thus they are identified as impact factors with a low correlation,which can be considered in modeling but are not importantly considered.
The BP neural network model was used to predict the water level in the forebay of pumping stations,and the prediction results were analyzed from the aspects of time series and impact factors.
The data of different time scales were trained and verified according to a certain proportion,and the training duration and prediction accuracy were compared.The results indicate that the optimal ratio of the training period to the prediction period is 7∶3.Reducing the ratio will lessen the prediction accuracy,while increasing the ratio almost does not change the prediction accuracy,and the required data volume is significantly raised.
We used 3 600 data to predict the water level change in the next two hours,andR2,ERMS,andEMAat the ratio of 7∶3 were maintained at about 0.95,0.04,and 0.03,respectively.When the ratio was increased,the prediction effect of each indicator was slightly improved,but the difference was not large;when the ratio was higher than 5∶1,the prediction accuracy basically would not see a rise.The specific comparison is shown in Fig.3 and Fig.4.
Fig.3 The result of water level forecast in the next 2 h(7∶3)
Fig.4 The result of water level forecast in the next 2 h(5∶1)
Three groups of three-month data were trained and validated at a ratio of 7∶3.R2was maintained at 0.93-0.98,ERMSat 0.02-0.05,andEMAat 0.02-0.04.The prediction results are shown in Fig.5.
Fig.5 The result of water forecast change in the next 2 h(7∶3)
The data of one month was verified at a ratio of 7∶3,and the verification results indicated that the ratio could still be applied to the amount of data of one month,as shown in Fig.6.
Fig.6 The result of water level forecast in the next 2 h(7∶3)
It can be seen from the above that the optimal ratio is suitable for data of different time scales,and the determination of the optimal ratio can not only save the learning time of the neural network but also improve the prediction accuracy,which has a great influence on the model.
The number of impact factors.When there is a high correlation between impact factors and predictors,a higher number of impact factors leads to more accurate prediction results.However,the increase in the number of impact factors can elevate the data demand during the training period.Therefore,to reduce the data demand and ensure prediction accuracy,we employed different numbers of impact factors for training and verification.The verification results revealed that at least three to five impact factors should be selected for training in the short term(one to three months),and at least five to seven impact factors with the greatest correlation were required for the data volume of three months to a year.
Types of impact factors.Studies have shown that higher prediction accuracy can be achieved when the most relevant impact factors are selected for modeling.According to the correlation analysis results of the impact factors,the three impact factors with the highest correlation are the water level of the pumping station at the current moment,the water level of the upstream adjacent nodes,and the flow difference.Three impact factors were applied to train and predict data of one month,and the above three impact factors registered the best prediction effect,as shown in Fig.7.
Fig.7 3-factor water level prediction result map(7∶3)
The time interval of the impact factors.When the data interval was 2 h,the water level of the Dongsong Pumping Station in the future was predicted:The water level prediction results in the next two hours were relatively stable,withR2greater than 0.9 and smallERMSandEMA;the prediction results of water levels in the next four hours were general,withR2of 0.8-0.9 andERMSandEMAslightly larger than those predicted in two hours;the prediction results of the water level in the next six hours were poor:R2was unstable and had a large variation range,and it was only about 0.7 when the results were good,whileERMSandEMAwere overly great.In other words,when the data does not change during the training period,a longer prediction time is accompanied by lower prediction accuracy.The three-month data were screened to change the interval from 2 h to 4 h,and the water level of the Dongsong Pumping Station in the next 4 h was predicted.The prediction results are shown in Fig.8.
Fig.8 The result of water level change in the next 4 h(7∶3)
The ten-month data were screened at an interval of 4 h,and the water level in the next 4 h was predicted.The prediction results are shown in Fig.9.
Fig.9 The result of water level change in the next 4 h(7∶3)
The research results demonstrate that compared with the direct prediction using the data at an interval of 2 h,the prediction using the data at an interval of 4 h registers higher accuracy in predicting the water level of the pumping station in the next 4 h,withR2,ERMS,andEMAin the range of 0.82-0.93,0.05-0.06,and 0.04-0.05,respectively.
The one-year data were screened to convert the interval from 2 h to 6 h,and the water level of the Dongsong Pumping Station in the next 6 h was predicted.The prediction results show that the prediction effect after screening is worse than that of the direct prediction using data at an interval of 2 h.Upon analysis,the above phenomenon is caused by the overly long interval of 6 h,which can not fully reflect the changing laws of each factor.Therefore,the prediction result after screening is worse than that using the data at an interval of 2 h directly.
The influence of the time series ratio on the water level prediction results:The optimal ratio of the training period to the prediction period is 7∶3,and the increase in the ratio cannot significantly change the prediction accuracy,while the decrease in the ratio can lead to a worse prediction effect.
The effect of impact factors on the prediction results:The amount of data corresponds to the number of impact factors.The data volume of three months requires three to five impact factors for training,and the data volume of three months to a year requires five to seven impact factors to ensure the same prediction effect.
The influence of the data interval on the prediction results:In general,when the data interval remains unchanged,the prediction accuracy gradually decreases with the increase in the prediction time,but when the data can reflect the changing laws of each factor,the data interval and the prediction time are the same,and the prediction effect is better.