薛萍,張召,雷曉輝,盧龍彬,顏培儒,李月強
(1.濟南大學水利與環(huán)境學院,濟南 250022;2.中國水利水電科學研究院水資源所,北京 100038;3.天津大學建筑工程學院,天津 300072;4.河海大學水利水電學院,南京 210098)
明渠調(diào)水工程在進行長距離輸水調(diào)度時,一般通過在渠道中設置泵站、節(jié)制閘、倒虹吸等水工建筑物解除地形條件對輸水限制的影響,同時在建筑物前設置水位計、流量計等監(jiān)測設備獲取水情信息監(jiān)控通水安全。相比于實時水位監(jiān)測,高精度的水位預測更能在水量調(diào)度過程中為調(diào)度人員提供科學指導,尤其是泵站前池水位預測,對泵站調(diào)控、水量調(diào)度、渠道安全均具有重要意義。受氣候、溫度、人類活動等多種因素影響,監(jiān)測設備采集到的水情序列往往呈現(xiàn)出非線性和不確定性的特點,常規(guī)方法很難對其進行規(guī)律分析和趨勢預測。學者[1-3]曾通過建立水力學模型模擬渠道水流的變化過程,但建模要求完整且準確的地形資料、工程參數(shù)和實測數(shù)據(jù),糙率率定過程也較為反復和繁瑣[4],因此存在較大的局限性。隨著人工智能技術和機器學習方法的不斷進步,采用數(shù)據(jù)驅動的方法進行預測可避免水力學建模的多方面要求和諸多限制,直接探索數(shù)據(jù)間的內(nèi)在規(guī)律[5]。
到目前為止,大部分學者[6-9]通過構建神經(jīng)網(wǎng)絡模型進行水位預測,如采用優(yōu)化后的RBF神經(jīng)網(wǎng)絡、LSTM神經(jīng)網(wǎng)絡模型、小波神經(jīng)網(wǎng)絡等應用于地下水位預測,預測精度高且預測效果處于較優(yōu)水平;雖可建立向量機RVM預測模型[10]、Mike模型[11]、相似模型[12]、統(tǒng)計模型[13]、貝葉斯模型[14]等進行水位預測,但使用時受限于一定的條件,故應用于調(diào)水工程中水位預測時不太廣泛;因神經(jīng)網(wǎng)絡已廣泛應用于水位預測,發(fā)展逐漸趨于成熟,也有眾多學者[15-26]通過將神經(jīng)網(wǎng)絡模型、算法組合或者改進算法的方式進行水位預測,如吳美玲[27]等將KNN、GA、BP相結合,對秦淮河的洪水位進行預測,相比于未組合的神經(jīng)網(wǎng)絡模型預測精度提高但略為復雜,但未組合的神經(jīng)網(wǎng)絡模型較為簡單實用,如高學平等[28]利用BP神經(jīng)網(wǎng)絡對泵站站前水位進行預測,發(fā)現(xiàn)BP神經(jīng)網(wǎng)絡在解決非線性問題上有很大優(yōu)勢,在智能預測方面存在巨大潛力。同時,常用的評價指標有ERMS(均方根誤差)、R2(決定系數(shù))[29]等。
綜上可知,構建神經(jīng)網(wǎng)絡進行水位預測是一種切實可行的研究方法。人工神經(jīng)網(wǎng)絡等智能算法在水文預測應用中具有一定的適用性條件,如:ANN有強大非線性能力,但結構簡單不能保存前時信息而無法學習時間序列數(shù)據(jù);RNN能保持先前時刻的水位預測,可有效處理序列數(shù)據(jù),但梯度傳遞中存在缺陷;LSTM具有長短期記憶功能,在一定程度上解決梯度消失和梯度爆炸,但長序列依舊存在問題且不能并行;受信息單向流動特點的限制,經(jīng)典BP神經(jīng)網(wǎng)絡考慮有限數(shù)量的歷史信息,僅適用于短時預測,但結構穩(wěn)定,具有多功能性和簡便性的特征,可靈活處理非線性問題并達到較高的預測精度,具有極強的非線性映射能力;而水文預測中的水情序列因受人為因素影響較大,呈現(xiàn)出較大的非線性特點,故BP神經(jīng)網(wǎng)絡適用于水文預測。BP神經(jīng)網(wǎng)絡自1986年被Rumelhart等[30]提出后,已被廣泛應用于水文預測領域的研究。本文通過建立BP神經(jīng)網(wǎng)絡,利用歷史數(shù)據(jù)預測泵站前池未來時刻的水位,分析時間序列比例及影響因子對水位預測的影響,預測結果既可為泵站前池水位預測提供一種預測方式,也給泵站前池水位變化趨勢提供參考數(shù)據(jù)。
選取泵站前池水位為研究對象,利用相關性分析確定影響因子,并將其作為輸入進行BP神經(jīng)網(wǎng)絡模型構建,預測結果用各指標參數(shù)情況評判優(yōu)劣。
受各種水力因素(斷面面積、水力比降、糙率等)影響,渠道內(nèi)斷面流量和水位之間存在對應關系。泵站前池水位作為監(jiān)測斷面之一,與相鄰斷面的水位、泵站的流量、上游流量、流量差等均可能存在水力聯(lián)系。將這些相關的水位、流量等作為變量,對各變量與預測因子進行相關性分析,識別出具有一定關聯(lián)度的影響因子。
采取的影響因子識別方法有皮爾遜(Pearson)相關系數(shù)法、肯德爾(Kendall)相關性系數(shù)法、斯皮爾曼(Spearman)等級相關系數(shù)法及灰關聯(lián)分析。皮爾遜相關系數(shù)法用于度量2個變量之間的相關程度,2個變量之間的皮爾遜相關系數(shù)定義為2個變量之間的協(xié)方差和標準差的商;肯德爾相關性系數(shù)法是表示多列等級變量相關程度的一種方法,若n個同類的統(tǒng)計對象按特定屬性排序,其他屬性通常是亂序的,同序對和異序對之差與總對數(shù)[n(n-1)/2]的比值定義為肯德爾系數(shù);斯皮爾曼等級相關系數(shù)法是根據(jù)等級資料研究2個變量間相關關系的方法,依據(jù)2列成對等級的各對等級數(shù)之差來進行計算,利用單調(diào)方程評價2個統(tǒng)計變量的相關性。上述3種方法的相關性指標或相關系數(shù)為-1~1:絕對值越接近1,相關性越高;絕對值等于0時,不具備相關性。灰關聯(lián)分析是一種分析系統(tǒng)中各因子關聯(lián)程度的量化方法,根據(jù)不同變量序列間發(fā)展趨勢的相似或相異程度,衡量因素間關聯(lián)程度?;疑P聯(lián)度小于0.6時,不具有相關性;灰色關聯(lián)度越趨近1,相關性程度越高。
BP神經(jīng)網(wǎng)絡是一個利用誤差反向傳播算法進行訓練的多層前饋神經(jīng)網(wǎng)絡,一般包括輸入層、隱含層、輸出層3部分。輸入層具有信息接入即信號接收功能,信號接收完成后將信息傳遞到隱含層,輸入層神經(jīng)元的個數(shù)為輸入影響因子的數(shù)量n;隱含層負責信息處理、信息變換,隱含層神經(jīng)元的個數(shù)為m,小于N-1(N是訓練樣本數(shù)),在MATLAB中經(jīng)測試取值;經(jīng)隱含層后信息傳遞到輸出層,輸出層將結果對外輸出,1個3層的典型網(wǎng)絡結構見圖1。
圖1 BP神經(jīng)網(wǎng)絡模型結構
神經(jīng)網(wǎng)絡結構參數(shù)設置有:最大訓練次數(shù)=100,訓練要求精度=1×10-8,學習率=0.01。參數(shù)設置完成后,網(wǎng)絡利用誤差的反向傳播自動調(diào)整權重和閾值,驅使BP神經(jīng)網(wǎng)絡中表達函數(shù)能夠得到最優(yōu)解,最后輸出預測結果及評判結果的各項指標值。
以R2(決定系數(shù))、ERMS(均方根誤差)、EMA(平均絕對誤差)為評判標準對預測結果的優(yōu)劣進行評判,R2越趨近1,ERMS和EMA越趨近0,說明預測精度越高。
膠東調(diào)水工程是山東省水利建設的重要組成部分,包括引黃調(diào)水工程和引黃濟青工程2條輸水線路。引黃濟青工程于1986年4月15日開工興建,1989年11月25日正式通水;引黃調(diào)水工程于2003年12月19日開工,2013年7月全線貫通,2013年12月主體工程建成通水。其中,引黃調(diào)水工程包括明渠段和管道段兩部分,明渠段以宋莊分水閘為起點,以黃水河泵站為終點,途經(jīng)灰埠、東宋、辛莊3座泵站及若干倒虹吸、渡槽等輸水建筑物,全長約160 km。本文所選研究區(qū)為引黃調(diào)水工程明渠段,具體研究區(qū)域為東宋泵站前后,其上游控制節(jié)點為灰埠泵站,下游控制節(jié)點為埠上節(jié)制閘,該渠段及沿線建筑物分布情況見圖2。
圖2 研究渠段及沿線建筑物分布
研究東宋泵站未來時刻的前池水位時,考慮到水位流量間關系及人為因素影響,除選取相鄰斷面水位作為影響因子外,還選取東宋泵站流量、灰埠泵站流量、灰埠-東宋2級泵站流量差為影響因子進行預測,且影響因子均為當前時刻的影響因子。表1為不同方法下的各因子與前池水位間的相關性分析結果。
表1 影響因子相關性分析結果
由表1可知,影響因子相關性排序從高到低依次為東宋泵站站前水位、2級泵站流量差、海鄭河倒虹下游水位、海鄭河倒虹上游水位、泵站流量、東宋泵站流量及上游泵站流量。前4項影響因子的各系數(shù)均為0.8~0.9,識別為相關性較高的影響因子,建模時優(yōu)先考慮;后3項影響因子的指標中僅灰色關聯(lián)度表明其相關性程度較高,故識別為相關性較低的影響因子,建模時可考慮在內(nèi),但不重點考慮。
利用BP神經(jīng)網(wǎng)絡模型進行泵站前池水位預測,預測結果從時間序列、影響因子2個方面進行分析。
3.2.1時間序列
將不同時間尺度的數(shù)據(jù)按照一定的比例進行訓練和驗證,對比訓練時長和預測精度。結果表明,訓練期和預見期的最優(yōu)比例為7∶3,減小該比例會使預測精度降低,增大該比例預測精度與之相差無幾,且數(shù)據(jù)需求量大幅提升。
采用3 600個數(shù)據(jù)預測未來2 h的水位變化,7∶3比例下的R2、ERMS、EMA分別維持在0.95、0.04、0.03左右。增大該比例時各指標預測效果略有提高,但相差不大,高于5∶1時其預測精度基本不提高。具體對比見圖3和圖4。
圖3 未來2 h水位預測結果(7∶3)
圖4 未來2 h水位預測結果(5∶1)
以7∶3的比例分別對3組3個月的數(shù)據(jù)進行訓練和驗證,R2維持在0.93~0.98,ERMS維持在0.02~0.05、EMA維持在0.02~0.04,預測結果見圖5。
圖5 未來2 h水位預測結果(7∶3)
以7∶3的比例對1個月的數(shù)據(jù)進行驗證,驗證結果表明該比例對1個月的數(shù)據(jù)量依舊適用,具體見圖6。
圖6 未來2 h水位預測結果(7∶3)
由上述可知,最優(yōu)比例適用于不同時間尺度的數(shù)據(jù),且最優(yōu)比例的確定既可節(jié)省神經(jīng)網(wǎng)絡學習的時間,又能提高預測精度,在模型中具有較大影響力。
3.2.2影響因子
影響因子數(shù)量。當影響因子與預測因子之間都具有較高相關性時,影響因子數(shù)量越多,預測結果越精確。但影響因子的數(shù)量會增加訓練期的數(shù)據(jù)需求量,為減少數(shù)據(jù)需求量且保證預測精度,利用不同數(shù)量的影響因子進行訓練和驗證,驗證結果表明:短期(1~3個月)內(nèi)至少選擇3~5個影響因子進行訓練,3個月至1 a的數(shù)據(jù)量則至少需要5~7個相關性最大的影響因子。
影響因子種類。研究表明,選取相關性最高的影響因子構建模型,預測精度更高。由影響因子相關性分析結果可知,相關性最高的3個影響因子為泵站當前時刻的水位、上游相鄰節(jié)點的水位、流量差。采用3個影響因子對1個月的數(shù)據(jù)進行訓練和預測,上述3個影響因子的預測效果最佳,具體見圖7。
圖7 3個因子水位預測結果(7∶3)
影響因子的時間間隔。數(shù)據(jù)間隔均為2 h時,對東宋泵站未來時刻的水位進行預測:未來2 h的水位預測結果較穩(wěn)定,R2結果均在0.9以上,ERMS和EMA也較??;未來4 h的水位預測結果一般,R2為0.8~0.9,ERMS和EMA比2 h預測略大;未來6 h的水位預測結果較差,R2不穩(wěn)定且變化區(qū)間較大,結果較好時也僅為0.7左右,ERMS和EMA則預測結果偏大,分別在0.11和0.09左右。即訓練期內(nèi)數(shù)據(jù)不發(fā)生改變時,預測時間越長,預測精度越低。對3個月的數(shù)據(jù)進行篩選,使2 h間隔轉為4 h間隔,并預測東宋泵站未來4 h的水位,預測結果見圖8。
圖8 未來4 h水位變化結果(7∶3)
對10個月的數(shù)據(jù)進行4 h間隔的篩選并預測未來4 h水位,預測結果見圖9。
圖9 未來4 h水位變化結果(7∶3)
研究結果表明,與采用2 h間隔的數(shù)據(jù)直接預測相比,采用4 h間隔的數(shù)據(jù)預測泵站未來4 h的水位,其預測精度更高,R2變化基本維持在0.82~0.93,ERMS和EMA分別維持在0.05~0.06、0.04~0.05。
對1 a的數(shù)據(jù)進行篩選,使2 h間隔轉為6 h間隔,并預測東宋泵站未來6 h的水位。預測結果表明,篩選后進行預測比用2 h的數(shù)據(jù)直接預測其預測效果更差。經(jīng)分析,上述現(xiàn)象是由6 h的時間間隔太長不能完全反映各因子變化規(guī)律導致,所以篩選后進行預測的結果比直接采用2 h間隔的數(shù)據(jù)進行預測結果更差。
時間序列比例對水位預測結果的影響:訓練期和預測期的最佳比例為7∶3,提高比例其預測精度無明顯變化,降低比例則預測效果變差。
影響因子對預測結果的影響:數(shù)據(jù)量與影響因子數(shù)量呈對應關系,3個月的數(shù)據(jù)量需3~5個影響因子進行訓練,3個月至1 a的數(shù)據(jù)量則需5~7個影響因子確保相同預測效果。
數(shù)據(jù)的時間間隔對預測結果的影響:一般情況下,數(shù)據(jù)間隔不變,預測精度隨預測時間的增加而逐漸降低;但當數(shù)據(jù)能夠反映各因子變化規(guī)律時,數(shù)據(jù)間隔和預測時間相同,預測效果更佳。
Prediction model for forebay water level of pumping stations with different time scales based on BP neural networks
XUE Ping1,ZHANG Zhao2,LEI Xiaohui2,LU Longbin1,YAN Peiru3,LI Yueqiang4
(1.School of Water Conservancy and Environment,University of Jinan,Jinan 250022,China;2.Institute of Water Resources,China Institute of Water Resources and Hydropower Research,Beijing 100038,China;3.School of Civil Engineering,Tianjin University,Tianjin 300072,China;4.College of Water Conservancy and Hydropower Engineering,Hohai University,Nanjing 210098,China)
Abstract:Considering the difficulty in water level prediction under building control,a water level prediction model for the forebay of a pumping station was built on the basis of back-propagation(BP)neural networks,and the influence of time series and impact factors on the accuracy of water level prediction was analyzed under different time scales.The constructed model was applied to the Dongsong Pumping Station of the Jiaodong Water Transfer Project.The research results revealed that:when the total amount of data was fixed,and the ratio of the training period to the prediction period was 7∶3,the prediction result was good;a larger amount of data was accompanied by a greater number of positively correlated impact factors required for certain prediction accuracy;in a short period of time,when the prediction time interval was the same as the time interval of the data itself,the prediction effect was better.The constructed model can meet the demand for dynamic prediction of the water level in the forebay of the open channel water transfer project and can achieve the 2 h accurate prediction of the forebay water level of the pumping station and the 4 h general accurate prediction.Additionally,it can be popularized and applied in other similar open channel water transfer projects.
Keywords:forebay of pump station;water level prediction;BP neural network;time series;proportion
Received:2021-07-04Revised:2021-09-30Onlinepublishing:2021-10-11
Onlinepublishingaddress:https://kns.cnki.net/kcms/detail/13.1430.TV.20211009.1638.002.html
Fund:National Natural Science Foundation of China(51779268)
Author′sbrief:XUE Ping(1998-),female,Weifang Shandong Province,mainly engaged in research on hydrology and water resources.E-mail:2857487127@qq.com
Correspondingauthor:LEI Xiaohui(1974-),male,Weinan Shaanxi Province,Ph.D.,professor-level senior engineer,mainly engaged in research on hydrology and water resources,reservoir dispatching,and hydraulic control.E-mail:lxh@iwhr.com
DOI:10.13476/j.cnki.nsbdqk.2022.0040
For the long-distance water dispatching of an open channel water transfer project,hydraulic structures such as pumping stations,control gates,and inverted siphons are generally set up in the channel to relieve the influence of terrain conditions on water transfer restrictions.Meanwhile,monitoring equipment such as water level meters and flow meters are installed in front of buildings to obtain water information and monitor water safety.Compared with real-time water level monitoring,high-precision water level prediction can provide more scientific guidance for dispatchers in the process of water dispatching,especially the water level prediction in the forebay of pumping stations,which is of great significance to the regulation of pumping stations,water dispatching,and channel safety.Affected by various factors such as climate,temperature,and human activities,the hydrological sequence collected by monitoring equipment often presents the characteristics of nonlinearity and uncertainty,and it is difficult to analyze the laws and predict the trend by conventional methods.Scholars[1-3]have built hydraulic models to simulate the changing process of channel water flow,but the modeling requires complete and accurate topographic data,engineering parameters,and measured data;moreover,the calibration process of the roughness rate is also repetitive and cumbersome[4],and thus there are huge limitations.With the continuous progress of artificial intelligence technology and machine learning methods,the data-driven methods used for the prediction can avoid many requirements and limitations of hydraulic modeling and directly explore the inherent laws between data[5].
Up to now,most scholars[6-9]have built neural network models for water level prediction,such as the optimized RBF neural network,LSTM neural network model,and wavelet neural network applied in groundwater level prediction,with high prediction accuracy and an excellent prediction effect.Although the relevance vector machine(RVM)prediction model[10],Mike model[11],similarity model[12],statistical model[13],and Bayesian model[14]can be constructed for water level prediction,their applications are limited to a certain extant,and hence they are not widely used in water level prediction for water transfer projects.As the neural network has been commonly used in water level prediction,and its development has gradually matured,many scholars[15-26]have made water level predictions by combining neural network models and algorithms or improving algorithms.For instance,Wu et al.[27]combined KNN,GA,and BP to predict the flood level of the Qinhuai River,and compared with the neural network model without combination,the combined method has higher prediction accuracy but is slightly more complicated.In other words,the uncombined neural network models are simple and practical.For example,Gao et al.[28]used the BP neural network to predict the water level in front of the pumping station and found that the BP neural network has great advantages in solving nonlinear problems and has significant potential in intelligent prediction.In addition,the commonly used evaluation indicators include the root mean square error(ERMS)and determination coefficient(R2)[29].
In summary,it is a feasible research method to construct a neural network for water level prediction.Moreover,intelligent algorithms such as artificial neural networks have certain applicability conditions in hydrological prediction applications.For example,ANN has a strong nonlinear ability,but due to its simple structure,previous information can not be saved,and time series data can not be learned.RNN can retain the water level prediction at the previous moment and can effectively process sequence data,but there are defects in gradient transfer.LSTM has long and short-term memory functions and can solve gradient disappearance and gradient explosion to a certain extent,but there are still problems in long sequences,and it can not be parallelized.Restricted by the one-way flow of information,the classical BP neural network considers a limited amount of historical information and is only suitable for short-term prediction,but it has a stable structure and features versatility and simplicity,which can flexibly deal with nonlinear problems,achieve high prediction accuracy,and has strong nonlinear mapping ability.As the hydrological sequence in hydrological forecasting is greatly affected by human factors and presents a prominent nonlinear characteristic,and the BP neural network is suitable for hydrological forecasting.Since BP neural network was proposed by Rumelhart et al.[30]in 1986,it has been widely used in research on hydrological prediction.In this paper,a BP neural network was established.We used historical data to predict the water level in the forebay of the pumping station and analyzed the influence of the time series proportion and impact factors on the water level prediction.The research results can provide a new method for water level prediction and reference data for the changing trend of the water level in the forebay of the pumping station.
The water level in the forebay of the pumping station is selected as the research object.The impact factors are determined by correlation analysis and are used as the input to construct the BP neural network model,and then the prediction results are judged by the parameters of each indicator.
Under the influence of various hydraulic factors(section area,hydraulic gradient,roughness,etc.),there is a corresponding relationship between the section flow and the water level in the channel.As one of the monitoring sections,the water level in the forebay of the pumping station may have a hydraulic connection with the water level of the adjacent section,the flow of the pumping station,the upstream flow,and the flow difference.Taking these relevant water levels and flow as variables,we conduct a correlation analysis of each variable and the predictor,and the impact factors with a certain degree of correlation are identified.
The impact factor identification methods adopted include Pearson′s correlation coefficient,Kendall′s correlation coefficient,Spearman′s rank correlation coefficient,and grey relational analysis(GRA).Pearson′s correlation coefficient is used to measure the degree of correlation between two variables,and Pearson′s correlation coefficient between two variables is defined as the quotient of the covariance and standard deviation between the two variables.Kendall′s correlation coefficient is a method to represent the degree of correlation of multi-column rank variables.Ifnsimilar statistical objects are sorted by a specific attribute,other attributes are usually out of order,and the ratio of the difference between same-order pairs and out-of-order pairs to the total number of pairs[n(n-1)/2]is defined as Kendall′s coefficient.Spearman′s rank correlation coefficient is a method to study the correlation between two variables according to the rank data;in other words,it is calculated according to the rank difference between each pair of two-column paired ranks,and the monotone equation is used to evaluate the correlation of the two statistical variables.The range of the correlation indicator or correlation coefficient of the above three methods is from-1 to 1:When the absolute value of the correlation coefficient is closer to 1,the correlation is higher;when it is equal to zero,there is no correlation.GRA is a quantitative method for analyzing the correlation degree of each factor in the system,which measures the degree of correlation between factors according to the degree of similarity or dissimilarity in development trends among different variable sequences.When GRA is less than 0.6,it is considered that there is no correlation,and when it is closer to 1,the correlation degree is higher.
A BP neural network is a multilayer feedforward neural network trained by an error back-propagation algorithm,generally including the input layer,hidden layer,and output layer.The input layer has the function of information access,i.e.,signal reception.When the signal reception is completed,the information is transmitted to the hidden layer,and the number of neurons in the input layer is the numbernof input impact factors.The hidden layer is responsible for information processing and information transformation,and the number of neurons in the hidden layer ism,which is less thanN-1(Nis the number of training samples),whose value is tested in MATLAB.Then,the information is transmitted from the hidden layer to the output layer,and the output layer outputs the results.The typical structure of a three-layer network is shown in Fig.1.
Fig.1 BP neural network model structure
The neural network structure parameters are set as follows:maximum training times=100;required accuracy of training=1×10-8;learning rate=0.01.Upon the parameter setting,the network automatically adjusts the weights and thresholds by the back-propagation of errors,which drives the expression function in the BP neural network to obtain the optimal solution,and finally,it outputs the prediction results and the indicator values of the evaluation results.
R2,ERMS,and the mean absolute error(EMA)are used as the evaluation criteria to judge the strengths and weaknesses of the prediction results.WhenR2is closer to 1,andERMSandEMAare closer to zero,the prediction accuracy is higher.
The Jiaodong Water Transfer Project is an important part of the water conservancy construction in Shandong Province,including two water transmission lines:the Yellow River Transfer Project and the Water Transfer Project from the Yellow River to Qingdao.The latter started on April 15,1986,and it was officially put into operation on November 25,1989;the Yellow River Water Transfer Project started on December 19,2003,and the whole line was completed in July 2013,with the main project put into operation in December.The Yellow River Transfer Project includes two parts:the open channel section and the pipeline section.The open channel section starts from the Songzhuang Transfer Gate and terminates at the Huangshuihe Pumping Station,passing through three pumping stations in Huibu,Dongsong,and Xinzhuang,several inverted siphons,aqueducts,and other water transfer structures,with a total length of about 160 km.The study area selected in this paper is the open channel section of the Yellow River Water Transfer Project.Specifically,the study area is around the Dongsong Pumping Station,with the upstream control node as the Huibu Pumping Station and the downstream control node as the control gate on the port.The building distribution of this section and buildings along the line are shown in Fig.2.
Fig.2 Canal section and building along the distribution
The relationship between the water level and flow rate and the influence of human factors were considered when studying the water level in the forebay of the Dongsong Pumping Station in the future.In addition to the water level of the adjacent section,the flow of the Dongsong Pumping Station,the flow of the Huibu Pumping Station,and the flow difference between the two pumping stations were also selected as the impact factors for prediction.The impact factors are all the impact factors at the current time.Tab.1 shows the correlation analysis results between each factor and the water level of the forebay under different methods.
Tab.1 Correlation analysis of impact factors
It can be seen from Tab.1 that the order of the correlation of impact factors from high to low is the water level in front of the Dongsong Pumping Station,the flow difference of the two pumping stations,downstream water level of the Haizheng River inverted siphon,upstream water level of the Haizheng River inverted siphon,the flow of the pumping station,flow of the Dongsong Pumping Station,and upstream flow of the pumping station.The coefficients of the first four impact factors are all between 0.8 and 0.9,which are identified as impact factors with a high correlation and are given priority when modeling.Considering the indicators of the last three impact factors,only GRA indicates that the degree of correlation is high,and thus they are identified as impact factors with a low correlation,which can be considered in modeling but are not importantly considered.
The BP neural network model was used to predict the water level in the forebay of pumping stations,and the prediction results were analyzed from the aspects of time series and impact factors.
3.2.1Timeseries
The data of different time scales were trained and verified according to a certain proportion,and the training duration and prediction accuracy were compared.The results indicate that the optimal ratio of the training period to the prediction period is 7∶3.Reducing the ratio will lessen the prediction accuracy,while increasing the ratio almost does not change the prediction accuracy,and the required data volume is significantly raised.
We used 3 600 data to predict the water level change in the next two hours,andR2,ERMS,andEMAat the ratio of 7∶3 were maintained at about 0.95,0.04,and 0.03,respectively.When the ratio was increased,the prediction effect of each indicator was slightly improved,but the difference was not large;when the ratio was higher than 5∶1,the prediction accuracy basically would not see a rise.The specific comparison is shown in Fig.3 and Fig.4.
Fig.3 The result of water level forecast in the next 2 h(7∶3)
Fig.4 The result of water level forecast in the next 2 h(5∶1)
Three groups of three-month data were trained and validated at a ratio of 7∶3.R2was maintained at 0.93-0.98,ERMSat 0.02-0.05,andEMAat 0.02-0.04.The prediction results are shown in Fig.5.
Fig.5 The result of water forecast change in the next 2 h(7∶3)
The data of one month was verified at a ratio of 7∶3,and the verification results indicated that the ratio could still be applied to the amount of data of one month,as shown in Fig.6.
Fig.6 The result of water level forecast in the next 2 h(7∶3)
It can be seen from the above that the optimal ratio is suitable for data of different time scales,and the determination of the optimal ratio can not only save the learning time of the neural network but also improve the prediction accuracy,which has a great influence on the model.
3.2.2Impactfactors
The number of impact factors.When there is a high correlation between impact factors and predictors,a higher number of impact factors leads to more accurate prediction results.However,the increase in the number of impact factors can elevate the data demand during the training period.Therefore,to reduce the data demand and ensure prediction accuracy,we employed different numbers of impact factors for training and verification.The verification results revealed that at least three to five impact factors should be selected for training in the short term(one to three months),and at least five to seven impact factors with the greatest correlation were required for the data volume of three months to a year.
Types of impact factors.Studies have shown that higher prediction accuracy can be achieved when the most relevant impact factors are selected for modeling.According to the correlation analysis results of the impact factors,the three impact factors with the highest correlation are the water level of the pumping station at the current moment,the water level of the upstream adjacent nodes,and the flow difference.Three impact factors were applied to train and predict data of one month,and the above three impact factors registered the best prediction effect,as shown in Fig.7.
Fig.7 3-factor water level prediction result map(7∶3)
The time interval of the impact factors.When the data interval was 2 h,the water level of the Dongsong Pumping Station in the future was predicted:The water level prediction results in the next two hours were relatively stable,withR2greater than 0.9 and smallERMSandEMA;the prediction results of water levels in the next four hours were general,withR2of 0.8-0.9 andERMSandEMAslightly larger than those predicted in two hours;the prediction results of the water level in the next six hours were poor:R2was unstable and had a large variation range,and it was only about 0.7 when the results were good,whileERMSandEMAwere overly great.In other words,when the data does not change during the training period,a longer prediction time is accompanied by lower prediction accuracy.The three-month data were screened to change the interval from 2 h to 4 h,and the water level of the Dongsong Pumping Station in the next 4 h was predicted.The prediction results are shown in Fig.8.
Fig.8 The result of water level change in the next 4 h(7∶3)
The ten-month data were screened at an interval of 4 h,and the water level in the next 4 h was predicted.The prediction results are shown in Fig.9.
Fig.9 The result of water level change in the next 4 h(7∶3)
The research results demonstrate that compared with the direct prediction using the data at an interval of 2 h,the prediction using the data at an interval of 4 h registers higher accuracy in predicting the water level of the pumping station in the next 4 h,withR2,ERMS,andEMAin the range of 0.82-0.93,0.05-0.06,and 0.04-0.05,respectively.
The one-year data were screened to convert the interval from 2 h to 6 h,and the water level of the Dongsong Pumping Station in the next 6 h was predicted.The prediction results show that the prediction effect after screening is worse than that of the direct prediction using data at an interval of 2 h.Upon analysis,the above phenomenon is caused by the overly long interval of 6 h,which can not fully reflect the changing laws of each factor.Therefore,the prediction result after screening is worse than that using the data at an interval of 2 h directly.
The influence of the time series ratio on the water level prediction results:The optimal ratio of the training period to the prediction period is 7∶3,and the increase in the ratio cannot significantly change the prediction accuracy,while the decrease in the ratio can lead to a worse prediction effect.
The effect of impact factors on the prediction results:The amount of data corresponds to the number of impact factors.The data volume of three months requires three to five impact factors for training,and the data volume of three months to a year requires five to seven impact factors to ensure the same prediction effect.
The influence of the data interval on the prediction results:In general,when the data interval remains unchanged,the prediction accuracy gradually decreases with the increase in the prediction time,but when the data can reflect the changing laws of each factor,the data interval and the prediction time are the same,and the prediction effect is better.