Yang Yu,, Zhenyu Lei, Yirui Wang, Tengfei Zhang, Senior,Chen Peng, and Shangce Gao, Senior
Abstract—Some recent research reports that a dendritic neuron model (DNM) can achieve better performance than traditional artificial neuron networks (ANNs) on classification,prediction, and other problems when its parameters are welltuned by a learning algorithm. However, the back-propagation algorithm (BP), as a mostly used learning algorithm, intrinsically suffers from defects of slow convergence and easily dropping into local minima. Therefore, more and more research adopts non-BP learning algorithms to train ANNs. In this paper, a dynamic scale-free network-based differential evolution (DSNDE) is developed by considering the demands of convergent speed and the ability to jump out of local minima. The performance of a DSNDE trained DNM is tested on 14 benchmark datasets and a photovoltaic power forecasting problem. Nine meta-heuristic algorithms are applied into comparison, including the champion of the 2017 IEEE Congress on Evolutionary Computation(CEC2017) benchmark competition effective butterfly optimizer with covariance matrix adapted retreat phase (EBOwithCMAR).The experimental results reveal that DSNDE achieves better performance than its peers.
NOWADYS, artificial neuron networks (ANNs) are applied to more and more fields, such as image processing, character recognition, and financial forecasting[1]–[3]. These successful applications benefited from their distinct structures. A typical structure of an ANN can be seen as a directed graph with processing elements as nodes, and interconnected by weighted directed links. The first computational model of a neuron was proposed by McCulloch and Pitts in 1943 [4]. Based on it, a multi-layer perceptron(MLP) was constructed and becomes a classical model in the ANN community. MLP is composed of three layers, including an input layer, one or more hidden layers, and an output layer.The information is transmitted between layers with probability-weighted associations, which are stored within the data structure of the network. Each layer has multiple neurons and it is assigned different thresholds to decide whether to transfer processed data to the next layer. An output layer acts as a multiplicative function for the received data from the former layer. At last, an activation function is implemented to calculate the ultimate output. Common activation functions include a sigmoid function, a rectified linear units function,and an exponential linear units function [5]–[8].
With the developments and applications of ANN in various fields, ANN derives many other different models. Convolutional neuron network (CNN) is a very effective one, which was proposed for analyzing visual imagery. It consists of an input layer, a convolution layer, a pooling layer, and a fully connected layer [9]. A convolution layer is also called a weighted filter as its size is smaller than that of the input data.An inner product is calculated by sliding a weighted filter with respect to the input. CNN takes advantage of a hierarchical pattern in data and assembles more complex patterns with smaller and simpler patterns. Therefore, CNN is efficient on the scale of connectedness and complexity.
Recurrent neuron network (RNN) is derived from feedforward neural networks and can use internal memory to process variable-length sequences of input [10]. This property makes it suitable for requests such as natural language processing and speech recognition. Basic RNN is a network of neuron-like nodes organized into sequential layers. Each node has a time-varying real-valued activation, and each connection has a real-valued weight that can be modified. All these nodes are either input nodes, hidden nodes, or output nodes and they are connected successively. RNN uses sequences of realvalued as input and is recursive in the training direction of the sequence.
Although these mentioned neural models succeed in many research and techniques, they still have some drawbacks, such as slow convergent speed and high computational cost [11].Recently, some research reveals that dendrite plays a pivotal role in the nervous system [12], [13]. The neural network,equipped with functional dendrites, shows a potential of substantial-overall performance improvement [13], [14]. This research arouses our attention to the study of the dendritic neuron model (DNM). DNM is developed by taking inspiration from the nonlinearity of synapses, and its dendrite layer can process input data independently. The characteristics of DNM can be summarized following the description in [15]:1) The structure of DNM is multilayered, and signals are transmitted between layers in a feedforward manner. Hence,the applied functions of these models can be reciprocated.2) Multiplication is both the simplest and one of the most widespread of all nonlinear operations in the nervous system[16]. DNM contributes a lot to the information process in neurons and the computation in synapses, whereas the latter is innovatively modeled by using sigmoid functions. 3) The output of synapses has four states, including excitatory,inhibitory, constant 1, and constant 0. They can beneficially identify the morphology of a neuron. The presentation of each state primarily depends on the values of parameters in synapses [17], [18]. Consequently, the training for parameters crucially influences the performance of a DNM.
Generally speaking, most ANN models use the back propagation (BP) algorithm, which is a gradient-based algorithm, as their learning methods to find the best combination of network weights and thresholds. However, BP intrinsically suffers from defects of slow convergence and easily dropping into local minima, making it has poor training efficiency [15]. Therefore, in recent studies, adopting non-BP learning algorithms for ANNs gradually becomes a tendency[19]–[26].
In view of the limitations of the previous work, a wavelet transform algorithm is used as a learning algorithm for DNM on forecasting the photovoltaic power [18], which is one of the important research issues within the smart grid. Wavelet transform algorithm was originally developed in the field of signal processing and has been shown to offer advantages over the Fourier transform when processing non-stationary signals. It has been widely used in time series forecasts due to its capability in dealing with discrete signals. The proposed forecasting model claims high computational efficiency and prediction accuracy by using actual training and test data taken with a sampling time interval of 15 minutes.
In [20], a hybrid algorithm that combines a genetic algorithm with a local search method is deployed to enhance the learning capability of a supervised adaptive resonance theory-based neural network by searching and adapting network weights. Owing to the effectiveness of the genetic algorithm in optimizing parameters, the proposed model can easily give high-accurate rates for samples from different classes in an imbalanced data environment.
Specifically, meta-heuristic algorithms are proven to be effective in training ANNs. In [27], biogeography-based optimization (BBO) is used as a trainer for MLP. It is compared with BP and other five meta-heuristic algorithms on eleven benchmark datasets. The statistical results reveal that the utilization of meta-heuristic algorithms is very promising in training MLP. Moreover, BBO is much more effective than BP regarding classification rate and test error.
Similarly, Gaoetal. [15] comprehensively investigate the performance of six meta-heuristic algorithms as learning methods. Taguchi’s experimental design method is used to systematically find the best combination of user-defined parameter sets. Benchmark datasets, involving five classification, six approximation, and three prediction problems, are conducted by using an MLP and DNM. Twelve combinations are investigated. It is reported that a combination of BBO and DNM is the most effective among its peers according to the experimental results.
The above-mentioned research reveals the flexibility and effectiveness of using meta-heuristics as learning algorithms for ANNs. It also gives us the motivation to propose better algorithms with much more powerful search ability.Generally, differential evolution (DE) is arguably one of the most efficient meta-heuristic algorithms in current use [28].Its simplicity and strong robustness realize successful applications to various real-world optimization problems,where finding an approximate solution in a reasonable amount of computational time is much weighted [29]. In the meanwhile, a scale-free network is a very common structure in nature. One of its characteristics is preferential linking,which means the probability that an edge links a vertex is proportional to the degree of this vertex. It provides a great benefit to the interaction of information exchange in DE’s population. The nodes with better fitness can have a greater influence on other inferior nodes, while the nodes with worse fitness have a lower chance to participate in a solution generation process. Hence, to further enhance DE’s robustness and stability when the population size and problem scale change, a dynamic scale-free network-based differential evolution (DSNDE) is developed. DSNDE combines a scalefree network structure with DE by considering a dynamic adjustment for the parameter, which endows DE with the benefit of utilizing the neighborhood information provided by a scale-free network. Meanwhile, its parameters can be dynamically tuned during the optimization. A mutation operator called DE/old-c enters/1 is finely designed to adequately exploit the advantages of a scale-free networkbased DE. In this way, DSNDE can concurrently avoid premature convergence and enhance its global optimality.
This paper contributes to communities of ANNs and evolutionary algorithms in the following aspects: 1) An effective DNM is trained by a novel learning algorithm DSNDE to improve its performance. For a given task, it can effectively enhance the training results of DNM whether it is a prediction problem, a classification problem, or a function approximation problem. 2) A photovoltaic power forecasting problem, in which its actual training and test data are collected from the natural environment, is used to seek the application value of the proposed training model. 3) Comparisons with nine state-of-the-art meta-heuristic algorithms, including the champion of the 2017 IEEE Congress on Evolutionary Computation (CEC2017) benchmark competition EBOwithCMAR [30], reveal that DSNDE has superiority in improving the computational efficiency and prediction accuracy of DNM for various training tasks.
The next section gives a brief introduction to a canonical DNM. A novel learning algorithm DSNDE is proposed in Section III. Sections IV and V present the experimental results of DSNDE and nine contrast learning algorithms for training DNM on 14 benchmark datasets and a photovoltaic power forecasting problem, respectively. Section VI concludes this paper.
DNM is composed of four layers [18], including a synaptic layer, a dendrite layer, a membrane layer, and a soma layer.The functions and details of each layer are described as follows.
A synaptic layer refers to a structure that transmits impulse from one dendrite to another dendrite or neural cell. The information transfers in a feedforward manner. Equation (1)describes the connection of theith (i=1,2,3,...,N) synaptic input to thejth (j=1,2,3,...,M) dendrite layer.
whereYi,jis the output from theith synaptic to thejth dendrite layer.xiis the input signal and normalized into [0, 1].kis a user-defined positive constant. Its value is problem-related.ωi,jand θi,jare the corresponding weight and threshold,respectively. They are the targets to be optimized by learning algorithms. The population to be trained is formulated as follows:
whereXi(i=1,2,...,Np) denotes theith individual in the population.Npis the population size.
The main function of a dendrite layer is to conduct a multiplicative operation to the outputs of synaptic layers.When the information transfers from a synaptic layer to a dendrite layer, the connection could have four kinds of states depending on different values of ωi,jand θi,j. They can be used to infer the morphology of a neuron by specifying the positions and synapse types of dendrites [15], [31].
Case 1:A direct or excitatory connection (when 0 ≤θi,j≤ωi,j). In this state, the output is proportional to the input no matter when the input varies from 0 to 1.
Case 2:An inverse or inhibitory connection (when ωi,j≤θi,j≤0). In contrary to the previous state, the output is inversely proportional to the input no matter when the input varies from 0 to 1.
Case 3:A constant 1 connection (when θi,j≤ωi,j≤0 or θi,j≤0 ≤ωi,j). The output is approximately 1 no matter when the input varies from 0 to 1.
Case 4:A constant 0 connection (when ωi,j≤0 ≤θi,jor 0 ≤ωi,j≤θi,j). The output is approximately 0 no matter when the input varies from 0 to 1.
Since the values of inputs and outputs of the dendrites correspond to 1 or 0, the multiplication operation is equivalent to the logic AND operation. The symbol π in Fig. 1 represents a multiplicative operator, and it is formulated in (3).
Fig. 1. Illustration for DNM. It consists of a synaptic layer, a dendrite layer,a membrane layer, and a soma layer.
whereZjis the output function for thejth dendrite branch.
A membrane layer is used to aggregate and process information in all dendrite layers by a summation function to closely resemble a logic OR operation. It is worth noting that the input and output are either 1 or 0. Thus, DNM is only suitable for two-class datasets, but cannot be applied to multiclassification problems under the current structure. As the threshold of a soma layer is set to 0.5, the soma body will be activated if all inputs are non-zero. The function is formulated as follows:
whereVis a summation output of all dendrite layers.
A soma layer represents a soma cell body. When the threshold is exceeded, the neuron is activated and the ultimate output of the entire model is calculated by a process expressed by a sigmoid function, which is shown as follows:
whereksis a positive constant and θsrepresents the threshold of a soma body. A sigmoid function output values ranging from 0 to 1. Therefore, it is often used for ANNs that require an output value located at intervals of 0 to 1 [32].
Fig. 1 illustrates the structure of DNM.x1,x2,..., andxNare the inputs in each dendrite layer. They are transformed into signals according to four connection states. Then, a multiplicative operation is conducted to multiply all the outputs from the synaptic layer. In the next step, these multiplied outputs are summed in a membrane layer. Finally,the obtained result is regarded as the input of a soma layer to generate the ultimate training output of DNM.
Scale-free networks commonly exist in nature. Many social and transportation behaviors exhibit a character of scale-free,such as world wide web and protein-protein interaction networks. There are already some studies trying to reveal their properties [33]–[35]. In these studies, the Barabási-Albert(BA) model is the first model that generates random scale-free networks with a preferential attachment mechanism [36] and is the most widely used scale-free model in the swarm and evolutionary algorithms. When building a scale-free network,mnodes are firstly initialized, and the network is constructed by connecting other nodes to the existing nodes. A network usually has two parameters, degree (k) and average degreeDegreekis the number of connections a node possesses to other nodes, and average degreeis thekaveraged over all nodes in a network. In a BA model, the probability that a new node connects to existing nodes is proportional to the degree of these existing nodes. The degree distribution of a BA model follows a power-law of the form:
whereγis three for a BA model [37]. Fig. 2 illustrates the degree distribution graph of a BA model when γ=3. The power-law distribution allows the existence of some nodes with numerous links which can be reflected by the long tail,whereas the majority of nodes have only a few links. This phenomenon presents a strong heterogeneity among the network topology. Thus, a stable diversity of the population is ensured. That is the reason that we introduce a scale-free network to enhance the information exchange in DE.
Fig. 2. Degree distribution of a BA model when γ=3. The long tail represents the existence of a few nodes with a very large number of connections.
In this part, DSNDE is introduced in detail. Firstly, a scalefree network is constructed based on the topmindividuals with better fitness in the population. They are called the centers. As we introduced, the BA model is the most widely used scale-free model. Hence, in DSNDE, it is used to construct a network initialized by the centers. Withminitial interconnected nodes, the left individuals are successively added to the network due to their fitness values from the best to the worst, which endows the better individuals with a higher chance to link the centers.Piis the probability of new individuals building a connection with the existing nodei, and is proportional to degreeki, shown as follows:
wherejrepresents all existing nodes. When an iteration starts,the centers are the first connection choices for other nodes,which leads to a result that their degree increases quickly. The increase in degree makes other individuals more inclined to establish connections with them. Consequently, they dominate a solution generation process and transmit more genetic information to others.
Traditional DE has four main operators, including initialization, mutation, crossover, and selection. In mutation operator, DE usually applies four common strategies, namely DE/best/1, D E/rand/1, DE/best/2, and D E/rand- to-b est/1, to generate mutant vectors. While in DSNDE, because of the implementation of a scale-free network, a new mutation operator called DE/old-c enters/1 is proposed to fully use the neighborhood information, which is formulated as follows:
whereXiandViare theith and the mutant vector,respectively.Xneighbor1andXneighbor2are selected from the neighborhood ofXiby a roulette wheel selection method based on their fitness. To avoid a situation that an individual has one link only, the minimum number of links a node can maintain is set to two, which means a newly-connected node links with at least two existing network nodes. The roulette wheel selection method is formulated as follows:
whereRirepresents a fitness rank of the corresponding individual in the neighborhood of the individual to be generated. It is inversely proportional to the fitness value in a minimization problem, which means the individual with better fitness has a higher chance to be selected into a mutation operation. Since the better individual has more useful information, the roulette wheel selection method can more effectively share the information with the population.is a vector of the former generation. It provides searching history to the current population.Xjis randomly selected from the centers, which shares the information of better individuals.
whereFiandCrirefer to a parameter set for theith individual.randnis a normal distribution with standard deviation 0.1,and with mean valuesFPandCrPforFandCr, respectively.Denote SFand SCras the sets of theFandCrvalues of the centers, andis an arithmetic mean of the elements of S.
Moreover, a dynamic mechanism is developed for adjusting the size of centers in DSNDE. Ifmis preset with a constant value, it can not maintain a stable optimization performance when facing variations of population size and problem scale.Therefore, in DSNDE, the size of centers dynamically varies in dependence on whether there is an improvement of the best solution. If the best optimum of the population gets improved byLmtimes in total andmis greater than 4, the size of centers decreases. Otherwise, if no improvement occurs in the best optimum in a total ofLmtimes, the size of centers increases until it reaches a maximum value of 0.2×Np. In the former case, successive improvements indicate that the whole population is in an exploration phase. Thus, decreasingmcan enhance the influence of the best individuals to accelerate the convergence speed. On the other hand, stagnation means the generation process sticks to a local optimum. Adding more nodes into the centers can increase the diversity of exchanged information and the probability of the population jumping out of a local optimum. The flowchart of DSNDE is presented in Algorithm 1, where the Best refers to the best individual found so far. Following this flowchart, the time complexities of DSNDE can be calculated as follows.Drepresents the dimension of the problem. In the procedure of initializing a population, the time complexity isO(Np×D). Then, as for individual evaluation, the time complexity isO(Np)+O(D).The construction of a scale-free network requirestime complexity. Select the nodes participating in mutation for the population needsThe mutation coststo generate mutant vectors. The time complexities for the crossover and selection operators areO(Np). In closing,DSNDE demands an overall time complexity ofin the worst case. Thus, DSNDE is computationally efficient.
Algorithm 1 DSNDE Algorithm 1 Initialize a population with individuals randomly;2 Calculate the fitness of each individual;t=1 Np 3 Generation ;F0=0.5Cr0=0.9 4 Initialize , ;5 while: The maximum number of iterations is not reached do centers 6 Use a BA model to build a scale-free network based on the;i=1:Np 7 for do Xneighbor1(t) Xneighbor2(t)Xi(t)8 Decide and by a roulette wheel selection method from the neighborhood of ;Xj(t) centers 9 Randomly choose from the ;10 Generate mutant vectors via - mutation operator;Vi(t)=Xi(t)+Fi·(Xj(t)?Xiold+Xneighbor1(t)?Xneighbor2(t))11 ;oldcenters/1 rand next generation;Xi(t+1)=17 18 end{ Ui(t), if f(Ui(t))< f(Xi(t))Xi(t), otherwise;19 Store current population as ;Xold =X(t)Xold 20 ;FP= ˉSF CrP= ˉSCr 21 Calculate and Fi Cri 22 Update and ;Fi=N(FP,0.1)23 ;Cri=N(CrP,0.1)24 ;Lm m>4 25 if The Best gets improved in a total of iterations & then m=m?1 26 27 else 28 if The Best does not get improved in a total of iterations & then m=m+1 Lm m<0.2×Np 29 30 end 31 end t=t+1 32 ;33 end In this section, five classification, six function approximation, and three prediction benchmark datasets are utilized to verify the performance of DNM training by DSNDE and nine meta-heuristic algorithms [38]. The comparison algorithms are listed as follows: 1) BBO [39]: Biogeography-based optimization algorithm; 2) DE [28]: Differential evolution; 3) DEGoS [40]: DE with global optimum-based search strategy; 在黑河分公司的遜克片區(qū)齊克加油站,那里的農(nóng)戶相對需求比較大,齊克加油站就在“10惠”活動中,要求先充值1000到2000元,同時當天加滿200升柴油,還能贈送一些非油品,比如糧油等商品。 4) JADE [41]: Adaptive DE with optional external archive; 5) SHADE [42]: Success-history based adaptive DE; 6) CJADE [43]: Chaotic local search-based JADE; 7) NDi-DE [44]: Neighborhood and direction information based DE; 8) EBLSHADE [45]: Successful history-based adaptive DE with linear population size reduction (LSHADE) with novel mutation strategy; 9) EBOwithCMAR [30]: Effective butterfly optimizer with covariance matrix adapted retreat phase. Tables I–III list the details of these benchmark datasets and their abbreviations. These datasets are named as CF1–CF5(classification functions), AF1–AF6 (approximation functions), and PF1–PF3 (predication functions) for convenience.The classification datasets are acquired from the University ofCalifornia at Irvine Machine Learning Repository [46]. Table I summarizes their numbers of attributes, training samples, test samples, and classes. TABLE I DETAILS OF THE CLASSIFICATION DATASETS TABLE II DETAILS OF THE FUNCTION APPROXIMATION DATASETS TABLE III DETAILS OF THE PREDICTION DATASETS Table II lists the function expressions of 1-Dsigmoid, 1-Dcosine with one peak, 1-Dsine with four peaks, 2-Dsphere, 5-DRosenbrock, and 2-DGriewank functions, as well as the number and value range of training and test samples. The details of three prediction datasets are given in Table III,involving the numbers of training and test samples. The Mackey Glass equation is derived from a nonlinear time-delay differential equation shown as follows: whereα,β,τandnare real numbers.xτis the value of variablexat timet-τ. Box Jenkins time series data and EGG data are acquired from [47] and [48], respectively. The population sizeNpand maximum number of iterations for all contrast learning algorithms are set to 50 and 250,respectively.Lmis set to 5. The corresponding parameter sets for each applied algorithm can be addressed in Table IV. They are set according to the related reference to ensure they can own the best performance. Each benchmark dataset is run 51 times to reduce random errors. All experiments are implemented on a PC with Windows 10 OS, 3.60 GHz AMD Ryzen 5 3600 6-core CPU, and 16 GB of RAM with MATLAB R2018a. In Table V, acceptable user-defined parameter settings are summarized, and they can be addressed in [15].Mis the number of dendrite layers,kandksare predefined parameters, and θsis the threshold value. The experimental results are presented in Tables VI–VII, in which the mean-squared errors (MSE) are used to calculate the output error of DNM for a given solutionXi. It is formulated in (13). TABLE IV PARAMETER SETTING OF ALGORITHMS TABLE V DETAILS OF THE CLASSIFICATION DATASETS whereTis the total number of training samples.ytandare the target and actual output vector of thetth sample,respectively. To precisely detect the significant difference between any two algorithms, a non-parametric statistical analysis method,Wilcoxon rank-sum test, is implemented [49], [50]. In this study, a significance level of 5% is set, which means that ifpvalue is less than 0.05, two compared algorithms are considered significantly different, and the former outperformsthe latter. From Tables VIII–IX, thepvalues between DSNDE and corresponding learning algorithms are listed. For a given problem, the MSE of DSNDE is highlighted when it significantly outperforms all other contrast algorithms.Otherwise, the corresponding algorithms are highlighted. The symbols +/≈/– comprehensively presents the statistical results of DSNDE versus its peers, which indicate that DSNDE performs significantly better (+), worse (–), or not significantly better and worse (≈) than the corresponding algorithm. According to these statistical results, the numbers of times that DSNDE wins others are 11 (BBO), 12 (DE), 11(DEGoS), 9 (JADE), 10 (SHADE), 10 (CJADE), 12 (NDi-DE), 10 (EBLSHADE) and 10 (EBOwithCMAR) out of 14 benchmark datasets. DSNDE is significantly better than other comparison algorithms on eight datasets. The proposed learning algorithm DSNDE shows overwhelming advantages over all contrast algorithms, including the champion of CEC2017 benchmark competition EBOwithCMAR [30].However, it should be noted that DSNDE does not obtain the best performance on a few datasets, including CF3, CF5, AF2,AF3, and AF4. On CF3, CF5, AF2, and AF4, the statistical test results show that all competitors achieve similar performances. The performance of DSNDE is not satisfactory on AF3. Seven competitors significantly outperform it. AF3 is an approximation dataset of function sine, and it is not a complex function. According to the no free lunch theorem, noalgorithm can perform the best for all problems [51]. The reason for DSNDE’s underperformance may be the special structure of the dynamic scale-free network. As we want to reduce the impact of poor individuals on the whole population, the information exchange in DSNDE is directed and limited. However, for some simple problems, all individuals could find high-quality solutions and deliver the correct search information. In this case, the search efficiency of DSNDE may not be as good as its peers. But its prominent performance on other datasets reveals the success of the proposed model. TABLE VI EXPERIMENTAL RESULTS OBTAINED BY DSNDE, BBO, DE, DEGOS AND JADE ON 14 DATASETS TABLE VII EXPERIMENTAL RESULTS OBTAINED BY SHADE, CJADE, NDi-DE, EBLSHADE AND EBOWITHCMAR ON 14 DATASETS TABLE VIII WILCOXON RANK-SUM TEST RESULTS (P -VALUES) OBTAINED BY DSNDE, BBO, DE, DEGOS AND JADE ON 14 DATASETS TABLE IX WILCOXON RANK-SUM TEST RESULTS (P -VALUES) OBTAINED BY SHADE, CJADE, NDi-DE, EBLSHADE AND EBOWITHCMAR ON 14 DATASETS Some matrix diagrams are shown in Fig. 3 to directly display the changes of weightωand thresholdθfrom initialization to end of the training by DSNDE. For the heart dataset, DNM only has 200 parameters (including 100 weight values and 100 threshold values) to be trained, which indicates the required computing resources of DNM are far less than those of ANN. Figs. 4 and 5 exhibit the classification accuracy, the error value and the receiver operating characteristic (ROC) curves of two classification datasets. The ROC curve is the average of the sensitivity over all possible specificity values [52]. The area under the ROC curve (AUC)can effectively summarize the accuracy of the classification. It takes a value from 0.5 to 1 (0.5 represents a random classification), and a value closer to 1 means the classification is more accurate. It can be observed that DSNDE obtains the best performance on accuracy and error values. Especially on the heart dataset, DSNDE overwhelmingly outperforms its peers. The AUC of DSNDE is 0.985 on the cancer dataset,and 0.842 on the heart dataset. They are also higher than the AUCs of other algorithms. All these results demonstrate the remarkable effectiveness and efficiency of DSNDE. Fig. 3. Changes of weight ω and threshold θ on heart dataset. Fig. 4. Analysis of classification dataset: Cancer. Fig. 5. Analysis of classification dataset: Heart. The performance of DSNDE on benchmark datasets can directly exhibit its pros and cons compared with its peers. But the practicality value of DSNDE still needs to be further validated by real-world challenges. Thus, in this section, an attempt is made to apply the DSNDE-trained DNM for a photovoltaic power forecasting problem, which is one of the most important research issues within the smart grid. By proposing a forecasting model based upon DNM with the aid of DSNDE, the accuracy of forecasting results is greatly improved. The actual training and test datasets of forecasting are taken from a photovoltaic power plant located in Gansu Province, China, with a sampling size of 8000 and a time interval of 15 minutes [18]. To comprehensively estimate the forecasting errors obtained by each learning algorithm, the dataset is evenly divided into 10 sets for cross-validation, and the sample size of each set is 800 [53]. Nine groups of contrast experiments are conducted by considering the training sets with 800, 1600, ..., 7200 samples, respectively.The subsequent 800 samples are used as the test set. Each group is repeatedly run six times to ensure independence and effectiveness. Hence, each algorithm is performed 54 times. To statistically measure the performance of the tested learning methods, and facilitate the comparison of approaches we compute, a mean absolute error rate (MAPE) and a root mean square error (RMSE) are introduced as follows: where the meaning of each variable here is the same as that defined in (13). Table X gives a comprehensive comparison results of nine groups and an average value on RMSE and MAPE. It can be observed that forecasting accuracy decreases when the size of training sets increases from 800 to 3200. Most algorithms obtain their worst and best performances on MAPE at sizes of 4800 and 6400, respectively. This result suggests that the most suitable ratio of the test set to the training set is 1:8 for the photovoltaic power forecasting problem, while a ratio of 1:6 is not applicable. According to the average values, DSNDE obtains the best performance on both RMSE and MAPE,which fully illustrates the practical value of DSNDE. In this paper, we propose a dynamic scale-free networkbased differential evolution to train the parameters of DNM. A scale-free network structure helps DE enhance its information exchange among individuals and improves its overall performance. The experiments on 14 benchmark datasets and a photovoltaic power forecasting problem are conducted to verify its effectiveness in training the parameters of DNM.DSNDE compares with nine powerful meta-heuristic algorithms, including the champion of CEC2017 benchmark competition EBOwithCMAR. The statistical results show that DSNDE outperforms its peers on most benchmark datasets and gains the highest accuracy on the photovoltaic power forecasting problem. In our future research, we wish to propose a population adaptation approach for DSNDE, which has the potential to further improve the training efficiency of DNM. Moreover, the proposed algorithm can be applied to address the semi-supervised classification issue [38]. TABLE X COMPREHENSIVE COMPARISON RESULT OF RMSE AND MAPE (%)IV. EXPERIMENTS ON BENCHMARK DATASETS
V. EXPERIMENTS ON PHOTOVOLTAIC POWER FORECASTING
VI. CONCLUSIONS
IEEE/CAA Journal of Automatica Sinica2022年1期