LI Tong-yan(李彤巖)
Meteorological Information and Signal Processing Key Laboratory of Sichuan Higher Education Institutes,Chengdu University of Information Technology,Chengdu 610225,China
Recent global expansion in the demand for telecommunication services has resulted in a considerable growth of networks in terms of size,complexity and bandwidth.Networks often consist of hundreds or even thousands of interconnected nodes from different manufacturers using various transport mediums and systems.As a result,when a network problem or failure occurs,it is possible that a very large volume of alarms are generated.These alarms describe lots of detailed but very fragmented information about the problems.Typically,alarm flow is useful to find and isolate faults.However,it is also very difficult to determine the root cause of the faults.As we know,alarm correlation is used to help the faults diagnosis and localization[1-3].In the past,the knowledge of alarm correlation was mainly obtained by network experts.With the development of telecommunication networks,it is now much more difficult for the experts to keep up with the rapid change of networks and discover the real useful knowledge from the alarms.Therefore,researchers have adopted many advanced methods including association rules mining to analyze the alarm correlation in order to resolve this problem.Data mining is a science of extracting implicit,previously unknown,and potentially useful information from large data sets or databases,and is also known as knowledge discovery in databases(KDD).The problem of discovering association rules was introduced in Refs.[4-6].Nowadays,alarm correlation analysis based on association rules mining is playing an important part in current research and drawing more and more attentions.
Alarm association rules mining usually includes two steps:frequent patterns mining and association rules generation.The first step is how to find weighted frequent patterns(WFPs)from the alarm data,and then the final realization is how to generate frequent patterns based on these WFPs.Rule itself reflects the correlation between alarm messages.Mining useful rules accurately are important for network ex-pert to find the root cause and locate the fault quickly.
Association rules are generated according to the two criteria of support threshold and confidence threshold[7-9].A classical algorithm of rules generation is a recursive depth first search (DFS)method proposed in Refs.[10-11].However,efficiency of this method is very low,for it must test all frequent patterns as well as their parameters one by one to generate association rules.Meanwhile,the number of frequent patterns is too large to mine efficiently,resulting in that there are too many redundant rules to accurately reflect the relationship between the rules of alarms.For instance,some rules have redundant information with the same meaning in different rules;some rules which meet both support threshold and confidence threshold cannot fully represent implication relations,sometimes they only represent a complicated relationship.In this case,it is not enough to generate association rules with those two criteria in some specific applications.
In this paper,a weighted frequent pattern based rules generation(WFP-RG)method is proposed for the condition of unequal telecommunication alarm transactions.And then a confidence covered value based rules generation method will be proposed,and finally the tests give the realization of association rule mining system and the results.
The rest of this paper is organized as follows:section 1 shows the method of rules generation,section 2 gives the treatment of the rules including how to generate and reduce the rules,section 3shows a series of experiments and their simulation results,and finally a conclusion is made in section 4.
Traditional rules generation algorithms are on the condition of equal itemsets.However,communication alarm itemsets are usually unequal;the rules generated objects are all the WFPs.For some WFPs,their sub-pattern may not be weighted frequently.Obviously,the non-weighted frequent itemsets can not appear in the rules,and we must delete all the non-weighted frequent itemsets from the subsets of WFPs.In this paper,a rules generation based on WFP method is proposed(shown in Algorithm 1).The depth-first algorithm is based on the improved recursive algorithm,making that the rules generation algorithm can not only maintain high efficiency,but also be adapt to the specific requirements of alarm data from communication networks.
Algorithm 1 WFP-RG algorithm
Input:all the weighted frequent itemsets WL,the maximum frequent itemsets MWL, confidence threshold minconf;
Output:strong association rules.
WFP-RG algorithm generates association rules from the maximum weighted frequent itemsets,which is different from traditional methods.In addition,WFP-RG algorithm increases the judgment whether the subsets of the maximum weighted frequent itemsets are frequent or not.Idea of this method is this:if the left side of the rule which shows that some subsets of the maximum weighted frequent itemsets is infrequent,then this subset will be deleted directly,without generating association rules;if the right side of the rule which shows that some subsets of the maximum weighted frequent itemsets is infrequent,then this subset will be deleted directly,without generating association rules;only when both the left and right sides of the rule are frequent,confidence threshold may be considered to judge whether the rule is strong;if the right of rule is a subset of some strong association rules,then this rule will be deleted as redundant rules.After judging with above conditions,strong rules that meet the condition of weighted frequent itemsets will be generated,without producing a large number of redundant rules.This method can reduce the number of rules and recursion,which is more suitable for generating rules with larger number and longer pattern.It proves that this method has more advantages in time complexity and space complexity.
WFP-RG algorithm can effectively find the rules which meet the support threshold and confidence threshold in an alarm transaction database.Meanwhile a large number of redundant rules may be removed during the process of rule generation.However,there are also some redundant rules,for these different rules express the same meaning.Additionally,although some rules meet the requirements of weighted support and confidence,they can not really express the relationship between faults;sometimes they only show a complicated relationship between alarms.Such a rule is meaningless for finding the root cause of fault.Reference[12]described an example of concurrency alarm situation in a communication network(shown in Fig.1).In Fig.1,NMS means network management system,MSC means mobile switching center,and BSC means base system controller.And we can see that a series of alarm are concurrent alarms,for they are triggered by a root alarm.Rules generated by these concurrent alarms are redundant for the root fault.
Fig.1 An example of communication network
The purpose of rule treatment is to improve the efficiency and provide useful knowledge for finding the root cause of fault by reducing the meaningless and redundant rules.For communication network alarms,this paper proposes a rule producing method based on the confidence covered value.The confidence covered value is defined as follows.
Definition 1Confidence covered value:suppose R1is A?C,then the confidence isα(0≤α≤1);rule R2is B?C,confidence isβ(0≤β≤1);if A ?Band(β-α)≤θmin,then θminis called the confidence covered value,which is a positive value of 0and 1.
Confidence covered value is a basis of judging whether a rule is redundant or not.If the left of a rule adds an antecedent constraint,but it can not guarantee a certain degree of confidence increased by a certain covered value,then this rule is considered as a redundant rule to be removed.
After the rule is generated,if using the confidence covered value to judge at first,a large number of redundant rules can be deleted,so as to preserve the more valuable rule information.Secondly,the rules will be classified in order to discover the root causes of failure.Regular basis of classification is based on the topology correlation clustering into the network element with a group.If the right of a rule is from the same network group,they will be linked by a chain.Obviously,the rules with the same right part will be linked by a chain.The way of linking rules is shown in Fig.2.In Fig.2,M and AB describe the right side of a rule;meanwhile,N,A &B &AB,and X &Y present the left side of the rule.
Fig.2 The way of linking rules
Based on the linked method,we proposed a new rule produced algorithm as follows.
Algorithm 2The rule produced algorithm
Input:the original rules sets R1,the confidence covered valueθmin;
Output:the produced rule sets R2.
According to the confidence covered value,the rules may be generated.Algorithm 2 gives the method that how to produce rule sets.
A series of experiments have been done to show the performance of our algorithms.We collect original alarm data from telecommunication network in a city.Parameters of alarms are shown in Fig.3.
Fig.3 Original alarm data
We test three groups of original alarm data using the mining system,and the result is shown in Table 1.As can be seen,the generating number of rules is influenced by the factors of the alarm number,the alarm type,the minimum support and the minimum confidence.For example,we test the third group of data,which includes 7 190alarm transactions and 126different types,and the number of WFPs is 165.The wminsupis set to 0.025%and minconf is set to 60%.Running with the algorithm WFP-RG,48rules are generated,in which only 2rules meet 100%confidence,for example,one rule is 0x01 0xff 0xff 0xff 0xff &0x0b0xff 0xff 0xff 0xff=>0x07 0xff 0xff 0xff 0xff &0x06 0xff 0xff 0xff 0xff &0x05 0xff 0xff 0xff0xff.According to the network topology,we can easily find where may have failures.Similarly,it is possible to find rules that meet a high degree of confidence,and experts may find some useful information from these rules.
Table 1 Results of association rules mining
Then we test the efficiency of our algorithm WFP-RG and traditional rules generation method depth first searchrules generation(DFS-RG).Figures 4and 5show the flex performances of the two algorithms in running time and memory occupied rate.The minimum support is fixed up to 0.25.From the figures,although running time and memory occupied rate of the two algorithms increase with the number of alarms linearly,WFP-RG increases more slowly than DFS-RG,and it has better flex performance than DFS-RG.
Fig.4 Running time changes with the number of alarms
Fig.5 Memory occupied changes with the number of alarms
The most important step of association rules mining is how to generate and produce rules efficiently.This paper presents a rules generation algorithm WFP-RG based on the characteristic of alarm data.For there are a lot of redundant rules and non-covered rules,we propose a novel rule processing method to resolve these problems.Finally,the mining system is used to test and verify the efficiency of WFPRG algorithm.
[1]Koca F,S?zer H,Abreu R.Spectrum-Based Fault Localization for Diagnosing Concurrency Faults[J].Testing Software and Systems,2013,8254:239-254.
[2]Estima J O,Marques Cardoso A J.A New Approach for Real-Time Multiple Open-Circuit Fault Diagnosis in Voltage-Source Inverters[J].IEEE Transactions on Industry Applications,2011,47(6):2487-2494.
[3]Tang Y N,Al-Shaer E,Boutaba R.Efficient Fault Diagnosis Using Incremental Alarm Correlation and Active Investigation for Internet and Overlay Networks[J].IEEE Transactions on Network and Service Management,2008,5(1):36-49.
[4]Agrawal R,Srikant R.Fast Algorithm for Mining Association Rules[C].Proceedings of the 20th VLDB Conference,Santiago,Chile,1994:487-499.
[5]Brauckhoff D,Dimitropoulos X,Wagner A,et al.Anomaly Extraction in Backbone Networks Using Association Rules[J].IEEE/ACM Transactions on Networking (TON),2012,20(6):1788-1799.
[6]Li T Y,Li X M.IULFP:an Efficient Incremental Updating Algorithm Based on LFP-tree for Mining Association Rules[C].Proceedings of the International Conference ICCASM 2010,Taiyuan,China,2010:426-430.
[7]Bayaedo R J,Agrawal R.Mining the Most Interesting Rulesz[C].Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,San Diego,1999:145-154.
[8]Huang Z,Lu X,Duan H.Mining Association Rules to Support Resource Allocation in Business Process Management[J].Expert Systems with Applications,2011,38(8):9483-9490.
[9]Li T Y,Li X M.Novel Alarm Correlation Analysis System Based on Association Rules Mining in Telecommunication Networks[J].Information Sciences,2010,180:2960-2978.
[10]Li A,Pu J Y.Optimization of Ship Field Repair Scheduling Based on Depth First Search Method[J].Applied Mechanics and Materials,2013,321/322/323/324:2152-2156.
[11]Duhamel C,Lacomme P,Prodhon C.AHybrid Evolutionary Local Search with Depth First Search Split Procedure for the Heterogeneous Vehicle Routing Problems[J].Engineering Applications of Artificial Intelligence,2012,25(2):345-358.
[12]Xu Q F.Alarm Correlation Analysis Based on Data Mining[M].Beijing:Beijing University of Post and Telecommunication,2007.(in Chinese)
Journal of Donghua University(English Edition)2015年2期