胡毅 朱子江
摘? 要: 對于傳統(tǒng)云環(huán)境大數(shù)據(jù)聚類中的量子進化方法的聚類精準度比較低的問題,為了降低存儲開銷,提高數(shù)據(jù)管理能力與調(diào)度能力,提出將優(yōu)化粒子群算法作為基礎的云環(huán)境大數(shù)據(jù)聚類算法,對云環(huán)境大數(shù)據(jù)聚類原理進行分析,將傳統(tǒng)模糊C均值聚類作為基礎,通過粒子群聚類算法對大數(shù)據(jù)聚類算法進行改進,從而實現(xiàn)空間分割,得出云存儲系統(tǒng)的海量數(shù)據(jù)模糊聚類。利用粒子群聚類方法分配聚類數(shù)據(jù)離散成本,得到數(shù)據(jù)聚類信息濃度;與粒子群優(yōu)化聚類約束條件結合,得到云環(huán)境大數(shù)據(jù)聚類中心最優(yōu)解。仿真結果表明,此算法的數(shù)據(jù)聚類精準度比較高,具有良好的收斂性能。
關鍵詞: 大數(shù)據(jù)聚類; 云環(huán)境; 粒子群優(yōu)化; 空間分割; 模糊聚類; 仿真測試
中圖分類號: TN919?34? ? ? ? ? ? ? ? ? ? ? ? ? ? ?文獻標識碼: A? ? ? ? ? ? ? ? ? ? ? 文章編號: 1004?373X(2020)14?0072?04
PSO?based big data clustering algorithm in cloud environment
HU Yi, ZHU Zijiang
(South China Business College Guangdong University of Foreign Studies, Guangzhou 410545, China)
Abstract: As the clustering accuracy of the quantum evolution method of the big data clustering in the traditional cloud environment is relatively low, a PSO?based big data clustering algorithm in the cloud environment is proposed to reduce the storage cost and improve the abilities of data management and scheduling. The principle of big data clustering in the cloud environment is analyzed. By taking the traditional fuzzy C?means clustering as the basis, the big data clustering algorithm is improved by means of the particle swarm clustering algorithm, so as to achieve the spatial segmentation and get the fuzzy clustering of mass data in the cloud storage system. The discrete cost of clustering data is distributed by means of the particle swarm clustering method to get the information concentration of data clustering, and is combined with the clustering constraint condition of particle swarm optimization to get the optimal solution of big data clustering center in the cloud environment. The simulation results show that the algorithm has high accuracy of data clustering and good convergence performance.
Keywords: big data clustering; cloud environment; particle swarm optimization; space division; fuzzy clustering; simulation testing
0? 引? 言
云計算概念是IBM于2007年提出的。云計算是并行處理、分布式計算、網(wǎng)格計算之后所發(fā)展起來的最新計算方式,其將各種互聯(lián)計算、數(shù)據(jù)、存儲和使用等資源整合,從而能夠?qū)崿F(xiàn)多層次虛擬化和抽象,用戶只需要和網(wǎng)絡連接,就能夠利用云計算強大的計算和存儲能力實現(xiàn)功能。基于云計算背景,大數(shù)據(jù)信息處理能夠?qū)崿F(xiàn)數(shù)據(jù)聚類,利用大數(shù)據(jù)的特征參量可以對數(shù)據(jù)進行分析。基于數(shù)據(jù)聚類可實現(xiàn)大數(shù)據(jù)的創(chuàng)建,并且利用模式識別與診斷實現(xiàn)服務分析。
1? 云環(huán)境大數(shù)據(jù)存儲的設計
云計算是指通過現(xiàn)代互聯(lián)網(wǎng)對結構模型與存儲空間進行動態(tài)擴展。要想以云計算作為背景,進行分類挖掘與大數(shù)據(jù)存儲,首先就要實現(xiàn)大數(shù)據(jù)存儲機制架構的創(chuàng)建。在云環(huán)境中,大數(shù)據(jù)存儲通過虛擬化存儲在計算機集群開展云計算部署,通過USB磁盤層、結構層、計算機等構成,企業(yè)利用終端就能夠使用,通過分布式計算機就能進行計算。
云環(huán)境大數(shù)據(jù)存儲結構如圖1所示。
利用圖1所示結構,將屋內(nèi)分配應用到云計算虛擬機中。通過式(1)、式(2)實現(xiàn)優(yōu)化聚類算法,利用最優(yōu)解實現(xiàn)云計算背景中大數(shù)據(jù)特點聚類物理分配,公式為:
[x=12μ(1+μ+(μ+1)(μ-3))]
[x=12μ(1+μ+(μ+1)(μ-3))]
為了避免粒子陷入局部最優(yōu),實現(xiàn)大數(shù)據(jù)信息特征矢量Xi存檔,計算公式為:
[li(k)=(1-ρ)li(k-1)+γf(xi(k))]
設置聚類閾值為Nth,在Neff