唐文明 劉桂雄
(華南理工大學(xué) 機(jī)械與汽車(chē)工程學(xué)院,廣東 廣州 510640)
?
指數(shù)函數(shù)CORDIC算法的FPGA定點(diǎn)化技術(shù)*
唐文明劉桂雄
(華南理工大學(xué) 機(jī)械與汽車(chē)工程學(xué)院,廣東 廣州 510640)
CORDIC算法廣泛應(yīng)用于多種超越函數(shù)求值,但其通用迭代算法難以用現(xiàn)場(chǎng)可編程門(mén)陣列(FPGA)計(jì)算寬范圍定義域指數(shù)函數(shù)求解.為此,文中提出一種FPGA定點(diǎn)化技術(shù),通過(guò)收斂域擴(kuò)張與迭代結(jié)構(gòu)優(yōu)化實(shí)現(xiàn)CORDIC算法的指數(shù)函數(shù)求值器.首先,應(yīng)用區(qū)間壓縮方法實(shí)現(xiàn)指數(shù)函數(shù)CORDIC算法的收斂域擴(kuò)張;其次,對(duì)CORDIC算法的迭代結(jié)構(gòu)進(jìn)行優(yōu)化;最后,通過(guò)對(duì)指數(shù)函數(shù)求值器的仿真分析與FPGA實(shí)現(xiàn),采用15級(jí)流水線結(jié)構(gòu),用雙曲系統(tǒng)CORDIC算法求解指數(shù)函數(shù),實(shí)現(xiàn)指數(shù)函數(shù)CORDIC算法的收斂域擴(kuò)張.仿真與實(shí)驗(yàn)表明:相比于通用CORDIC算法,所提算法的迭代模式節(jié)省約1/3硬件資源,少至2個(gè)乘法單元,使收斂域由[-1.118 2,1.118 2] 擴(kuò)張到[-6,6],運(yùn)算結(jié)果相對(duì)誤差達(dá)10-3.
CORDIC算法;指數(shù)函數(shù);區(qū)間壓縮;收斂域;迭代結(jié)構(gòu)優(yōu)化;現(xiàn)場(chǎng)可編程門(mén)陣列
指數(shù)函數(shù)是一種常見(jiàn)超越函數(shù),廣泛應(yīng)用于各種數(shù)值計(jì)算中.目前指數(shù)求值方法主要有查表法、泰勒展開(kāi)式法、多項(xiàng)式逼近法等[1-5];其中查表法隨結(jié)果精度提高或輸入值范圍增大,需大量存儲(chǔ)單元;泰勒展開(kāi)式法需要復(fù)雜求導(dǎo)運(yùn)算以及大量的乘、除法器,速度慢、效率低;多項(xiàng)式逼近法算法非常復(fù)雜,且精度非常有限.基于坐標(biāo)旋轉(zhuǎn)數(shù)字計(jì)算 (CORDIC) 方法的基本思想是通過(guò)一系列與運(yùn)算基數(shù)相關(guān)的固定角度不斷旋轉(zhuǎn)來(lái)逼近所需的旋轉(zhuǎn)角度,其算法結(jié)構(gòu)主要涉及移位、加法運(yùn)算,便于硬件實(shí)現(xiàn),但求解指數(shù)函數(shù)時(shí)因收斂域窄而導(dǎo)致定點(diǎn)化實(shí)現(xiàn)精度低[6-7].鑒于現(xiàn)場(chǎng)可編程門(mén)陣列(FPGA)靈活的時(shí)序、組合電路運(yùn)行模式,可快速并行定點(diǎn)化信號(hào)處理等特點(diǎn)[8-9],文中研究了一種定點(diǎn)化指數(shù)函數(shù)雙曲系統(tǒng)CORDIC算法的FPGA實(shí)現(xiàn)技術(shù),并對(duì)輸入值進(jìn)行區(qū)間壓縮以實(shí)現(xiàn)指數(shù)函數(shù)CORDIC算法的收斂域擴(kuò)張[10],使CORDIC算法可以指數(shù)函數(shù)運(yùn)算;并將該算法應(yīng)用于超聲相控陣的時(shí)間校準(zhǔn)增益(Time Corrected Gain,TCG)功能[11-12].
CORDIC算法作為一種通用迭代算法,通過(guò)控制向量可在線性坐標(biāo)系、圓坐標(biāo)系和雙曲坐標(biāo)系下旋轉(zhuǎn)和定向操作,但指數(shù)計(jì)算通常在雙曲坐標(biāo)系下完成[13-15].圖1為雙曲函數(shù)坐標(biāo)旋轉(zhuǎn)模型圖,圖中,θ為射線OVn、雙曲線和x軸圍成面積的2倍(弧度值),φ為射線OV1、雙曲線和x軸圍成面積的2倍,分別對(duì)應(yīng)陰影部分面積的2倍,雙曲函數(shù)x2-y2=C2上點(diǎn)V1(x1,y1)沿著上半軸曲線移動(dòng)到點(diǎn)Vn(xn,yn),可表示為
(1)
圖1 雙曲函數(shù)坐標(biāo)旋轉(zhuǎn)模型
將θ分解成一系列θi(i=1,2,…,n)累加和形式,即
令θi滿足條件tanhθi=2-i,式(1)可表示為
(2)
其中旋轉(zhuǎn)方向因子為di,校模因子K為
當(dāng)C=0時(shí),雙曲線x2-y2=C2最終退化為直線,暫不考慮K,則迭代遞推關(guān)系為
(3)
式中:i=1,2,3,…,n.
為滿足迭代序列收斂,迭代序列i的取值從第4項(xiàng)開(kāi)始,當(dāng)i=kn(kn=3kn-1+1,k1=4,n∈Z+)時(shí),重復(fù)此次迭代,即i=1,2,3,4,4,…,13,13,…,40,40….
經(jīng)過(guò)n次旋轉(zhuǎn)后,有
(4)
若取初值x1=y1=1/K,z1=θ,則有
xn+1=yn+1=coshz1+sinhz1=eθ.這就是雙曲坐標(biāo)指數(shù)函數(shù)CORDIC算法的求解機(jī)理.
但根據(jù)式(3)可得雙曲線坐標(biāo)CORDIC算法的收斂范圍為
其中,θmax為所有旋轉(zhuǎn)面積(弧度)和,即該算法收斂的最大弧度值.
由于θ的收斂范圍較小,因而實(shí)際應(yīng)用意義受到限制,必須對(duì)收斂域進(jìn)行擴(kuò)張?zhí)幚?
為解決雙曲線坐標(biāo)CORDIC算法收斂范圍狹小的問(wèn)題,需通過(guò)對(duì)收斂域進(jìn)行擴(kuò)展以滿足指數(shù)函數(shù)求解算法的通用性.
2.1區(qū)間壓縮方法實(shí)現(xiàn)指數(shù)CORDIC算法收斂域擴(kuò)張
eθ=eQln 2+γ=2Qeγ,
2.2整數(shù)Q值和γ值定點(diǎn)化值處理
若Q表示θ與ln 2相除得到商的整數(shù)部分,γ表示θ與ln 2相除得到的余數(shù)部分,則小數(shù)部分R表示為γ與ln 2相除得到的商,即:Q=[θ/ln 2],γ=θMOD ln 2,R=γ/ln 2,則
FPGA通常實(shí)現(xiàn)定點(diǎn)數(shù)運(yùn)算,故在求解Q、γ過(guò)程中必須進(jìn)行定點(diǎn)化處理,擴(kuò)大65 536 (216)倍實(shí)現(xiàn)定點(diǎn)化,把除法運(yùn)算轉(zhuǎn)換成乘法運(yùn)算,有:
(5)
式中,Q∈Z.
2.3Q值與定點(diǎn)化γ值的FPGA求解
Q=H10
(6)
即H10為整數(shù)部分Q值.
圖2 [θ·2216·(216/ln 2)]對(duì)應(yīng)二進(jìn)制數(shù)位結(jié)構(gòu)
216·216·γ=(ln 2·216)·LH16≈45 426·LH16
(7)
則[45 426·LH16]位寬為32 bit.圖3給出對(duì)應(yīng)的二進(jìn)制數(shù)位結(jié)構(gòu),用H16表示bit[31:16]共16 bit,如是有:
216·γ=H16
(8)
即H16為余數(shù)部分γ的定點(diǎn)化值(擴(kuò)大216倍).
圖3 [45 426·LH16]對(duì)應(yīng)二進(jìn)制數(shù)位結(jié)構(gòu)
根據(jù)式(5)-(8),基于FPGA定點(diǎn)化技術(shù)實(shí)現(xiàn)該算法,整數(shù)Q、γ定點(diǎn)化值求解原理框圖如圖4所示.由圖4可見(jiàn),只用到兩個(gè)乘法器,即可實(shí)現(xiàn)整數(shù)Q、定點(diǎn)化余數(shù)γ值求解,實(shí)現(xiàn)定點(diǎn)化區(qū)間壓縮,達(dá)到收斂域擴(kuò)張的效果.
圖4 整數(shù)Q和定點(diǎn)化余數(shù)γ值求解原理框圖
Fig.4The principle diagram of calculating solution for integerQand fixed-point remainderγ
3.1指數(shù)CORDIC算法的FPGA實(shí)現(xiàn)
圖5為指數(shù)CORDIC算法求解指數(shù)函數(shù)的流水線結(jié)構(gòu)圖,由于坐標(biāo)xn、yn的初值與迭代模式相同,可以合成為一個(gè)迭代通道,省掉坐標(biāo)yn的迭代過(guò)程,這時(shí)式(3)轉(zhuǎn)化為
(9)
式中:i=1,2,3,…,n.
式(9)中xi、zi迭代運(yùn)算物理意義:算法只需用到兩條迭代數(shù)據(jù)鏈,可省去一條迭代數(shù)據(jù)鏈,即僅對(duì)角度zi與坐標(biāo)xi進(jìn)行迭代,這種方式在硬件實(shí)現(xiàn)上可節(jié)省約1/3硬件資源[16],極大提高了FPGA處理算法的實(shí)時(shí)性.
圖5 指數(shù)CORDIC算法求解指數(shù)函數(shù)的流水線結(jié)構(gòu)圖
Fig.5Pipeline structure diagram of exponential CORDIC algorithm solving the exponential function
圖6所示為指數(shù)函數(shù)CORDIC算法的Modelsim(FPGA編程語(yǔ)言行為級(jí)仿真軟件)仿真結(jié)果.采用15級(jí)流水線CORDIC算法結(jié)構(gòu),輸入定點(diǎn)化參數(shù)[θ·216],其中θ∈[-6,6].圖中變量theta對(duì)應(yīng)θ值、exp對(duì)應(yīng)eθ指數(shù)函數(shù)值,橫坐標(biāo)對(duì)應(yīng)仿真時(shí)間(10 ns仿真時(shí)鐘周期),縱坐標(biāo)為幅度,比如:此時(shí)垂直光標(biāo)線位置對(duì)應(yīng)的輸入值θ=1,輸出值為e1≈ 2.718 23,與理論值2.718 28相比,相對(duì)誤差為
圖6 指數(shù)CORDIC算法的Modelsim仿真結(jié)果
Fig.6The results of exponential CORDIC algorithm by Modelsim simulation
收斂域θ∈[-6,6]時(shí)各輸入值的FPGA運(yùn)算結(jié)果如表1所示.
表1收斂域θ∈[-6,6]時(shí)各輸入值的 FPGA運(yùn)算結(jié)果
Table 1Results of FPGA operation when input partial values of the convergence regionθ∈[-6,6]
θeθ·65536eθ實(shí)際值eθ理論值相對(duì)誤差/%-61620.002470.00248-0.275-54410.006730.00674-0.131-412000.018310.01832-0.028-332620.049770.04979-0.026-288690.135330.13534-0.004-1241090.367870.36788-0.001-0655361.000001.000000.000-11781422.718232.71828-0.002-24842487.389047.389060.000-3131617620.0832520.08554-0.011-4357788854.5942454.59815-0.007-59726464148.41406148.413160.001-626436608403.39063403.42879-0.009
可以看出,相對(duì)誤差最大為-0.275%(擴(kuò)大小數(shù)位),其相對(duì)誤差達(dá)到10-3,可滿足很多實(shí)際工程上精度的要求,在實(shí)際應(yīng)用中可進(jìn)一步擴(kuò)大定點(diǎn)化倍數(shù)以減小相對(duì)誤差.
與工程上FPGA實(shí)現(xiàn)指數(shù)函數(shù)算法的經(jīng)典查表法性能對(duì)比,可通過(guò)算法精度與所耗內(nèi)存資源量進(jìn)行分析.若精度提高L位(擴(kuò)大10L倍),應(yīng)用查表法,對(duì)應(yīng)內(nèi)存資源(單位:bit)為
LRAM=10L
(10)
而使用文中指數(shù)CORDIC算法求解器時(shí),增加的二進(jìn)制數(shù)位寬(單位:bit)為
LB=L·log210≈3.321 9L
(11)
因文中為15級(jí)流水線兩個(gè)迭代通道,故增加的存儲(chǔ)單元bit數(shù)約為
(12)
由式(10)、(12)可得,基于FPGA指數(shù)求值器算法精度與消耗內(nèi)存量增加關(guān)系,文中CORDIC算法與經(jīng)典查表法的性能對(duì)比關(guān)系如表2所示.
表2CORDIC、查表法指數(shù)求值器(L=1,2,…,5)內(nèi)存增加量
Table 2The number of increasing memory of exponential eva-luator through CORDIC and look-up table (L=1,2,…,5)
精度提高位數(shù)L文中CORDIC算法增加內(nèi)存數(shù)/bit經(jīng)典查表法增加內(nèi)存數(shù)/bit110010220010032991000439910000054991000000
由表2可以看出,隨著精度增加,指數(shù)求解器所耗內(nèi)存資源文中CORDIC算法成線性增加,而經(jīng)典查表法成指數(shù)增加,體現(xiàn)了文中CORDIC算法的優(yōu)越性.
3.2指數(shù)CORDIC算法在TCG技術(shù)中的應(yīng)用
超聲相控陣儀器為了對(duì)不同探測(cè)深度(時(shí)刻)的缺陷回波有統(tǒng)一的評(píng)判當(dāng)量,使得相同尺寸缺陷回波幅度與其在材料中的深度無(wú)關(guān),對(duì)不同深度的反射回波幅度進(jìn)行增益dB補(bǔ)償,將所有的深度補(bǔ)償值連成一條曲線,即TCG曲線.算法上是通過(guò)增益控制器實(shí)現(xiàn)dB到放大倍數(shù)A的轉(zhuǎn)換:
dB=20lgA?A=e(dB·ln10)/20
(13)
使用廣州多浦樂(lè)電子科技有限公司生產(chǎn)的超聲相控陣儀器(PA2000)對(duì)B型相控陣標(biāo)準(zhǔn)測(cè)試模塊中深度5、10 mm的φ1 mm平底孔進(jìn)行檢測(cè)實(shí)驗(yàn),通過(guò)式(13)做出一系列增益補(bǔ)償曲線,其TCG技術(shù)增益補(bǔ)償效果如圖7所示.圖中橫坐標(biāo):左半部分A掃圖表示回波幅度相對(duì)百分比 (單位:%)、右半部分B掃圖表示水平掃查位移(單位:mm),縱坐標(biāo)表示垂直掃查深度(單位:mm),圖中B掃光標(biāo)位置對(duì)應(yīng)A掃圖,曲線列舉了5個(gè)點(diǎn)的增益補(bǔ)償連線(先用標(biāo)準(zhǔn)工件檢測(cè),把不同深度相同直徑缺陷回波增益調(diào)到相同的回波高度,再記錄下每個(gè)深度缺陷所額外增加的增益(ΔdB)值,用相應(yīng)ΔdB值對(duì)不同位置回波進(jìn)行增益補(bǔ)償形成TCG曲線,對(duì)缺陷進(jìn)行評(píng)判),可以看出經(jīng)TCG曲線補(bǔ)償后不同深度平底孔幾乎相同(圖中標(biāo)簽①、②所示),為缺陷評(píng)判提供了有力保證.
圖7 TCG技術(shù)增益補(bǔ)償效果
在收斂域擴(kuò)張方法上,根據(jù)輸入值θ與ln2相除的結(jié)果,通過(guò)分析商、整數(shù)、小數(shù)、余數(shù)部分的關(guān)系,最終根據(jù)關(guān)系式θ=Q·ln2+γ,把eθ的計(jì)算等效轉(zhuǎn)化為eγ(|γ| 算法只需用到兩條迭代數(shù)據(jù)鏈,可省去一條迭代數(shù)據(jù)鏈,即僅對(duì)角度zi與坐標(biāo)xi進(jìn)行迭代,這種方式在硬件實(shí)現(xiàn)上可以節(jié)省約1/3硬件資源,大幅度提高FPGA處理算法實(shí)時(shí)性. 通過(guò)15級(jí)的迭代流水線,對(duì)CORDIC算法指數(shù)求解器進(jìn)行FPGA實(shí)現(xiàn)與ModelSim仿真,求解值相對(duì)誤差達(dá)到了10-3,能滿足很多實(shí)際工程要求,在超聲相控的TCG功能中具有重要實(shí)用價(jià)值. [1] LAKSHMI B,DHAR A S.CORDIC architectures:a survey [J].VLSI Design,2010,2010:1-19. [2]SIDAHOAO N,CONSTANTINIDES G A,CHEUNG P Y.Architectures for function evaluation on FPGAs [C]∥Proceedings of IEEE International Symposium on Circuits and Systems.Bangkok:IEEE,2003:804-807. [3]YLOSTALO J.Function approximation using polynomials [J].IEEE Signal Processing Magazine,2006,23(5):99-102. [4]MULLER J M.A few results on table-based methods [J].Reliable Computing,1999,5(3):279-288. [5]NAGAYAMA S,SASAO T.Programmable numerical function generators based on quadratic approximation [C]∥Proceedings of Asia South Pacific Design Automation Conference.Yokohama:IEEE,2006:378-383. [6]LAKSHMI B,DHAR A S.VLSI architecture for parallel radix-4 CORDIC [J].Microprocessors and Microsystems,2013,37(1):79-86. [7]沃焱,徐角.精確的快速極坐標(biāo)諧波變換 [J].華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版),2012,40(4):23-29. WO Yan,XU Jiao.Accurate and fast harmonic transform of polar coordinates [J].Journal of South China University of Technology(Natural Science Edition),2012,40(4):23-29. [8]牟勝梅,李兆剛.一種面向FPGA的指/對(duì)數(shù)函數(shù)求值算法 [J].計(jì)算機(jī)工程與應(yīng)用,2011,47(33):59-61. MOU Sheng-mei,LI Zhao-gang.FPGA-oriented evaluation algorithm for exponential and logarithm functions [J].Compu-ter Engineering and Applications,2011,47(33):59-61. [9]牟勝梅,楊曉東.eθ的CORDIC 迭代初值選取策略及其硬件實(shí)現(xiàn) [J].計(jì)算機(jī)工程與應(yīng)用,2007,43(6):79-80. MOU Sheng-mei,YANG Xiao-dong.Policy of choosing initial values for CORDIC iteration and its effect on hardware implementation of exponential function [J].Compu-ter Engineering and Applications,2007,43(6):79-80. [10]何曉華,謝建精.基于擴(kuò)張收斂域CORDIC的指數(shù)變換器設(shè)計(jì) [J].計(jì)算機(jī)仿真,2010,27(7):365-368. HE Xiao-hua,XIE Jian-jing.Design of exponential function generator based on CORDIC algorithm with expanded range of convergence [J].Computer Simulation,2010,27(7):365-368. [11]VASJANOV A,BARZDENAS V.Design of a time-gain-compensation amplifier for ultrasonic echo signal processing [C]∥Electrical,Electronic and Information Sciences.Vilnius:IEEE,2015:1-6. [12]龐林花,文桂林.報(bào)廢回收汽車(chē)零部件再利用的無(wú)損檢測(cè)技術(shù) [J].華南理工大學(xué)學(xué)報(bào)(自然科學(xué)版),2014,42 (11):55-62. PANG Lin-hua,WEN Gui-lin.A study on nondestructive testing technology for recycling of re-usable automotive parts [J].Journal of South China University of Techno-logy (Natural Science Edition),2014,42(11):55-62. [13]楊宇,毛志剛,來(lái)逢昌.一種改進(jìn)的流水線CORDIC算法結(jié)構(gòu) [J].微處理機(jī),2006(4):10-14. YANG Yu,MAO Zhi-gang,LAI Feng-chang.An improved pipeline structure for CORDIC [J].Icroprocessors,2006(4):10-14. [14]NASCIMENTO I,JARDIM R,MORGADO-Dias F.A new solution to the hyperbolic tangent implementation in hardware:polynomial modeling of the fractional exponential part [J].Neural Computing & Applications,2013,23(23):363-369. [15]蘇誠(chéng),韓俊剛.一種對(duì)數(shù)求值器的硬件實(shí)現(xiàn) [J].協(xié)議·算法及仿真,2013,26(10):7-10. SU Cheng,HAN Jun-gang.A hardware implement of the logarithmic evaluator [J].Electronic Sci & Tech,2013,26(10):7-10. [16]劉美娟,許建華,張超.基于CORDIC算法的對(duì)數(shù)放大器的FPGA實(shí)現(xiàn) [J ].儀器儀表學(xué),2008,29(4):328-331.LIU Mei-juan,XU Jian-hua,ZHANG Chao.Implementation of logarithmic amplifier in FPGA based on CORDIC algorithm [J].Chinese Journal of Scientific Instrument,2008,29(4):328-331. Supported by the National Key Foundation for Exploring Scientific Instrument(2013YQ230575) FPGA Fixed-Point Technology of Exponential Function Achieved by CORDIC Algorithm TANGWen-mingLIUGui-xiong (School of Mechanical and Automotive Engineering,South China University of Technology,Guangzhou 510640,Guangdong,China) Although CORDIC algorithm has been widely used in various transcendental functions,its general iterative algorithm is inefficient in using FPGA (Field Programmable Gate Array) to solve the exponential function in a wide-range domain. In order to solve this problem,an FPGA fixed-point technology,which expands the convergence region and optimizes the iteration structure to implement CORDIC algorithm solver,is designed. In the investigation,firstly,range compression method is employed to realize the convergence domain expansion of exponential function achieved by CORDIC algorithm. Secondly,the iteration structure of CORDIC algorithm is optimized. Then,the exponential function achieved by CORDIC algorithm is analyzed in a simulative way and implemented in FPGA. Finally,a 15-grade pipeline structure as well as a hyperbolic method is used to implement the expansion in convergence domain of CORDIC algorithm. Simulated and experimental results show that,in comparison with the general CORDIC algorithm,the proposed algorithm saves about 1/3 hardware resources,uses only two DSP multiplexer units,expands the convergence domain from [-1.118 2,1.118 2] to [-6,6],and achieves a relative error low to 10-3. CORDIC algorithm; exponential function; range compression; convergence region; iteration structure optimization; field programmable gate array 1000-565X(2016)07-0009-06 2015-11-23 國(guó)家重大科學(xué)儀器設(shè)備開(kāi)發(fā)專(zhuān)項(xiàng)(2013YQ230575);廣州市科技計(jì)劃項(xiàng)目(201509010008) 唐文明(1983-),男,博士生,主要從事無(wú)損檢測(cè)與數(shù)字信號(hào)處理技術(shù)研究.E-mail:twm316@163.com TH 878.2doi: 10.3969/j.issn.1000-565X.2016.07.002