XU Kun ,FAN Guotian ,ZHOU Yi,ZHAN Haisheng ,and GUO Zongyi
(1.ZTE Corporation,Shenzhen 518057,China;
2.Xidian University,Xi'an 710000,China)
Abstract Antenna mechanical pose measurement has always been a crucial issue for radio frequency(RF)engineers,owning to the need for mechanical pose adjustment to satisfy the changing surroundings.Traditionally,the pose is estimated in the contact way with the help of many kinds of measuring equipment,but the measurement accuracy cannot be well assured in this way.We propose a non-contact measuring system based on Structure from Motion(SfM)in the field of photogrammetry.The accurate pose would be estimated by only taking several images of the antenna and after some easy interaction on the smartphone.Extensive experiments show that the error ranges of antenna's downtilt and heading are within 2 degrees and 5 degrees respectively,with the shooting distance in 25 m.The GPS error is also under 5 meters with this shooting distance.We develop the measuring applications both in PC and android smartphones and the results can be computed within 3 minutes on both platforms.The proposed system is quite safe,convenient and efficient for engineers to use in their daily work.To the best of our knowledge,this is the first pipeline that solves the antenna pose measuring problem by the photogrammetry method on the mobile platform.
Key words antenna mechanical pose measurement;SfM;photogrammetry;smartphone
D ue to the increase of mobile phone users,more and more GSMantennas need to be set up in populous regions.At the same time,the maintenance of a large number of antennas has been a difficult issue for radio frequency(RF)engineers.
Mechanical pose measurement is crucial for antenna management because any minor mechanical pose adjustment may cause big changes of antenna radiation patterns.Specifically,the mechanical poseof an antenna includes the heading,downtilt,Global Position System(GPS)location,and altitude.The heading is the horizontal angle of the antenna relative to true north and the downtilt is antenna's downward angle skews from the radial which is vertical to the ground in 3Dspace.RF engineers need to adjust these two angles according to surrounding changes,therefore,the two angles are the most vital parameters of the pose measurement.Fig.1shows the mechanical parameters of an ordinary GSM sector antenna.
In traditional ways,RF engineers usually need to climb up towers to measure antenna poses,helped with many kinds of measuring tools like[1].There are several drawbacks of this contact-type measuring way,which requires that engineers must contact antennas closely enough to get the parameters.The biggest risk is the security of engineers.Although wearing safety equipment,it is still dangerous for engineersto climb up tall towers with various structures.The measuring error is another problem,because the installation of measuring tools is easily affected by the human factor.Minor mounting displacement of the tools may lead to different measuring results.Moreover,equipment expenses,tools and personnel placement also makethemeasurement acostly procedure.Owingto all the disadvantages,it is not so plausible for engineers to measure antennaposesin acontact-typemeasuringway.
We want to solve the measuring problem in photogrammetry way.That is,engineers only need to use mobile phones to take several images of the antenna and well estimate the antenna pose well by easy interaction with the related applications.It is quite different from the traditional ways because engineers do not need to get close to the antenna anymore and remote measurement isavailablein thisphotogrammetry way.
We propose a measuring system based on Structure from Motion(SfM).SfM could estimate 3D structures from 2D image sequences.At the same time,the intrinsic and extrinsic parameters of each camera corresponding to each photo is calculated and we can reconstruct the object we want to obtain the antennapose.
By the proposed way,engineers first take 5 to 10 images of an antenna and store the pose of the smart phone at the moment each image is taken.This pose is relative to the geodetic coordinate system.The quality of images should be guaranteed,which means there is not too much noise or motion blur in the images.Second,SfM is performed by these images.We can get the necessary information of every camera by this procedure.Third,we calculate the rotation,scale,translation transformation parameters from the SfM outputs and poses of the smart phone.These transformation parameters intend to transform the structures from SfM coordinate to geocentric coordinate.Fourth,engineers are guided to draw the bounding box of the antenna in each image and line extraction algorithm will be performed to show the ID of each line in the small selected image.Finally,by choosing the corresponding lines of the antenna in each image as inputs,the triangulation of the corresponding lines and points will provide the final pose of theantenna.
▲Figure1.Thedowntilt,heading,location and height of an antenna.
In Section 2,we give an overview of SfM and characteristics of various SfM algorithms.In Section 3,the whole photogrammetry-based measuring system is illustrated in detail.Section 4 validates our algorithms in indoor and outdoor datasets.We perform comprehensive experiments in different environments and analyze the experiment results.We conclude the paper in Section 5.
In multi-view geometry,if we want to reconstruct the 3D shape of any object from 2D images,we have to know the intrinsic and extrinsic parameters of each camera.The phone that we use to capture the images provides some intrinsic parameters like focal length and pixel coordinates of principal points(could be computed from the image size).The extrinsic parameters,including 3D position and rotation matrix around the geodetic coordinate frame,can also be obtained by built-in sensors of the phone.However,the accuracy of these parameters cannot satisfy the need for reconstruction of the target.For example,the highest accuracy of GPSlocation of a smartphone is no better than 3 meters,even though corrected by over ten GPSsatellites.
SfM can solve this problem because only images are required for the calculation of the parameters of each camera.We can use these parameters to triangulate the object we want and then transform the pose of the object from SfM coordinate systemtogeodetic coordinatesystem.
SfM for computer vision has received tremendous attention in the last decade.The proposed methods can be divided into twoclasses:sequential methodsand global methods.
Sequential methods start from reconstruction of two or three views,then incrementally add new views into a merged representation.Bundler[2]is one of the most widely used sequential pipelines.However,there are several drawbacks of sequential methods.The quality of reconstruction is heavily affected by the choice of the initial images and the order of the subsequent image additions.Another disadvantage is that sequential methods are tend to suffer from the drift due to the accumulation of errors and cycle closures of the camera trajectory is hard to handle.The running speed of sequential methods is also a slow procedure,especially dealing with largeimagedatasets.
Global methods have better performance than sequential ones.The classical pipeline of global methods can be summarized as following procedures.
1)Featuredetection and matching
To find the correspondences between images,local corner features are detected and described.Scale-Invariant Feature Transform(SIFT)[3]is one of the most widely used feature detectors.These features are usually described as high-dimension vectors and can be matched by their differences.However,some of the matched features are incorrectly matched.These mismatches are called outliers and needed to be filtered.For example,Random Sample Consensus(RANSAC)[4]is often used to efficiently remove these outliers and keep the inliersin a certain probability.
2)Relative pose estimation
Given 2D-2D point correspondences between two images,we could recover the relative positions and orientation of the camera aswell as the positions of the points(up to an unknown global scalefactor)by thetwo-view geometry theory.Specifically,the essential matrix relating a pair of calibrated views can be estimated from eight or more point correspondences by solving a linear equation and the essential matrix could be decomposed to relative camera orientation and position.This issue is well illustrated by Hartley et al[5].
3)Absoluteposeestimation
This procedure aims to robustly recover the absolute global pose of each camera from relative camera motions.Because of the fact that the relative rotation can be estimated much more precisely than relative translation even for small baselines,the global rotation averaging can be performed beforehand and then the translation averaging can compute the absolute translation with the orientations fixed.Essential matrices only determine camera positions in a parallel rigid graph,so essential matrix based methods[6],[7]are usually ill-posed at collinear camera motion.In another way,trifocal tensor based methods[8],[9]are robust to collinear motion because relative scales of translationsareencoded in atrifocal tensor.
4)Bundle adjustment
SfM gives an initial estimation of each camera's projection matrices and also the 3D points from images features.However,it is still necessary to refine this estimation using iterative non-linear optimization method.Bundle adjustment is defined as the problem of refining the 3D points of the scene and the intrinsic and extrinsic parameters of each camera,according to the optimal criterion involving the corresponding image projections of all points.The Levenberg-Marquardt(LM)based algorithm[10]is the most popular method for solving non-linear least squaresproblemsand thechoicefor bundleadjustment.
Our photography based measurement system can be implemented by the following steps:1)taking 5 to 10 sequential images of an antenna and storing the poses of the phone relative to the geodetic coordinate system;2)performing SfMon theimages taken from the antenna;3)estimating the rotation,scale and translation transformation parameters which convert the structures from SfM space to geodetic space;4)selecting the small image of the antenna from each image,performing line extraction and choosing the corresponding lines of the antenna;5)triangulating the line correspondences and estimating downtilt and heading;6)triangulating the point correspondences and calculating the GPS and height.Fig.2shows the whole pipelineof themeasurment system.
Users need to take sequential images that include n different views of the antenna by the smartphone.Meanwhile the corresponding text file of each image is created,which stores camera poses,including rotation matrices,GPS,and height.We can get the intrinsic and extrinsic parameters of each camera by SfM.Fig.3shows the output of SfM procedure.The points in white show the outline of the scene and the points in green arethecameras'positions.
▲Figure2.Thepipelineof thephotography measurement system.
▲Figure3.Theoutput point cloudsof Structurefrom Motion.
3.2.1 SfM and Geodetic Coordinate System
Because cameras'parameters are estimated in the so-called SfM space,all the extrinsic parameters are relative to the SfM coordinate system.On the other hand,at the moment we take the images of the target,we can store the camera's parameters,including rotation matrices,GPSand height,which are relative to the geodetic coordinate system.Fig.4ashows the geodetic coordinatesystemandFig.4bshowsthedevicecoordinatesystem.Android Application Program Interfaces(APIs)provide the access to get the camera pose of the device relative to the geodetic coordinate system.
In order to transform the structures from SfM coordinate system to geodetic coordinate system as precisely as possible,we estimate the rotation transformation,scale transformation and tranlation transformation seperately.
3.2.2 Rotation,Scaleand Translation Transformation
For each pair of cameras in SfM space and geodetic space,we can directly estimate the rotation transformation:
▲Figure4.Transformation between twocoordinatesystems.
where Rgeoand RsfMare the rotation matrices relative to the SfM coordinate system and geodetic coordinate system respectively.Because we take several images of the target,we can calculate several Rtransas the same number of images.We decompose each Rtransinto three angles in the way Rtrans=Rz*Ry*Rx(each rotation matrix can be decomposed into three angles in any combination of Rx,Ryand Rz).Then all the Rx,Ryand Rzwill be averaged respectively and the final Rtransisreconstructed.
GPSis a global navigation satellite system that provides geolocation in the form of degrees.Because both the scale and translation transformation should be calculated in Euclidean space,we need to convent each camera's GPS location into Xi,Yiin the 2-D Cartesian coordinate system in the form of meters.Compared to latitude and longitude,Xiand Yiis a horizontal position representation measured in meter.We leave alone the Z coordinate,thus the transformation equation isgiven by:
where xiand yiare the values of the X and Y coordinates of each camerain SfMspace,which havebeen rotated by therotation transformation matrices and n is the number of images.S stands for the scale coefficient.Txand Tyare the translation parameters.QR decomposition(a decomposition of a matrix A into a product A=QR of an orthogonal matrix Q and an upper triangular matrix R)or Singular Value Decomposition(SVD)can easily solve this linear system to get the scale and translation parameters.
In order to get the correspondences of the lines which stand for the same contour line in each image,we have to extract the lines from antenna images.There are too many lines of the whole image,but what we only need is several contour lines'parameters of the antenna.Therefore,it is reasonable for the users to select the bounding box of the antenna and perform line extraction on these small pictures.
Line segment detection in images has been extensively studied in computer vision.Traditional methods like Hough transform[11]or its variants[12],[13]cannot satisfy the robustness requirement under different circumstances.We use a latest algorithm named Line Segment Detector(LSD)[14],[15],which is a linear-time segment detector giving subpixel results without parameter tuning.
The LSD algorithm extracts line segments in three steps:1)partitioning the images into line-support regions by grouping connected pixelsthat sharethesamegradient angleup toa certain tolerance;2)finding the line segment that best approximates each line-support region;3)validating or not each line segment based on the information in the line-support region.We exact and show the longest ten lines of each bounding box and manually input the ID of the corresponding contour line in the image.Finally the corresponding line data set is denoted as{l0,l1,...ln-1}.
We suppose a set of n corresponding lines are all visible in n perspective images.Our goal isto recover the 3Dpose of the antenna with known cameras'parameters and these line correspondences.We take three views of the images for explanation.Asshown inFig.5,the planesback-projected fromthelinesin each view must all meet in a single line L in space and conversely the 3D line projects to corresponding lines l0,l1and l2in these three images.This geometric property can be translated to an algebraic constraint,namely the trifocal tensor[16].
We use the method in[17]to perform line triangulation.The trifocal tensor matrix W is given by:
▲Figure5.Theline L in 3Dspaceistriangulated asthecorresponding triplet l 0?l1?l2 in three views indicated by their camera centers{C0,C1,C2}and imageplanes.
where P0,P1and P2are the projection matrices of these three images. Let Xa=v(:,3) and Xb=v(:,4), where[u , s, v]=SVD(W).Xaand Xbcan be regarded as two 3D points.After transforming the two points by rotation transformation matrices,the downtilt and heading can be calculated.What's more,there are C3nangles if we choose three arbitrary views of all the images.The final output could be averaged by theseangles.
Line triangulation cannot give the 3D coordinate of the antenna,so we try to triangulate the point correspondences to estimate the GPSand height.As shown inFig.6,the triangulation of points requires computing the intersection of two known rays in space and the point correspondence x?x'defines the rays.Fig.6a shows the theory of epipolar constraint.If the projection point x is known,then the epipolar line l'is known and the point X projects into the right image on a point x'which must lie on this particular epipolar line.It can be formulated by the equation x'TFx=0,where F is the fundamental matrix given by the two cameras'parameters.
To get point correspondence,we randomly choose two images to get the known matching line segments l?l'and the projection matrices of the two views.For the segment l's end point a inFig.7a,we can calculate the corresponding epipolar line in Fig.7b.This epipolar line intersects with l'on the point a'.In this way,we can get two point correspondences a?a'and b?b',asshown in Fig.7.
▲Figure6.a)A ray in 3D spaceisdefined by thefirst camera center C and x.Thisray isimaged as an epipolar line l'in the second view.The point X in 3D space which projects to x must lie on thisray,sothecorrespondingpoint x'must lieon l'.b)Triangulation.
▲Figure7.Themethod of computing point correspondence.
Fig.6b tells the basic principle of point triangulation.There are many algorithms we can adopt for triangulation.We develop the iterative linear method in[18],which isefficient and accurate enough.The 3D coordinate of the antenna is defined as the middle point of the points A and B triangulated in 3D space.We usethescale and translation transformation parameters to transform the 3D coordinate and the GPSlocation can be calculated by re-projecting the values of meters to degrees.Asfor height,we assume all the pictures are taken on the same altitude and the height of the antenna can be directly given by the translation transformation.
We implement the system on a PC and a smartphone.The PChas an Intel(R)Core(TM)i5-4590 3.30 GHz CPUwith dualcore processors and 8 GB memory.The smartphone is ZTE A2017 which has a Qualcomm snapdragon 820 2.2 GHz CPU with quad-coreprocessorsand the 3GBRAM.
Two representative data sets are used to perform the experiments:an indoor antenna dataset and an outdoor antenna dataset.For each dataset we take 6 images of the antenna target.All the images have the resolution of 4160×3120 ppi.We develop multi-thread programs to speed up the feature extraction procedures,which makes the running time on the PC and smartphone can be within 2 minutes and 3 minutes respectively,includingthetimeused for interaction.
We set up an antenna for the experiments and put it in an indoor environ-ment.In order to test the antenna with different poses,we take several images of the antenna with various downtilts from 3°to 12°,as shown inFig.8.
Only the downtilt and heading of the antenna could be estimated because there is no GPS signal in the indoor environment.On the other hand,we use different photography methods to take images of the antenna by the distance of 4 m.The poses of the smartphone relative to the geodetic coordinate system can be decomposed to three angles.In the proposed photography method,3d representatives that we fix these three angles of each smartphone to(110°,0°,90°),with the help of a tripod standing;2d means that only the latter two angles are fixed;0d meansthat all thethree angles could be different.Tables 1and2shows the experiment results based on the indoor datasets,where Tand H stand for downtilt and heading respectively.
▲Figure8.Indoor antennaimages.
▼Table1.Experiment resultsbased on theindoor antenna dataset:thevaluesof downtilt
▼Table2.Experiment resultsbased on theindoor antennadataset:thevaluesof heading
The experiment results based on the indoor antenna dataset show that different photography methods do not have an evident influence on the estimation accuracy.Downtilt errors are within 1°in different results,which are fairly accurate.However,heading errors are larger than the downtilt ones and the absolute error is within 5°.Fig.9shows how the antenna's downtil affects the average measuring results,especially for the heading errors.We can clearly find that as the downtilt of theantennaincreases,theaccuracy of theheadingalsoincreases.It is because that the estimation error of the heading is inevitable;when the target is almost vertical to the ground,a minor displacement of the estimated pose will lead to a big error of the heading.In extreme cases,when the target is vertical to the ground,itsheadingisan almost randomvalue.
We also perform the experiments in an outdoor environment to testify the valid photographic distances and the stability in different environments.We take one of the environments as an example.The spot for photography is the rooftop of one of ZTE buildings(Fig.10).The red box in the figure is the target antenna.
In this dataset,the photographic distance ranges from 4 m to 30 m.Because it is an outdoor environment,the GPSlocation and height of the smartphone can be stored and we can analyze the accuracy of these parameters.The antenna's downtilt and heading is 11°and 180°respectively.The true value of the longitude is 108.827717 and the latitude value is 34.098142.The altitude is 415 m.Table 3showsthemeasuringresults.
According to the results of Table 3 andFig.11,the system always gives an accurate downtilt at any distance within 30 m.Within this shooting distance,the heading error is below 5°.However,when the distance becomes farther,the accuracy of heading gets lower and more unstable too.This is because when the shooting distance is too far,the contour line of the antenna becomes smaller and the error of the line's parameters is bigger.The GPS is also within 5 m in this photography distance range.However,the accuracy of the height is rather low because of the inaccuracy of the built-in sensors of the smartphone.For example,when wetake5 images for the antenna target on the platform at the same altitude,we find that the 5 altitude values stored by the phone vary a lot and are not consistent with the ground truth.This inaccuracy of the raw data leads to the error of the final altitude of theantenna.
▲Figure9.Therelationship between thedowntilt of antenna and averagemeasuring error.
▲Figure10.Therooftop of one ZTEbuilding.
▲Figure11.Therelationship between photographic distanceand angle measuring error.
▼Table3.Resultsof theoutdoor antenna dataset
We suggest that the photography distance is in the range of 3 m to 25 m and the number of the images should be more than 5 and less than 10 considering both the accuracy and efficiency.The image quality should be good.In particular,the contour lines of the antenna should be distinct and easy for extraction.The moving distance between two shooting spots should be from 0.3 m to 1 m,because too small or too big moving distanceswill increase the failure risk of Structure from Motion.
We propose a photogrammetry-based antenna pose measuring system,which only requires antenna engineers to take several images of the antenna and some easy interaction with the application on the smart phone.The experiment results show that within the distance of less than 30 m,the downtilt error is in the range of 2°.Owing to the physical property of the heading,within the distance of 25 m,its error is in the range of 5°,bigger than the downtilt error.The GPSerror is within 5 m when the GPSinformation is well corrected by satellites after several minutes.The altitude results can just be regarded as a reference because the altitudes captured by the phonearetoonoisy.
The proposed system presents many advantages.With it,engineers do not have to wear the equipment,climb up the stairs and contact the antenna to measure the pose.What they only need is taking photos,touching the screen and waiting for about 3 minutes,and then the fairly accurate results will be estimated.It is safe,easy,economic and efficient.Webe-lieve that the proposed system can work in many measuring circumstancesand help savemany resourcesfor theindustry.
Biographies
XU Kun(xu.kun7@zte.com.cn)received his master's degree from Xidian University,China.He is now the director of Big Data Department 4,ZTE Wireless Research Institute.From 2009 to 2012,he was the project manager of the UniPOSNetMAX product group at ZTE Corporation.He served as the product director and then the head of the Wireless Network Optimization Tool Department,ZTE Corporation from 2012 to2014.Hehasrich experiencein wirelessnetwork optimization.
FAN Guotian(fan.guotian@zte.com.cn)received his master's degree in engineering in 2008.Now he is a senior product manager at the Wireless Big Data Center of ZTE Corporation.His research interests include big data mining,wireless network planning/optimization,and data GISpositioning.
ZHOU Yi(zhou.yi5@zte.com.cn)received his master's degree in computer system architecture from Xidian University,China in 2008.Now he is a system engineer and project manager at the Wireless Big Data Center of ZTE Corporation.His research interestsincludebigdataand machinelearning.
ZHAN Haisheng(zhan_haisheng@vip.163.com)received his doctor's degree in computer application technology from Xidian University,China in 2007.He is now an associate professor of the School of Network and Continuing Education,Xidian University.His main research interests include image processing and Chinese semantic processing.
GUO Zongyi(guozongyi75@126.com)received his bacherlor's degree in computer science and technology fromthe Department of Computer Science,Xidian Universidy,China in 2014.He is currently pursuing the master's degree at Computer Technology in Multimedia Technology Institute,Department of Computer Science,Xidian University.Hisresearch interestsinclude image processing,machine vision,photogrammetry,and machine learning.