Zhenjiang Dong ,Hui Ye,Yan Wu ,Shaoyin Cheng ,and Fan Jiang
(1.ZTECorporation,Nanjing210012,China;
2.Information Technology Security Evaluation Center,University of Science and Technology of China,Hefei230027,China)
Abstract Android has a strict permission management mechanism.Any applications that try to run on the Android system need to obtain permission.In this paper,we propose an efficient method of detecting malicious applications in the Android system.First,hundreds of permissions are classified into different groups.The application programming interfaces(APIs)associated with permissions that can interact with the outside environment are called sink functions.The APIs associated with other permissions are called taint functions.e construct association tables for block variables and function variables of each application.Malicious applications can then be detected by using the static taint-propagation method to analyze these tables.
Keyw ords malware;softwareanalysis;static analysis;Android
S martphones have become more complex in terms of functions and third-party applications,and this makes them a living space for malware.People store private information such as accounts and passwords on their smartphones,the loss of which could have serious consequences.
Malware that runs on smartphones has the same characteristics as malware that runs on desktops.Thus,traditional malware analysis methods for desktops can also be used for Android.Software analysis techniques develop very fast;static-analysis methods are particularly efficient and easy to use because an application can be analyzed without having to be run[1].General static-analysis methods include data-flow analysis,control-flow analysis,and type analysis.Symbolic execution is another commonly used static-analysis method[2],[3].It isused for intelligent path scheduling and constraint resolution.
Android uses a strict permission management mechanism to restrict the behavior of applications.If a program needs to write files to the storage card,the WRITE_EXTERNAL_STORAGE permission must be granted to the program.In other words,all the required permissions need to be granted totheapplication beforeit can run on thesystem.
Because Android is an open-source system,researchers are always interested in investigating its security mechanism.Android is studied using various kinds of methods that can be classified aseither dynamic monitoringor static analysis.
Dynamic monitoring often requires system modification so that an application can be monitored as it runs in the Dalvik virtual machine(DVM)or native environment.Some methods require stricter permission management for applications.Kirin checks whether the installed application violates the permission requirement strategy,which comprises an unsafe permission combination[4].Any application that requests an unsafe combination of permissions should be barred from running.Saint uses a similar method to Kirin but goes further.Developers can design permission assignment on installation and permission use at runtime.TaintDroid add middleware to the system and uses a dynamic,lightweight taint-propagation engine to detect privacy leakage in applications[5].Apex modifies the Android frameworks to restrict permission-granting[6].With this method,the system can grant parts of requested permissions and can even withdraw some permissions at runtime.MockDroid modifies the Android system to hide user resources from running processes[7].TISSA is a privacy protection model for preventing unauthentic applications from accessing private information[8].
Static analysis methods are somewhat different from each other.ScanDroid analyzes the source code and AndroidManifest.xml file of an application and generates a certificate that describes the use of permissions[9].PiOSuses program profiling to detect privacy leakage from applications on an iOSsystem[10].In[11],a decompiling tool called ded is used to decompile Dalvik bytecode to java source code so that the application can be analyzed using current java source code analysis tools.In our static method,permissions are first divided into groups.Then,the association table of block variables(ATBVs)and function variables(ATFVs)is constructed according to the Dalvik bytecode.Finally,we test our static taint-propagation method on thesetwotypesof association tables.
The Android smartphoneoperatingsystemcan bedivided into application(Fig.1),application framework,librariesand Android Runtime,and Linux kernel layers.The bottom layers of the software stack are based on Linux kernel 2.6.Basic device driving,memory management,process management,and network management are implemented in this kernel.Above the kernel layer is the libraries and Android Runtime layer.Android Runtime is a VM for applications,and each application runs on a separate VM.Libraries such as Surface Manager,Media Framework,WebKit,and SQLite are indirectly provided to developers.The second layer contains the application framework,which comprises all kinds of APIs that allow developers to reuse system components and servers.In the top(applications)layer,Google has pre-installed basic applications such as contacts.Users can install any third-party applications in this layer.The Android security mechanism comprises user ID(UID),permission,and signature.
In Android,each application has its own UID that is assigned by the system when the application is installed on a device.The UID is not changed.Security limitation is implemented at the process level.By default,applications cannot execute operationsthat hurt other applicationsor the system.For an application to run,the system allocates a separate DVM according to the application's UID.The DVM,working as a sandbox,separates applications from each other so that they do not interfere with each other.Directly accessing data of another DVM is forbidden by the system.However,if one application obtains another application'sshared UID,it can accessdataof the other application.
Permissions describe the rights of an application to execute some operations.Permissions are complex;there are 115 items in Android 2.3.3.An application must register all required permissionsat installation rather than at runtime.If it doesnot,reinstallation isnecessary.
There are three pieces of information associated with a permission:the permission name,group that the permission belongs to,and protection level.Permission groups are classified according to function.For example,the permission group PHONE_CALLS includes the permissions READ_PHONE_STATE, PROCESS_OUTGOING_CALLS and other permissions related to phone calls.The protection level identifies how the permission is protected.There are four protection levels:normal,dangerous,signature,and signature/system.Normal and dangerous permissions are only granted when they are requested;however,unless the application has the same digital certificate as the system,it cannot get permissions at the signature or signature/system level.
Each application needs a signature in order to establish a trusted relationship between developers and the application.A signature-level permission can only be granted to applications that have the same signature as each other.The digital signatures in Android can be designed by application developers and do not need to be authenticated by a digital certificate agency.Each signature has an expiration date that is checked during installation.The system will not check the expiration date after the application has been installed.Even if an application expiresafter it hasbeen installed,it can still run normally.Thesignature isalsoused toupdatetheapplication.However,in this case,if the signature has expired,the application cannot be updated.
▲Figure1.Android system structure.
Malware has evolved with the development of the software industry;however,the purpose of malware has never changed.It is software installed on a computer or other device without the user's authorization.It collects sensitive information from the system or does other harmful things.In general,malware can beclassified accordingtowhether it
·tries to get remote control of the target system.This category includes bug-exploiting programs,Trojan horses,worms,bots,and viruses.
·tries to maintain remote control.This category includes backdoors and rootkits.
·tries to accomplish specific tasks.This category includes spyware,spamming,adware,phishing,and other similar software.
These classifications were initially designed for malware targeting PCs.Even though smartphone malware might be slightly different,we can still learn a lot from PC malware.Currently,smartphones malware is mainly classified according to malicious behavior,that is,malicious charging,expenses consuming,backdoor operation,privacy violation,and other malicious behavior[12].
A total of 3523 types of malware were detected in the first quarter of 2012,and nearly 4.12 million phones were infected[12].There is a clear increase in the types of malware on Android systems.Malicious behavior,including privacy violation,remote controlling and malicious charging,accounted for 60%of malwarebehavior(Fig.2).
There are 130 permissions in the latest 4.1 version of Android,and it is difficult to classify them appropriately[13].Motivated by the permission group design in Android development document[14]and the malware classifications in section 3,we classify these permissions as interacting,controlling and system resource,privacy,and fee(Table 1).Each category is assigned a risk level.Malware in the interacting category poses the highest risk;malware in the controlling and system resource category poses a high risk;malware in the privacy category poses a medium risk;and malware in the fee category poses a low risk.Permissions in the interacting category interact with websites and other outside devices.The reason we assign the interacting category the highest risk level is that without this kind of permission,the phone would not be able to interact with outside devices.Thus,there are no threats to the phone in other permission categories.We assign the controlling and system resource category a high risk level because with the power of control,permissions in the privacy and fee categories would be easy to obtain.Because private information is usually more valuable than money,the privacy category is assigned a mediumrisk level,and the fee category is assigned a low risk level.
All permissions and application requests are declared in the application's manifest file and are determined on installation.Permissions are used to restrict the operations of a program,so in the program's source code,there should be functions that use the corresponding permissions.The foundation of static analysis based on a permission-classification method is the construction of a map from functions to their corresponding permissions.Functions that request permissions from the controlling and system resource,privacy,and fee categories are called taint functions.
▲Figure2.Android malwareclassification in thefirst quarter of 2012.
▼Table1.Permission classifications
In section 4.1,we divided permissions into four groups and assigned risk levels to each of these groups.Functions in the interacting group are the preconditions that allow malware in the other three groups to hurt the system.Therefore,functions belonging to the interacting group are called sink functions.They aretheterminatingfunctionsof static analysis.
The static analysis algorithm comprises the ATBV&ATFV engineaswell asthestatic taint-propagation engine.
The ATBV istheassociation table of block variables.Generally,one program may contain hundreds of functions,and one function may contain several basic blocks.In these blocks,there are some variables that are associated through assignment or other operations.For example,int v1=v2means variable v1is associated with v2by assignment.Therefore,we analyze all the variables in one block and construct an association table from them.Similarly,the ATFV is the association table of function variables.It is constructed from the scale of functions.Because one function often contains several blocks,the ATFV isconstructed fromthe ATBV.
4.2.1 ATBV&ATFV Engine
The ATBV&ATVF engine scans the Dalvik bytecode of an application and appliesthe following steps to each function:
1)Divide the function into several basic blocks using the basic block algorithmmentioned in[15].
2)Calculate the ATBV.In Dalvik VM,operands of an instruction are stored in registers which are reused in a program,so registers should not occur in the items of association table.
When calculating the variables,the engine reaches instructions such as aput and aput-object that write to a register.The engine first clears the register association variable then associates it with the new variable.Similarly,the engine reaches instructions such as aget and aget-object opcode that read a register.The engine adds a register association variable to the association table that has association variables of destination registers.When calculating the association table of variables,the variables are initially untainted.Meanwhile,the engine adds all the functions that are called by the current function to the function-call list:
1)Calculate ATFV by merging ATBV with ATVF.For block-crossing variables,redundant table items should be deleted duringmerging.
2)Calculate the entry function list based on all the func
tion-calling lists.This is accomplished by calculating the number of calls for each function and adding functions with a zero call number to the entry function list.
4.2.2 Static Taint-Propagation Engine
There are fifteen different taint states:NONE,MESSAGE,CONTACT,MAIL,CALLS,CALL_RECORD,LOCATION,LOCAL_DATABASE,LOCAL_LIB,FILE,CAMERA,MICROPHONE, OTHER_DEVICE, OTHER_CONTENT and WEB_DATA.The taint state of most variables is NONE.However,when a variable is related to a taint function,its state may change.For example,if the value of a variable comes from message-sending APIs,the taint state will be MESSAGE.When handling a variable,we check whether it is tainted or not rather than determine its specific taint state.The engine takes the output entry function list of the first engine as input and appliesthefollowingstepstoeach function:
1)Start deep traversing from the entry function.When a taint function is encountered,the taint state of its return value is set to tainted.Then,all the associated variables in ATFV are set to tainted.The taint states of function-crossing variables are propagated from the formal parameters.Therefore,all the variables associated with the formal parameters in ATFV areset totainted.
2)In the deep traversing process,only when a taint function and sink functions appear in the same path can this path be recorded.This strategy can reduce false positives because no taint function in the path means that there is no resource or privacy in thevariables,sothepath should besafe.
Thismethod reducesfalsepositives,which isthe main shortcoming of other static analysis methods.This method concentrates on variables and simulative execution of the target program on these variables.Current static methods often contain both data flows and control flows,which means they are time-consumingand memory-consuming.
We have developed a prototype system based on the previously mentioned method.The system framework is shown in Fig.3.The system analyzes the bytecode of an application without accessingthesourcecode.
The system comprises pretreatment,program-recovery,and malware-detection modules.The pretreatment module unpacks the classes.dex file from an Android apk.Then,this file is disassembled to bytecode using disassembling tools,and the output bytecode is imported into a database.After pretreatment,the program recovery module reads bytecode from database and starts constructing the ATBV and ATFV.Finally,the malware-detection module analyzes each execution path in the taint-propagation algorithm and outputsthe results.
▲Figure3.Prototypesystem framework.
We use this system to detect 7806 Android applications from an online application market.The results are shown in Table 2.A total of 2629(33.68%)of the applications were potentially malicious,and theremaining5177(66.32%)of theapplications were normal.
Of the 2629 malicious apps,609 demonstrated high-risk malicious behavior(Table 3).A total of 18,811 malicious behaviors are detected in all the malicious apps(Fig.4).We detected 50 types of malicious charging behavior in 31 apps,1344 types of privacy violation behavior in 614 apps,44 types of malicious propagation behavior in 40 apps,1043 types of expense-consuming behavior in 350 apps,7057 types of native code executing behaviors in 1729 apps,and 9416 types of unauthorized network connection behavior in 1612 apps.One executing path or one application can exhibit multiple types of malicious behavior.Because apps developed in Java are easy to disassemble,many developers use native codes to enhance copyright protection.This method is also used by hackers to hide malicious code.The main profit model of Android apps is to deliver advertising.Michael C.Grace analyzed some advertising packages and found many problems[16].Privacy violation and expense-consumption behavior are also common.Some malicious apps set out to obtain a user's private informa-tion,such as bank accounts and passwords.This results in monetary loss.Connecting to the network or sending messages in the background are expense-consuming,and these types of behavior are common in today's apps.However,malicious charging and maliciouspropagation are relatively uncommon.
▼Table2.Resultsof detectingapplicationsin an onlineapplication market.
▼Table3.Therisk level of potentially maliciousapplications
▲Figure4.Maliciousbehavior distribution.
In this paper,we have proposed a static analysis method based on permission classification.This analysis system comprises the ATBV&ATFV engine,which is used to construct variable tables,and the static taint-propagation engine,which is used to analyze the program.We used this system to detect 7806 apps from an online market.The experimental results show that our method is not only feasible but also effective in detectingmaliciousbehavior in Android apps.
Acknowledgment
This research was supported in part by the Fundamental Research Funds for the Central Universities of China(Grant No.WK0110000007),the Specialized Research Fund for the Doctoral Program of Higher Education of China(Grant No.20113402120026),the Natural Science Foundation of Anhui Province,China(Grant No.1208085QF112),the Foundation for Young Talents in College of Anhui Province,China(Grant No.2012SQRL001ZD)and the Research Fund of ZTE Corporation.