Keywords

1 Introduction

The current android operating system allows users to run many applications developed by third-party independent developers, which are available in android app markets. In addition, multiple applications can communicate and exchange data by inter-component communication (ICC). ICC is the key mechanism of communication between applications in android, which enriches the functions of android applications, such as WeChat, which can access health data for ranking. Unfortunately, while ICC enhances user functionality, it can be exploited by malicious software to threaten user privacy. Indeed, researchers have shown that android apps frequently collect and use users’ private data without their prior consent [17].

When applications communicate with each other, it is more prone to data leakage [1, 3]. Existing information flow analysis methods mainly include static analysis, dynamic analysis and machine learning analysis. In addition, there are some methods that combine the previous methods, such as hybrid (combining dynamic and static) analysis methods.

The static analysis method decompile the .APK file of each app, and then perform static taint analysis on the decompiled code to find out the data leakage path, such as flowDroid [4], IIFDroid [8], DroidSate [10], DroidGuard [14] and so on. They can analyze all the application’s resources or codes to achieve high coverage on code. However, they lack an actual execution path and face critical challenges in the presence of code obfuscation [22], loading dynamic code [15], reflection calls [23], native code [24], and multithreading issues. The dynamic analysis method detects the privacy leakage within an application by executing the application in a real or virtual device, such as Mobile-sandbox [16], TaintDroid [6] and ScanDroid [12]. They can observe actual execution trace and tackle code obfuscation and dynamic code loading. However, code coverage is limited by dynamic analysis methods because it cannot execute all possible traces in one time. As a result, the private data leakage vulnerabilities which exist in the uncovered codes will be missed. Moreover, current malware can recognize dynamic monitors as the analyzed app executes, causing the app to pose as a benign program in these situations [25]. In order to solve the challenges of static analysis and dynamic analysis, some hybrid solutions are proposed. For example, HybriDroid [9] present a novel hybrid approach aims to automatically find privacy leakages in a given app set.

In addition, in recent years, with the development of artificial intelligence algorithms, it has become a trend to combine information flow analysis with artificial intelligence. Machine learning is a branch of artificial intelligence mainly treating information flow analysis as a classification problem. By analyzing the differences in features between benign applications and malicious applications, the features with statistical differences are selected, and then trained to classify [26,27,28,29,30,31]. In the feature extraction stage, static analysis method is generally used to extract features. [32] makes use of the similarity analysis of android application features of multiple dimensions to obtain the relevant rules of multiple dimensions of android application. [33] automatically learns security/privacy-related behaviors by analyzing user comments based on machine learning. In order to extract API data dependencies, [34] conducts context-sensitive, flow-sensitive and inter-process data flow analysis. [35] proposes a semantic-based feature extraction and detection method for malicious code, which extracts the key behaviors of malicious code and the dependencies between behaviors. The advantages of machine learning analysis are low implementation overhead and simple operation. The disadvantage is that it is influenced by the difference of training applications and the selection of characteristics. Besides, the current machine learning approach does not support analysis of inter-apps.

Most above analysis methods can be used for the analysis on the information flow within a single component or application. In addition, many researches are proposed for information flow analysis on the inter-communication between different components [2, 7, 11, 18, 19]. IccTA [2] combines multiple applications into one and performs intra-app analysis on the combined one. Covert [7] is a tool for analyzing vulnerabilities across applications that allows incremental analysis of applications while they are being installed, updated, or deleted. MR-Droid [19] empirically assesses ICC risks and tests for high-risk pairs. However, most of these approaches works in a global way, in which they analyze the flows across multiple components by modeling them as one combined entity. In other words, the whole model must be remodeled even if there is a little change on one component. And it would cost many efforts but with little benefits. Besides, flows in the unchanged component or application will be reverified during the analysis, which causes the additional verification load on mobile devices.

In order to reduce the overhead of verifying compositional information flow security, this paper presents a compositional approach to automatically verify security of information flow across multiple components or applications. This paper presents the following original contributions:

  1. (1)

    We define a formal model on individual application and sequential composite applications combined by inter-component communication for information flow analysis.

  2. (2)

    We make the formal security constraints on information flow for each participant across multiple applications.

  3. (3)

    We propose a compositional information flow verification approaches for secure inter-app communication among applications in android.

The rest of the paper is structured as follows. Section 2 presents the motivation examples for this study. Sections 3 and 4 defines the formal models and propose a security theorem for compositional information flow which verify with the verification framework in Sect. 5. Section 6 evaluates our methodology and Sect. 7 is conclusion.

2 Motivating Example

To illustrate our approach, we provide a concrete example of information transmission among different android’s apps through Inter-Component Communication (ICC). Android provides a flexible application level message known as Intent for communications between components. The example includes three apps. \(App_{1}\) contains GetDataActivity which obtains the user’s sensitive data such as phone number, e-health record and so on, which is shown in LIST1. \(App_{2}\) contains ForwardActivity, which receives sensitive data (user’s movement steps) from intent message MSteps and forwarding it to \(App_{3}\) by intent message Fsteps1. \(App_{2}\) is shown in LIST2. \(App_{3}\) contains ReceiveDataActivity which is responsible for receiving intent message and sends the data to a remote server which can exploit it at will.

figure a
figure b
figure c

More specifically, from line 4 to line 9 in LIST 1, GetDataActivity edits a e-health record and stores it in SharedPreferences which is a lightweight storage class on the Android platform. From line 10 to line 15, GetDataActivity sets the action of Intent, then gets the stored e-health data and subsequently sends it to ForwardActivity through an Intent message. The Intent filter which is defined in the Manifest file of App2 is responsible for receiving this Intent message and handle it. Likewise, ForwardActivity gets an Intent and receives users movement step data of the message from GetDataActivity (On line 4 in App2). From line 9 to 16, ForwardActivity sets the type of Intent and sends an Intent message to ReceiveDataActivity. In ReceiveDataActivity, it gets the Intent message and receives the data.

In this example, if \(App_{3}\) is malicious software or is monitored by an attacker, the sensitive data in \(App_{1}\) may be leaked to the attacker even though the information flow is secure in \(App_{1}\) and \(App_{2}\).

3 Android Application Model

In Android system, an application is composed by components which are described in a special file called Manifest. There are four kinds of components, i.e., activity, service, content provider and broadcast receiver. Activities construct user interface of an app. Each app may have multiple activities representing different screens of the application to the user. Services do not have user interface but perform time-consuming tasks in the background. Content providers act analogous to a database and provide access to a constructed set of data. Broadcast receivers listen to global events.

Referring to the android application model described in [7], an app model can be formally defined as follows.

Definition 1

A model for an android app is a tuple \(A_{i}={<}C_{i},I_{i},IF_{i}, Sec_{i}{>} \ 1 \le i \le n\), where

\(C_{i}\) is a set of components represent as \(C_{i}=\{\ c_{1},c_{2},\ldots , c_{m}\}\), and each \(c_{j} \ (j<m)\) is a component of \(A_{i}\). Each component contains a series of methods for executing the required functions. We use \(M_{i}\) to represent the set of methods that used in application \(A_{i}\).

\(I_{i}\) is a set of event messages called intents that can be used for both intra-and inter-app communications. Here we use \(In_{i,j}\) to represent an intent message set from Application \(A_{i}\) to Application \(A_{j}\) where \(In_{i,j} \subset I_{i}\).

\(IF_{i}\) is a set of Intent filters. Each intent filter is attached to a component and responsible for filtering implicit intents.

\(Sec_{i}\) is the set of security properties of all methods in \(A_{i}\).

Different applications can cooperate with each other to fulfill different user’s requirements through ICC. This paper studies a simplified type of inter-application communications, i.e., sequential inter-application communication. And we call these applications as sequential composite applications. Sequential composite applications is composed by a set of applications \(A_{1},A_{2},\ldots , A_{n}\) which communicates with each other in a sequential way. According to the characteristics of sequential composite application, its model can be defined as follow.

Definition 2

The composite apps \(A_{c}\) can be represented as a tuple \(A_{c}={<}AC,CI,CIF{>}\). where

  • AC is a sequential group of apps in which each app has only one predecessor and one successor.

  • CI is a collection of all intents using for communication between \(A_{i}\) and \(A_{i+1}\).

  • CIF is the set of all Intent filters from all the applications used as the entry point for the adjacent application communications.

Based on the above definition of the composite application model, the model of the example in Sect. 2 can be extracted as \(A_{c}\). \(AC=\{App_{1},App_{2},App_{3}\}\), where, \(App_{1}={<}\{GetDataActivity\},\{Msteps\},\{Interfilters\}{>}\). Each intent is shown in bold in the code in Sect. 2. Interfilters are defined in the corresponding application Manifest file.

4 Secure Information Model for Composite Application in Android

4.1 Security Label Model

As a device for storing and processing data, android phone contains a lot of sensitive information, such as e-health, contacts, and so on. According to the different sensitivity of information, we use multi-level security model to describe the security properties of data.

By referring to [20], security label model can be defined as a lattice \((SL,\ge )\), where SL is a finite set of security levels that is orderly by \(\ge \).

We define a function \(g:M_{i}\rightarrow sl\) to represent the security level of each method in application \(A_{i}\). Based on the security label model, security property in \(A_{i}\) can be represented by the security level on the methods.

4.2 Information Flow in Intra-app

For intra-app information flows, we use static analysis technique [4] to analyze them. In one application’s component, there are source methods that are responsible for accessing sensitive data such as phone numbers and sink methods that are responsible for outputting data [5]. During the execution of the application, data are received by different sources, processed by methods in component, and finally outputted by different sinks, which constructs different data prorogation paths. And we can define the information flow within an intra-app as follows.

Definition 3

The information flow of android’s app \(A_{i}\) can be represented as a tuple \(flow={<}source,sink{>}\) where, \(source,sink\in M_{i}\).

Based on the above description, we define \(Flow_{i}=\{{<}source,sink{>}|source,sink\in M_{i},i\in N\}\) as the set of all flows in \(A_{i}\).

Combining with the multi-level security model, sl(source) and sl(sink) are used to represent the security levels of sources and sinks. According to the definition of non-interference [13], the security of information flow in a intra-app can be formally defined as follows.

Definition 4

The information flows in application \(A_{i}\) are secure if it satisfies that for \(\forall source,sink\in M_{i}\), \(sl(source)>sl(sink)\), there is no existence of information flow from source to sink, namely \({<}source,sink{>} \notin Flow_{i}\).

4.3 Secure Information Flow in Composite Applications

Considering sequential group apps \(A_{1},A_{2},\ldots ,A_{n}\), when applications communicates with each other by intents, data are passed from the sink method of one component in \(A_{i}\) to the source method of component in \(A_{i+1}\), which forms an inter-app information flow across multiple applications. And the following apps \(A_{j}(j>i)\) may leak data despite the information flow in \(A_{i}\) is secure. The Fig. 1 shows the intra and inter flows in our example in Sect. 2.

Fig. 1.
figure 1

Composite information flow

For sequential composite application \(A_{C}\), the definitions of inter-app information flow can be given as follows.

Definition 5

For \(\forall source_{i}\in M_{i}\), \(\forall sink_{j}\in M_{j}\), there is a inter flow \(flow_{i,j}={<}source_{i},sink_{j}{>}\) from \(A_{i}\) to \(A_{j}\), if they satisfy one of the following conditions.

  1. (1)

    For \(i=j-1\), there \(\exists sink_{i}\in M_{i}\), \(source_{j}\in M_{j}\) that satisfy \({<} sink_{i},source_{j} {>} \in In_{i,j}\).

  2. (2)

    For \(i\ne j-1\), \(\exists A_{k}, 1<k<j; \exists source_{k}\in M_{k}, \exists sink_{k+1}\in M_{k+1}\), and they satisfy that \(\exists flow_{i,k}={<}source_{i},sink_{k}{>}, \exists flow_{k+1,j}={<}source_{k+1},sink_{j}{>}\) and \(\exists {<}sink_{k},source_{k+1}{>}\in In_{k,k+1}\).

According to the above description of the inter-app information flow, we use \(Flow_{inter}\) to represent the set of all inter-app information flows in the sequential composite applications where \(Flow_{inter}=\bigcup _{0 \le i, j \le n} flow_{i,j}\).

Then we can obtain the following definition on secure information flow in composite application.

Definition 6

The information flows in sequential composite application \(A_{C}\) are considered secure iff it satisfies that for each \(A_{i}\in A_{C}\), for \(\forall source_{i}\in M_{i}, \forall sink_{j}\in M_{j} (i\le j)\), there is no existence of information flow \(source_{i}\) to \(sink_{j}\), namely \({<}source_{i}, sink_{j}{>} \notin Flow_{i} \cup Flow_{inter}\).

Based on the composite information flow security definition above, we can obtain the following theorem.

Theorem 1

In the security sequential combinatorial application \(A_{C}\)s, the information flow must satisfies following conditions.

  1. (1)

    For \(\forall {<}source,sink{>}\in Flow_{i}\) in \(A_{i}\), there is \(sl(source)\le sl(sink)\).

  2. (2)

    For \(\forall {<}sink_{i},source_{i+1}{>}\in In_{i,j}\), there is \(sl(sink_{i})\le sl(source_{i+1})\).

We can use the mathematical induction to prove the theorem by referring to [20].

5 Compositional Information Flow Verification for Composite Application Android System

Android inter-app communication is a basic behavior that usually occurs during system running. For example, wechat accesses health data and makes statistical ranking. The famous social software Weibo adds friends through visiting contacts. In order to ensure the data security across multiple applications, we propose an compositional information flow security Theorem 1. According to the Theorem 1, we can infer that for a sequential composite application, the information flow security verification procedure includes two different phases, i.e., the intra-flow verification and inter-flow verification.

5.1 Intra Flow Verification in Single Application

In the process of intra-flow verification, we first use the flowdroid tool [4] to obtain the application’s sensitive information flow. Then each flow is verified according to condition (1) in Theorem 1. If the flows in the application is valid, the certificate is generated which can be used for the inter flow verification to avoid the repeated verification. The certificate includes all essential information for inter flow verification, e.g., the security level on each source and sink method and so on. The procedure for flow validation and certificate generation is shown in Algorithm 1.

figure d

After successful verification, the generated certificate will be stored in the database. This procedure can be executed during the application is going to be installed on the system at the first time. And only secure ones can obtain the certificates while the others are not allowed to be installed. The counterexample will also be return to users.

5.2 Compositional Flow Verification for Inter-application Communications

According to the condition (2) in Theorem 1, the security on the inter-app flows can be ensured by verification on the inter flows between the adjacent applications. The compositional verification algorithm is as follows.

figure e

In the compositional verification process, the certificate is obtained first and then the adjacent application’s inter-flows are verified. If the validation is successful, the result is returned to the user. Otherwise, return counterexample of an insecure flow.

6 Implementation and Evaluation

This paper mainly studies the compositional application’s information flow security verification approach. In this section, we experimentally compare the verification time overhead of our approach with the global verification approach. Our approach has been described in Sect. 5. The global approach first uses ApkCombiner [21] to combine multiple applications into one, and then use flowdroid [4] for information flow analysis and verification. The basic experiment configuration is shown in Table 1 and verification results are shown in Figs. 2 and 3. In Table 1, the Applications number refers to the number of applications tested, and Combined applications number refers to the number of compositional applications formed.

Table 1. Basic configuration
Fig. 2.
figure 2

The verification time increases with the inter-app communication number

Fig. 3.
figure 3

Verification time increases with the number of composite applications

Figures 2 and 3 show that the verification time of the global method is much higher than that of our approach. The reason is that when the communication among different applications changes, the global approach needs to reverify all the flows in the whole composite application. On the contrary, our approach only needs to verify the relevant inter flows between the applications according to certificate, which saves lots of costs on the reverification on intra flows in the same application. Besides, our compositional verification algorithm is easy to extend. With the increasing number of steps n, we only need to verify the additional flows between \(A_{n}\) and \(A_{n+1}\) to ensure the security of the composite application.

7 Conclusion

ICC is used to communicate among multiple applications, which may cause leakage on users’ sensitive data. In this paper, we design the formal model of android application and sequential composite applications by ICC. Then, through the analysis of the intra and inter information flows, we get the security theorem on sequential composite application. Based on the theorem, we design a compositional information flow security verification algorithm. The process of information flow verification in intra and inter app is separated to avoid repeated verification in application when communication changes. Finally, we compared our approach with the global verification through the experimental evaluation. And the results show that our approach can reduce the overhead of information flow verification effectively. Since our approach relies on the accuracy of the information flow recognized by flowdroid [4], in the future, we are going to improve the precision of information flow validation by using machine learning to identify source and sink more accurately.