1 Introduction

As wireless and mobile networks flourish, the properties of flexibility, convenient and low cost make VoIP applications be widely used in the enterprise and consumer markets. End-user get used to making a phone call over the public internet rather than via the public switch telephone network (PSTN). In addition to the transmission of voice, video and multimedia also benefit from this technology. The current mainstream of VoIP is Session Initiation Protocol [1] which uses text-based signals to establish, modify and terminate media transmission sessions. The media streams, such as voice and video over IP networks, are transmitted by utilizing the Real-Time Transport Protocol [2]. The maturity of VoIP standards and quality of service (QoS) on IP networks opens up lots of services like the IP Multimedia Subsystem [3], online video conferencing, and video on demand (VoD).

In view of numerous advantages in SIP application, the security become a prerequisite. In 2013, Edward Snowden, an American computer professional, former CIA employee, and former National Security Agency (NSA) contractor, revealed numerous global surveillance programs run by the NSA and the Five Eyes with the cooperation of telecommunication companies and European governments. This news makes governments and enterprises start to focus on the security of transmitting media over IP networks. Before that, many security threats had been studied [4, 5], such as Denial of Service (DoS) [6], SIP malformed message attacks [7] and Abusing SIP Authentication attacks [8]. Therefore, a lot of security frameworks and schemes [7, 9] were proposed. Most of the proposed schemes emphasize the authentication in the registration phase or the signal protection of SIP. There is no concrete and overall system being implemented.

In 2012, Shen et al. [10] present the impact of Transport Layer Security (TLS) on SIP Server. They implement TLS to establish a secure channel before sending SIP signals and show that using TLS reduces the performance compared to typical case of SIP-over-UDP. The cost of RSA operations used for session negotiation is the primary factor. The experiment is running on an Intel-based server, not a mobile device.

In [11], Ashok et al. proposed a mechanism for enhancing privacy of Voice Calls by using ECDH. The ECC Key agreement is implemented in Asterisk Gateway Interface (AGI) [12] server which locates between the asterisk servers. The mobile phones transmit voice packet to asterisk servers, then voice data are encrypted and decrypted between the asterisk servers using ECDH key. In this mechanism, end-to-end privacy does not be provided. The Diffie–Hellman key exchange is vulnerable to a man-in-the-middle attack. In this attack, an attacker intercepts public values and sets up two different session keys with both parties involved in communication. Thus the attacker can eavesdrop the call. Elliptic Curve Menezes–Qu–Vanstone (ECMQV) is an authenticated key agreement, it provides protection against Man in the Middle (MitM) attacks.

To secure SIP signals and media packets is one of the most important things in SIP-based VoIP environment. In view of this point, we realize a VoIP system to offer a secure communication environment.

In this paper, we make the following contributions.

  • We present a feasible and secure SIP-based VoIP system. The system utilizes Java security package and Openssl library [13] to implement TLS that protects SIP signals. Furthermore, VoIP calls present key agreements by integrating ECDH and ECMQV, using the agreement key to secure voice packets which realize the encryption portion of SRTP [14]. In order to process the problem which firewalls blocks RTPs, we Ref. [15] to implement NAT traversal.

  • We use android-based smart phones to run key agreements on different algorithms (ECDH or ECMQV) with elliptic curves recommended by National Institute of Standards and Technology (NIST) [16]. We present the performance by up to 20 combinations which are two algorithms with ten elliptic curves parameters. The result is a useful reference for users who want to implement elliptic curve cryptosystems (ECC) [17] on mobile devices.

  • Only legal users can access our VoIP resources, thus we also propose an efficient and secure authentication mechanism in SIP registration process. We utilize the unique International Mobile Equipment Identity (IMEI) of the mobile device and the unique serial number, International Mobile Subscriber Identity (IMSI), of the subscriber identification module (SIM) card to generate the SIP REGISTER signal.

In past years, various security schemes are proposed, such as Password Authenticated Key Exchanged based (PAKE) schemes [18, 19], Hash and Symmetric Encryption based schemes [20, 21], Public Key Cryptography (PKC) based schemes [22, 23] and so forth. However, lacking implementations cannot provide concrete references to users. The proposed system integrates the key parameters into Session Description Protocol (SDP) [24] to achieve the key agreement and the experimental data make users aware of the overhead.

The remainder of this paper is organized as follows. The next section provides a brief background of SIP and TLS. Section 3 gives a brief overview of cryptosystems that is ECC, ECDH and ECMQV. In Sect. 4, we describe our secure VoIP system. Section 5 evaluates the performance on different combinations of elliptic curves and algorithms and the experiment results. Finally, we conclude this paper in Sect. 6.

2 Background

2.1 SIP overview

SIP is a signaling and an application-layer control protocol which is commonly used for VoIP communication. SIP defines two essential types of entities: user agents (UAs) and SIP servers. SIP servers are made up of registrar servers and proxy servers. Registrar servers are responsible for location management and proxy servers for message forwarding. SIP is based around request/response transactions, in a similar manner to the Hypertext Transfer Protocol (HTTP). Proxying, which means SIP message forwarding, is a critical function in the SIP infrastructure.

The standard application functionalities, such as authentication, authorization and media session setup, all require the proxy server to keep session state information.

Figure 1 shows a typical message flow of SIP proxying. User Agent Client (UAC) and User Agent Server (UAS) represent the caller and callee of a media session. First, the UAC and the UAS send register messages to the SIP proxy Server. The register message contains UAC/UAS’s credentials that verify its claimed identity (e.g., generally base on MD5 digest algorithm [25]). After passing authentication, SIP proxy server responds 200 OK messages to the UAC and the UAS respectively. The authentication information is optional; however, it is commonly deployed between UAs and its first-hop SIP server for allowing legal UAs to access resources.

Fig. 1
figure 1

SIP register and call setup flow

When the UAC wants to establish a session with the UAS, it first sends an INVITE message to the proxy server. Then the proxy server makes a response to the UAC with a 100 Trying message to inform the UAC that the message has been received. Then the proxy server checks the contact address for the SIP URI and forwards the message to the UAS. After receiving the INVITE message, the UAS acknowledges receipt with a 180 Ringing message and the callee’s phone rings. When the callee picks up the phone, the UAS sends out a 200 OK message. Both the 180 Ringring and 200 OK messages are routed back to the UAC through the proxy server. Once receiving the 200 OK message, UAC generates an ACK message for response. Then the media session is established, both endpoints use a media protocol, such as RTP, to communicate directly. When the conversation is over, the UAC hangs up and sends UAS a BYE message which is forwarded by the proxy server. The UAS then sends a 200 OK message in response. Figure 1 presents a basic flowchart, but in real networks to have multiple proxy servers between UAs is common.

2.2 TLS overview

In this part, the brief depiction of the TLS protocol is given. For more detail, please read [26,28,28].

There are three subprotocols in the TLS protocol that are used to control the session connection [29]: the handshake, change cipher spec, and alert protocols. The TLS handshake protocol is used to negotiate the session parameters. The alert protocol is used to notify the other party of an error condition. The change cipher spec protocol is used to change the cryptographic parameters of a session. In this paper, we focus on the handshake protocol. The handshake protocol consists of a series of message exchanges between the client and the server, it allows the participants to negotiate a specific cipher suite which includes ey establishment, digital signature, confidentiality and integrity algorithms. For an example, TLS_RSA_WITH_AES_256_CBC_SHA is a cipher suite, indicating that the RSA public key algorithm is used for shared secret key exchange and authentication; 256-bit AES in Cipher Blocking Chaining mode is used for bulk data encryption; and SHA-1 [30] is used as the message digest algorithm to compute the Message Authentication Code.

In the TLS handshake protocol, there are three types: Normal TLS handshake, Mutual TLS Handshake and Resumed TLS handshake. The normal TLS handshake is the reduced version of the mutual TLS handshake, it do not request authentication of clients. Figure 2 presented the process of the mutual TLS handshake.

Fig. 2
figure 2

Mutual TLS handshake process

First, the client launches the handshake with a ClientHello message which contains the version of the protocol, the cipher suite list and the compression algorithms that the client supports. To prevent replay attacks, a random number and timestamp is included in the message. After receiving the ClientHello message, the server specifies the protocol version and chooses the cipher suite and the compression algorithms among those proposed by the client. Then the server sends a ServerHello message back indicating which cipher suite it accept. Also the ServerHello message contains a timestamp, a random number which is a part of the key material, and an optional session_id that can be used to resume the session by the client later. Next, the server sends the Certificate message which has the server’s X.509 certificate containing its public key. Then the server transmits a CertificateRequest message to request the client’s certificate. Finally the server sends ServerHelloDone message to indicate all the messages have been sent in this phase. Once receiving the server’s CertificateRequest message, the client responds it with a Certificate message containing client’s certificate with its public key. For receiving server’s certificate, the client uses Certificate Authority (CA)’s public key to verify its certificate for authenticating the server. After the verification of server’s certificate, the client gets the server’s public key from the certificate. Thereupon the client generates a pre_master_secret and uses the server’s public key to encrypt it. Next the client sends the server a ClientKeyExchange message with the encrypted pre_master_secret to and a CertifcateVerify message containing a digest signature signed by client’s private key. The server can authenticate the client using client’s public key and decrypt the encrypted pre_master_secret by its own private key. Based on the same pre_master_secret, the server and the client both can compute a common master_secret which is used to generate shared symmetric keys for message authentication and bulk data encryption. The ChangeCipherSpec message, both the server and the client exchange, is used to indicate the sender has switched to the newly negotiated algorithms. At last, the Finished message used to ensure the integrity of the handshake has been transmitted to the other party. The Finished message contains a MAC digest of the negotiated master_secret.

During a configured interval, the resumed TLS hanshake allows the server and the client to restore the session information including the chosen algorithms and the master_secret. The resumption mode reduces the cost of renegotiating a new pre_master_secret.

3 Cryptosystem

3.1 Elliptic curve cryptosystem

In 1985, Koblitz [17] and Miller [31] proposed public key cryptosystems using the group of points on an elliptic curve. The primary advantage that elliptic curve systems is that one can use an elliptic curve group that is smaller in size while maintaining the same level of security. The result is smaller key sizes, bandwidth savings, and faster implementations, features which are especially attractive for security applications where computational power and integrated circuit space is limited, such as smart cards and mobile devices. Table 1 [32] compares the cipher strength in both Rivest–Shamir–Adleman (RSA) and ECC.

Table 1 Comparison of security strength

Table 2 [33] compares the computing time in both ECC and RSA. The computation time needed to solve an ECC based on ECDLP with a length of 160 bits is equal to that of solving an RSA with a key length of 1024 bits. Today, in practice, elliptic curve groups over the finite field of \(F_{p}^{{}}\) and \(F_{2}^{m}\) are used. Over the finite fields of \(F_{p}^{{}}\), an elliptic curve is defined by an equation of the form y 2 = x 3 + ax + b. Over the finite fields of \(F_{2}^{m}\), an elliptic curve is defined by an equation of the form y 2 + xy = x 3 + ax 2 + b, where a and b are arbitrary constants and 4a 3 + 27b 2 ≠ 0. To qualify as an abelian group, an elliptic curve defines O, a point at infinity, which serves as the identity element for some operations. The operations include the addition of two points and the double of a point. The rules can refer to [17].

Table 2 Comparison of computation time

The total number of points on a curve, described mathematically as #E(\(F_{p}^{{}}\)) or #E(\(F_{2}^{m}\)), is referred to as the order of a curve. The ECDLP is defined as follows: given P \(\in\) #E(\(F_{p}^{{}}\)) and Q = [a]P, find a.

3.2 Elliptic curve Diffie–Hellman key agreement protocol

The D–H key agreement protocol is one of the earliest practical methods of exchanging keys over an insecure channel. The original D–H was based on discrete logarithm problem. In this protocol, if Alice and Bob want to set up a random secret (session) key for their private key system, they first decide on a cyclic group, G, of order n and a generator, g, of the group in public. Then, Alice randomly chooses a prime integer, a \(\in\) [1, n − 1], and sends g a to Bob. Likewise, Bob compute g b for a random prime number, b \(\in\) [1, n − 1], and sends it to Alice. The secret key, g ab, is then set up, which Alice computes as (g b)a and Bob computes as (g a)b.

The ECDH key agreement protocol uses the D–H key agreement protocol based on ECDLP to computes the session key [ab]P. Table 3 defines the domain parameters for the ECC schemes [36].

Table 3 Elliptic curve cryptography domain parameters

The process of the elliptic curve Diffie–Hellman key exchange protocol refers to [37].

3.3 Elliptic curve Menezes–Qu–Vanstone key agreement protocol

In the ECMQV protocol [38], both parties are assumed to have long-term public and private key pairs. For example, Alice has the static key pair, [a]G as the public key and a as the private key. Bob has the static key pair [b]G and b likewise. To agree on a shared secret, Alice and Bob both generate two transient key pairs that are ([c]G, c) and ([d]G, d). After that, they exchange the public keys of these transient keys as in the standard ECDH protocol shown in Fig. 3.

Fig. 3
figure 3

Elliptic curve Diffie–Hellman key agreement protocol

After exchanging the public keys, Alice knows

$$a,c, \, \left[ a \right]G, \, \left[ c \right]G, \, \left[ b \right]G\;{\text{and}}\; \, \left[ d \right]G$$

and Bob knows

$$b,d, \, \left[ b \right]G, \, \left[ d \right]G, \, \left[ a \right]G\;{\text{and}}\; \, \left[ c \right]G$$

The shared secret is then computed by Alice according to the following algorithm:

ECMQV key derivation

 

INPUT: A set of domain parameter (#E(\(F_{p}^{{}}\)), q, h, G) and a, c, [a]G, [c]G, [b]G and [d]G

 

OUTPUT: A shared secret Q

 

1. n ← \(\left\lceil {log_{2} (\# E(F_{p} ))} \right\rceil\)/2

 

2. u ← (x([c]G)(mod 2n)) + 2n,    Convert the x-coordinate of the public key [c]G to an integer

 

3. s ← c + ua (mod q)

 

4. v ← (x([d]G)(mod 2n)) + 2n,    Convert the x-coordinate of the public key [d]G to an integer

 

5. Q ← [s]([d]G + [v]([b]G))

 

6. If Q is an infinity point goto step 1.

 

7. Output Q.

 

Bob also compute the same point of Q by changing the parameters (a, c, [a]G, [c]G, [b]G and [d]G) in the above algorithm with b, d, [b]G, [d]G, [a]G and [c]G. Then the shared secret Q is agreed.

4 The proposed VoIP system

The VoIP system we proposed is presented in Fig. 4. User equipments such as mobile devices can access VoIP services on the internet provided by 3G/4G base stations or WiFi access points. SIP Proxy servers and RTP Relay Servers are deployed behind the firewall which defends malicious internet attacks. The certificate authority and the database server, which are the kernel of the system, are allocated behind the second firewall and the intrusion prevention system.

Fig. 4
figure 4

The architecture of the proposed VoIP system

In this architecture, we assume that CA and DB are well-protected (In reality, a successful intrusion will make the in-use certificates be revoked and new one be issued). Clients are equipped with requisite certificates containing server’s and callees’ public keys.

First, the SIP client must register and be authenticated by the SIP server using the proposed authentication mechanism. Then when two SIP clients want to establish a media session, they use the SIP messages integrating ECMQV protocol to achieve a key agreement and use the agreed session key to encrypt the RTP packets with AES256. The SIP messages are protected by a secure communication channel which is provided by Transport Layer Security. The authentication mechanism and the integration of SIP and ECMQV key agreement protocol will be described in the following.

4.1 Authentication mechanism

Clients installed our SIP application will first set up a TLS-secured channel with the SIP server. Then the UE will send its IMSI and IMEI via the secure communication channel to the server. After receiving the IMSI and IMEI, the server encrypts and stores this information with the corresponding SIP account into database server. When the client wants to register to the SIP server, it starts with computing the authentication code as follows:

  • SHA256(IMSI|CSeq)| SHA256(IMEI|CSeq)|

  • SHA256(CSeq)|SIP_ACCOUNT

Then the client appends the authentication code to the SIP REGISTER message and sends it to the SIP server. The SIP REGISTER message is illustrated as Fig. 5 which is captured by WireShark packet analyzer. The first picture in Fig. 5 presents the encrypted application data in TLS which is unable to read. In order to explain the modified register message, we temporarily halt TLS protocol to show the content.

Fig. 5
figure 5

SIP register message

The registration mechanism incorporates IMSI and IMEI to bind the application with the user’s mobile device, in order to avoid an attacker installing the application on other devices to pretend users. The pervasive authentication that uses SMS may be broken by redirecting the SMS authentication code. New accounts can register the system with new IMSIs and IMEIs, the origianl users can request to the existing IMSIs and IMEIs from the setting.

The SIP server will store CSeq into the database at first time and extract IMSI and IMEI from the database in accordance with the SIP account. Next the SIP server computes the authentication code in the same way and compares it with the client’s. If the result is the same, the SIP server will respond 200 OK message to the client; otherwise, 403 FORBIDDEN message will be sent.

After the first registration, the subsequent register messages will adhere to the following process:

if(exists(SIP_ACCOUNT) == true){

 

 if (CSeq ! = 0 && CSeq < (CSeq[DB] + 250)){

 

  AUTH server = (SHA256(IMSI[DB]|CSeq)| SHA256(IMEI[DB]|CSeq)|

 

                        SHA256(CSeq)|SIP_ACCOUNT);

 

  if(!strcmp(AUTH client, AUTH server))

 

      send(200_OK);

 

  else

 

      send(403_FORBIDDEN);

 

 }else

 

      send(403_FORBIDDEN);

 

}else

 

      send(403_FORBIDDEN);

 

where is CSeq received from the client and CSeq[DB], CSeq[DB] and CSeq[DB] is extracted from the database. Considering packets may lose over the internet, the SIP server will accept the tolerance difference of CSeq in 250.

4.2 Integration of SIP and ECMQV

If two SIP clients want to establish a media session. The caller (UAC) first generates the ephemeral key pair that is ([c]G, c). Then the caller attaches the public key ([c]G) to Session Description Protocol (SDP) [39] information in the SIP INVITE message and sends the message to the SIP server. After receiving the SIP INVITE message, refer to [40, 41], the SIP server queries the RTP server to get two audio communication ports which is allowed through the firewall, modifies the SDP as follows:

Original SDP:

  • c = IN IP4 10.197.134.175

  • m = audio 49170 RTP/AVP 18

  • a = rtpmap:18 G729/8000/1

  • k = PK:ce15ad9708e3c406255afc01784e480681d22a6b225a8148465d4b6118e047a43f3777c44752bbb61ae5f264deab64c9916b9890d17179abbd606b92bf52480830de6ea686ad1e2592f32235426446d1246f7410c962179ae14b77d62cec81fb56570877b397f03045c6c432a22616b3d31d033f3cedf9ee9c72f157fe99580cc03d

Modified SDP:

  • c = IN IP4 140.118.122.145

  • m = audio 20000 RTP/AVP 18

  • a = rtpmap:18 G729/8000/1

  • k = PK:ce15ad9708e3c406255afc01784e480681d22a6b225a8148465d4b6118e047a43f3777c44752bbb61ae5f264deab64c9916b9890d17179abbd606b92bf52480830de6ea686ad1e2592f32235426446d1246f7410c962179ae14b77d62cec81fb56570877b397f03045c6c432a22616b3d31d033f3cedf9ee9c72f157fe99580cc03d

Next, the SIP server forwards the SIP INVITE message to the callee (UAS). The callee also generates the ephemeral key pair that is ([d]G, d) and attaches the public key ([d]G) to Session Description Protocol (SDP) as the caller does. Furthermore, the callee computes the shared secret point Q and calculates H(Q) = (k, k’) where H is a hash function. The callee uses k’ to generate a Message Authentication Code (MAC) and appends MAC to the SDP either.

$$M = MAC_{k} ,\left( {2,{\text{ Callee}},{\text{ Caller}}, \, \left[ d \right]G, \, \left[ c \right]G} \right)$$

Thus the callee transmits the SIP 200 OK message to the SIP server. The SIP server modifies the communication IP address and the port number in SDP and conveys the SIP 200 OK message to the caller.

After receiving the callee’s public key, the caller also computes shared secret point Q and calculates H(Q) = (k, k′). Then the caller to k’ to verify the MAC contained in SDP. If the value is not the same, terminate this session. Otherwise, the caller generates M’ = MAC k(3, Caller, Callee, [c]G, [d]G) and attaches it to SDP of the SIP ACK message. After that, the caller sends out the SIP ACK message to the SIP server and the message is forwarded to the callee by the server.

Finally, the callee uses k′ to verify the M’ in SDP. If the verification fails, the call will not be set up. If not, the session is established and both clients use k to encrypt the subsequent RTP packets. The whole process is presented in Fig. 7. The TLS handshake phase refers to Fig. 2. Before the media session, the SIP messages are protected by the TLS-secured channel that SIP payloads are encrypted by the agreement key of the server and the client. After the media session is established, clients use the agreed session key to encrypt RTP payloads that even the SIP server or the RTP server cannot eavesdrop.

5 Performance analysis

In this section, we evaluate the consuming time of establishing a TLS-secured channel, elliptic curve point addition, multiplication, ECDH and ECMQV. The experimental devices are hTC Butterfly which equipped with quadcore Snapdragon APQ8064 CPU at up to 1.5 GHz per core and 2G RAM and runs on 4.2.2 Android platform. The servers have a 1.9 GHz Intel i3-3227U CPU and runs Windows 8.1 operating system.

First, we use openssl and Java keytool to generate essential certificates and deploy them to the servers and the clients. Figure 6 illustrates a sample certificate which contains 4096 RSA public key and be used to proceed the TLS handshake. We evaluate TLS handshakes with different RSA key lengths in a pure WiFi environment. The result is shown in Table 4. The cipher suite is set to TLS_RSA_WITH_AES_256_CBC_SHA.

Fig. 6
figure 6

The client certificate contains 4096 public key

Table 4 TLS 1.2 handshake

From Table 4, we see the time cost of establishing a TLS for mobile devices is high. However, the process of establishing a TLS can be carried out at startup of the application. Thus, the clients will not be conscious of it when making a VoIP call. Besides, Table 4 shows that the difference of time at different key lengths is small.

Next we present the consuming time of ECDH and ECMQV key agreement in Table 5. The elliptic curve refers to the recommendation of NIST.

Table 5 ECC time cost (s)

From Table 5, it’s clear that the time cost of ECC is extremely low than that of f Integer Factorization Cryptography (IFC). In NIST’s recommendation, there are the other two curves sect409r1 and sect409k1 respectively; however, compared to others, the performance of these two curves are relatively bad. The results may be influenced by the parameters of the domain. We do not discuss it in this paper. Therefore, we remove these two curves from Table 4. We see that the time cost of ECDH and ECMQV is not the multiples of the addition and multiplication. That is because the consuming time of the function call should be taken into consideration. This is a critical point to implement a cryptographic system.

Finally, we show the voice quality of our system compared with popular VoIP applications such as Skype and Line (Fig. 7). We utilize Spirent Communications to measure the performance and present the results in Table 6.

Fig. 7
figure 7

The process of SIP integrating ECMQV

Table 6 Voice quality comparison

The Spirent software records the output audio and uses it with reference to its original audio file to compute PSEQ (MOS-LQO) score. Perceptual Evaluation of Speech Quality (PSEQ) [42] is a worldwide applied industry standard for objective voice quality testing, Mean Opinion Score-Listening Quality Objective (MOS-LQO) scale is in the range 1–5. Our application adopts G.729 audio compression standard with the proposed security mechanism. Skype uses self-made SILK as its Codec and RSA for key negotiation and the Advanced Encryption Standard to encrypt conversations. Line does not release the information of its codec and the audio packets are not encrypted. Our experimental environment is full of WiFi access points that generate many interference signals. The packet loss rate and latency is high. On the contrary, 3G signal strength is much better. Thus, we found the voice quality with 3G signal has superior performance than the voice quality with WiFi signal. The outcome shows it is better to utilize 3G rather than WiFi in a chaotic WiFi environment (In our laboratory, we do not have instruments to measure the signal strength). From Table 6, the result shows that our application has best performance in 3G mode. In WiFi mode, the voice quality of our application is a little bit worse than Skype. However, the security of our system is much stronger than that of the other two. Moreover, the VoIP system is controlled by ourselves rather than it is under others’ control.

Compared to other application, MicroSIP which is an open source portable SIP softphone based on PJSIP stack can only be installed in Windows OS. Xlite is a desktop application that runs on Windows or MacOSX. Zoiper is a free SIP client that supports both SIP calls over 3G or WiFi connections. However, most applications adopt the Diffie–Hellman cryptographic key exchange that lacks authentication and suffers the Man in the Middle (MitM) attack, especially the servers are not under control. The attacker can impersonate a server to eavesdrop communication channels. The cryptographic algorithms they utilized are opaque. In the proposed system, certificates are signed and issued by ourselves. We adopt the relatively secure and efficient elliptic curve and algorithms to implement cryptography. On top of that, from cryptography security to information security, the whole system is under our management.

The Fig. 8 illustrates our experiment environment which floods with many WiFi APs. However, the lab located under the coverage of signal of one base station. From the In 3G mode, the signal strength of the mobile phone is about −85 ~ −90 dBm. (When the signal strength exceeds −70 dBm, the quality of mobile network is excellent. When the signal strength is between −70 and −102 dBm, mobile network service is good. When the signal strength is lower than −102 dBm, the performance of mobile network is bad.) Thus we obtain a good QoS. However, in WiFi mode, many WiFi APs interfere with each other in the experiment environment. It influences the signal strength, lower than −102 dBm, and makes the connection unstable. For this reason, it results in bad performance. It concludes, in a environment full of WiFi APs, users should choose 3G/4G mode rather than WiFi mode.

Fig. 8
figure 8

The illustration of our experiment environment

6 Conclusion

As the events of surveillance and eavesdropping are disclosed, more and more governments and enterprises focus on the privacy of data transmitted on the internet. Nowadays, there are many free VoIP applications can be downloaded from internet; however, the security of these applications is not guaranteed. Many studies are proposed to provide the security of VoIP, but there is no concrete implementation. Besides, the voice quality after encryption should be taken into consideration. In this paper, we present a completely cryptographic VoIP system and show that the quality of voice is superior. In future research, we will study the QoS affected by various factors, especially packet loss, latency and jitter.