1 Introduction

In the IP network’s situation, voice communication is transmitted by IP packets through a data connection. The methods used for this application primarily to web-based data transmission, use IP packets, which also carry voice traffic. Achieving the required QoS is of the utmost importance for the accessibility of an adequate bandwidth to send and receive network and speech packets. Following 2010, VoIP applications like Skype, What Sapp and G talk will use the internet on 2 G or 3 G networks. The application path of the OTT program is not discriminated from other IP data traffic. Thus QoS can be severely affected by voice interaction. [1] 3GPP mainly worked on achieving a high data performance with a decreased latency through LTE network creation. LTE is only an IP network, and the ability to move conventional voice is not relevant. A separate approach was needed for LTE networks for the transport of switched circuit calling. This approach is typically known as “VoLTE” to transmit speech over IP on LTE networks. VoLTE converts speech into a data stream that is sent via the data link. For user experience, it is crucial to provide voice services connected to the data channel for other services such as video streaming, web browsing, and the public media. Therefore, as VoIP is the best service, the fundamental difference between VoLTE and VoIP is; VoLTE has the potential to validate QoS from the beginning. [2] The successful implementation of the IP Multimedia System (IMS) framework in the LTE network is a sure scheme of end to end QoS. The IMS supports various access and multimedia services and has evolved as a regular packet core (EPC) architecture [3,4,5].

3GPP committed with GSMA IR.92 IMS Profile [6] and GSMA IR.94 IMS Profile [7] when it provides high-quality IMS-based telephony products via the LTE mobile access network. This defines the optimal selection of the current 3GPP technologies to provide network infrastructure manufacturers, service providers, and smartphone developers with optimized Cellular mobile and multimedia solutions.

The use of VoLTE has presented mobile providers with several benefits because speech is the primary source of revenue. In comparison to current CS networks, VoLTE improves bandwidth efficiency and reduces network costs. Thus, VoLTE is an effective option for voice in 4 G networks. [8, 9].

VoLTE increases end-user understanding by superior quality of experience (QoE). In contrast, the VoLTE platform only supports multiple speech codecs frequencies AMR with both wireless AMR-WB, AMR-NB, and enhanced voice services (EVS). In turn, it provides an extra wide-band AMR-WB range.

To give up to 20 kHz audio range, the 3GPP Rel-12 implemented EVS codec. [10] Super wide-band EVS (SWB) with 13,2kbps offers an equal bit rate for voice quality to AMR and AMR-WB with hard chips and packet failures.

VoLTE provides high definition speech (HD) communications end-to-end (E2E) services to provide a better user experience. A collection of QoS category IDs (QCIs) has been established by 3GPP and are used by entities to provide the minimum QoS specifications for their end user system forbearers of particular preferences and attributes. For other products, distinct QoS rates are offered, which are assured by VoLTE. To build an E2E signal and a carrier stream, E2E QoS needs the help of cell terminals, communication networks, and core networks. This can improve the QoS commitment of voice services and enhance the experience of customers. A practical VoLTE QoS analysis is evaluated in MOS to determine voice quality. [11].

QoS analysis of KPI vendors and device manufacturers is essential information. The main purpose of this paper is to evaluate different screen formats and packet mode metrics for improving video quality and to examine statistical data for improving audio quality. Such data will help suppliers of video quality control tools in the IMS stack and product identification dependent on MOS efficiency.

The technological aspects of VoLTE implementations on the existing circuit-changed (CS) and packet-changed (PS) networks were discussed in this article. The speech function of the VoLTE system was implemented via the IMS platform in the LTE network. A better description of the carrying voice on the LTE network makes IMS derived speech usually available.

This paper also provides output analysis based on video evaluation information and VoLTE call quality. The KPIs are assessed for QCIF, QVGA and VGA displaying formats and FU and SNU packeting modes. The video quality test cases help us analyze the effects on the key KPIs for different parameters. Session Score is calculated as the arithmetical mean in all parts of a session. The sum is determined. In all situations with the right frame rate under varying RF settings, the weighted session rating and the video quality are determined. Image rating distribution is graphically evaluated for all configurations and video quality test cases checked. The checks were conducted for VoLTE call audio quality in many situations. This paper explains the sample two cases.

As far as potential work is concerned, this paper proposes to test audio video sync (AV-sync) KPIs in all formats, and to analyze the impact on VoLTE call quality.

The paper is organized appropriately. Section 3 in short explains different mechanisms to bring voice traffic to a LTE network. Section 2 reviews existing research articles focused on VoLTE QoS. Call procedure in VoLTE is briefed in Sect. 4. Section 5 describes the main video and audio call performance indicators (KPIs). The findings were presented and addressed in Sect. 6 and the assumptions are eventually clarified in Sect. 7.

2 Literature survey

In this chapter, researchers on VoLTE and QoS discuss some essential approaches to science. A short overview is given of some improvements to the current methods.

Yunhan Jack Jia et al. [1] introduced the commercially implemented VoLTE’s quality description and contrasted it to the standard 3 G call, Over the Top (OTT) VoIP call. VoLTE excels in the performance of clips but lags behind conventional 3 G communication efficiency. Wasi Ahmad DDG [12] has published a paper explaining the design of VoLTE, multiple voice transmission scenarios on LTE network, the structure of the VOTE call and the challenges associated with VoLTE, and has analyzed the efficiency of VoLTE in different radio environments and voice quality in terms of MOS.Ayman Elnashar et al. [13]. The study examined the use of VoLTE identification system to catch voice quality on VoLTE.

Mohamed EL Wakiel and others [14] have measured QoS KPIs by evaluating the trace of the actual VoLTE calls, which they have registered, and verified output by comparting them to the simulated values.

Guan-Hua Tuy et al. [15] analyzed VoLTE with respect to cost-setting and mission complexity. VoLTE requires higher priority systems in mobile networks to ensure reliability. M also given the different conditions for the delivery of voice call and retention services over LTE-based networks and a possible guidance for mobile operators to maintain calling over LTE. M performed a test of practical VoLTE and IMS over LTE mobile reference networks. Tabany et al. [16] The QoS output was analyzed using OPNET and checked. The findings of the test were in accordance with VoLTE ITU-R and 3GPP specifications.

3 Voice traffic transmission methods in LTE networks

The two different voice communication methods in this case are Simultaneous Voice or LTE, SVLTE and Circuit switched back (CS FB) [17], enabling a mobile to concurrently utilize voice and data networks, but not to use them. When the whole networks are LTE, but they have no facilities for making voice calls on the LTE, the voice calls are done by traditional networks such as CDMA/UMTS (2G/3 G). SVLTE utilizes two separate Radios to link Legacy CDMA to switched circuit services such as voice call and LTE network for improved packet switched server (PS) quality in the event of LTE issues with its legacy CDMA network. [18] The CSFB is another potential LTE speech communication midway fix. In this situation, only the existing networks are liable for a voice call made on the LTE network. The device “goes back” to the 3 G or 2 G network once LTE device is used for voice call or text message. Despite this, UE activates again on the LTE network after the call is safe. [19, 20] LTE’s network and VoLTE’s implementation was managed by LTE’s voice calls. Instead, if there is not LTE coverage, the existing networks handle the calls. Single Radio Voice Call Continuity (SRVCC), for voice traffic communication, is used for areas of non-LTE coverage.

IMS puts together speech functions such as encryption, database activation, call management, filtering, interoperability with PSTN, billing etc. The communication network is therefore implemented in VoLTE by using eUTRAN and EPC, while voice resides in IMS.

3.1 Mean opinion score (MOS)

Mean opinion rating (MOS) is a metric that measures an intervention or system’s overall quality of performance of knowledge or telecommunications technology. In all the specific “values in the preset scale,” it is the numerical measure that an object offers its opinion of a process quality’s performance [14], but they can also be algorithmically calculated. Such scores are usually obtained in a qualitative quality evaluation.

MOS is a common measure used, but not limited to those methodologies, for video, audio and audio visual quality assessment as shown in Table 1. ITU-T defined several ways in which the Score was taken from audiovisual, conversational, audio, speech and or video quality testing, as defined in Recommendation P.800.1.

Table 1 Satisfaction level of MOS score for voice call

3.2 Mathematical rating scale

The MOS is defined as a single rational number, typical of 1–5, where 1 is of the lowest quality and 5 of the highest quality perceived. Depending on the rating scale used for the underlying test, other MOS ranges are also possible. Quite common use is the Total Class Rating Scale indicated in Table 2 which maps rating from bad to outstanding from 1 to 5 as per se.

Table 2 Mathematical ratings between bad and excellent

ITU-T recommendations (such as P.800 or P.910) contain other standardized quality scales.

$${\text{MOS}} = \frac{{\mathop \sum \nolimits_{n = 1}^{N} R_{n} }}{N}$$
(1)

For instance, a constant scale between 1 and 100 could be used. The scale of the test is dependent on the objective of the trial. In some contexts, when obtained with different scales, there are no statistically significative differences between ratings of the same stimuli [21].

4 Call procedure in VoLTE

Two separately-adopted 3GPP specifications are based on the VOLTE Architecture. In this sequence, IMS is the primary VoIP operator for the LTE network because it knows and recognizes that specific network arrangements are necessary to promote voice communication. IMS was originally included in UMTS 3GPP 5 and LTE as an excellent voice transmission service in shared data packets, synthesized with IMS and LTE. A network ISP with a specified QoS is a carrier which provides greater spectral efficiency, higher power, lower latencies, and required service reliability (QoS). LTE guarantees this smooth IP connectivity. The network frequently configures and launches networks, as per request, in addition to the standard carrier that was approved for UE membership in the LTE network. Such modern companies are classified as independent carriers.

The two key porters for voice traffic are SIP signs, which are popular for network transactions for the UE server or VoLTE in a VoLTE call.

The QoS Class Identifier (QCI) is allocated to each client by the LTE network. A guaranteed or non-guaranteed bit-rate is considered by source type for each QCI. In tandem with the radio tools and packet information streams, QCI monitor the bearer from the UE to the PDN. In the 3GPP standard TS 23.203 standard QCIs are approved.

In the VoLTE system, the IMS framework sends guidance to the LTE network using the Sip and the QCI sets voice delegated cellular links to acceptable QoS facilities. At the conclusion of the call, the IMS advises the LTE network to minimize the actual Voice area. Of different traffic forms, the QCI sets the appropriate latency rate.

The two communication sessions of each VoLTE call are, according to Fig. 1, held on the data plane and on the control plane. The control aircraft session is the exchanging of call signals via the famous SIP. The data plane session handles transmission of voice packets via the RTP, which is defined at the inspection aircraft session on demand. To maintain a similar call quality to standard CS calls, LTE provides many product forms (for example, the fixed bit rate and numerous priorities). The LTE data stream, which offers standard data commodities but has higher emphasis than data services, includes both VoLTE and spoken signals. [15] The VoLTE service applies to two subsystems within LTE networks. The first is the cornerstone of the IMS, which is designed to support IP and multimedia products. The data gateway was built to provide VoLTE consumers and traditional mobile users, for instance, with real-time multimedia traffic (voice). The VoLTE client provides the features of the call control session between the network, the broadcast portal and the 4G gateway. The second is the device for the distribution of current packets. The 4 G portal is its main component. The main function is to provide the motive system with PS connectivity. This 4 G gateway facilitates VoLTE by sending packets to power and information planes between the network and the IMS center. The 4G gateway also facilitates control functions like IP address distribution, packet filters, and network quality and load assistance. The functional blocks for UE call control in case of ITRI VoLTE is shown in Fig. 2.

Fig. 1
figure 1

LTE network architecture with (out) VoLTE [1]

Fig. 2
figure 2

Functional for UE call control ITRI VoLTE [22]

The codec specified for VoLTE is an adaptive 3GPP (AMR) codec. The use of AMR codec for VoLTE provides benefits of interoperability of heritage networks.

5 Image and audio key performance indicators (KPIs)

Image reliability has been tested in FU and SNU file formats such as QCIF, QVGA, and VGA display. “KPI” includes a description of the video capture(s) performance data. The below is the way to define KPIs. Frame rate: This refers to the median number of framerates recorded in a video session in all sections. Max Section- Segment: The total framerate quality in a section in each session. Min section: The total framerate quality found in a line in a session from every device. Frame rate Deviation Standard: shows the framerates distribution across different segments. Normal Relative Deviation: this is the frame maximum relation to the frame rate observed.

Time freeze: When two following frames are identical, it is called frozen. Frozen one. The Time Frozen ratio represents the time for which freeze happens during the session. Percentage of time affected: a broken picture is an object that cannot be marked with any anomalies or circular markings. The interpretation is demonstrated by the amount of time deficits in a session for which the period is affected. Session score: Based on the frame rate observed, frozen time percentage and time deterioration percentage. The total session score is the numerical sum for all sections of one session. Audio performance evaluation is similar to MOS. Audio quality assessment. You can score every session at 5, 5 are nice, 4 are great, and 3 are average.

KPIs are the middle view score for audio call which is used to assess audio quality, as shown in Table 1 unintentionally.

6 Results and discussions

KPIs for different display types and packeting modes determine the quality of video calls in VoLTE.

6.1 KPIs for Video quality with distinct video formats

The KPIs for the SNU (0) mode were statistical tests of video quality. As shown in Table 3, the observed frame rate (average). For QCIF screen format, maximum is the measured frame rate (percentage).

Table 3 Video quality KPIs for packetisation mode SNU (0)

The video quality KPIs for the FU (1) mode are being quantitatively analysed. The median frame rate, as shown in Table 4 is the lowest measured frame rate of QVGA screen format. The frame rate is the maximum observed.

Table 4 Video quality KPIs for packetisation mode FU (1)

KPIs are assessed for QCIF, QVGA, and VGA display formats and for FU and SNU packet modes. Table 5 assesses and displays the KPIs.

Table 5 Video KPIs for different video formats

The session settings (average score) are calculated from the measurements taken, which are the arithmetical mean of the session values for all segments in a session. The weighted session rating is 4.97, excellently with a frame rate of 15fps observed, as shown in Table 5.

6.2 Video quality investigation

The Relative Signal Received Power (RSRP) picture quality control reaches -85 dB and the Noise and Interfere Ratio (SINR) approaches 16. The object call of Codec H.264 was tested, with a frame rate of 15 min in the SNU and FU range of different video styles. For all test cases, Table 6 shows the frame rate and session scoring.

Table 6 Video quality test case results

Video Codec H.264 is scored 4.3 in the QCIF and SNU features. Video codec H.264 in QVGA and FU formats hold the highest 4.45 score with a 14.19 frame rate. Also in the H.264 Video Codec, the best score is 14.09 in VGA-and FU mode in session. The H.264 video codec was evaluated in default Radio setters (RSRPs bigger than − 110 dB and smaller than − 85 dB) for QCIF, QVGA, VGA, SNU and FU system designs.

The graphical analysis is focused on graphs of averages for all combinations. Image (Figs. 3 and 4) describe the FU and SNU function for QCIF video rating distribution. YouTube score scale of 17 million ratings from 4.1 to 4.2, as shown in the Chart. In both QCIF and FU kit formats. The QCIF monitor size and SNU packeting mode shown in the statistic is 25 million quality at image score level 4.5–4.6. For codec H images. 264, QCIF’s perception of SNU mode offers better quality for video score than FU packets.

Fig. 3
figure 3

Display format: QCIF, packetization mode: FU

Fig. 4
figure 4

Display format: QCIF, packetization mode: SNU

The image (Figs. 5 and 6) show FU and SNU packet mode image output screen QVGA. The video ranking range 4.7–4.8 indicates 25 percent ratings in the monitor type and the FU packeting feature. 4. As shown in the QVGA screen style and SNU mode, 17 percent of video rating ranges 4.6–4.7 and 3.9–4.0. The QVGA packet layout for Video Codec H.264 is higher than the video rating in SNU style of 5.

Fig. 5
figure 5

Display format: QVGA, packetization mode: FU

Fig. 6
figure 6

DISPLAY format: QVGA, packetization Mode:SNU

The transmission of object levels for VGA with FU and SNU indicates categories 7 and 8, respectively. VGA monitor and FU pocketing mode shown in the figure are in the picture range 4.7–4.8 for 25% values in Fig. 7. The 4.7–4.8 array of video scores indicates 29% of the display layout and the configuration of a VGA kit, shown in the picture. 8. With Video Codec H.264, the VGA output in the SNU packet format provides much more quality than the video rating in FU packet mode (Fig. 8).

Fig. 7
figure 7

DISPLAY format: VGA, packetization mode:FU

Fig. 8
figure 8

DISPLAY format: VGA, packetization mode:SNU

The SNU and FU packaging modes are available for the H.264 video codec and the range of pixels 4.7–4.8. As shown in Table 6, test results are confirmed. With a calculated frame score of 14.09, the session-level is correct at 4.46. In the video-codec format H.264 QVGA or FU, video is available throughout 4.7–4.8. As shown in Table 6, the test results were checked. At 14.19 the frequency is sufficiently at 4.45.

The video codec H.264 is delivered in high quality 4.5 to 4.6 output in QCIF and SNU pocketing modes. As shown in Table 5, the outcomes of these test cases are confirmed. At the frame rate determined by 14.07, the session score is correctly sustained until 4.3. The performance of Video call is compared with 2G/3G, CSFB and ViLTE as shown in Table 7. The session score quality in this case is best. The quality of ViLTE service in multimode and multiband interworking scenarios is also stated for reference in Table 8.

Table 7 Performance of Video call in 2G/3G, CSFB, ViLTE [22]
Table 8 Quality of ViLTE service in multimode and multiband interworking scenarios [22]

The video codec H.264 is predicted to have 15 FPS frame rate as per IR.94. This model defaults to FPS 15 QCIF. During the study, there was a significant video failure for 30 FPS. Consequently, for the test cycle, the frame rate was kept at 15 FPS. The QCIF-15 Fps specifies in the horizontal and vertical range of 176 × 144 pixels. Such details help to improve video quality for device providers on the IMS stack.

6.3 Analysis of voice quality

The HD apparatus was used to compute MOS for a range of applications and provides an objective MOS that unidirectional characterizes audio quality [23]. Get the microphone’s audio input into the system and the audio output from the headphone jack for contrast and determine MOS on the basis of different measures, including unidirectional lag and missed images.

Test calls are made in conjunction with each RSRP rate for every request, and the MOS is measured every 10 s for a period of 5 min. Table 1 shows the level of satisfaction with the MOS call ranking.

For System under Test (DUT), the MOS software uplink and downlinking is taken. The MOS information are shown as the MOS distribution is organized every time. Table 9 summarizes the conditions of several testing environments that measure the quality of the VoLTE services using the POLQA [24, 25] under different signal strengths in terms of reference signal received power (RSRP).

Table 9 Test comparison in VOLTE [23]

Case 1: The MOS downlink and the DUT A uplinks are displayed in Table 10.

Table 10 Downlink and uplink MOS for DUT A

The image representation is shown in Figs. 9 and 10. MOS-Downlink distribution 11%, 56% and 22%, indicates the distribution of MOS in levels 3.1–3.2, 3.2–3.3 and 3.3–3.4. This MOS scope indicates that consumers do meet the call quality. The value of 77.78% of DUT A is above quality of MOS 3.2

Fig. 9
figure 9

Downlink MOS distribution for DUTA

Fig. 10
figure 10

Uplink MOS distribution for DUTA

Figure 10. MOS-Distribution displays uplink, MOS-Range 3.0–3.1 it is 10%, 2.9–3.0 it is 20% and 2.7–2.8 MOS-Range is 40%.The MOS scope is weak uplink. This also indicates that the 90.00% of DUT A MOS values are below 3.0. All distributions show that the DUT A system is under the poor MOS range.

In Table 11, Downlink and Uplink Case 2: MOS for DUT B is shown.

Table 11 Downlink and Uplink MOS for DUT B

The image representation is shown in Figs. 11 and 12. Downlink MOS distribution is shown at MOS sizes 3.3–3.4, 3.2–3.3 and 3.1–3.2 with values 20%, 60%, and 10%, respectively. The MOS range suggests the customer is not happy about the voice call quality. The value of DUT B is 80% above the quality of MOS 3.2.

Fig. 11
figure 11

Downlink MOS distribution for DUTB

Fig. 12
figure 12

Uplink MOS distribution for DUTB

Scales 2.9–3.0, 2.8–2.9 and 2.7–2.8 and uplink MOS variance from 36 to 9% are shown in the values distributions. This is a MOS uplink. It also indicates that 100% of MOS values for a given DUT B are below 3.0. The DUT B in the low MOS range on a standard system is shown in all distributions.

The results are compared with performance indicators for various scenarios as shown in Table 12. [22].

Table 12 Performance indicators of various scenarios [23]

7 Conclusion

Summarized the some points: When call parties are the UE’s of various brands, the ARM-NB Mode 7’s voice quality is better than the UE’s of the same brands. ARM-WB Mode 8’s output is influenced by UEs from various brands and thus call calculations by paring UEs from different brands are proposed. The VoLTE call quality is assessed using POLQA-NB and POLQA-WB to supply more genes.

With the IMS system, the VoLTE application implemented voice in LTE network. In the current scenario, IMS-based voice is widely considered the better approach.

Image KPIs refer to the capture capture performance(s) for the various screen and packeting modes for the study of video quality in VOLTE. Graphical analysis of the video scoring distributions validate the calculated QoS KPIs for video calls. The calculation analysis concludes with verification of the video codec option in VoLTE request for screen formats and packeting modes.

The Video Codec H.264 is a VGA with the SNU and FU option in the 4.7–4.8 video range. With a picture rate of 14.09 the H.264 video codec is held in a 4.7–4.8 score with a QVGA or FU packaging format. The photographic score of 4.46 is correctly measured. The Rating is accurately held at 4.45 with frame rate observed 14.19. The findings of a video test case were checked for video codec quality.

The quantitative MOS scale referred for assessing voice quality shall be conducted for voice call analyses. MOS DUT was shown by the Downlink/Uplink MOS graphical projection. The DUT case study categorized it as weak MOS phones potentially.