1 Introduction

Due to the fast advancement of digital systems and the widespread usage of software to operate them, it is essential to accurately and thoroughly assess software reliability to examine system reliability [1]. The software industry faces a big challenge in producing highly reliable software while complying with a strict timeline and budget [2]. As testing progresses, the faults are found and fixed, and it is assumed that the software system’s reliability improves [3]. Testing and debugging the system until it reaches the required degree of reliability is the fundamental goal of software reliability modeling [4]. The mathematical models that depict this behavior of the testing process to predict software reliability are known as SRGMs. These models have broad implications for software development, maintenance, and management, enabling organizations to enhance software quality, reliability, and performance while minimizing risks and maximizing resource efficiency [5, 6]. When a model is used to abstract reality, a few assumptions are necessary [6, 7]. The model’s form and the meaning of the model’s parameters are determined by these assumptions. Important phases in the construction of these mathematical models include the identification of realistic assumptions and the modeling of those assumptions within the right framework. The NHPP models are capable of providing an analytical framework for the software fault removal phenomenon [8]. In NHPP, the mean value function m(t), which is a cumulative number of faults identified over time t, is the function that represents the pattern [9]. A lot of research on software reliability has been carried out through SRGM [5,6,7,8,9,10,11,12]. Researchers focused on developing new SRGMs by considering different aspects of testing processes, i.e., testing coverage, testing effort, testing skill, fault types, etc. Their aim is to fulfill the growing need for an accurate model that can guarantee a minimum prediction error. A good software reliability model should be simple, widely applicable across different software failure datasets, and based on realistic assumptions. Some notable models in the domain of software reliability engineering are the Jelinski–Moranda (JM) model [13], Musa–Okumoto (MO) model [14], Goel–Okumoto model [15], Delayed S-shaped model [16], Inflection S-shaped model [17], etc. The JM model [13] which is considered as the first SRGM, is based on the assumption of a constant fault detection rate, making it relatively simple but not always suitable for modern complex software development practices. The MO model [14] is an extension of the JM model. It introduces a time-dependent hazard function, which allows the fault detection rate to decrease as testing progresses. The Goel–Okumoto model [15] is based on the assumption that the software failure rate decreases exponentially over time as defects are discovered and fixed. The Delayed S-shaped model [16] is a variation of the S-shaped logistic growth model. It is used to represent software reliability growth with an initial “delay” or slower improvement in reliability, followed by a more rapid increase as testing continues. The Inflection S-shaped model [17] represents software reliability growth with an initial slow rate of improvement, followed by an inflection point where the rate of improvement increases significantly. As software systems continue to evolve, ongoing research efforts are needed to refine and extend the SRGMs to address emerging challenges. Recent developments in software reliability growth modeling have demonstrated the importance of adapting models to contemporary software development practices. These models are now looking into the various unexplored facets of the software testing process and adding them to the model. Some of them employ machine learning algorithms [18,19,20], some examine uncertainty factors of the testing or operational environment [21,22,23,24], and some mix the various techniques to create hybrid models [25, 26].

In this paper, we present a model that utilizes the logistic growth function and incorporates log–log distribution for predicting software reliability. PF Verhulst introduced the logistic function in the middle of the nineteenth century [27]. By modifying the exponential growth model, he developed it as a population growth model, which is known as the logistic growth model. In this context, the word “logistic” has no special meaning or significance other than the fact that it is widely accepted. Apart from biology, the logistic model is now widely used in chemistry, geoscience, political science, and statistics. According to the model, each member of a population will have equal access to resources and a similar probability of surviving. The population’s per capita growth will decrease with time due to resource limitations. First, Yamada and Osaki [28], employed this model to estimate software reliability growth. Later, many researchers have used this model in evaluating software reliability under the assumption that each fault has an equal chance of discovery and that the fault detection rate per fault will decrease as testing progresses [29,30,31]. It allows for a realistic representation of software reliability improvement over time, accommodating various project-specific factors. Some researchers have shown promising results when they consider the fault detection rate function based on log–log distribution [32,33,34]. The purpose of this paper is to suggest a new SRGM by taking advantage of both the logistic model and the log–log fault detection rate function. A log–log distribution is a type of probability distribution where the probability of an event or value is proportional to the logarithm of its value. This means that as the value of the event or variable increases, the probability decreases exponentially. In contrast to the typical bathtub-shaped curve produced by the Weibull distribution, the log–log distribution offers a Vtub-shaped curve. In statistical modeling, the Weibull distribution is widely used. However, Vtub-shaped curves cover a wider range of monotone failure rates in addition to the characteristics of bathtub-shaped curves with increasing or decreasing failure rates. Other advantages of log–log distribution in software reliability modeling include:

  1. 1.

    Ability to detect early failure trends: The log–log distribution is very sensitive to changes in the failure rate, especially at the beginning of the testing process. This makes it easier to detect early failure trends and take corrective action before the software is released.

  2. 2.

    Non-linear behavior: The log–log distribution captures the non-linear behavior of failure rates in software testing, which is a more accurate representation of real-world failure patterns. This can help software developers better understand how their software behaves under different conditions and make more informed decisions about how to improve its reliability.

  3. 3.

    Model fitting: The log–log distribution is easy to fit to the data using regression techniques. This makes it a practical choice for software reliability growth modeling, where large amounts of data need to be analyzed.

The rest of the paper is structured as follows. Section 2, explains the derivation of a new SRGM. Section 3 describes the steps involved to analyze the model. Section 4 compares the model performance with some well-known models. And Sect. 5 concludes the paper outlining the findings, strengths and limitations of the study.

2 Proposed SRGM

A mathematical model based on the Verhulst logistic equation was designed to predict population growth. The model is represented as:

$$ \frac{{{\text{d}}p}}{{{\text{d}}t}} = r \times p\left( {1 - \frac{p}{k}} \right) $$
(1)

where, ‘p’ and ‘r’ represent the current population and the growth rate respectively. ‘k’ is the carrying capacity, i.e., maximum number of population that can survive in a specific environment with the available resources. In software testing process, it is observed that the fault detection rate at the beginning is very high, and this rate gradually drops as the testing continues. Anticipating this behavior of the testing process, the above model can be replicated as [29, 30]:

$$ \frac{{{\text{d}}m\left( t \right)}}{{{\text{d}}t}} = b\left( t \right)m\left( t \right)\left( {1 - \frac{m\left( t \right)}{N}} \right) $$
(2)

m(t) is the total number of faults detected by time t, b(t) is the time-dependent fault detection rate function, and ‘a’ is total no. of faults that exist in the software before testing. By solving Eq. (2), we can find the value of m(t).

$$ m\left( t \right) = \frac{N}{{1 + e^{c} e^{{ - \mathop \smallint \nolimits_{0}^{t} b\left( t \right){\text{d}}t}} }} $$
(3)

Here, ‘c’ is an integral constant. We assume that the function b(t) follows Vtub-shaped based on log–log distribution that has been considered in a few models suggested by Pham [12,13,14]. It is expressed as:

$$ b\left( t \right) = \beta \ln \left( \alpha \right)t^{\beta - 1} \alpha^{{t^{\beta } }} $$
(4)

where ‘\(\alpha\)’ is the scale parameter and ‘\(\beta\)’ is the shape parameter of the Loglog distribution. Replacing the value of b(t) in (3) and considering a constant \(\gamma\) = c + 1, we can derive the following:

$$ m\left( t \right) = \frac{N}{{1 + e^{{\left( {\gamma - \alpha^{{t^{\beta } }} } \right)}} }} $$
(5)

This is our new model which uses NHPP framework. The mean value function also known as m(t), is the characteristic that NHPP models use to describe data. The suggested model, some well-known models, and their related functions are listed in Table 1.

Table 1 A list of SRGMs

3 Model analysis and comparison

The suitability of the SRGMs is assessed in two steps: first, the model's parameters are calculated, and second, the model fittings are verified using various comparison criteria. In this section, we validate the proposed model and compare the findings to various known models given in Table 1.

3.1 Evaluation criteria

SRGMs are not precise representations of reality; assumptions and many environmental factors affect the validity and accuracy of their predictions. Therefore, it is essential to evaluate a model's appropriateness, which entails figuring out its advantages and disadvantages as well as the degree to which the results offered may be trusted. The potential of an SRGM to recreate the observed software behavior and predict future behavior with the help of observed failure data can be used to evaluate it. There are a variety of comparison standards called "goodness-of-fit" or GoF criteria that are used to assess how well a model fits a given collection of data. In this study, we use four criteria (i.e., the mean square error (MSE), PRR, R2 and AIC) that are defined as follows [36,37,38]:

  • MSE: It measures the difference between observed data and their expected values.

$$ \quad\quad{\text{MSE}} = \frac{{\mathop \sum \nolimits_{i = 1}^{k} \left( {m_{i} - m\left( {t_{i} } \right)} \right)^{2} }}{k - p} $$

Here, ‘k’ denotes the database size, and ‘p’ is the number of parameters of a model.

  • PRR: It measures the deviation per estimate of a model.

$$\quad\quad {\text{PRR}} = \mathop \sum \limits_{i = 1}^{k} \left( {\frac{{\left( {m\left( {t_{i} } \right) - m_{i} } \right)}}{{m\left( {t_{i} } \right)}}} \right)^{2} $$
  • R2: It represents how close the data are to the fitted regression line.

    $$ R^{2} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{k} \left( {m_{i} - m\left( {t_{i} } \right)} \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{k} \left( {m_{i} - \mathop \sum \nolimits_{j = 1}^{k} \frac{{m_{j} }}{k}} \right)^{2} }} $$
  • AIC: It is a single number score that is used to decide which of many models is most likely to be the best one for the dataset.

    $$ {\text{AIC}} = k \times \ln \left( {{\text{MSE}}} \right) + 2p $$

The better an SRGM is, the lower its MSE, PRR, and AIC values, and the higher its R2 value [31, 36].

3.2 Dataset used

Reference [39] reports three sets of failure data for three different releases of sizable medical record software (MRS). We utilized the first two releases’ data for our analysis (i.e., MRS-1 and MRS-2). The software has188 modules in total. It had 173 modules when it was first released, and 15 more modules were added later in the updated versions. In MRS-1, there were a total of 176 errors collected during the 18 weeks of testing. MRS-2 contains 204 errors that were found during the testing period of 17 weeks. Table 2 displays both the MRS-1 and MRS-2 datasets.

Table 2 Failure data of MRS

3.3 Parameter estimation

A software reliability model is a function of various parameters. The first step in employing an SRGM is to determine its parameters on a sample dataset. The process of parameter estimation can be done in a variety of ways. In the case of small datasets, the Least Square Estimation (LSE) approach is considered a quite efficient method for parameter estimation [37]. We have used the “CurveExpert Professional” tool to smoothly carry out the experimental work. It generates LSE results by fitting failure data to a non-linear model equation on a given dataset. Table 3 shows the parameter values of all seven models on both datasets.

Table 3 Parameter estimation using LSE

4 Results and comparison

Now with the help of the parameter values we can easily calculate the GoF criteria (i.e., MSE, PRR, AIC, and R2) from their formula. In statistical models, there is no right or wrong. It’s all about how good or bad they are. The suggested model produced the best fitting results on both datasets. The model’s accuracy has been compared to other SRGMs in Tables 4 and 5. We can summarize the results of the new model as follows:

  • MSE: The new model offers the smallest MSE in both datasets, i.e., 52.91 (MRS-1) and 26.855 (MRS-2). The second smallest MSEs achieved in the study were 116.3 for the ISS model and 53.531 for the Loglog Model in these datasets. The MSE values of the new model are near about half of the second best models.

  • PRR: The PRR values are 0.477 and 0.009. Both are the smallest among all models and the second smallest values are more than four times in the respective datasets.

  • R2: R-squared is the “percent of variance explained” by the model. A higher R-squared value indicates a higher amount of variability being explained by our model and vice-versa. The R2 values of the new model are 0.987 and 0.982. Both are the largest among all model’ values in the respective datasets.

  • AIC: The AIC values are 71.153 and 54.377. Both are the lowest among all models’ values. They indicate that the proposed model is relatively simple with four parameters.

Table 4 Performance analysis of the models on MRS-1
Table 5 Performance analysis of the models on MRS-2

The graphs in Figs. 1 and 2 show the model’s predictive capability on each dataset, comparing the estimated faults to the observed faults.

Fig. 1
figure 1

Predictive capability of the model for MRS-1

Fig. 2
figure 2

Predictive capability of the model for MRS-2

5 Conclusion

While many software reliability growth models have been proposed in the literature, there is often a lack of real-world validation to assess their effectiveness and applicability in practical settings. Investigating various approaches is necessary to comprehend software behavior and perform an efficient analysis of failure data. The paper proposes a software reliability model based on a logistic growth model, whereas the fault detection rate follows a log–log distribution. The main advantage of using log–log distributions in software reliability growth modeling is their suitability for capturing long-tailed data. In many software systems, the fault detection rate at the beginning is very high, and this rate gradually drops as the testing continues. Based on this behavior, we applied the logistic growth model to describe this behavior of the fault detection rate. The mathematical model for estimating the total number of faults over time is developed. The performance of the suggested model is then examined using numerical data from four standard GOF measurements and compared to a number of existing models. The outcomes clearly demonstrate that the recommended model performs better than the other models in all four assessments. The key strengths of the suggested model are its high predictive capability and flexibility. However, the model could be negatively impacted by a few limitations such as dataset dependency, complexity, and its inability to address uncertainty issues. In future, we will carry out more case studies to establish the supremacy of the model using different sets of failure data and evaluation criteria.