Keywords

1 Introduction

Full-stack development has seen tremendous growth recently due to the increasing demand for web development as the internet, and e-commerce continues to expand. Both mobile and web applications use RESTful Web APIs for authentication, data access, file management, and other resources. RESTful APIs are REST-based APIs that use resource identifiers to represent specific resources intended for interaction between components. The current state of a resource is referred to as a resource representation, which consists of data, metadata describing the data, and hypermedia links that allow for changing the state of the resource [1]. RESTful architectural design is a specific method for implementing APIs, introduced in 2000 by Roy Fielding. This design involves a set of constraints to improve an API’s reliability, scalability, and performance [2]. APIs generally serve as interfaces with a set of functions, protocols, and tools to integrate software applications and services. Web APIs, in particular, can be accessed over the web through the HTTP/HTTPS protocols, allowing requesting systems to access and manipulate web resources using standard and predefined, uniform rules. REST-based systems interact through the Internet’s Hypertext Transfer Protocol (HTTP) [3]. A Web API enables the front-end or multiple front-ends for different web application devices to communicate with the back-end by sending requests to specific endpoints and receiving data in response, as shown in Fig. 1. According to a survey from the developer nation in 2020, a staggering 90% of developers utilize APIs, demonstrating that the proliferation of APIs has played a crucial role in the growth of the developer ecosystem in recent years. The high adoption rate of APIs among developers serves as solid evidence that the rise of APIs has significantly impacted and contributed to the expansion of the developer ecosystem [4]. With an increasing number of programming languages, many with similar components and coding styles, performance should play a role in choosing a language/framework. The proper way to do this evaluation is to develop two different Web APIs using various technologies that use the same database and display the same output.

Fig. 1.
figure 1

Web API

Analyzing the performance of web applications is a common practice, with most studies focusing solely on testing the application as a whole. However, it is essential to assess the entire solution, including isolating testing of the Web API. This approach can effectively identify any potential issues specific to the Web API. This paper introduces a suggested suite of tests designed to evaluate how Web APIs behave across various CRUD (create, read, update, and delete) operations. These tests facilitate the examination of the application’s performance under diverse circumstances, encompassing both typical and exceptionally high request rates, as well as prolonged and resource-intensive durations. Furthermore, we provide a collection of tools for assembling the test suite and for visualizing and interpreting the outcomes it generates.

The article is structured into six sections, starting with the Introduction. In this section, readers will gain an understanding of the article’s objectives and the reasoning behind them. The second section describes RESTful Web API’s technology’s functionality, including its norms and practical applications, and an understanding of the key features that distinguish RESTful APIs from other APIs. In the third section, we introduce a set of tools utilized to construct and test the Web API and visualize the test results. The fourth section of API is performance testing, where the test battery is presented and why each test should be applied. The fourth section of the article focuses on performance testing, where the test battery is presented. It explains the reasoning behind each test’s application, with the importance of performance testing and the specific tests. The fifth section provides insights into the possible outcomes of the performance testing and also how to visualize the test results. Finally, the sixth section is the Conclusions, which summarizes the importance of running the different tests on each CRUD operation.

2 Related Work

A previous study concentrated on assessing the latency and performance of Web APIs but encountered the challenge of defining a standardized set of tests that could be universally applicable across different technological contexts [5]. Similarly, various research efforts have attempted to compare performance across diverse technologies. Yet, they, too, have faced the limitation of needing a comprehensive test suite adaptable to various scenarios or technological environments [1, 6, 7].

In a separate line of investigation, some studies have compared performance between two prominent architectural styles for web service development: REST and SOAP/WSDL. However, these studies typically needed to include the utilization of multiple tests with varying loads, limiting the breadth of their performance evaluations. Conversely, numerous other research endeavors have honed in on assessing the performance of Web APIs in the context of microservices-based web applications [8,9,10].

Earlier studies have delved into Web API performance and benchmarking analysis, with certain ones outlining the methodologies employed to yield their results. However, these studies often grappled with the challenge of creating a standardized testing framework that could be universally applied across diverse technological contexts.

In contrast, the present research addresses these limitations by undertaking a comprehensive examination. This examination encompasses all CRUD (Create, Read, Update, Delete) operations within Web APIs and is intentionally designed to be platform and technology-agnostic.

3 RESTful Web API

A well-designed Web API can expose the functionality of the back-end to other applications and services, allowing for the reuse of existing code and easy integration of new services [11]. Web API has the advantage of allowing different teams and developers to work together more efficiently and build more powerful and flexible web applications [12]. For example, one team can focus on front-end development, another on back-end development, and a third on infrastructure or DevOps. This clear frontier allows each team to have a deeper understanding and expertise in their area of focus, which can result in better quality and more efficient development. Splitting the work across teams can make it easier to manage and scale larger projects [13]. The Web API can be developed using different technologies, from Java, .NET, or JavaScript, using web-development frameworks. A Web API is a set of rules and protocols that allows different software applications to communicate with each other. APIs provide a way for different programs to interact with one another without requiring direct access to the underlying code. APIs are often used to access web-based software, such as social media sites, weather services, and online databases. For example, when a client uses a mobile app to check the weather using a mobile phone, the app is likely using an API to retrieve the data from a weather service’s servers [14]. Some web services provide APIs for clients to access their functionality and data. In such a scenario, the API is a set of functions, methods, and protocols that provide access to the functionality and data of a service, such as a database or a web application [15]. Then, Web API, when conforming to the REST architectural principles, are characterized by their relative simplicity and their natural suitability for the Web, relying almost entirely on the use of URIs for both resource identification and interaction and HTTP for message transmission [16].

3.1 Representational State Transfer (REST)

Following protocols, such as SOAP and RESTful web services, can implement Web APIs. The development of mobile applications was the initial driving force for RESTful, adopted over other protocols due to the simplicity of use [17]. There are clear advantages to the use of REST. Typically faster and uses less bandwidth because it uses a smaller message format. Another main reason is that it supports many different data formats, such as JSON, XML, CSV, and plain text, whereas SOAP supports only XML [18]. Representational State Transfer (REST) is a software architectural style that defines the rules for creating web services. RESTful Web APIs are based on the principles of REST architecture, which defines a set of architectural constraints that a web service must adhere to be considered RESTful, first described by Roy Fielding in his doctoral dissertation [2]. A RESTful Web API must follow these six architectural constraints [19]:

  • Client-Server: Separate client and server concerns for independent evolution.

  • Stateless: No client state retention, all needed data in requests.

  • Cacheable: Clients can cache responses for improved performance.

  • Layered System: Clients access API functionality consistently regardless of infrastructure.

  • Code on Demand (Optional): Allows downloading executable code for client extension.

  • Uniform Interface: Ensures an easy-to-learn and consistent client interaction.

Web APIs facilitate the seamless communication and collaboration of various software systems, which may have been developed using diverse technologies and programming languages. They foster interoperability across a broad spectrum of platforms and devices [20]. By leveraging APIs, developers can deconstruct intricate systems into more manageable, bite-sized components. This approach to modularity streamlines the processes of development, upkeep, and software updates. APIs empower applications to incorporate external services and data sources, broadening their capabilities and granting access to a broader array of services [21].

3.2 API HTTP Verbs

The API Interface should be simple, consistent, self-describing, and supports the most common standard HTTP methods (HTTP verbs): GET, POST, PUT, and DELETE. These verbs are used to indicate the intended action to be performed on the requested resource. Usually, they are translated into the CRUD operations - Create, Read, Update, and Delete [22].

4 Tools

A combination of Prometheus, Fluentd, and Grafana was utilized to facilitate monitoring this work. These tools were employed to collect statistics and create informative dashboards, providing insight into the performance and behavior of the system.

Prometheus is an open-source system monitoring and alerting toolkit. Provides real real-time monitoring and alerting on the performance of micro-services-based applications in cloud-native environments. Prometheus uses a powerful query language and a flexible data model that makes it easy to collect and store metrics from various systems and applications. The tool also includes built-in alerting and visualization capabilities [23].

Fluentd is an open-source data collection and logging tool. It can collect data from a wide variety of sources using input plugins and store the data in various destinations using output plugins. For this work, it will be used to read Nginx logs, parsing them into specific fields to Prometheus. Prometheus can’t process the NGINX logs to verify the accesses for each API. Fluentd can split the access logs into specific fields, such as IP, URL, and HTTP code. Then, Prometheus uses Fluentd as a data source. One of the fields relates to the path on the URL, which will be used on queries to Prometheus by Grafana to generate specific charts for each API.

Grafana is a popular open-source time-series data query, visualization, and alerting tool which was developed by Torkel Ödegaard in 2014. It has a highly pluggable data source model that supports multiple time-series-based data sources like Prometheus and Fluentd and SQL databases like MySQL and Postgres [24]. In this work, the data sources will be Prometheus, which provide the source for the virtual machine CPU and RAM, and Fluentd, for the NGINX reverse proxy.

For test and performance, the tools used were: cURL, Hey, and K6. cURL stands for “Client for URLs” and is a command-line tool for transferring data using various protocols. It is commonly used to send HTTP and HTTPS requests. Hey is an open-source load-testing tool for web servers developed by Jaana B. Dogan. It allows users to generate many HTTP requests to a specified endpoint to measure the endpoint’s performance and the server it runs on. Hey can be used to simulate different types of traffic, such as concurrent users, and it provides metrics such as request rate, latency, and error rate [25]. K6 is an open-source load-testing tool that allows developers to test web applications and APIs’ performance and scalability. It is written in Go, like Hey, and uses JavaScript as its scripting language for testing scenarios. It allows traffic simulation to a website or an API [26]. cURL and Hey was used for initial testing and to verify the testing environment, while K6 is the tool used in the examples in this work, with different setups for each test.

4.1 Test Scenario

The test scenario involved the setup of multiple virtual machines running Linux. Nginx was installed on the head node as a reverse proxy solution. The same VM hosted Prometheus, Grafana, and Fluentd to collect statistics and generate charts. Another VM housed a Java API connected to a third VM running a database engine, as depicted in Fig. 2.

Fig. 2.
figure 2

Test scenario

5 WEB API Performance Testing

Performance testing is a task performed to determine how a system accomplishes responsiveness and stability under a particular workload. It can also investigate, measure, validate, or verify other system quality attributes, such as scalability, reliability, and resource usage [27]. It is also an important test to identify bottlenecks and ensure that the software can handle the expected usage and demand. Several tests used to measure website performance may also be applied to Web API. Each test uses the tools presented in Sect. 4, following the three phases of traditional software testing: test design, test execution, and test analysis [28]. These steps should start by designing realistic loads for each type of test, simulating the workload that may occur in the field, or designing fault-inducing loads, which are likely to expose load-related problems. Once again, the tools from Sect. 4 will process logs, generating charts and tables with statistics [29].

5.1 The \(99^{th}\), \(95^{th}\) and \(90^{th}\) Percentiles

The \(99^{th}\) percentile is often used as a benchmark for performance testing because it represents a high level of performance. It measures how well a system performs compared to others and helps identify any outliers or issues that need to be addressed. Additionally, using the \(99^{th}\) percentile instead of the average (mean) or median can provide a more accurate representation of system performance, as it eliminates the impact of a smaller number of extreme results. The \(95^{th}\) percentile is also a commonly used benchmark for performance testing because it represents a level of performance that is considered good but not necessarily the best. It can provide a more realistic measure of performance, as it believes there may be some variability in results. Similarly to the previous, using the \(90^{th}\) percentile instead of the average (mean) or median can provide a more accurate representation of system performance, as it eliminates the impact of a smaller number of extreme results. In this case, the \(90^{th}\) percentile can help identify if the system is not meeting the desired performance level and any issues or bottlenecks that must be addressed. The mean and percentiles will be utilized in Grafana to create charts from the tests defined in K6.

5.2 Number of Virtual Users

The initial step in creating testing scenarios is often determining the appropriate number of concurrent users to simulate, establishing the foundation for establishing performance objectives. Although estimating the maximum simultaneous users for a new website can be difficult, for an existing website, numerous data sources, such as Google Analytics, can be utilized to establish performance targets and provide valuable information about the number of concurrent users likely to be required. To ensure the correctness of the testing environment and to determine a consistent number of virtual users (VUs) for all future tests, multiple pilot tests using different numbers of virtual users should be conducted for new applications (Fig. 3). Analyzing hardware requirements across various VU numbers is crucial for achieving optimal performance, CPU utilization, memory usage, and latency response. Conducting tests with different VUs can reveal diverse CPU and memory requirements behaviors, as demonstrated in Figs. 3 and 4. Notably, in this example, the CPU requirements increase with the number of users. Still, the memory requirements remain similar or even decrease, which contradicts the initial impression, highlighting the importance of analyzing the various hardware components.

Fig. 3.
figure 3

Pilot tests - CPU requirements

Fig. 4.
figure 4

Pilot tests - Memory requirements

A web API can be made available to clients through a web server or reverse proxy, which acts as a gateway to route incoming requests to the appropriate API endpoints and return responses to clients. These solutions can provide additional functionality like load balancing, caching, and security features that enhance API performance and security. Among the most popular web server and reverse proxy solutions are NGINX and Apache Web Server. The maximum number of concurrent connections for Apache2 is determined by the “MaxRequestWorkers” directive in its configuration file. The default value is 256 [30], but it can be adjusted according to specific requirements. On the other hand, the maximum number of concurrent connections for NGINX is set by the “worker_connections” directive in its configuration file. By default, NGINX can handle up to 512 connections per worker process, and this value can be increased to a maximum of 1024 connections per worker process [31]. Assuming that at least two workers are used, the number of allowed connections can be up to 2048.

5.3 Load Testing

Load testing primarily focuses on evaluating a system’s current performance in terms of the number of concurrent users or requests per second. It is used to determine if a system is meeting its performance goals. By conducting a load test, you can evaluate the system’s performance under normal load conditions, ensure that performance standards are being met as changes are made, and simulate a typical day in the business [32]. These tests are done using tools that use VUs to simulate the requests, as shown in Fig. 5. The configuration file in Fig. 6 specifies a maximum of 100 VUs for the test. As mentioned in Sect. 5.2, this value should be customized based on the expected traffic for a Web API or new applications. Figure 6 also shows that multiple endpoints can be tested simultaneously on the same instance using the same HTTP method.

Fig. 5.
figure 5

Load testing - VUs progress over time

5.4 Stress Testing

Stress testing is a form of load testing used to identify a system’s limits. This test aims to assess the system’s stability and dependability under high-stress conditions. By conducting a stress test, it can determine how the system will perform under extreme conditions and the maximum capacity of the system in terms of users or throughput. Also, the point at which the system will break, how it will fail, and whether it will recover automatically after the stress test is complete without manual intervention, as shown in Fig. 7. The configuration file shown in Fig. 8 specifies the VUs for different time intervals during the stress test, as indicated by the chart in Fig. 7.

Fig. 6.
figure 6

K6 Load testing

Fig. 7.
figure 7

Stress testing - VUs progress over time

Fig. 8.
figure 8

K6 Stress testing

5.5 Spike Test

A spike test is a variation of a stress test that involves subjecting a system to extreme load levels in a very short period. The main objective of a spike test is to determine how the system will handle a sudden increase in traffic and identify any bottlenecks or performance issues that may arise. This type of test can help identify potential problems before they occur in a production environment and ensure that the system can handle expected levels of traffic [32,33,34]. By conducting a spike test, you can determine how the system will perform under a sudden surge of traffic, most frequently a Denial of Service (DOS) attack, and whether it can recover once the traffic has subsided. The success of a spike test can be evaluated based on expectations, and systems generally react in one of four ways: excellent, good, poor, or bad.

  • “Excellent” performance is when the system’s performance is not degraded during the surge of traffic, and the response time is similar during low and high traffic;

  • “Good” performance is when response time is slower, but the system does not produce errors, and all requests are handled;

  • “Poor” performance is when the system produces errors during the surge of traffic but recovers to normal after traffic subsides;

  • “Bad” performance is when the system crashes and does not recover after the traffic has subsided, as depicted in Fig. 9.

Fig. 9.
figure 9

Spike testing - VUs progress over time

Figure 10 shows the configuration file where it is defined that the VUs will peak at 1500 and sustain for 3 min.

Fig. 10.
figure 10

K6 Spike testing

5.6 Soak Testing

Soak testing is used to evaluate the reliability of a system over an extended period. By conducting a soak test, you can determine if the system is prone to bugs or memory leaks that may cause it to crash or restart, ensure that expected application restarts do not result in lost requests, identify bugs related to race conditions that occur sporadically, confirm that the database does not exhaust allocated storage space or stop working, verify that logs do not deplete the allotted disk storage, and ensure that external services that the system depends on do not stop working after a certain number of requests [34]. To run a soak test, you should determine the maximum capacity that the system can handle, set the number of VUs to 75–80% of that value, and run the test in three stages: ramping up the VUs, maintaining that level for 4–12 h, and ramping down to 0, as shown on Fig. 11. A capacity limit of 75% to 80% for the soak test may place too much strain on the database, potentially causing the test to fail. To prevent this, the test should be conducted with a lower number, for example, using the default capacity limit for the Apache web server, which amounts to 400 connections when set to 80% capacity. The file configuration in Fig. 12 displays the VUs reaching 400 and staying at that level for approximately 4 h.

Fig. 11.
figure 11

Soak testing - VUs progress over time

Fig. 12.
figure 12

K6 Soak testing

5.7 Tests to All CRUD Operations

The tests outlined in Sect. 5 are exclusively related to the GET method. When evaluating an API’s performance, testing the other HTTP verbs used for the CRUD operations, such as the POST method for creating new records or the PUT method for updating existing ones, is essential. In these cases, a valid JSON payload must be added and sent to the server, as demonstrated in the example in Fig. 13. This example also showcases using environment variables and virtual users for iteration. The PUT method needs a present identifier to update a record successfully. Testing for deletions is challenging, as most web APIs include the identifier of the record to be deleted in the URL. It necessitates a distinct strategy for deleting a valid existing record, and various methods exist. For testing purposes, you can define the inserted identifier and use only identifiers that combine the iteration and virtual user. Alternatively, you can eliminate the identifier parameter and erase the identifier with the highest value on the database.

Fig. 13.
figure 13

K6 POST load test

Fig. 14.
figure 14

K6 GET Spike test

Fig. 15.
figure 15

K6 PUT Spike test

Fig. 16.
figure 16

Grafana - Latency p99 and p90

6 Results

The tests run on a linux console, using K6 provide initial results via command shell, as shown in Figs. 14 and 15. The results provide valuable performance information, particularly for HTTP request duration - median, percentile 99, and 90, as well as the number of test iterations and iterations per second and the number and percentage of failed HTTP requests. This information reveals the latency of requests at each percentile and the performance and potential bottlenecks of the Web API, as demonstrated by the results. The GET method in Fig. 14 ran without any issues, while with the PUT method (Fig. 15), over 10% of the requests failed due to 40x HTTP errors. Running the test only on the GET method could be deceiving, and it shows the importance of testing all methods, helping to identify potential technology or code problems. Using Grafana helps to gain a deeper understanding of the performance data. Grafana provides charts and visualizations and can be used to cross information such as VU numbers with the number of failed requests. It can also be used to visualize key metrics such as the latency of HTTP requests at the \(99^{th}\) and \(90^{th}\) percentiles, as demonstrated in Fig. 16. In addition to that, it also provides insight into the CPU and memory utilization, as shown in Fig. 17. This complete picture of the Web API’s performance allows for quick and easy identification of any potential bottlenecks or areas for improvement. Using multiple sources on Grafana, it is possible to compare different APIs on the same chart, allowing direct comparison, shown in Fig. 18.

Fig. 17.
figure 17

Grafana - CPU and memory

Fig. 18.
figure 18

Soak test .NET vs Java Spring

7 Conclusions

Web API performance is essential because it directly affects the user experience and the application’s overall success. Poor API performance can result in slow response times, error messages, and frustrated users. It can decrease user engagement, and e-commerce websites may reduce revenue. On the other hand, fast and reliable API performance can provide a better user experience, increase customer satisfaction, and drive business growth. In addition, efficient Web API performance is crucial for scalability and sustainability. As the number of users and API requests increase, the API must be able to handle the increased load without slowing down or crashing. A well-optimized API can handle significant traffic and requests, allowing smooth and seamless growth. Therefore, monitoring and improving Web API performance should be a priority for any organization that relies on APIs to power their applications and services.

One of the critical aspects of performance testing is defining the number of virtual users that will be used to simulate real user traffic. The number of virtual users required for performance testing will depend on several factors, including the system’s nature, the expected user load, and the testing goals. There are several factors or steps, but conducting a pilot test with a few virtual users ensures that the testing environment is set up correctly and establishes a baseline for the system’s performance. From that point, gradually increase the number of virtual users, monitoring the system’s performance at each stage. Gradually increase the number of users until the system reaches its maximum capacity or until the testing goals have been achieved.

The tests outlined in this work aim to evaluate a Web API under varying workloads thoroughly. The tests should cover the primary HTTP verbs, including GET, POST, PUT, and DELETE. The GET test should address the most demanding scenario: retrieving all entities from a single endpoint. The POST test focuses on creating new records in the database, the PUT test focuses on updating existing resources, and the DELETE test focuses on removing resources. The comprehensive test suite must be run on the four CRUD (Create, Read, Update, and Delete) operations to identify and eliminate potential performance problems. By thoroughly testing each of the CRUD operations, you can gain confidence in the reliability and scalability of the system and prevent any unexpected issues from arising during production use. Testing all HTTP methods may also help to determine the appropriate number of virtual users for the tests.

The tools presented in this work are a valid method for obtaining real-world results and testing the response limits of the application. The results may be visualized through charts generated by these tools, clearly representing any issues detected during the testing process. Running multiple queries on a single chart allows running the same test on numerous Web APIs, visualizing the results on a single graph, and providing a tool for direct comparison.