Keywords

1 Introduction

According to [1], it is estimated that 85% of e-commerce businesses with an investment of less than 10,000 euros fail to surpass two years of activity. The expenditure in ecommerce increased by 44% in 2020, with a similar trend in 2021, as reported by Forbes in www.elmundo.es [2]. However, the initiative of new companies entering the market in such a competitive context result in 90% of them failing within the first 4 months. To reduce the risk of user loss and strengthen their trust in products within an e-commerce platform, customer analysis should be conducted using techniques such as clustering to understand users and their needs, exploring potential customer groups [3].

In [3], the authors suggest that e-commerce platforms have the capacity to collect a large amount of data daily, which should be utilized for analyzing customer behavior and creating products for different groups of potential users. However, [4] mentions that for data to become an important resource for a company, it must be effectively utilized; otherwise, it can become a burden. This involves the utilization of technologies such as artificial intelligence, statistics, and databases, among others [4]. Consequently, smaller e-commerce companies find themselves in a disadvantaged position as they lack specialized technology and data analysis teams to compete with larger companies.

Therefore, aligned to [5], this study proposes a System that enables the generation of Sales Campaign Strategies for each customer segment in the e-commerce platform. To achieve this, the more precise Clustering algorithm will be used to classify customers for retail e-commerce, choosing between k-means and Hierarchical Clustering based on the collected company data. Then, these data will be fed into the Decision Tree algorithm to generate personalized sales strategies that the company can apply in their marketing plan, aiming to improve their performance in the market [6].

2 Related Work

In [7], a class model was developed to segment customers into groups and classify them based on their income. This proposal provided access to key information for maintaining good customer relationships and evaluating them in the long term [7]. The segmentation was based on identifying customer groups using characteristics of their own behavior. It was discussed how the obtained results could be relevant for implementing a loyalty program and improving future studies that also address customer quality and lifestyle as an additional variable, combining survey data and purchase history.

In [8], research was conducted to find patterns of customer behavior using data mining techniques and the support of the K-Means clustering algorithm to identify the best-selling products and the payment method used. RapidMiner was used to facilitate the use of necessary tools for analysis, and the Davies-Bouldin index (DBI) was used to evaluate the quality of the segmentation algorithm.

In [9], data mining was employed to discover patterns in customer behavior. It was found that the best-selling products belonged to category 503–505 and that the most used payment method was credit. It is important to adapt to changes in the virtual market and personalize strategies to achieve better results in audience relationships.

In [10], the authors proposes a comprehensive and simplified system from data preprocessing to visualization, suitable for small businesses. This system identifies the popularity of each product over a period and targets potential customers based on that information. The purpose of customer segmentation is to divide the user base into smaller groups that can be targeted with specialized content and offers. Customer segmentation allows businesses to efficiently target each specific group of customers. The study implements customer segmentation using a hierarchical clustering algorithm with a small dataset. Additionally, a credit card dataset is utilized. The agglomerative hierarchical clustering method is performed using the hclust function from the Cluster package in R. The study concludes with the authors’ perspective. However, it is noted that this method can be slow and hardware-dependent.

3 Method

3.1 Algorithm Comparison

To compare the K-means and HC algorithms, we conducted a benchmarking process. We used two tables (Table 1 and Table 2) to evaluate the algorithms in terms of scalability and five other dimensions based on e-commerce sales datasets, both public and private. Then, we compared the results in Table 3 to determine the accuracy of each algorithm in customer segmentation. We used this information to develop an effective customer segmentation system.

Table 1. Algorithm Benchmarking.
Table 2. Algorithm Benchmarking.

In Table 1, we evaluated the algorithms across five different dimensions. In Table 2, we assigned weights to each dimension to identify which one had the greatest impact on customer segmentation. Based on the results obtained in these two tables, we determined that K-means was the best algorithm for our project.

Table 3. Comparative Table of Cluster Accuracy for K-means and HC.

Finally, in Table 3, we display the results of the accuracy comparison between K-means and HC across five different companies. The results indicate that K-means achieved higher accuracy than HC in the division of 4 customer clusters. This information was crucial for selecting the appropriate algorithm for our customer segmentation application.

3.2 Decision-Making Algorithm

In the second stage of executing our solution, a decision tree algorithm was used to establish the best sales strategy for the e-commerce platform for each of the customer clusters generated as the output of the K-means segmentation algorithm. The algorithm implementation was based on creating a supervised learning model, which requires a labeled dataset from which the created model can make the correct decision based on what it has learned.

In our project, the decision was made to create a column called “Strategy,” which was used as the label to determine the most accurate value. Therefore, we used the clusters created in the previous step, which have average characteristics of the customers belonging to each cluster, as the input for the decision tree algorithm. This way, the algorithm can determine the most suitable strategy to apply. To accomplish this, it was necessary to gather a dataset of 200 clusters to train the decision tree with specific strategies labeled in the designated column based on the characteristics of each cluster. We had the professional support of a marketing specialist who helped us label the best strategy.

In this manner, the intelligent system can provide the company with precise and personalized sales strategies, which contributes to improving its performance in the market and increasing profitability [4].

3.3 Tool Implementation

Once the K-means algorithm has been selected as the preferred approach for segmenting customers of small e-commerce businesses, an intelligent system was developed to generate personalized sales campaign strategies for each customer segment identified in the e-commerce platform using the data collected by the company. To classify retail customers, the K-means clustering algorithm will be employed. Then, this data will be fed into a decision tree algorithm to generate customized sales strategies that the company can apply in its marketing plan with the aim of improving its performance in the market.

The logical structure of the system consists of various layers: client, access, presentation, business, and data. First, the client layer is established, where the web browser plays a fundamental role as a means of communication with the user. To achieve successful connection, it is essential to have an internet gateway located in the access layer. The presentation and business layers are adjacent to the access layer and are encapsulated in the service provided by Azure Web App, where our solution will be deployed. The presentation layer focuses on the visual aspect and user interface of our platform, being the visible point of contact for the user. On the other hand, the business layer hosts all the necessary logic to carry out the functionalities of our solution. As for data storage, we propose directing the data generated by our intelligent system to an external cloud database using the Azure Database PostgreSQL service, which is in a separate layer called the Data Layer.

Fig. 1.
figure 1

Logical Architecture

The proposed physical architecture outlines how the connection will be established between the user, in this case, a marketing employee of an e-commerce business, and our solution. It also details the necessary technologies for the proper development of the intelligent system.

The user’s connection will be made through a laptop or desktop computer, either via a Wi-Fi or direct Ethernet connection. Once the user has internet access, they can access our platform deployed on the Azure Web App service.

The technology distribution is divided into two sections: the front-end part of our platform, where direct visual interaction with the user will be established. For this, we will use HTML, CSS, and JavaScript, and enhance the user experience using Angular, taking advantage of its facilities for implementing modules and views.

In the back-end part, the main language will be Python, along with the Django framework, to carry out the necessary logical functionalities in the development of our algorithms and data storage. For data storage and connection with the external Azure Database PostgreSQL service, we will utilize the SQLAlchemy library, which will facilitate the creation of the data model and transactional rules to be applied in our solution.

For the creation and development of the clustering algorithms, we will employ the scikit-learn library, which offers various functions related to major clustering algorithms and simplifies their implementation.

Fig. 2.
figure 2

Physical Architecture

The intelligent system under development offers a set of functionalities that improve its usefulness and efficiency. These features include: system login for secure access; registration of new users to create individual accounts; the ability to modify the user’s profile to update personal information; access to different views of the system through an intuitive navigation menu; system dataset management, including loading, modifying, and deleting datasets; running the analysis tool, which allows you to select a specific dataset for analysis; and finally, the execution of the analysis itself, using a segmentation model to obtain relevant and significant results. These combined functionalities offer users a complete and efficient experience when using the intelligent system. The success of the development of these functionalities can be seen in Figures 1, 2 and 3 of the tool (Figs. 4 and 5).

Fig. 3.
figure 3

Login

Fig. 4.
figure 4

Dataset Loading

Fig. 5.
figure 5

Dataset Analysis and Cluster Results

4 Validation

The selected e-commerce company carried out a comprehensive evaluation of our platform and its key features over an approximate period of 1 month. The objective of this process was to obtain quantifiable results that can be compared with historical data, in order to validate any observed variations after using our tool [11]. In this way, the aim is to determine how our solution can support the growth of an e-commerce business by effectively leveraging customer data and providing effective support in that regard. The success or failure in e-commerce will depend on the positive response to the established indicators [12].

4.1 Evaluation Indicators

The validation of the results obtained after the company has used our service will serve as support to evaluate four relevant indicators that contribute to improving business growth and increasing productivity. This validation will also help prevent the inclusion of the company in the alarming failure rate that prevails among small e-commerce businesses (Table 4).

Table 4. Evaluation Indicators.

The proposed indicators encompass several key aspects to evaluate the performance and growth of an e-commerce business. These indicators include Web Traffic, Click-through Rate, Customer Acquisition, and Sales Revenue. Both the visit and page click indicators are useful for analyzing the progress of interaction and behavior of customers and potential customers on the website. These indicators allow us to determine if there has been growth after implementing the strategies defined by our solution.

Additionally, it is important to consider that increased interaction can lead to the conversion of potential customers into actual customers. This aspect will also be evaluated to measure growth in the buyer base. Lastly, the primary indicator to observe will be the increase in sales through the platform, as this will provide a substantial outcome on the growth of the e-commerce business.

In summary, the selected indicators such as page visits, new customers, page clicks, and sales are fundamental elements to evaluate the growth and success of the e-commerce business. These indicators will provide us with a comprehensive view of performance and allow us to measure the impact of the implemented strategies on e-commerce growth.

4.2 Pre-implementation State

To validate the obtained results, it is necessary to have access to the historical information of the e-commerce business related to our indicators, in order to make a comparison and determine if there has been an improvement and growth with the use of our tool. The selected company uses Shopify as the platform for website management and organization, along with Google Analytics for page interaction ratios, which has facilitated the extraction of various metrics related to their customers, generated in their information panel of both tools. These metrics have been aligned with our indicators, including Web Traffic, Click-through Rate, Customer Acquisition, and Sales Revenue. The comparison will be performed using data from the last six months, prior to the implementation of our solution, as shown in Table 5.

Table 5. Six-Month Metrics per Indicator.

Then, Formula 1 will be used to assess the variation of these indicators over the 6-month study period.

(1)

In formula (1), “a” is the index of the month being evaluated and “a − 1” is the index of the previous month.

4.3 Pos-implementation State

With the objective of evaluating the indicators, an estimated period of 1 month (4 weeks) was established during which a weekly marketing plan was implemented based on the recommendations provided by our platform. This plan sought to obtain results that reflected the impact and benefits of our solution on the potential development of the electronic commerce business and to compare the growth percentages between the previous months without the use of the tool, compared to the growth of the latter. Month. Once the 1-month period was over, we proceeded to evaluate the results of the four key indicators (WT, CTR, CA and SR). We made a comparison between the values prior to the implementation of the solution and the values after the application of the strategies recommended by our system. To carry out this comparison, we considered the results of one month prior to the use of the tool in comparison with the results obtained using our tool.

Next, we define the formula used to determine the variation in the indicators and that gives us an objective answer about the growth or not of the indicators:

(2)

5 Results

The metrics from the e-commerce Shopify platform were collected, and the results of the 4 indicators were analyzed considering the periods mentioned in Table 6 (Table 7).

Table 6. Sampling Periods.
Table 7. Two-Month Metrics per Indicator.

We applied Formula 1 and Formula 2 to evaluate a comparison of both scenarios for each of the indicators.

Table 8. Indicators Variation.

Table 8 displays the Variation of the 4 indicators (WT, CTR, CA, and SR), showing a comparison of the average growth before the use of the tool, contrasted with the growth of the metrics with the use of our solution. This allows for a more objective and effective comparison of the percentages.

6 Conclusions

The analysis of the results concludes that there is a significant increase in web traffic, click-through rate, and total sales compared to the previous scenario where the tool was not used and marketing strategies were not planned based on data. It is important to consider that December, January, and February are the months with the highest sales in the year. Despite these factors, it is evident that the use of the proposed solution allows for the growth of the indicators. It is also important to highlight that there is a notable increase in sales in monetary terms, despite the implementation of discounts and promotions on the products.

In conclusion, the sales strategies recommended by the tool have positively influenced the company both in monetary terms and in engagement.