Introduction

Holiday sales account for a significant portion of the annual revenue for many retail businesses. Understanding the browsing and purchasing patterns during such yearly shopping festivals creates opportunities for better interface designs and enriches user experience. The rapidly growing retail sector is e-commerce on mobile devices, contributed by an increasing number of smartphone owners who are becoming familiar with mobile purchases. Mobile has already become a primary platform through which online visitors access shopping sites instead of desktop computers. According to ComScore, 63% of online shoppers in 2015 were from mobile devices as opposed to desktops and mobile purchase are expected to grow rapidly. Despite the great potential of mobile shopping that enables anytime-anywhere purchase, however, little is known about mobile shopping behaviors due to the proprietary nature of data.

In an era of e-commerce being quite prevailing, the competition between mobile e-commerce is intense. Attribute to the grand shopping festival on 11st November (also called Double 11) created by Alibaba on 2009, Alibaba gradually stands out from various Business-to-Client (B2C) e-commerce websites. Since then, other e-commerce websites such as JD, YiHaoDian, Suning, Gome and Amazon also offer big discounts and promotions in China on 11st November, following in the footsteps of Alibaba. Nowadays, Double 11 has become the day when people in China celebrate the biggest shopping carnival on the Internet, similar to the Black Friday in America. The e-commerce sales on 11st November on the Internet rise from 50 million RMB in 2009 to 180 billion RMB in 2016, and are very likely to climb even higher because enterprises such as Alibaba and JingDong are aiming at internationalizing the shopping festival Double 11 actively.

Records of sales on a single day are refreshed constantly due to the biggest shopping carnival in each year. However, the more the sales are, it is more challenging for merchants to prepare an appropriate stock, to guarantee that the e-commerce platform will not be out of business with heavy burst traffic, and for express companies it is more difficult to arrange effective deliveries. Thus, analyzing historical consumers’ shopping behaviors before, on and after 11st November is essential for understanding people’s shopping behaviors in this big shopping carnival. This helps increase the revenue and reputation for both merchants, express companies and e-commerce platforms such as Alibaba and JD. Nevertheless, users’ online shopping behaviors are manifold. A large proportion of users spend plenty of time on browsing but never pay for any items, while some users first add items to cart and pay for them after a long or short time consideration, and also there are users who request for payment and pay decidedly. To recommend items to their latent buyers precisely (cf. Kim et al. 2016), we should extract users’ preference on items and the temporal characteristics of their online shopping behaviors which is based on the analysis of the logs generated when users are surfing on the e-commerce websites or using shopping apps.

Based on an anonymized log dataset on 10th ~ 12th November with over 47 k users and 236 k items, we study the user online shopping behaviors. The logs studied comprise information about user identifier, IP address, base station identifier, browsing URL, as well as the timestamps of every action. The logs were reconstructed at the level of product pages to reveal how people access mobile shopping websites during an annual sale event at the level of the JD.com’s main page, coupon pages, product pages, cart, and order actions. Because the data log actions of both purchasers and non-purchasers, they provide a unique opportunity to mine common shopping behaviors related to predicting purchases during an annual sale event.

The challenge of such data-mining task lies on the complex reverse engineering efforts to understand clickstream logs and to handle noise in data without deforming any crucial patterns. In particular, clickstream logs are not guided by user feedback such that one needs to create labels (e.g., a visitor has purchase intention) in unsupervised manners. Even the notion of how long a session lasts needs to be defined arbitrarily, as individuals engage in varying durations during the sale season (from a few seconds to several hours). We adopt varying definitions of sessions to be robust to the specific choice. Furthermore, people engaged in numerous actions on the shopping site, from browsing products or main pages to ordering actions as well as editing pro les. In order to focus on predicting purchases, we identified the major actions based on their frequency and model shopper behaviors. However, no information about users (e.g., gender, age) or product details could be revealed from the data. Product category information such as electronics or clothing was the only interpretable shopping context from the logs, which is a limitation of this study.

In this research, we conduct extensive analysis and model the mobile shopping patterns of tens of thousands of online visitors during such an annual sale event. First, we characterize online shopping users by dissecting their different online shopping steps, hesitant time duration for items, the specific time that they browse and pay on a day and etc. In addition, the popularity of an item can be detected. Second, based on the observations, we extract some features to conduct item recommendations based on collaborative filtering method. With the proposed collaborative filtering based approach, the hit rate of the item recommendations is evaluated based on 5-fold cross validation. Finally, we identify the critical shopping behaviors that determine the precursors of purchases. This paper’s strength is at testing the efficacy of several feasible precursors of purchasing actions (e.g., the effect of total browsing time, the number of clicks, product categories, and time of day in future purchases). We also examine whether visiting the shopping site prior to the sale event or browsing a coupon page is indicative of future purchases. Our important findings are summarized as follows,

  1. (1)

    Our study provides a first of a kind view on mobile purchase patterns over a shopping event.

  2. (2)

    We show that an ISP is able to parse specific human actions through requested URLs and use it to study user behavior. Therefore, this study is based on clickstreams, which include various browsing details.

  3. (3)

    This study is a multi-platform study. Choices of different platforms may lead to quite different behaviors. For example, although only about 39% users chose native app, they contributed 56% of all purchasers.

  4. (4)

    We find that item recommendations based on collaborative filtering is efficient, and the identified characteristic in shopping behaviours can predict the purchases with high accuracy.

The rest of this article is organized as follows. In “Motivation and dataset description” section, we describe the dataset we utilize in this article. “Characteristics of online shopping behaviours” section gives some statistical analysis from the perspective of users and items, respectively. In “Collaborative filtering based recommendation” section, we propose a collaborative filtering based approach to recommend items to consumers and propose a purchase prediction method. After representing some related works in e-commence and recommendation systems in “Related works” section, we finally conclude this article in “Discussion” section.

Motivation and dataset description

Motivation

For online retailers, understanding user’s purchasing behaviors is a typical problem. Many researchers have been focusing on the purchasing behaviors of online shoppers since 1999. Their researches are based on Pinterest data. They studied both long-term purchasing behavior (in days) and short-term one (in minutes or hours). One major purpose of these works aimed at finding factors that have influence on purchasing behaviors. It turns out demographic factors, product categories, looking at product info and other browsing behaviors, interest in ad, percentages of each action type, price of product, time information, as well as many other factors, are helpful for purchase prediction.

In 2017, at least three purchase prediction competitions have been launched in China with extreme high rewards. Both of these competitions focus on purchase prediction on particular categories during an upcoming week, which is a long-term purchase prediction problem. Participants of such competitions usually do a lot of exploration data analysis to discover strong predictors. They use algorithms based on gradient boosting decision trees to combine these predictors. Besides, ensemble methods are popular when making final predictions.

We collected data during a shopping festival particularly, when people have a much shorter time to make decision. Also, our data is collected by ISP. Comparing with data collected by website owner, we have entire data from a cellphone (don’t need to login), but we do not have data from PC. So it’s not quite precise for long-term human behavior study. Thus, in this paper, we’ll discover how people’s short-term behavior is like. Besides, we need to compare the user behaviors before the shopping festival and after the festival. Through these differences, we are able to carry out interesting recommendation and also predict the actions of purchase. Specifically, we model our short-term purchase prediction problem as a simple classification-learning problem. We remove all ordering actions from sessions of user activity and label these sessions as purchase or non-purchase ones. From the behavior sequence in each session we can extract many features that are relative to purchase behavior.

Dataset

Our dataset contains an anonymized online shopping logs of 47,906 users involving 236,809 items. 581,430 entries record users’ online shopping behaviours through JD websites or apps on November 10,11 and 12 in 2016. Each entry consists of anonymized user id, timestamp, action type and item id. For example, in the dataset, one entry is like “460030089072533 20161111005624 3 10632983079”. “460030089072533″ is the anonymized user id, “20161111005624” is the timestamp when the action recorded in this entry happened, formatted in GMT + 8, “3” represents the action types refer to browsing while other action types such as adding to cart and ordering are represented by “4” and “5”, respectively, and “10632983079” is the anonymized item id. Other entries are in the same format.

We obtain this dataset through cleaning the flow record data offered by one of the main network operators in China collected using deep packet inspection (DPI) technology. We chose to study all traffic flows to and from www.JD.com, one of the largest e-commerce retailers in China. The DPI technology can be used to resolve the traffic flow contents from packet headers. Extractable information typically includes the requested web link (URL) and timestamp. The URLs toward the JD.com site were structured such that we could identify meaningful information from the URL itself such as product IDs, product category IDs, and user action types. While this study is limited to understanding patterns occurring on a single website, we expect the key shopping behaviors observed from data would be similar to that seen in other shopping websites during the shopping festival.

The traffic flows indicated that people accessed the JD.com shopping website through different platforms. The most prominent kinds were third-party apps like WeChat, native JD app, and mobile web browsers as listed below:

  • WeChat is the most popular social network in China and also accounted for 51% of all visitors (cf. Yin 2016). WeChat offers pages for online stores through which people can browse items and order them conveniently without having to download native shopping apps.

  • The native JD app was the next popular, accounting for 39% of all visitors. The native app visitors generated the most amount of traffic (accounting 57% of all flows), indicating they are heavy users of the JD shopping site.

  • Mobile web browser was another way to access, although the smallest fraction of visitors accessed through this type (15%).

While the webpage designs may appear similar across these platforms, the choice of platforms leads to entirely different user experience. For example, payment takes fewer clicks and hence is easier on apps than on mobile browsers. The remainder of this paper presents the characteristics of shopping behaviors seen across all the platforms unless specified. Usually, there are three steps before users complete a transaction through online shopping. First, users browse the website, search and find what interests them. Then, they add items that it is possible to buy to cart. Finally, users make decisions on what to buy, submit payment requests and accomplish payments. Table 1 shows 98.4% of users browse items, 20.9% of users add items to cart, and 17.9% of users order. It is shown that 95.6% of items are browsed, 7.39% of items are added to cart, and 4.56% of items are ordered. However, not all consumers follow the steps—browsing, adding to cart, and ordering to make a deal. As shown in Fig. 1, only 9.80% of users browse, add items to cart and then make orders. 75.2% of users only browse, 11.4% of users browse and add to cart but do not buy anything, which implies 86.6% of users like window shopping. 8.52% of users browse and make orders who did not add items to cart. Meanwhile, not all items are browsed and added to cart before they are ordered as is shown in Fig. 2. It further shows that 91.3% of items are only browsed, and 3.23% of items are browsed and added to cart. 0.168% of items are browsed and added to cart before ordered.

Table 1 Basic information about action types of users and on items
Fig. 1
figure 1

Distribution of users of different online shopping behaviours

Fig. 2
figure 2

Distribution of items browsed, added to cart and ordered

Characteristics of online shopping behaviours

Since many shops on the e-commerce platform even the platform itself offer big discounts and promotions on 11st November, in this section, we utilize our dataset to answer the following questions: (1) What is the impact of discounts and promotions on the online sales? (2) How do discounts and promotions influence the shopping behaviour patterns of users? (3) How do discounts and promotions affect the popularity of items? Specially, we examine the sales variation (3.1), shopping behaviour patterns of users (3.2), and the popularity of items (3.3) before, on, and after the Singles’ Day.

Sales variation

To reveal the impact of discounts and promotions on the sales of online shopping, we investigate the number of users browsing, adding to cart, ordering, and items browsed, added to cart, and ordered before, on, and after 11st November. As shown in Table 2, the number of users surfing on the online shopping website or using the app is the largest on 11st November\, and drops sharply on 12th November. This implies that discounts and promotions play an important role in attracting users. However, no matter on which day, most users browse. The number of users ordering on 11st November is over twice that on November 10, and over four times that on 12th November. The situation is almost the same considering the number of items ordered on November 11 compared to that on November 10 and 12. Sales increase rapidly due to discounts and promotions. Furthermore, the numbers of ordering, browsing, and adding to cart vary with time in a day, and the variation curves on November 10, 11, and 12 are totally different.

Table 2 Number of users and involving items before, on, and after 11st November

Figure 3 shows the number of ordering per half an hour on November 10, 11, and 12, respectively. The peak ordering duration on 11st November is between 00:30 and 00:59, and the peak ordering number on 11st November is 1306, which is 145 in the same duration on November 10 and is 228 on 12th November. The peak ordering duration on November 10 is between 23:30 and 23:59, and the peak ordering number is 557, which is less than 50% of that on 11st November. The peak ordering number on November 12 is even less. 228 ordering occurs between 00:30 and 00:59 on November 12. The influence of online shopping carnival starts from half an hour before 11st November till half an hour later on November 12. However, the variation of browsing and adding to cart on November 10, 11, and 12 is different from that of ordering. The numbers of accumulative browsing and adding to cart reach peak between 23:30 and 00:00 on 10th November before the biggest discounts and promotions start as shown in Figs. 4 and 5. Both the numbers of ordering, browsing, and adding to cart show a small peak at around 10:30 am, because some merchants offer discounts and promotions until 10:00 am instead of 00:00 on 11st November.

Fig. 3
figure 3

Number of ordering per half an hour on 10th to 12th November

Fig. 4
figure 4

Number of browsing per half an hour on 10th to 12th November

Fig. 5
figure 5

Number of adding to cart per half an hour on 10th to 12th November

Shopping behaviour patterns of users

To answer how the discounts and promotions influence the shopping behaviour patterns of users, we analyse the number of items each user orders, and the average time that users spend on each action when they are online shopping on 11st November, compared to that on 10th and 12th November. Figure 6 shows the Cumulative Distribution Function (CDF) curves of the number of items that one user orders. 80% of users order no more than 3 items on 10th and 12th November, while 80% of users order no more than 4 items on 11st November. However, the users who buy the most on November 11 buy 35 items, while the users who buy the most buy 27 and 29 items on 10th and 12th November, respectively. Big discounts and promotions on 11st November stimulate the purchasing desire of some users. Figure 7 shows the CDF curves of the average browsing time and times before one user add an item to cart. Figure 7a shows 80% of users browse no more than 7 times before they add an item to cart, no matter on November 10, 11, or 12. Some users browse at most 218.5 times in average before they add an item to cart on 11st November, while some users browse at most 225 times in average before they add an item to cart on 10th November. Users browse at most 131 times in average before they add an item to cart on 12th November, which can be explained that there are not so many attractive items to users since most users have bought most items they want on 11st November. Figure 7b shows the CDF curve of the average browsing times before one user makes an order. 90% of users spend no more than 22.15 min in average on browsing before they make orders on 11st November, while the average time spent on browsing is no more than 12.75 min on 10th November, and 14.45 min on 12th November. This can be explained that there may be different merchants offering different discounts and promotions on the same items, and consumers are prone to shop around before they make the final decisions.

Fig. 6
figure 6

CDF curves of the number of items that one user orders on 10th to 12th November

Fig. 7
figure 7

CDF curves of the average browsing times before one user adds an item to cart and make an order on 10th to 12th November

Popularity of items

Figure 8 displays the number of concurrent shoppers on JD.com binned by the hour, which shows a peak starting a few hours prior to the sale event (i.e., 11 PM on November 10th to 2 AM on November 11th). The temporal pattern demonstrates mobile shoppers rushed to the site anticipating substantial sales on the Singles’ Day. Sales marked the record high for JD.com in 2016, where nearly one out of every 10 visitors purchased at least one item during three-day period.

Fig. 8
figure 8

Hourly clicks prior to and during the shopping festival

To investigate how discounts and promotions affect the popularity of items, we analyse the times that one item is ordered on 10th to 12th November, respectively. As shown in Fig. 9, 80% of items are ordered no more than 3 times no matter on 10th to 12th November. Nevertheless, the most popular item is ordered 5204 times on 11rd November, while an item is ordered at most 2061 times on November 10 and 1091 times on 12th November. Discounts and promotions are beneficial for rising the sales of items substantially.

Fig. 9
figure 9

CDF curve of the times that one item is ordered on 10th to 12th November

Collaborative filtering based recommendation

In this section, we aim to recommend items to users based on their historical ordering records and the characteristics of online shopping behaviors analyzed in “Characteristics of online shopping behaviours” section. We pre-process the utilized dataset to generate the purchasing matrix, which contains 3821 users as columns and 5564 items as rows. The values of the elements in this matrix is either 0 or 1, where 0 means the user did not buy the item, while 1 means the user bought the item. There are 6166 elements with non-zero values in this matrix. The data sparsity problem is very severe in this task. Thus, we borrow the idea of transfer learning. Both the records for browsing, adding to cart like ordering are used to predict consumers’ future shopping behaviors. We conduct the K-fold Cross Validation (K-CV) method to evaluate the performance of the recommendation method applied in this article, where K is set to be 5. The purchasing matrix is randomly divided into 5 parts, where each part is used as the testing set once, and another four parts left are used as the training set. In this way, our results are with more confidence. We utilize the functional module in MyMediaLite 3.11version (cf. Gantner et al. 2011) to complete the item prediction from positive-only implicit feedback, applying matrix factorization method WR-MF (cf. Hu et al. 2009). In our dataset, we do not have explicit feedback to show whether customer like or dislike the product, we only know whether customer browse, add to cart or make an order. WR-MF method is a good choice for this case. Parameters in WR-MF method such as numFactors is set to be 10, and regularization is set to be 0.0015. We choose the value of parameters like this because after multiple times try, we can obtain the best performance under these values. We evaluate the performance of the collaborative filtering based approach in terms of Precision@k and Recall@k, which are defined as follows (cf. Li et al. 2016):

$$ \mathrm{Precision}@k=\frac{1}{N}\sum \limits_{i=1}^N\frac{S_i(k)\cap {T}_i}{k}, $$
$$ \mathrm{Recall}@k=\frac{1}{N}\sum \limits_{i=1}^N\frac{S_i(k)\cap {T}_i}{\mid {T}_i\mid }, $$

where Si(k) represents the set of top k items recommended to user i, Ti represents the set of items bought by user i in the testing dataset, ∣Ti∣ represents the number of elements in the set Ti.

The performance of the collaborative filtering based approach is shown in Table 3. From the results, we can see that the it is still hard to do recommendation for any signal items. Thus, we need to find the features that the user will buy, which can be further utilized in the purchase prediction. During a short-term shopping festival, most people have to decide whether they would purchase something quickly. Under such circumstance, we mainly focus on human behaviour in a short time in this paper. While studying Internet browsing behaviour, it’s common to treat all of a user’s behaviour as several sessions. Each session represents one time of activity from login to logout. Many studies shows that one time of user’s browsing activity is most probably finished, if the user haven’t sent anymore request in 20 min. Besides, we assume once people have placed an order already, they will no more concern about the ordered products during this activity. Based on the above, we cut a user’s whole click stream into sessions by both the 20 min interval threshold and each ordering actions. Figure 10 shows the distribution of session length. Actually, 35% of sessions consists only one action. Also, the session length of purchaser and non-purchaser have much different distribution. It is easy to be understood that when a user have intent to buy something, he would cost more time and more click to look into a series of products. Thus, we evaluate the occurrence of purchasing behaviour by the following factors.

Table 3 The performance of the collaborative filtering based approach
Fig. 10
figure 10

Distribution of session length. Purchaser means the session consists one ordering action at least

  • Cart. The design of shopping cart is directly for the convenience of ordering. Thus, we could almost say an add-to-cart action is strongly related to a purchase intent. Quantitatively, our data shows 23% purchasing sessions have cart actions. Although the ratio is not very much high, it’s still pretty higher than 7% of non-purchase sessions.

  • Session lengths. Intuitively, people would spend more time and effort if they really intent to purchase something on the website. So in general we believe a longer session implies a higher probability for ordering. Basically, we show that the session length distributions are quite difference between purchasing sessions and non-purchasing sessions.

  • Event page browsing. Since most consumers are attracted to the website by sales events, we would like to discover if browsing on event pages would be typical to people’s purchase decision.

  • Number of browsed products. When people really want to purchase something, they would usually compare several products and choose one. So we suppose the more browsing actions a session have, the more probable it would be ended by an ordering action.

  • Platform. We suppose people have different degrees of purchasing on JD.com, and such difference can be reflected on the platform they choose. In general, purchase rate on native JD app is twice more than that on other platforms. Considering people just browse several event pages and leave soon, we suppose browsing specific product pages means the visitor is seriously want to look for something to buy. So we compare the number of buyers versus the number of visitors who have browsed at least one product page. The ratios of JD app and third-party apps are nearly equal, and are twice as much as that of mobile web browser.

  • Date. In “Characteristics of online shopping behaviours” section, we’ve already introduced how the sales events on each of the 3 days were held. We can simply treat the three days as one day before the festival, on the festival, and then one day after. And we’ve also shown that traffic of these days is quite different in amount (see Fig. 10). In general, there’s more activity on 11th, then 10th. Although 12th is a Saturday, it still has the least traffic flow. Unlike the traffic flow, it shows that the rate of purchasers is higher than 10th. We can explain it in two ways. On 10th, people were more likely to prepare for the upcoming shopping festival by looking for things they would buy, and they would wait until the shopping festival begun. It also explains why the session lengths on 10th are the longest. Furthermore, some of JD’s coupons were still available on 12th, and that was the deadline date. Of course, some people would purchase things by using these coupons for discounts.

  • Time of a day. There are two reasons that may cause the influence of time on people’s ordering behaviour. One is some sales event last for several hours during the shopping festival. The other is the Singles’ Day was a Friday, that’s to say people had to go to work on that day. Thus, we also want to know how the time of a day is related to the probability of purchase.

Now, with the give session’s statistics (number of clicks, users, duration per product category), we design several machine learning methods to carry out the prediction. We down sample our data in order to have same amount of purchasing sessions and non-purchasing sessions. There are many different kinds of features, so we choose logistic regression classifier and apply 5-fold cross validation. Logistic regression is a regression model where the Dependent Variable (DV) is categorical. It covers the case of a binary dependent variable—that is, where it can take only two values, “0” and “1”, which represent outcomes such as pass/fail, win/lose, alive/dead or healthy/sick. Cases where the dependent variable has more than two outcome categories may be analyzed in multinomial logistic regression, or, if the multiple categories are ordered, in ordinal logistic regression. The performance of our prediction is given as the AUC (cf. Fawcett 2006), which is defined as follows.

  • AUC: area under the curve is the area under the curve (mathematically known as the definite integral) is equal to the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming “positive” ranks higher than “negative”). In practice, it is possible to calculate the AUC by using an average of a number of trapezoidal approximations.

Based on the above defined metric, we show the prediction results in a specific category in Fig. 11. From the results, we can observe that our prediction accuracy is above 75% in all of the cases. It range can achieve about 84% when the category if phone. Generally, there is not so much difference between different types of shopping items. On the other hand, it also demonstrates the use of the session level indicators is enough to predict the purchase action. Thus, we conclude that the method we proposed can predict the purchase action with high accuracy.

Fig. 11
figure 11

The AUC of ROC for purchase prediction in a specific category

Related works

Research on consumer behaviour modelling in e-commerce could date back to the early occurrence of e-commerce websites. Consumer behaviour modelling then was conducted for predicting the acceptance of e-commerce (cf. Pavlou 2003) and appropriate goods stocking (cf. Hristoski and Mitrevski 2007). More recently, browsing (cf. He et al. 2015), ordering (cf. Wu et al. 2015) and repeat ordering behaviours (cf. Liu et al. 2016) of consumers are investigated to predict sales and recommend items to their latent buyers. Different from existing literatures, our work investigates the average browsing times before one user adds an item to cart and make an order before, on and after the big shopping festival.

Another line of related works focuses on recommendation systems widely used in e-commerce field to improve cross-selling (cf. Kamakura 2014), increase customers’ loyalty (cf. Liu et al. 2016) and realize deep personalization (cf. Zou et al. 2017). Studies have investigated users’ consumption intention from various aspects such as social media (cf. Ding et al. 2015), cursor movement (cf. Huang et al. 2011) and advertising target (cf. Farahat and Bailey 2012). This work is different from previous works since we investigate the impact of big sales and promotions proposed by all most all e-commerce websites on the same day. Influence on the variations of sales, users’ consuming patterns and popularity of items are studied in our work.

The third line of related works focus on the machine learning techniques applied in the recommendation system. The main category of techniques are collaborative filtering (cf. Schafer et al. 2007), content based (cf. Pazzani and Billsus 2007), knowledge based (cf. Nguyen et al. 2014) and hybrid (cf. Zhang et al. 2016). Collaborative filtering based approaches are further classified into memory based and model based. Memory based collaborative filtering can be realized with user-based algorithm (cf. Zhu et al. 2009) or item-based algorithm (cf. Sarwar et al. 2001), while model based collaborative filtering approaches are realized by matrix factorization (cf. Ma 2013) such as Singular Value Decomposition (SVD) (cf. Koren et al. 2009). In this work, since our dataset only includes implicit feedback, we apply collaborative filtering based method to recommend items for consumers, and 5-fold cross validation method is used to evaluate the performance of the method.

Discussion

The findings in this article give us some insights in recommending products to their latent buyers and predict the actions of purchases. For example, since people would spend more time and effort if they really intent to purchase something on the website, e-commence website could recommend and bring the products again into the views of users who have browsed them for a long time but hesitantly to buy. Since people prefer to compare prices between different e-commence website, it is beneficial for the e-commence company to show the higher prices of rival website on its own page of the same product when design the product page on shopping festivals. Since peak traffic always occur half an hour after the big discounts start, to alleviate the burden of logistics and website servers, it is better to set multiple discount time during a day. Our insightful findings in this article could give many constructive suggestions for merchants, logistics companies and e-commence companies to increase their incomings and working efficiency.

Conclusions and future work

In this article, we investigate how the big discounts and promotions offered on November 11st influence the sales of e-commerce websites, consumers’ online shopping behaviours and the popularity of items based on the logs cleaned from the DPI dataset. The sales of e-commerce are stimulated sharply by discounts and promotions. We find that the sales per half an hour reach the peak at 00:30 and 10:30 am on November 11st that are half an hour later than the time that big discounts and promotions start. Customers are more likely to shop around on the website before they make orders. The last conclusion is that the sales of the most popular items on November 11st could be increased several times. We also apply a collaborative filtering based approach to recommend items to users, and five-fold cross validation is conducted to evaluate the effectiveness of the proposed method. Moreover, we test the efficacy of several feasible precursors of purchasing actions (e.g., the effect of total browsing time, the number of clicks, product categories, and time of day in future purchases) by examining whether visiting the shopping site prior to the sale event or browsing a coupon page is indicative of future purchases.

The effect of holidays and shopping season to retails is critical. By far being the busiest shopping season of the year and this period can determine the difference between profit and loss for the year for many retailers. Mobile clearly is the big story. Despite accounting for a smaller overall percentage of spending, mobiles have an outsize impact on retail sales growth. In terms of traffic, mobile outpaced desktop retail traffic by a factor of 2 and was higher also on the Cyber Monday, when online retailers promote exceptional bargains immediately following the Thanksgiving holiday weekend in the US.

In our future work, since we do not consider the information diffusion, it would be better to integrate shopping data within the WeChat app. WeChat is a social network where information is shared among social ties (cf. Zhang et al. 2017) that agree to be mutual friends. People can forward shopping links to their social relationships, upon which a diffusion of information can occur. In addition, generalizing the analysis and proposed algorithm in this paper on dataset in different years and different websites can be very convinced, which is also left as our future work till we obtain the dataset in other years and other websites. We wonder if the key shopping behaviors observed from this dataset would be similar to that seen in other shopping websites during the shopping festival. Last but not least, we did not look into such cases nor proximity of information search behavior, where two connected individuals may search for similar items on the JD.com shopping site. To further improve the accuracy of recommendation and prediction of purchase actions, advanced techniques such as deep learning and transfer learning can be applied into these tasks. We leave these issues as future work.