1 Introduction

The mobile apps are continually becoming more and more popular. Mobile app stores such as Google Play store contains millions of apps. Among those apps, a number of apps have billions of downloads and active users. These apps are related to different categories including games, communication, books, business, news, sports, and many others. Each category contains a huge number of apps. In Google Play store, each app has usually been identified by its name and its rating. The app rating is the average rating of all the ratings given by the users. Before downloading an app, people usually prefer to download apps with high ratings because high rating apps usually have higher quality than the rest of the apps. In this regard, Hsu and Lin (2015) performed a detailed analysis of the intentions of mobile phone users. This study shows that app ratings have a valuable effect on the user’s intention to download and use an app. Therefore, in order to gain better ratings, companies use different techniques to increase the ratings of their apps on the Google Play store. For this purpose, many companies even use fraudulent and deceptive activities to gain more ratings for their apps, so their apps can be more visible in the play store (Zhu et al. 2015). However, to the best of our knowledge, there is no scientific approach to find a relationship between ratings and other factors of the app.

In order to fill the research gap, it is of great importance to analyze the various factors of Play store apps and find out if there is any real connection between these factors and the app rating. For this purpose, it is worthy to test these factors of Play store apps by the means of Machine Learning and data analysis. This will help finding a real connection between these factors and apps rating.

In this paper, our sole objective is to identify the influence of various factors of Google Play store apps on the apps rating. Although, some researchers have worked on different app variables and their relationship with the app rating, none of the researcher focused on the variables proposed in this research. Also in this research different linear and non-linear regression models are used along with the standard evaluation techniques for performance evaluation. Moreover, this research focuses on detailed keyword analysis that find the important keywords that are helpful in gaining better ratings and vice versa. For this purpose, we take real-world data of Google Play store apps and use various Machine Learning models to find a real relationship between these variables and app ratings. We also perform detailed statistical analysis in terms of app categories, app names, keywords, app size, number of installs, app types, content ratings, etc. with the overall app rating to find out a correlation between these factors and their value in terms of app ratings. For this purpose, we use Random Forest variable importance to find the importance of each variable, we use the Linear Regression model and Support Vector Regression model to find the importance of different factors. For evaluation of the work, we use Mean Square Error, and other performance evaluation techniques to evaluate the performance of our findings.

The rest of the paper is divided as follows. Section 2 discusses Literature review, section 3 discusses the research questions while section 4 explains the details of our research methodology. In section 4 we discuss the results before concluding the paper in the next section.

2 Literature review

As data availability, completeness and accuracy is a big issue in the mobile app market, the work in this area is still very limit. Google Play store ranks each app that is published in the store. The overall app ranking system of Google (Fernandez 2013) uses very complex infrastructure to rank apps. This is the reason google app ranking system outranks other apps store and is very successful. According to the researchers, Google Play is more of a superstar market due to its popular products and apps. Google play have a nice and clean adoption system that ranks apps (Zhong and Michahelles 2013). Also, these ranking systems are very complex and rank the apps in a very good manner, there are many cases where users scroll the pages to the bottom and find the apps they like or find an app with a better rating. Since the introduction of mobile app stores, there are a few researchers who works in the area of mobile app rating prediction and variable importance that helps in finding a correlation between apps ratings and other factors. Researchers have worked in different domains to find out the ways to get better ratings in apps. Some researchers focused on user’s reviews, some researchers target the apps attributes and features, some researchers worked in the field of better software engineering practices. However, all of these domains are important at their places, but usually apps attributes are analyzed by the researchers to find a relationship between apps rating and its attributes. In this section, we discuss some of the contemporary research works in this area. Tian et al. (2015) performed a case study using statistical analysis to rank the different factors of apps that effect the app ratings, the size of an app, promotional images and target sdk of an app are the most influential factors of high-rated apps. Similarly, Finkelstein et al. (2017) investigates the relationship between price, rating and popularity in the blackberry app store and their findings show that there is a strong correlation between customer ratings and popularity. Researchers performed a detailed analysis of apps from Android and Apple apps and performed a quantitative analysis of apps attributes and their effects on the apps in different app stores (Ali et al., 2017). Moreover, Liang et al. (2017) used feature-oriented matrix factorization to predict the mobile application ratings. Researchers uncover factors that influence the app rankings for apple app store and proposed a model that predict the ratings for different apps. They considered a number of variables in their model, including package size, app release date, category popularity, etc. to find the importance of these factors (Picoto et al. 2019). Khalid et al. (2016) performed an analysis of finding the relationship between app ratings and static-analysis warnings. According to their findings, the developers can use static analysis tools to identify bugs.

Similarly, researchers used mobile app ratings for the app recommender system, expert systems and knowledge based system for different domains. Researchers also proposed models to rank the risks of android apps using different machine learning models (Peng et al. 2012). However, the area of app variable importance is very limited and there is a gap in this field. Also, at one hand, developers and companies try their best to make apps to gain better rankings. On the other hand, researchers also identify ranking frauds in the mobile app market carried out by the companies and developers to gain better rankings (Zhu et al. 2015)

Although some researchers suggest that app rating is not considered important or is variable like Liu et al. (2014) performed a detailed analysis of Google play store. According to their findings the review ratings have lower impact in case of free apps. In another research Martin et al. (2016) performed a detailed analysis of app releases by developers. According to their findings, 33% of such release caused a significant amount of change in user ratings. Karagkiozidou, Makrina, et al. conducted a research study that helps the developers using the proper keywords and other primary things to gain better rankings of the apps using App Store Optimization (ASO) (Karagkiozidou et al. 2019). Similarly, Mcllroy et al. (2017) performed an analysis of google play app ratings when the company responds to those ratings. The results show that users changes their ratings 38% of the time following a response. However, sometimes, for user’s own satisfaction, sometimes by the requirements and the threshold of recommender systems and expert systems, it is beneficial for an app to have high ratings. Such apps usually gain more downloads and is more attracted by the users.

While different researchers focused on a number of attributes that influence the app ratings, there are still some simple but important factors that are not yet analyzed by the researchers. Moreover, most of the researchers target a few number of attributes and find the importance of those attributes for predicting the apps rating. Usually researchers took the default attributes of the apps and performed their analysis on those attributes. Therefore, there are many attributes that can be computed for each app and its effects can be analyzed for rating of that app. Therefore, to fill the research gap, we conduct a detailed study in which we take a large apps store dataset, we use a number of default app features as well as compute a number of attributes for each app and find the importance of these attributes in app ratings.

3 Research questions

This study addresses these research questions

  • Which type of factors effectively determines the rating of apps in google play store?

  • Does any set of factors exist that are more influential and have a strong relationship with the ratings of apps in google play store?

  • Are there any keywords that promise better ratings and any keywords that results in low ratings?

In order to carry out the research, we perform ML analysis along with statistical analysis to rank the different factors of apps.

4 Research methodology

This study identifies the importance variable for app rating as a regression method. The research model used in this research study is given in Fig. 1. The model is divided into two parts, in the first part, different variables are identified and computed. These variables are then tested on different regression and correlation models. The performance of these models is evaluated using different performance evaluation techniques and most effective and influencing variables are computed in this part. This model covers first and second research question. In the second part, keywords from app title are processed and different rankings are computed on the basis of ratings and frequency.

Fig. 1
figure 1

Proposed Research Model

4.1 Dataset collection

Google Play store apps dataset is collected from Kaggle (https://www.kaggle.com/lava18/google-play-store-apps). The dataset contains ranking and reviews data of 10,840 apps. There are a number of variables available in the dataset including app id, app name, category, rating, reviews, size, installs, type, price, content rating, genres, last updated, current version, and android version. The dataset is collected, preprocessed and stored in the database for further processing.

4.2 Variable importance

This part of research methodology is further divided into four main parts. Each part covers one aspect of data analysis.

4.2.1 Variable identification and Computation

In this part, a number of variables are identified and computed. Some of the variables are present in the dataset while we proposed other variables and computed their values. All these variables are computed and stored so further processing can be applied. The details of the variables are given as follows.

  1. 1)

    category_name: Each app that is uploaded to the Google Play store has a category associated with it.

  2. 2)

    no_of_reviews: The number of reviews for each review is also associated with each app.

  3. 3)

    app_size: App size as suggested by its name, is the size of app mentioned in the app description.

  4. 4)

    no_of_installs: Number of installs are mentioned with each app in categorical manner with each app, e.g., 1000+, 10,000+ etc.

  5. 5)

    type_of_app: Apps are of two types, i.e., free or paid.

  6. 6)

    price_of_app: The cost of each app is also associated with paid apps while free apps have a cost of 0.

  7. 7)

    content_rating: Content ratings are also associated with each app, e.g., 1+, 13+, etc.

  8. 8)

    Genres: Along with the category, genres are also related with the apps. It is usually a little different than the category.

  9. 9)

    android_version: minimum android version required to install the app.

  10. 10)

    word_count_in_name: Total number of words in the title of an app are computed for each app.

  11. 11)

    character_count_in_name: Total number of characters in the title of an app are computed for each app.

  12. 12)

    symbol_count_in_name: Total number of symbols in the title of an app are computed for each app.

  13. 13)

    category_related: A Boolean variable is computed for each app that match the words used in the app title with the words used in the category. If any of the words is matched the value is true else false.

  14. 14)

    free_in_title: A Boolean variable is computed for each app that find if the word free is used in the app title or not.

  15. 15)

    genre related: A Boolean variable is computed for each app that match the word used in the app title with the words used in the genre. If any of the word is matched, the value is true, else false.

  16. 16)

    digits_in_title: A Boolean variable that finds if any numeric value is present in the app title or not.

  17. 17)

    year_in_title: A Boolean variable that finds if any year (from 2000 to 2020) value is used in the app title or not.

4.2.2 Regression models

A number of regression and correlation models are used. The details of these models are given as follows.

  • Random forest: Random forest regression is applied to all the variables The results of random forest determine the importance of all the variable and their influence on the rating. The results of random forest regression are evaluated using Mean Square Error. Random forest model is the first model that is applied to the dataset and the results of Random forest classification are computed for a number of variables to find the importance of these variables.

  • Support Vector Regression: As Support Vector Regression (SVR) is a promising regression model for continuous variables, it is used to find the importance of all the numeric variables. In this model, only numeric values are used so the importance of these variables are computed with the rating. As SVR is usually used for continuous numeric data, this model is applied only on the numeric variables so the importance of those variables can be find out.

  • Linear Regression: Linear Regression model is also used to find the variable importance of different variables with ratings. Although linear regression model is a simple regression model, it sometimes produces better results than other complex models. In this model, only numeric values are used. Therefore, when this model is applied to the dataset, only the numeric variables are considered.

  • Pearson Correlation: Pearson correlation model is used to compute the correlation of the binary variables with the rating. Although these variables are not expected to produce higher results, even a small influence can be significant. Pearson Correlation is also applied to the dataset and the results are computed for a number of binary variables.

  • After applying different models, the performance of each of the model is computed. As each model is usually evaluated by a different sort of evaluation methods, different performance evaluation techniques are used.

4.2.3 Performance evaluation

The performance of different models is evaluated using different performance evaluation techniques. The details are given as under.

  • %IncMSE: %IncMSE is the most robust and informative measure. The higher MSE shows that the variable is more important while the lower number shows that variable is less important.

  • RMSE: Root Mean Square Error (RMSE) is the most widely used statistics. It measures the difference between values predicted by the model and estimator.

  • MAE: Mean Absolute Error (MSE) is a measure of difference between two continuous variables. It is the average distance between each point and the identity line.

  • pvalue: during a hypothesis test, a p value is used to determine the significance of the results. A small p value (less than 0.05) indicates strong evidence against the null hypothesis.

4.2.4 Identification of most important variables

On the basis of results of different models, we identify the most important variables that influence the overall ratings of the app. As app rating is divided between 0 to 5 on the scale of 0.01, even a small change in the rating means a lot. Moreover, as a huge number of apps available in the play store, a significant difference in ranking of an app change the app position in play store by a big margin.

4.3 Keyword processing

In the second part of data analysis, keywords are processed. This part analyzes the most important keywords used in the app titles that impact the app rating positively or negatively. Moreover, most frequently used keywords in app titles are also computed. In this phase, stop-words are removed from app titles, POS tagging is applied to identify all the important nouns and removal of unimportant words. Similarly, all the keywords along with its frequency, mean rating, max rating and min rating. This analysis is used to identify the keywords used in the app title that results in higher ratings, keywords used in the app title that results in lower ratings and most frequently keywords.

5 Results and discussions

After computing all the variables and applying the models, the results are computed from different models. Each model has its own significance and importance. Random forest regression and variable importance is the most commonly used regression models to find the important variables in a dataset. The results of random forest regression are shown in Table 1. The Results show that no_of_reviews, genre, character_count_in_name, app_size are the most influential variables and have high impact on the ratings of the apps. Similarly, year_in_title, free_in_title, digits_in_title have little importance in terms of predicting the ratings of an app.

Table 1 Random Forest Regression Results

In order to analyze the importance of a number of reviews on the app ratings, a scatter plot of number of reviews against app rating is shown in Fig. 2. The graph clearly shows that apps with higher number of reviews usually have higher ratings. These are usually the apps which have a very high number of installs and are owned by bigger companies who try to improve upon their apps. These apps usually have higher ratings.

Fig. 2
figure 2

Scatter chart of Apps rating for different number of reviews of apps

Similarly, some genres are ranked higher and are highly liked by the users. Therefore, apps in those ranks are highly rated by the users. On the other hand, some genres are highly criticized by the users, users expect much more from the app, or the users are highly diverse from different backgrounds which leads in mixed or lower sort of ratings. Other important attribute is character_count_in_name which clearly shows that character count really matters in the app. In order to show the better demonstration of character counts with mean app rating, a scatter graph is shown in Fig. 3. The Figure shows that for most of the times, when the character count in the App title is low, the ratings are usually low and when the Character counts are higher the ratings are higher. Although there is no perfect correlation between these two attributes, but the correlation is present.

Fig. 3
figure 3

Scatter chart of Apps rating for different Character counts in App Title

After the random forest regression, Linear regression model is applied. For this model, only numeric values are used because a linear regression model works best on numeric continuous values. The results of Linear regression models are shown in Table 2. The results show that the correlation between app rating and variable is very low. However, there are still some important variables like symbol_count_in_name and type_of_app. Although, in such sorts of analysis, linear regression model doesn’t compute well and usually there isn’t a direct relation between a variable predictor. However, the results show that p values are very low in most of the cases.

Table 2 Linear Regression Results

For an advance correlation measure through regression, SVR is used. SVR works similar as support vector machine. However, SVR is used for continuous values. The results of SVR are shown in Table 3. According to the results of SVR, word_count_in_name and content_rating the most promising variables.

Table 3 Support Vector Regression Results

As the number of characters in the App title are important variable, the number of word count in App title has also come out to be an important variable as shown by SVR. The Fig. 4 shows the average rating against number of words in App title. The results show that the higher number of words, the higher the rating of the app will get. This correlation is also similar to that of character count in the App Title.

Fig. 4
figure 4

Scatter chart of Apps rating for different Words counts in App Title

According to the results of the random forest regression model, no_of_reviews, genre, character_count_in_name, app_size are the most important variables among the others. As random forest regression uses ensemble learning methods and aggregates many decision trees, its results have higher importance compared with other regression models. According to the linear regression model, symbol_count_in_name and type_of_app are important variables. However, the significance of these two variables is very low while the coefficient values of other variables are also very low. This means that the relationship between apps, rating and other variables doesn’t have a linear relationship with each other. Therefore, we can say that none of the variable has a simple relationship with rating and the linear model is not able to find the real importance of any variable. Similarly, according to SVR results, word_count_in_name and content_rating are important variables. Although each model has its own importance and find the importance uniquely, if we take a look at the importance of the top variables of each regression model, i.e., number of reviews in Fig. 3, character count in App title in Fig. 4and word count in app title in Fig. 5, it is suggested by the results that these variables somehow have high impact on the app ratings.

Fig. 5
figure 5

Word cloud of most frequently used keywords in App title

As SVR and linear regression results are more suited on numeric values, the Pearson correlation is computed for binary variables. Results of Pearson correlation are shown in Table 4. The results show that although the correlation is low for variable, the significance of the variables is high. symbol_count_in_name appears to be the best variable among the other binary variables while the other variables have a very low or no significance on the app rating.

Table 4 Pearson Correlation Results

After computing the importance of variables, we discuss experimental results of keywords analysis. For this purpose, first we discuss the most frequently used keywords in the app titles. Top 150 keywords are chosen and the word cloud showing the most frequently keyword is shown in Fig. 2. According to the results, words like free, app, pro, camera, mobile, live, video, etc. are the most frequently used keywords in the app title.

Moreover, keywords present in the app title that have high ratings are shown in Table 5. The threshold of frequency “N” is set to 20. This way keywords with less apps are excluded. The results show that the most important keyword is ‘workout’ that is used in 30 apps while the average rating for the apps that contain the word workout is 4.6. This show that users mostly like the app that contain the word workout or we can say that people like exercise apps. There are many other similar keywords with high ratings that gives an idea about what kinds of words or areas are mostly liked by the users.

Table 5 Keywords with High Rating

Similarly, the keywords with lower ratings are also computed. These results are shown in Table 6. Now these results are the most important results. As these results give an idea of which words are responsible or we can say what the apps containing these words or areas that are mostly disliked by the users. The threshold of frequency “N” in this case is also 20. The result shows that fk, cd, cf., ah, fn, etc. are the lowest rated keywords. Some of these keywords are used as abbreviations while some are used as brand names for some apps developers. These results are interesting as the table show that most of these keywords are not dictionary words and doesn’t mean anything. This show that such words have a negative impact on the user in terms of rating and they usually rate such apps as low.

Table 6 Keywords with low ratings

Although this area is a very vast area and a detailed analysis is required in this area in terms of data, completeness and evaluation, the unavailability of detailed dataset is one of the big reasons why research hasn’t produced much work in this area. According to the results of random forest no_of_reviews, genre, character_count_in_name, app_size are the most important features. While the simple regression model the linear regression model shows that symbol_count_in_name and type_of_app are the important variables. Similarly, Pearson correlation shows that symbol_count_in_name has higher significance. Therefore, the positive use of these variables can improve the overall ratings of the app in the play store. While there are some places where the models don’t fit well, it is quite impossible to find a more stable and better variable from a very small dataset. However, due to the unavailability of a larger dataset the more in-depth analysis is not possible. Still the results of current analysis reveal that some of our proposed variables have significance and can be influential.

6 Conclusion

In this paper a detailed analysis is carried out on a number of variable that influence the ratings of google play store apps. Different variables are proposed and computed for this purpose and results of each variable is discussed. The results show that there are some significant variables that are able to influence the rating of apps. As app ratings have a very small scale, even a minor change in the rating can help getting a better outcome and more downloads and visibility. Therefore, even the lower significant variable important considering the domain. The performance evaluation results show that some of the proposed variables high significance and can be used in a positive way to increase the app ratings. We also performed a detailed keyword analysis which presented a number of important points. The results show that there are some words that promise higher ratings while there are some keywords that usually mean lower ratings. These keywords also reveal that the categories from which these apps belong are also considered as important and unimportant by the users. In the future, we aim to use a much larger database and incorporate app variables as well as app reviews and reviewer variables in order to predict the ratings of an app. This multi-dimensional analysis would help in finding a much better picture of variable importance and helps us finding the factors that contribute in higher app ratings of apps.