Development of stock market trend prediction system using multiple regression

Asghar, Muhammad Zubair; Rahman, Fazal; Kundi, Fazal Masud; Ahmad, Shakeel

doi:10.1007/s10588-019-09292-7

Development of stock market trend prediction system using multiple regression

S.I.: CMKBO
Published: 14 February 2019

Volume 25, pages 271–301, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational and Mathematical Organization Theory Aims and scope Submit manuscript

Development of stock market trend prediction system using multiple regression

Download PDF

Muhammad Zubair Asghar ORCID: orcid.org/0000-0003-3320-2074¹,
Fazal Rahman¹,
Fazal Masud Kundi¹ &
…
Shakeel Ahmad²

2719 Accesses
47 Citations
4 Altmetric
Explore all metrics

Abstract

The Stock market trend prediction is an efficient medium for investors, public companies and government to invest money by taking into account the profit and risk. The existing studies on the development of stock-based prediction systems rely on data acquired from social media sources (sentiment-based) and secondary data sources (financial-sites). However, the data acquired from such sources is usually sparse in nature. Moreover, the selection of predictor variables is also poor, which ultimately degrades the performance of prediction model. The problems associated with existing approaches can be overcome by proposing an effective prediction model with improved quality of input data and enhanced selection/inclusion of predictor variables. This work presents the results of stock prediction by applying a multiple regression model using R software. The results obtained show that the proposed system achieved a prediction accuracy of 95% on KSE 100-index dataset, 89% on Lucky Cement, 97% on Abbot Company dataset. Furthermore, user-friendly interface is provided to assist individuals and companies to invest or not in a specific stock.

Analysis of Stock Market and Its Forecasting

Visualization and forecasting of stock’s closing price using machine learning

Article 10 February 2024

Risk Analysis for Long-Term Stock Market Trend Prediction

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Prediction of different events like flood, weather, earthquake, election, and stock are considered as the challenging tasks and have a major impact on human life. Stock prices prediction is a complicated and dynamic process due to rapid prices movement and market behavior. The prediction techniques in the stock market can play a crucial role in bringing more people and existing investors at one place (Zhang et al. 2013). The Stock market forecasting and analysis have gained the attention of researchers in the field of data mining to develop applications which could assist individuals as well as business organizations in decision making to invest money or not on a specific stock by taking into account the profit and risk.

The major objective of stock market prediction is to design software applications that can (i) analyze stock-related data from different financial resources, (ii) process that data using some prediction model, and (iii) present information to users. Design of such applications receive the considerable attention from financial experts, economists, businessmen, and others. Many people, business and organization consider a stock business as a profitable resource and wish to take part in the particular business. However, before purchasing and selling the stock, they have to estimate the profit and demand of particular stock using online financial data and trends. Therefore, the stock prediction is a challenging task due to different factors, such as retrieval and analysis of relevant stock data, fitting data into appropriate model and interpretation of results (Ladan et al. 2014).

The existing studies (Ladan et al. 2014; Kamley et al. 2013; Javaid 2010; Reddy 2010; Yuan and Luo 2014; Park et al. 2010; Ariyo et al. 2014; Devi et al. 2013) on stock-based prediction are based on the sentiment-based and secondary data acquired from social sites and financial sources. The sentiment-based approaches for stock market prediction mainly rely on unstructured data extracted from social media sites. The major limitation of such approaches is the low accuracy of prediction model due to noisy nature of input data. The other category of stock-oriented prediction techniques is based on analytical models using secondary data. The analytical approaches provide a more accurate prediction as compared to sentiment-based techniques due to well-structured nature of input data. However, such methods for the development of stock-based prediction systems rely on data that is usually sparse in nature. Moreover, the selection of predictor variables is also poor, which ultimately degrades the performance of prediction model. The problems associated with existing approaches can be overcome by proposing an effective prediction model with improved quality of input data and revised selection/inclusion of predictor variables.

The proposed method is based on multiple regression analysis of stock trend prediction, supported by a revised set of stock indicators. The key addition to the state of art methods (Ladan et al. 2014; Kamley et al. 2013; Javaid 2010) is in the way it preprocesses the input data with a revised set of stock indicators. Our system can take dataset as input, apply preprocessing steps, perform prediction, and gives a recommendation about the stock investment to user on the basis of prediction. The proper selection of stock indicators and applying preprocessing steps provides efficiency of the proposed system with respect to the state-of-the-art methods in terms more accurate prediction of the stock trend.

The main of the contributions of this work include the development of a multiple regression-based stock market trend prediction system using a revised set of predictors (Appendix A of Supplementary Material). Following is the synopsis of contributions presented in this work.

Applying different pre-processing steps for dimensionality reduction, such as data cleaning using fill in missing values and data reduction using volume compaction.
To develop a multiple regression-based prediction model for predicting stock trend using a revised set of predictors.
To provide a user-friendly decision support interface for individuals and companies to invest or not in the stock market.
Evaluating performance of the proposed system with respect to baseline methods.

To achieve aforementioned objectives, the intention is to contribute knowledge, beneficial for stock market investors by developing a stock trend prediction system based on multiple regression model to assist the individuals and companies interested to take a decision regarding making an investment in the stock market.

The rest of paper is structured as follows. Section 2 presents literature review. In Sect. 3, we describe the proposed method. Experiment design is presented in Sect. 4. The final section outlines the work with a discussion on how it can be expanded in future.

2 Related work

The literature review section deals with the discussion on relevant studies on stock market trend prediction. The prior works are classified into two major categories, namely (i) analytical approach, and (ii) sentiment-based approach. Figure 1 shows proposed classification scheme of the literature review.

The first level shows that there are two major approaches, namely (i) analytical, and (ii) sentiment-based for stock market prediction. The level two shows related data sources for each approach. The last level depicts the relevant models used for both of the aforementioned approaches of stock prediction. The rest of the section is based on the aforementioned literature review classification scheme.

2.1 Analytical approaches

These approaches deal with the acquisition of secondary and historical stock data from different financial sources, such as Yahoo Finance, Google Finance, and Pakistan Stock Exchange. Analytical approaches are further classified into Regression driven and Autoregressive Integrated Moving Average (ARIMA) techniques.

The regression driven approaches are based on different regression models such as linear regression, multiple regression, logistic regression and genetic algorithm. The collected data is passed through a specialized pre-processing module to prepare it for further processing. Finally, pre-processed data is made an input to the prediction module.

In data mining and statistics, multiple regression model is a technique for estimating the relationship between a dependent and one or more independent variables (Zhang et al. 2013). In multiple regression, an attempt is made to model the relationship between two or more explanatory variables and a response variable by fitting a linear equation. The model associates each value of the independent variable x with a value of the dependent variable y. Following are some related works based on regression paradigm.

Ladan et al. (2014) introduced a multiple regression model to establish the relationship between different macroeconomic variables, namely: inflation rate, exchange rate and interest rate and stock prices. They compiled the dataset from CBN Statistical Bulletin 2010 available at www.cenbank.org. It is reported that exchange rate is significant, whereas interest rate is not significant and there exists a high correlation between the macroeconomic variables, such as inflation rate, exchange rate, and stock market returns. Moreover, it is observed that 54.2% of the total variation in all Index can be justified by the microeconomic variable. However, they used only limited number of macroeconomic variables, which if increased, can produce better results.

Kamley et al. (2013) applied the multiple regression approach to predict the stock market price from the stock market data (yahoo finance) using three variables, namely: open, high and close. Different pre-processing steps are applied for the cleansing, integration, and transformation of data. They achieved a prediction accuracy of 89%. However, the major limitation of their work includes lack sufficient training of the model due to poor selection of stock indicators.

Javaid (2010) used multiple regression model to predict market shares of companies on different parameters, namely: KIBOR, dividend, earning per share, gross domestic product, and inflation. They used Karachi-Stock-Exchange (KSE) 30-Index dataset and achieved up to 62% prediction accuracy. However, more enhanced variables can be used to predict the stock trend.

Reddy (2010) developed a statistical analysis tool (SAS) system to forecast stock market data by collecting secondary data over different periods from the National Stock Exchange (Nifty). They identified the seasonal differences and reported that the data series is likely to more accurate to build the model such as auto regression (AR), moving average (MA). Finally, forecasting is made for the market index by presenting change in market indices in a graphics mode. The limitation of AR model includes the huge difference between predicted and real-time values, due to some external factors, which often leads to inaccurate prediction.

Yuan and Luo (2014) analyzed the factors: price, trading volume, and position which affect the price movements. The required data is acquired from the “wind information terminal”, including the index of opening price, closing price, highest and the lowest price, volume and total holdings. The decision tree model is applied to predict price movement based on the selected variables by acquiring a prediction accuracy of about 70%. Moving average price has more significant with respect to volume and interest. They reported that the volume and position change has a limited role in price forecasting.

Park et al. (2010) describes the procedure to analyze financial data using multiple regression model. They used MRDDV model that emerges quantitative as well as qualitative variables to estimate the behavior of data. S&P closing price and XHB daily closing price are used as quantitative variables and consumer sentiments and housing construction data are used as qualitative variables. The model indicates the highs of y, the model shows lows in the stock data in both cases. However the lac of their study to include only one quantitative and one qualitative variable instead of using several quantitative variables for more effective analysis.

Unlike regression models, Autoregressive Integrated Moving Average (ARIMA) techniques, introduced by Box and Jenkins in 1970, are considered as effective for short-term prediction, such as financial time series forecasting. Following are some the selected studies.

Ariyo et al. (2014) introduced a short-term stock prediction system using ARIMA model using stock-related data from New York Stock Exchange (NYSE) and Nigeria Stock Exchange (NSE). The model achieved results with respect to comparing methods.

Devi et al. (2013) developed a stock trend forecasting system using ARIMA model by highlighting seasonal trend and flow. The model produces efficient results in terms of effective time series analysis for stock trend prediction with respect to baseline studies.

In their work on Indian stock prediction, Mondal et al. (2014) performed experiments on 56 Indian stocks using ARIMA model. The proposed model is evaluated using Akaike information criterion (AIC). Experimental result show effectiveness of the model in terms of improved prediction accuracy over previous different periods of data.

2.2 Sentiment-based approaches

In these approaches, the stock prediction is performed on the basis of user-generated reviews about a particular stock, posted on different online forums and review sites. User reviews about a particular stock are extracted, pre-processed and classified using supervised and unsupervised learning approaches. In the following paragraphs, some of the prior works performed on sentiment-based approaches are presented.

Rao and Srivastava (2012), analyzed the relationship between tweets analogy such as bullishness, volume, agreement with the financial market indicators like volatility, trading, and stock prices, and achieved a stable correlation. On daily basis, the sum of positive and negative tweets are calculated using Naïve Bayesian classifier. For forecasting, they incorporated an Expert Model Mining System (EMMS) with R square and mean absolute percentage error. The prediction accuracy increases with the increase in time windows and vice versa. However, they conducted the experiments by capturing sentiments of a particular index or company instead of multi-company indices.

Qasem et al. (2015) used a machine learning techniques to classify twitter sentiments. Stock sentiments are collected mainly from Twitter, Google, Facebook, and Tesla. After pre-processing, the neural network and logistic regression model are trained for further analysis. Results show that uni-gram-based TF-IDF outperforms bi-gram-based TF IDF with an accuracy of 58%. The limitation of their work includes not using the clustering techniques.

Nasseri et al. (2015) proposed an intelligent trading support system using text mining techniques and reported that stock-related microblogs affect the stock prices. The stock-related sentiments are collected from “StockTwits” micro-blogging site. After performing feature selection for extracting relevant terms in a tweet, decision tree model is applied to identify the decision trend of important terms. Results show that user’s sentiments act as a valuable resource for stock trading decision. The limitation of their work includes the limited length of N-grams for extracting tweets.

Enke et al. (2011) proposed a three-layer stock market prediction system based on regression analysis. The first layer chooses only those variables which have a positive and significant relationship with the target. After selecting the appropriate parameters, next stage is to apply type-2 fuzzy clustering technique on these parameters for constructing a cluster of related data and to extract fuzzy rules from such clusters. Finally, the extracted fuzzy rules are passed through a neural network for effective prediction of the stock market behavior.

In the classification of Indian stock market data, Soni and Shrivastava (2010) applied three supervised machine learning algorithms: classification and regression tree (CART), LDA and QDA, which provides a comprehensive mode of analysis of stock market data in form of a binary tree, linear surface, and quadratic surface. The performance of the three machine learning algorithms is evaluated in terms of misclassification and correct classification rate. The misclassification rate of CART algorithm is 56.11%, which is smaller than the other two machine learning algorithms, indicating a better classification performance. Table 1 gives an overview of the selected studies conducted on stock prediction. The proposed system is shown in Fig. 2.

Table 1 Overview of selected studies

Development of stock market trend prediction system using multiple regression

Abstract

Similar content being viewed by others

Analysis of Stock Market and Its Forecasting

Visualization and forecasting of stock’s closing price using machine learning

Risk Analysis for Long-Term Stock Market Trend Prediction

Explore related subjects

1 Introduction

2 Related work

2.1 Analytical approaches

2.2 Sentiment-based approaches

3 Methods

3.1 Data collection

3.2 Computation and pre-processing of stock market indicators

3.2.1 Computation of stock change

3.2.2 Computation of stock gain/return

3.2.3 Determination of dispersion/volatility

3.2.4 Stock volume

3.2.5 Volume reduction by computation of average

3.2.6 Return-on-investment (ROI)

3.2.7 Missing values

3.3 Stock market analysis/predictions system

3.4 Proposed algorithm and implementation

4 Results and discussion

4.1 Coefficient of determination (multiple R-square)

4.2 Evaluating the fitness of proposed model

4.3 Proposed model assist individuals and companies to invest or not in the stock market by providing user friendly interface

4.4 Comparison of proposed model with state-of-the-art methods

5 Conclusions and future work

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (RAR 353 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation