1 Introduction

The selection of risky assets for investment to be included in a portfolio that can optimize the return and risk of an investor is a herculean task in the current scenario of political instability, economic upheaval, increased terrorism, and supply chain disruption due to natural calamities and epidemics throughout the world. Due to the increasing uncertainty in economic conditions, while companies are facing recession and laying off staff to stay afloat in the market, the stock trading market has also become unpredictable, and hence the formation of an optimal portfolio of risky assets has become tough. In this context, the Modern Portfolio Theory developed by Harry Markowitz (1952) remains valuable. The theory provides a solid foundation for investors not only during normal times but also during times of uncertainty and crisis.

In the mean–variance portfolio framework of Markowitz, diversified portfolio development secures the highest possible expected return for a given degree of risk tolerance (Cheong et al. 2017). By clustering the assets based on certain characteristics and selecting assets from different clusters, one can ensure that the portfolio includes a well-diversified mix of assets. Moreover, the experimental results have shown that the use of clustering algorithms can improve the reliability of the portfolio (Long et al. 2014). Chen and Huang (2009) and Nanda et al. (2010) applied clustering in their work to mitigate the complexity of diversification. Since then, various authors have employed clustering techniques to address the diversification of stocks in portfolio selection problems (Ashfaq et al. 2021; Long et al. 2014).

Forecasting the returns of individual stocks is a crucial step in the portfolio construction process. The theoretical studies have shown that mean–variance portfolio selection problems are very sensitive to small forecast errors in the means and covariances (Du 2022; Chopra et al. 1993; Goldfarb and Iyengar 2003). In recent times, researchers have shown growing interest in extending the mean–variance model by improving the accuracy of the expected returns using various forecasting methods (Ashrafzadeh et al. 2023; Du 2022; Wu et al. 2021; Gu et al. 2020). Conventional forecasting techniques are not suitable for time series data with non-linearity and non-stationarity. In such situations, deep learning machine algorithms outperform the conventional methods of forecasting (Du 2022).

Selecting the right forecasting technique is crucial, but improving model performance also depends on various other aspects such as determining an initial approximate solution, optimizing the model’s meta-parameters, the training approach, etc. To achieve the best results, researchers have introduced several metaheuristic methods designed to search for the optimal set of hyperparameters. Metaheuristic techniques also enable the researchers to approximate optimal solutions to the portfolio optimization problem in an efficient manner (Erwin and Engelbrecht 2023). Some of these algorithms include the genetic algorithm (Gupta 2022; Cheong et al. 2017; Chang et al. 2009), the firefly algorithm (Wang and Liu 2019), and the particle swarm optimization (Wang and Liu 2019; Song et al. 2023). Unlike exact methods, which are suitable for solving simpler optimization problems under strict assumptions, metaheuristic methods are applicable to a broad range of more complex problems.

In this paper, we have proposed a novel approach that integrates clustering techniques, deep machine learning, and a metaheuristic algorithm to enhance the process of asset selection and allocation. First, data is extracted from the ProwessIQ database for S&P BSE 500 index companies. We apply the Expectation–Maximization (EM) clustering technique to categorize the S&P 500 companies into groups based on similar financial performance indicators. Subsequently, we concentrate on predicting the return of assets that have been chosen using the clustering technique. To do this, a deep neural-network based learning method called Neural Basis Expansion Analysis for Interpretable Time Series (N-BEATS) is employed. The portfolio optimization problem considered in this paper takes numerous objectives into account, such as variance, skewness, kurtosis, and entropy. Since in real life, any individual investor or any corporation is having a limited budget for investment and having a goal of getting a minimum return on investment, we also incorporate constraints relating to mean return, capital allocation, and budget limits to improve the practicality of the problem. In the end, the multi-verse optimization (MVO) approach is utilized to solve the portfolio problem which can help the company/investor to decide about how to distribute his wealth optimally among different assets at minimum risk.

The contribution of the present work is significant for more than one reason. Firstly, to the best of our knowledge, no prior research has utilized an MVO approach for solving a Multi-objective portfolio optimization problem. Secondly, although EM clustering and the N-BEATS forecasting technique are valuable methods, not many authors have paid attention to these approaches. Furthermore, the combination of MVO, N-BEATS forecasting, and EM clustering has not been explored previously by any researcher. Consequently, the present study offers substantial value to researchers, practitioners, and investors by addressing these important research gaps.

The rest of the paper is organized as follows. Section 2 presents an in-depth analysis of the literature. Section 3 provides the theoretical background by explaining all the methods involved in this paper like the EM, N-BEATS, MVO. Section 4 explains the proposed problem formulation, notation, and assumptions. Section 5 describes the data used for the problem, the analysis of the data, and the obtained results. Section 6 concludes the paper with a brief discussion of the findings. Finally, Sects. 7 and 8 discusses about future research opportunities and limitations of the research works.

2 Literature review

Harry Markowitz (1952) introduced the formula for calculating the risk of the portfolio by including the covariation term between the returns of risky assets and emphasized diversification of the portfolio to reduce the overall risk by including least-correlated, zero-correlated, or even negatively correlated assets. Diversification is a portfolio allocation technique that seeks to reduce idiosyncratic risk. A perfect positive correlation between assets in a portfolio raises the portfolio's standard deviation, or risk. Portfolios can be diversified in numerous ways, like across industries, asset classes, and markets (i.e., countries).

The unsupervised machine learning technique called “clustering” also helps in diversifying the portfolio. Cluster analysis is a tool used for grouping objects that have common features and is used by investors to create a subsystem trading strategy that assists them in building a diverse portfolio by picking stocks from different clusters. If implemented correctly, the individual clusters will have little association with one another. Investors obtain all the benefits of diversification under this setting: decreased downside losses, preservation of capital, and the opportunity to make riskier transactions without increasing overall risk. Diversification is a key tenet of investment, and clustering is simply one method for attaining it. In recent times, authors and researchers have shown growing interest in using clustering in their studies (Long et al. 2014; Cheong et al. 2017; Rezani et al. 2020; Sehgal and Jagadesh 2023; Sass and Thös, 2021; Wang and Aste 2023; Menvouta et al. 2023). K-means is one of the most widely used clustering techniques. It is easy to understand, computationally efficient, and works well when clusters are spherical and have similar sizes. Few of the recent studies on K-means clustering (Aithal et al. 2023; Navarro et al. 2023; Wu et al. 2022; Cheong et al. 2017; Nanda et al. 2010). However, it is sensitive to the initial placement of centroids, and it might converge to the suboptimal solutions in some cases. EM clustering, which is an extension of K-means, is, on the other hand, a more general framework that works well with data distributions that are not necessarily spherical or have equal sizes. It is often used in cases where the clusters have overlapping or complex shapes. EM clustering employs a probabilistic approach to clustering and is based on the expectation maximization algorithm. EM clustering is more flexible than K-means as it allows for more complex cluster shapes and sizes. It is also more robust to the choice of initial parameters due to its probabilistic nature. However, it can be computationally more intensive and might require careful initialization of parameters to converge on a good solution. A comparison of both the techniques is given by Jung et al. (2014) and Moghadaszadeh and Shokrzadeh (2018). Ng and Chin Khor (2014) built a plantation stock portfolio for Bursa Malaysia index using EM clustering technique.

To address the univariate time series forecasting problem using deep learning, Oreshkin et al. (2019) introduced a deep neural architecture, N-BEATS, which incorporates backward and forward residual connections as well as a very deep stack of completely linked layers. The M3, M4, and TOURISM competition datasets have shown cutting-edge performance for two N-BEATS configurations. A few of the studies available on N-BEATS are as follows: (Oreshkin et al. 2021; Sbrana and Lima De Castro 2023; Ma et al. 2023; Kaja et al. 2021). The only research available where N-BEATS forecasting is used in the field of finance is by Singhal et al. (2022), in which they describe a technique for improving stock market index forecasting that blends wavelet processing with the deep learning architecture, N-BEATS. This leaves a research gap in the field.

Mirjalili et al. (2016) offered a nature-inspired algorithm called multi-verse optimizer (MVO). The main inspirations for this algorithm come from three cosmological concepts: white holes, black holes, and wormholes. These three concepts are designed mathematically to be executed: exploration, exploitation, and local search, respectively. The multi-objective multi-verse optimizer (MOMVO) is a multi-objective variation of the MVO suggested by Mirjalili et al. (2017). The competitive multiverse optimizer (CMVO), a unique population-based optimization approach, is introduced by Benmessahel et al. (2020). Although it uses a different framework, this unique approach is fundamentally based on MVO. Abualigah (2020) reviewed existing literature on MVO and presented a comprehensive survey of the work. None of the studies employed MVO to solve the portfolio optimization problem.

The mean–variance portfolio optimization model of Markowitz assumes that the return of assets follows a normal distribution, which might not be true, as proved by many researchers (Malek et al. 2009; Z. Zhu et al. 2020; Saranya and Prasanna 2014). In cases of violation of the normal distribution of return, only two moments, mean and variance, are inadequate to consider while finding an optimal portfolio. Higher-order moments like skewness and kurtosis also need proper attention in selecting an optimal portfolio of assets, as has been proven in many studies (Nguyen 2016; Abdelaziz and Chibane 2023; Sihem and Slaheddine 2014; Mirlohi et al. 2021). The introduction of higher-order moments can help in identifying assets with low correlation to the traditional risk factors, thus enhancing the diversification benefits of the portfolio (Barkhagen et al. 2023; Naqvi et al. 2017; Khan et al. 2020). To ensure portfolio diversification, entropy is another measure. It helps in diversifying the portfolio and hence in increasing its performance, and it has been used in many studies along with higher moments (Gupta et al. 2019; Gonçalves et al. 2022; Nabizadeh and Behzad 2018; Batra and Taneja 2022; Pourrafiee et al. 2020; Ji et al. 2017). Zhou et al. (2013) examined the concepts and principles of entropy as well as their applications in finance, particularly portfolio selection and asset pricing.

By taking moments of return like variance, skewness, kurtosis, and entropy into our objective function, the portfolio optimization model becomes a multi-objective portfolio selection model with conflicting objectives of maximizing skewness and entropy while minimizing variance and kurtosis. Many authors have used goal programming to solve related problems (Ashfaq et al. 2021; Siew et al. 2021; Aksaraylı and Pala 2018). Many have used metaheuristics like (Li et al. 2023; Chen and Zhou 2018). Milhomem and Dantas (2020) conducted a thorough examination of the exact and heuristic approaches, software and programming languages, restrictions, and forms of analysis (technical and fundamental) employed in the solution of the portfolio optimization problem.

2.1 Motivation and contribution

An in-depth exploration of the literature review helped us identify the research gaps. Motivated by the research gap, the present study was undertaken. The main contribution of the present work lies in employing the MVO for the first time in the present context. Furthermore, the integration of the EM clustering technique and the N-BEATS forecasting method with the multi-verse portfolio optimization problem presents a novel approach. Elements such as variance, skewness, kurtosis, and Gini Simpson entropy are incorporated in the objective function for the analysis. The multi-objective problem encompasses a combination of objectives and constraints that have not been previously addressed, making it useful for complex portfolio allocation situations. Additionally, the utilization of the clustering method and forecasting techniques remains relatively unexplored by most researchers in this field. Our findings provide researchers and practitioners with valuable insights into how different combinations of objectives can impact portfolio performance (Table 1).

Table 1 Feature comparison of present study with existing studies in the literature

3 Methods description

3.1 Expectation maximization (EM) clustering method

EM is a general iterative optimization algorithm used to estimate the parameters of statistical models, particularly in situations involving missing or hidden data, as explained by Do and Batzoglou (2008). EM is a specific type of probabilistic clustering which uses the concept of Gaussian Mixture Models (GMMs).

The EM algorithm is often associated with Sir Ronald A. Fisher and developed further by other statisticians and researchers. However, its application to clustering and Gaussian mixture models can be attributed to many contributors, including Dempster et al. (1977). They introduced the algorithm and its application to statistical modelling in their paper titled "Maximum Likelihood from Incomplete Data via the EM Algorithm."

In the context of clustering, the EM algorithm for Gaussian mixture models iteratively updates the estimates of the mixture model's parameters by alternating between two steps:

  1. 1.

    Expectation Step (E-step) In this step, for each data point, the algorithm calculates the probabilities of belonging to each cluster based on the current estimates of cluster parameters. These probabilities represent the "expectation" of the hidden or missing cluster assignments.

  2. 2.

    Maximization Step (M-step) In this step, the algorithm updates the parameters (means, variances, and mixing proportions) of the Gaussian distributions in a way that maximizes the likelihood of the observed data given the current cluster assignments.

By iteratively repeating these steps, the algorithm aims to find a set of parameters that maximize the likelihood of the observed data. This process helps in estimating the underlying cluster structure of the data.

In summary, while the EM algorithm itself is not attributed to a single individual, its application to clustering, particularly Gaussian mixture models, has been developed by a combination of researchers in the fields of statistics and machine learning.

3.2 N-BEATS (neural basis expansion analysis for interpretable time series)

Oreshkin et al. (2019) proposed N-BEATS, which is a deep neural network based on backward and forward residual links as well as a very deep stack of fully connected layers. It is a univariate model. The architecture (shown in Fig. 1) of the model is founded on a few fundamental ideas.

  • The foundation framework should be simple, general, and descriptive (deep).

  • The design should not rely on feature engineering or input scaling that is time series-specific (like trend and seasonality).

  • For investigating interpretability, the architecture should be expandable so that its outputs can be easily interpreted by a human.

Fig.1
figure 1

N-BEATS architecture (Adapted from: (Oreshkin et al. 2019))

The data is fed into the model as a lookback period. The lookback period is the back horizon, which is used to make predictions on the forecast horizon. If the length of the forecast horizon is H, then the length of the backcast horizon should be 2H–7H.

The model is divided into a collection of blocks and stacks.

Block A block is simply four fully connected (FC) layers that give rise to two forks. The first one attempts to recreate the back horizon input, whereas the second attempts to forecast the horizon. FC layers give rise to the \({\theta }^{b}\) and \({\theta }^{f}\) coefficients. They are expansion coefficients. \({g}^{b}\) and \({g}^{f}\) are the basis vectors. Then a linear combination of coefficients and basis vectors suffices to generate a prediction.

Stack A stack is made up of multiple basic blocks that are organized following the double residual stacking concept. The output of the Basic Block undergoes two arithmetic operations (backcast and forecast), hence the phrase double residual stacking.

Multiple linked blocks yield better results; the subsequent blocks attempt to forecast the missing part of their predecessors, and the outputs are finally summed. These blocks form a stack, and the sum of numerous stacks yields the final output.

At this point, the model is in its generic form. The base expansion function, represented as \(g\) in the illustration, is trainable. To achieve the best outcomes, the neural network constructs a problem-specific function.

But to make the model interpretable, the author has incorporated trend and seasonality in the model in the form of polynomial and Fourier basis, respectively. As a result, in the interpretable version of the architecture, the model contains only two stacks: one for predicting a trend component and the other for forecasting a seasonal component. The predictions are then pooled to generate a final output.

The first block receives the actual input, \(n*H\). The subsequent blocks receive the backcast of the prior block as their input.

For brevity, the mathematical aspect of the model has been explained for \({k}^{th}\) block.

Suppose \({k}^{th}\) block receives its input as \({x}_{k}\) and then it gives out two outputs. One is \({\widehat{x}}_{k}\) and the other is \({\widehat{y}}_{k}\), which are the backcast and forecast of the, \({k}^{th}\) block. In the subsequent block, \({k+1}^{th}\) block receives the backcast from prior block \(k\) as its input, i.e., \({x}_{k+1} = {x}_{k} - {\widehat{x}}_{k}\).

The size of the input matrix is determined by batch size and back horizon. The input in each block passes through a pack of four fully connected layers and the ReLU (rectified linear unit) activation function, which produces the backward and forward expansion coefficients, \({\theta }_{k}^{b} and {\theta }_{k}^{f} ,\) respectively. Then these coefficients form a linear combination with vector basis, \({g}_{k}^{b} and {g}_{k}^{f} , respectively\) and make a prediction (forecast and backcast). Only one block is enough for making a prediction, but subsequent blocks are added to improve the result.

The operation of first part of \({k}^{th}\) block is described below:

\({h}_{k,1}=F{C}_{k,1}({x}_{k}\)), \({h}_{k,2}=F{C}_{k,2}({h}_{k,1}\)), \({h}_{k,3}=F{C}_{k,3}({h}_{k,2}\)), \({h}_{k,4}=F{C}_{k,4}({h}_{k,3}\))

\({\theta }_{k}^{b} ={ LINEAR}_{k}^{b}({h}_{k,4}\)), \({\theta }_{k}^{f} ={ LINEAR}_{k}^{f}({h}_{k,4}\))

The second part of the \({k}^{th}\) block projects expansion coefficients \({\theta }_{k}^{b} and {\theta }_{k}^{f}\) to vector basis and gives \({\widehat{x}}_{k} = {g}_{k}^{b} ({\theta }_{k}^{b}) and\) \({\widehat{y}}_{k} = {g}_{k}^{f} ({\theta }_{k}^{f})\). This operation can generally be described as follows:

$${\widehat{x}}_{k} = \sum_{i=1}^{dim({\theta }_{k}^{b})}{\theta }_{k,i}^{b}{v}_{i}^{b}, {\widehat{y}}_{k} = \sum_{i=1}^{dim({\theta }_{k}^{f})}{\theta }_{k,i}^{f}{v}_{i}^{f}$$

where \({v}_{i}^{b} and {v}_{i}^{f} are backcast and forecast vector basis.\)

As mentioned above, the N-BEATS model has two configurations: one is generic and the other is interpretable. Generic architecture does not depict time series-specific information. In it, \({g}_{k}^{b} and {g}_{k}^{f}\) are linear projections of the output of the preceding layer. To make the model more interpretable, trend and seasonality components are introduced.

Stack-level indexing, which was excluded in the generic model, is considered an interpretable one. Like, \({\widehat{y}}_{s,k}\) denotes the partial forecast of \({k}^{th}\) block in stack \(s\).

Trend model: A common feature of a trend is that it is almost always a monotonic function, or at least a slowly changing one. To replicate this behavior, constrain \({g}_{s,k}^{b} and {g}_{s,k}^{f}\) are polynomials of small degree p, a function that slowly varies over the prediction window:

$${\widehat{y}}_{s,k}=\sum_{i=1}^{p}{\theta }_{s,k,i}^{f}{t}^{i}$$

Here\(,t={\left[\mathrm{0,1},2,\dots ,H-2,H-1\right]}^{T}/H\) is time vector.

3.3 Multi-verse optimizer

MVO is a population-based, nature inspired metaheuristic algorithm. It is inspired by the multiverse theory. Mirjalili et al. (2016) proposed this method for solving numerical optimization problems. The MVO algorithm is based on principles of physics. In the multiverse theory, multiple worlds interact and may even collide. Each universe, according to MVO, would have its own set of physical principles. The three fundamental constituents of multiverse theory are white holes, black holes, and wormholes. The big bang could be regarded as a white hole and possibly the key component in the development of the universe. Black holes attract everything, including light beams, due to their immense gravitational attraction. Wormholes are holes in the cosmos that connect different portions of it. They serve as time and space travel tubes in the multiverse approach. These tunnels allow objects to travel between any two corners of a universe or even from one universe to another in a moment.

Key principles of the MVO optimization process:

  • As the inflation rate (fitness value) rises, so does the likelihood of having a white hole, whereas the likelihood of having a black hole decrease.

  • Objects are more likely to pass through white holes in universes with a higher inflation rate than through black holes with a lower inflation rate.

  • Regardless of the inflation rate, objects in all universes may transfer at random to the best universe via wormholes.

Each solution is a universe, and each variable in that universe is an object. Furthermore, an inflation rate is applied to each solution that is proportional to the fitness function value associated with the solution.

To describe the mathematical model of the white and black hole tunnels as well as the transportation of items across universes, a roulette wheel mechanism is used. The roulette process is used to select one universe from among all possible universes for the white holes. At each iteration, we will rank the universe by fitness value and choose one using roulette.

According to multiverse theory, there are several universes:

\(U = \left[\begin{array}{cccc}{y}_{1}^{1}& {y}_{1}^{2}& \cdots & {y}_{1}^{m}\\ {y}_{2}^{1}& {y}_{2}^{2}& \cdots & {y}_{2}^{m}\\ \vdots & \vdots & \ddots & \vdots \\ {y}_{n}^{1}& {y}_{n}^{2}& \cdots & {y}_{n}^{m}\end{array}\right]\) where m = number ofobjects, n = number of universes.

Mathematical model for the selection of universe using roulette wheel selection process:

$${y}_{i}^{j} = \left\{\begin{array}{c}{y}_{k}^{j}, {R}_{1}< NI({U}_{i})\\ {y}_{i}^{j}, {R}_{1}\ge NI({U}_{i})\end{array}\right.$$

where \({y}_{k}^{j}={j}^{th}parameter of {k}^{th} {\text{universe}},\mathrm{ selected by roulette wheel selection process}\)

$${y}_{i}^{j}={j}^{th}parameterof{i}^{th} {\text{universe}}$$
$${R}_{1}\in \left[\mathrm{0,1}\right],a random value$$
$${U}_{i}={i}^{th }universe$$
$$NI = normalized rate of inflation$$

Assume that each universe has wormholes to ensure the random interchange of objects via space. They shift objects at random without regard for their inflation rates. Assume that wormhole tunnels are always formed between a universe and the optimal universe (to provide local changes to each universe). Below is the mathematical formulation of this mechanism:

$${y}_{i}^{j}=\left\{\begin{array}{c}\left\{\begin{array}{c}{Y}_{j}+TDR\times \left(\left(U{B}_{j}-L{B}_{j}\right)\times {R}_{4}+L{B}_{j}\right),{R}_{3 }< 0.5\\ {Y}_{j}+TDR\times \left(\left(U{B}_{j}-L{B}_{j}\right)\times {R}_{4}+L{B}_{j}\right),{R}_{3 }< 0.5\end{array}\right.\\ {y}_{i}^{j},{R}_{2} \ge WEP\end{array}\right.,{R}_{2}< WEP$$

where \({Y}_{j}={j}^{th} parameter of the fitted universe\)

$${y}_{i}^{j}={j}^{th}parameterof{i}^{th} universe$$

UB (Upper bound) = Maximum limit.

LB (Lower bound) = Minimum limit

$$\{{R}_{2}, {R}_{3}, {R}_{4}\} \in [\mathrm{0,1}] and are random values$$

WEP (wormhole existence probability) and TDR (travelling distance rate) are coefficients. WEP is used to define the likelihood of the existence of wormholes in other universes. TDR helps in determining the distance rate (variation) at which an object can be transferred by a wormhole around the best universe obtained thus far. Unlike WEP, TDR is enhanced over iterations to allow for more precise exploitation and local search around the best-obtained universe (Fig. 2). They are formulated below:

Fig.2
figure 2

Pseudo Code for multi-verse optimizer (Adapted from: (Mirjalili et al. 2016))

$$WEP=min+e\times \left(\frac{\mathrm{max }-\mathrm{ min}}{{\text{E}}}\right), e = current iterartion, E = maximum iteration$$
$$TDR = 1 - \frac{{e}^\frac{1}{p}}{{E}^\frac{1}{P}} , p = exploitation accuracy$$

3.4 Higher moments

Portfolio optimization using higher moments refers to the process of constructing a portfolio by considering not just the expected return and risk (the first and second moments of the return distribution), but also higher-order moments such as skewness and kurtosis.

The inclusion of higher moments in portfolio optimization can lead to more diversified portfolios that recognize the dangers of asymmetric returns and fat-tail risk.

Let \({{\text{R}}}_{{\text{p}}}\) be a random variable representing the portfolio return. \(R=\left({R}_{1},{R}_{2},\dots ,{R}_{n}\right)\) be the return vector of n assets where \({R}_{i}{\prime}s\) are rate of return of \({i}^{th}\) asset.

Further, let \(X=\left({x}_{1},{x}_{2},\dots ,{x}_{n}\right)\) be the weight vector where, \({x}_{i}{\prime}s\) represents the proportion of investment in \({i}^{th}\) asset.

Then the first four moments (Kemalbay et al. 2011) (Aksaraylı and Pala 2018) of portfolio return, \({R}_{p}\) can be calculated as follows:

$$Mean=E\left({R}_{p}\right)=E\left[{X}^{T}R\right]=\sum_{i=1}^{n}{x}_{i}{\mu }_{i}= {X}^{T}\mu = {X}^{T}{M}_{1}$$
$$Variance=V\left({R}_{p}\right)=E{\left[{X}^{T}R - E\left[{X}^{T}R\right]\right]}^{2}=\sum_{i=1}^{n}\sum_{j=1}^{n}{x}_{i}{x}_{j}{\sigma }_{ij}= {X}^{T}VX = {X}^{T}{M}_{2}X$$
$$Skewness S\left({R}_{p}\right)=E{\left[{X}^{T}R - E\left[{X}^{T}R\right]\right]}^{3}=\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}{x}_{i}{x}_{j}{x}_{k}{s}_{ijk}= {X}^{T}S(X\otimes X) = {X}^{T}{M}_{3}$$
$$Kurtosis K\left({R}_{p}\right)= E{\left[{X}^{T}R - E\left[{X}^{T}R\right]\right]}^{4}= \sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{l=1}^{n}{x}_{i}{x}_{j}{x}_{k}{x}_{l}{k}_{ijkl}{=X}^{T}K\left(X\otimes X\otimes X\right)={X}^{T}{M}_{4}$$

Here, \(\upmu =\mathrm{ E}[{\text{R}}]\) \(={(\upmu }_{1},{\upmu }_{2},\cdots ,{\upmu }_{n})\) are the mean return of each asset vector,

\(V=E{\left[R-E\left[R\right]\right]}^{2}\) is \(n\times n\) variance–covariance matrix consisting of values like \(\sigma_{ij}{\prime} s \forall \left( {i,j} \right) \in \left[ {1, \cdots ,n} \right]\) and \(\sigma_{ij} { } = {\text{ E}}\left[ {\left( {R_{i} - E\left[ {R_{i} } \right]} \right)\left( {R_{j} - E\left[ {R_{j} } \right]} \right)} \right],\)

\(S=E{\left[R-E\left[R\right]\right]}^{3}\) is \(n\times {n}^{2}\) skewness coskewness matrix consisting of values like \(s_{ijk} ^{\prime}s \forall \left( {i,j,k} \right) \in \left[ {1, \cdots ,n} \right]\) and \({s}_{ijk} =\mathrm{ E}\left[\left({R}_{i}-E\left[{R}_{i}\right]\right)\left({R}_{j}-E\left[{R}_{j}\right]\right)\left({R}_{k}-E\left[{R}_{k}\right]\right)\right]\),

\(K=E{\left[R-E\left[R\right]\right]}^{4}\) is \(n\times {n}^{3}\) kurtosis cokurtosis matrix consisting of values like \(k_{ijkl} { }^{\prime}s \forall \left( {i,j,k,l} \right) \in \left[ {1, \cdots ,n} \right]\) and \(k_{ijkl} { } = {\text{ E}}\left[ {\left( {R_{i} - E\left[ {R_{i} } \right]} \right)\left( {R_{j} - E\left[ {R_{j} } \right]} \right)\left( {R_{k} - E\left[ {R_{k} } \right]} \right)\left( {R_{l} - E\left[ {R_{l} } \right]} \right)} \right]\). \({M}_{1},{M}_{2},{M}_{3}and {M}_{4} denotes these moments\). ⊗ denotes Kronecker product.

3.5 Gini-Simpson (GS) entropy

The following expression describes GS entropy as proposed by Aksaraylı and Pala (2018):

$$GS entropy: 1 - \sum_{i=1}^{n}{x}_{i}^{2} = 1 - {X}^{T}X$$

4 Proposed (Variance–Skewness–Kurtosis–Entropy) VSKE optimization model

The optimization model utilized to calculate the best percentage of investments to be made in all assets in the portfolio problem, as well as the assumptions and concepts employed in the problem, are discussed in this section.

4.1 Assumptions and notations

It is assumed that the investor is risk averse and will be interested in investing in the efficient frontier portfolio of assets which minimizes his risk at a given level of return. Rate of return of asset is following a probability distribution and investor is interested in maximizing his utility of wealth. Further, no taxes, commission, or transaction fee is involved.

Following notations have been used in the subsequent analysis:

\({w}_{i}{\prime}s\): weights assigned to different goals in the objective function,

\({x}_{i}{\prime}s :\) proportion of investment in each asset,

\({k}_{ijkl} and {s}_{ijk}:\) cokurtosis and coskewness matrix \(, i, j, k =1, 2,\dots ,n.\) (n is number of assets),

\(MinRet:\) minimum value of return aspired by investor,

\(LB \& UB :\) lower and upper bounds on the investment proportion of assets.

Following assumptions have been made in the proposed optimization model:

  1. i.

    An investor allocates his/her wealth among n assets offering random rates of return.

  2. ii.

    The minimum target return for the investment is set at 5%.

  3. iii.

    Available capital should completely be invested.

  4. iv.

    The capital invested in each asset is assumed to be bounded between a lower and an upper bound.

  5. v.

    Predicted returns are not normally distributed; therefore, skewness and kurtosis are utilized in the analysis.

4.2 Constraints of the model

4.2.1 Constraint on return: no less that a certain amount of return allowed i.e.

$$\sum_{i=1}^{n}{x}_{i}{\mu }_{i} \ge MinRet$$

4.2.2 Capital budget constraint: capital should be completely invested i.e.

$$\sum_{i=1}^{n}{x}_{i}=1$$

4.2.3 Bound constraint: bound on capital invested in each asset, i.e.

$$LB \le x_{i} \le UB$$

where LB is the minimum proportion of investment and UB is the maximum value of investment in each asset.

4.3 Problem formulation

The multi-objective nonlinear optimization problem VSKE is formulated as follows:

$$Min{ w}_{1}\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}\sum_{l=1}^{n}{x}_{i}{x}_{j}{x}_{k}{x}_{l}{k}_{ijkl}-{w}_{2}\sum_{i=1}^{n}\sum_{j=1}^{n}\sum_{k=1}^{n}{x}_{i}{x}_{j}{x}_{k}{s}_{ijk}+{w}_{3}(1 - \sum_{i=1}^{n}{x}_{i}^{2}) +{w}_{4}\sum_{i=1}^{n}\sum_{j=1}^{n}{x}_{i}{x}_{j}{\sigma }_{ij}$$
$$s.t.$$
$$\sum_{i=1}^{n}{x}_{i}{\mu }_{i} \ge MinRet$$
$$\sum_{i=1}^{n}{x}_{i}=1$$
$$LB\le {x}_{i}\le UB$$

The proposed optimization problem for optimal asset allocation is a multiple objective quadratic problem. Obtaining an optimal solution of the considered problem is a difficult task. A metaheuristic method, MVO, is utilized to solve the given problem. The reason for applying MVO is twofold. This method has not been applied previously in the present context. Furthermore, MVO can cater to a wide range of complex situations.

5 Data analysis and results

In this study, firstly, 10 years (January 2011–January 2022) of data on fundamental indicators like adjusted opening price, adjusted high price, adjusted low price, adjusted closing price, market capitalization, total returns, earnings per share (EPS), price-to-earnings (P/E) ratio, price-to-book (P/B), book value per share (BVPS), and turnover of S&P BSE 500 index companies have been collected from ProwessIQ’s financial database. These features cover a variety of financial indicators, providing a comprehensive understanding of the data's characteristics. After normalising the data, we performed principal component analysis (PCA) to reduce the dimensionality of the data, which reduced the dimension of the data to 7 components, where the explained variance of these 7 components is 0.99678. This step is particularly beneficial for mitigating the problem of dimensionality and improving the stability of clustering algorithms. Then Expectation Maximization Technique, was applied to the data to make clusters where 500 companies were grouped according to the similarity and dissimilarity of the 7 components obtained through PCA. A total of 10 clusters were formed, where 70 companies belonged to the first cluster, 32 to the second, and subsequently 88, 30, 80, 33, 46, 35, 52, and 34 to the rest of the clusters. We assessed the quality of our clustering solution using the silhouette score. The silhouette score measures the separation between clusters and their compactness. A score of 0.243 for 10 clusters indicates a reasonable level of separation and compactness among the clusters. The graph for different values of the silhouette score is shown below (Fig. 3).

Fig.3
figure 3

Silhouette Score

We have effectively managed the sensitivity to initialization in EM clustering by implementing controlled initialization strategies, employing dimensionality reduction through PCA and evaluating clustering quality using the silhouette score. This comprehensive methodology provides robustness and reliability to our clustering results, enhancing their practical applicability and interpretation.

The composite performance measure, the Sharpe Ratio, is used to select the best-performing asset from each cluster. To further increase the diversification, companies are selected from each cluster in such a way that all selected assets belong to different sectors such as consumer durables, infrastructure, telecom, finance, real estate, health care, power, oil and gas, industrial, auto, commodities, consumption, and so on (Table 2).

Table 2 Sharpe Ratio and Sector of selected companies

A portfolio of 10 companies’ risky assets has been formed, and their five-year (October 2017—October 2022) daily return data was calculated using \((New ACP - Old ACP)/Old ACP\), for which the Adjusted Closing Price (ACP) data has been collected from the Yahoo Finance database. Now the N-Beats method was applied to predict their return for the next 60 periods by using 7*1 as the lookback period and 1 as the horizon, whereas the rest of the hyperparameters were taken as they are in Oreshkin et al. (2019). The mean absolute error and root mean square error values after testing the model are calculated as 0.02569 and 0.03119, respectively, which signifies good performance of the model (Fig. 4).

Fig. 4
figure 4

Distribution of predicted return values

As asset returns do not necessarily follow normal distribution always, which was assumed in the Markowitz Mean–variance optimization model, the normality of predicted returns is checked by applying the Shapiro–Wilk test. The normal distribution assumption was found to be violated by the estimated return data. Higher order moments like skewness & Kurtosis other than mean and variance, along with Gini entropy were considered for optimization.

The following table comprises the list of descriptive statistics for the 10 stocks selected for analysis.

DLF Limited has the highest mean. Bajaj Holdings and Investment Limited has the lowest variance. Asahi India Glass Limited has the highest skewness, and Hindustan Aeronautics Limited has the lowest kurtosis. The Shapiro–Wilk test findings are also supplied in the last column. From Table 3, it can be seen that five of the companies’ returns are normally distributed and the other five are not (bold values of p), which justifies including skewness and kurtosis as one of the objectives to get better results.

Table 3 Mean, Variance, Skewness, Kurtosis, Shapiro–Wilk Statistics, and p-value of selected companies

Now the MVO is applied to solve multi-objective optimization problems by setting a minimum return of 5% and varying weights assigned to different objectives. Results have been tabulated in Table 4.

Table 4 Distribution of weights

Table 4 displays the percentage of investment in ten selected stocks along with the risk and return of the portfolio. To begin with, we allocated equal weights to all objectives, yielding a 5.21 percent return with a 47.82 percent portfolio risk. Then we only analysed three targets at a time by keeping the weight assigned to one at zero, yielding returns of 5.06 percent, 5.14 percent, and 5.04 percent. It shows that when all objectives are taken together, the portfolio is showing the best performance in terms of risk and return.

6 Conclusion

The present work is related to the portfolio allocation problem in Indian context. The findings of the study gave encouraging results. We applied EM clustering to create a well-diversified portfolio and N-BEATS to estimate future returns for further investigations. A multi-objective portfolio optimization problem involving variance skewness, kurtosis, and GS entropy as objectives and mean return with a minimum value of 5% as an additional constraint, was considered. This problem was solved using MVO metaheuristic technique. The maximum return was obtained when all objectives—kurtosis, skewness, entropy, and variance were considered and given equal weight. By neglecting kurtosis as an objective, the worst outcome was attained. The inclusion of higher moments improves the overall quality of the result. With the help of the present study, portfolio managers can fine tune their analysis for determining where & how much proportion of their wealth should be invested and they can improve their decisions. It provides a perfect quantitative approach. A future study might concentrate on incorporating more constraints by relaxing some of the assumptions of the model into the suggested portfolio optimization model. It would also be fascinating to experiment with various metaheuristics and variants of MVO. The variability of the results of the forecasting technique is a significant limitation of the work. The results of the study rely primarily on the availability and accuracy of past data as well as parameter adjustment of applied techniques.

7 Discussion

The comprehensive study of a ten-year dataset containing crucial financial indicators is the basis of our research. PCA has been useful in enhancing the stability and interpretability of clustering methods by reducing data dimensionality. Our use of the EM approach yielded 10 well-defined clusters, laying the groundwork for asset selection and diversification techniques. The Sharpe Ratio is used to find the best-performing assets within each cluster, which improves the portfolio's risk-return profile. The N-BEATS approach for return prediction, which incorporates deep learning, has exhibited solid performance, providing useful insights for investment decisions. Furthermore, including higher-order moments into the multi-objective optimization model acknowledges the non-normal distribution of asset returns, resulting in a more thorough risk assessment. This study provides portfolio managers and investors with a realistic and methodical strategy to managing the difficulties of current financial markets.