1 Introduction

Majority of the real-world complexities generally involve optimizing multiple conflicting objectives. Simply obtaining a solution for their least values concurrently for all the objectives does not guarantee correctness; hence a compromise needs to be made. As these optimization formulations involve multiple objectives, the objective function is formulated as a vector and it is treated as a vector optimization or a multi-objective optimization problem (MOOP) [1]. A MOOP involving multiple, conflicting objectives may be combined into a single-objective scalar function. This approach is named as the weighted-sum method. This is an a priori method established on the “linear aggregation of functions” principle.

The method is alternatively named as Single Objective Evolutionary Algorithm (SOEA). By definition, the weighted-sum method reduces to a positively weighted convex sum of the objectives, as follows:

$${\text{Min}}\mathop \sum \limits_{i = 1}^{n} w_{i} f_{i} \left( x \right),{ }where{ }\mathop \sum \limits_{i = 1}^{n} w_{i} = 1;{ }w_{i} > 0{ }\forall i$$
(1)

Minimization of this single-objective function is expected to give an efficient solution for the original multi-objective problem. The process involves scalarizing the conflicting objectives into a single objective function. There are various scalarization techniques which have been proposed in the past.

Accounting for ambiguity is important when there are restrictions on data which leads to inaccurate interpretation about choices, sensitivities, and other behavioral characteristics. Bayesian analysis, grounded on Bayes' theorem, is an instrument that assists in this accounting procedure. Although theoretically lucid, it is hard to apply in different real-world problems mainly due to the need of refined estimates. However, this problem was resolved through the advancement of mathematical techniques of iterative calculations largely based on Markov chain Monte Carlo (MCMC) methods. Bayes models free researchers from computational constraints by developing more realistic models of user behavior and decision making by integrating a hierarchical model with Bayesian estimates. The several sub-models are hierarchically integrated using the Bayes theorem which manages the uncertainty aspect, hence the name Bayesian Hierarchical model.

Bayesian Learning is based on the simple logic that one can achieve better choices by including their recent interpretations and beliefs obtained through previous knowledge and experience. Bayesian learning is also effective where frequentist statistics is not implementable. It possesses supplementary features like iterative upgrade of the posterior while analyzing a hypothesis to assess the parameters of a machine learning model. This promising learning technique is based on Bayes’ theorem to obtain the conditional probability of a hypotheses which is in turn based on certain previous knowledge. Majority of the everyday problems does encompass ambiguity and incremental wisdom, therefore making Bayesian learning more applicable to solve such problems. The Bayesian approach incorporates past belief and gradually refines the prior probabilities based on fresh evidence.

Bayesian data analysis is a process of mathematically modelling data and assigning credibility to individual parameters that are steady with the data and with previous experience. Incorporation of prior belief and experience gives Bayesian learning an edge over frequentist statistics. The Bayesian technique offers greater flexibility in system modeling based on available data. It also provides clarity of demonstrating parameter uncertainty which is accurately inferred and there exists no requirement for building sampling distributions from supporting null hypotheses. Frequentist approaches for calculating ambiguity are inconsistent and tough, whereas Bayesian approaches are essentially intended to offer distinct demonstrations of ambiguity. Frequentist techniques are comparatively more cumbersome for building confidence intervals on parameter. Though there are several advantages of the Bayesian approach, there is an inherent need for an adequately greater set of trials or assigning a confidence to the established hypothesis.

Bayesian reasoning ensures the reorganization of credibility over likelihoods while incorporating fresh data. The main objective of Bayesian estimation is to obtain the most reliable parameter values for the model and this estimation offers a total distribution of credibility over the space of parameter values, not simply one “best” value. The crux of Bayesian estimation is to correctly define how ambiguity changes when fresh data are considered. It is seen that at times that the parameters have significant dependencies on one another. This ordering of dependencies among parameters illustrates a hierarchical model. A hierarchical model specifies dependencies among parameters in an ordered manner based on the semantics of the parameters. Considering data from entities within sets is a salient hierarchical model application. A hierarchical model has the flexibility to possess parameters for every entity that define every discrete entity’s characteristics, and the distribution of different parameters inside a set is exhibited by a higher order distribution with its own parameters that define the tendency of the set. The entity level and set level parameters are assessed concurrently. The hierarchical approach is beneficial as it does not merge the entity’s data together thereby preventing dilution of trends within entities. To summarize, hierarchical models have parameters that implicitly define the data at several levels and link data within & across levels.

The main objective of this work is to provide a flexible Bayesian nonparametric approach that effectively incorporates various levels of uncertainties in determining the weights, whereas the existing approaches focused on improving the solution set without paying much attention to the appropriate choice of weights. The novelty of the proposed work lies in its coherent approach of considering all the levels of uncertainty for determining the weights and thereby improving the solution set. It is to be noted that the novelty also lies in the technique of generating the weights using the proposed Bayesian methodology which helps develop a bona-fide Bayesian posterior distribution for the optima, thus accurately quantifying the uncertainty about the optima. This work, formulated on the Bayesian framework, considers prior knowledge (realized through prior distribution) of the relative significance of the conflicting objectives while generating the weights. Unlike the existing techniques, this method can be applied with convenience to handle any number of objectives. As this method yields a posterior probability distribution over the weights, the stochastically generated weight vectors can be used to obtain the points on the Pareto front with less computational complications. Yet another novelty of this work is its ability to work with small data samples as it is not feasible at times to have a large sample dataset due to time and associated cost factors.

2 Literature survey

Zadeh popularized the weighted sum technique as a classical approach for solving such problems [2]. This method, as the name suggests, scalarizes a set of conflicting objective functions, by pre-multiplying each of the objective function by predefined weights. The ε-constraints method, introduced by Hamines et al. [3], focused on minimizing the most significant objective function fs(x). Another popular scalarizing technique is the goal attainment technique [4] where the goals are stated for individual objective function fs(x) and the process aims to reduce the overall deviation from the goals. The hierarchical approach [5] and the weighted metrics technique [6] are two more techniques for solving such problems. However, the weighted sum method has gained the most popularity among these due to its simplicity. Although much research has been devoted to the development of different algorithms improving the solution set in multi-objective optimization problems using weighted sum technique, to date, a comprehensive model generating the weights using various sources of uncertainties seem to be lacking. The weighted sum (WS) technique, a commonly used scalarizing technique in multi-objective algorithms, has distinctive advantages of greater search efficiency and easier computational capabilities. Nevertheless, it is frequently critiqued for its inability to predict the logic behind the weight selection as well as its incompetence to deal with nonconvex problems.

It was suggested by Steuer [7] that the weights should scientifically determine the decision-maker’s preference for a particular objective. Das and Dennis [8] offered a graphical explanation of this technique to elucidate few of its drawbacks. The delusion between the hypothetical and the realistic interpretation of the weights for the conflicting objectives made the weight selection process quite inefficient. Various approaches have been suggested for weight selection—Yoon and Hwang [9] suggested a ranking technique whereby the objectives are ranked based on their significance. The most important objective received the largest weight with gradual decrease in weights to lesser important objectives. It was quite similar to the categorization technique in which the conflicting objectives were grouped according to their varying degree of importance. Saaty [10] proposed an eigenvalue process for attaining weights, where n (n−1)/2 pair-wise evaluations were made between these objective functions to generate a comparison matrix; from this matrix the eigenvalues yielded the weights. Wierzbicki [11] proposed a method for generating weights, where the comparative significance of the objective functions is vague, based on the utopia and the aspiration points. Another method for weight determination was proposed based on fuzzy set theory by Rao and Roy [12]. Though various techniques exist for weight determination, just the selection of the weights may not necessarily generate a feasible solution. New weights may have to be considered and the process may have to be executed again. It was thus suggested by Messac [13] that weights should be functions of the objectives and not simply constants to simulate a task precisely. According to him, the weights must address the issues related to both scaling and relative preference of the objective functions to reflect the preference appropriately.

Selection of appropriate weights leads to an algorithm’s better performance. Timothy Ward Athan [14] proposed a quasi-random weighted criteria system that produces weights covering the Pareto set consistently. The method is based on random probability distribution and involves a large number of computations. Gennert and Yuille [15] proposed a nonlinear weight determination algorithm where an optimal point is obtained that is not in the vicinity of the extreme points. Although a lot of work is available in the literature regarding systematic selection of the weights in solving a Multi-Objective Optimization problem, till date a comprehensive data driven technique determining weights reflecting the relative importance of the conflicting objectives is lacking. Many authors including Das and Dennis [8] have shown that choosing weights uniformly over (0,1) does not guarantee uniform spread of Pareto points on the Pareto front. In many cases it has been observed that the points obtained using a uniform generation of weights are found to be clustered in certain regions of the Pareto set. In their subsequent work, Das and Dennis [16] have proposed a technique based on Normal-Boundary Intersection, of obtaining an even spread of Pareto points. Like many others, their method prioritizes the solution set while deciding upon the choice of weights. J. G Lin [17] point towards the scarcity of the number of Pareto Optimal solutions obtained by the existing methods, in addition to some of the solutions coinciding with extreme points. Lin has proposed a method of solving multi-objective optimization problems by transforming them into Single-objective optimization problem, by changing one of the multi-objectives to proper equality constraints using Lagrange multiplier.

Marler and Arora [18] explicated that the weighted sum method is a simple method that delivers a linear estimate of the preference function and need not necessarily reflect one’s primary preferences. It is essentially inept of including multifaceted preferences. In spite of determining satisfactory weights a priori, the end solution need not precisely display original preferences. Rui Wang et al. [19] proposed a multi-objective decomposition-based Evolutionary algorithm based on local application of the weighted sum technique. They proposed that the optimal result for each of the search routes is obtained from amongst its adjoining results. Experimental outcomes confirmed that the MOEA/D-LWS outperformed the remaining algorithms for majority of the cases. Zhang proposed a dynamic weighted sum (DWS) technique [20] to methodically alter the weights of individual conflicting objectives for solving multi-objective optimization problems (MOO). He studied the search effect of the different dynamic weighted aggregations namely bang-bang, linear, sinusoidal and random weighted aggregations.

Jaini and Utyuzhnikov [21] proposed a compromise grading method in a fuzzy multi-criteria choice-making system. The fuzzy quantities symbolize the vague weights of each of the conflicting objectives. The authors have designed a fuzzy trade-off grading technique to rank alternatives by awarding the smallest compromise solution as the finest choice. Most of the work in the available literature has focused on fixing the weights based on some prior beliefs or information. The focus of the existing methods is towards refining the distribution of Pareto solutions provided by the WS technique [22], with less emphasis on the stability and appropriateness of the choice of weights for precise representation of the conflicting objectives.

In contrast to the objective of weight determination of the existing works, which aimed at choosing the set of weights which stabilizes the solution set [23,24,25], this work proposes to frame a model which determines a much stable set of weights in comparison to that obtained deterministically. The criticisms of the existing methodologies for determination of weights have motivated this work and to propose the Bayesian model based on multinomial and Dirichlet priors. As per the authors’ existing knowledge, this work is first of its kind since none of the earlier works had this motivation of searching for stability in weights. Unlike the frequentist approach, the Bayesian modelling is based on treating the uncertainties in the parameters probabilistically. The Frequentist methodologies, not considering prior probabilities, come up with estimates based mostly on the maximum likelihood or confidence intervals while Bayesians, have a complete posterior distribution over the possible parameter values. This allows them to account for the uncertainty in the estimate by integrating the entire distribution, and not just the most likely value.

This work has been based on the Bayesian framework as one can coherently consider any prior knowledge (reflected through the prior distribution) about the relative importance of the conflicting objectives to generate the weights. This prior knowledge (distribution) will then be updated using data from the sample using Bayesian paradigm. The sample data obtained from the pilot survey makes the probability distribution narrower around the parameter’s true but unknown values. The hierarchical Bayesian model has been so developed as to reflect the relative importance of the conflicting objectives through the respective weights, which were stochastically estimated, based on the data obtained from a pilot survey for the given purpose.

3 Statistical prerequisite

The proposed hierarchical Bayesian model methodology for generating weights is based on the Multinomial and Dirichlet distributions as priors, which are conjugate to each other. A brief discussion on the statistical prerequisites is given in this section.

3.1 Bayesian approach

Unlike frequentist approach which does not quantify the uncertainty in fixed but unknown values of the parameters, Bayesian approach, defines probability distributions over possible values of a parameter. Let x denote the data and \(\theta\) be the parameter of interest which is unknown. Let \(\theta\) \(\in \Theta\) be the parametric space. Under Bayesian approach one can quantify the prior belief about \(\theta\) by defining a prior probability distribution over \(\Theta\), the set of possible values of \(\theta\). The newly collected data makes the probability distribution over \(\Theta\) narrower by updating the prior distribution to posterior distribution (updated) \(\theta\) of using Bayes’ theorem which states that

$${\text{P}}\left( {\theta |{\text{Data}}} \right) = \frac{{{\text{P}}\left( {{\text{Data}}|{\uptheta }} \right){\text{P}}\left( {\uptheta } \right)}}{{{\text{P}}\left( {{\text{Data}}} \right)}}$$
(2)

where P(Data|\(\uptheta )\) is called the likelihood & \({\text{P}}\left( {\theta |{\text{Data}}} \right)\) is the posterior distribution of the parameter \(\uptheta\).

3.2 Hierarchical Bayesian model

A model in which the prior distribution of some of the model parameters depend on some other unknown parameters, which are in turn modelled as random variables following some other distribution is a Hierarchical Bayesian model. The level of hierarchy depends on the context and complexity of the problem. Given the observed data x, suppose x follows f(.|θ), with θ being distributed as a prior π(φ). If the parameters φ can further assumed to be following ζ(υ), the use of hierarchical models ensures a more flexible account of data.

3.3 Multinomial distribution

It is a multivariate generalization of Binomial distribution. Suppose an experiment is conducted such that each trial has k (finite & fixed) mutually exclusive & exhaustive possible outcomes with probabilities \({p}_{1}, {p}_{2},\dots ,{p}_{k}\) such that \({p}_{i}\ge 0 \forall i=1(1)k\) and \(\sum_{i=1}^{k}{p}_{i}=1\). If \({X}_{i}\) be the random variable indicating the number of times category I is observed over n independent trials of the experiment, then the vector \(\underset{\_}{X}=\left({X}_{1}, {X}_{2},\dots , {X}_{k}\right)\) follows a Multinomial Distribution with parameters n and \({p}_{1}, {p}_{2},\dots ,{p}_{k}\). The probability mass function of the multinomial distribution is:

$$\begin{gathered} {\text{F}}\left( {X_{1} = x_{1} ,{ }X_{2} = x_{2} , \ldots ,{ }X_{k} = x_{k} } \right) \hfill \\= \frac{n!}{{x_{1} ! \ldots x_{k} !}}p_{1}^{{x_{1} }} \ldots p_{k - 1}^{{x_{k - 1} }} \left( {1 - p_{1} - p_{k - 1} } \right){ }^{{n - x_{1} - \ldots - x_{k - 1} }} \hfill \\ \end{gathered}$$
(3)

3.4 Dirichlet distribution

Dirichlet distribution is a multivariate generalization of Beta distribution. Dirichlet distribution of order (k \(\ge 2\)) with parameters \({\alpha }_{1},\dots ,{\alpha }_{k}>0,\) has the following probability density function:

$${\text{g}}(x_{1} ,{ }x_{2} , \ldots ,{ }x_{k} ) = \frac{{{\Gamma }(\mathop \sum \nolimits_{i = 1}^{k} \alpha_{i} )}}{{\mathop \prod \nolimits_{i = 1}^{k} {\Gamma }(\alpha_{i} )}}{ }\mathop \prod \limits_{i = 1}^{k} x_{i}^{{\alpha_{i} - 1}}$$
(4)

Here Xi’s are continuous random variables with \({x}_{i}\ge 0 \forall i\) and \(\sum {x}_{i}=1\), that is, the support of Dirichlet distribution is the set of k-dimensional vectors whose entries belong to (0,1) and add up to one. The parameter vector \({p}_{1}, {p}_{2},\dots ,{p}_{k}\) of the Multinomial distribution has the properties of the xi’s above, as p = {p1,p2,⋯,pk}, where 0 ≤ pi ≤ 1 for i ∈ [1,k] and ∑pi = 1 and hence can be modelled using an appropriate Dirichlet distribution. Dirichlet distribution is a family of continuous probability distribution for a discrete probability distribution with k categories. The usefulness of this method is explained with the help of a realistic example. Considering a company produces six faced dice; though manufacturing processes are precise nowadays, they are still not 100% perfect—if one rolls a randomly selected dice, getting an exact relative frequency of one sixth for the outcomes is difficult due to a slight manufacturing defect. As one can always expect a probability distribution over all possible values, 1, 2, 3, 4, 5 and 6; this probability distribution can be modelled using Dirichlet distribution.

3.5 Conjugate prior

In Bayesian probability theory if posterior and prior probability distributions of the parameter θ belong to the same probability distribution family, the prior is then called a conjugate prior. In other words, in Equation (2) if P(θ) and P(θ|Data) are in the same family of distributions, they are called conjugate distributions. It can be shown that Dirichlet distribution acts as a conjugate prior for Multinomial distribution.

3.6 Proposed methodology

As generation of the weights corresponding to different conflicting objectives in a weighted sum problem is the primary interest, the authors have considered the weights wi in (1) as the unknown parameters.

Let M be a set of conflicting objectives in the objective space defined as follows:

$$M = \left\{ {f_{1} \left( x \right),{ }f_{2} \left( x \right), \ldots ,{ }f_{l} \left( x \right);{\text h_{i}}\left( {\text{x}} \right) \le / \ge 0, {\text{i}}\, = \,{1}, \ldots ,{\text{p}}} \right\}$$
(5)

where \({f}_{1}\left(x\right), {f}_{2}\left(x\right),\dots , {f}_{l}\left(x\right)\) are the conflicting objective functions, hi(x) denotes the set of p constraints. The Weighted Sum method scalarizes the vector objective functions, \(\underset{\_}{f}=({f}_{1}\left(x\right), {f}_{2}\left(x\right),\dots , {f}_{l}\left(x\right)\))\(\epsilon {\mathcal{R}}^{l}\), where \({\mathcal{R}}^{l}\) is the l dimensional Euclidean space, using the appropriately selected vector of weights \(\underset{\_}{w}=({w}_{1},\dots ,{w}_{l})\in {\mathcal{R}}^{l}\) such that \({w}_{i}>0\) and \(\sum_{i=1}^{l}{w}_{i}=1\).

$$\theta = \underline {w}^{^{\prime}} \underline {f} = w_{1} f_{1} + \cdots + w_{l} f_{l}$$
(6)

It is to be noted that, \(\theta \in {\text{R}}\) is a scalar. Without any loss of generality one can assume the objective functions \(f_{i} \left( x \right)\) \(\forall_{i} = 1\left( 1 \right)l\) to be normalized.

To determine the weights, suppose one obtains data on the preferences of n individuals regarding the choice of different categories (representing different conflicting objectives) through a planned pilot survey. Individuals may be asked to vote for the single most important category out of a finite number of mutually exclusive and exhaustive set of choices. Let ni= number of individuals who have voted for category i (i = 1, 2,..., l; representing the ith objective function) in the pilot survey. The multinomial distribution is used for modeling the probability of counts in the different categories (representing the different objective functions), as the individuals vote independently for exactly one of the l categories. Then, (n1, n2,…,nl) ~ Multinomial (n; w1, w2,…,wl), where wi is the population proportion of individuals who will vote for category i or is the probability that a randomly selected individual votes for ith category. Probability mass function of multinomial distribution is given by

$$\begin{gathered} {\text{f}}\left( {{\text{n}}_{{1}} ,{\text{n}}_{{2}} , \ldots ,{\text{n}}_{{\text{l}}} |{\text{ w}}_{{1}} ,{\text{ w}}_{{2}} , \ldots ,{\text{w}}_{{\text{l}}} } \right) \hfill \\ = \frac{n!}{{{ }n_{1} !n_{2} ! \ldots n_{l} !}}{ }w_{1}^{{n_{1} }} w_{2}^{{n_{2} }} \cdots w_{l - 1}^{{n_{l - 1} }} \left( {1 - w_{1} - \cdots - w_{l - 1} } \right)^{{n - n_{1} - \ldots n_{l - 1} }} \hfill \\ \end{gathered}$$
(7)

As wi’s are continuous random variables, where

\(w_{i} \ge 0{ }\forall { }i{ }\), and \(\sum w_{i} = 1\) it can be further assumed that,(w1, w2,…,wl) ~ Dirichlet (α1, α2,…,αl) having the following form of density.

$${\text{g }}\left( {{\text{w1}},{\text{ w2}}, \ldots ,{\text{ wl}}} \right)\, = \,\frac{{{\Gamma }(\mathop \sum \nolimits_{i = 1}^{l} w_{i} )}}{{\mathop \prod \nolimits_{i = 1}^{l} {\Gamma }(w_{i} )}}{ }\mathop \prod \limits_{i = 1}^{l} w_{i}^{{\alpha_{i} - 1}}$$
(8)

Here, Dirichlet distribution, being a distribution over a probability simplex, is most appropriate for modelling (w1, w2,…,wl). Dirichlet distribution is a multivariate generalization of Beta distribution a acts as a conjugate prior to multinomial where α1, α2,…,αl are the concentration parameters such that αi > 0 ∀ i= 1 (1) l.

The marginal likelihood function is given by,

$$\begin{gathered} {\text{h }}({\text{n}}_{{1}} ,{\text{ n}}_{{2}} , \ldots ,{\text{ n}}_{{\text{l}}} |\alpha_{{1}} ,\alpha_{{2}} , \ldots ,\alpha_{{\text{l}}} ) = \hfill \\ \int_{{{\text{w}}1,{\text{ w}}2, \ldots ,{\text{ wl}}}}^{.} {{\text{f}}\left( {n_{1} ,{ }n_{2} , \ldots ,{ }n_{l} {|}w_{1} ,w_{2} , \ldots ,w_{l} } \right).{\text{ g }}\left( {w_{1} ,w_{2} , \ldots ,w_{l} } \right)d{\text{w}}_{{1}} {\text{dw}}_{{2}} \ldots {\text{dw}}_{l} = } \hfill \\ \frac{{{\Gamma }\mathop \sum \nolimits_{j = 1}^{l} (\alpha_{j} )}}{{\mathop \prod \nolimits_{j = 1}^{l} {\Gamma }\left( {\alpha_{j} } \right)}}{ }.{ }\frac{n!}{{\mathop \prod \nolimits_{j = 1}^{l} n_{j} !{ }}}{ }.{ }\frac{{\mathop \prod \nolimits_{j = 1}^{l} {\Gamma }\left( {n_{j} +\alpha _{j} } \right)}}{{{\Gamma }(\mathop \sum \nolimits_{j = 1}^{l} { }\alpha_{j} + n)}} \hfill \\ \end{gathered}$$
(9)

Equation (9) gives the conditional probability of observing the data given α1, α2,…,αl. The values of α1, α2,…,αl which maximizes (9) are considered as the estimates. Now it can be shown that [w1, w2,…,wl | n1, n2,…,nl ~ Dirichlet (α1+n1, α2+n2 ,…, αl+nl ), i.e. the posterior distribution of the weights given the data, follows Dirichlet distribution with concentration parameters (α1+n1, α2+n2,…,αl+nl).

Posterior expectations of the weights are given by,

$${\text{W}}_{{\text{i}}} *\, = \,{\text{E }}\left( {{\text{w}}_{{\text{i}}} |{\text{ n}}_{{1}} ,{\text{ n}}_{{2}} ,{\text{ n}}_{{3}} } \right)\, = \frac{{\alpha _{i} + {\text{n}}_{i} }}{{\mathop \sum \nolimits_{i = 1}^{3} (\alpha _{i} + {\text{n}}_{i} )}},{\text{ i}}\, = \,{1},{2},{3}$$
(10)

Hence, estimates of weights can be taken as

$${\hat{\text{W}}}_{{\text{i}}} * = \frac{{\hat{\alpha }_{i} + {\text{n}}_{i} }}{{\mathop \sum \nolimits_{i = 1}^{3} \left( {\hat{\alpha }_{i} + {\text{n}}_{i} } \right)}}$$
(11)

where \(\hat{\alpha }_{1}\), \(\hat{\alpha }_{2}\) and \(\hat{\alpha }_{3}\) are the values maximizing (9).

Hence, the objective function gets modified as follows:

$${\text{Minimization of f}} = {\hat{\text{W}}}_{{1}} *{\text{f}}_{{1}} + {\hat{\text{W}}}_{{2}} *{\text{f}}_{{2}} + {\hat{\text{W}}}_{{3}} *{\text{f}}_{{3}}$$
(12)

The above modeling technique incorporates the uncertainties in determination of the weights through a Bayesian hierarchical model based on multinomial distribution with Dirichlet prior. As observed ithe existing literature wi’s have been estimated simply by the proportion of preference in the respective categories,

$${\hat{\text{W}}}_{{\text{i}}} \, = \,{\text{p}}_{{\text{i}}} \, = \,{\text{n}}_{{\text{i}}} /{\text{ n}};{\text{ i }} = {1},{ 2}, \ldots ,{\text{l}};{\text{ n }} = \mathop \sum \limits_{i = 1}^{l} n_{i}$$
(13)

The proposed algorithm for weight determination is as follows:

Step 1: Read n1,. ….nl(votes for individual categories).

Step 2: Calculate sum n = n1+. …. + nl.

Step 3: Calculate Probability Mass Function f(n1,n2,…,nl | w1, w2,…,wl) where wi is the population proportion of individuals (unknown) who will vote for category i

Step 4: Calculate Probability Density Function g (w1, w2,…, wl | α1, α2,…,αl) wi’s are continuous random variables and α1, α2,…,αl are the concentration parameters.

Step 5: Calculate Marginal Likelihood Function h (n1, n2,…, nl1, α2,…,αl ): The values of α1, α2,…,αl which maximizes this expression are considered as the estimates.

For three variables, h=

$$\frac{{{\text{gamma }}\left( {{\text{a}}_{1} + {\text{a}}_{2} + {\text{a}}_{3} } \right)}}{{{\text{gamma }}\left( {{\text{a}}_{1} } \right)*{\text{gamma }}\left( {{\text{a}}_{2} } \right)*{\text{gamma }}\left( {{\text{a}}_{3} } \right)}} \times \frac{{{\text{factorial}}\left( {\text{n}} \right)}}{{{\text{factorial}}\left( {{\text{n}}_{{1}} } \right)*{\text{factorial}}\left( {{\text{n}}_{{2}} } \right)*{\text{factorial}}\left( {\text{n}}_{{3}} \right) }} \times \frac{{{\text{gamma}}\left( {{ {\text{a}}_{{1}} +\text{n}}_{1}} \right)*{\text{gamma}}\left( {{\text{a}}_{2} + {\text{n}}_{2} } \right)*{\text{gamma}}\left( {{\text{a}}_{3} + {\text{n}}_{3} } \right)}}{{{\text{gamma}}\left( {{\text{a}}_{1} + {\text{a}}_{2} + {\text{a}}_{{3}} + 3{\text{n}}} \right)}}$$

Step 6: Calculate Posterior Expectations of weights Wi*

Step 7: Calculate Estimates of weights Ŵi*

Step 8: Exit

The corresponding MATLAB pseudo-code for obtaining the weights is as follows:

figure a

4 Results and discussion

Comparison of estimate of weights has been performed for the Frequentist and Bayesian models. Results obtained from the pilot survey are as follows:

n1 = 24, n2 = 11, n3 = 12, n = 47.

Under Frequentist setup, the estimated weights are

$${\hat{\text{W}}}_{{\text{i}}} \, = \,{\text{n}}_{{\text{i}}} /{\text{ n}}$$
(14)

As ni ~Binomial (n, pi) for i = 1, 2, 3, the error variance is given by:

$${\text{V}}\left( {{\hat{\text{W}}}_{{\text{i}}} } \right)\, = \,{\text{Var}}\left( {{\text{n}}_{{\text{i}}} /{\text{ n}}} \right)\, = \,{\hat{\text{W}}}_{{\text{i}}} \left( {{1} - {\hat{\text{W}}}_{{\text{i}}} } \right)/{\text{n}}$$
(15)

Under Bayesian setup, estimates of weights are given in Equation (11). Expression for variance with respect to posterior Dirichlet (α1+n1, α2+n2, …, αl+nl) distribution:

$${\text{V}}\left( {{\hat{\text{W}}\text{i}}*} \right) = {\text{Var}}\left( {\frac{{\hat{\alpha }_{i} + {\text{n}}_{i} }}{{\sum \nolimits_{i = 1}^{3} \left( {\hat{\alpha }_{i} + {\text{n}}_{i} } \right)}}} \right) = \frac{{n{\hat{\text{W}i*}}\left( {1 - {\hat{\text{W}i*}}} \right)}}{{\left\{ { \sum \nolimits_{i = 1}^{3} \left( {\hat{\alpha }_{i} + {\text{n}}} \right)} \right\}^{2} }}$$
(16)

In order to compare the performance of proposed Bayesian model with the existing frequentist method, one needs to consider samples with varying sizes. The results are shown in Table 1.

Table 1 Comparison of the error variance under the Frequentist and Bayesian techniques for different sample sizes

Although the weights seem to be close, it is clear from the results that the new model outperforms the frequentist one with respect to stability under small sample sizes.

Efficiency of estimator T2 with respect to T1 is given by

$$E = \frac{{V\left( {T_{1} } \right)}}{{V\left( {T_{2} } \right)}}$$
(17)

Figure 1 depicts the performance of the two estimators with respect to the relative gain in efficiency for varying sample sizes. Suppose there are two estimators T1 and T2, relative gain in efficiency of T2 with respect to T1 is given by,

$${\text{G}} = \frac{{\left( {{\text{V}}\left( {{\text{T}}_1} \right) - {\text{V}}\left( {{\text{T}}_2} \right)} \right)}}{{V\left( {{\text{T}}_1} \right)}}$$
(18)
Fig. 1
figure 1

Relative Gain in Efficiency vs Sample Size

Note that G ≈ 0, indicates that the two estimators are equally efficient. An estimator T2 is more efficient than T1 if V(T2) ≤ V(T1), G > 0. Calculating the relative gain in efficiency in Table 1, it is observed that the gain in efficiency due to the proposed method over the existing one is quite high for small sample sizes.

It can be observed that the proposed estimator outperforms the existing one with respect to gain in efficiency for small sample sizes. With the increase in sample sizes there is a steady decrease in the gain in efficiency, indicating that with respect to the given data, the two estimators become equally competent for large sample sizes. But in reality, it may be difficult to have a large sample data, thus the effectiveness of the proposed method gains prominence. Bayesian determination of weights is highly recommended in such cases where conducting a large-scale survey is time consuming, difficult to implement as well as expensive.

5 Application in parking route problem

The proposed methodology has been applied in the field of Intelligent Transport System (ITS). Smart transportation is the need of the hour for sustainable development in a growing economy. Smart transportation supported by a strong communication network and based on sound statistical techniques is a key for tomorrow’s smart cities. This route optimization tool promotes environment conservation and sustainable development by providing the most optimal route to a parking lot thereby saving time, energy, and fuel. Discovering the most optimal parking lot is a serious problem in the cities and it tends to aggravate during peak hours of the day and at congested places. Selecting the route depends on multiple conflicting objectives, namely minimizing the distance to the parking lot, maximizing the speed of the car, and lastly maximizing the parking availability at the lot. The detailed problem definition, formulation, design methodology and implementation are available at [26]. A pilot survey was conducted among 50 drivers. The Bayesian and frequentist weights were calculated respectively using Equations (11) and (14) and summarized in the Table 2.

Table 2 Weights-Frequentist and Bayesian

Genetic Algorithm has been used to solve the Multi-Objective optimization problem. The algorithm has been designed to run for 30 generations as the fitness values have stabilized by then in most cases. The fitness values obtained across generations have been plotted and graphically represented in Table 3 and Figure 2.

Table 3 Fitness values across Generations for Frequentist and Bayesian Weights for two time slots 12AM–4AM and 12Noon–4PM
Fig. 2
figure 2

Plot of Fitness vs. Generations for two time slots 12AM–4AM and 12 Noon–4PM showing how Fitness values change across 30 generations using frequentist (green) and Bayesian (red) weights (Color figure online)

It is observed that as the generations increases, the value of the fitness function tend to decrease till it stabilizes at an optimal value. In both cases it was seen that the Bayesian weights produced lower fitness values consistently. The process was then repeated for thirty different executions. The fitness values obtained for both the Frequentist and Bayesian weights were noted in Table 4 and plotted in Figure 3.

Table 4 Fitness values across 30 executions for Frequentist and Bayesian Weights for time slot 12AM–4AM
Fig. 3
figure 3

Graph showing how fitness varies for 30 different runs using frequentist (green) weights and Bayesian weights (red) for time slot 12AM–4AM (Color figure online)

Table 3, combined with Fig. 2, shows that the fitness value exhibits a consistently decreasing trend as the number of generations increase across all time zones. Focusing on the fitness values corresponding to a typical time zone, here 12 am to 4 am, it can further be noticed from Table 5 that both the average fitness value and the best fitness value were lower for the Bayesian weights than Frequentist weights in 30 executions. Secondly the routes as well as the parking lot vary depending on the time zone. This simulates a real-life scenario where parking lots and routes are bound to change as the values for the different factors changes. Although distance remains constant but the average speed and availability of parking lots changes with time which gets finally reflected in the fitness function.

Table 5 Descriptive Statistics for comparing the Fitness values across 30 executions for Frequentist and Bayesian Weights for time slot 12AM–4AM

6 Conclusion

The Bayesian Hierarchical model provides a posterior distribution on weights and is suitable for generating weights to check the nature of the solution set. Moreover, generation of weights using the proposed Bayesian methodology can be used to develop a bona-fide Bayesian posterior distribution for the optima, thus properly and coherently quantifying the uncertainty about the optima. It has been shown that the proposed estimator outperforms the existing ones with respect to efficiency for small sample sizes. In practice, as it is difficult to have a large sample data, the effectiveness of the proposed method gains prominence. Bayesian determination of weights finds high applicability in cases where conducting a large-scale survey is time consuming, difficult to implement as well as expensive. This proposed model is designed to adequately derive information from the collected data, rendering highly efficient estimators for small data sizes. This technique has been analyzed for error variances thereby quantifying the reliability of the estimates.

When applied in the domain of route optimization in discovering the most suitable parking lot, the proposed methodology has produced results which display close resemblance to the phenomenon observed in real life situations. This work relied on sound statistical techniques to improve the weights representing the relative importance of the possibly conflicting objective functions of the route optimization process rather than improving the solution set directly. It has also been observed that on an average the fitness values obtained under weights generated by the proposed methodology outperforms that obtained by frequentist approach. If implemented in reality, this would certainly ensure saving of time, energy and fuel, thus a greener world.