1 Introduction

Due to emerging economies, intercity travel demands are so high in several Asian countries like Bangladesh that the transport planners cannot expect to serve all of it in vehicles on the highway. In this context, the traffic load on highways can be reduced if a significant number of travelers switch to an alternative mode of transportation. A good intercity train service can play an important role to address this issue. Particularly, trains could be an efficient and economic mode to meet long-distance travel demand. It is also a demand-responsive mode. The capacity of the train can be changed according to the demand. Changing car numbers and types (e.g. single-decker or double-decker), it is possible to meet the travel demand to a great extent. However, like any public transport mode, ridership of intercity trains also depends largely on the passengers’ satisfaction of their services. Hence, train service quality (SQ) is an issue of major concern.

SQ is a general group of performance measurement tools in traffic and transportation engineering operations. Specifically, this study defines SQ as a measure of user perceptions gained from the overall public transport service, e.g. performance, comfort etc. In order to retain attractiveness of public transit among travelers and boost ridership, operators need to continuously monitor various SQ metrics. Several studies (Hu and Jen 2006; Pham and Simpson 2006; Pérez et al. 2007) have highlighted the importance of the SQ of public transport. Measurements of SQ enable the transit operator to decide upon the organizational goals and make crucial decisions regarding future investments. Models of SQ attributes of transport services provide the opportunity to gain insight into the attributes related to SQ and thus provide a guideline for amelioration. However, modeling service quality has posed a considerable challenge to researchers due to the complexity of the concept, uncertainties regarding the attributes to be used, perception heterogeneity of passengers, imprecise and subjective nature of the survey data. Thus, researchers have resorted to a wide variety of tools for modelling service quality.

A passenger’s opinion about SQ of any public transport mode on regular days are significantly different from those for special days. Related studies are very hard to find in literature, especially in case of intercity trains. In general, SQ data on special days are more heterogeneous. Knowing passengers’ perception to use intercity train service can be a useful support for transit planners and managers who can define the most convenient strategies to satisfy existing passengers and attract new ones. Accordingly, this study explores a relationship between the intercity train SQ attributes and passenger satisfaction based on perceptions gained from experience. To this end, a Likert scale questionnaire survey was designed to collect data on user perceptions about intercity train SQ under different scenarios. A mathematical framework of Adaptive Neuro-Fuzzy Inference System (ANFIS) was calibrated using those data to see how models fit intercity train SQ data collected from the field. ANFIS is a Fuzzy-Logic Inference System (FIS)-based Artificial Neural Network (ANN) tool, which is robust and convenient to inherent properties of storing empirical knowledge. It is fast and accurate in prediction of SQ of heterogeneous data. ANFIS has been successfully implemented in different complex transportation problems, such as mode-choice modeling, accident prediction, real-time traffic state estimation, travel behavior modeling. However, few researches have been performed using ANFIS on SQ of train. Moreover, to the authors’ best knowledge there are no previous studies that investigated the influence of MFs and epochs to calibrate the ANFIS by utilizing the intercity train SQ dataset. Using a questionnaire dataset collected from regular days and special days, this study calibrated the ANFIS framework to forecast intercity train SQ. The calibrated models are also used to rank the SQ attributes and the results are compared with empirical observations (public opinion) to see the accuracy of the ANFIS models.

The rest of this paper is structured as follows: Sect. 2 discusses the previous studies on train SQ evaluations. Section 3 describes the structure of the ANFIS framework and Sect. 4 discusses the study location, sample size, and selected SQ attributes. Model findings and discussions are depicted in Sect. 5. Section 6 presents the implications and limitations of this research work. The last section illustrates the major findings and provides recommendations for future study.

2 Literature review

Service quality (SQ) is vital in attracting and retaining new users. Therefore, researchers and policy makers are concerned about its influence on different public transport modes. Early research work by Parasuraman et al. (1985, 1988) developed a SERVQUAL SQ model, where they identified five dimensions—reliability, assurance, tangibles, empathy and responsiveness to measure SQ through questionnaires. Several studies such as—Gronroos (1984), Brady and Cronin (2001), Karatepe et al. (2005) performed extensive SQ assessment in the context of user’s perception. Later, the techniques of SQ were widely implemented in several fields of market research including performance evaluation of public transportation including bus, train and so on. Several authors introduced different quality measuring methods and parameters to assess railway passenger service quality. Wardman (2004) used demand elasticity to estimate the influence of travel time, fuel cost, car ownership, the population on railway SQ. Cavana et al. (2007) presented the SERVQUAL method using regression analysis to evaluate passenger rail SQ by incorporating comfort, connection, and convenience. They revealed that reliability, responsiveness and empathy had significant effects on overall SQ. Nathanail (2008) developed a framework for assessing the SQ of railway from a questionnaire survey of passengers and grouped attributes into six criteria. Those were—itinerary accuracy, system safety, cleanness, passenger comfort, servicing and passenger information among which itinerary accuracy and system safety were found most important attributes. Cantwell et al. (2009) developed a multinomial logit model and revealed that improvement in service, reliability and a reduction of crowding were related to passengers’ satisfaction. Brons et al. (2009) assessed the importance of access to the station with overall satisfaction of railway journey through a regression analysis. Geetika (2010) evaluated the factors that determine user satisfaction with the SQ provided on railway platforms. Determinants found from factor analysis were availability, safety and security, quality of refreshments, effectiveness of information systems, behavior of railway staff and basic amenities provided on platforms. Refreshments and behavioral factors were considered the most significant by passengers.

Prasad and Shekhar (2010) identified the quality of a rail service by the SERVQUAL model incorporating assurance, empathy, reliability, responsiveness, tangibles, service product, social responsibility and service delivery. Among them, service delivery was found as the most important factor. Agunloye and Oduwaye (2011) investigated the relationships among arrival time of the train, smoothness of ride, and cleanliness of the coaches. The research proposed that the arrival time of trains at stations and trip frequency have a significant relationship with SQ of a train. Chou et al. (2011) incorporated the quality satisfaction-loyalty relationship into a passenger satisfaction index (PSI) calculation to evaluate the SQ of high speed rail. The study concluded that the level of access to a station and the personal spaces on the train were the most important quality indicators to improve customer satisfaction. Irfan et al. (2012) evaluated passengers’ perceptions about the SQ of a rail system through a modified SERVQUAL and service qualities were—empathy, assurance, tangibles, timeliness, responsiveness, information system, food, safety and security. The study showed that there is a positive relationship among the SQ attributes. The most significant correlation was found among tangibles and empathy. De Oña et al. (2014) focused on the factors affecting the SQ of the railway in Northern Italy using the decision tree approach. The research found that courtesy and competence/capacity in the station, workability of windows and doors, regularity of train frequency were the major factors for SQ of the railway. Aydin et al. (2015) proposed a combined fuzzy hierarchy process to assess customer satisfaction levels of rail transit. The study provided operational deficiencies related to rail transit through customer satisfaction surveys.

Conventional models have underlying assumptions and predetermined fundamental relationships. Modeling non-linear relationships between user satisfaction and attributes was widely adopted over the last few decades. In these methods, SQ attributes were considered as independent variables and user satisfaction was taken as dependent variables. The coefficients were estimated by relating SQ attributes with user satisfaction. For instance, Chou et al. (2014) used a structural equation model (SEM) to test relationship among SQ, customer satisfaction and customer loyalty on high-speed rail service in Taiwan. The study showed that most significant attributes were cleanliness of train, attitude and appearance of the employee, the comfort of air condition, the on-time performance of the train. De Oña et al. (2016) modeled the behavioral intentions of passengers of using a light rail transit of Seville (Spain) through SEMs. The authors found that behavioral intentions are mostly affected by passenger satisfaction with a light rail transit service and judgements about its SQ. The study findings also demonstrated that the light rail transit SQ is mostly influenced by aspects concerning comfort, accessibility, and timeliness.

In analyzing large datasets, non-parametric models of Artificial Intelligence (AI) such as ANFIS can provide advantages over other statistical regression models like linear regression, logistic regression, ridge regression, lasso regression etc. They can model non-linear relationships with a progressive capability and have scope for model validation as well. Recently, these models were applied widely in the field of science, engineering and market research. Non-linear relationships in many transportation problems can be solved accurately by using a Fuzzy logic-based approach (Teodorovic and Vukadinovic, 2012). Yen and Langari (1998), Passino et al. (1998), and Lewis (1997) performed extensive studies on fuzzy logic and FIS. Later, neural network (NN)-based learning was incorporated into FIS to solve many transportation problems. Teodorovic and Vukadinovic (2012) presented potential applications of fuzzy logic and NNs in solving transportation problems. Park (2002) forecasted freeway traffic volume by a hybrid network-based fuzzy system. Pribyl and Goulias (2003) developed ANFIS for a travel behavior study. Andrade et al. (2006) utilized an adaptive network-based fuzzy logic for transport choice modeling.

Mucsi et al. (2011) utilized the ANFIS technique to estimate queue length precisely and applied it for queue management at signalized intersections. They adopted a trapezoidal MF for computational simplicity and to capture the similarity with the inputs of a simulation model. Islam et al. (2016a) adopted Probabilistic Neural Network (PNN), Generalized Regression Neural Network (GRNN) and Pattern Recognition Neural Network (PRNN) to assess the significant attributes that influence the SQ for Dhaka city bus transit. In another study, Islam et al. (2016b) used PNN and ANFIS to construct and compare the prediction models for bus SQ of Dhaka city. They ranked the SQ attributes according to their effect and identified the significant attributes. The study revealed that ANFIS performed better than PNN for the evaluation of bus transit SQ. The SQ assessment studies conducted by Islam et al. (2016a, b) differ from the current study with respect to several aspects: (1) types of public transport modes considered; (2) coverage of the modes considered; (3) days considered (regular, special or both); (4) empirical models used; (5) specific SQ attributes introduced into the models; and (6) assessing the impact of Membership Functions (MFs) and Epochs in the model performance. To simplify and provide a good visualization of the differences and progress of the authors’ research works, a table is provided in Appendix A.

Inspired by the most recent studies, this paper depicts the application of the ANFIS in the development of a new fuzzy logic-based approach for exploring the relationship among attributes of intercity train’s SQ with passengers’ satisfaction level. Particularly, previous studies showed that the ANFIS-based SQ assessment seems to be a feasible approach for any mode of public transport. A dataset is trained and tested using Artificial NN to check the fitness of the calibrated FIS for estimating the parameters of observed attributes or variables of railway SQ under different scenarios, i.e., regular days and special days. Despite uncertainties and nonlinearities, ANFIS represents a mathematical framework that can model the relationship among observed variables, hidden layers and output variables quite remarkably.

3 ANFIS framework

In SQ evaluation, the complex relationships among all observed attributes create an environment that is difficult to define intuitively. A neuro-fuzzy approach which can learn from the training data was used accordingly. Artificial Neural Network (ANN) creates a computational structure which functionally can perform like the human brain. It interconnects neurons or nodes. These nodes use input data and Membership Functions (MFs) to process and transmit outputs. Jang (1993) adopted the network learning algorithm to FIS and named the structure ANFIS. It combines the FIS and ANN. Like a human brain, ANFIS can process training data to gain experience and create a fuzzy-logic-based network with a complex algorithm. The algorithm works stepwise. The FIS is used to model a non-linear relationship and the NN is used to calibrate the parameters of input and output MFs. ANFIS combines the advantages of FIS and ANN into a single implementation unit by designing a feed-forward ANN that can perform the operations of FIS. The ANN training method has also been improved in ANFIS by a hybrid learning scheme. Therefore, it is clearly understood that ANFIS is an updated and automated version of combined FIS and feed-forward ANN. Since ANN is a class of deep learning process (Schmidhuber 2015), ANFIS also belongs to that class. Specifically, the ANFIS tool in MATLAB Release (2014) is used in this research to predict intercity train’s SQ. This tool offers both a Sugeno-type and a Mamdani-type inference system. As the Sugeno-type is a more compact and computationally efficient representation than a Mamdani system, it better suits to the use of adaptive techniques for constructing fuzzy models (see, Sugeno 1985). Hence, the current study chooses the Sugeno-type model in FIS to capture the non-linear relationship between intercity train SQ and its attributes. The objective function of ANN training optimizes the weights for a specific set of inputs and outputs. The output calculated by the network is compared with the corresponding training data. Then the randomly set parameters are adjusted in order to reduce the errors of the output. The optimization process ends when the error reaches an expected value. To calibrate parameters, a combination of least-squares and a back propagation gradient descent method is used in the ANFIS structure.

The calibration of MF was performed by using input data (i.e. observed attributes) and output data (i.e. overall passenger’s satisfaction) through the learning process. The process consists of two main steps: (i) the collection of learning data; and (ii) FIS generation. Data collection consists of gathering sufficient relevant training data describing the relationship between observed variables or attributes and the corresponding overall passengers’ satisfaction level by using numerical ranking. For every individual training, a database with 830 respondents for regular days and 443 respondents for special days were collected from a Likert scale questionnaire survey. The structure of the FIS was constrained to allow 181 MFs for input attributes and the shapes of the MFs were optimized for both regular days and special days. The number of rules depends on the number of MFs and input variables. However, rule outputs have diferent levels of strength. The rule outputs were combined to provide a single output. The questionnaire survey data was used to perform the calculations required by the ANFIS. During the learning process, ANFIS modifies the input and output MF parameters with the objective of minimizing the error. The error is the sum of the squared differences between the observed and modelled values. The structure of ANFIS comprises five layers as shown in Fig. 1a, those are—(i) fuzzification; (ii) fuzzy AND; (iii) normalization; (iv) defuzzification; and (v) output layer. Each of these layers is connected through direct links and nodes. Nodes are process units which consist of adaptive and fixed parameters. By setting learning rules, adaptive parameters can be altered and the MFs are reformed.

Fig. 1
figure 1

ANFIS structures: a two input variables used for mathematical illustration; and b MATLAB graphical interface with 18 intercity train SQ attributes

In Fig. 1a, two input variables (x, y) and one output (f) are merely considered for mathematical illustration of ANFIS modeling. Considering these, for the first-order Sugeno fuzzy model and typical if–then fuzzy rules, Eqs. (1) and (2) can be written. In these equations, a and b are the coefficients of the input variables and c is the constant term. These parameters are referred to as consequent parameters and are optimized through the least squares method. Figure 1b shows the MATLAB graphical interface of the ANFIS model with 18 intercity train SQ attributes considered in this study.

  1. Rule 1.

    If x is P1 and y is Q1, then

$$f_{1} = a_{1} x + b_{1} y + c_{1}$$
(1)
  1. Rule 2.

    If x is P2 and y is Q2, then

$$f_{2} = a_{2} x + b_{2} y + c_{2}$$
(2)

All nodes in the first layer, i.e. fuzzy layer, are adaptive. It is also known as input layer. The relationship between the output and input MFs of this layer is as follows:

$$O_{m}^{1} = \mu_{{P_{m} }} (x);\;m = 1,2$$
(3)
$$O_{n}^{1} = \mu_{{Q_{n} }} (y);\;n = 1,2$$
(4)

Here, x and y are the inputs of nodes Pm and Qn, respectively. Pm and Qn are the linguistic labels used in the fuzzy theory for dividing the MFs. \(\mu_{{{\text{P}}_{\text{m}} }}\) and \(\mu_{{{\text{Q}}_{\text{n}} }}\) can adopt any MF. If a bell-shaped MF (Jang 1993) is employed, \(\mu_{{{\text{P}}_{\text{m}} }}\) is given by Eq. (5),

$$\mu_{{P_{m} }} (x) = \frac{1}{{1 + \left[ {\left( {\frac{{x - u_{m} }}{{v_{m} }}} \right)^{2} } \right]z_{m} }}, \quad m = 1,2$$
(5)

where, \(u_{m} ,\)\(v_{m}\) and \(z_{\text{m}}\) are the parameters of the bell-shaped MF.

Alternatively, the Gaussian MF is represented by Eq. (6),

$$\mu_{{P_{m} }} (x) = \exp \left[ { - \left( {\frac{{x - u_{m} }}{{v_{m} }}} \right)^{2} } \right], \quad m = 1,2$$
(6)

where, \(u_{m}\) and \(v_{m}\) are the parameters of the Gaussian MF. The parameters in this layer are usually referred to as premise parameters.

The second layer is labeled as M. This layer is also known as input MF. All nodes are fixed in this layer and perform as simple multiplier. The outputs of this layer are firing strengths represented as:

$$O_{m}^{2} = w_{m} = \mu_{{P_{m} }} (x)\mu_{{Q_{m} }} (y); \quad m = 1,2$$
(7)

The third layer is labeled as N. This layer performs as a normalizer to the firing strengths obtained from the previous layer. The outputs of this layer are normalized firing strengths and given by:

$$O_{m}^{3} = w_{avg,m} = \frac{{w_{m} }}{{w_{1} + w_{2} }}; \quad m = 1,2$$
(8)

All nodes are adaptive in the fourth layer. The layer is also known as output MF. The output of each node in this layer is the product of the normalized firing strength and a first-order polynomial. The outputs of the layer are as follows:

$$O_{m}^{4} = w_{avg,m} f_{m} = w_{avg,m} (a_{m} x + b_{m} y + c_{m} ); \quad m = 1,2$$
(9)

Only one single fixed node performs the summation of all the incoming signals in the fifth layer and it is labeled as Σ. Therefore, the overall output of the model in the fifth layer is represented as:

$$O_{{}}^{5} = \sum\limits_{m=1, 2} {w_{avg,m} f_{m} } = \frac{{\sum\nolimits_{m=1, 2} {w_{m} f_{m} } }}{{\sum\nolimits_{m=1, 2} {w_{m} } }}$$
(10)

4 Survey design, data collection and response characteristics

Firstly, SQ attributes were selected to design a Likert scale questionnaire survey. A total of 18 train SQ attributes were used for building ANFIS models. A list of the selected 18 attributes is given in Table 1. The attributes were selected from the extensive literature review on train SQ analysis, train passengers’ interview and analysis of the opinion of expert transportation practitioners and policy makers. The questionnaire survey form provided to the respondents is attached in Appendix B.

Table 1 Explanation of attributes used in ANFIS models

The stated preference (SP) survey was carried out through face-to-face on-board interviews with passengers of 30 intercity express trains at Kamalapur railway station in Dhaka city. The surveys were conducted on selective key days throughout the month of July 2016 considering regular days and special occasions (Eid festival). The SP questionnaire is structured into four sections. The first section gathers personal and socioeconomic information: age, gender, occupation of passengers and the reason for traveling by train. The second section focuses on the attributes which are responsible for the assessment of intercity train SQ. These attributes are in a close-ended layout with relevant multiple choices. The respondents are asked to mark the checkboxes by assessing the present situation of the service attributes according to their preferences. The third section of the questionnaire is designed to collect benchmark points on the overall SQ of the intercity train. In both the preceding sections, a quantitative scale of 1–5 is used to rate the service attributes and overall SQ. In the given scale, 1 corresponds to ‘very poor’ and 5 indicates ‘excellent quality’. The fourth section records the respondents’ responsiveness on the significant attributes considered for rating the train SQ. Particularly, the users were asked to select at least 10 attributes upon which they have decided the SQ rating of an intercity train service in the third section.

The strategy followed in this survey is to assess the opinion of different age groups, gender, and occupation type. A total of 1037 and 553 questionnaire data samples were collected during regular days and special days, respectively. The demographic data of the respondents are shown in Table 2. Based on this table, the following statistical findings have been obtained where the values in brackets represent special days; otherwise, they represent regular days. Among the samples, 77.6% (73.7%) and 22.4% (26.3%) were male and female, respectively. In the context of age, 1.5% (21.9%), 53.7% (52.8%), 20.4% (16.6%), and 24.4% (8.7%) of the respondents fall in the category of 11–20 years, 21–30 years, 31–40 years, and more than 40 years of age, respectively. Moreover, in the context of occupation, 30.5% (28.6%), 20.3% (20.7%), 27.3% (25.9%), 17.5% (17.6%), and 4.4% (7.2%) of the respondents were service holder, businessman, student, worker and housewife, respectively. These statistics in the context of different demographic factors do not show a large variation between regular and special days, except in the age category. These unbiased compositions of demographic factors help in accurately identifying the relative effects of intercity train attributes on its perceived SQ for regular and special days.

Table 2 Demographics of passengers participating in the survey on regular days and special days

In the first section of the questionnaire, respondents were also asked to mark the most important reason behind choosing intercity train as their travelling mode. From Fig. 2, the following statistics can be obtained. Specifically, 52.3% (57.0%), 29.4% (27.8%), 14.7% (10.9%), and 3.6% (4.3%) of the respondents choose intercity train as it is safer compared to other modes, its service is comfortable, it is cheaper than other modes, and train is the only mode to reach destination (i.e. captive rider), respectively.

Fig. 2
figure 2

Reasons for choosing intercity trains

According to survey results obtained from the third section of the questionnaire, 2.9% (3.7%), 30.4% (28.7%), 59.4% (63.9%), and 7.3% (3.7%) respondents rated train SQ as excellent, very good, satisfactory, poor, respectively, and no respondents indicated SQ as very poor for regular and special days.

To determine the relative importance of the attributes, all the respondents were asked to select at least ten out of 18 attributes that affect mostly the intercity train SQ and were recorded in the fourth section of the questionnaire. Figure 3 shows the percentages of public opinion corresponding to all the attributes. According to 76.2% of the respondents, ‘Waiting place condition’ is one of the most important attributes for regular days. Approximately 35–65% opined that ‘Toilet cleanness’, ‘Fitness of car’, ‘Air ventilation system’, and ‘Convenience of online ticketing system’ are also important and have significant impact on the SQ for the regular days. For special days, 78.9% of respondents opined that ‘Waiting place condition’ is the most important intercity train SQ attribute. In contrast, approximately 35–75% opined that ‘Travel cost’, ‘Air ventilation system’, ‘Convenience of online ticketing system’, and ‘Travel delay’ are also important and have significant impacts on the SQ for the special days. The percentages of travelers’ inclination to SQ attributes are shown in Fig. 3. These values are utilized for ranking attributes under the columns ‘Public opinions for regular days’ and ‘Public opinions for special days’ in Table 5.

Fig. 3
figure 3

Importance of different SQ attributes according to users’ opinion

5 Model development and evaluation

An out-of-sample forecasting technique is applied to examine the prediction performance of ANFIS using a dataset other than that which was used to train the models. The fitness and accuracy of a prediction model was tested by comparing the predicted outputs with corresponding observed values. The training and forecasting dataset contained 830 (80% of the whole sample) and 207 (20% of the whole sample) observations of regular days. For special days there were 443 (80% of the whole sample) and 110 (20% of the whole sample) observations, respectively. The ANFIS tool designed in MATLAB Release (2014) was used for the development of the models. Parameters such as learning rate, neuron number, smoothing factor, momentum factor, and transfer functions of the ANFIS models were altered by trial-and-error in order to achieve the best performance. The two main criteria imposed while altering the ANFIS parameters are: (1) avoiding model over fitting phenomenon; and (2) improvements in models’ accuracy relative to CPU time requirements, the time required for a computer to analyze the survey data. In this study, 1 GB RAM of Pavilion dv6 laptop with processor Core i5 2.53 GHz was dedicatedly used. In both ANFIS models, the number of input variables, the number of layers, the number of MFs, epochs (training cycle) and training goal were 18, 5, 181, 10, and 0.01, respectively. Whereas, the transfer function of the hidden layer, scaling method, the transfer function of the output layer, and training algorithm were transigmoid, normalization, linear, and back propagation, respectively. This section explicitly illustrates the sensitivity of MFs and epochs in the performance of ANFIS-based intercity train SQ models.

5.1 Influence of MF types and epochs

5.1.1 Effect of membership function types

MFs determine the shape of the projection of input–output mapping and the parameters of these shape functions are adjusted through the fuzzy system to capture the trend of the empirical data. In this study, five types of MFs are considered (Table 3): (1) Triangular; (2) Trapezoidal; (3) Gaussian; (4) Sigmoidal; and (5) P-sigmoidal to assess the ANFIS-based SQ estimation accuracy.

Table 3 Effect of type of MFs on the fitness of ANFIS-models for different scenarios (Bold values in this and subsequent tables are used to draw the attention on values that are most important to notice.)

The results for different MFs are summarized in Table 3 for the two scenarios. In both scenarios, Gaussian-type MFs achieved the best fitness value (RMSE). Other MFs are less capable to capture the distribution of the SQ dataset. It was also cross-checked by the survey data, which showed that the distribution of all the SQ attributes follow the Gaussian distribution. Moreover, the CPU time required for ANFIS training with the same size of SQ data for different MFs does not differ considerably.

Root-mean-square error (RMSE) is expressed as:

$$RMSE \, = \sqrt {\frac{{\sum\nolimits_{i = 1}^{N} {(SQ_{obs,i} - SQ_{\bmod el,i} )^{2} } }}{N}}$$
(11)

where, \(SQ_{obs,i}\) = observed SQ value, \(SQ_{\bmod el,i}\) = modelled SQ value for ith data, and \(N\) is the total number of SQ data used in the model evaluation.

5.1.2 Effect of number of epochs

The epoch number controls the number of times the combined execution of back propagation and least square estimation (Back-propagation learning) occurs for the training dataset. The training process terminates once the designated epoch number or the training goal is reached. However, the number of epochs should be chosen such that overfitting does not occur. In this research, experiments with ANFIS training were conducted with varying epoch numbers (i.e. 5, 10, 20, and 50) considering a Gaussian MF as shown in Table 4. From the table, it can be found that no notable variation in model accuracy occurs due to the increment of epochs (training cycle). Moreover, after an epoch of 20, improvement in model accuracy vanishes for both scenarios. Whereas, a significant change in computation time can be observed. Particularly, for regular days, CPU time is increased by 9.7%; whereas, model accuracy is improved by 0.11% due to the increment of epoch from 10 to 20. Likewise, for special days, CPU time is increased by 10.1%; whereas, model accuracy is improved by 0.14% due to an increment of the epoch from 10 to 20. Consequently, a fixed epoch number of 10 was chosen for the analysis all through this research.

Table 4 Effect of epochs on fitness of ANFIS-models for different scenarios

5.2 Model evaluation

With the trained networks, performances in predicting intercity train SQ for regular and special days were performed. Specifically, in this study, a confusion matrix and RMSE were used as the criteria for performance evaluation of the ANFIS models. The confusion matrix shows the one-to-one matching between output classes (1–5) and target classes (1–5). The diagonal light grey boxes in Fig. 4 illustrate the amounts and percentages that are identical in both output and corresponding target classes. The red boxes explain the amounts of misclassifications. The right-bottom box shows the total correct classifications (green) and misclassifications (red) in percentages. It is found that ANFIS has 83.1% accuracy in training and 54.1% accuracy in forecasting SQ for regular days. Whereas, those values are 93.3% and 60.2%, respectively, for special days. This signifies that during ANFIS training, 689 out of 830 and 413 out of 443 SQ matches perfectly for regular days and special days, respectively. In a similar fashion, it can be explained that during ANFIS prediction, 112 out of 207 and 66 out of 110 SQ matches are perfect.

Fig. 4
figure 4

Confusion matrices of: a trained model for regular days; b tested model for regular days; c trained model for special days; and d tested model for special days

The RMSE values between the predicted and actual SQ of the forecasting sample provide useful information regarding model fitness. The RMSE values of the ANFIS predictions were 0.0857 and 0.0671 during regular days and special days, respectively. The lower RMSE values demonstrated that the proposed ANFIS models for intercity train SQ prediction perform accurately under different scenarios.

5.3 Attributes ranking

In this study, 18 attributes are used to calibrate the ANFIS models and estimate the intercity train SQ. Since the relationship between input attributes and the overall SQ assessment is indistinct, analytical methods were adopted to rank these SQ attributes. Cross-correlation, principal component analysis (PCA), stepwise approach, and connection weights are some of the analytical techniques which are used for attribute rankings. However, the stepwise approach is followed in this study to rank the train SQ attributes. This approach is a simple, standardized and popular method for evaluating and validating the ANN-type models. In this approach, a single attribute from all the attributes is considered separately to develop the ANFIS network. For each case, the isolated network is trained with the training sample whereas, forecasting sample is used to estimate the SQ. After that, the performance of each model is evaluated by checking the differences between actual and predicted SQ using RMSE.

The orders of the relative significance of the SQ attributes found in the models are essential for assessing SQ of trains. Moreover, the variations in the relative significance of those attributes under different scenarios need to be identified. This indicates the perceptions, needs and demands on intercity train under different scenarios. Table 5 shows the ranking of the attributes based on their effect on the SQ prediction. From the table, it can be seen that waiting place condition is the most significant factor for both regular days and special days. It is logical since train schedules for the specific survey location are not maintained properly. During the field survey, a discussion with the rail administration revealed that there is a lack of proper maintenance of the railway’s permanent ways, branch line, bridges, signals and other ancillary facilities over several decades. Aging of rolling stocks, such as engines and coaches together with lack of maintenance on a timely basis, decreases the availability of rolling stock on line on any given day. As a consequence, cancellation/rescheduling of train service is quite usual and passengers need to wait for hours to board a train. Waiting at a station for such a long duration can strongly influence a passenger’s perceived SQ. However, the RMSE value in case of special days is smaller compared to that of regular days as can be found in Table 5. Specifically, the weight of this attribute is 42% higher in case of special days. A previous study by Geetika (2010) also emphasized platform conditions as an important determinant of Indian train SQ. Besides, toilet cleanliness, the fitness of the car and air ventilation systems are also important attributes for regular days. In contrast, travel cost, air ventilation system and convenience of the online ticketing system are found to be the important factors for special days. It indicates that attributes related to physical conditions and service features of a train are important determinants of perceived SQ for regular days and special days, respectively.

Table 5 Attributes ranking based on the ANFIS model and public opinion under different scenarios

The second most significant attribute is found to be travel cost for special days. This is because a large number of city dwellers leave the city to enjoy the ‘Eid Festival’ with their relatives. It creates a temporal gap between demand and supply of public transport. Due to the limited number of train tickets, oftentimes, its price is hiked unofficially. Oftentimes, people are compelled to buy tickets at a higher price due to ticket black-marketing even on the railway counter. In contrast, since the ticket price remains the same, the travel cost is ranked as less significant during regular days. Air ventilation system is ranked as third and fourth most significant attribute for special days and regular days, respectively. Train cars become crammed, overcrowded and a large number of standing tickets are sold officially during special days. It causes the same ventilation system of a train to be perceived by a passenger in a different way. Hence, people feel that the air ventilation system is one of the important factors for train SQ and the rank of this attribute is changed by one unit upward in special days compared to regular days. The convenience of the online ticketing system is one of the important attributes for both regular days and special days. However, the RMSE related to this attribute is much lower; thus, ranked higher for special days compared to regular days. People wait in a long queue to purchase tickets from railway counters during special occasions. Many of them want to avoid this hassle and emphasize the convenience of an online ticketing system. Besides, ANFIS models for both regular days and special days show that car cleanness, on-time performance and meal services are less important due to corresponding high RMSE values. This finding differs with the earlier research conducted by Nathanail (2008) and Chou et al. (2014), which is logical due to the difference in socioeconomic structures, road network conditions, and public transport availability of those study locations and that of the developing economies.

As mentioned earlier, in the fourth section of the questionnaire, the respondents were asked to choose the ten most important out of 18 attributes. The percentages of the attributes chosen by most of the respondents were ranked in a descending order for both regular days and special days. It has been found from the analysis shown in Table 5 that the ANFIS model outputs and public opinions of the top six attributes of regular days and top four attributes of special days are a direct match. However, there are jumbled matches from the 7th and 5th top attributes affecting intercity train SQ of regular days and special days, respectively. However, the differences in those rankings obtained from public opinions and ANFIS are not significant. The above-mentioned research findings match with the real-world scenario quite remarkably which indicates the accuracy of the proposed ANFIS models. Note that the public opinion data (obtained from the fourth section of the questionnaire) are independent of the data used for developing ANFIS models (obtained from the second section of the questionnaire) and finding attributes ranking.

6 Managerial implications

Commuter preferences are hardly taken into consideration by decision makers in most of the developing countries while introducing new policies or adding infrastructures in the existing transportation systems. There are ways, however, for decision-makers to get closer to popular views. The proposed ANFIS model in this study has shown its potential in capturing users’ preferences. It is robust in SQ estimation, which can help transport planners to improve the SQ of intercity train through proper planning and policymaking. Prior study on the SQ estimation for city bus in the context of Bangladesh was conducted by Islam et al. (2016b) found 84% accuracy for ANFIS and had better accuracy than other models, such as, Probabilistic Neural Network (PNN). However, most of the users of city busses are regular passengers and travel within the city. The city bus passengers are more homogeneous compared to the long-distance travelers of the intercity train considered in the current study. In fact, the passengers of intercity train are more diverse, heterogeneous and irregular having wide variation in income levels, age groups and travel purposes. In spite of such heterogeneity, the ANFIS model provides 54.1% (regular days) and 60.2% (special days) accuracy in estimating the SQ of intercity trains, which is remarkable. Thus, the researchers consider that the proposed ANFIS model captures the opinions of a wide variation of users’ heterogeneity with better accuracy. Particularly, this study analyzes the users’ perceptions of the existing service of the intercity train and compared the service and preferences based on their responses. The concerned authority will be able to plan their improvement policies of the SQ of intercity trains and the preference can be followed by the rank of attributes of intercity SQ provided in the ANFIS models.

SQ experiments help us to identify the actual determinants of user satisfaction on the intercity trains for regular days and special days. In this respect, this paper suggests certain policy implications related to the intercity trains. The output of such research will aid transit planners and managers in defining the most convenient strategies to satisfy existing passengers and attract new ones. For instance, the findings on waiting place condition call for adequate and superior seating facilities, good and affordable refreshments, real-time information on train schedules, improved security, adjacent toilets and other amenities, such as television or music, newspapers and magazines. Similarly, based on the finding on air ventilation systems, railway administration can initiate/start special train services or add extra coaches or change train frequency to manage the temporal high demand during special days. However, attention should be given to overcome the limitations of this study; these limitations include: lack of information about the income variables for the respondents, lack of adequate information about infrastructure parameters such as track quality, signaling system, safety features and correlation among these variables.

7 Conclusions and future works

Considering the complexity in the human decision-making process, ANN can be a suitable tool to model SQ of the intercity train. ANFIS is an advanced and popular technique of this genre which was implemented to predict train SQ based on 18 attributes. This research focused on two main objectives: (1) the calibration of FIS using different MFs during ANFIS training cycle; and (2) the evaluation of train SQ attributes according to their importance for different days. Results showed that ANFIS models have 83.1% and 93.3% accuracy in training an SQ dataset for regular days and special days, respectively. In contrast, the ANFIS models showed 54.1% and 60.2% accuracy in forecasting SQ for those days, respectively. It demonstrates that the calibrated models execute NN algorithm into FIS, which is faster and accurate in predicting heterogeneous SQ data for special days. This study revealed that Gaussian type MFs used in ANFIS can predict the intercity train SQ better than other MFs. With this MF, predicted SQ showed a reasonable agreement with observed SQ. It was also cross-checked by the survey data, which showed that the distribution of all the SQ attributes follows the Gaussian distribution. Investigating different training cycles or epochs, it was found that the prediction accuracy increases with the increment of epochs until 20. However, training an ANFIS model with a higher number of epochs is time-consuming and the improvement in SQ estimation accuracy is insignificant at the higher epochs.

Moreover, a step-wise approach was followed to rank the 18 SQ attributes incorporated within the ANFIS models. Considering the RMSE values, it was revealed that passengers are highly concerned about the waiting place condition for both regular days and special days. Besides, travel cost, air ventilation system, and convenience of online ticketing system are influential factors in passengers’ decision-making process for special days. In contrast, they give priority on toilet cleanness, fitness of car and air ventilation system during regular days. It indicates that attributes related to physical conditions and service features of trains are important determinants of perceived SQ for regular days and special days, respectively. The questionnaire survey data collected during special days are much more heterogeneous than for regular days. Hence, significance and rank of the attributes differ in between these two scenarios.

This study provides a clear perception about understanding and improving the overall SQ of intercity train. It relates users’ demand and overall train SQ. Although the sample size used in this study is adequate, further research can be performed with a larger dataset to boldly represent around eight thousand daily intercity train users of the survey location. Moreover, investigation on different user groups, such as student, female, senior citizens, and low-income people could lead to other significant findings which will provide valuable information on SQ. Particularly, due to the heterogeneity among users, SQ attributes are not equally esteemed by them; meaning that an improvement made to any of these attributes would not have the same acceptance level from the individual users (Bordagaray et al. 2014). We also plan to apply ANFIS to find the attributes that influence intercity train SQ using a combined dataset of both users and non-users. The identified important attributes will be compared with the current study. This comparison will help to attract new users. Besides, the prediction capabilities of ANFIS and its calibration technique used in this research will be helpful for transportation officials across the world to implement this tool to prioritize SQ of other public transportation systems, e.g., high-speed train and intercity bus.