Keywords

1 Introduction

The Rasch model is an appropriate tool to measure latent variables (DeVellis 2012). It has been increasingly used especially in scale development studies (DeVellis 2012) in various branches of social sciences. For instance, the research conducted by Bechtel (1985) used Rasch for a consumer rating scale, while Albano (2009) used Rasch for the measurement of individual happiness, and Salzberger and Koller (2013) used Rasch in their marketing study. For the purpose of evaluating the quality of measurement items, the Rasch model offers several analyses. Among them are the item polarity analysis, fit statistics, and local item dependency.

Latent variables can be measured by two common theories: the classical test theory and the item response theory. The Rasch measurement model belongs to the family of item response theory which explains latent variables’ measurement (DeVellis 2012). It contains methods to link observed elements with latent constructs and how to analyze them for a meaningful result.

The Rasch model has been tested on the Islamic Quality Management Scale (IQMS) which has been developed systematically via a design and development approach. The IQMS is significant in order to serve as a standard measurement scale to assess the application of Islamic values in organizations. A review in the discipline of quality management has informed on the inexistence of a standard measurement scale to assess the application of values in organizations. Moreover, there is no empirical data on the influence of Islamic values to quality management practice as the existing scale has scarcely referred to Islamic values (Ishak 2016). Therefore, a list of psychometric properties is developed and validated via the Rasch model. This article reports on the validation results based on three measures of item quality, the interpretation, as well as the sensible decision appropriate to be taken by the researcher. Additionally, this article also evaluates the decision’s effectiveness.

2 The Methodology

Rasch is a probabilistic model which uses logit as measurement units (Bond and Fox 2007). Interestingly, the data can be mapped into a linear scale, which facilitates data reporting. Ironically, interval data is actually a requirement to perform factor analysis, but existing studies mainly used ordinal data (Simblett and Bateman 2011; Hobart and Cano 2009; Muller and Roddy 2009). However, Rasch provides a solution as it transforms the ordinal response into interval scale using probabilistic functions (Linacre 2002; Bond and Fox 2007; Azrilah et al. 2013). Scholars including Bond and Fox (2007), Tennant and Conaghan (2007), and Azrilah et al. (2013) have agreed that the Rasch model provides sufficient parameters for a good measurement with the advantage of, among others, providing a linear scale, transforming ordinal data into interval data, proposing suggestion for missing data, and assessing items’ quality. Thus, it is commonly suitable for validating the reliability of a newly developed measurement scale.

The Islamic Quality Management Scale (IQMS), tested in this article, is an instrument meant for measuring the application of Islamic values in quality management context. Based on an extensive literature review of previous researchers (Ishak and Osman (2016); Ishak and Osman (2019), this article reports on the validity of 60 items vested under 8 dimensions. The IQMS is proposed as a new measurement since the existing tools marginally refer to Islamic substances and are found to be narrowed down to physical or hardcores of quality management. Moreover, the existing empirical studies have narrowly analyzed values based on the general framework of Hofstede’s cultural dimensions (Baird et al. 2011), Competing Values Framework (CVF) of O’Reilly (Prajogo and McDermott 2011; Prajogo and McDermott 2005; Gambi et al. 2013), organizational culture profile (OCP) of Quinn and Rohrbaugh (Denison and Spritzer 1991), and Detert’s framework (Detert et al. 2000; Detert et al. 2003).

However, at the advent of the twenty-first century, there is consistent inclination among contemporary scholars to elaborate quality management based on Islamic perspectives. These are referred to the writings of Khaliq (Khaliq 1996), Abulhasan and Khaliq (1996), Al-Buraey (2005), Sany et al. (2011), Siti and Azmi (2011), and Ishak and Osman (2016). The major similarity in their works is the conceptual elaboration on a list of values embedded in the practice of quality management. Only few empirical data on that matter are reported, and the appropriate instrument could not be located, which translate the gap to introduce IQMS.

This study employed organizations as the unit of analysis. In tandem with the use of the Rasch model as the method of analyses, the determination of adequate sample size followed the recommendation by Linacre (1994). Table 1 explains the minimum and best determined number of sample size.

Table 1 Item point-measure correlation for pilot test

For each organization, the targeted respondents were individuals who understand and have knowledge on quality management implementation in their organizations. Thus, for well-targeted respondents, this study administered the IQMS among participants of ISO9001 training session, organized by a national-level certification body. The training session was participated by individuals who were directly involved in quality management such as the management representatives, the document controller, or the quality department staff to represent their organizations. Out of 200 questionnaires, 59 were returned, with a response rate of 29.5%. The number of respondents is considered as sufficient in the Rasch model analysis, at the 99% confidence interval (Linacre 1994).

3 Analyses and Discussion

3.1 Item Polarity Analysis

In the Rasch model, information on response validity is provided by analysis of item polarity, which can be detected from the value of item point-measure correlation. Table 1 presents the value of point-measure correlation for 61 items. The positive values show that the responses were valid. Any negative value is undesirable as it signals that the item is measuring in the wrong direction. Thus, it is a sign of distortion and required further scrutiny, to either remove the item or retain it. Based on Table 1, all responses were valid with a maximum value of point-measure correlation of 0.74 and a minimum value of 0.33.

3.2 Item Fit Analysis

An item is considered as misfit if it simultaneously falls outside the acceptable ranges of mean square (0.5 < × <1.5), z-standard (−2.0 < × < 2.0), and point-measure correlation or PMC (0.4 < × < 0.8). Based on the analysis, there are only two items (Com05 and Com06) which did not fit the model. This shows that both items do not measure similar trait or are not unidimensional with other items (Bond and Fox 2007). In such case, Rasch suggests on the items’ removal. Table 2 reports on these items. The values for MNSQ and z-standard are far beyond the acceptable ranges, while the PMC are slightly lower than the desired value.

Table 2 Summary of misfit items for pilot test

According to Bond and Fox (2007), Rasch provides several rationales for problematic items and prompts the researcher to scrutinize these items prior to removal. As postulated by Azrilah et al. (2013), the effectiveness of item removal can be detected from five indicators: item reliability, item separation, person separation, item infit MNSQ SD, and person infit MNSQ SD. An effective item removal should be reflected in increased item reliability, item separation, and person separation, but decreased item and person infit MNSQ SD. In the current study, these indicators are presented in Table 3. The removal of Com05 and Com06 was found effective as it satisfies the effectiveness indicators. Com05 and Com06 reported slight increase in item and person reliability and slight decrease in item infit MNSQ SD. In Rasch, the fit statistics inform on the fitness of data to the model.

Table 3 Analysis on the effectiveness of misfit items’ removal

3.3 Local Item Dependency Analysis

Local item dependency can be detected via the value of the largest residual correlation, with a cut-off value of 0.7 (Yen 1993; Linacre 2011). Above that value, the paired items may have some duplicative features of each other, or they might be redundant. It also indicates that the respondents may perceive them as similar, thus should be considered to be combined, rephrased, or removed for better understanding among respondents.

Based on the results, there are six paired items which correlate above 0.7, as reported in Table 4. Thus, for each pair, one of the items should be retained, while another should be removed. Selection of retained items should be made based on their values of mean square and z-standard. Items with mean square and z-standard values approaching their ideal values of 1 and 0, respectively, should be retained (Azrilah et al. 2013; Bond and Fox 2007). Therefore, items Con06 and DIL 12 were retained, whereas Con07, Con08, Con09, and DIL11 were removed. Items Con06 and Con08 had thrice correlation beyond 0.7. Thus, the results were consistent to retain Con06 and remove Con08. As for DIL02 and Con06, the correlation is 0.68, very close to the cut-off value of 0.7. It was a sign that more than 50% of the respondents answered both questions similarly. However, both were retained as they measured different dimensions. Table 4 summarizes the results.

Table 4 Summary of items with largest residual correlation in pilot test

Another indicator for local item dependency is the logit values. Items with similar logit values mean that the items are measuring similar ability. In the Rasch model, each item should ideally measure different levels of abilities, and then the scale will not be lengthy (Azrilah et al. 2013). However, if the items originated from different dimensions, they should be retained. On the other hand, if the items originated from similar dimensions, selection should be made based on the values of mean square and z-standard. The closest to the expected values of mean square and z-standard should be retained (Linacre 2002). Table 5 shows the items with similar logit values and the decisions.

Table 5 Summary of items with similar logit values in pilot test

Based on Table 5, there are two pairs of items with 0.0 logit values: BoT02 and BoT05. BoT05 was suggested to be removed as the mean square and z-standard values were far from their ideal values of 1 and 0, respectively. Similarly, based on the mean square and z-standard, Con07 was also suggested to be retained as compared to Con08 though both had 0.0 logit values. However, Con07 had been previously suggested for removal due to high correlation, above the cut-off value of 0.7 (Table 5). Thus, both Con07 and Con08 were suggested to be removed. Items DIL04 and DIL06 also had similar −0.10 logit values. However, DIL04 was retained because its values of mean square and z-standard were approaching the ideal values of 1 and 0. Meanwhile, three items, DIL03, DIL07, and DIL12, shared the logit value of −0.7. Based on the comparison of mean squares and z-standards, items DIL03 and DIL07 were removed. Only item DIL12 was retained. Finally, item Coo03 was suggested to be retained as compared to Coo01, though both shared the logit value of −0.17.

Prior to removal, Rasch prompts problematic items to be scrutinized (Bond and Fox 2007). As postulated by Azrilah et al. (2013), the effectiveness of item removal can be detected from five indicators: item reliability, item separation, person separation, item infit MNSQ SD, and person infit MNSQ SD. An effective item removal should be reflected in increased item reliability and person separation, but decreased item and person infit MNSQ SD. In the current study, as several indicators were found constant, Table 6 only reports on the item separation and item reliability. Based on the table, all items reported increased item reliability, which reflect the items’ removal effectiveness.

Table 6 Analysis on the effectiveness of removing dependent items

4 Results and Discussion

Based on the results, there are 11 problematic items. Two items, Com05 and Com06, were considered misfit as they fall outside the acceptable ranges of point-measure correlation, mean square, and z-standard. Another four items, Con08, Con09, Con10, and DIL11, were suggested for removal as they produced high correlation which reflect on item redundancy. Apart from that, another seven items were suggested for removal due to similar logit values which depict that the items are measuring similar ability, which undesirably prolong the list of items. In the Rasch model, each item should ideally measure one ability (Azrilah et al. 2013). However, two items were consistently suggested for removal based on correlation analysis and item dependency analysis. Thus, a total number of 11 items were suggested for removal. Fortunately, Rasch is also capable of indicating the removal effectiveness. As postulated by Azrilah et al. (2013), the effectiveness of removal can be reflected from the increase in item separation and item reliability, as explained in Table 6.

Based on this study, the IQMS is proven as a reliable measurement scale to assess the application of Islamic values in the context of quality management. The scale can be used by the organizations to diagnose the level of value application, which can contribute to empirical evidence of the standard’s effectiveness. Furthermore, policy makers can utilize the results to further design or formulate appropriate reforms to nurture the inculcation of values. As for the management of organizations, they can improve the application of values or plan toward it, as good values have been articulated as the push factor toward economic reconstruction. This can be evidenced from the success story of the Japanese recovery from the aftermath of World War II, which can be traced back from their innate positive cultural values topped up with the quality management knowledge transferred from the Western quality gurus during the 1950s and 1960s (Ishikawa 1985; Kehoe 1996). During the early 1980s, several books on management and organizations were published and turned out to be best sellers, disseminating the idea of cultural values’ importance. Among them was The Art of Japanese Management by Pascal and Athos in 1981, Corporate Cultures by Deal and Kennedy in 1982, and In Search of Excellence by Peters and Waterman in 1982. These books have concluded on the importance of cultural values to determine the success of organizations, which had been approved by the Japanese success (Naceur 2005).

5 Conclusion

The article demonstrated an analysis on quality of the proposed psychometric properties via indicators of items’ fitness as offered in the Rasch model. The psychometric properties are specifically developed for the purpose of measuring the application of Islamic values in organizations. The Rasch model provides empirical evidence which reflect on item quality, via three measures, item polarity analysis, fit statistics, and local item dependency, as discussed earlier in the article. Based on item polarity analysis, all items are valid. However, based on the fit statistics, two items were considered as misfit. Another four items were considered as redundant as they produced correlation above the maximum threshold. Meanwhile, seven items were considered as measuring similar ability. However, two items were found consistently problematic in the local item dependency analysis and similar logit values. Based on these interpretations, 11 items were found problematic and suggested for removal, as summarized in the Appendix. The removal was found effective in all items as the removal increases the item reliability and item separation.