1 Introduction

Indigenous peoples in Canada, Australia, and Aotearoa/New Zealand are profoundly disadvantaged on virtually all markers of health and social status. Despite ongoing efforts to improve Indigenous health, disparities between Indigenous and non-Indigenous populations in these and other western nations continue to grow. Indigenous populations in Canada, Australia, and Aotearoa/New Zealand face significantly higher mortality rates, higher rates of infectious and chronic diseases, poorer overall health status, and life expectancies at birth between 8 and 20 years lower than their non-Indigenous counterparts (Australian Bureau of Statistics, & Australian Institute of Health and Welfare 2003; Ellison-Loschmann and Pearce 2006; Health Canada First Nations and Inuit Health Branch 2003; Martens et al. 2005). In addition to these persistent health disparities, Indigenous populations also face an increasing burden from sexually transmitted infections (Australian Bureau of Statistics, & Australian Institute of Health and Welfare 2003; Health Canada First Nations and Inuit Health Branch 2003), tuberculosis (Das et al. 2006; Dyck et al. 2007; Health Canada First Nations and Inuit Health Branch 2003), epidemics of HIV/AIDS (Guthrie et al. 2000; Health Canada First Nations and Inuit Health Branch 2003) and lifestyle-related diseases, particularly cardiometabolic diseases (Australian Bureau of Statistics, & Australian Institute of Health and Welfare 2003; Riddell 2005; Young et al. 2000).

To more sensitively and effectively respond to Indigenous health needs, there has been a growing shift to the provision of Indigenous community-directed and controlled health and social services, particularly for preventing disease and promoting health. These programmes attempt to account for the cultural, social, economic and historical contexts of Indigenous communities, and to tailor programmes to local cultural and community contexts to facilitate better penetration and uptake of prevention initiatives. Such programmes require data on health and social indicators concerning contextual aspects of community-level environs that influence lifestyle, health risks, and disease outcomes. Selecting indicators that will adequately and appropriately measure these characteristics is challenging, as most such indicators are created by non-Indigenous bodies and therefore may not capture information relevant and meaningful to community prevention efforts.

Many researchers and community stakeholders criticise the application of conventional health and social indicators and highlight the importance of indicators that reflect community goals and social contexts in planning and evaluating community-based prevention efforts (Hancock et al. 1999; Pearce 1996; Walker et al. 2003). Issues of adequacy of existing indicators are especially pertinent when considering their application for research, surveillance, and monitoring prevention programmes in Indigenous communities. Few indicators incorporate traditional knowledge, cultural and historical issues, or a holistic approach to health—considerations that are essential to health and community development efforts for many Indigenous peoples (Durie 1994; Giles and Findlay 2004; Thompson and Gifford 2000; Wilson and Rosenberg 2002).

The lack of culturally relevant indicators stems from researcher-driven efforts which, albeit well-intended, have not provided adequate opportunities for Indigenous stakeholders to contribute to the research process. Non-participatory approaches to Indigenous community health research are no longer tenable owing to research that is not socially or culturally relevant or provide benefit to Indigenous communities. The importance of community engagement is reflected in ethical guidelines and codes of research ethics for conducting Indigenous health research in Canada (Schnarch 2004), Australia (VicHealth Koori Health Research and Community Development Unit 2000) and New Zealand (Health Research Council of New Zealand 1998). Participatory research has the added benefit of balancing cultural and social relevance with scientific rigour (Fisher and Ball 2005; Daniel et al. 1999); demonstrated benefits to research range from enhanced participant recruitment and retention rates, enhanced cultural validity of measures, reduced reporting bias, enriched interpretation of research findings and increased translation of findings into action (Cargo and Mercer 2008).

Given the need for culturally relevant indicators, community-based endeavours to create and monitor indicators that better address local concerns and contexts have burgeoned since the early 1990s (Besleme et al. 1999; Norris and Pittman 2000). Many examples of Indigenous (Giles and Findlay 2004; Karjala et al. 2004; Steering Committee for the Review of Government Service Provision 2003) and non-Indigenous (Ontario Healthy Communities Coalition 1999; Popovich 1996; Stein 1996; Waddell 1995) community indicator projects demonstrate increased attention to community engagement in multi-stakeholder processes for indicator development.

Several sets of criteria have been developed to assist policymakers, researchers, and communities to assess indicator validity, reliability, and applicability for different needs. The U.S. Institute of Medicine (IOM) synthesised the selection criteria literature for health quality and population health measures. This work notes that health indicator selection guidelines most often include consideration of indicator relevance, meaningfulness or interpretability, scientific evidence, reliability or reproducibility, feasibility, and health importance (Institute of Medicine, Committee on the National Quality Report on Health Care Delivery 2001).

Given concerns about applying mainstream indicators to the specific contexts of diverse communities including Indigenous populations, alternate criteria have been proposed to inform the planning and evaluation of community-based prevention programmes. Criteria include assessing indicators according to characteristics such as relevance to and accessibility by the community, changeability of conditions by direct citizen or indirect policy change, sensitivity to change over time, and ability to disaggregate data (Black and Hughes 2001; Hancock et al. 1999; Institute of Medicine, Committee on the National Quality Report on Health Care Delivery 2001; Sawicki and Flynn 1996; Waddell 1995). Beyond evaluation criteria, the process by which indicators are evaluated and chosen is vital. In accordance with participatory research principles (Cargo and Mercer 2008), many have called for the inclusion of community representatives in the selection of indicators representing community-level characteristics (Hancock et al. 1999; Karjala et al. 2004; Raphael et al. 1999; Waddell 1995). Several guidelines and toolkits have been created to assist communities in this work (Kingsley 1999; Ontario Healthy Communities Coalition 1999; Redefining Progress, & Earthday Network 2002; Tyler Norris Associates. Redefining Progress, & Sustainable Seattle 1997; U.S. Environmental Protection Agency).

Criteria, guidelines, or a tool for use in and by Indigenous communities to evaluate and select indicators does not exist. This project aimed to develop selection criteria and a rating tool that could be applied by Indigenous community stakeholders, policymakers, health professionals and researchers to assess the utility and appropriateness of existing health and social indicators for use in Indigenous community-based health research and prevention programme planning and evaluation.

2 Methods

2.1 Tool Development

Based on a review of existing criteria we culturally adapted the indicator criteria proposed in the IOM study to be relevant to Indigenous communities. Methodological and quality criteria from other indicator projects and medical, health promotion, and environmental assessment research were also consulted, including considerations of validity, timeliness, sensitivity, comparability and flexibility (Kramers 2003).

The rating tool was developed and refined through a collaborative and iterative process (Fig. 1) that engaged Indigenous and academic stakeholders from Canada, Australia, and Aotearoa/New Zealand.

Fig. 1
figure 1

Community indicator rating tool development process

In its original form the indicator rating tool was comprised of a statement on the intended purpose of the tool and 18 questions within three criteria: Importance, Soundness, and Viability. A four-point rating scale and a “Not Applicable” option were included on the tool.

To establish the face and content validity of the rating tool, feedback was obtained through:

  • Three teleconferences with academic and community partners in the three countries.

  • Three discussion groups and four interviews with Indigenous stakeholders (n = 14).

  • An international meeting with 14 Indigenous and academic stakeholders from the three countries.

In response to feedback a slightly revised tool was then submitted for pretesting by Indigenous partners in each country (range 3–4 persons per country). Pre-testers without English literacy limitations indicated that completing the form was straightforward and they believed the tool would be useful for communities that were interested in selecting indicators for programme planning and evaluation uses. Pre-testers with lesser English literacy had some difficulty filling out the form.

Qualitative content analysis indicated five categories of problems encountered by pre-testers:

  1. 1.

    Jargon and specific terms that were difficult to understand (e.g., “upstream” influences on health) or that might be understood differently by different individuals (e.g., “at the community level”).

  2. 2.

    Understanding the intention of some questions within the cultural validity and scientific validity criteria.

  3. 3.

    Some questions were not applicable for rating certain indicators and participants were unsure about how to respond. Pre-testers recommended that users be reminded of the “Not Applicable” response option.

  4. 4.

    Pre-testers found two questions that incorporated more than one concept and suggested that these ideas be separated into distinct questions.

  5. 5.

    Cultural appropriateness was a concern for participants regarding the wording of two questions in the soundness criterion.

Meeting participants commented on the specialised knowledge required to answer questions in the scientific validity sub-domain of the soundness criterion, noting that those without community health research or other professional public health experience would be precluded from completing this section. It also emerged that use of the indicator rating tool was predicated on a willingness to accommodate both scientific and Indigenous perspectives and that those who would be applying the tool should be aware of that before use.

A consensus-based process to revise questions for each criterion was conducted with academic and community-based partners; results are shown in Table 1. Collaborators suggested adding a glossary of key terms. These definitions were developed and included. In addition, the four-point response scale was replaced by a six-point response scale. Further, to maintain a commitment to balancing Indigenous and scientific perspectives, questions concerning scientific validity were separated from other questions into a distinct “Part B” of the form, designated for completion by those with training in scientific methodology.

Table 1 Criteria, sub-domains, and questions contained in indicator rating tool during pilot testing

2.2 Pilot Testing

Pilot testing with the revised criteria and rating tool was conducted with Indigenous community partners in each country. Each pilot testing session in each country was led by a facilitator—a program officer familiar with the tool and the aims of the project. From a sample of indicators used by government statistical agencies in each of the three countries, six indicators for pilot testing were selected (each session rated these six indicators) to represent a variety of domains of community attributes. These domains were based on an conceptual framework and indicator classification system developed as part of the project (Marks et al. 2007).

2.3 Data Analysis and Tool Revision

As response scales for all items were identical, we computed standardised alphas (representing consistency among items) and item-total correlations (the relation of question scores to total scale scores) to assess the properties of questions in the tool for each indicator rated.

Based on these analyses and a review of comments from facilitators about questions in the tool that presented difficulty to raters, revisions were made. To identify and rectify items that were inconsistent, each domain was examined and problematic items were removed or reworded.

In addition, for each indicator pilot tested, agreement between raters was assessed by country using the intraclass correlation coefficient (ICC). Here, the ICC reflects the proportion of the total variance due to the “true” variance among items falling under a given indicator for a given country (a low ICC indicates poor agreement between rater scores for a given indicator). The tool was designed to assess aggregated scores in group rating exercises, rather than individual ratings of indicators. As random errors tied to individual ratings will average out in aggregated scores, we adapted our reliability coefficient to reflect the resulting superior reliability of average scores compared to individual ratings by controlling for the number of individual ratings (i.e., number of raters) on which the average was based (Streiner and Norman 2003). The ICC was calculated as (σ 2 i /(σ 2 i  + (σ 2r  + σ 2e )/k), where σ 2 i is the variance component for the items, σ 2r is the variance component for the raters, σ 2e is the variance component for residual error, and k is the number of raters. Guidelines for evaluating ICCs consider values below 0.40 as poor, from 0.40 to 0.59 as fair, from 0.60 to 0.74 as good, and from 0.75 to 1.0 as excellent (Cicchetti 1994).

3 Results

3.1 Pilot Testing

Seventeen individual Indigenous raters primarily involved in community prevention service planning or provision were involved in pilot testing the indicator rating tool. For each indicator, 14–17 participants (6–7 in Aotearoa/New Zealand, 2–5 in Australia, and 5 in Canada) applied the rating tool. Participants were encouraged to participate in the entire rating session, but this was not always possible. In Australia and Aotearoa/New Zealand there was variation in the number of raters for each indicator. Given the flexibility required for participatory research, raters’ responses were included in analyses, even if they had not rated all indicators.

Seven participants returned forms with missing data; less than 2.5% of rating responses were missing. Thirty-six of 45 missing responses were for questions within Part B of the form (scientific validity section). Missing values were substituted for the question for that indicator using mean values for responses for that country. There were no “Not Applicable” responses to any questions for any indicators.

Twelve items in the tool had an item-total correlation <0.2 in analysis of the full scale for at least one indicator (Table 2). Of these, five items had item-total correlations <0.2 for the full scale in ≥50% of the indicators rated. Three of these items (3b, 6a, 6b) were removed; for two of these items (4a and 4c) the wording of questions was revised. Responses to items 4a and 4c which asked about “any concern” were almost uniformly high. Item 3b was removed, as it likely reflected raters’ political or personal opinions, rather than any true perspective on the indicator in question. Item 6a was removed as the tool’s purpose had broadened to allow raters to answer questions from the perspective of their community. Item 6b was removed because participants found the question and the concept it was addressing to be confusing.

Table 2 Items with low item-total correlations in indicator rating tool pilot testing (across all participants) (n = 6 indicators)

For the 14 questions retained unedited in the tool, individual participants’ ratings of indicators ranged from 14 to 62 out of a possible range of scores of 14 (all questions answered most positively) to 84 (all questions answered most negatively). Mean summary scores for rated indicators ranged from 25.9 (SD = 2.8) to 52.6 (SD = 10.8) for Australian participants; 20.0 (SD = 3.5) to 24.6 (SD = 7.2) for Canadian raters; and 22.7 (SD = 4.1) to 27.7 (SD = 8.2) for Aotearoa/New Zealand pilot testers. Across all pilot testing sessions, the mean rating for all seven indicators was 27.3 (SD = 9.7).

Table 3 shows the internal consistency for these 14 unedited items as ranging from 0.65 to 0.95 for the six indicators rated. Alpha values of 0.70 or greater are widely interpreted as acceptable (Streiner and Norman 2003). Average item-total correlations ranged from 0.24 to 0.75. Ratings of the “removed or separated from family during childhood” indicator yielded the lowest internal consistency for all measures. Supplementary analyses of data on this indicator excluding raters from Aotearoa/New Zealand, where forced removal of children to church- or state-run institutions or placement with white families was not practised to same extent as in Australia and Canada (Armitage 1995), yielded a standardised alpha of 0.76 and an average item-total correlation of 0.33 (Table 3).

Table 3 Internal consistency for indicator rating tool for all items retained in the final instrument, across all raters in all countries, by sample indicator rated

Inter-rater reliability calculations for all items in the pilot-test version of the tool demonstrated excellent agreement for all indicators for raters in Aotearoa/New Zealand (ICC range: 0.83–0.93) and Canada (ICC range: 0.82–0.96) (Table 3). Inter-rater reliability of Australian participants was excellent for two indicators, good for one indicator, fair for one indicator, and poor for two indicators. ICC values ranged from 0.00 to 0.79 for the Australian raters. Across the three countries, ICC values showed excellent agreement, ranging from 0.88 to 0.95.

3.2 Final Instrument

Based on the pilot testing results, the rating tool was revised. Appendix 1 shows the final, formatted instrument. Items 4a and 4c were re-worded in more neutral language to reduce skewness of responses. The “Not Applicable” response option was retained, even though no pilot testing raters had selected it. The final 16-item instrument was comprised of 14 unedited items and two edited (re-worded) items.

4 Discussion

Overall, the indicator rating tool pilot tested in this study demonstrated good reliability, except for application to one indicator of little relevance to participants in Aotearoa/New Zealand. The iterative development process resulted in an instrument with 16 questions within three domains and six sub-domains. Pilot test ratings of indicators were positive overall, though mean summary scores for Australian participants were higher (more negative) for all indicators. Questions retained in the tool demonstrated good internal consistency. Inter-rater reliability was excellent for all indicators for two of three pilot testing groups; agreement of raters in Australia was less consistent, but fewer raters and a more variable group of participants could explain this discrepancy. Lower reliability results may reflect greater rater heterogeneity among Australian raters: participants represented a broad range of sectors (health, sport, education, language, and government), urban and rural areas.

Face validity (where the instrument appears to measure what it purports to measure) and content validity (where the items in the instrument are sufficiently representative of the content domains being measured) were established by consultation with stakeholders and collaborators in teleconferences and meetings. Participants in the iterative development process (Fig. 1) and the consensus-based tool revision process explicitly addressed content and face validity in their meetings.

This tool enables systematic evaluation of the utility of existing indicators. In so doing it might become apparent that existing indicators do not extend to areas of concern to Indigenous people, such as constructs including strength of cultural identity, loss of land or traditional ties to land or one’s people, systemic racism and other aspects of relations with mainstream cultures, or capability in tapping the circumstances of different settings (e.g., urban vs. remote). Such determinations may suggest a need for developing new indicators, where the tool could again be applied to select newly developed indicators evaluated against criteria of scientific validity and cultural relevance.

Our tool is not intended to replace procedures for validating newly developed indicators, or to serve to identify constructs or content areas for which new indicators are needed. Its purpose is to guide assessment of the appropriateness of proposed indicators selected by a range of user groups including policy-makers, program managers, and prevention researchers. For this purpose we contend, on the basis of our reliability and validity testing results, that the tool is of sound construction and relevant to academic-scientific prevention partnerships with Indigenous communities, where a balance of scientific and cultural utility is desirable (Cargo et al. 2007).

This study has several limitations. Given participants’ time constraints, the tool was pilot tested with a limited pool of indicators, and we could not assess a selection of “good” and “bad” indicators to assess the discriminative validity of the tool as when testing a clinical diagnostic instrument. Also, as the tool was designed as part of an international collaborative health research programme involving participants in Australia, Canada, and Aotearoa/New Zealand, the generalisability of the instrument to other populations, or for use in other types of projects, is unknown. Further, we assessed the reliability of the instrument for a mix of people from Indigenous bodies and governmental organisations representing a range of backgrounds and prior experiences. Future studies may wish to examine how the psychometric properties reported here might vary according to the characteristics of raters (e.g., different ratings by Indigenous living in urban vs. non-urban or remote environments).

Future needs include retesting reliability and further validity evaluations after further testing of the final revised version of the instrument. As the pre-testing and pilot testing utilised indicators from a range of conceptual domains, additional testing should also assess the ability of the tool to discriminate between indicators that represent the same or a similar construct. Feedback from communities applying the tool will provide essential information to guide its dissemination and any necessary subsequent revisions.

The rating tool responds to the increasing trend of collecting data on health and social indicators for public health surveillance, health needs assessment, programme evaluation, and health research activities. But for indicators to be useful for prevention research, they must not only be valid and reliable measures, but also be relevant and appropriate to the specific contexts in which they are being applied. This is particularly true for the use of indicators in programmes with Indigenous communities, where many have noted the challenges to application of existing indicators designed for use in other contexts (Giles and Findlay 2004; Karjala et al. 2004; Ten Fingers 2005; Walker et al. 2002).

Ethics guidelines on conducting research with and in Indigenous communities emphasise the importance of participation of Indigenous community members at all stages of the research process, including determination of the methodology to be employed (Maori Health Committee of the Health Research Council of New Zealand 1998; Medical Research Council of Canada. Natural Sciences and Engineering Research Council of Canada, & Social Sciences and Humanities Research Council of Canada 2003; National Health & Medical Research Council 2003). Furthermore, involving community stakeholders in indicator selection processes aligns with the principles of participatory research (Cargo and Mercer 2008).

More often seen in other fields such as natural resource management, environmental impact assessment, and sustainable development, indicators based on traditional ecological knowledge and community-based participatory processes for selecting indicators have been advocated and increasingly employed (Karjala et al. 2004; Natcher and Hickey 2002; Reed and Dougill 2002). Other researchers have noted the importance of community involvement for more than mere approval of a list of indicators, but through participation in identifying indicators (Cunningham and Beneforti 2005; Karjala and Dewhurst 2003) and defining appropriate categorizations of indicators (Andersen and Poppel 2002; Natcher and Hickey 2002). Moreover, utilising more meaningful measures and methodologies is essential to addressing concerns that health research is not amply meeting the needs of Indigenous communities (Baum et al. 2006; Smith 1999).

This tool is unique in providing a guided process that balances scientific and cultural concerns whereby health researchers, community members, and public health funding agencies can identify the most relevant indicators to evaluate the effectiveness of Indigenous community-based prevention efforts. Extensive consultation and field testing results support its applicability for health research with Indigenous communities.