Introduction

Humans have been in close relation to crops for about 13,000 years, and a large share of the world population continues to entertain this interdependent relation in small-scale farming systems (Diamond 2002; Gepts 2004). In these systems, morphological diversity within crop species is remarkable, as stressed early by Darwin in Origin of the Species (1859). This diversity is generated and shaped by various evolutionary mechanisms, which are strongly influenced by human management practices (Hodgkin et al. 2007). These practices especially play on selection (Boster 1985; Louette and Smale 2000), as well as on genetic drift. They also have an impact on both seed- and pollen-mediated gene flows (Barnaud et al. 2007; McGuire 2008; vom Brocke et al. 2003).

Landraces selected and identified using vernacular names by farmers are assumed to be distinct management units, organizing the continuum of phenotypic and genetic diversity within crop species (Badstue et al. 2007; Bellon and Brush 1994; Harlan et al. 1976). In small-scale farming systems, knowledge of landrace diversity is of upmost importance for the survival of societies as part of their adaptive strategy. Humans have defined discrete categories into the crop diversity continuum just like they did for the rest of their environment, and have been using them as frames for reasoning and management (Atran and Medin 2008). These folk taxonomies involve identification, naming, and classification processes that are interrelated. According to Friedberg (1991), identification is a perceptual process through which farmers assign a plant to a class based on its perceived characteristics. Naming is the process through which these classes are labeled, mainly for communication purposes, which suppose that individuals exchanging information and planting material share a common nomenclature. Last, classification refers to the multi-level organization of classes.

A big debate exists concerning whether farmers’ knowledge of landrace identification is homogeneous in farming societies, i.e., if farmers agree on the identification, naming, and classification of crop diversity. Indeed, the impressive number of landrace names inventoried on-farm in most studies questioned whether a collective consensus exists among farmers for landrace identification and naming (Jarvis et al. 2008; Sadiki et al. 2007). Furthermore, studies conducted on various species in different countries reported a poor match between farmers’ landrace nomenclature and the structure of phenotypic and genetic diversity (e.g., Barnaud et al. 2007; Quiros et al. 1990; Salick et al. 1997). These studies highlighted the lack of scientific knowledge concerning farmers’ landrace taxonomy. They raised debate concerning the existence of a common reference frame for landrace identification and naming shared by farmers on the local scale. They also raised questions concerning factors involved in inter-individual variations of taxonomy on the local scale.

This issue was first tackled by Boster in his work on cassava landrace identification among Aguaruna farmers (Boster 1986), based on the cultural consensus theory (Romney et al. 1986). Assuming that landraces are categories that constitute “a frame for storing and conveying experience and information” (Bulmer 1974 in Friedberg 1991), he hypothesized that farmers should exchange information and share a common experience of the landraces in order to agree on their identification and naming based on their phenotypic characteristics. He showed that differences of knowledge of landrace identification among farmers were associated with cultural differences, knowledge heterogeneity being higher between kinship groups than within. This indicates that the level of knowledge homogeneity among farmers for landraces’ identification is proportional to their collective experience of crop characteristics and to the intensity of information exchange among them, and thus to the strength of their social ties and cultural proximity.

Since then, cultural consensus theory has been applied to a variety of local ecological knowledge domains, showing that the ways people think of and classify the natural world differ across cultures. Atran and Medin (2008) notably showed that different cultural groups living in the same agro-ecological zones present noteworthy differences of knowledge concerning their environment. These studies revealed that culture influences the way human societies classify the continuum of biological variability surrounding them, but this issue was rarely investigated with crop landraces. Most work on landrace identification by farmers was conducted in the frame of crop diversity studies (Nuijten and Almekinders 2008; Sadiki et al. 2007), and inter-individual variations of knowledge of landrace identification were not documented. Indeed, most studies document the landrace name given by one farmer to identify a plant (Soler et al. 2013), or rely on a focus group to document what is the consensual name for it within the community (Mucioki et al. 2014). Our understanding of the identification, naming, and classification of crop intraspecific diversity by rural societies hence remains limited despite its importance regarding crop genetic resources conservation and property rights issues (Lapeña and Halewood 2016).

The way farmers identify and classify landraces is expected to influence their management of crop infra-specific diversity, and differences of management should therefore exist between cultural groups. The work of Perales et al. (2005) brought a key contribution to this issue, indicating that two geographically close ethnolinguistic groups in Mexico have divergent seed selection practices as they maintain morphologically distinct maize populations despite gene flows between them. Their results suggest that differences in knowledge of crop landrace management exist between ethnolinguistic groups despite their geographical proximity. Our study aims at testing this hypothesis by characterizing differences in knowledge concerning sorghum (Sorghum bicolor [L.] Moench–Poaceae) landrace identification between three ethnolinguistic groups in a locality of the Mount Kenya region. We tested three main hypotheses following Bosters’ work on farmers’ knowledge concerning landrace identification. First, farmers belonging to the same ethnolinguistic group display more similar knowledge than those belonging to different groups because they exchange more information. Second, ethnolinguistic groups display different levels of knowledge heterogeneity for the identification of the different landraces because they differ in their experience of their characteristics. Third, ethnolinguistic groups differ in the landrace names they associate with the different morphotypes because information exchange is limited between them.

This work builds on a previous study conducted in the same area, in which samples from 14 sorghum landraces named by famers in the three ethnolinguistic groups were collected (Labeyrie et al. 2014). Genetic analysis showed that the plants collected belong to four main genetic clusters corresponding to differences in phenology, as well as in the origins and history of the landraces. These four clusters respectively correspond to (i) short-cycle local landraces, (ii) long-cycle landraces, (iii) an introduced variety released by research several decades ago (Kaguru), and (iv) another introduced variety that has just diffused in the area (Gadam). The first two clusters included several morphotypes associated with different landrace names, the occurrence frequency of which differed significantly among ethnolinguistic groups. Based on these results, our article addresses two main questions: do the three ethnolinguistic groups differ in their knowledge concerning sorghum landrace identification, and does their respective knowledge vary according to sorghum genetic and morphological characteristics?

Material and Methods

Study Site

The study site was located in the Eastern Province of Kenya at the boundary between Tharaka-Nithi and Embu counties (0° 24′ S, 37° 46′ E). We focused on a contact zone between Chuka, Tharaka, and Mbeere ethnolinguistic groups (Fig. 1). The study site presents uniform agro-ecological conditions, at an altitude of about 900 meters (m) above sea level, and with mean monthly temperature ranging between 21.7 and 23.9 degrees Celsius (°C) (Jaetzold et al. 2007). The mean rainfall is about 700–800 millimeters (mm) per year, distributed across two rainy seasons with the long rains occurring from March to May and the short rains from October to December.

Fig. 1
figure 1

Study site location. a Location of the study site in Kenya. b Geographic distribution of ethnolinguistic groups in the Eastern Mount Kenya region. c Linguistic classification of Central Kenya languages according to Hammarström et al. (2015).

According to oral history, people started to migrate to the study area by the end of the nineteenth century. The Chuka would have been the first to settle in this area about one century ago, while the Tharaka and the Mbeere probably settled more recently, but information is lacking concerning their history. The Chuka, Tharaka, and Mbeere groups present cultural and linguistic differences (Hammarström et al. 2015). They speak distinct languages of the central Bantu cluster, which are, however, largely inter-intelligible. Members of each group believe in their common ascendance on which is based their distinct ethnolinguistic identity (Heine and Möhlig 1980; Middleton 1953). Tharaka and Chuka were allied in the past and consider themselves to be blood brothers, or gishiaro, in Kimeru language (Fadiman 1993), and they have limited relationship with the Mbeere (Glazier 1970; Mwaniki 1973). Intermarriage is frequent between the Tharaka and the Chuka, while it is very uncommon between both groups and the Mbeere (Labeyrie et al. 2016a). This relationship system is reflected by the geographical organization of the three groups, the Tharaka and the Chuka being spatially mixed and settled in the northern part of the study site, whereas the Mbeere are located separately from the two other groups in the southern part of the area. The maintenance of this geographical partition among ethnolinguistic groups results mainly from the combination of ethnolinguistic endogamy and patrilocal residence, implying that most married men settle near the compound of their father (Middleton 1953, pers. obs.).

Ethics Statement

This work was conducted in collaboration with the KALRO National Genebank of Kenya that has the national mandate for the collection of plant genetic resources and the documentation of accompanying information. Institutional and administrative procedures were carefully followed prior to undertaking the study, and dedicated committees in KALRO granted approval for our research activities. We followed recommendations of the ISE Code of Ethics, and the involvement of team member natives from the study region contributed to ensure that local procedures, rules, and customs were respected, and that authorizations were granted from legitimate authorities. First, government administrative and local community representatives were informed and kept updated of the activities, and their consent was sought before conducting the research. Then, the study objective and the future utilization of data were explained to farmers and their prior informed consent was obtained verbally before undertaking interviews and crop collection. Activities were not conducted where such consent was not granted.

Data Collection

Sorghum panicles were sampled on-farm in January and July 2011 in the three ethnolinguistic groups following the strategy described previously (Labeyrie et al. 2014), which aimed at representing the diversity of the sorghum landraces named by farmers. Seeds from the collected panicles were sown in October 2011 in an experimental field under controlled and uniform growing conditions. A total of 293 descendants were sampled to maximize the range of morphological variability and presented to a panel of farmers for identification. Out of this set, 287 panicles were scored for morphological descriptors as six panicles were too degraded at the end of the survey to be scored. One-hundred seventy plants in this subset were scored for neutral genetic microsatellite SSR markers, selected to represent the diversity of sorghum landraces named by farmers in the three ethnolinguistic groups (Labeyrie et al. 2014).

Landrace Identification Experiment

The set of 293 panicles harvested in the experimental field was presented to a panel of informants from the three ethnolinguistic groups. Thirty-two female informants were randomly chosen in each group, and their ethnolinguistic group was recorded. Only women were interviewed because they are in charge of sorghum seed selection, sowing, harvesting, and trading according to the local gendered division of labor (pers. obs.). Following the procedure used by Boster (1986), each informant was independently asked to identify each of the 293 panicles that were successively presented to her. A field assistant recorded the name used by each informant to identify each panicle. Spelling standardization was later done to ensure that differences were not due to variation in pronunciation among informants.

Morphological and Genetic Characterization

Out of the 293 panicles harvested, 287 were scored for 16 qualitative morphological descriptors at the Kenya Agricultural and Livestock Research Organization, Genetic Resources Research Institute, Muguga. The study was limited to the characteristics of the panicles because the selection of seeds by farmers is done on the panicle only, at home before threshing, and thus without considering the characteristics of the whole plant. Only qualitative descriptors were scored because they are the main criteria on which farmers base their perceptual distinctiveness (Gibson 2009). The 16 descriptors were selected for their polymorphism in the sorghum population studied and their ease of scoring. They included the main criteria that farmers reported using for identifying their sorghum landraces, according to information collected during semi-directive interviews.

Traits scored (Electronic Supplementary Material [ESM] Appendix 1) concerned the whole panicle shape, seed characteristics (color, lateral shape, shattering, endosperm texture, sub-coat presence, and pericarp thickness) and glume characteristics (color, opening, adherence, covering, awning, hairiness, texture, presence of a transversal wrinkle, and pedicelate spikelet). Parts of these descriptors were selected among those recommended by the IPGRI (1993), and more precise descriptors of seed and spikelets were added among some of those used by Snowden (1936). Procedures of double characterization of a set of panicles randomly selected made it possible to ensure the consistency of operators in scoring morphological traits, and double data entry was performed to limit typing errors.

DNA extraction, amplification, migration, and alleles’ size scoring for 18 microsatellite SSR loci were done on 170 plants out of the 287 for which panicles were morphologically scored. The study of the genetic diversity of the sorghum population, including these 170 individuals, was conducted in a previous study, and full methodological details are provided in Labeyrie et al. (2014).

Statistical Analyses

Statistical analyses were conducted to describe, on the one hand, the patterns of sorghum genetic and morphological diversity and, on the other hand, the patterns of knowledge heterogeneity among informants.

Analysis of Sorghum Genetic and Morphological Diversity

First, the structure of sorghum genetic diversity was described. A discriminant analysis of principal components (DAPC, Jombart et al. 2010) was used to identify and then describe clusters from the genetic diversity of the 170 sorghum panicles collected. The K-means method was performed prior to running the discriminant analysis using the algorithm included in DAPC function, and the optimal number of clusters to describe the diversity was determined based on the Bayesian information criterion (BIC) curve. Analyses were performed using the R package adegenet, version 2.0.1 (R. Core Team 2016; Jombart 2008).

Second, the morphological diversity of the 287 panicles was described by performing a principal coordinates analysis (PCoA) on the morphological dissimilarity matrix. Dissimilarity between panicle pairs was computed with the simple matching index, i.e., dividing the number of traits for which both panicles share the same modalities by the total number of traits. Correspondence between the structure of sorghum morphological and genetic diversity was assessed by displaying genetic clusters in colors on the PCoA scatterplot. Analyses were performed using the R package ade4 version 1.7–6 (Dray and Dufour 2007).

Measurement of Informants’ Consistency in Naming Panicles

We computed the number of informants in each ethnolinguistic group who cited the same name to identify each sorghum panicle. A cross-table was built by a group, crossing the list of the 293 panicles in one way and the list of the names given for identification in the second way, to calculate the number of informants at each combination of panicle x name. We considered that a panicle was named consistently in a given group, i.e., knowledge heterogeneity was low within it, when more than half of informants (60%) used the same name to identify this panicle. On the contrary, when less than 60% of informants used the same name, we considered that the panicle was named inconsistently, i.e., no consensus exists among farmers for its identification.

Distance-Based Analysis of Knowledge on Panicle Identification

In this paper, differences of knowledge between ethnolinguistic groups were measured by comparing, on the one hand, their level of knowledge heterogeneity, i.e., inter-individual variations in the identification of panicles, and on the other hand, their landrace identification, i.e., the name they associated to each panicle. First, we analyzed patterns of informants’ knowledge for the whole panicle set (n = 287), and then separately for subsets of panicles corresponding to each of the genetic clusters.

The heterogeneity of knowledge within the group for the identification of the panicle set was measured as the dispersion of informants’ answers. This heterogeneity degree was compared between the Chuka, Tharaka, and Mbeere groups. For this purpose, a similarity index was computed for each pair of informants as the proportion of panicles they named identically, and a distance index was then constructed by subtracting the similarity index to one. Average within-group dispersion, i.e., the average distance of individuals to group centroid in the space of the simple matching distance index, was used as a measure of knowledge heterogeneity within groups. We conducted analyses to assess farmers’ knowledge heterogeneity for the identification of the whole panicle set on one hand, and of each set corresponding to each of the four genetic groups on the other hand. Knowledge heterogeneity degree was compared among ethnolinguistic groups by testing if the average within-group dispersion was equivalent among groups through running an analysis of multivariate homogeneity of groups’ dispersions (PERMDISP2), which was performed on the distance matrix (Anderson 2006). Further, pairwise Tukey’s honest significant difference (HSD) tests were performed to test for the significance of pairwise differences in the average dispersion between ethnolinguistic groups. PERMDISP2 and Tukey’s HSD analyses were run globally on the whole panicle set.

Then, we tested if significant differences in landrace identification exist between ethnolinguistic groups, i.e., if they used different names to identify panicles in each set. This was done by testing if the centroids were equivalent for all groups, using a non-parametric multivariate analysis of variance (ADONIS, Anderson 2001). This test is a multivariate equivalent of ANOVA, based on the comparison of within- and between-group average dispersion. ADONIS results can be confidently interpreted only if within-group dispersion is equivalent among ethnolinguistic groups, which is tested by PERMDISP2. A principal coordinates analysis (PCoA) was conducted on the knowledge distance matrix to visualize knowledge patterns within and between groups. All distance-based analyses were performed using the R package vegan version 2.4–0 (Oksanen et al. 2012).

Both the analysis of multivariate homogeneity of groups’ dispersions (PERMDISP2) and the non-parametric multivariate analysis of variance (ADONIS) were run separately for each genetic cluster after performing them on the whole panicle set (n = 287). The correspondence between the structure of sorghum morphological diversity and knowledge patterns was assessed by displaying the “consistent” names (i.e., used by more than 60% of informants) using different colors on the PCoA scatterplot of morphological distances for each ethnolinguistic group.

Results

Knowledge Heterogeneity Within and Between Ethnolinguistic Groups

A large number of landrace names were cited during the identification experiment (Chuka, 30 names; Tharaka, 36; Mbeere, 39), but only a few names were used consistently at least one time, i.e., by more than 60% of the informants, to identify the same panicle (Chuka, 7 names; Tharaka, 6; Mbeere, 5). Most landrace names used consistently were common to the three ethnolinguistic groups, but some were peculiar to one or two groups. In most cases, each consistent name was used to identify several panicles, but some were used to identify only one or two panicles.

Significant differences of knowledge heterogeneity level were observed between ethnolinguistic groups for the identification of the whole panicle set (n = 293). Indeed, within-group dispersion (distance to a group’s centroid) differed significantly between groups, as indicated by PERMDISP2 results (F = 8.55, p value < 0.001). A Tukey HSD test further showed that the mean within-group dispersion was significantly lower in the Chuka group (0.27) than in the Tharaka (0.39) and Mbeere (0.38) groups, indicating a higher knowledge homogeneity in the former group. Differences of knowledge between the Mbeere and the two other groups are displayed along the second PCoA axis (Fig. 2), but the significance of these differences cannot be assessed based on ADONIS results because of differences in within-group dispersion. Furthermore, two different knowledge subgroups are distinguished within the Mbeere group along the first PCoA axis, and part of the Tharaka appears close to the Chuka, while the rest presents differences in knowledge.

Fig. 2
figure 2

Knowledge similarity among individuals according to their ethnolinguistic membership. Plot of the two first axis of the PCoA based on knowledge distance matrix between informants (n = 96 informants; the first component expresses 23% of the total variation, and the second one expresses 17%). Ethnolinguistic groups are displayed in colors (red, Chuka; green, Tharaka; blue, Mbeere).

Strong differences in consistency were observed among panicles. The proportion of informants who used the same landrace name varied among panicles, ranging from a minimum of 19% in the Chuka and Mbeere groups and 25% in the Tharaka group, to a maximum of 100% in the three groups (ESM Fig. 1). Overall, the proportion of panicles identified consistently varied strongly among ethnolinguistic groups, with 80% in the Chuka group, 40% in Tharaka, and 32% in Mbeere (n = 293 panicles).

Farmers’ knowledge was homogeneous within and between ethnolinguistic groups for some panicles, which were identified highly consistently in all groups. This was especially noticed for a set of panicles named Kaguru by a large majority of informants (Fig. 3). In other cases, the level of knowledge heterogeneity differed between groups as some panicles were identified highly consistently in one group and not in the others. This was especially striking for a set of panicles named Gadam by most Chuka informants, but identified inconsistently in the two other groups. A similar situation was observed for panicles consistently named Murugue and Mugeta by the Chuka and Tharaka, and for panicles consistently named Ngirigacha by the Mbeere, which were identified inconsistently in the other groups. Last, a high degree of knowledge heterogeneity was observed both within and between ethnolinguistic groups for some panicles that were named inconsistently in all groups.

Fig. 3
figure 3

Comparison of panicle identification between ethnolinguistic groups. Boxes represent the proportion of panicles (y-axis) identified consistently (in colors) or not (in gray) in each ethnolinguistic group (x-axis). Flows among boxes in the different ethnolinguistic groups represent the share of panicles that were identified similarly or differently between groups. For instance, the width of the flow between box A in group X and box B in group Y represents the proportion of panicles identified as landrace A in group X that was identified as landrace B in group Y.

Differences in Knowledge According to Panicles’ Characteristics

An adequate number of classes to describe the genetic diversity in our dataset was K = 4 according to the BIC criterion in K-means algorithm. Genetic clusters matched partially with the structure of morphological diversity, some genetic clusters being morphologically distinct while others overlapped (Fig. 4). G2 was highly homogeneous and distinct morphologically, while G1, G3, and G4 were heterogeneous and overlapped. Overlap was especially high between G3 and G4.

Fig. 4
figure 4

Match between genetic and morphological diversity. Plot of the first two axes of the PCoA based on panicles’ morphological traits (n = 287 panicles, 16 traits; the first component expresses 26% of the total variation, and the second one expresses 14%). Genetic clusters are displayed in colors (n = 170, panicles morphotyped but not genotyped are displayed in gray).

Knowledge heterogeneity was similar among ethnolinguistic groups for G1, G3, and G4 clusters, as no significant differences of answer dispersion were detected by PERMDISP2 (1.41 < F < 1.93, p value > 0.1). Differences of knowledge heterogeneity between ethnolinguistic groups were significant only for the genetic cluster G2 (F = 11.04, p value < 0.001; ESM Appendix 2). Indeed, the large majority of Chuka informants agree on the identification of panicles assigned to the G2 genetic group that they named Gadam, while Mbeere and Tharaka informants presented a high level of knowledge heterogeneity.

Differences of knowledge on panicle identification were observed between ethnolinguistic groups for some genetic clusters (G3 and G4) while not for others (G1). For G1, no significant differences of identification were observed (ADONIS: F = 1.71, p value = 0.044; ESM Fig. 2) as knowledge homogeneity was very high both within and between groups. Indeed, this cluster was mainly composed of panicles consistently named Kaguru by informants in all groups (Fig. 5). ADONIS test on centroid difference between groups was significant for three clusters (G2 to G4). Significant differences of groups’ centroid for G3 and G4 traduce differences of identification among groups because within-group dispersion was similar for these clusters. However, such interpretation cannot be applied to G2 because significant differences of within-group dispersion were detected by PERMDISP2.

Fig. 5
figure 5

Match between genetic clusters and landrace identification. Bars represent the number of panicles identified consistently (> 60% of informants, in colors) and inconsistently (< 60%, in black) in each genetic cluster.

Ethnolinguistic groups differed significantly in their identification of panicles in G3 (ADONIS: n = 97, F = 7.89, p value < 0.001). This genetic cluster comprised three morphological sub-clusters corresponding to different landrace names that were identified with very different consistency levels by the different groups (Fig. 6). A first sub-cluster was composed of a large number of panicles consistently named Muruge by the Chuka, while the Tharaka and Mbeere identified consistently only a small part of them. A second sub-cluster included panicles consistently named Mugeta by the Tharaka and to a lesser extent by the Chuka, while the Mbeere were inconsistent in their identification. Last, a third sub-cluster included a few panicles consistently named Ngirigacha by the Mbeere only. Ethnolinguistic groups also differed significantly in their identification of panicles in G4 (ADONIS: n = 17, F = 3.62, p value < 0.001). This cluster mainly included panicles identified consistently as Muruge, and, in a lesser extent, Serendo and Kaguru in proportions differing among groups. In addition, G3 and G4 also included a large share of panicles identified inconsistently in proportions that varied strongly between groups. These panicles were displayed in between the morphological groups corresponding to panicles identified consistently, for which knowledge homogeneity was high, on the PCoA of morphological distance.

Fig. 6
figure 6

Correspondence between morphological diversity and landrace identification for panicles assigned to G3 (dots) and G4 (stars) genetic clusters (n = 114 panicles, 16 traits). Plot of the two first axes of the PCoA based on panicle morphological traits (variability expressed: 1st Co = 23%, 2nd Co = 15%). Colors correspond to landrace names identified consistently in each ethnolinguistic group, and panicles identified inconsistently are displayed in gray.

G3 and G4 genetic clusters displayed a large morphological heterogeneity and overlapped. Part of the plants in these clusters hence presented morphological similarities despite their genetic differences. Interestingly, some panicles presenting genetic differences were named similarly because of their morphological similarity. In particular, panicles named Muruge in all ethnolinguistic groups presented morphological similarity although they belong to both G3 and G4 genetic clusters. Similarly, some panicles assigned to G4 but morphologically similar to G1 were named Kaguru.

Discussion

In this paper, we analyzed farmers’ knowledge of sorghum landrace identification in an ethnolinguistic contact zone. Our results show that knowledge varies according to both individuals’ cultural identity and panicle characteristics. We first assessed whether farmers within ethnolinguistic groups shared knowledge concerning landrace identification by measuring within-group knowledge heterogeneity, and then tested whether the names used to identify panicles differed between groups. Results showed that groups differed significantly in their level of knowledge heterogeneity, the Chuka displaying the highest homogeneity for the identification of the whole panicle set. Furthermore, within-group knowledge heterogeneity varied strongly among panicles on one hand, and differed between groups for some panicles on the other hand. This indicates that knowledge heterogeneity within and between ethnolinguistic groups is related to panicle characteristics.

We further conducted analyses to test for the effect of panicle genetic and morphological characteristics on knowledge heterogeneity within and between ethnolinguistic groups. Results first showed that within-group knowledge heterogeneity differed between groups for only one genetic cluster (G2), indicating that Chuka shared common knowledge on the identification of panicles in this cluster, while Tharaka and Mbeere do not. Secondly, we found that knowledge was highly similar and homogeneous among ethnolinguistic groups for the identification of one cluster (G1), while significant differences were observed among them for the identification of two clusters (G3 and G4). Furthermore, ethnolinguistic groups differ in their level of knowledge heterogeneity for the identification of the different morphotypes within G3 and G4.

The knowledge patterns we described reveal insights on farmers’ experience of the different landraces, and on the diffusion of knowledge within and between ethnolinguistic groups. Indeed, according to Boster (1986), landrace identification, naming, and classification regarding their morphological characteristics are socially learned and further constructed by individuals through their direct experience with the plants. He identified three major processes involved in inter-individual differences of knowledge concerning landrace identification: (i) differences among individuals in their learning sources and pathways; (ii) differences in their experience of the landrace and its morphological characteristics; and (iii) differences in the time or the willingness individuals have for acquiring experience in this domain. As there is little support for significant differences of inter-individual variations of time and willingness to learn between ethnolinguistic groups, the knowledge patterns we observed reflect, on the one hand, modalities of knowledge transmission within and between groups and, on the other hand, differences between groups in their level of experience concerning landrace identification. This leads us to discuss knowledge patterns regarding the history and characteristics of panicles and knowledge transmission modalities within and between groups (Cavalli-Sforza and Feldman 1981; Leclerc and Coppens d’Eeckenbrugge 2012; Reyes-García et al. 2009).

Panicle Characteristics and History in Relation to Ethnolinguistic Groups

Panicles presented to informants display different morphological characteristics, and have different histories and origins. These differences help explain the strong variations of within-group knowledge heterogeneity between panicles and its variations between ethnolinguistic groups for the same panicle. Indeed, farmers share knowledge on panicles presenting morphological characteristics with which they are collectively familiar. Several characteristics of landraces can contribute to building collective knowledge, such as the time since they are cultivated, their popularity, and whether they can be easily identified and distinguished based on their morphological characteristics.

A previous genetic study, which included the 170 individuals we analyzed here, showed that sorghum landraces in our study area present different histories and agronomic characteristics (Labeyrie et al. 2014). It identified four genetic clusters matching with those we identified, showing that clusters G1 and G2 were introduced varieties released by the formal breeding system, whereas G3 and G4 were local landraces with different agro-morphological characteristics.

Knowledge was highly homogeneous within and between groups for the identification of G1 panicles that were named Kaguru by most informants. By contrast, differences were observed among groups for G2 as knowledge was highly homogeneous in the Chuka group where it was named Gadam, while it was heterogeneous in the Mbeere and Tharaka groups. Differences in the dates of dissemination of these two improved varieties likely explain why G1 was identified consistently in all groups whereas G2 was only in the Chuka group. Indeed, Kaguru was introduced several decades ago and has been widely cultivated and sold in markets in the area, while Gadam was released in the area only two years before our study. Our results further suggest that G2 introduction started in the Chuka group, who display a strong collective experience of its identification. Such high knowledge uniformity may result from massive dissemination of G2 under the well-defined “Gadam” name by Kenyan agricultural extension services in the Chuka group. By contrast, knowledge heterogeneity for G2 identification in the Tharaka and Mbeere groups indicates that they are not yet familiar with its characteristics and name.

Clusters G3 and G4 included panicles identified consistently with various landrace names as well as panicles identified inconsistently, in proportions that vary strongly between ethnolinguistic groups. This variety of landrace names was associated with differences in panicles’ morphological characteristics within these clusters, the different landrace names corresponding to different morphotypes. Knowledge heterogeneity was similar among groups for these genetic clusters, but it varied between morphotypes within genetic clusters, especially G3. Ethnic groups appeared to be respectively familiar with the identification of different morphotypes, some being identified highly consistently in one ethnolinguistic group while not in others and conversely. The Chuka in particular consistently identified a set of morphologically similar panicles as Muruge, while knowledge within the two other groups was more heterogeneous. A similar situation was observed for a set of panicles identified as Mugeta by most Tharaka informants. This probably results from differences in experience that ethnolinguistic groups developed over time concerning these different landraces. This hypothesis is supported by previous results showing that Muruge and Mugeta landraces were respectively introduced by the Chuka and the Tharaka in the study area, which explains their respective higher knowledge uniformity for the identification of corresponding morphotypes (Labeyrie et al. 2014, 2016b).

Last, our study showed that knowledge heterogeneity was very high in all ethnolinguistic groups for some panicles. Our hypothesis is that these panicles present a combination of traits that do not correspond to the traits on which the landrace classification system used by most informants is based. Either these panicles may result from crosses and combine morphological characteristics of several landraces, which is confusing for informants, or they may belong to recently introduced or rare landraces, whose characteristics are not yet familiar to most informants.

The existence of a collective coherence among farmers for landrace identification has been debated in several studies (Sadiki et al. 2007). Some studies reported high knowledge heterogeneity among farmers, such as Salick et al. (1997), on cassava nomenclature among Amuesha in the Peruvian Amazon. Others found higher knowledge homogeneity within villages than between villages, and postulated that geographic distance that limits seed circulation also limits knowledge transmission (Nuijten and Almekinders 2008). However, knowledge of different farmers for the identification of the same plant was not measured since Bosters’ work was based on the cultural consensus framework (1986). Our study shows that farmers’ consistency in identifying crops varies strongly depending on plants’ characteristics and history, but varies also according to the cultural background of farmers. Hence, farmers living in the same location can present strong differences of knowledge because of their cultural differences. Our results are thus in line with those of Perales et al. (2005) on maize in Mexico, and contribute to explaining the differences in selection practices they observed between adjacent ethnolinguistic groups.

Knowledge Transmission Modalities

Knowledge heterogeneity within and between ethnolinguistic groups indicates that communication and landrace circulation is limited between them. Indeed, we would expect high knowledge homogeneity both within and between groups if circulation was not limited (Romney et al. 1986). Sharing a common nomenclature depends on the path for learning and knowledge transmission. As landrace names are used for communicating, learning what is the culturally appropriate name to label a given morphotype is essential for farmers. Socially related farmers exchange more information concerning landraces than farmers that are not related, and they are hence more likely to share a common taxonomy (Boster 1986). First, the overall higher knowledge homogeneity in the Chuka group indicates more intense knowledge transmission between individuals within this group than in the two others. Interestingly, this can be put in relation to results from a previous study in the same area (Labeyrie et al. 2016a), which showed that the seed circulation network was more dense and cohesive in the Chuka group than in the two others, possibly because it was the first group to settle in the study area. It is likely that these properties of seed circulation networks enhance knowledge homogenization between individuals concerning landraces.

Second, our results indicate that different modalities of knowledge diffusion exist for the different genetic clusters and morphotypes. Knowledge for G1 identification was highly homogeneous, indicating intense knowledge diffusion within and between ethnolinguistic groups. These results are in line with those of previous studies showing that G1 corresponds to Kaguru improved variety, which is cultivated by most farmers and sold on the market (Labeyrie et al. 2014). Then, major differences of knowledge heterogeneity for G2 between the Chuka and the other groups suggest first that sources of knowledge differed between them, as diffusion of the Gadam improved variety appears to have been bound to the former group. Furthermore, knowledge diffusion appeared limited between groups for this variety, but this is not surprising as time since its introduction was very short when the study was conducted. Knowledge patterns concerning identification of the different morphotypes within G3 indicate a higher knowledge similarity between Chuka and Tharaka than between these groups and the Mbeere. Higher knowledge similarity was especially observed between the two former groups for the identification Muruge and Mugeta landraces. This is likely related to the intensity of interpersonal seed exchanges between Chuka and Tharaka, which is the major seed circulation modality for these varieties, while exchanges with the Mbeere were very rare (Labeyrie et al. 2016a). These seed circulation patterns were linked to a strong alliance relationship between Chuka and Tharaka groups.

Several studies reported a coincidence between seed circulation networks and knowledge patterns. For instance, cassava circulation was found to be more intense among kin, who also display more similar knowledge on landrace identification among the Aguaruna in the Amazon (Boster 1986), and similar results were observed among members of the same village for rice in Gambia (Nuijten and Almekinders 2008). Other studies showed that people exchanging more seeds were also the most knowledgeable, for instance among home garden keepers in the Catalan Pyrenees (Calvet-Mir et al. 2012), or among caboclo cassava farmers in Brazilian Amazonia (Kawa et al. 2013). Our results are partly in line with these studies, as we found a correspondence between the structure of seed circulation networks and knowledge patterns for some landraces, but interestingly, this was not the case for others. For instance, knowledge for Gadam identification was strongly shared by the Chuka but not with the Tharaka, despite our previous study which had not detected any limitation to seed circulation between both groups. In addition, knowledge for Kaguru identification was shared by the three groups, despite this same study showing that seed circulation was limited between the Mbeere and the two other groups. An explanation for such a discrepancy between the structure of seed circulation networks among farmers and their knowledge patterns could be that they obtain a large share of their seed on the local market as well as through extension services and NGOs. That is especially the case for introduced varieties released by research (unpublished results). Furthermore, these results suggest that farmers learn about landrace identification through different pathways depending on its nature, i.e., local or introduced, and that seed and knowledge of landraces do not necessarily circulate through the same channels.

Conclusion

This study shows that knowledge concerning sorghum landrace identification differs between adjacent ethnolinguistic groups and that it varies according to landraces’ characteristics. First, our results indicate that farmers in each ethnolinguistic group present a high level of collective consistency in naming what they consider their own sorghum landraces and anciently introduced varieties. This suggests that the consensus level of landrace identification reflects how long it has been cultivated by the human group. Second, our results suggest that social pathways for learning could play a major role in shaping knowledge, as geographically close cultural groups present major differences in knowledge on landraces.

These results open perspectives to understand farmers’ seed selection practices, a major driver of crop evolution and adaptation in situ. Indeed, the way individuals perceive, represent, and classify their environment affects their management practices (Atran and Medin 2008). The effect of inter-cultural differences was especially observed in maize seed selection practices in Mexico, where populations of this crop present divergent morphological characteristics between adjacent villages (Pressoir and Berthaud 2004) and ethnolinguistic groups (Perales et al. 2005). Our study suggests that such divergence in crop selection practices could result from differences in the identification and classification of landraces by the different human groups. It advocates for further integration of anthropology in crop diversity research as crops not only are biological objects but also bear the imprint of the societies in which they are grown, exchanged, and selected (Harlan 1975).