Keywords

Introduction

Few would disagree that scientific research is a laborious process. Observation, hypothesizing, data collection, writing, and knowledge dissemination are all time-consuming and complex tasks, even if one performs them in their native or mother tongue. Now let us imagine for a moment having to conduct these tasks in a second, third, or even fourth language without any previous training in this second, third, or fourth language. Let us imagine the cost, both temporal and financial, imposed upon those who find themselves in this situation regularly. Imagine, still, that in some arenas, the ability to communicate in this other language is not merely desirable or “nice to have,” but an (at best) explicit and (at worst) implicit expectation to climb academic and professional ranks. As Montgomery remarks, it is worth noting that approximately “15 million people [in 2009] work with scientific information on a regular basis, two thirds of them in countries where English is not the first language,” yet “over 80% of scientific publication takes place in English” (2009, p. 7). In more recent data, Bowker and Ciro (2019, p. 1) state: “English has emerged as the international language of scholarly communication—particularly in the domains of science and technology despite the fact that only roughly 6% of the world’s population speaks English as a native language.” In a similar vein, Mahony (2018, p. 374) underscores that because English is the predominant language in the tech industry (Web; digital publishing), there is “an additional incentive to learn English […] whether or not [people] study or train in an English-speaking environment.” Mahony also remarks that this does not “incentivize English speakers to develop other language skills” (ibid.).

For those of us on the front lines of work that requires translation and intercultural communication, we are familiar with such scenarios and the related social, cultural, and economic asymmetries such contexts engender; yet, these questions are not always raised, problematized, or even considered in predominantly Anglophone spheres, including in the contexts of international academic and scientific production (however, it is worth noting that Bowker and Ciro’s recent Machine Translation and Global Research runs counter to this trend). In an article addressing cultural diversity and the evolution of the Digital Humanities, Mahony (2018) lists a number of instances of Anglocentrism, which also apply in other academic fields:

Historically, DH [Digital Humanities; but also the wider Humanities] has developed in a very anglophone environment as English became the language of the Internet (with ICANN) and the lingua franca of the Web (with the W3C Consortium), along with the domination of the ASCII code. ICANN is extending things now with the New Generic Top-Level Domains to include non-Latin characters, although only those that are included in Anglo/US-centric Unicode. There have been recent studies on the metrics of publication and how that along with citation counts has a clear Anglo-bias, resulting in incentives for advancement, promotion and funding to favour publication in the English language for the Arts and Humanities. (p. 372)

Even in contexts that purport to being epistemologically, linguistically, and scientifically inclusive-and the Digital Humanities is a compelling example (cf. Galina Russell 2013)-there seems to be a disconnect between what is championed in discourse and what practice, output, and infrastructure data show. This chapter presents the initial stages of a research project focused on a specific scientific context, one that involves not only academe, but also the larger “crowd”: citizen science. Specifically, quantitative and qualitative data collected from online citizen projects (2018–2019) suggests limited linguistic diversity and generally Anglocentric modes of knowledge creation and dissemination. One of the project’s goals is to identify some of the more implicit structures and practices that serve to reinforce Anglocentrism in online citizen science so that asymmetries and inequities can be addressed.

Literature Review

In Translation Studies (TS), more specifically, scholars have, in various capacities, addressed some of the ethical, professional, and cultural implications of scenarios such as the one roughly described above. For instance, some TS researchers have indicated the “epistemicide” (Bennett 2007; Bennett and Queiroz de Barros 2017) caused by the (mandatory) use of and recourse to “academic” English, while others have referenced the phenomenon of “linguistic imperialism” (Montgomery 2009). Others, still, have addressed the linguistic and translational barriers that both international students and international scholars face, using specific case studies to illustrate the point (Fan 2017). In recent work linking Affective Science and TS, Hubscher-Davidson (2018) discusses how translators (and, by extension, people whose first language is not English) take on varying levels of emotional labor when faced with recurrent and challenging translation/interpretation work. Though she focuses more specifically on translators, it is reasonable to hypothesize that individuals who have to constantly work in a second language, or who must produce self-translations, or who have to outsource translations in addition to their actual professional tasks likely shoulder additional emotional labor or, at the very least, perform additional labor often without additional financial recompense. Elsewhere, research on translation flows (i.e., the direction in which translated content/knowledge circulates) clearly points to asymmetries in the exchange of scientific knowledge and cultural capitals, both offline and online (Brisset 2008; Buzelin 2014; McDonough Dolmaya 2017). These translation flow analyses also reveal that proficiency in English generally facilitates increased access to scientific literature and to a wider range of analytical as well as technological tools (textbooks; e-books; software; apps). A more recent research arena is centered more specifically on how these linguistic and translational asymmetries play out in online and/or digital ecosystems, for instance, on knowledge-related platforms, such as Wikipedia (McDonough Dolmaya 2017; Jones 2018), and on popular science “channels,” as with TEDTalks (Olohan 2014a). Ultimately, it would be difficult to argue against the fact that English proficiency signals an upper hand in scientific and research circles. In a similar vein, it would be difficult to dispute the disproportionate amount of scientific/academic content available/produced in English compared to other languages, particularly those languages that remain peripheral (cf. Calvet’s “gravitational model” 1999).

These TS contributions provide a clearer understanding of the production and dissemination of scientific discourse and discovery more broadly, as well as the role translation plays (or doesn’t play) in these contexts (cf. Olohan and Salama-Carr 2011; Olohan 2014b). Interestingly, Olohan and Salama-Carr (2011) remarked that the study of scientific discourse and its translation remained largely peripheral in TS, for reasons that include institutional and disciplinary factors and data accessibility/management. Yet, as science becomes even more democratized and as technological advances continue to radically change what becomes possible in terms of democratic knowledge creation and transfer, it appears that newer arenas of inquiry connecting scientific transmission/discovery and TS are constantly emerging. The translation flows and scientific discourses that once seemed impossible or more difficult to study are now much more readily accessible/analyzable thanks to the Web, big data,Footnote 1 digital technologies, social platforms, and other tools. St. André (2018, p. 2) remarks that many early projects merging TS, digital data and data visualization were “labor-intensive, slow, and expensive”; however, due to the “exponential growth of computing power and the concomitant decrease in price,” research situated at the crossroads of the Digital Humanities and other fields is now gaining greater momentum. TS researchers can now leverage programming, big data, and visualization methodologies in unprecedented ways, allowing them to resolve or address multi-pronged quantitative and qualitative research questions. For instance, thanks to new technology and methodologies, my research team was able to pose some of the following questions in relation to newer datasets: what happens when the plurivocal (cf. Nappi 2013) crowd is solicited to participate actively in scientific discovery, research, and dissemination? What impact does crowd solicitation have for scientific translation and multilingual communication? What happens when this solicitation occurs online, on social platforms or in social media contexts, wherein users not only contribute (e.g., data production; data collection) but exchange (e.g., discussion threads; tweets)? These are some of the questions this case study will examine, using Zooniverse—“the world’s largest and most popular platform for people-powered research” (Zooniverse 2019Footnote 2)—as its point of departure.

Given the impact of digital technologies, it is all the more relevant to examine how science and research have been affected. Indeed, in the last ten years, the proliferation of mobile technologies, the popularity of participatory culture and social media, as well as the uptick in crowdsourced models for conducting a variety of large-scale tasks has significantly and broadly impacted the academic and scientific landscapes (cf. Howe 2006; Anderson 2008; Young 2010; Boschma 2016; Sturm et al. 2018). More specifically, many of the disciplines in the fields of Science, Technology, Engineering, and Mathematics (STEM) have capitalized on the crowd’s interest and willingness to contribute and to participate in large-scale research projects, often to good effect. This phenomenon is generally known as “citizen science.” Cohn (2008) defines citizen science as “a form of research collaboration involving members of the public in scientific research problems to address real-world problems.” He further explains that working with citizens to produce scientific knowledge is not an entirely new phenomenon: the practice dates back to initiatives that started in the early twentieth century (e.g., the National Audubon Society), though the term “citizen science” (CS) didn’t have currency until the 1990s (ibid.). However, what constitutes a newer development (or what is now known as “citizen science 2.0”) is the growing “number of studies that use citizen scientists, the number of volunteers enlisted in the studies, and the scope of data they are asked to collect […] [and the] use of sophisticated equipment and techniques” (ibid., p. 193). In 2008, Cornell scientists estimated the existence of “thousands” of citizen science projects (ibid.), proof that the phenomenon was gaining traction on a world-scale. According to Silvertown (2009) online crowdsourced initiatives, in particular, have garnered increased attention and risen in popularity, for reasons not dissimilar to the rise of research in the Digital Humanities listed above (lower cost; better technology; etc.). Since 2008, researchers in various disciplines (usually within STEM) have classified CS projects (Wiggins and Crowston 2011, 2012), observed and analyzed the reception and perception of CS within academe (Riesch and Potter 2014), and noted outcomes and impacts of CS (Constant and Roberts 2017). Franzoni and Sauermann (2014) note that citizen science projects that have leveraged “the crowd” have led to scientific findings published in reputable journals such as Nature, Proceedings of the National Academy of Sciences, and Nature Biotechnology. Further, in the literature pertaining to CS typologies, researchers have also indicated that action-oriented projects and projects aimed at conservation efforts engage citizens on both the level of scientific contribution and civic duty (Wiggins and Crowston 2011). However, despite diverse analyses of CS projects and initiatives, very little is said on the role that linguistic diversity or translation might play in these contexts. In fact, it would appear that in extant English-language literature,Footnote 3 only Michalak (2015) has discussed a rough connection between CS and translation, but only in relation to the educational potential for language acquisition, not the role or the effects of translation or non-translation within CS spaces. It appears as though English-language CS literature implicitly assumes English proficiency within the crowd, and rarely problematizes this assumption explicitly, in turn echoing some of the remarks made about implicit and explicit Anglocentrism in the introductory section.

Moreover, despite the quasi-utopian promises of digital communication and technologies amplifying marginalized voices, the lack of diversity, and more specifically, the lack of linguistic diversity, appears to remain the status quo in STEM and in online CS spaces. For example, Brinkworth et al. (2016) express a concern for the lack of diversity in relation to the American Astronomical Society (AAS). Though their report presents American data, similar sentiments are also shared by the international STEM community, whether these sentiments are focused on English as the predominant scientific language to the exclusion of other languages (Meneghini and Packer 2007), on issues related to gender parity in STEM (Devillard et al. 2017; Gaviola 2017), or centered around diversity across STEM in general (Ouimet 2015).

Proponents of online and digital technologies regularly tout increased and more diverse connectivity and while the amorphous citizen science “crowd” should, one would presume, be a collection of individuals with different linguistic, ethnic, racial, scientific, and cultural profiles, science-even science produced by this diverse crowd of citizen scientists-remains by and large framed by the epistemologies of Anglo-Saxon traditions and underpinned by Anglocentric computer programming (cf. Mahony 2018). Even if and when translation is present, it serves largely to feed into existing English-language scholarship, rarely the other way around (the translation flow can be illustrated as follows: peripheralFootnote 4 language → English) (Brisset 2008; UNESCO 2009; Buzelin 2014). It is also worth underscoring that while research exists on what motivates volunteer translators to translate citizen science projects and findings (Olohan 2014a) and while research on crowdsourced translation quality is gaining momentum (Jiménez-Crespo 2018), current literature features few case studies examining and leveraging translation flows to confirm (or disprove) Anglocentrism in these citizen science spaces (McDonough Dolmaya’s [2017; section 2.2] does, however, reference the concept of translation flows in her discussion of language policies on Wikipedia, a crowdsourced resource).

Furthermore, the online and digital data surrounding citizen science and crowdsourced translation (i.e., “conversations”/“user engagement”) have been largely neglected, obfuscating critical insights that would, I posit, reinforce motivational profiling and supplement flow analyses. With the exception of Olohan’s (2014a) analysis of TEDTalk blog posts about translator motivations, few studies have integrated online social media data to flesh out the who, what, when, where, why, and how of scientific translation and citizen science translationFootnote 5 in online conversations. For the most part, when researchers profile citizen scientist motivations, they use participant-based methodologies (interviews; surveys) rather than observational and visualization methodologies. While interviews and surveys are valid approaches, each methodological framework provides differently curated and motivated insights. A survey or interview imposes a frame upon the participant from the outset, whereas a voluntary social media post is part of another type of communicational context altogether. Olohan (ibid., p. 23) and Watson (2009) further clarify this important distinction: blog posts, and by extension, social media posts (e.g., tweets, stories, or status updates), are

representations of the motives [or interests] that translators wish to communicate publicly. This constitutes an important distinction vis-à-vis research which analyses questionnaire data. In both cases, respondents make decisions about the information they wish to impart to putative readers. However, the text of a blog entry [or post] published online is not proffered for research purposes but is a form of crafted self-presentation. (Watson 2009)

Brooker et al. (2016, p. 1) echo this position: “Social media provides a form of user-generated data which may be unsolicited and unscripted, and which is often expressed multi-modally.”

It is also curious that crowdsourced translation and localization are assumed to be the de facto and exclusive translation “models” in the dissemination of online citizen science: in the limited literature on citizen science and translation, few other conceptualizations of translation are discussed.Footnote 6 Because the crowd has been solicited to translate (i.e., crowdsourced translation) popular CS sites, such as Zooniverse,Footnote 7 and because Web localization is often the term used by industry to describe the translation and local adaptation of websites (cf. Jiménez-Crespo 2013), it makes sense that other forms of translation practice or phenomena would not readily come to mind (e.g., self-translation; embedded automatic machine translation; online and offline collaborative translation; computer-assisted translation). There is also a “widespread (but mistaken) impression” that multilayered translation ecosystems are “too complex” to research (Shuttleworth 2017, p. 311). This conceptualization of CS translation activity (i.e., crowdsourced/collaborative and/or localization) is also reflected in the discourse on translation within CS ecosystems. For example, in Zooniverse’s own ecosystem, more specifically the “Talk” section (a message/chat boardFootnote 8 feature on the site), a member of the Zooniverse team replies to another user, who was requesting localization features, by stating “We’re planning to add translations/localisation in the future,”Footnote 9 which seems to either conflate or distinguish the two (the lack of additional contextual features makes it difficult to discern), revealing some of the short-hands used by online users to discuss complex translation activities. In addition, most of the other discussion activity on the subject of translation in other Talk threads primarily pertains to making more Zooniverse features multilingual (i.e., user requests for increased multilingualism/translation more broadly and user requests for additional translation features within “project builders”) rather than the Zooniverse team attempting to create holistic, site-wide translation policy (the issue of guidelines and policies will be discussed further on in the section Translation Flow Analysis). This limited framing of translation, i.e., a distinct and subsequent act that follows the conceptualization of a CS project, site, or platform serves to reinforce the platform’s inherent Anglocentrism (e.g., site design, project building features, and other aspects related to programming). Programming languages and algorithms, for instance, are often connected to the languages in which they are produced, replicating epistemological biases of all kinds, as previously stated. Thus, translation features designed using specific programming languages or algorithms can also pose limitations for diversity and pluralism (and some of the archived Zooniverse Talk threads are suggestive of this). Additionally, on a micro-level, the terms “crowd translation”/“crowdsourced translation” and “localization” do not necessarily address the blurred lines of online interlingual communication or the complex interplay of human-computer interaction. In related literature, Jones (2018) notes the collaborative and “e-volving” co-construction of Wikipedia articles as a “muddy mix of translation, collating, summarizing and synthesizing,” while Desjardins (2017, 2019) has advocated for a layered understanding of translation in social media contexts. These two more nuanced conceptualizations of online translation have relevance here because Zooniverse also hinges on complex and layered translation phenomena. Other conceptualizations of layered multilingual communication that have applicability in this case study are Li’s concept of “translanguaging” and Androutsopoulos’s discussion of “online multilingualism.” Li (2011, p. 1223) has proposed the psycholinguistic concept of “translanguaging” to refer to “going between different linguistic structures and systems, including different modalities (speaking, writing, signing, listening, reading, remembering) and going beyond them.” Androutsopoulos (2015, p. 187) builds on Li’s conceptualization to address online multilingualism: he explains that web content is often cast in different languages, at different times, i.e. a “configuration of ‘modules’ that co-exist in screen space.” In a sense, Androutsopoulos’s understanding of “networked multilingualism” intersects with the position taken here in relation to translation: translation activities enable, enact, and ensure networked multilingualism.

To state the interconnectedness of these terms more plainly: if a translation is produced by the crowd (i.e., crowdsourced translation) on a localized website (localization), this does not necessarily mean other forms of translation are not concurrently taking place; the digital realm necessarily multiplies the forms translation activity can take. The problem is that previous analyses tend to focus on one type of translation activity to the erasure of others.Footnote 10 In this study, a concerted effort has been made to avoid translation binaries (e.g., source/target; translator/author) and to address the layered “materiality” (cf. Littau 2015) of online digital translation, thereby acknowledging concomitant variants of translation and multilingual activity. This has the benefit of viewing citizen science translation from a more holistic perspective.

By analyzing CS platforms and projects, as well as social conversations related to these initiatives, and by using a TS lens to do so, my team was able to investigate knowledge creation (scientific discovery), knowledge dissemination, and translation flows as knowledge in the Humanities, Social Sciences, and STEM fields was being produced (rather than after the fact), not only within academe, but within and by the general population. This in turn stands to fill some of the gaps identified in the reviewed literature.

Theoretical Framework and Methodological Modeling

The theoretical and methodological frameworks amalgamate insights from Citizen Science, Translation Studies, and Social Media Studies, thus squarely positioning the overarching research design within the scope of the transdisciplinary Digital Humanities. The research worldview (Creswell and Creswell 2018) is twofold: on the one hand, I have elected to implement a transformative worldview (ibid.), because the project’s mandate is, in part, to inform and guide more equitable exchange and dissemination of citizen science capitals. By indicating points of non-translation or asymmetrical translation flows in this newer arena, it is difficult to refute the lack of concerted strategies to ensure linguistic justice,Footnote 11 an issue which has been raised by the CS community itself (cf. Sturm et al. 2018). If the general CS community is informed of these linguistic and translation lacunae and/or asymmetries, the argument is then that more linguistic diversity can be encouraged or required by policy or best practice guidance (i.e., transformation of existing practices or paradigms). The case study’s research design is also modelled by a pragmatic worldview (Creswell and Creswell 2018) meaning that the principal investigator (PI) and research team did not set out with a predetermined set of methodological approaches to analyze the data. Rather, following in-depth literature reviews in the fields of Translation Studies, Citizen Science, and Social Media Studies, the team, under my supervision, adapted the theoretical and methodological modeling to allow for a mixed methods and pluralistic approach to evolve (iterative research design). This project also falls within the category of a mixed-methods approach as both quantitative data (e.g., number of translated projects; number of platforms; statistical analyses) and qualitative data (e.g., social conversations; network visualizations; hashtag indexing; sentiment analysis) are leveraged (according to Creswell and Creswell [2018], this is considered a convergent mixed-methods approach). To the extent that individual translators and citizen scientists have not been interviewed, controlled, or tested upon, and that only public and anonymized data is analyzed, the research falls within observational work that presents minimal risk.Footnote 12 Given the relatively unprecedented nature of the project, particularly within TS, it is also possible to classify this research as exploratory and experimental.

Within the purview of TS, this study falls under the umbrella of product-oriented research and context-oriented research (Saldanha and O’Brien 2013). An argument could be made to suggest the project also falls within the classification of process-oriented analysis (ibid.) as well, given that consideration was also given to any data evocative of translation processes, translation workflows, and best practices.

In their article titled “Translating Science,” Olohan and Salama-Carr (2011) argued that to fully comprehend how scientific knowledge is disseminated and circulated, the import of translation (and associated practices, such as interpretation and localization) could not be ignored. Their call focused primarily on STEM fields, though the position taken here is one that also takes into consideration scientific research in the Humanities and Social Sciences. Too often these fields are compartmentalized, which in turn reinforces disciplinary silos to the detriment of novel ways of thinking and conducting research. Historically, CS was assumed to be a practice generally associated with STEM, given its connection to institutions or organizations like the National Audubon Society. However, what platforms like Zooniverse indicate is a growing trend within the Social Sciences and Humanities (Arts) to also involve citizens in the research process. In fact, Zooniverse actively blurs epistemological lines by cross-classifying projects across the disciplinary spectrum (an illustration of multi-, inter-, trans-, and pluri-disciplinarity if there ever was one). For instance, a project titled “SONYC: Sounds of New York City”Footnote 13 is cross-classified under “Social Sciences” and “Physics,” suggesting that the project has resonance for both fields. This type of classification—one that perhaps does away with former or more traditional classification systems—is likely to elicit more citizen and researcher engagement if it is listed under two categories rather than one.

Given the transdisciplinary framework (Fig. 1), the case study’s observations and data have import for both the Humanities and Social Sciences and the STEM sciences. Moreover, because analytical consideration is given to citizen engagement beyond CS platforms, extending, for instance, to other social platforms like Facebook and Twitter, the case study is also informed by theories emanating from Social Media Studies. Specifically, two ways of analyzing social media data are used:

  1. 1.

    social media analytics/analysis, i.e., tracking online conversations on social media, (this framework is inspired by Part III of The SAGE Handbook of Social Media Research Methods [Sloan and Quan-Haase 2017], which focuses primarily on qualitative approaches to social data); and

  2. 2.

    social network analysis (SNA) through network visualizations. SNA can be conducted in offline settings and online settings (Marin and Wellman 2014). Here, SNA is used only to analyze online social networks, specifically those that relate to the CS platforms under study.

Fig. 1
figure 1

The project’s theoretical scaffolding

Case Study: Platforms and Analyzers

As the literature on CS indicates, the proliferation of CS platforms and projects has been on the rise. It would be impossible to analyze all CS platforms and projects within the scope of a single study, so primacy was given to Zooniverse given its mainstream popularity and the standard it has set for other CS platforms. Zooniverse was chosen because it includes CS projects from across the disciplinary spectrum, which enables the comparison of social data and translation flow data between disciplines, as well as within disciplines. Comparatively, many other platforms focus on the natural sciences, conservation projects, or other similar thematic initiatives exclusively. Zooniverse also has a site that is dedicated to the crowdsourcing of project translation, Zooniverse Translations,Footnote 14 which initially suggested readily available translation data. Unfortunately access to Zooniverse Translations has posed a problem: a number of attempts requesting access to data and applications to participate in project translation were made, but I did not receive a reply by the time of publication. In future research, it is hoped that this data will be retrievable and accessible, though the Zooniverse platform still comprises other relevant translation-related data, which is presented in the two “Initial Findings” sections that follow.

Another noteworthy Zooniverse feature is the option to embed social media buttons linking to external social channels within CS project builders, including Facebook and Twitter. This facilitates the observation and analysis of language use/translation activity on Zooniverse and social engagement (e.g., on Facebook and/or Twitter and/or YouTube). These supplementary off-site social conversations provide clues related to translation flows or translation agents, which can then be integrated into SNA analyses and other descriptive analyses.

While a number of analyzers are available, the choice of network analyzer and social analytic tools was determined by a number of factors and constraints. One factor was familiarity: network analyzers are not necessarily a commonly used tool in TS (though they are gaining popularity) and, for this reason, it seemed judicious to select a tool that had a proven track-record in other academic arenas and that provided a user-friendly site to get started. Following an introductory presentation on the social analyzer Netlytic in the context of the 2017 “Social Media + Society” conference held in Toronto, I decided it would be opportune to use Netlytic (Gruzd 2016) as the primary analyzer. This cloud-based social network analyzer is economical (entry-level data tiers are free and project-specific requests can be accommodated); it has a proven track-record in a variety of case studies;Footnote 15 and it is the product of a Canadian-led research team (Ryerson University), which answered the criteria of supporting research collaboration among Canadian scholars and research institutions. The argument has been made that there is sometimes a lack of transparency regarding the choice of analyzers/tools (Raghavan 2014; Brooker et al. 2016). In this case, although the rationale provided for using Netlytic could be flagged for some shortcomings (in hindsight, other tools might have been more intuitive or afforded more scalability or other relevant features), full transparency about the selection criteria has been provided.

Translation Flow Analysis: Initial Findings

My team’s first objective was to chart translation activity across the Zooniverse platform, including projects from across the disciplinary spectrum, using translation flow analysis. The first phaseFootnote 16 of the analysis took place from September 2018 to May 2019 and new, paused, and completed Zooniverse projects were tracked. Although the data collected during this first phase cannot illustrate diachronic trends conclusively, they nonetheless provide a snapshot of nearly a year-long cycle and could be used in comparative work in the future to establish longer-term trends.Footnote 17 Overall, 132 Zooniverse projects in the Social Sciences, Arts, and SciencesFootnote 18 were individually examined for translation activity (e.g., evidence of different language versions; localization; crowdsourcing; self-translation) or translation features (e.g., evidence of volunteer translator forms; translation buttons; or explicit translator profiles). In total, nine projects had either been translated (project site available in at least two or more languages) or had prominent translation features, only two of which were still active at the time of writing (July 2019).Footnote 19 All nine projects fell under the overarching category of STEM disciplines. On the one hand, this is not necessarily surprising in that STEM citizen science has a longer history/tradition than citizen science in the Humanities and Social Sciences. In addition, many larger-scale STEM CS initiatives have interinstitutional or collaborative teams at their helm, which would likely involve the use of English as the lingua academica, but also that of other languages in order to carry out local/geographically specific tasks. For instance, observational astronomy requires the coordination of telescopes or observatories in different geographical spaces, which would more easily explain the presence of multilingual teams and the need for translation compared to a project that is situated within a single institution in a specific city where only one language is used.Footnote 20 It is also worth noting these data indicate translation phenomena are more prevalent in the sub-category “Physics and Astronomy (Space)” than any other STEM discipline. Of the nine translated projects, all the projects that had more than three (>3) translated project sites (i.e., at least four different languages within one, larger project ecosystem) belonged to the “Physics and Astronomy (Space).” In total, these nine projects comprise 15 different languages. Table 1 presents these initial findings.

Table 1 Translated Zooniverse projects, status, language combinations (Sept. 2018–May 2019)

Although it is encouraging from the perspective of linguistic diversity and linguistic justice to see a relatively diverse range of represented languages (i.e., languages from different language trees or in different “positions” [cf. Calvet 1999]), English is unquestionably the platform’s lingua franca and “point-of-entry,” meaning that English proficiency is, in many regards, assumed by the site’s creators/developers, despite early claims of inclusivity (as indicated by the seemingly inactive Zooniverse Translations page). This diversity is also precarious: on a micro-scale, 15 languages may seem to suggest linguistic diversity, but it is worth noting that only 6% of Zooniverse’s entire project catalog (9/132) during Sept. 2018 to May 2019 was available, to varying degrees, in translation or in languages other than English. Unilingual citizen scientists who may not be proficient in English, particularly, and who have either the expertise or interest (or both) to collaborate are thus implicitly excluded from the outset: after all, if online content (in this case Zooniverse projects) is not readily available in one’s language, could this not deter potential citizen scientists (or users, more broadly) to participate in and contribute to Zooniverse projects?

In a recent CS workshop initiativeFootnote 21 that led to the publication of a series of “defining principles” (Sturm et al. 2018) for more inclusive/effective citizen science, participants and researchers noted: “we are not aware of a collection of recommendations specific for citizen science that provides support and advice for planning, design and data management and platforms that will assist learning from best practice and successful implementation” (p. 1). Specifically, researchers and stakeholders who attended the workshops addressed some of the challenges and barriers to conducting citizen science. The barriers/challenges were listed under six major areas:

  1. 1.

    interoperability and data standardization;

  2. 2.

    user interface and experience design;

  3. 3.

    outreach, learning, education, and other rewards of participation;

  4. 4.

    re-use;

  5. 5.

    sharing of learning; and, finally,

  6. 6.

    tracking participants’ contribution across different projects. (ibid., p. 5)

Of specific relevance for TS, the focus group tackling the area of “Outreach” addressed the “socio-technical” nature of CS projects and evoked the importance of cultural sensitivity, yet no overt mention of linguistic or translation-related barriers were underscored, despite the connection between culture, epistemology, and language. For instance, the Anglocentric biases related to programming languages or the way Zooniverse project builders tend to be modeled according to Eurocentric or predominantly North American paradigms could have been problematized within the discussion on linguistic, geographic, and cultural representation, but they were not. This is, in my estimation, not only an important oversight in an attempt to create overarching best practices in online CS, but also endemic among initiatives that try to promote greater linguistic diversity. If translation and multilingualism are an afterthought or entirely omitted from the scientific discussion, then so too is linguistic equity in scientific inquiry.

Zooniverse does have a page titled “Best Practices For Engagement and Success,”Footnote 22 indicating some consideration for equity and accessibility, but the issue of translation is never addressed explicitly in the three sections of the “Best Practices.”Footnote 23 No consideration or guidance is given, for example, about how to create engagement beyond English or how multilingual “Talk” features or social media outreach in languages other than English might elicit more responses or citizen scientist engagement from beyond the “usual” Zooniverse volunteer pool. Nothing is said about programming language bias or project builder bias and how either would impact user contributions and/or engagement. Further, the lack of overarching translation guidance likely explains some of the inconsistencies noted among the different projects in the translation flow analysis. Of course, Zooniverse projects might also be bound by the constraints/modalities of grant or funding agencies and this could certainly impact how the project is conceptualized, built, and shared. For instance, a funding agency requiring multilingual knowledge dissemination might require translation (or multilingual/translingual options), while projects that do not have this requirement may prioritize other project features. In a parallel comparative analysis of Zooniverse projects and the Government of Canada’s Citizen Science PortalFootnote 24 projects, the latter having a translation/linguistic funding requirement, it is interesting to note how a systematic translation policy impacts project building and citizen engagement. Although this part of research is still in progress (part of phase 2), preliminary analysis shows the Canadian Citizen Science Portal has a concerted bilingual social media engagement policy, which means research communities and citizen communities fluent in either official Canadian language are solicited. Said differently, translation and multilingual engagement/dissemination are not afterthoughts on the Canadian Citizen Science Portal; rather, translation is integral in the conceptualization and user experience (which intersects with some of the best practices evoked in previously referenced in Sturm et al.’s report). However, an explicit translation policy does not necessarily mean that the underpinning motive is that of greater linguistic justice and language representation. In the case of the Canadian Citizen Science Portal, the representation of the two official languages speaks directly to national language policy, which frames all publicly-funded initiatives to varying degrees. If linguistic justice and greater equity were the guiding principles, one could make the argument for linguistic representation beyond the official languages (English; French) to include peripheral or endangered languages (Indigenous languages being a particularly probing example). Nonetheless, national language policies, such as the Canadian Official Languages Act, do seem to encourage project conceptualization and knowledge dissemination in languages other than English.

Social Media Analysis and Social Network Analysis: Initial Findings

The rationale for extending the analysis beyond Zooniverse project ecosystems was that doing so would facilitate a better understanding of how more mainstream social platforms and apps are mobilized in CS engagement and outreach. The goal here was to determine whether “active,” “still available,” or “paused” translated Zooniverse projects used multilingual communication or translation in their social media engagement strategy (i.e., did they use platforms such as Facebook or Twitter to create engagement or share content in languages other than English). The three projects that fit under this category were “Condor Watch,” “Cyclone Center,” and “Snapshot Hoge Veluwe.” Condor Watch and Cyclone Center both mobilized social engagement beyond their immediate project ecosystem; however, all activity on these external social platforms was primarily conveyed in English. For example, analysis of Condor Watch’s Twitter account (@condorwatch) and feed revealed activity was exclusively in English despite the Zooniverse project having a French and Polish version. Figure 2 shows the descriptive text for the @condorwatch account, which does not provide any explicit reference to the French or Polish versions of the project or any reference to the fact that prospective citizen scientists could engage in either of these languages. In another example, the Cyclone Center team uploaded three YouTube tutorials in 2015, all three of which are available in English only.Footnote 25 In the case of Snapshot Hoge Veluwe, the project ecosystem is available in both English and Dutch and it lists four social channels external to Zooniverse: Facebook, Twitter, Instagram, and YouTube. However, upon closer inspection, these social accounts are in fact the De Hoge Veluwe national park channels, not those of the Snapshot Hoge Veluwe research project team. Therefore, the user engagement is not directly related to the Zooniverse project.

Fig. 2
figure 2

The description box on the @condorwatch Twitter profile page, as of June 1, 2019

Although the data is quite limited, these examples constitute missed opportunities for wider linguistic and cultural engagement. For instance, the Condor Watch team could easily inform its Twitter followers that they can contribute to the project in three languages. Similarly, Cyclone Center’s YouTube tutorials could be dubbed or subtitled to ensure Chinese speakers and Italian speakers have access to relevant support and training materials in the same way English speakers do (it is worth recalling Cyclone Center’s Zooniverse ecosystem is available in English, Chinese, and Italian; see Table 1). In line with the recommendations explored during the aforementioned European workshops, the argument is that by signaling, at the very least, the option or availability of translated or multilingual resources/content, research teams leveraging citizen scientists would be addressing the principles of greater outreach, re-use, and sharing.

What the use of social platforms like YouTube, Facebook, and Twitter also shows is that citizen science escapes the “usual” spaces of citizen science portals or platforms, placing such conversations in and around citizen science in larger social debates and networks. Netlytic allows researchers to investigate and visualize these conversations. My research team thus extended our analysis onto these other platforms, in an attempt to map social networks and to see what social conversations were configured around citizen science/translation.

As part of an iterative querying strategy, Netlytic was used to parse Twitter to find conversations and content centered on “translation” and “citizen science.” The idea was to see if these two terms “collocated” and if so, what users were tweeting about more specifically. From there, the hypothesis was that we might be able to identify any conversations that focused more specifically on Zooniverse. Netlytic can create two types of overarching data visualizations: one that is related to text (word clouds) and one that is related to networks (which can be generated in three different layouts: Fruchterman-Reingold, DrL, and Lgl). Using a list of key terms related to the project and Boolean operators, Twitter was queried on a few occasions to extract preliminary data. In Figs. 3 and 4, examples of word clouds connected to “#citizenscience” are presented.

Fig. 3
figure 3

Example of a word cloud generated on May 27, 2019 using search query “#citizenscience”

Fig. 4
figure 4

Example of a word cloud generated on May 31, 2019, using search query “#citizenscience”

The word clouds illustrate the other terms, accounts, or words that tend to “cluster” or “collocate” with “#citizenscience.” Although Figs. 3 and 4 present specific examples (specific dates), querying the same key term in a few different tests revealed that most of the words that appeared alongside “#citizenscience” were in English and “translation” (as a key term and including derivatives queried using Boolean operators, such as “translat*”) did not appear once. Evidently, these initial queries were part of the research team’s iterative testing and querying, so these data are preliminary and synchronic in nature. In Phase 2, regular and consistent querying will be conducted to establish diachronic trends.

Network visualizations provide a visual representation of social conversations in and around the queried search terms, illustrating how online users are related to one another within a thematic social conversation. Network visualizations and plots created by Netlytic allow researchers to examine (1) centralization (the centralization or decentralization of online conversations on a given platform); (2) density (how close participants are which can help assess the speed of information flow); (3) reciprocity (the proportion of reciprocal ties or, more simply stated, “back-and-forth” conversation); (4) modularity (higher levels of modularity indicate whether clusters in the network are distinct or overlapping); and (5) diameter (indicates a network’s size, i.e., how many nodes it takes to get from one side to the other) (Gruzd 2016). As an example, Fig. 5 presents a network visualization using the search term “translation+studies.” Figure 5 shows the different “constellations” (or clusters) of users discussing “translation+studies” in their tweets. This visualization is meant to give readers an idea of what network visualizations look like; however, Fig. 5 does not present project-specific data.

Fig. 5
figure 5

A test example of a network visualization (“translation + studies”) using Netlytic (July 17, 2019)

Although network visualization has been used in other disciplines such as Social Media Studies, recourse to this type of data visualization is still relatively new in TS.Footnote 26 Given the “sociological turn” in TS (cf. Angelelli 2014), where focus shifted from microtextual analysis (e.g., earlier case studies inspired by corpus linguistics) to larger sociological aspects related to translation and interpretation (i.e., the agents involved in the translation process; the sociocultural factors that impacted translation phenomena), I argue that network visualization constitutes a relevant analytical tool in contemporary TS. Mapping and analyzing networks provides clues as to how some social configurations are formed and how they evolve (in some cases, in real-time) in relation to specific thematic content, ostensibly answering the question of who is talking about translation, when, where, why, and how. In an ideal scenario, of course, the hope is that a query would generate a large-scale network with numerous “active agents.” This would suggest that a given key term or search query is generating significant or at least active engagement. However, when the same key terms were paired (“citizen science” + variants and “translation” + variantsFootnote 27) and run through Netlytic over the course of different tests, the team obtained zero hits (n = 0). This might initially suggest that examining conversations on the subject of citizen science and translation was a futile exercise. However, the non-representation of translation in online social conversations on the topic of citizen science is equally telling: if translation is not/minimally being discussed, could it mean that the dominance of English as the lingua franca is relatively unquestioned as well? Or, does it mean (and likely so) that discussions on linguistic justice and scientific knowledge use different key terms, perhaps in different languages? If so, what are these terms and what does this signify for how we define translation and online multilingualism? These questions will guide the following phases of the project. It is also worth noting that Netlytic parses the 1000 most recent tweets, meaning that older tweets or tweets falling beyond this number do not appear in the visualization report. This indicates the importance of establishing a consistent querying strategy, particularly in the case of diachronic analysis. As the project continues into the second analytical phase, the network analysis will be extended to capture conversations about citizen science and translation (and related derivatives/search terms) in English, but in other languages as well.

Conclusion

This chapter sought to present an overview of extant literature that justifies examining translation, as a sociological object of study, on online social platforms beyond some of the more recurrent examples in TS (i.e., Facebook, Twitter, Instagram, YouTube and Wikipedia). The literature review also indicates the relevance of problematizing the dominance of English as the language of production and dissemination of scientific capitals. By examining the (non)presence of translation (and multilingual communication) on citizen science platforms, my team and I were able to present initial findings that would support the hypothesis of Anglocentric bias in citizen science.

While CS purports to be inclusive on a macro-level, the Zooniverse data presented in this study shows that on a micro-level, citizen science remains, to a degree anyway, exclusive in the way that some of its tools, paradigms, and online interactions are construed. Moreover, the relatively inexistentFootnote 28 social conversations (in English) on Twitter on the topic of citizen science and translation suggest that multilingual citizen science is not a forefront issue, even though the call for greater epistemological, linguistic, gender, and cultural diversity has been made in the STEM disciplines as well as in the Social Sciences and Humanities.

This project is still in its early analytical stages; however, the transdisciplinary theoretical and methodological framework merging insights from TS, CS, and Social Media Studies has applicability for future case studies in TS, even beyond citizen science. For instance, in a parallel project, a similar framework was used to analyze multilingual and translation phenomena on Netflix, another online social platform that warrants further study in TS,Footnote 29 to good effect. In this case, the network visualizations illustrated a number of active clusters on Twitter discussing translation, interpretation, dubbing, and subtitling in relation to popular Netflix shows. As research progresses, the end-goal will be to present a diachronic report of translated and non-translated CS projects on Zooniverse, with supplementary insights from related social media analysis and network visualizations.