Text and Data Mining Exceptions in Latin America

Schirru, Luca; Rocha de Souza, Allan; Valente, Mariana G.; de Perdigão Lana, Alice

doi:10.1007/s40319-024-01511-2

Abstract

Text and data mining (TDM) is a powerful tool in the knowledge discovery process and an essential step in the process of training Artificial Intelligence (AI) systems. Whether forms of use needed for TDM conflict with copyright rules is still a matter for debate within the specialized literature and when designing new legislation across the globe. Despite the borderless nature of research and the fact that the interplay between TDM and copyright is a matter of interest to all regions, most of the focus in the existing literature is on countries or examples from the Global North. This study contributes to filling this gap by providing additional information on recent developments across Latin America regarding the need for copyright legislation to adapt to data-intensive research practices and uses. It also provides a set of practical examples and issues specific to that region. It is hoped that these will, at least, partially, contribute to a more universal approach to the issue around the globe.

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

It is not uncommon for data-intensive research to use works that may be protected by copyright (papers, news articles, images, etc.), especially in the training phase.^{Footnote 1} Text and data mining (TDM) may be a powerful tool in the knowledge discovery process,^{Footnote 2} and an essential step in the process of training Artificial Intelligence (AI) systems. Whether forms of use needed for TDM conflict with copyright rules is still a matter for debate within the specialized literature and when designing new legislation. Legal certainty about the status of TDM-related practices, and, in general, about uses for research purposes, may be decisive for technological, scientific, and economic development. Previous studies have shown that Latin America brings together in one area a substantial number of countries in which either legislation on the matter is restrictive (in the case of uses for research purposes in general) or the subject is unregulated, as is the case for most copyright laws in the region when it comes to TDM.^{Footnote 3}

There is already an extensive body of literature analyzing TDM exceptions. However, even though the borderless nature of research makes the interplay between TDM and copyright a matter of interest to all regions, most of the focus in the existing literature is on countries or examples in the Global North.^{Footnote 4} In the EU, such issues as the potentially negative impact of existing copyright rules on TDM activities,^{Footnote 5} harmonization throughout the Union,^{Footnote 6} realization of the existing TDM exceptions on research and innovation,^{Footnote 7} and the narrow scope of the TDM exceptions^{Footnote 8} are just some of the topics addressed in the copyright literature.^{Footnote 9} In the United States, given its common law system, the literature analyzes the lawfulness of the acts carried out within the context of TDM research under the fair use doctrine,^{Footnote 10} often commenting on landmark cases.^{Footnote 11} The Japanese exception is also often referenced due to its particular wording and permissive scope.^{Footnote 12}

While the debate in Latin America may benefit from literature on international copyright law^{Footnote 13} and existing studies addressing the interplay between TDM and copyright on a global level, there are normative, socioeconomic and cultural characteristics unique to either the region or countries within it that must be considered in any analysis and design of a legal framework for research and innovation. This study aims to provide an analysis of the current debate and regulation on the topic in Latin America, the role of Latin American TDM research in the global research community, and examples of TDM practices that are key to research practices and uses.

Part 2 will outline the definition of TDM adopted herein and some of the main forms of use within TDM research, highlighting potential copyright issues in such practices already documented in the dominant literature. Part 3 will provide a broad overview of the international debate on the regulation of research practices and, more specifically, forms of use needed for TDM under copyright law, as well as the potential relationship between the openness of copyright law and research. Part 4 will present the current state of the copyright laws in the Latin American region when it comes to TDM and research exceptions, exemplify some TDM practices that are key to research, and analyze the role of Latin American TDM research in the global research community.^{Footnote 14} Finally, it will discuss the need for a well-tailored and balanced legal framework in the region.

2 Text and Data Mining and Copyright

Even though the training of AI systems cannot be reduced to TDM, and the two are not synonymous, there is a clear relationship between them, with the latter being an important step for the former. TDM practices often involve the automated analysis of works that could be protected by copyright (papers, news articles, images, etc.). Whether the use that needs to be made of these for TDM may trigger copyright rules is still a matter for debate within the specialized literature. While part of the literature argues that there must be an exception for TDM, other parts disagree and understand that such an exception would reinforce the misconception that TDM involves uses that, by law, fall under copyright holders’ exclusive rights. This section aims at elucidating some key applications within TDM research, and highlighting potential copyright concerns connected therewith, including nuances of the Latin American context concerning the intersection of TDM and copyright.

2.1 What is Text and Data Mining?

When it comes to existing laws for regulating TDM practices, one of the most cited is European Union’s Directive 2019/790 on copyright and related rights in the Digital Single Market (CDSM). Article 2(2) thereof defines TDM as “any automated analytical technique aimed at analysing text and data in digital form in order to generate information which includes but is not limited to patterns, trends and correlations”.^{Footnote 15} An example from Latin America is Brazil’s bill on AI, which describes TDM as “the process of extracting and analyzing large amounts of data or partial or full excerpts of textual content, from which patterns and correlations are extracted that will generate relevant information for the development or use of artificial intelligence systems”.^{Footnote 16}

In the literature, data mining is defined by Han, Kamber and Pei^{Footnote 17} as “the process of discovering interesting patterns and knowledge from large amounts of data”. Izquierdo^{Footnote 18} defines it as “AI’s ability to interpret large quantities of raw data, [...] [and] by identifying patterns, to process them”.^{Footnote 19} Similarly, Ruiz Lobaina and Romero Suárez^{Footnote 20} propose the following definition: “Data Mining is the process [...] that deals with the non-trivial extraction of useful, hidden patterns that are inherent in data, and also the fastest way to study large volumes of information”.^{Footnote 21}

Although it could be challenging to give details of each and every phase of research employing TDM, given the multiple research fields,^{Footnote 22} projects and techniques,^{Footnote 23} the description provided in Carroll is helpful for illustrating some of the main steps in the process:

a multi-step process involving first the compilation of a dataset of text-based and related works into a format amenable to software-based statistical and related forms of pattern analysis. Researchers make multiple copies of the data during the TDM process. They make copies when they: (1) collect and compile the data; (2) format the data for computational processing; (3) process the data in a computer’s active memory; and (4) store or archive the data to enable reanalysis or to enable validation through reproducing the analysis.^{Footnote 24}

As seen in many of the available definitions on TDM, including those listed above, one of the common aspects is that the primary output of the employment of TDM techniques is the extraction of knowledge, patterns and correlations from a large amount of data. The output of TDM research is not supposed to reproduce any of the works used in the mining process in the final result and, practically, “little or none of the text, images or other forms of expression in the data appear in the TDM results”.^{Footnote 25}

Yet, whether the uses made of copyrighted works within TDM require prior authorization from copyright holders remains the subject of ongoing and growing debate within the legal literature, as will be further discussed below.

2.2 Is TDM a Copyright Issue?

As stated by Carroll, “researchers make multiple copies of the data during the TDM process”.^{Footnote 26} As will be further discussed in this item, there is a debate in the literature about whether these copies and, in general, the use of protected works for TDM may constitute copyright infringement. On the one hand, and given that some parts of the process may involve copies or other uses of copyrighted material that may require the prior authorization of the rightholder,^{Footnote 27} researchers engaging in TDM practices can be liable for copyright infringement.^{Footnote 28} On the other hand, part of the legal literature considers that some uses made for TDM projects do not trigger copyright rules for multiple reasons.

Building upon debates on this matter in the dominant literature, this section will focus on the discussion about the lawfulness under copyright law of the acts carried out (e.g. reproductions made) during the TDM process. It is important to mention that this is just one facet of a more general discussion concerning the lawfulness of acts carried out within a project employing TDM. This broader discussion is further developed in the literature and may also raise related issues, such as cross-border uses; contract prohibitions on TDM practices; the lawful access requirement; and the legal, social and economic differences between countries in the Global South and Global North.^{Footnote 29}

One of the main theories discussed in the literature about the lawfulness of TDM practices when copyright rules are considered is the one on expressive and non-expressive uses. A non-expressive use, according to Sag, “refers to any act of reproduction that is not intended to enable human enjoyment, appreciation, or comprehension of the copied expression as expression”, while the expressive use of a work would be one connected to “human appreciation of the expressive qualities of that work”, as would be the case were one to “download a film to watch it, or photocopy a magazine article to read it”.^{Footnote 30} Sag further expands on non-expressive uses and the relationship with TDM by referring to the traditional idea-expression dichotomy:

The idea-expression dichotomy limits the rights of the copyright owner to the expressive elements of the author’s work. In a world of analog works printed on paper or etched in vinyl, this is achieved by simply holding that the copying of facts and ideas alone does not infringe. Preserving the idea-expression dichotomy in the digital world means recognizing that copying a work for purely non-expressive purposes also does not infringe.^{Footnote 31}

Under this proposition, if the necessary acts carried out (e.g. reproductions made) within a research project employing TDM are considered non-expressive, there would not be any copyright infringement.^{Footnote 32} Therefore, proposing and building TDM exceptions could reinforce the argument that, if there is no exception for this kind of practice, the uses made therein could be considered infringing.^{Footnote 33} While analyzing Arts. 3 and 4 of the CDSM, Margoni and Kretschmer provide an illustration of the argument:

Nevertheless, the effect of the dispositions contained in Arts. 3 and 4 CDSM is to formalise an interpretation that significantly reduces the ambit of application of the idea/fact/expression doctrines. This is achieved through the affirmation that non-protected mere facts and data when contained in protected works receive some sort of derivative or reflected form of protection since their (non-protected) reuse requires the making of some sort of transient or temporary copy of the (protected) containing work. In other words, the content is not protected in its own right, the container is. But because there is no viable form of using the content without also using the container, the protection of the latter extends to the former.^{Footnote 34}

Given that the major economies in Latin America are parties to key international treaties on copyright, such as the Berne Convention and TRIPS Agreement, as well as human rights treaties, the fundamental issues concerning TDM and the reach of copyright may, to some extent, be similar to those raised in this section. However, potential similarities do not extend much beyond this point, as will be further demonstrated in Part 4 of this article.

2.3 What Changes with Generative AI?

Recently, the popularization of generative AI^{Footnote 35} systems has highlighted certain copyright-related – and many other^{Footnote 36} – issues relating to their training and use. When it comes to copyright, the literature has discussed how far copyright rules extend to AI-generated output, and the use of copyrighted works to train these systems. Considering the commercial purpose behind some of the popular generative AI systems and their outputs’ potential substitutive effect, the use of copyrighted works for training them may require a different treatment by copyright law than uses for research and public-interest-related purposes. Although the use of AI to develop products that are equivalent in ‘artistic’ or literary terms is no longer new, the speed and intensity with which technology has evolved in the last couple of years have brought urgency to the legislative discussion both in Latin America and around the globe.^{Footnote 37} Projects such as the well-known Next Rembrandt,^{Footnote 38} Portrait of Edmond Belamy,^{Footnote 39} or Sunspring^{Footnote 40} are still very popular examples for sparking discussion on the interplay between AI and copyright. However, more recently, the focus appears to have shifted towards generative AI systems like those offered by OpenAI.^{Footnote 41}

Although some of the technical aspects of the use of copyrighted works to train these generative AI systems may not differ substantially from those for the general use of TDM in research,^{Footnote 42} other aspects may need to be regulated differently.^{Footnote 43} One of the differences in these recent generative AI systems, when compared with both the use of TDM for research and the projects popularized up to 2020 (e.g. Next Rembrandt, and Sunspring), is that, unlike their predecessors, they are typically accessible to the public, and often provided as a service with both free and premium/pro (paid) versions. Another difference lies in their expected output: while the results expected from TDM practices in research, for example, are generally patterns and correlations (elements not protected by copyright),^{Footnote 44} the output expected from a generative AI system consists, as a rule, of products (e.g. illustrations and texts) that are objectively indistinguishable from human creations,^{Footnote 45} and may directly compete with works used in their training.^{Footnote 46} That may affect markets once exclusively populated by humans (e.g. translators, dubbing actors, designers, illustrators). Moreover, the fact that most of these systems are freely available to anyone interested in operating them allows the simultaneous creation of countless potentially competing products for less or no cost. On the other hand, some applications employing generative AI systems may assist human creators in the creative process.

Questions around fairness in the use of copyrighted works for training generative AI systems have been the object of many lawsuits.^{Footnote 47} However important, the analysis of generative AI transcends the purpose of this article. For now, it is fair to say that there are solid arguments for advocating that these activities be regulated differently than the forms of use needed for TDM for research and/or by public interest institutions when carrying out their activities.

3 TDM Exceptions Worldwide

In the past decade, there have been developments both in the field of legislation relating to TDM practices and in the related copyright literature. While these developments may have different dynamics and results according to local practices and their jurisdiction, they mostly share at least one common concern: how should copyright law confront and regulate the forms of use needed in the context of TDM? This section will provide an overview of some of the current research on TDM exceptions worldwide.

3.1 Research (and TDM) Exceptions Worldwide

TDM is a powerful tool for processing and analyzing large amounts of data within the scope of research activities and is key for contemporary computational research. When applied within the scope of research uses, TDM may be enabled, or even incentivized, by a permissive and general provision addressing such research uses in copyright law, or, alternatively, may be constricted or even made legally impossible. This section of our study will address some recent studies that have mapped and analyzed the text of copyright laws around the world, focusing on the available research exceptions and others from which TDM could benefit.

Flynn, Schirru, Palmedo and Izquierdo categorize “the world’s copyright laws according to the degree to which they provide exceptions to copyright exclusivity for research uses”.^{Footnote 48} One of the typologies adopted in the article considered six different categories, classifying research exceptions from the most open (green) to the most restrictive (red). For the purposes of this study, it is enough to note that, as seen from the map below (Fig. 1), most of the countries colored red are concentrated in Latin America. These countries, as described in the study, “specifically limit research exceptions to uses of excerpts of works, which is the minimum standard for limitations and exceptions required by Article 10(1) of the Berne Convention”. This could be inadequate for TDM purposes.^{Footnote 49}

Palmedo et al. (2023) expanded on the analysis in Flynn, Schirru, Palmedo, Izquierdo^{Footnote 50} by analyzing “a new dataset of copyright exceptions for researchers in 165 countries over 21 years”, with the goal of tracing changes in said laws.^{Footnote 51} The color scheme adopted in the previous study was adapted into a coded scoring system (switching from colors to numbers), and one of the findings concerns the fact that “[w]ealthier countries, on average, are more likely to have copyright exceptions allowing greater unauthorized uses for research purposes than other countries”.^{Footnote 52} When it comes to variations over time, Table 2 of the said study shows Latin American countries mostly present in the list of “Countries with Decreasing Scores” (a movement towards more restrictive legislation), while the list “Countries with Increasing Scores” contains more countries located in the Global North (Fig. 2):

Using the dataset in Palmedo et al.,^{Footnote 53} and carrying out a similar analysis focused only on the Latin American countries’ scores, it can be seen that, apart from Ecuador, the trend in Latin America has been for legislation to become more restrictive over the selected time period. Antigua and Barbuda (5 to 1), Brazil (4 to 0), Dominica (5 to 1), and Grenada (5 to 1) all decreased by -4 points, and Panama decreased by -3 points (3 to 0). On the other hand, Ecuador had an increase of +2 points (Fig. 3).

When considering the available research exceptions, it becomes evident that not only does the present legal framework fall short of being optimal for research purposes, but it also reflects a broader trend towards increased restrictiveness in the region. This trend contradicts the movement observed in developed countries and, as will be further developed in Section III.C, may negatively impact research and innovation in the region.

When it comes to specific TDM exceptions, and building on the color-coding scheme previously mentioned, Flynn, Schirru, Palmedo and Izquierdo^{Footnote 54} also provide an overview of existing provisions. As shown in the figure below, most of the existing exceptions were concentrated in Global North countries. In general, existing TDM exceptions enable TDM practices for research purposes, even where there are restrictions on the used works or permitted forms of use. Additional requirements seen in some jurisdictions, for example the need for lawful access and protection against contractual overridability, are also mapped.^{Footnote 55} As analyzed by the authors, the only TDM exception found in Latin America was in Ecuador, which was flagged red owing to potential restrictions when it comes to the forms of use needed for conducting TDM research (Fig. 4).^{Footnote 56}

Latin American countries also make up a significant proportion of those countries in which copyright law restricts research uses and is mostly silent on TDM practices. Next, we will analyze the potential negative consequences of this legal landscape on research in the region.

3.2 The Relationship between Balanced Copyright Regimes and TDM Research

Although the link between stronger and more restrictive intellectual property rights and higher innovation remains unclear,^{Footnote 57} recent empirical evidence in the literature shows a negative association between restrictive copyright rules and innovation,^{Footnote 58} as well as a positive relationship between more open and permissive copyright systems and research.^{Footnote 59}

By building a “User Rights Database”^{Footnote 60} and applying econometric tests to the data available therein, Flynn and Palmedo analyzed 21 countries’ copyright laws between 1970 and 2016.^{Footnote 61} The study found that “researchers in countries with more open user rights environments produce more scholarly output and more high-quality output”.^{Footnote 62} When it comes to the impact of more open copyright laws on copyright-intensive industries, the findings were either neutral^{Footnote 63} or positive. The study found that “more open user rights environments are associated with higher firm revenues” for information industries, and that “more open user rights environments are not associated with harm to industries […] such as publishing and entertainment”.^{Footnote 64}

When comparing developed and developing countries, the study illustrates the gap between the level of openness of the legislation in both categories, concluding that: “developing countries in our sample are now at the level of openness that existed in the wealthy countries about thirty years ago” (Fig. 5).^{Footnote 65}

Handke, Guibault and Vallbé,^{Footnote 66} found “strong evidence for stricter copyright hindering the wide adoption of novel ways to build on copyright works and generate derivative works”. The study analyzed “bibliometric data to establish how various copyright policies affect the application of DM [data mining] in academic research”,^{Footnote 67} covering data available in the Web of Science between 1992 and 2014. Amongst the findings of the study are the following:

countries in which academic researchers must acquire the express consent of rights holders [which includes all researched Latin American countries] to conduct lawful DM exhibit a lower share of DM research output in their total research output […] This implies that an application of copyright exceptions or limitations that establish the right to mine for academic researchers – if they have lawful access to input works and irrespective of explicit rights holder consent – boosts DM research.^{Footnote 68}

Recently, some of the authors involved in the previous study presented a related study involving a larger research team and additional data. The research worked with a dataset of 1.5 million TDM-related articles,^{Footnote 69} and shared similar objectives to the research from 2021, i.e. to understand how much TDM research was published per country and per year,^{Footnote 70} and – as there may be additional factors influencing the outcomes – to analyze the potential correlation between the amount of TDM research being produced and the degree of “openness” of the copyright law of these countries (Fig. 6).

The conclusions aligned with each other: they found that countries with more restrictive laws tended to be less productive when it came to TDM-related research than countries with more open copyright laws.

By comparing the table above, which gives updated numbers of TDM research articles, with the results presented in Flynn and Palmedo, Handke, Guibault and Vallbé, Flynn, Schirru, Palmedo, and Izquierdo, and Palmedo et al.,^{Footnote 71} it can be seen that countries commonly referred to as having more “open” and “permissive” copyright law exceptions (e.g. USA, Germany, Japan, Australia, and Canada) are usually also amongst the countries with more published research on TDM.

4 Text and Data Mining in Latin America

Legal clarity, indeed certainty, on research uses and, more specifically, TDM-related practices under copyright law may be decisive for technological, scientific, and economic development. The current legal framework in Latin America, in general, is not well equipped to deal with TDM practices. This may significantly affect research and the development of its AI industry, with countries and researchers having fewer options, which are also far from optimal, such as the adoption of pre-trained models and datasets prone to bias and local inadequacy. This section of our analysis focuses on the current legal status of TDM and research exceptions in Latin America and gives examples of TDM practices that are key for research. Finally, this section will address some issues arising from qualitative interviews with practitioners from different fields that may contribute to the design of a well-tailored legal framework in the region.

4.1 TDM and Research Exceptions in Latin America

Previous studies on research exceptions in copyright law illustrate the alarming scenario for researchers working with TDM in Latin America, as most of the laws in the region are restrictive and insufficient for fostering innovation,^{Footnote 72} and/or do not provide a general exception for research/TDM.^{Footnote 73} Moreover, apart from recent and localized developments towards a more extensive approach (based on fundamental rights),^{Footnote 74} limitations and exceptions (L&Es) are largely interpreted restrictively in Latin American countries.^{Footnote 75} This creates substantial legal obstacles for different common practices, especially for libraries, archives, museums, and research and education institutions, and their agents.^{Footnote 76}

Regarding TDM specifically, the situation is no different. By analyzing the copyright laws of the five largest economies of South America at the time (Argentina, Brazil, Chile, Colombia and Peru), Bertón (2021) concluded that the copyright systems in all regions were “not prepared for digital research techniques such as text and data mining” and imposed “limits that do not cover the needs of TDM researchers and put the region at a competitive disadvantage for keeping up with the latest developments in AI.”^{Footnote 77}

A recent study^{Footnote 78} presented in interactive maps^{Footnote 79} divides the copyright exceptions of 19 countries^{Footnote 80} into three different categories: (i) educational purposes; (ii) libraries and archives; and (iii) research and new technologies. When compared with the results obtained by Flynn, Schirru, Palmedo and Izquierdo (2022), it is clear that, by 2022, the only TDM provision in Latin America was the one enacted by Ecuador, which was further analyzed in the latter study.

However, this does not mean that TDM policies have not been addressed in Latin America. Recently, there have been multiple initiatives in the region concerning the creation of a TDM provision or amendments in the copyright law that may positively impact research practices. In July 2020, Senator Antares Guadalupe proposed including a TDM limitation in Mexican copyright law. The provision would allow reproductions and extractions for TDM purposes, conditional upon lawful access.^{Footnote 81} A Uruguayan proposal first dated 2020 was to change the copyright law to allow works to be reproduced for computational analysis within the scope of non-commercial research.^{Footnote 82} In Brazil, a TDM limitation is currently being discussed in the Senate as part of a bill on AI.^{Footnote 83} Article 42 of that bill (Bill 2338/2023) authorizes “automated use of works, such as extraction, reproduction, storage, and transformation, in data and text mining processes in artificial intelligence systems”, limited to “activities carried out by research and journalism organizations and institutions, and by museums, archives, and libraries”.^{Footnote 84}

Despite important initiatives at national level to foster the debate on TDM and copyright, its importance to research, and the need for such exceptions in copyright laws,^{Footnote 85} the actual revision of the relevant legal text still has to be addressed. Legal harmonization, and not only at regional level, may also be of crucial importance for narrowing the gap that exists between developing and developed countries.^{Footnote 86} The current legal landscape and the fact that developing countries are currently not updating their legislation to properly regulate data-driven research and the use of TDM-related tools create unjustifiable legal obstacles for researchers and research institutions,^{Footnote 87} both at national and regional level and in cross-border collaboration.^{Footnote 88}

4.2 TDM Practices and Research in Latin America

Data-driven research has many applications in a wide variety of areas, and this is no different in Latin America. Examples of data-intensive research can be found in health,^{Footnote 89} including, very specifically, regarding neglected diseases common in the region;^{Footnote 90} in information management by librarians;^{Footnote 91} in comparing the quality of external communication by universities;^{Footnote 92} and in avoiding the spread of misinformation on the web.^{Footnote 93} This section will provide further details on examples in the area of health, focusing on the impact of the recent COVID-19 pandemic and the struggle with neglected diseases. In addition, we will report on some of the main results of an empirical study carried out in Latin America in 2021.

4.2.1 The Role of TDM in Health Research

Brazil has been hit hard by the COVID-19 pandemic in terms of number of cases and deaths. By 4 February 2024, Brazil had the sixth highest number of confirmed cases (37.5 million) and the second highest number of confirmed deaths (702,100).^{Footnote 94} These numbers may be even higher owing to possible under-notification.

Recent years have seen research goals all over the world redirected in order to fight the SARS-CoV-2 virus.^{Footnote 95} By 4 February 2024, there were 28,592 articles addressing matters related to COVID-19 in the medRxiv and bioRxiv repositories. Between 1 March 2020 and 1 March 2021 alone, a search for the term “coronavirus” in the bioRxiv and medRxiv repositories brought up 13,038 results.^{Footnote 96} The articles address issues related to the treatment, diagnostics and other medical procedures for the new coronavirus.^{Footnote 97} These numbers show that in these two repositories alone, a significant amount of data was already available to deal with the new SARS-CoV virus. Together with all the other content made available in other repositories, it would be humanly impossible for health professionals and scientists to read and extract all the information available on the internet.

Here, automatic and computational text and data analysis for uncovering patterns and correlations make a difference in the speed and scope of findings.^{Footnote 98} However, while medRxiv expressly allows text and data mining in its database,^{Footnote 99} and there were several initiatives towards openness and access for research purposes during the COVID-19 pandemic,^{Footnote 100} this study provides evidence that there are not enough legal possibilities in the copyright laws of Latin American countries for carrying out the acts of use needed for research purposes. That is likely to affect future challenges. Taking the COVID-19 pandemic as an example, there was no literature before 2020 on how to deal with this specific Sars-CoV virus or its effects on the human body. The potential emergence of new diseases could pose similar challenges, and any restrictions on accessing and mining data and information from these studies may hinder the development of critical knowledge essential for crafting an effective response to disease and ultimately saving lives.

On the other hand, owing to access to genetic data^{Footnote 101} free of charge,^{Footnote 102} scientists involved in a study in Brazil could “in just 24 hours […] conduct a sequencing of the samples collected and discern the regions of origins of the virus”, allowing them to reach an important finding in the midst of the health emergency caused by COVID-19: “successfully performing these sequencing methods constituted a crucial step towards understanding the main characteristics of the pathogen and how much it mutated”.^{Footnote 103} TDM research in the region is also important for neglected tropical diseases,^{Footnote 104} which affect millions of inhabitants of certain areas.^{Footnote 105} By way of illustration, TDM was used in a study addressing schistosomiasis, with the main objective of “creating a knowledge base about schistosomiasis and classifying information using text mining techniques”.^{Footnote 106} The study also relied on a database whose data is available under the open access regime.^{Footnote 107}

4.3 Evidence on the Interplay between Copyright and (TDM) Research

In a parallel study,^{Footnote 108} also conducted by the authors of this article,^{Footnote 109} we delved into some of the practices and perceptions surrounding copyright exceptions for research activities, particularly concerning TDM uses. We aimed to understand how stakeholders who need to perform such activities perceive the lack of express limitations and exceptions for research, particularly for TDM. We gathered anecdotal evidence rather than comprehensive statistical data. Between August and December 2021 we conducted 53 interviews with stakeholders from research entities,^{Footnote 110} government bodies,^{Footnote 111} and third-sector organizations,^{Footnote 112} across six Latin American countries.^{Footnote 113} The main findings reveal a significant disparity in knowledge about copyright laws among the three sectors, ranging from a lack of awareness of the connection between copyright and research to explicit demands for copyright reform.

It is notable that there is limited awareness of existing copyright protection for databases, particularly among researchers regarding access to scientific articles.^{Footnote 114} None of the interviewees made a distinction between copyright protection of the contents (e.g. scientific articles, videos, photos) and of the database itself. Some of the researchers and other stakeholders interviewed showed little comprehension of how current copyright law works as a whole.^{Footnote 115}

On the other hand, stakeholders who were more informed about copyright issues advocated for necessary reforms. Argentine researchers in particular explicitly called for national and international changes in copyright law to address perceived obstacles to research activities.^{Footnote 116} Concerning the right to research in particular, 9 out of 53 interviewees considered that there was a need for clear limitations and exceptions allowing for the use of copyrighted materials for research purposes^{Footnote 117} or that the law should clearly set out what was and what was not allowed for research activities, as a way of protecting researchers and institutions.^{Footnote 118} Across many countries, private-sector stakeholders exhibited a deeper understanding, possibly owing to the heightened risks associated with their activities.

Lateral findings highlight stakeholders’ support for public access to databases. Some defended the need for publicly funded data to remain open despite challenges in academic publishing dynamics. Interviews also uncovered explicit references to “piracy” and alternative practices like accessing shadow libraries – such as Sci-Hub and LibGen – in order to be able to conduct their research and develop the knowledge that they needed. While researchers acknowledged that these platforms were probably illegal, they still viewed them positively.

Even if sometimes expressed indirectly, this concern reflects the limited access that researchers in this region have to information they need for their work. Their preference for open databases, along with their criticism of knowledge privatization models, and concerns about contractual challenges in licensing private databases (particularly within universities and libraries), highlights significant barriers to democratizing research and knowledge. Researchers commonly refer to shadow libraries as crucial sources, emphasizing how the enforcement of copyright rules is perceived in the region. This, in practical terms, is viewed as a “balance” to the licensing issues faced by universities and libraries.

Another relevant finding was that several stakeholders in all three sectors spontaneously expressed concern about personal data protection. Given that some countries in the Latin American region have recently approved data protection laws, this seems to indicate that there has been some success in creating awareness in the field.

To sum up, the research underscores the need for awareness raising and community action among researchers and institutions, as well as legislative reform to enhance legal possibilities and security for research activities across Latin America. It has also highlighted the fact that, when interviewees were more informed about the impact of copyright on research, they were usually more in favor of legislative reform. The findings also highlight the nuanced views on openness and accessibility, the prevalence of alternative practices based on necessity, and the intertwining of copyright with broader regulatory issues like AI and data protection.

5 Conclusion: The (Urgent) Need for a Proper Legal Framework for Research

Previous studies illustrated the alarming scenario in Latin America regarding the legal landscape applicable to research using copyrighted works. The region brings together in one area a substantial number of countries in which legislation on the matter is either restrictive (in the case of uses for research purposes in general) or non-existent in terms of a specific TDM exception. This is not exclusively a new development: as shown above,^{Footnote 119} there has been a trend towards increasingly restrictive regimes in Latin America over recent decades.

As empirical evidence suggests, there is a negative relationship between restrictive copyright regimes, and research (including research on TDM) and innovation. While the restrictiveness of copyright laws is just one of several factors affecting innovation in the Global South, ensuring the legal possibility of and certainty for uses for research purposes, and maintaining a well-balanced copyright regime may be important steps in fostering technological, scientific, and economic development.

Most Latin American countries that would highly benefit from a TDM exception or from a legal framework conducive to innovation have not achieved either yet. And there may be several reasons for that, including the fact that these countries may find their position in international trade affected if they do not comply with, or agree to, standards often proposed or advocated by countries in the Global North.^{Footnote 120} In addition, and given the cross-border nature of many contemporary research practices, the inadequacy of the current legal framework also stems from the absence of international legal standards on limitations and exceptions, both within the region and globally.

The legal uncertainty around TDM research may make it more costly, both economically and otherwise, for researchers and organizations (e.g. owing to the need to negotiate with and pay for licenses from each owner for fear of legal action). This may ultimately compel them to abandon their research, or conduct it abroad,^{Footnote 121} or may affect collaboration with regional and global research partners. In addition, by not clearly allowing, or by substantially restricting, the training of and research on AI systems in Latin America, countries in this region may have to rely on pre-trained models, whose opacity raises concerns – and risks – of bias and other potentially harmful consequences, owing to the lack of control over the materials used in the training dataset.^{Footnote 122} On a cultural and social level, countries will lose an important opportunity to train models with materials that reflect their own characteristics, languages, demands and desires in different fields, including but not limited to health (e.g. neglected diseases) and culture (e.g. linguistic peculiarities).

Copyright today can be seen as much more than the set of rules for protecting and using original expressions of the human spirit in the arts, literature and science. By restricting the scope of what can be used to train AI systems and under which circumstances, copyright rules go beyond protecting expressions and may even hinder the development of a country’s AI industry and, more broadly, its economic and technological development.^{Footnote 123} In the meanwhile, countries like the U.S. have a general clause of “Fair Use” and have already discussed TDM-related issues in their case law,^{Footnote 124} while jurisdictions like the European Union, Japan, and Singapore already have a specific and express TDM exception.^{Footnote 125} At the same time, the existing legal framework in Latin America is not suited to regulating and allowing for TDM practices for research purposes.^{Footnote 126}

Therefore, in order to provide the necessary incentive and legal assurance for researchers and research institutions, it is crucial that copyright laws are balanced and updated to take into consideration and regulate data-intensive and computational research and other public-interest activities in a way that actually promotes development and innovation. We argue that TDM exceptions, adequately designed, have the potential to play a significant role in this.

On a more abstract level, new TDM provisions should consider the economic, cultural and technological context and aim to promote national systems of innovation, but also be tailored so as not to isolate any country or region, given the borderless nature of contemporary research. In order for legal reform to happen, more of the stakeholders involved must become aware of the dynamics of the impact of copyright on research. The legal text itself must be sufficiently clear about permitted uses, users, purposes and the relationship with existing related provisions in copyright law (e.g. by clarifying that technological protection measures cannot be imposed to restrict or impede the enjoyment of the authorized uses), as well as with wider legislation outside the copyright system (e.g. laws regulating the use of personal and non-personal data). It is important that provisions allow for all uses necessary in the context of research activities or activities carried out by certain categories of users (e.g. public-interest-oriented institutions),^{Footnote 127} and not be overridden by private agreements. These recommendations represent no more than a few suggestions that could potentially help craft the research and TDM provisions required in Latin America. In any event, they must take into account the rich and diverse cultures and legal systems coexisting in that region.

Notes

Flynn et al. (2020), para 3: “Many of the most useful TDM and AI projects involve the use of copyright protected works”.
See Han et al. (2011).
See Flynn et al. (2021): “Fewer than a quarter of the countries in our study have research exceptions that are open to reproduction and sharing of any type of work by any user”.
See Bertón (2021).
See e.g., Rosati (2020).
See e.g., Sganga (2024).
See e.g., Geiger (2021), Geiger et al. (2019).
See e.g., Margoni and Kretschmer (2022).
See e.g., Handke et al. (2015), Ducato and Strowel (2021), Dusollier (2020).
See e.g., Carroll (2019), Sag (2019, 2023).
See e.g., Authors Guild v. Google, Inc., 804 F.3d 202, 215 (2d Cir. 2015); Authors Guild, Inc. v. HathiTrust 755 F.3d 87, 105 (2d Cir. 2014).
See e.g., Ueno (2021), Dermawan (2023).
See e.g., Senftleben (2022).
Throughout the article, we cite articles and research that either refer to one country in Latin America and/or multiple countries, but without exhausting all the jurisdictions that are part of Latin America. While we acknowledge that one example or even several may not fully represent the entire region, we understand that there is value in the insights derived from analyzing patterns observed in some of the key countries in the region.
Directive 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and2001/29/EC, OJ L 130/92, Art. 2(2).
Translated by the authors. The original text is available in the final report by CJUSBIA, a committee of legal experts tasked with helping to draft a new bill on AI in Brazil. The report can be downloaded here: https://legis.senado.leg.br/sdleg-getter/documento/download/bdaad0dc-5c0a-4217-a6d0-aefb0d8ec8d4. For an analysis of the legislative process relating to the development of the TDM provision within the Brazilian AI Bill, see Schirru et al. (2024).
Han et al. (2011), p. 8.
Izquierdo (2021), p. 327.
Izquierdo (2021). Translated by the authors. Original text: “la capacidad de la IA para interpretar grandes cantidades de datos sin procesar, para, precisamente, ser procesados mediante la identificación de patrones”.
Ruiz Lobaina and Romero Suárez (2018), p. 2.
Translated by the authors. Original text: “La minería de datos es el proceso […] que se encarga de la extracción no trivial de patrones ocultos, útiles y que residen de forma implícita en los datos y también la forma más rápida de estudiar grandes volúmenes de información”.
See Hugenholtz (2019): “In the industrial and commercial realm TDM has become even more pervasive. Text and data mining is nowadays standard practice in pharmaceutical research, journalism, information retrieval, search, and consumer information – to name just a few areas”. See also Ducato and Strowel (2018): “Several of these studies highlight that the beneficial uses of TDM are not limited to scientific research, but take place in other contexts, including consumer information and protection”. Carroll (2019), p. 903: “Text and data mining tools are used by scholars in all fields. While most attention has been focused on the promise of using TDM in the biological sciences to discover new lines of research that ultimately improve human health, other fields of inquiry, including the emergence of digital humanities as a distinct field, also rely on text and data mining”.
Rosati (2020): “TDM activities can take place through different procedures and with different goals, the only common element being that of analysing and extracting associations between concepts to identify new patterns and relations”.
Carroll 2019, p. 895.
Carroll (2019), p. 895.
Carroll (2019), p. 895.
See Geiger et al. (2019), p. 6: “It works by (1) identifying input materials to be analysed, such as works, or data individually collected or organised in a pre-existing database; (2) copying substantial quantities of materials – which encompasses (a) pre-processing materials by turning them into a machine-readable format compatible with the technology to be deployed for the TDM so that structured data can be extracted and (b) possibly, but not necessarily, uploading the preprocessed materials on a platform, depending on the TDM technique to be deployed; (3) extracting the data; and (4) recombining it to identify patterns into the final output. Obviously, there is a tension between intellectual property protection and TDM techniques”.
See Geiger et al. (2019), p. 7: “However, at some point, during the chain of activities enabling TDM research, technically some IPR relevant actions are necessary so that in the absence of a specific permission within the legal framework, TDM can lead to an infringement. […] Basically, IPRs can be affected whenever mining involves IP protected subject matters”.
See e.g., Sag (2019), Schirru and Margoni (2023), Margoni (2023).
Sag (2019), p. 6.
Sag (2019), p. 9.
See e.g., Sag (2019), p. 9: “Somewhat surprisingly, the distinction between ideas and their expression is only implicit in the Berne Convention for the Protection of Literary and Artistic Works (“Berne Convention”), but what Berne left unstated, the TRIPs Agreement made explicit […] What is the significance of the idea-expression dichotomy for TDM? The idea-expression dichotomy limits the rights of the copyright owner to the expressive elements of the author’s work. In a world of analog works printed on paper or etched in vinyl, this is achieved by simply holding that the copying of facts and ideas alone does not infringe. Preserving the idea-expression dichotomy in the digital world means recognizing that copying a work for purely non-expressive purposes also does not infringe”; Flynn et al. (2020), p. 4: “Although the enormous scientific and cultural progress that TDM can enable may require merely technical reproductions of copyright-protected works, TDM need not come at the expense of rights holders. These reproductions do not compromise the core interests of exclusive rights, which is to prohibit unauthorized reproductions that can substitute for the work of the author. It could even be argued that these incidental reproductions are outside of the scope of exclusive rights. Also, as has been underlined by several scholars, mere reading does (and should) not involve a copyright relevant action, and neither should “the act of reading a work into a computer's random access memory”.
Rosati (2020), p. 215: “In addition, by framing TDM within the scope of copyright protection, EU legislature has clarified once and for all that the undertaking of TDM activities outside the scope of a licencing agreement or qualification for the protection offered under available exceptions would expose one to potential liability for copyright infringement, irrespective of whether the process of copying (if any) is intermediate and finalized at extracting what copyright law does not protect, that is facts, data, and information. In practice, this might have a negative impact on the (unlicensed) development of AI creativity”.
Margoni and Kretschmer (2022), p. 690.
Examples of generative AI systems are: ChatGPT (https://chat.openai.com/); Dall-E2 (https://openai.com/dall-e-2); Midjourney (https://www.midjourney.com/); Stability AI (https://stability.ai/); (Adobe Firefly (https://firefly.adobe.com/); Dreamstudio (https://dreamstudio.ai/); Canva AI Image Generator (https://www.canva.com/ai-image-generator/); Getty Images (https://www.gettyimages.com.br/ia/gerador/sobre).
As an example, Friedmann (2024), p. 1, highlights the potential impact on cultural and social aspects: “Historically, human culture has been imperative to human flourishing, and has become arguably more sublime over time […]. After social media seduced humanity into sharing what makes it tick […], gAI poses the threat of replacing human culture with increasingly diluted versions of that culture”. On some of the copyright issues, see Samuelson (2023).
Examples of laws discussing matters within the interplay between (generative) AI and Copyright can be found, for example, in Bills 4025/2023 and 1473/2023 before Brazil’s Chamber of Deputies, and Bill 2338/2023 before Brazil’s Senate; in the Artificial Intelligence Act in the European Union; and in recent Law proposal No. 1630 in France.
Next Rembrandt, https://youtu.be/IuygOYZ1Ngo.
Christie’s (2018).
Ars Technica (2016).
A brief check on the Scopus database on 24 January 2024 showed that, of the 162 results from Social Sciences articles that mentioned “artificial intelligence” and “copyright” in their abstract, the first one addressing copyright issues to mention the word “generative” (in any part of the article) was dated 2020. There was significant growth in articles mentioning “generative” between 2022 and 2024: of the 57 documents mentioning “artificial intelligence” and “copyright” in their abstract, 20 also contained the word “generative”.
Sag (2023), p. 309: “For the most part, the copyright implications of the new wave of LLMs are no different from earlier applications of text data mining”. See also, Dermawan (2023): “In its operation process, both DALL‐E 2 and Stability AI use the TDM technique to obtain realistic images and art from a text description”.
These differences and other arguments have been addressed in the literature in support of, for example, an authors’ remuneration system. Senftleben (2023), p. 1537: “Other justifications – ranging from the parasitic use of human literary and artistic works and central functions of human art in society to broader socio-political objectives and arguments for improving AI – strongly support the introduction of a system for remunerating human authors”.
See e.g., the definitions provided in Part I.
See e.g., Schirru (2023).
See e.g., Lucchi (2023), p. 1: “Current legal frameworks, such as fair use in the USA and the TDM exemption in the EU, provide some guidance on the use of copyrighted material to train AI models. However, these frameworks may not fully address the complexities inherent in generative AI systems, which can directly compete with and even dwarf original works”. Senftleben (2023), p. 1556: “Generative AI systems are only capable of mimicking human creativity because human works have served as training material […] Generative AI systems are likely to replace human creations and usurp the market for human literary and artistic works”.
See e.g., the recent case involving The New York Times and Open AI: The New York Times Company v. Microsoft Corporation (1:23-cv-11195) District Court, S.D. New York. See also Authors Guild v. OpenAI Inc. (1:23-cv-08292-SHS) District Court, S.D. New York. Getty Images (US), Inc. v. Stability AI Ltd., High Court of Justice in London, No. IL-2023-000007.
Flynn et al. (2022a).
Flynn et al. (2022a), p. 21.
Flynn et al. (2022a).
Palmedo et al. (2023), p. 3.
Palmedo et al. (2023), p. 8.
Palmedo et al. (2023).
Flynn et al. (2022a).
Flynn et al. (2022a), p. 35, clarify that these two factors are not considered in the typology.
For a complete analysis of the provision, see Flynn et al. (2022a), pp. 33–35.
Dosi and Stiglitz (2013): “Among the central theses of this introduction, and the papers in this book, are the following: […] (iii) Most broadly, the link between stronger IPR and innovation is ambiguous at best. […] So far we have primarily discussed the relations between the regimes of IPR protection and rates of innovations, basically concluding that either the relation is not there, or if it is there that it might be a perverse one, with strong IPR enforcement actually deterring innovative efforts”.
Handke et al. (2021).
See e.g., Handke et al. (2021), Flynn and Palmedo (2019a).
Flynn and Palmedo (2019a), p. 2, refer to “user rights” as “rights to use copyrighted material without the permission of owners to facilitate a range of modern activities from social media to Internet search”.
The Latin American countries whose laws were considered in the study were: Argentina, Brazil, Chile, Colombia, and Peru.
Flynn and Palmedo (2019a), p. 3.
Flynn and Palmedo (2019a), p. 19: “We next sought out to test whether the gains to technology firms come at a cost to traditional copyright intensive industries – such as book publishers, music publishers, and motion picture and video producers. We find no evidence of such a cost”.
Flynn and Palmedo (2019a), p. 2.
Flynn and Palmedo (2019a), p. 3.
Handke et al. 2021, p. 2012.
Handke et al. (2021), p. 1999.
Handke et al. (2021), p. 2008.
Vallbé (2023).
The presenter clarified that in situations of co-authorship, the country that counted was that with which the first author was affiliated.
Flynn and Palmedo (2019a), Handke, Guibault and Vallbé (2021), Flynn et al. (2022a), Palmedo et al. (2023).
The inadequacy of the current set of limitations and exceptions also formed part of the conclusion reached by Pirela (2023), p. 28: “Los catálogos de limitaciones y excepciones al derecho de autor característicos del sistema continental pueden ser insuficientes como formas de garantizar un acceso equilibrado y transparente a diversos productos protegidos por el régimen jurídico especial de propiedad intelectual necesarios para generar mejores desarrollos” [The catalogs of limitations and exceptions to copyright characteristic of the continental system may be insufficient as ways of ensuring balanced and transparent access to various products protected by the special intellectual property legal regime necessary for generating greater development].
See e.g., Flynn et al. (2022a), Palmedo et al. (2023).
For example, in Brazil, the Supreme Court of Justice (2011) decided that “The effective scope of protection of copyright (Art. 5, XXVII, of the Federal Constitution) only emerges after considering the restrictions and limitations imposed on it, being considered as such those resulting from the list of examples drawn from the statements contained in Arts. 46, 47 and 48 of Law 9.610/98, which must be interpreted and applied in accordance with fundamental rights” (translated by the authors). S.T.J. Recurso Especial No. 964.404 ES (2007/0144450-5). Rapporteur: Ministro Paulo de Tarso Sanseverino, 15.03.2011, Diario da Justica Eletronico [D.J.e.], 23.05.2011 (Braz.). More recently, Brazil’s Federal Justice Council approved Enunciado [Interpretative Statement] 115, providing that: “The copyright limitations established in Arts. 46, 47 and 48 of the Copyright Law must be which must be interpreted and applied in accordance with fundamental rights and the social function of property established in Art. 5, XXIII, of CF/88” (translated by the authors). Conselho da Justiça Federal, III Jornada De Direito Comercial, Enunciado 115), https://www.cjf.jus.br/enunciados/enunciado/1310. Federal Justice Council (2019)
Diaz Charquero (2022), p. 58: “la doctrina mayoritaria entiende que las flexibilidades presentes en cada ley deben interpretarse de forma restrictive […], por lo que es poco probable que los jueces, al aplicar el derecho, generen nuevas excepciones por analogía o apelando a los principios generales del derecho” [the majority of legal experts understand that the flexibilities in each law must be interpreted restrictively […], so it is unlikely that the courts, when applying the law, will generate new exceptions by analogy or by appealing to the general principles of law].
After analyzing the database provided in Diaz Charquero (2021), which is built on the legislation of ten Latin American countries (Argentina, Brazil, Chile, Colombia, Ecuador, Mexico, Panama, Paraguay, Peru and Uruguay), the author 2022, 74 concludes: “es altamente probable que las bibliotecas, los archivos y las instituciones educativas y de investigación, así como las personas que forman parte de estas instituciones, enfrenten reiteradas situaciones de inseguridad jurídica” [it is highly likely that libraries, archives, educational and research institutions, as well as the people who are part of these institutions, repeatedly face situations of legal insecurity].
Bertón (2021), p. 1156.
Latin American Civil Society Alliance for Fair Access to Knowledge, “A Review on the State of Copyright Flexibilities in Latin American Countries” (2022), https://datysoc.org/wp-content/uploads/2022/05/Copyright-Flexibilities-LAC-Ginebra-1.pdf.
See DatySoc (2023), Flexibilidades al derecho de autor en América Latina, https://flexibilidades.datysoc.org/mapa.
Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Cuba, Dominican Republic, Ecuador, El Salvador, Guatemala, Honduras, Mexico, Nicaragua, Panama, Paraguay, Peru, Uruguay and Venezuela.
Gaceta Parlamentaria (2020), Miércoles 08 de julio de 2020 / LXIV/2SPR-29-2661/109633FN, https://www.senado.gob.mx/65/gaceta_del_senado/documento/109633.
Parlamento del Uruguay (2020), Derechos Autor, Excepciones y Limitaciones a Bibliotecas, Archivos y Plataformas Virtuales Academicas. https://parlamento.gub.uy/documentosyleyes/ficha-asunto/149302/ficha_completa.
Further analysis of the legislative debate can be found in Schirru et al. (2024).
Translated by the authors. The full original text is available in the final report by CJUSBIA, which can be downloaded here: https://legis.senado.leg.br/sdleg-getter/documento/download/bdaad0dc-5c0a-4217-a6d0-aefb0d8ec8d4.
At national level, and for illustration purposes, we refer to some of the recent articles and scientific publications on TDM and copyright in Brazil. Alvarenga 2019 was one of the first known theses in Brazil addressing the copyright issues in text and data mining. It analyzed the Brazilian copyright L&Es and their role in innovation in Brazil, and in particular in data-driven technologies, and concluded that there was a need for a TDM exception in the copyright legal framework, as well as for public policies to be designed to enable further accessibility for databases. Souza et al. (2020, 2022), still focusing on the Brazilian legal framework and using examples connected to research and fake news in the country (Aos Fatos (2024a, b)), addressed the importance of TDM techniques in research on the pandemic and in combating disinformation, highlighting the obstacles imposed by multi-layered protection to databases and stressing “the role of copyright limitations and exceptions in TDM and their implementation in developed countries, […] highlighting the need for developing countries to respond to current technological demands and bridge the divide in worldwide technological innovation”. (2022, p.2) More recently, Schirru et al. (2024), provided an analysis of the legislative process behind the proposal for a TDM exception within the work carried out by a special committee convened by the Brazilian Senate to work on a substitute draft for the AI Bill (Federal Senate of Brazil, 2023). As will be further developed in Part III, Brazil’s TDM provision focuses on uses by public-interest-oriented organizations, and currently forms part of Bill 2338/2023, which is being discussed in the Senate.
Izquierdo (2021), p. 337.
As pointed out by Izquierdo (2021), p. 337, research must rely on the existing exceptions in national laws, which may be restrictive in their scope (e.g. Peru and Nicaragua) or in their interpretation (e.g. Argentina).
See e.g., Bertón (2021), p. 1156: “If developing countries want to truly embrace the opportunities of digital technologies, especially AI, they have to proactively look for better solutions that encourage research in their territories. South America can move in this direction by introducing TDM-specific exceptions or adapting some of the more general exceptions to suit the needs of digital research. This would reduce the barriers to access content that can be used for data training and allow local scientists to rely on national copyright regimes without moving to foreign jurisdictions”.
See e.g., Goes de Jesus et al. (2020), Souza et al. (2022).
Araújo et al. (2016), Ferreira and Correa (2020).
See e.g., Botta-Ferret and Cabrera-Gato (2007), Ruiz Lobaina and Romero Suárez (2018).
See Nunez et al. (2021).
TDM is also a key ally in the fight against disinformation, especially at election times. As reported by Souza et al. (2020), p. 3, “Aos Fatos” is an important example “which has a Radar based on algorithms constantly curated by linguists. The software collects publications and posts on several media such as WhatsApp, Facebook, YouTube, and others, looking for keywords that match content which is typically associated with false information on several topics, including those related to the COVID-19 pandemic.[…] The agency also has a bot on Twitter, called Fátima, dedicated to debunking false information on the platform that the agency has already checked”.
World Health Organization (2023). WHO COVID-19 dashboard, https://covid19.who.int/ (last accessed, 4 February 2024).
OECD (n.d.) “The pandemic has triggered an unprecedented mobilisation of the scientific community”, https://www.oecd.org/sti/science-technology-innovation-outlook/crisis-and-opportunity/thepandemichastriggeredanunprecedentedmobilisationofthescientificcommunity.htm.
MedRxiv . https://www.medrxiv.org/search/coronavirus%20jcode%3Amedrxiv%7C%7Cbiorxiv%20limit_from%3A2020-03-01%20limit_to%3A2021-03-01%20numresults%3A10%20sort%3Arelevance-rank%20format_result%3Astandard.
MedRxiv. https://connect.medrxiv.org/relate/content/181.
See Caspers and Guibault (2016), p. 2: “The amount of published knowledge has increased exponentially over the last decades. With the help of TDM techniques, this has opened up opportunities to mine large collections of works to find certain patterns and it has enabled academics to keep up with an overload of publications in some fields through machine assisted literature review”. Carrol 2019: “For researchers in many fields, computational research of the published literature holds out great promise. Scholars in most fields, and particularly in the sciences, suffer from information overload. Too many potentially relevant journal articles are published each day for a scholar to find, read, and analyze. Computational analysis can help the scholar sort through this information to identify those articles most relevant to the scholar. More importantly, a computer can independently process (read) all of these data to mine for patterns, concordances, and other relationships in the data that are, or potentially are, relevant to the scholar’s field of inquiry”.
MedRxiv, Text & Data Mining, https://www.medrxiv.org/content/about-medrxiv: “TEXT & DATA MINING. medRxiv provides free and unrestricted access to all articles posted on the server. We believe this should apply not only to human readers but also to machine analysis of the content”.
See e.g., Wellcome (2020). Press Release, Publishers make coronavirus (COVID-19) content freely available and reusable, WELLCOME (15 March 2020), https://wellcome.org/press-release/publishers-makecoronavirus-covid-19-content-freely-available-and-reusable. On the IP-related barriers to the prevention and containment of COVID-19, see also Flynn et al. (2022b).
GISAID, https://www.gisaid.org/.
Further information on the research can be found in Goes de Jesus et al. (2020), Souza et al. (2022).
Souza et al. (2022), pp. 2–3.
According to the WHO Q&A: “Neglected tropical diseases (NTDs) are a diverse group of conditions that are mainly prevalent in tropical areas, where they thrive among people living in impoverished communities. They are caused by a variety of pathogens including viruses, bacteria, parasites, fungi and toxins, and are responsible for devastating health, social and economic consequences”. Available at: World Health Organization, “Questions and Answers”, “Neglected tropical diseases”, https://www.who.int/news-room/questions-and-answers/item/neglected-tropical-diseases#:~:text=NTDs%20are%20diseases%20of%20neglected,hard%2Dto%20reach%2Dregions.
Araújo et al. (2016), p. 174.
Translated by the authors. Original text in Portuguese is available in Araújo et al. (2016), p. 174.
FIOCRUZ (2024), “Memórias do Instituto Oswaldo Cruz”, https://memorias.ioc.fiocruz.br/.
The data and insights provided in this section will be published in more detail in the forthcoming article: Mariana G. Valente, Alice de Perdigão Lana, André Parente Houang, Copyright and Research in Latin America: law, courts, and perceptions (forthcoming 2024).
The study was led in Latin America by InternetLab (Brazil) within the activities of the global network and research project “Right to Research in International Copyright Law”, which is chaired by the Program on Information Justice and Intellectual Property at the American University Washington College of Law (PIJIP) and supported by the Arcadia Foundation. It was possible thanks to the efforts of Latin American partner organizations Fundación Via Livre in Argentina, Nurep in Brazil, Derechos Digitales in Chile, Karisma Foundation in Colombia, Hiperderecho in Peru, and Datysoc in Uruguay.
This category consists of (i) private or public research entities in humanities and economics, (ii) private or public research entities in exact or biological sciences, (iii) public or private universities with relevant participation in national research, (iv) small companies/startups doing research with data, and (v) public or private libraries, archives, and museums.
This category consists of (i) the Ministry of Foreign Affairs, (ii) the Ministry or Secretariat participating in the coordination, regulation, and/or supervision of copyright, (iii) the Ministry or Secretariat participating in the coordination, regulation, and/or supervision of research in the country, and (iv) the public research promotion agency.
This category consists of (i) associations representing private research interests, (ii) associations representing public research interests, (iii) experts on the topic in the country, and (iv) civil society research organizations.
Argentina, Brazil, Chile, Colombia, Peru and Uruguay. The Latin American countries studied for the purposes of this project were chosen based on the presence of civil society organizations that have already established relationships and worked together in the past, and that could assist with data collection and analysis related to their own jurisdictions. With this choice, we do not intend to represent the region as a whole, nor to encompass all its complexity. However, there is still value in looking for patterns in some of the key countries in the region, and studying the specificities of their models and realities.
An interviewee from a research institute in Colombia reports a case in which “it happened that the whole project and data collection had already been done, and when it was published, the journal asked for the owner of the form. We didn’t know that it had an owner”.
This type of response was seen among interviewees from different areas and countries. For example, when asked whether copyright law should be changed, an interviewee from the “Centro Interdisciplinario de Ciencia de Datos y Aprendizaje Automático” of the University of the Republic in Montevideo, Uruguay, a university research center, answered: “It is hard for me to give an opinion because I don’t really know the law”. In response to the same question, interviewees from the Colombian library “Biblioteca Luis Ángel Arango”, answered: “I am not sufficiently informed to answer this question” and “I am not a specialist in this subject and cannot go into it in depth”.
An interviewee from “Agencia ciencia y técnica”, an Argentinian association representing public interests in research, said that “Argentina's copyright law is disastrous and should be amended, as should international treaties”. Another interviewee from a private research organization in Argentina said: “Although there are several exceptions within the intellectual property law, it should be more express to protect researchers, so that they have access, to reinforce the exceptions and limitations that already exist”.
An interviewee from a research institute in Colombia said: “Research most of the time has no commercial interest, the research results are not expected to generate profits; the work is produced to generate knowledge and information that can be used in decision making. My goal is not profit, which is why I believe there must be a change in this part of the use and access to information”. Another eight interviewees made specific mention of the need to create new exceptions and limitations or to broaden the scope of existing ones, as seen below.
The co-coordinator of the “Núcleo REAA”, an interdisciplinary initiative for the development of open and accessible educational resources at Uruguay’s University of the Republic, Grade 5 Professor at the School of Engineering and researcher in the field of semantic information systems said: “There should be clear exceptions; the whole process is very obscure”.
Palmedo et al. (2023).
See e.g., Valente (2013), Drahos (1995).
See e.g., Margoni and Kretschmer (2022), p. 687, arguing that high costs with authorizations for training AI in a region – in that case the EUmay lead to scenarios where those who are not able or willing to absorb those costs may have to rely on lower quality or less curated data, and/or pre-trained models. The latter may “propagate biased, opaque and unaccountable AI given the fact that there will be little or no transparency of the underlying data used for training”. Bertón (2021), p. 1156: “If developing countries want to truly embrace the opportunities of digital technologies, especially AI, they have to proactively look for better solutions that encourage research in their territories. South America can move in this direction by introducing TDM-specific exceptions or adapting some of the more general exceptions to suit the needs of digital research. This would reduce the barriers to access content that can be used for data training and allow local scientists to rely on national copyright regimes without moving to foreign jurisdictions”.
See e.g., Margoni and Kretschmer (2022), p. 687.
See e.g., Margoni and Kretschmer (2022), p. 686: “[…] by devising the rules that regulate access to a certain technology and by allocating ownership in the elements necessary to develop it, we are shaping that technology and its impact on society for years to come”.
On U.S. case law, see Carroll (2019) and Sag (2019, 2023).
See Copyright Act 2021 (Revised Edition 2020, Act No. 22 of 2021) (Singapore). Copyright Act, 1970 (Act No. 48 of 6 May 1970, as amended up to Act No. 72 of 13 July 2018) (Japan). Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market, OJ L130, Art. 3.
By way of illustration, while in some countries, temporary copies could be deemed legal, either by force of a specific exception, like in the EU, or even excluded from the scope of copyright law when some requirements are met, copyright law in Latin America is not harmonized so as to exclude or exempt these temporary reproductions. For an analysis of the legal debates in the U.S., see Carroll (2019) and Sag (2019, 2023).
It is crucial that the legal text further clarifies the scope of the L&Es in terms of their beneficiaries by, for example, proposing definitions for “text and data mining”, “research organizations” and “public-interest-oriented institutions”, where applicable.

References

Aos Fatos (2024a) Fátima. https://twitter.com/fatimabot. Accessed 12 Feb 2024
Aos Fatos (2024b) Radar. https://www.aosfatos.org/radar. Accessed 12 Feb 2024
Araújo DAO, David LRS, Rios RSH, Veloso RR (2016) Descoberta de Conhecimentos sobre a esquistossomose a partir de Documentos Científicos Utilizando Técnicas de Mineração de Textos. Pesq. Bras. em Ci. da Inf. e Bi., João Pessoa, v.11, n.2, pp. 173–186. www.periodicos.ufpb.br/ojs2/index.php/pbcib/article/view/31846. Accessed 6 Feb 2024
Ars Technica (2016) Sunspring | a sci-fi short film starring Thomas Middleditch. https://www.youtube.com/watch?v=LY7x2Ihqjmc. Accessed 14 Feb 2024
Bertón MJ (2021) Text and data mining exception in South America: a way to foster AI development in the region. GRUR Int 70(12):1145–1157. https://doi.org/10.1093/grurint/ikab081
Article Google Scholar
Botta-Ferret E, Cabrera-Gato J E (2007) Minería de textos: una herramienta útil para mejorar la gestión del bibliotecario en el entorno digital. ACIMED, 16(04), http://scielo.sld.cu/scielo.php?script=sci_arttext&pid=S1024-94352007001000005&lng=es&nrm=iso. Accessed 6 Feb 2024
Carroll MW (2019) Copyright and the progress of science: why text and data mining is lawful. U.C. Davis L. Rev 53:893–901, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3531231. Accessed 6 Feb 2024
Caspers M, Guibault L (2016) A right to ‘read’ for machines: assessing a black-box analysis exception for data mining. Comput Sci 53(1):1–5. https://doi.org/10.5555/3017447.3017464
Article Google Scholar
Christie’s (2018) Is artificial intelligence set to become art’s next medium? https://www.christies.com/features/A-collaborationbetween-two-artists-one-human-one-a-machine-9332-1.aspx Accessed 6 Jan 2024
DatySoc (2023) Flexibilidades al derecho de autor en América Latina. https://flexibilidades.datysoc.org/mapa. Accessed 5 Feb 2024
Dermawan A (2023) Text and data mining exceptions in the development of generative AI models: what the EU Member States could learn from the Japanese “nonenjoyment” purposes? JWIP 5:1–25. https://doi.org/10.1111/jwip.12285
Article Google Scholar
Diaz Charquero P (2021) Flexibilidades al derecho de autor en América Latina. https://repositorio.cfe.edu.uy/bitstream/handle/123456789/1458/Diaz%2c%20P.%2cFlexibilidades.pdf?sequence=2&isAllowed=y Accessed 5 Feb 2024
Diaz Charquero P (2022) Derecho de autor y acceso al conocimiento en América Latina. Base de datos sobre excepciones al derecho de autor y escenarios que evidencian el atraso normativo. Infor Montevideo 27(1):55–76. https://doi.org/10.35643/info.27.1.11
Article Google Scholar
Dosi G, Stiglitz JE (2013) The role of intellectual property rights in the development process, with some lessons from developed countries: an introduction. LEM Working Paper Series 23:3–22 https://www.econstor.eu/bitstream/10419/89516/1/771928769.pdf. Accessed 6 Feb 2024
Drahos P (1995) Global property rights in information: the story of TRIPS at the GATT. Prometheus 13(1):6–19
Google Scholar
Ducato R, Strowel A (2018) Limitations to text and data mining and consumer empowerment: making the case for a right to “machine legibility”. CRIDES Working Paper Series https://doi.org/10.13140/RG.2.2.15392.84482
Ducato R, Strowel A (2021) Ensuring text and data mining: remaining issues with the EU copyright exceptions and possible ways out. CRIDES Working Paper Series No. 1/2021; forthcoming in EIPR 2021 43(5):322
Dusollier S (2020) The 2019 Directive on Copyright in the Digital Single Market: some progress, a few bad choices, and an overall failed ambition. Common Market Law Rev 57(4):979–1030 https://ssrn.com/abstract=3695839. Accessed 6 Feb 2024
Federal Justice Council (2019) III Jornada De Direito Comercial, Enunciado 115). https://www.cjf.jus.br/enunciados/enunciado/1310. Accessed 7 Jan 2024
Federal Senate of Brazil (2023) Commission of jurists responsible for subsidizing the elaboration of a substitutive bill on AI in Brazil: final report. https://legis.senado.leg.br/sdleg-getter/documento/download/bdaad0dc-5c0a-4217-a6d0-aefb0d8ec8d4 Accessed 6 Jan 2024
Ferreira MHW, Correa RF (2020) Mineração de textos científicos: análise de artigos de periódicos científicos brasileiros da área de Ciência da Informação. Em Questão 27(1):237–262. https://doi.org/10.19132/1808-5245271.237-262
Article Google Scholar
Fiocruz (2024) Memórias do Instituto Oswaldo Cruz. https://memorias.ioc.fiocruz.br/ Accessed 7 Feb 2024
Flynn S, Palmedo M (2019a) The user rights database: measuring the impact of copyright balance. PIJIP/TLS Research Paper Series No. 42
Flynn S, Palmedo M (2019b) The impact of copyright exceptions for researchers on scholarly output. Efil J Econ Res 2(6):114–139
Google Scholar
Flynn S, Geiger C, Quintais JP, Margoni T, Sag M, Guibault L (2020) Carroll MW (2020) Implementing user rights for research in the field of artificial intelligence: a call for international action. Eur Intellect Prop Rev 42(7):393–398. https://doi.org/10.2139/ssrn.3578819
Article Google Scholar
Flynn S, Palmedo M, Izquierdo A (2021) Research exceptions in comparative copyright law. 26 PIJIP/TLS Research Paper Series 72(2)
Flynn S, Schirru L, Palmedo M, Izquierdo A (2022a) Research exceptions in comparative copyright. 1 PIJIP/TLS Research Paper Series No. 75. https://digitalcommons.wcl.american.edu/research/75. Accessed 6 Feb 2024
Flynn S, Nkrumah E, Schirru L (2022b) International copyright flexibilities for prevention, treatment and containment of COVID-19. Afr J Inform Commun (AJIC) 29:1–19. https://doi.org/10.23962/ajic.i29.13985
Article Google Scholar
Friedmann D (2024) Copyright as affirmative action for human authors until the singularity. (Editorial) GRUR Int 73(1):1–2
Google Scholar
Gaceta Parlamentaria (2020) Miércoles 08 de julio de 2020 / LXIV/2SPR-29-2661/109633FN. https://www.senado.gob.mx/65/gaceta_del_senado/documento/109633 Accessed 16 Jan 2024
Geiger C (2021) The missing goal-scorers in the artificial intelligence team: of big data, the fundamental right to research and the failed text and data mining limitations in the CSDM Directive. PIJIP/TLS Research Paper Series No. 66. https://digitalcommons.wcl.american.edu/research/66. Accessed 6 Feb 2024
Geiger C, Frosio G, Bulayenko O (2019) Text and data mining: Articles 3 and 4 of the Directive 2019/790/EU. Propiedad intelectual y mercado único digital europeo. In: Saiz Garcia C, Evangelio Llorca R (eds) Tirant lo blanch, 27, Centre for International Intellectual Property Studies (CEIPI) Research Paper No. 2019-08
Goes de Jesus J et al (2020) Importation and early local transmission of COVID-19 in Brazil. J São Paulo Inst Trop Med. https://doi.org/10.1590/S1678-9946202062030
Article Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques. Elsevier. https://doi.org/10.1016/C2009-0-61819-5
Book Google Scholar
Handke C, Guibault L, Vallbé, J (2015) Is europe falling behind in data mining? copyright's impact on data mining in academic research (7 June 2015). https://ssrn.com/abstract=2608513 or https://doi.org/10.2139/ssrn.2608513. Accessed 6 Feb 2024
Handke C, Guibault L, Vallbé J (2021) Copyright’s impact on data mining in academic research. Manag Decis Econ 42(8):1999–2016
Google Scholar
Hugenholtz PB (2019) The new copyright directive: text and data mining (Articles 3 and 4). Kluwer Copyright Blog, Wolters Kluwer (24 Jul 2019) http://copyrightblog.kluweriplaw.com/2019/07/24/the-new-copyright-directive-text-and-data-mining-articles-3-and-4/ Accessed 14 Jan 2024.
Izquierdo HA (2021) Minería de textos y datos e Inteligencia Artificial: nuevas excepciones al derecho de autor. THEMIS Revista De Derecho 79:323–343. https://doi.org/10.18800/themis.202101.018
Article Google Scholar
Latin American Civil Society Alliance for Fair Access to Knowledge (2022) A review on the state of copyright flexibilities in Latin American countries. https://datysoc.org/wp-content/uploads/2022/05/Copyright-Flexibilities-LAC-Ginebra-1.pdf Accessed 2 Feb 2024
Lucchi N (2023) ChatGPT: a case study on copyright challenges for generative artificial intelligence systems. Eur J Risk Regul 1:1–23
Google Scholar
Margoni T (2023) Saving research: lawful access to unlawful sources under Art. 3 CDSM Directive? (Kluwer Copyright Blog, 22 Dec 2023) https://copyrightblog.kluweriplaw.com/2023/12/22/saving-research-lawful-access-to-unlawful-sources-under-art-3-cdsm-directive/ Accessed 16 Jan 2024
Margoni T, Kretschmer M (2022) A deeper look into the EU text and data mining exceptions: harmonisation, data ownership, and the future of technology. GRUR Int 71(8):685–701. https://doi.org/10.1093/grurint/ikac054
Article Google Scholar
Nunez NA, Crisostomo RA, Sanchez SA (2021) Uso de minería de textos para comparar los contenidos relacionados a calidad y acreditación generados en redes sociales por universidades de Perú y Chile. Form. Univ., La Serena, 14(1):111–120. http://www.scielo.cl/scielo.php?script=sci_arttext&pid=S0718-50062021000100111&lng=es&nrm=iso. Accessed 6 Feb 2024
OECD (n.d.) The pandemic has triggered an unprecedented mobilisation of the scientific community. https://www.oecd.org/sti/science-technology-innovation-outlook/crisis-and-opportunity/thepandemichastriggeredanunprecedentedmobilisationofthescientificcommunity.htm
Palmedo M, Alvarenga M, Imran M, Le D, Schirru L (2023) Measuring change in copyright exceptions for text and data mining. PIJIP/TLS Research Paper Series No. 98. https://digitalcommons.wcl.american.edu/research/98. Accessed 6 Feb 2024
Parlamento del Uruguay (2020) Derechos Autor, Excepciones y Limitaciones a Bibliotecas, Archivos y Plataformas Virtuales Academicas. https://parlamento.gub.uy/documentosyleyes/ficha-asunto/149302/ficha_completa. Accessed 7 Feb 2024
Pirela M (2023) Propiedad intelectual como herramienta para promover la transparencia y prevenir la discriminación algorítmica. Revista Chilena de Derecho y tecnologia 12. https://rchdt.uchile.cl/index.php/RCHDT/article/view/70131. Accessed 6 Feb 2024
Rosati E (2020) Copyright as an obstacle or an enabler? A European perspective on text and data mining and its role in the development of AI creativity. Asia Pac Law Rev 27(2):198–217. https://ssrn.com/abstract=3452376. Accessed 6 Feb 2024
Ruiz Lobaina EM, Romero Suárez CP (2018) Resultados Obtenidos En Un Proceso De Minería De Datos Aplicado A Una Base De Datos Que Contiene Información Bibliográfica Referida A Cuatro Segmentos De La Ciencia. Journal of Information Systems and Technology Management-Jistem USP 15:e201815003. https://doi.org/10.4301/S1807-1775201815003
Article Google Scholar
Sag M (2019) The new legal landscape for text mining and machine learning. J. Copyright Soc’y of the USA 66:291. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3331606. Accessed 6 Feb 2024
Sag M (2023) Copyright safety for generative AI. Forthcoming in the Houston Law Review, Houston Law Review, 61(2). https://ssrn.com/abstract=4438593 or https://doi.org/10.2139/ssrn.4438593. Accessed 6 Feb 2024
Samuelson P (2023) Generative AI meets copyright. Science 381(6654):158–161. https://doi.org/10.1126/science.adi0656
Article Google Scholar
Schirru L (2023) Direito Autoral e Inteligência Artificial: Autoria e Titularidade nos produtos da IA, 1st edn. Dialética, São Paulo
Google Scholar
Schirru L, Margoni T (2023) Arts 3 and 4 of the CDSM Directive as regulatory interfaces: shaping contractual practices in the commercial scientific publishing and stock images sectors. (Kluwer Copyright Blog, 22 Aug 2023) https://copyrightblog.kluweriplaw.com/2023/08/22/arts-3-and-4-of-the-cdsm-directive-as-regulatory-interfaces-shaping-contractual-practices-in-the-commercial-scientific-publishing-and-stock-images-sectors/ Accessed 7 Feb 2024
Schirru L, Souza AR, Chamas C (2024) Building a text and data mining limitation: the Brazilian case. GRUR Int. https://doi.org/10.1093/grurint/ikad136
Article Google Scholar
Senftleben M (2022) Compliance of national TDM rules with international copyright law – an overrated nonissue? (12 Apr 2022). Int Rev Intellect Prop Compet Law 53:1477–1505. https://doi.org/10.1007/s40319-022-01266-8
Article Google Scholar
Senftleben M (2023) Generative AI and author remuneration. Int Rev Intellect Prop Compet Law (IIC) 54:1535–1560. https://doi.org/10.1007/s40319-023-01399-4
Article Google Scholar
Sganga C (2024) The past, present and future of EU copyright flexibilities. IIC 55:5–36. https://doi.org/10.1007/s40319-023-01413-9
Article Google Scholar
Souza AR, Schirru L, Alvarenga M (2020) Copyright and data and text mining the fight against COVID-19 in Brazil. LIINC 16(2)1–15. http://revista.ibict.br/liinc/article/view/5536/5133. Accessed 6 Feb 2024
Souza AR Schirru L, Alvarenga M (2022) COVID-19, Text and Data Mining and Copyright: The Brazilian Case. January 2022. In book: WIPO-WTO Colloquium Papers Vol. 11, WIPO-WTO
Supreme Court of Justice (2011) Case No. 964.404 ES 2007/0144450-5 (Braz.)
Ueno T (2021) The flexible copyright exception for ‘non-enjoyment’ purposes – recent amendment in Japan and its implication. GRUR Int 70(2):145–152
Google Scholar
Valente MG (2013) Direitos autorais como comércio internacional: desafios políticos. In: Nalini JR (ed) Propriedade Intelectual em Foco, 1ed, vol 1. Revista dos Tribunais, São Paulo, p 120
Google Scholar
Valente MG, Lana AP, Houang AP (2024) Copyright and Research in Latin America: law, courts, and perceptions (forthcoming)
Vallbé J (2023) Impact of copyright regulation on DM research. Presentation given in the User Rights Network Symposium: Protecting Copyright User rights from Contractual Override, sharing some of the results of a research project involving the following authors: Flynn S, Palmedo M, Alvarenga M, Handke C, Coma B, Guibault L, Vallbé JJ available at https://www.youtube.com/watch?v=2bs_e7QRDHo. Accessed 6 Feb 2024
Wellcome (2020) Publishers make coronavirus (COVID-19) content freely available and reusable. Press Release, WELLCOME (15 Mar 2020) https://wellcome.org/press-release/publishers-make-coronavirus-covid-19-content-freely-available-and-reusable Accessed 12 Feb 2024
World Health Organization (2023) WHO Coronavirus (COVID-19) Dashboard. https://covid19.who.int/ Accessed 4 Feb 2022

Download references

Funding

Open access funding provided by University of St.Gallen.

Author information

Authors and Affiliations

Executive Director and Researcher, Brazilian Copyright Institute, Rio de Janeiro, Brazil
Luca Schirru
Copyright Professor, Graduation Program on Public Policy, Strategies and Development at the Federal University of Rio de Janeiro (PPED/UFRJ) and the Civil Law and Humanities Department of the Instituto Três Rios of the Federal University of Rio de Janeiro (DDHL/ITR/UFRRJ), Rio de Janeiro, Brazil
Allan Rocha de Souza
Assistant Professor, University of St, Gallen, St. Gallen, Switzerland
Mariana G. Valente
PhD Candidate, Public Policy, Strategies, and Development at the Federal University of Rio de Janeiro (PPED/UFRJ), Rio de Janeiro, Brazil
Alice de Perdigão Lana

Authors

Luca Schirru
View author publications
You can also search for this author in PubMed Google Scholar
Allan Rocha de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Mariana G. Valente
View author publications
You can also search for this author in PubMed Google Scholar
Alice de Perdigão Lana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mariana G. Valente.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Luca Schirru is Executive Director and Researcher at the Brazilian Copyright Institute. He is a Research Fellow at the Centre for IT & IP Law (CiTiP-KU Leuven), Belgium. He was awarded a full scholarship for the LLM program in Intellectual Property and Technology at American University Washington College of Law (2021/2022) and the Arcadia Fellowship in International Copyright. He is Copyright Professor at the Specialization Program on Intellectual Property Law at the Pontifícia Universidade Católica of Rio de Janeiro (PUC-RJ). Allan Rocha de Souza is Copyright Professor at the Graduation Program on Public Policy, Strategies and Development at the Federal University of Rio de Janeiro (PPED/UFRJ) and in the Civil Law and Humanities Department of the Instituto Três Rios of the Federal University of Rio de Janeiro (DDHL/ITR/UFRRJ), Brazil. He also teaches Copyright on the IP Specialization Course at Pontifícia Universidade Católica (PUC-RJ). He is Scientific Director of the Brazilian Copyright Institute (IBDautoral), a copyright consultant at Fundação Oswaldo Cruz (FIOCRUZ), and a lawyer. Mariana G. Valente is Assistant Professor with a tenure track at the University of St. Gallen, Switzerland, and Associate Director and Board Member of InternetLab (law and policy research center, Brazil). She is Scientific Council member of the Brazilian Copyright Institute (IBDAutoral) and member of the Legal Affairs Committee of the International Council of Museums. Alice de Perdigão Lana is a PhD candidate in Public Policy, Strategies, and Development at the Federal University of Rio de Janeiro (PPED/UFRJ), Brazil. She holds a bachelor and master’s degree from the Federal University of Paraná School of Law (PPGD/UFPR). She is a member of the board of the Copyright Observatory Institute (IODA) and the board of Creative Commons Brazil.

This article was written within the scope of the Project on the Right to Research in International Copyright, funded by the Arcadia Foundation.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schirru, L., Rocha de Souza, A., Valente, M.G. et al. Text and Data Mining Exceptions in Latin America. IIC (2024). https://doi.org/10.1007/s40319-024-01511-2

Download citation

Accepted: 15 July 2024
Published: 19 September 2024
DOI: https://doi.org/10.1007/s40319-024-01511-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Text and Data Mining Exceptions in Latin America

Abstract

Explore related subjects

1 Introduction

2 Text and Data Mining and Copyright

2.1 What is Text and Data Mining?

2.2 Is TDM a Copyright Issue?

2.3 What Changes with Generative AI?

3 TDM Exceptions Worldwide

3.1 Research (and TDM) Exceptions Worldwide

3.2 The Relationship between Balanced Copyright Regimes and TDM Research

4 Text and Data Mining in Latin America

4.1 TDM and Research Exceptions in Latin America

4.2 TDM Practices and Research in Latin America

4.2.1 The Role of TDM in Health Research

4.3 Evidence on the Interplay between Copyright and (TDM) Research

5 Conclusion: The (Urgent) Need for a Proper Legal Framework for Research

Notes

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation