On Trusting a Cyber Librarian: How Rethinking Underlying Data Storage Infrastructure Can Mitigate Risksof Automation

Israel, Maria Joseph; Graves, Mark; Amer, Ahmed

doi:10.1007/978-3-030-76426-5_3

Maria Joseph Israel²⁰,
Mark Graves²¹ &
Ahmed Amer²⁰

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 377))

Included in the following conference series:

International Conference on Intelligent Technologies for Interactive Entertainment

564 Accesses

Abstract

The increased ability of Artificial Intelligence (AI) technologies to generate and parse texts will inevitably lead to more proposals for AI’s use in the semantic sentiment analysis (SSA) of textual sources. We argue that instead of focusing solely on debating the merits of automated versus manual processing and analysis of texts, it is critical to also rethink our underlying storage and representation formats. Further, we argue that accommodating multivariate metadata exemplifies how underlying data storage infrastructure can reshape the ethical debate surrounding the use of such algorithms. In other words, a system that employs automated analysis typically requires manual intervention to assess the quality of its output, and thus demands that we select between multiple competing NLP algorithms. Settling on an algorithm or ensemble is not a decision that has to be made a priori, but when made, involves implicit ethical considerations. An underlying storage and representation system that allows for the existence and evaluation of multiple variants of the same source data, while maintaining attribution to the individual sources of each variant, would be a much-needed enhancement to existing storage technologies, as well as, facilitate the interpretation of proliferating AI semantic analysis technologies. To this end, we take the view that AI functions as (or acts as an implicate meta-ordering of) the SSA sociotechnical system in a manner that allows for novel solutions for safer cyber curation. This can be done by holding the attribution of source data in symmetrical relationship to its further multiple differing annotations as coexisting data points within a single publishing ecosystem. In this way, the AI program allows for the annotations of individual and aggregate data by means of competing algorithmic models, or varying degrees of human intervention. We discuss the feasibility of such a scheme, using our own infrastructure model, (MultiVerse), as an illustrative model for such a system, and analyse its ethical implications.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Once Upon a Time, “About 1.170 Results”: Exploring the Narrative Power of Search Engines

Article Open access 20 September 2024

Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations

Forget Me, Forget Me Not - Redefining the Boundaries of the Right to Be Forgotten to Address Current Problems and Areas of Criticism

Keywords

1 Introduction

Artificial Intelligence (AI) is increasingly touching and structuring our lives. AI helps enhance our ordinary experiences with its tailored news, real time traffic updates, more accurate weather forecasts, better personal time management, delivery of online meetings, global email communications, and cost-efficient healthcare diagnosis. Where the moral nature of the personal use of AI in these examples is largely beneficial, implicit and nominal, its impact becomes more direct and ethically ambiguous when employed essentially to judge people. Today AI is being used to predict one’s ethnicity [29], credit-worthiness for a loan or mortgage [7, 39], academic grade [38], or political leanings [54]. More recently, the literal judgement in court sentencing is increasingly influenced by “risk assessment” AI with potentially dire consequences to these developments [8, 71, 72]. Although seemingly innocuous, the application of algorithms for the micro-evaluations of a text demands moral explication. The use of Natural Language Processing (NLP) algorithms varies from its application to judge the veracity of a text’s authorship, to the assessment of a written work’s sub-text such as the writer’s sentiment in the piece [20, 60]. Because automated sentiment analysis and similar textual processing become more efficient as one increases the data available for the AI, it seems unlikely that this practice will cease, indeed it may be the only way to handle the exponential volume of automatically generated text flooding online media channels. The interconnectedness of data sets used by the AI mean there are no neutral or bias-free domains of knowledge. The need to automatically identify bad actors posting online news [2, 49] or social media [58, 75], can wrongfully limit an individual’s freedom of speech or be gamed effectively by deliberate bad actors or states. These situations contextualize the ethical and professional domain of the hypothetical cyber-archivist, the AI librarian or scholarly assistant who processes written data and annotates it for further analysis or classification.

AI’s usefulness for all such cyber-archivist tasks is undeniable, given its ability to quickly sift through massive datasets and to detect and trace patterns that would be impossible for a human to process with any efficiency [26, 57]. For example, given human limitations and financial considerations, combing through online media posts to detect trends in public sentiment, or to detect spam in individual post comments, would require more personnel hours than could reasonably be brought to bear by any individual party or organization. As more and more data about our world becomes available and meets computing power to process it as never before, this apparent usefulness can only grow. But whether or not such usefulness is truly beneficial, or merely an invitation to hand over human judgment to fallible algorithms, given the potential for bias and error, is a topic of intense debate [33, 36, 41, 47, 67]. And when AI is used to process and pass judgment upon large data sets, attempts to improve the quality of an AI solution may be hindered by the very nature of the data that leads us to embrace such solutions – specifically, its vastness. For example, if an AI model that has processed vast volumes of data is found to be flawed, then correcting such a flaw and embracing a new model may be impossible without entirely reprocessing the vast datasets involved. This could mean that opportunities to embrace new, more trustworthy, AI models (or to simply tweak existing models to correct a minor flaw), would be lost to us without sufficient information being preserved regarding more than simply the results of prior processing.

To debate the merits and perils of applying such technologies without consideration for how the underlying technological infrastructure could be changed to promote or discourage risks, is a necessary ongoing ethical conversation, for any blinkered views could lead to an inaccurate and potentially harmful AI model.

Given the fundamental nature of this problem for all AI models we will consider the role of automated algorithms in rendering judgment without reference to a specific domain, that is, in its most general form as a processor of data that mimics human judgment. More specifically, we look to how artificial automation is analogous to an archivist or librarian citing, archiving, and scholarly critiquing data. We are, therefore, dealing with the question of whether or not a cyber-archivist can be both useful and safely trusted. In deciding whether or not to place AI technology in a position of trust, the question is not merely whether the AI can be trusted to offer good judgments, but also critical is how that technology, and the judgements it makes, is integrated into the broader system. The questions of whether or not an AI’s judgment can be trusted is not therefore our focus, but rather we look at the manner in which it is best applied. We illustrate the potential to overlook this by illustrating how underlying infrastructure can impact the amount of trust placed in AI, and we do this by describing our system, MultiVerse^{Footnote 1}, which allows us to support the coexistence and processing of multiple (competing, and potentially conflicting) decisions within the same archive. In other words, we argue that the ethical dilemma posed by whether or not AI can be trusted in roles of judgment can be mitigated by building better technological infrastructure underlying such AIs and affecting how AI and humans interact and collaborate. Specifically, we use the analogy of a flawed cyber-archivist, being trusted thanks to the construction of a suitably resilient library, rather than being the subject of attempts to create a flawless AI to serve as a trustworthy cyber-archivist.

The rest of the paper is organized as follows: Sect. 2 discusses the related work covering the efforts in tackling the trustworthiness of automated systems and the importance of human-computer interaction. Section 3 further leads the ethical discussions of AI/ML as understood by the proponents and opponents of cyber-archivists. Section 4 briefly describes our project, MultiVerse, as an illustrative example to discuss the importance of the underlying data storage infrastructure of an automated system, and broader ethical concerns. In particular, our focus in this paper is on the broader conflicting ethical implications that can be impacted by such focus on systems infrastructure (e.g., data privacy versus veracity, accuracy versus authenticity, efficiency versus transparency, and the ongoing need for more explainable AI)

2 Related Work: The Problem of Flawed Librarians

With our use of a library analogy and its focused use of text analysis and annotation, it is necessary to acknowledge the efforts that lead us to this work. In particular, there is a large body of works on automating the processing of textual data and considerable recent efforts in tackling the trustworthiness of such automated systems. One particularly promising approach has been to consider how humans and AI can most beneficially interact. Our proposal, to focus more on the underlying storage infrastructure as a means of mitigating potential problems, builds upon our ongoing work, and a considerable body of prior research, in the domain of data provenance.

Tools and techniques in automating data science, also known as AutoML/ AutoAI, are the subject of research in many companies and open source communities [22, 46]. Given the speed and cost-effectiveness of AI for such tasks, there is optimism in the industry that AI/ML systems can eventually replace the thousands of human workers who are currently involved in making decisions, for example, automated comments moderation on social media [44]. Other examples of automated ML and NLP techniques for semantic sentiment analysis include: financial microblogs and news [21], twitter [52, 62, 63], big social data [25], clinical analytics [59], specific language-based literature [3, 48, 50], and publishing domains [9, 12, 73]. These systems have the potential to perform moderation much faster than human moderators, which is attractive for more than simple performance/cost reasons (since removal of harmful content quickly can reduce the harm it causes). Automating humanly laborious tasks not only facilitates scalability, it is also promoted for its potential to introduce consistency in performing allocated tasks/decisions. But this is not necessarily a good thing, if an error or a bias is consistently and reliably propagated across vast volume of data and large number of people.

Despite the many benefits of automated ML and NLP techniques, their use introduces new challenges. In an AI-automated system, identifying tasks that should be automated and configuring tools to perform those tasks is crucial. Perhaps there are those who view the biggest hurdle in accepting AI-generated models to be the lack of trust and transparency, given the potential for large-scale harm due to errors [46]. Attempting to understand an intelligent agent’s intent, performance, future plans, and reasoning process is a challenge. Accurate automated systems are not an easy task. These challenges place a greater emphasis on how AI and humans interact, and prior research on this point – Computer Supported Cooperative Work (CSCW) research – has established that a fundamental socio-technical gap exists between how individuals manage information in everyday social situations versus how this is done explicitly through the use of technology [5, 30]. Often, technical systems fail to capture the flexibility or ambiguity that is inherent in normal social conditions [1]. Concurrently, research findings reveal the deficiencies of AI in making decisions that require it to be attuned to the sensitivities in cultural context or to the differences in linguistic cues [1, 65, 73]. These failures to detect individual differences of context and content can have serious consequences, for example, in failing to distinguish hate speech and misinformation from newsworthiness in automated news feeds can have serious consequences. In fact, these failures to address context issues and misinformation on automated Facebook or WhatsApp content regulation arguably contributed to violence in Myanmar [66]. Overcoming these obstacles requires human ingenuity and the moral to engage artificial intelligent systems.

To overcome these challenges and to boost user’s morale to act upon an artificial intelligent system requires human intervention. The Human-in-the-loop system or Human-guided machine learning [30] taps the speed and processing power along with human intuition and morality. Hybrid AI-Human systems forge a strong collaboration between artificial and organic systems and this opens a way to solve difficult tasks that were once thought to be intractable. To be ethical, this man-computer symbiosis must be characterised by the cooperation of machines with humans. The machine and AI systems should not be designed to replace the natural skills and abilities of humans, but rather to co-exist with and assist humans in making their work and lives more efficient and effective. Fortunately, some progress towards this goal has been made. Some works that combine human-in-the-loop collaboration with AI for solving difficult problems include, but not limited to: image classification [70], object annotation [61, 69], protein folding [56, 68], disaster relief distribution [28], galaxy discovery [43], and online content regulation [35].

Human-Computer Interaction (HCI) and in particular Computer-Supported Cooperative Work (CSCW) are not radically new concepts in spite of their current urgency. The concept of symbiotic computing has been around since the early 1960s “Man-Machine Symbiosis” work by J. C. R. Licklider [42]. Licklider envisioned computers serving as partners whose interactive design as intelligent agents would collaborate with human beings to solve interesting and worthy problems in computing and society. His view can be universally applied to any technologies that extend or enhance humans abilities to interact with their environments, and can therefore be considered a persistent question surrounding our interaction with AI.

More generally, as long as human operators and new automated systems simultaneously adapt, they will co-evolve. However, it remains important to remember that the socio-technical gaps that CSCW problems generalize, are never completely resolved and continued efforts to “round off the edges” [1] of such coevolution is necessary. Given the shortcomings of automated tools and the required careful human administration of these tools, we propose that instead of developing fully automated systems that require perfection for complete autonomy, researchers and designers should make efforts to improve the current state of mixed-initiative regulation systems where humans work alongside AI systems.

Since automated tools are likely to perform worse than humans on cases where understanding nuance and context is crucial, perhaps the most significant consideration is determining when automated tools should perform certain tasks by themselves and when results of such tasks need to be reviewed by human actors. We echo calls by previous studies for building systems that ensure that the interactions between automation and human activities foster robust communities that function well at scale [65].

MultiVerse looks at how an AI’s improved infrastructure, for the preservation of both source data and its annotations (including AI generated annotations), can help grant greater resilience to decisions making capacities of AI-human systems. Our approach simplifies these decisions, as well as, allots for their safe reversal or delaying their implementation. In this way, a boon is made for explanatory data that supports these decisions of critical importance in the creation of accessible AI that also complies with the legislative demands for transparency like the EU’s General Data Protection Regulation (GDPR) [32, 34, 64]. It does so by preserving more data regarding decisions/outcomes (i.e., automated results and judgments), their annotations as they are produced. In its support of the preservation of multiple versions of data, our approach is commensurate with both AI (XAI) algorithms used in a black box neural networks and those transparent box presentations of data such as decision trees. These features combine to grant greater flexibility in how humans verify the results or describe its data sources or when the results require explanation. To offer such a richer storage infrastructure, we leverage a novel architecture built upon our own extensions of data provenance research. Data provenance research is focused on the preservation and presentation of the origins and transformations of stored data, and has typically been narrowly employed for the management of project data like scientific workflow or code management [4, 10, 16, 27, 53].

3 The Proponents and Opponents of Cyber-Archivists

Opposing Camps of AI: While AI systems present enormous potential benefits, they are not without problems. As a result, there are opposing camps arguing extreme views on the acceptance or rejection of AI. The optimists of AI, like Ray Kurzweil, an inventor and futurist [40] and other AI enthusiasts [45], predict a utopian future of immortality, immense wealth, and all-engaging robotic assistants to humans, ushered in with the singularity AI help. These techno-optimists believe that Genetics, Nanotechnology and Robotics (GNR) with ‘strong AI’ will revolutionize everything “allowing humans to harness speed, memory capacities and knowledge sharing ability of computers and our brain being directly connected to the cloud” [40]. On the other hand, there are those who argue AI risks and its potential dystopian consequences. The critics of strong AI include the likes of Bill Joy, a computer engineer, co-founder of Sun Microsystems, and venture capitalist [37], Stephen Hawking, a theoretical physicist [14], and Nick Bostrom, a philosopher at the University of Oxford [11]. They believe that AI is “threatening to make humans an endangered species and second rate status” [45]. But there are others like Sam Altman, an entrepreneur and CEO of “OpenAI” and Michio Kaku, a theoretical physicist and futurist, who believe that AI could be controlled through “openAI” and effective regulation [55]. They believe that humans could learn to exploit the power of the computers to augment their own skills and always stay a step ahead of AI or at least not be at a disadvantage. The spectrum on this is expansive as it ranges between the extremes of reactive fear and complete embrace of AI. Both accounts fail to make a rational and ethical assessment of AI. The practical debate, the real question, is not whether AI technologies should be adopted, but how they can be most beneficially, and most safely, adopted.

Algorithmic Transparency: How algorithmic decisions are embedded in a larger AI system is difficult and specialized area of study. When an AI system produces outputs that can lead to harm, the likelihood of realizing that, let alone remedying it, can often be blamed on a lack of transparency regarding how the outcomes were reached. This has led to increasing demands for algorithmic transparency. But the immediate claim that these problems can be remedied by greater algorithmic transparency offers little more than the self-evident. Basically, any process or technology that does not offer perspective on its manner of operation is inherently suspect, and unlikely to be trusted. There is, of course, a place to discuss the philosophical notion of transparency as an ideal. Indeed, it can be argued that the genealogy for any one practical instantiation of the transparent is ultimately found in epistemological speculation concerning the nature of truth.

Recently, transparency has once again taken a prominent place in public governance systems, where social activists strive for greater government accountability. In AI, as with these practices, transparency is touted as a way to disclose the inherent truth of a system. In the context of AI, it is understood as taking a peek inside the black-box of algorithms that enable its automated operations. However, we view transparency for AI systems more broadly, not as merely seeing phenomena inside a system, but rather, across the system, as argued by Ananny and Crowford, and Crawford [6, 15]. That is, not merely as code and data in a specific algorithm, but rather to see “transparency as socio-technical systems that do not contain complexity, but enact complexity by connecting to and intertwining with assemblages of humans and non-humans” [6]. In other words, it is better to take account of the more complete model of AI and this includes a comprehensive view of how humans and algorithms mutually intersect within the system [15]. Without a sound understanding of the nature of algorithmic transparency and decision making, a false conflation of the “algorithmic operation” and human policy failings is possible. This is an especially troubling occurrence when inherent bias in an AI model is applied to the judicial system as evident in the the scandalous COMPAS revelations about the Correctional Offender Management Profiling for Alternative Sanctions algorithm [8, 71, 72].

Accountability Beyond Algorithmic Transparency: In the ideal, algorithms are transparent when they are predicative, enable benefits given they are fundamentally neutral, unbiased. As stated previously, it is logically possible that deterministic, flawed or discriminatory algorithms may on occasion produce equitable outcomes – an AI system must be continuously evaluated [23]. On this reality, Dwork and Mulligan state concerning AI “the reality is a far messier mix of technical and human curating” [23]. AI has moral implications, but never in isolation of the context in which it is applied. When AI has a negative impact, the assumption of fault and responsibility differs based on your perspective and role.

If algorithms are presented as an open book, then the developers of algorithms have less responsibility when they are misapplied. On the other hand, if algorithms are constructed as a black-box, or an autonomous agent operating with an opaque logic, then the users are denied accountability for how algorithms make decisions that affect them. In essence, the developers of such systems are asking that their judgment be trusted blindly, and would therefore be expected to shoulder more responsibility for any future problems.

There are also different default assumptions depending on the role one plays. Generally speaking, the present legal system does not hold firms responsible for the misuse of algorithms they develop [46, 72], but they can be held responsible for systems they sell. From the perspective of software developers, their algorithms are neutral and so a failure is more likely assumed to be due to users’ thrusting algorithms into fallible contexts of biased data and improper use. At the users’ end, algorithms are difficult to identify and comprehend and therefore they aren’t typically held accountable for the ethical implications of their use [46]. [18] and [74] suggest that as algorithms seem to be unpredictable and inscrutable, assigning the responsibility to developers or users is ineffective and even impossible, but firms could be better held responsible for the ethical implications of their products’ use. [46] conceptualizes algorithms as value-laden in that algorithms create moral consequences, reinforce or undercut ethical principles, and enable or diminish stakeholder rights and dignity. In other words, ascribing responsibility for algorithms resulting in harm is very tricky. This lack of clarity is a hurdle to responsible and ethical adoption of algorithms in critical roles, e.g., when they are placed in roles that require them to pass judgment. But it is insufficient to say that these risks need only greater transparency of the algorithm, for the algorithm alone is never responsible for the outcome, and transparency needs to expose more than the workings of an individual algorithm to offer the most resilience and trust possible. Moreover, an algorithm’s transparency and one’s relevant faith in it involves the quality of data it processes, the structure of the AI from which it operates and larger socio-cultural considerations introduced with human involvement.

Without striving for transparency beyond the specific algorithm, i.e., striving for a broader, more holistic view of the system, we may miss opportunities to build better and more resilient AI-enhanced systems. Returning to our analogy of a cyber-archivist, we would argue that simply offering a view of the workings of a particular instance of such an AI is to pass on the opportunity to really understand the overall system and lessen later opportunities to harden it against failures. Specifically, imagine if one particular algorithm for processing a large dataset was deemed to be the best, and was employed for several years with acceptable performance (including full transparency regarding its implementation), but that it was discovered that its outputs were flawed for certain edge cases that could have been caught with a superior algorithm. The only way to remedy this, would seem to be to reprocess the entire dataset (assuming it is still available), and to compare the outputs of the algorithms. But if the data storage infrastructure had the facility to support the operation of both algorithms, and the maintenance of the provenance of their outputs, then this process would be feasible without a reprocessing of the potentially vast datasets (assuming they are still available). It’s exactly this kind of increased accountability and accounting that is possible if we aim for transparency that goes beyond the algorithm alone, and is enabled with infrastructure that can support such a goal. Our MultiVerse system is an example of such an infrastructure.

4 Trusting the Cyber-Archivist – MultiVerse

MultiVerse is designed as a digital data infrastructure that preserves multiple perspectives, and thereby allows better support for multicultural digital content. We contend that in order to better support transparency, intercultural ethics, and more ethical digital media curation across cultures, such an infrastructure is needed. So, what is MultiVerse? MultiVerse is a digital data representation infrastructure intended to track provenance of multi-varied translations of scholarly texts and their derivatives. Provenance can be defined as the recording of the history of user activities that create and transform data. The MultiVerse infrastructure allows users to remix/combine existing translations and/or add one’s own personal translations at will and add annotations to it. Annotations can be made regarding the scope, context, or other relevant metadata. MultiVerse is primarily concerned with the metadata needed to store such provenance alongside the data to which it refers. In this project, provenance tracking is done by capturing all translations (users’ activities) without any preferences, prejudices, and prizes (value judgements/correctness), at the time of their composition.

To realize this concept, we have used the well known 13th century Italian poet Dante Aligheri’s the Divine Comedy, and some of its many English translations [17]. We have combined these into a single repository that allows the remixing and composition of new translations, while offering detailed tracking of the origins and transformations of such texts. A user has the option of either collating different versions of verses or adding in his/her versions of verses from/to this repository to compose his/her unique version of translation of the Divine Comedy. Moreover, the user can tag richer semantic metadata like context, intent, scope, or tone/sentiment to his/her composition. Multiple versions of the Divine Comedy are thereby stored in a single repository with rich version histories. A high level architectural overview and the user API of MultiVerse are depicted in Fig. 1.

The primary purpose of this project is to demonstrate the importance of a robust data storage technology, in the context of human-in-the-loop system, that captures and represents pluralistic views cutting across individuals’ cultural, ethnic, religious, gender, social, political, economic, emotional, etc., stances/ viewpoints. At its very beginning, a key design principle of MultiVerse is to enhance technology to represent pluralistic multicultural perspectives of all users, rather than after-the-fact. This is achieved by designing MultiVerse which enables users to record not only their views irrespective of their correctness but also accommodate their contexts and intents.

We might ask, “what are the benefits of this technology design principle in the first place?” Without arguments, it can be stated that all voices (decisions/judgements) are preserved. Single versions can be presented on demand. But the history and identity of those who selected the individual versions and the provenance of the documents can be permanently stored using blockchain technology [51] and can not be tampered with in any way. By virtue of its immutability, MultiVerse becomes a means to establish the source of any loss of nuances, and makes arguments (by allowing future archaeology on such repositories) about the correct form moot. More precisely, while it does not eliminate contention over the ideal translation, it does not force that debate to be fought over the preserved version. There need be no permanent winner, and past mistakes can be corrected in future revisions. But this leads us to consider the broader ethical implications of such multicultural pluralistic digital infrastructures.

In the context of AI, it helps to record the decisions of users and machines, and to preserve them for as long as they might be needed. Such logs are useful in case we need to revisit them, whether to better understand past behavior or to further enhance future decisions. As the underlying data storage repository in MultiVerse preserves all versions of decisions in an immutable manner, any additions, deletions, and modifications may be made as annotations without corrupting original logs, or conflicting with their subsequent versions. It thereby helps by protecting their lineage/provenance.

4.1 Using MultiVerse

A user creates a resource (multi-varied translated texts in our initial example, MultiVerse), either by copying an existing resource, or by newly translating a resource. For this new resource, or for an existing resource we wish to add to the system, MultiVerse generates a few standard properties, also known as the structural metadata for that resource. This metadata describes the resource, such as its type, size, creation date, and creator, and adds this information to a provenance log. The Semantic Metadata Module extends this mechanism to allow generation of user-specified descriptive properties such as context, scope, and sentiments. These additional properties are a concrete example of what we mean by “richer semantic metadata.” These properties will be based on the uploaded data as well as newly derived sources. Consequently it is possible to register new translations for existing resources and/or generate a new resource. This new resource can be described as a source, with its own location, i.e., context (which would be, for example, specified through a URL). It could, for example, be generated from an existing resource via a copy operation (where that existing resource would be the source for this copy). To help track a copied resource’s origin, Semantic Metadata Module adds a source property, which becomes part of the provenance of the resource. This source property is added to the new copy, which links it to the original URL.

Once a user integrates a translated version of the data into his/her work space, the user can proceed to the next task in the plan. In the next task, if a user chooses to make his/her own translation, the Semantic Metadata Module generates a hasTranslation property and enables a user to tag information about the user-as-the-translator, its creation time, context, and scope of the translation. Using the provenance log, the Semantic Annotation-Provenance Module will help document the data’s provenance into annotated provenance documents that contain both structural as well as user-specified descriptive metadata.

Given the final derived product’s URL, anyone granted access to MultiVerse can trace backward following the links in hasSource and hasTranslation properties to discover the input data and relevant user-specified metadata entries. This kind of query would not be possible without the added metadata (i.e., the semantically-enrichable provenance framework we have proposed in MultiVerse). Adding this metadata would increase the storage demands of the system as a whole, but these would be increases in capacity demands (simply the volume of data stored, as opposed to the needed storage system performance), which is arguably a cheaper resource than the time, energy, and temporary storage demands of having to reconstitute such information at a later point in time. In other words, assuming that it is possible to reconstruct the varied versions of our data at a later date (which is not necessarily possible), then there is a tradeoff between efficient space utilization today, and the cost of future computation and data retrieval demands tomorrow. The decision to store such metadata thereby holds efficiency considerations, in addition to the added transparency it could provide.

MultiVerse is not just a repository of multivariate data, but a means of ensuring the preservation of those versions against malicious action attempting to rewrite history, hence the immutability requirement is incorporated into MultiVerse. To keep such a repository consistent, it is structured as an immutable data store, allowing the addition of new content and amendments, but disallowing any modification or deletion of data that has been committed to this store. The immutable aspect of MultiVerse is achieved by adapting a basic model of blockchain technology [51]. The technical details of blockchain technology is beyond the scope of this paper. The interactive aspects of MultiVerse are enabled by offering a user application programming interface (API) to annotate semantic analysis decisions and allow access to the repository in a secured manner. We discuss the ethical implications of the MultiVerse framework in the next subsection.

4.2 Ethical Considerations

A moral question that arises on MultiVerse is: How does MultiVerse change the ethical debate around allowing an algorithm to judge/annotate and provide an actionable opinion? Our approach, illustrated through the MultiVerse example, shows that it is possible to construct systems whose impacts are more easily reviewed and evaluated against each other (since multiple versions are readily accessible for comparison), or that allow decisions taken by an automated algorithm to be less permanent in their effect (since alternative results that have been preserved, can be retroactively embraced). In other words, by allowing for one of three outcomes:

1. The decisions can be undone by preserving results of the prior decision and superseding it by adopting an alternate decision at a later date;

2. If undoing is not possible, then perhaps it allows us to defer making the decision at all, if we delay the aggregation or selection amongst alternative annotations (judgements) until the latest possible point in time, we would have guaranteed the adoption of the best and fairest technology available for that decision; or finally,

3. Assuming that decisions can neither be undone nor delayed, it is still beneficial to have on hand the results of competing models, if only to aid the more rapid analysis and evaluation of new and improved models, and to improve and accelerate our understanding of where and how defunct models may have failed.

On the contrary, leaning too heavily on an ability to defer or delay decisions, or a false sense of immunity to bad decisions, can lead to more reckless human adoption of algorithmic decision-making technologies.

However, one view of what distinguishes human intelligence from AI in decision-making is our ability to make connections in ways that are not formalizable (through unconscious processes, for example, or by involving emotions). When seen from that perspective, an AI algorithm would be a tool enacting what is ultimately a human will. That human will may be inexplicable, but the algorithms can and should be transparent and open to revision, making it easier to adopt in an informed manner. The use of an infrastructure like MultiVerse may aid in documenting such open algorithms, or may host the results of more opaque algorithms. It does not dictate taking one approach or the other.

The moral considerations of MultiVerse are slightly different than the moral considerations of using AIs for sentiment analysis. Harm is mitigated by potentially making sure that no decision is necessarily permanent, or that bad decisions can be attributed to specific sources (allowing for greater accountability), but this still leaves concerns. It is possible to confuse the mitigation of harm with the elimination of the possibility of harm, which of course is not the case here. A decision can be revised if enough provenance data is available to retroactively consider alternatives, but the effects of decisions might not be reversible (e.g., we can learn to improve a sentencing algorithm, but cannot expect any data storage system to restore a single day of unjustly lost freedom. While it may be possible to retroactively determine what a sentence algorithm could have recommended, it is definitely not possible to undo a sentence that has already been served). A potential harm that could be introduced arises if users of MultiVerse are lulled into a sense of complacency, such that human errors that would result in poor decisions might be made more often. MultiVerse provides the ability to mitigate harms and add greater accountability, but it is still up to individual deployments of systems to actually monitor the performance of their “cyber-librarians” and to temper their decisions when there is doubt about the quality of their outputs.

A significant portion of the potential harm of automated systems can arise as a result of those systems shifting the focus of responsibility away from humans. In other words, when we lose accountability, harm caused by acting on AI-provided data would not necessarily be blamed on those who should have maintained human oversight of how we got there. A mechanism that can improve the accountability of such systems, improving tracking of problems to failures of algorithm selection or oversight, would therefore have the potential to encourage both system builders and system adopters, to be more conscientious and ethical (thanks to an awareness of provenance tracking), but may also be helped in their oversight tasks thanks to the long term evaluation and auditing of the performance of different algorithms. The different choices regarding whether we defer to the algorithms, when and how often we defer to the algorithms, or when and how often we defer to the algorithms that are deployed for a specific problem is a question related to best practices around auditing and system improvements.

Finally, one might perceive MultiVerse as a system that is deliberately designed to record too much metadata, thereby creating an unnecessary information overload; or as a scientific apparatus to dissect the intellectual work of others; or as a blockchain mechanism to prevent the ability to edit what is stored. This leads to the issue of (data) privacy in the context of immutability of stored information about persons interacting with MultiVerse. To prevent these undesired consequences, there is a choice, by design, for users either to opt out from recording all their creative activities or to opt in to reveal as much as it is needed or to choose documenting the synthesizing process of a digital product. Such decisions regarding opting in or out would affect what data is recorded by the system, but it’s important to recognize that when it comes to the question of an individual’s right to be forgotten, such a question is not simply decided by the presence or absence of data, but is a question of the retrievability of such data. A data store can be immutable, and hold data that is never completely removed, and yet can still honor an individual’s right to be forgotten within such a system, for example, adapting users’ data access and retrieval rights and policies as appropriate.

To return to the use a library analogy, we go beyond prior efforts by focusing less on making librarians less flawed, but instead highlighting how an improved library could perhaps lessen the risk of harm posed by less-than-perfect librarians, and help all who support and benefit from librarians to better support and improve the library.

5 Concluding Remarks and Further Applications

To demonstrate how rethinking underlying technical infrastructure can reshape the questions we face with AI, we illustrated an example of one such “rethought” realization of a data storage system. By combining elements of version control systems, trusted immutable stores, and provenance technologies, MultiVerse shows that we can defer and revise decisions between human and automated analysis.

Such an infrastructure functions as an example of how to critically rethink the either/or decision regarding the applicability of AI. In fact, this infrastructure is useful for any AI domain that involves NLP and text processing/classification of texts, etc. While we’ve used the analogy of a librarian, to emphasize that our focus is on systems that automate the processing and tagging of textual information, our arguments should hold for any data processing task that could involve AI. It, therefore, would have applications beyond scholarly articles and references, including domains like managing fake news, social media, synthetically generated media, legal and governmental processes, materials in the broader arts and sciences (beyond simple workflow management), and can encompass more than purely textual media and materials.

Notes

1.
The term “Multiverse” is widely used in different domains to describe different concepts. In science, it refers to everything that exists in totality [13] - as a hypothetical group of multiple universes. In quantum-computation, it refers to a reality in which many classical computations can occur simultaneously [19]. In a bibliographic-archival system, referred to as “Archival Multiverse”, it denotes “the plurality of evidentiary texts (records in multiple forms and cultural contexts), memory-keeping practices and institutions, bureaucratic and personal motivations, community perspectives and needs, and cultural and legal constructs” [24](Pluralizing the Archival Curriculum Group). In Information Systems, it deals with the complexity, plurality, and increasingly post-physical nature of information flows [31]. Our use of the term “MultiVerse” with a capitalized ‘V’ denotes a version of our proposed digital infrastructure for a richer metadata representation, which captures the nature of representing multiple versions of a source data object, and was named partially due to the system’s earliest tests being focused on translated poetry verses.

References

Ackerman, M.S.: The intellectual challenge of CSCW: the gap between social requirements and technical feasibility. Human-Comput. Interact. 15(2–3), 179–203 (2000)
Article Google Scholar
Al Asaad, B., Erascu, M.: A tool for fake news detection. In: 2018 20th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC), pp. 379–386. IEEE (2018)
Google Scholar
Alowaidi, S., Saleh, M., Abulnaja, O.: Semantic sentiment analysis of Arabic texts. Int. J. Adv. Comput. Sci. Appl. 8(2), 256–262 (2017)
Google Scholar
Altintas, I., Barney, O., Jaeger-Frank, E.: Provenance collection support in the kepler scientific workflow system. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 118–132. Springer, Heidelberg (2006). https://doi.org/10.1007/11890850_14
Chapter Google Scholar
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)
Google Scholar
Ananny, M., Crawford, K.: Seeing without knowing: limitations of the transparency ideal and its application to algorithmic accountability. New Media Soc. 20(3), 973–989 (2018)
Article Google Scholar
Angwin, J., Parris Jr, T., Mattu, S.: Breaking the black box: when algorithms decide what you pay. ProPublica (2016)
Google Scholar
Angwin, J., Larson, J., Mattu, S., Kirchner, L.: Machine bias: there’s software used across the country to predict future criminals and it’s biased against blacks (2016). https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing. Accessed 2019
Athar, A., Teufel, S.: Context-enhanced citation sentiment detection. In: Proceedings of the 2012 conference of the North American chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 597–601 (2012)
Google Scholar
Bavoil, L., et al.: Vistrails: enabling interactive multiple-view visualizations. In: VIS 05. IEEE Visualization, pp. 135–142. IEEE (2005)
Google Scholar
Bostrom, N.: Superintelligence: Paths, Dangers, Strategies. Oxford University Press, Oxford (2014)
Google Scholar
Cambria, E., Olsher, D., Rajagopal, D.: SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1515–1521 (2014)
Google Scholar
Carr, B., Ellis, G.: Universe or multiverse? Astron. Geophys. 49(2), 2–29 (2008)
Google Scholar
Cellan-Jones, R.: Stephen hawking warns artificial intelligence could end mankind. BBC News 2(2014), 10 (2014)
Google Scholar
Crawford, K.: Can an algorithm be agonistic? Ten scenes from life in calculated publics. Sc. Technol. Human Values 41(1), 77–92 (2016)
Article Google Scholar
Davidson, S.B., Freire, J.: Provenance and scientific workflows: challenges and opportunities. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1345–1350 (2008)
Google Scholar
(DDP), T.D.D.P.: Multiple translations of comedia di dante degli allaghieri col commento di jacopo della lana bolognese, a cura di luciano scarabelli (bologna: Tipografia regia, 1866–67), as found on dante lab (2013). http://dantelab.dartmouth.edu
Desai, D.R., Kroll, J.A.: Trust but verify: a guide to algorithms and the law. Harv. JL Tech. 31, 1 (2017)
Google Scholar
Deutsch, D.: The structure of the multiverse. Proc. R. Soc. London. Ser. A: Math. Phys. Eng. Sci. 458(2028), 2911–2923 (2002)
Google Scholar
Dos Santos, C., Gatti, M.: Deep convolutional neural networks for sentiment analysis of short texts. In: Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers, pp. 69–78 (2014)
Google Scholar
Dridi, A., Atzeni, M., Recupero, D.R.: FineNews: fine-grained semantic sentiment analysis on financial microblogs and news. Int. J. Mach. Learn. Cybern. 10(8), 2199–2207 (2019). https://doi.org/10.1007/s13042-018-0805-x
Article Google Scholar
Drozdal, J., et al.: Trust in automl: exploring information needs for establishing trust in automated machine learning systems. In: Proceedings of the 25th International Conference on Intelligent User Interfaces, pp. 297–307 (2020)
Google Scholar
Dwork, C., Mulligan, D.K.: It’s not privacy, and it’s not fair. Stan. Law Rev. Online 66, 35 (2013)
Google Scholar
The Archival Education and Research Institute (AERI), Pluralizing the Archival Curriculum Group (PACG): Educating for the archival multiverse. The American Archivist, pp. 69–101 (2011)
Google Scholar
El Alaoui, I., Gahi, Y., Messoussi, R., Chaabi, Y., Todoskoff, A., Kobi, A.: A novel adaptable approach for sentiment analysis on big social data. J. Big Data 5(1), 12 (2018)
Article Google Scholar
Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)
Google Scholar
Freire, J., Koop, D., Santos, E., Silva, C.T.: Provenance for computational tasks: a survey. Comput. Sci. Eng. 10(3), 11–21 (2008)
Article Google Scholar
Gao, H., Barbier, G., Goolsby, R.: Harnessing the crowdsourcing power of social media for disaster relief. IEEE Intell. Syst. 26(3), 10–14 (2011)
Article Google Scholar
Garfinkel, P.: A linguist who cracks the code in names to predict ethnicity. New York Times (2016)
Google Scholar
Gil, Y., et al.: Towards human-guided machine learning. In: Proceedings of the 24th International Conference on Intelligent User Interfaces, pp. 614–624 (2019)
Google Scholar
Gilliland, A.J., Willer, M.: Metadata for the information multiverse. In: iConference 2014 Proceedings (2014)
Google Scholar
Goebel, R.: Explainable AI: the new 42? In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2018. LNCS, vol. 11015, pp. 295–303. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99740-7_21
Chapter Google Scholar
Grove, W.M., Meehl, P.E.: Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: the clinical-statistical controversy. Psychol. Public Policy Law 2(2), 293 (1996)
Article Google Scholar
Holzinger, A., Kieseberg, P., Weippl, E., Tjoa, A.M.: Current advances, trends and challenges of machine learning and knowledge extraction: from machine learning to explainable AI. In: Holzinger, A., Kieseberg, P., Tjoa, A.M., Weippl, E. (eds.) CD-MAKE 2018. LNCS, vol. 11015, pp. 1–8. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99740-7_1
Chapter Google Scholar
Jhaver, S., Birman, I., Gilbert, E., Bruckman, A.: Human-machine collaboration for content regulation: the case of reddit automoderator. ACM Trans. Comput.-Human Interact. (TOCHI) 26(5), 1–35 (2019)
Article Google Scholar
Johnson, C., Taylor, J.: Rejecting technology: a normative defense of fallible officiating. Sport, Ethics Philos. 10(2), 148–160 (2016)
Article Google Scholar
Joy, B.: Why the future doesn’t need us. Wired Mag. 8(4), 238–262 (2000)
Google Scholar
Katwala, A.: An algorithm determined UK students’ grades (2020)
Google Scholar
Kharif, O.: No credit history? No problem. Lenders are looking at your phone data. Bloomberg.com (2016)
Google Scholar
Kurzweil, R.: The Singularity is Near: When Humans Transcend Biology. Penguin, New York (2005)
Google Scholar
Lehner, P.E., Mullin, T.M., Cohen, M.S.: A probability analysis of the usefulness of decision aids. In: Machine Intelligence and Pattern Recognition, vol. 10, pp. 427–436. Elsevier (1990)
Google Scholar
Licklider, J.C.: Man-computer symbiosis. IRE Trans. Human Factors Electron. 1, 4–11 (1960)
Article Google Scholar
Lintott, C.J., et al.: Galaxy zoo: morphologies derived from visual inspection of galaxies from the Sloan digital sky survey. Mon. Not. R. Astron. Soc. 389(3), 1179–1189 (2008)
Article Google Scholar
Madrigal, A.: Inside facebook’s fast-growing content-moderation effort. The Atlantic (2018)
Google Scholar
Makridakis, S.: The forthcoming artificial intelligence (AI) revolution: its impact on society and firms. Futures 90, 46–60 (2017)
Article Google Scholar
Martin, K.: Ethical implications and accountability of algorithms. J. Bus. Ethics 160(4), 835–850 (2019). https://doi.org/10.1007/s10551-018-3921-3
Article Google Scholar
Mateos-Garcia, J.: To err is algorithm: algorithmic fallibility and economic organisation (2017)
Google Scholar
Molina-González, M.D., Martínez-Cámara, E., Martín-Valdivia, M.T., Perea-Ortega, J.M.: Semantic orientation for polarity classification in Spanish reviews. Expert Syst. Appl. 40(18), 7250–7257 (2013)
Article Google Scholar
Monti, F., Frasca, F., Eynard, D., Mannion, D., Bronstein, M.M.: Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673 (2019)
Mukku, S.S., Choudhary, N., Mamidi, R.: Enhanced sentiment classification of Telugu text using ML techniques. In: SAAIP at IJCAI, vol. 2016, pp. 29–34 (2016)
Google Scholar
Nakamoto, S.: Bitcoin: a peer-to-peer electronic cash system, p. 4 (2008). https://bitcoin.org/bitcoin.pdf
Nakov, P.: Semantic sentiment analysis of twitter data. arXiv preprint arXiv:1710.01492 (2017)
Oinn, T., et al.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)
Article Google Scholar
O’neil, C.: Weapons of math destruction: How big data increases inequality and threatens democracy. Broadway Books, Portland (2016)
Google Scholar
Peckham, M.: What 7 of the most world’s smartest people think about artificial intelligence. Time Magazine (2016)
Google Scholar
Peng, J., Mit, C., Liu, Q., Uci, I., Ihler, A., Berger, B.: Crowdsourcing for structured labeling with applications to protein folding (2013)
Google Scholar
Piateski, G., Frawley, W.: Knowledge Discovery in Databases. MIT Press, Cambridge (1991)
Google Scholar
Rafiq, R.I., Hosseinmardi, H., Han, R., Lv, Q., Mishra, S.: Scalable and timely detection of cyberbullying in online social networks. In: Proceedings of the 33rd Annual ACM Symposium on Applied Computing, pp. 1738–1747 (2018)
Google Scholar
Rajput, A.: Natural language processing, sentiment analysis, and clinical analytics. In: Innovation in Health Informatics, pp. 79–97. Elsevier (2020)
Google Scholar
Redhu, S., Srivastava, S., Bansal, B., Gupta, G.: Sentiment analysis using text mining: a review. Int. J. Data Sci. Technol. 4(2), 49–53 (2018)
Article Google Scholar
Russakovsky, O., Li, L.J., Fei-Fei, L.: Best of both worlds: human-machine collaboration for object annotation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2121–2131 (2015)
Google Scholar
Saif, H., He, Y., Alani, H.: Semantic sentiment analysis of Twitter. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012. LNCS, vol. 7649, pp. 508–524. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35176-1_32
Chapter Google Scholar
Saif, H., He, Y., Fernandez, M., Alani, H.: Contextual semantics for sentiment analysis of Twitter. Inf. Process. Manag. 52(1), 5–19 (2016)
Article Google Scholar
Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.R.: Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, vol. 11700. Springer, Heidelberg (2019). https://doi.org/10.1007/978-3-030-28954-6
Book Google Scholar
Seering, J., Wang, T., Yoon, J., Kaufman, G.: Moderator engagement and community development in the age of algorithms. New Media Soc. 21(7), 1417–1443 (2019)
Article Google Scholar
Stecklow, S.: Why Facebook is losing the war on hate speech in Myanmar (2018). https://www.reuters.com/investigates/special-report/myanmar-facebook-hate
Taylor, T.B.: Judgment day: big data as the big decider. Ph.D. thesis, Wake Forest University (2018)
Google Scholar
Vijayanarasimhan, S., Grauman, K.: What’s it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2262–2269. IEEE (2009)
Google Scholar
Vondrick, C., Patterson, D., Ramanan, D.: Efficiently scaling up crowd sourced video annotation. Int. J. Comput. Vis. 101(1), 184–204 (2013). https://doi.org/10.1007/s11263-012-0564-1
Article Google Scholar
Wah, C., Van Horn, G., Branson, S., Maji, S., Perona, P., Belongie, S.: Similarity comparisons for interactive fine-grained categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 859–866 (2014)
Google Scholar
Wexler, R.: How companies hide software flaws that impact who goes to prison and who gets out. Washington Monthly (2017)
Google Scholar
Wisser, L.: Pandora’s algorithmic black box: the challenges of using algorithmic risk assessments in sentencing. Am. Crim. L. Rev. 56, 1811 (2019)
Google Scholar
Yousif, A., Niu, Z., Tarus, J.K., Ahmad, A.: A survey on sentiment analysis of scientific citations. Artif. Intell. Rev. 52(3), 1805–1838 (2019). https://doi.org/10.1007/s10462-017-9597-8
Article Google Scholar
Ziewitz, M.: Governing algorithms: myth, mess, and methods. Sci. Technol. Human Values 41(1), 3–16 (2016)
Article Google Scholar
Zinovyeva, E., Härdle, W.K., Lessmann, S.: Antisocial online behavior detection using deep learning. Decis. Supp. Syst. 138, 113362 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Santa Clara University, Santa Clara, CA, 95053, USA
Maria Joseph Israel & Ahmed Amer
University of Notre Dame, Notre Dame, IN, 46556, USA
Mark Graves

Authors

Maria Joseph Israel
View author publications
You can also search for this author in PubMed Google Scholar
Mark Graves
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Amer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maria Joseph Israel .

Editor information

Editors and Affiliations

Santa Clara University, Santa Clara, CA, USA
Navid Shaghaghi
INFN Sezione di Torino, Torino, Italy
Fabrizio Lamberti
Santa Clara University, Santa Clara, FL, USA
Brian Beams
Santa Clara University, Santa Clara, FL, USA
Reza Shariatmadari
Santa Clara University, Santa Clara, CA, USA
Ahmed Amer

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Israel, M.J., Graves, M., Amer, A. (2021). On Trusting a Cyber Librarian: How Rethinking Underlying Data Storage Infrastructure Can Mitigate Risksof Automation. In: Shaghaghi, N., Lamberti, F., Beams, B., Shariatmadari, R., Amer, A. (eds) Intelligent Technologies for Interactive Entertainment. INTETAIN 2020. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 377. Springer, Cham. https://doi.org/10.1007/978-3-030-76426-5_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-76426-5_3
Published: 19 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-76425-8
Online ISBN: 978-3-030-76426-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

On Trusting a Cyber Librarian: How Rethinking Underlying Data Storage Infrastructure Can Mitigate Risksof Automation

Abstract

Similar content being viewed by others

Once Upon a Time, “About 1.170 Results”: Exploring the Narrative Power of Search Engines

Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations

Forget Me, Forget Me Not - Redefining the Boundaries of the Right to Be Forgotten to Address Current Problems and Areas of Criticism

Keywords

1 Introduction

2 Related Work: The Problem of Flawed Librarians

3 The Proponents and Opponents of Cyber-Archivists

4 Trusting the Cyber-Archivist – MultiVerse

4.1 Using MultiVerse

4.2 Ethical Considerations

5 Concluding Remarks and Further Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

On Trusting a Cyber Librarian: How Rethinking Underlying Data Storage Infrastructure Can Mitigate Risksof Automation

Abstract

Similar content being viewed by others

Once Upon a Time, “About 1.170 Results”: Exploring the Narrative Power of Search Engines

Research with User-Generated Book Review Data: Legal and Ethical Pitfalls and Contextualized Mitigations

Forget Me, Forget Me Not - Redefining the Boundaries of the Right to Be Forgotten to Address Current Problems and Areas of Criticism

Keywords

1 Introduction

2 Related Work: The Problem of Flawed Librarians

3 The Proponents and Opponents of Cyber-Archivists

4 Trusting the Cyber-Archivist – MultiVerse

4.1 Using MultiVerse

4.2 Ethical Considerations

5 Concluding Remarks and Further Applications

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation