Keywords

1 Introduction

Libraries are constantly adopting new technology and services to better their offerings and suit their patrons’ changing information demands. Discovery services, which provide a unified search interface for accessing diverse library materials, have grown in popularity in academic libraries around the world. These allow users to search numerous databases and employ advanced search capabilities such as relevance ranking, filtering, and saving searches. However, in order to verify that these systems are effective in satisfying the demands of users and to improve the search experience, it is imperative to evaluate and assess this type of service using data-driven methodologies. With a growing emphasis on a service-oriented approach, librarians are urged to critically analyze their work and services. Libraries are increasingly required to plan, provide, and evaluate their services using data and evidence, which has increased the demand for attention in academic literature and industry best practices [1].

Transaction log analysis (TLA) is a method that has been widely employed over the years to understand user behavior and enhance system architecture. It is a technique for gathering information and a research technique for analyzing user and system behavior. An electronic record called a “transaction log” keeps track of user and system interactions. These log files can originate from a variety of sources including websites, online catalogs (OPAC), user computers, blogs, listservs, online newspapers, and other programs that can record user-system information interaction [2]. Transaction logs keep track of all system activity, such as searches, clicks on search results, and access to certain resources. These logs can be studied to learn more about how users interact with the system.

Transaction logs are valuable for librarians because they allow for the analysis of user activity, which helps library management make operational decisions [3]. By examining transaction logs, librarians can uncover common problems and improve the system to better meet user needs [3]. For instance, librarians can use transaction log analysis to determine which resources are being used the most and which ones are not, then decide which resources to promote or drop [1].

The De La Salle University (DLSU) Libraries, which serve almost 25,000 members of the academic community, is composed of a main library and five satellite libraries spread over four campuses. It has a large collection of print and digital resources, including books, periodicals, and multimedia products. Satellite libraries are strategically located on different campuses to meet the needs of specific academic programs. The DLSU Libraries examines and analyzes its services, resources, and systems on a regular basis in order to find areas for improvement and implement creative solutions. This ongoing effort is an attempt to guarantee that the library remains a valuable resource for study and research.

In 2019, Fresnido and Barsaga examined search trends and identified the causes of failure rates by conducting a transaction log analysis of the DLSU Libraries’ previous online public access catalog (OPAC). The study's findings revealed that the current OPAC did not match the standards of a next-generation catalog, as evidenced by the issues discovered throughout the log analysis process [4]. Recognizing the need to update its discovery service, the DLSU Libraries launched AnimoSearch in September 2020, a next-generation discovery service powered by Ex Libris Primo that provides a uniform search interface for accessing diverse library contents. It is a discovery platform that harvests and indexes local library collections, such as bibliographic records, full-text articles, and digital objects, making them easily discoverable to users. It has advanced harvesting and normalization capabilities that increase productivity with end-to-end library workflows. It also allows for customization of the discovery service to influence how collections are explored and displayed [5, 6].

The purpose of this study is to investigate the information behavior of AnimoSearch users in order to better understand their information demands and actions during the information-seeking process. The study employs transaction log analysis to evaluate the behavior of AnimoSearch users during the academic year 2021–2022, with an emphasis on detecting trends in user behavior. The study specifically seeks to address the following research questions: (1) What are the most often used search phrases in a discovery service? (2) What are the most frequently used facets, and how do they affect the success of a user's search? (3) How do user search behavior and resource access patterns differ by user segment? The findings of this study will provide insights for improving research training delivery, assisting librarians and developers in better understanding how users interact with the system, and identifying areas for improvement. Furthermore, the study adds to the expanding body of research on TLA as a method for researching user behavior and preferences, emphasizing the relevance of context in the design and evaluation of information systems.

2 Literature Review

TLA has been used to generate and analyze transaction logs since the 1960s. Peters examined the growth of TLA, its application, and its particular application in assessing online catalog systems in 1993. His research revealed that the first phase of TLA was originally focused on evaluating system performance rather than user behavior, such as the study of Meister and Sullivan in 1967 and Lucas in 1971. But it was during the second phase, from the 1970s until the mid-1980s, when TLA was first applied to the study of online catalog systems. Researchers were interested in studying both how the system was used, such as the selection and order of search choices, and user searching behavior, including session time and error patterns [7,8,9].

Jansen described a three-stage process for performing web search transaction log analysis, which includes data collection, preparation, and analysis. The article defined key topics in TLA and explored its advantages, such as low-cost and unobtrusive data collection. However, limitations are noted, such as a lack of information on search reasons and searcher motivations. To improve analysis robustness, the author advised integrating TLA with additional approaches [10]. Jones et al. did a transaction log study on the New Zealand Digital Library's Computer Science Technical Reports Collection to better understand user behavior. The research looked at user demographics, search behaviors, query structure, common errors, and interface design concerns. The findings gave important information for enhancing digital library systems [11].

In a discovery system, Behnert and Lewandowski (2017) examined known-item search queries that resulted in zero hits. Based on item availability and query validity, they divided questions into four categories. The study identified acquisition and erroneous searches as the primary causes of zero hits and proposed methods to improve user experience, such as automatic spelling correction. Similarly, Ciota used Primo Analytics to understand patron search strategies at Grinnel College by categorizing the search types into known and exploratory categories. Results showed that known item searches were slightly higher than exploratory search queries [12].

Schlembach et al. investigated user search habits by analyzing transaction logs from a federated search system. The research uncovered information about search characteristics, search help usage, and clickthrough actions. The discoveries aided in the optimization of search and discovery services in dispersed retrieval systems [13]. On the other hand, Agosti et al. gave an in-depth examination of log analysis research in web search engines and digital libraries. They stressed the significance of merging log data with data from other sources in order to gain a comprehensive understanding of user behavior. The report also highlighted research obstacles and new trends in log analysis [14].

Fischer et al. developed a way for analyzing transaction log data from EBSCO Discovery Service (EDS) queries captured in Google Analytics. The authors described how to export data, analyze it, and recreate a search. The study demonstrated the potential of transaction log analysis to enhance system effectiveness, resource utilization, and user instruction in libraries [15]. Meadow and Meadow examined web-scale discovery tools for the quality of their search queries. The study consistently identified high-quality search queries but also areas for improvement, such as user education on effective search strategies and interaction between libraries and vendors [16].

Transaction log analysis has proven to be an effective study tool for gaining a better knowledge of user behavior, system performance, and search tactics in various information retrieval systems. The reviewed research showed TLA's strengths and shortcomings, suggested improvements, and provided insights.

3 Methodology

This study employs a combination of quantitative and qualitative methodologies to uncover user behavior and identify information needs throughTLA. TLA is a research method that analyzes logs to gain insights into how actual users use a system. Transaction logs are files that record every communication or transaction between a system and its users and can be used to collect significant amounts of data on system usage. TLA is considered an unobtrusive method of collecting substantial amounts of usage data on a considerable number of users [10, 17].

The data used in this study is sourced from Primo Analytics, a tool developed by Ex Libris, that provides analytics for libraries using the Primo discovery system. In the context of Primo, transaction logs refer to a record of user interactions with the system. This includes search queries, resource views, and other actions taken by users. Primo Analytics provides various reports that allow libraries to analyze user behavior. I analyzed transaction data from 2021–2022, extracted in CSV format from Primo Action Usage, Primo Facet Usage, and Primo Popular Searches reports.

The qualitative aspect of this study focused on categorizing the popular searches identified in the Primo Popular Searches report. For the analysis, we utilized the top fifty rankings from the report for the years 2021 and 2022, resulting in a total of 805 search queries. The classification scheme used was adopted from Rebecca Cioti's report entitled “Using Primo Analytics to Understand Patron Search Strategies.” This involved known popular searches, such as specific book titles or authors, and exploratory popular searches that may reveal user interests and behaviors [12].

4 Results and Discussion

4.1 Most Commonly Used Search Terms in AnimoSearch

Search Queries. The results of this study revealed a diverse range of topics that are of interest to AnimoSearch users, indicating the usefulness and applicability of the search engine in various contexts. Table 1 displays the top fifty AnimoSearch search queries and the number of times each query was searched.

The predominance of COVID-19-related search queries demonstrates the pandemic's persistent influence on society and the need for reliable information and tools. One notable finding was the prevalence of search queries for popular database titles such as Euromonitor, JSTOR, Scopus, and Science Direct. This behavior indicated that users were using AnimoSearch as a means to access scholarly databases. However, it is essential to note that users may not have been aware that a better alternative to accessing the database directly is using the A-Z directory on a different webpage. They may perceive that AnimoSearch functions the same way that search engines, like Google, work. Moreover, the broad spectrum of topics searched in AnimoSearch in the areas of management, business, economics, philosophy, psychology, politics, and science highlights the utility of AnimoSearch as an academic research platform. It also indicates the significance of keeping a database of scholarly literature that covers multiple disciplines.

The types of search queries also d helpful insight into how patrons use these searches to locate information. Many of the search terms used, such as “development,” “science,” or “test,” are too broad or vague, which can lead to a large number of irrelevant results that may not be useful for their research. Furthermore, users may be unaware of advanced search tools and strategies that can help them refine their queries and locate more relevant content. Users can find more relevant information by using methods such as Boolean operators, enclosing search phrases in quotation marks, or utilizing filters.

These findings highlighted the importance of teaching patrons how to refine their search terms and use more specific keywords directly related to their research topic. This can help them find the most relevant and valuable information in a shorter amount of time. Additionally, patrons may also be more trained to use advanced search techniques such as Boolean operators or limiters to further refine their search results. By understanding how patrons used these searches and providing guidance on how to refine them, librarians can help ensure that patrons are able to find the information they need efficiently and effectively.

Table 1. Top 50 search queries used by AnimoSearch users

Known-İtem and Exploratory Searches.

The sample queries were categorized into known item and exploratory search types. “Known item searches” refer to the activities carried out by searchers who have a particular item in mind. The searcher comes to the database with the knowledge that the search target exists and with specific information about it, such as the author, title, or subject. The goal of a “known item” search is to locate a specific information source, such as a book, journal, article, video, or website, that the searcher already knows something about [18].

On the other hand, exploratory search in library catalogs refers to the process of searching for information when the searcher is unfamiliar with the domain of their search goal and unsure about the ways to achieve their objective. It is a type of information exploration that involves preliminary, initial, and novice searching. Exploratory searching is used to identify relevant resources and to develop a better understanding of the topic being researched [19].

Table 2 provides an overview of the distribution of search types across the different search types. Out of 805, 188 queries (23.35%) were classified as known items, indicating that researchers were looking for specific information. The bulk of searches, 613 (76.15%), were exploratory in nature, as researchers sought general information or explored themes of interest. Only four queries (0.50%) were recognized as having inadequate information, indicating challenges owing to a lack of data. These findings are in contrast with the studies conducted by Schlembach et al. and Mischo et al. that reported a higher proportion of known items than exploratory searches. In their papers, results indicated that known item searches constituted around 63.00% and 55.00% of search sessions, respectively [13, 20].

Table 2. Search types and strategies employed by AnimoSearch users

Out of 188 known item searches, the most common approach was searching by title (131 searches, or 69.68%). This suggested that researchers mainly used the title to find specific items. The second most common strategy was combining the author's name with the title (21 searches, or 11.17%), indicating that researchers used both criteria to refine their search. Searching by author alone was used in 16 searches (8.51%), showing that researchers were interested in finding items by specific authors.

Other search strategies included using call numbers (2.66%), DOIs (2.66%), and website links (4.79%). These methods were used to locate known items through library classification systems, unique digital identifiers, and direct online sources, respectively. Only one search (0.53%) used the International Standard Book Number (ISBN) to find a specific book. Data showed that researchers mainly relied on titles and a combination of author and title searches to find known items.

What known items are users looking for? Results reveal that the most common type of item searched for was “Book and Book Chapter” and “Article” constituting 41.49% (n = 78) and 36.70% (n = 69) of the total searches, respectively. Additionally, “Thesis and Dissertation” accounted for 7.45% (n = 14) of the total searches. These findings suggested the significance of scholarly and academic literature in the users information seeking process. Other item types such as “Database” (12 or 6.38%), “Media” (6 or 3.19%), “Journal Title “ (5 or 2.66%), “Webpage” (3 or 1.60%) and “Newspaper article” (1 or 0.53%) accounted for a smaller proportion of the total searches. The results highlighted that, while these item types represent a smaller percentage, they still played a valuable role in complementing the users’ search strategies, and they utilized various resources to support their studies.

Number of Words in Search Queries.

Findings revealed that the average number of words per question was calculated at 4.30. The majority of user search queries consist of one or two words. The results closely match those of Schlembach et al. who reported that users of their library catalog system entered an average of 4.33 words for each search request [13]. The frequency of occurrence for one-word inquiries is 134 (16.65%), whereas it was somewhat higher for two-word queries, at 203 (25.22%). This suggested that many people prefer to enter brief searches with few words.

As the number of terms in a query exceeded two, the frequency of occurrence gradually decreased. Three-word questions had a frequency of 122, followed by four-word queries at 93. The downward trend continued for five-word inquiries (60), six-word queries (57), and seven-word queries (32). Data showed that the frequency rose again for questions with more than ten words, with a frequency of 57. This implied that a subset of users entered lengthier and more specific inquiries, indicating a need for extensive and comprehensive search results. Results illustrated that shorter inquiries were more popular, while lengthier queries had a smaller but noticeable presence. Understanding the distribution of query lengths can be useful for search engine optimization and for creating successful information retrieval systems.

4.2 Most Frequently Used Facets and Their Impact on Search Success

This analysis examined the impact of different facets on the success of users’ searches in AnimoSearch and their significance for improving search functionality and user experience. Table 3 shows that the “Top Level” facet was the most frequently used, with an average of 146,362 selections, indicating its importance in providing users with a starting point for their searches. This was followed by “Resource Type,” which was selected an average of 95,590 times. This facet provided users with the option to further filter results by item types such as articles, books, theses and dissertations, and many others. The “Date Slider” facet was selected an average of 67,808 times, suggesting that users value specifying a date range to access relevant and up-to-date information. Other facets such as “Topic,” “Collection,” and “Library” were used less frequently, while the “Author” facet had the lowest usage. These findings can inform improvements to AnimoSearch by enhancing the presentation and usability of frequently used facets and ensuring the accuracy and comprehensiveness of search options.

Understanding how these factors affected search success helps enhance search functionality and user experience. Users can navigate the search interface better by improving the “Top Level” facet, which is utilized most often. Accurate resource types, date range options, and topic categories can also improve search results. These insights can inform the design and development of AnimoSearch to increase usability, search success rates, and user experience.

Table 3. Frequency of facet selection in AnimoSearch

The top-level facet provides users with options to refine their searches and access relevant resources based on specific criteria. It is important to note that the availability and specific facets used in Primo may vary depending on the configuration and customization of individual Primo implementations. In AnimoSearch, users can further tweak their results in terms of availability and type of access, namely: Peer reviewed journals, Full-text Online, Open Access and Available in the Library. Filtering results by Peer-reviewed Journals” tops the selections, followed by Full-Text online facet while the Available in the Library was the least used facet.

Looking at user preferences for resource types, Table 4 shows that “Articles” was the most popular resource type with 43,945 selections. Users favored articles as a primary source of information and may have been sought after for their depth and scholarship. “Books” was selected 11,075 times, implying that users valued books as a resource for in-depth research. “Dissertations” ranks third in user picks with 8,995 instances. This suggested that users were interested in academic dissertations, which present original research and analysis. Users seeking in-depth and specialized information can benefit from dissertations.

With 5,405 selections, “Newspaper Articles” was also popular. Newspapers offer current news and information, suggesting users value it. Newspapers provide timely coverage of events, opinions, and other significant issues. With 4,720 and 4,671 selections, “Book Chapters” and “Reviews” were relatively popular. This suggested that users valued reading book chapters and reviews to learn about certain topics or evaluate resources. “Journals” had 3,091 selections, demonstrating that users actively sought specific journals to read articles and intellectual content published inside them. While other resources types, such as “Text Resources,” “Newsletter Articles,” “Conference Proceedings,” and “Market Research,” also received varied user selections, suggesting distinct study interests and needs.

Table 4. User preferences for resource types

These data indicated that users preferred peer-reviewed material. Users also showed interest in books, dissertations, and newspaper articles, emphasizing the need for broad and updated knowledge sources. Users’ concentration on specific areas or resource quality was shown by their choice of resource types, such as book chapters and reviews. These findings indicated a varied user base with diverse needs, as users sought a wide range of resources to support their research and information needs. Understanding these patterns of resource type selection helps inform the creation and customization of the AnimoSearch system, ensuring that users can simply access and navigate the most desired resource types for a better user experience.

4.3 Variation in Search Behavior by User Segment

Through Primo Analytics, various actions conducted by AnimoSearch users while interacting with the service were recorded and summarized. This allowed for analyzing the difference in behavior and preferences per user segment.

The top ten user actions in AnimoSearch shed light on their preferences and behavior during interactions with the system (see Table 5). Of the 141 action types, the most popular activity was the use of basic search, which had 973,985 occurrences, demonstrating that users regularly used the search functionality to find specific objects or information. Displaying the full record of an item was closely followed by 922,116 instances, indicating great interest in getting thorough details or metadata. The third most popular activity was clicking on an item's title, which occurred 337,905 times, showing users’ willingness to explore further information or inspect the item itself. Navigating to the next page of search results or item lists was also popular, with 249,283 instances of the “Next page” action. Users checked availability statements frequently, executing the “Click on availability statement” action 218,364 times.

On the other hand, users used the advanced search tool 200,212 times, showing a need for more refined or particular searches. Facet filtering was also popular, with 130,410 instances reflecting users’ desire for limiting search results based on specified criteria. Creating or accessing citations for items was another typical action, occuring 71,974 times. Users clicked on “Main Menu Link 1” 68,885 times, indicating its significance as a navigation feature. Finally, the “AZ list” action occurred 33,744 times, suggesting users’ tendency to browse or locate certain things alphabetically.

The results highlight the most popular activities of the users as they engaged with the discovery service. This information helps prioritize features and functionalities that align with user preferences, as evidenced by the high frequency of certain actions. On the other hand, this could also help in the modification of the interface if there are certain actions that the library wants the user to make use more of. For example, making the Advanced Search more visible may increase the use of this functionality.

Table 5. Top 10 user actions in AnimoSearch

When performing a basic search in AnimoSearch, users have the option to select the search scope. Search scope defines where the system should perform the search. There are three predefined search scopes: Everything (default), Online Resources, and Physical Resources. A user who chooses to search in Everything will get the available materials provided by the DLSU Libraries regardless of format.

The frequency with which the various search scopes are selected in AnimoSearch by different user segments was also examined. It showed that the Guest user category had the highest frequency in all search scopes. Ranking their preferences in terms of search scope selection, Gradudate, Undergraduate, and Senior High School (SHS) students exhibited a similar pattern where they mostly used Physical Resources, followed by Everything then lastly, the Online Resources. For Faculty, Physical Resource selection topped the rank as well but their second most used scope was Online Resources. With only 161 (Everything) and 2,370 (Physical Resources) occurrences, the Staff user group had the lowest frequency of selecting search scopes. Overall, data showed that when logged in, users mostly searched Physical Resources first, while guests (not logged in) employed the default search scope option, which is Everything.

Table 6 presents the breakdown of facet usage by different user groups in AnimoSearch. Findings revealed that facet usage varied across user segments. Faculty members mostly used the Resource Type facet, followed by Topic, and Date Slider, while Graduates mostly preferred to use Collection, Topic and Resource Type. For the Undergraduates, their top three facets included Resource Type, Top Level and Date Slider, which indicated they are more particular in selecting resource format or type of information resource to streamline their research process and retrieve more targeted results. Senior High School mirrored this behavior but in a much lesser frequency. This is understandable, as the number of SHS students was much lower than that of undergraduates. Staff have the least number of interactions with the system, and showed that they mostly use Resource Type, Date Slider, and Topic facets. Guests, on the other hand, were those unsigned AnimoSearch users that belong to any of the abovementioned user segments. Overall, Guests users most frequently used the Top Level facet as well as the Resource Type and Date Slider facets. The table further shows that overall, Top level was the most used facet in AnimoSearch, while Author was the least. These findings provided valuable evidence that each user segment varies in terms of actions and preferences.

Table 6. Percentage of facet usage by user groups in AnimoSearch

5 Conclusion

This study provides valuable insights into the search behavior and preferences of AnimoSearch users. It revealed a diverse range of popular search terms used by AnimoSearch users, including COVID-19-related queries and popular database titles such as Euromonitor, JSTOR, and Scopus. The search queries examined revealed that more than 75.00% of the searches were exploratory. For known item searches, title was the most common search strategy, and articles are the most common item type searched. On average, users used 4.3 terms per search string. Findings also revealed that users tended to search using broad terms and rarely use techniques that may help them refine their search.

In terms of facet selection, the most frequently used facets in AnimoSearch were the “Top Level” facet, “Resource Type” facet, “Date Slider” facet, and “Topic,” “Collection,” and “Library” facet. The analysis of user actions in AnimoSearch revealed little variation in search behavior among different user segments when it came to facet selection and search scope selection. Basic search was the most popular activity across user segments, with articles being the most frequently selected resource type, followed by books and dissertations.

In conclusion, AnimoSearch proves to be a well-utilized tool by users to discover a diverse range of academic resources. However, students would benefit from improving their understanding of advanced search tools and techniques. While the current reseach instruction program at DLSU Libraries includes teaching this skill, it would be beneficial if this area will be given more emphasis by providing additional tutorials and guides. Additionally, although advanced search is clearly visible in the interface, it is still less frequently used than basic search. It is necessary to find a better strategy to encourage users to use this feature more. Finally, it is important to note thatTLA, such as the one conducted in this study, does not provide context for user behavior. Therefore, it is highly recommended that further studies be undertaken to explore the motivations and contextual factors behind user behavior when using AnimoSearch. This will provide a more comprehensive understanding of how users interact with the platform and will help guide future improvements.