Keywords

1 Introduction

In the contemporary digital age, humans are influenced by external digital stimuli, particularly through recommendation algorithms that impact emotions, thoughts, and actions [1, 2]. On YouTube, thumbnails are powerful visual attractors that significantly affect a user’s decision to view a video, making them crucial for capturing interest and guiding subsequent actions. The trend towards shorter video formats, such as YouTube Shorts, has reshaped content consumption, catering to fast-paced lifestyles seeking brief, engaging content.

This research explores biases in recommendation algorithms as they pertain to YouTube Shorts’ thumbnails. By examining how these visual elements are recommended and disseminated, the study aims to uncover patterns of bias. Specifically, it addresses the following research questions:

  • RQ1: How does the topical content of YouTube Shorts’ thumbnails change over time through recommendations?

  • RQ2: What types of topics are more and less frequently recommended for YouTube Shorts after multiple recommendation cycles?

  • RQ3: How does the content depicted in these thumbnails, as recommended by YouTube’s algorithm, perpetuate biases on the platform?

To answer these questions, up-to-date topic modeling and content generation techniques were utilized. This research aims to enhance the understanding of the effects of recommendation algorithms on thumbnail content and their implications for content diversity and user engagement.

This paper is structured as follows: Sect. 2 reviews key concepts and relevant literature. Section 3 details our data collection and analytical methods. Section 4 presents findings on biases in YouTube Shorts’ recommendations with graphical analyses. Finally, Sect. 5 summarizes key insights and discusses implications for future research and practical applications.

2 Literature Review

This literature review provides an overview of key studies relevant to various aspects of our research.

2.1 South China Sea Dispute

The South China Sea (SCS) is a critical geopolitical region with significant attention due to overlapping territorial claims and its strategic importance for maritime routes and natural resources.

The research by [3] highlights the dual factors of natural resources and freedom of navigation, emphasizing the SCS’s abundant resources like oil, natural gas, and rich fishing grounds. They also stress the SCS’s importance as a vital trade route. The work by [4] examines China’s strategic approach, detailing efforts to consolidate claims and expand influence through diplomatic, economic, and military measures. The study by [5] analyzes China’s assertiveness from 1970 to 2015, identifying key turning points and highlighting the cumulative effects of China’s actions.

These studies underscore the SCS’s impact on regional stability, international maritime law, and the balance of power in the Asia-Pacific region.

2.2 YouTube Shorts and Thumbnails

YouTube Shorts, introduced to meet the demand for short-form content, have quickly become a dominant format, particularly in entertainment categories [6]. These videos attract higher engagement metrics compared to regular videos (RVs) but pose monetization challenges due to fewer advertisement opportunities, requiring new revenue strategies for creators [7]. The popularity of Shorts reflects changes in viewer behavior, aligning with shorter attention spans [8].

Thumbnails on YouTube play a crucial role in attracting viewers and influencing video selection, directly impacting click-through rates (CTR) and engagement metrics. Visually appealing thumbnails with high view counts are more likely to be selected [9]. However, clickbait thumbnails can lead to viewer dissatisfaction when the content does not meet expectations [10, 11]. Technological advancements like Optical Character Recognition (OCR) help detect and avoid clickbait, ensuring accurate representation of video content [12]. Thumbnails also influence algorithmic recommendations, as higher CTRs lead to more favorable placements in user feeds.

2.3 Recommendation Bias

Recommender systems significantly influence content consumption, often embedding biases and creating filter bubbles. Bias can arise from user preferences, algorithmic design, and training data. The study by [13] highlights how recommendation algorithms shape public discourse by promoting emotionally charged content [14]. Another work by [15] discusses content drift towards homogeneous and radical themes, emphasizing the importance of monitoring these shifts. Audits by [16,17,18] reveal the promotion of biased content, underlining the need for interventions. The work by [19] found that YouTube’s algorithm promotes content with specific emotional tones, affecting user engagement. Cross-topical analysis by [20] reveals biases in diverse contexts and highlight content drift risks. Lastly, the research by [21] found that YouTube’s algorithm can lead users, especially those with right-leaning ideologies, towards radical content [22]. Thus, addressing recommender bias and drift is crucial for a well-informed public.

To the best of our knowledge, no previous studies have specifically investigated biases in YouTube Shorts and their thumbnails. Therefore, our research offers a unique and novel contribution to the understanding of algorithmic biases in digital media platforms.

3 Methodology

This section details the methods used for data collection, topic modeling, and analysis of YouTube Shorts’ thumbnails to investigate recommendation biases.

3.1 Data Collection

To initiate data collection, we held workshops with experts to generate relevant keywords for our search, targeting YouTube Shorts videos.

Due to the YouTube Data API’s limitations with Shorts, we used APIFY’s YouTube Scraper [23] to collect 1,210 unique video IDs. Finding this insufficient, we employed a snowball method to generate additional keywords, using the YouTube Data API and transcriptions from previous research [24, 25].

The keywords for the South China Sea Dispute covered legal rulings, geopolitical tensions, and economic interests, enabling us to collect 2,094 unique video IDs for a detailed analysis of the conflict.

To measure bias in YouTube Shorts recommendations, we developed a custom scraping method.

Using collected video IDs as seed videos, we ensured a neutral user profile with fresh WebDriver instances. Automated with Selenium, our script scrolled through recommendations (depth) to a depth of 50, then started a new session for the next video.

We collected 104,700 videos, and after filtering out unavailable ones, our final dataset included 100,300 videos with their thumbnail images, obtained via the YouTube Data API.

3.2 Caption Generation

To investigate thumbnail images in detail and understand their context, we generated captions that describe the contents and events depicted in the images.

For this, we used GPT-4 Turbo, an enhanced version of OpenAI’s GPT-4 language model. This model is optimized for speed and cost-effectiveness while maintaining similar capabilities and performance to the standard GPT-4, making it suitable for applications requiring quick responses and scalability. GPT-4 Turbo excels in natural language understanding and generation, supporting tasks from text completion and translation to content creation and conversational AI [26].

3.3 Topic Modelling

To understand the thumbnails’ captions, we classified them into topics to track their evolution through recommendations.

We used GPT-4o, a refined and efficient version of GPT-4 designed for faster performance and lower costs while maintaining advanced natural language capabilities [26]. This model generated two types of topics general topics and categorized topics with specific constraints.

We also used BERTopic [27], a technique leveraging BERT embeddings to capture semantic similarities, resulting in coherent and interpretable topics. A fine-tuned version pre-trained on approximately 1,000,000 Wikipedia pages [28] was utilized, identifying 2,377 distinct topics. This robust framework effectively analyzed the thematic content of videos.

3.4 Clustering Topics

To analyze the topics discussed in Sect. 3.3, we clustered them due to their large number.

We filtered out non-informative topics like ‘Photograph(s)’, ‘Thumbnail(s)’, ‘Image(s)’, and ‘Video(s)’.

BERT embeddings [29] were generated to capture semantic meaning through dense vector representations, considering both preceding and following contexts for accuracy.

These embeddings were reduced to two dimensions using t-SNE for visualization [30], which reveals intricate local patterns effectively.

The reduced features were clustered using the OPTICS algorithm [31], which handles varying data densities and is more flexible than other methods.

4 Results

This section presents our findings on biases in YouTube Shorts’ recommendations, supported by graphical analyses of topic shifts and distributions.

4.1 Clustered General Topics with GPT

The GPT model generated 2,314 unique general topics. To visualize the topic clusters, we plotted them in a 2D space at various depths using t-SNE components for dimensionality reduction. We visualized the top 50 high-frequency topics. The legend distinguishes clusters by color and shows noise in gray.

As shown in Fig. 1, representing depth 0, topics are clustered together. Cluster 0 includes terms like politics, history, diplomacy, war, and military, indicating political themes. Clusters 3, 4, and 5 contain terms like ships, aircraft, and fishing, relating to aircraft carriers and economic perspectives at sea. Cluster 7 includes terms like geopolitics, map, and Philippines, highlighting the geographic perspective of the topic. Other clusters depict activities such as broadcasts, meetings, and presentations, with some, like Cluster 1, showing animated explanations of the topic. Initial depth videos were highly relevant to the investigated topic.

At depth 5, as displayed in Fig. 2, the topics are vastly different from the initial ones. Cluster 0 shows crafting topics, Cluster 1 covers machines and robotics, Cluster 3 is mostly about gaming, Cluster 4 includes dance, gym, and martial arts, Cluster 7 features child and dog-related terms, and Cluster 8 contains memes. The original topics have almost completely faded, with many new topics emerging.

We did not include all depths here because topics drifted early, and space constraints limited our illustrations to these two depths. More depths will be shown in the upcoming result subsections.

Fig. 1.
figure 1

General Topic Clustering for Depth 0

Fig. 2.
figure 2

General Topic Clustering for Depth 5

4.2 Categorized Topics with GPT

In this section, we investigated the categorized topics mentioned earlier in Sect. 3.3. Using the GPT model, we generated 20 categorized topics. These categories were determined by researching online and examining YouTube video categories, supplemented by additional categories from various news websites to ensure comprehensive coverage. Thus, we settled on 20 categories to encompass a broad range of topics as shown in Fig. 3. We used a lollipop chart for clarity, with the Y-axis representing the topics and the X-axis showing the topic ratios or distributions. The legend indicates depth ranges with corresponding colors. Topics for each depth range are accumulated and normalized, grouped into five classes for convenience and clarity, filtering out depths with a ratio below 0.01.

At the initial depth (depth 0), news dominates nearly 40% of the topics, followed by politics at around 15%, with other topics like history and lifestyle also present. As the recommendation algorithm suggested new videos, these new topics were neither news nor politics. News topics reduced dramatically across other depths, and political terms disappeared entirely. New topics, primarily entertainment-related, increased with each depth. Lifestyle topics remained relatively stable across depths due to their broad and encompassing nature. The graph clearly shows the topic shift happening in the recommendation algorithm.

Fig. 3.
figure 3

The distribution of Categorized Topics Across Depths

4.3 Topic Distribution with BERTopic

In addition to generative AI, we utilized BERTopic for topic modeling, as detailed in Sect. 3.3. For illustration, we used radar charts to visualize the topics at initial, middle, and end depths as shown in Fig. 4. On the circle the topic IDs and names are represented, and the topic IDs are available on Huggingface [28] with details. We focused on the three most prevalent topics for each depth level to effectively track topic transitions.

At depth 0, the most prominent topics included flag, geography, and uniform. For example, topic 1650 (uniforms, uniformed, berets, beret) related primarily to military and soldiers, topic 935 (geography, geographic, geographical, geographer) concerned geopolitical regions around the South China Sea, and topic 111 (flags, flag, flagpole, commonwealth) referenced different nations. These topics clearly relate to the South China Sea Dispute.

As we move to deeper levels, we observed the emergence of unrelated topics, with the initial topics fading away. At later depths, we see topics like 706 (artistic, art, artwork, paintings), 5 (cuisine, cuisines, foods, culinary), and 1879 (lighting, lights, fluorescent, light). This shift indicates the recommendation algorithm’s steering away from the original subject matter towards more general and unrelated topics.

Although BERTopic may not capture topics as effectively as GPT, it vividly demonstrates the topic drift across depths. The transition from focused, relevant topics to broader, unrelated ones highlights the algorithm’s influence on content distribution, illustrating how quickly the focus can shift away from the initial subject matter.

Fig. 4.
figure 4

Topic Distribution Using BERTopic Across Depths

5 Conclusion and Discussion

In this study, we investigated algorithmic bias in YouTube Shorts’ video recommendations, focusing on thumbnail captions within the context of the South China Sea Dispute.

Our findings indicate a clear topic shift or drift in YouTube Shorts recommendations. After the initial videos, broader and less relevant topics are suggested due to YouTube’s recommender system favoring high-engagement entertainment videos. This popularity bias results in the neglect of minority and serious issues, creating an algorithmic bias on YouTube Shorts. Consequently, these more popular but less serious videos are recommended more frequently.

For future work, we will address study limitations by comparing results with engagement scores to validate our assumptions. We will investigate various narratives, comparing well-known topics with niche subjects like the South China Sea Dispute, to understand recommendation levels. Additionally, we will incorporate interactive data collection (liking and commenting) to observe how user interactions affect recommendations and analyze text attributes such as titles, descriptions, and transcriptions.

This research highlights biases in YouTube’s algorithmic recommendations, focusing on thumbnails. Understanding these biases is essential for fair representation of diverse topics, especially serious and minority issues. Thumbnails influence user engagement, and our study shows how algorithmic preferences can skew topic visibility. By exposing these biases, we contribute to the discourse on digital media ethics and the need for transparency in recommendation systems.