Background

According to the GLOBOCAN 2020 estimates, female breast cancer is the most common type of cancer, with an estimated 2.3 million new cases, and is the fifth leading cause of cancer-related deaths, accounting for 685,000 deaths [1]. The disease burden of breast cancer varies significantly among countries, with China topping the list of the number of cases and death rate [2]. With the development of medical technology and popularization of breast cancer screening programs, early diagnosis and treatment of breast cancer have progressed significantly, reducing mortality [3, 4]. Patients with breast cancer often experience multiple concurrent and interacting symptoms throughout their cancer journey, which require obtaining relevant health information to cope with the disease [5]. Patients and their families often have diverse needs at different treatment stages [6]. Among these, information and emotional needs are the most common. In a cross-sectional study, patients with breast cancer in China scored low on instrumental, emotional, and informational support [7]. Several family members of patients with breast cancer reportedly scored moderately high on information needs and low on emotional function and mental health [8, 9]. However, the existing research is mainly cross-sectional surveys and semi-structured interviews, which are often limited to samples from specific geographic areas, thus restricting the widespread applicability of the findings. In addition, an in-depth exploration of the needs of patients and their families during the various treatment stages is lacking.

With the rapid development of internet technology and widespread use of online social platforms, an increasing number of patients and their families seek online support [10]. Patients with breast cancer and their families reportedly often search for cancer-related issues on internet platforms to increase their understanding of the disease and achieve personal education goals. Additionally, they quickly obtain and share disease-related information through online message platforms and support groups, providing emotional support to each other, reducing anxiety and loneliness, and promoting positive disease coping mechanisms [11]. Therefore, the analysis and mining of data from social platforms are necessary to understand the health information needs of patients with breast cancer.

Approximately 420,000 new cases of breast cancer are diagnosed in China annually. Recently, the incidence of breast cancer has increased by 3–4% annually. The peak age of onset of female breast cancer is 45–55 years in China, which is earlier than 55–59 years in Western countries. The early onset implies a heavier disease burden on the patients, their families, and society [12]. In China, breast cancer is a high-frequency search topic in online health communities (OHCs). Patients with breast cancer and their families often discuss disease-related issues, share medical experiences, exchange information, and express fear and frustration with others by posting messages on OHCs [13]. These posts are genuine logs of self-reported needs, feelings, and knowledge shared by patients with breast cancer and their families from different regions of the country during various treatment phases that are regularly updated [13]. The evolving and multifaceted needs of patients with breast cancer and their families throughout their treatment journey can be understood by text-mining these narratives. In traditional Chinese culture, breast cancer is considered an emotional issue or the result of karmic retribution, leading to feelings of stigma in patients and even vicarious stigma of their caregivers, causing them to be reluctant to seek help from their families and professionals [7]. Although cross-sectional and longitudinal analyses of topics discussed on Chinese breast cancer forums have been performed, in-depth exploration of the needs of patients with breast cancer and their families is lacking, and few studies have explored the needs of patients with breast cancer by text-mining Chinese social media posts.

OHC-related data are enormous, with complex structures and diverse content, making manual analysis and processing inefficient and ineffective. To extract valuable information, using natural language processing and machine learning technologies for automated text analysis is necessary. To determine the most suitable topic modeling method for this study, we conducted a thorough evaluation of several mainstream topic modeling techniques at the initial stage of the research, covering the advanced BERT-based (Bidirectional Encoder Representations from Transformers) BERTopic model, traditional Latent Semantic Indexing (LSI), and Latent Dirichlet Allocation (LDA) [14,15,16]. Through comparative analysis, we noted that the unsupervised learning model, LDA, is particularly effective in mining deep themes in texts manifested through lexical co-occurrence. Compared to BERTopic and LSI, LDA shows significant advantages in topic consistency and its performance stability under different parameter configurations, highlighting its uniqueness in the field of topic modeling. Moreover, an increasing number of researchers are using the LDA model to analyze the health information needs of internet users, suggesting that this method has unique advantages in text mining for web-based question-and-answer communities [17, 18]. Hence, this study plans to use the LDA model to assess the needs of breast cancer patients and their families in online health communities, in order to provide a scientific basis for formulating targeted nursing measures. Due to the argumentation process’s complexity, a comprehensive explanation is provided in the Electronic Supplementary Material of the paper.

Methods

A five-step process, including data source determination, collection, cleaning and coding, preprocessing, and LDA topic modeling was included in the LDA model. The overall process is illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart of data processing and modeling

Data source determination

Initially, health-related websites were searched using the Baidu search engine using “breast cancer” as the main keyword. After a detailed examination of the websites, breast cancer-related posts appeared mainly in two OHCs: patient-to-patient and patient-to-doctor. Patient-to-patient communities are an open platform where patients with breast cancer and their families provide peer support and exchange tips and information on how to cope with cancer. “Breast Bar,” a sub-social forum of Baidu Post Bar, is the largest (approximately 237,181 members) and most active (approximately 1,573,096 posts) patient-to-patient community for breast cancer in China. Patient-to-doctor communities are internet-based platforms that enable patients to seek medical consultations from doctors. “Xunyiwenyao,” “Haodaifu,” and “360Liangyi” are all significant and highly influential large-scale physician–patient interaction platforms in China. These platforms have > 900,000 doctors from > 100,000 legitimate hospitals nationwide. Therefore, we selected “Breast Bar,” “Xunyiwenyao,” “Haodaifu,” and “360Liangyi” as data sources.

Data collection

Using the keyword “breast cancer,” we applied the Python software (Python Software Foundation, CWI, Netherlands) to collect posts published in the “Breast Bar,” “Xunyiwenyao,” “Haodaifu,” and “360Liangyi” from January 1, 2017, to April 25, 2023. Data scraping and storage were achieved by utilizing Python and its library resources, such as Requests and lxml. The core process of scraping primarily relied on the Requests library for making GET requests and parsing HTML content to acquire data. For web pages with complex structures, advanced parsing tools were employed or user actions simulated to precisely extract the needed information. Additionally, the speed of scraping was enhanced through the use of multithreading and asynchronous programming techniques. Throughout the development of the web scraper, we adhered to websites’ privacy policies, only collecting data necessary for the research, paid attention to personal data protection, and avoided unnecessary server load by setting appropriate request headers, avoiding peak hours, and implementing suitable delays. The Etree library and Xpath language were used to filter advertising posts during the data scraping process on the aforementioned forum and websites. Available post-related information, including user ID, post title and body content, timestamp, number of likes, number of favorites, and number of comments, was stored in a database.

Data cleaning and coding

To improve the veracity of the dataset and resolve any disagreements, the first author and a research assistant independently evaluated, filtered, and coded the text data. Inclusion criteria included (1) posts on breast cancer-related needs, (2) posts from which the current treatment stage of a patient can be inferred, and (3) posts published by the patients or their families. Exclusion criteria included (1) posts with repeated or duplicated questions and (2) posts that were irrelevant to the research question. The posts were classified based on the treatment phase, including initial diagnosis, perioperative, non-operative, relapse and metastasis, and rehabilitation treatment phases.

Data preprocessing

We preprocessed the extracted posts using the Python software. First, as the informality and colloquial nature of the language used in online communication often result in the use of different sentences with similar meanings, which might impede natural language processing, we initially standardized vocabulary or phrases that convey identical meanings but vary in wording. For instance, “单子” (examination form) and “报告单” (report sheet) were unified as “报告单” (report sheet). Next, texts were segmented into words before further analysis. We used the extensively used Chinese word segmentation tool, “Jieba” (结巴) library for this purpose. After text segmentation, both the Ha Gong da and customized stop word lists were used to eliminate punctuation marks, conjunctions, numbers, English letters, and a set of frequently used words with minimal meaningful content. Following this preprocessing step, meaningful noun phrases were extracted.

LDA model

During the LDA topic modeling phase, the Gensim library was utilized, which is a Python library specifically designed for topic modeling, document indexing, and similarity retrieval of large corpora [19]. Using the LDA model, topics were extracted from collected posts, revealing latent topics hidden within the texts and differentiating the various meanings of words. The optimal number of topics was determined by adopting the perplexity calculation method, where a lower perplexity value indicates a better generalization capability of the model [17, 18]. After extracting the topics, each was characterized by a set of words with varying strengths, indicating the degree of association with the topic. Words with higher strength values are more representative of their respective topics. By analyzing the semantics of these topic words, we summarized the core content of each topic and assigned descriptive names to enhance understanding of their meanings.

When posting inquiries, posters often reveal their own or the patient’s identity information as well as the patient’s age. In this study, “poster identity” denotes clear information identifying the poster as either the patient or a relative. Posts lacking direct identification as such were excluded, except for age details, which were deemed non-essential for identity; absent age data led to classification under “unknown” without post exclusion. For example, a post might describe: “My mother, 56 years old, was recently diagnosed with triple-negative breast cancer. How long can someone with triple-negative breast cancer live?” Such descriptions enabled us to determine whether the poster is the patient themselves or a relative of the patient, and to ascertain the patient’s age. Posts that do not disclose the patient’s identity information are excluded based on our inclusion criteria. Ultimately, we compiled data on the number of posts made by the patients themselves, relatives of patients, patients under the age of 40, patients aged 40 and above, and posts from different regions for each period. The region of the patients was inferred from the IP address captured during data collection.

Results

Basic information about the posts

We collected 84,043 posts, of which 9504 were included after data cleaning, including 7786 (81.92%) and 1718 (18.8%) from patient-to-patient and patient-to-doctor communities, respectively. Specific information collected from each community is presented in Table 1. After coding, 4902 posts represented the initial diagnosis phase, ranking first among the five treatment phases and accounting for 51.58% of the total posts. Conversely, the posts for the rehabilitation phase were 588 (6.19%), which were the lowest among all treatment phases. The posts for perioperative, non-operative, and relapse and metastasis treatment phases were 1134, 1992, and 888 posts, respectively. The posts for the non-operative treatment phase had the highest average number of likes per post, with 10.92 likes per post. The average number of favorites marked and comments per post were 8.60 and 68.22 for the relapse and metastasis treatment phases, respectively, both ranking first. Detailed information on the average engagement metrics across the treatment phases is shown in Fig. 2.

Table 1 Basic information on post-collection
Fig. 2
figure 2

Average engagement metrics across treatment phases

Post-related demographic information

Only 23.60% of the posts contained information about the patient’s age. During the perioperative phase, posts from young patients with breast cancer were higher than those from other older patients. However, in the other three treatment phases, the posts from the young patients were fewer than those from the older patients. In each treatment phase, the number of posts published by the patients exceeded those published by their family members. User location information was available for only 38.06% of posts. Throughout every treatment phase, East and Northwest China consistently generated the highest and lowest number of posts, respectively. Specific post-related demographic information is presented in Table 2.

Table 2 Post-related demographic information (n, %)

Analysis of need-related topics among patients with breast cancer and their family members

To determine the optimal number of topics, we calculated the perplexity of each topic. Figure 3 shows a line graph of the perplexity for each treatment phase, in which the horizontal and vertical axes represent the number of optional topics and perplexity value, respectively. We chose a topic number in the interval N ∈ [1, 10] to ensure that valuable topics would be discovered and each topic would be well explained. The perplexity values for each treatment phase first decreased rapidly as the number of topics increased, reaching their lowest point at three topics for the early diagnosis stages and two topics each for the perioperative, non-operative, relapse and metastasis, and rehabilitation phases. Subsequently, as the number of topics increased, the perplexity values gradually increased. Therefore, with three topics for the initial diagnosis phase and two for the other phases, the model was the most stable, and the intelligibility of the topic words was the best. Table 3 shows the 11 final topics with their respective topic words and strengths. These topics enabled us to explain and summarize the discussion text for each topic.

Fig. 3
figure 3

Perplexity change graph. Initial diagnosis phase (A), perioperative treatment phase (B), non-operative treatment phase (C), relapse and metastasis treatment phase (D), and rehabilitation phase (E)

Table 3 Distribution and strength of topics and keywords among patients with breast cancer and families across treatment phases

Initial diagnosis phase

Three topics were identified in the initial diagnosis phase, including disease outcomes, diagnostic analysis, treatment information, and emotional support.

Disease outcomes (Topic 1)

Keywords such as “cure rate, life expectancy, prognosis, recurrence, and survival period” suggest that the primary concerns of individuals newly diagnosed with breast cancer and their families are the prospects of curing the disease and improving their overall survival. For example, similar to some other posts, one patient asked, “I am 32 years old, and today I received a pathological diagnosis of invasive breast cancer. How long can I live?”.

Diagnosis analysis (Topic 2)

Keywords related to auxiliary examinations were included, such as “immunohistochemistry, puncture, ultrasound (B-mode), biopsy, color Doppler ultrasound, magnetic resonance imaging(MRI), and report sheet,” indicating the attitude of patients and their families in seeking assistance from doctors or fellow patients on online platforms to interpret medical reports, thus obtaining comprehensive diagnostic information. A patient asked, “I underwent a color ultrasound and needle biopsy at the Beijing Cancer University Hospital 8 days ago, and the diagnosis was breast cancer. Can anyone please help me review my report? Is this type relatively less malignant?”.

Treatment information and emotional support (Topic 3)

Keywords such as “treatment method, chemoradiotherapy, surgery, breast-conservation, targeted therapy, costs, and hospitals” showed that patients with breast cancer and their families sought information about treatment plans, costs, and hospitals. For instance, in one post, an individual asked for help because the doctor mentioned three treatment plans for a mother’s invasive breast cancer diagnosis. Keywords such as “breakdown, fear, and positive energy” indicated that patients and families often experience complex emotions, including “emotional turmoil, desperation, sadness, and helplessness,” following a breast cancer diagnosis. They expressed their feelings online and sought support and confidence by sharing their experiences. For example, a newly diagnosed patient with breast cancer expressed feelings of fear and helplessness, asking for support and sharing experiences from others, to which other users responded.

Perioperative treatment phase

The perioperative treatment phase included two topics: surgical options and outcomes and post-surgical care and treatment planning.

Surgical options and outcomes (Topic 4)

Keywords such as “recurrence, reconstruction, prognosis, artificial breast, implantation, modified radical surgery, breast-conserving surgery, total resection, infection, and general anesthesia” indicated that patients and their families were concerned about breast cancer surgery and its outcomes. Surgery is the primary treatment for early- and mid-stage breast cancer. Surgical options include modified radical mastectomy, breast-conserving surgery, and breast reconstruction after mastectomy. Patients could choose the most suitable option based on their situation. However, navigating these choices can be daunting; therefore, patients often seek help from healthcare professionals or peers for more information to aid their decision-making. Common questions included “We need to choose a surgical plan. Have any patient had their whole breasts removed or received implants? If so, can you share what it was like for them?”.

Postoperative care and treatment planning (Topic 5)

The keywords in this topic were related to post-surgical care and follow-up treatment. Keywords such as “diet, recovery period, functional exercise, and weight” indicated that patients and their families wanted information on diet, exercises, and activities after breast cancer surgery. Keywords such as “drainage, pain, and swelling” reflected their interest in drainage tube maintenance and common postoperative symptom management. For example, the question, “Has anyone in this group undergone breast reconstruction surgery? If so, I am curious to know how long the drainage tube needs to be kept in place.” Additionally, “chemotherapy and radiotherapy” indicated concerns about future treatment plans after breast cancer. For instance, similar to some posts, one user asked, “My wife underwent breast resection for ductal carcinoma in situ approximately ten days ago. Does the patient require chemotherapy?”.

Non-operative treatment phase

The non-operative treatment phase included two topics: treatment options and costs and management of chemoradiotherapy side effects and disease prognosis assessment.

Treatment options and costs (Topic 6)

Keywords including “targeted therapy, course of treatment, paclitaxel, doxorubicin, medical insurance, Adriamycin and Cyclophosphamide Regimen, Taxotere (docetaxel) and Cyclophosphamide Regimen, drug change, treatment plan, and expense” indicated that during the non-operative treatment phase, patients and their families often have questions about treatment plans, including the effectiveness and cost of chemotherapy drugs. Common questions included, “The doctor at the city hospital suggested the Adriamycin and Cyclophosphamide Regimen, while the doctor at the provincial hospital recommended the Taxotere (docetaxel) and Cyclophosphamide Regimen. Which one is better?” “The doctor recommended changing the medication to either docetaxel or albumin-bound paclitaxel. Which should we choose?” “Is doxorubicin covered by insurance, and how much would an additional eight chemotherapy treatments cost?”.

Side effects management and disease prognosis assessment (Topic 7)

This topic focused on the side effects of chemoradiotherapy and disease prognosis. Discussions on this topic were centered on the effective management of common side effects, such as “endometrial thickening, albumin decline, nausea, skin ulcers, and hair loss,” and disease prognosis after non-surgical treatment. Common questions included, “I have been on targeted therapy for almost 2 months, and my white blood cell count is low. What should I eat to help them? Has anyone experienced taxophene-induced endometrial thickening, and how did you deal with it? I have completed chemoradiotherapy for breast cancer, and the doctor said that it went well. Is there a high chance of recurrence?” Additionally, as patients often need peripherally inserted central catheters (PICCs) for multiple chemotherapy sessions, “PICC” was reflected in this topic. Patients and their families sought information on the maintenance of PICCs. For example, one patient asked, “My arm, where the PICC was, has been swollen and painful for three days. What should I do?”.

Relapse and metastasis treatment phase

The relapse and metastasis treatment phases included diagnosis and treatment options, disease prognosis, and emotional support.

Diagnosis and treatment options (Topic 8)

This topic focused on the diagnosis and treatment of breast cancer recurrence and metastasis with keywords including “nodules, computed tomography, hydrops, B-ultrasound, report sheet, bone scan, and liver.” Patients and their families often share diagnostic reports on the platform to determine whether cancer has recurred or spread. A common question was, “How can I tell if there is bone metastasis?” When facing recurrence or metastasis, they sought advice on treatment options, such as asking, “Can radiotherapy control brain metastases from breast cancer in my mother?”.

Disease prognosis and emotional support (Topic 9)

This topic included keywords such as “survival time, stress, life span, targeted therapy, narrowing, sorrow, psychological preparation, communication, despair, and encouragement,” mainly addressing disease prognosis and emotional support. Patients and their families often share feelings of pessimism and despair after learning about cancer recurrence and metastasis. They sought to understand their expected survival time, emotional support, and encouragement from online platforms. For example, “My breast cancer has recurred! It is overwhelming, and I do not know how long I have, but I am not giving up. I need encouragement!” Many patients shared their feelings and received supportive responses.

Rehabilitation treatment phase

The rehabilitation phase included follow-up and recurrence concerns, physical symptoms, and lifestyle adjustments.

Follow-up and recurrence concerns (Topic 10)

Keywords such as “lymph, blood flow, scanning, computed tomography, B-ultrasound, cancer antigen 153, color ultrasound, centimeter, and armpit” represented the posts related to review reports for patients with breast cancer who need regular check-ups. They often share their results on online platforms, asking for feedback. For example, “My mother had breast cancer surgery 4 years ago; is this review report good?” The keyword “recurrence” showed that patients and their families are still concerned about cancer recurrence, even in the rehabilitation treatment phase, as seen in questions like “I have had breast cancer for 7 years; could it come back?”.

Physical symptoms and lifestyle adjustments (Topic 11)

Keywords such as “pain and edema” showed that patients with breast cancer and their families desire nursing knowledge regarding common physical symptoms during recovery. Patients often struggle with limb pain and swelling after breast cancer treatment. For example, “I finished my breast cancer treatment 1 year ago, but my right arm still hurts and swells. What can I do?” Keywords such as “diet, moxibustion, weight-bearing, traditional Chinese medicine, exercise, and sexual life” involved questions on diet, exercise, sexual activity, and using traditional Chinese medicine in recovery. People often asked “Is it safe to have sex 2 years after breast cancer surgery on my right side? Can it cause a recurrence?” or “I am 30 and finished treatment 6 months ago. What should be eaten to prevent a recurrence? Are there Chinese medicinal treatments to help?”.

Discussion

To the best of our knowledge, this is the first study to use the LDA model to analyze posts in two OHCs in China to evaluate the needs of patients with breast cancer and their families at different treatment stages. The findings of this study indicated that the number of posts gradually decreased as therapy progressed, with the highest and lowest number of posts observed in the early diagnosis and rehabilitation treatment phases, which was consistent with the results of Mikal et al. [20]. In the early diagnosis phases, patients and their families typically possessed limited knowledge about the disease and were eager to obtain information [21]. However, as treatment progressed, patients developed a deeper understanding of the disease, and simultaneously, the sociocultural and psychological barriers they faced gradually diminished, which can partly explain the declining trend in the number of online posts throughout therapy [22]. Furthermore, the mismatch between the support available online and the needs of patients may have contributed to fewer posts. Mikal et al. studied online social support among patients with breast cancer and found that patients and their families primarily post on Facebook to seek resource support; however, they often received emotional support instead because it is cost-effective, reducing participation by patients and their families on social media [20].

Regarding post-interaction, Mikal et al. revealed that post-engagement metrics showed a sharp increase after cancer diagnosis, followed by a steady tapering that continued throughout the transition to cancer therapy [20]. Our data showed a similar trend, with likes peaking during the non-operative treatment phase, whereas favorites and comments peaked during the relapse and metastasis treatment phase. This indicates that posts during the non-operative and recurrence and metastasis treatment phases generated higher engagement and resonance among other users, resulting in more interaction. In the context of the unequal distribution of medical resources in China, patients and their families are likely to encounter common problems, such as the side effects of radiotherapy, chemotherapy, and neoadjuvant therapy during non-operative treatment [23]. They often lack offline support for these common issues and are more inclined to seek information online, where they can find resonance with others. During the relapse and metastasis treatment phase, patients and their families often experience significant psychological pressure. However, Chinese have roundabout and restrained emotional expression; therefore, they are reluctant to share their emotions with close individuals [24]. In contrast, they prefer to use the Internet to express their concerns and seek relevant information. The posts, which document the emotional journeys of patients and their families with cancer, such as fear, hope, resilience, and love, had a profound impact on readers, easily evoking deep emotional connections. Moreover, these posts offer valuable information and emotional support to individuals facing similar circumstances because stories of personal struggle and resilience could be inspiring for other patients, building a virtual support system [25].

Notably, our findings revealed that the majority of posts were from patients with breast cancer, which aligned with the relatively young age profile of Chinese patients with breast cancer. Breast cancer in several Asian countries primarily occurs at the ages of 45–69 years, which is > 10 years earlier than that in Europe and the USA [26]. These patients generally possessed proficient internet skills and the ability to independently seek breast cancer-related information online. Notably, East China contributed the highest number of posts, potentially because of its superior economic development and greater accessibility to Internet resources [27].

Regarding the post contents, patients with breast cancer and their family members predominantly sought information and emotional support through online media rather than other sources. This finding was consistent with the results of Nadine et al., who showed that emotional and information support were among the most frequently provided or received types of support in interpersonal interactions as well as computer-mediated environments for individuals with cancer or other serious illnesses [28]. The results highlighted a greater emphasis on treatment-related information within the post content, such as attending hospitals and chemotherapy regimens, compared with nursing-related information, which was consistent with the findings of the study by Bei on the information needs of Hong Kong Chinese patients with breast cancer [29]. This could be attributed to social media users being mostly patients or family members who possess general knowledge about treatment-related aspects but might lack expertise in highly specialized disease nursing information [30]. Moreover, this finding corresponds with the results of Cho, who reported that visitation rates for treatment-related issues in breast cancer surpassed those for other categories [31].

Our study found that patients with breast cancer and their family members consistently expressed ongoing concerns about cancer recurrence, metastasis, and overall survival duration throughout the entire disease trajectory, which is consistent with the results of the study by Shi [32]. This emotional state, characterized by “fear, worry, or concern about cancer returning or progressing,” is commonly known as fear of cancer recurrence (FCR) [33]. Herschbach et al. confirmed that FCR might appear at any stage of the illness and persist for many years even after treatment or remission of the illness, which is consistent with our research findings [34]. This is primarily because cancer progression is uncertain, and patients and family members must constantly seek information to manage changes in symptoms, drug resistance, and complications and to understand how to cope with the negative emotions that accompany uncertainty in cancer progression, helping them have a sense of control over their lives [35,36,37].

Miller found that cancer-related demands varied across a patient's cancer trajectory [38]. Similarly, we discovered that patients and their families had different needs during the different stages of treatment. During the initial diagnostic stage, patients with breast cancer and their family members were more concerned about disease outcomes, diagnostic analysis, and treatment information. They experienced strong anxiety regarding the types of hospital examinations and diagnostic results, as well as treatment methods and their effectiveness. During the perioperative period, patients and their families focused on surgical choices, prognosis, postoperative care, and follow-up treatments, particularly information related to surgical procedures, as keywords such as “reconstruction, implantation, modified radical mastectomy, breast conservation, and total mastectomy,” which were related to surgical methods, repeatedly appeared with a high degree of strength. Breast cancer surgery, especially mastectomy, has a profound impact on the appearance of patients and significantly affects their emotional well-being and social functioning [39, 40]. Therefore, patients and their families frequently aim to alleviate the impact on their bodies through breast conservation or reconstruction while concurrently harboring concerns about the potential adverse consequences of these surgeries, prompting active seeking of relevant information online [41]. During the non-operative treatment period, treatment options and costs, chemoradiotherapy side effect management, and disease prognosis assessment were the main concerns of patients and their family members. This might be because patients at this stage had a heavy economic burden after a period of treatment, and some side effects of radiotherapy and chemotherapy emerged and progressed as the treatment progressed. Patients with recurrent or metastatic breast cancer and their family members should pay more attention to the auxiliary diagnosis, treatment, and prognosis. Recurrence and metastasis are the main causes of death in patients with breast cancer. In patients with breast cancer, this is a recurring negative event [42]. Consequently, patients might seek a second medical opinion to ensure the accuracy of the diagnosis and re-evaluate treatment options in the relapse and metastasis treatment phase. During the rehabilitation treatment phase, patients and their families showed heightened interest in understanding follow-up reports, receiving detailed explanations of prognosis, acquiring knowledge about managing physical symptoms effectively, and maintaining healthy lifestyles. After treatment is completed, patients enter a long-term monitoring and follow-up phase. In contrast, breast cancer survivors usually experience treatment-related lymphedema, pain, and fatigue [43, 44]. We found that Chinese breast cancer survivors turned to traditional Chinese medicines, such as moxibustion and herbal medicine, to alleviate specific post-treatment discomfort and enhance their overall quality of life. An exploratory study of the main French forums and discussion groups indicated that herbal medicines are frequently cited for breast cancer [45]. Breast cancer survivors expressed a strong desire for knowledge related to their sex lives. Sex was not a priority during treatment compared to securing survival; however, as time progressed, the significance of sex and intimacy in their lives became more evident, playing a substantial role in the overall quality of life [46]. Notably, some patients with breast cancer have misconceptions, believing that engaging in sexual activity might increase the risk of breast cancer recurrence. Therefore, healthcare providers should offer comprehensive sex education to breast cancer survivors. In conclusion, the patients and their families had a strong initiative to seek information at each stage. One possible reason why patients and their families resort to online queries is the increasing complexity of treatments involving multiple healthcare providers in different locations, which leads to a lack of treatment continuity [47]. Additionally, some patients and their families might not have received sufficient information and emotional support from healthcare professionals, causing them to question treatment and care measures implemented by medical personnel, which motivated them to actively seek evidence to establish trust with healthcare providers involved in their treatment [48].

Our research findings revealed that although the emotional support needs of patients and their families were particularly pronounced during the early diagnosis and recurrent and metastatic treatment phases, the keywords associated with “emotional support needs” were not entirely consistent between these two phases, suggesting variations in the emotional support required by patients and their families during these different stages. During the early diagnosis phase, the keywords related to emotions were “breakdown” and “fear.” A cancer diagnosis is often considered an extremely stressful and potentially traumatic experience [49]. Sussman et al. studied specialized oncological care following surgery for patients newly diagnosed with breast and colorectal cancer and revealed that the transition from cancer diagnosis to the formal cancer phase was accompanied by a surge in emotional distress, such as mental stress and anxiety, which correspondingly heightened the demand for information support and emotional care, which is consistent with our findings [50]. Therefore, when facing a breast cancer diagnosis, the primary feelings of the patients and their families are extreme helplessness, anxiety, and worry. In the recurrent and metastatic treatment phase, the keywords related to emotions were “pain,” “psychological preparation,” and “breakdown.” Recurrence and metastasis are the primary causes of breast cancer-related deaths, representing a negative event for both patients and their families [42]. Consequently, patients and their families may be more prone to feelings of sadness and despair, even after being emotionally prepared for their impending death. A qualitative interview study conducted by Jassim et al. on 12 patients with breast cancer indicated that women diagnosed with late-stage disease were more likely to experience negative emotions [51]. However, both the early diagnosis and recurrent and metastatic treatment phases exhibited keywords related to positive emotions, such as “positivity” and “encouragement.” This observation implies that despite the daunting challenges they faced, a number of patients and their families managed to sustain a sense of optimism, proactively confronting the recurrence or progression of the disease with resilience. Nevertheless, upon examining the frequency and prominence of the keywords, it became evident that patients and their families predominantly conveyed negative emotions on social platforms. Freedman et al. analyzed 1,024,041 posts related to breast cancer treatment and found that 57% of the posts expressed negative emotions, which is consistent with the findings of our study [22]. Positive psychological coping contributes to the emotional well-being of patients [22, 52]. Consequently, healthcare providers should greatly emphasize providing psychological support to patients during both the initial diagnosis and recurrent and metastatic treatment phases.

Study limitations

This study has some limitations. Although this study focused on the information needs of patients with breast cancer and their families at different stages of treatment, we did not specifically distinguish between the needs of the patients and their families. Furthermore, we did not conduct an in-depth comparison of the content of the posts from the two online health communities, where the needs of patients and their families may vary depending on whether they seek advice from doctors or fellow patients. Additionally, the information on the websites included in our study may have primarily reflected the perspectives of younger patients, potentially overlooking the comprehensive needs of older patients.

Conclusions

Our findings indicate that the information and emotional support needs of patients with breast cancer and their families differ across various phases of cancer therapy, and the need for emotional support intensifies in the early diagnostic and recurrence or metastasis phases. Concurrently, concerns regarding cancer recurrence, spread, and survival duration remain consistent throughout the therapeutic journey. Consequently, offering specific information or emotional assistance tailored to each phase of treatment is crucial for meeting the unique needs of patients and their families.