Introduction

According to statistics, the number of new breast cancer (BC) patients in 2020 has reached 2.26 million, which indicates that BC has become the highest incidence of cancer [1]. Mammography and ultrasonography are currently the main methods of screening for BC. Although highly effective in reducing BC mortality due to early diagnosis and treatment, mammography still suffers from some limitations, such as lower sensitivity in screening patients with dense breasts [2], frequent false positive alarm [3], and ionizing radiation risk. Correspondingly, ultrasonography is safe for pregnant women because it does not use ionizing radiation and performs better than mammography for dense breasts. However, it is not sensitive to microcalcifications and the detection rate is dependent on the experience level of the examining physician. Therefore, the development of a rapid, non-invasive, convenient, and sensitive method for BC detection is imminently required in clinical practice.

As a powerful spectroscopic technique, Raman spectroscopy (RS) has been widely used in biological detection [4, 5] based on the principle of providing molecular information about chemical bonds related to molecular vibration and rotation, which helps detect tiny changes in the structure of biomolecules composed of lipids, proteins, and nucleic acids during the development of cancer [6,7,8]. In terms of preliminary findings, RS showed high accuracy in diagnosing cancers such as bladder [9], kidney [10], and skin cancer [11], with both sensitivity and specificity exceeding 0.9. In addition, RS has relatively loose requirements on the morphology of the tested samples, such as solid tissue strips, pathological tissue sections, and even liquid samples that can be used for Raman analysis [12, 13].

Surface-enhanced Raman spectroscopy (SERS) is a technique for enhancing the Raman signal of biomolecules by using precious metal nanoparticles (gold, silver, and copper) as substrates. The electromagnetic mechanism (EM) and chemical effect (CT) of metal nanoparticles are central to the principle [14]. The EM is caused primarily by the coupling of the incident electromagnetic field in the gap between nanoscale metallic materials and can be explained as a contribution to the enhancement of the incident field and the enhancement of the Stokes scattering of molecular re-emission at a specific point on the surface where the sample is located. The CT mechanism is primarily derived from nanoscale metal particle structure and charge transfer between molecules, which is accomplished through the formation of new analyte-metal surface complexes. These two processes take place simultaneously and work together to increase the Raman spectrum intensity [15]. SERS is currently widely used in the detection of biological fluids such as serum [16], tears [17], and urine [18] because it can generate significantly enhanced Raman signals, even at the single-molecule level in some cases [19].

Given the benefits of RS, some studies on its use in the diagnosis of breast cancer have been reported [20, 21]. However, the outcomes of different studies differ. This can be explained by the fact that different studies have different sample sizes and diagnostic algorithms. As a result, we conducted this meta-analysis to determine the clinical value of RS to comprehensively analyze the exact effect of RS in diagnosing BC.

Methods

Literature search

We searched the relevant articles published in PubMed, Embase, Web of Science, and Cochrane Library from the establishment of the database to May 20, 2022. The search terms were as follows: (1) [(“Breast cancer” OR “Breast tumor” OR “Breast neoplasm” OR “Mammary cancer”) AND (“Raman spectroscopy” OR “RS” OR “efficacy” OR “sensitivity” OR “specificity”)]. No language or study type restrictions were applied when conducting the initial literature search.

Inclusion criteria

Studies according to all of the following criteria were included: (1) studies involved RS of two parts of normal breast tissue and BC. (2) BC samples in the studies were derived from patients confirmed by pathology or purchased standard BC cell lines. (3) Studies provided true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) directly or indirectly to construct a 2 × 2 table. (4) The studies were reported in original articles.

Exclusion criteria

The following exclusion criteria were applied: (1) research involving non-human subjects. (2) Other types of study: review articles, letters, case reports, and comments. (3) Patient and data overlapping studies.

Data extraction

Two investigators independently extracted data, and differences were resolved by consensus. Overall, a total of 6 important diagnostic efficiency-related parameters were extracted, including diagnostic sensitivity, specificity, TP, TN, FP, and FN. In addition, methodological and technical data reflecting the baseline characteristics of the studies such as first author name, publication year, geographic location, number of patients, number of spectra, sample type, diagnostic algorithm, and laser wavelength were also carefully extracted.

Literature quality assessment

The standard quality assessment of each study is based on the tool Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) [22]. It consists of four parts: patient selection, index test, reference standard, flow, and time. The levels of bias risk and applicability concerns were rated as low risk, high risk, and unclear risk. The QUADAS‑2 assessment was performed by Review Manager 5.3

Statistical analysis

The accuracy of RS in diagnosing BC was assessed by pooling TP, TN, FP, and FN data to calculate sensitivity, specificity, positive likelihood ratio (PLR), negative likelihood ratio (NLR) values, corresponding 95% confidence intervals (CI), and diagnostic odds ratio (DOR). Moreover, summary receiver operator characteristics (SROC) [23] were generated to assess the relationship between sensitivity and specificity, and the area under the curve (AUC) was calculated to determine the overall performance of RS. Diagnostic tools are considered excellent when the AUC value is more significant than 0.8. To further explore potential sources of heterogeneity, the inconsistency index (I2) statistic and chi-square test were used for subgroup analysis [24]. I2 > 50% and P-value < 0.05 were considered significant for heterogeneity meanwhile a random‑effects model was applied. We also conducted Deeks’ funnel plot asymmetry test to investigate publication bias [25]. All the above statistical analyses were performed using Stata 16.0.

Results

Study selection

The study screening procedure is presented in a PRISMA flowchart (Fig. 1). After an initial literature search, a total of 2798 articles were identified, which were reduced to 712 after the removal of duplicates. Then, 656 articles were excluded by a manual screening of the titles and abstracts, and 40 articles were removed by reading the full text and reviewing the data. Finally, sixteen articles [26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41] were enrolled in this meta-analysis according to the inclusion criteria and exclusion criteria. Since the experimental sample population origins were distinct from each other, 2 studies conducted by the same author were both included [31, 32].

Fig. 1
figure 1

Flow diagram of the study selection process

Study characteristics

A total of 58,144 spectra from 16 articles were included in this meta-analysis. Most studies were conducted in China (n = 7), Pakistan (n = 3), and the USA (n = 3). Others were conducted in South Korea (n = 1), Japan (n = 1), and the UK (n = 1). Sample types were breast tissue (n = 9), standard cell lines (n = 1), serum (n = 4), whole blood (n = 1), and saliva (n = 1). Among the 16 recruited studies, 11 studies used RS, 4 studies used SERS, and 1 study used Raman confocal spectroscopy (RCS). All of the above studies were performed in vitro with pathological diagnosis as the gold standard. The detailed information of each study we included is shown in Table 1.

Table 1 Characteristics of the 16 studies included in the meta-analysis

Assessment of study quality and publication bias

The assessment of the risk of bias and concerns about the suitability of the included studies according to the QUADAS-2 tool [42] are shown in Fig. 2. Deeks’ tests for publication bias yielded p values of 0.53 which revealed that no significant publication bias was shown in the pooled analysis of the included studies (Fig. 3).

Fig. 2
figure 2

The graphical display of the evaluation of the risk of bias and concerns regarding the applicability of the selected studies

Fig. 3
figure 3

Deeks’ funnel plot asymmetry test of RS in the diagnosis of breast cancer

Overall analysis

We measured overall diagnostic accuracy by calculating sensitivity, specificity, PLR, NLR, and diagnostic odds ratio (DOR). The pooled sensitivity and specificity of RS were 0.97 (95% CI, 0.92–0.99) and 0.96 (95% CI, 0.91–0.98), respectively (Fig. 4). The pooled PLR and NLR were 21.98 (95% CI, 10.08–47.96) and 0.03 (95% CI, 0.01–0.09), respectively (Fig. 5). The DOR of RS demonstrated high accuracy (721; [95% CI, 136–3829]). The AUC of the SROC curve was 0.99 (95% CI, 0.98–1) (Fig. 6). Heterogeneity was significant across all pooled studies (I2 > 50%, p < 0.05).

Fig. 4
figure 4

Forest plot of the pooled sensitivity and specificity of RS for breast cancer

Fig. 5
figure 5

Forest plot of the pooled positive likelihood ratio (PLR) and negative likelihood ratio (NLR)

Fig. 6
figure 6

The area under the curve (AUC) of SROC (summary receiver operating characteristic) curves

Subgroup analysis

To investigate the effects of different races, sample types, instrument types, numbers of spectra, diode laser wavelengths, and diagnostic algorithms on the accuracy of Raman spectroscopy in distinguishing BC, we performed relevant subgroup analyses, whose results are shown in Table 2. The subgroup of Asian samples, serum samples, PCA algorithm, the numbers of sample spectra more than 200, laser = 785 nm, and RS showed extremely high DOR (1093.98; [95% CI, 133.31–8977.81]), (4247.05; [95% CI, 236.48–76,273.26]), (281,444.30; [95% CI, 293.11–2,700,000]), (2027.29; [95% CI, 196.92–20,871.04]), 779.76 (102.65–5923.19), and (841.31; [95% CI, 89.20–7934.57]), respectively. Moreover, we found that the subgroup of diagnostic algorithm based on PCA outperformed other subgroups in various evaluation indicators, with sensitivity being 1 (95% CI, 0.93–1.00), specificity being 0.98 (95% CI, 0.93–1.00), PLR being 61.60 (95% CI, 14.14–268.46), and NLR being 0 (95% CI, 0.00–0.08).

Table 2 The results of subgroup analysis of all studies in our meta-analysis

Discussion

RS has been extensively researched as a new technology that has become widely used in the biomedical field in recent years [42,43,44]. By analyzing 58,144 spectra from 16 studies, we confirmed the superiority and high diagnostic efficiency of RS in diagnosing BC by combining recent findings to systematically investigate the diagnostic performance of RS for BC.

We discovered that the general pooled diagnostic sensitivity and specificity of RS for BC were 0.97 (95% CI, 0.92–0.99) and 0.96 (95% CI, 0.91–0.98), respectively. Both the sensitivity and specificity were over 0.9, indicating that RS had a high identification of BC samples and can distinguish them from normal samples respectively with a low omission diagnostic rate. Furthermore, the random-effects model yielded a pooled DOR of 720.89 (95% CI, 135.73–3828.88). Since a DOR exceeding 1 indicates a high discriminative effect and the discriminative effect increases with the DOR value, the DOR of RS in diagnosing BC has a reliable diagnostic effect. In the SROC curve analysis, the AUC was 0.99 (95% CI, 0.98–1), suggesting an excellent performance for detecting breast cancer samples by using RS. Diagnostic efficiency is considered excellent according to the SROC's standard grading system.

A series of subgroup analyses were performed to further clarify the optimal conditions for the diagnosis of BC by RS. According to the subgroup analysis results, all subgroups performed well, with sensitivity and specificity greater than 0.9. Despite having a lower DOR than the Asian group, the PLR and NLR were comparable in both groups, indicating that RS is capable of screening for BC in all races. Furthermore, our study discovered that RS performed better in serum samples than in breast tissue samples, possibly due to interference from normal breast tissue in breast cancer biopsy samples. Numerous normal spectra can still be collected in a single malignant tissue in our included studies using breast tissue as samples because operators frequently collect dozens of spectra in a single sample, which are thought to be homogeneous. These misclassified spectra are the primary cause of the low diagnostic performance of breast samples. At the same time, the extremely high DOR of the serum sample group also suggests that serum samples may one day be used for BC screening for RS. In terms of sample size for diagnostic results, the pooled DOR decreased significantly when studies with fewer than 200 spectra were included in the subgroup. The explanation for this could be that the multivariate analysis algorithm’s training set was created with mistakes due to the short number of samples, resulting in poor performance in detecting BC and normal tissue. In addition, with the innovation and development of Raman spectroscopy for tumor diagnosis, technologies such as Raman spectroscopy identification and diagnosis based on machine learning are emerging. Our subgroup analysis revealed that RS in conjunction with the PCA algorithm had an excellent diagnostic effect, with a DOR of 28,144.30. Due to a lack of data for the other diagnostic algorithms included, we were unable to analyze them. Besides, subgroup analysis of both the RS and SERS instruments was performed to clarify the impact of different Raman instruments on the experimental results. We discovered that both types of Raman instruments achieved good sensitivity and specificity, despite the fact that the DOR values in the RS group were higher than in the SERS group (841.31 vs 495.98), but this may be due to the significantly smaller number of spectra in the SERS group, and cannot be used to justify RS being better than SERS. It is worth noting that the adsorption of molecules on metal colloidal particles and rough metal surfaces in the SERS principle can increase the spectral intensity of the sample by 10^4–10^6 times, making it suitable for detecting BC in combination with serum samples. We already know that the fourth power of the excitation wavelength has an inverse relationship with the Raman scattering efficiency. It is crucial to select the proper excitation wavelength in order to maximize scattering effectiveness while minimizing fluorescence interference. Our findings indicate that the DOR at wavelength = 785 nm is comparable to the overall DOR, but we were unable to investigate wavelengths above and below 785 nm due to a lack of data. Overall, based on the findings of our subgroup analysis, we have reason to believe that RS combined with the PCA algorithm using serum samples will become an effective method for BC screening in the future.

The inelastic scattering of photons from molecular surfaces causes Raman scattering, and the scattering spectrum is influenced by the energy exchange between the sample molecules and the photons. The sample’s biological makeup can be determined by examining the relative position and strength of each distinctive peak in the Raman spectrum [45]. When compared to normal breast tissue, BC tissue’s lipid and carotenoid contents reduced due to the development of cancer, while its protein content considerably rose. As a result, it was discovered that while the peaks at 853 cm−1 (protein) were greatly enhanced, the peaks at 719 cm−1 (lipid) and 1159 cm−1 (carotenoid) were significantly reduced in RS [46]. RS distinguishes between BC and normal breast tissue in this manner.

RS has the potential to distinguish different stages of BC development in addition to serving as a screening tool for the disease. Han et al. [27] used RS to accurately distinguish four types of breast tissue (normal breast tissue, atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS), and invasive ductal carcinoma (IDC)). The experimental results revealed that the spectrum of the ADH sample had a significant decrease in the peaks representing lipid characteristic peaks (1300 cm−1 represents CH3 deformation frequencies in lipid and 1656 cm−1 represents the C–C stretch of phospholipids) and a significant increase in the peaks representing DNA (1096 cm−1) and protein characteristic peaks (1267 cm−1 represents the C-N stretching mode of protein). Although the overall accuracy could be improved, it is still clinically useful for ADH and DCIS diagnosis. According to Wellings and Jensen’s [47] model of breast cancer development, normal cells in the terminal ductal lobular unit first develop into AH, then DCIS, and finally IDC. Early detection and intervention in the AH stage of breast cancer can reduce the occurrence of cancer [48]. As a result, detecting breast AH by RS is critical for protecting women’s health. RS enables the identification and discovery of cancer cells at an early stage by quantifying changes in the chemical structure and content of substances in breast tissue, which is currently unattainable by other cancer screening methods. We hope to improve the diagnostic performance of RS in the future, making it the primary method of early screening for BC.

Although studies on the application of RS in vivo were excluded due to a lack of TN, TP, FN, and FP, which are required to calculate sensitivity, specificity, and so on, their role in the diagnosis and treatment of BC during surgery cannot be underestimated. Lizio et al. [49]. have developed a novel technique for measuring BC specimens on an intraoperative timescale (20 min) for rapid assessment of BC during surgery. Wen et al. [50] used an AU nanostar-based photoacoustic, surface-enhanced Raman spectroscopy, and thermosurgical probe to create a “three-in-one” therapeutic nanoprobe for residual microtumors in orthotopic BC. Following the treatment strategy for residual microcarcinoma, mouse experiments confirmed that the tumor did not recur after residual tumor eradication. These results show that RS can be employed in clinical settings, particularly given its quick and non-destructive advantages, which can be used to identify tumor boundaries and even remove any remaining cancer following surgery.

The present meta-analysis also has several limitations. First, despite our subgroup analysis, our study’s fundamental drawback is significant heterogeneity, which cannot be minimized. Second, the patient size in each study was small, and the number of spectra varied greatly among the included studies, potentially influencing the results. Third, due to a lack of available data, we were unable to conduct a meta-analysis of various breast cancer pathological subtypes, which may have impacted the accuracy of our findings. We cannot demonstrate that our results accurately separate BC of different subtypes because BC types are complex and diverse. Fourth, standard procedures and protocols for RS diagnosis have not been established, making it difficult to standardize procedures for RS.

Conclusion

As an emerging optical diagnostic technique, RS has great potential in detecting malignant breast lesions. At the same time, it has the advantages of non-invasiveness, real-time, and ease of use. However, before considering real-time use in clinical settings, larger sample size studies are required to determine whether RS can distinguish between different BC subtypes. Meanwhile, the RS’s performance must be further examined and normalized.