1 Introduction

Surface plasmon resonance (SPR) sensors are extensively used for biochemical sensing owing to their favorable characteristics, such as label-free detection, real-time results, increased sensitivity, and reliability [1, 2]. The fundamental sensing concept states that when light passes through the prism and strikes the metal surface, collective oscillations of free electrons, i.e., surface plasmon wave (SPW) are excited at the dielectric–metal interface. When the phase-matching requirement is satisfied, this collective oscillation travels along the dielectric–metal interface [2]. Measuring the changes in the refractive index (RI) or concentration owing to adsorption of the analyte in the sensing medium enables detection of the analyte. In addition, the SPW’s electric field is very sensitive to variations in the RI or concentration of the sensing medium within the penetration depth of the evanescent field [3]. Therefore, SPR sensors are being used in a wide range of applications, including food safety, environmental monitoring, and clinical diagnostics, due to their outstanding properties [4,5,6].

Traditionally, the optimal sensor structure for fabrication is designed using a brute-force trial-and-error process in which multiple simulation settings are examined and the best performing parameter set is chosen. This procedure is tedious, computationally costly, and time-consuming. Additionally, various parameter sets are omitted as only a small subset of the parameter space can be manually tested. Also, this entire process has to be repeated even with a minute change in desired sensor performance [7]. Also, the design of these sensors is based on time-consuming simulation techniques, such as finite difference time domain (FDTD), finite element method (FEM), or rigorous coupled wave analysis (RCWA). These techniques are employed iteratively until the desired and optimized results are attained. This optimization procedure is inefficient, since it often takes lots of time to complete and the majority of simulation findings are underutilized. In addition to the time-consuming simulation challenge, SPR sensors also suffer from low stability and background noise [8,9,10]. The commercialization of most contemporary SPR sensors is hampered by their precision and reliability. Therefore, to avoid the time-consuming simulation and to enable the direct prediction of SPR sensing characteristics, researchers are looking to construct a model to discover the precise connection between the geometrical parameters of the SPR sensor structure and their associated performance parameters, such as sensitivity, figure of merit (FoM), etc.

To develop an intelligent SPR sensor from a traditional one, researchers have used machine learning (ML) techniques and evaluated the sensors' efficiency and accuracy [7, 10,11,12,13,14]. ML technique has gained prominence due to rise in computational power and it can be used to emulate the simulation of the intended sensor design. ML techniques are based on decision systems and they might be an appropriate alternative to overcome the problems of SPR sensors by automatically predicting RI or concentration change of an analyte in the sensing medium. ML involves training a mathematical model consisting of undetermined parameters on a large and varied dataset. Its parameters are randomly initialized and are tuned as training continues. This enables the model to learn the impact of each input feature on the output. As a result, the model can then discover relationships and patterns in a huge dataset and can emulate the simulation. It has been used in designing biosensors with negative meta materials [11], predicting nitrate soil nitrogen content [15], brain tumour diagnosis [16], and mode classification of PCF SPR sensor [17] etc. The ML model serves as a substitute to the intensive computational simulation, but it alone by itself cannot yield the optimum parameter set. Recently, many researchers have utilized particle swarm optimization (PSO) to scan multiple dimension parameters to speed up the computational process [18, 19]. Using PSO with ML offers an extremely efficient solution.

This paper aims to use the ML technique in conjunction with PSO for the efficient design of a graphene-assisted SPR sensor and attempts to develop a highly tunable photonic device. Concerning the tunability property, it is believed that the Fermi energy of graphene can be modified by chemical doping, which results in variable inter-band transitions and a change in the RI of graphene, according to the Pauli blocking concept [20]. In a broader sense, the objective of this paper is two-fold: (i) to explore and establish a relationship between chemical doping of graphene and the performance of the sensor; and (ii) ML algorithms are used to optimize the performance of graphene-based SPR sensor designs, enabling an efficient and robust determination of graphene-based sensor structures without extensive experimental and theoretical study.

2 Concept and methodology

2.1 Design of sensor structure and its feasibility

The proposed Kretschmann sensor structure comprises of four layers, as shown in Fig. 1a: an SF10 prism, a gold (Au) layer, graphene layer, and a sensing medium containing biomolecules. For the numerical analysis, the RI values for SF10 prism, Au layer, and chemical potential (CP) dependent graphene RI are used from the existing literature [2, 20, 21]. The CP may alter the surface conductivity and RI of graphene. The surface conductivity of graphene is defined by the Kubo formula in terms of intra-band and inter-band transitions [20]. Graphene’s tunability is obtained by changing its CP and the CP of graphene can be changed through chemical doping. To attain the desired sensor performance, the thickness of the Au layer (d2), graphene (d3), and number of graphene layers (L) are tuned. The operating wavelength of the polarized laser source for sensing investigations is used as 633 nm. The sensing medium is considered as the last layer of the aqueous solution with a RI variation of 1.33 + \(\Delta {n}_{\mathrm{s}}\), where \(\Delta {n}_{\mathrm{s}}\) = 0.005 reflects the sensing medium’s RI shift due to analyte absorption across graphene [20]. Change in the RI of the sensing medium is due to the considerable absorption of biomolecules over the graphene surface.

Fig. 1
figure 1

a Schematics of proposed sensor structure, b feasibility of sensor structure, c workflow of preprocessing calculation of reflectance curve using TMM, which is followed by machine learning for analysis of reflectance curve, and d workflow consisting of ML models and PSO for structural parameter set selection

Regarding the possibility of implementing the proposed structure in practice, the vapor deposition technique can be utilized to deposit a thin Au layer on the SF10 prism [21]. Using the chemical vapor deposition (CVD) process, graphene may be synthesized and transferred onto an Au layer [22]. Realization of an experimental SPR-based sensor is possible by fabricating a graphene-based SPR chip as shown in Fig. 1a, b [21, 22].

2.2 Method and performance parameters

This section discusses the sensor’s performance parameters. The performance characteristics for the proposed sensor are computed from the reflectance curve using Fresnel’s equations and the N-layer transfer matrix method (TMM), which are shown in the block diagram in Fig. 1c. Thorough discussion of the TMM modeling utilized for the proposed sensor has been already presented in the literature [1, 4]. The sensor performance characteristics are calculated with the help of references [2, 4, 20]. To assess the performance of the proposed SPR sensor, the SPR reflectance curves are used to calculate the following parameters:

Sensitivity (S) is defined as the ratio of shift in resonance angle (\(\Delta {\theta }_{\mathrm{res}}={\theta }_{2}-{\theta }_{1}\)) to shift in sensing layer RI in the range of 1.330–1.335 (\(\Delta {n}_{\mathrm{s}}=0.005\)) recorded in SPR reflectance curves due to adsorption of analyte. Sensitivity represents the detecting capabilities of the sensor [1, 7]:

$$S= \frac{\Delta {\theta }_{\mathrm{res}}}{\Delta {n}_{\mathrm{s}}}\left(^{\circ} /\text{RIU}\right).$$
(1)

Full Width at Half Maximum (FWHM) is the difference between the resonance angles at a half of the maximum reflection intensity. Additionally, it represents the angular width of the SPR reflectance curves. Detection accuracy (DA) is defined as inverse of FWHM [1, 4]:

$$\text{DA}= \frac{1}{\mathrm{FWHM}}\left(1/^{\circ} \right).$$
(2)

Figure of merit (FOM) is the product of sensitivity and detection accuracy. FOM should be as high as possible [1, 4]:

$$\text{FoM}=S\times \text{DA} \left(1/\text{RIU}\right).$$
(3)

2.3 Machine learning and particle swarm optimization

MATLAB® is used to simulate the reflectance curve for different structural parameters of sensor, such as thickness of Au layer (d2) and graphene layer (d3) with CP of graphene at various RI of sensing medium (\({n}_{\mathrm{s}}\)). Figure 1c, d depicts the schematics of the proposed sensor as well as the process of calculating the reflectance curve using TMM followed by ML and PSO for analyzing the performance of the proposed SPR sensor. In addition, gradient boosted regression trees (GBRT) are included into the model. It is a collection of decision trees that use many decision branches to create more accurate predictions. Each tree/branch attempts to gain knowledge from its predecessor to create superior outcomes. In essence, several trees with shallow depths are generated, each of which makes accurate predictions on a portion of the data, and their combination results in a robust model.

2.3.1 Dataset of ML model

A large and varied dataset is necessary for better training of ML model. The dataset was compiled by simulating the proposed sensor in MATLAB® and iteratively cycling through all parameter settings for the required data range. It took approximately 76 min. The calculations were done by the TMM owing to its high accuracy [1]. In Fig. 1d, the ranges of input variables are as follows: (i) gold thickness (d2)—35 nm to 55 nm with 1 nm interval; (ii) graphene thickness (d3), which can be calculated using the expression L × 0.34 nm, where L is the number of graphene layers and 1 to 7 layers are considered here; (iii) CP of graphene—0 to 1 with an interval of 0.1, and (iv) sensing layer RI (\({n}_{\mathrm{s}}\))—1.33 to 1.40 with an interval of 0.005. A total of 24,255 data points were collected, out of which 22,805 points had non-zero FWHM. The data was rescaled using the function of MinMaxScaler for faster and more efficient training. For the classifier, the data was split into 10% test data (2426) and 90% training data (21,829). The data with a non-zero FWHM was then further split into 10% test data (2281) and 90% training data (20,524). The training data is used by the model for parameter tuning and to decipher the relationship between the inputs and the output. During training, the model is not exposed to a portion of the data known as test data. The test data is used to simulate the performance parameters on which the model would be working and is used to evaluate its performance and usefulness. As seen in Fig. 2a, b, we used the Spearman and Pearson correlation matrices to determine the linear and monotonic correlations, respectively between the input and output parameters and to enhance the ML model’s performance. A larger coefficient value implies a stronger association. The \(m\) data points are utilized to calculate the Pearson correlation coefficient (\(r\)) [23] between two variables (x and y) using the equation as follows:

Fig. 2
figure 2

a Heat map of the Spearman correlation coefficient matrix. b Heat map of the Pearson correlation coefficient matrix

$$r=\frac{m\left(\sum xy\right)-\left(\sum x\right)\left(\sum y\right)}{\sqrt{\left[m\sum {x}^{2}-{(\sum x)}^{2}\right]\left[m\sum {y}^{2}-{(\sum y)}^{2}\right]}}.$$
(4)

Similarly, the Spearman correlation coefficient (\(\rho\)) between two variables x and y is defined [24] as:

$$\rho =\frac{1/m\sum_{j=1}^{m}\left(R\left({x}_{j}\right)-\overline{R\left(x\right)}\cdot(R\left({y}_{j}\right)-\overline{R\left(y\right)})\right)}{\sqrt{(1/m\sum_{j=1}^{m}{(R\left({x}_{j}\right)-\overline{R\left(x\right)})}^{2})}-(1/m\sum_{j=1}^{m}{(R\left({y}_{j}\right)-\overline{R\left(y\right)})}^{2})}$$
(5)

where, R(x) and R(y) are the ranks of x and y, respectively. \(\overline{R\left(x\right)}\) and \(\overline{R\left(y\right)}\) are the mean ranks of x and y, respectively.

In Fig. 2a, the Spearman correlation coefficient between \({n}_{\mathrm{s}}\) and minimum of θres is 1, which indicates a strong positive monotonic relationship. The Spearman correlation coefficients between minimum of θres and geometry parameters of sensor structure such as d2, d3 and CP are − 0.029, − 0.038 and − 0.0038, respectively, which depict a weaker dependence than on \({n}_{\mathrm{s}}\). The Spearman correlation between d3 and FWHM is 0.32. The Spearman correlation coefficients between FWHM and d2, CP and \({n}_{\mathrm{s}}\) are − 0.8, − 0.08 and 0.21, respectively. These results indicate that characteristics with lower reliance are also relevant and cannot be neglected during model training. Likewise, in Fig. 2b the Pearson correlation coefficients have maximum absolute values between \({n}_{\mathrm{s}}\)—minimum of θres is 1 and between and d2—FWHM is − 0.77, respectively. The Pearson matrix provides additional support for using all parameters in model training. Figure 3a depicts the distribution of the training data set for \({\theta }_{\mathrm{res}}\) and the histogram in Fig. 3b depicts the distribution of the training data set for FWHM. Further, minimizing FWHM is essential for improving performance of sensor. In some circumstances, FWHM cannot be determined because the reflectance drop does not go below 0.5. Since these situations are particularly undesirable due to their low resolution, it is vital to create a classifier to detect and exclude them. Regression models (RM) were developed to predict the resonance angle (θres) and FWHM. Resonance angles are selected for prediction rather than sensitivity or FoM directly. Hence, our study is extendable and models do not need retraining even if the reference RI is changed. Further, FoM may be computed and used by PSO while selecting the parameters. Finally, we will generate a dataset, train a classifier to weed out zero FWHM data points, train ML model using the scikit-learn framework for θres and FWHM prediction, and then use this model in the optimisation algorithm.

Fig. 3
figure 3

Histograms of the training and test datasets for a resonance angle (θres), b FWHM

2.3.2 Multi-layer perceptron classifier

The multi-layer perceptron (MLP) classifier is used with five hidden layers containing 150, 100, 100, 50, and 50 nodes, respectively and the case with zero FWHM is considered to be positive class. The maximum number of cycles (Epoch) was set at 100. The loss curve is plotted and shown in Fig. 4a. After many iterations, the loss value reduces significantly. After 30 epochs, execution ceases when the loss value stabilizes at zero for computing efficiency.

Fig. 4
figure 4

a Loss value of the model versus number of epochs. b Confusion matrix for the classifier

The confusion matrix for the classifier is shown in Fig. 4b. The accuracy of the test was 99.75%, while the accuracy of the training was 99.84%. Accuracy alone is not a sufficient criterion since it does not notify about the model’s ability to exclude zero FWHM use cases. Therefore, we assessed the model’s ability to exclude positive (zero FWHM) instances using the recall value. Recall indicates the number of real positive instances that the model accurately predicted. The recall value can be calculated as the ratio of true positive value to the summation of true positive and false negative values. True positive value consists of predictions where the predicted value matches the real value and the real value was positive. The false negative value consists of predictions where the predicted value does not match the real value and the real value was positive. The recall value for the proposed sensor is 1 because the true positive and false negative values are 157 and 0, respectively for the proposed sensor. The proposed model has a recall value of 1.0 with the prediction accuracy of 99.75%. It accomplishes the objective as no test data point with an FWHM of zero was incorrectly categorized.

2.3.3 Regressor for resonance angle

A gradient boosted regression tree (GBRT) with hundred estimators having a maximum depth of three and a learning rate of 0.1 was employed. Mean absolute error (MAE) was selected as the performance metric to determine the model’s effectiveness [25]. The MAE for test data was 0.0187° while for training data it was 0.0185°. Figure 5a shows a scatter plot of both the true (blue dots) and predicted (red stars) θres values. It is evident that both the predicted and true values are clustered tightly together, thereby indicating their conformity. Moreover, Fig. 5b depicts the histogram of error in predicted values of θres. As noted, the majority of the errors converge within 0.05°. Each θres was a prediction error of less than 0.003%. All of these criteria, together with the low MAE on test data, imply that the model can provide reasonably accurate predictions to be used by optimization algorithms.

Fig. 5
figure 5

a Relationship between the predicted (red stars) and true (blue dots) values of resonance angle. b Histogram of error in predicted values of resonance angle (θres)

2.3.4 Regressor for FWHM

A GBRT with seven thousand estimators having a maximum depth of four and a learning rate of 0.361 was used. Additionally, the loss attribute was set to ‘squared error’ for training. The resultant model had a MAE of 0.198° for test data and 0.147° for training data. Figure 6a consists of scatter plots of both the true (blue dots) and predicted (red dots) values of the FWHM. Similar to θres, it is visible that both the predicted and actual values are closely placed, thereby indicating their convergence. Furthermore, Fig. 6b depicts the histogram of error in predicted FWHM values. As it can be seen, most of the errors are converged within 0.075°. All these factors, along with the low MAE on test data, indicate that the model can provide reasonably accurate predictions to be used by optimization algorithms.

Fig. 6
figure 6

a Relationship between the predicted (red dots) and true (blue dots) values of FWHM. b Histogram of error in predicted values of FWHM

2.3.5 Particle swarm optimization (PSO) algorithm

It would be computationally costly and impractical to simply iterate exhaustively through the 4-dimensional parameter space and choose the parameter sets with the highest FoM. It feels that an optimization technique such as Kennedy and Eberhart’s [26] particle swarm optimization (PSO) provides a viable solution in this study [18].This algorithm is influenced by the behaviour of a group of fish or birds seeking food or refuge. The discovery of each person aids the group as a whole in achieving the optimal outcome. Since the aim of the proposed study is also to maximize the FoM, its inverse must be minimized. Therefore, the optimizable or objective function (F) to increase the FoM is defined as \({\mathrm{FoM}}^{-1}.\)

We declared N (= 20) particles, which will search the 4-dimensional parameter space and will select the values minimizing objective function, thereby maximising the FoM. Much like a swarm, each particle’s search benefits from the results obtained by other members, thereby helping the swarm to attain an optimum result efficiently. The parameter set of ith particles {d2i, d3i, CPi, \({n}_{\mathrm{s}i}\)} at jth iteration is defined as its position (Xij). The position of the particle changes according to another variable called its velocity (\({V}_{i}^{j}\)). Initially, the position and velocity of every particle are assigned a random value within the limits. During the search, the algorithm stores the previous best position \({X}_{i}^{\mathrm{best}}\) (parameter set with minimum F value) of each particle and the best position is obtained globally \({X}_{\mathrm{global}}^{\mathrm{best}}\). These values are updated as better values are encountered during execution. The position of the particles is then updated as per the equation given below [26]:

$${X}_{i}^{j+1}={X}_{i}^{j}+{V}_{i}^{j+1}.$$
(6)

Also, the position of the particle is updated as per the below mentioned equation.

$${V}_{i}^{j+1}={w\times V}_{i}^{j}+{C}_{1}\times \text{Rm}_{1}\times ({X}_{i}^{\mathrm{best}}-{X}_{i}^{j})+{C}_{2}\times \text{Rm}_{2}\times ({X}_{\mathrm{global}}^{\mathrm{best}}-{X}_{i}^{j})$$
(7)

where, \(w\) is the inertia weight constant, which determines the influence of the particle’s previous velocity on its current velocity. \({C}_{1}\) is defined as the cognitive coefficient constant and determines the weightage a particle assigns to its previous search results. \({C}_{2}\) is defined as the social coefficient constant, which determines the weightage a particle assigns to the group’s search results. \(\text{Rm}_{1}\) and \(\text{Rm}_{2}\) are random numbers between 0 and 1.

Finally, the MAE values for sensitivity and FoM are 0.82 and 0.40, respectively, which were calculated from predicted values and are apt for the optimization algorithm. A PSO algorithm with 50 maximum iterations was applied. The inertia weight (\(w\)) was set to 0.5. The cognitive coefficient and social coefficient were 5 and 2.5, respectively. The selection ranges were d2 (35–55 nm with an interval of 1 nm), L (1–7 with an interval of 1), CP (0–1 with an interval of 0.01) and RI of \({n}_{\mathrm{s}}\) (1.33–1.335). The parameters selected by PSO were d2 = 52 nm, L = 3, CP = 0 and RI of ns = 1.335.

3 Results and discussion

As seen in Fig. 7, \({\mathrm{FoM}}^{-1}\) decreases for 10 epochs before stabilizing at 0.009795. The program runs for the maximum number of epochs to guarantee that \({\mathrm{FoM}}^{-1}\) reaches a stable value. The PSO selected the parameter set from the specified optimization space after 28 s. As the RI of the sensing medium changes from 1.33 to 1.335, the resonance angle changes from 53.9764 to 54.32017. Using Eqs. (13), the sensitivity (68.754 °/RIU) and FoM (100 \({\mathrm{RIU}}^{-1}\)) of the proposed structure are estimated at optimal sensor structural parameters.

Fig. 7
figure 7

Objective function versus number of epochs

Finally, we used the PSO algorithm for designing the sensor and selecting the optimum structural parameters instead of training an ML model to predict the structural parameters (d2, d3, CP, \({{n}}_{\mathrm{s}}\)) for a given FoM because we intended our sensor to have the maximum possible FoM rather than a user-specified FoM [19]. We opted for maximum FoM as the existence of a sensor for every possible user-desired FoM is not guaranteed and also it would not give any consideration to the ease of sensor fabrication. With an optimization algorithm, we can establish a bound on a designed sensor such that it can be easily fabricated and be cost-effective. Also, we can fix any subset of the parameters and vary the rest for designing as the need arises. Unlike Yan et al. [27], we have used FoM for optimization of both sensitivity and FWHM instead of sensitivity only, as there are many parameters sets with similar sensitivity but high FWHM and consequently poor FoM as shown in Table 1.

Table 1 Variation in FWHM and FoM despite having similar sensitivity at \({n}_{\mathrm{s}}\)=1.335

Hence, it can be concluded that FWHM cannot be disregarded while determining an optimum parameter selection. Additionally, the selected parameters’ FOM is in the top 99.7 percentile, thereby indicating our algorithm’s effectiveness.

4 Conclusion

We have utilized ML and PSO to find the optimal geometric parameters for proposed SPR sensor design using ML and PSO algorithm. Our solution eliminated the requirement for brute-force searching over the full parameter space. On test data, the ML classifier distinguished between occurrences of resonance drop below and above 0.5 with a precision of 99.75% and a high recall of one. The ML model predicted the angle of resonance and the FWHM with a mean absolute error of 0.0187° and 0.198°, respectively. This also implies that an SPR sensor may be simulated by an ML model with sufficient accuracy, and there is no need for computationally expensive sensor simulation for parameter selection. The parameter set chosen by the optimization method has a FoM of 100 and is in the top 99.7 percentile. Using this process, the PSO was able to identify an optimal parameter set in 28 s, while the simulation required 76 min to find the resonance angle and FWHM for our data. The proposed conjunction of ML model with PSO may also be used to the design of other opto-electronic devices. Consequently, our strategy simplifies and facilitates the optimal design of SPR sensors.