1 Introduction

Quantum mechanics predicts the existence of quantum states of composite systems that cannot be written as products of states of their individual components [1]. These are the so-called entangled states. Today, these states play a central role in quantum information theory [2, 3] and in many applications, such as, for instance, quantum cryptography [4], quantum teleportation [5, 6], frequency standards improvement [7,8,9], one-way quantum computing [10], clock synchronization [11], and entanglement-assisted orientation in space [12], among many others. Interestingly, entangled states play a key role in the argument put forward by Einstein, Podolsky, and Rosen [13]. This was aimed at ascribing objective values to measurable quantities, that is, values that exist prior to and independently of measurements. Bell’s inequality [14] shows that precisely the existence of entangled states precludes such conception of reality.

In view of the foundational significance of entangled states and their many applications, theoretical and experimental characterization and detection of entangled states are important research subjects. One of the first criteria employed to study the entanglement of quantum states is the violation of the Clauser–Horne–Shimony–Holt inequality [15, 16] (CHSH), which is the generalization of Bell’s inequality to two observers each having the choice of two measurement settings with two outcomes. In this scenario, the violation of the CHSH inequality indicates the presence of entanglement. This approach has also been studied in the context of the theory of entanglement witnesses [17, 18]. These are observables with positive expectation values with respect to the complete set of separable states that for at least one entangled state provide a negative expectation value. Thus, a negative expectation value signals the presence of entanglement. It has been shown that the CHSH inequality can be related to an entanglement witness [18, 19].

Here, we study the detection of entanglement of unknown states via the violation of the CHSH inequality. Since the majority of the entanglement measures and entanglement detectors are based on the knowledge of the quantum state, the unknown character of the state increases the difficulty of the problem. The presence of unknown quantum states is common in quantum communication [20,21,22] and quantum computing [23,24,25], where an objective entangled state is prepared, but it is modified by the action of the environment. Furthermore, experimental setups to study the CHSH inequality generally aim to generate maximally entangled pure states. However, technical limitations, experimental inaccuracies, and noise can lead to partially entangled mixed states that are difficult to characterize. In particular, decoherence generated by an unwanted interaction between a bipartite quantum system and an environment leads to the loss of quantum coherence, entanglement, and Bell nonlocality [26]. The later has been called Bell nonlocality sudden death [27, 28], and its consequence is that states affected by this phenomenon cannot be purified. Entanglement detection of unknown quantum states has been previously studied from the point of view of quantum tomography [29, 30] by means of an adaptive scheme [31, 32], employing a succession of measurements of witness operators [33, 34], via the measurement of the energy observable [35], via local parity measurements on twofold copies of the unknown state [36], series of local random measurements from which entanglement witnesses are constructed [37], and variational determination of geometrical entanglement [38], among many others. We follow a different approach. For a given known state, the maximal violation of the CHSH inequality is obtained by maximizing the inequality onto the set of 4-tuples of dichotomic observables. This procedure is typically carried out by means of semidefinite programming (SDP) techniques. If the state is unknown, then the function to be optimized, that is, the target function, contains unknown fix parameters and SDP cannot be employed to find the measurements leading to the maximal violation. Analogously, the use of an entanglement witness also requires the knowledge about the state. To overcome this problem, we employ a recently developed optimization algorithm [39], the complex simultaneous perturbation stochastic approximation (CSPSA), which can handle functions with unknown parameters. CSPSA works natively within the field of the complex numbers. Thereby, no parameterization of the complex arguments onto the real numbers is necessary. Also, this algorithm has exhibited an improved convergence rate in certain applications such as, for instance, the estimation of unknown quantum pure states [40]. CSPSA uses a stochastic approximation of the complex Wirtinger gradient of the target function, that is, the function to be optimized, which requires the value of the target function at two different points in the optimization space. In the case at hand, these two values can be obtained experimentally, regardless of whether the state remains unknown. CSPSA iteratively generates a sequence of sets with four local measurement settings with increasing values of the CHSH function until reaching the highest possible violation of the inequality.

We first study via numerical simulations the performance of the method here proposed when applied to unknown pure 2-qubit states. In this case, the maximal value achieved by the CHSH function depends on the Schmidt coefficient of the state. Thereby, the performance of the method can be compared with an analytical bound. We show that for the set formed by states that have the same set of local Schmidt bases, the method leads in tens of iterations to a value close to the maximum of the CHSH function for each value of the Schmidt coefficient. We also consider sets of states that have the same concurrence value but different local Schmidt bases. In this case, the method also approaches the corresponding maximum value of the CHSH inequality in tens of iterations. However, the higher the concurrence value, the fewer iterations are required for a violation of the CHSH inequality. Also, all states with the same concurrence value exhibit a very similar behavior of the CHSH function as a function of the number of iterations, that is, CSPSA produces results that are nearly independent of the particular set of local Schmidt bases. We also consider the average behavior of the method on the Hilbert space of two qubits. In this case, the method reaches a CHSH function value greater than 2 after 17 iterations for an ensemble size of \(10^2\). After 25 iterations, the interquartile range is also above 2, which indicates that for 75% of the simulated states the method reached a violation of the CHSH inequality. A further increase in the ensemble size leads to a reduction in the number of iterations required to achieve a violation of the CHSH inequality. In order to study the accuracy achieved by our method, we employ the squared error. We show that the mean and median squared error on the 2-qubit Hilbert space are nearly indistinguishable. After 25 iterations, the mean square error achieves a value in the order of \(10^{-1}\) for an ensemble size of \(10^2\). A further increase in the ensemble size to \(10^3\) leads to a decrease in mean square error in the order of half order of magnitude. Thereafter, we study the case of two-qubit mixed states. Unlike the case of pure states, there is no known analytical formula for the maximum value of the CHSH function for an arbitrary mixed state. However, in the particular case of Werner states, that is, a maximally entangled state affected by white noise, it is possible to obtain the maximum value of the CHSH function in terms of the mixing parameter. We show that CSPSA is capable of achieving a value close to the maximum violation of the CHSH inequality for all Werner states. As the ensemble size increases, the value of the function provided by CSPSA becomes closer to the maximal violation. Finally, we analyze the results achieved by CSPSA for unknown mixed states. For these states, there is no analytical expression for maximal violation, so we calculate this value via semidefinite programming (SDP). After generating \(10^6\) density matrices, a subset of \(8\times 10^3\) density matrices that violate the CHSH inequality is identified. These states have a small value of the negativity, a well-known entanglement measure. Within this subset, the mean and median values of the CHSH function provided by CSPSA achieve a value close to the theoretical maximal violation after approximately 75 iterations.

Our results show that the maximization of the CHSH function via the CSPSA method allows detecting the entanglement of unknown states, pure or mixed, with a high degree of accuracy. Furthermore, the highest value of the CHSH function can also be achieved. Our approach requires the ability to adapt local measurements, which are carried out on single copies of the unknown state. This can be implemented in various experimental platforms [16, 41,42,43,44,45]. We stress the fact that no a priori information about the unknown state, such as purity, Schmidt coefficient, or Schmidt bases, has been employed to optimize the performance of CSPSA.

2 CHSH inequality and CSPSA optimization algorithm

The target function to be optimized is the Clauser–Horne–Shimony–Holt function S defined by the expression [15]

$$\begin{aligned} S({{\varvec{z}}},{{\varvec{z}}}^*)= & {} E({{\varvec{z}}}_a,{{\varvec{z}}}_b)+E({{\varvec{z}}}_a,{{\varvec{z}}}'_b)+E({{\varvec{z}}}'_a, {{\varvec{z}}}_b) \nonumber \\{} & {} - E({{\varvec{z}}}'_a,{{\varvec{z}}}'_b), \end{aligned}$$
(1)

where the expectation value \(E({{\varvec{z}}}_a,{{\varvec{z}}}_b)\) is given by the average of the products of the outcomes of two locally performed dichotomic measurements \(A({{\varvec{z}}}_a)\) and \(B({{\varvec{z}}}_b)\) defined by the settings \({{\varvec{z}}}_a\) and \({{\varvec{z}}}_b\), respectively. The vector \({{\varvec{z}}}\) contains the settings of the four local measurements, that is, \({\varvec{z}}=({{\varvec{z}}}_a, {{\varvec{z}}}'_a, {{\varvec{z}}}_b, {{\varvec{z}}}'_b)\). The CHSH inequality adopts the form \(|S|\le 2\).

A quantum mechanical dichotomic observable \(A({{\varvec{z}}}_a)\) is defined as the one having \(\pm 1\) eigenvalues, that is, an observable with the spectral decomposition

$$\begin{aligned} A({{\varvec{z}}}_a)=|\psi ({{\varvec{z}}}_a)\rangle \langle \psi ({{\varvec{z}}}_a)|-|\psi ^\perp ({{\varvec{z}}}_a)\rangle \langle \psi ^\perp ({{\varvec{z}}}_a)|, \end{aligned}$$
(2)

where \(|\psi ({{\varvec{z}}}_a)\rangle \) is an arbitrary two-dimensional quantum state

$$\begin{aligned} |\psi ({{\varvec{z}}}_a)\rangle =\frac{z_{a,1}|0\rangle +z_{a,2}|1\rangle }{\sqrt{|z_{a,1}|^2+|z_{a,2}|^2}}. \end{aligned}$$
(3)

The state \(|\psi ^\perp ({{\varvec{z}}}_a)\rangle \) is orthogonal to \(|\psi ({{\varvec{z}}}_a)\rangle \), and the components \(z_{a,1}\) and \(z_{a,2}\) of the vector \({{\varvec{z}}}_a\) are complex numbers. Thereby, the expectation value \(E({{\varvec{z}}}_a,{{\varvec{z}}}_b)\) is given by the expression

$$\begin{aligned} E({{\varvec{z}}}_a,{{\varvec{z}}}_b)= & {} Tr(\rho |\psi ({{\varvec{z}}}_a)\rangle \langle \psi ({{\varvec{z}}}_a)|\otimes |\psi ({{\varvec{z}}}_b)\rangle \langle \psi ({{\varvec{z}}}_b)|) \nonumber \\{} & {} + Tr(\rho |\psi ^\perp ({{\varvec{z}}}_a)\rangle \langle \psi ^\perp ({{\varvec{z}}}_a)|\otimes |\psi ^\perp ({{\varvec{z}}}_b)\rangle \langle \psi ^\perp ({{\varvec{z}}}_b)|) \nonumber \\{} & {} - Tr(\rho |\psi ({{\varvec{z}}}_a)\rangle \langle \psi ({{\varvec{z}}}_a)|\otimes |\psi ^\perp ({{\varvec{z}}}_b)\rangle \langle \psi ^\perp ({{\varvec{z}}}_b)|) \nonumber \\{} & {} -Tr(\rho |\psi ^\perp ({{\varvec{z}}}_a)\rangle \langle \psi ^\perp ({{\varvec{z}}}_a)|\otimes |\psi ({{\varvec{z}}}_b)\rangle \langle \psi ({{\varvec{z}}}_b)|), \end{aligned}$$
(4)

where \(\rho \) is a fixed known two-qubit state.

The problem of violating the CHSH inequality consists in finding a complex vector \({{\varvec{z}}}\) such that for a given known state \(\rho \) leads to a maximal value of \(|S({{\varvec{z}}},{{\varvec{z}}}^*)|\) larger than the classical bound of 2. This optimization problem can be solved by means of semidefinite programing or other numerical optimization techniques. However, when the state \(\rho \) entering in the function S is unknown, the standard approaches to the problem cannot be employed. The reason for this is that the function S and its derivatives cannot be evaluated.

figure a

In order to overcome this problem, we resort to the recently introduced CSPSA [39] optimization algorithm for real-valued functions of complex arguments. This algorithm works natively on the field of the complex numbers, which make unnecessary the use of real parameterizations of the complex arguments. For a target function \(f({{\varvec{z}}},{{\varvec{z}}}^*):{\mathbb {C}}^n\times {\mathbb {C}}^n\rightarrow {\mathbb {R}}\), CSPSA is defined by the iterative rule

$$\begin{aligned} \hat{{\varvec{z}}}_{k+1}=\hat{{\varvec{z}}}_k+a_k\hat{{\varvec{g}}}_k(\hat{{\varvec{z}}}_k,\hat{{\varvec{z}}}_k^*), \end{aligned}$$
(5)

where \(a_k\) is a positive gain coefficient and \(\hat{{\varvec{z}}}_k\) is the estimate of the maximizer \(\tilde{{\varvec{z}}}\) of \(f({{\varvec{z}}},{{\varvec{z}}}^*)\) at the k-th iteration. The iteration starts from an initial guess \(\hat{{\varvec{z}}}_0\), which is randomly chosen. The function \(\hat{{\varvec{g}}}_k(\hat{{\varvec{z}}}_k,\hat{{\varvec{z}}}_k^*)\) is an estimator for the Wirtinger gradient [46] of \(f({{\varvec{z}}},{{\varvec{z}}}^*)\) whose components are defined by

$$\begin{aligned} {\hat{g}}_{k,i}=\frac{f(\hat{{\varvec{z}}}_{k+},\hat{{\varvec{z}}}_{k+}^*)+\epsilon _{k,+}-(f(\hat{{\varvec{z}}}_{k-},\hat{{\varvec{z}}}_{k-}^*)+\epsilon _{k,-})}{2c_k{\Delta }_{k,i}^*}, \end{aligned}$$
(6)

with

$$\begin{aligned} \hat{{\varvec{z}}}_{k\pm }=\hat{{\varvec{z}}}_k\pm c_k{{\varvec{\Delta }}}_k, \end{aligned}$$
(7)

where \(c_k\) is a positive gain coefficient and \(\epsilon _{k,\pm }\) describes the presence of noise in the values of \(f(\hat{{\varvec{z}}}_{k\pm },\hat{{\varvec{z}}}_{k\pm }^*)\). The components of the vector \({{\varvec{\Delta }}}_k\in {\mathbb {C}}^n\) are identically and independently distributed random variables in the set \(\{\pm 1,\pm i\}\). The gain coefficients \(a_k\) and \(c_k\) control the convergence of CSPSA and are chosen as

$$\begin{aligned} a_k=\frac{a}{(k+1+A)^s},~~c_k=\frac{b}{(k+1)^r}. \end{aligned}$$
(8)

The values of aAsb and r are adjusted to optimize the rate of convergence depending on the target function. We use the values: \(a = 1.0\), \(b = 0.25\), \(s = 1.0\), \(r = 1/6\), and \(A = 0\).

Two main properties of CSPSA are: (i) it has a mathematical proof of asymptotic convergence in mean to the maximizer \(\tilde{{\varvec{z}}}\) of \(f({{\varvec{z}}},{{\varvec{z}}}^*)\) and (ii) \(\hat{{\varvec{g}}}_k\) is an asymptotically unbiased estimator of the Wirtinger gradient. With proper conditions, these properties are maintained even in the presence of the noise terms \(\epsilon _{k,\pm }\) entering in Eq. (6). CSPSA is the generalization of the Simultaneous perturbation stochastic approach (SPSA) [47, 48] from the field of real numbers to the field of complex numbers. SPSA has been applied to the problem of estimating pure states [49, 50] and experimentally realized [51].

Thus, the application of CSPSA to the maximization of the CHSH function proceeds as follows: an initial guess \(\hat{{\varvec{z}}}_0\) for the vector containing the measurement settings and a vector \({{\varvec{\Delta }}}_0\) are randomly generated. These two vectors are employed to calculate the vectors \(\hat{{\varvec{z}}}_{0\pm }\) according to Eq. (7). Thereafter, the values \(S(\hat{{\varvec{z}}}_{0\pm },\hat{{\varvec{z}}}^*_{0\pm })\) of the CHSH function are obtained, which involves the realization of measurements on a finite ensemble of N copies of the unknown state \(\rho \). The values \(S(\hat{{\varvec{z}}}_{0\pm },\hat{{\varvec{z}}}^*_{0\pm })\) are then employed to calculate the estimator for the Wirtinger gradient \(\hat{{\varvec{g}}}_0(\hat{{\varvec{z}}}_0,\hat{{\varvec{z}}}_0^*)\) using Eq. (6). Finally, a new estimate \(\hat{{\varvec{z}}}_1\) for the vector of settings is obtained by means of Eq. (5). This process is iterated until achieving a violation of the CHSH inequality or until reaching a predefined number of iterations. Algorithm 1 shows a pseudocode for the optimization of the CHSH function via CSPSA.

According to Eq. (6), the use of CSPSA to maximize the CHSH function \(S({{\varvec{z}}},{{\varvec{z}}}^*)\) requires the capability of obtaining the values \(S(\hat{{\varvec{z}}}_{k\pm },\hat{{\varvec{z}}}_{k\pm }^*)\) at each iteration, which in turn requires an experimental platform capable of measuring the CHSH function at any value of the setting vector \({{\varvec{z}}}\). In photonic platforms where a qubit is encoded in the polarization degree of freedom of a single photon, the local measurements on a qubit are carried out by the interaction of the photon with a sequence of half- and quarter-wave plates followed by a polarizing beam splitter and single-photon detectors. In this case, a setting vector is given by the rotation angles of the wave plates. Thereby, it is possible to implement any local measurement up to the angular resolution of the wave plates. It is possible to achieve a high degree of control in other experimental platforms, for instance, in time-bin or energy–time encoded qubits, where local measurements can be implemented introducing electronically controlled phase shifts. Thus, we will assume that the CHSH function can be measured for any value of the setting vector \({{\varvec{z}}}\).

3 Results

A single run of CSPSA starts with the choice of an initial guess \({{\varvec{z}}}_0\) of the four local measurement bases and proceeds through the choice of the vector \({{\varvec{\Delta }}}_k\) at every iteration. Since there is no a priori information about the initial state, the initial guess for each of the local measurements, which are defined by Eqs. (2) and (3), is randomly chosen according to a Haar uniform distribution. The choice of \({{\varvec{\Delta }}}_k\) is equally random. Thereby, CSPSA is an intrinsically stochastic optimization algorithm. A third source of randomness is the value of the CHSH function. This is obtained by means of probabilities that are inferred from local measurements made on a set of equally prepared copies of the unknown state. Since the size N of the ensemble is finite, the inferred probabilities are affected by finite statistic noise. Thereby, CSPSA exhibits three different sources of randomness and, consequently, each run of CSPSA will follow a different trajectory in the optimization space, that is, the space of all four setting vectors. Here, we report the results of numerical experiments for the cases of pure and mixed states considering the sources of randomness affecting the performance of the proposed method.

To study the violation of the CHSH inequality with an unknown state \(\rho \), pure or mixed, we compute the expected value \({\bar{S}}(\rho )\) by sampling a sufficiently large number of independent trajectories, each obtained through the optimization of S by CSPSA for \(\rho \), as

$$\begin{aligned} {\bar{S}}(\rho )=\frac{1}{K}\sum _{{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\}}S(\rho ,{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\}), \end{aligned}$$
(9)

where \(S(\rho ,{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\})\) is the value of the CHSH function evaluated on a particular trajectory generated by a single run of CSPSA and K is the total number of simulated trajectories. \(S(\rho ,{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\})\) depends on the unknown state \(\rho \), the set \({{\varvec{z}}}_0\) of complex numbers that defines the initial guess for the four local measurements, and the particular sequence of choices \(\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\}\). The mean \({\bar{S}}(\rho )\) will be studied as a function of the number k of iterations for a fixed ensemble size N.

Since we are interested in the overall behavior of the algorithm for unknown states, we calculate the mean \({\bar{S}}_C\) of \({\bar{S}}(\rho )\) in a set \(\Omega _C\), that is,

$$\begin{aligned} {\bar{S}}_C=\frac{1}{M}\sum _{\rho \in \Omega _C} {\bar{S}}(\rho ), \end{aligned}$$
(10)

where M is the number of states in \(\Omega _C\) and C is a parameter that characterizes the states in the set. Alternatively, we calculate the median \({\tilde{S}}_C\) of \({\bar{S}}(\rho )\) in the set \(\Omega _C\) and the interquartile range. This is done to determine whether the distribution of \({\bar{S}}(\rho )\) in \(\Omega _C\) exhibits a symmetric distribution or not and the possible existence of outliers.

Fig. 1
figure 1

CHSH function \(S(|\psi _\lambda \rangle )\) as a function of the Schmidt coefficient \(\lambda \) for two-qubit states with fixed local Schmidt bases. Continuous green line represents the theoretical prediction given by Eq. (12). Solid red circles (blue x’s) represent the mean \({\bar{S}}(|\psi _\lambda \rangle )\) (median \({\tilde{S}}(|\psi _\lambda \rangle )\)) of \(S(|\psi _\lambda \rangle )\) obtained via CSPSA considering \(10^4\) initial guesses for each state \(|\psi _\lambda \rangle \) after 200 iterations and an ensemble size \(N=10^2\)

Fig. 2
figure 2

CHSH function \(S(|\psi _\lambda \rangle )\) as a function of the Schmidt coefficient \(\lambda \) for two-qubit states with fixed local Schmidt bases. Continuous green line represents the theoretical prediction given by Eq. (12). Solid red circles (blue x’s) represent the mean \({\bar{S}}(|\psi _\lambda \rangle )\) (median \({\tilde{S}}(|\psi _\lambda \rangle )\)) of \(S(|\psi _\lambda \rangle )\) obtained via CSPSA considering \(10^4\) initial guesses for each state \(|\psi _\lambda \rangle \), 200 iterations, and an ensemble size \(N=10^4\)

3.1 Unknown pure states

We start our analysis of the proposed algorithm by considering the violation of the CHSH inequality for the set \(\Omega _{\lambda }\) of two-qubit pure states defined by the Schmidt decomposition

$$\begin{aligned} |\psi (\lambda )\rangle =\sqrt{\lambda }|0\rangle _1|0\rangle _2+\sqrt{1-\lambda }|1\rangle _1|1\rangle _2, \end{aligned}$$
(11)

where \(\lambda \in [0,1/2]\) is the Schmidt coefficient and \(\{|0\rangle _1,|1\rangle _1\}\) and \(\{|0\rangle _2,|1\rangle _2\}\) are fixed local Schmidt bases of each qubit. States in \(\Omega _\lambda \) lead to a value of the function S given by

$$\begin{aligned} S(\lambda )=2\sqrt{1+4\lambda (1-\lambda )}. \end{aligned}$$
(12)

In Fig. 1, we show \({\bar{S}}(\rho _{\lambda })\) for \(\rho _{\lambda }=|\psi _\lambda \rangle \langle \psi _\lambda |\) as a function of \(\lambda \) for \(N=10^2\) after 200 iterations and \(K=10^4\). Initial guesses for the set of four local observables are randomly chosen. In particular, information about the fixed bases in \(|\psi _\lambda \rangle \) has not been used to improve the performance of CSPSA. As is apparent from Fig. 1, CSPSA provides mean and median of \(S(|\psi _\lambda \rangle )\) that closely resemble the theoretical prediction of Eq. (12) for any value of \(\lambda \). A much better agreement can be obtained by increasing the ensemble from \(N=10^2\) to \(N=10^4\), which is illustrated in Fig. 2.

Table 1 Comparison between the mean value of the CHSH function \(S_{mean}\) provided by the training of the measurement for pure states and the maximal theoretical value \(S_{th}\) for a given value of the Schmidt coefficient \(\lambda \) with the corresponding relative error for ensemble size \(N=10^2\) and \(N = 10^4\)

Table 1 provides a summary of simulations conducted for states expressed in Schmidt decomposition with parameter \(\lambda \), for an ensemble size of \(N = 10^2\) and \(10^4\), with 75 total iterations for each ensemble size. The calculation of relative error indicates that as a larger ensemble size is used, the method’s accuracy improves by up to 2.78 times (\(\lambda = 0.40\)). However, as the \(\lambda \) parameter decreases, the method’s accuracy decreases. This is because the method requires more iterations to achieve an accuracy close to \(0.1\%\)

Fig. 3
figure 3

Mean \({\bar{S}}_C\) of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for several values of the concurrence C in the interval [0.1, 1.0], from bottom to top. The mean \(\bar{S}(\rho )\) is calculated with \(10^4\) independent trajectories, and each local measurement is simulated with an ensemble size \(N=10^2\). Upper and lower straight lines represent the values 2\(\sqrt{2}\) and 2, correspondingly

Fig. 4
figure 4

Median of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for several values of the concurrence C in the interval [0.1, 1.0], from bottom to top. The mean \({\bar{S}}(\rho )\) is calculated with \(10^4\) independent trajectories and each local measurement is simulated with an ensemble size \(N=10^2\). Upper and lower straight lines represent the values \(2\sqrt{2}\) and 2, correspondingly

Next we analyze the case of pure states with a known value of the concurrence C, which is given by the expression

$$\begin{aligned} C(\lambda )=2\sqrt{\lambda }\sqrt{1-\lambda }. \end{aligned}$$
(13)

The local Schmidt bases of the state are unknown. In the simulations, we choose a fixed value C of the concurrence, which in turn fixes the value of the Schmidt coefficient. The local Schmidt bases are randomly chosen. As in the previous simulations, the knowledge about the value of the concurrence is not employed to improve the performance of CSPSA. Figure 3 shows the behavior of \({\bar{S}}_C\), which is the mean of \({\bar{S}}(\rho )\) calculated on a set \(\Omega _C\) of pure states with a fixed value C of the concurrence, as a function of the number k of iterations for several values of C. Each set \(\Omega _C\) contains 100 states chosen according to a Haar uniform distribution and \({\bar{S}}(\rho )\) is calculated with \(10^4\) trajectories. Each one of the four local measurements is simulated considering an ensemble size of \(N=10^2\). According to Fig. 3, the quantity \({\bar{S}}_C\) exhibits a fast increase in the value of the CHSH function within the first tens of iterations followed by a linear behavior, which asymptotically approaches the maximal value of the function S for the value C of the concurrence. The overall behavior of \({\bar{S}}_C\) does not depend on the value of C.

Figure 4  displays the median \({\tilde{S}}_C\) of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for several values of the concurrence C. Shaded areas represent the interquartile range. Monte Carlo experiments are carried out as in Fig. 3. As is apparent from this figure, the median exhibits the same overall behavior as the mean \({\bar{S}}_C\). Mean and median reach after a few tens iterations values that are nearly indistinguishable and contained within the interquartile range. This indicates that the stochasticity of CSPSA does not lead to outliers in the histogram of \({\bar{S}}(\rho )\) for all simulated sets \(\Omega _C\). The interquartile range, which is a quartile-based measure of variability, decreases rapidly with the number of iterations and becomes a very narrow fringe. This is an indication that the histogram of \({\bar{S}}(\rho )\) for a particular \(\Omega _C\) after a few tens iterations is highly concentrated around the mean.

Table 2 Comparison between the mean value of the CHSH function \(S_{mean}\) provided by the training of the measurement for Werner states and the maximal theoretical value \(S_{th}\) for a given value of the Schmidt coefficient \(\lambda \) with the corresponding relative error for ensemble size \(N=10^2\) and \(N = 10^4\)

Table 2 provides a summary of simulations conducted for Werner states based with parameter \(\lambda \), for an ensemble size of \(N = 10^2\) and \(10^4\), with 75 total iterations for each ensemble size. This table shows that for small values of \(\lambda \), the relative error is very high. This is because for weakly entangled states, the method necessarily requires a large ensemble size and a high number of iterations to achieve accurate results. However, the same table shows that for \(\lambda \) values close to 0.4, the error with an ensemble size of \(10^4\) particles is less than \(1\%\). This demonstrates that with equal resources, very high precision can be achieved.

Thus, Figs. 3 and 4, and Table 2 clearly indicate that CSPSA can be employed to iteratively increase the value of the CHSH function for unknown pure states and detect entanglement. The greater the entanglement of the unknown state, the fewer iterations will be required to obtain a violation of the CHSH inequality. Furthermore, approximately 70 iterations are necessary to reach a value of the CHSH function close to the maximal violation allowed by quantum mechanics.

Fig. 5
figure 5

Mean \({\bar{S}}_C\) of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for several values of the concurrence C in the interval [0.1, 1.0], from bottom to top. The mean \(\bar{S}(\rho )\) is calculated with \(10^4\) independent trajectories and each local measurement is simulated with an ensemble size \(N=10^4\). Upper and lower straight lines represent the values \(2\sqrt{2}\) and 2, correspondingly

Fig. 6
figure 6

Median of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for several values of the concurrence C in the interval [0.1, 1.0], from bottom to top. The mean \({\bar{S}}(\rho )\) is calculated with \(10^4\) independent trajectories, and each local measurement is simulated with an ensemble size \(N=10^4\). Upper and lower straight lines represent the values \(2\sqrt{2}\) and 2, correspondingly

Figures 5 and 6 depict the mean \({\bar{S}}_C\) and the median \({\tilde{S}}_C\) of \({\bar{S}}(\rho )\) in \(\Omega _C\), correspondingly. In this case, local measurements are simulated with an ensemble size of \(N=10^4\), that is, a quadratic increase with respect to previous simulations. As is apparent from Figs. 5 and 6, the overall behavior remains unchanged with respect to Figs. 3 and 4. In particular, both values of ensemble size, \(N=10^2\) and \(N=10^4\), show small differences in the asymptotic linear regime. For instance, for weakly entangled states, that is, \(C=0.1\), after the total of iterations, in Fig. 4 CSPSA is close to 2 but below. In Fig. 6, CSPSA is slightly above 2. Similar differences can be observed for other values of C. Furthermore, a small reduction in the number of iterations required to violated the CHSH inequality can be observed. This reduction depends on the initial amount of entanglement of the unknown state. Also, the increase in N leads to narrower interquartile ranges.

This is more clearly illustrated in Fig. 7, which shows the median \({\tilde{S}}_C\) of \(S(\rho )\) in \(\Omega _C\) for \(C=0.5\) and \(C=0.9\) for three values of ensemble size \(N=10^2, 10^3, 10^4\). The interquartile range is also depicted. As is apparent from Fig. 7, CSPSA provides very similar values of \({\tilde{S}}_C\) almost independently of the size of the ensemble employed. However, in the regime of a few tens of iterations, \(N=10^2\) leads to lower values of \({\tilde{S}}_C\), while \(N=10^3\) and \(10^4\) lead to very similar values of \({\tilde{S}}_C\), which are higher than in the case \(N=10^2\). This has for consequence that higher values of N lead to a decrease in the number of iterations required to observe a violation of the CHSH inequality, but this improvement is saturated for an enough large sample size.

Fig. 7
figure 7

Median \({\tilde{S}}_C\) of \({\bar{S}}(\rho )\) in \(\Omega _C\) as a function of the number of iterations for \(C=0.5\) and \(C=0.9\). Each local measurement is simulated with an ensemble size \(N=10^4, 10^3, 10^2\). The median \({\tilde{S}}(\rho )\) for each value of C is calculated with \(10^4\) independent trajectories. Upper and lower straight lines represent the values \(2\sqrt{2}\) and 2, correspondingly

This later effect is analyzed with the help of Fig. 8 that displays the number of iterations \(k_{S>2}\) required to obtain a violation of the inequality with \(75\%\) of the states generated for a given value of C and with \(N=10^2, 10^3, 10^4\). Here, we observe that \(N=10^4\) and \(N=10^3\) lead to a very similar behavior while \(N=10^2\) requires the largest number of iterations to reach a violation of the CHSH inequality. Also, the lower the concurrency value, the greater the number of iterations required for the violation. In fact, Fig. 8 suggests that \(k_{S>2}\) decreases exponentially with C. This figure also illustrates the interplay between \(k_{S>2}\) and the total ensemble size \(N_{S>2}\) required for violating the CHSH inequality. For example, in the case of \(C=0.1\) and \(N=10^2\), we have that approximately \(k_{S>2}=100\), which leads to \(N_{S>2}=8\times 10^4\). For \(N=10^4\), we have that approximately \(k_{S>2}=35\), and thus, \(N_{S>2}=280\times 10^4\). Clearly, the reduction in the value of \(k_{S>2}\) comes at the expense of using a much larger total ensemble \(N_{S>2}\). For states with a high value of concurrence C, the reduction in the value of \(k_{S>2}\) by increasing the value of N is marginal.

Fig. 8
figure 8

Number of iterations \(k_{S>2}\) such that the interquartile range is above \(S=2\) as a function of the concurrence C for \(N=10^2,10^3\), and \(10^4\), from top to bottom

Fig. 9
figure 9

Mean \({\bar{S}}_{{\mathcal {H}}}\) and median \({\tilde{S}}_{{\mathcal {H}}}\) of \(\bar{S}(|\psi \rangle \langle \psi |)\) with \(|\psi \rangle \in {\mathcal {H}}\) and interquartile range for \(N=10^2\)

Fig. 10
figure 10

Mean \({\bar{S}}_{{\mathcal {H}}}\) and median \({\tilde{S}}_{{\mathcal {H}}}\) of \(\bar{S}(|\psi \rangle \langle \psi |)\) with \(|\psi \rangle \in {\mathcal {H}}\) and interquartile range for \(N=10^4\)

So far, our study of the violation of CHSH inequality through CSPSA has been done considering that the initial amount of entanglement is known. This was done to show that CSPSA drives the value of the CHSH function S close to the maximum value regardless of the amount of entanglement. We now lift this assumption and consider unknown pure states. In order to do this, we generate a set \(\Omega _{{\mathcal {H}}}\) with 100 pure states in the Hilbert space \({{\mathcal {H}}}={{\mathcal {H}}}_1\otimes {{\mathcal {H}}}_2\) of two qubits according to a Haar uniform distribution and calculate the mean \({\bar{S}}_{{\mathcal {H}}}\) and the median \({\tilde{S}}_{{\mathcal {H}}}\) of \({\bar{S}}(|\psi \rangle \langle \psi |)\) in \(\Omega _{{\mathcal {H}}}\), together with the corresponding interquartile range. These quantities are depicted in Fig. 9 as a function of the number of iterations. The behavior exhibited by the mean and media is very similar and characterized by a fast increase within the first tens of iterations followed by an asymptotic linear regime. Figure 9 also shows the mean and media of the maximal theoretical values of S for each state in \(\Omega _{{\mathcal {H}}}\), which are indicated as two superposed straight lines. As can be seen from Fig. 9, CSPSA produces a mean and a median that are very closely to the theoretical values. Also, the expected number of iterations \(k_{S>2}\) such that 75% of the simulated states violates the CHSH inequality is about 25. Figure 10 shows the same information as Fig. 9 but with \(N=10^4\). In this case, we see that the quadratic increase in the ensemble size allows CSPSA to reach mean and media values that are even closer to the theoretical values. Furthermore, there is a small reduction in the number of iterations required to obtain a value of S greater than two from 25 to 20.

Our previous simulations seem to indicate that the optimization of the CHSH function for an unknown state through the CSPSA method provides maximum values of the CHSH functional close to the theoretical maximum values. In order to analyze this, we employ the mean square error. For a given state \(\rho =|\psi \rangle \langle \psi |\) and a single realization of CSPSA, we calculate the square error \(SE(\rho )\) as

$$\begin{aligned} SE(\rho )=|S(\rho ,{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\}-S_{max}(\rho )|^2. \end{aligned}$$
(14)

The mean square error \(MSE(\rho )\) for a fixed unknown state \(\rho \) with respect to a large set of realizations is given by

$$\begin{aligned} MSE(\rho )=\frac{1}{K}\sum _{{{\varvec{z}}}_0,\{ {{\varvec{\Delta }}}_1,\dots ,{{\varvec{\Delta }}}_k\}}SE(\rho ), \end{aligned}$$
(15)

which corresponds to an estimation accuracy metric. This is then used to calculate the average of the mean square error \(\overline{MSE}\) on the total Hilbert space \({\mathcal {H}}\) as

$$\begin{aligned} \overline{MSE}=\frac{1}{M}\sum _{\rho \in \Omega _{{\mathcal {H}}}} MSE(\rho ). \end{aligned}$$
(16)
Fig. 11
figure 11

Mean square error \(\overline{MSE}\) as a function of the number k of iterations for \(N=10^2, 10^3\), and \(10^4\), from top to bottom. Shaded areas represent interquartile range

Figure 11 shows the mean \(\overline{MSE}\) of the square error on the Hilbert space as a function of the number of iterations for \(N=10^2, 10^3, 10^4\). For each value of ensemble size, \(\overline{MSE}\) displays a fast decrease followed by an approximately asymptotic lineal behavior. \(N=10^3\) and \(N=10^4\) produce very similar values of the mean square error while an ensemble size of \(N^2\) produces a value that is almost half order of magnitude higher. After 25 iterations, the difference between the maximal theoretical value and the value achieved by CSPSA is between \(10^{-1}\) and \(10^{-2}\). Adding 50 more iterations this difference is approximately between \(10^{-2}\) and \(10^{-3}\). Let us recall that after 75 iterations the lower bound of the interquartile range of \({\bar{S}}(|\psi \rangle \langle \psi |)\) has an approximate value of 2.12, so that for 75% of states in the bipartite Hilbert space we can ascertain its entangled nature and assign an accurate value of the CHSH function. A further improvement in the accuracy achieved by CSPSA can be obtained at the expense of a large increase in the number of iterations, after adding 150 iterations we obtain a new decrease by one order of magnitude, that is, the mean \(\overline{MSE}\) of the square error on the Hilbert space is approximately in the interval between \(10^{-3}\) and \(10^{-4}\).

3.2 Unknown mixed states

In the previous section, we have studied the violation of the CHSH inequality for unknown pure states by means of a CSPSA-driven sequence of local measurements. Here, we study the case of mixed bipartite states.

We start by reproducing the value of the CHSH function on the set of the Werner states, which are given by the expression

$$\begin{aligned} \rho _\lambda =\lambda |\psi _s\rangle \langle \psi _s|+\frac{(1-\lambda )}{d}{} I , \end{aligned}$$
(17)

where \(|\psi _s\rangle \) is the maximally entangled singlet state defined as

$$\begin{aligned} |\psi _s\rangle =\frac{1}{\sqrt{2}}(|0\rangle |1\rangle -|1\rangle |0\rangle ) \end{aligned}$$
(18)

and \(I \) is a 4-dimensional identity operator. This mixture of the singlet state with white noise is separable if and only if \(\lambda \le 1/3\) and violates the CHSH inequality if and only if \(\lambda >1/\sqrt{2}\). The maximal value of the CHSH function for a Werner state \(\rho _\lambda \) is given by

$$\begin{aligned} S(\rho _\lambda )=2\sqrt{2}\lambda . \end{aligned}$$
(19)
Fig. 12
figure 12

Mean \({\bar{S}}(\rho _\lambda )\) (solid red dots) and median \(\tilde{S}(\rho _\lambda )\) (blue x’s) as a function of \(\lambda \) for Werner states. Continuous black line depicts the maximal value of the CHSH function of Eq. (19). Local measurements are simulated with an ensemble size \(N=10^2\) and 75 iterations are realized

Fig. 13
figure 13

Mean \({\bar{S}}(\rho _\lambda )\) (solid red dots) and median \(\tilde{S}(\rho _\lambda )\) (blue x’s) as a function of \(\lambda \) for Werner states. Continuous black line depicts the maximal value of the CHSH function of Eq. (19). Local measurements are simulated with an ensemble size \(N=10^4\) and 75 iterations are realized

Figure 12 displays the mean \({\bar{S}}(\rho _\lambda )\) and median \({\tilde{S}}(\rho _\lambda )\) as a function of \(\lambda \) obtained via CSPSA for an ensemble size \(N=10^2\) after 75 iterations. With the exception of the first 5 points, Fig. 12 shows a very good agreement between the maximal value of the CHSH function of Eq. (19) and the value achieved with the help of CSPSA. Furthermore, mean and median exhibit values that also are very close and the interquartile (not depicted) range is very narrow. Thus, within the family of Werner states CSPSA drives the sequence of local measurement bases very close to the optimal set. An increase in the ensemble size leads to even better results. This is illustrated in Fig. 13, where local measurements are simulated with an ensemble \(N=10^4\). In this case, all points are closer to the maximal value of the CHSH inequality.

Next we proceed with the case of unknown mixed states. We randomly generated a set of \(10^6\) two-qubit mixed states. In order to determine whether a mixed state violates or not the CHSH inequality, we employ the M quantity criterion [52]. A mixed state \(\rho \) acting on a Hilbert space \(\mathcal {H}=\mathcal {H}_2\otimes \mathcal {H}_2\) can be represented in the form

$$\begin{aligned} \rho= & {} \frac{1}{4}\left( I\otimes I +\sum _{i=1}^3r_i\sigma _i\otimes I + I\otimes \sum _{i=1}^3s_i\sigma _i\right. \nonumber \\{} & {} + \left. \sum _{n,m=1}^3 t_{nm}\sigma _n\otimes \sigma _m \right) , \end{aligned}$$
(20)

where I represents the 2-dimensional identity operator, \(\{\sigma _n\}_{n=1}^3\) are the standard Pauli matrices, and the real coefficients \(r_i, s_i\) and \(t_{n,m}\) define the mixed state. The quantity M is defined by \(M(\rho )=u+\tilde{u}\), where u and \(\tilde{u}\) denote the greater positive eigenvalues of the matrix \(U_{\rho }:=T_{\rho }^T T_{\rho }\) being the coefficients of the matrix \(T(\rho )\) given by \(t_{nm}=\text{ Tr }(\rho \sigma _n\otimes \sigma _m)\). A state \(\rho \) violates the CHSH inequality if and only if the condition \(M(\rho )~>~1\) holds [52]. Employing this criterion, the initial set of \(10^6\) mixed states was reduced to a set \(\Omega \) containing \(8\times 10^3\) mixed states with \(M(\rho )>1\) that violate the CHSH inequality.

To analyze the values of the CHSH function obtained through CSPSA, we use those obtained through SDP. In the SDP case, we need to fix the state that is used in the maximization. However, let us recall that even when the states are fixed, the maximization of S remains to be a nonlinear problem. Therefore, to find the maximum value of S for each state in \(\Omega \) we use the see-saw method [53, 54] to iterate a SDP test [55, 56] where either observable A or B remains fixed while optimizing in the other variable. The SDP that we solve is the following

$$\begin{aligned} \text{ given }~{} & {} \rho _\Omega , A(z_a), A(z_a'), \end{aligned}$$
(21)
$$\begin{aligned} \underset{B(z_b), B(z_b')}{\text{ max }}{} & {} S(\rho _\Omega ,A(z_a), A(z_a'),B(z_b), B(z_b')), \end{aligned}$$
(22)

with the conditions

$$\begin{aligned} |\Psi (z_b)\rangle \langle \Psi (z_b)|,|\Psi ^{\perp }(z_b)\rangle \langle \Psi ^{\perp }(z_b)|\ge 0 \quad \forall \ z_b,z_b', \end{aligned}$$
(23)
$$\begin{aligned} |\Psi (z_b)\rangle \langle \Psi (z_b)| + |\Psi ^{\perp }(z_b)\rangle \langle \Psi ^{\perp }(z_b)| = I \quad \forall \ z_b,z_b'. \end{aligned}$$
(24)

Notice that this SDP takes Alice’s observables \(A(z_a)\) and \(A(z_a')\) as inputs and for a given mixed state from the \(\Omega \) set, it finds Bob’s observables \(B(z_b)\) and \(B(z_b')\) that maximally violate S. Then, we take the observables B outputted by this SDP as inputs in a new iteration to obtain optimal observables A. This procedure is iterated until some suitable convergence condition is satisfied. We performed this optimization for every mixed bipartite state in the set \(\Omega \), which allows us to find better lower bounds on S, together with the optimal observables A and B.

Fig. 14
figure 14

Mean \({\bar{S}}_{\Omega }\) (red solid line) and median \(\tilde{S}_{\Omega }\) (blue solid line) obtained via CSPSA on the set \(\Omega \) of randomly generated mixed entangled states as a function of the number k of iterations. Mean \({\bar{S}}_{\Omega }\) (yellow solid line) and median \({\tilde{S}}_{\Omega }\) (green solid line) obtained via SDP on the set \(\Omega \) of randomly generated mixed entangled states as a function of the number k of iterations. Shaded areas correspond to interquartile range. CSPSA simulations consider ensemble size \(N=10^4\)

Figure 14 displays the behavior of the mean \({\bar{S}}_{\Omega }\), median \({\tilde{S}}_{\Omega }\), and interquartile range as functions of the number of iterations. This figure also displays the values of these quantities obtained via SDP. As is apparent from this figure, the values of the mean and median provided via CSPSA are very close and tend to agree with the values delivered by SDP after tens of iterations. Also, the interquartile ranges tend to overlap. However, in the case of mixed states the number of iterations needed to obtain a violation of the CHSH inequality is much greater than in the case of pure states. This is due to the fact that the mixed states in \(\Omega \) typically have small values of the negativity, a well-known measure of entanglement, and thus, as in the case of weakly entangled pure states, need more iterations to reach a violation of the CHSH inequality.

4 Conclusions

We have studied the problem of detecting the entanglement of unknown two-qubit states, mixed or pure, by violating the Clauser–Horne–Shimony–Holt inequality. Our approach to this problem is based on the maximization of the CHSH function by means of a stochastic optimization method, the complex simultaneous perturbation stochastic approximation. This allows optimizing functions with unknown parameters, which in our case correspond to the unknown quantum state. CSPSA employs an iterative rule which requires at each iteration the value of the target function, that is, the CHSH function, at two different points in the optimization space. This is formed by vectors on the field of the complex numbers containing the measurement settings of four observables. The values of the CHSH function can be experimentally obtained even if the two-qubit state remains unknown. Thereby, CSPSA generates a sequence of measurement settings that in mean lead to increasing values of the CHSH inequality.

To analyze the characteristics of the proposed method, we carried out several numerical experiments. In particular, due to the stochastic nature of CSPSA, we employ random sampling to obtain estimates of the mean, median, and interquartile range of the quantities of interest. We first note that for a fixed unknown state, CSPSA provides very similar values of the mean and median of the CHSH function and a very narrow interquartile range. This indicates that CSPSA does not generates outliers, that is, for a given unknown state different realizations of our method provide very close results. This feature has been observed for each state in a universe of \(5\times 10^4\) randomly generated pure two-qubit states.

The typical behavior of the mean of the CHSH function, as a function of the number of iterations, corresponds to a rapid increase followed by an approximately linear asymptotic behavior, which approaches the maximal value of the CHSH function. Unknown states characterized by the same concurrence value exhibit a very similar behavior of the CHSH function. However, the rate of convergence toward the maximum depends on the initial value of the concurrence. The higher the concurrence value, the fewer iterations are required to obtain a violation of the CHSH inequality and, consequently, detect entanglement. For example, states with maximum concurrence need 13 iterations while states with a concurrence of 0.1 need approximately 75 iterations to reach a violation. The number of iterations required to detect entanglement can be decreased by increasing the size of the ensemble of identically prepared copies that is employed to estimate the expectation values entering in the CHSH function. In our simulations, however, the effect of increasing the ensemble size is more notorious in the case of highly entangled states. We have studied the mean of the CHSH function on the 2-qubit Hilbert space. In this case, for an ensemble size of \(10^2\) the entanglement of the randomly generated states is detected in mean by violating the CHSH inequality after 17 iterations, while after 25 iterations 75% of the randomly generated states violate the CHSH inequality. These figures can be reduced by increasing the ensemble size. We have also studied the accuracy provided by our method in the estimation of the maximum value of the CHSH function. As accuracy metric, we have used the mean squared error, which shows that after 25 iterations the difference between the maximal theoretical value and the value achieved by CSPSA is between \(10^{-1}\) and \(10^{-2}\). After 75 iterations, the accuracy is approximately between \(10^{-2}\) and \(10^{-3}\). We have also considered the case of mixed states. The proposed method is capable of reproducing the maximal value of the CHSH function for Werner states and for randomly chosen mixed states.

Therefore, the numerical simulations indicate that the maximization of the CHSH function through CSPSA leads to the detection of the entanglement of unknown states, pure or mixed. In mean, 25 iterations detect the entanglement of 75% of the generated states. Also, it is possible to reach an accurate value of the maximal violation.

There are some variations in the method here proposed that could reduce the number of iterations used to detect entanglement. We implement CSPSA considering the standard choice for the gain coefficients. However, these can be optimized. This is in general a difficult problem. Nevertheless, some simple heuristic prescriptions have been discussed in the study of various proposals of variational quantum eigensolvers [57]. These are based on SPSA, a version of CSPSA that works on the field of the real numbers. It seems possible that the SPSA performance-enhancing prescriptions could also be used to improve the CSPSA convergence rate, which would reduce the number of iterations required to detect entanglement. The usage of second-order methods or quantum natural gradient could also speed up the protocol [58,59,60,61,62]. These employ additional measurements of the objective function to estimate its Hessian matrix, or fidelity to estimate the metric tensor. Thereafter, these matrices are used to precondition the gradient in order to improve the convergence rate, avoiding the need for tuning of some gain coefficients. Another possibility arises when considering the large amount of information generated by our method. At each iteration, 4 local observables are measured, which after several iterations provide a considerable amount of information about the unknown state. Thus, we can obtain an estimate of the unknown states by means of maximum likelihood [40]. This, together with the estimate of the optimal measurement settings provided by CSPSA, can be used as initial guesses in a SDP problem to optimize the CHSH function. The solution of this problem can be used as the initial guess of the optimal measurement settings in the next iteration of CSPSA. This procedure does not increases the amount of measurements to be carried out but the computational cost. Besides, the use of a priori information, which restricts the dimension of the parameter space, can be employed to further increase the CSPSA convergence rate and achieve entanglement detection with a reduced number of iterations.

We would like to remark that our approach to the violation of the CHSH inequality with unknown states can be employed in other interesting problems, where some properties of the optimization algorithm are advantageous. The construction of entanglement witnesses is a demanding computational task [33, 34], especially if the state is unknown, but it could be done efficiently with our method. In this case the optimization is also performed in the space of observables. Also, the search for the optimal measurement settings beyond the bipartite case is also possible, for instance, the violation of a multiqubit Bell inequality [63,64,65]. This is a challenging subject because the dimension of the total Hilbert space scales exponentially with the number of qubits, so finding the optimal measurement settings by means of quantum tomography and SPD is unfeasible. Our approach could provide an advantage in this problem, since the required resources scale with the number of iterations and not with the number of qubits. In addition, in the multipartite scenario it seems feasible to study the analogies of Bell nonlocality sudden death through unknown states [66, 67].