1 Introduction

The seamless integration of chalcogenide phase change materials (PCM) with silicon photonic components is opening new avenues for light-speed computation and novel datacenter applications with all the advantages of nonvolatile memory on a CMOS technology compatible optical platform. Inventive ways of leveraging the exciting properties of PCM are being explored on the optical platform which is already superior over the electrical domain owing to the multiplexing capability, low residual crosstalk and dispersion, not to mention the ultrawide bandwidth. Due to the dramatic refractive index contrast of Ge2Sb2Te5 (GST-225) over the vis and near-IR spectral range, long retention time (for at least 10 years) (Wong et al. 2010; Wuttig and Yamada 2007), and its nanosecond scale switching rate with high reproducibility and durability obtained over 1012 switching cycles (Burr et al. 2010), this alloy has been used as a suitable candidate for an integrated PCM photonic non-volatile memory. Recently, GST has been integrated directly on silicon and silicon nitride waveguides to implement an all-photonic 5-bit memory cell (Li et al. 2019), a 512-bit photonic random-access memory (Feldmann et al. 2020), a photonic tensor core for convolution processing (Feldmann et al. 2021), an in-memory computing platform (Ríos et al. 2019), mimic synapses and neurons in spiking neural networks (Chakraborty et al. 2018; Cheng et al. 2017; Feldmann et al. 2019), and emulate an abacus calculator (Feldmann et al. 2017).

The fully CMOS-process compatible silicon photonics platform can enable a potential paradigm shift in design of in-memory computing systems due to the inherently parallel nature of light-based calculations. Content addressable memories (CAMs) are a special form of in-memory computing circuit widely used in high-speed searching applications as they can compare an input query against a list of stored data in parallel, returning the address of matching data in the case of a hit. CAMs can also play useful roles in specialized computations, such as hardware compression/ decompression and implementing reverse tagged directories for large-scale multiprocessor coherence (Beamer and Akgul 2009). Though they have also found new utility in the emerging in-memory computing paradigm and deep learning applications (Karam et al. 2015; Ni et al. 2019), commercial success of CAM implementations has mostly been seen in datacenter technologies for network routing and real-time network traffic monitoring (Meiners et al. 2010; Pagiamtzis and Sheikholeslami 2006; Xu et al. 2016). One such technology is the implementation of Address Look-Up (AL) which continues to be traditionally bottlenecked by the speed limitation of SRAM technology operating at a few GHz (Agarwal et al. 2011; Pagiamtzis and Sheikholeslami 2006) as well as by impediments to scaling due to increasing leakage currents. Designs of emerging alternative electronic memory technologies based on phase change material (GST) (Junsangsri et al. 2017), ferroelectric field-effect-transistors (FeFET) (George et al. 2020; Yin et al. 2020; Yin et al. 2019a, b), nanomagnetic logic (Matsunaga et al. 2012), memristors (Grossi et al. 2018; Li et al. 2020; Yang et al. 2019), hold the promise for a better trade-off between low-power consumption and chip area occupancy but without raising any expectations for increased data rates. Crossing into the optical domain in search of higher rates, a photonic integrated CAM capable of 10 Gbps operation was recently introduced (Maniotis and Pleros 2017; Mourgias-Alexandris et al. 2018) whose design produces an optical equivalent of an SRAM-based CAM cell with flip-flops (FF). At each cell, the FF, responsible for storing the bit, requires two coupled semiconductor optical amplifier-Mach Zehnder interferometer (SOA-MZI) switches in a master–slave configuration. In addition, the output is connected to the lower branch of an MZI which constitutes an XOR gate, comprising two SOA. However, this approach requires significant hardware overhead (cell area ~ 6 × 2 mm2) (Pitris et al. 2015), therefore achieving low bit density for memory applications. A more recent proof-of-concept for an optoelectronic CAM (Alkabani et al. 2020) achieved much higher compactness and non-requirement of internal light sources as compared to the SRAM-emulating design, by utilizing foundry available single bus microring modulators (MRM) based on pn junction and carrier depletion as the basic elements in a WDM scheme. They proposed the use of electrical writes to store values in an optical search engine by voltage biasing each MRM j to resonance wavelength \({\lambda }_{0j}\)(store bit 0) or \({\lambda }_{1j}\)(store bit 1) in a row of n MRMs. The micro-ring resonators are coupled in series to a waveguide constituting the CAM’s entry (row), where each entry is terminated by a photodetector with output connected to a priority encoder. However, they found that it is required to compromise the switch modulation range (~ 3 dB) for storing bits of the CAM words to achieve at least 25 GHz modulation speed. Although the bias power for MRM modulation is notionally negligible in high-rate bit-by-bit switching, it adds to significant energy consumption overhead in routing schemes where switching occurs at much lower frequencies than the actual data rate. Furthermore towards denser photonic integration, scaling down the radii of Si resonators from tens of micrometers to wavelength of operation-comparable sizes has been experimentally demonstrated (Liu et al. 2019; Xu et al. 2008), but at the critical expense of ring quality factors (Q-factor) due to higher bending losses. Operating these MRRs as modulators demands great increases in the required switching power or else deteriorates the modulation depth (Bogaerts et al. 2012). Therefore, a compact photonic switch with a (1) self-holding capability, (2) moderate extinction ratio, and (3) operating on the basis of optical attenuation and not in spite of it- would be naturally suited for low-frequency switching applications like content addressing in routing AL; in addition to possessing the vital features of (4) compactness and CMOS-compatibility for easy on-chip integration and scaling alongside fully electronic systems.

Here, we combine the concept of reconfigurable optical phase change memory with silicon microring resonators (MRRs) to propose a photonic CAM architecture benefitting from the above-mentioned features of nonvolatile switches. By setting up the devices to behave as self-holding 1 × 2 switches, we have emulated the digital XOR (mismatch) operation in a novel all-optical manner and combined the cells in a wavelength division multiplexing (WDM) read scheme to achieve parallelism. The system’s read-response is limited only by on-chip modulator and photodetector bandwidths since the time-of-flight of a photon through the low-Q MRRs is negligible (~ few ps), signifying orders of improvement over conventional electronic CAM counterparts. Keeping in mind the low energy and area footprints that will be required to enable denser scaling of photonic integrated circuits (PIC) into the coming decade, we evaluate the performance of the system using ultracompact MRRs (radius ≈ 1.5 \(\mu m\)). We also briefly explore reconfigurability of the designed switches in light of recent works on PCM based photonic systems and adopt a low-footprint electro-optic write mechanism that consumes < 200 pJ per element for updating the stored words in the CAM. Finally, through system level simulations, we evaluate the uniformity and reliability of the read operations with different bit permutations in an 8-bit search in the presence of interchannel crosstalk, channel noise, and source-receiver nonidealities in the telecommunication S and C bands (1525–1575 nm).

2 Optical XOR operation

Our simple architecture for the proposed CAM array is based on an SOI add-drop ring resonator/filter with a thin patch of PCM-GST integrated onto the bent waveguide (Fig. 1a). Similar resonator devices have been very recently fabricated on silicon and silicon nitride platforms for both device (Rudé et al. 2013; Stegmaier et al. 2017; Wu et al. 2019; Zhang et al. 2019, 2018; Zheng et al. 2018) and circuit-level evaluation as neuromorphic/in-memory computing systems (Feldmann et al. 2019) and non-volatile memory circuits (Feldmann et al. 2020). However, here we leverage the differential transmission characteristics of the THROUGH port with respect to the IN and ADD ports of the filter at resonance to emulate an XOR-like operation which facilitates the bit search scheme.

Fig. 1
figure 1

a A perspective view of an add-drop microring resonator with a small patch of GST on top showing its ports, coupling coefficients and materials and b Field profile of fundamental TE mode at 1550 nm for (i) amorphous and (ii) crystalline states of GST in a waveguide bend of 1.55 \(\mu m\) radius. Inset shows cross-section of GST integrated on top of a 0.45 × 0.25 Si strip waveguide on SiO2 substrate

Taking into account the refractive indices of GST, Si, and SiO2, we simulated mode profiles via the Finite-Difference Eigenmode solver in Lumerical MODE (which calculates the spatial profile and frequency dependence of modes by solving Maxwell's equations on a cross-sectional mesh) for the bent waveguide with a 10-nm-thick GST layer topped by a protective indium tin oxide (ITO) layer. The bent waveguide aspect ratio (t/w) of 250/450 nm was chosen to ensure single fundamental TE mode transmission in the 1500–1600 nm (telecom S and C band) range of wavelengths. Figure 1b shows the field profile (\(|E|\)) at the cross-section for this mode at 1550 nm, visually demonstrating the difference in optical absorption (represented by \(Imag({n}_{eff})\)) in the GST’s (i) amorphous and (ii) crystalline state. As the light couples evanescently to the PCM, the interaction with the material on top of the waveguide decreases exponentially with the distance so that the influence of the protective layer on the device performance can be neglected compared to the PCM.

For our design, we set the minimum radius of the MRRs to be R = 1.515 μm to achieve denser integration with the added advantage of miniaturization being that the free spectral range (FSR) of the devices becomes larger, supporting denser WDM channel counts in spite of the increased full width half maximum (FWHM) in absorption-modulated devices. Rings of similar size have been demonstrated previously (Prabhu et al. 2010; Xu et al. 2008) with a more recent experiment (Liu et al. 2019) achieving FSR as large as 93 nm using R = 0.8 μm, with a moderate FWHM of ~ 0.8 nm. The modifications involved in the fabrication of the ultracompact MRRs in the mentioned works included reducing the width of the bus waveguide, which increases the spatial period of the propagating mode due to the lower effective refractive index. This results in a better phase match with the mode in the tightly curved ring waveguide and therefore better coupling. The use of strip waveguides and air cladding instead of ridge waveguides and oxide cladding respectively minimizes the bending loss due to a larger refractive index contrast better confining the mode (Bogaerts et al. 2012).

The microring is designed to be near-critically coupled (parameters in Table 1) when the GST is in the amorphous phase with low absorption such that the resonance of the microring assumes an extinction ratio as high as possible. We used a finite-difference time-domain method based solver (FDTD Solutions from Lumerical Inc.) to solve the Maxwell equations on a discrete spatiotemporal grid and obtain the optical response (Fig. 2) upon switching into the crystalline state: the round-trip loss inside the ring resonator is strongly increased due to the enhanced material absorption in the GST. Because the round-trip losses have increased, the existing coupling gap leads to an undercoupled ring resonator, thus increasing the transmission from IN to THROUGH ports and decreasing the transmission from the ADD to THROUGH ports at resonance.

Table 1 MRR and PCM parameters for FDTD simulation
Fig. 2
figure 2

a In to Through port transmission increases (from 0.027 to 0.602) upon crystallization of GST. b Add to Through port transmission increases (0.038 to 0.578) upon amorphization of GST. Readouts are performed at a fixed wavelength, ideally at the resonance wavelength of the MRR

The following expression for the THROUGH port transmission of an MRR with respect to the IN port at resonance

$${\mathrm{T}}_{\mathrm{through}}=\frac{{\mathrm{I}}_{\text{THROUGH }}}{{\mathrm{I}}_{\text{IN }}}=\frac{{\mathrm{r}}_{2}^{2}{\mathrm{a}}^{2}-2{\mathrm{r}}_{1}{\mathrm{r}}_{2}\mathrm{a}+{\mathrm{r}}_{1}^{2}}{1-2{\mathrm{r}}_{1}{\mathrm{r}}_{2}\mathrm{a}+{\left({\mathrm{r}}_{1}{\mathrm{r}}_{2}\mathrm{a}\right)}^{2}}$$
(1)

exhibits this strong dependence on the single pass (round trip) amplitude transmission \(a\approx exp\left(-\frac{2\pi }{{\lambda }_{res}}{\kappa }_{eff,GST}{L}_{GST}\right)\) which decreases strongly with increasing \({\kappa }_{eff,GST}\), which is the imaginary part of the effective refractive index of the waveguide material with GST. r1, r2 are the self-coupling coefficients, λres is the resonance wavelength and LGST is the length of the GST element. The same can be seen for the THROUGH port transmission with respect to the ADD port transmission which is found by interchanging the terms \({r}_{1}\) and \({r}_{2}\) in the regular drop port expression for a double bus MRR:

$${\mathrm{T}}_{\mathrm{drop}}=\frac{{\mathrm{I}}_{\text{THROUGH }}}{{\mathrm{I}}_{\text{ADD }}}=\frac{{k}_{1}^{2}{k}_{2}^{2}\mathrm{a}}{1-2{\mathrm{r}}_{1}{\mathrm{r}}_{2}\mathrm{a}+{\left({\mathrm{r}}_{1}{\mathrm{r}}_{2}\mathrm{a}\right)}^{2}}$$
(2)

where k1, k2 are the cross-coupling coefficients. The device also experiences a small resonance shift upon switching (Stegmaier et al. 2017) quantified by \(\frac{\Delta {\lambda }_{\mathrm{r}}}{{\lambda }_{\mathrm{r},0}}\approx \frac{\Delta {n}_{\mathrm{eff},\mathrm{GST}}}{{n}_{\mathrm{g},\mathrm{eff}}}\cdot \frac{{L}_{\mathrm{GST}}}{2\pi R}\) which amounts to approximately 0.6 nm for our case. Here, \({\lambda }_{r,0}\) denotes the resonance wavelength before the switching, \(\Delta {n}_{eff,GST}\) the respective change upon switching, \({n}_{g,eff}\) the effective group index of the bare waveguide, and \(R\) the radius of the ring. Since the read is performed at a fixed wavelength, this red shift upon crystallization actually adds to the switching contrast for the rings at the expense of a slightly higher interference with the response of the adjacent larger ring.

Now it is straightforward to demonstrate how the described structure performs the all-optical XOR operation for detecting single bit mismatches between a search query and a stored word/entry in the CAM as shown in Fig. 3. The stored bit B is encoded as the crystal state of the integrated PCM which can be either fully crystalline (B = 0) or amorphous (B = 1), while the search bit A is represented as the presence of the corresponding resonance frequency \({\lambda }_{i}\) at the IN port (A = 1) or ADD port (A = 0). Similar to CAMs in the electronic domain (Pagiamtzis and Sheikholeslami 2006; Yin et al. 2019b), the operation requires complementary read signals which can be generated by simply using the opposite modulation scheme of the one used to generate the uncomplemented input pulses. If the stored bit B is 1 (amorphous GST), a HIGH search bit (A = 1) means that the pulse from the input port is dropped and does not make it through to the output since the MRR is at a critically coupled state. However, a LOW search bit (\(\overline{\mathrm{A} }\)=1) would result in the ADD signal being dropped to the THROUGH port after being attenuated to around half its amplitude (T = 0.578 according to Fig. 2b) and evaluated at the detector as a mismatch with that particular stored bit B. Vice-versa for a stored bit B = 0 (crystalline GST), a HIGH search bit would result in partial transmission from the IN to the THROUGH port (T = 0.602) and evaluated at the detector as a mismatch while a LOW search bit would result in the ADD port signal being sent off-chip via the DROP port, owing to the undercoupled condition of the MRR. We can compensate for the lower transmission from the ADD port compared to that from the IN port by adjusting the corresponding laser power or by simply placing a broadband attenuator at the mouth of the IN bus, in order to ensure uniform response among mismatch permutations. The minimal transmission in the matched bit cases (0.027 and 0.038 from Fig. 2) may be further reduced by precisely engineering the coupling gaps to achieve better critical coupling (Feldmann et al. 2020).

Fig. 3
figure 3

Single bit mismatches are detected at the Through port as an XOR-like operation between a stored nonvolatile bit encoded in the crystal state of a PCM, and a search bit represented by the presence of an optical pulse at the resonant frequency \({\lambda }_{i}\) of an MRR

In contrast to a fully optical read scheme, the writing scheme of our photonic CAM is chosen to work via electrical heaters which instantiate the phase change of the GST to change the stored bit B, the details of which have been presented in a later section.

3 CAM array design

We propose the architecture for the CAM array in which the described unit cells are parallelly cascaded after appropriate optimization in the frequency domain. During the content addressing operation, a fast parallel XOR-based comparison operation is performed across the stored data of the table which for example in the case of AL, is to look for a fast match with the packet destination address of the incoming packet of the router. Here we leverage the WDM technique to compute the degree of mismatch (if any) between the bits of an input query and the stored words (entries) in the CAM array. To represent multiple wavelengths, we use multiple ring resonators of increasing ring radii (1.515 to 1.59 μm) to represent different bits in a row as shown in Fig. 4. A search query and its complement encoded as WDM signals enters the straight waveguide through the IN and ADD ports respectively, and the coupling condition determined by the PCM state on each ring resonator results in a bitwise XOR operation being performed at each read wavelength within the FSR, as described in the previous section. Thus, at the output port we obtain a multiwavelength signal which in the case of a full match will ideally consist of no light, but due to non-zero transmission values (< 5%) in the matched bit states, possesses weak (few μW) components from all the channels. Wavelengths corresponding to mismatched bits will be present at the Through port and the power detected at the receiver (corresponding to each row/word) will result in evaluation of the mismatch as per the decision levels.

Fig. 4
figure 4

Full match case output, with a don’t care bit (x) included in the search query. In the shown case, the first two bits (10) of the 3-bit search word matches with the corresponding bits in the stored word (10) and the 3rd bit is ignored by eliminating light of the corresponding frequency in both the IN and ADD buses. Ideally no light at any wavelength makes it to the THROUGH port/bus (XOR output = 000)

Although we have designed a ‘binary’ CAM which does not offer the facility of storing don’t care (X) states in the table words, the system can be used to eliminate the effect of a particular bit on the word match by sending light with the corresponding frequency in neither of the input buses. In this way a don’t care (X) may be searched at multiple specified positions in the query.

Full FDTD simulations were performed for only the smallest and largest rings in a row of n (= CAM entry length) MRRs varying the state of an embedded GST layer between fully amorphous and crystalline. Owing to the large computational expense of resonant devices in FDTD methods, the results for the port transmissions were used to calibrate an adaptive analytical model which takes as input the material and bending losses, coupling coefficients (r, k) and GST lengths as parameters. The material loss due to PCM absorption, quantified by effective indices, was obtained from mode simulations (shown earlier in Fig. 2b) for our chosen geometry and PCM material parameters (Pernice and Bhaskaran 2012). Frequency sweeps were conducted in Lumerical MODE tracking the fundamental TE mode for each GST state. Bending loss and coupling coefficients vs. frequency were calculated in separate FDTD simulations for the designed bend radii and coupling gap sizes. After obtaining all the required parameters, we calculated the through/drop frequency response (from Eqs. 1 and 2) for a row of PCM-MRRs in parallel using the model Eqs. (1) and (2). In this way the full FDTD results were emulated by breaking up the simulation into segments thus reducing the lengthy computation time required for a row of resonators each with two possible PCM states. Comparing the adapted model with full FDTD results for the smallest and largest MRR (Fig. 5a), we find reasonable match for the positions and heights of the transmission peaks/troughs in light of our application of the photonic devices as two-level switches. We noted that our fitted model slightly overestimates the FWHM and underestimates the drop port transmission peaks for the smaller rings, translating to higher crosstalk and thus a miniscule performance drop in circuit operation compared to what would be obtained if we used the results from full FDTD simulations. Figure 5b illustrates the superimposed normalized transmissions for each resonator in a row of 8 MRRs with increasing radii (1.515–1.59 μm). The PCM patches were designed to be longer for the larger ring (400 nm vs. 270 nm for the smaller ring) in order to compensate for the variation of coupling efficiency with wavelength.

Fig. 5
figure 5

a Calibration of an analytical model by cross-matching the calculated frequency responses with those obtained from full FDTD simulations for (i) 1.515 μm MRR and (ii) 1.59 μm MRR integrated with PCM-GST (amorphous and crystalline states). b Superimposed frequency responses for two degrees of crystallization: GSTa (0%) and GSTc (100%) of 8 PCM integrated MRRs with increasing radii designed to resonate at equally spaced wavelengths (1532 – 1572 nm) within a single FSR (~ 56 nm)

The maximum word size supported by our CAM array design will depend on the number of channels that can be fit into a single FSR while maintaining an acceptable level of crosstalk and is quantified by the finesse of the resonator, defined as the ratio of the FSR and linewidth (aka FWHM or 3 dB bandwidth). The overall effect of interchannel crosstalk in our system is to limit the transmission to the output in the mismatched state hence reducing the detected power contrast between match and (single bit) mismatched cases. However, the crosstalk penalty is dependent primarily on the states of the individual PCMs and can be easily analyzed and accounted for in the design process in light of the proposed application. Since the FWHM is larger for lossy rings, MRRs storing the 0-bit value (corresponding to fully crystalline state of the GST) will have a larger overlap on adjacent MRR’s frequency responses. We simulate the frequency response of a single ring array (16-bit word) in the industry-grade PIC solver ‘Lumerical INTERCONNECT’ to visually illustrate the overall through port transmission when a particular word row stores all 1 s (all amorphous) and all 0 s (all crystalline) in Fig. 6. For this purpose, the device parameters (loss, coupling, effective index) extracted from the calibrated analytical model (including the frequency dependency) have been applied to INTERCONNECT’s built-in double-bus MRR element model. The simulated resonance peaks are spaced 2.779 nm apart within an FSR of ~ 56 nm for the largest ring (having a radius of 1.59 \(\mu m\)). Selecting a GST length of 350 nm, the FWHM of the largest ring in the (more lossy) crystalline state is 3.43 nm- giving a finesse value of ~ 16 for our design. Lengths of PCM patches are designed to be longer for larger rings in order to compensate for the larger coupling coefficients at longer wavelengths and keep the extinction ratios across a row as uniform as possible. The resonances of the largest rings in an array are for this reason wider than those of the smallest rings resulting in a slightly lower extinction ratio between the GSTa and GSTc states. We have in a later section shown the photodetector output for all single bit mismatch permutations in a CAM with 8-bit entries to quantitatively evaluate the effect of this deterministic crosstalk on the uniformity and integrity of the search operation.

Fig. 6
figure 6

Through port frequency response considering inter-ring crosstalk when the value of a 16-bit word is hex 0 (all GSTc) or hex FFFF (all GSTa). Read wavelengths are ideally chosen to match the resonance (minimum transmission) in the GSTa states of the MRRs

4 Optical search engine layout and cascadability

To query the match or presence of a search query in a generic CAM table, it is converted to a binary form and bit-encoded as light pulses using a suitable modulation scheme before being input to the search mechanism or engine. The search engine outputs a match line (ML) per table entry that signifies a match (ML = 0) or mismatch (ML = 1). The match lines are input to a priority encoder that generates the address (or first address in case of multiple matches) at which the match occurred, in which case a Hit output is also asserted. Our designed CAM cells are at the heart of this address-returning system that is capable of simultaneously storing data in a nonvolatile manner and performing the mismatch operation via all-optical XORs. Each n-bit entry in the CAM table is terminated by a photodetector whose photocurrent can be read out to evaluate the match/mismatch of the particular entry with the query. The block diagram of the overall system for returning the address of a matching entry is shown in Fig. 7, including a scheme for cascading word-partitions. By partitioning a longer m*n bit word into m n-bit words and feeding each word-part into an identical CAM array, the search operation can support m-times longer words without requiring increased resolution capability at the detector (otherwise deteriorated by increased crosstalk in the alternative case of using a denser WDM scheme). Electronically joining the match lines corresponding to m different partitions in an m*n bit word comes at the expense of read latency increased by an amount equal to the capacitive delay of an m-input OR gate.

Fig. 7
figure 7

Block diagram of the cascaded system to return the address of the searched word in the case of a match in the CAM table. By cascading smaller arrays with digital OR gates, we can increase the word length supported by the search

5 Reconfiguration of CAM entries

Write operations in CAMs for routing applications are performed during network-topology changes, when the stored routing paths need to be dynamically reconfigured and new data have to be written in the memory. In most CAM applications, including those for memory augmented neural networks, writes are less frequent than searches (Alkabani et al. 2020; Ni et al. 2019) and so the read speed rather than write speed is the main bottleneck for performance. In our non-volatile architecture, writing or reconfiguration translates to amorphization or recrystallization of the thin GST segments integrated on each MRR in a row. Various reconfigurability schemes have been implemented on phase change-based photonic platforms (Kato et al. 2017; Rios et al. 2018; Wu et al. 2019; Xiong et al. 2011; Zhang et al. 2020; Zheng et al. 2020b, 2020a, 2018) to explore the tradeoff among power consumption, speed, insertion loss, footprint and complexity of the routing arrangement. Since our proposed architecture is based on small MRRs to exploit their large free spectral range, we suitably choose an ultracompact integrated heater scheme for electrical writes proposed recently for GST-based switches on an SOI platform (Sb et al. 2018).

Finite-element based simulations were done by Sb et. al. (2018) on instantiation of phase change via voltage pulses applied to a 10 nm copper layer deposited on top of a GST patch with similar dimensions as ours (500 nm) in a 400X250 nm SOI platform (Fig. 8). Pulses with an amplitude of 5 V were applied for 4 ns to the Cu electrode to heat the 30 nm thick cGST layer above its melting temperature and allowed to quench for < 1 ns to achieve full amorphization (SET). For full recrystallization (RESET), a 1 V pulse was applied to hold the GST’s temperature above its glass temperature for 100 ns. Energy consumption for set and reset per bit in our case can be roughly estimated to be half of the reported 194 pJ and 163 pJ values respectively due to shorter (270–400 nm) and thinner (10 nm) GST segments. Our WDM-exploiting layout also supports sub-nanosecond optical writes for the amorphization operation (Feldmann et al. 2020; Li et al. 2019; Stegmaier et al. 2017) by utilizing the input buses to send multiplexed write pulses at the resonant frequencies. However, for this scheme, all the devices in a row have to be reset first before setting the individual bits, and since the ring is undercoupled in the RESET state, write power would be partly wasted. We note that replacing the S-band transparent ITO material with copper would incur some insertion losses leading to decreased contrast between the PCM states, which can be recovered by increasing the length or thickness of the GST patch (Rios et al. 2018), or by sub-wavelength patterning of the GST patches rather than simply deposition along the entire width of the ring waveguide (Wu et al. 2019). Considering the overall device footprint and energy available with current technologies, we propose the Cu-heater approach for facilitating the CAM write operations. Electrical reconfiguration of optical PCM switches is currently an area of concentrated research in the search for enabling large-scale optical FPGAs, and silicon PIN heaters as well as ITO and graphene heaters (Kato et al. 2017; Zheng et al. 2020b, 2020a) are emerging as efficient solutions.

Fig. 8
figure 8

Proposed write scheme for the individual MRRs utilizing an ultracompact electrothermal heater

6 System level evaluation

We evaluate the read performance of our CAM architecture as part of an optoelectronic system consisting of state-of-the-art laser sources, modulators and photodetectors to study the reliability of the word search operations for various permutations of match and mismatch under channel noise, crosstalk and source/receiver nonidealities. For this purpose, we utilize Lumerical INTERCONNECT as our simulation platform into which the parameters of the MRR-based devices and optoelectronic system are imported. The implementation of the CAM table is shown in Fig. 9 and system parameters are listed in Table 2. We illustrate our design of a four-entry table with 8-bit words to avoid lengthy simulation times. The 8 MRR resonances are equally spaced by \(\Delta \lambda\) in the range 1532–1572 nm so that \(8\Delta \lambda\) < FSR (= 56 nm). The extracted port transmissions from FDTD simulations for a single ring are used to calibrate an analytical model (discussed earlier in Sec. 3) to estimate the frequency response for all 8 rings (as shown in Fig. 5).

Fig. 9
figure 9

Partial schematic for the integration of the CAM table implementation in a photonic circuit simulation framework

Table 2 Optical source/receiver and noise parameters

Since our main purpose is to characterize the bit integrity and uniformity between various permutation states for the read operations under noise and crosstalk, choice of modulator and photodetector may be arbitrarily selected. However, in the interest of benchmarking the system under familiar conditions, parameters including the photodetector noise (thermal and shot) characteristics are taken from literature on high-speed (> 50 GHz) integrated Ge p-i-n devices (Chen et al. 2016; DeRose et al. 2011). Static digital queries and their bitwise complements are generated via modulation of CWL lasers by NRZ coded signals. Laser output power is arbitrarily chosen as 1 mW for each probe wavelength (aligned to resonance wavelengths in the device’s amorphous state). Additionally, a noise spectral density (10–18 W/Hz) is added to the laser sources. In a practical system the NEP of the detector and level of two photon absorption in the SOI platform would determine the source powers.

6.1 Photonic match/mismatch evaluation

Read outputs for full match of search query with the four stored words set to simulate full match, single- and double-bit mismatches (Fig. 10) demonstrate the reliability of the proposed photonic match scheme. Owing to non-zero transmission to the Through port in the case of a bit match when individual MRRs are not fully critically coupled, full match signals possess a non-zero amplitude (~ 0.1 mA in the shown case). The two different bit permutations for the single-bit mismatch are split due to 1) the dependence of inter-ring crosstalk on the bit states in a single table entry and 2) small differences in transmission seen by input signals from IN and ADD ports, as qualitatively discussed earlier. We further conduct a full sweep over the 2n by n input space to simulate the n (= 8) possible permutations for a single bit mismatch with each of the 2n (= 256) possible queries to estimate the appropriate decision levels at the receiver (Fig. 11). The yellow surface represents the minimum level of the noisy photodetector output current for a mismatch and the blue surface represents the maximum level of the same for a full match. A minimum extinction ratio of 2.054 dB is exhibited between the two cases. Properly engineering the device parameters to achieve better critical coupling, besides material level optimizations of the phase change element are key to further increasing this contrast (Rios et al. 2018; Wu et al. 2019) towards enabling reliable evaluation of the “degree of mismatch” (important for attentional memories (Ni et al. 2019) in a manner similar to a PAM scheme, rather than just simply detecting the occurrence of a mismatch. In this aspect, future work may also explore utilization of intermediate crystallization states of the GST (Chakraborty et al. 2019; Cheng et al. 2018; Li et al. 2019) to assign different weights to mismatch conditions in different bit positions.

Fig. 10
figure 10

Output of the match operation represented as photodetector current, with a search query of binary 10101110 with stored words 10111110 (1-bit mismatch), 00111110 and 10101010 (2-bit mismatches). Time interval for simulation was chosen arbitrarily to demonstrate the levels of noisy output

Fig. 11
figure 11

Input space sweep for all 8 possible permutations of a query word for a single-bit mismatch with 256 possible stored words. Maximum level of full match case outputs under noise for 256 words in an 8-bit CAM is colored blue. A worst-case extinction ratio of 2.054 dB is exhibited between the two surfaces

6.2 Read latency

The latency for read operations is not limited by the circuit-level implementation of the CAM table itself but rather by the photonic and electro-optic components used to implement the search system. Read latency through the parallel MRR rows is estimated. The time of flight of the photon in an MRR defined as \({T}_{photon}=\frac{Q}{ \pi *{f}_{res}}\) amounts to ~ 2.74 ps using the maximum simulated Q-factor (~ 1700) for the 1.515 μm MRRs in amorphous (low-loss) state. Optical splitter length of approximately 500 μm (Alkabani et al. 2020) contributes another 1.6 ps to the total time of flight. Thus, the upper bound for the parallel XOR operation speed can theoretically exceed 200 GHz though it is currently limited by current photodetector technology (Chen et al. 2016) response times to around 50 GHz (tread = 0.02 ns).

6.3 On-chip area estimate

Compact on-chip footprint is also an advantage in our photonic CAM owing to the simplistic design of the unit cells. Area requirement for storing the table entries scales with the number and bit-length of words as the product of n*M*AreaMRR where n is the word length in bits, M is the number of entries and each MRR occupies an average area of ~ 7.55 μm2, providing a memory density that would be quite difficult to achieve using bulkier components like Mach–Zehnder Modulators or architectures requiring internal light sources and amplifiers. The MRR ‘pitch’ requirement (~ 100 μm in switching applications) due to thermal restrictions (Cheng et al. 2020) is also a constraining factor for the footprint when thermo-optic modulation schemes are used. Since our design is based on self-holding states, the pitch requirement is greatly relaxed.

Table 3 summarizes the characteristics of our CAM cell and compares them with previous works utilizing optoelectronics (Alkabani et al. 2020; Mourgias-Alexandris et al. 2018) and PCM in electronic domain (Junsangsri et al. 2017). As the title of this work suggests, the ultracompact nature of our designed photonic architecture allows us to achieve a bit density of more than 2 orders of magnitude larger than previous photonics-based designs. The nonvolatile capability of the cells to store bits removes the requirement of voltage biasing of each individual cell (which is an advantage especially compared to mW-level biasing power of MZI-based approaches) and provides a platform for in-memory computing rather than storing the CAM entries in a separate memory. With further optimization of switching techniques or by use of alternative PCMs with lower switching power, further energy benefits can also be aimed for to achieve energy consumption comparable with that of other technologies in the electrical domain.

Table 3 Characteristics of photonic PCM-CAM and Comparison with Literature

7 Conclusion

We have proposed a photonic content addressable memory scheme based on a compact nonvolatile switch capable of high-bandwidth search operations. By leveraging the self-holding capability of the phase change material GST integrated with WDM-compatible Si microring add-drop filters, we designed a photonic XOR-operation structure for parallel read-out. We evaluated the reliability of mismatch detection under inter-ring crosstalk and channel/detector noise and engineer the uniformity of the system’s read response to various mismatch permutations by appropriate design analyzing the effect of crosstalk. We have also proposed a suitable write scheme for CAM entry reconfiguration through considering a balanced tradeoff between energy consumption, speed, insertion loss and chip footprint. The proposed architecture can be readily cascaded to support longer word searches and integrated with large scale electronics to implement ultrafast searches in hardware. Our work highlights yet another novel application of nonvolatile integrated photonics capable of not only filling a void in high-end datacenter applications but also providing photonic acceleration in search-based operations for specialized heterogeneous computing platforms.