1 Introduction

Memory Unit is essential for storing and accessing the data in the processor. Each and every instruction in the program was stored in the appropriate memory locations and they are implemented one by one using control unit in the controller. Speed has increased in contemporary high-performance processors. Nonetheless, improvements to Memory Units also focus on low power consumption, increasing data storage density in a smaller footprint, and reducing latency without adversely affecting CPU performance. A Content Addressable Memory (CAM) specifically designed for high performance systems in order to achieve desirable speed. A Content-Addressable Memory (CAM) provides the addresses of the matched tagline data after all of the input data has been compared to the data taglines that have been stored in the memory cell. On network devices, CAM makes software and hardware-based search engines by offering a single clock cycle throughput. They find applications in high-speed search scenarios. An SRAM cell is a fundamental memory unit that stores data using binary values, 1 s and 0 s. The number of CMOS transistors within an SRAM cell can vary based on the specific type of SRAM. Traditionally, the 6 T SRAM configuration has been widely used for data storage. However, conventional 6 T SRAM tends to consume a significant amount of power, leading to a degradation in system performance [5]. A variety of methods need to be used in the CAM cells in order to reduce this problem with power consumption and produce low-power memory cells. Because CAM cells enables data retrieval using content rather than address. They have the advantage of drastically decreasing the amount of time needed to find stored data in memory. This unique attribute makes CAM a valuable component in high-speed search applications [17]. When considering alternative memory search algorithms currently in use, CAMs provide a noticeable performance advantage. It shortens access times by concurrently comparing the requested data with the complete list of pre-stored data items. In CAMs, SRAM with comparison circuitry, which allows an entire search operation during single clock cycle.

Commonly used applications of CAMs involve search-intensive tasks, especially in the realms of data packet forwarding, filtering, splicing, data compression, and data packet classification within network routers. With the expanding range of applications for CAMs, the need for larger CAM sizes has become more noticeable. However, this growth in size also exacerbates the power consumption challenge. Addressing this issue and finding ways to reduce power consumption while maintaining speed and area efficiency has become a central focus in recent research endeavours aimed at large-capacity CAMs [15].In this paper, our initial focus is on the analysis of methods to enhance speed while simultaneously reducing power consumption through the utilization of different architectures. Subsequently, we conduct a comparative evaluation of the performance analysis of existing and proposed architectures.

2 Related work and background

Low-power CAM architectures using pass-transistor-logic comparison and weak-gate evaluation in high-delay designs [2, 4, 9, 11, 14, 17, 20,21,22,23,24,25] were discussed. High-speed designs have used low-power methods that utilize NOR-type CAM cells [7,8,9,10, 13, 18, 19]. In these cells, the dynamic or transmission gate comparison unit and evaluation phase were controlled by rail-to-rail supply voltages. Matching lines (ML) settle more quickly in this configuration [1, 3, 8, 14]. In order to balance power consumption and delay, the self-power-off approach integrates two supply voltages within the Gated-Power CAM (GP-CAM) [18]. Matching lines (MLs) are pre-charged to voltage levels which is lower than the supply voltage and hence reduces Match line power consumption in GP-CAM cells. This approach combines a selective pre-discharge/pre-charge technique with charge-sharing systems between master-Matchlines and slave Matchlines [5, 9, 14] and among ML segments [4, 12, 13]. The goal is to further reduce ML power consumption. In the precomputation-based architectures, initial comparison step is achieved during the pre-processing of a group of search bits [15, 16]. Search power consumption should be reduced at the search cycle’s assessment stage. The majority of mismatched matching lines (MLs) are removed by segmentation systems, which reduces the amount of ML discharges that are needless during evaluation. In order to facilitate parallel search and lower power consumption within each segment, multiple NAND ML segments also function independently [16]. Shared matchline scheme is used to reduce the power in NOR type ML CAM [23].

To enhance efficiency, a charge-sharing technique is combined with a segmentation scheme. This strategy aims to decrease match line swing and diminish charge loss during the search process [17, 22]. By using clamping and current limiting techniques in the NOR-Match Line segments reduces the overall power consumption [18]. In order to lower the ML switching capacitance, a set of Match lines is divided into clusters of NOR cells into the local-NOR CAM cell and global-NAND CAM cells. Shorter NAND-Matchlines lower the amount of searching power used in the dominant NOR-Matchlines by acting as a filter for mismatched circumstances [20, 21]. Pre-charge and pre-discharge prior to each evaluation phase are made possible by the complimentary characteristic fluctuation in charge between nMOS and pMOS transistors in the comparison/evaluation, which also helps to reduce Matchlines swing in the differential Matchline segments [7, 25]. Mismatching Matchlines are predicted earlier by using dynamically varying signals, limiting Matchlines swings to the lower pre-charge phase [19The majority of these methods typically involve gate-keeping and charging or discharging all of the cells [6, 7, 9, 19] or select groups of cells [10, 13, 14, 25, 26] in a Matchline row using extra ML control or sensing circuits. As a result, these methods impact search speed as well as cause area overhead. From the literature match line, comparison unit,encoder,decoder and sense amplifier are the power consuming part in the CAM architecture. Among all unnecessary precharge and discharge in the matchline the dominant one in the CAM. So that is taken into account to do the proposed CAM architecture.

3 Existing CAM architectures

The traditional Content Addressable Memory Architecture was implemented by putting data into memory and using a comparison circuit with a search line to enter search data. It compares the input data given in the search tag line with the data that has been stored. When comparing the NAND and NOR CAM architectures, it’s important to note that NAND CAMs tend to consume less power but operate at a slower search speed compared to NOR CAMs, which consume more power.

In order to accomplish CAM functionality, data must be stored in the memory and a comparison circuit with a search line for data input must be included. Using the search Tag lines, the comparison operation consists of comparing the corresponding input data lines with the stored data lines [7]. Figure 1(a) and 1(b) depict the architectural configuration used with two distinct architectures, such as NAND and NOR architectures.

Fig. 1
figure 1

a NAND CAM architecture, b NOR CAM architecture

The architectural configuration employed with two different designs, such as the NAND and NOR architectures, is shown in Fig. 2. The CAM cell can be accessed by access transistors [9]. There are two PMOS transistors in the comparison circuit. Four transistors are added to make cross-coupled inverters. XOR CAM cell architecture is represented in Fig. 3. Bitline, Bitlinebar, WriteLine, Searchline, and SearchlineBar are inputs, and Data, Databar and Matchline are outputs. WriteLine is activated during the write process, and data that is meant to be stored is supplied at Bitline, while the matching data is supplied at Bitlinebar. The data is saved in the CAM cell. In the course of the comparison, the stored data.

Fig. 2
figure 2

XOR CAM cell architecture

Fig. 3
figure 3

XNOR CAM Cell architecture

Using eight MOSFETs, the XNOR CAM cell is made up of comparison circuit and a standard 6 T SRAM cell. For the cell access depicted in Fig. 3, two NMOS transistors act as access transistors. The cross-coupled inverters with comparison circuit is composed of two PMOS transistors. By utilizing 1 NMOS and 1 PMOS an inverter was created. Cross-coupled inverters establish the CAM cell’s storage unit. In the XNOR CAM cell, outputs are Data, Databar, and Matchline, Bitline, Bitlinebar and inputs are Writeline, Searchline, and Searchlinebar. It functions without requiring an initial Match Line charge.

Figure 4 shows the Precharge Free CAM architecture. These comprise the following inputs and outputs: Writeline, BitLine Bitlinebar, Searchline, and Searchlinebar. Most importantly, it works without requiring Matchline to have a full charge beforehand. When writing, WL is activated, and data is given at Bitline and Bitlinebar respectively. The data that is stored in the CAM cell and the data that is being sought are compared during a search process. There are two possible outputs from this comparison are matched and mismatched. When the searched data is matched with stored data match, Match line is activated at the high level. However, Matchline struggles when there is a discrepancy between the searched and stored data.

Fig. 4
figure 4

PF CAM Cell architecture

The Self-controlled Pre-charge free CAM cell combines a standard 6 T SRAM cell with a unique comparison circuit, as shown in Fig. 5 Ten MOSFETs are used, including two NMOS access transistors for cell access and two NMOS transistors in a comparison circuit. Additionally, under both match and mismatch conditions, the charge control circuit which consists of the NMOS transistor and the PMOS transistor operates. Match line output is activated in accordance with the charge on node S [14].Fig. 5 shows the SCPF CAM cell circuit with inputs Bitline, Bitlinebar, Writeline, Searchline, and Searchlinebar and outputs Data, Databar, and Matchline.

Fig. 5
figure 5

SCPF CAM Cell architecture

The cell functions well even without pre-charging Matchline. During a search operation, the input data and the stored data of the CAM cell are compared. In the instance of a mismatch, the match line stays low, and in the case of a match, it remains high. When doing comparative procedures, WL is turned off. When node S records a high value, Matchline is able to high through the NMOS transistor in a matched scenario. Node S displays a low-level during mismatch, which causes Matchline to discharge through the PMOS transistor [14].

The comparison circuit of ten MOSFETs and a basic 6 T SRAM cell comprise the Modified Pre-charge Free (MPF) CAM architecture. To allow for cell access, two of these NMOS transistors act as access transistors. There are two NMOS transistors in the comparison circuit. Furthermore, in both match and mismatch scenarios, the charge control circuit, which consists of PMOS transistor and NMOS transistor, enables the output Matchline in response to the charge on node S.

Figure 6 depicts the circuit for the MPF CAM cell, which contains the following inputs: Inputs, WriteLine, Searchline, Searchlinebar, Bitline, and Bitlinebar, Data, Datatabar, and ML. Interestingly, it functions without requiring ML to be pre-charged. When writing, WL is turned on, and data and its complement are supplied at BL and BLB, respectively. While conducting a search, the CAM’s stored data is evaluated in connection with the input data that is being searched, including Searchline and Searchlinebar. As a match occurs, Matchline rises, and when a mismatch occurs, it stays low. WL is turned off while the comparison is being done [25]. The NMOS transistor allows Matchline to go high if a match is detected, and node S stays at a low level. On the other hand, node S displays a high value under mismatched conditions, which causes Matchline to discharge through the PMOS transistor to low.

Fig. 6
figure 6

MPF CAM Cell architecture

SMS architecture of a CAM cell shown in Fig. 7.The issue of residual charge from the cascading of intermediate cells and the long delay in the Matchlines caused by serial discharge of transistors. This limits the length of words that can be used in CAM cells using NAND-cell Matchlines. Longer word lengths in CAM systems are not as beneficial for NAND-cell Matchlines because of the residual charge problem brought on by the intermediary cells cascading. Consequently, the last evaluation node adopts a LOW logic state. In the event that the stored data bits and the searched data bits match, the evaluation nodes in both cells turn to ‘0’. Finally, the evaluation node chooses a LOW logic state. Matchline is designed to use a transistor to discharge to ground (GND). When a 2-bit comparison-evaluation occurs, then each mismatch is handled by a single Matchline discharge path [23].

Fig. 7
figure 7

SMS architecture of CAM cell [23]

4 Proposed cam architecture

High-speed searching is accomplished with NOR-type ML outs, albeit at the expense of significant power consumption. A “N-1” match-line in a N × M-CAM discharge was discovered while looking for a unique word match scenario. Furthermore, different MLs in the CAM array have different numbers of mismatched cells. While some MLs discharge through a greater number of mismatched cells, others through less. Therefore, during the phase transition from precharge to assessment, high discharge rates cause large Matchline power consumption and strong Matchline switching activity.

Suggested Common Matchline scheme (CMS) with PUPD network in depicted in Fig. 8. During search operation, it aims to enhance the power efficiency in a CAM system by using NOR- Matchline. It comprises of the high-speed 10 T NOR-type cell shown in Fig. 8 (c) and 8(d). Enhancement is accomplished by applying a Match-Line control unit (MLCU) to the segmented parts in the NOR-ML and it allowing for selective evaluation for each and every search. This results in a more straightforward precharge in between searches. Structure for the suggested Matchline technique is Fig. 8(a) displays a block diagram of the proposed CMS-CAM with PUC/PDC networks and the suggested Matchline approach is illustrated in Fig. 8(b) with an N-bit word structure that is divided into N and K bit match-lines partitions. Apart from the Matchline sections (ML1 and ML2), the output Match-line (MLN) indicates the match result based on the search.

Fig. 8
figure 8

Proposed architecture of CAM using PUPD scheme

During the search process, the majority of the words in a Content Addressable Memory array were mismatched. In conventional and contemporary Matchline techniques each and every one bit cell contains specific compare-evaluation logic unit.All mismatching cells in CAM give different Matchline states (precharge to discharge).If all Matchlines in a sequence have bit mismatches, the worst-case discharge occurs through ‘N’ data points.The suggested common Matchline system (CMS), illustrated in Fig. 8, seeks to decrease the number of discharge pathways. In Fig. 8(b), we illustrate a 2-bit CAM cell in which the left and right-side storage were compared in a common evaluation block. Each search process is preceded by a default phase (precharge) using the sense amplifier’s precharge transistor, as illustrated in Fig. 8(c). If the stored bits matched with the related search bits the two cells’ evaluation nodes attains ‘0’, resulting in LOW logic being transmitted to the final Evaluation Node. As a result, the common Matchline node (ML1/2) keeps its precharged state indicating a final match. E1/2 is driven with HIGH logic if either or both bits are mismatched, and so ML1/2 discharges to ground via transistor ME. As a result, each mismatch generates one ML discharge path for the 2-bit comparison-evaluation. Mismatching Matchline rows undergo state changes during assessment, but matching rows retain their initial precharged states.

Each ML has the same number of discharge–charge pathways as its mismatching cells [1, 3, 15]. In contrast, the suggested CAM reduces the number of routes to at most half due to two cell mismatches contributing just one channel for Matchline discharge. In the worst-case scenario of an N-bit mismatch, there are only ‘N∕2’ Matchline discharge pathways, which is half the number of traditional and current techniques. As a consequence, the amount of Matchline switching in each and every row is minimized, and then allowing the CMS-CAM to save dynamic power during search. The suggested approach is also prone to compromising the ML latency because of the two-step assessment per two cells. The proposed design of CAM outperforms than all than existing architecture with some addition area overhead. In the proposed CAM, 2 Bit Evaluation comparison unit using PUPD uses common match line which also reduces overall power consumption while comparing to the existing CAM architectures.

5 Simulation and results

The CADENCE Virtuoso tool is utilized for designing various CAM architectures using 45 nm, 20 nm, 16 nm, 7 nm technology. Simulations are consistently conducted at 27 degrees Celsius. The supply voltages ranging from 0.7 to 1.2 V, the testing environment was created and tested using the identical input patterns for both the proposed and conventional CAM designs. The activity of the CAM cell is described in Table 1 in terms of four different searches of the cell. Average power consumption, delay and power delay product of existing CAM architectures and proposed design CAM of using Common match line scheme using PUPD at various voltages (from 1.2 to 0.7 V) were shown in the Table 2, Table 3 and Table 4 respectively for the 8 × 8 bits. Table 5, Table 6, Table 7 describes the Noise, Energy and Area of existing and proposed CAM design analysis at different voltages respectively. From this above proposed design of CAM produce better results than exists CAM architectures.

Table 1 operation of 2Bit-CE CAM Cell
Table 2 Average power consumption in µW for a CAM cell with 8 × 8 bits at various voltages
Table 3 Delay in nanoseconds for 8 × 8 bits CAM cells at various voltages
Table 4 Power Delay Product (Energy in fJ) for CAM cells with 8 × 8 bits at various voltages
Table 5 Noise in dB for 8 × 8 bits CAM cells
Table 6 Energy for 8 × 8 bits CAM cells (fJ/bit/search)
Table 7 Area (μm 2) for 8 × 8 bits CAM cells

Performance comparison in terms of average power consumption, delay, noise and Energy results of existing SMS CAM [23] and proposed CAM for 8 × 8 bits shown in Fig. 9(a), Fig. 9(b) and Fig. 9(c) and Fig. 9(d) respectively using various 45 nm, 20 nm, 16 nm and 7 nm technologies. From this analysis, new proposed CAM architecture is technology independent one and produce better performance in terms of average power consumption (16%), delay (11%), noise (5%) and energy (10%) than existing SMS CAM architecture. Performance comparison in terms of average power consumption results of Existing SMS CAM [23] and proposed CAM for 8 × 8 bits, 16 × 16 bits and 32 × 32 bits shown in Fig. 10(a), Fig. 10(b) and Fig. 10(c) respectively using 45 nm Technology. Delay analysis for existing SMS CAM [23] and proposed CAM architectures represented in Fig. 11(a), Fig. 11(b) and Fig. 11.c for 8 × 8 bits, 16 × 16 bits and 32 × 32 bits respectively using 45 nm technology. Based on the above graph the new CMS architecture yield good results while comparing to all the all existing techniques even if no bits increased in the CAM size with respect to power consumption and delay analysis.

Fig. 9
figure 9

a Power consumption of Existing SMS and proposed CAM Schemes for 8 × 8 array at different nm Technologies, b Delay analysis of Existing SMS and proposed CAM Schemes for 8 × 8 array at different nm Technologies, c Noise analysis of Existing SMS and proposed CAM Schemes for 8 × 8 array at different nm Technologies, d Energy of Existing SMS and proposed CAM Schemes for 8 × 8 array at different nm Technologies

Fig. 10
figure 10

a Power consumption of Existing and proposed Schemes for 8 × 8 array, b power consumption of Existing and proposed Schemes for 16 × 16 array, c power consumption of Existing and proposed Schemes for 32 × 32 array

Fig. 11
figure 11

a Delay results of Existing and proposed Schemes for 16 × 16 array, b Delay results of Existing and proposed Schemes for 16 × 16 array, c Delay results of Existing and proposed Schemes for 32 × 32 array

Figure 12 describes the layout for the proposed CAM. To sum up, the data suggests that employing a PUPD network in CMS CAM leads to marginally 13%–60% lower power consumption compared to conventional memory architecture. This could prove advantageous in scenarios prioritizing power efficiency, such as in mobile devices or energy-saving systems. Proposed new CMS CAM also produced better results than existing architectures in terms of noise, delay and energy. Additionally, the proposed new CMS CAM outperformed the current designs in terms of energy, latency, and noise.

Fig. 12
figure 12

Layout of the proposed CAM cell using 45 nm

6 Conclusion

To reduce the number of match line discharge pathways during search, common match line scheme (CMS) is introduced. The novel design of CAM performs better than conventional designs in terms of noise, power consumption, and latency. The proposed modified CMS-CAM architecture utilizing a PUPD network demonstrates a noteworthy 13%–60% reduction in power consumption and 3–16% decrease in delay when traditional conventional architectures. These results were obtained using different 45 nm, 20 nm, 16 nm and 7 nm technologies in the Cadence tool across various array sizes like 8 × 8 bits, 16 × 16 bits, and 32 × 32 bits. The proposed CMS CAM design by integrating a PUPD network achieves a balanced approach between power efficiency and delay optimization, offering a promising solution for without any significant impact on their performance. The suggested scheme’s search performance is also demonstrated to be consistent under a range of operating situations and efficient for larger macro-sized designs. For data processing in cache-tags and data compressors, as well as for information interchange in network routers, the suggested CMS-CAM may prove helpful to HSEs in implementing lookup table management and high-performance systems.