1 Introduction

Digital receivers revolutionized electronic systems providing innumerable applications and functionalities. Communications, transmission data, signal processing and many other areas have benefited and undergone a true revolution due to these devices. Therefore, the use of reprogrammable radio architectures leads us to the concept of Software-Defined Radios (SDR) [1]. The main feature of SDR’s systems is to get the code as close to the antenna as possible. It turns radio hardware problems into software problems. This means that the embedded software running in this communication terminal will simultaneously modulate/demodulate the transmitted/received waveforms in a wide range of frequencies and deal with most standards that may evolve or appear during the hardware lifecycle. This concept is well depicted by [2], and it is shown in Fig. 1.

Fig. 1
figure 1

“Typical” software radio block diagram [2]

Observing this idealistic figure, we may see that the design of SDR architectures includes many challenges to overcome. For our purpose, considering the receive path, the most significant bottleneck nowadays is practically to deal with two crucial problems: To implement a front end that can process all type of signals in very distinct bands (any modulation, channel bandwidth, or carrier frequency) [3] in the way that the signal delivered to the Analog-to-Digital Converter (ADC) might be in a tangible frequency for the ADC resolution (on its Nyquist band) [4].

However, the design flow and circuit techniques of contemporary transceivers for Multi-GHz mobile RF wireless applications are typically analog intensive, and utilize process technologies that are more complex than or incompatible with standard digital CMOS processes [5].

Thus, Field Programmable Gate Arrays (FPGAs) are an attractive option to perform many of these SDR functions for reasons of cost, performance, and programmability [6, 7]. Furthermore, since SDR functions are commonly based on hardware reconfiguration, VHDL can be used as the “(re)programming language” in the development and implementation of high speed digital signal processing algorithms within the FPGAs [8].

Nevertheless, some authors ensure that in order to achieve the flexibility required by modern SDRs, it should be performed by combining a DSP processor (reconfigurable software) and an FPGA (reconfigurable hardware) [9, 10]. That is, while the DSP processor handles system control and configuration functions, the FPGA implements the computationally-intensive signal processing data path and control, minimizing the system latency. Hence, to switch between standards, the processor can switch dynamically between major sections of software while the FPGA can be completely reconfigured, as necessary, to implement the data path for the particular standard.

In a long term solution many trends will open up for potential disruptive innovations in SDR applications. For one side, based on the analog approach, Sampled Analog Signal Processors (SASP) [11] is one solution. On the other side, based on the digital approach, heterogeneous multi-processor system-on-chip (MPSoC) platforms [12] is another solution.

In this work, the reconfigurable logic is the implementation path to start tackling the first part: the demodulator circuit, in the digital domain. The circuit has an All-Digital Phase-Locked Loop (ADPLL), described in VHDL language, as its main IP (intellectual property) block. In Sect. 2, the FM Receiver is revised. In Sect. 3, the Digital FM Demodulator and each block are described and explained. In Sect. 4, a detailed system modeling is presented. Synthesis results and simulations are summarized in Sects. 5 and 6, respectively. Finally, in Sect. 7 conclusions are drawn.

2 FM receiver

Frequency modulation has been in ubiquitous use for communication media such as radio broadcast FM, TV audio, VHS HiFi, laser disc, and even digital wireless in the form of frequency shift keying (FSK). Many efforts have been made to integrate an FM receiver on a single chip using various architectures, but the performance has been limited by the analog signal-processing accuracy [1315]. The main issue of integrating an FM demodulator on a chip is how to accurately discriminate a small frequency deviation of the FM signal from its center frequency. Most FM demodulators use either the Foster–Seeley method or phase-locked loops (PLL). The PLL behaves as a narrow-band tracking filter, with its loop-filter output exhibiting a frequency discriminating characteristic. It is readily implementable in integrated forms, but the linearity of the voltage-controlled oscillator VCO affects the overall linearity [16]. Hence, digital PLLs can overcome some of the weaknesses of analog PLLs. In addition, the digital tangent method can compute frequency from the ratio of in-phase and quadrature (I–Q) signals [17, 18]. These digital methods for wide dynamic range IF processing and accurate frequency discrimination require either extensive numerical processing or large read-only memory (ROM) lookup tables. In this work, a highly linear digital FM demodulator is proposed.

3 The digital FM demodulator

3.1 The all-digital phase locked-loop (ADPLL)

The phase locked loop is a useful element in many types of communication systems. It is used in two fundamentally different ways: (1) as a demodulator, where it is implemented to follow phase or frequency modulation [19, 20] or (2) to track a carrier or synchronizing signal which may vary in frequency with time, that is, as a local oscillator (LO) or as a clock generator [2123]. Operating as a demodulator has been thought in the early 1970s [24, 25] and thereabout the performance as a near-optimum FM Demodulator [26].

3.2 Basic operation

The complete diagram of the Digital FM receiver circuit is shown in Fig. 2. With mathematical equations, we can explain the basic operation of the circuit. Thus, since the input signal is Frequency Modulated, in(t) can be expressed as follows,

$$ in(t) = \sin (\omega _{0} t + \theta _{i} (t)) $$
(1)
Fig. 2
figure 2

The digital FM receiver

As shown in Fig. 2, the ADPLL is made by a feedback loop and the Numerically-Controlled Oscillator (NCO) outputs the same frequency of the input FM signal in(t). In this circuit the output sinusoidal frequency is controlled by a digital input value. Then NCO output ref(t) is expressed as follows,

$$ ref(t) = \cos {\left( {\omega _{0} t + \theta _{0} {\left( t \right)}} \right)} $$
(2)

By multiplying in the Phase Detector, in(t) and ref(t) signals gives the signal c(t) as follows,

$$ c(t) = in(t)ref(t) = \sin (\omega _{0} t + \theta _{i} (t))\cos (\omega _{0} t + \theta _{0} (t)) $$
(3)
$$ \begin{aligned}{} c(t) & = \frac{1} {2}\sin {\left( {\theta _{i} (t) - \theta _{0} (t)} \right)} \\ & + \frac{1} {2}\sin {\left( {2\omega _{0} t + \theta _{i} (t) + \theta _{0} (t)} \right)} \\ \end{aligned} $$
(4)

The first term of the resulting equation above corresponds to the phase difference between in(t) and ref(t). The second term corresponds to the high frequency component. By removing the second term thru a loop filtering, the phase difference can be obtained.

$$ c {^{\prime}} (t) = \frac{1} {2}\sin {\left( {\theta _{i} (t) - \theta _{0} (t)} \right)} $$
(5)

As described in [27] the phase difference between the modulated signal and the carrier \( {\left( {\theta _{i} (t) - \theta _{0} (t)} \right)} \) is the desired original signal.

Since in our case the entire system is described in VHDL, the All-Digital Phase Locked Loop system is broken down into three basic parts: (3.2.1) Phase Detector, (3.2.2) Loop filter, and (3.2.3) Numerically-Controlled Oscillator (NCO). It also includes a (3.3) Low Pass Filter at the output to perform signal shaping. The following subsections explain each block of the receiver.

3.2.1 Phase detector

The Phase Detector detects the phase error between input signal and output signal from NCO. This operation employs a multiplier module and a register as depicted in the following Fig. 3. In the VHDL model, we have used the Booth’s Multiplication algorithm instead of simple signed arithmetic multiplier operation. The Booth’s multiplication algorithm will achieve smaller area rather than the simple array multiplier [28]. Nonetheless, if the circuit operates at the high frequency regime, high-speed multiplier architecture will be needed, for example, the architecture named Wallace-tree multiplier [29] may well be applied for this purpose.

Fig. 3
figure 3

Phase detector

3.2.2 Loop filter

As shown in Fig. 4, the loop filter circuit is composed by a small loop branch. It does the addition with the output signal from the phase detector (c(t)) and the D flip-flop output multiplied by the coefficient α = (1−1/16) = 15/16 = 0.9375 [30]. The loop filter output can be expressed as follows:

$$ loop\_out = c{\left( t \right)} + \alpha c{\left( {t - 1} \right)} + \alpha ^{2} c{\left( {t - 2} \right)} +\cdots $$
(6)
Fig. 4
figure 4

Loop filter

This kind of averaging with the smaller weights for the older values indicates the low pass filtering characteristics. In fact, the loop filter will remove the high frequency components after the multiplication performed by the Phase Detector (PD). That is, the loop-filter has a discriminator frequency behavior. Hence, the first-order loop filter is a low pass filter with the following transfer function:

$$ H{\left( z \right)} \equiv \frac{{Y{\left( z \right)}}} {{X{\left( z \right)}}} = \frac{1} {{z - 0.9375}} $$
(7)

Which has a pole on the real axis at z = 0.9375. From the stability property of discrete time filter, we know that H(z) is stable since its pole is located within the unit circle [31].

3.2.3 Numerically-controlled oscillator

Differently than the Voltage-Controlled Oscillator (VCO), for which the output frequency depends on the DC input voltage, the Numerically-Controlled Oscillator (NCO) module generates a digital cosine wave for which the frequency is determined by the digital input value (8-bits in our case). The output can either be used directly, for example, by a digital multiplier, or can be passed onto a Digital-to-Analog Converter (DAC) for use in the analog domain. As shown in Fig. 5, our circuit is based in the common look-up table (LUT)/accumulator NCO architecture. The output of the accumulator is scaled down to match the address size of the lookup table. The input to the accumulator would consist of the sum of an offset corresponding to the nominal lock frequency and Ac′(t) that would be determined by the output of the loop filter. Our choice is based on the fact that this architecture will save area, while in the other hand it will have a huge advantage concerning high frequency operation [32]. To understand our target system, let us assume that the system clock frequency is 16 MHz and the center NCO operating frequency is 1 MHz. Then as shown in the Fig. 6, there are 16 sampling points in 1 cycle of 1 MHz sinusoidal wave. The NCO generates exactly 1 cycle of sinusoidal wave when the input value Ac′(t) = 0. Since the offset value is 1/16, every clock cycle the D flip-flop accumulates the offset value. Then in 16 cycles the accumulated value will increase by 1.0. Multiplying by 2π the accumulator output it will address this value in the look-up table and extract from the ROM the cosine value. Otherwise, when the input value is more than 0, the accumulation speed gets higher. Thus in less than 16 cycles the accumulator increases by 1.0. This will correspond to a higher frequency than 1 MHz. Moreover, when the input value is less than 0, a lower frequency than 1.0 MHz is generated. Consequently, the NCO operating frequency will be controlled by the input value with center frequency of 1.0 MHz. In fact, all 1024 values for a given cosine signal cycle are needed. However, since one cycle can be divided into four sinusoidal quarters, only the first quarter with 257 values needs to be defined. The remaining quarters are duplicated from the first quarter, where the opposite sign is applied to the second and third quarter. This is illustrated in Fig. 7.

Fig. 5
figure 5

Numerically-controlled oscillator

Fig. 6
figure 6

Operating signal wave at the NCO

Fig. 7
figure 7

Data values in one cycle cosine ROM

3.3 Low pass output filter

This is a simple moving-average FIR filter [33]. As shown in Fig. 8, it uses 16 tap Finite Impulse Response (FIR) filter to perform digital low pass filtering. This is essentially an averaging filter since its output is equal to the average value of its input over the last n-tap samples, where n is number of tap used [27]. This configuration needs 16 coefficients, but simplification is taken by assuming all of the coefficients are the same, 1/16. In actual hardware the 1/16 multiplication is a trivial and fast 4-bit right shift, hence no hardware multiplier is required.

Fig. 8
figure 8

FIR Filter at the system output

4 Digital FM demodulator modeling

The complete schematic diagram modeled in Z domain is shown in Fig. 9. The single most important point to realize when designing with the PLL is that it is a feedback system and, hence, it is characterized mathematically by the same equations that apply to other, more conventional feedback control systems [34, 35]. Mathematical model of the all digital PLL system can be derived analyzing the transient and steady state response. The block diagram of the all digital PLL system in z domain (discrete time) and its transformation in s domain (continuous time) is shown in Fig. 10. The transfer function of the system is

$$ \frac{{Y{\left( s \right)}}} {{X{\left( s \right)}}} = \frac{{ - s^{2} + s}} {{1.9375s^{2} + 0.06161s + 0.00089}} $$
(8)
Fig. 9
figure 9

Z-domain complete schematic diagram

Fig. 10
figure 10

S-domain and Z-domain system transfer function

Hence, the PLL system is a second order system. In the test for stability we subjected the system with test signal representing a unit step of frequency at constant phase, this test signal corresponds to actual input signal which is a FM-modulated signal [36].

5 Synthesis results

LeonardoSpectrum Level 3 version 2004a.63 from Mentor Graphics was used to synthesize the circuit. The complete system with each VHDL code and its hierarchy is shown in the following Fig. 11. Since the hardware compiler allows us to optimize speed and/or area, optimization goals were set on a global basis hierarchy. The speed optimization minimizes delay by synthesizing circuits to contain the least number of levels of combinational logic, or larger fan-out cells, yielding increased design area. This setting maximizes operating frequency and minimizes combinational path delays. And the area optimization minimizes the combinational logic resources used, sometimes yielding reduced speed. This setting minimizes combinational logic usage. Tables 1 and 2 summarize the values obtained for timing and area using as a target the TSMC foundry CMOS 0.35 μm technology, in its “fast” version.

Fig. 11
figure 11

All programmable logic blocks—coded in VHDL

Table 1 Timing values
Table 2 Area values

6 Simulations

Simulations were done using the netlist generated by the synthesis. That is, gate models with area and delay constrains were included. For a system clock frequency of fclk = 50 MHz the setup time (tsu = 400 ps) and the reset-to-output time (trst = 120 ns) are shown in Figs. 12 and 13. For the same clock frequency the settling time (Ts = 3 μs) is shown in Fig. 14.

Fig. 12
figure 12

Reset-to-output delay trst = 120 ns

Fig. 13
figure 13

Setup time delay tsu = 400 ps

Fig. 14
figure 14

Settling time Ts = 3 μs

However, in the simulation of the entire system we set the clock frequency and the sampling frequency to 16 MHz. FM modulation is ±10 KHz at a 1 MHz center frequency. That is, ±1.0% modulation of the carrier frequency. Figure 15 shows the simulation waveform for the all-digital FM receiver circuit subjected to square wave modulated data, while Fig. 16 shows the simulation waveform for the all-digital FM Receiver circuit subjected to triangular wave modulated data. In both pictures, the first row shows the FM modulated waveform according to the sending data (in(t), input waveform). The second row is the NCO output (ref(t)) and the third row is the phase detector (multiplier) output (c(t)). The fourth row and the fifth row are the accumulator output and the demodulated output, respectively. At the initial simulation phase, the demodulated output overshoots since the phase synchronization is in convergence phase. From Figs. 15 and 16, we concluded that the designed FM receiver circuit successfully demodulates input signal back to the original signal.

Fig. 15
figure 15

Simulation waveform of the circuit, subjected to square wave modulated input signal; Plot from ModelSim SE 6 “analog view”

Fig. 16
figure 16

Simulation waveform of the circuit, subjected to triangular wave modulated input signal; Plot from ModelSim SE 6 “analog view”

7 Conclusions

A digital FM demodulator based on a 2nd-order All-digital phase-locked loop has been described which was implemented digitally in a simple way. The architecture is built from the ground up using digital techniques that exploit the high speed, high density and high flexibility of Field Programmable Gate Arrays (FPGAs). In our case, for processing the demodulation only about 15 K logic gates operation at 150 MHz frequency are needed. Therefore, since our straightforward circuit is restricted to one such application, on the other hand the fine tuning of each component demonstrates that the reprogrammability allows us to modify parameters without direct hardware interaction. Thus, it can be reconfigured for any other application that can demodulate any other transmission standard in a completely different frequency, and in addition doing this all in parallel. Consequently, this is undoubtedly the main constraint in nowadays SDR systems.