# A Low-Cost Implementation of High-Order Square M-QAM

## **Detection/Demodulation in a FPGA Device**

Gléverson Fabner Condé Lemos, Marcos Vinícius Silva Oliveira, Fabrício P. V. de Campos

Luciano Manhães de Andrade Filho and Moisés Vidal Ribeiro

Signal Processing and Telecommunication Laboratory

Federal University of Juiz de Fora

Juiz de Fora, Minas Gerais, Brazil

Email: gleversonlemos@ieee.org, marcos.oliveira@engenharia.ufjf.br, fcampos@lps.ufrj.br

luciano.andrade@engenharia.ufjf.br, mribeiro@ieee.org

Abstract— This paper outlines a low-cost technique for detection/demodulation of high-order square *M*-ary quadrature amplitude modulation (M-QAM), M = 4, 16, 64, 256, 1024, and 4096, in a field programmable gate array (FPGA) device. Hence, it is very interesting for advancing high-speed power line communication technology. The proposed technique is based on a so-called heuristic decision region approach combined with a nonlinear function that allows to simplify the hardware implementation of detection and demodulation for a high-order square M-QAM symbols. The performance evaluation of the proposed technique is carried out when the channel is additive white Gaussian noise (AWGN) and additive impulsive Gaussian noise (AIGN). The attained results, in terms of bit error rate vs. energy per bit to noise power spectral density ratio  $(E_b/N_0)$ , indicate that the use of the proposed technique offers reduced performance losses. Also, its implementation in a FPGA device demands the lowest hardware resource and latency.

### I. INTRODUCTION

Currently, the development of spectral efficient *power line communication* (PLC) systems for broadband application in low- and medium-voltage electric power grids demands the use of high-order digital modulation techniques, like Mary *quadrature amplitude modulation* (M-QAM), capable of transmitting a high amount of bits per symbol. Consequently, the process of detection in the demodulator, which represents a computational cost, must be carefully considered to design such systems.

Although the *maximum likelihood* (ML) criterion is an optimum detection technique for *additive white Gaussian noise* (AWGN) channels presenting messages of equal probability, this technique requires large computational requirements to calculate the Euclidean distance between the received symbol and all other points of the QAM constellation [1].

One of the main challenges in modern digital communication systems is to reduce the computational complexity. In this regard, this paper outlines a low-cost detection/demodulation technique for high-order square M-QAM constellations.

The proposed technique is based on the so-called *heuristic decision region* (HDR) approach that allows a considerable

reduction of hardware resource in a FPGA device [2]. Additionally, it makes use of non-linear function that simplify the implementation of HDR approach with finite-precision (i.e., unsigned fixed-point).

The performance of the proposed technique is analyzed when the signal is corrupted by the presence of background AWGN and *additive impulsive Gaussian noise* (AIGN) [3]. The numerical results indicate that reduced performance losses are noted for M-QAM, M = 4, 16, 64, 256, 1024, and 4096. The FPGA implementation of the proposed technique is accomplished with *hardware description language* (HDL). The analysis of hardware resource demand reveals that the proposed technique is the one with the lowest hardware resource demand and latency.

This paper is organized as follows. In Section II, the problem formulation is arisen. HDR approach is shown in Section III. Section IV provides a description of the proposed detection strategies for high-order M-QAM used in the FPGA implementation. The simulation results and the computational burden analysis for FPGA-based implementation of the proposed strategies are shown in Section V, and finally, Section VI presents the conclusions of this paper.

#### **II. PROBLEM FORMULATION**

A modulation and demodulation scheme for digital communication can be depicted in Fig. 1. In this figure, functions f(.) and g(.) implement the modulation and detection/demodulation techniques, respectively. Also, the channel output, which is corrupted by additive noise, can be expressed by

$$\mathbf{y} = \mathbf{x}_i + \mathbf{v},\tag{1}$$

where **y** is the channel output vector;  $\mathbf{x}_i$ , i = 0, 1, ..., M - 1, is the *i*-th constellation point associated with the binary message  $m_i$ , which lenght is  $log_2(M)$ , and **v** is the additive noise vector.

For this contribution, we assume that  $\mathbf{x}_i$  can be a point in a square *M*-QAM constellation, such that  $M = 2^n$ , n = 0, 2, 4, 6, 8, 10, 12. Additionally, a Gray code similar to the one employed in IEEE 802.16 standard [4] is considered. Fig. 2 presents the Gray-mapped constellation for 16-QAM, in which the least significant bit is  $b_0$  and the



most significant bit is  $b_3$ . The remaining constellations can be generated using the same rule.



Fig. 2: 16-QAM constellation map [4].

In the system model,  $g(.) = [f(.)]^{-1}$  if  $\mathbf{v} = 0$ . However, if  $\mathbf{v}$  is a random variable, the channel is AWGN,  $E\{\mathbf{x}_i\} = 0$ ,  $\mathbf{x}_i$  is *i.i.d.* (independent and identically distributed), then, the detector which provides an optimal solution in the ML sense is expressed by

$$\hat{m}_i \Rightarrow m_i \text{ if } \|\mathbf{y} - \mathbf{x}_i\|^2 \le \|\mathbf{y} - \mathbf{x}_j\|^2, \forall i, j = 0, 1, ..., M - 1$$
(2)

The ML detector for AWGN channels has an intuitive physical interpretation, in which the decision  $\hat{m}_i$  corresponds to the symbol  $\mathbf{x}_i$  which is the nearest one to the output of the channel vector, in terms of Euclidean distance. If the constellation size is small, the evaluation of (2) presents very low computational complexity. The same cannot be said when the constellation size is huge, e.g., a new generation of PLC modem will make use of  $2^{15}$ -QAM constellation size [5]. As a result, the low-cost implementation of high-order QAM modulation/demodulation needs to be addressed in order to make the high-speed PLC modem commercially feasible.

Usually, modulation and demodulation techniques reduce to mapping and detection/de-mapping procedures, respectively. Then, the introduction of a low-cost detection technique that works together with the demodulation is a timely and important issue to be considered. In this context, the following questions can arise:

i) How to implement the detection technique in (2) with low-computational complexity in a FPGA device?

ii) What kind of performance losses can be noted if the low-cost implementation of (2) in a FPGA device is applied to demodulate M-QAM symbols corrupted by AWGN or AIGN?

The analysis of AIGN is part of the question (ii) because the additive noise in the PLC channel can be modeled as [6]

$$\mathbf{v} = \mathbf{v}_{bkgr} + \mathbf{v}_{nb} + \mathbf{v}_{pa} + \mathbf{v}_{ps} + \mathbf{v}_{imp}, \tag{3}$$

in which  $\mathbf{v}_{bkgr}$  is the background noise,  $\mathbf{v}_{nb}$  is a narrow band noise,  $\mathbf{v}_{pa}$  is a periodical impulsive noise asynchronous to the fundamental component of power system,  $\mathbf{v}_{ps}$  is a periodic impulsive noise synchronous to the fundamental component of power system, and, finally,  $\mathbf{v}_{imp}$  is an asynchronous impulsive noise, which is the hardest one due to its unpredictability and high power.

In this contribution, is assumed that

$$\mathbf{v} = \mathbf{v}_{bkgr} + \mathbf{v}_{ps} + \mathbf{v}_{imp},\tag{4}$$

because  $\mathbf{v}_{nb}$  can be mitigated and  $\mathbf{v}_{pa} + \mathbf{v}_{ps}$  can be considered as one component. In this model, for AIGN, we assume that  $\mathbf{v}_{bkgr} \sim \mathcal{N}(0, \sigma^2)$ , a random variable with Gaussian distribution and mean equal to 0 and variance  $\sigma^2$ , represents the colored background noise;  $\mathbf{v}_{ps} \sim \mathcal{N}(0, K_1 \sigma^2)$  denotes a periodic component with arriving interval  $t_{arr,ps} = (1/2f_0)$ s, in which  $f_0$  is the power frequency of electric system, and time duration of  $t_{w,ps} = 100 \ \mu s$  located in  $(n/2)f_0$ ,  $n = 0, 1, 2, \dots$  And also  $\mathbf{v}_{imp} \sim \mathcal{N}(0, K_2 \sigma^2)$  represents an impulsive component with interarrival time  $t_{arr,imp}$  modeled as an exponential random variable with mean equal to 100 ms and time lasting of  $t_{w,imp} = 100 \ \mu s$ . The constants  $K_1$  and  $K_2$  are specified in order to generate noises with different levels of severity. This model is capable of representing PLC scenarios with high severity once the , impulsive noise is modeled as white and Gaussian, what make it possible to emulate the worst case scenario. In all the simulation results shown in this work,  $K_1 = K_2 = 20$ dB.

Section III details an efficient and low-cost implementation of (2) in a FPGA device that try to offer directions to answer the first posed question. Directions to answer the second question are presented in Section V.

#### **III. HEURISTIC DECISION REGIONS**

According to the theory, the decision region using a ML detector for each message  $m_i$ , i = 0, 1, 2, ..., M - 1, for a square M-QAM modulator is defined by

$$D_{i} \stackrel{\Delta}{=} \left\{ \mathbf{y} | p_{\mathbf{y} | \mathbf{x}} \left( \mathbf{y} | i \right) \ge p_{\mathbf{y} | \mathbf{x}} \left( \mathbf{y} | j \right), \forall j \neq i \right\}, \tag{5}$$

where  $p_{\mathbf{y}|\mathbf{x}}(y|i)$  is the conditional density probability of the channel. The decision rule is provided by (2) and demands a huge computational complexity for high-order *M*-QAM constellations. Another decision region is defined as follows:

Definition: A heuristic decision region (HDR) for the detection of each message  $m_i$ , i = 0, 1, 2, ..., M - 1 for

a square M-QAM modulation is defined by

$$D_{i,j} \triangleq \begin{cases} D_{i,j}^* & \text{if } i, j < 0\\ D_{i+1,j}^* & \text{if } i \ge 0, j < 0\\ D_{i,j+1}^* & \text{if } i < 0, j \ge 0\\ D_{i+1,j+1}^* & \text{if } i \ge 0, j \ge 0 \end{cases},$$
(6)

where

$$D_{i,j}^{*} = \begin{cases} \mathbf{y} \mid id \leq \Re(\mathbf{y}) < (j+1)d \text{ and } jd \leq \Im(\mathbf{y}) < (j+1)d \\ \text{for } -\frac{\sqrt{M}}{2} < i, j < \frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) < (j+1)d \\ \text{for } i = \frac{\sqrt{M}}{2} - 1, -\frac{\sqrt{M}}{2} < j < \frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) < (j+1)d \text{ and } jd \leq \Im(\mathbf{y}) \\ \text{for } -\frac{\sqrt{M}}{2} < i < \frac{\sqrt{M}}{2} - 1, j = \frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id > \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) < (j+1)d \\ \text{for } i = -\frac{\sqrt{M}}{2}, -\frac{\sqrt{M}}{2} < j < \frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) < (j+1)d \text{ and } jd > \Im(\mathbf{y}) \\ \text{for } -\frac{\sqrt{M}}{2} < i < \frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) \\ \text{for } i = j = \frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id > \Re(\mathbf{y}) \text{ and } jd > \Im(\mathbf{y}) \\ \text{for } i = j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) \text{ and } jd > \Im(\mathbf{y}) \\ \text{for } i = \frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id \leq \Re(\mathbf{y}) \text{ and } jd > \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd \leq \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1, j = -\frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \text{for } i = -\frac{\sqrt{M}}{2} - 1 \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf{y}) \text{ and } jd < \Im(\mathbf{y}) \\ \mathbf{y} \mid id < \Re(\mathbf$$

 $\Re(.)$  and  $\Im(.)$  denote real and imaginary components of **y**, respectively; *d* is the minimum distance in accord with the theory for digital communications. Fig. 3 illustrates the decision regions provided by (6) for 16-QAM constellation.



Fig. 3: Heuristic decision regions for 16-QAM.

Despite of the complexity related to its formulation, the HDR approach has a tremendous impact on the implementation of M-QAM in a FPGA device. Essentially, sum, subtraction and multiplication procedures are exchanged by

comparison operations that demand lower hardware complexity to be accomplished than the former operations.

Additionally, variations of the HDR can be devised by taking into account the ML detector. For instance, a mixed ML + HDR detector can be designed if 2d replace d and the ML criterion is applied in each four-point constellation defined by the HDR.

In the hardware description, for a FPGA implementation of HDR, the detector is defined by comparators that evaluates the input vector in order to assign it to one of the predetermined detection regions, then each detected point is associated with an output through a *look-up table* (LUT). The detection and demodulation functions can be combined into a single block, since it is possible to merge their functionalities on a single LUT and minimize resource.

Even for this same detection technique, it can be found different ways to implement it, which require different amounts of hardware resources. Thus, it was implemented the HDR technique using state machines in order to take advantage of procedures found in modern synthesis tools for this kind of hardware description for FPGA device. Coding the *hardware description language* (HDL) properly, ensuring that modern synthesis toll recognizes a piece of code as a state machine, allows the tool to recode the state variables to improve the quality of results, and allows the tool to use the known properties of state machines to optimize other parts of the design. When synthesis recognize a state machine, it is often able to improve the design area and performance [7].

State machines offer an interesting and powerful alternative to implement HDR approach. Since the region scanning is implemented sequentially, each quadrant checking can be defined as a different state. Fig. 4 illustrates the state diagram of the in-phase (I) and quadrature (Q) components implemented for detecting 16-QAM. In this figure, the distance d between consecutive points is properly chosen to being equal to 2, the states are represented by circles and the transitions between states represent the symbol detection. For instance, if a received symbol has a quadrature component bigger than unity, it will change for the x > 0state.



Fig. 4: State machine for 16-QAM detection.

The system starts in an idle state and waits for a valid data. Once the input is valid, the machine changes the current state according to the input value until get one of the four last states. Each one of that states generates a different output value and put the machine back to the idle state in order to perform a new iteration.

#### IV. THE PROPOSED TECHNIQUE

The proposed modulation and demodulation/detection technique for M-QAM modulation is drawn in Fig. 5.

$$\underbrace{m_i}_{f(.)} \underbrace{\mathbf{x}_i}_{(.)} \underbrace{\mathbf{y}_q}_{\mathfrak{F}(.)} \underbrace{g(.)}_{\mathfrak{F}(.)} \underbrace{\hat{m}_i}_{\mathfrak{F}(.)} \underbrace{g(.)}_{\mathfrak{F}(.)} \underbrace{\hat{m}_i}_{\mathfrak{F}(.)} \underbrace{g(.)}_{\mathfrak{F}(.)} \underbrace{\hat{m}_i}_{\mathfrak{F}(.)} \underbrace{g(.)}_{\mathfrak{F}(.)} \underbrace{g(.)} \underbrace{g(.)}_{\mathfrak{F}(.)} \underbrace{g(.)} \underbrace{g(.)$$

Fig. 5: System model for the proposed technique for M-QAM detection/demodulation.

In this proposed technique, the block  $\mathfrak{F}(.)$  perform clipping, offset and quantization operation; and the block g(.) implement the HDR approach to detect *M*-QAM symbols.

The clipping operation is applied to restrict  $\mathbf{y}$  into one of the square tile defined by the HDR approach. The offset/translation operation have to be considered because, in this context, it is more efficient to work with unsigned integer numbers than signed ones, since it can be reduced one bit. The quantization of real and imaginary components of  $\mathbf{y}$  allows the FPGA to reduce the computational complexity during the detection process. In fact, the comparators for HDR implementation in a FPGA device will work with the number of bits used in the quantization process. The output of  $\mathfrak{F}(.)$  block is expressed by

$$\mathbf{y}_d = \mathfrak{F}(\Re(\mathbf{y}), \Delta_s, K_q) + j\mathfrak{F}(\Im(\mathbf{y}), \Delta_s, K_q), \qquad (8)$$

where  $\Delta_s$  denotes the step-size for amplitude quantization,  $K_q$  the value for offset/translation, and  $\mathfrak{F}(.)$  is the function that implements the clipping, offset and quantization operations.

We assume that

$$K_q = \frac{2^N - 1}{2},$$
 (9)

in which N is the number of bits applied to quantize  $\Re(\mathbf{y})$  or  $\Im(\mathbf{y})$ . Also, the step-size is given by

$$\Delta_s = \frac{\left(\sqrt{M} - 1\right)}{2^N - 1}d,\tag{10}$$

where d is the minimum distance (in one direction) between two closest points of the M-QAM constellation and it is expressed by

$$d = \sqrt{\frac{6E_{\mathbf{x}}}{M-1}},\tag{11}$$

in which  $E_{\mathbf{x}}$  is the *M*-QAM symbol energy.

Now  $\mathfrak{F}(\mathbf{z}, \Delta_s, K_q)$  can be expressed by

$$\mathfrak{F}(\mathbf{z}, \Delta_s, K_q) = \begin{cases} 0, \frac{\mathbf{z}}{\Delta_s} < -K_q \\ \left\lfloor \frac{\mathbf{z}}{\Delta_s} + K_q + \frac{1}{2} \right\rfloor, \frac{|\mathbf{z}|}{\Delta_s} \leq K_q \\ 2K_q, \frac{\mathbf{z}}{\Delta_s} > K_q \end{cases}$$
(12)

where  $\lfloor . \rfloor$  is the floor function and it returns the greatest integer smaller than or equal to the input value and  $\mathbf{z}, \mathbf{z} \in \mathbb{R}$ , is an input variable.

In the implemented system, the signal entering the receiver is set with 32-bit floating point, in order to represent data with infinite precision. The detector receives the data with 32-bit fixed-point and unsigned integer defined by the process of quantization.

### V. NUMERICAL RESULTS

To carry out simulations to verify the performance of the proposed technique, the parameter energy per bit to power spectral density ratio  $(E_b/\mathcal{N}_0)$ , showed in the following figures, represents the ratio between the energy of the transmitted bit and the PSD of the background noise,  $\mathbf{v}_{bkgr}$ , for AWGN or AIGN assumptions;  $E_b = 0$  dB and  $\mathcal{N}_0 = \sigma_{bkgr}^2$ .  $E_b/\mathcal{N}_0$  values range from 0 dB to the point where BER (bit error rate) reached the value of  $10^{-6}$ . The  $2^n$ -QAM, n = 2, 4, 6, 8, 10, constellations are taken into account. For each point in the BER  $\times E_b/\mathcal{N}_0$ , it is established a value of at least 150 errors to stop the simulation. Table I lists different configurations of the proposed techniques that will be analyzed in this section.

TABLE I: The description of the proposed detection techniques for M-QAM symbols.

| Technique   | Technique Description               |
|-------------|-------------------------------------|
| HDR         | HDR detection                       |
| ML          | ML detection                        |
| HDR/4-ML    | HDR detection followed by           |
|             | ML detection in 4-point regions.    |
| HDR/16-ML   | HDR detection followed by           |
|             | ML detection in 16-point regions.   |
| HDR/64-ML   | HDR detection followed by           |
|             | ML detection in 64-point regions.   |
| HDR/256-ML  | HDR detection followed by           |
|             | ML detection in 256-point regions.  |
| HDR/1024-ML | HDR detection followed by           |
|             | ML detection in 1024-point regions. |

The performance comparison among these configurations of the proposed technique and the traditional ML criterion based detector are illustrated in Figs. [6-11], in which the performance of the system in the presence of AWGN is always better than the performance of the system in the presence of AIGN. For the sake of comparison, in these figures, it is included the theoretical curves obtained for M-QAM when the channel is AWGN. Based on these simulation results, one can note that reduced performance losses are incurred by the proposed techniques if the channel is AWGN or AIGN. The largest performance losses in comparison with the theoretical curve for AWGN is presented in Tab. II. In terms of hardware implementations, it can be concluded that none of configurations of the proposed technique yields results that undermine the communication system performance.

TABLE II: Performance losses due to the use of the proposed detection strategies for the implementation of M-QAM. The losses are related to the theoretical curves for AWGN.

| Modulation | Largest Variation | Detection Technique |
|------------|-------------------|---------------------|
| 4-QAM      | 0.03 dB           | HDR                 |
| 16-QAM     | 0.03 dB           | HDR and HDR/4-ML    |
| 64-QAM     | 0.08 dB           | HDR                 |
| 256-QAM    | 0.15 dB           | HDR                 |
| 1024-QAM   | 0.03 dB           | HDR/256-ML          |
| 4096-QAM   | 0.11 dB           | HDR/4-ML            |

Table III presents a comparison of the number of *logic* cells (LC) and DSP blocks (DSP), memory used (Mem.), the maximum restricted frequency ( $F_{max}$ ) of operation and the system latency for 16-QAM implementation. The collected data were considered in the worst case on a Altera Stratix III EP3SL150F1152C2 FPGA device [8].

TABLE III: Attained results with the implementation of the proposed strategies for 16-QAM detection in the Altera Stratix III EP3SL150F1152C2 device.

| Technique     | LC  | DSP | Mem. | $F_{max}$  | Latency        |
|---------------|-----|-----|------|------------|----------------|
| HDR           | 56  | 0   | 0    | 500 MHz    | 1 clock cycle  |
| State Machine | 64  | 0   | 0    | 500 MHz    | 3 clock cycles |
| ML            | 409 | 32  | 0    | 226.35 MHz | 4 clock cycles |
| HDR/4-ML      | 331 | 64  | 0    | 204.04 MHz | 4 clock cycles |



Fig. 6: Performance comparison for the proposed models - 4-QAM constellation.

Based on the values presented in Table III, it can be noted that HDR configurations of the proposed technique



Fig. 7: Performance comparison for the proposed models - 16-QAM constellation.



Fig. 8: Performance comparison for the proposed models - 64-QAM constellation.

demand the lowest hardware resources and latency when it is implemented in FPGA device. The FPGA implementations of HDR configuration for the other M-QAM were accomplished. The hardware resource and the latency for these implementations are presented in Tab. IV. One can note that HDR configuration demands only logic cells of the FPGA which number for each constellation point reduces from 1 to nearly 1/5 when M moves from 4 up to 4096.

## VI. CONCLUSION

In this paper, it was discussed the digital design, simulation, and FPGA implementation of a low-cost high-order square M-QAM detection/demodulation technique, using Altera design tools and simulation software.

A comparative analysis for symbol detection with ML criterion and HDR was provided. It was also proposed



Fig. 9: Performance comparison for the proposed models - 256-QAM constellation.



Fig. 10: Performance comparison for the proposed models - 1024-QAM constellation.

TABLE IV: Attained results with the implementation of the HDR configuration of proposed technique for M-QAM detection in the Altera Stratix III EP3SL150F1152C2 device.

| Modulation | LC  | Mem. | $F_{max}$ | Latency       |
|------------|-----|------|-----------|---------------|
| 4-QAM      | 4   | 0    | 500 MHz   | 1 clock cycle |
| 16-QAM     | 56  | 0    | 500 MHz   | 1 clock cycle |
| 64-QAM     | 148 | 0    | 500 MHz   | 1 clock cycle |
| 256-QAM    | 315 | 0    | 500 MHz   | 1 clock cycle |
| 1024-QAM   | 493 | 0    | 500 MHz   | 1 clock cycle |
| 4096-QAM   | 847 | 0    | 500 MHz   | 1 clock cycle |

strategies for jointed use of both methods in order to reduce the computational complexity of this process. The analysis of the resource demanded by these schemes, showed that there are alternative ways to achieve satisfactory levels of



Fig. 11: Performance comparison for the proposed models - 4096-OAM constellation.

performance with reduced computational burden.

In addition, from the observation of results obtained in computer simulations with different strategies for detecting symbols, it could be seen that only minimal performance degradation between these and the theoretical curves drawn for each of the modulation schemes proposed is achieved. Since the biggest difference between the performance curves in terms of BER  $\times E_b/N_0$ , showed relatively small value, either detection alternatives presented would lead to satisfactory results for the process of demodulation. Thus, the best results can be attributed those strategies that had fewer resource request, that is the HDR approach.

#### ACKNOWLEDGMENT

The authors would like to thanks FINEP, INERGE, CNPq, FAPEMIG, and CAPES for their financial support.

#### REFERENCES

- J. G. Proakis, *Digital Communication*, 3rd ed. McGraw-Hill International Editions, 1995.
- [2] U. Meyer-Bäse, Digital Signal Processing with Field Programmable Gate Arrays, 3rd ed. Springer, 2007.
- [3] F. P. V. de Campos and M. V. Ribeiro, "Performance analysis of clustered-OFDM system with bitloading algorithm for broadband PLC," in *Proc. IEEE International Symposium on Power Line Communications and Its Applications*, Apr. 2008, pp. 345–350.
- [4] IEEE, "Standards for local and metropolitan areas networks part 16: Air interface for fixed broadband wireless access systems," IEEE Standard, 802.16, 2004.
- [5] A. Z. A. M. Tonello, P. Siohan and X. Mongaboure, "Challenges for 1 Gbps Power Line Communications in Home Networks," in *Proc. IEEE Personal Indoor Mobile Radio Communications Symposium*, Sep. 2008, pp. 1–6.
- [6] K. Dostert, Power Line Communications. Prentice Hall, 2001.
- [7] S. Brown and Z. Vransic, *Fundamentals of Digital Logic with VHDL Design*, 1st ed. McGraw-Hill, 2000.
- [8] Stratix III Device Handbook, www.altera.com, Altera Corporation, March 2010.