# Efficient Hardware Design of Single Carrier GSMK Modulator and Demodulator for next Generation Communication using Flexible and Optimal Sub-Modules

RENUKA R KAJUR

Department of Electronics and Communication Engineering PESIT-BSC, BANGALORE, KARNATAKA

TEJAS SP Department of Electronics and Communication Engineering PESIT-BSC, BANGALORE, KARNATAKA

Dr.K.V.Prasad

Department of Electronics and Communication Engineering Bangalore Institute of technology, BANGALORE, KARNATAKA

Abstract- Now, we are in the modern era of mobile networks that must point out the issues of a fully mobile and their connectivity. Gaussian minimum shift keying (GMSK) is a modulation scheme which used for variety of communications systems. From the knowledge of spectrum utility, GSMK modulation play an important role in next generation communication system. The main problem in hardware design of next generation communication systems are the system capacity, tradeoff between mobile data volume and area, high data rates, end-to end latency, and power consumption. Many researchers have been proposed hardware design of modulation scheme with a concentration of single metrics among power consumption, area consumption and maximum operating frequency. In this paper, we propose an efficient hardware design of single carrier GSMK (HSC/GSMK) modulator/demodulator to maximize the operating speed; minimize the hardware cost, and power consumption. Generally, GSMK modulator/demodulator consists of sub-modules are addition, multiplication, integration, derivation, trigonometric functions, Gaussian filter, and low pass filter. Among the other modules, multiplier and filters are affects the characteristics of GSMK modulation/demodulator. The HSC/GSMK modulator/demodulator replaces the flexible multiplier and optimal Gaussian filter instead of conventional multiplier and filters, which surely minimize the area and power consumption. Moreover, the multi-purpose module is proposed for trigonometric functions such as cosine, sine and arc tan. Xilinx simulation results shows that the proposed design performs efficient than previous scheme in terms of hardware utilization, speed, and power consumption.

# Keywords – single carrier, next generation, Gaussian minimum shift keying, modulator, demodulator, multi-purpose trigonometric module

#### I. INTRODUCTION

One of the main issues in the application of digital modulation to radio communications is the choice of multiplexing/multiple access techniques. Debates on the benefits of single carrier (SC) transmission over multicarrier (MC) transmission have led to the choice of the SC system in several applications. On the other hand, there has also been a renewed interest in multicarrier modulation (MCM) [1]-[3]. MCM out-performs single carrier modulation (SCM) [4]-[6] in the case of flat fast fading, since the fading signal is integrated over a longer symbol interval. The use of multiple antennas at the transmitter and receiver sides can significantly enhance the capacity and reliability of wireless links [7]. However, multi-antenna operation faces significant challenges due to complexity and cost of the hardware owing to the requirement of inter-antenna synchronization and maintenance of multiple antenna systems which addresses these issues [9]. High data rate communication system designs necessitate larger bandwidths, which, in turn, result in frequency selectivity of the wireless channel. In frequency selective channels, the presence of multipath components causes inter-symbol interference (ISI) [10]. To mitigate ISI, use of multipath components causes inter-symbol interference (ISI) [10].

carrier techniques such as orthogonal frequency division multiplexing (OFDM) is popular due to simple equalization/receiver complexity [11].

The different hardware designs are used for mobile communication systems that are provides a ubiquitous connectivity and seamless service delivery in all circumstances [12]. This important expected number of devices and the coexistence of human-centric and machine-type applications will lead to a large diversity of communication scenarios and characteristics. Many advanced communication techniques are under investigation. A fully digital adaptive equalizer chip for QAM digital radio modem [13] contains 107936 transistors on a silicon area of 94.6 mm2. The inputs and outputs of the chip are ECL compatible, using control unit compensating the influence of transistor parameter variations. A differential 16 QAM modulator [14] have implemented and the operation was verified in the time domain. The salient features of this digital realization approach are simplicity, flexibility and precision. Degradations resulting from the classical I/Q modulator implementations are bypassed. A large number of modulation schemes may reside in a single unit and can be selected via TI manual or electronic switch. The IC integrates a VSB/QAM demodulator [15] for in-band broadcasting with a QPSK modem for bidirectional out-ofband cable signaling. It features all digital synchronization scheme and robust channel equalization technique applied to the common VSB/QAM demodulation. 4LINE-PAM4 and 3LINE-PAM4 [16] reduce the transmitted power of multilevel signaling. The schemes have roughly 3-5 dB coding gain over the un-coded 4-PAM. An adaptive modulation scheme for link adaptation (LA) using digital phase-modulation techniques [17] with abrupt the phase shift of carrier waveform. The momentary increase in voltage-controlled oscillator (VCO) frequency with narrow control pulses, derived from the baseband data stream using a simple supporting circuit. QR decomposition for 4×4 MIMO-OFDM systems [18] developed by cascading one complex-value and one real-value givens rotation stages. The requirement of skewed inputs in the conventional QR-decomposition systolic array is eliminated and 36% of delay elements are removed. The real-value Givens rotation stage is also constructed in a form of stacked triangular systolic array to match with the throughput of the complex-value one. A configurable tree-searching approach [19] combines the classical depth-first and breadth-first features to reduce complexity while providing both hard and soft outputs. A single programmable parameter allows the user to tradeoff throughput versus BER performance.

Our contributions. The hardware design of single-carrier GSMK (HSC-GSMK) modulator/ demodulator is proposed for next generation communication system using flexible and optimal design modules. The main objective is to design hardware efficient architecture with maximize the speed, minimize the hardware cost and power consumption.

The rest of the paper is organized as follows. Recent works related to our contributions are surveyed in Section 2. Section 3 provides the problem methodology and system model of proposed design and the detailed descriptions are given in Section 4. The simulation and result analysis are performed in Section 5, finally, the paper concludes in Section 6.

# II. RELATED WORKS

Gutierrez et al. [20] have proposed a filter bank multi-carrier (FMMC) modulation scheme for digital communications architecture. A unified framework to characterize any possible multi-carrier modulation, including those relying on band-limited shaping pulses. A general signal model and identified a set of four signal parameters whose values characterize the transmitted signal. The combinations of parameters systematically used to avoid the partially addressed extensive use of the poly-phase decomposition of the prototype filter and standard multi-rate techniques.

Nadal et al. [21] have proposed an advanced communication system based on FBMC/OQAM modulation. The hardware prototyped new waveform was considered as a key enabler for the future flexible 5G air interface. The design and prototyping experience related to new technical component, including the algorithm simplification and optimization, architecture exploration, hardware implementation, on-board validation and demonstration. The contribution serves as a proof-of-concept of the waveform and allows for rapid architecture exploration and performance evaluation and comparison with state-of-the-art OFDM-based systems. This modulation was being studied and considered nowadays by recent research projects as a key enabler for future flexible 5G air interface.

Lin et al. [22] have proposed a MCM scheme (called state-of-the-art (SoTA)) using version of the classical OQAM with the circular convolution concept. OFDM is the simplest MCM system and widely adopted in many applications. Different from the SCM for which the transmitted data was spread over a wide bandwidth, in OFDM the data are modulated at a set of narrow subcarriers whose bandwidth is largely smaller than the channel coherent bandwidth, leading to quasi flat fading at each subcarrier.

Yang et al. [23] have proposed a fractional Fourier transform (FrFT) based multicarrier order division multi access communication system. It uniquely identified by an FrFT order with the wide-band base band wave forms in all FrFT domains with different users' FrFT orders. The multiple independent data streams have transmitted by FrFT–OFDM in the same time and different FrFT domains. The new kind of carriers with different modulated rates, are merely mutually approximately orthogonal. There was a problem of energy leakage between multiple chirp carriers, which possibly causes the multiple chirp carriers inter-shielding to influence the FrFT–OFDM demodulation performance.

Han et al. [24] have designed Alamouti-like space-time block coded (STBC) and space-frequency block coded (SFBC) vector OFDM systems. The two transmit and one receive antenna, the diversity order of the zero forcing (ZF) receiver fixed to 2 over frequency selective fading channels, while that of the minimum mean square error (MMSE) receiver depends on the channel memory length, the vector block (VB) size and the spectral efficiency. These schemes operate on a VB basis aided by vector-specific phase rotation. Each of them yields a generalized framework, which incorporates the existing Alamouti-like OFDM and SC-FDE systems as special cases.

Liu et al. [25] have proposed a memory access reordering poly-phase network (PPN) for FBMC offset QAM (OQAM) system at 60 GHz band. The PPN architecture has lower complexity than the state-of-the-art designs. To evaluate the system performance and hardware complexity of baseband receiver in millimeter wave band, the PPN was integrated into our 8X-parallelism 60 GHz band baseband receiver. The out-of-band (OOB) radiation in fixed-point simulation has 25 dB improvements as compared with OFDM. The transmission efficiency of FBMC-OQAM baseband receiver can improve 52% due to using more data subcarriers and removal of cyclic prefix (CP).

Yang et al. [26] have presented single-carrier spatial modulation transmission techniques (SC-SM). The associated transceiver design with the benefits and potential tradeoffs, the LSA aided multiuser transmission developments, the relevant open research issues as well as the potential solutions of this appealing transmission technique. This scheme capable of adopting the low-complexity single-stream based detection, whilst relying on a single RF chain. Moreover, it can be designed for striking a flexible trade-off amongst the range of potentially conflicting system requirements, such as the effective throughput, the diversity gain and the hardware cost, while facilitating communications over dispersive channels.

Xu et al. [27] have proposed single-carrier frequency-division multiple-access (SC-FDMA) receivers in the longterm evolution uplink. A log likelihood ratio (LLR) computation algorithm used as a signal model with residual phase noise is considered in the algorithm. Based on this model to derive a closed-form expression of the likelihood function of the received symbol and calculate more accurate LLR information. The accuracy of the decoder was increased and the performance of the SC-FDMA system improved.

#### III. PROBLEM METHODOLOGY AND SYSTEM MODEL

#### A. Problem methodology

Nadal et al. [28] have proposed a low-complexity pipelined implementation for Filter-Bank multi-carrier with offset Quadrature amplitude modulation (FBMC/OQAM) using a pruned IFFT algorithm. The FBMC/OQAM and OFDM transmitters have developed with similar architectural choices for common blocks. Analytical and postsynthesis FPGA of transmitter demonstrate significant complexity reduction. For a short prototype filter like TFL1, complexity scales down to 40 to 50% when compared to a typical FBMC/OQAM design and becomes comparable to OFDM. For longer prototype filter lengths such as PHYDYAS, the PPN unit represents the main bottleneck in terms of hardware complexity. FBMC/OQAM design described in VHDL and synthesized targeting the XC7z020-1 Xilinx Zyng SoC device. The FBMC/OQAM TFL1 implementation consumes the 2828 registers, 4011 LUTs, 1068 RAM, 20 DSP multiplier, 139 mW power consumption, and 802 clock cycle latency. The FBMC/OQAM PHYDYAS implementation consumes the 3788 registers, 5585 LUTs, 3180 RAM, 32 DSP multiplier, 252 mW power consumption, and 805 clock cycle latency. From previous works [20]-[28], the simulation based analyses, where implementation aspects and crucial design metrics like power consumption, chip area, data rate are not considered. Hence there is a need to focus on optimization techniques for equalizer, reduction in design of digital circuitry or components to reduce the chip area providing high sampling rate. Finding the solutions for the key components have to be realized to provide the flexibility to deal with required modulation types and coding schemes. The costly resources are shared to keep the silicon area low and architectural optimizations have to be carried out to achieve very low power consumption, so that high capacity digital radio systems are being forced into upper microwave and millimeter-wave bands where existing transceiver design technology less effective. An important characteristic of GMSK modulation is its suitability for direct digital modulation at the transmit frequency is an attractive option for reducing the cost and complexity of the transmitter, as it removes the requirement for IF and up conversion circuitry. Hence, high-performance, cost effective, highly efficient and high data rate pulse shaping filters for GMSK modulators are required in the frequency range of microwave and millimeter bands are very much essential in the design of future systems.

The GMSK signal is considered as a partial response continuous phase modulation signal with the modulation index of ½ and the receiver is made of an ideal multiplier that multiplies the received signal with a locally generated carrier, followed by LPFs to generate the real and imaginary parts of the complex envelope of the received signal. Then, a phase generator builds all the possible phase transitions, finally the bits are reconstructed. The GMSK modulation/ demodulation in discrete time domain is written as follows,

$$M(t) = I(t)\cos(2\pi f_c t) + Q(t)\sin(2\pi f_c t)$$
(1)

$$DM(t) = I(t)\cos(2\pi f_c t + \theta_d(t)) + Q(t)\sin(2\pi f_c t + \theta_d(t))$$
(2)



Fig. 1 Basic structure of GMSK (a) Modulator (b) Demodulator

#### B. System model of proposed HSC/GSMK modulation/demodulation

The basic GMSK modulator/demodulator is shown in Fig. 1 which consists of different sub-modules are addition, multiplication, integration, differentiation, Gaussian filter, low pass filter (LPF) and trigonometric functions. The proposed HSC/GSMK modulator/demodulator replaces the flexible multiplier, optimal Gaussian filter and real valued low pass filter instead of conventional multiplier and filters. The multi-purpose trigonometric module is used for trigonometric functions. The system model of proposed architecture is shown in Fig. 2 with our contributions. The main objective of proposed design as follows:

1. In HSC/GSMK modulation/demodulation, the proposed flexible multiplier increases the flexibility of data handling in communication environment, and it reduces the hardware cost.

- 2. An optimal Gaussian filter is used to filtering the modulating signal without memory shortage problem i.e. the impulse response of the Gaussian filter is not only limited into a single bit interval, but is also extended over at least two more bits. This optimal filter composes an address of a memory (LUT), where the appropriate middle part shape of the Gaussian-filtered pulse is pre-calculated and stored as set of different digital words.
- 3. Additionally, the multi-purpose trigonometric module used to provide the cosine, sine and Arc tan function without reconfigure the architecture. The system model of proposed architecture is shown in Fig. 2 with our contributions.



Fig. 2 System model of proposed HSC/GSMK modulator/demodulator

# IV. HARDWARE DESIGN OF HSC/GSMK MODULATOR/DEMODULATOR

First, the detailed hardware architecture of proposed modules such as flexible multiplier, optimal Gaussian filter and multi-purpose trigonometric are discussed in this section.

# A. Flexible multiplier module

Flexibility is an important property the hardware industry lacks and trying to establish as much as possible. To survive in this technological word, the new designs should be of an adjustable one, which processes the flexible property. Here, we considered the conversion of a conventional MSB bit multiplier into a flexible multiplier with a maximum bit length m that can reconfigure itself for performing any size multiplication with bit length l<m, where l is the bit length for the required multiplication. The conventional MSB-first multiplier processing procedure is given in Algorithm 1, here, the notation Ci to represent the value of C after i iterations. The main difference between MSB-first multiplication and the LSB-first multiplication is that the former use the bits in the register from MSB to LSB, whereas the latter use the bits from LSB to MSB. As the iteration steps forward Ci+1, CmP and biA are added to form the Ci value for the corresponding iteration. Where, cm is the MSB bit of Ci+1. Comparing MSB-first multiplier with the LSB-first multiplier, LSB first include shorter critical path delay, thereby contributing in the total processing speed of the design, whereas it requires an extra register for temporarily storing a data.

By using a modified technique to overcome this problem of selecting the suitable feedback path based on the m value the conventional multiplier can be resigned to a multiplier incorporating a flexible property. In this paper, AND gates array and tri-state buffer logic connected with the multiplicand register A and the irreducible polynomial register P for enabling the multiplier bits is replaced with suitable tractate registers as shown in Fig. 3.

Algorithm 1: MSB-first multiplier

| Inpu | ut: A, B and P                               |
|------|----------------------------------------------|
| Out  | put: $R = AB \mod P$                         |
| 1.   | $C^i \leftarrow 0$                           |
| 2.   | for $i = m-1$ to 0 do                        |
| 3.   | $C^i \leftarrow x (C^{i-1} + C_m P + b_i A)$ |
| 4.   | end for                                      |
| 5.   | Retrun: $R \leftarrow C^0/x$                 |



Fig. 3 Proposed flexible multiplier

The area utilization is achieved by mapping the AND gate and tri-state buffer logic in the target device is shown in Fig. 4. When comparing the resource utilized by the AND gate and the tri-state buffer design logic, each AND gate utilize a two input LUT within each slice and two input buffers (IB) and a single output buffer (OB), whereas the tri-state buffer utilize a single inverter for inverting the incoming input signal from one of the inverter buffer and the other input acts as an enable for the tri-state buffer, thereby controlling the dataflow from the IB to the OB with respect to the other IB signal.



Fig. 4 Targeted AND, tri-state buffer logic

For reducing the unnecessary transition of bits in the registers, which include a vast effect in the total power consumption, we have redesigned the architecture with a gate clocking technique. Clock gating is designed by the RTL functionality and it stops the clocks for individual blocks when those blocks are inactive, effectively disabling all functionality of those blocks. Because large blocks of logic are not switching for many cycles it saves substantial dynamic power. The simplest and most common form of clock gating is when a logical "AND" function is used to selectively disable the clock to individual blocks by a control signal. This seems to be good approach for reducing the unnecessary transition of the output line of the R register, the unnecessary transition of R register within the feedback loop are not concerned. The processing input and their corresponding outputs are given in Table 1.

| Q         Q         D         D+1           0         0         0         0         0           0         1         1         0         1         1 | ration of the Baung teening teening |   |   |     |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------|---|---|-----|--|--|--|
| 0 0 0 0<br>0 1 1 0<br>1 0 1 1                                                                                                                       | Q                                   | D | Q | D+1 |  |  |  |
| 0 1 1 0                                                                                                                                             | 0                                   | 0 | 0 | 0   |  |  |  |
| 1 0 1 1                                                                                                                                             | 0                                   | 1 | 1 | 0   |  |  |  |
|                                                                                                                                                     | 1                                   | 1 | 0 | 1   |  |  |  |
| 1 1 0 1                                                                                                                                             | 1                                   | 0 | 1 | 1   |  |  |  |

#### B. Optimal Gaussian filter

The filtering is as an operation of selectively removing the frequency components from an input data. The selection criterion depends on the filter as low pass and high pass filter. The effect of filtering can be seen by it is effect on the Fourier spectrum where the data that correspond to the filtered frequencies are 'blackened' out. Gaussian filters are ideal to start filtering because their design can be controlled by manipulating just one variable as variance. The Gaussian filter function is defined as follows,

$$\phi(t) = \frac{1}{2\pi\sigma^2} e^{\frac{-(\ln(t))^2 + (R(t))^2}{2\sigma^2}}$$
(3)

where  $\sigma$  represents the variance (smaller values of means more frequencies are suppressed), In(t) is the input received from integrator module and R(t) is the reference signal for filter. LPF can always filter the low frequencies pass or are uncompressed by the filter. The size of the filter will determine the extent of filtering, the Gaussian filter, by virtue of its shape, always allows the area around the centre to pass. And since low frequency components correspond to the slowing changing elements of an input and high frequency its edges, applying Gaussian filter blurs an input by removing the sharp edges in it. In this paper, the hardware design of optimal Gaussian filter is presented and it shown in Fig. 5, which consists of simple sub-modules are counter, shift register, address decoder, read only memory (ROM), XOR and NOT function. The address decoder is used in order to access the memory taking into account the symmetries of the trajectories. The Gaussian ROM contains  $2^{j}$  samples of these trajectories in two memory page properly transformed in channel frequency setting words (FSWs). The address decoder consists of

memory page, properly transformed in channel frequency setting words (FSWs). The address decoder consists of XOR chains and a NOT gate. When the tri-bit word corresponds to S1–S4 portions (e.g. 111, 110, 011, 010) then this word comes through XOR1 chain, since the current bit is bit 1. In contrary, when the tri-bit word corresponds to S5-S8 portions (e.g. 000, 001, 100, 101) then this word is inverted. As a result, symmetric trajectories access the same page of ROM.

From Fig. 5, only one bit is used as MSB, since there are two pages stored in ROM for S2 and S4 portions, while the LSBs are produced by the counter MOD  $2^{j}$ . The bit-to-XOR is used as input in the XOR<sub>2</sub> chain. When bit-to-XOR is 0, then the counter counts from 0 to  $2^{j}$ , in order to derive the samples of S2 and S4 portions. In contrary, when bit-to-XOR is 1, then the counter counts from 2 j to 0 for the samples of S3 portions. Actually, for S3 portions, the counter accesses the samples of portions S2 in reverse order, since S3 is opponent of S2 portions. The address decoder and counter access the ROM and derive S2, S3, S4 portions, while S1 portion is constant and is generated differently. For S5, S6 and S7 portions are use the XOR3 chain to provide resultant output. This chain reverses the output bits of the memory when  $In_n$  is 0, because, as referred, S1–S8, S2–S6, S3–S7 and S4–S5 are

symmetric to horizontal axis. If S1–S8 were stored in ROM, then the required memory would be  $8 \times 2^{j} \times k$ , where k is the number of output bits of ROM, while in this case the required memory is  $2 \times 2^{j} \times k$ . The number of samples stored in this memory, depends on the required accuracy of response of the implementation. Due to the characteristics of GMSK modulation/demodulation the dependence level is varied with the boundary of upper and lower frequency as follows,

$$f^+ = f_c + \Delta_f \tag{4}$$

$$f^{+} = f_{c} - \Delta_{f} \tag{5}$$

where  $f_c$  represents the carrier frequency and represents the frequency deviation and defined as,

$$\Delta_f = \frac{B_r}{4} \tag{6}$$

Where  $B_r$  is the bit rate.



Fig. 5 Optimal Gaussian filter module

#### C. Multi-purpose trigonometric module

The unit circle is used to define the trigonometric functions at oblique triangles. Some oblique triangles are obtuse and we'll need to know the sine and cosine of obtuse angles. The trig functions are defined as angles beyond 180° and for negative angles. The ancient Greek geometers only considered angles between 0° and 180°, and considered neither the straight angle of  $180^{\circ}$  nor the degenerate angle of  $0^{\circ}$  to be angles. It's not only useful to consider those special cases to be angles, but also to include angles between 180° and 360°, too, sometimes called "reflex angles." With the applications of trigonometry to the subjects of calculus and differential equations, angles beyond  $360^{\circ}$  and negative angles became accepted, too. Consider the unit circle as Fig. 6. Denote its center (0,0) as O, and denote the point (1,0) on it as A. As a moving point B travels around the unit circle starting at A and moving in a counterclockwise direction, the angle AOB as a 0° angle and increases. When B has made it all the way around the circle and back to A, then angle AOB is a 360° angle. As B continues the second time around the circle, we get angles ranging from 360° to 720°. They're the same angles we saw the first time around, but we have different names for them. For instance, a right angle is named as either 90° or 450°. Each time around the circle, we get another name for the angle. So 90°, 450°, 810° and 1170° all name the same angle. If B starts at the same point A and travels in the clockwise direction, then we'll get negative angles or more precisely, names in negative degrees for the same angles. For instance, if you go a quarter of a circle in the clockwise direction, the angle AOB is named as  $-90^{\circ}$ . Of course, it is the same as a 270° angle. Let the angle be placed so that its vertex is at the center of the unit circle O = (0, 0), and let the first side of the angle be placed along the x-axis. Let the second side of the angle intersect the unit circle at B as shown in Fig. 7. Then, the angle equals the angle AOB where A is (1,0). We use the coordinates of B to define the cosine of the angle and the sine of the angle. Specifically, the x-coordinate of B is the cosine of the angle, and the y-coordinate of B is the sine of the angle.



Fig. 6 Unit circle with angles



Fig. 7 Sine and Cosine of arbitrary angles

The proposed multi-purpose architecture is shown in Fig. 8, which computes the trigonometric function and magnitude of a vector without the linear functions such as multiplication and division in different mode of operation such as rotational and vectoring mode respectively. The generalized trigonometric functions such as cosine, sine and arc-tan respectively,

$$x_{i+1} = x_i - n \sigma_i y_i \rho^{-s_{m,i}}$$

$$\tag{7}$$

$$y_{i+1} = y_i - n \sigma_i x_i \rho^{-s_{m,i}}$$
(8)

$$z_{i+1} = z_i - \sigma_i \,\alpha_{m,i} \tag{9}$$

where  $\sigma_i$  represents the choice of direction of rotation in each iteration, represents the radix of the number system, m steers the choice of linear (m = 0), circular (m = 1), and hyperbolic (m = -1) coordinate systems, is the nondecreasing integer shift sequence, and the rotation angle as,

$$\alpha_{m,i} = \frac{1}{\sqrt{m}} \tan^{-1} \left( \sqrt{m} \, \rho^{-s_{m,i}} \right) \tag{10}$$



Fig. 8 Multi-purpose trigonometric function (a) Cosine and Sine function (b) Arc tan function

## V. RESULT AND DISCUSSION

In this section begins with a brief review of FPGA design implementation of proposed hardware design of single carrier GSMK (HSC/GSMK) modulator/demodulator. At transmitter side, HSC/GSMK modulator is used to produce the sampled versions of the modulated signal by using a sampling rate f greater than  $2f_0$ , where  $f_0$  is a low carrier frequency. Taking advantage of the replicas of the digital signal the real carrier frequency is chosen at  $n \cdot f + f_0$  where n is a positive integer. Similarly at receiver side, convenient ratio between real carrier frequency and sampling frequency has to be respected. HSC/GSMK modulation/demodulation is implemented and syntheses in Xilinx FPGA with Zynq (XC7z020-1) target device. The inbuilt ISIM simulator is used for verification of process of the designed architecture. The experiments are carried out in a personal computer with windows 7 operating system with 4 GB ram and core i3 Intel processor. The RTL technical schematic of proposed HSC/GSMK modulator/demodulator is shown in Fig. 9.

The performance metrics are hardware utilization, maximum frequency, and power consumption are used to analyze the performance of proposed HSC/GSMK modulator/demodulator. The screen shot of device utilization summary is shown in Fig. 10 and Table 2 summarizes the detailed information regarding proposed design.



Fig. 9 RTL schematic screenshot of the proposed HSC/GSMK modulator/demodulator

| Table 2 device utilization summary of proposed HSC | C/GSMK modulator/demodulator |
|----------------------------------------------------|------------------------------|
|----------------------------------------------------|------------------------------|

| Performance metrics             | Values         |
|---------------------------------|----------------|
| Target device                   | Zynq XC7z020-1 |
| Number of slice Registers       | 867            |
| Number of look-up-tables (LUTs) | 847            |
| Number of RAM LUTs              | 168            |
| Number of bounded IOBs          | 1058           |
|                                 |                |

DSP multipliers

| 1 | 7 |
|---|---|
|   |   |

| Project Status      |                                     |                          |                                  |  |
|---------------------|-------------------------------------|--------------------------|----------------------------------|--|
| Project File:       | work1.xise                          | Parser Errors:           | No Errors                        |  |
| Module Name:        | HSCGMSK                             | Implementation<br>State: | Synthesized                      |  |
| Target Device:      | XC7z020-1                           | • Errors:                | No Errors                        |  |
| Product<br>Version: | ISE 14.5                            | • Warnings:              | <u>159 Warnings</u><br>(159 new) |  |
| Design Goal:        | Balanced                            | • Routing<br>Results:    |                                  |  |
| Design<br>Strategy: | <u>Xilinx Default</u><br>(unlocked) | • Timing<br>Constraints: |                                  |  |
| Environment:        | System Settings                     | • Final Timing<br>Score: |                                  |  |

| Device Utilization Summary (estimated values) |      |           |             |
|-----------------------------------------------|------|-----------|-------------|
| Logic Utilization                             | Used | Available | Utilization |
| Number of Slice Registers                     | 867  | 106400    | 0%          |
| Number of Slice LUTs                          | 847  | 53200     | 1%          |
| Number of fully used LUT-FF pairs             | 168  | 1546      | 10%         |
| Number of bonded IOBs                         | 1058 | 125       | 846%        |
| Number of BUFG/BUFGCTRL/BUFHCEs               | 2    | 104       | 1%          |
| Number of DSP48E1s                            | 17   | 220       | 7%          |

Fig. 10 Screenshot of device utilization summary

From Table 2, we use the FPGA target device as Zynq (XC&z020-1), the design consumes the 867 slice registers, 847 slice LUTs, 168 RAM LUTs, 1058 bounded IOs and 17 DSP multipliers. Other than main devices, the macro statistics device utilizations are 4 multipliers, 27 adder/ subtractor, 2 counters, 11 MAC parts and 1 memory device (RAM).

# A. Hardware utilization comparison

The performance of proposed HSC/GSMK modulator/demodulator is compared with the existing modulation techniques such as OFDM, FBMC/OQAM TFL1 (typical), FBMC/OQAM TFL1 (proposed), FBMC/OQAM PHYDYAS (typical), and FBMC/OQAM PHYDYAS (proposed) [28]. The performance comparison of hardware utilizations such as registers, LUTs, RAM LUTs and DSP multipliers are summarizes in Table 3.

| Design techniques           | Registers | LUTs | RAM (LUTs) | DSP multipliers |
|-----------------------------|-----------|------|------------|-----------------|
| OFDM                        | 3066      | 3599 | 912        | 16              |
| FBMC/OQAM TFL1 (typical)    | 3687      | 7385 | 1632       | 40              |
| FBMC/OQAM TFL1 [28]         | 2828      | 4011 | 1068       | 20              |
| FBMC/OQAM PHYDYAS (typical) | 6641      | 8952 | 3744       | 52              |
| FBMC/OQAM PHYDYAS [28]      | 3788      | 5585 | 3180       | 32              |
| HSC/GSMK                    | 867       | 847  | 168        | 17              |

# Table 3 Comparison of device utilization

The basic OFDM design consumes 3066 slice registers, 3599 slice LUTs, 912 RAM (LUTs), and 16 DSP multipliers. The FBMC/OQAM TFL1 (typical) design consumes 3687 slice registers, 7385 slice LUTs, 1632 RAM (LUTs), and 40 DSP multipliers. The FBMC/OQAM TFL1 [28] design consumes 2828 slice registers, 4011 slice LUTs, 1098 RAM (LUTs), and 20 DSP multipliers. The FBMC/OQAM PHYDYAS (typical) design consumes 6641 slice registers, 8952 slice LUTs, 3744 RAM (LUTs), and 52 DSP multipliers. The FBMC/OQAM PHYDYAS [28] design consumes 3788 slice registers, 5585 slice LUTs, 3180 RAM (LUTs), and 32 DSP multipliers. Up to this, the FBMC/OQAM TFL1, FBMC/OQAM PHYDYAS [28] perform very efficient than existing typical designs but we already mention the problem in [28]. Our proposed HSC/GSMK design consumes 867 slice registers, 847 slice LUTs, 168 RAM LUTs, and 17 DSP multipliers. The proposed HSC/GSMK design reduces 69.34% slice registers, 78.883% slice LUTs, 84.2697% RAM LUTs and 15% DSP multipliers than FBMC/OQAM TFL1 [28] design and reduces 77.112% slice registers, 84.8344% slice LUTs, 94.717% RAM LUTs and 46.875% DSP multipliers than FBMC/OQAM PHYDYAS [28] design.

# B. Power and speed analysis

The power consumption and speed comparison of proposed and existing designs is summarized in Table 4, which clearly depicts the power consumption and speed of proposed design is very low and high respectively compare to existing designs.

From Table 4, the basic OFDM design consumes 109 mW power and 1064 clock cycle latency; FBMC/OQAM TFL1 (typical) design consumes 246 mW power and 1073 clock cycle latency; FBMC/OQAM TFL1 [28] design consumes 139 mW power and 802 clock cycle latency; FBMC /OQAM PHYDYAS (typical) design consumes 359 mW power and 1076 clock cycle latency; FBMC/OQAM PHYDYAS [28] design consumes 252 mW power and 805 clock cycle latency; and the proposed HSC/GSMK design consumes 102 mW power and 801 clock cycle latency. The proposed HSC/GSMK design reduces 26.6187% and 59.5238% power consumption than FBMC/OQAM TFL1 [28] and FBMC/OQAM PHYDYAS [28] design respectively. Similarly, the proposed HSC/GSMK design reduces 0.1246883% and 0.496894% latency (delay) than FBMC/OQAM TFL1 [28] and FBMC/OQAM PHYDYAS [28] design respectively.

| Design techniques           | Power consumption (mW) | Latency (clock cycles) |
|-----------------------------|------------------------|------------------------|
| OFDM                        | 109                    | 1064                   |
| FBMC/OQAM TFL1 (typical)    | 246                    | 1073                   |
| FBMC/OQAM TFL1 [28]         | 139                    | 802                    |
| FBMC/OQAM PHYDYAS (typical) | 359                    | 1076                   |
| FBMC/OQAM PHYDYAS [28]      | 252                    | 805                    |
| HSC/GSMK                    | 102                    | 801                    |

#### Table 4 Comparison of power consumption and speed

#### VI. CONCLUSION

Through this work, we have attempted to introduce an efficient hardware design of single carrier GSMK (HSC/GSMK) modulator/demodulator for next generation communication system. The general GSMK module consists of different sub-modules, among others, multiplier and filters are affects the performance. Our paper, the flexible multiplier and optimal Gaussian filter used to replace the existing modules for area and power consumption requirement. The proposed design have implemented and synthesis using Xilinx ISE 14.5 with Zynq XC7z020-1 FPGA families. From simulation result and comparison, proposed HSC/GSMK) modulator/demodulator design performed very efficient than existing design in terms of device utilization, maximum frequency, and power consumption.

# REFERENCES

- [1] H. Lin and P. Siohan, "Modulation Flexibility in PLC: A Unified MCM Transceiver Design and Implementation", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 57, no. 10, pp. 2762-2775, 2010.
- [2] R. Aquilue, I. Gutierrez, J. Pijoan and G. Sanchez, "High-Voltage Multicarrier Spread-Spectrum System Field Test", IEEE Transactions on Power Delivery, vol. 24, no. 3, pp. 1112-1121, 2009.
- [3] N. Prasad, V. Shameem, U. Desai and S. Merchant, "Improvement in target detection performance of pulse coded Doppler radar based on multicarrier modulation with fast Fourier transform (FFT)", IEE Proceedings - Radar, Sonar and Navigation, vol. 151, no. 1, p. 11, 2004.
- [4] L. He, J. Wang and J. Song, "A Priori Information Aided Iterative Equalization: A Novel Approach for Single-Carrier Spatial Modulation in Dispersive Channels", IEEE Transactions on Vehicular Technology, pp. 1-1, 2016.
- [5] S. Daoud and A. Ghrayeb, "Using Resampling to Combat Doppler Scaling in UWA Channels With Single-Carrier Modulation and Frequency-Domain Equalization", IEEE Transactions on Vehicular Technology, vol. 65, no. 3, pp. 1261-1270, 2016.
- [6] S. Huang, J. Wang, J. Wang, C. Zhang and J. Song, "Convergence of Frequency-Domain Iterative MF-DFE for Single-Carrier Modulation", IEEE Transactions on Communications, vol. 63, no. 11, pp. 4150-4158, 2015.
- [7] G. Wang, L. Zhuang and K. Shao, "Time-varying multicarrier and single-carrier modulation systems", IET Signal Processing, vol. 7, no. 1, pp. 81-92, 2013.
- [8] X. Zhou, L. Yang, C. Wang and D. Yuan, "SCM-SM: Superposition Coded Modulation-Aided Spatial Modulation With a Low-Complexity Detector", IEEE Transactions on Vehicular Technology, vol. 63, no. 5, pp. 2488-2493, 2014.
- [9] N. Serafimovski, M. Di Renzo, S. Sinanovic, R. Mesleh and H. Haas, "Fractional bit encoded spatial modulation (FBE-SM)", IEEE Communications Letters, vol. 14, no. 5, pp. 429-431, 2010.
- [10] Z. Yiggit and E. Basar, "Quadrature spatial modulation for large scale MIMO systems", 2017 25th Signal Processing and Communications Applications Conference (SIU), 2017.
- [11] I. Al-Nahhal, O. Dobre and S. Ikki, "Quadrature Spatial Modulation Decoding Complexity: Study and Reduction", IEEE Wireless Communications Letters, vol. 6, no. 3, pp. 378-381, 2017.
- [12] G. Liu, X. Hou, J. Jin, F. Wang, Q. Wang, Y. Hao, Y. Huang, X. Wang, X. Xiao and A. Deng, "3-D-MIMO With Massive Antennas Paves the Way to 5G Enhanced Mobile Broadband: From System Design to Field Trials", IEEE Journal on Selected Areas in Communications, vol. 35, no. 6, pp. 1222-1233, 2017.
- [13] S. Meier, E. De Man, T. Noll, U. Loibl and H. Klar, "A 2- mu m CMOS digital adaptive equalizer chip for QAM digital radio modems", IEEE Journal of Solid-State Circuits, vol. 23, no. 5, pp. 1212-1217, 1988.
- [14] C. Koukourlis, P. Houlis and J. Sahalos, "A general purpose differential digital modulator implementation incorporating a direct digital synthesis method", IEEE Transactions on Broadcasting, vol. 39, no. 4, pp. 383-389, 1993.

- [15] Jisung Oh, Yongduk Chang, Kyeongbong Ha, Sukjin Jung and Jaewoo Kim, "A single VSB/QAM/QPSK IC for ATSC and OpenCable/sup TM/ digital terminals", ICCE. International Conference on Consumer Electronics (IEEE Cat. No.01CH37182).
- [16] K. Farzan and D. Johns, "A Robust 4-PAM Signaling Scheme for Inter-Chip Links Using Coding in Space", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 16, no. 11, pp. 1535-1544, 2008.
- [17] R. Mahapatra, A. Sundar Dhar and D. Datta, "Adaptive digital phase modulation schemes using transition-initiated phase acceleration", AEU - International Journal of Electronics and Communications, vol. 62, no. 10, pp. 740-753, 2008.
- [18] Z. Huang and P. Tsai, "Efficient Implementation of QR Decomposition for Gigabit MIMO-OFDM Systems", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 58, no. 10, pp. 2531-2542, 2011.
- [19] Chung-An Shen, A. Eltawil, K. Salama and S. Mondal, "A Best-First Soft/Hard Decision Tree Searching MIMO Decoder for a 4×4 64-QAM System", IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 20, no. 8, pp. 1537-1541, 2012.
- [20] E. Gutiérrez, J. López-Salcedo and G. Seco-Granados, "Systematic design of transmitter and receiver architectures for flexible filter bank multi-carrier signals", EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 1, 2014.
- [21] J. Li, K. Nasartschuk and K. Kent, "System-on-chip processor using different FPGA architectures in the VTR CAD flow", 2014 25nd IEEE International Symposium on Rapid System Prototyping, 2014.
- [22] H. Lin and P. Siohan, "Multi-carrier modulation analysis and WCP-COQAM proposal", EURASIP Journal on Advances in Signal Processing, vol. 2014, no. 1, 2014.
- [23] Z. Yang, R. Tao, Y. Wang and T. Wang, "A Novel Multi-carrier Order Division Multi-access Communication System Based on TDCS with Fractional Fourier Transform Scheme", Wireless Personal Communications, vol. 79, no. 2, pp. 1301-1320, 2014.
- [24] J. Han and G. Leus, "Space-Time and Space-Frequency Block Coded Vector OFDM Modulation", IEEE Communications Letters, vol. 21, no. 1, pp. 204-207, 2017.
- [25] C. Liu, M. Sie, E. Leong, Y. Yao, H. Lopez, C. Jen, W. Liu and S. Jou, "An 8X-Parallelism Memory Access Reordering Polyphase Network for 60 GHz FBMC-OQAM Baseband Receiver", IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 63, no. 12, pp. 2347-2356, 2016.
- [26] P. Yang, Y. Xiao, Y. Guan, K. Hari, A. Chockalingam, S. Sugiura, H. Haas, M. Di Renzo, C. Masouros, Z. Liu, L. Xiao, S. Li and L. Hanzo, "Single-Carrier SM-MIMO: A Promising Design for Broadband Large-Scale Antenna Systems", IEEE Communications Surveys & Tutorials, vol. 18, no. 3, pp. 1687-1716, 2016.
- [27] Z. Xu and G. Ren, "Phase Noise Suppression Algorithm Based on Modified LLR Metric in SC-FDMA System", Journal of Electrical and Computer Engineering, vol. 2017, pp. 1-5, 2017.
- [28] J. Nadal, C. Abdel Nour and A. Baghdadi, "Low-Complexity Pipelined Architecture for FBMC/OQAM Transmitter", IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 63, no. 1, pp. 19-23, 2016.