Lecture Notes in Electrical Engineering 573

# Sergio Saponara Alessandro De Gloria *Editors*

# Applications in Electronics Pervading Industry, Environment and Society APPLEPIES 2018



## Lecture Notes in Electrical Engineering

### Volume 573

#### Series Editors

Leopoldo Angrisani, Department of Electrical and Information Technologies Engineering, University of Napoli Federico II, Napoli, Italy

Marco Arteaga, Departament de Control y Robótica, Universidad Nacional Autónoma de México, Coyoacán, Mexico

Bijaya Ketan Panigrahi, Electrical Engineering, Indian Institute of Technology Delhi, New Delhi, Delhi, India Samarjit Chakraborty, Fakultät für Elektrotechnik und Informationstechnik, TU München, München, Germany Jiming Chen, Zhejiang University, Hangzhou, Zhejiang, China

Shanben Chen, Materials Science & Engineering, Shanghai Jiao Tong University, Shanghai, China

Tan Kay Chen, Department of Electrical and Computer Engineering, National University of Singapore, Singapore, Singapore

Rüdiger Dillmann, Humanoids and Intelligent Systems Lab, Karlsruhe Institute for Technology, Karlsruhe, Baden-Württemberg, Germany

Haibin Duan, Beijing University of Aeronautics and Astronautics, Beijing, China

Gianluigi Ferrari, Università di Parma, Parma, Italy

Manuel Ferre, Centre for Automation and Robotics CAR (UPM-CSIC), Universidad Politécnica de Madrid, Madrid, Madrid, Spain

Sandra Hirche, Department of Electrical Engineering and Information Science, Technische Universität München, München, Germany

Faryar Jabbari, Department of Mechanical and Aerospace Engineering, University of California, Irvine, CA, USA

Limin Jia, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Janusz Kacprzyk, Systems Research Institute, Polish Academy of Sciences, Warsaw, Poland

Alaa Khamis, German University in Egypt El Tagamoa El Khames, New Cairo City, Egypt Torsten Kroeger, Stanford University, Stanford, CA, USA

Qilian Liang, Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX, USA Ferran Martin, Departament d'Enginyeria Electrònica, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain

Tan Cher Ming, College of Engineering, Nanyang Technological University, Singapore, Singapore Wolfgang Minker, Institute of Information Technology, University of Ulm, Ulm, Germany

Pradeep Misra, Department of Electrical Engineering, Wright State University, Dayton, OH, USA

Sebastian Möller, Quality and Usability Lab, TU Berlin, Berlin, Germany

Subhas Mukhopadhyay, School of Engineering & Advanced Technology, Massey University, Palmerston North, Manawatu-Wanganui, New Zealand

Cun-Zheng Ning, Electrical Engineering, Arizona State University, Tempe, AZ, USA

Toyoaki Nishida, Graduate School of Informatics, Kyoto University, Kyoto, Kyoto, Japan

Federica Pascucci, Dipartimento di Ingegneria, Università degli Studi "Roma Tre", Rome, Italy

Yong Qin, State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, China Gan Woon Seng, School of Electrical & Electronic Engineering, Nanyang Technological University, Singapore, Singapore

Joachim Speidel, Institute of Telecommunications, Universität Stuttgart, Stuttgart, Baden-Württemberg, Germany

Germano Veiga, Campus da FEUP, INESC Porto, Porto, Portugal

Haitao Wu, Academy of Opto-electronics, Chinese Academy of Sciences, Beijing, China Junjie James Zhang, Charlotte, NC, USA

The book series *Lecture Notes in Electrical Engineering* (LNEE) publishes the latest developments in Electrical Engineering - quickly, informally and in high quality. While original research reported in proceedings and monographs has traditionally formed the core of LNEE, we also encourage authors to submit books devoted to supporting student education and professional training in the various fields and applications areas of electrical engineering. The series cover classical and emerging topics concerning:

- Communication Engineering, Information Theory and Networks
- Electronics Engineering and Microelectronics
- Signal, Image and Speech Processing
- Wireless and Mobile Communication
- Circuits and Systems
- Energy Systems, Power Electronics and Electrical Machines
- Electro-optical Engineering
- Instrumentation Engineering
- Avionics Engineering
- Control Systems
- Internet-of-Things and Cybersecurity
- Biomedical Devices, MEMS and NEMS

For general information about this book series, comments or suggestions, please contact leontina. dicecco@springer.com.

To submit a proposal or request further information, please contact the Publishing Editor in your country:

#### China

Jasmine Dou, Associate Editor (jasmine.dou@springer.com)

#### India

Swati Meherishi, Executive Editor (swati.meherishi@springer.com) Aninda Bose, Senior Editor (aninda.bose@springer.com)

#### Japan

Takeyuki Yonezawa, Editorial Director (takeyuki.yonezawa@springer.com)

#### South Korea

Smith (Ahram) Chae, Editor (smith.chae@springer.com)

#### Southeast Asia

Ramesh Nath Premnath, Editor (ramesh.premnath@springer.com)

#### USA, Canada:

Michael Luby, Senior Editor (michael.luby@springer.com)

#### All other Countries:

Leontina Di Cecco, Senior Editor (leontina.dicecco@springer.com) Christoph Baumann, Executive Editor (christoph.baumann@springer.com)

# \*\* Indexing: The books of this series are submitted to ISI Proceedings, EI-Compendex, SCOPUS, MetaPress, Web of Science and Springerlink \*\*

More information about this series at http://www.springer.com/series/7818

Sergio Saponara · Alessandro De Gloria Editors

# Applications in Electronics Pervading Industry, Environment and Society

APPLEPIES 2018



*Editors* Sergio Saponara University of Pisa Pisa, Italy

Alessandro De Gloria University of Genoa Genoa, Italy

ISSN 1876-1100 ISSN 1876-1119 (electronic) Lecture Notes in Electrical Engineering ISBN 978-3-030-11972-0 ISBN 978-3-030-11973-7 (eBook) https://doi.org/10.1007/978-3-030-11973-7

Library of Congress Control Number: 2019931542

© Springer Nature Switzerland AG 2019, corrected publication 2019

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

## Preface

The 2018 edition of the Conference on "Applications in Electronics Pervading Industry, Environment and Society" was held in Pisa, Italy, on September 26–27, 2018, at the Congress Center "Le Benedettine" of the University of Pisa.

The conference had the technical and/or financial support of University of Pisa, University of Genoa, SIE (Italian Association for Electronics), and INTEL.

The conference, active since 2002, offers an overview of Electronic applications in several domains, demonstrating how Electronics has become pervasive and ever more embedded in everyday objects and processes.

In this edition, about 60 papers were accepted after a review process, with 3 independent reviews for each paper, organized in 8 oral sessions and 2 poster sessions. The oral sessions involved contributions on *Automotive Electronics* chaired by Prof. M. Grammatikakis, *Healthcare & Bio-electronic Systems* chaired by Prof. A. Solanas, *Technology and Testing* chaired by Dr. T. Erlbacher, *Sensors & Transducers* chaired by Prof. R. Berta, *Signal Processing Systems* chaired by Prof. A. Mansour, *Wireless Circuits and Systems* chaired by Prof. E. Ragonese, *Power and Thermal Electronics* chaired by Prof. M. Conti, and *Digital Circuits and Systems* chaired by Prof. M. Martina. Professor F. Bellotti and M. Ruo Roch chaired the poster sessions.

The conference hosted also three special events, introduced by Prof. S. Saponara:

- Keynote lecture *From Silicon Gate to Microprocessors to AI to Consciousness*, held by Federico Faggin, the father of the microprocessor.
- Roundtable *Legacy in Applied Electronics & Systems* with contributions from multinational electronic industries (Intel, Sitael, Hanking Electronics, and AMS).
- Roundtable *Perspectives in Embedded and High Performance Computing* with contributions from STMicroelectronics, Barcelona SuperComputing, E4, FagginFoundation, European Processor Initiative, and representative of the EU commission.

The proposed papers, collected in this book, and the talks and roundtables of the special events, prove that the computing, storage, and networking capabilities of today electronic systems is such that their applications can fulfill the needs of humankind in terms of mobility, health, connectivity, energy management, smart production, ambient intelligence, and smart living.

To exploit such capabilities, multidisciplinary knowledge and expertise are needed to support a virtuous iterative cycle from user needs to the design, prototyping and testing of new products and services. The latter are more and more characterized by a digital core.

The design and testing cycles go through the whole system engineering process, which includes analysis of users' needs, specification definition, verification plan definition, software and hardware co-design, lab and user testing and verification, maintenance management, and life-cycle management of electronics applications. The design of electronics-enabled systems should provide key features such as innovation, high performance, real-time operations, and implementations with low-cost and reduced budgets in terms of size, weight, and power consumption. To succeed in this, one of the most important factors is the adoption of a suited design flow and related CAD (Computer-Assisted Design) tools. Platform-based design and meet in the middle between top-down and bottom-up design flows are needed to fulfill the time- and cost-related challenges of nowadays market scenarios.

All these challenging aspects call for the importance of the role of Academia as a place where new generations of designers can learn and practice with cutting-the-edge technological tools and are stimulated to devise solutions for challenges coming from a variety of application domains, such as health care, transportation, education, tourism, entertainment, cultural heritage, and energy.

The APPLEPIES 2018 conference wants to report and discuss several examples of designs and become a reference point in the field of electronics systems design and applications, trying to fill at scientific and technological R&D level a gap that the most farsighted industries have already indicated and are striving to cover.

Pisa, Italy

Genoa, Italy

Sergio Saponara General Chair Alessandro De Gloria Honorary Chair

The original version of the book was revised: The Volume number has been updated. The correction to the book is available at https://doi.org/10.1007/978-3-030-11973-7\_60

# Contents

#### Part I Automotive Electronics

| Raspberry Pi 3 Performance Characterization in an Artificial Vision<br>Automotive Application                                                                                                                                                               | 3  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Ahmad Kobeissi, Francesco Bellotti, Riccardo Berta<br>and Alessandro De Gloria                                                                                                                                                                              | 2  |
| Analysis of Cybersecurity Weakness in Automotive In-Vehicle<br>Networking and Hardware Accelerators for Real-Time                                                                                                                                           | 11 |
| Luca Baldanzi, Luca Crocetti, Matteo Bertolucci, Luca Fanucci<br>and Sergio Saponara                                                                                                                                                                        | 11 |
| <b>Testing Facility for the Characterization of the Integration</b><br><b>of E-Vehicles into Smart Grid in Presence of Renewable Energy</b><br>Paolo Ferrari, Alessandra Flammini, Marco Pasetti, Stefano Rinaldi,<br>Flavio Simoncini and Emiliano Sisinni | 19 |
| <b>On-the-Fly Secure Group Communication on CAN Bus</b>                                                                                                                                                                                                     | 27 |
| Part II Healthcare and Bio-electronic Systems                                                                                                                                                                                                               |    |
| Neuromuscular Disorders Assessment by FPGA-Based SVMClassification of Synchronized EEG/EMGDaniela De Venuto and Giovanni Mezzina                                                                                                                            | 37 |
| Functional Near Infrared Spectroscopy System Validationfor Simultaneous EEG-FNIRS MeasurementsG. C. Giaconia, G. Greco, L. Mistretta, R. Rizzo, A. Merla,A. M. Chiarelli, F. Zappasodi and G. Edlinger                                                      | 45 |

| Electro-Photonic Chip-Scale Microsystem for Label-Free Single                                                                                                                               |     |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Bacteria Monitoring           Francesco Dell'Olio, Donato Conteduca, Michele Cito, Giuseppe Brunetti,           Caterina Ciminelli, Thomas F. Krauss and Mario N. Armenise                  | 53  |
| HGIS: A Healthcare-Oriented Approach to Geographic Information<br>Systems<br>Edgar Batista, Antoni Martínez-Ballesté, Marta Peña, Xavier Singla<br>and Agusti Solanas                       | 59  |
| Part III Technology and Testing Issues                                                                                                                                                      |     |
| Main Parasitic Effects in Contactless Wafer Testing<br>Alessandro Finocchiaro, Giovanni Girlando, Alessandro Motta,<br>Alberto Pagani and Giuseppe Palmisano                                | 69  |
| Study of Low-Dose Long-Exposure Gamma Radiation Effects on InPDBR Cavity Lasers from Generic Integration TechnologyF. Gambini, N. Andriolli, V. Nurra, M. Chiesa, F. Petroni and S. Faralli | 77  |
| Technological Advances Towards 4H-SiC JBS Diodes for Wind PowerApplications                                                                                                                 | 83  |
| Part IV Sensors and Transducers                                                                                                                                                             |     |
| A Scalable 2D, Low Power Airflow Probe for Unmanned Vehicle<br>and WSN Applications                                                                                                         | 93  |
| Electronics System for Velocity Profile Emulation<br>Dario Russo, Valentino Meacci and Stefano Ricci                                                                                        | 101 |
| An Ultra-Low Cost Triboelectric Flowmeter                                                                                                                                                   | 109 |
| Electro-Thermal Characterization and Modeling of a 4-Wire<br>Microheater for Lab-on-Chip Systems<br>Andrea Scorzoni, Pisana Placidi, Paolo Valigi and Nicola Lovecchio                      | 117 |
| Part V Signal Processing Systems                                                                                                                                                            |     |
| Toward the Real Time Implementation of the 2-D Frequency-Domain         Vector Doppler Method         Stefano Rossi Matteo Lenge Alessandro Dallai Alessandro Ramalli                       | 129 |
| and Enrico Boni                                                                                                                                                                             |     |

| <ul> <li>A Field Experiment of Rainfall Intensity Estimation Based</li> <li>on the Analysis of Satellite-to-Earth Microwave Link Attenuation</li> <li>M. Colli, M. Stagnaro, A. Caridi, L. G. Lanza, A. Randazzo, M. Pastorino,</li> <li>D. D. Caviglia and A. Delucchi</li> </ul> | 137 |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| A Face Recognition System Using Off-the-Shelf Feature Extractors<br>and an Ad-Hoc Classifier                                                                                                                                                                                       | 145 |
| Twiddle Factor Generation Using Chebyshev Polynomials and HDL<br>for Frequency Domain Beamforming                                                                                                                                                                                  | 153 |
| Part VI Wireless Circuits and Systems                                                                                                                                                                                                                                              |     |
| A LoRaWAN Wireless Sensor Network for Data Center<br>Temperature Monitoring<br>Tommaso Polonelli, Davide Brunelli, Andrea Bartolini and Luca Benini                                                                                                                                | 169 |
| Wireless Low Energy System Architecture for Event-Driven SurfaceElectromyographyFabio Rossi, Paolo Motto Ros, Stefano Sapienza, Paolo Bonato,Emilio Bizzi and Danilo Demarchi                                                                                                      | 179 |
| Activity Monitoring and Phase Detection Using a Portable<br>EMG/ECG System<br>Wulhelm Daniel Scherz, Ralf Seepold, Natividad Martínez Madrid,<br>Paolo Crippa, Giorgio Biagetti, Laura Falaschetti and Claudio Turchetti                                                           | 187 |
| Transformer Design for 77-GHz Down-Converter in 28-nmFD-SOI CMOS TechnologyAndrea Cavarra, Claudio Nocera, Giuseppe Papotto, Egidio Ragoneseand Giuseppe Palmisano                                                                                                                 | 195 |
| Part VII Power and Thermal Electronics                                                                                                                                                                                                                                             |     |
| Investigating an Active Cooling System Powered by a Thermoelectric<br>Generator                                                                                                                                                                                                    | 205 |
| A Smart Torque Control for a High Efficiency 4WD Electric                                                                                                                                                                                                                          |     |
| Vehicle Antonio Cordopatri and Giuseppe Cocorullo                                                                                                                                                                                                                                  | 213 |

| Experimental Analysis of Battery Management System Algorithmsof Li-ion BatteriesFederico Garbuglia, Matteo Unterhorst, Luca Buccolini, Simone Orcioniand Massimo Conti                                                                    | 221 |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Part VIII Digital Circuits and Systems                                                                                                                                                                                                    |     |
| Design of Low-Power Approximate LMS Filters<br>with Precision-Scalability<br>Darjn Esposito, Gennaro Di Meo, Davide De Caro, Antonio G. M. Strollo<br>and Ettore Napoli                                                                   | 237 |
| An Optimized Partial-Distortion-Elimination Based Sum-of-Absolute-<br>Differences Architecture for High-Efficiency-Video-Coding<br>Paolo Selvo, Maurizio Masera, Riccardo Peloso, Guido Masera,<br>Muhammad Shafique and Maurizio Martina | 245 |
| Efficient Ensemble Machine Learning Implementation on FPGAUsing Partial ReconfigurationGian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino,Marco Matta, Marco Re, Francesca Silvestri and Sergio Spanò               | 253 |
| Synthesis Time Reconfigurable Floating Point Unit for Transprecision<br>Computing<br>Giulia Stazi, Federica Silvestri, Antonio Mastrandrea, Mauro Olivieri<br>and Francesco Menichelli                                                    | 261 |
| Radiation Hardness by Design Techniques for 1 Grad TID Rad-HardSystems in 65 nm Standard CMOS TechnologiesGabriele Ciarpi, Sergio Saponara, Guido Magazzù and Fabrizio Palla                                                              | 269 |
| Part IX Poster Session                                                                                                                                                                                                                    |     |
| Context-Aware Environments in Passenger Train Transportation<br>Systems: Ideas, Feasibility and Risks<br>Francisco Falcone, Costas Patsakis and Agusti Solanas                                                                            | 279 |
| Automatic Perishable Goods Shelf Life Optimization<br>in No-Refrigerated Warehouses by Using a WSN-Based<br>Architecture                                                                                                                  | 287 |
| FPGA-Based Multi Cycle Parallel Architecture for Real-Time                                                                                                                                                                                |     |
| Processing in Ultrasound Applications<br>Valentino Meacci, Enrico Boni, Alessandro Dallai, Alessandro Ramalli,<br>Monica Scaringella, Francesco Guidi, Dario Russo and Stefano Ricci                                                      | 295 |

Contents

| Adaptive Tuning System and Parameter Estimation of a Digitally         Controlled Boost Converter with STM32         Gianpaolo Vitale, Antonino Pagano, Leonardo Mistretta         and Giuseppe Costantino Giaconia                                                                   | 303 |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| <b>Developing a Machine Learning Library for Microcontrollers</b><br>Andrea Parodi, Francesco Bellotti, Riccardo Berta<br>and Alessandro De Gloria                                                                                                                                    | 313 |
| The Case for RISC-V in Space<br>Stefano Di Mascio, Alessandra Menicucci, Gianluca Furano,<br>Claudio Monteleone and Marco Ottavi                                                                                                                                                      | 319 |
| Encoder-Motor Misalignment Compensation for Closed-Loop Hybrid<br>Stepper Motor Control                                                                                                                                                                                               | 327 |
| Fully Integrated Galvanically Isolated DC-DC ConvertersBased on Inductive CouplingEgidio Ragonese, Alessandro Parisi, Nunzio Spinaand Giuseppe Palmisano                                                                                                                              | 335 |
| A Distributed Condition Monitoring System for the Non-invasive<br>Temperature Measurement of Heat Fluids Circulating<br>in Turbomachinery Pipes Based on Self-Powered Sensing Nodes<br>Tommaso Addabbo, Elia Landi, Riccardo Moretti, Marco Mugnaini,<br>Lorenzo Parri and Marco Tani | 343 |
| Towards Subsea Non-ohmic Power Transfer via a Capacitor-LikeStructureAnwar Mohamed, Valentina Palazzi, Sunny Kumar, Federico Alimenti,Paolo Mezzanotte and Luca Roselli                                                                                                               | 349 |
| A Robust Sensing Node for Wireless Monitoring of Drinking<br>Water Quality<br>Lorenzo Mezzera, Michele Di Mauro, Marco Tizzoni, Andrea Turolla,<br>Manuela Antonelli and Marco Carminati                                                                                              | 359 |
| <b>Doubly-Balanced Gilbert Cell Down-Conversion Mixer in AMS</b><br><b>0.35 μm SiGe CMOS for Mode-1 MB-OFDM UWB Receivers</b><br>S. Cammarata, G. Fieramosca, B. Neri, F. Baronti and S. Saponara                                                                                     | 367 |
| Efficient Implementation of Recurrent Neural Network<br>Accelerators<br>Vida Abdolzadeh and Nicola Petra                                                                                                                                                                              | 375 |
| Ultrasound Measurement of the Peak Blood Flow Based<br>on a Doppler Spectrum Model<br>Riccardo Matera, David Vilkomerson and Stefano Ricci                                                                                                                                            | 383 |

| Embedded System to Recognize Movement and Breathing in Assisted         Living Environments       3         Eva Rodríguez de Trujillo, Ralf Seepold, Maksym Gaiduk,       3         Natividad Martínez Madrid, Simone Orcioni and Massimo Conti       3 | 91  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| Energy Harvesting with Current Sensors to Sustain Embedded<br>IoT Platforms                                                                                                                                                                             | 99  |
| A PXI Based Implementation of a TLK2711 Equivalent Interface 4<br>Pietro Nannipieri, Luca Dello Sterpaio, Antonino Marino<br>and Luca Fanucci                                                                                                           | .07 |
| An FPGA-Based Real-Time Acquisition System for a Distributed         Acoustic Sensor Based on Φ-OTDR       4         Francesco Martina, Yonas Muanenda, Stefano Faralli       4         and Fabrizio Di Pasquale       4                                | 15  |
| Sound-Based Detection and Ranging System as Example Applicationof a Rapid Prototyping and Low-Cost Technology for Board-LevelElectronic Systems EducationStefano Di Pascoli, Gabriele Ciarpi and Sergio Saponara                                        | -21 |
| Approximate Memory Support for Linux Early Allocatorsin ARM Architectures4Giulia Stazi, Antonio Mastrandrea, Mauro Olivieriand Francesco Menichelli                                                                                                     | .29 |
| Fully Digital Low-Power Implementation of an Audio Front-Endfor Portable Applications4Gabriele Meoni, Luca Pilato, Gabriele Ciarpi, Alessandro Pallaand Luca Fanucci                                                                                    | .37 |
| Comparison and Implementation of Variable Fractional Delay Filtersfor Wideband Digital Beamforming4Gian Carlo Cardarilli, Daniele Giardino, Marco Matta, Marco Re,Francesca Silvestri, Lorenzo Simone and Sergio Spanò                                  | .45 |
| Autonomous Sail Surface Boats, Design and Testing Resultsof the MOUNTAINS Prototype4Enrico Boni, Marco Montagni and Luca Pugi                                                                                                                           | .53 |
| A Low Cost ALS and VLC Circuit for Solid State Lighting 4<br>Massimo Ruo Roch and Maurizio Martina                                                                                                                                                      | 61  |
| Chamberlin State-Variable Filter Structure in FPGA for Musical         Applications       4         Adriana Ricci, Mattia Silvestrini, Massimo Conti, Marco Caldari       4         and Franco Ripa       4                                             | .69 |

Contents

| Brake Blending and Optimal Torque Allocation Strategies<br>for Innovative Electric Powertrains<br>Luca Pugi, Tommaso Favilli, Lorenzo Berzi, Edoardo Locorotondo<br>and Marco Pierini | 477        |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| <b>Smart Coaster: An Example of IoT Design and Implementation</b><br>Maurizio Rossi, Matteo Nardello and Davide Brunelli                                                              | 485        |
| IP Generator Tool for Efficient Hardware Acceleration<br>of Self-organizing Maps<br>Daniele Giardino, Marco Matta, Marco Re, Francesca Silvestri<br>and Sergio Spanò                  | 493        |
| Correction to: Applications in Electronics Pervading Industry,<br>Environment and Society                                                                                             | <b>C</b> 1 |
| Author Index                                                                                                                                                                          | 501        |

# Part I Automotive Electronics

## **Raspberry Pi 3 Performance Characterization in an Artificial Vision Automotive Application**



Ahmad Kobeissi, Francesco Bellotti, Riccardo Berta and Alessandro De Gloria

**Abstract** Artificial vision is a key factor for new generation automotive systems. This paper focuses on a module aimed at maximizing the energy flow between the transmitting and receiving grids, in the context of dynamic wireless charging of electrical vehicles. The output of the module helps the driver to keep a precise alignment between the vehicle and the charging grids in the road. The module was developed using low cost and open hardware and software components. This paper provides a characterization of the embedded system from a performance point of view, considering various parameters, such as CPU load, memory footprint, and energy consumption, in view of assessing the Raspberry Pi as a platform for embedded rapid prototyping and computing in automotive environment.

#### 1 Introduction

Also thanks to the power of deep learning technologies, artificial vision is a key factor for new generation automotive systems reaching higher automation levels [1]. This paper considers a particular vision application, for dynamic wireless charging of electrical vehicles [2, 3]. In this context of Inductive Power Transfer (IPT), it is necessary to precisely align the receiving coil, placed under the vehicle, and the transmitting coils, that are buried in the asphalt in a specific road lane [4]. The goal is to maximize the energy flow between the coils, keeping the displacement within

F. Bellotti e-mail: franz@elios.unige.it

R. Berta e-mail: berta@elios.unige.it

A. Kobeissi (🖂) · F. Bellotti · R. Berta · A. De Gloria

DITEN, Università Degli Studi Di Genova, Via Opera Pia 11/a, 16145 Genoa, Italy e-mail: ahmad.kobeissi@elios.unige.it

A. De Gloria e-mail: ADG@elios.unige.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics* 

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_1

 $\pm 20$  cm. (with respect to a width of the transmitter of 50 cm), as higher misalignments typically cause drops in energy transfer [5].

In the context of the Fabric project [6], we have developed a subsystem aimed at helping the driver to keep a precise alignment between the vehicle and the charging grids in the road.

The sub-system was developed using low cost and open hardware and software components and gave us an opportunity to assess the Raspberry Pi as a platform for embedded rapid prototyping. This paper presents our development experience and focuses on providing a characterization of the Raspberry Pi 3 single-board PC [7] from a performance point of view, considering various parameters, such as CPU load and memory footprint, instruction per cycle, energy consumption.

#### 2 Related Work

Car manufacturers are striving to build smart vehicles, exploiting massive amounts of data from a wide variety of hardware components, including sensors, onboard cameras, and further external sources [8]. These data are processed on the cloud or on-board. Also according to the emerging edge-computing paradigm [9, 10], it is likely that a growing amount of computation will be done on-board, by means of low-cost and energy efficient (low energy) microcomputer devices, such as the Raspberry Pi [11], that have recently spread in several application domains (e.g., in high-precision agriculture [12]). Hassan [13] stresses that "*The dramatic drop in price of computing hardware, coupled with the recent breakthroughs in embedded systems design that enabled the integration of high-level software and low-level electronics [...] has led to the development of different varieties of user-friendly Internet of Things (IoT) hardware development platforms for IoT prototyping".* 

He et al. [14] investigated performance of the Raspberry pi 2 B+ graphics in terms of electrical power and energy. They measured power consumption difference between GPU rendering and software rendering, using different benchmarks. Results showed that the number of frames rendered per second increases dramatically when hardware rendering is used, as does electrical power. They also found that—due to the velocity of the hardware—the total energy consumed per rendered frame was lower despite the electrical power during hardware rendering being higher. Nunes et al. [15] analyzed the execution behavior and power consumption of web services on Raspberry Pi B. In this paper, we focus on the new Raspberry Pi 3 board, and on characterizing its behavior while running an artificial vision program, in order to quantitatively assess this open platform in an automotive embedded environment.

#### **3** System Architecture

From the hardware viewpoint, our grid alignment system involved a video camera (Logitech c920), a high definition (1920 × 1080) webcam to capture 30 frames per second of the road ahead as shown in Fig. 1. Mounted on the top-middle of the windshield using a suction cup, the camera is connected to a server-node device the Raspberry Pi 3 B [7], or a regular PC, both of which run python on Windows or Linux OS. In the development phase, as working with the Pi 3 module required a monitor, mouse, and keyboard for monitoring, tracing, and editing the code, we used an ordinary laptop for convenience [18]. For the final system release, the laptop was seamlessly replaced with a Pi 3 board. Our laptop was a Samsung Chronos 770Z5E with 3rd gen Intel Core i7, 8 GB of main memory RAM, and an AMD Radeon HD 8870 M graphics card with 2 GB of dedicated RAM.





The core functions of the system run on the server-node and are written in python. We used Jupyter Notebook [16] as execution environment, and the OpenCV library for graphic processing [17].

Three steps formulate each frame's processing: object spotting, clustering and line detection, and middle point estimation. The objects we set to recognize are the grid and the lane. The grid is the prime priority; we switch to lane borders recognition in case the grid fails to be recognized. Since it is unknown beforehand whether a grid could be recognized or not, we implemented a parallel image processing pipeline on two copies of each frame, one for grid recognition and another for lane recognition.

For each object, we set specific colour masks and region masks. Two colour masks, or filters, were defined for the grids—one is light grey and the other dark grey—corresponding to the different grids' possible colors. For the lane lines, we considered the two internationally standardized lane colours: white and yellow.

Next, the algorithm starts a parallel processing pipeline where low-pass filters are applied to the two instances of the original frame. For grid detection, we apply a median-blur filter, while we use a Gaussian filter for lane line detection.

After that, both instances undergo edge detection using the Canny algorithm that shapes out the contour of the grid and of the lane lines. Then, region masks selection is performed. The grid region mask is centred, while lane lines mask is composed of two separate regions at the right and left sides of the frame.

After the frame instances are cropped by the region masks, they become ready for Hough lines estimation, in which a transform function is performed to produce every possible straight line within specific slope and bias parameters. These parameters are adjusted to fit the depth perception of the dimensions of the road. A concentration of Hough lines would form in each region, signifying the recognition of grid or lane borders. A clustering algorithm identifies each concentration, then a median line is computed for each cluster. Thus, each couple of near-symmetric lines are associated together as an object (grid or lane) representation.

The lane recognition sometimes fails when the vehicle is too close to an empty gap of the central dashed line. To tackle this situation we developed a function based on the recognition of a single line. The function implements a simple online estimation algorithm that estimates the distance between the two lane-borders to the provisional grid centre using the previous 10 frames with true recognition of both border lines.

As mentioned before, the priority in image processing is for grid detection. If a grid was recognized in its own region, the algorithm defines the two lines representing the right and left sides of the grid according to a validity check including the parameters: line slope and bias. Then, a middle line can be generated, the centre of which is our estimation for the grid centre coordinates. The offset can now be computed as the difference between the estimated grid centre and the frame middle. In the other pipeline, the offset is the lane centre, which we use to compute the estimated grid centre. The offset that is computed at this stage by either pipeline is a pixel metric and needs to be converted to centimeters.

Once the offset is computed, its value is sent over WIFI to the tablet device placed on the vehicle dashboard, where an Android app displays the offset as a pointer on a linear gauge. A POST HTTP request is also sent to a private online server with the grid misalignment estimation, its direction, and the vehicle charging state as payloads.

#### 4 Experiment

We used the developed software as a benchmark to assess performance of the RPi 3 system. We performed several measurements, that are reported in Table 2. Details about the target system are provided in Table 1.

As monitors, we used h-top, a USB Power Gauge and Conky. Since, for the sake of simplicity, we employed the same RPi board as the measuring and measured system as well, we expect that the actual absolute performance should be slightly better.

We tried our grid alignment assistance system (GAAS), in four configurations, varying the size of the convolutional kernel, which is used for filtering each frame with the Gaussian and the median-blur filter. The contribution of the online estimating algorithm was negligible. Results are the same at each frame, according to the functioning of the GAAS program, and we can see that the  $5 \times 5$  configuration allows achieving the same high accuracy (approximately 80% of the time prediction is within a 10 cm. error, while 100% within 20 cm.) as the  $7 \times 7$  kernel, but the utilization of the resources is much lower, leaving space also for other applications. Accuracy was measured in road tests performed in Susa (To), Italy, at the MotorOasi safe drive track, with a dynamic wireless charging experimental van set up by the

| Feature      | Value              | Notes                                 |
|--------------|--------------------|---------------------------------------|
| Processor:   | ARMv7 rev 4        | @ 1.20 GHz'                           |
| Cores        | 4                  |                                       |
| Graphics:    | LLVMpipe           |                                       |
| OpenGL       | 3.3 Mesa 13.0.3    | Gallium 0.4<br>(LLVM 3.9 128<br>bits) |
| Screen       | 1824 × 984 HDMI    | Samsung<br>S22C300H                   |
| MotherBoard: | BCM2835 Pi 3       | Model B Rev 1.2                       |
| Memory:      | 860 MB             |                                       |
| Disk:        | 32 GB SD           |                                       |
| File-system: | ext4               |                                       |
| OS:          | Raspbian 9.4       |                                       |
| Kernel       | 4.14.52-v7 + arm71 |                                       |

**Table 1**Raspberry Pi 3system information

| Program                                                  | h-top                  |                |         |                                                   | USB powe | r gauge |      | OpenGL | Conky       |               | Accuracy    |             |
|----------------------------------------------------------|------------------------|----------------|---------|---------------------------------------------------|----------|---------|------|--------|-------------|---------------|-------------|-------------|
|                                                          | CPU                    | Freq.<br>(MHz) | Threads | Memory                                            | >        | A       | *    | FPS    | Nwk<br>(up) | Nwk<br>(down) | < 10<br>cm. | < 20<br>cm. |
| GAAS (3 ×<br>3)                                          | 22% (python 20%)       | 1200           | 1       | 46%—python core<br>14%—chromiun<br>(jupyter) 20%  | 4,93     | 0,36    | 1,77 | 28     | 3.0 KB/s    |               | 5%          | 20%         |
| GAAS (5 ×<br>5)                                          | 35% (python 34%)       | 1200           | 5       | 449%—python core<br>16%—chromiun<br>(jupyter) 21% | 4,97     | 0,4     | 1,99 | 24     | 2.9 KB/s    |               | 80%         | 100%        |
| GAAS (7 ×<br>7)                                          | 75% (python 71%)       | 1200           | 3       | 50%—python core<br>18%—chromiun<br>(jupyter) 20%  | S        | 0,46    | 2,30 | 10     | 2.4 KB/s    |               | 80%         | 100%        |
| GAAS (9 ×<br>9)                                          | 100% then crashed      | 1200           | 4       | 50%—python core<br>19%—chromiun<br>(jupyter) 19%  | 4,99     | 0,5     | 2,50 | e      | 1.5 KB/s    |               | 100%        | 100%        |
| Idle                                                     | 3%                     | 600            | 1       | 12%                                               | 5        | 0,19    | 0,95 | 30     |             |               |             |             |
| Youtube<br>(1080)                                        | 33% (chromium<br>28%)  | 1200           | 2       | 43%—chromium 30%                                  | 4,98     | 0,39    | 1,94 | 30     |             | 10 MB/s       |             |             |
| Minecraft Pi                                             | 29% (minecraft<br>23%) | 600            | 2       | 17%-minecraft 5%                                  | 5        | 0,36    | 1,80 | 30     |             |               |             |             |
| Walfram<br>Math.<br>(AutocorrTest<br>"up to lag<br>100") | 51% (Test 45%)         | 1200           | e       | 58%—wolfram 38%                                   | 4,94     | 0,39    | 1,93 | 30     |             |               |             |             |

| er benchmarks |
|---------------|
| nd oth        |
| GAAS ai       |
| for           |
| metrics       |
| Performance   |
| Table 2       |

8

Fabric project. The 3 × 3 version almost keeps the camera source rate (30 FPS), but with unacceptable accuracy. The 7 × 7 version has a non-negligible FPS rate drop, differently from the 5 × 5 kernel. The 9 × 9 size has the highest accuracy, reaching even higher resolution values (3 cm., on a laptop), but leading to full utilization of the CPU up to a crash in the RPi. We believe that the use of the GPU may improve performance, but results achieved with the 5 × 5 version–that we used in the road tests—are already within the set specifications [5].

We also considered some other benchmarks, doing video playing (Youtube), gaming (Minecraft) and intensive computation (Walfram Mathematica). We could notice that, in our case, the CPU is the bottleneck, while in the other cases CPU utilization is quite limited. This is another indicator that a custom utilization of the GPU by GAAS should lead to better results.

#### 5 Conclusions

This paper has investigated the behavior of a Raspberry Pi 3 single PC board while running an artificial vision program, in view of a quantitative assessment of this open platform in an automotive embedded environment. The benchmark program—performing grid alignment for the dynamic wireless charging of electric vehicles—involved various signal processing functions and a simple online estimation algorithm. To the best of our knowledge it is the first performance characterization of a Raspberry Pi 3 in automotive environment.

Results have shown that the bottleneck of the system is represented by the CPU. A custom exploitation of the GPU should lead to better results, even if results with a proper filter kernel size configuration allow show that the system has been able to reach the target accuracy [5] while leaving room for other possible concurrent applications, and with energy demands similar to state of the art programs.

From the development viewpoint, the choice of python has been very effective, as it allowed a seamless transition between the desktop and embedded environments.

For future work, results suggest implementing a GPU-targeted version of the system. Moreover, further analysis with other benchmarks and platforms are needed in order to better assess the potential of open hardware systems for embedded applications in automotive.

Acknowledgements We would like to thank the FABRIC coordinator, Prof. Angelos Amditis and all the colleagues that allowed a successful performance of the project.

This work was supported in part by the EU, under the Feasibility Analysis and Development of on-road charging solutions for future electric vehicles (FABRIC) integrated project (FP7-SST-2013-RTD-1 605405).

#### References

- 1. Falcini, F., Lami, G., Costanza, A.M.: Deep learning in automotive software. IEEE Softw. 34(3), 56–63 (2017)
- 2. Ruffo, R., Cirimele, V., Diana, M., Khalilian, M., La Ganga, A., Guglielmi, P.: Sensorless control of the charging process of a dynamic inductive power transfer system with an interleaved nine-phase boost converter. IEEE Trans. Industr. Electron. **65**(10), 7630–7639 (2018)
- Tavakoli, R., Pantic, Z., Analysis, design and demonstration of a 25-kW dynamic wireless charging system for roadway electric vehicles. IEEE J. Emerg. Sel. Topics Power Electron. https://doi.org/10.1109/jestpe.2017.2761763
- Hwang, K., Park, J., Kim, D., Park, H.H., Kwon, J.H., Kwak, S.I., Ahn, S.: Autonomous coil alignment system using fuzzy steering control for electric vehicles with dynamic wireless charging. Math Probl. Eng. Article ID 205285, 14 p (2015). https://doi.org/10.1155/2015/ 205285
- Cirimele, V., Smiai, O., Guglielmi, P., Bellotti, F., Berta, R., De Gloria, A.: Maximizing power transfer for dynamic wireless charging electric vehicles. In: International Conference on Applications in Electronics Pervading Industry, Environment and Society, APPLEPIES 2016, Rome. Lecture Notes in Electrical Engineering, vol. 429, pp. 59–65 (2017). https://doi.org/10.1007/ 978-3-319-55071-8\_8
- Amditis, A. Karaseitanidis, G., Damousis, I., Guglielmi, P., Cirimele, V.: Dynamic wireless charging for more efficient FEVS: the fabric project concept, MedPower 2014, Athens, pp. 1–6 (2014)
- 7. Raspberry Pi 3 Model B. https://www.raspberrypi.org/products/raspberry-pi-3-model-b/
- Marosi, A.C., Lovas, R., Kisari, Á., Simonyi, E.: A novel IoT platform for the era of connected cars. In: 2018 IEEE international conference on future IoT technologies (Future IoT), Eger, pp. 1–11 (2018)
- Shi, W., Cao, J., Zhang, Q., Li, Y., Xu, L.: Edge computing: vision and challenges. IEEE Internet Things J 3(5), 637–646 (2016)
- Fan, Q., Ansari, N.: Application aware workload allocation for edge computing-based IoT. IEEE Internet Things J. 5(3), 2146–2153 (2018)
- Hajdarevic, K., Konjicija, S., Subasi, A.: A low energy APRS-IS client-server infrastructure implementation using Raspberry Pi. In: 2014 22nd Telecommunications Forum Telfor (TELFOR), Belgrade, pp. 296–299 (2014)
- Cimino, D., Ferrero, A., Queirolo, L., Bellotti, F., Berta, R., De Gloria, A.: A low-cost, open-source cyber physical system for automated, remotely controlled precision agriculture, In: Proceedings of Applications in Electronics Pervading Industry, Environment and Society (APPLEPIES), Lecture Notes in Electrical Engineering, Rome, Sept. 215. Springer, Cham
- 13. Hassan, Q.F.: A Tutorial Introduction to IoT Design and Prototyping with Examples, in Internet of Things A to Z: Technologies and Applications, vol. 1, Wiley-IEEE Press (2018)
- He, Q., Segee B., Weaver, V.: Raspberry Pi 2 B+ GPU Power, Performance, and Energy Implications. In: 2016 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, pp. 163–167 (2016)
- Nunes, L.H., et al.: Performance and energy evaluation of RESTful web services in Raspberry Pi. In: 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC), Austin, TX, pp. 1–9 (2014)
- 16. Jupyter Notebook. https://jupyter.org/
- 17. Beyeler, M., OpenCV with Python Blueprints, Packt (2015)
- Kobeissi, A., Bellotti, F., Berta, R., De Gloria, A.: IoT grid alignment assistant system for dynamic wireless charging of electric vehicles. In: 5th International Workshop on Intelligent Transportation and Connected Vehicles Technologies (ITCVT 2018), Valencia, Spain (2018)

# Analysis of Cybersecurity Weakness in Automotive In-Vehicle Networking and Hardware Accelerators for Real-Time Cryptography



# Luca Baldanzi, Luca Crocetti, Matteo Bertolucci, Luca Fanucci and Sergio Saponara

**Abstract** The work analyses the cybersecurity weakness in state-of-art automotive in-vehicle networks and discusses possible countermeasures at architecture level. Due to stringent real-time constraints (throughput and latency) of fail-safe automotive applications, hardware accelerators are needed. A hardware accelerator design for AES (Advanced Encryption Standard)-128/256 calculation, the latter being already considered post-quantum resistant, is also presented together with implementation results in FPGA and 45 nm CMOS technology.

**Keywords** HW accelerators • Automotive cybersecurity • AES (Advanced Encryption Standard)

#### **1** Introduction

Modern automotive systems feature several networked architectures and on-board communication buses. The amount of exchanged data and the in-vehicle networks traffic in absence of proper security mechanisms provide a wide range of attack surfaces, making the automotive systems vulnerable to the many typical cybersecurity threats and cybersecurity attacks, as data sniffing, data tampering, unauthorized

- L. Crocetti e-mail: luca.crocetti@for.unipi.it
- M. Bertolucci e-mail: bertolucci.matteo@gmail.com

L. Fanucci e-mail: luca.fanucci@unipi.it

S. Saponara e-mail: sergio.saponara@unipi.it

L. Baldanzi (⊠) · L. Crocetti · M. Bertolucci · L. Fanucci · S. Saponara Department of Information Engineering (DII), University of Pisa, via G. Caruso 16, 56122 Pisa, Italy

e-mail: luca.baldanzi@ing.unipi.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_2

accesses, etc. Due to stringent real-time (throughput and latency) constraints of failsafe automotive applications, hardware accelerators are needed. To this aim, Sect. 4 presents the design of a hardware accelerator design for AES (Advanced Encryption Standard)-128/256 calculation, the latter being already considered post-quantum resistant. The work presented in this paper addresses such issues and it is organized as it follows. Section 2 analyses the cybersecurity threats in state-of-art automotive in-vehicle networks and Sect. 3 discusses possible countermeasures at architecture level. Implementation results for the AES accelerator in FPGA and 45 nm CMOS technology are discussed in Sect. 5. Conclusions are drawn in Sect. 6.

#### 2 Cybersecurity Weakness and Countermeasures for On-Board Networking Automotive Systems

This paper refers to on-board embedded and networked automotive systems, and hence this Section is focused on cybersecurity issues and countermeasures for invehicle networks. Other cybersecurity aspects, such those related to Vehicle-to-Everything (V2X) connectivity, cloud-based traffic and fleet managements, just to name a few, are out of scope of this work. Secure by design in-vehicle networking should ensure several properties, such as data integrity, confidentiality, authentication and availability. However, several security vulnerabilities [1-8] characterize current in-vehicle networking technologies, using Controller Area Network (CAN) and/or CAN-FD as a backbone, and a plethora of other interconnecting technologies for specific subsystems (e.g., LIN, Local Interconnection Network, for local interconnection of low data-rate nodes, MOST, Media Oriented Systems Transport, for infotainment with USB and Bluetooth user interfaces, and FlexRay for latency-critical functions). The net-spanning data exchange via various gateway devices potentially allows access to any vehicular bus from every other existing bus system. Indeed, each LIN, CAN or MOST controller is potentially able to send messages to any other existing car controller [9, 10]. Without particular preventive measures, a single compromised bus system endangers the whole vehicle communication network. Whereas attacks on LIN or multimedia networks may result in the failure of power windows or navigation software, successful attacks on CAN or FlexRay networks may result in malfunctioning of some important driving assistants that leads to serious impairments of driving safety. While the use of Cyclic Redundancy Check (CRC) ensures data integrity, the broadcast nature of CAN/CAN-FD or FlexRay is a risk in terms of confidentiality, as an attacked Electronic Control Unit (ECU) can monitor all data passing on the bus. Moreover, since new ECUs can be added in a plug-andplay way (assigning them a new identifier) without modifying the already installed ECUs, and since the data link layer does not provide any mechanism for signature and transmission encryption, there is a high risk of authentication vulnerability and unauthorized access to the CAN bus and to the traffic content over it. Data sniffing and manipulation, jamming and Denial-of-Service (DoS) are only few of the possible attacks to the backbone bus or to the local bus that could lead to system failure or to functional failure, respectively. Moreover, utilizing the CAN mechanisms for automatic fault localization, malicious CAN frames can determine the disconnection of every single controller by posting several well-directed error flags. Similar to the CAN automatic fault localization, the bus guardian in FlexRay can be utilized for the well-directed deactivation of any controller by appropriate faked error messages. Attacks on the common time base, which would make the FlexRay network completely inoperative, are also feasible by posting proper malicious SYNC messages on the bus. Moreover, the introduction of well-directed sleep frames deactivates the corresponding power-saving capable FlexRay controllers.

#### 3 Countermeasures for On-Board Networking Automotive Systems

As possible countermeasures, the following techniques are foreseen and are likely to appear in the new generation of car connectivity devices:

- To cluster the subnetworks and related subsystems in security islands, separated by gateways with embedded cybersecurity functionalities, so that an attack on a non-safety related bus, like LIN or MOST, cannot propagate to the safety-related functions connected to FlexRay or CAN [3]. This approach will also be applied to the future architectures based on Automotive Ethernet [11].
- To embed cybersecurity hardware accelerators in new automotive computing units to sustain message encryption in real-time. This is the reason why in Sects. 4 and 5 we are proposing a new digital macrocell, that implements real-time security techniques like the symmetric-key algorithm AES. More complex algorithms like the Elliptic Curve Digital Signature Algorithm (ECDSA) for public-key cryptography are also under development [12, 13]. The use of HW-based co-processors is required by stringent latency and energy-efficiency requirements that are not achievable with software-based implementations.
- To embed signature mechanisms for controller authentication in new automotive computing units. Authentication of all senders is needed to ensure that only valid controllers are able to communicate on automotive bus systems [3, 12, 14, 15]. All unauthorized messages may then be processed separately or immediately discarded. Every controller therefore needs a certificate to authenticate itself against the gateway as a valid sender. For example, as proposed in [3], a certificate may consist of the controller identifier ID, the public key and the authorizations of the respective controller. The gateway, in turn, should securely hold a list of public keys of all accredited OEMs for the considered vehicle. Each controller certificate is digitally signed by the OEM with the relevant secret key. The gateway again uses the corresponding public key of the OEM to verify the validity of the controller certificate. If the authentication process succeeds, the relevant controller is added to the gateways list of valid controllers.

- To cluster the ECUs in different trustable classes depending on how easily they can be attacked. For example, in [16] a security framework for vehicular systems, called VeCure, is proposed, which can fundamentally solve the message authentication issue of the CAN bus. Each node that sends a CAN packet needs to send also the message authentication code packet (8 bytes). The ECUs are split into two categories, namely, the Low-trust and the High-trust groups. ECUs that have external interfaces, e.g., OBD-II or telematics are put in the low-trust group. The High-trust group ECUs share a secret symmetric key to authenticate each incoming and outgoing message.
- To implement intrusion detection mechanisms based on the physical or packet layer features. For example, a clock-based intrusion detection system at physical layer is proposed in [5]. Similarly, an in-vehicle network traffic monitoring technique is proposed in [17] to detect the increased transmission rates of manipulated message streams.
- To implement gateway firewalls. For example, as proposed in [3], if the vehicular controllers are capable of implementing digital signatures, the firewall rules are based on the authorizations given in the certificates of every controller. Therefore, only the authorized controllers are able to send valid messages to the high safety-critical in-vehicle bus systems. If the vehicular controllers do not have the abilities to use digital signatures, the firewall can be established only on the authorizations of each subnet. However, controllers of less restricted networks such as LIN or MOST should generally be prevented from sending messages to the high safety-relevant bus systems as CAN or FlexRay. Simplified firewall-like functionalities can be also implemented in each end-node and not only in the gateways, with the so-called Data Diode [18].

#### 4 AES-128/256 HW Accelerator Design for Real-Time Embedded Cryptography

The AES is a symmetric-key block cipher algorithm that processes data blocks of 128 bits with three different key sizes: 128 bits (AES-128), 192 bits (AES-192) and 256 bit (AES-256) [19]. The AES is an iterative algorithm and each iteration of the algorithm is called "round". The number of rounds depends by the key size: 10 rounds for the AES-128, 12 rounds for the AES-192 and 14 rounds for the AES-256. Each round consists of four different steps: a non-linear transformation by substitution (*SubBytes*), a permutation (*ShiftRows*), a linear transformation by mixing data (*Mix-Columns*) and the combination of the data with the round key (*AddRoundKey*). Each one of these transformations is invertible and this allow to revert the modifications of the AES algorithm and therefore to decrypt data.

An additional algorithm, called Key Expansion, derives the keys for each rounds starting form the initial Cipher Key. Figure 1 shows the AES algorithm by means of graphical representation of the rounds and their transformations.



**Fig. 1** AES encryption algorithm, the number of rounds (*N*) depends on the Cipher Key size: N = 10 for 128-bit Cipher Key (AES-128), N = 12 for 192-bit Cipher Key (AES-192) and N = 14 for 256-bit Cipher Key (AES-256)

The inter-round pipelined architecture is one of the most diffused approach for a co-processor implementing the AES algorithm, because it allows to reach a very high efficiency in terms of area/latency trade-off. Such architecture consists in implementing a single round with all its internal transformations and using it iteratively by means of a buffer to store the intermediate results. Figure 2 illustrates the inter-round pipelined architecture for the AES co-processor.

Also the Key Expansion algorithm is iterative and derives the round keys along rounds with transformations similar to the ones performed by the AES algorithm, except for the linear data mixing transformation which is not executed. Thus an architecture similar to the one illustrated by Fig. 2 well fits also for the module implementing the Key Expansion algorithm.



Fig. 2 Inter-round pipelined architecture for an AES hardware accelerator

**Table 1** AES HW accelerator synthesis results. LE stands for Logic Element, that, for the Stratix IV FPGA can be both an adaptive LUT (ALUT), i.e. a combinational logic resource, or a 1-bit register. kGE stands for kiloGate Equivalent, referring to 1 Gate Equivalent as a 4 transistors gate. The data reported in the 'Latency' and 'Throughput' rows and separated by the/character refer, respectively, to the AES-128 and to the AES-256 algorithms execution

| Feature              | FPGA technology    | 45 nm CMOS technology |
|----------------------|--------------------|-----------------------|
| Logic resources/area | 8063 LEs (4.4%)    | 19kGE                 |
| Maximum frequency    | 145 MHz            | 460 MHz               |
| Latency              | 11/15 clock cycles |                       |
| Throughput           | 1.69/1.24 Gbps     | 5.35/3.93 Gbps        |

#### 5 AES-128/256 HW Accelerator Implementation Results in FPGA and 45 nm CMOS Technology

Considering the logic resources overhead required to handle the AES-192 algorithm case, with respect to the security level offered by such algorithm, a single AES hardware accelerator able to support both the ciphers AES-128 and AES-256 and both the encryption and decryption algorithms has been implemented, thus offering a very high security level by means of the AES-256 algorithm, which is already declared to be resistant to post-quantum crypto-analysis [20].

The AES hardware accelerator has been synthesized on a Stratix IV FPGA (EP4SGX230KF40C2) and on the 45 nm CMOS technology provided by the Nan-Gate FreePDK45 Open Cell Library standard-cell library. Table 1 shows the main features.

As reported in Table 1, in case of implementation on the Stratix IV, the AES hardware accelerator logic resources is of 8063 LEs, that corresponds to the 4.4% of the total amount of available logic resources of the FPGA. Any LE of the Stratix IV FPGA can be either a combinational logic element (ALUT, Adaptive LUT) or 1-bit register or a combination of ALUT and registers. 3656 LEs are used as pure combinational logic elements, 262 as pure registers, 202 as a combination of combinational and sequential logic and 3943 LE are used for routing and interconnection.

#### 6 Conclusions

The implemented AES hardware accelerator can be used as co-processor to secure the automotive in-vehicle networks, ensuring the data confidentiality with a high level of security (AES-256) and matching the stringent real-time requirements of the automotive area. Thanks to its reduced latency (i.e. 23.9 ns in case of AES-128 or 32.6 ns in case of AES-256) and its high throughput (refer to Table 1) it can largely support many of the most diffused and safety-critical in-vehicle networks, such as the

17

CAN or the CAN-FD that feature, respectively, a maximum data rate of 1 Mbps and of 12 Mbps, or future Ethernet-based automotive architectures employing the Gigabit Ethernet protocol (1 Gbps) or the Fast Ethernet protocol (100 Mbps). Furthermore the presented AES hardware accelerator already support the AES-ECB mode of operation and it can be embedded in more sophisticated co-processors implementing the AES modes of operation such as the AES-CTR, the AES-CMAC, the AES-CCM and/or the AES-GCM ciphers. These modes provide also the data integrity, the data authentication and the anti-replay security services and are widely employed within many security communication protocols as the MACsec one (IEEE 802.1AE), for the data protection on Ethernet LANs.

Acknowledgements This work has been partially supported by PRA2017 and EPI H2020 projects.

#### References

- Nilsson, D.K., Larson, U.E., Picasso, F., Jonsson, E.: A first simulation of attacks in the automotive network communications protocol flexray. In: International Workshop on Computational Intelligence in Security for Information Systems, CISIS 2008, pp. 84–91. Springer, Heidelberg (2009)
- 2. Lin, C.W., Sangiovanni-Vincentelli, A.: Cyber-security for the controller area network (CAN) communication protocol. In: International Conference on Cyber Security, p. 17 (2012)
- Wolf, M., Weimerskirch, A., Paar, C.: Secure In-Vehicle Communication, p. 95109. Springer, Heidelberg (2006)
- Avatefipour, O., Malik, H.: State-of-the-art survey on in-vehicle network communication CAN-Bus security and vulnerabilities. Int. J. Comput. Sci. Netw. 6(6), 720–727 (2017)
- Cho, K.-T., Shin, K.G.: Fingerprinting electronic control units for vehicle intrusion detection. In: 25th USENIX Security Symposium, Austin, TX, pp. 911–927 (2016)
- dos Santos, E., Simpson, A., Schoop, D.: A formal model to facilitate security testing in modern automotive systems. In: Joint Workshop on Handling IMPlicit and EXplicit Knowledge in Formal System Development (IMPEX) and Formal and Model-Driven Techniques for Developing Trustworthy Systems, pp. 95–104 (2017)
- Hoppe, T., Kiltz, S., Dittmann, J.: Security threats to automotive CAN networks-practical examples and selected short-term countermeasures. Reliab. Eng. Syst. Saf. 96(1), 11–25 (2011). Special Issue on Safecomp 2008
- 8. Lukasiewycz, M., Mundhenk, P., Steinhorst, S.: Security-aware obfuscated priority assignment for automotive CAN platforms. ACM Trans. Des. Autom. Electron. Syst. **21**(2) (2016)
- Eisenbarth, T., Kasper, T., Moradi, A., Paar, C., Salmasizadeh, M., Shalmani, M.T.M.: On the power of power analysis in the real world: a complete break of the KeeLoq code hopping scheme. In: Wagner, D. (ed.) Advances in Cryptology CRYPTO 2008, pp. 203–220. Springer, Heidelberg (2008)
- Koscher, K., Czeskis, A., Roesner, F., Patel, S., Kohno, T., Checkoway, S., McCoy, D., Kantor, B., Anderson, D., Shacham, H., Savage, S.: Experimental security analysis of a modern automobile. In: IEEE Symposium on Security and Privacy, pp. 447–462 (2010)
- Shreejith, S., Mundhenk, P., Ettner, A., Fahmy, S.A., Steinhorst, S., Lukasiewycz, M., Chakraborty, S.: Vega: a high performance vehicular ethernet gateway on hybrid FPGA. IEEE Trans. Comput. 66(10), 17901803 (2017)
- 12. Patsakis, C., Dellios, K., Bouroche, M.: Towards a distributed secure in-vehicle communication architecture for modern vehicles. Comput. Secur. **40**, pp. 60–74 (2014)

- Sghaier, A., Zeghid, M., Machhout, M.: Fast hardware implementation of ECDSA signature scheme. In: 2016 International Symposium on Signal, Image, Video and Communications, pp. 343–348 (2016)
- 14. Ueda, H., Kurachi, R., Takada, H., Mizutani, T., Inoue, M., Horihata, S.: Security Authentication System for In-Vehicle Network. SEI Tech. Rev. 81 (2015)
- Mundhenk, P., Paverd, A., Mrowca, A., Steinhorst, S., Lukasiewycz, M., Fahmy, S.A., Chakraborty, S.: Security in automotive networks: lightweight authentication and authorization. Trans. Des. Autom. Electron. Syst. 22(2), 25:125:27 (2017)
- Wang, Q., Sawhney, S.: VeCure: A practical security framework to protect the can bus of vehicles. In: 2014 International Conference on the Internet of Things (IOT), pp. 13–18 (2014)
- Waszecki, P., Mundhenk, P., Steinhorst, S., Lukasiewycz, M., Karri, R., Chakraborty, S.: Automotive electrical and electronic architecture security via distributed in-vehicle traffic monitoring. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 36(11), 17901803 (2017)
- Okhravi, H., Sheldon, F.T., Haines, J.: Data Diodes in Support of Trustworthy Cyber Infrastructure and Net-centric Cyber Decision Support, pp. 203–216. Springer (2013)
- National Institute of Standards and Technology (NIST), Advanced Encryption Standard (AES), 26 Nov 2001
- Moody, D.: National Institute of Standards and Technology (NIST), Update on the NIST postquantum cryptography project. https://csrc.nist.gov/CSRC/media/Presentations/Update-onthe-NIST-Post-Quantum-Cryptography-Proje/images-media/2\_post-quantum\_dmoody.pdf

# Testing Facility for the Characterization of the Integration of E-Vehicles into Smart Grid in Presence of Renewable Energy



#### Paolo Ferrari, Alessandra Flammini, Marco Pasetti, Stefano Rinaldi, Flavio Simoncini and Emiliano Sisinni

**Abstract** In the last years, the increased environmental awareness is calling for the transition from vehicles powered by Internal Combustion Engines (ICEs) toward Electric Vehicles (EVs). Nevertheless, the wide penetration of such technologies is limited by the impact EV Charging Stations (EVCSs) have on the distribution grid. The management of EVCSs could benefit from the use of the energy produced by Renewable Resources, appropriately coupled with storage system, through the infrastructures offered by Smart Grids (SGs). The validation of these architectures can be performed in simulation or emulation environments. Whilst such approaches are profitable for validating the sensitivity of different architectures to parameters changes, sometimes the use of over-simplified models could bring to unreliable results. For this reason, a testing facility for the characterization of the integration of EVCSs in SGs has been designed and deployed at the eLUX lab of the University of Brescia, Italy. The testing facility includes an EVCS (22 kW), an EV (Renault Zoe), a controllable photovoltaic (PV) field (10 kWp) and a Battery Energy Storage System (BESS) (20 kWp, 23.5 kWh). The possibility to integrate a real-time emulator (OPAL-RT) for Hardware-In-the-Loop (HIL) emulation allows to easily expand the capabilities of the testing facility.

e-mail: emiliano.sisinni@unibs.it

P. Ferrari e-mail: paolo.ferrari@unibs.it

A. Flammini e-mail: alessandra.flammini@unibs.it

M. Pasetti e-mail: marco.pasetti@unibs.it

S. Rinaldi e-mail: stefano.rinaldi@unibs.it

F. Simoncini e-mail: flavio.simoncini@unibs.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_3

P. Ferrari · A. Flammini · M. Pasetti · S. Rinaldi · F. Simoncini · E. Sisinni (🖂)

Department of Information Engineering, University of Brescia, Via Branze 38, 25123 Brescia, Italy

#### **1** Introduction

Recently, Electric Vehicles (EVs), and in particular Battery EVs (BEVs), are going to replace conventional Internal Combustion Engine (ICE) vehicles because of normative restrictions which are stopping the circulation of the most polluting diesel vehicles. In addition, recent research works demonstrated that BEVs are less carbonintensive also with respect to Plug-in Hybrid EVs (PHEVs) and Hybrid EVs (HEVs) [1], and that the large penetration of EVs could significantly reduce air pollution in urban context, as demonstrated in [2, 3]. One of the main limit of BEVs is still the low energy density of batteries [4], but the recent advances in batteries technology seems to be promising in reducing costs and in improving the capacity [5]. Nevertheless, several obstacles are still present, mainly concerning the spatial and temporal stochasticity of the power demand of EVs at Electric Vehicle Charging Stations (EVCSs). In fact, the current power grid is not ready for a large penetration of EVs, as demonstrated in [6]. Several research works are proposing advanced optimization strategies for the integration of EV in Smart Grid [7], and some of them [8] are exploiting the communication infrastructures [9] deployed on the power grid for control [10], smart metering [11] and measurement applications [12, 13]. Such an integration is required for the adoption of proper Demand-Side Management (DSM) schemes [14], crucial for the successful integration of EVs in energy systems [15].

Generally, such integration schemas are evaluated through numerical simulations. This approach is suitable for the identification of the sensitivity of the system to parameters variations. Nevertheless, the reliability of the obtained results depends on the accuracy of the models. The integration of external devices requires the use of emulation environment and of test bench. EVs and their components, such as electric drives and Electronic Control Units (ECUs), are validated using Model In the Loop (MIL) [16] or Hardware In the Loop (HIL) [17] approaches, respectively in the case only models or physical subparts of the components of EV exist. Software In the Loop [18] approach is used typically to validate the firmware of ECU.

The target of this research is to design and deploy a testing facility for the evaluation in operating environment of different integration schemas. The facility should be flexible enough to easily test different approaches, to validate existing simulation models and to enable the integration of real-time emulator for HIL validation strategies. In particular, the testing facility should include not only EVCS and EVs, but also Distributed Energy Resources (DERs) and Distributed Energy Storage Systems (DESSs) for the validation of the integration of renewable resources. The testing facility has been deployed in the energy Laboratory as University eXpo (eLUX) of the University of Brescia, Italy.

The structure of the paper is organized as follows. In Sect. 2, the architecture of the testing facility and its deployment in the Engineering Campus of the University of Brescia is described. Then, the use of the testing facility for the monitoring of the charging phase of two different EVs is shown in Sect. 3. Finally, the results are summarized in the conclusions.

#### 2 The Testing Facility

A large penetration of EVCSs on the distribution grid requires the definition of proper architectures for their integration. Several research works are focusing on the integration of EVCSs with DERs, properly coupled with DESSs, to mitigate the impact of in-charge EVs on the power grid. These approaches are particularly effective for private charging applications [19], in which the power provided by the Distribution System Operator (DSO) is limited, causing an increase in charging times.

The architecture of the testing facility, designed and deployed at eLUX lab of University of Brescia, is shown in Fig. 1. The testing facility has been designed in order to be flexible and to be easily reconfigured to evaluate different scenarios, including the support of DSM programs, by receiving power schedules from a local Energy Management System (EMS), and by accepting end-users' requests, such as fast charge or low-cost recharge. The testing facility includes a Renewable Energy Source (RES) plant, which can be used to feed the EVCS and to reduce the load on the power grid, and a BESS used for testing load shifting strategies. The equipment of the testing facility is supervised and controlled by the Distributed Energy Resource Automatic Controller (DER-AC). The DER-AC monitors all of the systems installed in the testing facility and sends commands to local system controllers to regulate the power flows according to the rules defined by the validation scenario. The testing facility is completed by a Labview Graphical User Interface (GUI), which is used by the operator to monitor the state of the equipment and to set-up the validation scenario (i.e., the energy flows among the components of the testing facility). The Labview GUI is connected to the DER-AC by means of a WebService (WS) RESTful interface.

The testing facility deployed in the eLUX Lab of the University of Brescia consists of a set of equipment that are monitored and controlled by the Labview GUI, through the DER-AC device. The equipment that are part of the testing facility are located in the building "Modulo Didattico", one of the buildings of the Pilot Project Smart Campus, and the headquarter of eLUX lab of University of Brescia. The building is equipped with a 10 kWp photovoltaic (PV) plant, a 20 kWp molten salt (Na–NiCl<sub>2</sub>) Battery Energy Storage System (BESS) with a gross capacity of 23.5 kWh, and a



fast EVCS (equipped with 2 AC charging points, 22 kWp each) produced by Ducati Energia.

The PV, the BESS and the EVCS are connected to the main grid through the MV/LV substation of the building. The DER-AC is installed in the electric panel of the BESS. The prototype has been deployed using the PC Engines APU2C4 board, equipped with an AMD Embedded G series GX-412TC CPU, 1 GHz quad core, 64 bit, 4 GB DDR3-1333 RAM, 3 NIC Intel i210AT, one Compex WLE200NX 802.11a/b/g/n miniPCI express wireless card. The board is equipped with Arch Linux x86-64 distro, kernel version 4.7.2. The DER-AC is monitoring the power flows of each of the plants under its control (i.e., the PV plant, the BESS, the EVCS and the grid consumption of the building), by using three-phase power meters UPM209 produced by Algodue Elettronica s.r.l.. The three phase power meters use Rogowski coils as current transducers. These measurement devices have an accuracy on the active power measurement equal to the 0.1% Full Scale (FS) (Power Factor (PF) = 1).

#### **3** Application of the Testing Facility to a Real Case

The testing facility described in the previous section can be used to perform the characterization of several scenarios of integration between the EVCS and the distribution grid. The system can be also used to validate the models normally adopted in numerical simulations, to improve the reliability of simulation/emulation. One of the most important model is the charging profile of EVs. Typically, the EV charging profile is assumed constant. The testing facility can be used to easily estimate the charging profile of different EVs. In details, in the paper, two full-electric EV models are considered: an entry level (Renault Zoe R240) and a performing (Tesla Model S 90D) EV. The Renault Zoe R240 is powered by a 68 kWp engine and is equipped with a Lithium-Ion battery energy storage with a net (i.e., usable) capacity of about 22 kWh. The Renault Zoe supports only fast AC charging mode, for a maximum charging power of 22 kW. The Tesla Model S 90D is powered by a 193 kWp engine and is equipped with a Lithium-Ion battery energy storage with a usable capacity of 90 kWh. The Tesla Model S supports both fast AC as well ultrafast DC charging mode. The maximum charging power in AC mode is 16.5 kW, while in DC mode is 120 kW. The charging profile of Tesla Mode S is shown in Fig. 2a, recorded on 8th June 2018, starting from an initial State Of Charge (SOC) of 20%, with a sampling time of 2 s, while the charging profile of the Renault Zoe is shown in Fig. 2b (initial SOC = 38%, sampling time = 2 s, recorded on 6th June 2018). The different initial SOC affects the duration of the Li-Ion battery recharge and the algorithm used to charge the battery. When the SOC is below 95%, the Li-Ion battery is charged at constant current, after the threshold, the battery is charged at constant voltage.

Note as, while the charging profile of Tesla Model S can be assumed pretty similar to the ideal one, the charging profile of the Renault Zoe is affected by several disconnections during the charging, due to the over temperature of the battery charger.



Fig. 2 Active power recharge profile of a Tesla Model S (a) and Renault Zoe (b), connected to the testing facility. Data are sampled every 2 s



**Fig. 3** The power flows monitored by the testing facilities during the recharge of the Renault Zoe. Data are sampled every 2 s. Observation interval of 24 h, 8th of June 2018

Such a behavior depends on the cooling circuit of the EV. The main solution is to shade the charging station to avoid the battery charge in direct sunlight. As an example of the capability of the system, the testing facility is configured to monitor in real-time the power flows among the power plants (PV Plant, EVCS, and other electric loads) during the charge process of the Renault Zoe (initial SOC = 38%). The power flows are shown in Fig. 3. The data are sampled every 2 s. The experiment was performed over an observation interval of 24 h, the 8th of June 2018. As clearly highlighted from Fig. 3, the presence of EV during the charge affects severely the total grid consumption. The availability of a PV plant is typically not enough for limiting the impact of EV on power grid, because the EV charging phase could not match the PV production peak as well because the total power installed is not enough for the compensating the peak of EV consumption.
# 4 Conclusion

The increasing environmental awareness is causing the progressive transition from the traditional ICE vehicles toward EVs. The wide penetration of EVs is limited not only but the higher cost of vehicles if compared to traditional one, but also from the impact the charging system could have on the distribution grid. Each of the EV represents an electric load equivalent to tens of traditional houses, and moreover mobile. Several possible solutions have been proposed in literature, and evaluated by means of numerical simulations. One of the more interesting, in particular for private charging, exploits the use of DERs, coupled with DESSs, to limit the usage of power from the distribution grid. The research work described in the paper aimed to design and deploy a testing facility for the validation of such solutions in a working environment, to overcome the issue of over-simple simulation models. The testing facility deployed at the eLUX lab of University of Brescia is formed by a PV plant, a BESS and a fast EVCS, supervised by a Labview GUI. As a first result of the testing facility, it was highlighted as the charging profile of different models of EV could differ from the ideal one, generally considered in simulation environment. In particular, the charging profile of Renault Zoe is strongly affected by the environmental conditions (over temperature), causing disturbances on the power line.

Acknowledgements The Authors would thank Mr. Mirko Magri for his valuable contribution during the deployment of the testing facility.

The research work has been partially funded by University of Brescia as part of the research activities of the laboratory "energy Laboratory as University eXpo–eLUX".

## References

- Onat, N.C., Kucukvar, M., Tatari, O.: Conventional, hybrid, plug-in hybrid or electric vehicles? State-based comparative carbon and energy footprint analysis in the United States. Appl. Energy 150, 36–49 (2015)
- Ferrero, E., Alessandrini, S., Balanzino, A.: Impact of the electric vehicles on the air pollution from a highway. Appl. Energy 169, 450–459 (2016)
- 3. Buekers, J., et al.: Health and environmental benefits related to electric vehicle introduction in EU countries. Transp. Res. Part D Transp. Env. **33**, 26–38 (2014)
- Shareef, H., et al.: A review of the stage-of-the-art charging technologies, placement methodologies, and impacts of electric vehicles. Renew. Sustain. Energy Rev. 64, 403–420 (2016)
- 5. IEA, Global EV Outlook 2017: Two million and counting; 2017. 11
- Mahmud, K., et al.: Integration of electric vehicles and management in the internet of energy. Renew. Sustain. Energy Rev. 82, 4179–4203 (2018)
- Liu, L., et al.: A review on electric vehicles interacting with renewable energy in smart grid. Renew. Sustain. Energy Rev. 51, 648–661 (2015)
- 8. Rinaldi, S., Pasetti, M., Sisinni, E., Bonafini, F., Ferrari, P., Rizzi, M., Flammini, A.: On the mobile communication requirements for the demand-side management of electric vehicles. Energies **11** (2018)
- Rinaldi, S., Della Giustina, D., Ferrari, P., Flammini, A., Sisinni, E.: Time synchronization over heterogeneous network for smart grid application: Design and characterization of a real case. Ad Hoc Netw. 50, 41–57 (2016)

- Artale, G., Cataliotti, A., Cosentino, V., Di Cara, D., Guaiana, S., Nuccio, S., Panzavecchia, N., Tine, G.: Smart interface devices for distributed generation in smart grids: the case of islanding. IEEE Sensors J. **17**(23):7803–7811 (2017)
- Capriglione, D., Ferrigno, L., Paciello, V., Pietrosanto, A., Vaccaro, A.: On the performance of consensus protocols for decentralized smart grid metering in presence of measurement uncertainty. In: Proceedings of I2MTC 2013, pp. 1176–1181 (2013)
- 12. Barchi, G., Macii, D., Belega, D., Petri, D.: Performance of synchrophasor estimators in transient conditions: a comparative analysis. IEEE Trans. Instrum. Meas. **62**(9), 2410–2418 (2013)
- Castello, P., Ferrari, P., Flammini, A., Muscas, C., Pegoraro, P.A., Rinaldi, S.: A distributed PMU for electrical substations with wireless redundant process bus. IEEE Trans. Instrum. Meas. 64(5), 1149–1157 (2015)
- 14. Pasetti, M., Rinaldi, S., Manerba, D.: A virtual power plant architecture for the demand-side management of smart prosumers. Appl. Sci. 8(3), 432 (2018)
- 15. Hu, J., et al.: Electric vehicle fleet management in smart grids: a review of services, optimization and control aspects. Renew. Sustain. Energy Rev. 56, 1207–1226 (2016)
- Scherler, S., et al.: Holistic design of an electric vehicle with range extender in connected traffic systems. In: Proceedings of EVER, pp. 1–8 (2018)
- 17. Abdelrahman, et al.: A novel platform for powertrain modeling of electric cars with experimental validation using real-time hardware in the loop (HIL): a case study of GM second generation chevrolet volt. IEEE Trans. on Power Elect. **33**(11):9762–9771 (2018)
- Niegl, M., et al.: A. Model-based steering ECU application using offline simulation (Software in the loop). In Proceedings of AVC, pp. 269–274 (2017)
- Rinaldi, S., Pasetti, M., Vivacqua, G., Trioni, M.: On the integration of E-Vehicle data for advanced management of private electrical charging systems. In: 2017 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 600–605. Turin, Italy (2017)

# **On-the-Fly Secure Group Communication on CAN Bus**



M. D. Grammatikakis, N. Mouzakitis, E. Ntallaris, V. Piperaki, K. Patelis and G. Vougioukalos

**Abstract** vatiCAN is a data link protocol which supports authentication and integrity for critical messages, thwarting masquerade and replay attacks on in-vehicle networks, such as CAN bus. Our extension to vatiCAN (called vatiCAN-G) supports on-the-fly secure group communications, improving security through separate 32-bit authentication for group mask, and 64-bit authentication for data. Experimental results from running vatiCAN-G on small CAN networks with Atmel AVR-based microprocessors indicate limited overhead compared to vatiCAN, in the ms range.

# 1 Introduction

CAN (Controller Area Network) is a serial communication technology that simplifies installation, reduces wiring, and enables very reliable, real-time data exchange among sensors, actuators, and electronic control units (ECUs), providing standardization of the ECU infrastructure and network.

CAN protocol (ISO 11898) defines an asynchronous, event-driven prioritized communication protocol based on two OSI layers: Physical Layer specifying data rates from 125 Kbit/s to 1 Mbit/s, and Data Link Layer [1]. Higher data rates are useful for safety-critical applications in powertrain, multimedia, and vehicle chassis domains. With CAN protocol, a CSMA/CA (Carrier Sense Multiple Access with Collision Avoidance) policy is used. Hence, although any node has the right to access the bus, at the end of the arbitration phase, only the higher priority CAN node (the one with the lowest ID) is authorized by its interface (CAN controller and transceiver) to broadcast its message to the bus. Thus, for high bus loads, CAN protocol can cause increased delay for less critical, lower-priority CAN messages. Upon message transmission, CAN nodes with lower priority messages switch to the receiving state to listen to the broadcast message.

TEI of Crete, 71410 Heraklion, Greece e-mail: mdgramma@cs.teicrete.gr

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_4

M. D. Grammatikakis  $(\boxtimes) \cdot N.$  Mouzakitis  $\cdot$  E. Ntallaris  $\cdot$  V. Piperaki  $\cdot$  K. Patelis  $\cdot$  G. Vougioukalos

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

In this paper, we focus on the design, implementation, and evaluation of a state-ofthe-art protocol for *on-the-fly secure group communications on in-vehicle bus-based networking systems*, thwarting spoofing and replay attacks. Although our schemes have been implemented on the commonly available CAN bus, it is possible to extend them to other in-vehicle networks, such as LIN for low cost communication among sensor/actuator and ECUs, MOST used for multimedia, or FlexRay proposed (but not so frequently used) for high priority subsystems, such as adaptive cruise control, lane departure warning, engine control, gear, and anti-lock breaking system (ABS).

In our threat model, we consider two common types of in-vehicle network attacks, in which an adversary may tamper a CAN node to either perform a **replay attack**, in which a perpetrator retransmits earlier data in the system, or a **masquerade (or spoofing) attack**, in which an attacker sends a fraudulent CAN message which misleads other nodes on its identity.

To revoke these threats, our proposed group security protocol establishes separately authentication and integrity of the group communicator, and of the payload of CAN messages, i.e. no data can be changed by unauthorized nodes, and no node can impersonate the identity of another ECU. In addition, it supports data origin authentication, i.e. a receiver is sure about the identity of a sender.

In this context, TESLA has been proposed for broadcast authentication in wireless networks (IETF-RFC 4082) [2]. Unlike all protocols discussed here, TESLA does not provide shared keys, instead a key is sent during each round (along with data) to authenticate a message from the same ID in the previous round.

Szilagyi and Koopman use CAN nodes to vote on the authenticity of messages [3]. The protocol allows for increased security, but introduces additional delays. For time-triggered networks, they also propose One MAC Per Receiver (OMPR) authentication for protecting from masquerade and replay attacks. Each pair of CAN nodes shares a secret key (exchanged using one-way hash functions). This key is used together with message data and global clock to calculate a MAC which is concatenated to the end of several CAN messages [4]. For a given number of authentication bits per packet, voting and TESLA are more appropriate for high assurance systems with a large number of receivers, OMPR and voting schemes are better for low assurance systems with a small number of receivers. OMPR introduces additional delays unsuitable for many real-time automotive applications, except where high security is needed.

For networks without a global clock, Lin and Sangiovanni-Vincentelli propose using message counters. They reduce overhead by transmitting only the least significant counter bits [5]. Simulations show that this mechanism achieves sufficient security, without introducing high communication overhead (bus load and message latency).

Nürnberger and Rossow developed Vetted Authenticated CAN (vatiCAN) [6], the first open source software package for authentication and ID/data integrity protection. Their solution establishes a secure way for messaging among critical nodes, while non-protected components continue to communicate using legacy CAN messages, thus providing backward compatibility. The authentication scheme is similar to the proposed scheme by Lin and Sangiovanni-Vincentelli, i.e. the MAC is calculated as

function of the packet body, pair-wise shared key, and counter. However, vatiCAN uses a global counter specific to each sender (called GRC). Furthermore, instead of distributing the MAC over multiple messages, it is propagated in a second message that validates the previously transmitted data, so that receivers can authenticate the source of the message in a subsequent step (similar to how TESLA treats the keys). Results from emulating traffic diagnostics from a commercial vehicle (VW Passat B6) on a vatiCAN prototype consisting of real nodes and simulated ones reveal a CAN message latency less than 4 ms, with strong security guaranties.

In all previous work, **security groups are defined statically**, e.g. in vatiCAN, secure nodes are fixed and frozen during setup phase. Static groups do not allow adding or deleting members to a (pre-defined) group later. Moreover, CAN network design based on static groups would require an uncomfortably high degree of trust among many CAN nodes who must share among themselves all keys.

Our proposed extension (called vatiCAN-G) focuses on supporting dynamic security groups (called Cliques), where the number of participating nodes is not fixed during setup. With this protocol, authenticated broadcasting occurs in sequence, within each clique. Therefore, node keys need not be released to any nodes that don't co-share a security group. This can further enhance high-level security, including event logging and related intrusion detection services.

In Sect. 2, we explain the vatiCAN-G protocol. Section 3 compares its overhead to vatiCAN for small to medium size CAN networks based on Atmel AVR micro-processors. We close our presentation with a summary and references.

### 2 vatiCAN-G: On-the-Fly Security Cliques

The vatiCAN-G protocol, shown in Fig. 1, supports a secure way to broadcast a CAN message within a group which can be defined on-the-fly by the creator of the message. Unlike vatiCAN which uses two message exchange phases (message send, and authentication), vatiCAN-G consists of **three control flow phases** (group definition, message send, and authentication). A ticketing scheme ensures that all phases in a clique are executed atomically and cannot be interrupted by another clique. Our extension supports non-secure, legacy messages, therefore non-secure ECUs can be used without modifications (similar to vatiCAN).

For secure vatiCAN-G messages, a group mask is used to define a vatiCAN-G Clique (function **initGroupMask**) as a group of CAN nodes who intend to **participate next, in a single round of secure communications**. Participants in a vatiCAN-G Clique instance, uniquely identified by this group mask, share a session key, computed (in **calcSessionKey**) by hashing on secret keys of all group nodes.

Phase I: Send: a sender announces a Group Start message (via Send) that contains:

- an *ID* from the clique sender.
- a *32-bit group mask*: the first four payload bytes define Clique participants. (up to 32 CAN nodes).



Fig. 1 Vatican-G protocol with three send/receive control flow phases

• a *32-bit* message authentication code (*MAC*<sub>1</sub> computed via **MessageAuthentication**: these last four bytes of the payload ensure that an authorized sender has transmitted the group start message.

**Phase I**: Receive: Each CAN node receiving a group start message (via **MsgAvail-able**) authenticates the group mask by comparing  $MAC_1$  to a hash based on the group mask, the session key composed from the secret keys of Clique participants (indexed by mask), and a global counter (GRC). If the group mask is authenticated, the node shall wait next for secure data transmission (in Phase II).

Phase II: Send: The sender sends 64-bit data (with the same sender ID).

**Phase II**: Receive: Upon message receipt, an authentication code is computed at each intended receiver (via **MessageAuthentication**). This code is computed as a hash based on the *data*, *GRC*, and *session key* of all Clique participants.

**Phase III**: Send: The final step involves transmitting the sender *ID* with a  $MAC_2$  (as payload) for authentication.  $MAC_2$  is computed using *data*, *GRC*, and *session key*. **Phase III**: Receive: The receiver accepts the message only if the previously computed  $MAC_2$  matches the transmitted MAC code. Otherwise, the receiver rejects the data.

Our protocol enhances security by supporting 32-bit MAC for group mask and 64bit MAC for data, compared to 64-bit MAC for data in vatiCAN. As an informal proof an adversary listening to CAN cannot easily detect an *ID* and announce a spoofed group start message with the *ID*, since the *group mask* is authenticated with a 32-bit  $MAC_1$  (Phase I). The attacker cannot replay a group start message in Phase I (i.e. resending a spoofed message), since *group mask* is protected by incorporating *GRC*  in the  $MAC_1$  authentication. Similarly, confidentiality and integrity in Phases II and III are accomplished via  $MAC_2$  authentication, with GRC in the  $MAC_2$  computation.

Alike vatiCAN, but unlike schemes with pair-wise keys, vatiCAN-G memory resources increase linearly with the number of secure CAN nodes (32 in our case). vatiCAN-G also uses lightweight crypto: keccak/salsa coded in AVR assembly.

Finally, in addition to round-trip time (in vatiCAN), vatiCAN-G supports one-way delay (OWD), an important metric for critical frames that flow uni-directionally, e.g. from a wireless sensor to an ECU, and finally to ABS subsystem. Our vatiCAN-G OWD computation extends Choi and Yoo's algorithm [7] to compute OWD during runtime (in parallel with RTT) by rewriting/simplifying network calculus equations to avoid recomputation (omitted due to space restrictions).

### **3** Experimental Framework, Testbenches and Results

We have interconnected Atmel AVR microcontrollers via DFRobot's CAN bus shield which combines the standard MCP2515 CAN Controller and MCP2551 CAN transceivers. We configure small to medium range CAN networks with up to 6 nodes which communicate at 500 Kb/s using either Legacy and vatiCAN, or Legacy and vatiCAN-G messages. By writing a unique number (e.g. 0–15) in the microcontroller's EEPROM, we provide unique node identification and unify programming across CAN nodes, increasing reuse and software maintenance.

In Fig. 2, we examine average delay for forward + reverse path among 2 CAN nodes (500 experiments). An injection rate of 100 vatiCAN-G packets/sec refers to 3 phases of message exchanges in a period of 10 ms. Results indicate that, for all protocols, the delay increases linearly with the number of nodes and injection rate, with a slightly higher rate for vatiCAN-G. Note that 2-phase vatiCAN is 2.03 times slower than Legacy, while 3-phase vatiCAN-G is 1.52 times slower than vatiCAN; for example, extends the braking distance to 1.37 m at motorway speed of 100 km compared to 0.9 m for vatiCAN. In addition, Legacy, vatiCAN, and vatiCAN-G messages experience a min/max variance of [-23%, 24%], [-29%, 28%], and [-36%, 44%], respectively. The higher variance for vatiCAN-G is due to increased packet interference for CAN access (via CSMA) due to the higher number of messages that must be processed in a given time interval.

Figure 3a shows secure, periodically-trigerred synchronous GRC broadcast for improved safety. A sender  $P_0$  sends message *ID* get at T0, initiating the protocol, and subsequently message want at T1. Next, all group participants ( $P_1$ ,  $P_2$ ,  $P_N$ ) send an authenticated message want synchronously to each other (in tandem) and eventually to the GRC server. Upon receiving all expected want messages, GRC server broadcasts a new GRC value received at all secure nodes. Then, at T2, all group participants start to send an authenticated ok message synchronously to each other (in tandem) and eventually to the GRC server who broadcasts a final go message, arriving at  $P_0$  at T3. Figure 3b shows the average delay for the synchronous GRC protocol versus the number of receiving nodes. Delay (T1–T0) is constant, as expected.



**Fig. 2** Average protocol communication delay (forward + reverse path) vs Injection Rate in 6-node CAN bus; Two nodes are used for measuring the one way delay, while either M = 3, or M = 4 nodes are simultaneously injecting protocol messages (Legacy, vatiCAN, or vatiCAN-G)



Fig. 3 Synchronous GRC protocol: a control flow, and b average delay for 2-5 nodes

Delays (T2-T1) and (T3-T2), related to want and ok messages, increase linearly with the number of nodes due to sequential broadcasts, c.f. red arrows in Fig. 3a. GRC server is also used for ticketing with an average delay of 4.2 ms.

## 4 Conclusion

We have developed vatiCAN-G protocol for group-based CAN authentication protecting from masquerade and replay attacks and compare its performance to vatiCAN protocol. In the future, we hope to experiment with large CAN networks and extend vatiCAN-G protocol with event logging and intrusion detection policies for protecting against Denial-of-Service attacks on in-vehicle networks.

Acknowledgements This work was partially funded by National Matching Funds 2017–2018 of the Greek Govt (GSRT) related to "FP7-DREAMS" (GA No 610540).

### References

- Lima, Rocha, F., Völp, M., et al.: Towards safe and secure autonomous and cooperative vehicle ecosystems. In: Proceedings of Workshop on Cyber-Physical Systems Security and Privacy, pp. 59–70 (2016)
- 2. Tesla remote attack: https://www.youtube.com/watch?v=c1XyhReNcHY. Accessed 9 Jun 2018
- Szilagyi, C., Koopman, P.: A flexible approach to embedded network multicast authentication. In: Proceedings of 2nd Workshop on Embedded Systems Security (2008)
- 4. Szilagyi, C.: Low cost multicast network authentication for embedded control systems. Ph.D. dissertation, ECE, CMU. www.ece.cmu.edu/~koopman/thesis/szilagyi.pdf (2012)
- 5. Lin, C.-W., Sangiovanni-Vincentelli, A.: Security-aware design for cyber-physical systems: a platform-based approach, Springer (2017). ISBN 978-3-319-51327-0
- Nürnberger, S., Rossow, C.: vatiCAN—Vetted, Authenticated CAN Bus. In: Proceedings of Conference on Cryptographic Hardware and Embedded Systems, pp. 106–124. Springer, LNCS 9813 (2016)
- 7. Choi, J.-H., Yoo, C.: One-way delay estimation and its application. Comput. Commun. 28, 819–828 (2005)

# Part II Healthcare and Bio-electronic Systems

# Neuromuscular Disorders Assessment by FPGA-Based SVM Classification of Synchronized EEG/EMG



Daniela De Venuto and Giovanni Mezzina

Abstract Exploiting the synchronized assessment of the neuromuscular implications, this paper proposes an embedded digital architecture for the assessment of the movements' automatism and the reduction of pre-motor function capability. The study can enable a forward recognition of the Parkinson's disease (PD) progression stages, which are characterized by muscular disorders. The architecture, implemented on Altera Cyclone V FPGA, classifies in real-time these physiological disorders during the walk. The system operates on 8 surface EMG (limbs) and 7 EEG (motor-cortex). The signals, synchronously acquired and processed, undergo to a features extraction (FE) in the time-frequency domains. The features are timecontinuously processed (in chronological order) from an innovative on-going Support Vector Machine (SVM) classifier. The SVM identifies and categorizes the patient pathology severity. Experimental results from 4 subjects affected by mild (n = 2) and heavy PD (n = 2) show an accuracy 93.97%  $\pm 2.1\%$  in PD stages recognition.

# 1 Introduction

Among the main symptoms of the motor deficit in Parkinson's disease (PD), gait disorders are the most characteristic and debilitating implications [1]. New assessment strategies and data post-processing algorithm have been introduced in order to evaluate the gait and the balance characteristics in an earlier stage of PD [1].

At the state of the art, the only existing diagnosis technique comes from expertise and careful observation/evaluation by the physician, while the subject is involved in outpatients' clinical protocols. Nevertheless, measuring continuously at home the disease-related implications, could improve the reliability of the diagnosis once the clinical visit occurs. In this context, several wearable solutions have been proposed

D. De Venuto (🖂) · G. Mezzina

Department of Electrical and Information Engineering, Politecnico Di Bari, 70125 Bari, Italy e-mail: daniela.devenuto@poliba.it

G. Mezzina e-mail: giovanni.mezzina@poliba.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_5

| System<br>features | Kostikis<br>et al. [3]        | Braybrook<br>et al. [4]                        | Ruonala<br>et al. [5]                                                     | Salarian<br>et al. [6]         | Perumal<br>et al. [7]               | Our work                                                                 |
|--------------------|-------------------------------|------------------------------------------------|---------------------------------------------------------------------------|--------------------------------|-------------------------------------|--------------------------------------------------------------------------|
| Technology         | Smartphone<br>ACC:<br>Glove   | ACC:<br>Wrist                                  | EMG:brachii<br>ACC:brachii                                                | ,GYRO:hand                     | Force<br>Sensors:<br>Toe            | EEG/EMG                                                                  |
| PD features        | Freq.<br>(PSD)                | Tremor<br>time<br>between<br>9.00 and<br>18.00 | ACC:<br>RMS,<br>peak Freq<br>EMG:<br>Single<br>Motor<br>Unit<br>Potential | Rotation<br>of hand<br>and PSD | Gait Step<br>length,<br>stride time | Cortico-<br>Muscular<br>Assess-<br>ment:<br>MRPs and<br>Contrac-<br>tiom |
| Applicability      | Differentiate<br>PD and<br>HC | Differentiate<br>PD and<br>HC                  | Differentiate<br>PD, HC<br>and ET                                         | Tremor<br>Detection            | Differentiate<br>PD and<br>HC       | Differentiate<br>PD stages,<br>PD and<br>HC                              |
| Classifier         | Pearson<br>Coeffi-<br>cient   | Threshold                                      | PCA,<br>LDA                                                               | Ad hoc<br>Algorithm            | LDA                                 | SSVM                                                                     |
| Accuracy           | 90%                           | 90.25%                                         | 81%                                                                       | 94.2%                          | 91.58%                              | 93.97%                                                                   |

Table 1 Implemented system comparison with the state of the art

ACC: accelerometer, GYRO: gyroscope, PSD: Power Spectrum Density, PCA: Principal Component Analysis, LDA: Linear Discriminant Analysis, SSVM: Serial Support Vector Machine

to help clinicians in performing early diagnosis, differential analyses, and objective quantification of PD symptoms. In Table 1 are reported some noteworthy works from a recent and accurate review on the state of the art in PD linked wearable technologies [2]. In Table 1, the selected systems are compared considering relevant features, such as the technologies for sensing and computing, the monitored PD features, the platforms applicability, the adopted classifier, and the relative accuracy. Mostly the solutions at the state of the art are based on using inertial sensors [3–6] and rarely force sensors and electromyography [5, 7]. Most systems use PC as computing unit. They have a wearable sensing interface, but not a computing core with the same characteristics. It makes the whole solution not suitable for a fully portable solution. In addition, all the works in [2] monitor the progression of specific symptoms, allowing the system to recognize only the difference between PD patients and healthy control (HC), except for [5], where the authors claim to be able to distinguish essential tremors (ET) from the PD similar symptoms. Nevertheless, differentiating PD stages can be extremely useful for treatment management and personalized medicine [2]. Indeed, due to its nature, the PD involves combined motor and cerebral activity in terms of reduction of pre-motor function capabilities [8, 9]. Basing on these concepts, this work intends to introduce a technological tool to study and differentiate, with proper accuracy, the PD progression starting from gait disorders.

The platform implements a multichannel sensing system (8 EMG and 7 EEG electrodes) that synchronously digitizes and analyzes the bio signals in a real-time context. The analysis consists of a clinical-evidence based features extraction (FE) that operates in the time-frequency domains. The FE stage defines a set of indexes that feed a time-continuous Support Vector Machine (SVM) classifier. It has the role of on-line discriminating two clinically proven PD stages: Stage 3 and 4 from Hoehn & Yahr (H&Y) scale [10]. Finally, the architecture is validated on an FPGA with a future perspective of ASIC implementation for wearable applications.

## 2 The Overall System Architecture

The proposed architecture can be divided in two parts: the EEG computing for the brain implications assessment and the EMG computing for the muscular PD repercussions. These blocks realize a clinical-evidence FE both in time and frequency domains. These features are the inputs of an on-going SVM classifier. All the units operate on a unique central unit, realized by an FPGA, which processes the signals provided by the both types of sensors and analyzes them in order to classify the gait disorders as related to mild or a severe PD (e.g., 3rd and 4th stage of H&Y, respectively).

**Analyzed Signals**. The PD affects the motor control system showing limits in the movement control capability and leading to the lack of motor automatisms [8–10]. These abnormities in the movement control can be evaluated by studying some brain potentials, named Movement Related Potentials (MRPs). The here proposed system extracts three MRPs:  $\mu$  and  $\beta$  rhythms and Bereitschafts potential (BP). The  $\mu$ -rhythm is detectable in the band: 7.5–12.5 Hz, the  $\beta$ -rhythm ranges in 12.5–30 Hz band and BP band is 2–5 Hz. All the MRPs are detectable up to 500 ms before the stimulus in the opposite hemisphere with respect to the limb involved in the movement [8, 9].

## 2.1 The Sensing Platform

The multichannel sensing system wirelessly acquires data from 7 EEG channels on the motor area: T3, T4, C3, C4, CZ, P3, P4. The AFz electrode is used as GND for a monopolar reading and the reference electrode (REF) is on the right ear lobe. EEG samples are recorded in an analog input range of  $\pm$  375 mV with 24-bit resolution at 500 Hz sampling rate [11]. Synchronously, 8 EMG nodes placed on Gastrocnemius, Tibialis, Rectus and Biceps Femoralis (on both legs) acquire signals with 16-bit resolution at 2 kHz. The EEGs are band-passed between 0.5–30 Hz by using an 8th order band-pass Butterworth filter, while the EMG are first down sampled to 500Sa/s and numerically filtered between 10–200 Hz.

### 2.2 The Muscular Implication Computing

The EMG signals are sent to the FPGA, where they are used for the Trigger signal generation. This block creates a 1-bit signal correspondence starting from the 16-bit EMG samples. The trigger signal switches between the '0' value and '1' if the muscle activation condition is satisfied. A moving average-based algorithm, detailed in [12, 13], realizes the activation condition. Briefly, it consists in a moving average on the EMG squared signal, computed cyclically on 1 s of acquisition. This value constitutes the threshold for the activation condition. The last 250 ms of the acquisition are used to define the instantaneous muscular magnitude. If this latter is higher than the threshold, the muscle activation condition is respected, and the trigger signal goes '1'. The present system uses the information linked to the parallel activations between the agonist-antagonist muscle pairs (co-contractions) [13].

### 2.3 The Brain Implication Computing

When the Gastrocnemius is activated (trigger = '1'), the EEG raw data referred to the dedicated motor cortex are processed. The here adopted time-frequency analysis is the Short Time Fourier Transform (STFT). This method allows a frequency band resolution for the FFT of 2 Hz [8]. Once the power spectrum is defined, the MRPs are extracted, returning the peaks of the power spectrum in the BP,  $\mu$  and  $\beta$  bands. As shown in Fig. 1, the system implements on FPGA a single FFT processor (256 points-24bit resolution), which is sequentially fed with the data provided by the 7 EEG branches. In particular, 256 samples (i.e. 500 ms of acquisition, for each EEG channel), are cyclically stored in 256 words RAMs at 24bits waiting for the proper trigger. The FFT Controller FSM provides the data to be analyzed (D), the control signal for the processor (FFT\_sink) and the processor reset flag (RST). The FFT Controller FSM exploits a 500 Hz clock for the RAM management and uses a 4 MHz clock for the FFT processor (Clk\_FFT). When the positive edge of the trigger occurs, the opposite FFT Controller T4 branch starts sending data to FFT processor via Data Mux. When the FFT is completed, the processor sends the data to the MRPs Extraction block, resets itself, and set to '1' the flag of SRC\_ready (waiting for new data). The SRC ready enables a 3-bit address counter, which drives the MUX.

Then, the *FFT Controller P4* starts to send data to the processor, repeating cyclically the procedure. The FSM for the MRPs calculation extracts the BP,  $\mu$  and  $\beta$  in a sequential way realizing a "time continuous" signals, as shown in Fig. 1.



Fig. 1 Schematic of the brain implication computing branches

### 2.4 The Time Continuous SVM

The features generated by the EEG and EMG computing branches (Sects. 2.2-2.3) feed an SVM classifier. This classification structure has been selected due to its capability of learning by small set of observations (brief machine learning protocols), without the need of a specific and accurate FE stage. Considering a single reference leg movement, the classifier block organizes the MRPs and the co-contractions in a time continuous signal structure: the *Unlabelled Feature Vector*,  $UFV \in \Re^{B,Nf}$ , where B is the number of bits that compose the sample (B = 12 bits) and N<sub>f</sub> is the number of features ( $N_f = 14$  features). In its main form [14], the SVM is a binary discriminator, which classifies data by finding the best hyperplane that separates all the features that identify the first class from the ones related to a second class. The SVM requests in input a set of labelled features vectors structured as { $\mathbf{F}i \in \mathfrak{R}^{Nf}$ , Yi}, with  $i = 1...N_{obs}$ number of observations (*train dataset*) and Yi  $\in \{-1, 1\}$  is the i-th observation label. In this application, the Yi =  $-1 \rightarrow 4$ th stage of PD, while Yi =  $1 c \rightarrow 3$ rd H&Y stage. The SVM derives the hyperplane by the support vectors  $SV \in \Re^{Ns,Nf}$  with  $N_s$  number of support vectors ( $N_s \ll N_{obs}$ ). Each vector is supported by a dedicated label, named  $Y_{sv}$  and a Lagrange multiplier  $\alpha_{sv}$ .

In this work, we treated the features vector and the support vectors as time continuous signals, refreshing them by the computing time  $t_i$ . The prediction equation is:

$$f(\mathbf{x}) = \sum_{sv=1}^{N_s} \alpha_{sv} Y_{sv} \cdot \sum_{i=1}^{N_f} x(t_i) \cdot SV_{sv}(t_i) + b \tag{1}$$



Fig. 2 SVM classifier schematic

where *x* is the features to be predicted (without label), *b* is the hyperplane bias term, and  $x^T SV$  is the *linear SVM kernel* function,  $t_i$  is formally the arrival time of the i-th features. The Fig. 2 shows the implemented SVM structure, expanding only a single main block (*Weighted Sum#1*). It is structurally repeated for N<sub>s</sub> = 15 times. The difference among the *Weighted Sum* blocks lies in the parameters { $\alpha_{sv}$ , SV<sub>sv</sub>,  $y_{sv}$ } that characterize the structure. These parameters are contained in a dedicated 45 words ROM (*SVM Config. ROM*).

The system synchronizes the SV<sub>sv</sub> as time continuous signal by matching it with the UFV streaming. The  $SV_{sv}(Ti)$  and the UFV(Ti) are multiplied generating the signal  $UFV(Ti) \cdot SV_{sv}(Ti)$ . The last signal is integrated by a cumulative sum based on a DFF. The resulting signals is the  $\sigma_{sv}$ , which multiplied by the corresponding Lagrange multiplier  $\alpha_{sv}$  and the dedicated support vector label  $y_{sv}$  provides a weighted  $\sigma_{sv}(W\sigma_{sv})$ . This latter signal, from all the blocks, leads to the prediction (f(Ti)) by a general sum in which the  $W\sigma_{sv}$  with sv = 1,...,15, and the SVM bias term (*b*) converge.

Then a zero threshold comparator (*Sign Check*) determines the label assignment for the patient classification. After  $N_f = 18$  cycles, the definitive prediction *f* is reached.

# **3** Results

The validity of the proposed platform in detecting healthy subjects within a dataset of mixed PD and control patients has been treated in [8, 15, 16]. Basing on the most characterizing neuromuscular features, the present section outlines the experimental validation of the system, obtained through in vivo measurements on a dataset that includes EEG/EMG recordings from n = 2 subjects affected by clinically proved mild form (3rd H&Y stage) of PD and n = 2 subjects in advanced stage of the pathology (4th H&Y stage). The subjects under test were involved, under the supervision of specialized staff, in the same protocolled 10 m walking task [10], collecting respectively 2700 steps per patient (~ 2500 are dedicated to the ML–train set).

**Machine Learning Performance**. The supervised ML stage is operated on a train set that comprises 2500 steps per patient, for about 10000 steps. The linear SVM, compatible with the Eq. (1), has been realized with a Steve Gunn's approach [14] in a multiclass method: One-vs-One. The classifier training time on the laptop (AMD A10-9600P processor) is 4.86 s  $\pm$  1.6 s.

**Real-time Classification**. The *test set* used for the classifier real-time accuracy evaluation has been conducted on 242 observations, of which 141 for the mild PD subjects (3rd H&Y) and 101 for heavy PD affected ones (4th H&Y). The resulting trained classifier reached an accuracy of 97.21% in a supervised cross-validation test [14].

In the real-time context, the implemented classifier shows an accuracy of 93.97% with a positive predicted value rate of 92.9% (131/141) for the 3rd stage subjects and the 95.04% (96/101) for 4th stage ones.

# 4 Conclusions

This paper detailed the design and implementation on FPGA of a time-continuous SVM based classifier for the Parkinson's disease stages recognition. The system demonstrated its capability in distinguishing two clinically tested PD stages: the 3rd and 4th of the H&Y rating scale, with a misclassification rate of about 6% in a real-time context. The classification is based on an innovative FE stage that operates during gait. The proposed architecture analyzes data from wearable and wireless EEG and EMG equipment and classifies the patient in less time with respect to the standard linear SVM. The structure of the implemented SVM shows a high degree of modularity, allowing to fit more complex classification problem (leaving free 15 DSP blocks, still available on Altera Cyclone V FPGA). In vivo tests (n = 4 subjects affected by PD) showed an accuracy of 94% on a single walking step.

### References

- De Venuto, D., Annese, V.F., Mezzina, G.: Remote neuro-cognitive impairment sensing based on P300 spatio-temporal monitoring. IEEE Sensors J. 16(23), 8348–8356 (2016). https://doi. org/10.1109/jsen.2016.2606553
- Rovini, E., et al.: How wearable sensors can support Parkinson's disease diagnosis and treatment: a systematic review. Front. Neurosci. 11, 555 (2017)
- 3. Kostikis, N., et al.: A Smartphone based tool for assessing Parkinsonian hand tremor. J. Biomed. Heal Informat. **19**, 1835–1842 (2015)
- Braybrook, M., et al.: An ambulatory tremor score for Parkinson's disease. J. Parkinsons. Dis. 6, 723–731 (2016)
- 5. Ruonala, V., et al.: EMG signal morphology and kinematic parameters in essential tremor and Parkinson's disease patients. J. Electromyogr. Kinesiol. **24**, 300–306 (2014)
- Salarian, A., et al.: ITUG, a sensitive and reliable measure of mobility. IEEE Trans. Neural Syst. Rehabil. Eng. 18, 303–310 (2010)
- Perumal, S.V., Sankar, R.: Gait and tremor assessment for patients with Parkinson's disease using wearable sensors. ICT Expr. 2, 168–174 (2016)
- De Venuto, D., Annese, V.F., Mezzina, G., Defazio, G.: FPGA-based embedded cyber-physical platform to assess gait and postural stability in Parkinson's disease. IEEE Trans. Compon. Packag. Manuf. Technology. https://doi.org/10.1109/tcpmt.2018.2810103
- De Tommaso, M., Vecchio, E., Ricci, K., Montemurno, A., De Venuto, D., Annese, V.F.: Combined EEG/EMG evaluation during a novel dual task paradigm for gait analysis. In: Proceedings–2015 6th IEEE International Workshop on Advances in Sensors and Interfaces, IWASI 2015, art. no. 7184949, pp. 181–186. https://doi.org/10.1109/iwasi.2015.7184949 (2015)
- Hoehn, M.M., Yahr, M.D.: Parkinsonism: onset, progression, and mortality. Neurology 50(2), 318–318 (1998)
- De Venuto, D., Stikvoort, E., Tio Castro, D., Ponomarev, Y.: Ultra low-power 12-bit SAR ADC for RFID applications. In: 2010 Design, Automation & Test in Europe Conference & Exhibition, pp. 1071–1075. Dresden. https://doi.org/10.1109/date.2010.5456968 (2010)
- De Venuto, D., Annese, V.F., Ruta, M., Di Sciascio, E., Sangiovanni Vincentelli, A.L.: Designing a cyber-physical system for fall prevention by cortico-muscular coupling detection. In: IEEE Design and Test, vol. 33(3), pp. 66–76, art. no. 7273831. https://doi.org/10.1109/mdat. 2015.2480707 (2016)
- Annese, V.F., Crepaldi, M., Demarchi, D., De Venuto, D.: A digital processor architecture for combined EEG/EMG falling risk prediction. In: 2016 Design, Automation & Test in Europe Conference & Exhibition, pp. 714–719. Dresden (2016)
- Gunn, Steve R.: Support vector machines for classification and regression. ISIS Tech. Rep. 14(1), 5–16 (1998)
- De Venuto, D., Torre, M.D., Boero, C., Carrara, S., De Micheli, G.: A novel multi-working electrode potentiostat for electrochemical detection of metabolites. 2010 IEEE Sensors 1572–1577. Kona, HI (2010). https://doi.org/10.1109/icsens.2010.5690297
- Carrara, S., Torre, M.D., Cavallini, A., De Venuto, D., De Micheli, G.: Multiplexing pH and temperature in a molecular biosensor. In: 2010 Biomedical Circuits and Systems Conference (BioCAS), pp. 146–149. Paphos. https://doi.org/10.1109/biocas.2010.5709592 (2010)

# Functional Near Infrared Spectroscopy System Validation for Simultaneous EEG-FNIRS Measurements



# G. C. Giaconia, G. Greco, L. Mistretta, R. Rizzo, A. Merla, A. M. Chiarelli, F. Zappasodi and G. Edlinger

**Abstract** Functional near-infrared spectroscopy (fNIRS) applied to brain monitoring has been gaining increasing relevance in the last years due to its not invasive nature and the capability to work in combination with other well–known techniques such as the EEG. The possible use cases span from neural-rehabilitation to early diagnosis of some neural diseases. In this work a wireline FPGA–based fNIRS system, that use SiPM sensors and dual-wavelength LED sources, has been designed and validated to work with a commercial EEG machine without reciprocal interference.

G. Greco e-mail: giuseppe.greco17@unipa.it

L. Mistretta e-mail: leonardo.mistretta@unipa.it

R. Rizzo e-mail: raimondo.rizzo@unipa.it

A. Merla · A. M. Chiarelli · F. Zappasodi Dip. Di Neuroscienze, Imaging E Scienze Cliniche, Università "G. D'Annunzio" Chieti–Pescara, Pescara, Italy e-mail: arcangelo.merla@unich.it

A. M. Chiarelli e-mail: antonio.chiarelli@unich.it

F. Zappasodi e-mail: filippo.zappasodi@unich.it

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_6

G. C. Giaconia (⊠) · G. Greco · L. Mistretta · R. Rizzo DEIM, University of Palermo, Palermo, Italy e-mail: costantino.giaconia@unipa.it

G. Edlinger Guger Technologies OG (G.Tec), Herbersteinstrasse 60, 8020 Graz, Austria e-mail: edlinger@gtec.at

# 1 Introduction

Simultaneous measurements of brain oxygenation and EEG are gaining nowadays increasing relevance in early diagnosis of various diseases and in brain rehabilitation therapy monitoring. To obtain enough scalp coverage, a great number of fNIRS sensors and sources are required [1] thus increasing possible interference with EEG electrodes. Due to their high impedance, they behave like antennas collecting all the surrounding electrical noise generated by various EMI sources but mostly by fNIRS probe wires [2].

In this work the developed wireline fNIRS system, simultaneously working with a commercial EEG machine, has been tested. The fNIRS system relies on a Zyng FPGA hosting all the required data processing, adoption of Silicon Photomultipliers (SiPM) as highly sensitive yet fast light sensors and infrared dual-wavelength LEDs as optical sources. The system is a revised and expanded version of the solution proposed in [3], embedding an adjustable current drive for each LED sources and a precise and configurable SiPM sensor biasing. Since human scalp presents quite high attenuation values in the infrared region and it is not possible to increase arbitrarily the amount of impinging light for both safety reasons and EMI minimization; it is of great importance the choice of very sensitive detectors. The SiPMs are capable to fully reach single photon counting performances and can be biased at relatively a low voltage of about 30 V [4]. Even though they have high sensitivity, the useful signals coming from the sensors are deeply buried in optical noise, due to ambient light or lamps, thus a strong signal recovery chain is needed to collect useful information. The FPGA implements, for this reason, a full digital lock-in amplification technique to effectively mark the light passing through the gray matter from the light coming from ambient and others uncorrelated possible paths. This implementation is strong enough to keep a high rejection ratio against others light sources even if the fNIRS system is used without any sort of head coverage [5].

To validate the proposed solution, some experimental tests have been carried out on healthy volunteer subjects. These experiments are based on collecting the brain's hemodynamic reaction and EEG signals on a test-subject while he is exposed to stimulus or he is doing a task. In particular, the test used in this work consists of a repeated finger tapping test under several conditions of ambient light and various parameters settings of the fNIRS system. A first measurement has been acquired with optimal ambient light and sensor biasing while successive have been carried out with extreme conditions of high and low ambient light. Meanwhile a commercial EEG machine, provided by g.Tec, was acquiring data using electrodes placed near fNIRS components.

### 2 System Architecture

In order to obtain a good signal quality without unwanted noise, a digital lockin technique has been implemented though proper LEDs light modulation and a signal processing chain. The implemented architecture of the lock-in amplifier is the well-known dual-phase Lock-In-Amplification (LIA) [5, 6]. It takes the input signal, modulated at a predefined and fixed frequency, and multiplies it by a generated sine and cosine reference signals, running at the same frequency of the modulated signal. The outputs are then low-pass filtered with a properly designed digital filter in order to reject noise and unwanted frequency components.

As depicted in Fig. 1, the designed system, named DigiLock, consists of three boards: an ADC board with two high resolution TI ADS1298 ADC 24-bit converters and passive networks for signal filtering; a LED board with single adjustable current source and 16 multiplexed outputs; a SiPM board that hosts the DC\DC converter for the high rail bias generation. The SiPM bias voltage is adjustable as well and it is shared by all the sensors. The FPGA is located on a development board equipped with a Xilinx Zynq 7Z020 FPGA, that embeds a programmable logic section (PL) with a dual-core ARM. The processor is mainly used for data post processing and communication with a host PC, while all the processing steps needed by lock-in technique have been instantiated in the PL hardware entities.

The tasks of the programmable logic are: collecting data from ADC converters through a dedicated ADC driver; managing time-sharing techniques for each optical channel and sending ADC samples to DigiLock; performing all lock-in algorithm computations. When DigiLock processes a new data set it will send it to the ARM processor via a hardware implemented Block RAM. At this stage the collected data are sent through a high-speed Ethernet connection to the host PC, where a software interface generates an HDF file with the performed measurement. More details regarding the DigiLock design process and architecture can be found in [5].

Figure 2 shows the experimental setup that has been used to validate the developed prototype working simultaneously with a commercial EEG machine, provided by g.Tec.

# **3** Experimental Results

The conducted tests aimed to prove fNIRS system performance under different ambient light conditions and to check if any interference on EEG signals appeared when fNIRS hardware was running. One volunteer has been selected for the measure session to match the conditions of a healthy subject with brown hairs. The additional light attenuation provided by hairs is a suitable test bench to verify signal recovery chain goodness and SiPMs sensibility.



Fig. 1 Image of the developed prototype boards

The selected task was a repeated finger tapping with a defined timeline: 30 s of initial rest, 10 s of right hand tapping followed by 10 s of rest repeated ten times. The task has been repeated four times; one with only EEG system running and three times with both machines running under three different ambient light conditions. Post-processing analysis of fNIRS signal involved artifact removal and optical densities estimation. Hemoglobin oscillations within each channels were computed by means of the Modified Lambert Beer Law and averaged time-locked to the stimulation in a 20 s window (form 5 s prior the onset of the stimulus up to 5 s after).

Figures 3, 4 and 5 show average responses (and related standard error) of hemoglobin for each significant channel recorded in different experimental conditions. A classical hemodynamic response is visible for many of the channels with different levels of noise as a function of experimental condition. EEG signals were acquired with and without concurrent fNIRS acquisition to test possible electromagnetic interference of the switching LEDs onto the EEG. EEG signals were band-pass filtered (0.5–100 Hz Butterworth digital filter) and power spectrum densities were computed over a 2 s window (0.5 Hz resolution).



Fig. 2 Block diagram of the experimental setup



Fig. 3 Oxygenated (red) and de-oxygenated (blue) hemoglobin measurements in optimal light conditions



Fig. 4 Oxygenated (red) and de-oxygenated (blue) hemoglobin measurements in low ambient light conditions and with low level of LEDs driven current



Fig. 5 Oxygenated (red) and de-oxygenated (blue) hemoglobin measurements with direct sunlight conditions and with high level of LEDs driven current



Fig. 6 Spectra of different EEG channels with (right image) and without (red image) fNIRS system activated. They clearly show no visible change of their behavior

Figure 6 shows EEG spectra with and without concurrent acquisition of fNIRS. No visible interference due to the LEDs switching is visible in the EEG frequencies of interest (0.5–100 Hz).

## 4 Conclusions

The developed system had shown optimum level of precision and stability.

As it can be seen no significant interference due to fNIRS system has been detected on EEG signal spectrum and a good fNIRS signal with clear evidence of motorial stimulus has been recovered in all experiments. Experimental results had shown also a good immunity towards ambient light.

The goodness of the described results encourages further investigations.

Acknowledgements This document has been created in the context of the EC-H2020 co-funded ASTON!SH project (ECSEL-RIA proposal No. 692470-2). No guarantee is given that the information is fit for any purpose. The user, therefore, uses the information at their sole risk and liability. The ECSEL has no liability with respect of this document, which is merely representing the authors' view.

# References

- Agrò, D., et al: Design and implementation of a portable fNIRS embedded system. In: Applications in Electronics Pervading Industry, Environment and Society. Lecture Notes in Electrical Engineering, 351, pp. 43–50. Springer, Berlin, Germany (2016)
- A. Von Lühmann, et al. M3BA: a mobile, modular, multimodal biosignal acquisition architecture for miniaturized EEG-NIRS based hybrid BCI and monitoring. IEEE Trans. Biomed. Eng. 1 (2016)
- Giaconia, G.C., Greco, G., Mistretta, L., Rizzo, R.: FPGA based digital lock-in amplifier for fNIRS systems. ApplePies (2017)
- Adamo, G., Parisi, A., Stivala, S., Tomasino, A., Agro, D., Curcio, L., Giaconia, C.G., Busacca, A.C., Fallica, P.G.: Silicon photomultipliers signal-to-noise ratio in the continuous wave regime. IEEE J. Sel. Top. Quantum Electron. 20(6), 284–290. IEEE (2014). https://doi.org/10.1109/jstqe. 2014.2346489
- 5. Giaconia, G.C., et al.: Exploring FPGA-based lock-in techniques for brain monitoring applications. Electronics **6**(1), 18 (2017)
- 6. Macias-Bobadilla, G., et al.: Dual-phase lock-in amplifier based on FPGA for low-frequencies experiments. Sensors 16, 379 (2016)

# Electro-Photonic Chip-Scale Microsystem for Label-Free Single Bacteria Monitoring



### Francesco Dell'Olio, Donato Conteduca, Michele Cito, Giuseppe Brunetti, Caterina Ciminelli, Thomas F. Krauss and Mario N. Armenise

**Abstract** Monitoring of bacteria metabolism/viability at single level during the antibiotics action is a crucial functionality for systems supporting the development of new drugs able to kill bacteria resistant to all or nearly all antibiotics currently available. In this paper, we report on an electro-photonic chip-scale microsystem including an array of photonic nanocavities each able to trap a single bacterium. By monitoring the spectral response of the nanophotonic cavities and the electrical impedance across the trapping sites, a detailed knowledge of the metabolic state of trapped bacteria can be obtained. By three-dimensional simulations based on the finite element method, we predict a high electrical detection resolution of the microsystem, with a current variation of a factor 12 between dead and live bacteria.

# 1 Introduction

Optoelectronic and photonic integrated devices and microsystems for healthcare are currently the topic of a strong research effort and several very innovative application domains are emerging for these components [1–6]. One of the most intriguing applications is that in the field of systems for mitigating antimicrobial resistance (AMR) [7], which is the ability of bacteria to make the action of antimicrobials, such as antibiotics, inefficient.

As pointed out in many official documents of the World Health Organization [8], AMR is growing very fast ad it will become the deadliest cause in the next decades. Consequently, there is an urgent need of improving the efficiency of antibiotics and developing new drugs.

Nowadays, most of techniques proposed to defeat AMR are not sensitive enough to detect bacteria since early stages of infections [9], that becomes very risky for several

© Springer Nature Switzerland AG 2019

F. Dell'Olio · D. Conteduca · M. Cito · G. Brunetti · C. Ciminelli (⊠) · M. N. Armenise Optoelectronics Laboratory, Politecnico Di Bari, Bari, Italy e-mail: caterina.ciminelli@poliba.it

D. Conteduca · T. F. Krauss

Photonics Group, Department of Physics, University of York, York, UK

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_7



Fig. 1 Schematic illustration of the chip configuration. The chip includes an array of photonic nanocavities acting as trapping site and two electrodes for impedance measurement

deadly diseases, such as sepsis, flesh eating bacterial infections, acute tuberculous meningitis and carcinogenesis. In fact, if those pathologies are not treated promptly, they can grow and quickly spread [10].

Currently the plate-count method [11] is the most typical approach for the analysis of bacterial infections but few days are necessary to obtain accurate results due to the need of cell culture, and it is also laborious because expert users are required for the sample preparation and the final analysis. Therefore, novel techniques have been proposed to overcome such limitations, and remarkable advantages have been obtained with photonic and electrochemical approaches [12–14].

In this paper, we report on the design of an electro-photonic chip-scale microsystem that exploits the combination of optical and electric techniques aiming at monitoring of the state of bacteria at single level with fast analysis of tens of minutes. The chip [15] (see Fig. 1) includes a number of photonic nanotweezers based on photonic crystal (PhC) cavities for trapping single bacteria. After a trapping event, electrical impedance measurements are carried out on each trapping site to monitor if bacteria are alive or dead through a couple of electrodes having a properly designed geometrical configuration, in order to allow the current flow in the trapping region.

### 2 Electro-Photonic Traps

Each nanotrap is a PhC cavity in silicon-on-insulator (SOI) technology. It has the well-known L3-type configuration [16], consisting of a two-dimensional slab PhC with a line of three holes missed out of a triangular lattice (see Fig. 2).

#### Fig. 2 L3-type PhC cavity



The triangular lattice is etched in the silicon slab (n = 3.478 at  $\lambda = 1550$  nm) with a thickness of 220 nm on glass (n = 1.444 at  $\lambda = 1550$  nm). The holes radius is 106 nm and their period along the cavity length is 400 nm. A heterostructure configuration has been utilized in the PhC design. In fact, the lattice pitch has been increased up to 420 nm in the L3-cavity, in order to improve the mode confinement [17]. Such parameters have been chosen to obtain the best compromise between a high Q-factor and a large extinction ratio, together with a resonance condition close to 1.55 µm.

The holes configuration allows the nanocavity out-of-plane excitation by top illuminating the chip [18]. In particular, a superimposed lattice with larger holes with a radius of 130 nm and periodicity = 800 nm has been included in the design of the cavity to enable out-of-plane coupling. The nanocavity configuration has been optimized by including a tapered section with smaller holes on both sides of the L3 cavity with the aim of reducing the optical losses and increasing the Q-factor. These holes have a radius of 85 nm and are shifted of 88 nm with respect to the site of the unitary cell of the lattice.

The selected excitation mechanism allows that all trapping sites are simultaneously activated.

When a single bacterium is trapped after the chip illumination with a light beam at  $1.55 \mu m$ , the effective index of the cavity mode increases. Consequently, a red-shift in the cavity resonance is observed. In this work, the monitoring of E.coli has been assumed, but it is expected that the quantification of this resonance shift can be utilized to identify whether the bacterium is Gram negative or Gram positive, due to different optical density [19].

The equivalent electrical circuit shown in Fig. 3a describes the electrochemical behavior of each trapping site, whose cross-section is schematically shown in Fig. 3b. The elements of the electric circuit are the resistance of the doped lateral sections of the silicon slab  $R_{slab,l}$ , the resistance of the central undoped section of the silicon slab  $R_{slab,c}$ , the double layer capacitance ( $C_{dl}$ ), which represents the electrical properties of the interface between the electrode and the bacterium, and the surrounding medium resistance ( $R_{sol}$ ).



A narrow silicon region at the center of the L3-cavity is assumed undoped to strongly enhance the value of  $R_{slab,c}$ , which represents the necessary condition to reach a resolution suitable for the monitoring of the metabolic state of a single bacterium. Furthermore, the undoped region in the L3 cavity also allows to reduce the optical loss.

In the p-doped regions of silicon slab we have assumed an exponential vertical decay of the concertation of dopant, from  $N = 10^{21}$  cm<sup>-3</sup> to  $10^{17}$  cm<sup>-3</sup> in only 20 nm.

When no bacteria are trapped, the impedance of the tapping site  $Z_{TS}$  is very high due to the high value of  $R_{slab,c}$  and the low electrical conductivity of the surrounding medium (i.e. deionized water) having  $\sigma \sim 10^{-4}$  S/m. A similar behavior is expected when a single bacterium is trapped and no efficient antibiotic interacts with it, due to the insulating external membrane. On the contrary, when efficient antibiotics are introduced in the solution and bacteria are dead, a fast ions efflux occurs from the disrupted external membrane [20], corresponding to a remarkable change of  $C_{dl}$  and, so, of the impedance  $Z_{TS}$ .

### **3** Selected Numerical Results

We have simulated the electro-photonic traps by the three-dimensional finite element method. For optical simulations, we have represented the bacterium as a 2  $\mu$ m long cylinder with a diameter of 500 nm and refractive index = 1.388 [21]. For electrical simulations, we have used the three-shell spheroidal model representing the cytoplasm, the inner membrane, the periplasm, and the outer membrane [22]. In this model, a dead bacterium is represented with the external wall disrupted, assuming a larger region of the periplasm (=15 nm) with an average value of

electrical conductivity (1 S/m), in order to simulate the release of the ions and organelles in the surrounding environment of the bacterium.

The Q-factor of the L3-type PhC cavity is  $2.3 \times 10^3$  with a transmission dip of 29% on resonance. Our results in [23] shows that this Q-factor value is suitable to exert optical forces strong enough for trapping of living matter and dielectric objects at the micro- and sub-microscale with long trapping time.

In the electrical simulations, we have assumed that an AC electrical signal with a frequency f = 100 Hz and a low amplitude V = 50 mV is applied to one of the electrodes (the other electrode is assumed connected to ground) in order to avoid undesired Joule heating.

The nanotraps electrical behavior has been simulated in three different conditions. (i) When no bacteria is trapped a low current value at the device output ( $I_{out} = 360 \text{ pA}$ ) is obtained because of the presence of the narrow region of undoped silicon in the L3 cavity that prevents current to flow. (ii) Trapping of a single live bacterium before the administration of antibiotics. In this condition, the current is still low because of the strong insulating behavior of the intact membrane ( $I_{out} = 350 \text{ pA}$ ). (iii) When the trapped bacterium is dead, the conductivity at the interface between the bacterium and the silicon is higher, with a consequent higher value of  $C_{dl}$ . In this condition  $I_{out} = 4.10 \text{ nA}$ . An enhancement factor  $I_{out\_dead}/I_{out\_live} \sim 12$  has been calculated between the conditions live/dead bacteria in the optical trap that makes easy the monitoring of the metabolic state of the bacterium.

### 4 Conclusions

An electro-photonic chip that allows the monitoring of the response of bacteria to antibiotic challenge has been reported. The chip includes an array of nanophotonic traps based on photonic crystal cavities in silicon-on-insulator technology. The silicon is partially doped to enable impedance measurements. The main advantages of the proposed device are mainly related to a high efficiency and high resolution enabling the detection of bacteria at single level, together with a real-time, reusable, labelfree and non-destructive analysis with a very compact footprint, which make the optoelectronic system suitable for on-chip integration within point-of-care medical instruments.

### References

- Estevez, M., Alvarez, M., Lechuga, L.: Integrated optical devices for lab-on-a-chip biosensing applications. Laser Photon. Rev. 6, 463–487 (2012)
- Ciminelli, C., Campanella, C.M., Dell'Olio, F., Campanella, C.E., Armenise, M.N.: Labelfree optical resonant sensors for biochemical applications. Prog. Quantum Electron. 37, 51–107 (2013)

- 3. Fernández Gavela, A., Grajales García, D., Ramirez, J.C., Lechuga, L.M.: Last advances in silicon-based optical biosensors. Sensors 16, 285 (2016)
- Ciminelli, C., Dell'Olio, F., Conteduca, D., Campanella, C.M., Armenise, M.N.: High performance SOI microring resonator for biochemical sensing. Opt. Laser Technol. 59, 60–67 (2014)
- Dell'Olio, F., Conteduca, D., Ciminelli, C., Armenise, M.N.: New ultrasensitive resonant photonic platform for label-free biosensing. Opt. Express 23, 28593–28604 (2015)
- 6. Dell'Olio, F., Conteduca, D., De Palo, M., Ciminelli, C.: Design of a new ultracompact resonant plasmonic multi-analyte label-free biosensing platform. Sensors **17**, 1810 (2017)
- Prestinaci, F., Pezzotti, P., Pantosti, A.: Antimicrobial resistance: a global multifaceted phenomenon. Pathog. Glob. Health 109, 309–318 (2015)
- 8. http://www.who.int/antimicrobial-resistance/en/
- 9. Jasovsky, D., Littmann, J., Zorzet, A., Cars, O.: Antimicrobial resistance a threat to the world's sustainable development. Upsala J. Med. Sci. **121**, 159–164 (2016)
- Dickson, R.P., Singer, B.H., Newstead, M.W., Falkowski, N.R., Erb-Downward, J.R., Standiford, T.J., Huffnagle, G.B.: Enrichment of the lung microbiome with gut bacteria in sepsis and the acute respiratory distress syndrome. Nat. Microbiol. 1, 16113 (2016)
- Khan, M.M.T., Pyle, B.H., Camper, A.K.: Specific and rapid enumeration of viable but nonculturable and viable- culturable gram-negative bacteria by using flow cytometry. Appl. Environ. Microbiol. 76, 5088–5096 (2010)
- Zhou, H., Yang, D., Ivleva, N.P., Mircescu, N.E., Schubert, S., Niessner, R., Wieser, A., Haisch, C.: Label-free in situ discrimination of live and dead bacteria by surface enhanced Raman scattering. Anal. Chem. 87, 6553–6561 (2015)
- Yang, L., Li, Y., Griffis, C.L., Johnson, M.G.: Interdigitated microelectrode (IME) impedance sensor for the detection of viable Salmonella typhimurium. Biosens. Bioelectron. 19, 1139–1147 (2004)
- Safavieh, M., Pandya, H.J., Venkataraman, M., Thirumalaraju, P., Kanakasabapathy, M.K., Singh, A., Prabhakar, D., Chug, M.K., Shafiee, H.: Rapid real-time antimicrobial susceptibility testing with electrical sensing on plastic microphotonics with printed electrodes. Appl. Mater. Interfaces 9, 12832–12840 (2017)
- Conteduca, D., Dell'Olio, F., Brunetti, G., Krauss, T.F., Ciminelli, C., Armenise, M.N.: Highefficiency optoelectronic system for monitoring of antimicrobial resistance (AMR) in bacteria. In: 20th Italian National Conference on Photonic Technologies (Fotonica 2018), Lecce, Italy (2018)
- Akahane, Y., Asano, T., Song, B.S., Noda, S.: High-Q photonic nanocavity in a twodimensional photonic crystal. Nature 425, 944–947 (2003)
- Portalupi, L., Galli, M., Reardon, C., Krauss, T.F., O'Faolain, L., Andreani, L.C., Gerace, D.: Planar photonic crystal cavities with far-field optimization for high coupling efficiency and quality factor. Opt. Express 18, 16064–16073 (2010)
- Galli, M., Portalupi, S.L., Belotti, M., Andreani, L.C., O'Faolain, L., Krauss, T.F.: Light scattering and fano resonances in high-Q photonic crystal nanocavities. Appl. Phys. Lett. 94, 071101 (2009)
- Terisod, R., Tardif, M., Marcoux, P. R., Picard, E., Hadji, E., Peyrade, D., Houdrè, R.: Optical trapping of living bacteria in 2D hollow photonic crystal cavities. In: Conference on Laser and Electro-optics (CLEO 2018), San Jose, California, USA (2018)
- Delcour, A.H.: Outer membrane permeability and antibiotic resistance. Bioch. et Biophys. Acta 1794, 808–816 (2009)
- Liu, P.Y., Chin, K., Ser, W., Ayi, T.C., Yap, P.H., Bourouina, T., Leprince-Wang, Y.: Real-time measurement of single bacterium's refractive index using optofluidic immersion refractometry. Procedia Eng. 87, 356–359 (2014)
- Bai, W., Zhao, K.S., Asami, K.: Dielectric properties of E.coli cell as simulated by the three shell spheroidal model. Biophys. Chem. 122, 136–142 (2006)
- Conteduca, D., Reardon, C., Scullion, M.G., Dell'Olio, F., Armenise, M.N., Krauss, T.F., Ciminelli, C.: Ultra-high Q/V hybrid cavity for strong light-matter interaction. APL Photonics 2, 086101 (2017)

# HGIS: A Healthcare-Oriented Approach to Geographic Information Systems



Edgar Batista, Antoni Martínez-Ballesté, Marta Peña, Xavier Singla and Agusti Solanas

Abstract The healthcare sector is rapidly changing to incorporate new technologiesAntoni that enable the collection of unprecedented amounts of data. New paradigms such as mobile and smart healthcare are founded on these new technologies, such as the IoT, and the data they collect are no longer only medical but, instead, they refer to contextual information with spatio-temporal features. Also, new data processing techniques, such as process mining, are gaining attention and are reshaping the landscape of healthcare information systems. In this article, we concentrate on Geographic Information Systems and propose HGIS: a comprehensive, healthcare-oriented, GIS architecture to support the integration of heterogeneous data with spatio-temporal features. Moreover, we comment on initial results obtained from a proof-of-concept implementation developed by *Xarxa Sanitària i Social de Santa Tecla*, a large healthcare provider in the south of Catalonia.

**Keywords** GIS · Electronic healthcare records · Context-awareness · Smart healthcare · Healthcare management · Process mining

E. Batista

SIMPPLE, S.L., Tarragona, Catalonia, Spain e-mail: edgar.batista@simpple.com

A. Martínez-Ballesté · A. Solanas (⊠) Smart Health Research Group, Universitat Rovira i Virgili, Catalonia, Spain e-mail: agusti.solanas@urv.cat

A. Martínez-Ballesté e-mail: antoni.martinez@urv.cat

M. Peña · X. Singla Xarxa Sanitària i Social Santa Tecla, Tarragona, Catalonia, Spain e-mail: mpena@xarxatecla.cat

X. Singla e-mail: xsingla@xarxatecla.cat

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_8

# 1 Introduction

Organisations and companies count with information systems that continuously collect large amounts of heterogeneous data from a variety of sources. These data are used not only for operational purposes but also for strategic and decision support. Processed data from operational databases are stored in data warehouses, used by business intelligence solutions to extract knowledge and visualise strategic information. The great amount of data stored, together with their complexity, foster the appearance of advanced data analysis and visualisation techniques. In particular, those related to spatio-temporal dimensions.

Geographic Information Systems (GIS) provide a set of methods and tools to ease the storage, analysis, transformation and visualisation of geolocated data. The paradigm of computerised GIS<sup>1</sup> enabled the rapid development and commercialisation of solutions for territory planning, agriculture, banking, telecommunications, transportation and public health [8]. In particular, GIS technologies are well suited to play a key role in healthcare, since they contribute to the analysis and improvement of its sustainability and efficiency. More importantly, to enhance quality of service, GIS could be coupled with healthcare records, processes, and information systems, that allow the integration of contextual information, following the paradigm of Smart Healthcare [6, 7]. Due to the importance of the context, health conditions and spatio-temporal features are intertwined: microbes propagation, socioeconomic data, demographic status, and the distribution of healthcare services are a few examples of geographical, spatio-temporal features affecting health outcomes in certain areas at particular times [2]. Despite the multiple healthcare applications of GIS [3], context features are rarely considered and potentially beneficial knowledge is neglected.

In addition to the historical medical information stored in data warehouses of medical institutions, GIS technologies could, for instance, integrate contextual information by taking advantage of wearable and IoT devices from patients. These devices integrate plenty of built-in sensors (e.g. environmental, physiological, positioning) and, despite their risks [4], they can help to obtain important knowledge when their collected data are correlated with health conditions from individuals and populations. Furthermore, the analysis of these data can go further by considering spatio-temporal dimensions through process mining analysis, which aims to discover and visualise medical processes in their context [1].

# 1.1 Contribution and Plan of the Article

This article explores the challenges and opportunities of GIS technologies, motivated by their current lack of contextualisation, together with the need for achieving more sustainable healthcare models. We present our Healthcare Geographic Information

<sup>&</sup>lt;sup>1</sup>Roger Tomlinson, 1967.

System (HGIS): a solution that is currently under development by Xarxa Sanitària i Social de Santa Tecla.<sup>2</sup> By considering contextual information, HGIS aims at assisting healthcare managers in making better decisions on the planning and delivery of healthcare services.

The rest of the article presents the main architecture of HGIS (Sect. 2), a proof-ofconcept implementation and some initial results (Sect. 3) and conclusions (Sect. 4).

### 2 HGIS: A GIS Oriented to the Healthcare Sector

This section presents the architecture of HGIS (acronym for Healthcare Geographic Information System). It is a GIS especially designed for the healthcare domain, aiming at ease the analysis of healthcare information by combining several types of data and providing strategic visualisations.

Figure 1 shows the architecture of our HGIS solution, which considers three tiers: (i) a **data tier** (with medical, geographical and contextual data), (ii) a **processing tier** with several analysis methods aimed at obtaining knowledge, and (iii) a **visualisation tier** based on maps. We analyse them next:

Data tier: In the first tier, we distinguish three main elements:

- **Data warehouses**, aka strategic databases, integrate data from multiple operational information systems to support decision making and strategic alignment. In HGIS, data warehouses are persistent data repositories with medical-related data (i.e., electronic health records, patient profiles, clinical guidelines) imported from the operational information systems through ETL<sup>3</sup> procedures.
- The Locations Look Up Table (LLUT) is a tree-based data structure specially tailored to store and associate geographical constructs with their corresponding geographical shapes, defined by sets of latitude-longitude pairs by using e.g. GeoJSON. Being designed to be completely adaptable to the needs of health-care organisations worldwide, the LLUT enables the definition of geographical constructs (e.g., provinces, cities) by means of geographical shapes. Hence, the LLUT plays the role of an abstraction layer for the geographical information used in HGIS, by identifying each construct with a unique code in a hierarchical tree-based structure. These codes are translated into shapes (with geographical meaning) by the visualisation layer using the LLUT. Also, the hierarchy of the LLUT allows the dynamic definition of levels (e.g., hospital → city → region → province). For instance, at the lowest level (i.e., the leafs), hospitals and healthcare facilities are defined. One level above, cities are defined, and so on.
- Contextual information is able to augment medical data with cause-effect explanations. Analysing environmental/contextual data, such as temperature or PM10 concentration, in conjunction with the biophysical parameters of patients, enables

<sup>&</sup>lt;sup>2</sup>A large healthcare provider in the south of Catalonia, comprised by several hospitals and points of care.

<sup>&</sup>lt;sup>3</sup>Extraction, transformation and loading.



Fig. 1 Our three-tier HGIS architecture: data, processing and visualisation

personalised analyses and helps to understand the relation between environmental factors and diseases [5]. To fuel this HGIS component, a variety of IoT devices (e.g., smartphones and wearable devices) could be used. Moreover, cognitive environments such as smart hospitals and smart cities gain importance as sources of contextual data for healthcare.

**Processing tier**: In this tier we group analysis techniques applied to data collected in the data tier. Initially, fairly simple computations are used to compute healthcare indicators i.e., numerical and categorical values resulting from the joint evaluation of several medical-related facts, such as the number of patients' relapses and their prescribed medication. These indicators are then statistically analysed and data mining techniques are used to compute correlations, identify clusters, classify observations and detect outliers. Also, thanks to the addition of timestamped data, process mining (PM) techniques are used to discover processes, to check processes adherence, and for process optimisation. For instance, using PM techniques, patients' flows could be discovered and analysed in terms of efficiency and performance, from a spatio-temporal point of view.

**Visualisations tier**: HGIS is a strategic organisational tool and, as such, data must be properly visualised in order to assist decision makers in gathering knowledge and helping them to make better decisions. With this aim, HGIS plots indicators using multiple visualisation layers, each of them following a specific visualisation
model. For instance, numerical indicators are shown with markers and choropleth maps, densities are represented with heat maps, flows are described with migration maps, and statistical data are shown with pie and bar charts. The bottom visualisation layer is a map, obtained from on-line service providers, and the other visualisations are built on top. Each visualisation layer includes legends, layer switching controls, zooming and filtering options. HGIS is extensible and allows data scientists to create and use new visualisation layers.

### **3 Proof-of-Concept Implementation and Initial Results**

In order to obtain an initial pre-evaluation of HGIS, a proof-of-concept, web-based prototype has been developed. The HGIS data tier has been built using the geographical PostGIS extension for PostgreSQL. In the processing tier, we use server-side implementations in Java and Python. Finally, for the visualisation tier, we have used the JavaScript-based Leaflet plug-in,<sup>4</sup> OpenStreetMap and Stamen.

For each tier, we have paid special attention to the components that are specifically designed for HGIS, and we briefly summarise them next:

In the data tier, for the sake of patients privacy protection, the data warehouse has been populated with synthetic data mimicking the real data from the operational systems of the *Xarxa Sanitària i Social de Santa Tecla*. The LLUT has been fully implemented with a five-level hierarchy, following geographic organisation in Catalonia (i.e., province, region, basic healthcare area, municipality and healthcare centre). Currently, our LLUT considers one province, two areas, six basic healthcare areas, 23 municipalities and 31 healthcare centres.

In the processing tier, data are analysed in terms of medical indicators and basic statistics (e.g., aggregations, averages). HGIS users can select existing data analyses and create new ones, according to their needs. It is worth emphasising that a process mining module has been developed in order to discover flows in medical processes (from a spatio-temporal perspective) e.g., the mobility of patients amongst healthcare facilities. To the best of our knowledge, this is the first time that PM techniques and GIS are combined to study healthcare processes.

The visualisation tier uses markers and choropleth visualisations to represent indicators. Those visualisations depend on the hierarchy level of the data defined in the LLUT. For instance, markers are used to visualise data at the lowest level (i.e., healthcare centres) because they correspond to locations, rather than regions. On the contrary, when data refers to higher levels of the LLUT entailing areas, choropleth visualisations are used (cf. Fig. 2a). In addition, PM results are represented using traffic maps visualisations. This kind of visualisations can be used to represent medical processes from a spatio-temporal perspective, enabling the identification of bottlenecks or hidden patterns cf. Fig. 2b.

<sup>&</sup>lt;sup>4</sup>https://leafletjs.com/.



Fig. 2 HGIS visualisation layers. a Visualisation using markers and choropleth layers, b Flow visualisation from a spatial perspective using traffic maps

The initial testing of the proof-of-concept implementation shows that the platform if fully functional and helps to represent information that, until now, was not accessible to decision makers. Thanks to the use of open and scalable technologies, HGIS has proven to be an extensible, cost-efficient and practical tool for managers and data scientist in healthcare organisations.

## 4 Conclusions

The need for reaching sustainable healthcare models requires new forms of data analysis for decision making. Adding contextual information to medical data provides analysts with more possibilities to gather added-value knowledge in a variety of topics: from medical procedures and treatments to economic and patients' flows. Thus, we sustain that the fusion of context-aware analyses with GIS technologies will reshape the understanding of business intelligence tools in the healthcare sector. However, the idiosyncrasies of the healthcare domain require important modifications to current general purpose GIS solutions.

In this article, we have introduced HGIS, a healthcare-oriented GIS solution. Due to space limitations we have not been able to describe HGIS in detail. We have briefly described our three-tier architecture and we have report preliminary comments on the implemented proof-of-concept. Future work concentrates on fully implementing the solution and thoroughly testing it in real medical practice.

Acknowledgements This work is supported by Generalitat de Catalunya projects 2017-SGR-896 and 2017-DI-002, and by Universitat Rovira i Virgili project 2017PFR-URV-B2-41.

## References

- 1. Batista, E., Solanas, A.: Process mining in healthcare: a systematic review. In: 9th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–6. IEEE (2018)
- Graves, B.A.: Integrative literature review: a review of literature related to geographical information systems, healthcare access, and health outcomes. Perspect. Health Inf. Manag. 5, 11 (2008)
- Lyseen, A.K., Nøhr, C., Sørensen, E.M., Gudes, O., Geraghty, E.M., Shaw, N.T., et al.: A review and framework for categorizing current research and development in health related geographical information systems (GIS) studies, Yearb. Med. Inform. 23(1), 110–124 (2014)
- 4. Papageorgiou, A., Strigkos, M., Politou, E., Alepis, E., Solanas, A., Patsakis, C.: Security and privacy analysis of mobile health applications: the alarming state of practice. IEEE Access 6, 9390–9403 (2018)
- Riano, D., Solanas, A.: Exploiting the relation between environmental factors and diseases: a case study on chronic obstructive pulmonary disease. In: Workshop on Knowledge Representation for Health-Care Data, Processes and Guidelines, KR4HC 2014, pp. 160–173 (2014)
- Solanas, A., Casino, F., Batista, E., Rallo, R.: Trends and challenges in smart healthcare research: a journey from data to wisdom. In: 3rd International Forum on Research and Technologies for Society and Industry (RTSI), pp. 1–6. IEEE (2017)
- Solanas, A., Patsakis, C., Conti, M., Vlachos, I.S., Ramos, V., Falcone, F., et al.: Smart health: a context-aware health paradigm within smart cities. IEEE Commun. Mag. 52(8), 74–81 (2014)
- 8. Waters, N.: GIS: history. In: International Encyclopedia of Geography, pp. 1-13 (2018)

# Part III Technology and Testing Issues

## Main Parasitic Effects in Contactless Wafer Testing



Alessandro Finocchiaro, Giovanni Girlando, Alessandro Motta, Alberto Pagani and Giuseppe Palmisano

**Abstract** The paper presents an analysis of principal parasitic effects in contactless wafer-level testing. Contactless technology exploits an inductive coupling between a tester antenna and many integrated on-chip antennas (OCAs) able to transfer energy and exchange bidirectional data. Electromagnetic crosstalk between adjacent on-chip antennas and the eddy currents generated in the substrate were analyzed. Simulations, varying the thickness and the conductivity of the substrate, have highlighted the strengths of this approach. Moreover, a wafer scribe line pre-cutting, used to drastically reducing the eddy currents, was also adopted.

Keywords On-chip antenna  $\cdot$  Contactless testing  $\cdot$  Magnetic coupling  $\cdot$  Eddy currents

## 1 Introduction

The Electrical Wafer Sorting (EWS) is generally adopted in Integrated Circuits (ICs), as first test to map good dice before being packaged. In the case of simple ICs, the EWS is the only means to validate the operation, whereas for complex ICs addi-

G. Girlando e-mail: giovanni.girlando@st.com

A. Motta · A. Pagani STMicroelectronics, 20864 Agrate Brianza, Italy e-mail: alessandro.motta@st.com

A. Pagani e-mail: alberto.pagani@st.com

A. Finocchiaro (🖂) · G. Girlando

STMicroelectronics, 95121 Catania, Italy e-mail: alessandro.finocchiaro@st.com

G. Palmisano DIEEI, University of Catania, 95126 Catania, Italy e-mail: giuseppe.palmisano@unict.it

<sup>©</sup> Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_9

tional functional tests are required to guarantee the customer's performance certified in the product data sheet. However, the final test represents a challenge for waferlevel testing, especially at high level of parallelism and technology scaling. Indeed, miniaturization and parallelism require a high number of probes, which consequently increases the complexity of the probe card and Automatic Test Equipment (ATE). Moreover, reliability and yield must be guaranteed during the wafer–level test avoiding cracking of the pads and assuring stable and low contact resistances.

A viable solution that drastically reduces the testing cost without compromising the reliability and yield is the use of contactless wafer–level testing approach [1–3]. It exploits a magnetic coupling between a tester antenna and an On-Chip Antenna (OCA) on a Device Under Test (DUT). In [1] an OCA was designed using a standard 90-nm CMOS process on a 900-MHz Radio Frequency Identification (RFID) frontend. The chosen operating frequency for contactless wafer-level testing sets the kind of the OCA and all needed circuit blocks (i.e., the rectifier and the RF transceiver). A fully contactless wafer-level test was instead demonstrated in [2] using a complete UHF RFID tag with OCA as test vehicle. This OCA was designed on the perimeter of the underlying IC tag and fabricated using a post-processing.

In [3], a contactless parallel testing of a System-in-Package (SiP) module was presented. This method exploits a magnetic coupling interface between wireless probe integrated in standard tester probe card (e.g., JTAG testers) and the SiPs equipped with RF circuitry. The adopted RF carrier signal was 1.5 GHz and the test system was able to deliver power and bidirectional data.

This paper describes the main constraints of inductive contactless wafer–level testing due to substrate parasitic effects, untreated by [1–4]. Crosstalk between adjacent OCAs and the eddy currents generated in silicon substrate are analyzed and a pre–cutting of the wafer scribe line is adopted to drastically reduce eddy currents. Simulation results, carried out by varying the thickness and conductivity of the substrate, show parasitic effect behavior to choose design parameters for the optimization of this contactless approach.

## 2 Main Parasitic Effects in Silicon Substrate

The design of the contactless wafer–level testing faces challenging substrate parasitic effects, which cause the reader antenna frequency shift (or de-tuning) and the  $\omega QL$  product reduction. Indeed, the magnetic flux generated by the reader antenna, which is concatenated with non–negligible electrical conductivity substrate, generates eddy currents. These parasitic currents reduce the efficiency of the reader antenna.

The electromagnetic (EM) simulations with Ansys HFSS, were performed using a  $6.3 \times 6.3 \text{ mm}^2$  reader antenna (as in [2]), at different values of substrate resistivity,  $\rho$ . The resistivity  $\rho$  ranged from 10 m $\Omega$ cm (i.e., high-doped silicon wafer) to 1 k $\Omega$ cm (i.e., very purified silicon wafer) thus covering the main IC technologies. In Fig. 1a a picture of the contactless interface in HFSS environment is shown.



Fig. 1 a Ansys HFSS snapshot of the reader antenna coupled to OCAs; b Resonance frequency vs. distance between reader antenna and wafer with a low (10 m $\Omega$ cm) and high (125  $\Omega$ cm) resistivity: experimental results (solid line) and simulations (dashed line)

Reader antenna resonance frequency, defined as the frequency where the magnitude of the reflection loss coefficient is minimum, depends on the substrate resistivity and the distance, d, between the antenna and the wafer. Indeed as shown in Fig. 1b, for wafers with a low substrate resistivity (i.e.,  $10 \text{ m}\Omega \text{cm}$ ), the resonance frequency of the reader antenna rises as the d is reduced. The reason is a reduction in the effective inductance of the reader antenna due to eddy currents, especially at reduced distances where the magnetic coupling is higher. Instead, for wafers with a high substrate resistivity (i.e., 125  $\Omega$ cm), the resonance frequency decreases as the d is reduced. It happens because of capacitive coupling between reader antenna and silicon substrate, which prevails over the magnetic one. In addition, at maximum distance of 4 mm and independently of the substrate resistivity, the resonance frequency tends to converge on a single value, thus demonstrating that the parasitic effects no longer have any impact. Good agreement between EM simulations and experimental results is achieved. The resonance frequency was deduced by measuring the S11 magnitude on two different conductivity wafers exploiting the experimental setup described in [2].

On the other side, in Fig. 2a, the behavior of the resonance frequency, versus the substrate resistivity, was shown at various distances between wafer and reader antenna. Three different ranges of substrate resistivity can be identified. For  $\rho < 100 \text{ m}\Omega\text{cm}$  and for  $\rho > 1 \Omega\text{cm}$ , the resonance frequency is almost constant for each distance. Instead, in the range of 100 m $\Omega\text{cm} < \rho < 1 \Omega\text{cm}$ , the resonance frequency rapidly varies, especially for small distances. Indeed, when the distance is higher than 3 mm, the resonance frequency is independent of substrate resistivity. Furthermore, at a substrate resistivity of about 300 m $\Omega\text{cm}$ , the resonance frequency is quite stable for all considered distance. In the first range (i.e., for  $\rho < 100 \text{ m}\Omega\text{cm}$ ) the inductive coupling between the reader antenna and the silicon substrate prevails over the capacitive coupling predominates over the inductive coupling one. In the second



Fig. 2 a Reader antenna resonance frequency and b  $\omega QL$  product versus substrate resistivity varying the distance between wafer and reader antenna

range (i.e., for 100 m $\Omega$ cm <  $\rho$  < 1  $\Omega$ cm) the two couplings conflict each other balancing at 300 m $\Omega$ cm resistivity.

As discussed, the reader antenna generates eddy currents inside the substrate, which affect the antenna performance. An expression of the eddy current  $I_e$  and the effect of substrate partitioning can be evaluated starting from [5] that considers a small coil inside an IC. In our case, the reader antenna irradiates a plurality of ICs on wafer, adding many parasitic effects (due to seal rings, metal layers, circuits). Then, the model can be generalized to take into account all effects due to substrate conductivity and the integrated circuits underneath the reader antenna. As reported in [2], the effect of the induced current on the magnetic field cannot be neglected. A solution to decrease eddy currents was implemented in [6]. The technique uses trenches obtained in the scribe line of the wafer with the aim of cutting the paths of circular eddy currents. The  $\omega QL$  product of the reader antenna versus wafer substrate resistivity, for different values of distances, is shown in Fig. 2b. This parameter describes the loss effects on the reader antenna due to the coupling between the reader antenna and the non-negligible electrical conductivity wafer substrate. Differently from the previous de-tuning effects, losses effects on reader antenna are strongly present even at a distance of 4 mm. Indeed, even a little loss due to wafer substrate heavy affects a high quality factor reader antenna. Moreover, the loss effects due to inductive or capacitive couplings add together, thus always lowering the reader antenna quality factor. However, at reduced distances (i.e., up to 1 mm) the losses due to inductive coupling (i.e., for  $\rho = 10 \text{ m}\Omega\text{cm}$ ) are more severe than those due to capacitive coupling (i.e., for  $\rho = 1 \text{ k}\Omega\text{cm}$ ). Thus, the worst case is between 0.3 to 1  $\Omega$ cm substrate resistivity when the two coupling effects concur to add loss contributions.



Fig. 3 a Four OCAs drawn in environment HFSS to evaluate crosstalk between adjacent OCAs; b Magnitude of  $S_{12}$  parameter versus silicon substrate resistivity for unsawn and sawed wafer

### **3** Crosstalk Between OCAs

The crosstalk is an electromagnetic interference that can compromise results of functional and parametric tests and, in the worst case, lead to a wrong classification of a good/faulty die. As described below, the crosstalk among tag ICs depends on the substrate resistivity. EM simulations were performed using a 5–turn–squared OCA with a 5  $\mu$ m thick metal layer and a 0.45– $\mu$ m<sup>2</sup> of area, as described in [2]. Moreover, analysis was led considering both the unsawn wafer (i.e., wafer without trenches or cuts) and sawed wafer (i.e., wafer with trenches or cuts) silicon substrate and results refer to 868–MHz operative frequency, which conforms to the European band, as reported in EPC Global 2 standard [7]. Considering Fig. 3a, only the  $S_{12}$  of the scattering parameter matrix has been taken into account to evaluate the crosstalk between OCAs since coupling between neighboring OCA1 and OCA2 depicts the worst case.

The  $S_{12}$  parameter versus substrate resistivity is depicted in Fig. 3b for unsawn and sawed wafers. For unsawn wafer a non-stable variation was shown for substrate resistivity from 5 m $\Omega$ cm to 1  $\Omega$ cm with a peak of -48.5 dB at about 30 m $\Omega$ cm; while beyond 1  $\Omega$ cm the  $S_{12}$  was stabilized at constant value of about -51 dB. Whereas for sawed wafer, the cut leads to the absence of the peak of  $S_{12}$  parameter always below -50.6 dB, due to only one magnetic coupling effect as predicted by the electrical model below. Indeed in Fig. 4, an equivalent circuital model of two adjacent OCAs (i.e., OCA1 and OCA2 in Fig. 3a) is depicted to analyze the crosstalk phenomena, where *M* is the magnetic coupling between nominal inductances  $L_1$  and  $L_2$ ;  $M_P$  are the parasitic magnetic couplings between  $L_1$  and  $L_2$  with the parasitic inductances due to eddy currents in the substrate  $L_{IM1}$  and  $L_{IM2}$ , respectively.  $C_{SUB}$ and  $R_{SUB}$  are the substrate resistance between OCA1 and OCA2. For unsawn wafer, crosstalk between OCA1 and OCA2 can be generated through magnetic coupling, *M*, or through substrate current,  $I_X$ ; whereas for sawed wafer, crosstalk between



Fig. 4 Equivalent circuital model for crosstalk analysis related to OCA1 and OCA2

OCA1 and OCA2 can be generated only through magnetic coupling, M, because  $I_X = 0$ .

For  $\rho$  close to zero, the equivalent inductances  $L_{eq1}$  and  $L_{eq2}$  (i.e.,  $L_{eq1} = L_1 - M_P$ and  $L_{eq2} = L_2 - M_P$ ) are close to zero, thus *M* is negligible. Moreover, also  $R_{SUB}$ is close to zero, thus,  $I_X$  tends to zero and crosstalk between OCA1 and OCA2 is ideally zero.

For low substrate resistivity (e.g.,  $5 \text{ m}\Omega \text{cm} < \rho < 0.03 \Omega \text{cm}$  of Fig. 3b), the equivalent inductances  $L_{\text{eq1}}$  and  $L_{\text{eq2}}$  increase, thus increasing *M*. Moreover, in unsawn wafer,  $R_{\text{SUB}}$  increases, thus enhancing  $I_{\text{X}}$  and crosstalk between OCA1 and OCA2 rises.

At medium value of substrate resistivity, when  $M_P$  is negligible (i.e.,  $L_{eq1} \cong L_1$ and  $L_{eq2} \cong L_2$ ), M achieves the maximum value. In particular, for unsawn wafer,  $R_{SUB}$  and  $R_X$  increase until  $I_X$  is maximized, thus explaining the crosstalk peak (at about  $\rho = 0.03 \ \Omega \text{cm}$ ) in the Fig. 3b.

For medium/high substrate resistivity (e.g., 0.03  $\Omega$ cm <  $\rho$  < 1  $\Omega$ cm), M is at the maximum value, as in the previous case. Moreover, for unsawn wafer,  $I_{CSUB}$ increases because  $R_{SUB}$  and  $R_X$  are high, thus reducing  $I_X$  and crosstalk between OCA1 and OCA2. Whereas, for very high value of substrate resistivity (e.g.,  $\rho$  > 1  $\Omega$ cm),  $I_X$  contribution to crosstalk becomes negligible, thus crosstalk converging to a unique value for both sawed and unsawn wafer.

In Fig. 5a, a measurement setup was shown, whereas in Fig. 5b, the IC input voltage versus substrate resistivity and for various trench thickness in the sawed substrate, was shown for the distance of 1 mm. For a substrate completely sawed (i.e., case  $th = 750 \,\mu$ m), the induced input voltage is always higher than the threshold voltage (i.e., 500 mV), while for partially sawed wafer a substrate resistivity at least of 100 m $\Omega$ cm must be adopted.



**Fig. 5** a Snapshot of measurement setup; **b** Induced input voltage versus substrate resistivity at various trench thickness, *th* ( $\mu$ m), in the sawed substrate and at the distance of 1 mm

## 4 Conclusions

Contactless wafer–level testing exploits a simultaneous inductive coupling capable of transferring energy and exchanging communication data between a tester antenna and many integrated OCAs. The main parasitic effects due to substrate resistivity were analyzed as well as crosstalk between adjacent OCAs. Moreover, a pre–cutting of the wafer scribe lines, adopted to drastically reducing the eddy currents, was analysed. EM simulations, varying the thickness and conductivity of the substrate, show parasitic effect behavior to choose design parameters for the optimization of this contactless approach.

## References

- Finocchiaro, A., et al.: A 900-MHz RFID system with TAG-antenna magnetically-coupled to the die. In: IEEE Radio Frequency Integrated Circuits Symposium, Atlanta, GA, pp. 281–284 (2008)
- Finocchiaro, A., et al.: A fully contactless wafer-level testing for UHF RFID tag with on-chip antenna. In: 13th International Conference on Design & Technology of Integrated Systems in Nanoscale Era (DTIS), Taormina, Italy, pp. 1–6 (2018)
- Moore, B., et al.: High throughput non-contact SiP testing. In: IEEE International Test Conference, Santa Clara, CA, pp. 1–10 (2007)
- Hsu, H.-M., Chang, J.-Z.: Mutual coupling of on-chip inductors in CMOS technology. J. Micromech. Microeng. 18(3) (2008)
- Zhang, F., Kinget, P.R.: Design of components and circuits underneath integrated inductors. IEEE J. Solid-State Circuits 41(10), 2265–2271 (2006)
- Pagani, A., Girlando, G., Ziglioli, F.G., Finocchiaro, A.: IC with insulating trench and related methods. US Patent 9 887 165 (2018)

7. EPCglobal Standard Specification GS1. Available from: https://www.gs1.org/sites/default/files/ docs/epc/uhfc1g2\_1\_1\_0-standard-20071017.pdf

## Study of Low-Dose Long-Exposure Gamma Radiation Effects on InP DBR Cavity Lasers from Generic Integration Technology



## F. Gambini, N. Andriolli, V. Nurra, M. Chiesa, F. Petroni and S. Faralli

**Abstract** The electro-optical performance of InP distributed Bragg reflector cavity lasers has been studied after Gamma irradiation at different total dose, up to 50 krad. Results demonstrate a wavelength shift of the emitted peak without significant optical loss.

**Keywords** Photonic integrated circuits (PIC)  $\cdot$  Integrated optics  $\cdot$  Gamma radiation  $\cdot$  Indium phosphide

## 1 Introduction

The next era of satellite communications will require constellations of highly efficient mini-satellites with a reduced mass and size, to provide a better coverage at lower costs [1]. Hence, new technologies must be studied and implemented to reduce the size, weight, and power consumption of the components while improving the communication speed. Recently, free space laser communications (lasercom) have undergone an intensive development to overcome the limitations of wireless RFbased communications. In this scenario, photonic integrated technologies allow the reduction of mass-volume-power of the components due to their extremely compact footprint and power efficiency, while achieving the highest communication performance. Nowadays, III–V compound platforms are the only integration technology that supports electronic and photonic functionalities providing both passive and active components. Integration also enables high-volume manufacturing, which would drop

© Springer Nature Switzerland AG 2019

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_10

F. Gambini · N. Andriolli · M. Chiesa · S. Faralli (🖂)

Scuola Superiore Sant'Anna, Via Moruzzi 1, 56124 Pisa, Italy e-mail: stefano.faralli@santannapisa.it

V. Nurra · F. Petroni SITAEL S.p.A, Via Livornese 1019, 56122 Pisa, Italy

S. Saponara and A. De Gloria (eds.), Applications in Electronics

| Table 1 Irradiation testing   parameters . |          |                                     |                      |                       |
|--------------------------------------------|----------|-------------------------------------|----------------------|-----------------------|
|                                            | PIC name | Irradiation<br>period<br>(hh:mm:ss) | Total dose<br>(krad) | Absorbed<br>dose (Gy) |
|                                            | CHIP 001 | 18:12:05                            | 6                    | 60.07                 |
|                                            | CHIP 002 | 24:20:08                            | 20                   | 80.07                 |
|                                            | CHIP 003 | 68:30:03                            | 50                   | 225.30                |

the fabrication costs [2–4]. However, the space radiation environment can limit the performance and life-time of the circuits. It is therefore necessary to study the effects of the space radiation on the photonic integrated circuits (PIC) in order to ensure the necessary reliability of the lasercom systems. Studies on particle and ionizing radiation effects on light-emitting diodes (LED) and laser diodes have been performed since 1970 and the results demonstrated a relation between damage, particle type, and energy [5–8]. These works highlighted a degradation in the emitted optical power and threshold current for LED and laser diodes.

The evolution of fabrication processes and material composition of the optoelectronic devices has improved the efficiency of the emitters and allowed the emergence of generic integration technologies, serving a variety of different applications thanks to a standardized fabrication process and predefined design libraries [4]. Nevertheless more investigations must be carried out to understand the effect of the radiation on the newer devices, especially for long exposure time and low total dose, identified as the most detrimental scenario [9, 10]. In this scenario, Gamma rays are the shortest-wavelength and highest-energy form of electromagnetic radiation in the universe. This radiation has the ability to move through substances and alter them as it passes through by ionizing the atoms of the material with which it interacts. This paper investigates the effect of low-dose Gamma radiations on InP sampled grating distributed Bragg reflector (DBR) cavity lasers from generic integration technology with a long exposure time. Gamma irradiation of the DBR lasers at different total doses has been studied using a Cobalt 60 Gamma source as recommended by the European Space Agency (ESA) basic specification for the irradiation test method [11]. As reported in Table 1, the irradiation with a total dose between 6 and 50 krad has been performed in order to monitor the level of radiation damage of the devices to be used in low earth orbit satellites [12].

#### 2 The Fabricated Device

Figure 1 shows the picture of one of the fabricated PIC. The circuit consists of a monolithically integrated tunable laser providing a continuous-wave source in the C-band. The layout of the DBR laser can be divided into four sections: a front mirror (30  $\mu$ m long), a gain stage (600  $\mu$ m), a phase shifter (125  $\mu$ m) and a rear mirror (250  $\mu$ m). The mirrors are DBR gratings with a grating pitch of 237.52 nm. They





allow to control the emitted wavelength peak by varying the amplitude of the injected electrical current, while the phase shifter slightly varies the cavity length. The output of the laser crosses a  $2 \times 2$  multi-mode interference coupler (adding an optical loss of 3 dB) before reaching the spot size converter (SSC), used to widen the optical mode to improve the coupling with a tapered fiber with a spot size diameter of 3  $\mu$ m used to collect the light out of the chip. Reflections are minimized thanks to anti-reflective coating on the facet and 7° angled SSC. The devices have been designed using the building blocks provided in a process design kit [4] and fabricated by Oclaro Technology plc, UK, in a multi-project wafer run of the PARADIGM generic integration platform [3].

## **3** The Gamma Radiation Test

Prior to irradiating the samples, the emitted optical power, the spectrum, and the current-voltage characteristic of the lasers have been tested using the setup shown in Fig. 2. An electrical sourcemeter provides a DC driving current to the gain stage of the device under test (DUT) through the gold electrical pads, while the voltage is recorded. No current has been injected in the rear and front mirror sections and in the phase shifter. The emitted optical power and spectra are recorded by a power meter (synchronized with the sourcemeter through a computer) and an optical spectrum analyzer (OSA), respectively. The temperature of the DUT during the measurements was controlled by a TEC controller and set at 22 °C to improve the test repeatability by maintaining the same operative conditions. Three samples have been measured and then irradiated, respectively, with 6, 20 and 50 krad of total dose using a 1.25 MeV Cobalt-60 ( $^{60}$ Co) radioisotope source at room temperature (24 °C). The three bare chips were 2.5 m far from the  $^{60}$ Co source. Table 1 reports the irradiation parameters (period and doses) for the three different PICs. No thermal annealing was performed after the irradiation.

#### 4 Results

Figure 3 reports the emitted optical power, for the three PICs under test, as a function of the injected current in the gain section (PI curve) before (dashed curves) and after (continuous curves) the Gamma irradiation. A negligible optical output power



Fig. 2 Experimental setup for electro-optical performance testing. OSA: Optical Spectrum Analyzer, DUT: Device Under Test



Fig. 3 Measured emitted optical power as a function of the injected current for the three samples before (dashed lines) and after (continuous lines) different total irradiation doses: **a** 6 krad, **b** 20 krad and **c** 50 krad. In the insets the lasing thresholds are reported

variation is shown, indicating that the provided low-dose long-exposure Gamma radiation does not affect this parameter. The inset also demonstrates that the lasing threshold does not vary, occurring in all cases at 16 mA. Figure 4 shows the laser emission spectra on each of the three chips, for an operative current of 100 mA. Spectra before (after) irradiation are reported in dashed (continuous) lines. Variations of the emitted wavelength peaks are clearly visible and they are equal to 0.171 nm, 0.473 nm, and 0.881 nm for 6 (red), 20 (green), and 50 krad (blue) of total dose, respectively. Figure 5 highlights the correlation between the total radiation dose and the shift of the emitted wavelength peak for different laser operative conditions i.e., with a current in the gain section of: 80 mA, 90 mA and 100 mA, respectively. The curves report a similar red-shift behavior, linearly increasing with the dose. This behavior can be ascribed to the interaction between the ionizing radiation and the semiconductor, which produces electron-hole pairs. The electrons propagate in the material and generate secondary electron cascades [13]. This varies the refractive index of the gratings in the DBR structure, thus causing the wavelength shift reported in the results.



Fig. 5 Wavelength shift as a

function of the total radiation dose for 80 (red), 90 (green)

and 100 mA (blue) of

injected current



81

## **5** Conclusions

The work shows the results achieved through the experimental study of the effect of Gamma radiation on InP DBR lasers fabricated using a generic integration platform. The electrical and optical performance of the devices has been evaluated for a total dose up to 50 krad and compared with the analysis performed before the irradiation. The total emitted power and the lasing threshold are not affected by the irradiation. However, a wavelength shift of the emitted peak is noticed, showing a linear correlation with the total dose. This effect can be related to the variation of the refractive index of the DBR gratings.

## References

- 1. Busch, S., et al.: UWE-3, in-orbit performance and lessons learned of a modular and flexible satellite bus for future pico-satellite formations. Acta Astronaut. **117**, 73 (2015)
- Heck, M.J.R., et al.: Hybrid silicon photonic integrated circuit technology. IEEE J. Sel. Top. Quantum Electron. 19(4) (2013)
- 3. http://paradigm.jeppix.eu
- Smit, M., et al.: An introduction to InP-based generic integration technology. Semicond. Sci. Technol. 29(8) (2014)
- Compton, D.M.J., Cesena, R.A.: Mechanisms of radiation effects on lasers. IEEE Trans. Nucl. Sci. 14, 55 (1967)
- Barnes, C.E.: Radiation effects in electroluminescent diodes. IEEE Trans. Nucl. Sci. 18, 322 (1971)
- 7. Hum, R.H., Barry, A.L.: Radiation damage constants of light-emitting diodes by a low-current evaluation method. IEEE Trans. Nucl. Sci. **22**, 2482 (1975)
- Epstein, A.S., Trimmer, P.A.: Radiation damage and annealing effects in photon coupled isolators. IEEE Trans. Nucl. Sci. 20, 391 (1972)
- Johnston, A.H., et al.: Low dose rate effects in shallow trench isolation regions. IEEE Trans. Nucl. Sci. 57(6), 3279 (2010)
- Witczak, S.C., et al.: Dose-rate sensitivity of modern nMOSFETs. IEEE Trans. Nucl. Sci. 52(6), 2602 (2005)
- 11. ESCC Basic Specification No. 22900 Issue 5, Total dose steady-state irradiation test method. https://escies.org
- Suparta, W., Zulkeple, S.K.: Investigating space radiation environment effects on communication of Razaksat-1. J. Aerosp. Technol. Manag. 10, e2218 (2018). https://doi.org/10.5028/ jatm.v10.815
- Phifer, C.C.: Effects of radiation on laser diodes. Report Lasers, Optics, Plasma Sciences, Vision Science and Remote Sensing Department, Sandia National Laboratories, p. 15 (2004)

## **Technological Advances Towards 4H-SiC JBS Diodes for Wind Power Applications**



Jonas Buettner, Tobias Erlbacher and Anton Bauer

Abstract Carefully designed 4H-SiC Junction Barrier Schottky diodes are capable of the low on-state losses and surge current ruggedness required to be employed as freewheeling diodes in wind turbine generators. Ion implantation is a crucial process step for the performance of such JBS diodes. To better understand the influence of the implantation on the forward characteristics, JBS and Schottky diodes were fabricated and characterized. The measurement data was compared with TCAD models. Monte Carlo simulations were used to accurately model the implantation including lateral straggling and channeling. The simulations show that the actual junction barrier spacing is reduced by 1  $\mu$ m in the manufactured device compared to the intended spacing. Schottky region pinch-off which occurs at a spacing of less than 3  $\mu$ m must be avoided.

## 1 Introduction

Doubly fed induction generators (DFIGs) are widely used in large, variable speed wind turbines to enable the rotor to turn in a wide range of frequencies instead of being fixed to a single operating frequency [1]. To provide this ability, the DFIG is equipped with a four-quadrant converter that adjusts the rotor current amplitude and frequency to feed power into the grid at 50 Hz. However, during a fault in the grid, e.g. a sudden voltage drop, the rotor windings induce a surge current across the converter as depicted in Fig. 1. Conventionally, crowbars are used to protect the converter from extended periods of overcurrent. They short-circuit the rotor windings in the case

Fraunhofer Institute for Integrated Systems and Device Technology IISB, Schottkystr. 10, 91058 Erlangen, Germany

e-mail: jonas.buettner@iisb.fraunhofer.de

T. Erlbacher e-mail: tobias.erlbacher@iisb.fraunhofer.de

A. Bauer e-mail: anton.bauer@iisb.fraunhofer.de

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_11

J. Buettner  $(\boxtimes) \cdot T$ . Erlbacher  $\cdot A$ . Bauer

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics



Fig. 1 A wind turbine setup with a doubly fed induction generator featuring crowbars to protect the converter from surge current events in the rotor due to grid faults



Fig. 2 Schematic cross-section of a JBS diode with implanted  $p^+$ -junction barrier. The cut-out shows the region covered by the TCAD model.  $W_p$  is the  $p^+$ -region width and  $W_s$  is the Schottky width

of a surge current event. One major drawback of this crowbar configuration is the complete shutdown of the power generation after the fault. Ideally, to be efficient and to require little maintenance, the converter should have the ability to ride-through temporary grid based faults The wind turbine would then be able to immediately return to normal operating conditions after the disturbance subsides. Therefore, active protection inside the converter is being investigated. High voltage freewheeling diodes with a high ruggedness against surge currents and low conduction losses are part of this configuration [2].

4H-SiC Junction Barrier Schottky (JBS) diodes are capable of high current densities at low forward bias due to their unipolar operating mode while an implanted junction barrier shields the Schottky contact from high electric fields under reverse bias. A schematic cross-section of a JBS diode is shown in Fig. 2. In the event of a surge current minority carriers are emitted from the junction barrier into the epitaxial layer facilitating conductivity modulation. This mechanism provides a high ruggedness against overcurrents allowing up to 20 times the nominal current without damaging the device.

85

The junction barrier is formed by a regular striped pattern of  $p^+$ -doped regions at the surface of the epitaxial layer. These  $p^+$ -regions take up part of the active area of the device reducing the Schottky area available for carrier transport, thus increasing the on-state voltage drop. When designing the diode, the ratio of the respective widths,  $W_s/W_p$ , where  $W_s$  is the Schottky contact width and  $W_p$  the width of the  $p^+$ -region as illustrated in Fig. 2, can be used to tune the electrical properties of the device. A high ratio reduces the on-state resistance during normal operation while a lower ratio improves surge current ruggedness.

Since the p<sup>+</sup>-barrier is formed via an implantation of aluminium ions into the nitrogen doped epitaxial layer the doping profile, and thus the actual spacing of the barrier, depends on several factors. These include lateral straggling and channeling of the implanted ions as well as the accuracy of the lithography during fabrication. These influences are challenging to model analytically. A previous study used TCAD to show the dependence of the forward current density on the barrier spacing [3]. This study was done using rectangular abrupt analytical implant profiles. Lateral straggling was not taken into account which occurs due to scattering of the implanted ions at lattice atoms [4]. In this work, the geometry of the simulation model takes into account these manufacturing dependent parameters and is compared to experimental data from manufactured devices.

#### 2 Fabrication and Simulation

We fabricated high voltage JBS diodes with 2, 3 and 4  $\mu$ m wide Schottky contacts. The p<sup>+</sup>-region width was chosen to be 2  $\mu$ m. The respective ratios W<sub>s</sub>/W<sub>p</sub> were 1.0, 1.5 and 2.0. The diodes were fabricated on 100 mm 4H-SiC wafers with a 45  $\mu$ m wide, n-doped epitaxial layer and a surface doping concentration of  $1.5 \times 10^{15}$  cm<sup>-3</sup>. The junction barrier was formed by Al ion implantation through a resist mask patterned via photolithography. The implantation parameters were chosen to yield a 5  $\times 10^{19}$  cm<sup>-3</sup> box profile with a total dose of 2.6  $\times 10^{15}$  cm<sup>-2</sup>. The implantation was performed under 7° tilt. Ni silicidation was used to form the backside Ohmic contact. The anode contact metallization was 100 nm thick titanium annealed at 450 °C for 30 min. For reference and calibration of the metal work function in the simulation, we fabricated Schottky barrier diodes (SBD) on the same wafers. The processing of the diode was done analogous to the fabrication previously outlined for similar diodes [5].

All diodes are modeled with Synopsis Sentaurus. The model uses Monte Carlo process simulation for the implantation profiles to accurately account for the implantation parameters as well as lateral straggling and channeling. Manufacturing tolerances, such as overexposure during lithography and resist shrinkage due to ion bombardment during implantation, widen the p<sup>+</sup>-regions and shorten the Schottky

contact. These deviations from the intended dimensions are taken into account by increasing  $W_p$  and reducing  $W_s$  by equal amounts keeping the sum of the parameters constant.

## **3** Results and Discussion

The Monte-Carlo simulated 2D doping concentration profile of a 2  $\mu$ m wide aluminium implantation is shown in Fig. 3. The metallurgical pn-junction, i.e. where the aluminium concentration is equal to the doping concentration of the epitaxial layer, is indicated by the thin black line. The vertical 1D profile along line B in the center of the structure is plotted on the left. Compared to an analytical approximation, channeling of the ions is evident from a depth of 700 nm and below. The horizontal doping profile just beneath the surface along line A is shown below the Al concentration map. The profile shows significant lateral straggling. The junction is located about 250 nm outside the implantation window on each side. Maximum straggling is reached at a depth of about 300 nm. There the p<sup>+</sup>-region is close to 800 nm wider than intended. This is significant for diodes with only a few micrometers nominal Schottky width.

In order to obtain a good match between measurement and simulation the  $p^+$ -width had to be set 500 nm larger than the nominal width. The Schottky contact was



Fig. 3 The simulated concentration map of the implanted aluminium ions. A Horizontal doping profile below the Schottky contact. B Analytical and simulated profile in vertical direction along the center of the structure



Fig. 4 Measurement (symbols) and simulated (lines) J-V data of SBD and JBS diodes with junction barrier spacing of 2  $\mu$ m, 3  $\mu$ m and 4  $\mu$ m

shortened by the same amount. A ratio  $W_s/W_p = 2.0$  is thus reduced to 1.4 in the actual device. In combination with the lateral straggling, this results in an effective reduction of the Schottky region by about 1  $\mu$ m.

In Fig. 4, representative forward current density-voltage data of SBD and each JBS diode variant is plotted. The data is accompanied by curves from TCAD simulations. Calibration of the Schottky metal work function was done using the data obtained from the SBD. Excellent agreement between measurement data and simulation for this diode was achieved using a value of 4.31 eV which agrees well with published data for titanium. A Schottky barrier height of about 1.05 eV is extracted from the conduction band energy ( $E_c$ ) diagram in Fig. 5. The simulated curves for the JBS diodes coincide very well with the measured data except for diodes with the spacing  $W_s = 3 \,\mu$ m. The measured forward voltage drop is higher than the simulation or diffusion of carriers across the pn-junction might cause this discrepancy. Further refinements of the physical models used in the simulation are needed to capture this increased voltage drop.

The map of the 2D distribution of  $E_c$  for a JBS diode with nominal spacing  $W_s = 2 \mu m$  on the left in Fig. 5 shows that the space charge region indicated by the thin white lines fully pinches off the Schottky contact. The two p<sup>+</sup>-regions create a potential barrier for the electrons at a depth of about 600 nm. This barrier is higher by about 0.8 eV than the Schottky barrier and is causing the increased voltage drop for diodes with this spacing. SBD and JBS diodes with  $W_s = 4 \mu m$  have a very similar conduction band energy profile. The small increase in forward voltage drop seen in Fig. 4 is due to the reduced Schottky area available for current transport.



Fig. 5 On the left, a 2D map shows the conduction band energy at 0 V for a JBS diode with  $W_s = 2 \mu m$ . The graph on the right plots the profiles of the conduction band energy along the horizontal center for each diode (line A)

The decrease in Schottky spacing due to manufacturing is expected to be beneficial to the surge current properties of these diodes. Knowledge of the actual  $p^+$ -width and lateral straggling are critical for the accuracy of the model in the high current regime. Measurements under such conditions will reveal the influence of the junction barrier on the ruggedness of the devices and the applicability of the simulation model at high currents.

## 4 Conclusion

Wind turbine generators need freewheeling diodes with low conduction losses and surge current ruggedness. We have fabricated SBD and JBS diodes with differently patterned junction barriers to investigate the effects of manufacturing on the electrical properties. Accurate modelling of the implantation of the p<sup>+</sup>-regions was the focus of TCAD simulations of these diodes. Measured data shows nearly ideal Schottky diodes with a Schottky barrier of 1.05 V. Lateral straggling is evident in the Monte-Carlo simulated doping profiles. Manufacturing tolerances further reduce the junction barrier spacing. The Schottky contact width is reduced by about 1  $\mu$ m due to these effects. The forward current densities decrease with smaller Schottky regions. At a spacing of less than 3  $\mu$ m the p<sup>+</sup>-regions start to pinch off the Schottky region causing an increased forward voltage drop. Further investigations into the surge current ruggedness are necessary to fully assess the impact of the junction barrier design and manufacturing parameters.

Acknowledgements This work was supported by the SPEED (Silicon Carbide Power Technology for Energy Efficient Devices) project funded by the European commission under PFP7-Grant #604057.

## References

- Seman, S., Niiranen, J., Virtanen, R., et al.: Low voltage ride-through analysis of 2 MW DFIG wind turbine - grid code compliance validations. Energy Society General Meeting, Pittsburgh, PA, USA (2008)
- Huang, X., Wang, G., Lee, M.-C., et al.: Reliability of 4H-SiC SBD/JBS diodes under repetitive surge current stress. In: 2012 IEEE Energy Conversion Congress and Exposition (ECCE), Raleigh, NC, USA (2012)
- 3. Huang, Y., Wachutka, G.: Comparative study of contact topographies of 4.5 kV SiC MPS diodes for optimizing the forward characteristics. In: 2016 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD), Nuremberg, Germany (2016)
- 4. Jiang, Y.F., Baliga, B.J., Huang, A.Q.: Influence of lateral straggling of implated aluminum ions on high voltage 4H-SiC device edge termination design. Mater. Sci. Forum **924**, 361–364 (2018)
- Schoeck, J., Buettner, J., Rommel, M., et al.: 4.5 kV SiC junction barrier schottky diodes with low leakage current and high forward current density. Mater. Sci. Forum 897, 427–430 (2017)

# Part IV Sensors and Transducers

## A Scalable 2D, Low Power Airflow Probe for Unmanned Vehicle and WSN Applications



Paolo Bruschi, Andrea Ria and Massimo Piotto

**Abstract** A compact anemometer, capable of detecting the magnitude and direction of the wind in a plane, is presented. The device constitutes an evolution of a class of sensors that exploit a recently proposed original approach, involving fluidic processing of the pressures induced around a cylinder. A significant size reduction with respect to previous prototypes has been achieved by the use of a tiny differential pressure sensor based on a MEMS System on a Chip. Preliminary characterization performed in a wind tunnel is presented.

## 1 Introduction

Measurement of wind velocity and direction is traditionally required in many application fields, including meteorological studies and aviation. Recently, the development of ICT and, in particular, the evolution of wireless sensor networks (WSNs) have stimulated new applications in different disciplines. In agriculture, WSNs equipped with directional anemometers are useful in optimizing the applications of agrochemicals [1], improving living conditions of animals inside livestock and poultry facilities [2] or controlling the greenhouse microclimate [3]. Distributed monitoring of wind inside urban areas is required for precise assessment of wind resources [4, 5] or monitoring the spreading of pollutants [6]. In this field, small unmanned aerial vehicle (UAV) equipped with directional anemometers can be an effective solution to measure fluctuating flows within urban environments with a good spatial resolution [7, 8]. Directional anemometers placed on mobile robots have been proposed for gas-tracking applications [9, 10] or indoor dead reckoning localization [11]. Most of

P. Bruschi (🖂) · A. Ria · M. Piotto

Università di Pisa, Dept. Ingegneria Dell'Informazione, via Caruso 16, 56122 Pisa, Italy e-mail: paolo.bruschi@unipi.it

A. Ria e-mail: andrea.ria@ing.unipi.it

M. Piotto e-mail: massimo.piotto@unipi.it

© Springer Nature Switzerland AG 2019

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_12

S. Saponara and A. De Gloria (eds.), Applications in Electronics



Fig. 1 a Representation of the cylinder cross-section exposed to a wind of velocity w. A reference diameter forming an angle  $\theta$  with the wind direction is shown. b Typical behavior of the differential pressure  $\Delta P$  measured across the diameter as a function of  $\theta$ 

these new applications require miniaturized devices with low power consumption and without moving parts. Ultrasonic anemometers are currently the solution that best satisfy these requirements [12, 13]. However, their cost is high and the power consumption is not negligible if compared with that of small UAVs or robots. Solutions based on multi-hole pressure probes [14, 15] usually require complex calibrations and bulky data acquisition equipment [7]. In 2009, Bruschi et al. [16] demonstrated that a compact 2D airflow probe can be obtained sampling the pressure around a small cylinder by means of holes connected to only two MEMS flow sensors. The proposed method has been used to fabricate low power, low cost directional anemometers [8, 17, 18].

In this work, a further evolution of the device proposed in [17] is presented, consisting in combining an improved fluidic section with a versatile System on a Chip (SoC) including both flow-sensitive devices and an ultra-low noise, low power analog front end.

## 2 Principle of Operation and Device Description

Let us consider a cylinder exposed to a wind of velocity **w**, perpendicular to the cylinder axis, as shown by the cross section in Fig. 1a. The differential pressure  $(\Delta P)$  developed across a diameter depends on the angle  $\theta$  between the diameter and the wind direction according to the behavior shown in Fig. 1b. Such a dependence is not suitable for simple estimation of the wind direction and magnitude [16].

The solution proposed in [16] is illustrated in Fig. 2a. A symmetrical configuration of diameters, placed at angular distances  $\phi_i$  from the symmetry axis, replaces the single diameter of Fig. 1a. The proposed approach consists in combining the various pressures  $\Delta P_i$  that are developed across the diameters to form a global differential pressure given by:

$$\Delta P_X = \sum_{-N}^{N} a_i \Delta P_i, \tag{1}$$



Fig. 2 Schematic representation of a symmetrical set of diameters (a), whose differential pressures are combined to form a quantity that depends on the wind direction according to a cosine law (b)

where  $a_i$  are proper weights. In [16], it has been shown for the first time that, with only three diameters, placed at angles  $-40^\circ$ ,  $0^\circ$  and  $+40^\circ$  and with a set of uniform weights, the global pressure follows a cosine behavior surprisingly well across a large wind velocity range. As a result, two identical orthogonal arrangements of diameters will give two differential pressures,  $\Delta P_X$  and  $\Delta P_Y$ , with the following dependence on the wind direction ( $\theta$ ) and velocity (*u*):

$$\begin{cases} \Delta P_X = H(u)\cos(\theta) \\ \Delta P_Y = H(u)\sin(\theta) \end{cases}$$
(2)

where H(u) is a monotonic function of the wind velocity. In [16] it was also shown that the calculation in (1) could be performed in the fluidic domain (i.e. using only a fluidic structure), avoiding the need to acquire a relatively large number of pressures. The approach is shown in Fig. 2b: two cavities  $C_1$  and  $C_2$  are created inside the cylinder; micro-channels connect the cavities to the outside air, just at the points where the diameters meet the outer surface. By modulating the channel lengths, it is possible to change the weights  $a_i$  in (1). The first complete 2D anemometer based on this principle was presented in [17], where an improved 5-diameter configurations, was combined with a 2-channel MEMS flow sensor, used as an ultra-sensitive differential pressure sensor.

A theoretical explanation of the principle was recently developed [19], introducing a systematic approach to find the optimal combinations of weights  $(a_i)$  and angles  $(\phi_i)$ .

The anemometer described in this work uses a seven diameter configuration, with equal spacing between diameters (22.5°) and weights proportional to  $cos(\phi_i)$  (nonuniform weights). The final configuration of the device is shown in the exploded view and photograph of Fig. 3. Conversion between the airflow and differential pressure  $\Delta P_X$  and  $\Delta P_Y$  is operated by a 2 cm cylindrical fluidic probe, formed by stacking two orthogonal sections (X and Y) where the cavity/channel structure described earlier has been carved by means of a computer controlled milling machine. The material of the probe is PolyMethylMethAcrylate (PMMA). The cross sections of the channels that connect cavities C<sub>1</sub> and C<sub>2</sub> to the outside air are 1 × 1 mm<sup>2</sup> for the X section



Fig. 3 Simplified exploded view (left) and photograph (right) of the proposed anemometer

and  $0.5 \times 0.5 \text{ mm}^2$  for the Y section. This difference was introduced to check the role of channel size on the sensor accuracy and sensitivity [19]. The two differential pressures are read by means of an ultra-sensitive 2-channel MEMS sensor.

This kind of MEMS sensor detects the differential pressure by measuring the flow induced into a micro-channel of known hydraulic conductance [20]. Several differential thermal flow sensors are integrated into a single MEMS SoC. In order to expose the flow sensors to independent flows, the packaging approach shown in Fig. 4 (left) has been used [21, 22]. Briefly, a PMMA air conveyor with a front face provided of trenches and sized to fit within the chip pad frame is applied to the chip surface after being coated with a sealant (silicone glue). In this way, the trenches form flow channels, each one including a single sensing structure. Channels carved into the PMMA conveyor allow connection to both ends of the trenches from the opposite surface, purposely enlarged.

The SoC was designed and fabricated with the STMicroelectronics BCD6 s process, completed with post-processing procedures [23] (selective anisotropic etching). The architecture of the SoC is shown in Fig. 4 (right). The chip includes three flow sensing structure  $S_{1-3}$ , (only  $S_2$  and  $S_3$  are used in this work). A 4-way analog multiplexer (mux) selects one of the structures and connects it to an analog front-end, formed by a low noise, low offset chopper amplifier (gain = 200) and a programmable heater-driver. The analog signal is read by a purposely built printed circuit board (PCB) equipped with an MSP430-i2041 microcontroller (Texas Instruments) and by a serial-to-USB converter for connection to a personal computer. The total power consumption of the SoC is around only 6 mW.



Fig. 4 Schematic description of the 2-channel differential pressure sensor

### **3** Experimental Results

The anemometer, mounted on a rotating goniometer, has been placed inside a wind tunnel, formed by a 1 m long pipe of 12 cm diameter, equipped with a controllable fan. Figure 5 (left) shows pressures  $\Delta P_X$  and  $\Delta P_Y$  as a function of the angle as measured by the MEMS sensor. A good agreement with the cosine and sine fits can be observed. The main result, confirmed by the data at different velocities, is that the smaller cross-section of the channels in the Y section produces a sensitivity reduction with no benefit in terms of accuracy (see the standard deviations). This seems to rule out the hypothesis introduced in [19] to explain the larger-than-theoretical errors measured in actual devices in terms of channel non-idealities, such as finite cross-section and reduced aspect ratio.

Figure 5 (right) shows the angle measured by the proposed anemometer as a function of the actual wind direction. Angle determination has been performed inverting (2) after equalization of the  $\Delta P_Y$  pressure to compensate for the reduced sensitivity. The maximum angular error is  $\pm 10^{\circ}$ , which is considerably higher than the theoretical value predicted in [19] for the configuration used in this paper ( $\pm 1^{\circ}$ ). Further work is required to investigate the cause of the residual inaccuracy. However, the accuracy of the proposed device meets the specifications for a large number of applications such as WSN for environmental monitoring and autonomous robots for gas source finding.



**Fig. 5** Measured differential pressures produced by sections X and Y as a function of the wind direction ( $\theta$ ). For a 2 m/s wind velocity (left). Cosine and sine fits are included. Relative deviation from the fits (standard deviations  $\sigma$ ) are indicated for each curve. Estimated angle as a function of the actual wind direction (right). The ideal curve is shown by the solid line

## References

- Pajares, G., Peruzzi, A., Gonzalez-de-Santos, P.: Sensors in agriculture and forestry. Sensors 13, 12132–12139 (2013)
- Gao, Y., Ramirez, B.C., Hoff, S.J.: Omnidirectional thermal anemometer for low airspeed and multi-point measurement applications. Comput. Electron. Agric. 127, 439–450 (2016)
- 3. López, A., Valera, D.L., Molina-Aiz, F., Peña, A.: Thermography and sonic anemometry to analyze air heaters in mediterranean greenhouses. Sensors **12**, 13852–13870 (2012)
- Murthy, K.S.R., Rahi, O.P.: A comprehensive review of wind resource assessment. Renew. Sustain. Energy Rev. 72, 1320–1342 (2017)
- Karthikeya, B.R., Negi, P.S., Srikanth, N.: Wind resource assessment for urban renewable energy application in Singapore. Renew. Energy 87, 403–414 (2016)
- Dobre, A., Arnold, S.J., Smalley, R.J., Boddy, J.W.D., Barlow, J.F., Tomlin, A.S., Belcher, S.E.: Flow field measurements in the proximity of an urban intersection in London, UK. Atmos. Environ. 39, 4647–4657 (2005)
- Prudden, S., Fisher, A., Marino, M., Mohamed, A., Watkins, S., Wild, G.: Measuring wind with Small Unmanned Aircraft Systems. J. Wind Eng. Ind. Aerodyn. 176, 197–210 (2018)
- Bruschi, P., Piotto, M., Dell'Agnello, F., Ware, J., Roy, N.: Wind speed and direction detection by means of solid-state anemometers embedded on small quadcopters. Procedia Eng. 168, 802–805 (2016)
- Fukazawa, Y., Ishida, H.: Estimating gas-source location in outdoor environment using mobile robot equipped with gas sensors and anemometer. In: Proceedings of IEEE Sensors 2009, pp. 1721–1724 (2009)
- Martínez, D., Clotet, E., Tresanchez, M., Moreno, J., Jiménez-Soto, J.M., Magrans, R., Palacín, J.: First characterization results obtained in a wind tunnel designed for indoor gas source detection. In: Proceedings of Advanced Robotics (ICAR), pp. 629–634 (2015)
- Seo, W., Baek, K.R.: Indoor dead reckoning localization using ultrasonic anemometer with IMU. J. Sens. 2017, 1–12 (2017)
- Han, D., Kim, S., Park, S.: Two-dimensional ultrasonic anemometer using the directivity angle of an ultrasonic sensor. Microelectron. J. 39, 1195–1199 (2008)
- Lopes, G.M.G., da Silva Junior, D.P., de França, J.A., de Morais França, M.B., de Souza Ribeiro, L., Moreira, M., Elias, P.: Development of 3-D ultrasonic anemometer with nonorthogonal geometry for the determination of high-intensity winds. IEEE Trans. Instrum. Meas. 66, 2836–2844 (2017)

- Bryer, D.W., Pankhurst, R.C.: Pressure-probe methods for determining wind speed and flow direction, pp. 41–74. Campfield Press, St Albans, UK (1971)
- Hall, B.F., Povey, T.: The Oxford Probe: an open access five-hole probe for aerodynamic measurements. Meas. Sci. Technol. 28(035004), 1–12 (2017)
- Bruschi, P., Dei, M., Piotto, M.: A low-power 2-D wind sensor based on integrated flow meters. IEEE Sens. J. 9, 1688–1696 (2009)
- 17. Piotto, M., Pennelli, G., Bruschi, P.: Fabrication and characterization of a directional anemometer based on a single chip MEMS flow sensor. Microelectron. Eng. **88**, 2214–2217 (2011)
- Liu, C., Du, L., Zhao, Z.: A directional cylindrical anemometer with four sets of differential pressure sensors. Rev. Sci. Instrum. 87(035105), 1–8 (2016)
- Bruschi, P., Piotto, M.: Determination of the wind speed and direction by means of fluidicdomain signal processing. IEEE Sens. J. 18, 985–994 (2018)
- Piotto, M., Del Cesta, S., Bruschi, P.: A compact, dual channel flow-based differential pressure sensor with mPa resolution and sub-10 mW power consumption. Procedia Eng. 168, 757–761 (2016)
- Bruschi, P., Nurra, V., Piotto, M.: A compact package for integrated silicon thermal gas flow meters. Microsyst. Technol. 14, 943–949 (2008)
- Bruschi, P., Dei, M., Piotto, M.: A single chip, double channel thermal flow meter. Microsyst. Technol. 15, 1179–1186 (2009)
- Piotto, M., Del Cesta, S., Bruschi, P.: Precise measurement of gas volumes by means of lowoffset mems flow sensors with μl/min resolution. Sensors 17(2497), 1–13 (2017)

## **Electronics System for Velocity Profile Emulation**



Dario Russo, Valentino Meacci and Stefano Ricci

Abstract The possibility of detecting the velocity profile in a fluid flowing in an industrial pipe is of high importance for several applications, like the accurate measurement of the volume flow or the rheological characterization of the fluid. Recently, industrial embedded systems have been presented that detect the velocity profile in pipes through Pulsed Ultrasound Velocimetry (PUV) method, based on Doppler ultrasound. The development, test and characterization of these systems are currently based on flow-rigs, which consist of hydraulic systems where a known fluid flows in a pipe circuit pushed by a pump. Unfortunately, flow-rigs are cumbersome and produce velocity profile whose features are not perfectly known. In this work, an electronic system that mimics the echo signal produced by a flow-rig is presented. Characteristic of the emulated profile, like signal-to-noise ratio, shape, velocity, etc., are fully programmable and perfectly known, thus a complete and reliable evaluation of the performance of the PUV system under test is now possible.

## 1 Introduction

Ultrasound is widely employed in industries for monitoring the fluids and suspensions involved in the production process [1]. Recently, ultrasound systems are available that, by measuring the velocity profile that a flow develops when moving in a pipe, can characterize the fluids in-line and in real-time without the intervention of operators [2]. These systems are typically based on the Pulsed Ultrasound Velocimetry (PUV) technique [3, 4], where the velocity profile is detected through Doppler ultrasound with a modality similar to that employed in medical echo-Doppler [5].

D. Russo (🖂) · V. Meacci · S. Ricci

- V. Meacci e-mail: valentino.meacci@unifi.it
- S. Ricci e-mail: stefano.ricci@unifi.it
- © Springer Nature Switzerland AG 2019

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_13

Information Engineering Department, University of Florence, 50123 Florence, Italy e-mail: dario.russo@unifi.it

S. Saponara and A. De Gloria (eds.), Applications in Electronics

A Doppler PUV system needs to be submitted to several tests during development, production and maintenance. Unfortunately, a comprehensive test is possible only by connecting the system to a cumbersome flow-rig, i.e. a hydraulic system where a known fluid flows inside a pipe-circuit moved by a pump. However, the flow velocity profile, to be used as a ground-truth in the test, is only partially known, limiting the accuracy of the test.

In this work a simple and flexible electronics system is presented for the first time, which emulates the Radio Frequency (RF) echo signal generated by a flow-rig. Not only the cumbersome flow-rig can be avoided, but, since the echo signals are synthetized with known and programmable features, the performance of the PUV Doppler system under test can be effectively evaluated. The features of the presented profile emulator are shown in experiments by connecting it to a PUV system previously developed by our research group for rheological fluid characterization [6, 7].

### 2 The Profile Emulator

Fig. 1 A fluid flowing in a

pipe is investigated with an

angle ultrasound beam

#### 2.1 Ultrasound Signal Features and PUV Basic Operations

In a typical PUV application the fluid flowing in a pipe is investigated by an ultrasound beam angled by  $\theta$  with respect to the fluid velocity (see Fig. 1).

The transducer emits periodic ultrasound bursts of energy at Pulse Repetition Interval (PRI), typically constituted by 3-10 sinusoidal cycles at frequency that ranges from hundreds of kHz to tens of MHz. The fluid particles, moving at velocity v, produce echoes whose frequency is shifted by the Doppler effect:

$$f_D = 2f_t \frac{v}{c} \cos(\theta) \tag{1}$$

where c is the sound velocity. Figure 2 reports an example of RF signal received in a PRI from a fluid moving in a 8 mm pipe and investigated with 7 MHz burst. The 2 strong echoes visible at 6 and 17  $\mu$ s are the reflections of the static pipe walls, while





Fig. 2 Example of RF signal from a fluid in a 8 mm pipe investigated at 7 MHz

**Table 1** Features of the<br/>proposed system

| Parameter              | Value                    |
|------------------------|--------------------------|
| Output voltage         | Up to 400 mVpp           |
| frequency range        | 0.1 ÷ 10 MHz             |
| PRI range              | $0.1 \div 10 \text{ ms}$ |
| Sampling freq.         | 50 Msps                  |
| Resolution             | 14 bit                   |
| SDRAM size             | 64 MB                    |
| Flash size             | 128 MB                   |
| Emulated pipe diameter | Up to 15 cm              |

the weak signal in the middle is the fluid echo affected by the Doppler shift. The Doppler shift, and thus the velocity, is detected by performing a spectral analysis that correlates the signal sampled at the same depth, i.e. the same time offset starting from the PRI beginning. The power spectra, aligned along the rows of the power spectral matrix, produce an intuitive representation of the flow profile. An example is reported in the following experimental result, in Fig. 4, left. The image segmentation produces the velocity profile (see, e.g. Fig. 4, right).

### 2.2 The Electronics System

The proposed system basically consists in a signal synthesizer that produces, for every PRI, the complex ultrasound echo signal generated by the fluid scatterers, like that of Fig. 2. Its output signal, processed by the PUV system under test, should results in the desired spectral matrix and velocity profile, like those shown in Fig. 4. The main characteristics of the system are listed in Table 1.

The system, whose architecture is reported in Fig. 3, is based on the EP3C25F256 Field Programmable Gate Array (FPGA) from the Cyclone family of Altera-Intel (San Jose, CA, USA). It connects to a 64 MB MT48H32M16 SDRAM (Micron


Fig. 3 Architecture of the proposed system

Technology, Boise, USA), a Flash memory (Altera-Intel) where the signal samples are stored, and an AD9707 (Analog Devices, Norwood, MA, USA) 14-bit Digitalto-Analog (DA) converter. An analog section follows the DA converter that amplifies the signal up to 400 mVpp, over a 0.1–10 MHz bandwidth. The internal operation sequence is managed by a NIOS II® soft processor, included in the FPGA. After switch on, the soft processor moves the signal samples from the slow serial flash memory to the SDRAM to allow a fast reproduction during the signal generation. Then, for each PRI, the samples of the PRI to be produced, transit through a FIFO memory (see Fig. 3). The PUV system generates the *PRI sync* signal, used to start the PRI generation synchronously with the PUV system. This is a key point, since every possible jitter among subsequent PRIs is read by the PUV system as signal phase rotation, which generates Doppler artifacts. For this reason, the system allows the possibility the resynchronize its internal clock to the external reference *CLK sync* to reduce any possible jitter.

The 128 MB flash memory allows to store hundreds of PRIs, depending on the pipe diameter that has to be emulated. A pipe with diameter of d [m] needs N samples to be emulated, according to the following:

$$N = \frac{2 \cdot d}{1500 \text{ m/s}} \cdot 50 \text{ MHz}$$
(2)

where 1500 m/s is the sound velocity in water, and 50 MHz is the sampling frequency. For instance, the signal from a pipe with 2 cm of diameter lasts about 28  $\mu$ s and needs a 1400 sample@50 MHz to be emulated. In this case, the 128 MB flash memory holds more than 45 k PRIs.

The signal that the profile emulator produces is generated off-line through the specialized US simulation software Field II [8, 9], freely available at http:// field-ii.dk, and stored in the flash. Field II works as an extension of Matlab (The Mathworks, Natick, MA), and is widely used in the biomedical ultrasound research. Given the geometrical and electrical features of the transducer, the samples of the transmission signal, the static configuration of scatterers present in the field of view of the transducer, the desired SNR, etc., Field II generates an accurate simulation of the RF signal received from the mimicked configuration. A Doppler simulation of a flow is obtained by updating the scatters configuration between successive PRIs according to a predefined flow velocity field. The fluid velocity distribution can be preset (e.g. parabolic) or the behavior of complex non-Newtonian fluids or pipe geometries can be calculated by other external specific CADs. An example of this technique is reported in [10] as applied in a biomedical application.

#### **3** Experiments and Results

Some of the features of the profile emulator are shown in the following experiment. The experimental parameters are summarized in Table 2. The signal generator is connected to the PUV system described in [7]. A "smashed" profile, typical of most non-Newtonian industrial fluids, is generated with +10 dB and -20 dB SNR. The mimicked fluid flows in a 16 mm diameter pipe at 0.5 m/s peak velocity, and is investigated through a 7 mm diameter circular transducer excited at 5 MHz. The Doppler frequency corresponding through (1) to the peak velocity is 1689 Hz, or 0.34 when normalized with respect to 1/PRI. For each profile, 1024 PRIs are stored in the system memory and the PUV system is programmed to produce a power spectral matrix every 64 PRIs. Thus 8 frames per profile are produced and averaged. The result is compared to the profile used for the sample generation, and their agreement is quantified by evaluating the root square mean error (RMSE) between the curves.

The measured spectral matrices are reported on the left column of Fig. 4 with a 60 dB dynamics, while right column shows the measured velocity profiles (blue continuous curves) compared with the reference profiles (red, dashed curves). When the signal is generated with a +10 dB SNR, the spectral profile is clearly detectable and the noise is lower than -60 dB. The corresponding profile (Fig. 4 top, right) corresponds well to the reference (RMSE = 4.6%). Its peak velocity corresponds to

| General         |        | Transducer and transmission |                        | Pipe and profile      |                      |
|-----------------|--------|-----------------------------|------------------------|-----------------------|----------------------|
| PRI per<br>exp. | 1024   | Diameter                    | 7 mm                   | Diameter              | 16 mm                |
| PRI             | 0.2 ms | Bandwidth                   | $3 \div 7 \text{ MHz}$ | Velocity Peak         | 0.5 m/s              |
| Sample/PRI      | 2048   | Burst                       | Sinusoidal             | Doppler angle         | 60°                  |
|                 |        | Frequency                   | 5 MHz                  | Profile shapes        | Smashed              |
|                 |        | Cycles                      | 5                      | SNR                   | +10, -20 dB          |
|                 |        | Apodization                 | Hanning                | Peak Doppler<br>shift | 1689 Hz;<br>0.34/PRI |

 Table 2
 Parameters used in experiments



**Fig. 4** Spectral matrices (left) and velocity profiles (right) for a +10 dB (top) and -20 dB (bottom) SNR. Frequency is normalized to 1/PRI. Reference profiles are reported in red-dashed curves. Horizontal black-dashed line reports the reference peak velocity (0.5 m/s)

the expected 0.5 m/s. When the noise is raised for a SNR = -20 dB (Fig. 4 bottom), the spectral matrix is quite confused by noise. Nevertheless, the velocity profile can be still detected (Fig. 4 bottom, right), although with relative high errors (RMSE = 17.6%).

# 4 Conclusion

In this paper a simple electronics system has been presented that produces a RF signal similar to that generated at the output of an ultrasound transducer connected to a cumbersome flow-rig. Moreover, the signal generated emulates a velocity profile with perfectly known features. It represents a great advantage in terms of time and cost saving both in the development of new US Doppler systems, or in the automation of quality checks in PUV system production [11].

### References

- Salazar, J., Alava, J.M., Sahi, S.S., Turo, A., Chavez, J.A., Garcia, M.J.: Ultrasound measurements for determining rheological properties of flour-water systems. In: IEEE Ultrasonics Symposium 2002 Proceedings (2002). https://doi.org/10.1109/ULTSYM.2002.1193537
- Birkhofer, B., Debacker, A., Russo, S., Ricci, S., Lootens, D.: In-line rheometry based on ultrasonic velocity profiles: comparison of data processing methods. Appl. Rheol 22(4), 3. https://doi.org/10.3933/ApplRheol-22-44701
- Wiklund, J., Shahram, I., Stading, M.: Methodology for in-line rheology by ultrasound Doppler velocity profiling and pressure difference techniques. Chem. Eng. Sci. 62(16), 4277–4293 (2007). https://doi.org/10.1016/j.ces.2007.05.007
- 4. Muller, M., Brunn, P.O., Harder, C.: New rheometric technique: the gradient-ultrasound pulse doppler method. Appl. Rheol **7**(5), 204–210 (1997)
- Ricci, S., Matera, R., Tortoli, P.: An improved Doppler model for obtaining accurate maximum blood velocities. Ultrasonics 54(7), 2006–2014 (2014). https://doi.org/10.1016/j.ultras.2014. 05.012
- Ricci, S., Liard, M., Birkhofer, B., Lootens, D., Brühwiler, A., Tortoli, P.: Embedded Doppler system for industrial in-line rheometry. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 59(7), 1395–1401 (2012). https://doi.org/10.1109/TUFFC.2012.2340
- Ricci, S., Meacci, V., Birkhofer, B., Wiklund, J.: FPGA-based system for in-line measurement of velocity profiles of fluids in industrial pipe flow. IEEE Trans. Ind. Electr. 64(5), 3997–4005 (2017). https://doi.org/10.1109/TIE.2016.2645503
- Jensen, J.A., Svendsen, N.B.: Calculation of pressure fields from arbitrarily shaped, apodized, and excited ultrasound transducers. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 39(2), 262–267 (1992). https://doi.org/10.1109/58.139123
- 9. Jensen, J.A.: Field: a program for simulating ultrasound systems. Med. Biol. Eng. Comput. 34(1), 351-353 (1996)
- Ricci, S., Swillens, A., Ramalli, A., Segers, P., Tortoli, P.: Wall shear rate measurement: validation of a new method through multiphysics simulations. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 64(1), 66–77 (2017). https://doi.org/10.1109/TUFFC.2016.2608442
- Kotzé, R., Ricci, S., Birkhofer, B., Wiklund, J.: Performance tests of a new non-invasive sensor unit and ultrasound electronics. Flow Meas. Instr. 48, 104–111 (2016). https://doi.org/10.1016/ j.flowmeasinst.2015.08.013

# An Ultra-Low Cost Triboelectric Flowmeter



Alessandro Bertacchini and Paolo Pavan

Abstract In this paper, we present an ultra-low cost flowmeter suitable for both gases and fluids. Differently from other flowmeters presented in the literature, the prototype is based on the triboelectric effect. The realized device is extremely low cost because it uses commercial silicone as triboelectric material. The comparison between experimental measurements and output data of a commercial flowmeter, used as reference, demonstrates the effectiveness of the proposed solution in both constant and variable flow conditions. Moreover, thanks to its reconfigurable architecture, the realized device can be used for both redundant measurements and triboelectric energy harvesting purposes.

# 1 Introduction

In the last few years, thanks to the advances in emerging and enabling technologies like wireless connectivity and energy harvesting system, the smart metering received a growing interest resulting in a boost of research activities in both industry and academic domains. In this scenario, new flowmeters for civil and industrial applications gained renewed attention as demonstrated by the number of papers presented in the literature. Most of the presented solutions refer to enhanced versions of devices based on well-known transduction principles.

For example, electromagnetic flowmeters (e.g. [1]), are obstruction-free systems and have their main advantage in the not interruption of the flow in the pipe, but they do not work with non-conductive fluids. Coriolis mass flowmeters do not suffer from

A. Bertacchini (🖂)

P. Pavan

DISMI – Department of Sciences and Methods for Engineering, University of Modena and Reggio Emilia, 42122 Reggio Emilia, Italy e-mail: alessandro.bertacchini@unimore.it

DIEF – Department of Engineering "Enzo Ferrari", University of Modena and Reggio Emilia, 41125 Modena, Italy e-mail: paolo.pavan@unimore.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics* 

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_14

this limitation and keep the non-invasive feature, but are based on the mechanical oscillation of the pipe and it is essential to have a closed-loop control system to keep stable the oscillation and consequently obtain higher precision and stability of the measurement, (e.g. [2]). Vortex flowmeters, instead, are suitable for measuring all types of fluids or gases, but are not obstruction-free. In real applications, the measurement is affected by impacts and vibrations occurring during the normal operation, and therefore require complex signal processing algorithms to eliminate these effects on the measurement, (e.g. [3]).

Optical and ultrasonic flowmeters have the advantage of non-contact measurement with the fluid or gas flowing through the pipe, therefore are very useful for applications with extreme temperatures or highly polluted fluids, and gases. They can use several mechanisms like Doppler effect, laser beams (e.g. [4]), or sound pattern (e.g. [5, 6]) and are based on the cross-correlation of known signals disturbed by the fluid flowing through the pipe. Their main disadvantages are the strong dependence on the acoustic properties of the fluid that can be influenced by environmental factors (e.g. temperature), the material properties of the fluid (e.g. low density, especially in case of gases), the impurities in the fluid itself and the very small amplitude of the signals that have to be processed.

#### 1.1 Main Contributions of This Work

The solution we propose is based on the speed detection of a rotating impeller that in turn is proportional to the flow rate, like in classic electromagnetic turbine-based flowmeters, but presents the following distinctive features:

- 1. It uses triboelectricity as transduction mechanism exploiting the triboelectric properties of ultra-low cost commercial acrylic silicone.
- 2. Differently from the most used materials for triboelectric devices (e.g. PDMS, PET, FEP, EVA, PVDF), commercial silicone can be easily deposited without need of complex processes/machineries and has natural roughness (that has been demonstrated to be beneficial in triboelectric devices to increase surface charge density and voltage generation) without need for additional manufacturing processes (e.g. bumps nano patterning).
- 3. Differently from other solutions presented in literature and based on the same working principle (e.g. [7]), the proposed device has a reconfigurable architecture enabling different capabilities like redundant measurement, flow direction detection, and energy harvesting accordingly with the chosen output terminal configuration.
- 4. It can operate properly with both gases and liquids allowing overcoming the hard compatibility of triboelectricity with liquids, because the sensing element is not in contact with the material flowing into the main body of the realized device.

#### 2 System Description

The exploded sketch of the TriboElectric Flowmeter (TEF) we realized is shown in Fig. 1. All the mechanical parts have been realized in ABS (Acrylonitrile Butadiene Styrene) using a commercial 3D printer. The system is comprised of two main parts. The first one (i.e. Main Body) resembles the classic structure of impellerbased flowmeters, while the second one (i.e. Crankcase) includes the triboelectric transducer.

When a gas/fluid flows through the main body induces the rotation of the impeller into the main body, which in turn activates the rotation of the triboelectric impeller into the crankcase thanks to the rigid connection realized by the shaft.

The crankcase is a dry chamber isolated from the main body by means of gaskets. Consequently, the measurement is independent of the gas/fluid flowing into the main body and it is related only to the rotation speed of the triboelectric impeller and to the internal diameter of the incoming and outcoming pipes of the main body.

A 0.2 mm thick layer of commercial acetylic silicone has been deposited onto the surface of the four triboelectric impeller's blades facing to the closure cap. The crankcase closure cap, instead, is realized using a double-sided 1.6 mm thick FR4 board with 35  $\mu$ m of copper on each side. The surface of the closure cap faced to the impeller has been divided into eight sectors by mechanical milling, and an electric contact for each sector has been realized. Each sector has the same physical dimensions (and same area) of one of the four blades of the triboelectric impeller.

Two adjacent sectors (namely S- and S+) forms a pair of positive and negative contacts from which is possible to measure the output voltage generated by the rotation of the triboelectric impeller. Consequently, the whole cap has been divided into four sensing elements  $S_i$  (i = 1...4). By combining opportunely the electric contacts, it is possible to obtain different configurations of the device. For example, with four independent sensing elements, it is possible to obtain a redundant measurement. Otherwise, it is possible to use just one sensing element for flow metering while the



Fig. 1 Exploded 3D sketch of the proposed TEF prototype (left) and a picture of the fully assembled prototype (right)



Fig. 2 (*Left*) Four-steps sequence of the operating principle of the proposed TEF **a** approach; **b** cross; **c** overlap; **d** overtake. The material stacks used for the triboelectric impeller blade and the closure cap (with the copper electric contacts) are sketched on the upper side. (*Right*) Output voltage generated by the TEF in conditions of low air flow to highlight the corresponding steps of the sequence shown to the left

remaining ones can be used for energy harvesting purposes. In this work, we focused only on the validation of the measuring capabilities of the proposed flowmeter.

# **3** Operating Principle

The silicone surface of the impeller and the copper surface of the closure cap have been put in contact and separated during the assembly of the whole device. This step is needed to activate the triboelectrification process (i.e. transfer of electrons from the copper surface of the closure cap to the silicone).

Once fully assembled, the realized transducer resembles the structure of a single dielectric triboelectric device operating in freestanding non-contact mode, as schematically depicted by the 4-step sequence shown in Fig. 2. For sake of simplicity, the following description refers just to a pair of adjacent sectors (i.e. only one S+ and one S-) and one blade of the triboelectric impeller.

The basic assumption is that the initial amount of triboelectric charge, Q, obtained by putting in contact the triboelectric blade and the closure cap during the assembly of the device remains constant during the rotation of the impeller because the device operates in non-contact mode [7]. In this way, by changing the capacitive coupling between S- and S+, represented by the parameter C, it is possible to obtain a voltage generation, V, across the S- and S+ terminals because of  $Q = C \cdot V$ . The variation of C can be easily obtained by exploiting the rotation of the triboelectric impeller that causes a variation in the overlapping between the impeller itself and S+ and S-.

An example of the measured output voltage generated by the TEF confirming this behavior is shown on the right of Fig. 2. While the impeller blade is far from the gap between S- and S+ (see Fig. 2a left), or it overlaps the gap (see Fig. 2c left), there is no change in C and consequently no  $\Delta V$  is generated between S- and S+. Vice

versa, when the blade crosses and overtakes the gap between S- and S+ (Fig. 2b left and Fig. 2d left, respectively) a  $\Delta C$  occurs, and consequently a negative voltage pulse, and a positive one are generated, respectively. The frequency of the pulse train generated by the cyclic sequence of the four phases described above is related to the rotation speed of the triboelectric impeller, and consequently to the speed of the gas/fluid flowing through the main body of the TEF.

The silicone layer of the triboelectric impeller can not store the initial charge generated by the triboelectrification process indefinitely. It has a decay time that depends strongly on the environmental conditions (e.g. humidity) and on the properties of the material used, [7]. Several countermeasures can be taken in further TEF implementations to overcome this issue. For example, using a professional 3D printer for mechanic parts allows reducing significantly the air gap between closure cap and triboelectric impeller, which is beneficial for both generation and store of triboelectric charge. Moreover, in substitution of the used screws, a properly designed accordion can be used to fix the closure cap with the crankcase. Under the reasonable assumption that vibrations (even small) occurring during the normal operation of the TEF, the accordion allows random oscillations of the closure cap. This results in random contacts between the cap and the triboelectric impeller with the consequent regeneration of the triboelectric charge.

#### **4** Experimental Results

The realized prototype has been characterized by using the setup shown in Fig. 3. The air exits from the blow gun of the portable air compressor and is canalized into the realized TEF by a flexible rubber pipe. The air exiting from the TEF flows into a second flexible rubber pipe and enters into the commercial Hall Effect-based flowmeter used as reference transducer. In this way, the two flowmeters are connected in series and are subjected to the same air flow. Both the output voltages generated by the TEF and the output signal of the Reference Flowmeter (RF) are acquired with an Agilent DSO9254A oscilloscope. The RF produces four output voltage pulses per round and is powered by an Agilent E3631A power supply.

It is important to note that the main body of the TEF, has been designed reproducing the same mechanical dimensions of the commercial flowmeter (i.e. same internal diameter of the pipes, same dimensions and same number of blades of the impeller). In a first approximation, this allowed having the same head loss in both RF and TEF. The triboelectric impeller has been designed with four blades in order to have four output voltage pulses per round generated by each sector (the same ones of the RF) allowing the direct comparison between TEF and RF.

Figure 4 shows an example of comparison between the two synchronized output signals provided by the RF and by the sector  $S_1$  of the proposed TEF when operating with a constant (on the left) and a variable (on the right) air flow.

As it is possible to note, there is a very good agreement between the voltage peak (negative or positive) generated by the TEF and the rising and falling edges of the



Fig. 3 Experimental setup (*left*) and zoom of realized TEF and reference flowmeter connected in series (*right*)



Fig. 4 Comparison between the output signal of the reference flowmeter and the output voltage generated by the TEF measured across the S- and S+ terminals of the sector S<sub>1</sub> in case of constant (*left*) and variable air flow (*right*)

RF output signal. The slight differences in the peak amplitude of the TEF output signal are related to the non-uniform thickness of the silicone layer due to the manual deposition process and the slightly eccentric rotation of the triboelectric impeller due to the mechanical tolerance of some 3D-printed parts. Both these aspects produce a variation in the gap between the triboelectric impeller and the closure cap during the normal operation of the TEF resulting in a variation of equivalent capacitance (and consequently of voltage), as occurring in any triboelectric/electrostatic device. It is worth noting that these non-idealities do not affect the flow rate measurement because, like in classic impeller-based flowmeters, it is based on the frequency of the pulses and not on their amplitude. Moreover, by combining the outputs of two adjacent sectors (e.g.  $S_1$  and  $S_2$ ), the TEF allows obtaining easily flow direction (from the time sequence of pulses generated by the two sectors) and a redundant measurement (by comparing the time interval between two consecutive peaks occurring for each sector).

# 5 Conclusions

In this paper, we presented an ultra-low cost commercial silicone-based triboelectric flowmeter. Experimental results demonstrate the effectiveness of the proposed solution showing a very good agreement between the output data of the realized prototype and the ones provided by a commercial reference flowmeter. Thanks to its reconfigurable architecture, it is possible to enable additional features like flow direction detection, redundant measurements or even energy harvesting capabilities.

# References

- Subramanian, S., Kumar, U.: Augmenting numerical stability of the Galerkin finite element formulation for electromagnetic flowmeter analysis. IET Sci. Meas. Technol. 10(4), 288–295 (2016)
- Zheng, H.D., Fan, S.: Experimental study and implementation of a novel digital closed-loop control system for coriolis mass flowmeter. IEEE Sens J. 13(8), 3032–3038 (2013)
- Shao, L., Xu, K.J., Shu, Z.P.: Segmented kalman filter based antistrong transient impact method for vortex flowmeter. IEEE Trans. Instrum. Meas. 66(1), 93–103 (2017)
- Fernandes, W., Bellar, M.D., Werneck, M.M.: Cross-correlation-based optical flowmeter. IEEE Trans. Instrum. Meas. 59(4), 840–846 (2010)
- Zhao, H., Peng, L., Takahashi, T., Hayashi, T., Shimizu, K., Yamamoto, T.: CFD-aided investigation of sound path position and orientation for a dual-path ultrasonic flowmeter with square pipe. IEEE Sens. J. 15(1), 128–137 (2015)
- Zheng, D., Mei, J., Wang, M.: Improvement of gas ultrasonic flowmeter measurement nonlinearity based on ray tracing method. IET Sci. Meas. Technol. 10(6), 602–606 (2016)
- Lin, L., Wang, S., Niu, S., Liu, C., Xie, Y., Wang, Z.L.: Noncontact free-rotating disk triboelectric nanogenerator as a sustainable energy harvester and self-powered mechanical sensor. ACS Appl. Mater. Interfaces 6(4), 3031–3038 (2014)

# **Electro-Thermal Characterization and Modeling of a 4-Wire Microheater for Lab-on-Chip Systems**



Andrea Scorzoni, Pisana Placidi, Paolo Valigi and Nicola Lovecchio

**Abstract** The paper proposes a part of an extended research devoted to the application of Lab-on-Chip systems for the detection of viral infections. The focus is on accurate lumped element modeling, simulation and experimental characterization of the thermal behavior of an integrated glass micro-heater, useful for the design and simulation of dedicated electronic controlling systems. A lumped three-compartment model for the analysis of the thermal behavior of a heater has been proposed and discussed. The mathematical model was extensively simulated, and the associated parameters have been chosen in order to minimize the L2 norm of the error with respect to experimental data collected though an experiment campaign.

# 1 Introduction

Lab-on-Chip (LoC) electronic systems are smart laboratories integrating a set of simultaneous analyses to obtain high sensitivity, diagnostic speed, cost efficiency, parallelization, safety, etc. They are capable of precise detection of dozens chemical and biological, possibly hazardous, substances determining a large amount of chemicals (nitrates, chlorides, heavy metals, etc.) and microorganisms (bacteria, fungi, yeast, cells, etc.), as well as conducting a variety of tests based on DNA analysis [1–3].

In these systems, heaters are among the most frequently used devices to control temperature for analytical purposes. Glass is the material of choice for conventional

A. Scorzoni · P. Placidi (🖂) · P. Valigi

Dipartimento di Ingegneria, via G. Duranti 93, 06125 Perugia, Italy e-mail: pisana.placidi@unipg.it

- A. Scorzoni e-mail: andrea.scorzoni@unipg.it
- P. Valigi e-mail: paolo.valigi@unipg.it

N. Lovecchio DIET, Sapienza University of Rome, Rome, Italy

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_15

analytical applications, and therefore glass substrates are of particular interest for labon-chip devices. A recent literature search on thin-film heaters on glass reported in [4] summarizes applications in the field of LoCs. Moreover, integrated microheaters are also widely used for other sensor system applications as in the case of micromachined vacuum sensors, flow sensors, humidity sensors, etc. [5].

In [6] 1st and 2nd order lumped element models for a heater on glass with a single power injection were presented, based on serial/parallel connections of thermal resistors and capacitors. It was shown that Spice simulation results improve when the 2nd order model is adopted. However, the distributed power generation of the heater was not correctly taken into account. Moreover, 4-wire geometries could be adopted for heaters to accurately monitor the resistance in the inner region featuring a uniform temperature.

Section 2 of this work describes the heater structure and the conventions we will use to localize the different spots of the heater. In Sect. 3 we show the experimental thermal transient characterizations. The lumped-element thermal model is accurately described in Sect. 4, while Sect. 5 shows and discusses the simulation results. Section 6 concludes the paper.

#### 2 Heater Structure

The heater geometry was formerly designed and optimized exploiting multiphysics finite element simulation to get a spatially uniform temperature distribution over the whole active area of the device, following the procedure reported in [7]. Recently, a similar approach on a silicon based metal oxide gas sensors has been considered in [8] in order to optimize the layout geometry in terms of power consumption and temperature distribution. The structure (Fig. 1a) is a serpentine shaped 4-wire resistor, made of a Cr/Al/Cr sandwich, with varying width and spacing of the meanders, resembling a "chirp" signal. Electrical current *I* is introduced between taps #1 and #4; good temperature uniformity is guaranteed in the "body" (or "active area") region between the central taps #2 and #3, provided about half of the total power is dissipated in the surrounding "wing" regions between taps #1 and tap #2 and, symmetrically, taps #3 and #4. The "wings" do not contribute to the 4-wire electrical signal but they are essential from the thermal point of view and an accurate electro-thermal model of the structure should not neglect their presence and behavior.

The good temperature uniformity in the body allows us to correlate the regime 4-wire heater resistance  $R_{h-4w}$  with the active area temperature *T* with the simple relationship

$$R_{h-4w} = R_{h-4w0}(1 + TCR_0 \cdot T) \tag{1}$$



Fig. 1 a Optimized 4-wire microheater geometry: in figure the measurement spots were reported and highlighted as Sp1, Sp2, Sp3, Sp4 and Sp5. b FLIR thermometric image of the microheater during the transient experiment at 48.7 °C

where  $TCR_0$  and  $R_{h-4w0}$  are the Temperature Coefficient of Resistance and the 4-wire heater resistance referred to 0 °C, respectively. Figure 1a also shows the dimensions and the geometry of the structure.

In the next sections we will describe temperature measurements collected with a FLIR A325 system IR thermal imaging camera in different spots of the structure. The body of the heater includes spots sp2, sp3 and sp4 of Fig. 1, while spots sp1 and sp5 belong to the wings of the heater (outside the voltage taps.) We chose spot locations in the glass regions between meanders since the emissivity of the glass is well characterized (typically 0.95), while the metal emissivity is much lower and often affected by an unpredictable error.

#### **3** Transient Thermal Characterization

A microheater was electrically characterized both (i) during constant 4-wire resistance heating and (ii) during free temperature evolution after removing power. Constant resistance bias was implemented due to the previously cited direct link between active area temperature and 4-wire resistance. This can be obtained by measuring the electrical current of the heater (between taps #1 and #4) and controlling the voltage drop (between taps #2 and #3) in order to maintain a constant resistance. A current saturation of about 360 mA, based on previous experience, was imposed during the initial transient in order to limit the power density on the glass to a safe value of about 8 W/cm<sup>2</sup> for a few seconds.

Infrared maps as a function of time were collected with a FLIR A325 system during repeated measuring sessions. The FLIR system accuracy is  $\pm 2$  °C for temperatures below 100 °C. Figure 1b shows a thermometric image at 48.7 °C where a darker shadow corresponds to a region where the SU-8 passivation still adheres to the substrate while in the other area it delaminated away, leaving that region uncoated.





Thermometric measurements were performed in the uncoated glass region, where the well accepted emissivity value of 0.95 was in-house previously checked against thermocouple measurements. The resistor was heated at constant electrical resistance till the target resistance and temperature were achieved, then the power was removed and the observation was prolonged till the heater reached the ambient temperature again. Ambient temperature was about 30 °C. The temperature of the 5 spots (Fig. 1a) was characterized as a function of time for two different 4-wire resistance set points: 21 and 23.1  $\Omega$ . We chose spot locations in the glass regions located between meanders since the emissivity of the glass is well characterized, while the metal emissivity is often affected by an unpredictable error. Two couples of the 5 spots are symmetrical with respect to the central axis of the chirp heater. Figure 2 shows the experimental data for a target temperature of 48.7 °C. The initial transient highlights a noticeable but expected temperature overshoot of the glass in spots #1 and #5, since the design power density in that region is bigger than in the other spots, where a smoother increase is observed. Temperature measurements are taken in the glass only, but during the initial transient the metal is certainly hotter than the glass and a significantly greater temperature overshoot is expected in the metal regions of the heater which cannot reliably be measured by the IR camera.

The different spots show a more uniform behavior after power is cut off. Symmetrical spots, e.g. sp#2 and sp#4, do not show identical results, due to uneven cut of the glass, not symmetrical passivation delamination, not symmetrical metal overetch during fabrication, not perfectly constant glass thickness. IR camera measurements in regime condition were also exploited for extracting the  $TCR_0$  of the considered device. The measured regime temperature at a setpoint resistance  $R_{h-4w} = 21 \Omega$  was 48.7 and 82 °C at a 23.1  $\Omega$  setpoint. Through a two-parameter fitting,  $R_{h-4w0} = 17.9 \Omega$  and  $TCR_0 = 0.00356 \text{ °C}^{-1}$  were calculated for the body of the heater. In the following simulations it was assumed the wings share the same TCR.



## 4 Lumped-Element Electro-Thermal Model

In order to define an accurate lumped element model of the integrated micro-heater, we improved and refined the model reported in [5]. A new lumped 3-state-variable, three-compartment linear thermal model was defined (Fig. 3). The model includes two thermal CR's associated to the two different thermal constants of the body of the heater: *inner* (i) components  $R_{\text{th-i}}$  and  $C_{\text{th-i}}$  refer to the glass of the sp3 spot and *outer* (o) components  $R_{\text{th-o}}$  and  $C_{\text{th-o}}$  refer to the sp2-sp4 spots, while  $R_{\text{th-io}}$  is the thermal resistance between inner and outer spots of the body. Then, another thermal  $CR(C_{\text{th-w}},$  $R_{\text{th-w}}$ ) was defined for the wings of the heater and connected to the outer temperature of the body through a  $R_{\text{th-ow}}$  thermal resistor. The thermal resistance between each metal meander of the heater and the surrounding glass was taken into account through thermal resistors  $R_{\text{th-wmw}}$ ,  $R_{\text{th-omo}}$  and  $R_{\text{th-imi}}$ , where the relevant power is injected. The lumped model represents the microheater folded around its middle axis, therefore the model assumes the thermal behavior is perfectly symmetrical around this axis. Since the "outer" and "wing" thermal components represent the thermal behavior of couples of meanders, the pertinent injected power is twice the power of a single meander, i.e. twice the squared heater current multiplied by the relevant electrical resistance, as shown in Fig. 3. Figure 2 clearly shows the real thermal behavior is not perfectly symmetrical, especially in the wings of the micro heater. The model could be further complicated by introducing two different outer components and two different wings. However, we preferred to preserve a relative simplicity of the thermal model at the expense of slightly worse parameter identification results. The values of the  $R_{ih-4w0}$  (resistance of the inner part of the body at 0 °C),  $R_{oh-4w0}$ (resistance of the two meanders of the outer part of the body at 0 °C) and  $R_{wh-4w0}$ (resistance of the two wing meanders at 0 °C) conform to Eq. (1) and have been calibrated taking into account the real geometries, in terms of geometrical squares, of the inner and outer meanders (Fig. 1): the inner part is about 34% of the body of the heater, while each wing comb is 1.23 times more resistive than each outer comb. The adopted values are  $R_{wh-4w0} = 7.25 \ \Omega$ ,  $R_{oh-4w0} = 5.91 \ \Omega$ ,  $R_{ih-4w0} = 6.09 \ \Omega$ .

# 5 Simulation Results and Discussion

The three-compartment lumped model of the heater has been derived in the form of a state space model, and several simulation experiments have been carried out. The time behavior of the three sections, "wing, "outer" and "inner" has been compared with the measured ones, and a simple heuristic identification procedure for the parameters of the thermal model has been implemented, aimed at minimizing the L2 norm of the error between measured and simulated curves at the three chosen sections. Since the three-compartment model is based on the symmetry of the microheater but the actual temperature measurements of the "wing" and the "outer" sections are not perfectly symmetrical, we chose to fit the average temperature data in these sections. Since the model should represent the transient behavior of the microheater at typical LoC temperatures [9], we tested its robustness by implementing the identification procedure on the experimental data at 48.7 °C and exploiting the extracted thermal parameters to simulate the system behavior at a temperature much greater than that, i.e. 82 °C. Optimization regarded the 8 parameters  $R_{\text{th-i}}$ ,  $R_{\text{th-o}}$ ,  $R_{\text{th-io}}$ ,  $C_{\text{th-o}}$ ,  $C_{\text{th-wy}}$ ,  $R_{\text{th-io}}$ .

For the purpose of identification, the external circuitry used to power the heater in constant resistance mode has also been modeled. Overall, it turns out the complete system can be represented by a nonlinear model with five state variables, three of which belong to the heater. The system has two equilibrium points, namely the origin (every signal equal to zero) and the one associated with the target temperature. An application of the reduced Lyapunov criterion allows us to prove that the origin is unstable, while the latter one is asymptotically stable.

For the sake of accuracy, the thermal model has been extended with three additional CRs, to model the metal strips in the three sections, "wing", "outer" and "inner". It turns out, based on simulation results, that the sensitivity of the heater dynamical response to these additional parameters is zero, from a practical point of view, and therefore they are neglected in the following.

The simulated time behavior of the proposed lumped-element thermal model is reported in Fig. 4a and zoomed in Fig. 4b. The main characteristics of the measured signals are reproduced. The worst case error, greater than the  $\pm 2$  °C accuracy of the FLIR system, is only found in the "wing" regions (i.e., the temperature in spots #1 and #5) during the first 10 s of the transient (Fig. 4b), indicating a worse approximation of the actual distributed behavior of the simplified three-compartment model in that part of the microheater. Anyway, the "wing" temperature, exhibits the overshoot captured by the experiments, as well as the "outer" section, although such an overshoot has a quite smaller amplitude, barely noticeable in the experimental data.

Simulation experiments also allow us to qualitatively envisage the behavior of the metal temperature during transient, which, as expected, grows faster than on glass (Fig. 4c) and zoomed in Fig. 4d. The identified parameters of the thermal model, together with other constant parameters, are displayed in the caption of Fig. 4. The experimental data used where those collected with a target temperature equal to



**Fig. 4** a Lumped model comparison with experimental data in glass spots @48.7 °C. **b** Zoom on the previous figure during the first 20 s. **c** Glass spots versus metal strips temperatures. **d** Zoom on the previous figure during the first 20 s. Identified parameters:  $C_{\text{th-w}} = 89.1 \text{ mJ/K}$ ,  $C_{\text{th-o}} = 83.6 \text{ mJ/K}$ ,  $C_{\text{th-i}} = 232.6 \text{ mJ/K}$ ,  $R_{\text{th-w}} = 142.5 \text{ K/W}$ ,  $R_{\text{th-o}} = 400.4 \text{ K/W}$ ,  $R_{\text{th-i}} = 1080 \text{ K/W}$ ,  $R_{\text{th-imi}} = 1080 \text{ K/W}$ ,  $R_{\text{th-imi}} = 10 \text{ K/W}$ 

48.7 °C. It turns out that the optimized values for the thermal CR's are 12.7 s, 33.5 s and 251.2 s, for "wing", "outer" and "inner" sections, respectively.

Ambient temperature is a function of the experiment (calculated from the regime values after cooling).

To validate the estimated parameters, and the predictive quality of the whole mathematical model as well, the data from the second experiment have been used (Fig. 5a), with the same parameter set of the previous case. As it can be seen, the agreement between simulations and the measured data is only qualitative in the first 25 s (Fig. 5b.) After that, the worst case discrepancy is found between  $T_{\text{o-meas}}$  and  $T_{\text{o-sim}}$ , whose difference, however, is always lower than 2.5 °C, but slightly higher than the  $\pm 2$  °C accuracy of the FLIR system.



Fig. 5 a Lumped model comparison with experimental data @82 °C. b Zoom on the previous figure during the first 20 s

# 6 Conclusion

A lumped three-compartment model for the analysis of the thermal behavior of a heater has been proposed and discussed. The heater is intended for use on a Lab-on-chip system.

The mathematical model has been extensively simulated, and the associated parameters have been chosen in order to minimize the L2 norm of the error with respect to experimental data collected though an experiment campaign. The model turns out to be quite accurate, and the behavior obtained under operating conditions different from the ones used for the identification process confirms the quality and suitability of the model.

Acknowledgements This work has been carried out in the framework of the "*Fondo di Ricerca di Base 2018*", funded by the University of Perugia (Project n.o.: PJ RICBA18PP). The authors would like to thank prof. G. de Cesare and the DIET group for samples preparation and their suggestions.

#### References

- Shagun, G., Kritika, R., Suhaib, A., Vipan K.: Lab-on-Chip technology: a review on design trends and future scope in biomedical applications. Int. J. Bio-Sci. Bio-Technol. 8(5), 311–322 (2016)
- Richard, B.F., et al.: Chemical and biological applications of digital-microfluidic devices, chemical and biological applications of digital-microfluidic devices. IEEE Des. Test Comput. 24(1), 10–24 (2007)
- Francesca, C., et al.: Integrated sensor system for DNA amplification and separation based on thin film technology. IEEE Trans. Compon. Packag. Manuf. Technol. 8(7), 1141–1148 (2018)
- 4. Scorzoni, A., et al.: Thermal characterization of a thin film heater on glass substrate for lab-onchip applications. In: Proceedings of I2MTC, pp. 1089–1094 (2014)
- 5. Mejer, G.C.M. (ed.): Smart Sensor Systems. Wiley (2008)

- Scorzoni, A., Tavernelli, M., Placidi, P., Valigi, P., Nascetti, A.: Accurate analog temperature control of a thin film microheater on glass substrate for lab-on-chip applications. In: Proceedings of the IEEE Sensors Conference, November 2–5, 2014, Valencia, Spain (2014)
- 7. Caputo, D., de Cesare, G., Nascetti, A., Scipinotti, R.: A-Si: H temperature sensor integrated in a thin film heater. Phys. Status Solidi A **207**(3), 708–711 (2010)
- Lahlalia, A., Filipovic, L., Selberherr, S.: Modeling and simulation of novel semiconducting metal oxide gas sensors for wearable devices. IEEE Sens. J. 18(5), 1960–1970 (2018)
- 9. Sripumkhai, W., et al.: On-chip platinum micro-heater with platinum temperature sensor for a fully integrated disposable PCR module. Adv. Mater. Res. **93–94**, 129–132 (2010)

# Part V Signal Processing Systems

# **Toward the Real Time Implementation of the 2-D Frequency-Domain Vector Doppler Method**



Stefano Rossi, Matteo Lenge, Alessandro Dallai, Alessandro Ramalli and Enrico Boni

**Abstract** Ultrasonography is widely used for tissue imaging and blood flow assessment thanks to its important advantages in terms of performance, cost and safety. High frame rate 2-D vector blood flow imaging is an innovative method that delivers accurate two-dimensional velocity maps but requires time consuming algorithms, which has limited real-time applications so far. In this paper, a GPU based implementation of the technique is proposed and its speed compared to previous implementations addressed to rapidly process 2-D vector data. The results obtained by using optimized code and massive parallel computation devices, confirm that the method is suitable for real-time applications in the near future.

# 1 Introduction

Several ultrasound Doppler techniques have been recently proposed for the estimation of both direction and intensity of blood flow vectors [1]. The high-frame-rate 2-D vector flow imaging method [2] produces bi-dimensional maps of 2-D velocity vectors; it is based on the processing of data obtained by the illumination of the region of interest (ROI) with plane waves transmitted at a given pulse repetition frequency (PRF). As proposed for elastography [3], the phase shifts caused by blood movement are evaluated in the frequency domain. The application of this method to 2-D Doppler analysis has brought significant advantages in terms of computational time compared to similar approaches [4]. However, the overall velocity requirements are still too high to permit real-time operations through classic devices, such as DSPs.

M. Lenge

Neurosurgery Department, Children's Hospital Meyer-University of Florence, Florence, Italy

A. Ramalli

S. Rossi (🖂) · A. Dallai · E. Boni

Department of Information Engineering, University of Florence, Florence, Italy e-mail: stefano.rossi@unifi.it

Laboratory of Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Louvain, Belgium

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_16



Fig. 1 Acquisition setup

The goal of this work is to evaluate whether implementing the method on a GPU may involve execution times short enough to enable a real-time operation of the frequency domain method. Compared to, e.g., FPGA implementations, the use of a GPU offer advantages in terms of cost, easier software programming and continuous upgrade of available commercial boards.

# 2 2-D Displacement Estimation in Frequency Domain

In this section, the 2-D displacement estimation algorithm is reviewed [2].

High frame-rate (HFR) radio-frequency (RF) images are obtained by parallelbeamforming the backscattered echoes when plane waves are transmitted from the elements of a linear array probe (see Fig. 1). Maps of velocity vectors are produced by tracking the blood particles displacements by processing consecutive RF frames, as sketched in Fig. 2.



Fig. 2 Main steps of method

A high-pass filter, called "clutter filter", mitigates the tissue clutter on each frame of RF data by subtracting the DC component in each point. Such component is estimated through a moving average over 20 RF frames.

A single velocity map is the result of the contribution of two consecutive frames, whose points are here indicated as  $s_{mn}$  and  $\tilde{s}_{mn}$  (where *m* and *n* are the coordinates of each point). Each frame is divided in partially overlapped blocks of  $M_B \times N_B$  points each. The interval between the two frames, acquired at times *t* and  $\tilde{t}$ , respectively, is the pulse repetition interval:  $\tau = \tilde{t} - t$ . If blood particles (scatterers), during  $\tau$ , have covered average displacements  $\Delta z$  and  $\Delta x$ , in the axial and lateral directions respectively, a block of  $\tilde{s}_{mn}$  frame can be assumed to be a shifted version of the respective block of  $s_{mn}$  frame.

The scatterers displacement,  $\Delta z$  and  $\Delta x$ , are estimated by calculating the phase shifts between the 2-D spectra of corresponding blocks in two consecutive frames. The spectral analysis of each block is performed using a 1-D discrete Fourier transform (DFT) for the z direction, and a 2-D DFT for the x direction. To further reduce the computation time, the 2-D DFT is split into a pair of 1-D DFTs, where the first DFT is the one already calculated for the z direction [2].

The choice of the number of frequencies of DFTs is crucial for the algorithm performance. Small values are preferable in terms of memory resources and time consumption, while large values should be used to increase robustness and accuracy by averaging results obtained at manifold frequencies. In this work, as suggested in [2], we chose  $n_f = 5$  axial frequencies, uniformly distributed around the transmitted signal frequency,  $f_0 = 6$  MHZ, and  $2n_f = 10$  lateral frequencies, 5 on the negative and 5 on the positive axis.

The mean velocity matrices are obtained dividing the displacement by the time interval  $\tau$ :

$$\overline{v_z} = \frac{\overline{\Delta z} \delta z}{\tau} \qquad \overline{v_x} = \frac{\overline{\Delta x} \delta x}{\tau} \tag{1}$$

where  $\delta z$  and  $\delta x$  are the spatial distances between adjacent depths and lines, respectively.

Partial block overlapping (i.e. 79% along the axial direction, and 95% along the lateral direction) was considered to further average values and reduce the influence of noise, though this results in larger number of blocks to be processed and longer execution times.

Figure 3 shows two examples of screenshots captured during the analysis of common carotid arteries of two volunteers.



Fig. 3 Vector maps a, b in two different carotid arteries

# **3** Computational Performance

The computational performance of the frequency-domain method was demonstrated significantly lower than the conventional 2-D cross-correlation approach [5]. In particular, the high frame rate 2-D displacement estimation in the frequency-domain was demonstrated to be 50 times more computationally efficient than other techniques, providing comparable flow speed estimations [2]. However, the implementation of the method in a CPU is still not fast enough to be used in clinical applications which require real-time execution.

In order to increase the processing speed, we propose here a parallelized processing approach on GPU device. Preliminary tests were conducted exploiting the Parallel Computing Toolbox supplied with Matlab (The MathWorks Inc., Natick, MA, USA) to ease the implementation. The results were encouraging, as they presented a  $5 \times$  speed-up factor for the GPU over the bare CPU implementation. This is reasonable, though not fully satisfactory, since the GPU Matlab toolbox does not allow the programmer to perfectly control every step of the workflow.

Consequently, we have implemented a first version of the same algorithm in C++ using the CUDA toolkit (Nvidia Corporation, Santa Clara, CA, USA). The code cyclically transfers a set of fresh RF samples, comprising several frames, to the GPU, and arranges them in a buffer whose size is adjustable. At the same time, data processing is performed in parallel by thousands of cores of the device.

The code was designed to sequentially run at every cycle all the GPU kernel functions, one for each step of the algorithm. The numbers of blocks and threads per block were specifically set for each function to improve the computation performance. Several code optimization techniques have been exploited, such as coalesced memory accesses, tiling and asynchronous memory transfers, focusing on the strict allocation and the arrangement of available resources in the GPU memory.

#### **4** Experiments and Results

The performance tests were executed on a PC supported by an Intel i5-2320 Processor (Intel Corporation, Santa Clara, CA, USA) and a GPU board Nvidia GeForce GTX TITAN Black, featuring 15 streaming multi-processors, with 192 cores each (for a total of 2880 cores), and 6 GB of GDDR5 memory with very fast internal bus (up to 336 GB/s bandwidth).

To simplify the test setup, stored experimental data have been used. The frame dimensions and the most important computational parameters are shown in Table 1. The axial  $(M_w)$  and lateral  $(N_w)$  dimensions of the output velocity vector matrix can be easily calculated as:

$$Mw = \frac{(Depths - M_B + 1)}{(M_B - OD)} \qquad Nw = \frac{(Lines - N_B + 1)}{(N_B - OL)}$$
(2)

where OD and OL are the numbers of overlapped depths and lines of consecutive blocks.

The tests involved the execution of the optimized CPU Matlab code, the GPU Matlab code and the new GPU code version. In Fig. 4a, the performance of the three tested methods, in terms of frame rate, are reported.

The sustainable input frame rate increases from about 320 frame/s, obtained with the GPU Matlab algorithm, up to almost 4500 frame/s.

The dimension of the buffer, which is processed for each cycle, is crucial for the real-time implementation. Thus, it has been varied between 2 and 2000 frames and different performance has been compared.

The efficiency of the algorithm is maximized computing just 20–40 frames per cycle. This also leads to a minimal output latency.

| Test parameters                             | Value     |  |
|---------------------------------------------|-----------|--|
| No. of frequencies $(n_f)$                  | 5         |  |
| Depth distance ( $\delta z$ )               | 0.01 mm   |  |
| Line distance ( $\delta x$ )                | 0.15 mm   |  |
| Depths                                      | 1842      |  |
| Lines                                       | 64        |  |
| M <sub>B</sub> (Depths per block)           | 150       |  |
| OD (Ovelapped depths of consecutive blocks) | 110 (79%) |  |
| N <sub>B</sub> (Lines per block)            | 20        |  |
| OL (Ovelapped lines of consecutive blocks)  | 19 (95%)  |  |
| M <sub>w</sub>                              | 42        |  |
| Nw                                          | 45        |  |

Table 1 Test parameters



Fig. 4 Sustainable input frame rates **a** for the settings summarized in Table 1 and the incidence of the variation in the number of frequencies (**b**)

Lastly, the reduction of the number of frequencies in DFTs from  $n_f = 5$  to  $n_f = 3$ , as suggested in [2], maintains a good accuracy and further increases the frame rate up to 5380 frame/s, Fig. 4b.

# 5 Conclusions

In this work, the implementation of the high frame rate 2-D vector blood flow imaging method in C++ (CUDA) language has been proposed and its performance compared to that of a previous MATLAB version. The new code was refined until the results were not substantially coincident with the results of the MATLAB code.

The algorithm implementation in a GPU system has produced 4500 frame/s, thus capable of supporting a PRF of 4.5 kHz. Further tests have been conducted by reducing the number of DFT frequencies and the dependence of the frame rate from the buffer size. Future improvements may be obtained by either a boosted optimization of the code and a revision of the method. Considering that the devices (CPU, GPU processors and motherboard) we used do not represent the top level, the target of 7 kHz of PRF may be considered achievable. Note that having such a high PRF is redundant for the needs of real-time display (for which frame rates of 50 Hz might be sufficient) but may be useful to enable further processing strategies addressed to improve the image quality and robustness. Finally, since the PRF is coincident with the Doppler signal sampling rate, high values are necessary for a correct analysis of high flow velocities.

In addition to minimizing processing time, it's necessary to have an ultrasound system like ULA-OP 256 [6], developed at the University of Florence for research purposes, which implements high frame rate imaging methods and suitable high bandwidth data transfer system. In a real-time process, with 7 kHz of PRF, it is necessary a fast communication bus that allows at least 3.5 GB/s. This specification is compatible with the latest PCI-express standards. Ultimately, the real time implementation of this method will be feasible in future ultrasound platforms combining FPGA and GPU processing [7].

Acknowledgements The authors are grateful to Prof. Piero Tortoli for providing valuable suggestions and guidelines in this work.

#### References

- Tortoli, P., Lenge, M., Righi, D., Ciuti, G., Liebgott, H., Ricci, S.: Comparison of carotid artery blood velocity measurements by vector and standard doppler approaches. Ultrasound Med. Biol. 41(5), 1354–1362 (2015)
- Lenge, M., Ramalli, A., Boni, E., Liebgott, H., Cachard, C., Tortoli, P.: High-frame-rate 2-D vector blood flow imaging in the frequency domain. IEEE Trans. Ultrasound Ferroelectr. Freq. Control 61(9), 1504–1514 (2014)
- Ramalli, A., Basset, O., Cachard, C., Boni, E., Tortoli, P.: Frequency-domain-based strain estimation and high-frame-rate imaging for quasi-static elastography. IEEE Trans. Ultrasound Ferroelectr. Freq. Control 59(4), 817–824 (2012)
- Lenge, M., Ramalli, A., Tortoli, P., Cachard, C., Liebgott, H.: Plane-wave transverse oscillation for high-frame-rate 2-D vector flow imaging. IEEE Ultrasound Ferroelectr. Freq. Control 62(12), 2126–2137 (2015)
- Udesen, J., Gran, F., Hansen, K.L., Jensen, J.A., Thomsen, C., Nielsen, M.B.: High frame-rate blood vector velocity imaging using plane waves: simulations and preliminary experiments. IEEE Trans. Ultrasound Ferroelectr. Freq. Control 55(8), 1729–1743 (2008)
- Boni, E., Bassi, L., Dallai, A., Meacci, V., Ramalli, A., Scaringella, M., Guidi, F., Ricci, S., Tortoli, P.: Architecture of an ultrasound system for continuous real-time high frame rate imaging. IEEE Ultrasound Ferroelectr. Freq. Control 64, 1276–1284 (2017)
- Boni, E., Yu, A., Freear, S., Jensen, J., Tortoli, P.: Ultrasound open platforms for next-generation imaging technique development. IEEE Trans. Ultrasound Ferroelectr. Freq. Control 65(7), 1078–1092 (2018)

# A Field Experiment of Rainfall Intensity Estimation Based on the Analysis of Satellite-to-Earth Microwave Link Attenuation



# M. Colli, M. Stagnaro, A. Caridi, L. G. Lanza, A. Randazzo, M. Pastorino, D. D. Caviglia and A. Delucchi

**Abstract** Globally, the risks from extreme weather are significant and increasing, mainly because larger numbers of people and their assets are being exposed to floods. To cope with such events a now-casting service would be worth to validate weather alerts and to give civil protection offices timely information on the precipitation. In this work, rain-monitoring sensors based on the measurement of the attenuation of the microwave signals transmitted by geostationary satellites are employed to that purpose. The proposed system is able to estimate the rain rate in real time by inverting the propagation model. This paper describes the approach, the set-up and the preliminary results of a comparative field campaign going on in Genoa (Italy).

# 1 Introduction

In recent years, flash flood events occurred in many countries around the world, and also our city (Genoa, Italy) was seriously affected in 2010, 2011, and 2014 [1].

If they take place in urban areas, many people, properties and industries can be severely damaged [2]. To reduce the negative impact on society, i.e. to efficiently manage emergencies, Civil Protection operators need efficient and reliable rainfall monitoring systems [3]. Unfortunately, the existing systems typically suffer from the lack of real-time information about the areal distribution of heavy rain. This

M. Colli (🖂) · A. Randazzo · M. Pastorino · D. D. Caviglia

Department of Electrical, Electronics and Telecommunication Engineering and Naval Architecture (DITEN), Università di Genova, Genoa, Italy e-mail: matteo.colli@unige.it

M. Stagnaro · L. G. Lanza Department of Civil, Chemical and Environmental Engineering (DICCA), Università di Genova, Genoa, Italy

M. Colli · M. Stagnaro · L. G. Lanza WMO-CIMO Lean Centre "B.Castelli" on Precipitation Intensity, Genoa, Italy

A. Caridi · D. D. Caviglia · A. Delucchi Artys Srl, Genoa, Italy

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_17

information is fundamental for the short-term forecasting (now-casting) of flood risk and the management of the emergency phase, constituting a key element for Decision Support Systems (DSS).

In particular, the update time of validated rainfall maps computed starting from long range weather radar (WR) measurements is usually not shorter than 10 min, with a spatial resolution that is often not sufficient to describe the variability of rain storms [4]. It is also well known that the orographic shielding effect limits the applicability of WR to measure orographic and convective rain in steep mountain valleys [5]. On the other hand, rain gauges networks relies on point-scale measurements that generally are not enough capillary distributed to provide rainfall spatial distribution over small sized basin [6], e.g. the complex Ligurian sub-catchments system in Italy. The provision of near real-time measurements by rain gauge networks is only possible by performing laboratory dynamic calibrations of the sensors and adopting specific algorithms to process the data (e.g. as proposed by Colli et al. [3]).

The Smart Rainfall System (SRS<sup>TM</sup>) [5] has been conceived to give a contribution in that perspective: it produces an estimation map of the precipitation in terms of one-minute rainfall amounts by processing the attenuation of the satellite microwave link signal. The system consists in a set of peripheral microwave sensors placed on the field of interest and connected to a central processing and analysis node. The SRS architecture relies on low cost and low power Internet of Things (IoT) nodes: each of them features up to four sensors (in the present implementation) connected to the same commercial satellite dishes used by citizen to receive the digital television broadcasting. In this perspective it is possible to envisage a participatory vision, where "*things*", people and processes are connected through the "*Internet of everything*", finally leading to an accurate monitoring of the water-system.

The present contribution reports the results obtained by using three experimental SRS sensors installed in the City of Genoa (Italy) between 2016 and 2018. This is the first field comparative experiment organized at the urban scale involving several microwave link measuring stations. Furthermore, dynamically calibrated tipping-bucket rain gauges (TBRG) provide the rainfall observations used for comparison [7]. The TBRG measurements have been processed according to advanced algorithms developed by the Lead Centre "B. Castelli" on Precipitation Intensity of the World Meteorological Organization [8].

#### 2 Methodology

In order to illustrate the working principle of the considered technique, let us consider a receiving (Rx) antenna located at a position  $\mathbf{r}_{Rx}$  (Fig. 1). The satellite television dish receives a plane-wave EM signal transmitted by a commercial digital video broadcasting satellite (DVB-S).



Since the wave propagation is affected by the precipitation during its path, by using the ITU model described in [9], the rainfall intensity *RI* [mm h<sup>-1</sup>] can be expressed as:

$$RI = \sqrt[a_p]{\frac{10\log(P_o/P)}{b_p l}} \tag{1}$$

where  $P \propto |\mathbf{E}(\mathbf{r}_{Rx})|^2 [dBi]$  is the power available at the output of the antenna being  $\mathbf{E}(\mathbf{r}_{Rx})$  the electric field impinging on the antenna [10],  $P_o$  [dBi] is the clear sky power at the output of the antenna,  $b_p$  and  $a_p$  are empirical parameters derived from the ITU model [9]. The microwave link length is assumed equal to  $l = H/sin\vartheta$  where H [m] is the elevation of the melting layer at the given latitude and  $\vartheta$  the elevation angle of the Rx antenna (Fig. 1).

As first validation tests, experiments aimed at verifying the operation and effectiveness of the SRS sensor, have been carried out in the laboratories of the University of Genoa—DITEN. The block diagram of the SRS sensor module is drawn in Fig. 2. It exploits the information contained in the power level of the signal at the output of an LNB (Low Noise Block converter), commonly used in the receiving chain of a consumer DVB-S set.

For our purposes, it is worth using a low-cost Universal LNB. In such a case, the down-converted signal on the descending cable contains (in the band 915–2150 MHz) one of the four possible bands which correspond to two different polarizations—ver-



Fig. 2 Block diagram of the sensor module

tical and horizontal—and, in the case of the Ku-band, two different frequency bands—lower and upper.

A proper circuitry detects the television level of the signal as a voltage obtained with a logarithmic detector (Fig. 2). Such a value makes possible the estimation of the rain intensity averaged along the path, and is transmitted on either a wired or a wireless link to a collection center, for recording and monitoring. A detail of the SRS sensing board is depicted in Fig. 3.

To test our approach an experiment is under execution in the city of Genova (Italy) employing 3 antennas pointed towards the Turksat 42 °E satellite. The antennas have been installed in three different sites (denominated A1, A2 and A3 in Fig. 4) and constitute a sub-system of a larger set of microwave links connected to various satellites conveniently selected given their favorable alignment shown in Fig. 4 (red lines). The three microwave links spans over the same portion of territory with a slight offset and sites A2 and A3 site are equipped with tipping-bucket rain gauges (TBRG).

The experimental campaign started on November 2016 and the SRS voltage level signals are recorded with a one-minute time interval.

The TBRG have been dynamically calibrated according to the national standard UNI 11452 [11] by the WMO Lead Centre B. Castelli on Precipitation Intensity and the one-minute rainfall intensity (RI) measurements are computed by using advanced interpretation algorithms [8].



**Fig. 3** Layout of the SRS sensing board: (1) decoder interface; (2) DC/DC converter (13/18 V to 5 V); (3) 75/50  $\Omega$  directional coupler; (4) LNA (MGA-86563, by Avago Technologies); (5) L-band BP filter; (6) LDO for the analog circuitry; (7) Logarithmic power detector (AD8314, by Analog Devices). At the top side, two F connectors fall for the most part out of the figure: the right one allows the connection with an optional decoder, while the right one is dedicated to the connection with the LNB



**Fig. 4** Map of the experimental field site in Genoa (Italy) and horizontal projection of the microwave links. Sites A1 and A2 are equipped with tipping-bucket rain gauges (TBRG)



Fig. 5 Rainfall maps computed by the SRS system during November 11th 2016 and based on the extended network of sensors deployed in the city of Genoa (Italy); each panel shows the time-space evolution of the RI field with a 5-min interval

# **3** Results and Discussion

An example of the rainfall maps computed by the SRS system basing on the extensive network of sensors deployed in the city of Genoa (Italy) is provided by Fig. 5. Each panel of Fig. 5 depicts the one-minute rainfall intensity RI fields computed in different moments of the precipitation occurred between 08:00:00 GMT + 1 and 08:15:00 GMT + 1 on November 11th 2016.

Figure 6 shows a preview of the microwave signal attenuation  $\gamma_{rain}^{dB/Km}$  [dB km<sup>-1</sup>] time series (blue lines) received by three antennas located in the A1, A2 and A3 sites and pointing towards Turksat 42 °E, the reference rainfall intensity *RI* time series are reported in red lines.

A brief comparison between the time series of Fig. 6 shows an evident correlation between the SRS signal attenuation and the *RI* measured by the rain gauges. The time and spatial variability of the precipitation event must be taken into account when evaluating the differences between the signal of non co-located antennas and TBRGs.

Currently, the RI measurements made by the SRS antennas are based on the solution of Eq. (1) and the accuracy of the intensity maps obtained by interpolating such measurements is under evaluation as the main objective of the experimental campaign. An overview of the SRS sensors performance is provided in Fig. 7 in terms of linear correlation coefficient between the 10 min rainfall accumulation [mm] measured by SRS and the reference measurements for a selection of eight precipitation events.



**Fig. 6** Time series of the one-minute rainfall intensity RI [mm h<sup>-1</sup>] measured by the reference rain gauge (red line) and microwave signal attenuation  $\gamma_{rain}^{dB/Km}$  [dB km<sup>-1</sup>] measured by three SRS sensors (blue line) on November 25th 2016



**Fig. 7** Linear correlation coefficients between 10 min rainfall accumulations measured by the SRS sensors located at the A1, A2 and A3 sites, and the reference rain gauges during the experimental campaign

The presence of small liquid or solid particles in the atmosphere, such as clouds, water vapor and fog, is unlikely to affect the attenuation time series measured by the SRS sensors given the Ku band microwave wavelength (approximated to 3 cm). On the other hand, the assimilation of more detailed information about the level and the extent of the melting layer in our methodology would allow accounting for the bright band effects in the Ku band, e.g. following the approach proposed by Gray et al. [9]. Further experimental analysis will focus on assessing the role of the drop size distribution on the signal power attenuation-rainfall rate model parameters [11].

Another challenge for the operational application of the proposed technique is related to the assessment of the clear sky power at the output of the antenna (also called "*baseline*") that is a fundamental parameter for the estimation of the rainfall rate (Eq. 1). The current methodology relies on the assumption that the power measured during the last no-rain minute (identified according to the reference observations) remains constant during the whole precipitation event. The impact of different baseline detection methodologies on the accuracy of the system is the object of ongoing analysis.

# References

- 1. Acosta-Coll, M., Ballester-Merelo, F., Martinez-Peiró, M., La Hoz-Franco, D.: Real-time early warning system design for pluvial flash floods—A review. Sensors **18**(7), 2255 (2018)
- 2. Balanis C.A.: Antenna theory. McGraw-Hill (1989)
- Colli, M., Lanza, L.G., Chan, P.W.: Co-located tipping-bucket and optical drop counter RI measurements and a simulated correction algorithm. Atmos. Res. 119, 3–12 (2013)
- LaDue, D.S., Heinselman, P.L., Newman, J.F.: Strengths and limitations of current radar systems for two stakeholder groups in the southern plains. Bull. Amer. Meteor. Soc. 91, 899–910 (2010)
- Germann, U., Galli, G., Boscacci, M., Bolliger, M.: Radar precipitation measurement in a mountainous region. Q. J. R. Meteorol. Soc. 132, 1669–1692 (2006)
- Krajewski, W.F., Ciach, G.J., Habib, E.: An analysis of small-scale rainfall variability in different climatic regimes. Hydrol. Sci. J. 48(2), 151–162 (2003)
- 7. UNI: Metrological requirements and test methods for catching type gauges, Standard UNI 11452:2012 (2012)
- 8. Federici, B., Caviglia, D., Pastorino, M., Sguerso, D., Randazzo, A., Delucchi, A., Caridi, A.: System and method for monitoring a territory. Italian Patent UIBM No. 0001412786 - European Patent No. EP2688223 (2014)
- 9. Gray, W.R., Cluckie, I.D., Griffith, R.J.: Aspects of melting and the radar bright band. Meteorol. Appl. 8, 371–379 (2001)
- Faccini, F., Luino, F., Sacchini, A., Turconi, L., De Graff, J.V.: Geohydrological hazards and urban development in the mediterranean area: an example from Genoa, Liguria, Italy. Nat. Hazards Earth Syst. Sci. 15, 2631–2652 (2015)
- ITU: Specific attenuation model for rain for use in prediction methods, Standard ITU-R P838 (2005)
# A Face Recognition System Using Off-the-Shelf Feature Extractors and an Ad-Hoc Classifier



### Stefano Marsi, Luca De Bortoli, Francesco Guzzi, Jhilik Bhattacharya, Francesco Cicala, Sergio Carrato, Alfredo Canziani and Giovanni Ramponi

**Abstract** Face recognition systems are of great interest in many applications. We present some results from a comparison on different classification methods using an open source tool that works with Convolutional Neural Networks to extract facial features. This work focuses on the performance obtainable from a multi-class classifier, trained with a reduced number images, to identify a person between a group of known and unknown subjects . The overall system has been implemented in an Odroid XU-4 Platform.

**Keywords** Face detection  $\cdot$  Face recognition  $\cdot$  Deep learning  $\cdot$  Convolutional Neural Networks

### **1** Introduction

This work aims at checking the performances that can actually be achieved using the most recent open-source tools for face recognition. The spectrum of possible application fields where the recognition of a person is required is huge. In many areas a high degree of reliability is essential, such as surveillance and security systems (recognition of suspects, access control, checking for falsified documents, etc.), but also aids for people with visual or cognitive impairments. Other areas are less stringent in terms of reliability, like recreational or marketing applications (tools for annotating friends in a photograph, recognizing expressions or emotions, etc.). Depending on the purpose, the robustness of the system in terms of incorrect recognition (false positive and missed classification) should be evaluated and optimized. Another factor

J. Bhattacharya Thapar Institute of Engineering and Technology, Patiala, India

A. Canziani New York University, New York City, NY, USA

© Springer Nature Switzerland AG 2019

S. Marsi (⊠) · L. De Bortoli · F. Guzzi · F. Cicala · S. Carrato · G. Ramponi University of Trieste, Trieste, Italy e-mail: marsi@units.it

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_18

of interest is the minimum number of images necessary to adequately train the system. Indeed, in many real-world applications only a small number of images of the subjects are available: this is an important factor which is often disregarded in the literature.

Many traditional systems for face recognition utilize handcrafted techniques for feature representation and classification. For example, focusing on tools for the visually impaired, a Kinect-based wearable system [1] uses a Histogram of Oriented Gradients (HOG) a Principal Component Analisys (PCA) and K-Nearest Neighbors (KNN) classifiers, while a PCA-based recognition system using pinhole cameramounted sunglasses was developed by Krishna et al. [2]. A cane-mounted recognition device was implemented using image sets based recognition [3]. A major reason for adapting simple handcrafted techniques in many systems is the difficulty of using state-of-the-art high performance deep learning algorithms on low-power wearable devices. Moreover, using deep neural networks for face recognition requires a huge training dataset formed by images with people of different ethnicities, gender, ages, poses, lighting, in order to obtain robust descriptors; there are also open issues in terms of depth, width, mixture of the datasets. A suitable compromise, that we choose in this paper, is to employ a pre-trained network followed by a user-specific compact classifier. The latter is preferably a shallow network which can perform satisfactorily on the user's limited sample set of faces. It is generally argued that with a limited number of samples a shallow network performs better as compared to a deeper one. Some of these scenarios are discussed in [4]. Currently two open source projects provide pre-trained Convolutiona Neural Network (CNN) and Application Program Interfaces API dedicated to face recognition: OpenFace [5] and Dlib [6] are widely used.

### 2 Architecture of the System

Our target is to develop a small device that can be installed in small places or be wearable, using OEM hardware. The proposed system follows the classical signal chain of many recognizers: an object detector, a features extractor and a features classifier. To detect people in a large area, we acquire a scene using a wide lens full-HD ( $1920 \times 1080$ ) camera and the acquired frames are passed to a face detector. We compared different face detectors according to their accuracy in recognizing faces and their speed on portable hardware: the most promising results are obtained by the HOG algorithm, included in the Dlib library, and Viola-Jones [7], adopted in OpenCV. However the Dlib implementation allows to tweak a quality threshold through which the majority of non-faces are rejected, improving the robustness of the system.

The second step in the chain is the features extraction that is carried out by a CNN, which produces a descriptors vector for each input image. We tried two different

open-source pre-trained networks, OpenFace NN-n4-v2 [5] and Dlib-Resnet [6]. Openface takes a 96×96 RGB image as input, and through a deep inception [8] architecture produces a 128-D vector whose elements lie on the unit hypersphere. This topology (reached using both triplet training and output normalization) dictates the cosine distance as a reasonable distance metric. Dlib resnet produces as well 128-D features, but uses  $150 \times 150$  input images; as the name suggests, the network is a scaled down version of the famous resnet34 network [9] (halfed width), which has been trained within the Dlib environment forcing a margin between features belonging to different identities (ID). Both networks have been trained on a mixture of datasets composed mainly by Face Scrub [10], whose images have been aligned using five points. The precise location of those points in the face Region of Interest (RoI) is given by a landmark detection process, implemented following the Kazemi-Sullivan algorithm [11], included in Dlib. The alignment is carried out using an affine transform, whose parameters are found following the method of Umeyama [12].

Finally, features are classified in order to choose for the correct ID. In this paper three methodologies have been tested: distance classifier (Rocchio), Support Vector Machines (SVM), and Multi Layer Perceptron (MLP). The first one involves the use of a reasonable distance metric (the Euclidean distance in the case of Dlib) as well as the calculation of the centroids of the distributions. Both the SVM and the MLP have been implemented through the methods encapsulated in scikit-learn, using a suitable number of samples. The proposed MLP has two hidden layers of 100 nodes each and obviously a number of output nodes equal to the number of the classes.

The overall system have been implemented in an Odroid XU-4 Platform. The inference time depends on the number of faces present in the scene; in the case of a couple of them, the proposed system analyzes a single full-HD frame in about 1.8 s.

### **3** Training of the Classifier and System Performances

The dataset we have used to train and test the classifier consists of about 13,000 images divided into 34 classes of known people (30 subject have been extracted from the well-known CASIA database [13] while other 4 have been created by our own); let us call  $Kp_{db}$  such dataset, which represents a group of known people that the system must be able to recognize. This dataset contains hundreds of samples for each subject. Moreover, we use also another dataset of about 20,000 images of completely unknown people (called  $Up_{db}$ ) useful to test the robustness of the system in not confusing strangers with known people. The  $Up_{db}$  is composed by both images acquired by our own as well as images extracted from the Internet and is composed by hundreds of different subjects.

### 3.1 ID Recognition in a Closed Number of Classes

The first test conducted was done using only  $Kp_{db}$ , to show the system's performances in a simple task such as the recognition of one subject within a limited number of classes. In particular, we wanted to evaluate how the number of training samples used to configure the classifiers influences the results, as well as to compare the performances obtainable from different feature extractors and from different classifiers.

A predetermined number of samples for each person was extracted randomly from  $Kp_{dB}$ . This data represent the training set used to setup the different classifiers. All the remaining samples in the dataset were subdivided between the validation-set and the test-set in the proportions of 20–80%. The validation set is used, if needed, to control and stop the training process, while the testing data is eventually used to verify the system performances. Each test has been repeated 20 times by randomly extracting the training samples, training the classifiers and testing the performances using the test-set. The parameter to validate the system performances is the misclassification, i.e. the percentage of misclassified samples referred to the entire number of testing samples as defined in Eq. 1

$$MsC = \frac{N_s - \sum_{i=1}^{N} Tp_i}{N_s} * 100$$
 (1)

where N is the number of classes (34 in our case),  $Tp_i$  is the true-positive occurrence for the i-class and  $N_s$  is the number of samples used in the test.

The results of this test is depicted in Fig. 1. For each result the mean value of the various tests and the standard deviation (as a superimposed bar) are reported. From the graphs it can be noted that Dlib always outperforms Openface, regardless of the classifier used. Several other tests, not present here for the sake of simplicity, have been conducted to compare the two features extractors: in all of them the results of the system based on Dlib outperform those obtained using OpenFace. This result leads to the conclusion that the Dlib's neural network have been better trained, with respect to the Openface system, in the recognition of the salient characters of a face. Hence, we will consider only Dlib as feature extractor for the overall system. As a further consideration it can also be noted that the setup of the classifiers seems not to require a large amount of data, and a few tens of samples proves to be sufficient to obtain good performances.

### 3.2 Confidence Evaluation

The results just proposed can be significantly improved when a confidence assessment on classification is introduced. As a confidence measure, we introduce a normalized distance between the two most likely classes according to Eq. 2



Fig. 1 Percentage of misclassified testing samples using two feature extractors (Dlib and OpenFace) and three classifiers (Rocchio, SVM and MLP), vs. the number of training samples

$$C = \frac{abs(d_1 - d_2)}{max(d_1, d_2)}$$
(2)

where  $d_1$  and  $d_2$  are, relatively to the various cases:

- for the Rocchio classifier: the euclidean distances (lowest and second-lowest) of the feature under examination with respect to all the centroids representative of the classes;
- for the SVM classifier: the highest and the second-highest probability;
- for the MLP classifier: the "logit", i.e. the output of the latest but one layer of the MLP (before the softmax operation) respectively of the largest and second-largest value.

This measure closely resembles the criterion used in [1], where the system expresses a doubt in the feedback when the ratio between second maximum and the maximum voted class is close to 1. The introduction of an appropriate threshold value for C makes it possible to discard low-confidence results. This proves to considerably reduce the misclassification error, even if with the drawback of a partial decrease of the TP ratio. Most importantly, the use of a threshold on the confidence is essential to introduce a discriminant between known and unknown persons.

In order to verify this aspect, we carried on the following test: using a variable number of samples (as in the previous test) extracted from  $Kp_{db}$  we setup the three classifiers. Each system was tested using both the remaining  $Kp_{db}$  samples and all the  $Up_{db}$  samples. For each confidence threshold, all the results under the threshold



Fig. 2 ROC comparison of 3 Classifiers trained with a different number of samples. Curves are parametrized with confidence threshold

have been rejected, identifying them as "unknown". The expectation would be that none or just very few samples belonging to  $Up_{db}$  would be erroneously classified as belonging to one of the subjects present in  $Kp_{db}$ .

The parameters adopted to globally evaluate the performances are:

- the True positive ratio (TPr), i.e. the ratio between the number of correctly classified samples belonging to  $Kp_{db}$  and the total number of samples of  $Kp_{db}$  which have been used during the test.
- the False Positive Ratio (FPr), i.e. the ratio between the number of samples of  $Up_{db}$  that overcome the threshold (and therefore have been erroneously classified) and the total number of samples of  $Up_{db}$

The test was performed by using different threshold values within the [0, 1] range and modifying the number of training samples. Furthermore, each test was repeated ten times in order to evaluate the mean value and the variance of the results. The results, depicted as ROC curves, are reported in Fig. 2. It can be noticed that with a suitable training all methods can reach almost a zero false positive ratio. However, the MLP classifier reaches this goal still yielding more than 50% true positive, even when the training has been performed using just a couple of samples; on the contrary the classifiers based on SVM need at least two tens of training samples to provide acceptable results. We can conclude that MLP performs slightly better than others, but at the cost of a more complex structure. Moreover SVM, adopting a statistical analysis, can not be used if only a few samples are available.

### 4 Conclusions

In this paper he have analyzed the performances of a face recognition system based on a "off the shelf" face detector and feature extractor followed by a suitable classifier. A confidence evaluation of the results allows to keep false recognitions close to zero, while the true positive can be maintained above the 50% even when the classifier is trained on a very limited number of samples. The overall system, implemented on an Odroid XU-4 Platform recognizes faces up to 4 m at a speed of 1.8 seconds per frame. An evaluation of the quality of these performances obviously depends on the specific application in which the facial recognition system is used, but it is our opinion that they are largely sufficient in the most practical use systems.

Acknowledgements University of Trieste-FRA projects.

### References

- Neto, L.B., Grijalva, F., Maike, V.R.M.L., Martini, L.C., Florencio, D., Baranauskas, M.C.C., Rocha, A., Goldenstein, S.: A kinect-based wearable face recognition system to aid visually impaired users. IEEE Trans. Hum. Mach. Syst. 47(1), 52–64 (2017)
- Krishna, S., Little, G., Black, J., Panchanathan, S.: A wearable face recognition system for individuals with visual impairments. In: Proceedings of the 7th International ACM SIGACCESS Conference on Computers Accessibility, pp. 106–113 (2005)
- Astler, D., et al.: Increased accessibility to nonverbal communication through facial and expression recognition technologies for blind/visually impaired subjects. In: Proceedings of the 13th International ACM SIGACCESS Conference Computers Accessibility, pp. 259–260 (2011) (2010, pp. 4538–4541, pp. 45–52)
- 4. Bansal, A., Castillo, C., Ranjan, R., Chellappa, R.: The Do's and Don'ts for CNN-based Face Verification (2017)
- Amos, B., Ludwiczuk, B., Satyanarayanan, M.: Openface: A general-purpose face recognition library with mobile applications, CMU-CS-16-118. Technical Report, CMU School of Computer Science (2016)
- http://blog.dlib.net/2017/02/high-quality-face-recognition-with-deep.html . Accessed 12 July 2018
- 7. Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137154 (2004)
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. https://arxiv.org/abs/1409.4842
- He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. https://arxiv. org/abs/1512.03385
- Ng, H.W., Winkler, S.: A data-driven approach to cleaning large face datasets. In: Proceedings of the IEEE International Conference on Image Processing (ICIP), pp. 27–30. Paris, France (2014)
- Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition
- 12. Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. **13**(4), 376–380 (1991)
- Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch (2014). https:// arxiv.org/abs/1411.7923

# **Twiddle Factor Generation Using Chebyshev Polynomials and HDL for Frequency Domain Beamforming**



Ghattas Akkad, Ali Mansour, Bachar ElHassan, Frederic Le Roy and Mohamad Najem

**Abstract** Twiddle factor generation is considered a computationally intensive task in generic length, high resolution, FFT operations. In order to accelerate twiddle factor generation, we propose a reconfigurable hardware architecture based on Chebyshev polynomial expansion for computing the cosine and sine trigonometric functions under finite precision arithmetic. We show that our approach presents a flexible 3 decimal digits precision output for variable length FFT operations, since the same design space can be used for any power of 2 FFT length. In particular, this study focuses on communication systems incorporating frequency domain beamforming algorithms for single and multi-beams. The proposed architecture is competitive with classical designs i.e. Coordinate Rotation Digital Computer, CORDIC and Taylor Series by providing low latency, high precision twiddle factors for variable length FFT.

**Keywords** FFT  $\cdot$  FPGA  $\cdot$  Accelerated computing  $\cdot$  VHDL  $\cdot$  Beamforming  $\cdot$  Twiddle factor  $\cdot$  Chebyshev  $\cdot$  Frequency domain

Lab-STICC, CNRS, UMR 6285, ENSTA Bretagne, Brest, France e-mail: Ghattas.akkad@ensta-bretagne.org

A. Mansour e-mail: Ali.Mansour@ensta-bretagne.fr

F. Le Roy e-mail: Frederic.LE\_ROY@ensta-bretagne.fr

G. Akkad

Department of Electrical Engineering, University of Balamand, Koura, Lebanon

B. ElHassan Faculty of Engineering, Lebanese University, Tripoli, Lebanon e-mail: bachar.elhassan@gmail.com

M. Najem Computer and Communication Engineering, Lebanese International University, Mount Lebanon, Lebanon e-mail: najem.mhd@gmail.com

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_19 153

G. Akkad (🖂) · A. Mansour · F. Le Roy

### **1** Introduction

Beamforming or (Spatial filtering) is of a particular importance in wireless communication and multiple input multiple output (MIMO), systems since it provides increased signal to noise ratio (SNR), thus nulling interfering signals, by steering the beam and nulls of an antenna array towards specific directions [1, 2]. Beamforming techniques can be time or frequency domain based, and are often implemented on dedicated hardware i.e. FPGAs or Digital Signal Processing (DSP) processors for real-time performance.

In time domain, phase and delay corrections are performed on the entire input signal with high sampling rate [3], thus increasing design complexity and processing requirements when implemented on dedicated hardware. In contrast, frequency domain beamforming divides the input signal into parallel data streams of smaller frequency components where each frequency component is processed separately [3] and is thus preferable when implemented on dedicated hardware for parallelism and pipelining. Such processing is achieved by applying the Inverse Fast Fourier Transform (IFFT) or the Fast Fourier Transform (FFT), considered as one of the most computationally intensive operation [4]. In addition, twiddle factor precision greatly affects the overall design accuracy and output resolution.

In this context, FPGAs are chosen as the main processing unit for their ability of supporting massive parallelism, pipelining and hardware reuse while increasing throughput, speed and performance [5]. Given the beamforming application requirements of real time beam steering and interference cancelling while processing an increased amount of data streams for MIMO systems in real time, it is of great interest to accelerate such algorithms on FPGAs while maintaining high resolution for a low complexity architecture.

Moreover, Chebyshev polynomial expansion [6] is used to compute the required trigonometric functions for a variable length FFT/IFFT twiddle factor generation, offering a three digits precision on a low complexity architecture for a fifth order polynomial.

Many hardware optimization techniques for an efficient FFT computation on FPGA have been proposed in [7–12]. Furthermore, the presented designs focus on implementing an optimized butterfly unit with pre-computed twiddle factors for a fixed length FFT. In contrast [13–16] present memory efficient techniques for twiddle factor generation using Coordinate Rotation Digital Computer, CORDIC. However, the use of internal memory cannot be omitted. Furthermore, a disadvantage of CORDIC is that a large number of iterations are required to achieve high precision, thus resulting in a considerably large latency in pipelined architectures [6].

While previous work focuses on optimizing the CORDIC architecture for a real time, high precision twiddle factor generation, none considered the Chebyshev polynomial approach. This paper proposes an alternative approach for implementing a low latency, high precision twiddle factor generation technique using Chebyshev polynomial expansion for variable, power of 2, length FFT/IFFT operations on FPGA.

Moreover, the presented hardware architecture is a low latency memory less design and offers parallelism and pipelining.

The paper is organized as follows: Sect. 2 presents a summary of the FFT algorithm and describes the Chebyshev polynomial expansion for the required trigonometric functions. Then, Sects. 3 and 4 detail simulation and synthesis results using MATLAB and HDL; Sect. 5 presents a comparison with classical approaches. Finally, the conclusion and suggested future work are presented in Sect. 6.

### 2 Chebyshev Polynomial Expansion

In order to achieve high resolution FFT for accurate beam steering, a low latency, high precision twiddle factor generation architecture for a generic FFT length is proposed using Chebyshev approximation. This section presents some literature review on FFT and describes the trigonometric function approximation using Chebyshev polynomials and coefficients. The parameters used in the following study are s[n] discrete signal sample, S[k] FFT output sample, n input sample index, k output samples index, and N for signal length.

### 2.1 FFT Overview

The FFT is an optimized technique for computing the Discrete Fourier Transform DFT, described in Eq. (1); Its effectiveness lies in reducing the number of operations required, especially the number of multiplications which further reduces the execution time and the complexity [1, 7].

$$S[k] = \sum_{n=0}^{N-1} s[n] e^{-j2\pi nk/N}$$
(1)

The first FFT algorithms, proposed by Cooley and Tukey, are: the Radix-2 Decimation in Time (DIT) and the Radix-2 Decimation in Frequency (DIF) [7]. Radix-2 DIT/DIF FFT divides the sequence of s[n]/S[k] of length N into even and odd samples iteratively; given that N is a power of 2. The samples are later multiplied by the rotations or twiddle factors represented by  $W_N^k$  where

$$W_N^k = e^{-j2\pi k/N} \tag{2}$$

The multiply addition and multiply subtraction operations are represented by the butterfly. We can still use the radix-2 algorithm for the calculation of the FFT. However, the use of higher radix-r algorithms can reduce the pipeline number and the number of stages [1]. The improvement of computational complexity has led to





the implementation of the FFT radix-4, whose number of data points is a power of 4 [8]. The FFT radix-4 algorithm is therefore a decomposition of DFT in four under independent transforms of size N/4 [1]. The typical structure of radix-2 DIT butterfly is shown in Fig. 1.

As shown in Fig. 1, the butterfly, considered as the core unit of the FFT operation, is achieved by multiplying the input samples, s[n] by the appropriate twiddle factors [11]. Thus twiddle factors precision greatly affects the overall resolution. Moreover, high precision, pre-computed twiddle factors require additional storage elements.

### 2.2 Chebyshev Polynomial Representation

The CORDIC algorithm discussed in [14–16] allows approximating trigonometric functions using shift-add operations at a low to moderate computational cost [6]. However, since the number of bits is linearly proportional to the number of iterations, approximating high precision functions results in an increased latency for pipelined architectures [6]. Thus, an alternate low latency, high precision approximation technique is required for parallel and pipelined implementations.

For some functions, Taylor series approximation converges fast with sufficient precision. However for other functions i.e. sine and cosine, Chebyshev approximation can be used for better precision while reducing polynomial terms [6]. Chebyshev polynomials are orthogonal polynomials of the form:

$$T_k(u) = \cos(k \ a \cos u) \tag{3}$$

where *u* is defined over the interval [-1, 1], hence a function approximation can be written as shown in Eq. (4).

$$f(u) \cong \sum_{k=0}^{L-1} c(k)T_k(u) \tag{4}$$

For twiddle factor generation, the trigonometric functions sine and cosine, representing the real and imaginary components described in Eqs. (5)–(7), are approximated over the interval [0, 1].

$$Re\{f(u)\} = \cos(\pi u) \tag{5}$$

Twiddle Factor Generation Using Chebyshev Polynomials and HDL ...

$$Im\{f(u)\} = \sin(-\pi u) \tag{6}$$

$$f(u) = \cos(\pi u) - j\,\sin(\pi u) \tag{7}$$

where *u* is the change of variable mapping *x* from interval [a, b] i.e. [0, 1] to [-1, 1] using Eq. (8) [6].

$$u = \frac{2x - b - a}{b - a} \underset{a=0,b=1}{\Longrightarrow} 2x - 1 \tag{8}$$

Furthermore, Eqs. (5) and (6) are expanded to their Chebyshev polynomial representation by computing the coefficients for u as presented in Eq. (8). And x is the twiddle factor exponent defined by Eq. (9) in which K is the rotation index and N is the FFT length.

$$x = \frac{2k}{N} \tag{9}$$

For *N* representing a power of two FFT length, Eq. (8) can be written as follows [7]:

$$u = k2^{-r+2} - 1 \tag{10}$$

where r is defined as (11), eliminating the need for the multiply-divide operation by inferring a right shift.

$$r = \log_2 N \tag{11}$$

Moreover, a generalized form, for computing a Discrete Fourier Transform, DFT, exponent of N terms, is achieved by modifying (10) as presented in (12), where n is a discrete signal sample index, hence allowing the computation of the rotation index for a variable, power of two, length DFT/FFT at minimal cost [8].

$$u = nk2^{-r+2} - 1 \tag{12}$$

### 2.3 Chebyshev Coefficients

The Chebyshevs fifth order polynomial coefficients, for Eqs. (5) and (6), satisfying (4), are computed using the *Chebfun* toolbox on MATLAB for u in the interval [-1, 1] and listed in Table 1.

In addition, the expanded coefficients satisfying Eq. (13) are computed and listed in Table 2.

| f(u)   | Coefficients |              |              |              |              |              |  |
|--------|--------------|--------------|--------------|--------------|--------------|--------------|--|
|        | <i>c</i> (0) | <i>c</i> (1) | <i>c</i> (2) | <i>c</i> (3) | <i>c</i> (4) | <i>c</i> (5) |  |
| sine   | -0.4720      | 0            | 0.4994       | 0            | -0.0279      | 0            |  |
| cosine | 0            | -1.1336      | 0            | 0.1380       | 0            | -0.0045      |  |

### Table 1 Chebyshev coefficients

Table 2 Expanded Chebyshev coefficients

| f(u)   | Coefficients |              |              |              |              |              |  |  |
|--------|--------------|--------------|--------------|--------------|--------------|--------------|--|--|
|        | <i>a</i> (0) | <i>a</i> (1) | <i>a</i> (2) | <i>a</i> (3) | <i>a</i> (4) | <i>a</i> (5) |  |  |
| sine   | -0.9994      | 0            | 1.2227       | 0            | -0.2239      | 0            |  |  |
| cosine | 0            | -1.5707      | 0            | 0.6435       | 0            | -0.0729      |  |  |

Table 3 General form coefficients

| f(x)   | Coefficients |              |              |              |              |              |  |
|--------|--------------|--------------|--------------|--------------|--------------|--------------|--|
|        | <i>g</i> (0) | <i>g</i> (1) | <i>g</i> (2) | <i>g</i> (3) | <i>g</i> (4) | <i>g</i> (5) |  |
| sine   | -0.0006      | -3.0997      | -0.4824      | 7.1643       | -3.5821      | 0            |  |
| Cosine | 1.0001       | -0.0099      | -4.8041      | -0.6871      | 5.8348       | -2.3339      |  |

$$f(u) \cong \sum_{k=0}^{L-1} a(k)u^k \tag{13}$$

Finally, the general purpose coefficients for x in [0, 1] satisfying Eq. (14) are computed by substitution and listed in Table 3.

$$f(x) \cong \sum_{k=0}^{L-1} g(k) x^k \tag{14}$$

While the general form polynomial of Eq. (14) can be implemented as a general purpose circuit for approximating any function, by interchanging its general form Chebyshev coefficients, the expanded Chebyshev form is more favorable. Indeed, its advantage is in implementing parallel and pipelined architectures with lower resource consumption, less product terms and minimal latency and word length [6].

### **3** Software Simulation

To assess performance and accuracy of the previously computed coefficients in contrast to the Taylor series expansion and to choose the most convenient polynomial

| f(x)   | Twiddle factor |              |              |              |              |              |  |  |
|--------|----------------|--------------|--------------|--------------|--------------|--------------|--|--|
|        | <i>x</i> (0)   | <i>x</i> (1) | <i>x</i> (2) | <i>x</i> (3) | <i>x</i> (4) | <i>x</i> (5) |  |  |
| sine   | 0              | -0.3827      | -0.5000      | -0.7071      | -1           | 0            |  |  |
| cosine | 1              | 0.9239       | 0.8660       | 0.7071       | 0            | -1           |  |  |

Table 4 Twiddle factor direct computation

Table 5 Twiddle factor using 5th order Taylor series

| f(x)   | Twiddle factor – Taylor series |              |              |              |              |              |  |  |
|--------|--------------------------------|--------------|--------------|--------------|--------------|--------------|--|--|
|        | <i>x</i> (0)                   | <i>x</i> (1) | <i>x</i> (2) | <i>x</i> (3) | <i>x</i> (4) | <i>x</i> (5) |  |  |
| sine   | 0                              | -0.3827      | -0.5000      | -0.7071      | -1.0045      | -0.5240      |  |  |
| cosine | 1                              | 0.9239       | 0.8661       | 0.7074       | 0.200        | 0.1239       |  |  |

| Table 6 | <ul> <li>Expande</li> </ul> | d form 4th | and 5th or | der polynomial |
|---------|-----------------------------|------------|------------|----------------|
|---------|-----------------------------|------------|------------|----------------|

| f(u)             | Twiddle factor |              |              |              |              |              |
|------------------|----------------|--------------|--------------|--------------|--------------|--------------|
|                  | <i>u</i> (0)   | <i>u</i> (1) | <i>u</i> (2) | <i>u</i> (3) | <i>u</i> (4) | <i>u</i> (5) |
| 4th order sine   | 0              | -0.3825      | -0.5002      | -0.7077      | -0.9994      | 0            |
| 4th order cosine | 0.9956         | 0.9279       | 0.8683       | 0.7049       | 0            | -0.9556      |
| 5th order sine   | 0              | -0.3825      | -0.5002      | -0.7077      | -0.9994      | 0            |
| 5th order cosine | 1.0001         | 0.9238       | 0.8661       | 0.7072       | 0            | -1.0001      |

order simulations using MATLAB are implemented. The current section details and discusses the obtained results.

MATLAB simulations are performed for the Chebyshev coefficients in expanded form with different polynomial degrees. The input test vector is an array of constants x = [0, 1.0/8, 1.0/6, 1.0/4, 1.0/2, 1.0]. The simulated result was compared to a direct computation of the twiddle factor and with a Taylor series expansion for accuracy and precision and is presented in Tables 4, 5 and 6.

In this context, Tables 4 and 5 present a direct computation for the twiddle factor real and imaginary components in Eq. (7) and Taylor series expansion for Eqs. (5) and (6) over an expansion point p = 0 for x in interval [0, 1].

Moreover, Table 6 lists the computation results for a 4th and 5th order Chebyshev polynomials using expanded, u, form coefficients.

Tables 4, 5 and 6 show that a 5th order Taylor series expansion offers better accuracy while greatly degrading precision the farther the input is from the expansion point. In contrast, a 4th order Chebyshev polynomial of the expanded form offers a three digits precision for Eq. (6) and two digits of precision for Eq. (5), a 5th order polynomial offers three digits of precision for Eqs. (5) and (6) for all inputs. Furthermore, a 5th order Chebyshev polynomial of the expanded form is chosen for implementing an HDL twiddle factor generation processor on FPGA over its counterpart, the Taylor series expansion, given its stable precision and error distribution [6].

#### 4 **HDL Based Implementation**

We have previously introduced MATLAB simulation results for an alternative approach for generating high precision twiddle factors using Chebyshev polynomials; In this section we present a low latency FPGA architecture for implementing a 5th order Chebyshev expanded form polynomial. This design computes Eqs. (5), (6) and (13) in pipelined, parallel approach as shown in Fig. 2 following the grouping of Eqs. (15) and (16).

$$\cos(\pi u) \cong u \left( a_1 + \left( u^2 (a_3 + a_5 u^2) \right) \right)$$
(15)

$$\sin(-\pi u) \cong a_0 + \left(u^2 \left(a_2 + a_4 u^2\right)\right)$$
(16)

The HDL implementation is conducted on a 18 bits signed fixed point representation, targeting a Xilinx ZynQ '7z020clg484 -1' and an Intel/Altera Cyclone V SoC 5CSEMA5F31C6.

Moreover, the parallel pipelined architecture, presented in Fig. 2, implements four pipelined stages with a latency of five i.e. the time required to obtain the first value for computing the twiddle factor components. Synthesis and implementation results for the Chebyshev 5th order polynomial using VHDL for different targets are presented in Table 7.

Table 7 presents synthesis results on different targets. The following architecture, presented in Fig. 2, describes an implementation of a parallel design with 5 pipeline stages, i.e. latency of 5, utilizing 6 DSP blocks at 174.64 MHz for the Cyclone V target and 174.917 for the ZynQ. DSP block utilization is compatible with the described architecture in Fig. 2 since 6 signed 18\*18 multipliers are needed, for an 18 bit signed input data type [6]. Furthermore, the design on both target shows minimal LE utilization, hence the inferred adders are transistor level adders belonging to the embedded DSP blocks.

Furthermore, a second implementation approach is done using a behavioral description for a parallel one stage architecture as described in [6] using Clenshaws recurrence formula [6]. Synthesis results are presented in Table 7.

Table 8 presents the synthesis results for a parallel one stage architecture with no pipeline. In contrast to Table 8, the described architecture infers 6 DSP units of 18\*18 multipliers for both targets at a much lower frequency of 24.73 for Cyclone V and 25.124 for the ZynQ. The minimal use of LE is a result of cascading multiple

| Table 7 Chebyshev 5th           order polynomial pipelined | Architecture | Synthesis results - Parallel pipelined |     |        |                    |  |
|------------------------------------------------------------|--------------|----------------------------------------|-----|--------|--------------------|--|
| order polynomial pipenned                                  |              | DSP<br>(%)                             | FF  | LE (%) | Frequency<br>(MHz) |  |
|                                                            | Cyclone V    | 6.89                                   | 138 | 0.143  | 174.64             |  |
|                                                            | ZynQ         | 2.72                                   | 246 | 0.076  | 174.917            |  |

Table 7



| Target    | Synthesis results – Parallel 1 stage |    |        |                 |  |  |
|-----------|--------------------------------------|----|--------|-----------------|--|--|
|           | DSP (%)                              | FF | LE (%) | Frequency (MHz) |  |  |
| Cyclone V | 6.89                                 | 9  | 0.162  | 24.73           |  |  |
| ZynQ      | 2.72                                 | 16 | 0.078  | 25.124          |  |  |

 Table 8
 Chebyshev 5th order polynomial combinational

Table 9 Twiddle factor using Q2.15 format

| f(u)   | Twiddle factor in finite precision 5th order Chebyshev |              |              |              |              |              |  |  |
|--------|--------------------------------------------------------|--------------|--------------|--------------|--------------|--------------|--|--|
|        | <i>x</i> (0)                                           | <i>x</i> (1) | <i>x</i> (2) | <i>x</i> (3) | <i>x</i> (4) | <i>x</i> (5) |  |  |
| sine   | 0                                                      | -0.3825      | -0.5001      | -0.7077      | -0.9993      | -0.0006      |  |  |
| cosine | 1                                                      | 0.9231       | 0.8610       | 0.7060       | 0            | -1.0002      |  |  |

Table 10 Twiddle factor using Q2.15 format

| f(x)          | Twiddle fact | Twiddle factor in finite precision CORDIC |              |              |              |              |  |
|---------------|--------------|-------------------------------------------|--------------|--------------|--------------|--------------|--|
|               | <i>x</i> (0) | <i>x</i> (1)                              | <i>x</i> (2) | <i>x</i> (3) | <i>x</i> (4) | <i>x</i> (5) |  |
| CORDIC sine   | 0            | -0.38269                                  | -0.49981     | -0.7070      | -0.9999      | 0.0000       |  |
| CORDIC cosine | 1            | 0.92379                                   | 0.86611      | 0.7071       | 0.0001       | -1.0000      |  |

embedded DSP blocks, thus increasing the designs critical path and greatly degrading the operating frequency.

As shown in Tables 7 and 8, the benefit of implementing a parallel, pipelined architecture is the increase in the operating frequency at the cost of a larger latency for the first output. The operating frequency can be further increased by pipelining critical paths i.e. multiplication-addition, as shown in Fig. 2.

### 5 Results Discussion

We have previously described the difference in precision for a Chebyshev and Taylor series polynomial expansions. This section presents additional comparison using finite precision and results discussion for different CORDIC optimization techniques presented in [6, 14–16]. Moreover, the finite precision simulation is performed on the input vector x = [0, 1.0/8, 1.0/6, 1.0/4, 1.0/2, 1.0] translated to *u* following a signed fixed point Q2.15 format i.e. one signed bit, two integer bits and fifteen bits of precision covering a possible range of [–4, 3.999]. Tables 9, 10, 11 and Fig. 3 shows the fixed point simulation results for the CORDIC, Chebyshev and Taylor twiddle factor computation using Q2.15 and Q3.14 representation.

As shown in Tables 9, 10, 11 and Fig. 3 the CORDIC computation provides exact accuracy however, as stated previously, it requires a large number of iteration

| f(x)   | Twiddle factor in finite precision 5th order Taylor |              |              |              |              |              |  |  |
|--------|-----------------------------------------------------|--------------|--------------|--------------|--------------|--------------|--|--|
|        | <i>x</i> (0)                                        | <i>x</i> (1) | <i>x</i> (2) | <i>x</i> (3) | <i>x</i> (4) | <i>x</i> (5) |  |  |
| sine   | 0                                                   | -0.3826      | -0.5000      | -0.7048      | -0.9447      | -0.5240      |  |  |
| cosine | 1                                                   | 0.9238       | 0.8661       | 0.7073       | 0.0198       | 0.1218       |  |  |

 Table 11
 Twiddle factor using Q3.14 format



Fig. 3 Sin(-pi\*x) and Cos(pi\*x) Q2.15 fixed point direct, Taylor, Chebyshev and CORDIC results

depending on the precision required in bits, thus causing a considerable increase in latency for pipelined architectures.

In addition a Taylor series coefficients would require 4 integer bits i.e. Q3.14 format for a 5th order polynomial given a total word-length of 18bits thus providing smaller accuracy for the same hardware resources, in addition to the erroneous result the further the input gets from the expansion point. Furthermore the Mean Square Error (MSE) of the difference between the direct computation and Chebyshev approximation in finite precision for the sine and cosine functions respectively is 0.000 and 0.0102.

While the performance of the FFT processor is determined by the main computational unit, i.e. butterfly [11, 12], real-time computation of high resolution twiddle factors can introduce unneeded delays affecting the overall performance of the system in critical applications [14]. Moreover, [15, 16] introduced optimized CORDIC implementations for generating twiddle factors using 16 rotation stages in [15] and embedded memory in [16]. Even though, the presented techniques achieve better accuracy, performance and resource consumption over the classical CORDIC implementations, they still suffer from high latency of 16 rotation stages [15] and rely heavily on the use of an internal memory thus increasing access time and reducing operating frequency. In contrast, a Taylor series polynomial expansion is presented in [6] for implementing trigonometric functions with lower latency and better precision. However, higher order Taylor series are required for increased accuracy since it suffers from degrading performance the farther the input value is from the expansion point [6].

A requirement for real-time digital communication applications i.e. frequency domain beamforming, is a high precision, low latency twiddle factor generation for FFT [2, 17]. In contrast to the CORDIC and Taylor series implementations, the presented Chebyshev polynomial technique offers a low latency, high precision and low resource consumption architecture for variable length FFT without the use of internal memory elements. In addition, as shown in [6] for a 5th order polynomial, Taylor series expansion only provides 6-bits of accuracy, while the Chebyshev approximation has 16-bits precision. Thus, the presented technique provides a promising alternative for a low latency, high precision twiddle factor generation approach for reconfigurable systems, i.e. FPGAs [18].

### 6 Conclusion

This paper presents an alternate technique for implementing a high resolution, low latency, twiddle factor generation processor on FPGA. In contrast to the naïve implementations, Chebyshev polynomial expansion offered the flexibility to implement a high resolution, reconfigurable, twiddle factor generation scheme on FPGA to meet the required performance of beamforming applications, while preserving speed, resource utilization and minimal latency. However, CORDIC presents the possibility of inferring multiplier free architectures using a shift and add approach; it still suffers from an increased latency in pipelined designs. Moreover, In contrast to Taylor series expansion, Chebyshev polynomials achieved better precision for less product terms and orders. While the general form polynomial presented a general approach to approximate a function by pre-loading its appropriate coefficients, it includes more product terms thus an increased latency. For this design, the expanded polynomial form is chosen given the application requirements for processing high precision data in real-time in a parallel and pipelined low latency architecture. The design can be further improved for higher digit precision by slightly increasing the polynomial order.

### References

- Hamid, U., Qamar, R.A., Waqas, K.: Performance Comparison of time domain and frequency domain beamforming techniques for sensor array processing, In: Proceedings of 2014 11th International Conference on Applied Sciences & Technology (IBCAST). Islamabad, Pakistan (2014)
- 2. Allen, B., Ghavami, M.: Adaptive Array Systems Fundamentals and Applications. Wiley (2005)
- 3. Navaro, R.: Frequency domain beamforming for a deep space network downlink array. In: Aerospace Conference, Big Sky. MT, USA (2012)
- 4. Nadal, A.B.J., Abdel Nour, C., Lin, H.: Hardware prototyping of FBMC/OQAM base-band for 5G mobile communication systems. In: IEEE International Symposium on Rapid System Prototyping, New Delhi, India, pp. 135–141 (2014)
- Dali, M., Gibson, R., Amira, A., Guessoum, A., Ramzan, N.: An efficient MIMO-OFDM radix-2 single-path delay feedback FFT implementation on FPGA. In: NASA/ESA Conference on Adaptive Hardware and Systems (2015)
- 6. Meyer-Baese, U.: Digital Signal Processing with Field Programmable Gate Arrays, 4th edn. Springer (2014)
- 7. Garrido, M.: A new representation of FFT algorithms using triangular matrices. In: IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1737–1745 (2016)
- Hung, C.H., Chen, S.G., Chen, K.L.: Design of an efficient variable length FFT processor. In: Proceedings of the International Symposium on Circuits and Systems, pp. 833–836 (2004)
- Lucius, G., Le Roy, F., Aulagnier, D., Azou, S.: An algorithm for external eigenvectors computation of Hermitian matrices and its FPGA implementation. In: IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 1407–1410 (2013)
- Nane, R., Sima, V.M., Pilato, C., Choi, J., Fort, B., Canis, A., Chen, Y.T., Hsiao, H., Brown, S., Ferrandi, F., Anderson, J., Bertels, K.: A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(10), 1591–1604 (2016)
- Neuenfeld, R.H., Fonesca, M.B., da Costa, E.A.C., Oses, J.P.: Exploiting addition schemes for the improvement of optimized radix-2 and radix-4 FFT butterflies. In: IEEE 8th Latin American Symposium on Circuits & Systems (LASCAS). Bariloche, Argentina (2017)
- 12. Neuenfeld, R., Fonesca, M., Costa, E.: Design of optimized radix-2 and radix-4 butterflies from FFT with decimation in time. In: IEEE 7th Latin American Symposium on Circuits & Systems (LASCAS). Florianopolis, Brazil (2016)
- 13. Yu, C., Lee, K.H., Kuo, C.F.: Low-complexity twiddle factor generator for FFT processors. In: International Conference Consumer Electronics. Las Vegas, NV, USA (2017)
- 14. Chi, J.C., Chen, S.G.: An efficient twiddle factor generator. In: 12th European Signal Processing Conference. Austria, Vienna (2004)
- Zhou, J.: A new method to generate twiddle factor using CORDIC based radix-4 FFT butterfly. In: 2013 International Conference on Communications, Circuits and Systems (ICCCAS). Chengdu, China (2013)
- Shinde, S.N.: Twiddle factor generation using CORDIC processor for fingeprint application. In: 2015 International Conference on Computer, Communication and Control (ICA). Indore, India (2015)
- Srar, J.A., Chung, K.-S., Mansour, A.: Analysis of the LLMS adaptive beamforming algorithm implemented with finite precision. In: IEEE Wireless Communications and Networking Conference (WCNC 2012). France, Paris (2012)
- Najem, M., Bollengier, T., Le Lann, J.C., Lagadec, L.: A cost effective approach for efficient time-sharing of reconfigurable architectures. In: 2017 International Conference on FPGA Reconfiguration for General-Purpose Computing (FPGA4GPC). Hamburg, Germany (2017)

# Part VI Wireless Circuits and Systems

## A LoRaWAN Wireless Sensor Network for Data Center Temperature Monitoring



Tommaso Polonelli, Davide Brunelli, Andrea Bartolini and Luca Benini

Abstract High-performance computing installations, which are at the basis of web and cloud servers as well as supercomputers, are constrained by two main conflicting requirements: IT power consumption generated by the computing nodes and the heat that must be removed to avoid thermal hazards. In the worst cases, up to 60% of the energy consumed in a data center is used for cooling, often related to an over-designed cooling system. We propose a low-cost and battery-supplied wireless sensor network (WSN) for fine-grained, flexible and long-term data center temperature monitoring. The WSN has been operational collecting more than six million data points, with no losses, for six months without battery recharges. Our work reaches a 300× better energy efficiency than the previously reported WSNs for similar scenarios and on a  $7 \times$  wider area. The data collected by the network can be used to optimize cooling effort while avoiding dangerous hot spots.

### 1 Introduction

Due to the extraordinary fast growth of IT (Information Technology) industry, the worldwide data centers energy consumption has attracted global attention because of their impact on pollution and climate change. In past years the overall trend in

A. Bartolini e-mail: andrea.bartolini@unibo.it

L. Benini e-mail: luca.benini@unibo.it; luca.benini@iis.ee.ethz.ch

D. Brunelli (⊠) University of Trento, Trento, Italy e-mail: davide.brunelli@unitn.it

L. Benini ETH Zurich, Zurich, Switzerland

T. Polonelli · A. Bartolini · L. Benini University of Bologna, Bologna, Italy e-mail: tommaso.polonelli2@unibo.it

<sup>©</sup> Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_20

hardware design field continue to decrease the power efficiency of both processor and memory, but a parallel trend at data centers is that the heat density of computing systems has increased at a faster rate. Nowadays, into the data center hosting facilities, the IT devices count only for 30–to 60% [1] of the overall electric bill; indeed the rest of the energy is lost by the environmental control systems such as Computer Air Conditioning (CRAC) units, water chillers, humidifiers or during the power conversion process.

Real-time data about the temperature inside a data center building [2], together with historical information collected, are useful not only for diagnosis problems, to predict possible thermal issues, but also for improving the data center power efficiency [3]. TierO Supercomputing centers [4] are designed for peak computational performance and thus are characterized by the highest power/computational densities between data centers. Cooling efficiency in this domain limits the maximum achievable performance and thus TierO's QoS. As an example, the former most powerful supercomputer worldwide Tianhe-2 occupies 720 m<sup>2</sup>, consumes 17.8 MW for 33.2 Petaflops. However, the power consumption increases to 24 MW also considering the cooling infrastructure [5]. This cost can be reduced by adopting predictive control approaches [6, 7], which combine the supercomputer's IT power consumption, external and room temperatures for optimal control of cooling effort [8]. Traditional commercial solutions for temperature monitoring use wired sensors, but due to the high installation costs, systems utilize only a few measurements points [3]. On the other hand, wireless sensor networks (WSN) are ideal for scattered sensing systems; mobile nodes may be placed freely in critical areas to measure temperature or power consumption [9, 10]. Unfortunately, the data center environment is characterized by a typical condition that is generally adverse for wireless communication. The primary material in data center facilities is metal, in addition to switches, racks, cables, and other obstacles including cooling fans, power distribution system and cable rails that generate intense electromagnetic noise.

This paper proposes a WSN designed to monitor in a distributed fashion the temperature evolution in a data center with the goal to improve cooling efficiency. We use Semtech's LoRa (Long Range) [11] that is one of the most promising wide-area IoT communication technologies [12, 13] because of the proprietary spread spectrum modulation.

So far, indoor monitoring applications used mostly Zigbee and other meshoriented protocols in the 2.4 GHz bandwidth studied intensively during the past decade. In this case, multi-hop communication is necessary for long distance transmission, or for reliability in noisy or crowded environments. The usage of LoRa in indoor environments introduces a return to the one-hop communication model, at the cost of a reduced available bandwidth, but with the capability to cover up 34 000 m<sup>2</sup> [14] in a single communication with a similar transmission power. Moreover, for low traffic intensity, it has been demonstrated in [15] that indoor LoRa communication is more energy efficient than a multi-hop network that needs more than one router to cover the same distance. That article shows that even the 802.11ah is significantly worse than LoRa, regarding energy efficiency, when an application requires to exchange tiny data packets. Considering these recent results and the requirements for data-center environmental monitoring, in this paper, we describe the first deployment of a distributed temperature monitoring system in a data-center using the LoRa technology. The contribution of the paper is twofold: (i) to explore the performance of the application using LoRaWAN, especially regarding throughput, energy consumption and packet conflicts; (ii) to investigate the merits of using LoRaWAN as an alternative for the mesh-oriented communication used in this class of applications, so far.

The sensor nodes are designed for easy deployment within an existing supercomputing center facility. Experiments are run in CINECA, the Tier0 supercomputing center for scientific research in Italy [4].

The article is organized as follows: Sect. 1.1 presents the related works. Section 2 describes the LoRa capabilities, while Sect. 3 discusses the hardware design. Section 4 describes the WSN setup and the reached results. Eventually, Sect. 5 concludes the paper with comments and final remarks.

### 1.1 Related Works

Thermal management of data centers has been studied in depth in the last few years [6], ranging diverse strategies, with the goal to reduce the cooling infrastructure power consumption, improving the overall efficiency. In this context, numerous proposals have been presented: in [3] optimization applies to select a data center cooling mode and liquid flow with the aim of minimizing the facilities overall power, under the quality of service and thermal requirements. Optimization in [16, 17] aiming to select the rack fans speed, whereas in [18], the CRACs internal temperatures are used as the reference to minimize the computational and thermal power. In [19], the effort is put into selecting the active subfloor tiles opening and blowers speed, for optimizing the Computer Room Air Handlers (CRAHs) cooling system. In [14] an optimal control policy is presented for hybrid systems featuring free, liquid and air cooling. The optimal control policy is based on a predictive model of the cooling efficiency based on environmental, room and IT temperature measurements. All model-predictive approaches implicitly assume the availability of high-quality temperature data, both off-line for model identification and online for driving control decision. In [20] a green cooling system is proposed; the management model collects information from WSN that utilizes temperature sensors to control the ventilation system and the air conditioning. The WSN is based on the ZigBee protocol and includes the actuators.

This generates a highly sophisticated network and a complex deployment even with 10 boards scattered in only 20 m<sup>2</sup>. As a comparison, our deployment is much cheaper because does not need any router, and at the same cost we could deploy more than 30 nodes. In our test, we covered a 150 m<sup>2</sup> data center room transmitting data to a gateway installed another area of the building at a 60 m distance. Moreover, some of the sensor nodes proposed in [21] require a wired power supply that severely restricts the deployment and increases the installation cost.

Another recent WSN for environmental data monitoring is presented in [5] and proposes Zigbee sensor nodes supplied by 2000 mAh batteries. The deployment consists of 10 nodes in a 30 m<sup>2</sup> area that represents half of their data room. Since the network is configured as a very dense mesh to provide reliability in case of packet loss, several boards are programmed as a router, and communication from the more distant node need up to 4 hops. This has a tremendous impact on the power consumption and on the lifetime of the installation. A sensor in [5] consumes 73 mA on average that is two orders of magnitude higher than our LoRa solution. With an average current consumption of 194 uA, our deployment can operate unattended for more than 200 days using a 1000 mAh battery, half of the size needed in [5].

### 2 LoRa Wireless Modulation Capabilities

LoRa<sup>TM</sup> is a wireless modulation for long-range low-power low-data-rate applications developed by Semtech. LoRa is supported by an alliance (LoRa Alliance) that has defined LoRaWAN, standardizing the higher-layer protocols on top of the physical radio to regulate secure communication for IoT applications and wide area networks. The network consists of end devices and gateways. Based on the LoRaWAN specifications three classes of end devices are defined: Class A, Class B, and Class C.

LoRa modulation is both bandwidth and frequency scalable. Moreover, due to the high Bandwidth Time Product (BT), a LoRa signal is very resistant to both in-band and out-of-band interference mechanisms. Since the symbol period can be longer than the typical short-duration of a noisy spike, it provides immunity to pulsed interference mechanisms. With spread spectrum, the wireless communication issues caused by the presence of interference is reduced by the process gain that is inherent to the modulation. These interfering signals are spread beyond the desired information bandwidth and can be easily removed by filtering at the receiver side.

Another factor that must be considered is the wireless link budget that defines the maximum communication range for given transmission power. The link budget delta, from a comparison between an FSK transceiver with a sensitivity of -122 dBm at 1.2 Kbps with LoRa, at a fixed transmission output power, is more than four times, such as deeply studied in [7].

As well known, in a wireless path the propagation loss increases with the distance between nodes; this means that, for narrowband systems, additional nodes could be needed to generate a mesh network topology (with increased network complexity and redundancy) or to operate as additional repeaters for star networks. Unfortunately, the installation cost associated with installing a repeater increases the overall price of the WSN, in term of hardware components and software development. LoRa can minimize this cost, using a simple star network, by taking advantage of the property that signals with a different spreading factor or sequence will appear as noise at the gateway.

### **3** Network Hardware Design

The sensor node is designed to be low-power and versatile with internal temperature and humidity sensor in parallel with a thorough assessment of the components cost. Moreover, multiple sensors and devices can be connected to their expansion connectors, useful for future works. The gateway used for tests is a commercial product [22] easily customizable in hardware and software. It can be configured both as gateway alone or gateway and server.

### 3.1 Sensor Node

The end-node, shown in Fig. 1, is based on the STM32L4 MCU from ST Microelectronics; this component includes both low-power and high computational resources, while the RFM96 SoM transceiver [23] manages the LoRa Physical layer. The high sensitivity in reception (-148 dBm) combined with the integrated +20 dBm power amplifier makes it optimal for applications requiring range or robustness in communication.

The power consumption of each element was measured. The obtained current in sleep mode is 4 uA @ 3 V with the Real Time Clock (RTC); instead, the STM32L4 provides a low consumption in RUN mode @48 MHz about 8.25 mA, the analog circuits (2 mA) and the RFM96 (76 mA in TX at 10 dBm) are the most expensive parts. The sensor node firmware is based on the I-CUBE-LRWAN libraries package, which is configured to be compliant with LoRaWAN Class A. Each sensor node is programmed to transmit a packet to the gateway every 30 s. Every packet includes a temperature and humidity sample, furthermore, to monitor and manage the WSN, node status information is acquired, such as the battery voltage and channel conflicts. With this configuration and with a battery of 1000 mAh, the sensor node lifespan is seven months. Notice that the solution presented in [21] can operate only a couple of days, using the same energy stored in the battery.



### 3.2 LoRaWAN Gateway and Server

MultiConnect® Conduit<sup>TM</sup> is a highly configurable, manageable, and scalable communications gateway for industrial IoT applications. Network connectivity choices to any preferred data management platform include carrier approved 4G-LTE, 3G, 2G and Ethernet. A diverse range of accessory cards provide the local wired or wireless field asset connectivity and plug directly into the rear of the Conduit gateway. The LoRaWAN radio front-end includes a Semtech SX1301 and two SX1257 that demodulate the packets received simultaneously on all channels.

### 4 Network Setup

A final network experimental deployment was carried out with 20 sensor nodes, arranged in key points of data center's room. This test aimed to verify in a realistic operating condition the LoRa capabilities in a noisy wireless environment. We positioned all the devices in hallways between racks, close to the CRAC output and under the floor, besides, some sensor was placed into full metallic air conditioning pipes and structure. Figure 2 shows two pictures of the WSN deployment.

### 4.1 Network Results

We tested the sensor node for six months in CINECA data center, without recharging the batteries for the entire trial duration. During this time, we acquired more than six million measurements. Taking as reference a month of operation, with 1,073,771 points acquired, the LoRa radio conflicts were in average the 0.55%, that allows high reliability in term of packet communication and data analysis, a prerequisite for automatic cooling systems. By employing automatic retransmission of the collided packets, no discontinuity of collected data was detected, and only one sensor node, during the entire trial period was rebooted due to firmware issues.

All the collected data are acquired by InfuxDB time-series database, which is managed by a Node-Red application. Moreover, to provide a ready-to-use cooling monitoring system we used Grafana (Fig. 3), a platform for data analytics and monitoring. This tool is used by the data center's operators to dynamically adjust the room temperature accordingly with IT's heat generation.

With an average current consumption of 194 uA, due to the unique characteristics of LoRa modulation, our WSN is capable to operate with a single hop communication under challenging environments, allowing both communication reliability and low power. In comparison with other solutions (Table 1), we propose a reduction of the average number of hop (ANH) up to 4 times and an improvement of  $90 \times$  the battery lifetime respect ZigBee protocol and  $1.5 \times$  with others LoRaWAN WSN.



Fig. 2 Sensor node deployment. **a** a hallway positioning is proposed, **b** the sensor node is under the data center floor, **c** GALILEO (CINECA) data center map



Fig. 3 Grafana dashboard, temperatures acquired in CINECA data center

| WSN                       | Protocol | Power supply        | ANH | Max range<br>(m) | Battery<br>lifetime |
|---------------------------|----------|---------------------|-----|------------------|---------------------|
| This paper                | LoRaWAN  | 1000 mAh<br>battery | 1   | 100              | 6 months            |
| Liu, et al. [20]          | ZigBee   | Wired               | 3   | 10               | _                   |
| Rodriguez,<br>et al. [21] | ZigBee   | 2000 mAh            | 4   | 10               | 48 h                |
| Belady [1]                | LoRaWAN  | 1000 mAh            | 1   | 100              | 4 months            |

Table 1 WSN in data center environments

### 5 Conclusion

This paper demonstrates that a LoRa wireless sensor nodes can be an effective and low-cost tool for temperature and humidity sensing in a data center. The proposed nodes and LoRaWAN WSN can provide reliable data for data center's room and environmental monitoring, and to train and feed numerical models for optimal cooling control. LoRaWAN network offers the advantage of easy deployment throughout the data center facilities because there is no need of wiring for power and communication. This network also offers freedom in deployment, as the sensor module can be placed in locations where wired sensors would be unfeasible for technical or safety reasons and are not constrained to keep specific distance between nodes and routers like other mesh-oriented protocols.

Acknowledgements This work was partially supported by a collaboration grant with CINECA. A special thanks for the support to Michele Toni, Massimo Alessio Mauri, Emanuele Sacco is also acknowledged.

### References

- 1. Belady, C.L.: In the data center, power and cooling costs more than the equipment it supports. Electronics Cooling magazine **3**, 1 (2007)
- Rossi, M., Rizzon, L., Fait, M., Passerone, R., Brunelli, D.: Energy neutral wireless sensing for server farms monitoring. IEEE J. Emerg. Sel. Top. Circuits Syst. 4(3), 324–334 (2014)
- 3. Kim, K., Ruggiero, M., Atienza, D.: Free cooling-aware dynamic power management for green datacenters. In: Proceedings of IEEE HPCS, pp. 140–146 (2012)
- 4. Top 500 list. Available from: https://www.top500.org/
- Dongarra, J.: Visit to the national university for defense technology changsha. University of Tennessee, China (2013)
- Rhomadon, R., Ali, M., Mahdzir, A.M., Abakr, Y.A.: Energy efficiency and renewable energy integration in data centers. Strategies and modeling review. Renew. Sustain. Energy Rev. 42, 429–445 (2015)
- 7. Park, S., Seo, J.: Analysis of air-side economizers in terms of cooling-energy performance in a data center considering exhaust air recirculation. Energies **11**(2) (2018)
- 8. Conficoni, C., Bartolini, A., Tilli, A., Cavazzoni, C., Benini, L.: HPC cooling: a flexible modeling tool for effective design and management. IEEE Trans. Sustain. Comput. (2018)

- Porcarelli, D., Brunelli, D., Benini, L.: Clamp-and-forget: a self-sustainable non-invasive wireless sensor node for smart metering applications. Microelectron. J. 45(12), 1671–1678 (2014)
- Balsamo, D., Porcarelli, D., Benini, L., Brunelli, D.: A new non-invasive voltage measurement method for wireless analysis of electrical parameters and power quality. In: SENSORS 2013, IEEE, Baltimore, MD, pp. 1–4 (2013)
- LoRa<sup>™</sup> Modulation Basics, AN1200 v22, LoRa Alliance, Inc. 2400 Camino Ramon, Suite 375 San Ramon, CA 94583, LoRa Alliance, Tech (2015)
- Brunelli, D., Bedeschi, E., Ferrari, M., Tinti, F., Barbaresi, A., Benini, L.: Long-range radio for underground sensors in geothermal energy systems. In: Applications in Electronics Pervading Industry, Environment and Society. Lecture Notes in Electrical Engineering, vol 429. Springer, Cham (2016)
- Sartori, D., Brunelli, D.: A smart sensor for precision agriculture powered by microbial fuel cells. In: 2016 IEEE Sensors Applications Symposium (SAS). Catania, pp. 1–6 (2016)
- Haxhibeqiri, J., Karaagac, A., Van den Abeele, F., Joseph, W., Moerman, I., Hoebeke, J.: LoRa indoor coverage and performance in an industrial environment: case study. In: 2017 22nd IEEE International Conference on Emerging Technologies and Factory Automation (ETFA). Limassol, pp. 1–8 (2017)
- Morin, É., Maman, M., Guizzetti, R., Duda, A.: Comparison of the device lifetime in wireless networks for the internet of things. IEEE Access 5, 7097–7114 (2017)
- Das, R., Kephart, J.O., Lenchner, J., Hamann, H.: Utility function-driven energy-efficient cooling in data centers. In: Proceedings of ICAC, pp. 1526–1544 (2010)
- Banerjee, A., Mukherjee, T., Varsamopoulos, G., Gupta, S.K.: Energy-optimal dynamic thermal management: computation and cooling power co-optimization. IEEE Trans. Ind. Informat. 6(3), 340–351 (2010)
- Parolini, L., Sinopoli, B., Krogh, B.H., Wang, Z.: A cyber-physical systems approach to data center modeling and control for energy efficiency. Proc. IEEE 100(1), 255–268 (2012)
- Zhou, R., Wang, Z., Bash, C.E., McReynolds, A., Hoover, C., Shih, R., Kumari, N., Sharma, R.K.: A holistic and optimal approach for data center cooling management. In: Proceedings of IEEE American Control Conference, pp. 1346–1351 (2011)
- Liu, Q., et al.: Green data center with IoT sensing and cloud-assisted smart temperature control system. Comput. Netw. 101, 104–112 (2016)
- Rodriguez, M.G., et al.: Wireless sensor network for data-center environmental monitoring. In: Sensing Technology (ICST), 2011 Fifth International Conference on IEEE (2011)
- 22. MultiConnect<sup>®</sup> Conduit<sup>™</sup>, programmable gateway with Linux. Available from: http://www. multitech.net/developer/products/multiconnect-conduit-platform/conduit/
- 23. 868/915 Mhz RF Transceiver Module. Available from: http://www.hoperf.com/rf\_transceiver/ lora/RFM95W.html

# Wireless Low Energy System Architecture for Event-Driven Surface Electromyography



Fabio Rossi, Paolo Motto Ros, Stefano Sapienza, Paolo Bonato, Emilio Bizzi and Danilo Demarchi

**Abstract** The development of surface ElectroMyoGraphy (sEMG) acquisition system having an optimal trade-off between accuracy, resolution, low dimension and power consumption is a hot topic today. The event-driven Average Threshold Crossing (ATC) technique applied to the sEMG signal allows the reduction of both complexity and power consumption of the acquisition board. The paper presents an sEMG acquisition system, based on this approach, and shows the advantages of the ATC in this field. A framework for developing bio-signal ATC-processing applications is provided, enabling the comparison with a standard sEMG sampling approach. Both system performance and power consumption analyses are carried out to obtain promising results in terms of real-time behavior and energy saving. As a sample application (FES) in way to verify the behavior of the ATC approach in such application.

**Keywords** Surface ElectroMyoGraphy · Event-driven · Average threshold crossing · Bluetooth low energy · Functional electrical stimulation

F. Rossi (⊠) · D. Demarchi Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino, Turin, Italy e-mail: fabio.rossi@polito.it

P. Motto Ros Electronic Design Laboratory, Istituto Italiano di Tecnologia, Genoa, Italy

S. Sapienza · P. Bonato Department of Physical Medicine and Rehabilitation, Harvard Medical School, Boston, USA

E. Bizzi Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, USA

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_21

### **1** Introduction

The surface ElectroMyoGraphy (sEMG) is a non-invasive electrodiagnostic technique for evaluating and recording the electrical activity produced by the skeletal muscle [1]. The features obtained processing and classifying the sEMG signal are employed in many applications as the determination of the muscle activation timings, the estimation of the force produced by muscle, movement recognition and prostheses control [2-4]. The miniaturization of the acquisition channel, the wireless transmission, and the power consumption are the key aspects for the development of wearable detecting system to be inserted into the Internet of Things (IoT) network [5, 6]. The Average Threshold Crossing (ATC) is an event-driven technique, applied to the sEMG signal, that allows to obtain an optimal trade-off between these requirements. The approach concerns the thresholding of the muscle signal: every time the signal crosses a static or dynamic threshold, an event is generated, producing the quasi-digital Threshold Crossing (TC) signal. It is characterized by a digital-shape and analog timing information, i.e., the number of events and their timing in general, which make it directly interpretable by digital electronics. The simple hardware implementation of this technique, shown in Fig. 1, allows to perform an on board TC features extraction in order to reduce data digitalization, storing and transmission. The ATC parameter, which is calculated as the ratio between the number of TC events detected during an observation window and the length of the window itself, has proven to be highly correlated to the muscle force (i.e., 95% ATC-force correlation) [7]. We have previously shown the benefits of the ATC in the sEMG field [8, 9] as well as the reduction of both circuitry complexity and system dimension and the energy saving for the wireless transmission, i.e., using the Impulse Radio-Ultra Wide Band (IR-UWB). Nevertheless, IR-UWB does not provide an easy interface with common computers; therefore, in this work, in order to provide a standard development framework, we replaced IR-UWB with the now ubiquitous Bluetooth Low Energy (BLE).



Fig. 1 Average Threshold Crossing (ATC) technique: on the left, a comparison with standard sampling approach; on the right, HW implementation of the ATC

We employed the developed system in the control of the Functional Electrical Stimulation (FES), which is a rehabilitative technique that applies low energy electrical pulses to promote the muscle contraction. Considering the same flow of [10], the basic idea is to use the ATC parameter, instead of the sEMG signal, to define a stimulation pattern in a *real-time* mode.

To the best of our knowledge, our system is the first one to apply an sEMG event-driven approach w.r.t. common bio-signal acquisition boards (e.g. Bitalino, FreeEMG and OpenBCI boards [11–13]).

### 2 Hardware System Architecture

The hardware architecture can be conceptually divided into two main parts: the acquisition board, already presented in a previous paper [14], and the BLE module. Here we discuss about the interface between the ATC Analog Front End (AFE) and a MicroController Unit (MCU) and about the transmission of the event data using the BLE protocol. The system has the dual purpose of acquiring the sEMG and generating the TC signals in order to carry out a performance comparison about the two techniques and to fulfill the real-time constraints. Fig. 2 gives an overview about the whole system.

The TI MSP430FR5969 MCU has been chosen due to its high performance and ultra-low power features. The raw sEMG signal is acquired using the integrated ADC, sampling at 1 kHz, 8 bit. The resources required to implement the ATC are minimal w.r.t. typical bio-signal digitalization [15]. The time sparseness of the TC signal allowed us to consider each rising edge as an interrupt, used to increase a counter. An internal timer has been used to reset the counter at the end of the observation window. Low Power Mode (LPM) has been used to reduce the power consumption.

The Microchip RN4020 and the TI CC2540-USBDongle have been chosen as the devices connected to the acquisition system and computer, respectively. Both Generic Access Profile (GAP) and Generic Attribute Profile (GATT) are defined according to the Bluetooth 4.0 specification [16]. In agreement with the GAP description, the CC2540 acts as central/master device while the RN4020 as peripheral/slave one: in that situation the user can supervise the system moving from one function to another, depending on the required task, once the connection with the acquisition board is established.

Next step is to consider the structure definition for data exchange that is based on a server/client architecture. The suitable idea is to define the server on the RN4020; so, every time new data are available, they are updated into the specific location, and a *notification* message is sent to the client. As defined in GATT, we decided to use the *private* service and characteristic, instead of the public one, in way to have an autonomous control on characteristic data dimension, permission, security and identifiers.



Fig. 2 Overview of the overall system: on the top, a schematic representation of the hardware architecture; on the bottom, the graphical user interface developed for the FES application

The server structure is organized as follows: a private service contains four different private characteristics for ATC data, sEMG data, control command and threshold value. ATC characteristic has 4 B data dimension in order to include the information of each AFE in the same packet. The sEMG characteristic has a dimension of 20 B, the maximum data size available by RN4020 [17], so to maximize data throughput. The interfacing commands between the user and the acquisition board are stored in the *command* characteristic; the possible operation are *ATC evaluation* of each channel simultaneously, *sEMG acquisition* of one channel at time and *Threshold setting* to setup a different threshold for TC generation. The last characteristic is used for setting the threshold. All the characteristics have an handle as identifier and the ATC and sEMG ones present notification permission, in addition to the read/write one, to send the packet data to the master when new data are available.

An analysis about data throughput in both ATC and sEMG wireless transfer is needed to define the most suitable connection parameters for the application, i.e. *connection interval, slave latency* and *connection supervision timeout*, considering that six packets can be transmitted for each connection event. TC data are available every 130 ms for each channel and so its data throughput corresponds to:

$$ATC_{throughput} = ATC_{data, 4Ch} * \frac{1}{ATC_{availability time}} = 4B * \frac{1}{0.13s} \simeq 30.7B/s$$

Therefore, the ATC transmission needs  $\frac{1}{0.13s} \simeq$  7events/s to transmit the amount of data, one ATC packet per event, and so we set up the connection interval at 130 ms.

On the other side, the sEMG signal is sampled one channel at time at 1 kHz with 8 bit resolution, obtaining a data rate of 1 kB/s. So, the connection interval parameter have to be changed when the sEMG acquisition is enabled. The new parameter is calculated in agreement with the following formula:

$$\frac{sEMG_{throughput}}{sEMG_{packet size}} * \frac{1}{\# packet s_{MAX}} = \frac{1kBs^{-1}}{20B} * \frac{1}{6event^{-1}} \simeq 8.33 \text{event/s}$$

that corresponds to an interval of 110 ms, 9 connection events per second and the transmission of 6 packets per event. The connection interval and supervision timeout are set to the value of 0 and 2 s for both the acquisition types.

### 3 Software System Architecture

The goal of our project, in the FES field, is the control of a commercial stimulator (RehaStim2 by Hasomed<sup>®</sup>) defining the stimulation pattern as the result of the ATC parameter processing. The device provides a SIMULINK<sup>®</sup> model to interface the stimulator with the simulation software, which allows precise control of each pulse features. We used MATLAB<sup>®</sup>, coupled with SIMULINK<sup>®</sup>, for providing a well-known software environment for the development of ATC-related application. Therefore we developed a Graphical User Interface (GUI) for data processing and inter-systems control. The implemented GUI allows the management of BLE connection and acquisition board, the driving of the RehaStim2 and the monitoring of both acquisition and actuation process by a *multi-threading* approach in order to fulfill the *soft real-time* requirement. Figure 2 (bottom) shows the proposed GUI during the active stimulation: on the graph is plotted the amplitude of FES pulses modulated by the value of the ATC parameter.

### 4 Performance and Discussion

In conclusion, we perform some analyses regarding the power consumption of the acquisition board, and the soft real time performance of the system.

We measured the acquisition board consumption (4 AFE and 1 MCU) of 5.126 mW using LPM. Considering also the wireless transmission we measured the value of 20.23 mW and 23.47 mW for TC and sEMG transmission. The advantage of the event approach is easily valuable, with the same energy dissipation, considering the transmission of four AFE TC information instead of a single sEMG one.

The time performance are evaluated in terms of delay between the muscle contraction and stimulation initialization, using an articular goniometer to trigger a timer when a movement is detected. We measured a mean time of 774.5 ms: this relatively high value for a real-time application is related to the use of the BLE protocol, which is not completely suited for event-driven transmission, and to the employed software that will be replaced by an embedded computing system in future.

### 5 Conclusion

In this paper we present a framework for the acquisition and processing of the sEMG signal, based on the ATC event-driven approach. We designed and tested the interfacing of event data with a microcontroller and a standard BLE transmission, obtaining good result in terms of firmware complexity and power consumption w.r.t. the standard bio-signal acquisition approach. As sample application, we employed such system for the control of a FES stimulator: we developed a multi-threading GUI for the management of MATLAB<sup>®</sup> and SIMULINK<sup>®</sup> soft real-time ATC-processing system.

### References

- 1. Robertson, D.G.E.: Research Methods in Biomechanics. 2 edn, part III, chapter 8: Electromyographic Kinesiology - Gary Kamen (2004)
- Mitra, S., Acharya, T.: Gesture recognition: a survey. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 37, 311–324 (2007)
- 3. Chang, Y.-J., Chen, S.-F., Huang, J.-D.: A Kinect-based system for physical rehabilitation: a pilot study for young adults with motor disabilities, vol. 32, pp. 2566–2570 (2011)
- Girolamo, M.D., Favetto, A., Paleari, M., Celadon, N., Ariano, P.: A comparison of sEMG temporal and spatial information in the analysis of continuous movements. Inf. Med. Unlocked 9, 255–263 (2017)
- Atzori, M., Gijsberts, A., Kuzborskij, I., Elsig, S., Mittaz Hager, A.-G., Deriaz, O., Castellini, C., Müller, H., Caputo, B.: Characterization of a Benchmark Database for Myoelectric Movement Classification (2015)
- 6. Jani, A.B., Bagree, R., Roy, A.K.: Design of a low-power, low-cost ECG EMG sensor for wearable biometric and medical application. In: 2017 IEEE Sensors, pp. 1–3, Oct 2017
- Crepaldi, M., Paleari, M., Bonanno, A., Sanginario, A., Ariano, P., Tran, D.H., Demarchi, D.: A quasi-digital radio system for muscle force transmission based on event-driven IR-UWB. In: 2012 IEEE Biomedical Circuits and Systems Conference (BioCAS), pp. 116–119, Nov 2012
- Motto Ros, P., Paleari, M., Celadon, N., Sanginario, A., Bonanno, A., Crepaldi, M., Ariano, P., Demarchi, D.: A wireless address-event representation system for ATC-based multi-channel
force wireless transmission. In: 5th IEEE International Workshop on Advances in Sensors and Interfaces IWASI, pp. 51–56, June 2013

- Sapienza, S., Crepaldi, M., Motto Ros, P., Bonanno, A., Demarchi, D.: On integration and validation of a very low complexity ATC UWB system for muscle force transmission. IEEE Trans. Biomed. Circuits Syst. 10, 497–506, April 2016
- 10. Shalaby, R.E.-S.: Development of an electromyography detection system for the control of functional electrical stimulation in neurological rehabilitation (2011)
- 11. Bitalino: BITalino Board Kit Data Sheet (2015)
- 12. BTS Bioengineering: BTS FreeEMG user manual, October 2008
- 13. OpenBCI, OPENBCI CYTON. http://docs.openbci.com/Hardware/02-Cyton
- Guzman, D.A.F., Sapienza, S., Sereni, B., Motto Ros, P.: Very low power event-based surface EMG acquisition system with off-the-shelf components. In: IEEE Biomedical Circuits and Systems Conference (BioCAS) (2017). https://doi.org/10.1109/BIOCAS.2017.8325152
- Lichtman, A., Fuchs, P.: Hardware and software design for one channel ECG measurement using MSP430 microcontroller. In: 2018 Cybernetics Informatics (K I), pp. 1–5, Jan 2018
- 16. Bluetooth Special Interest Group (SIG) and Bluetooth SIG Working Groups, BLUETOOTH SPECIFICATION Version 4.0, June 2010
- 17. Microchip, RN4020 Bluetooth Low Energy Module DataSheet, September 2015

# Activity Monitoring and Phase Detection Using a Portable EMG/ECG System



## Wulhelm Daniel Scherz, Ralf Seepold, Natividad Martínez Madrid, Paolo Crippa, Giorgio Biagetti, Laura Falaschetti and Claudio Turchetti

**Abstract** The investigation of stress requires to distinguish between stress caused by physical activity and stress that is caused by psychosocial factors. The behaviour of the heart in response to stress and physical activity is very similar in case the set of monitored parameters is reduced to one. Currently, the differentiation remains difficult and methods which only use the heart rate are not able to differentiate between stress and physical activity, without using additional sensor data input. The approach focusses on methods which generate signals providing characteristics that are useful for detecting stress, physical activity, no activity and relaxation.

R. Seepold e-mail: ralf.seepold@htwg-konstanz.de

R. Seepold · N. Martínez Madrid Department of Information and Internet Technology, Sechenov University, Campus Trubetskaya str., 8, b. 2, 119992 Moscow, Moscow, Russia e-mail: natividad.martinez@reutlingen-university.de

N. Martínez Madrid Reutlingen University, Alteburgstr. 150, 72762 Reutlingen, Germany

P. Crippa · G. Biagetti · L. Falaschetti · C. Turchetti Department of Information Engineering, Università Politecnica delle Marche, Via Brecce Bianche, 12, 60131 Ancona, Italy e-mail: p.crippa@univpm.it

G. Biagetti e-mail: g.biagetti@univpm.it

L. Falaschetti e-mail: l.falaschetti@univpm.it

C. Turchetti e-mail: c.turchetti@univpm.it

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_22 187

W. D. Scherz (⊠) · R. Seepold HTWG Konstanz, Alfred-Wachtel-Str. 8, 78462 Konstanz, Germany e-mail: wscherz@htwg-konstanz.de

# 1 Introduction

Stress and activity have certain relationships: Stress, as a psychosocial factor on a long term, has negative effect on the cardiovascular system [1]; acute stress can even trigger spontaneous panic attacks [2] and induce immune dysfunctions and reduce life quality [3]. It is clear that stress is always present in our modern life and it is not always destructive [4]. The self-awareness of health and the possibility of bio vital data analysis that is available due to current technology support are an upcoming topic in modern society. New technologies offer options for analysing stress and understanding the patterns of the bio vital data. Stress in presence of real danger is a mechanism that allows us to react faster and more effectively. While under stress, the body reorganizes and prioritizes the functions to achieve maximal performance [4]. This mechanism is used in the daily life, when quick and effective reactions are needed. A response can be triggered by social media in adolescents or by work events [5]. A consequence might be that our body is under stress. Stressors are defined as external influences that induce stress. Some negative consequences of long term high levels of stress are difficulties to respond in an adequate way to physical, mental and emotional demands [6–9] also known in burnouts and cardiac diseases [10, 11]. In addition, the brain structure might change [12]. Stress should not be considered exclusively as a peak of a dramatic event. Many events which require a fast response and elevated activity can contribute to stress.

Physical activity and stress have very similar effects on the body, like for example, changes in the heart rate (variability), blood pressure, etc. Physical activity has some similar but short-term effect, like changes in the heart rate, respiration rate, and long-term changes if the physical activity is repeated over a longer time [13].

At previous attempts [14, 15] for measuring stress based on electrocardiography (ECG), heart rate (HR), heart rate variation (HRV) data, we were confronted with the problem to differentiate stress and physical activity. Both events are very alike and cause similar changes in the ECG, HR, HRV, etc. [12, 16].

Furthermore, mobile ECG devices and activity monitoring tools/devices can be used for determining stress but the main challenge remains: the correct classification/distinction of/between stress and (sport) activity. Currently, there is no method to differentiate clearly and easily between stress and activity in case movement detectors are not considered.

For this purpose, a test has been created to collect data in different situations. The obtained data should offer a possibility to distinguish stress from physical activity. The original experiment was designed to generate four different datasets: no activity, activity, relaxing and mental stress. For this work, a shorter modified version of the original test with only 3 datasets was used (mental stress was left out). This decision was taken in order to simplify the analysis and to concentrate in a first step on major impact factors.

To collect the data from the subjects and to analyse the behaviour (no activity, activity, and relaxing) a low-cost wearable wireless system has been used that was specifically designed to acquire surface electromyography (EMG), ECG and accelerometer data [17, 18]. This system consists of a wireless sensing node capable of acquiring, processing and transmitting motion related body signals and ECG data.

# 1.1 State of the Art

Currently, there are three well-known methods applied for measuring stress: the first is to use stress reports, the second – physiological signals, the third is based on the analysis of the behaviour. Stress reports are variations of the Perceives Stress Scale [19], that measures a degree of stress in various situations. The disadvantage is that answering these questionnaires require a lot of time and effort and it depends of the individual motivation.

The second method for measuring stress is based on physiological signals. This method can be divided into two sub methods: (a) using lab instruments, e.g. measuring hormones like cortisol and adrenaline that are released in saliva and blood can be are used for determination of stress [20]; and (b), capturing and analysing bio vital data (physical signals) like HR, ECG, skin conductivity (electro dermal activity), EMG, HRV calculated from ECG, etc. As mentioned in [15], stress can influence the behaviour of the heart and the variability can give useful information about changes. Another example of the use of physical parameters is analysis of skin conductivity. Most of the methods are offline (post processing).

The third method uses the behaviour as an input for stress measurement. This method analyses the small difference in behaviour while being or not being stressed. In [21, 22] the ways of typing during stress were analysed. Another example is the analysis conducted with the use of the steering wheel in cars while driving [23]. All these methods are tailored to specific situations and they are difficult to adapt to different use cases.

It is known that stress has an influence on physical activity, meditation and relaxation [24]. Relaxation and meditation can be analysed by observing the heart dynamics [24]. A popular method of detecting physical activity uses wearable sensors that are placed over the body [25]. Physiological signals, which are usually used for detecting physical activity are EMG, ECG, photo plethysmography (PPG) and temperature. These kinds of physiological signals are nowadays widely available because suitable sensors are embedded in smart watches, smartphones, smart wearing (clothes, glasses, etc.) and chest bands. The second common option for evaluating physical activity are inertial measurements which records the changes in orientation and acceleration. However, the detection of physical activity faces a challenge: the quality of the detection is strongly influenced by the placement of the sensors. That means that these methods are optimised, depending on the placement of the sensors. The risk of the activity detection using the limbs is the possible failure due to the major body movements [18].

# 2 Method

In this section, we describe the tests executed and the details of the data collection. The experiment was performed in a room with artificial light and constant temperature. The duration of all phases was limited to 5 min and 2 min pause between each phase. For the first iteration, we collected data from two candidates: one male and one female. None of the participants stated any health problems, like heart disease or similar, none of them smokes. The age range was between 26–31.

The experiment consists of four different phases in this order: the first phase is the resting. In this phase, the data is collected while the participant rests (sitting on a chair, open eyes), i.e. no special activity is done in this time. The obtained data of the first phase is used as reference for comparing later with the data of the other stages. The duration of this phase is 5 min. The second phase is the physical activity phase. In this phase, the participant has to do some physical exercise. Due to repeatability, this test was simplified. It is composed of easy exercises done in this order: 11 biceps curls, 11 lateral raises, 1 isometric contraction (approx. 10s), 11 frontal raises and 11 vertical raises. The phase requires approx. 5 min.

In the third phase, a test person is exposed to some psychological stressors. The mostly used are the Stroop test [26], the Trier test [27] and arithmetical operations that have to be solved in limited time. Approx. duration is 5 min. The fourth phase is relaxation. This phase was designed to achieve relaxation and observe the heart rate dynamics and compare the differences to the other phases. Here, we let test persons listen to classical music (sitting, closed eyes). We used Mozart Sonata for Piano Duet in D major K448: II Andante. Duration of the phase is approx. 5 min (although the composition is longer).

During the experiment, ECG, EMG and accelerometer data are recorded with Wireless Sensor Nodes, developed by the University of Ancona. Three mobile nodes [17] were used: two of the mobile nodes were used to collect EMG data and the third node was used for collecting ECG data. The nodes were placed on the biceps (biceps brachii), deltoideus medius for EMG data and near to the musculus pectoralis major for ECG data. Figure 1 shows the placement of the nodes on the arm. This placement was chosen for a simpler detection and reconstruction of activity based on [17]. The ECG data is evaluated by extracting the R-peaks (maximum peaks) from the QRS complex and comparing them to each predecessor. All available data from the nodes are stored per session in a file that contains all sensor values like EMG/ECG values, time stamp, accelerometer and temperature. Currently, temperature data and accelerometer data are not used.





Fig. 2 Both figures show the temporary evolution of the RR intervals and their change. The upper figure shows a resting phase. The lower figure shows the physical activity phase

# **3** Results

In a previously performed experiment, we obtained the following preliminary results by observing the RR intervals as in Fig. 2. Both figures visualise the change between two consecutive heart beats.

$$Y = | RR_{i-1} - RR_i | . (1)$$

It is observable that in upper figure of Fig. 2 there are no clear changes in the signal. In comparison to that, the second part of the figure, shows a remarkable shift in the signal. It is expected to use this shift to be the differentiator between stress and physical activity. It is also expected that there will be a significant difference between all four phases of the test. Also, some characteristics that were observed are a lower variance in the HRV, during the different phases of the experiment. For the test persons, the experiment shows a clear marker to distinguish between stress and physical activity.

# 4 Conclusion and Outlook

In this paper we describe shortly a simple and compact experiment that generates data for distinguishing between resting, physical activity, stress and relaxation. We used a mobile node system that can be applied for detecting different physical activities and it could be used for detecting stress and relaxation phases. This serves as a starting point to further investigate on to difference between stress and physical activity

Stress and physical activity have similar properties like variation in the heart rate. It is very important for the development of a system used for stress detection to minimize the misdetection of stress caused by physical activities or relaxation because misdetection can lead to stress and reduces the trust in the system. Analysing the behaviour of the heart during different activities like no activity, physical activity, stress and relaxation could be an applicable method for answering this question. Currently, the differentiation of physical activity and stress for validation proposes can be realised with an activity monitoring system. To further confirm this preliminary result, a bigger study with more participants have to be performed, which is planned in the future. Aim of that study is to validate the behaviour of the heart as a discriminator for stress.

Acknowledgements This research was partially funded by the EU Interreg V-Program "Alpenrhein-Bodensee-Hochrhein": Project "IBH Living Lab Active and Assisted Living".

### References

- Black, P.H., Garbutt, L.D.: Stress, inflammation and cardiovascular disease. J. Psychosom. Res. 52(1), 1–23 (2002)
- Schmidt, N.B., Lerew, D.R., Jackson, R.J.: The role of anxiety sensitivity in the pathogenesis of panic: prospective evaluation of spontaneous panic attacks during acute stress. J. Abnorm. Psychol. Bd. Volumen 5(2), 355–364 (1997)
- Epel, E.S., Blackburn, E.H., Lin, J., Dhabhar, F.S., Adler, N.E., Morrow, J.D., Cawthon, R.M.: Accelerated telomere shortening in response to life stress. Proc. Natl. Acad. Sci. 101(49), 17312–17331 (2004)

- Jansen, A.S.P., Van Nguyen, X., Karpitskiy, V., Mettenleiter, T.C., Loewy, A.D.: Central command neurons of the sympathetic nervous system: basis of the fight-or-flight response. Sci. Bd. 270, 644–646 (1995)
- Afifi, T.D., Zamanzadeh, N., Harrison, K., Callejas, M.A.: WIRED: the impact of media and technology use on stress (cortisol) and inflammation (interleukin IL-6) in fast paced families\*. Comput. Hum. Behav. 81, 265–273 (2018)
- Martinez Fernandez, J., Augusto, J.C., Seepold, R., Martínez Madrid, N.: A sensor technology survey for a stress-aware trading process. IEEE Trans. Sys. Man Cybern. Part C Appl. Rev. 42, 809–824 (2012)
- Martínez Fernández, J., Augusto, J.C., Seepold, R., Martínez Madrid, N.: Why Traders Need Ambient Intelligence, Germany. Springer Berlin, Heidelberg (2010)
- Martínez Fernández, J., Augusto, J.C., Trombino, G., Seepold, R., Martínez Madrid, N.: Selfaware trader: a new approach to safer trading. J. Univ. Comput. Sci. (2013)
- Roozendaal, B., McEwen, B.S., Chattarji, S.: Stress, memory and the amygdala. Nature Rev. Neurosci. 10, 423–433 (2009)
- 10. Orth-Gomér, K., et al.: Marital stress worsens prognosis in women with coronary heart disease: the Stockholm female coronary risk study. J. Am. Med. Assoc. (2000)
- Mei, Y., Thompson, M.D., Cohen, R.A., Tong, X.: Autophagy and oxidative stress in cardiovascular diseases. In: Biochimica et Biophysica Acta (BBA) - Molecular Basis of Disease (2015)
- 12. Lupien, S.J., McEwen, B.S., Gunnar, M.R., Heim, C.: Effects of stress throughout the lifespan on the brain, behaviour and cognition. Nat. Rev. Neurosci. **10**, 434–445 (2009)
- Gutin, B., et al.: Effects of exercise intensity on cardiovascular fitness, total body composition, and visceral adiposity of obese adolescents. AS for Clin. Nutr. ED. 75(5), 818–826 (2002)
- 14. Scherz, W.D., Ortega, J.A., Seepold, R.: Towards emotion pattern extraction with the help of stress detection techniques in order to enable a healthy life. In: ARCA XXVII Conference on Qualitative Systems and Applications in Diagnosis, Robotics and Ambient Intelligence (2016)
- Scherz, W.D., Ortega, J.A., Seepold, R.: Heart rate variability indicating stress visualized by correlations plots. In: Lecture Notes in Bioinformatics and Biomedical Engineering (LNBI), Vol 9044, Subseries of Lecture Notes in Computer Science (2015)
- Kidd, T., Carvalho, L.A., Steptoe, A.: The relationship between cortisol responses to laboratory stress and cortisol profiles in daily life. Biol. Psychol. 25(02), 34–40 (2014)
- Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Wireless surface electromyograph and electrocardiograph system on 802.15.4. IEEE Trans. Consum. Electron. 62(3), 258–266 (2016)
- Biagetti, G., Crippa, P., Falaschetti, L., Orcioni, S., Turchetti, C.: Human activity monitoring system based on wearable sEMG and accelerometer wireless sensor nodes. BioMed. Eng. Online 17(132) (2018)
- Chohen, S., Kamarck, T., Mermelstein, R.: A global mesure of preceved stress. J. Healt Soc. Behav. 24(4), 385–396 (1983)
- Juliane, H., Melanie, S.: The physiological response to Trier Social Stress Test relates to subjective measures of stress during but not before or after the test. Psychoneuroendocrinology 37(1) (2012)
- Gunawardhane, S.D., De Silva, P.M., Kulathunga, D.S., Arunatileka, S.M.: Non invasive human stress detection using key stroke dynamics and pattern variations. In: International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo (2013)
- Vizer, L., Zhou, L., Sears, A.: Automated stress detection using keystroke and linguistic features: an exploratory study. Int. J. Hum Comput Stud. 67(10), 870–886 (2009)
- Paredes, P.E., Ordoñez, F., Ju, W., Landay, J.A.: Fast & furious: detecting stress with a car steering wheel. In: CHI Conference on Human Factors in Computing Systems, Montréal (2018)

- Peng, C.-K., Henry, I.C., Mietus, J.E., Hausdorff, J.M., Khalsa, G., Benson, H., Goldberger, A.L.: Heart rate dynamics during three forms of meditation. Int. J. Cardiol., 19–27 (2004)
- Magno, M., Benini, L., Spagnol, C., Popovici, E.: Wearable low power dry surface wireless sensor node for healthcare monitoring application. In: International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 189–195 (2013)
- 26. Stroop, J.R.: Studies of interference in serial verbal reactions. J. Exp. Psychol., 643-662 (1935)
- Kirschbaum, C., Pirke, K.-M., Hellhammer, D.H.: The 'Trier Social Stress Test'- a tool for investigating pyschobiological stress responses in a laboratory settings. Neuropychobiologie, 78–81 (1993)

# Transformer Design for 77-GHz Down-Converter in 28-nm FD-SOI CMOS Technology



Andrea Cavarra, Claudio Nocera, Giuseppe Papotto, Egidio Ragonese and Giuseppe Palmisano

**Abstract** This paper presents a comparative analysis of integrated transformers for a 77-GHz down-converter in a 28-nm fully depleted (FD) silicon-on-insulator (SOI) CMOS technology. The proposed down-converter, which is addressed to long-range automotive radar applications, is based on a fully differential mixer-first architecture and exploits two integrated transformers, i.e. an input transformer for single-endedto-differential conversion of the 77-GHz signal and an inter-stage transformer to feed a current-driven passive Gilbert-cell. Both transformers have been properly designed, while exploiting the most suitable spiral configuration to meet the stringent requirements of automotive applications. To this aim, stacked, interleaved, and interstacked transformers have been compared by means of extensive electromagnetic simulations at 77 GHz. The comparison has been carried out in terms of insertion loss (IL) and transformer characteristic resistance (TCR), which are the most suitable figures of merit. The interstacked configuration provides the lowest IL (i.e., 1.2 dB at 77 GHz), thus resulting the best choice as input balun. The interleaved topology has been chosen instead as inter-stage transformer thanks to its high TCR (i.e.,  $1.9 \text{ k}\Omega$  at 77 GHz), which leads to better conversion gain.

A. Cavarra e-mail: andrea.cavarra@studium.unict.it

C. Nocera e-mail: claudio.nocera@unict.it

G. Palmisano e-mail: giuseppe.palmisano@unict.it

© Springer Nature Switzerland AG 2019

A. Cavarra · C. Nocera · E. Ragonese (⊠) · G. Palmisano DIEEI, Università di Catania, Viale a. Doria 6, 95125 Catania, Italy e-mail: egidio.ragonese@unict.it

G. Papotto STMicroelectronics, Stradale Primosole 50, 95121 Catania, Italy e-mail: giuseppe.papotto@st.com

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_23

# 1 Introduction

In recent years, 77-GHz radar sensors have gained high interest due to their application in Advanced Driver Assistance Systems (ADAS). Highly scaled CMOS technologies are very promising to implement low-cost mm-wave automotive radar systems since they enable System-on-Chip (SoC) implementation including the mmwave front-end, the analog base-band, and the digital processing circuitry. As far as the mm-wave front-end design is concerned, one of the most critical building block is the down-converter, which has to cope with stringent requirements in terms of noise figure (NF), linearity, and power consumption. Actually, the major design challenge comes from the limited voltage headroom. To overcome this limitation, architectures based on inductive components are extensively adopted at mm-wave frequencies [1, 2]. Although different solutions have been proposed in literature [3–6], the best trade-off between linearity and noise performance is provided by mixer-first architectures, which rely on a low-noise current-driven mixer to perform the RF input signal down-conversion. These solutions guarantee high linearity and low power consumption while providing acceptable noise and conversion gain performance, despite the lack of an input low-noise amplifier (LNA).

Figure 1 displays the simplified schematic of the proposed down-converter [7]. It consists of input transformer  $T_1$ , performing both single-ended-to-differential signal conversion and 50- $\Omega$  input matching, and an active transconductor (*V-I* converter), which delivers the RF current to a Gilbert-cell through transformer  $T_2$ . The down-converter performance optimization calls for a careful design of both transformers,  $T_1$  and  $T_2$ . Specifically, input transformer  $T_1$  mainly affects the down-converter *NF* and hence its *IL* at 77 GHz must be minimized in resonance condition. Conversely, load transformer  $T_2$  of the *V-I* converter affects the conversion gain and hence its *TCR* must be maximized [8, 9].

In this paper, a comparative analysis of monolithic transformers is carried out to identify the most suitable structures for the proposed down-converter. To this purpose, stacked, interleaved, and interstacked transformers were designed in 28-nm FD-SOI CMOS technology by means of extensive EM simulations.

Section 2 reports a description of the designed transformers, whereas simulated results are reported in Sect. 3. Finally, conclusions are drawn in Sect. 4.



Fig. 1 Simplified schematic of the down-converter architecture for the 77-GHz radar application

### 2 Transformer Design

The 28-nm FD-SOI CMOS technology by STMicroelectronics [10] provides a backend-of-line with 8 Cu metal layers with thick Cu option for the last two ones (referred to as IB and IA), and an Al metal layer (LB) at the top of the stack, as shown in Fig. 2a. The two thick-copper metals in addition to the aluminum one can profitably be used to implement integrated transformers thus minimizing both resistive losses and parasitic capacitance towards the substrate.

Three octagonal single-turn transformers were designed by adopting stacked, interleaved, and interstacked configurations, respectively, as shown in Fig. 2. Each structure was sized to guarantee a self-resonance frequency (SRF) around twice the operating frequency. Stacked (Fig. 2b) and interleaved (Fig. 2c) transformers are based on a standard configuration. Specifically, the primary and secondary coils of the stacked transformer are built in LB/IB and IA respectively. On the other hand, both primary and secondary coils of the interleaved transformer are fabricated in LB/IB. Interstacked transformer adopts a mixed interleaved/stacked coil configuration, which allows a higher magnetic coupling to be achieved [11–13]. Specifically, outer (inner) spiral of primary winding is stacked to the outer (inner) spiral of the secondary winding and interleaved with the inner (outer) spiral of the secondary winding at the same time, as shown in Fig. 2d. Both primary and secondary coils



Fig. 2 Stack of 28-nm FD-SOI CMOS technology (a) and 3D-view of stacked (b), interleaved (c) and interstacked transformers (d)



Fig. 3 *Q*-factors, of primary (a) and secondary (b) coils, for stacked, interleaved, and interstacked transformers



are built in IA/IB. Note that, no patterned ground shield (PGS) was used. Indeed, it would have a negligible impact on the substrate losses at 77 GHz, while significantly reducing the SRF [13, 14].

# **3** Simulation Results

Transformer EM simulations were carried out in ADS Momentum by Keysight. Figures 3 and 4 reports the simulated quality factor, Q, of both primary and secondary coils and the magnetic coupling, k, for each of the designed transformers. As apparent, all the designed structures exhibit a SRF of about 170 GHz. Furthermore, the interleaved transformer guarantees the highest Q-factor at both primary and secondary coil, whereas the interstacked configuration allows maximizing the magnetic coupling with a k-factor of about 0.68 at 77 GHz (see Fig. 4). Figure 5 displays the *IL* and *TCR* of each transformer. The lowest *IL* at 77 GHz is achieved by the interstacked topology (around 1.2 dB), whereas the interleaved transformer ensures the maximum *TCR* (about 1.9 k $\Omega$ ). Since the *IL* directly affects the down-converter *NF*, the interstacked transformer represents the best choice for the implementation of the input balun.



Fig. 5 Simulated performance of the stacked, interleaved, and interstacked transformers: insertion loss (a) and TCR (b)

| Parameters                         | Stacked | Interstacked | Interleaved | Units |
|------------------------------------|---------|--------------|-------------|-------|
| Metal width                        | 5.5     | 6.5          | 5.5         | (µm)  |
| Inner diameters P/S                | 44      | 30           | 55/70       | (µm)  |
| Primary coil inductance @ 77 GHz   | 90      | 72           | 130         | (pH)  |
| Secondary coil inductance @ 77 GHz | 96      | 72           | 110         | (pH)  |
| Primary coil Q-factor @ 77 GHz     | 17      | 18           | 23          | -     |
| Secondary coil Q-factor @ 77 GHz   | 26      | 18           | 27          | -     |
| SRF                                | 170     | 174          | 175         | (GHz) |
| k @ 77 GHz                         | 0.62    | 0.68         | 0.56        | -     |
| IL @ 77 GHz (in resonance mode)    | 1.4     | 1.2          | 2.5         | (dB)  |
| <i>TCR</i> @ 77 GHz                | 1.4     | 0.9          | 1.9         | (kΩ)  |

 Table 1
 Transformer geometrical and electrical parameter comparison

Interleaved structure allows maximizing the amount of RF current delivered to the Gilbert-cell thanks to its high *TCR*, thus resulting the best solution for the implementation of the *V-I* converter load transformer. Table 1 summarizes the geometrical and electrical parameters of the designed transformers.

Finally, the interstacked and interleaved structures were used, as transformer  $T_1$  and  $T_2$ , respectively, of the 77 GHz down-converter shown in Fig. 1. The *V-I* converter, basically, consists of a pseudo-differential common-source stage with inductive degeneration for 50  $\Omega$  input matching, and exhibits a current consumption of about 12 mA. The *V-I* converter feeds the RF current, through  $T_2$ , to a passive Gilbert-cell for reduced flicker noise. Figure 6 reports the short-circuit transconductance conversion gain of the 77 GHz down-converter along with its *NF*. The down-converter exhibits a *NF* as low as 6.3 dB at 1-MHz intermediate frequency (IF), with a conversion gain of 46 mS at 77 GHz.



**Fig. 6** Simulated down-converter performance. Short-circuit transconductance conversion gain as function of the RF frequency ( $\mathbf{a}$ ) and NF as function of the IF frequency ( $\mathbf{b}$ )

# 4 Conclusion

The design of mm-wave integrated transformers for a 77-GHz automotive radar down-converter in 28-nm FD-SOI CMOS technology has been presented. Three different transformers, namely stacked, interleaved, and interstacked, with SRF set to twice the operating frequency, have been compared in term of *IL* and *TCR*. The best solution for each of the transformers of the down-converter has been selected. The interstacked configuration exhibits the lowest *IL*, thus resulting the most suitable solution for the implementation of the down-converter input balun. On the other hand, the interleaved configuration turns out to be the best option for the implementation of the inter-stage transformer, thanks to its high *TCR*.

### References

- Medra, A., et al.: An 80 GHz low-noise amplifier resilient to the TX spillover in phasemodulated continuous-wave radars. IEEE J. Solid-State Circuits 51, 2299–2311 (2016)
- Vigilante, M., Reynaert, P.: On the design of wideband transformer-based forth order matching networks for E-Band receivers in 28-nm CMOS. IEEE J. Solid-State Circuits 52, 2071–2082 (2017)
- Fujibayashi, T., et al.: A 76- to 81-GHz multi-channel radar transceiver. IEEE J. Solid-State Circuits 52, 2226–2241 (2017)
- 4. Trotta, S., et al.: An RCP packaged transceiver chipset for automotive LRR and SRR systems in SiGe BiCMOS technology. IEEE Trans. Microw. Theor. Tech. **60**, 778–794 (2012)
- Ragonese, E., Scuderi, A., Giammello, V., Palmisano, G.: A SiGe BiCMOS 24-GHz receiver front-end for automotive short-range radar. Springer Analog Integr. Circuits Signal Process. 67, 121–130 (2011)
- Ragonese, E., Scuderi, A., Giammello, V., Messina, E., Palmisano, G.: A fully integrated 24 GHz UWB radar sensor for automotive applications. In: IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, California, USA, pp. 306–307 (2009)
- Nocera, C., Cavarra, A., Ragonese, E., Papotto, G., Palmisano, G.: Down-converter solutions for 77-GHz automotive radar sensors in 28-nm FD-SOI CMOS technology. In: 14th Conference on Ph.D. Research in Microelectronics & Electronics (PRIME), Prague, Czech Republic, pp. 153–156 (2018)

- Italia, A., Carrara, F., Biondi, T., Scuderi, A., Ragonese, E., Palmisano, G.: The transformer characteristic resistance and its application to the performance analysis of silicon integrated transformers. In: IEEE Radio Frequency Integrated Circuits Symp. Dig. (RFIC), Long Beach, California, USA, pp. 597–600 (2005)
- Carrara, F., Italia, A., Ragonese, E., Palmisano, G.: Design methodology for the optimization of transformer-loaded RF circuits. IEEE Trans. Circuits Syst. I: Regul. Papers 53, 761–768 (2006)
- Cathelin, A.: Fully depleted silicon on insulator devices CMOS: the 28 nm node is the perfect technology for analog, RF, mmW, and mixed-signal system-on-chip integration. IEEE Solid State Circuits Mag. 9, 18–26 (2017)
- Ragonese, E., Sapone, G., Palmisano, G.: High-performance interstacked transformers for mm-wave ICs. Wiley Microw. Opt. Technol. Lett. 52, 2160–2163 (2010)
- Ragonese, E., Sapone, G., Giammello, V., Palmisano, G.: Analysis and modeling of interstacked transformers for mm-wave applications. Springer Analog Integr. Circuits Signal Process. 72, 121–128 (2012)
- Giamello, V., Ragonese, E., Palmisano, G.: A transformer-coupling current-reuse SiGe HBT power amplifier for 77-GHz automotive radar. IEEE Trans. Microw. Theor. Tech. 60, 1676–1683 (2012)
- Giammello, V., Ragonese, E., Palmisano, G.: Transmitter chipset for 24/77 GHz automotive radar sensors. In: IEEE Radio Frequency Integrated Circuits Symposium Digest (RFIC), Anaheim, California, USA, pp. 75–78 (2010)

# Part VII Power and Thermal Electronics

# Investigating an Active Cooling System Powered by a Thermoelectric Generator



Pietro Tosato, Maurizio Rossi and Davide Brunelli

**Abstract** The diffusion of powerful microprocessors, even in embedded processing, justifies the need for more efficient heat dissipation. Even though the implementation of active cooling is often easy, in this study, we show how to lower the temperature of a general-purpose microprocessor with energy harvesting techniques. We present a joint thermal and electrical analysis of a thermoelectric-powered active cooler and we demonstrate that it is possible to decrease the temperature some Celsius degrees with respect to a regular passive cooler using the same energy dissipated by the processor under the heat-sink.

**Keywords** Thermoelectric generator · Energy harvesting · Energy neutral system · Thermoelectric cooling

# 1 Introduction

Energy harvesting is nowadays a consolidated process to convert, and store electrical energy for consumer and embedded electronics, from external environmental sources (e.g., solar power [1, 2], wind energy [11, 14], kinetic energy, bacteria [4, 18] or electromagnetic coupling [13]). Thermoelectric energy harvesting is a well-known example of such techniques, especially in wearable applications [3], taking advantage of the difference between human body and air, and in smart sensors deployed in industrial plants, exploiting the wasted heat at high temperatures from industrial thermal processes [17].

P. Tosato (🖂) · M. Rossi · D. Brunelli

Department of Industrial Engineering, University of Trento, via Sommarive 9, 38123 Trento, Italy e-mail: pietro.tosato@unitn.it

M. Rossi e-mail: maurizio.rossi@unitn.it

D. Brunelli e-mail: davide.brunelli@unitn.it

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_24

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

There are many applications of Thermoelectric Generators (TEG) reported in literature, most of them are focused on applications where high power is involved, like in automotive, aircraft and spacecraft, combining heat and solar for thermoelectric power generators [6].

With the development of low-power electronics and new battery generations [12], TEGs were used also in the market of smart sensing electronics. One of the first contributions on energy harvesting from the microprocessor wasted heat can be found back in 2008 from Zhou et al. [21], who presented the problem of heat spreading from the microprocessor chip and the difference between benchmarks and load type on the harvested power.

A more recent investigation is provided in [16], where an energy neutral cooling system powered by TEG is designed. In that work, an intermittent cooling was achieved with an energy harvesting and storage between the TEG and the fan, enabling the system cooling for processor overclock. A modeling approach was presented by Martinez et al. [9], who did not provide experimental data on a real prototype.

Many other contributions discuss efficient energy harvesting solutions: for example, several works of Spies et al. [20] and the number of contributions in wearable devices powered by human body heat [10, 19].

In this paper, we present the results of an investigation on the feasibility of a passive-yet-active cooling system self-powered by thermoelectric generators. The base idea is to use a thermoelectric generator in the same stack of the heat-sink, to scavenge the needed energy to power a fan, that boosts the heat dissipation. Although a patent application was already submitted [15] in the early 90's, experimental results that confirm the proposed approach are not yet available.

The paper is organized as follows: thermal and electric model are presented in Sect. 2, while the achieved results are discussed in Sect. 3, before the concluding remarks.

# 2 System Prototype and Model

The thermal resistance, between the silicon die and the ambient, determines the operating temperature of the electronics. The clock speed of microprocessors is usually automatically tuned with the internal temperature, to not exceed a safe operating threshold when the CPU is running. A lower thermal resistance, therefore, allows the processor to run with faster clock rates, because of the enhanced heat spreading capability. On the other hand, incrementing the airflow through the heat-sink also result in a more efficient heat dissipation. Therefore, even if inserting a TEG between the chip package and the heatsink appears a drawback (because of the higher thermal resistance) the energy produced by this element may be enough to power a fan and then improving the heat-sink dissipation.

The system requires a joint thermal and electrical model to measure the thermal properties and to find the optimal trade-off of the design parameters.



Fig. 1 Thermal model of the thermoelectric energy harvesting cooler and example of the stack implementation

Table 1Operatingparameters of the prototype inthe laboratory environment

| Parameter           | Value  |
|---------------------|--------|
| Fan min voltage     | 80 mV  |
| Fan min power       | 1.5 mW |
| T_core max          | 78 °C  |
| Ambient temperature | ~25 °C |

The heat source, used in our experiments, is the microprocessor on a Raspberry PI 3 single board computer (SBC). This was chosen because it is a well known and widely used SBC, although the heat generated by its BCM2837 ARM core is limited compared with other platforms. For instance, a typical *Exynos* SoC used for smartphones and tablets can drain 5–8 W under load, while the BCM2837 usually consumes around 3 W, providing, therefore, less heat dissipation usable by the TEG.

The resulting scheme is represented in Fig. 1, both as schematic stack representation, and as the thermal equivalent circuit. Some parameters of the prototype are provided in Table 1. Because the main purpose of the system is spreading heat off the processor, the energy scavenged by the TEG is reused for boosting the cooling. Therefore the arrow going out of the TEG module in Fig. 1 actually changes the thermal resistance of the heatsink (HS in the figure) by means of a fan (improving the thermal convection on the fins).

In our experiments, we used a Laird CP10-31-05 TEG, and even if the thermal characteristics of the thermoelectric module are not available in the datasheet, we extracted a model from some previous experiments available in literature [7].



Fig. 2 Two Raspberry PIs, with and without the TEG-powered cooler, used to evaluate the effectiveness of the cooling system running the same benchmark

The fan is mounted on the top of an aluminum heat-sink of  $1.5 \times 1.5 \times 1.5$  cm, using a custom 3D-printed support. This support has also the purpose of fostering chimney effect on the heatsink. Figure 2 shows two Raspberry PIs used for executing a parallel benchmark, with and without the proposed method.

All the parameters identified in the model of the Fig. 1 are estimated and verified by measures. Rc is the thermal resistance available both at the interfaces heatsink-to-TEG and TEG-to-processor, and it is realized with a silicone thermal compound, while Rj is the thermal resistance at the junction inside the processor case. Both can be considered negligible with respect to the thermal resistance of the TEG and the heat-sink. The junction resistance Rj of the BCM2837 is not known actually, but in similar devices, such resistance is as small as 0.5 K/W.

As widely discussed in the literature, the thermal resistance of a TEG is a function of the current flowing through it [5]. With a parametric test setup, similar to the one presented in [8], the thermal resistance of the TEG has been measured between 5 and 8 K/W. Similarly, the Thermal resistance of the used aluminum heat-sink is around 11 K/W, that is a typical value for a small aluminum extruded heat-sink.

# **3** Results

We started a parallel test with two setups, as a first experiment. The main reason for this concurrent experiment is the need of comparable environmental condition, in order to correctly evaluate the performance gain obtained using the TEG and the fan.

In fact, the temperature on the cold side of the TEG (and therefore its power generation) is directly proportional to the amount of heat dissipated by the heat-sink, thus it is important to control the external environmental temperature. With a double



Fig. 3 Resulting temperature on two Raspberry PI processors ( $T_core$ ) running the same benchmark, at the same time. The reference board has just the heat-sink while the other has the TEG cooler on

test executed in parallel at the same time, and with the same external temperature, we have evaluated the difference generated by the proposed approach.

The temperature on the processor was extracted with a bash script, logging temperature as well as other parameters such as frequencies, voltages, and workload of the CPU. The CPU load was produced with the help of the *sysbench* tool, commonly available in many Linux distributions. The temperatures logged on the BCM2837 processor are reported in Fig. 3. In the prototype with the TEG cooler, the temperature stabilizes at a couple of degree Celsius lower than the processor with only the heat-sink. The temperature on both the heat-sinks was also logged, finding that, in the case of the TEG-enhanced system, not only the temperature on the processor is lower, but also the heat-sink is kept at lower temperature.

We measured also the power extracted by the TEG (shown in Fig. 4), to assess the possibility of powering the fan indirectly, exploiting a power manager. In fact, the power extracted by the TEG, when the processor is IDLE, is not enough for activating the fan, but it may be stored in an accumulator for later usage, if a power management is implemented.

The advantage would be the recover of all the energy, both during idle and load periods of the processor and the release of the energy on the fan only when needed. We will test this approach in the future, because of some open issues such as the low efficiency of the energy harvester IC in similar conditions, that wastes a remarkable part of the power.

The power generated by the TEG is approximately proportional to the squared of the temperature difference between the device surfaces, which in turn depends on the overall thermal resistance. The latter is decreased by the fan, which works at a



Fig. 4 Power harvested from the TEG and used by the fan, with respect to the temperature on the Raspberry PI core

higher speed if the power generated is higher, as the chain of dependencies shown as follows:

$$P_{TEG} \rightarrow \frac{\text{fan air flow}}{\text{fan power}} \rightarrow \sum R_{thermal} \rightarrow \Delta T_{TEG} \rightarrow P_{TEG}$$

The insertion of another relation (the efficiency of an energy harvester, like the LTC3108), between the fan and the power generated by the TEG, lower the overall efficiency of the system, resulting in the loss of the self-sustainability. In fact, the efficiency of the tested energy harvester, in the experiment conditions, is typically lower than 30%. The effect would be an inevitable intermittent operation of the fan. The power harvested by the LTC3108 available at its output (continuously) was about  $600 \,\mu$ W. This is remarkably lower than the one depicted in Fig. 4, which is the power extracted by the TEG and directly used on the fan.

# 4 Conclusion

In this paper, a prototype of a thermoelectric powered cooler for low-power ARM processor is presented. The system is successfully applied to a Raspberry PI3, showing the advantage of such approach and obtaining a self-powered active cooler capable of decreasing the temperature of some degrees Celsius with respect to a commercial passive cooler.

# References

- 1. Bergonzini, C., Brunelli, D., Benini, L.: Comparison of energy intake prediction algorithms for systems powered by photovoltaic harvesters. Microelectron. J. **41**(11), 766–777 (2010)
- Brunelli, D., Dondi, D., Bertacchini, A., Larcher, L., Pavan, P., Benini, L.: Photovoltaic scavenging systems: modeling and optimization. Microelectron. J. 40(9), 1337–1344 (2009)
- Brunelli, D., Farella, E., Rocchi, L., Dozza, M., Chiari, L., Benini, L.: Bio-feedback system for rehabilitation based on a wireless body area network. In: Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW 2006), p. 5 p. 531, March 2006
- Brunelli, D., Tosato, P., Rossi, M.: Microbial fuel cell as a biosensor and a power source for flora health monitoring. In: 2016 IEEE SENSORS, pp. 1–3, Oct 2016
- Carmo, J., Antunes, J., Silva, M., Ribeiro, J., Goncalves, L., Correia, J.: Characterization of thermoelectric generators by measuring the load-dependence behavior. Measurement 44(10), 2194–2199, Dec 2011
- Champier, D.: Thermoelectric generators: a review of applications. Energy Convers. Manag. 140, 167–181 (2017)
- Dziurdzia, P.: Modeling and simulation of thermoelectric energy harvesting processes. In: Sustainable Energy Harvesting Technologies - Past, Present and Future. InTech, Dec 2011
- Izidoro, C., Junior, O.A., Carmo, J., Schaeffer, L.: Characterization of thermoelectric generator for energy harvesting. Measurement 106, 283–290, Aug 2017
- Martínez, A., Astrain, D., Rodríguez, A.: Dynamic model for simulation of thermoelectric self cooling applications. Energy 55, 1114–1126, Jun 2013
- Nardello, M., Tosato, P., Rossi, M., Brunelli, D.: A thermoelectric powered system for skiing performance monitoring. In: Applications in Electronics Pervading Industry, Environment and Society, pp. 135–144. Springer International Publishing (2019)
- Pasquato, L., Bonotto, N., Tosato, P., Brunelli, D.: An optimized wind energy harvester for remote pollution monitoring. In: 2017 IEEE Workshop on Environmental, Energy, and Structural Monitoring Systems (EESMS), pp. 1–6, July 2017
- Porcarelli, D., Brunelli, D., Benini, L.: Characterization of lithium-ion capacitors for lowpower energy neutral wireless sensor networks. In: 2012 Ninth International Conference on Networked Sensing (INSS), pp. 1–4, June 2012
- Porcarelli, D., Brunelli, D., Benini, L.: Clamp-and-forget: a self-sustainable non-invasive wireless sensor node for smart metering applications. Microelectron. J. 45(12), 1671–1678 (2014)
- Porcarelli, D., Spenza, D., Brunelli, D., Cammarano, A., Petrioli, C., Benini, L.: Adaptive rectifier driven by power intake predictors for wind energy harvesting sensor networks. IEEE J. Emerg. Sel. Top. Power Electron. 3(2), 471–482 (2015)
- 15. Primus, F.J., Goldenberg, M.D., Hills, S.: United States Patent (19) (1991)
- Rizzon, L., Rossi, M., Passerone, R., Brunelli, D.: Energy neutral hybrid cooling system for high performance processors. In: International Green Computing Conference, pp. 1–6, Nov 2014
- Rossi, M., Rizzon, L., Fait, M., Passerone, R., Brunelli, D.: Energy neutral wireless sensing for server farms monitoring. IEEE J. Emerg. Sel. Top. Circuits Syst. 4(3), 324–334 (2014)
- Sartori, D., Brunelli, D.: A smart sensor for precision agriculture powered by microbial fuel cells. In: 2016 IEEE Sensors Applications Symposium (SAS), pp. 1–6, April 2016
- Siddique, A.R.M., Mahmud, S., Heyst, B.V.: A review of the state of the science on wearable thermoelectric power generators (TEGs) and their existing challenges. Renew. Sustain. Energy Rev. 73(January), 730–744 (2017)
- 20. Spies, P., Pollak, M., Rohmer, G.: Power management for energy harvesting applications (2018)
- Zhou, Y., Paul, S., Bhunia, S.: Harvesting Waste Heat in a Microprocessor Using Thermoelectric Generators - Modeling Analysis and Measurement.Pdf, pp. 98–103 (2008)

# A Smart Torque Control for a High Efficiency 4WD Electric Vehicle



Antonio Cordopatri and Giuseppe Cocorullo

**Abstract** At the state-of-the-art, electric propulsion systems based on hub-motors allow mechanical components to be removed, increasing also the vehicle stability. However, since motors and wheels are directly coupled, when a high propulsion torque is delivered, in such a configuration the motors could work in low efficiency operative points, negatively affecting the vehicle consumptions. In this paper, a torque distribution strategy is defined to minimize these consumptions. The propulsion configuration under investigation is a four-wheel drive that adopts two different couples of Brushless DC (BLDC) hub motors. The proposed research highlights how the electric efficiency is improved when the propulsion torque required by the vehicle is delivered by the BLDCs in an asymmetric way, according to the motors operative conditions. Since this torque repartition dynamically changes while the vehicle is running, the effects on the vehicle stability are also evaluated. Such an analysis is performed considering, as reference, a Class A vehicle, principally designed for the urban mobility.

# 1 Introduction

Currently, the propulsion system for an Electric Vehicle (EV) may be realized choosing one among six different configurations [1]. One of them consists in directly placing traction motors inside the wheel-drives. Such a solution allows the most compact and lightest propulsion configuration to be obtained, because mechanical components, such as differential, gearbox, transmission and clutch are not required. In addition, unlike the other configurations, when independent motors are employed, the yaw rate, and thus the vehicle stability, can be corrected also accelerating each

A. Cordopatri (⊠) · G. Cocorullo

Department of Computer Science, Modelling, Electronic and System Engineering, University of Calabria, Arcavacata di Rende, Italy e-mail: a.cordopatri@dimes.unical.it

G. Cocorullo e-mail: cocorullo@dimes.unical.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_25

wheel drive differently. Moreover, since the motor temperature is related to the motor surface-volume ratio, a simple cooling system, even air based, can be adopted in this multi-motor propulsion system. Nevertheless, removing the mechanical gearings, electric motors and wheels are directly connected. Therefore, the hub motors have to deliver a high torque traction during the low speed urban acceleration phases and high currents flow in the motors windings. A lot of energy is lost in the form of Joule heating, reducing the energy efficiency and increasing the consumptions of this kind of propulsion systems. Such a problem has been investigated in [2] and the efficiency of propulsion systems composed by BLDC hub motors has been analyzed. In [2] two-wheel drive (2WD) and four-wheel drive (4WD) configurations were proposed, and for each of them the energy consumptions referred to the New European Driving Cycle (NEDC) were obtained. In particular, the 4WD configurations were implemented adopting BLDCs with equal (homogeneous solution) and different electric-magnetic features (inhomogeneous solutions). Results showed that, among all the proposed solutions, the inhomogeneous 4WD composed by two 120 V-14 kW and two 72 V-8 kW BLDCs, allows the energy efficiency to be maximized, reducing consumptions by 27% with respect to the 2WD propulsion configuration. However, results in [2] were obtained assuming that all the hub motors delivered the same propulsion torque. Works proposed in [3-5] demonstrated that the energy address the consumptions for a 4WD propulsion system might be reduced distributing the propulsion torque in a proper way. Based on such a consideration, in this paper we analyze how the energy consumption of the propulsion configuration that adopts two 120 V-14 kW and two 72 V-8 kW can be further increased with an asymmetrical torque distribution.

# 2 Models

In order to compare the data obtained by our research with [2], we assume their same models and operative conditions. Thus, a 4WD Class A vehicle is adopted as reference car. This vehicle is designed principally for the urban mobility even if it is able to face also extra urban travels with a maximum speed of 120 km/h. It provides a good load capacity (400 kg) and four seats. The car adopts two 120 V–14 kW BLDCs, on the rear axle, and two 72 V–8 kW BLDCs on the front axle. In order to evaluate the energy consumptions of the proposed propulsion system, each BLDC is described by an equivalent electric circuit, depicted in Fig. 1, in which the coil stator resistance,  $R_c$ , the back electromotive force, b-EMF, the losses due to eddy currents, hysteresis and mechanical frictions,  $R_p$  and  $I_p$ , are taken into account. Such electrical parameters are obtained, for each motor, starting from the BLDC electrical nominal values and efficiency curve. In the worst case, the difference between the efficiency provided by the BLDC manufacturer and that obtained with our approach, does not exceed the 5%.

The propulsion system adopts LiFePO4 batteries and the total energy stored is 30kWh. For both batteries and converters, the efficiency has been supposed equal



Fig. 1 On the left the BLDC equivalent electrical model; on the right the accuracy results for the 14 kW BLDC

to one, because their behavior is outside the scope of this research. Anyway, we have preliminary verified by simulations that, for the hub motor-based propulsion systems, almost all the energy lost occur into the electrical motors. Since the energy consumption depends on the driving path, the standard NEDC is adopted as reference. This driving cycle is composed by four urban drive cycle (UDC) and one extra urban driving cycle (EUDC), for a total length of 11.023 km. An accurate analysis of the energy consumptions requires taking into account the dynamics of the vehicle and of the hub motors and the constraints imposed by the driving cycle. In this paper, such a not simple system has been simulated by the interaction of two different software: CarSim and MatLab. While the former is used to describe the vehicle dynamics, the latter is adopted to simulate the propulsion system and the reference driving cycle.

# 3 The Proposed Torque Distribution Strategy

As reported in Eqs. (1) and (2), the required propulsion torque  $(\tau_v)$  is split between the front axle and the rear axle BLDCs in accordance with the repartition index  $\alpha$ , a value between 0 and 1. When  $\alpha$  is different from 0.5, the propulsion torque is delivered in an asymmetrical way. In that case, the BLDCs on the same vehicle axle deliver the same torque, but the propulsion force is differently allocated between the front axle and the rear axle motors.

$$\tau_{\text{BLDC front axle}} = 1/2 * \alpha * \tau_{\text{v}}.$$
 (1)

$$\tau_{\text{BLDC\_rear\_axle}} = 1/2 * (1 - \alpha) * \tau_{v}.$$
<sup>(2)</sup>

It is worth pointing out that such an asymmetric torque distribution has a positive effect on the energy consumptions only if different couples of motors are used. In such a condition, the total propulsion force must be properly partitioned between the



Fig. 2 Repartition index—BLDCs losses relationship

front and rear vehicle axles motors, identifying an optimal repartition index ( $\alpha_{opt}$ ) that allows the BLDCs losses to be minimized (Fig. 2).

For a propulsion system that adopts BLDCs, the  $\alpha_{opt}$  value is mathematically expressed in the generic formula reported in Eq. (3).

$$\begin{aligned} \alpha_{opt} &= \left(-R_{c1} * \tau_v * (I_{p1} + b - Emf_1/R_{p1})/k_{m1} + R_{c2} * \tau_v^2 * 1/(2 * k_{m2}^2) \right. \\ &+ R_{c2} * \tau_v * (I_{p2} + b - Emf_2/R_{p2})/k_{m2}\right) * 1/(0.5 * \tau_v^2 * (R_{c1}/k_{m1}^2 + R_{c2}/k_{m2}^2)). \end{aligned}$$

The subscripts  $_1$  and  $_2$  indicate that the parameters are referred, respectively, to the front and the rear BLDCs.

As described by (3),  $\alpha_{opt}$  is not constant but it varies with the propulsion torque required by the driver and the BLDCs rotation speed. Thus, each vehicle condition is characterized by a specific optimum repartition index.

## 4 Simulations

## 4.1 Vehicle Stability

According to Eq. (3), the index torque repartition dynamically changes during the vehicle motion, and this could have negative effects on the vehicle maneuverability. In order to investigate this phenomenon, the Moose test is executed in CarSim on two cars with equal parameters but different propulsion systems. The former uses a 2WD solution while the latter adopts a 4WD inhomogeneous solution in which the propulsion torque is delivered in an asymmetrical way with a repartition index that cyclically runs from 0 to 1 and vice versa. For both cars, the speed is equal to 50 km/h. The vehicle lateral distance to path, during the MOOSE test, for each car, is depicted



in Fig. 3. It is worth noting that these two curves are perfectly overlapped, since the two vehicles under test follow the same driving path. Such a result highlights that an asymmetrical torque distribution does not affect the vehicle stability.

#### 4.2 **Energy Consumptions**

distance to path

In order to analyze how the asymmetric torque distribution influences the propulsion energy consumptions, the equations relative to vehicle dynamic, propulsion system, driving cycle and optimum repartition index are implemented in the MatLab-CarSim simulation environment. Another time, it is important noting that, in this analysis, just the losses on BLDCs are considered, as done in [2]. As above mentioned, such an assumption does not have a strong impact for our research.

#### 5 **Results Discussion**

Results obtained by our MatLab-CarSim simulations are compared with the "two 120 V-14 kW and two 72 V-8 kW" energy consumptions calculated by [2] with a symmetric torque distribution (4WD STD). In this comparison, it is useful to include also the energy consumptions of the 2WD solution, taken, once again, as reference propulsion system. For these three propulsion configurations, the energy consumptions for the UDC, EUDC and complete NEDC are reported in Table 1.

Focusing on the UDC, the energy consumption of the 4WD ATD is 10% lower than the same propulsion configuration that uses a symmetric torque distribution, and 50% lower compared with the 2WD configuration. This high energy saving is due to the deep accelerations that the vehicle has to face during the urban driving cycle.

| <b>U</b> 1                                                            |             | •            |              |
|-----------------------------------------------------------------------|-------------|--------------|--------------|
| Configuration                                                         | UDC (Wh/km) | EUDC (Wh/km) | NEDC (Wh/km) |
| 2 × 14 kW (2WD) [2]                                                   | 296         | 193          | 231          |
| $2 \times 14 \text{ kW} + 2 \times 8 \text{ kW} (4\text{WD STD})$ [2] | 164         | 162          | 167          |
| $2 \times 14 \text{ kW} + 2 \times 8 \text{ kW} (4\text{WD ATD})$     | 148         | 155          | 152          |

 Table 1 Energy consumptions of the evaluated propulsion systems

 Table 2
 Autonomy comparison for the evaluated propulsion systems

| Configuration                                                     | Autonomy (km) | Variation (%) |
|-------------------------------------------------------------------|---------------|---------------|
| $2 \times 14$ kW (2WD) [2]                                        | 104           | -             |
| $2 \times 14$ kW + $2 \times 8$ kW (4WD STD) [2]                  | 143           | +37           |
| $2 \times 14 \text{ kW} + 2 \times 8 \text{ kW} (4\text{WD ATD})$ | 158           | +52           |

In these phases, the BLDCs have to provide a high torque and thus high currents flow in the motors windings. If the propulsion torque is asymmetrically provided, according to Eq. (3), the most of the torque is delivered by the 8 kW BLDCs that have the lowest coil resistance and the higher mechanical constant. Therefore, the energy lost as Joule heating is lower in the 8 kW BLDCs than in the 14 kW BLDCs.

Otherwise, during the extra urban driving cycle, the 4WD ATD energy consumptions are just 4% lower than the 4WD STD. The auxiliary 8 kW BLDCs can not reach the high rotation speeds imposed by the extra urban driving cycle. Thus, for the most of the EUDC, the propulsion system is composed by the only two-14 kW BLDCs. In this condition, an asymmetric torque distribution is infeasible and therefore the benefits introduced by such a strategy are limited.

Finally, for the complete NEDC, the 4WD ATD shows an energy consumption of 152 Wh/km, that is 8.9% lower than the same configuration with a symmetric torque distribution, and 34% lower with respect the reference 2WD. All the propulsion systems can be also compared from the autonomy point of view. Regardless of the propulsion system, the LiFePO4 battery packs store 30 kWh of energy. In order to avoid deep charge/discharge cycles, we assume that only the 80% of 30 kWh can be used. Therefore, considering the NEDC energy consumptions and the available energy, the autonomies for each propulsion system are obtained (Table 2).

Among all the three solutions, the 4WD with the asymmetric torque repartition shows the highest autonomy. It ensures 158 km of autonomy, which is 10% more than 4WD STD and 52% more with respect the 2WD solution.

# 6 Conclusion

This paper proposes a strategy to reduce the energy consumptions of a propulsion system composed by different couples of BLDCs. More in details, we exploit the features of different BLDCs and an asymmetrical repartition of the propulsion force to reduce the losses on the motors. When tests are performed on the "two 120 V–14 kW and two 72 V–8 kW" configuration, results demonstrate how the proposed torque distribution strategy allows reducing the NEDC energy consumption by 8.9% than the same configuration that uses a symmetric torque distribution, and by 34% compared with the reference 2WD energy consumption. In addition, the effects on the vehicle stability have been also investigated. From CarSim simulation, the MOOSE tests show that an asymmetric torque distribution does not affect the vehicle maneuverability, making even more valid the approach here proposed.

# References

- 1. Tie, S.F., Tan, C.W.: A review of energy sources and energy management system in electric vehicles. Renew. Sustain. Energy Rev. 20, 82–102 (2013)
- Cordopatri, A., Cocorullo, G.: A hub motors choice strategy for an electric four independent wheel drive vehicle. In: Proceedings of 2017 International Conference of Electric and Electronic Technologies for Automotive, Torino, Italy, June 2017
- 3. Wu, D., Li, Y., Zhang, J., Du, C.: Torque distribution of a four in-wheel motors electric vehicle based on PMSM system model. Proc. IMechE Part D: J Autom. Eng. 1–18
- Yuan, X., Wang, J.: Torque distribution strategy for a front- and rear-wheel-driven electric vehicle. IEEE Trans. Veh. Technol. 61(8), 3365–3374 (2012)
- Chen, L., Wang, J., Lazari, P.: Influence of driving cycles on traction motor design optimizations for electric vehicles. In: Proceedings of Transport Research Arena (TRA) 5th Conference: Transport Solutions from Research to Deployment, Paris, France (2014)

# **Experimental Analysis of Battery Management System Algorithms of Li-ion Batteries**



Federico Garbuglia, Matteo Unterhorst, Luca Buccolini, Simone Orcioni and Massimo Conti

**Abstract** The large use of lithium batteries as energy storage pushes researches to find new systems to make them work in safe conditions, to estimate their state of charge and their state of health. Better algorithms can be developed using software simulations, but they need to be tested on real cells. In this paper, two charging algorithms are compared, testing their efficiency on a new Arduino-based HW platform, developed for this purpose. The platform, which implements passive balancing, is controlled by a PC, executing Matlab scripts.

# 1 Introduction

In recent years, large scale deployment of renewable energy sources gives rise to multiple challenges regarding the energy storage solutions. On the other hand, the last decade electrical vehicles proliferation is becoming an important part of our everyday reality. Lithium batteries are among the most used energy storage systems because of their excellent performance, which is related to their high specific energy, energy density, specific power, efficiency, and long life for either energy storage system and electrical vehicles [1]. Lithium batteries have also some disadvantages. Lithium metal is high reactive and operating parameters must be monitored to ensure the cells are operating into its their safe operating area in terms of temperature, voltage and current.

Some other disadvantages are due to the fact that the batteries are composed by different cells connected in series. The consecutive charge-discharge cycle may cause a voltage and charge imbalance among battery cells because of variations in their physical characteristics. This imbalance happens due to manufacturing, temperature, and cell aging problems. Imbalanced voltage and charge profiles may reduce the overall performance and durability of energy storage systems [2].

F. Garbuglia · M. Unterhorst · L. Buccolini · S. Orcioni · M. Conti (⊠) Department of Information Engineering, Università Politecnica delle Marche, via brecce bianche 12, 60124 Ancona, Italy e-mail: m.conti@univpm.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_26

For this reasons, the battery management systems (BMS) are mandatory to continuously monitoring the cells temperature, voltages and battery current and to handle the cells-to cells mismatch by acting passive cell balancing, where the cells with higher voltage are discharged on resistors or active balancing, where the weakest cells are charged by transferring the charge from the strongest cell or group of cells.

Despite it's not strictly mandatory, almost all application of BMS require the knowledge of the battery state of charge (SoC) and the battery state of health (SoH) thus a lot of papers focus the attention to the SoC and SoH estimation algorithms [3, 4].

Several simulations were presented by in [5–7] that help to reduce the testing times, also regarding balancing strategies [8], but a HW test bench is required for the final validation of every algorithm [9, 10].

The authors developed an open source HW platform for research in battery charging and passive charge balancing algorithms as well as SoC and SoH estimation algorithms [11]. The platform uses a custom open source, Arduino-compatible BMS board which is controlled by a PC via Matlab scripts. Through the PC it is possible also to regulate the battery charging, controlling a digital DC/DC converter. With this setup, testers manage to acquire real-time plot and to respond quickly to critical conditions. Other advantages of this platform include high modularity, thanks to the extended Arduino compatibility, and the high versatility of Matlab programming environment, which allow to modify algorithm parameters on-the-go.

In this paper we present the comparison between two different charging algorithms that uses passive charge balancing as balancing techniques. The HW platform allows us to take into account for the BMS's temperature during the development of the charging and balancing algorithms. The Matlab script which contains the algorithms and handles the test allows to display and to estimate in real time the efficiency of the algorithms and of the main battery parameters.

# 2 BMS Hardware

The hardware used to test the algorithms is made up of a six-cell battery management system named BMSino, a digital DC/DC converter and a PC. The BMS is designed to measure the state of the battery pack and to send and receive serial commands from/to the PC. A Matlab script was developed to communicate with the BMS through the serial port. The measured value from the BMS are required by the Matlab software which elaborates them and, following the algorithm rules, applies the balancing mask and imposes certain current set-point to the DC/DC converter.

The main hardware components of the test platform are:

- 1. Six Battery Cells. These are standard 3.7 V cells. We use 6 new Panasonic NCR18650B and 6 used Sanyo UCR18650F.
- DC/DC converter. It's based on LM2596 step-down regulator from Texas Instruments. Output voltage and current set-points can be set via PC using UART



Fig. 1 Complete BMS hardware composed by the custom BMS, the battery charger, the cells and a power supply unit. On the right hand a detail of the BMSino is reported

communication as the converter is able to operate in constant voltage or constant current mode.

- 3. Power supply unit. It supplies 48 V direct current from the main line needed to power the DC/DC converter.
- 4. Brushless 80 mm fan. It blows towards the BMSino balancing resistors, which are the main heat source of the system. Thermal characteristics can be improved using thermal pad and aluminium heat sink.
- 5. BMSino board. Its purposes is to measure cells voltage, cells current and cells temperatures using an AD converter, to communicate via an Arduino board with the PC and to activate the charge balance on each cell. The board is described in details in [11].

The AD converter, balancing resistors and current sensors are integrated in the BMS board, while Arduino UNO is connected using the compatible pinout. The AD7280 converter, made by Analog Devices, is controlled by Arduino using an SPI bus.

The AD acquires up to 12 different input channels: 6 of them are connected to the terminals of each cell, 6 are connected to NTCs to measure cell temperatures (Fig. 1).

Channels resolution is 976 uV, with a (1–5) V range. Its maximum sample rate is 11.6 kHz. Moreover, the AD controls MOSFET switches to connect a 10  $\Omega$ , 2 W resistor to each cell. These resistors implement the passive balancing, discharging each cell when desired.

The Arduino board saves received voltages and temperatures from the AD, while it measures battery current and BMS board temperature using its own analog input. The BMS temperature is acquired by an on-board 10 kohm NTC, while battery current can be measured in 3 different ways:

- 1. for high currents, an integrated open loop Hall effect can be used (of HAIS family, by LEM), with a range of +-150 A and 0.39 A resolution.
- 2. for lower currents, an integrated INA170 current sense amplifiers (by Texas Instruments) is provided: it senses the voltage drop across a  $0.22 \Omega$  shunt resistor.
- 3. a chosen sensor, connected to a dedicated input on the BMS.

The BMS-Arduino system can communicate with external SPI devices, such as SD CARDS, to save log file with measured values. In this configuration, the UART bus is connected to the PC, to send commands provided by the Matlab algorithm and to plot real time data from BMS. The AD7280's I2C gives the possibility to use higher voltage battery pack, connecting more BMS in series using a master-slave configuration. Aside from these, many Arduino general purpose pins are still available. For example, they can be used to control security contactors, to disconnect the battery.

# 3 BMS Algorithm

The software is built in MATLAB on a PC that sends commands to the BMS board and acquires and stores the measurements. The software consists in a continuous loop repeated every second with the following steps:

- 1. **Measure**: The time instant is saved. Charge balancing is temporarily disabled. Cell voltages, cell temperatures, battery current, and BMS shield temperature are measured.
- 2. **Security Control**: measured values are compared to security thresholds. For each exceeded threshold a new flag is set. If a new flag is detected, the supply current is forced to zero and the procedure is stopped.
- 3. **BMS algorithm**: The balancing algorithm is evaluated. Exit, if charge is completed.
- 4. Output set: The charging current set-point and the balancing mask is applied.
- 5. Wait: Wait up to the end of the second and go back to step 1.

Two BMS algorithms has been implemented, called algorithm A and B.

#### Algorithm A

The average value of the cell voltages over 8 s is calculated.

- 1. If all cell voltages are under a balancing threshold  $(V_{th})$ , the charging current is set to a constant maximum value  $(I_{MAX})$ . Balancing is not activated.
- 2. When the first cell voltage is over the balancing threshold  $(V_{th})$ , the first balancing is activated. The charging current is set to the value reported in Eqs. 1–3.
- 3. The balance is applied to every cell for which  $V_i > V_{imax} V_{th2}$ .

$$I_{charge} = I_{MAX} \left[ 1 - \left( \frac{V_{imax} - V_{th}}{V_{MAX} - V_{th}} \right)^2 \right]$$
(1)

Experimental Analysis of Battery Management System Algorithms ...

$$V_{imax} = max_{i=1..n}V_i \tag{2}$$

$$I_{i,balance} = \frac{V_i}{R_{balance}} \tag{3}$$

- 4. The balancing stops when all cells are balanced at least once. Charging continues lowering the current using Eq. 1. Balancing is not activated for 100 s.
- 5. After 100 s, another balancing starts going back to step 3.
- 6. Balancing and unbalancing phases continue until one cell reaches maximum allowed voltage  $(V_{MAX})$ .

#### Algorithm B

The average value of the cell voltages over 8 s is calculated.

- 1. If all cell voltages are under a balancing threshold  $(V_{th})$ , the charging current is set to a constant maximum value  $(I_{MAX})$ . Balancing is not activated.
- 2. When the first cell voltage is over the balancing threshold  $(V_{th})$ , balancing is activated. The charging current is set to the value reported in Eqs. 1–3.
- 3. The balance is applied to every cell for which  $V_i > V_{imax} V_{th2}$ .
- 4. Balancing phase continues until one cell reaches maximum allowed voltage  $(V_{MAX})$ .

The values selected for the parameters of the two algorithms have been chosen in the following way:

$$V_{th} = 4V, V_{th2} = 5mV, V_{MAX} = 4.2V, I_{MAX} = 1.6A, R_{balance} = 10\Omega$$

# **4** Experimental Results

The hardware and software platform has been validated using six new Panasonic NCR18650B cells and six used Sanyo UCR18650F cells. Their main characteristics are reported in Table 1. The 12 cells have been characterized and the OCV model, internal resistance and capacitances of the model described in [6] have been obtained.

|                                              | Panasonic<br>NCR18650B | Sanyo UR18650F |
|----------------------------------------------|------------------------|----------------|
| Nominal voltage                              | 3.6 V                  | 3.7 V          |
| Discharging end voltage/End of charging volt | 2.5 V/4.2 V            | 2.75 V/4.2 V   |
| Rated capacity                               | 3200 mAh               | 2500 mAh       |
| Max charging current                         | 1.5 A                  | 1.75 A         |

Table 1 Characteristics of the test cells used


Fig. 2 Voltages of the 6 Panasonic cells during charge using the BMS A algorithm, test 4



Fig. 3 Balance control of the cells during charge (1:ON, 0:OFF), of the test 4

Different charge and discharge tests have been developed to verify the hardware and the BMS algorithms. Some results are reported in Figs. 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 the overall performances are summarized in Table 2.

Figures 2, 3, 4, 5, 6 and 7 show the results of test 4 of Table 2 in which the BMS algorithm A is applied. The batteries are initially approximately half charged, the mismatch on the initial SoC is low so the cells are almost balanced form the beginning and the mismatch on the parameters of the cells is not relevant (only cell 5 shows an internal resistance higher than the other cells).

Figure 2 shows the values of the voltages of the six cells during charge in test 4. In the first 180 s all the cells are charged, some of them are balanced so that they are charged with a current that is reduced by the current (about 0.4 A) flowing in the balance resistance of 10  $\Omega$  in parallel to the cell. Figure 3 shows the balancing signals. Cell 4 has the lowest voltage so it is never balanced, while cell 1 is always balanced.



Fig. 4 Maximum voltage difference dv during charge, of the test 4



Fig. 5 BMS temperature during charge, of the test 4



Fig. 6 Current feed during charge, of the test 4







Fig. 8 Voltages of the 6 Panasonic cells during charge using the BMS B algorithm, test 2



Fig. 9 Balance control of the cells during charge (1:ON, 0:OFF), of the test 2



Fig. 10 BMS temperature during charge, of the test 2



Fig. 11 Power efficiency  $\eta$  during charge, of the test 2

After about 180 s, cell 4 increases his voltage with respect to the other cells and start balancing. From this instant, algorithm A stops balancing avoiding a continuous balancing and all the cells recharge. Due to the mismatch, the difference among the cell voltages increases. After 100 s the balancing algorithm starts balancing again and the difference among the cell voltages reduces. The effect of the balancing algorithm on the maximum difference among the cell voltages dv is shown in Fig. 4. During balancing dv reduces rapidly. On the other hand, during balancing the power dissipation on the BMS board increases and the temperature increases. The period of stop balancing help in keeping the temperature under control and average temperature is low, as shown in Fig. 5.

Figure 6 shows the charging current during the recharge of test 4. Its value is imposed by the BMS following the rule reported in Eq. 1. Starting from a value of about 1.65 V the current reduces as the maximum voltage among the 6 cells increases.

Figure 7 shows the power efficiency during time defined as

$$\eta_P = \frac{(P_{sup} - P_{int} - P_{bal})}{P_{sup}} \tag{4}$$

|                      |          | U                                 | 1     |       |       |       |       |
|----------------------|----------|-----------------------------------|-------|-------|-------|-------|-------|
| Battery cells        | Panasoni | Panasonic NCR18650B Sanyo UR18650 |       |       |       |       |       |
| BMS algorithm        | В        | В                                 | А     | A     | A     | A     | А     |
| Experimental test    | 1        | 2                                 | 3     | 4     | 5     | 6     | 7     |
| η (not internal) (%) | 59.73    | 75.13                             | 78.88 | 92.86 | 93.78 | 93.76 | 96.26 |
| η (%)                | 58.63    | 73.28                             | 76.52 | 89.69 | 91.54 | 91.84 | 94.00 |
| Max temperature (°C) | 60       | 76                                | 75    | 72    | 72    | 58    | 47    |
| Average temperature  | 53.81    | 43.62                             | 41.39 | 40.05 | 33.85 | 26.70 | 23.92 |
| SoC min init (%)     | 25       | 35                                | 10    | 55    | 75    | 65    | 40    |
| d SoC max init (%)   | 45.0     | 16.0                              | 35.0  | 4.0   | 5.0   | 3.0   | 2.0   |
| d SoC max final (%)  | 22.8     | 2.7                               | 12.9  | 3.8   | 4.8   | 2.8   | 0.9   |
| V min init (mV)      | 3714     | 3816                              | 3525  | 3932  | 4092  | 4003  | 3855  |
| dV max init (mV)     | 298      | 153                               | 229   | 20    | 12    | 13    | 20    |
| dV max final (mV)    | 151      | 4.83                              | 32    | 8     | 7     | 8     | 9     |
| OCV min init (mV)    | 3600     | 3680                              | 3361  | 3723  | 3905  | 3907  | 3657  |
| OCV max final (mV)   | 4044     | 4032                              | 4062  | 3989  | 4039  | 3994  | 4046  |
| Current init (A)     | 1.5      | 1.5                               | 1.5   | 1.6   | 1.6   | 1.6   | 1.6   |
| Current final (A)    | 0.348    | 0.397                             | 0.231 | 1.000 | 0.730 | 0.688 | 0.350 |
| Recharge time (s)    | 7000     | 6000                              | 11000 | 700   | 500   | 1000  | 4500  |

 Table 2
 Performances of the recharge of the 7 test performed

where  $P_{sup}$  is the power coming from the charger,  $P_{int}$  is the power dissipated on the internal resistances of the cells,  $P_{bal}$  is the power dissipated on the balancing resistances. During the period of no balancing the inefficiency is due only to the internal resistances,  $P_{int}$  reduces when the current decreases, that is close to the end of charge. During balancing the efficiency reduces especially when an high number of cells are balancing. The stop balancing period increases the average power efficiency.

As a comparison, Figs. 8, 9, 10 and 11 show the results of test 2 in which the BMS algorithm B is applied. The batteries are initially approximately 1/3 charged, consequently the charge period is higher with respect to test 4. The mismatch on the initial SoC is high (about 16%) so the some cells must be balanced from the beginning of the charge.

Figure 8 shows the values of the voltages of the six cells during charge in test 2. Figure 9 shows the balancing signals. Cell 1 and cell 4 reach soon the balancing threshold and they are almost continuously balancing. As soon as the other cells reaches the cells with the maximum voltages, they start balancing. After about 4100 s the difference among the cell voltages is low and from that moment the cell with the lowest voltages continuously changes. The balancing is always active and many cells are on average balancing, therefore the power dissipated by the BMS is high and the temperature increases at the end of the charge, as shown in Fig. 10. Figure 11 shows the power efficiency during time. At the end of the charge the power efficiency

is extremely low since almost all the supply current is flowing in in the balancing resistances.

The overall performances, discussed in detail for test 2 and 4, are summarized in Table 2. Table 2 reports for the 7 tests on the two types of batteries:

• The energy efficiency neglecting the internal dissipation defined as

$$\eta_{not internal} = \frac{(E_{sup} - E_{bal})}{E_{sup}}$$
(5)

• The energy efficiency defined as

$$\eta = \frac{(E_{sup} - E_{int} - E_{bal})}{E_{sup}} \tag{6}$$

where  $E_{sup}$  is the energy coming from the charger during all the recharge,  $E_{int}$  is the energy dissipated on the internal resistances of the cells,  $E_{bal}$  is the energy dissipated on the balancing resistances.

- The maximum and average temperature during all the charge.
- The minimum initial SoC and the maximum difference of the SoC ( $dSoC_{max}$ ) among the cells at the beginning and at the end of the recharge. The initial SoC of each cell  $SoC_i(0)$  is estimated from the model parameters, while

$$SoC_i(t) = SoC_i(0) + \frac{1}{C_{nom}} \int_0^t I_i(\tau) d\tau$$
(7)

where  $C_{nom}$  is the nominal capacity of the cell and  $I_i(\tau)$  is the measured current flowing in the cell (considering balancing).

• The minimum initial voltage of the cells when the current is applied, that is it the Open Circuit Voltage (OCV) plus the voltage over the internal resistances. The maximum difference among the cell voltages at the beginning of the charge and at the end of the charge

$$dV_{max} = max_{i=1..n}V_i - min_{i=1..n}V_i$$
(8)

- The initial OCV measured before the charge with zero supply current, and the final OCV, estimated from the cell voltage measurement and the estimated internal voltage drop.
- The initial and final imposed supply current.
- The total time of the measurements.

In all the tests shown in Table 2 the recharge arrives to an OCV of about 4.05 V that is about 90% of full charge, apart from test 4 and 6 with an OCV of about 4 V.

In test 1–3 the recharge start from a low charging status and the initial  $dSoC_{max}$  is high so the charging time high and time spent in balancing is high. In test 1

the balancing is not completed at the end of the test:  $dV_{max}$  is still at 150 mV, but due to the high  $dSoC_{max}$  the balancing is continuous and consequently the average temperature is high and the energy efficiency  $\eta$  is low. We can compare the two BMS algorithms using the tests 2 and 3. The initial  $SoC_{min} = 10\%$ ,  $dSoC_{max} = 35\%$  and  $dV_{max} = 229$  mV in test 3 that uses algorithm A, while  $SoC_{min} = 35\%$ ,  $dSoC_{max} = 16\%$  and  $dV_{max} = 153$  mV in test 2 that uses algorithm B. So the initial charging conditions are much worst for test 3 with respect to test 2, conversely the maximum and average temperature are similar and the energy efficiency is similar too.

In tests 4–7 the initial  $dSoC_{max}$  and  $dV_{max}$  are low, so the cells are almost balanced from the beginning. Algorithm A have good performances with energy efficiency over 90% and low average temperature,

The difference between  $\eta_{not internal}$  and  $\eta$  is low, about 2%, this means that the energy loss on internal resistances is negligible with respect to balancing energy. The BMS algorithm is fundamental to reduce the energy loss.

#### 5 Conclusions

A hardware and software test system for BMS algorithms has been presented. The system has been applied on 12 commercial batteries. Many tests have been performed to test the system and to verify the efficiency of two BMS algorithms. The results show that the proposed algorithm allows an improvement in energy efficiency and temperature control. Further improvements on the BMS algorithm will be carried out using the developed hardware.

#### References

- 1. Horiba, T.: Lithium-Ion battery systems. Proc. IEEE 102(6), 939-950 (2014)
- Hannan, M.A., Hoque, M.M., Hussain, A., Yusof, Y., Ker, P.J.: State-of-the-Art and energy management system of lithium-ion batteries in electric vehicle applications: issues and recommendations. IEEE Access 6, 19362–19378 (2018)
- 3. Xiong, R., Cao, J., Yu, Q., He, H., Sun, F.: Critical review on the battery state of charge estimation methods for electric vehicles. IEEE Access 6, 1832–1843 (2018)
- Lin, C., Tang, A., Wang, W.: A review of SOH estimation methods in lithium-ion batteries for electric vehicle applications. Energy Procedia 75, 1920–1925 (2015)
- Rusu, F.A., Livint, G.: Estimator for a pack of lithium-ion cell. In: 2016 20th International Conference on System Theory, Control and Computing (ICSTCC), Sinaia, pp. 573–577 (2016)
- Orcioni, S., Buccolini, L., Ricci, A., Conti, M.: Lithium-ion battery electrothermal model, parameter estimation, and simulation environment. Energies 10(3), 375 (2017)
- Scavongelli, C., et al.: Battery management system simulation using system C. In: International Workshop on Intelligent Solutions in Embedded Systems WISES2015, pp. 151–156, Ancona, Italy, October 29–30, 2015
- Zheng, L., Zhu, J., Wang, G.: A comparative study of battery balancing strategies for different battery operation processes. In: 2016 IEEE Transportation Electrification Conference and Expo (ITEC), Dearborn, MI, pp. 1–5 (2016)

- Lotfi, N., Fajri, P., Novosad, S., Savage, J., et al.: Development of an experimental Testbed for research in Lithium-Ion battery management systems. Energies 6(10), 5231–5258 (2013)
- Alvarez, B., Garcia, S., Ramis, C.: Developing an active balancing model and its battery management system platform for lithium ion batteries. In: IEEE International Symposium on Industrial Electronics, pp. 1–5 (2013)
- Buccolini, L., Garbuglia, F., Unterhorst, M., Conti, M.: HW platform for BMS algorithm validation. In: Proceedings of the 14th Conference on Ph.D. Research Microelectronics and Electronics (PRIME 2018), Praha, July 2–5 2018

# Part VIII Digital Circuits and Systems

## Design of Low-Power Approximate LMS Filters with Precision-Scalability



Darjn Esposito, Gennaro Di Meo, Davide De Caro, Antonio G. M. Strollo and Ettore Napoli

**Abstract** Approximate Computing (AC) waives error free computation to improve circuits performances. Adaptive Least-Mean-Squares (LMS) filters can benefit from AC, being both power hungry and inherently approximate. In this paper an approximate LMS filter is proposed, which is able to change, at runtime, the precision level by acting on an external quality knob. An auxiliary circuit enables the approximation mode, in which the update of some of the filter coefficients is frozen. The proposed filter achieves a power improvement in the range 5-32%, as function of the tolerable quality degradation.

## 1 Introduction

Approximate Computing (AC) is a new paradigm that enlarges the design space by sacrificing exact calculations with the aim of improving circuits performance. Error resilient applications, showing little quality degradation in presence of inexact processing, can mostly benefit from AC [1, 2]. In particular, several arithmetic–intensive applications have an excellent resiliency to errors and hence constitute a fertile ground to employ approximate hardware circuits [3-8].

Adaptive Least-Mean-Squares (LMS) algorithm is the most used approach for adaptive filtering [9] with countless applications ranging from noise cancellation to system identification, channel equalization and so on. As shown in Fig. 1a, an adaptive LMS filter includes a time-varying FIR filter and a weights—update block, which updates the FIR filters coefficients, in order to minimize the difference between the desired signal d(n) and the filter output p(n). The Fig. 1b reports the hardware implementation of an adaptive LMS filter, using the direct form FIR filter architecture [10]. Both the FIR filter and the weights-update block require a massive usage of multipliers, that drastically impact on the power consumption.

D. Esposito (🖾) · G. Di Meo · D. De Caro · A. G. M. Strollo · E. Napoli

Department of Electrical Engineering and Information Technology, University of Napoli "Federico II", Naples, Italy

e-mail: darjn.esposito@unina.it; darjesp@gmail.com

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_27



Fig. 1 Adaptive LMS filter. a general overview; b hardware implementation

As discussed in [11], the LMS algorithm can be classified as an error resilient application, in which approximate computing can be successfully exploited to decrease the power consumption. To that purpose, approximate multipliers are introduced in [11] in the FIR section of the adaptive filter. A drawback of this approach is that the approximation level is fixed at design time, while runtime precision-scalability is highly desirable to apply AC techniques in general-purpose systems [12–15].

In this paper we propose a runtime precision–scalable LMS filter, in which the quality level can be tuned as a function of the applications, by using an external quality knob. Approximation is introduced at algorithmic level, by freezing the update of some filter weights, to save power. A simple auxiliary circuit monitors the error and the quality knob and drives the filter in and out the low–power approximate modality. Implementation results show that the proposed LMS filter achieves a power improvement that spans in 5-32%, as function of the tolerable quality degradation.

#### 2 LMS Algorithm

In this section the LMS algorithm is briefly reviewed. The weights–update block of Fig. 1 iteratively updates the weights  $w_k(n)$ , according to the LMS algorithm:

$$w_k(n+1) = w_k(n) + \mu \cdot e(n) \cdot x(n-k) \tag{1}$$

Where the term  $e(n) \cdot x(n-k)$  is the approximate *k*-th gradient component of the Mean Squared Error (MSE),  $\mu$  is the step-size parameter (which allows trading convergence speed and steady-state error) and e(n) is the difference between desired signal d(n) and filter output p(n). The weights  $w_k$  are used in the FIR filter as follows:

$$p(n) = \sum_{k=1}^{T} w_k(n) \cdot x(n-k)$$
(2)

where *T* is the number of taps. The Fig. 1b highlights the *k*-th LMS tap, implementing both (1) (upper part of Fig. 1b) and (2) (lower part of Fig. 1b),

#### **3** Proposed Precision-Scalable Adaptive LMS Filter

We modify the *k*-th LMS tap of Fig. 1b, to obtain a precision–scalable tap. The Fig. 2a reports the proposed circuit (named *PS*-tap in the following). The *PS*-tap can operate in either exact-mode or approximate-mode, depending on control signal  $\alpha_k$  (as shown in Fig. 2a, this signal in the *PS*-tap is latched by the flip-flop FF).

The tap is in exact mode when  $\alpha_k$  is high: in this case the gates A1 and A2 allows the signal propagation and the CGIC (clock-gate integrated cell) is enabled. Please note that the components A1, A2, FF and CGIC give a small overhead in terms of power consumption, compared to the standard tap in Fig. 1b.

The tap operates in low-power, approximate mode when  $\alpha_k$  is low. In this modality, the gates A1 and A2 are stuck at zero, preventing the switching of the multipliers *m*1 and *m*2. Note that in this case the weight  $w_k(n)$  is not updated and we can clock–gate the register R1, thru the CGIC.



Fig. 2 a Proposed precision-scalable tap, b Approximation management block

It is worth observing that freezing the coefficients introduces an approximation at algorithmic level. The Approximation Management block (named *A-M* in the following), shown in Fig. 2b, determines when the *PS-tap* is driven in approximate mode. The *A-M* receives as inputs the error e(n), the weights  $w_k(n)$ , the threshold *THR* and produces as output the  $\alpha_k(n)$  signals (the enable signal *en* is used for testing purposes: when not asserted, it disables the *A-M* so that the filter always operates in full–precision).

The *A-M* monitors the variations in error e(n) and asserts the scalability flag *F* when a steady–state condition is reached; this check is performed in the Approximation Scalability Flag sub-block of Fig. 2b. When F = 1, the absolute value of  $w_k$  is compared with the threshold *THR* in the Approximation Control sub-block of Fig. 2(b). If  $|w_k| \leq THR$ , the output  $\alpha_k$  is lowered and the *k*-th *PS-tap* is driven in approximate mode. The rationale behind this approach is that small weights contribute lesser to the quality of the adaptation (note also that *THR* is the quality knob of our quality-scalable system).

The Approximation Scalability Flag sub-block works with a divided clock, to reduce as much as possible the power consumption. This sub-block receives a sub-sampled e(n) and evaluates the derivative de: if large variations of de are detected we assume that the LMS filter has not reached a steady–state condition and signal F1 is kept low. On the other hand, if the most significant k-bits of de are equal each other we infer that the error variation is small and, therefore, a steady–state condition has been reached. In this case, signal F1 is asserted. The accuracy of such approach is highly affected by the gradient noise (due to the approximate computation of the gradient (1)), therefore the quality scalability flag F is asserted only when F1 is high for L consecutive clock cycles.

#### 4 Precision-Power Trade-Off

We investigated the Precision-Power trade-off in a system identification application, in which the systems to identify are one FIR filter (FIR1) and two IIR filters (IIR1, IIR2). The impulse response of the three considered filters are shown in Fig. 3 (for the IIR filters the first 20 coefficients are shown).



Fig. 3 Impulse response of the filter to identify. a FIR1, b IIR1, c IIR2

| Table 1         Post-layout area and           leakage power results | LMS Filter | Area [mm <sup>2</sup> ] | $P_{LEAKAGE}$ [ $\mu$ W] |  |
|----------------------------------------------------------------------|------------|-------------------------|--------------------------|--|
| leakage power results                                                | Standard   | 0.205                   | 69                       |  |
|                                                                      | Proposed   | 0.217 (+6%)             | 71 (+3%)                 |  |

We designed the proposed and the exact LMS filters in 16-bits fixed-point arithmetic, with T = 20 taps. For the A-M block of Fig. 2b we chose L = 7, k = 13, while the clock frequency of the Approximation Scalability Flag sub-block is 32 times lower than that of the remaining system. The proposed and exact filters have been synthesized in TSMC 40 nm technology using Cadence RTL Compiler and placed and routed using Cadence SoC Encounter. To minimize power, a Multiple—Supply–Voltage flow has been implemented, in which the A-M block is placed in a voltage island with VDD = 0.9 V, while the remaining system uses the nominal 1.1 V. The system operates at 222 MHz clock frequency. The power consumption has been evaluated post-layout, with SDF and VCD-based gate-level simulations.

The Table 1 reports the area and leakage results for the proposed and the exact adaptive filters, showing that proposed circuit has a limited overhead of 6% and 3% in terms of Area and Leakage, respectively.

The Table 2 reports the Precision-Power trade-off for the three considered filters, in three different modalities: "E" the *A*-*M* block is disabled and no approximations are introduced; "L.D." low error degradation with moderate approximations; "H.D." high error degradation with aggressive approximation. In Table 2 the *THR* values corresponding to the two L.D. and H.D. modalities are reported as an integer multiple of the LSB. In E modality a power overhead in the range 3%-7% is observed. On the other hand, in L.D. modality, when low quality degradation is tolerable, we have a power reduction that spans in 9%-29%. Finally, when power dissipation is of primary importance and H.D. modality can be tolerated, a power reduction up to 32% can be achieved. We observe that the power improvement is limited for filters having sporadic zero coefficients as IIR1 (compare with Fig. 3b).

To investigate the quality degradation, the Fig. 4 shows the harmonic response (in steady–state) for modalities **L.D.** (Fig. 4a) and **H.D.** (Fig. 4b), for the FIR1 filter identification. As it can be observed, the harmonic response in **L.D.** mode is almost indistinguishable from the one obtained with the standard LMS filter, whereas in **H.D.** mode some degradation in the stop-band behavior appears.

### 5 Conclusions

A precision–scalable adaptive LMS filter has been proposed. The filter can change, at runtime, the precision level, by acting on an external quality knob. A simple auxiliary circuit automatically drives the adaptive filter in low-precision modality, when a steady-state condition has been reached. The proposed filter can give power improvement up to 32% with negligible or tolerable quality degradation.

| FIKI                    |                        |                  |                        |                         |                       |                    |                  |  |
|-------------------------|------------------------|------------------|------------------------|-------------------------|-----------------------|--------------------|------------------|--|
|                         | E(en = 0)              |                  | L. D. (THE             | L. D. (THR = 2)         |                       | H. D. $(THR = 64)$ |                  |  |
| Circuit                 | MSE                    | Pdyn<br>[µW/MHz] | MSE                    | Pdyn<br>[µW/MHz]        | MSE                   | Pdyn [μW           | /MHz]            |  |
| Standard                | $7.89 \times 10^{-10}$ | 239              | -                      | -                       | -                     | -                  |                  |  |
| Proposed                | $7.89 \times 10^{-10}$ | 250<br>(+6.69%)  | $7.95 \times 10^{-10}$ | 172<br>(-28%)           | $9.73 \times 10^{-9}$ |                    | 167<br>(-30%)    |  |
| IIR1                    |                        |                  |                        |                         |                       |                    |                  |  |
|                         | E(en = 0)              |                  | L. D. (THE             | R = 1)                  | H. D. (TH             | H. D. (THR = 32)   |                  |  |
| Circuit                 | MSE                    | Pdyn<br>[µW/MHz] | MSE                    | Pdyn<br>[µW/MHz]        | MSE                   | Pdyn [μW           | /MHz]            |  |
| Standard                | $2.63 \times 10^{-9}$  | 247              | -                      | -                       | -                     | -                  |                  |  |
| Proposed                | $2.63 \times 10^{-9}$  | 258<br>(+4.45%)  | $2.66 \times 10^{-9}$  | 224<br>(-9.31%)         | $7.00 \times 10^{-8}$ |                    | 201<br>(-18.62%) |  |
| IIR2                    |                        |                  |                        |                         |                       |                    |                  |  |
|                         | E(en = 0)              |                  | L. D. (THE             | R = 1)                  | H. D. (TH             | IR = 32)           |                  |  |
| Circuit                 | MSE                    | Pdyn<br>[µW/MHz] | MSE                    | Pdyn<br>[µW/MHz]        | MSE                   | Pdyn [μW           | /MHz]            |  |
| Standard                | $7.48 \times 10^{-9}$  | 253              | -                      | -                       | -                     | -                  |                  |  |
| Proposed                | $7.48 \times 10^{-9}$  | 262<br>(+3.56%)  | $7.58 \times 10^{-9}$  | 180<br>(-28.85%)        | $8.47 \times 10^{-5}$ |                    | 172<br>(-32.02%) |  |
| (a) 0<br>(a) -20<br>-40 |                        | Stand            | dard<br>osed           | (b) 0<br>20 - 20<br>-40 |                       | Stand<br>Propo     | dard<br>osed     |  |

 Table 2
 Post-layout Power-Precision tradeoff for the FIR1, IIR1, IIR2 filters identification

 FIR1
 FIR1



Fig. 4 Harmonic response of FIR1 identified using standard or proposed adaptive filter. **a L.D.** Low Degradation modality, **b H.D.** High Degradation modality

## References

- 1. Han, J., Orshansky, M.: Approximate computing: An emerging paradigm for energy-efficient design. In: 18th IEEE European Test Symp. (ETS), Avignon, pp. 1–6 (2013)
- Roy, K., Raghunathan, A.: Approximate computing: an energy-efficient computing technique for error resilient applications. In: IEEE Computer Society Annual Symposium VLSI, pp. 473–475 (2015)
- Shafique, M., Ahmad, W., Hafiz, R.: A low latency generic accuracy configurable adder. In: DAC Design Automation Conference, San Francisco, CA, pp. 1–6 (2015)
- Esposito, D., Castellano, G., De Caro, D., Napoli, E., Petra, N., Strollo, A.G.M.: Approximate adder with output correction for error tolerant applications and Gaussian distributed inputs. In: 2016 IEEE International Symposium on Circuits and Systems (ISCAS), Montreal, QC, pp. 1970–1973 (2016)
- Verma, A.K., Brisk, P., Ienne, P.: Variable latency speculative addition: a new paradigm for arithmetic circuit design. In: Proceedings of Design, Automation and Test in Europe, Munich, pp. 1250–1255 (2008)
- 6. Esposito, D., De Caro, D., Napoli, E., Petra, N., Strollo, A.G.M.: Variable latency speculative han-carlson adder. IEEE Trans. Circuits Syst. I Regul. Pap. **62**(5), 1353–1361 (2015)
- Esposito, D., De Caro, D., Strollo, A.G.M.: Variable latency speculative parallel prefix adders for unsigned and signed operands. IEEE Trans. Circuits Syst. I Regul. Pap. 63(8), 1200–1209 (2016)
- Esposito, D., Strollo, A.G.M., Napoli, E., De Caro, D., Petra, N.: Approximate multipliers based on new approximate compressors. IEEE Trans. Circuits Syst. I Regul. Pap. https://doi. org/10.1109/tcsi.2018.2839266
- 9. Haykin, S.: Adaptive Filter Theory. Prentice-Hall (2002)
- Meher, P.K., Park, S.Y.: Critical-path analysis and low-complexity implementation of the LMS adaptive algorithm. IEEE Trans. Circuits and Syst. I 61(3), 778–788 (2014)
- Esposito, D., Di Meo, G., De Caro, D., Petra, N., Napoli, E., Strollo, A.G.M.: On the use of approximate multipliers in LMS adaptive filters. In: 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, pp. 1–5 (2018)
- Frustaci, F., Khayatzadeh, M., Blaauw, D., Sylvester, D., Alioto, M.: SRAM for error-tolerant applications with dynamic energy-quality management in 28 nm CMOS. IEEE J. Solid State Circuits 50(5), 1310–1323 (2015)
- Esposito, D., Strollo, A.G.M., Alioto, M.: Power-precision scalable latch memories. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, pp. 1–4 (2017)
- 14. de la Guia Solaz, M., Han, W., Conway, R.: A flexible low power DSP with a programmable truncated multiplier. IEEE Trans. Circuits Syst. I Regul. Pap. **59**(11), 2555–2568 (2012)
- Moons, B., Verhelst, M.: An energy-efficient precision-scalable ConvNet processor in 40-nm CMOS. IEEE J. Solid State Circuits 52(4), 903–914 (2017)

## An Optimized Partial-Distortion-Elimination Based Sum-of-Absolute-Differences Architecture for High-Efficiency-Video-Coding



### Paolo Selvo, Maurizio Masera, Riccardo Peloso, Guido Masera, Muhammad Shafique and Maurizio Martina

**Abstract** Sum of Absolute Differences (SAD) is one of the most time consuming tasks in video coding. This paper proposes an architecture to compute the SADs for all the different block sizes required by the High Efficiency Video Coding (HEVC) standard. Moreover, the Partial Distortion Elimination (PDE), clock gating and a low leakage technology enable high power/energy reductions/savings over the state of the art.

Keywords VLSI architecture · Motion estimation · HEVC

## 1 Introduction

High Efficiency Video Coding (HEVC) is the latest video compression standard developed by the Joint Collaborative Team on Video Coding (JCT-VC) [10]. The encoding process relies on inter prediction to exploit the temporal redundancy between successive frames, by implementing the Motion Estimation (ME). ME is

P. Selvo e-mail: paolo.selvo@polito.it

R. Peloso e-mail: riccardo.peloso@polito.it

G. Masera e-mail: guido.masera@polito.it

M. Martina e-mail: maurizio.martina@polito.it

M. Shafique Institute of Computer Engineering, Vienna University of Technology (TU Wien), Vienna, Austria e-mail: muhammad.shafique@tuwien.ac.at

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_28

P. Selvo  $\cdot$  M. Masera  $\cdot$  R. Peloso  $\cdot$  G. Masera  $\cdot$  M. Martina ( $\boxtimes$ )

Department of Electronics and Telecommunications, Politecnico di Torino, Turin, Italy e-mail: maurizio.masera@polito.it



Fig. 1 The organization of the CU in HEVC (a). ME process (b)

performed on the Prediction Unit (PU) basis, where PUs are built by splitting the current Coding Unit (CU) into one, two or four PUs, as shown in Fig 1a. During the Integer Motion Estimation (IME) phase, the encoder explores a search window in the reference frames in order to find the best match with the current block (see Fig. 1b). This operation is repeated for each PU by computing the Sum of Absolute Differences (SAD) between the samples of current block and of the candidate blocks from

the reference frames. The SAD metric is calculated as  $SAD = \sum_{i=1}^{W} \sum_{j=1}^{H} |C_{i,j} - R_{i,j}|$ ,

where and  $C_{i,j}$  and  $R_{i,j}$  denote each pixel of the current and the reference blocks, respectively, which size is  $W \times H$ . Therefore, the reference block, which produces the minimum SAD, is chosen as predictor of the current block. As argued in [1], the calculation of SAD and other distortion metrics can take up to 40% of the encoding time and about 80% of the energy budget [4], thus being one of the most time consuming and energy demanding tasks at the encoder side. For this reason, several architectures have been proposed in the open literature to accelerate the SAD computation in HEVC. In [7] an high speed SAD architecture for FPGA which supports all the PU sizes specified in HEVC has been proposed by relying on parallel computation of four 8 × 8 PU. Then, the works in [5, 6] implement two very fast accelerators on FPGA and ASIC respectively, which are able to compute the 64 × 64 SAD in only 16 clock cycles. However, neither bandwidth constraints nor real memories have been considered in these works, thus leading to implementations which are far from being easily integrated in a system. A slower, but more realistic hardware SAD architecture has been proposed in [3], which requires 32 clock cycles to compute the  $64 \times 64$  SAD. Besides, Partial Distortion Elimination (PDE) is a very effective technique to improve the performance while not affecting the coding efficiency [2]. Since the IME aims to find the best match that minimizes the SAD cost function, one can stop the computation when a partial result is already larger than the current minimum SAD value. PDE has been recently exploited in [8] only for  $4 \times 4$  PUs. Therefore, this paper proposes a reconfigurable and low-power SAD architecture, which extends the PDE approach to all the PU sizes specified in HEVC.

### 2 Proposed Architecture

The proposed architecture, which computes the SAD for all the PU sizes listed in the first column of Table 1, is made of three main blocks: three SRAMs, a control unit, named SAD interface, and a pipelined SAD datapath, as shown in Fig. 2. Moreover, a minimum SAD updater module handles the PDE, when it is activated, namely, it stores the current minimum SAD value and provides the comparison value to the comparators inside the datapath.

The maximum input sample rate has been fixed to about 50 GB/s, so that the external memory can be a common SDRAM running at 1280 MHz (*clock1*) with a 256 bit data bus. As a consequence, with a clock frequency equal to 160 MHz (*clock2*), the architecture can read 2048 bits (256 samples) per clock cycle from the SRAMs, where the reference and the current PU are stored. Two SRAMs for the reference PU

|                | 8, r    |         |           |         |         |           |         |         |           |
|----------------|---------|---------|-----------|---------|---------|-----------|---------|---------|-----------|
| SAD size       | Clock   | Clock   | Time      | Power   | Power   | Power     | Energy  | Energy  | Energy    |
|                | cycles  | cycles  | variation | w/o PDE | w/o PDE | variation | w/o PDE | w/o PDE | variation |
|                | w/o PDE | w/o PDE | (%)       | [mW]    | [mW]    | (%)       | [pJ]    | [pJ]    | (%)       |
| $8 \times 4$   | 6.0     | 6.0     | 0.0       | 12.10   | 12.12   | 0.2       | 454     | 455     | 0.2       |
| $8 \times 8$   | 7.0     | 7.0     | 0.0       | 15.30   | 15.30   | 0.0       | 669     | 669     | 0.0       |
| $16 \times 4$  | 7.0     | 7.0     | 0.0       | 15.42   | 15.43   | 0.1       | 675     | 675     | 0.1       |
| $16 \times 8$  | 8.0     | 7.2     | -10.0     | 21.77   | 23.20   | 6.6       | 1089    | 1044    | -4.1      |
| $16 \times 12$ | 9.0     | 7.6     | -15.6     | 33.47   | 36.97   | 10.5      | 1883    | 1756    | -6.7      |
| $16 \times 16$ | 9.0     | 7.8     | -13.3     | 33.82   | 36.16   | 6.9       | 1902    | 1763    | -7.3      |
| $32 \times 8$  | 9.0     | 7.7     | -14.4     | 33.88   | 37.65   | 11.1      | 1906    | 1812    | -4.9      |
| $32 \times 16$ | 11.0    | 9.4     | -14.5     | 40.56   | 44.74   | 10.3      | 2789    | 2628    | -5.7      |
| $32 \times 24$ | 12.0    | 10.2    | -15.0     | 48.39   | 53.36   | 10.3      | 3629    | 3402    | -6.3      |
| 32 × 32        | 13.0    | 10.8    | -16.9     | 54.02   | 60.58   | 12.1      | 4389    | 4089    | -6.8      |
| $64 \times 16$ | 13.0    | 12.0    | -7.7      | 54.45   | 61.95   | 13.8      | 4424    | 4646    | 5.0       |
| $64 \times 32$ | 17.0    | 14.0    | -17.6     | 72.34   | 87.15   | 20.5      | 7686    | 7626    | -0.8      |
| $64 \times 48$ | 21.0    | 15.8    | -24.8     | 86.02   | 98.67   | 14.7      | 11290   | 9744    | -13.7     |
| $64 \times 64$ | 25.0    | 18.4    | -26.4     | 92.49   | 106.28  | 14.9      | 14452   | 12222   | -15.4     |

Table 1 Timing, power and energy results for different SAD cases



Fig. 2 Proposed architecture block scheme

allow the SAD datapath to work every clock cycle avoiding pipeline stalls, namely, the two SRAMs are used in an interleaved fashion so that while the first one provides data to the datapath (reading), the second one loads a new frame from the external SDRAM (writing). The dashed blue line in Fig. 2 is used to separate the two clock domains. The data stored in the local SRAM are arranged as sets of  $4 \times 4$  samples, so the datapath is made of 16 Processing Elements (PEs) working concurrently, as shown in Fig. 3. Each PE computes a  $4 \times 4$  SAD, therefore, it contains an adder-tree made of 15 adders. At each clock cycle, up to 16 sets of  $4 \times 4$  samples from the current and reference PUs are sent to the datapath. As a consequence, a  $16 \times 16$  SAD requires one cycle to be computed, whereas a  $64 \times 64$  SAD needs 16 clock cycles. This aspect is handled by the multiplexer in the upper-right part of Fig. 3 (before the SAD register), which selects the correct exit point.

It is worth noting that, each pipeline register (dashed red line in Fig. 3) has its own enable port (blue lines in Fig. 3), which allows for clock gating optimization to reduce the power consumption. Finally, five comparators have been placed in the datapath



Fig. 3 Datapath block scheme

(see Fig. 3), so allowing to implement PDE, namely if one of the comparators detects a value greater than the current minimum SAD, then the SAD computation is stopped. Extensive simulations have shown that there is no significant latency reduction by adding other comparators inside the datapath.

#### **3** Implementation Results

The proposed architecture has been described in VHDL [9] and synthesized on a CMOS 65 nm standard cell technology using SRAM IPs. Timing simulations have been performed by computing the SADs on successive frames of  $1920 \times 1080$  standard sequences (e.g. BOTerrace). The SADs have been computed with the accelerator either activating or deactivating the PDE technique, thus allowing to measure the average number of clock cycles needed to calculate the SAD for each PU. Table 1 reports the total number of clock cycles required to compute the SAD on PUs of different sizes without and with PDE, respectively: as expected, the speedup offered by PDE increases with PU sizes. Power consumption results have been obtained by extracting the switching activity of the circuit when running 1024 SAD test vectors for each PU. Since the architecture is parallel and pipelined, the clock gating technique has been employed to reduce the power consumption for the small PUs, which do not use all the datapath elements. This technique is effective for the PUs from  $8 \times 4$  to  $16 \times 8$ , which save from the 24.5% to the 8.3%, respectively. Clock gating costs about the 0.4% of the total area, thus being negligible. Despite the use of PDE increases the power consumption as the PU size increases, the achieved speed up allows for an interesting energy reduction, ranging from about 4-15%.

A fair comparison of the proposed solution with those proposed in the literature is not straighforward. Despite several architectures in the literature, as well as the proposed one, employ 16 clock cycles in the datapath to compute a  $64 \times 64$  SAD, no information about the number of clock cycles spent by the control unit and to load/store samples from/to the memory is given in the literature (the proposed architecture requires at most 25 clock cycles, 18.4 with PDE). Moreover, the architectures in [5–7] achieve very high clock frequencies, thus their delay for a  $64 \times 64$  SAD are respectively 96.63 ns, 34.88 ns and 22 ns. However, these works do not specify the required input bandwidth and they do not explain how to bring to their architecture all the required samples. As far as power consumption is concerned, the SAD architecture in [7] features a power consumption that is three times higher than the one of the proposed architecture. Moreover, it becomes 4 times higher if on-chip memory is not considered. Finally, the proposed architecture requires an area which is equal to  $1.2 \text{ mm}^2$ , corresponding to 90 k, which is more than four times lower than the area reported in [5].

## 4 Conclusion

This paper presented an energy efficient ASIC architecture, to perform all the SAD operations required by the HEVC standard. Optimized memory management and a parallel datapath allow to achieve a high throughput, while the PDE method permit to further speed-up the computation and to save energy, leading to an area and energy efficient solution.

## References

- Bossen, F., Bross, B., Suhring, K., Flynn, D.: HEVC complexity and implementation analysis. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1685–1696 (2012)
- Chiu, M.Y., Siu, W.C.: New results on exhaustive search algorithm for motion estimation using adaptive partial distortion search and successive elimination algorithm. In: 2006 IEEE International Symposium on Circuits and Systems, pp. 4–3981 (May 2006)
- Dinh, V.N., Phuong, H.A., Duc, D.V., Ha, P.T.K., Tien, P.V., Thang, N.V.: High speed SAD architecture for variable block size motion estimation in HEVC encoder. In: 2016 IEEE International Conference on Communications and Electronics, pp. 195–198 (July 2016)
- 4. El-Harouni, W., Rehman, S., Prabakaran, B.S., Kumar, A., Hafiz, R., Shafique, M.: Embracing approximate computing for energy-efficient motion estimation in high efficiency video coding. In: Design, Automation and Test in Europe Conference, pp. 1384–1389 (Mar 2017)
- Medhat, A., Shalaby, A., Sayed, M.S.: High-throughput hardware implementation for motion estimation in HEVC encoder. In: IEEE International Midwest Symposium on Circuits and Systems, pp. 1–4 (Aug 2015)
- Medhat, A., Shalaby, A., Sayed, M.S., Elsabrouty, M., Mehdipour, F.: A highly parallel SAD architecture for motion estimation in HEVC encoder. In: IEEE Asia Pacific Conference on Circuits and Systems, pp. 280–283 (Nov 2014)
- Nalluri, P., Alves, L.N., Navarro, A.: High speed SAD architectures for variable block size motion estimation in HEVC video coding. In: IEEE International Conference on Image Processing, pp. 1233–1237 (Oct 2014)
- Seidel, I., Brascher, A.B., Guntzel, J.L.: Combining pel decimation with partial distortion elimination to increase SAD energy efficiency. In: International Workshop on Power and Timing Modeling, Optimization and Simulation, pp. 177–184 (Sept 2015)
- 9. Selvo, P.: VHDL code of an optimized SAD architecture for HEVC (Oct 2017). http://personal. det.polito.it/maurizio.martina/hevc.html
- Sullivan, G.J., Ohm, J.R., Han, W.J., Wiegand, T.: Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 22(12), 1649–1668 (2012)

## Efficient Ensemble Machine Learning Implementation on FPGA Using Partial Reconfiguration



Gian Carlo Cardarilli, Luca Di Nunzio, Rocco Fazzolari, Daniele Giardino, Marco Matta, Marco Re, Francesca Silvestri and Sergio Spanò

**Abstract** Ensemble Machine Learning (EML) consists of the combination of multiple Artificial Intelligence algorithms. This paper presents an efficient FPGA implementation of an Ensemble based on Long Short-Term Memory Networks (LSTM). For an efficient implementation, the proposed design uses the Partial Reconfiguration function available for FPGAs. Results are presented in terms of resources utilization, reconfiguration speed, power consumption and maximum clock frequency.

## 1 Introduction

In the last few years, Machine Learning (ML) gained an important role in several fields [1–3]. The availability of increasingly computational power and the introduc-

G. C. Cardarilli e-mail: cardarilli@ing.uniroma2.it

R. Fazzolari e-mail: fazzolari@ing.uniroma2.it

D. Giardino e-mail: giardino@ing.uniroma2.it

M. Matta e-mail: matta@ing.uniroma2.it

M. Re e-mail: re@ing.uniroma2.it

F. Silvestri e-mail: f.silvestri@ing.uniroma2.it

S. Spanò e-mail: spano@ing.uniroma2.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_29

253

G. C. Cardarilli · L. Di Nunzio ( $\boxtimes$ ) · R. Fazzolari · D. Giardino · M. Matta · M. Re · F. Silvestri · S. Spanò

University of Rome Tor Vergata, Via del Politecnico 1, 00133 Rome, Italy e-mail: di.nunzio@ing.uniroma2.it



Fig. 1 An example of EML in a classification problem. Three different classifiers that represent the Base Learners are aggregated to obtain a best classifier

tion of new technologies have increased the interest in ML [4–8]. In this scenario, Ensemble Machine Learning (EML) plays an important role. EML consists of the combination of multiple ML algorithms. The use of multiple ML algorithms allows increasing the performance with respect to the single ML algorithm [9–11]. The single ML algorithm is called "Base Learner" (BL) and the set of Base Learners compose the Ensemble Learner. In Fig. 1 an example of EML is provided for a classification problem. The EML is composed of three different Base Learners, the results of these Base Learners are then combined to obtain a more accurate classification result. EML requires two main features: Base Learners "diversity" and their "aggregation". Base Learner diversity can be achieved changing ML models, hyperparameters, learning options, initialization or training sets. The aggregation, instead, can be realized using several techniques as bagging, majority voting or boosting [9–11], In the last few years, ML has been introduced in Embedded Systems. It is useful in several devices and applications as Sensor nodes [12], Unmanned Autonomous vehicles [13], Health, Video and Image processing.

Also, in Embedded Systems, EML can be used to improve ML performances. In general, ML algorithms are characterized by a high level of parallelism. If high computation rate is required, microprocessors are not suitable. For this reason, the



Fig. 2 Partial Reconfiguration and EML: every Base Learner (BL) is implemented by the relevant partial bitstream

literature offers several solutions involving FPGA, GPUs, or mixed HW/SW architecture. Since EML is based on the use of multiple ML, it is possible to state that the computation effort required for EML is greater than the one required for a single Base Learner. To face the complexity issues, in this paper we propose an efficient EML implementation on a commercial Xilinx FPGA XC7Z7045 using the Partial Reconfiguration technique [14].

The Partial Reconfiguration is a feature provided by some commercial FPGAs. It allows designers to change on the fly the functionality of a portion of FPGA without fully reconfiguring the device. Blocks of logic can be modified by downloading partial bitstreams as many times as needed. During the reconfiguration of a block, the remaining logic continues to operate without interruption. Partial Reconfiguration fits perfectly with EML. This technique allows Base Learners to be loaded from an external memory to the FPGA only when required. Only the Base Learner required in a specific moment is loaded on the FPGA. Using the Partial reconfiguration there is not the necessity to load all the Base Learners in the FPGA and consequently, a smaller FPGA can be used (Fig. 2).

#### **2** Proposed Experiment

As a case study, in this paper we propose an EML based on Long Short-Term Memory (LSTM) for sequence prediction. The proposed system is based on a Xilinx SoC Zynq7045 System On Chip (SOC), composed of an ARM Microprocessor and an FPGA, and a DDR memory is available on board. Five different partial bit-streams representing five different Base Learners are stored in the DDR RAM. When a Base Learner is required for computation, the ARM processor takes the corresponding partial bit-stream from the memory and configures the FPGA using the Partial Reconfiguration technique. Up to three Base Learners can be loaded on the FPGA simultaneously.



Fig. 3 An example of time prediction problem



Fig. 4 LSTM architecture: Layer (left) and Unit (right)

### 2.1 LSTM

Long Short-Term Memory networks (LSTMs) are a special kind of Recurrent neural networks, capable of learning long-term dependencies [15]. They find application in time series prediction problems. Given an input sequence X[n] composed of N samples, LSTM are able to estimate future samples. The time between the last actual sample of X[n] and the predicted sample is called prediction horizon (Fig. 3).

LSTMs are structured on one or more layers of elements called Units (Fig. 4 shows a single layer and a single unit). Each Unit is composed of Gates. Gates are the computational elements of the LSTMs and are essentially composed of artificial neurons characterized by sigmoid and hyperbolic tangent activation functions. A detailed description of the LSTMs is provided in [15].

## 2.2 Hardware Architecture

As discussed in Sect. 1, this paper investigates about the possibility to efficiently implement EML on FPGA using the Partial Reconfiguration technique.

In our experiments we consider the following scenario:

A total of 5 pre-trained Base Learners are ready to use

Efficient Ensemble Machine Learning Implementation ...

- Up to 3 Base learners are used in parallel
- All the Base Learners are based on LSTM

Each LSTM is designed for a prediction horizon of 20 samples and differs from each other in terms of the training sequences, training parameters and weights initialization. The single Base Learner (a LSTM) is mapped in a partial Bit-stream that is loaded in the FPGA from the external memory when required. The first aspect to deal with is the hardware implementation of a single LSTM. As shown in [16], LSTMs can be implemented by Tensor operations as Matrix Vector Multiplications and activation functions.

The LSTMs involved in our experiments are characterized by 20 samples input vectors, 1 layer and 20 units per layer. The above specifications imply the following computations: 8 vector matrix multiplications (between  $1 \times 20$  vectors and  $20 \times 20$  matrices) and 60 scalar products. To reduce the area occupation, the 8 vector-matrix products are implemented using a semi-parallel approach: a single vector/matrix row product is computed in parallel. This solution implies that multipliers are time shared and consequently, there is the necessity to store the LSTM coefficients on local memories. The processing is performed using a word size of 16 bits with scaling and truncation after each multiplication.

### **3** Experimental Results

The EML system has been trained and validated through MATLAB simulation and coded in VHDL. FPGA implementation has been realized using XILINX VIVADO 2018.2. The device used is the XILINX XC7Z7045 SoC.

In Table 1 the resource utilization is provided when three Base Learners are loaded.

The maximum clock speed is 107.53 MHz. The size of a single Base Learner, in terms of partial bitstream, is 6850 KBytes. The time required to load a partial bit-stream containing a single base learner is 16,67 ms. Static power is 0.213 W and dynamic power at 100 MHz is 1.933 W.

The implemented EML has been tested repeating the experiments presented in [17], which are based on the following public access databases: Daily maximum temperature in Melbourne, Internet traffic in bits in France, Daily births in Quebec and the Number of Monthly sunspots.

The performance measure of our setup and the experiment in [17] is reported in Table 2 in terms of Root Mean Square Error.

| Table 1XILINX XC7Z7045 | Resurces | Utilization | Available | Utilization%   |
|------------------------|----------|-------------|-----------|----------------|
| resources utilization  | Resurces | Othization  | Available | O thization // |
|                        | LUT      | 47.182      | 218.600   | 21.57          |
|                        | FF       | 14.059      | 437.200   | 3.22           |
|                        | DSP      | 660         | 900       | 73.33          |

| Table 2         Implemented EML           results         Implemented EML | Dataset | RMSE    | Literature [13] |       |
|---------------------------------------------------------------------------|---------|---------|-----------------|-------|
|                                                                           | Births  | 38.84   | 42.19           |       |
|                                                                           | Traffic | 1097.63 | 1380.53         |       |
|                                                                           |         | Temp.   | 5.35            | 7.74  |
|                                                                           |         | Sun     | 64.46           | 74.88 |

## 4 Conclusions

Partial Reconfiguration in Ensemble Machine Learning is a very powerful solution that allows implementing Base Learners on FPGA without the need to stop the computation and to reload the full bitstream. We implemented a flexible, scalable and interchangeable Ensemble Learner on a XILINX XC7Z7045. Three LSTM base learners compute the relevant output simultaneously, and a total of five base learners are available on the external DDR memory. We validated our EML algorithm comparing our design to a set of recent reference experiments found in Literature [12]. As for the FPGA implementation, the maximum clock frequency of our system is 107.53 MHz and dissipates 1.9 W of dynamic power and 0.2 W of static power. The occupied resources are less than the 22% of LUTS and 75% of DSPs on the selected device, leaving free resources for future developments.

Acknowledgements The authors would like to thank Xilinx Inc, for providing FPGA hardware and software tools by Xilinx University Program.

## References

- Lo Sciuto, G., Susi, G., Cammarata e, G., Capizzi, G.: A spiking neural network-based model for anaerobic digestion process. In: IEEE 23rd International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM) (2016)
- Brusca, S., Capizzi, G., Lo Sciuto e, G., Susi, G.: A new design methodology to predict wind farm energy production by means of a spiking neural network based-system. Int. J. Numer. Model. Electron. Netw. Devices Fields 7 (2017)
- Scarpato, N., Pieroni, A., Di Nunzio, L., Fallucchi, F.: E-health-IoT universe: a review. Int. J. Adv. Sci. Eng. Inf. Technol. 7(6), 2328–2336 (2017)
- Cardarilli, G.C., Cristini, A., Di Nunzio, L., Re, M., Salerno, M., Susi, G.: Spiking neural networks based on LIF with latency: simulation and synchronization effects. In: Asilomar Conference on Signals, Systems and Computers, pp. 1838–1842 (2013)
- Khanal, G.M., Acciarito, S., Cardarilli, G.C., Chakraborty, A., Di Nunzio, L., Fazzolari, R., Cristini, A., Re, M., Susi, G.: Synaptic behaviour in ZnO-rGO composites thin film memristor. Electron. Lett. 53(5), 296–298 (2017)
- Acciarito, S., Cardarilli, G.C., Cristini, A., Nunzio, L.D., Fazzolari, R., Khanal, G.M., Re, M., Susi, G.: Hardware design of LIF with Latency neuron model with memristive STDP synapses. Integr. VLSI J. 59, 81–89 (2017)

- Khanal, G.M., Cardarilli, G., Chakraborty, A., Acciarito, S., Mulla, M.Y., Di Nunzio, L., Fazzolari, R., Re, M.: A ZnO-rGO composite thin film discrete memristor. IEEE, ICSE, art. no. 7573608, pp. 129–132 (2016)
- Acciarito, S., Cristini, A., Di Nunzio, L., Khanal, G.M., Susi, G.: An a VLSI driving circuit for memristor-based STDP. PRIME 2016, art. no. 7519503 (2016)
- 9. Opitz, D.; Maclin, R.: Popular ensemble methods: an empirical study. J. Artif. Intell. Res. 11, 169–198
- 10. Polikar, R: Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 6(3), 21-45
- 11. Rokach, L: Ensemble-based classifiers. Artif. Intell. Rev. 33(1-2), 1-39
- Dalmasso, I., Galletti, I., Giuliano, R., Mazzenga, F.: WiMAX Networks for Emergency Management Based on UAVs. In: IEEE–AESS European Conference on Satellite Telecommunications. (IEEE ESTEL 2012), Rome, Italy, Oct. 2012, p. 1–6 (2010)
- Giuliano, R., Mazzenga, F., Neri, A., Vegni, A.M.: Security access protocols in IoT capillary networks. IEEE Internet Things J. 4(3), 645–657 (2017)
- 14. Vivado Design Suite UG909 Partial Reconfiguration
- Hochreiter, S., Schmidhuber, J.: Long: Short-Term Memory. Neural Comput. 9(8), 1735–1780 (1997)
- Chang, A.X.M., Culurciello, E.: Hardware accelerators for recurrent neural networks on FPGA. In: 2017 IEEE International Symposium on Circuits and Systems (ISCAS)
- Krstanovic, S., et al.: Ensembles of recurrent neural networks for robust time series forecasting. In: 2017 International Conference on Innovative Techniques and Applications of AI, Cambridge

## Synthesis Time Reconfigurable Floating Point Unit for Transprecision Computing



Giulia Stazi, Federica Silvestri, Antonio Mastrandrea, Mauro Olivieri and Francesco Menichelli

**Abstract** This paper presents the design and the implementation of a fully combinatorial floating point unit (FPU). The FPU can be reconfigured at implementation time in order to use an arbitrary number of bits for the mantissa and exponent, and it can be synthesized in order to support all IEEE-754 compliant FP formats but also non-standard FP formats, exploring the trade-off between precision (mantissa field), dynamic range (exponent field) and physical resources. This work is inspired by the consideration that, in modern low power embedded systems, the execution of floating point operations represents a significant contribution to energy consumption (up to 50% of the energy consumed by the CPU). In this scenario, the adoption of multiple FP formats, with a tunable number of bits for the mantissa and the exponent fields, is very interesting for reducing energy consumption and, simplifying the circuit, area and propagation delay. Adopting multiple FP formats on the same platform complies with the concept of *transprecision computing*, since it allows fine-grained control of approximation while meeting the required constraints on the precision of output results. The designed FPU has been tested in order to evaluate the correctness of all supported operations, and implemented on a Kintex-7 FPGA. Experimental results are provided, illustrating the impact and the benefits derived by the use of non-standard precision formats at circuit level.

**Keywords** Floating point unit · Low power consumption · Approximate computing · Transprecision computing

A. Mastrandrea e-mail: mastrandrea@diet.uniroma1.it

M. Olivieri e-mail: olivieri@diet.uniroma1.it

F. Menichelli e-mail: menichelli@diet.uniroma1.it

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_30

G. Stazi (⊠) · F. Silvestri · A. Mastrandrea · M. Olivieri · F. Menichelli Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Via Eudossiana, 18, 00184 Roma, Italy e-mail: stazi@diet.uniromal.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

### **1** Introduction

The execution of FP operations in most embedded applications emerges as a major contributor to system energy consumption. In [1] experimental results show that 30% of the energy consumption is due to FP operations and an additional 20% is caused by moving FP operands from memory and registers and viceversa. Approximate computing [2–5] is an emerging technique which proposes to relax the specifications on precise computation allowing digital systems to introduce errors implied by imprecise hw/sw, and trading off quality, in terms of computational accuracy, for energy consumption or speed. According to another paradigm, transprecision computing [6], low power embedded systems should be designed to deliver the required precision for computation. In this scenario several works try to overcome the limitations of fixed-format FP types: for example in [7, 8] multi-precision arithmetic software libraries for performing calculations on number with arbitrary precision are proposed. In [1] the authors present a transprecision FPU capable of handling 8-bit and 16-bit operations in addition to the IEEE-754 compliant formats.

In this paper, we describe the design and the implementation of a fully combinatorial and reconfigurable FPU, supporting IEEE-754 and also capable of working with reduced precision formats, characterized by an arbitrary number of bits for the mantissa and the exponent.

## 2 Floating Point Representation, IEEE-754 Standard

The IEEE floating point standard (IEEE 754) [9] is a technical standard for floatingpoint computation, established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE).

$$X = (-1^s) \times 1.m_b \times b^e \tag{1}$$

According to Eq. 1 a floating point number consists of three fields: a sign bit (s), a biased exponent(e) and a mantissa (m).

The IEEE standard for *half precision* floating-point numbers has 1 sign bit, 5-bit exponent, and 11-bit mantissa, for *single precision* has 1 sign bit, 8-bit exponent, and 23-bit mantissa and finally for *double precision* has 1 sign bit, 11-bit exponent, and 52-bit mantissa. The advantage of floating-point over fixed-point is the range of numbers that can be represented with the same number of bits. In addition, the floating point IEEE-754 standard has defined representations for zero, negative and positive infinity, and NaN (Not a Number).

#### **3** Reconfigurable Floating Point Unit

The primary goal of this work is the design and implementation of a fully combinatorial and reconfigurable FPU using VHDL hardware description language at Register Transfer Level (RTL). The unit allows to perform operations on floating point numbers in any format with a maximum length of 64 bits and with an arbitrary number of bits dedicated to the exponent and to the mantissa fields. In this way, it is possible to support floating point operations fully compliant with those defined by the IEEE-754 standard (double-precision, single-precision and half-precision) and operations with reduced precision formats. The FPU has been made reconfigurable at synthesis time by declaring the length of all signals and variables as function of two generic types, *m* and *e*, used respectively for defining the length of the mantissa and the exponent.

The hardware architecture has been designed trying to satisfy the following targets: (a) reduced area occupation; (b) low power consumption; (c) maximum propagation speed. The FPU is fully combinatorial, in order to be inserted in a CPU 1-cycle pipeline stage. The implemented operations are: sum, subtraction, multiplication, conversion from floating point to integer and from integer to floating point.

## 3.1 Floating\_Point\_Unit\_core

Figures 1 and 2 represent, respectively, the *Floating\_Point\_Unit\_core* internal diagram and external interface. The FPU does not support operations between denormalized numbers because they would led to a larger use of logic and therefore of area occupation in exchange for a marginal increase in accuracy. The approach is indeed the same one used in the VFP of ARM processors: all denormalized numbers in input to the FPU are directly approximated to 0. The other FPU input signals are:

- operation, which encodes the type of operation to be executed;
- *control*, which configures the unit before performing computational tasks; in particular it assigns the sign during integer to float conversion and it determines whether exceptions are enabled on the status port.

The output signals are:

- *result*, which provides the result of the performed operation. Again, its length depends on the number of digits for the exponent and the mantissa (*e* and *m* parameters).
- *status*, which contains information regarding exceptions that have occurred and that are appropriately flagged.

The FPU supports the following exceptions: inexact result, invalid operation, underflow and overflow. When a case of underflow is detected, the result is approximated to 0, when instead an overflow occurs, the result is approximated to infinity. As far as the exception of an invalid operation is concerned, the result is set to *NaN*.



Fig. 1 Floating point unit core architecture



## **4** Experimental Results

## 4.1 Testing

The testing phase, performed immediately after the FPU design, represented an important step in order to make sure that the computational unit operates according to its design specifications and produces the correct results. To verify the behavior of the designed core for a variety of inputs, a testbench has been built. The testbench inserts input test vectors, automatically generated by a C program, into the FPU and then compares the results processed by the computational core with the output produced by the C program (using hardware FP). All validation was performed simulating the FPU at behavioral level using *Modelsim SE 10.1c* [10].

Synthesis Time Reconfigurable Floating Point Unit ...

|           | •             |               |                            |
|-----------|---------------|---------------|----------------------------|
| #bit sign | #bit exponent | #bit mantissa | Total bits                 |
| 1         | 5             | 10            | 16 (IEEE half precision)   |
| 1         | 6             | 11            | 18                         |
| 1         | 5             | 12            | 18                         |
| 1         | 6             | 13            | 20                         |
| 1         | 5             | 14            | 20                         |
| 1         | 7             | 16            | 24                         |
| 1         | 6             | 17            | 24                         |
| 1         | 8             | 23            | 32 (IEEE single precision) |
| 1         | 10            | 37            | 48                         |
| 1         | 9             | 38            | 48                         |
| 1         | 11            | 52            | 64 (IEEE double precision) |

 Table 1
 List of analyzed formats

Considering a FP number of predefined length, the partition of bits between the mantissa and the exponent fields has an impact on the represented numbers, enforcing a trade-off between dynamic range and precision; in particular the number of bits in the exponent field affects the range of numbers that can be represented while in the mantissa field modifies the precision of the represented number. We centered our analysis on reduced precision formats from a minimum of 16 bits up to 64 bits; this choice is supported by the consideration that single precision IEEE format is often not necessary for applications in embedded domains while IEEE half precision can be affected by underflow/overflow problems. In particular, Table 1 shows the list of standard and non-standard formats analyzed in this section.

## 4.2 Synthesis and Implementation

The next step was the synthesis and the implementation of the FPU core on Kintex 7 FPGA through *Xilinx Vivado Design Suite* [11]. Since the FPU is fully combinatorial, it does not have an input clock signal. As timing constraint for the project, a *virtual clock*, which is not connected to any design object, was used. To obtain data that reflect the effective gate area occupation, we run synthesis with hardware FPGA DSP block disabled, leaving the synthesizer with the possibility of using only the remaining logic blocks. This approach was required in order to compare the area occupied by different FP formats, which otherwise would have been implemented using DSP block using fixed FP formats. Table 2 shows the results gathered from the *Utilization report*. This report has been produced after the implementation of the FPU core at 40MHz clock speed (25 ns is the minimum period obtained for the single precision FP format) and it collects data regarding number of LUTs and slices used in the unit. It can be seen that the reduced precision formats allow to

| Precision | #slice | #LUT | % Resources rel. to single precision |
|-----------|--------|------|--------------------------------------|
| Half      | 191    | 624  | 37.9                                 |
| m11_e6    | 198    | 664  | 39.2                                 |
| m12_e5    | 209    | 730  | 41.5                                 |
| m13_e6    | 238    | 796  | 47.2                                 |
| m14_e5    | 243    | 807  | 48.2                                 |
| m16_e7    | 299    | 1037 | 59.3                                 |
| m17_e6    | 238    | 1114 | 67.0                                 |
| Single    | 504    | 1787 | 100                                  |

Table 2 Resources @ 40MHz clock with DSP disabled

**Table 3** Propagation delayfor reduced precision formats

| Precision | Propagation delay [ns] |
|-----------|------------------------|
| Half      | 20                     |
| m11_e6    | 20                     |
| m12_e5    | 20                     |
| m13_e6    | 20                     |
| m14_e5    | 20                     |
| m16_e7    | 21                     |
| m17_e6    | 21                     |
| Single    | 25                     |
| m37_e10   | 25                     |
| m38_e9    | 25                     |
| Double    | 29                     |

significantly limit hardware resources. Propagation speed results, extrapolated from the *Timing Summary Report* produced after each implementation, are illustrated in Table 3. For half precision and for precisions between 18 and 20 bits, propagation speed remains constant at 20 ns, 25% better than single precision. Single precision presents a propagation speed of 25ns, decreasing the operating frequency to 40 MHz; finally double precision introduces an increase in propagation time of about 45% with respect to the best case.

## 5 Conclusions

In this paper, the design of a FPU, synthesizable with arbitrary precision formats, was presented. We showed that a FPU with reduced precision is a good solution for low-power and low-cost microprocessor systems. The savings in terms of resource
occupation, for the analyzed formats, range from about 38% for m17\_e6 format and reach about 63% for m11\_e6 format with respect to single precision. Moreover, reducing precision has also considerably decreased propagation delay. In particular, the propagation delay on reduced-precision implementations was about 20 ns, with a gain of about 25 and 45% with respect to single and double precision formats.

In future works, synthesizing the FPU on ASIC will allow more accurate estimate of area occupation and speed gain and will also add an estimate and a comparison on power consumption, which was revealed not reliable using FPGA as target.

#### References

- Tagliavini, G., Mach, S., Rossi, D., Marongiu, A., Benini, L.: A transprecision floating-point platform for ultra-low power computing. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), vol. 2018, pp. 1051–1056. IEEE (2018)
- Han, J., Orshansky, M.: Approximate computing: An emerging paradigm for energy-efficient design. In: 18th IEEE European Test Symposium (ETS), vol. 2013, pp. 1–6. IEEE (2013)
- Stazi, G., Menichelli, F., Mastrandrea, A., Olivieri, M.: Introducing approximate memory support in linux kernel. In: 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), pp. 97–100. IEEE (2017)
- Menichelli, F., Stazi, G., Mastrandrea, A., Olivieri, M.: An emulator for approximate memory platforms based on qemu. In: International Conference on Applications in Electronics Pervading Industry, Environment and Society, pp. 153–159. Springer (2016)
- Stazi, G., Adani, L., Mastrandrea, A., Olivieri, M., Menichelli, F.: Impact of approximate memory data allocation on a h.264 software video encoder. In: International workshop on Approximate and Transprecision Computing on Emerging Technologies (ATCET). Springer (2018)
- Malossi, A.C.I., Schaffner, M., Molnos, A., Gammaitoni, L., Tagliavini, G., Emerson, A., Tomás, A., Nikolopoulos, D.S., Flamand, E., Wehn, N.: The transprecision computing paradigm: concept, design, and applications. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), vol. 2018, pp. 1105–1110. IEEE (2018)
- 7. Bailey, D.H., Yozo, H., Li, X.S., Thompson, B.: Arprec: an arbitrary precision computation package (2002)
- Fousse, L., Hanrot, G., Lefèvre, V., Pélissier, P., Zimmermann, P.: Mpfr: a multiple-precision binary floating-point library with correct rounding. ACM Trans. Math. Softw. (TOMS) 33(2), 13 (2007)
- 9. Kahan, W.: Ieee standard 754 for binary floating-point arithmetic. Lecture Notes on the Status of IEEE, vol. 754, no. 94720–1776, p. 11 (1996)
- 10. Graphics, M.: Modelsim-advanced simulation and debugging (2012)
- 11. Feist, T.: Vivado design suite. White Paper, vol. 5 (2012)

## Radiation Hardness by Design Techniques for 1 Grad TID Rad-Hard Systems in 65 nm Standard CMOS Technologies



#### Gabriele Ciarpi, Sergio Saponara, Guido Magazzù and Fabrizio Palla

**Abstract** The paper shows the radiation effects on 65 nm standard CMOS technology and RHBD (Radiation Hardening By Design) techniques developed to reduce the mosfets performance degradation. The paper is focused on the techniques to address extremely high Total Ionization Dose (TID) up to 1 Grad, which is the level required for the planned upgrade of the CERN's LHC (HL-LHC). Today, only few data of single mosfets measurement at 1 Grad are presented in literature. These data are collected and transistors models are developed to presents, in this paper, the first system simulation results at 1 Grad conditions. As case of study, the performance reduction of two full-custom D flip-flops are presented, highlighting the robustness against radiation of CML technology for high-speed applications (10 Gbps).

## 1 Introduction

As electronic systems are pervading harsh environment, new challenges have to be faced in order to increase the electronics reliability of systems under extremely temperature, electromagnetic fields and radiation conditions. About radiation issues, generally, three main approaches have been considered to develop radiation hard components: process enhancement (Radiation Hardening By Process, RHBP [1]), design enhancement (Radiation Hardening By Design, RHBD [2]), and finally shielded packages (Radiation Hardening By Shield, RHBS [3]). RHBP requires very expensive dedicated process or specific variation of standard process, which does not have the same maturity of standard one. In general, RHBP presents an effort that can be supported only by large silicon foundries but rad-hard market is only a niche. RHBS usually needs expensive and cumbersome shields that can not be used for applications where weight and volume are essential. Therefore,

G. Ciarpi (🖂) · S. Saponara

Dip. Ingegneria dell'Informazione, University of Pisa, Pisa, Italy e-mail: gabriele.ciarpi@ing.unipi.it

G. Ciarpi · G. Magazzù · F. Palla INFN, Sezione di Pisa, Pisa, Italy

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_31

RHBD is preferable whenever is possible. Radiation effects on ICs can be divided into accumulative and local events: Total Ionizing Dose (TID) and Single-Event Effects (SEE). As space applications are more affects by SEE than TID the study of solutions to deal with the impact of energetic heavy ions or protons is more widespread in literature [4]. Therefore, techniques as triple redundancy and error correction code are pervasive in radiated environment [5].

Being space, the main market of rad-hard devices and having TID lower than 1 Mrad [6], the effort to develop systems to reduce the TID effects is significantly lower than that to face SEE. In applications with high radiation levels as medical physics or high energy physical experiments the level of total dose is a key point for results reliability. In current set-up of the CERN's LHC the electronics systems are qualified for few hundreds of Mrad but the forecast upgrade to HL-LHC creates challenges to increase the systems reliability up to 1 Grad [7]. In addition, the high amount of data extracted from the HL-LHC pixel sensors require high-speed high-reliability communication links up to 10 Gbps [8, 9]. To the best of the authors' knowledge, today in literature, there are not systems that can work at 1 Grad TID and there are only few data of single transistors measurements at so high TID. The 65 nm CMOS technology seems to be the promising technology for these TID levels, therefore in this paper the measured data of the 65 nm technology at 1 Grad are collected and transistor models are derived in order to evaluate the system performance degradation at this TID levels. For this purpose, Sect. 2 shows the main effects that TID makes on the 65 nm technology and in Sect. 3 are discussed the main RHBD techniques used to face mosfets performance degradation. In Sect. 4 are displayed two examples of performance degradation of full-custom designed D flip-flops, which want to give an idea of system level TID effects. Conclusions are draw in Sect. 5.

#### 2 TID Effects in 65 nm CMOS Technology

For the MOS technology, when a device is irradiated, electron-hole (e-h) pairs are generated in the dielectric and separated by electrical field. Electrons, which have mobility higher than holes, escape from the dielectric towards the gate for a positive bias. The holes, while they are traveling towards the dielectric-silicon interface, can be collected by traps. This generates punctual positive charge defects near the interface, influencing the mosfets characteristics. In addition, holes travel can move H<sup>+</sup> ions, which are filled in the oxide for dry deposition. The charges in the oxide are always positive, therefore they creates a negative shift of the threshold voltage for n-mosfets and a positive shift for p-mosfets. The nature of the charge trapped in the interface traps depends on the mosfets type, in particular, is negative for n-mosfets and positive for p-mosfets. Radiation-induced-hole transport has a strong dependence on gate oxide thickness as displayed in [10], in particular, a  $t_{ox}^2$  dependence of the flatband voltage shift is observed. The drastically reduction of these effects for thin oxide makes the 65 nm CMOS technology, with gate-oxide thickness of only 2 nm, a very radiation hard technology.



Although the expected performance of this technology, in Fig. 1 are shown the radiation effects on 'n' and 'p' core mosfets of 65 nm technology. In particular, it is highlighted the strongly effects on the drain current of mosfets connected in diode configuration. This such large degradation is due to the other oxides in the mosfet process with higher thickness and lower quality, as shallow trench insulation (STI) and spacers.

### **3** Radiation Hardness Techniques

In this section, the main techniques used to face the performance degradation of the mosfets are analyzed.

### 3.1 Enclosed Layout

One of the main effects that makes such huge performance degradation is due to STI. When the STI dielectric is struck by ions or radiation, e-h pairs are generated and the same effects discuss in Sect. 2 for the gate-oxide are obtained. The positive charge trapped near the channel creates parasitic mosfets that are the mainly responsible for leakage current in n-mosfets. Furthermore, the creation of interface defects, trapping charge, creates electrical field that leads to narrow channel effects [11]. A solution of this problem is avoiding the edge between the gate and the STI, so at layout level the mosfets are designed with a circular gate shape, as in Fig. 2, called Enclosed Layout Transistor or Edge Less Transistor (ELT) [12]. The mosfet drain has usually higher resistance than source, making it a hot node for SEE. The inner well of the ELT mosfet shape has a lower area than the external one so the drain of the transistor should be the inner well for SEE reduction. Figure 3 shows the improvement performance



Fig. 2 Classical (left) and ELT (right) mosfet layout structures. In the classical layout the leakage paths, due to radiation charge in STI, are highlighted



Fig. 3 ELT layout mosfets degradation for diode configuration. The data up to 400 Mrad (SiO<sub>2</sub>) are extracted by [13]. The data for TID higher than 400 Mrad (red-highlighted data) are obtained with linear interpolation

using ELT layout for n and p mosfets, in particular the robustness again the radiation increases of 15% for n-mosfets and 23% for p-mosfets [13]. On the other hand, the use of ELT leads to use more area for the layout (nearly more 2% for each mosfet) and imposes a minimum mosfet width of 1.3  $\mu$ m to meet the fabrication rules. In addition, these mosfets need ad hoc models and tools for electrical simulations and for recognition of the circular shapes in the LVS (Layout vs. Schematic) check.

## 3.2 Minimum Length

Figure 3 shows a performance increase with an increase in transistors length. This is related to LDD (Lightly Doped Drain) spacer oxide effects. LDDs are next to gate oxide and have a higher thickness, so the radiation effects on these structures

influence whole mosfet performance. Two main effects are due to LDDs, the first is the increment of the mosfet in series resistance due to charge trapped in the spacer oxide or at its interface. The second is due to H<sup>+</sup>, which can move near the channel provoking Vth shift [13]. The use of long transistors allows reducing the contribution of these effects on the mosfets performance, having a low channel percentage near the LDDs. Obviously, the use of long mosfets means reducing their transconductance or increasing mosfet capacitances if the W/L ratio is kept constant. Therefore, the use of long mosfets can be a solution only for low-speed applications.

## 4 D Flip Flops for High-Speed High-TID Systems

For high-speed applications and low voltage systems, the lower are the mosfets Vth (in module) and the faster are the rise/fall time of the signals. The radiation effects, shown in Sects. 2 and 3, shift the Vth of 'n' and 'p' mosfets. In particular, for n-mosfets, the effects due to oxide trapped charge and interface defect compensate themselves reducing the Vth shift. While for p-mosfets, being both the collected charges positive, there is not compensation. Therefore, p-mosfets Vth shift makes it not useful for high data rate and high dose applications. In order to evaluate the effects on an integrated system, due to mosfets performance decrement, the transistors behavior, under high dose, is modeled using the data extracted from literature [11–15]. The models of 'n' and 'p' mosfets are then used with Spectre simulator, to extract the performance of two basic D flip-flops in two different technologies. The DFF is chosen as representative block for the performance of digital high-speed systems.

## 4.1 CMOS D Flip-Flop

The CMOS D-FF chosen for the system evaluation is the classical 16 transistors static D flip-flop (DFF), shown in Fig. 4. The mosfets are sized to face high-speed switching up to 10 Gbps and high TID. In order to reduce the radiation effects, the chosen mosfet length is 120 nm.

After 500 Mrad TID the CMOS-DFF shows a significantly reduction of the maximum switching frequency, which is shifted to 3.5 Gbps. This is due to p-mosfets degradation, which not allows the fast commutation of not-gates. As shown in Fig. 3, the p-mosfets degradation is particularly heavy for 1 Grad TID, reducing the maximum switching frequency at only 2.5 Gbps. In Fig. 5 is shown the eye diagram of the output signal for 1 Grad TID.



Fig. 4 Schematic of the CMOS 16 transistors D flip-flop



### 4.2 CML D Flip-Flop

The CML (Current Mode Logic) technology uses only n-mosfets and resistors, avoiding the use of p-mosfets. In Fig. 6 is shown the schematic of a CML flip-flop D, which uses two latches with the classic two differential couples, one to impose the signals and the other for level regeneration and storage. This technology uses differential signals to reject common mode disturbs and a voltage swing of 400 mV for high switching frequency. Under high TID, the CML DFF presents a different behavior than CMOS DFF; while the CMOS works or not works for a TID level the CML presents a degradation of the performance but it is still working. The simulations with the irradiated models highlight that the CML DFF works at 10 Gbps also at 1 Grad TID only with a decrement on the eye opening of 50 mV, as shown in Fig. 7.

Although the higher performance of CML DFF, it has a significant higher static power consumption than CMOS DFF, which make this architecture useful only for high-speed signals.



Fig. 6 Schematic of the CML D flip-flop



## 5 Conclusions

In this paper are presented the main effects that high radiation levels have on standard 65 nm CMOS technology and the RHBD techniques used to reduce their degradation effects, keeping low cost. The first simulation results of the 65 nm CMOS technology exposed to 1 Grad TID are shown, thanks the development of 'n' and 'p' mosfet irradiated models for schematic level simulations. Two full-custom D flip-flops are taken as example for performance degradation at very high TID levels. The CMOS-DFF highlights the incompatible use of standard CMOS cell for high-speed signals, as standard cells used in automatic digital synthesizer, at so high TID. The CML architecture shows a much higher robustness to radiation than CMOS but, on the other hand, an increment on consumption power is expected. Using the developed transistors irradiated models, the performance of more complex system can be evaluated at design level before a real exposition to radiation with the consequent saving of money and time. This allows achieving new RHBD techniques to face this extremely high TID levels.

## References

- 1. Hadda, N.F., et al.: Incremental enhancement to SEU hardened 90 nm CMOS memory cell. In: Proceeding of RADECS 2010 (2010)
- Clark, L., et al.: Optimizing radiation hard by design SRAM cells. IEEE Trans. Nucl. Sci. 54(6), 2028–2036 (2007)
- 3. Lv, H., Sun, Y., et al.: Research on optimization design of radiation dose shield hardening for aerospace components. In: IEEE PHM-Harbin (2017)
- 4. Garcia, R., Brugger, M., et al.: Simplified SEE sensitivity screening for COTS components in space. IEEE Trans. Nucl. Sci. 64(2) (2017)
- Sielewicz, K.M., Rinella, G.A., et al.: Experimental methods and results for the evaluation of triple modular redundancy SEU mitigation techniques with Xilinx Kintex-7 FPGA. In: IEEE Radiation Effects Data Workshop (REDW) (2017)
- 6. Sinclair, D., Dyer, J., Radiation effects and COTS parts in small SmallSats. In: SSC13, AIAA/Utah
- 7. Butler, J., et al.: Technical proposal for the phase-II upgrade of the CMS detector. In: CERN-LHCC-2015-010
- 8. Magazzù, G., Ciarpi, G., et al.: Design of a radiation-tolerant high-speed driver for Mach Zender modulators in high energy physics. In: IEEE ISCAS (2018)
- 9. Paternò, A., Pacher, L., et al.: New development on digital architecture for efficient pixel readout ASIC at extreme hit rate for HEP detectors at HL-LHC. In: IEEE NSS/MIC/RTSD (2016)
- 10. Saks, N.S., Ancona, M.G., et al.: Radiation effects in MOS capacitors with very thin oxides at 80 °K. IEEE Trans. Nucl. Sci. **31**(6) (1984)
- 11. Faccio, F., Michelis, S., et al.: Radiation-induced short channel (RISCE) and narrow channel (RINCE) effects in 65 and 130 nm Mosfets. IEEE Trans. Nucl. Sci. **62**(6) (2015)
- Anelli, G., Campbell, M., et al.: Radiation tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments: practical design aspects. IEEE Trans. Nucl. Sci. 46(6) (1999)
- Faccio, F., Borghello, G., et al.: Influence of LDD spacers and H<sup>+</sup> transport on the total-ionizingdose response of 65 nm Mosfets irradiated to ultra-high doses. IEEE Trans. Nucl. Sci. 65(1) (2018)
- Menouni, M., Barbero, M., et al.: 1-Grad total dose evaluation of 65 nm CMOS technology for the HL-LHC upgrades. J. Instrum. 10 (2015)
- Ding, L., Geranrdin, S., et al.: Drain current collapse in 65 nm PMOS transistors after exposure to grad dose. IEEE Trans. Nucl. Sci. 62(6) (2015)

# Part IX Poster Session

## **Context-Aware Environments in Passenger Train Transportation Systems: Ideas, Feasibility and Risks**



Francisco Falcone, Costas Patsakis and Agusti Solanas

Abstract In this article, an interactive context-aware system applied in passenger train transportation is outlined, following a holistic approach. User-system interaction is provided by means of wireless sensor/actuator networks deployed in several locations of the train infrastructure, coupled with additional information, such as user mobile phone data. Several services, such as user assistance (location/guidance), location-based marketing and interactive agendas are suggested. In addition to proposing this conceptual idea, analysis in terms of Quality of Service metrics, in terms of wireless coverage/capacity channel estimation, as well as potential security issues are discussed. In this article we do not describe and discuss the solution in detail. Instead, the goal of this article is to set the ground for the development of the idea, discuss its feasibility in communications terms, and point out the security and privacy risks that might arise in this kind of contexts.

## 1 Introduction

The ageing of the population coupled with a steady urbanization process pose important challenges for governments and policy makers. One of the main challenges that our society has to face is to achieve sustainable living environments, whilst increasing quality of life indicators, as it has been shown that there is a relation between environmental factors and diseases [1]. The trend in population settlements indicates that within the next generation, approximately 70% of the world's population

F. Falcone (🖂)

C. Patsakis University of Piraeus, Piraeus, Greece e-mail: kpatsak@gmail.com

Universidad Pública de Navarra, Pamplona, Navarre, Spain e-mail: francisco.falcone@unavarra.es

A. Solanas Universitat Rovira i Virgili, Tarragona, Catalonia, Spain e-mail: agusti.solanas@urv.cat

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_32

will be living in urban and sub-urban areas. As a result, resource management and provisioning become one of the main goals in the Smart Cities and Smart Regions frameworks. In order to fulfill them, multiple elements are used to provide interactive scenarios as well as data analytic elements, enabling not only reactive behavior from the Smart City/Smart Region, but also inference capabilities to provide a proactive response [2].

Tasks within the Smart City/Region are performed by multiple sub-systems and areas, such as energy (smart grids, micro grids and renewable energy resources), waste management (waste treatment, reduction of greenhouse effect gases), health-care (e-Health, m-Health, Smart Health [3]) or Intelligent Transportation systems (ITS). In this article, amongst these systems, we will place our focus on ITS and particularly, in passenger train transportation in the context of Smart Cities and Smart Regions.

ITS are one of the most relevant systems regarding sustainability, given the impact of all transportation modes in economic terms (region development, optimization and reduction of travel time), safety (reduction in accidents) and environmental impact (pollution reduction, noise reduction). The adoption of ITS within the framework of Smart Cities/Smart Regions is motivated by multiple elements, such as vehicle connectivity, platooning capabilities, route optimization, fault diagnostics and autonomous vehicles. In this sense, wireless communication systems play a key role in providing the required levels of ubiquity and mobility, and to aid in the goal of putting people in the center of transportation systems [4]. Multi-modal transportation is also a key factor in order to optimize overall transportation systems, both for passengers as well as for freight distribution. Alongside with the development of ITS as one of the Smart City/Smart Regions subsystems, the development of Internet of Things and the evolution of communications systems towards HetNets and 5G systems are playing a key role in the transformation of social and business models as a whole. New capabilities within communication systems will enable anywhere, anytime, anything connectivity, with new requirements in terms of bandwidth, delay and data aggregation, to name a few. In this article, within different transportation types, we will focus on passenger train transportation systems, exploring new capabilities derived from the use of ICT in order to build a context aware environment.

### 2 ITS-Train System Overview

Train transportation systems are one of the key elements contributing to provide territorial cohesion, as well as to improve overall transportation quality. In this sense, rail transportation systems are experimenting a sustained evolution towards the provision of more efficient, safer and enhanced user experience services. In the case of Europe [5], a railway package has been proposed, in which multiple aspects, such as governance, domestic market access or technical aspects (safety and interoperability) are addressed and are currently under study. In this sense, specific initiatives such as Shift2Rail or the adoption of the European Railway Traffic Management System (ERMTS) are pointing towards the direction of ICT supported railway systems.

In the context of IoT/Smart Cities/Smart Regions, connectivity as well as interactive capabilities among users and with the railway infrastructure is compulsory. This can be achieved by providing an architecture that comprises the use of dedicated embedded networks (*e.g.*, wireless sensor networks and on-board WLAN networks), cooperating with other networks (*e.g.*, Public Land Mobile Networks-PLMN, municipal WLAN networks) which can connect to content specific servers (*e.g.*, local/regional information, transportation connection in multi-modal transports) as well as to auxiliary systems (*e.g.*, emergency calls, infrastructure monitoring, user statistical analysis, location based marketing schemes, etc.). In this work, we propose the conceptual idea of a context-aware urban train system, applicable to passenger train transportation. The system has the aim of fulfilling the following goals:

- Providing users with transportation information (schedules, gates, boarding times, connections, etc.)
- Enabling multi-modal connectivity, by providing end to end travel information, in combination with other transportation means, such as urban/regional buses, taxi services, flight connection or sea vessel connection.
- Location and guidance services, such as passenger assistance
- Infrastructure tele-control and tele-monitoring
- Location based marketing services
- User facility/service analysis.

The implementation of the system is based on the superposition of on board networks (WSN + WLAN), infrastructure networks (WLAN + WSN located on train stations) and an information gathering/dissemination platform, schematically depicted in Fig. 1.

## 3 Coverage/Capacity and Information Security Brief Analysis

In order to correctly design and evaluate the proposed context aware passenger train system, several parameters have to be analyzed [6], mainly in relation with coverage/capacity analysis [7] as well as in relation with potential information security issues. In the next lines we briefly elaborate on these two aspects.

Given the fact that interconnectivity among users and systems is mainly supported by WSN/WBAN/WPAN and WLAN systems, their use must be carefully studied in order to provide adequate QoS indicators (*i.e.*, required transmission bit rate as a function of receiver sensitivity, overall delay, etc.) as well as security levels. In relation with physical layer requirements, the proposed systems operate in the 2.4 GHz and 5 GHz, enabling the use of smartphones and mobile terminals with WiFi as well as Bluetooth (Classic/BLE). In relation with embedded transceivers within the wireless sensors networks, multiple solutions can be used, based on other alternative



Fig. 1 Schematic representation of the context-aware passenger transportation system



f<sub>operation</sub>@5GHz

frequencies in the 400–900 MHz frequency band. The proposed system can operate jointly with these new frequency bands, without loss of generality in the provided description.

The analysis of operation has been obtained by performing wireless channel analysis with the aid of in-house 3D ray launching code, considering a conventional passenger train wagon. An example of the obtained coverage plots is depicted in Fig. 2. These values enable to perform estimations in overall received power levels as a function of receiver sensitivity, mainly given by required bit rate at receiver. Examples for conventional WLAN/WBAN receiver sensitivities and comparison with received power levels, which derive on coverage/capacity estimations are depicted in Fig. 3. From the basis of these results, the network deployment can be determined, as a function of node configuration as well as node density and placement.

As stated above, in addition to providing the proper communications coverage, it is paramount to guarantee passengers security, both in terms of physical security and information security. Next, we briefly discuss the latter: Offering services in trains implies several security and privacy issues as discussed in [8]. Many efforts have been



Fig. 3 Coverage/Capacity analysis for on board WSN/WLAN networks, operating at 2.4 GHz and 5 GHz

devoted to proposing security and privacy solutions in vehicular communications [9], but train transportation systems, understood as a context-aware environment, have not been analyzed so thoroughly. Since trains are means of public transportation, they are used in a daily basis by millions of people worldwide. Therefore, a successful attack on such services may hurt millions simultaneously. In what follows, one needs to consider that context awareness would stem from the provision of additional data that will be most likely broadcasted. In this regard, the attacks may either try to block the data transmission, to alter the broadcast data or to intercept and manipulate traffic.

In the former case, jamming, the cost for the adversary can be considered minimal and the attack easy to be performed. Nevertheless, the actual impact of this attack is in the scale of annoyance. The reason is that jamming would actually result into a denial of service attack. Since these services are not critical, the jamming is expected to irritate the users, but nothing more than that. On the other hand, in most occasions, when data is broadcasted, encryption is not used as it is considered an unnecessary overhead. In this regard, whether this is a beacon, a WiFi broadcast message, or an amplified GPS signal, the recipients cannot verify the information source. Therefore, an adversary can easily broadcast information to all nodes in proximity, mainly to lure them into performing other actions. It is worth noting that this can also be considered a social engineering attack as users have an inherent trust in the service provider who appears to be the train management organization. In fact, several tools like *wifiphisher*<sup>1</sup> have been developed to exploit such scenarios and special hardware<sup>2</sup> is available as off-the-shelf solution. Hence, the attacker can easily trick the user to e.g. install rogue applications and steal their credentials (as it has been proved in other domains such as in healthcare [10]). Should the attacker intercept and manipulate traffic, then she may extract a lot of sensitive information. Just from interception, the

<sup>&</sup>lt;sup>1</sup>https://github.com/wifiphisher/wifiphisher.

<sup>&</sup>lt;sup>2</sup>https://www.wifipineapple.com/.

adversary can collect unique hardware identifiers like MAC addresses that can be used to identify the victims.<sup>3</sup> This can potentially reveal the user's residence, work and time schedules that are important for targeted advertising.

## 4 Concluding Remarks

In this work, a framework for implementing a context-aware passenger train system is outlined. Several parameters have been analyzed, in relation with coverage/capacity analysis as well as security/privacy issues. The results aid in the design phases of the proposed solution, based on the combination of multiple wireless networks (on board/infrastructure) and application layers.

The proposed context-aware scenario description is preliminary in nature but sets the ground for further and more detailed analyses. In this sense, the main contribution of the article is to state the need for context-aware train transportation systems to support sustainable smart cities and regions. The preliminary coverage/capacity results presented in this article show that defining a context-aware system in this domain is possible. Moreover, we emphasize the importance of paying attention to security issues, since the highly interconnected infrastructure provided to passengers opens the door to multiple and variate attacks to their security and privacy. In this sense, this article also serves the goal of raising awareness on the need for considering these issues in the design of context-aware environments such as the one proposed here.

In the near future we aim at implementing the proposed system in practice and design a set of applications that will be offered to passengers during their daily rides.

Acknowledgements This work is supported by Generalitat de Catalunya projects 2017-SGR-896 and 2017-DI-002, and by URV project 2017PFR-URV-B2-41.

### References

- Riaño, D., Solanas, A.: Exploiting the relation between environmental factors and diseases: a case study on chronic obstructive pulmonary disease. In: Workshop on Knowledge Representation for Health-Care Data, Processes and Guidelines. KR4HC 2014: Knowledge Representation for Health Care, pp. 160–173 (2014)
- Solanas, A., Casino, F., Batista, E., Rallo, R.: Trends and challenges in smart health-care research: a journey from data to wisdom. In: 3rd International Forum on Re- search and Technologies for Society and Industry (RTSI), pp. 1–6. IEEE (2017)
- Solanas, A., Patsakis, C., Conti, M., Vlachos, I.S., Ramos, V., Falcone, F., et al.: Smart health: a context-aware health paradigm within smart cities. IEEE Commun. Mag. 52(8), 74–81 (2014)
- All Ways Travelling-POC. https://ec.europa.eu/transport/sites/transport/files/docs/2017-awtphase-2.pdf

<sup>&</sup>lt;sup>3</sup>https://github.com/sensepost/mana/.

- 5. SWD 226 final document: The implementation of the 2011 White Paper on Transport "Roadmap to a Single European Transport Area—towards a competitive and resource-efficient transport system" five years after its publication: achievements and challenges (2016)
- Azpilicueta, L., Astrain, J.J., Lopez-Iturri, P., Granda, F., Vargas-Rosales, C., Villadangos, J., Perallos, A., Bahillo, A., Falcone, F.: Optimization and design of wireless systems for the implementation of context aware scenarios in railway passenger vehicles. IEEE Trans. Intell. Transp. Syst. 99, 1–13 (2017)
- Aguirre, E., Lopez-Iturri, P., Azpilicueta, L., Redondo, A., Astrain, J.J., Villadangos, J., Bahillo, A., Perallos, A., Falcone, F.: Design and implementation of context aware applications with wireless sensor network support in urban train transportation environments. IEEE Sens. J. 17, 169–178 (2017)
- Chen, B., et al.: Security analysis of urban railway systems: the need for a cyber-physical perspective. in: International Conference on Computer Safety, Reliability, and Security. Springer, Cham (2014)
- Zhang, L., Wu, Q., Solanas, A., Domingo-Ferrer, J.: A scalable robust authentication protocol for secure vehicular communications. IEEE Trans. Veh. Technol. 59(4), 1606–1617 (2010)
- Papageorgiou, A., Strigkos, M., Politou, E., Alepis, E., Solanas, A., Patsakis, C.: Security and privacy analysis of mobile health applications: The alarming state of practice. IEEE Access 6, 9390–9403 (2018)

## Automatic Perishable Goods Shelf Life Optimization in No-Refrigerated Warehouses by Using a WSN-Based Architecture



Daniela De Venuto and Giovanni Mezzina

Abstract Aiming to improve the perishable goods shelf life (SL) providing a generalized solution that minimizes the waste due to not careful storing, a low cost and reprogrammable Wireless Sensors Network (WSN) based architecture for the functional warehouse management is here proposed. The architecture continuously monitors environmental parameters (i.e., temperature, light and humidity), combining them for the spatio-temporal prediction of the product SL. These parameters are treated by a 1st order kinetic model, taking into the account the storage site area, identifying the position that maximizes the SL. The system manages the pallets positions by a set of automated trans-pallets. An experimental proof of concept of the proposed architecture is here provided, comparing the presented system in the storage of vegetables with respect to a typical one with static management.

## 1 Introduction

The World Health Organization (WHO) and Food and Agriculture Organization (FAO) identify in the pre-consumer losses the main cause of the food losses and wastes. In particular, they state that the most critical point in the food supply chain is represented by the bad fresh food storage [1]. Among the food degradation causes in a storage context, the temperature is considered the most relevant factor [2]. Indeed, if perishable goods are stored in warehouse hotter than the temperature limits, rapid microbiological growth takes place [3]. It leads to rotten goods and subsequent economic issues [2].

By monitoring the temperature, as first constrain, and then assessing relative humidity and light expositions, it is possible to extract a prediction of the period of time for which goods will remain safe for use: the Shelf Life (SL) [2].

D. De Venuto · G. Mezzina (⊠)

Department of Electrical and Information Engineering, Politecnico di Bari, 70125 Bari, Italy e-mail: giovanni.mezzina@poliba.it

D. De Venuto e-mail: daniela.devenuto@poliba.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_33



Fig. 1 Overview of the implemented WSN-based architecture

The advances in the Wireless Sensors Networks (WSNs) in terms of power consumption of the smart sensor (to permit a continuous monitoring), the number of available sensing nodes and the introduction of low-cost coordinator platforms (e.g., Raspberry, Arduino, etc.), opened new scenarios for the SL prediction and improvement [4]. Combining the benefits of the WSNs with the computational capability of a low-cost platform (i.e., Raspberry Pi), an innovative fully automated architecture for the SL optimization in no-refrigerated environment is here proposed. The smart WSN architecture merges the monitoring capability of ZigBee nodes equipped with sensors, with a logistic algorithm to maximize the goods lifetime. The pallets displacement is entrusted to step-by-step driven automatic trans-pallet.

#### 2 The Architecture

Figure 1 shows the WSN-based architecture for the functional warehouse management. The figure has been decomposed in three main sections: (i) the Environmental Unit, (ii) the Pallets Monitor Unit, (iii) the Pallets Management Unit. In all the units it is possible to identify, inside the white box on the top, the Control Unit. The Environmental Unit consists of eight multisensory Light/Humidity/Temperature (L/H/T) nodes (green L/H/T in Fig. 1) that acquire statically the data from a specific position of the storing room. As shown in Fig. 1, the data are sent to the Control Unit via ZigBee. From these data the Control Unit realizes a time continuous heating map of the storing site.

The Pallets Monitor Unit is realized with dedicated L/H/T nodes (orange) directly placed on the surface of the pallet. It communicates via ZigBee or through dedicated commercial gateway with the Control Unit. These additional sensors allow the system to accurately derive the product SL [4]. The third section of the Fig. 1 is dedicated to the Pallets Management Unit and consists of a set of trans-pallets driven step-by-step, in their navigation, by the Control Unit. The trans-pallets displace the pallets

in the positions that optimize the SL. The transporters embed a dedicated hardware programmed for the automatic movements, curves handling and motors power management. They communicate with the Control Unit via Bluetooth. The Control Unit collects data from the Environmental and Pallets Monitor units, implementing a predictive SL computing on the chosen pallet. The SL is extended to the whole storing area in order to identify the optimal site for the specific pallet.

Once this place is identified, the Control Unit opens a Bluetooth communication with a dedicated pallet transporter. The Control unit consists of a Raspberry Pi 2 B + (RPi) working with Raspbian Jessie OS. It is equipped with an ATmega256RFR2 for the ZigBee communication and an HC-05 for the Bluetooth interfacing. The RPi implements a set of Python 2.7 scripts for the SL computing, management algorithm and the automated trans-pallets dispositions.

As a proof of concept, the WSN-based monitoring and management system has been implemented in a real-life context by using 12 ZigBee nodes, equipped with L/H/T sensors. Four of these were applied on the pallets to be monitored (Pallets Monitor Unit), while the remaining eight were bonded on the storing room walls (Environmental Unit). The Pallets Management Unit consists of an acrylic car prototype [5], equipped with Arduino UNO and HC-05 (Bluetooth). Even if in the areas in which the automated robots operate, no obstacles are typically present, the automatons are equipped with ultrasonic sensors (e.g., HC-SR04), to avoid crashes.

#### 2.1 Sensing System

A single sensing node consists of an NXP JN5164 microcontroller interfacing a 10bit resolution ADC [6], for data acquisition and digitization. The node embeds a Digi XBee 802.15.4 module for ZigBee communication. Each sensor node uses  $3 \times 1.5$ AA batteries with a transmitting power <1 mW. It allows the node to be powered for 1.5 years if the read-sleep cycle is: 1 reading/30 s, and 2.5 years if read-sleep cycle is: 1 reading/60 s. The temperature sensor has an accuracy of  $\pm 2$  °C (in a range of -18 °C to +55 °C). The humidity sensor has an accuracy of  $\pm 3.5\%$  RH (Interchangeability  $\pm 5\%$  ( $\leq 59\%$  RH)  $\pm 8\%$ (>59% RH)) and the light one has a luminance range of  $1-10001 \times (\pm 20\%)$ .

#### 2.2 1st Order Kinetic Model Shelf Life Prediction

The Control Unit implements a reprogrammable algorithm for the SL assessment. As a proof of concept, we focused on fresh-cut tomatoes parameters.

The food quality is affected by several physical, chemical and microbiological reactions that can be mainly related to the food color and firmness [7, 8]. This latter is itself related to temperature, humidity and light exposition, via empirical function/constants known as kinetic parameters [7]. The empirical law that describes

the temperature dependence in a simple chemical reaction is the Arrhenius law [7], it relates the rate constant k of a reaction with the absolute temperature T, through the equation [8]:

$$k(T) = p_{ex} \cdot e^{-Ea/RT} \tag{1}$$

with, *Ea*: activation energy, *R*: the gas constant and  $p_{ex}$ : pre-exponential factor [7].

The Arrhenius law parameters for fresh-cut tomatoes are typically determined with empirical data, such as the one in [7]. We will insert the kinetic parameters in [8], in a general formula for k rate derivation, suitable for any temperature:

$$k(T) = k_{ref} e^{-\frac{Ea}{R} \left(\frac{1}{T} - \frac{1}{T_{ref}}\right)}$$
(2)

This proof of concept exploits the tomato's firmness as the only parameter for the quality level determination [7]. A first order approximation about the product quality and, thus, the derived SL is given by:

$$c = c_{eq} + (c_0) - e^{-k(T)t} \to SL(t) = \frac{1}{k} \ln\left(\frac{c'}{c_{eq}}\right)$$
 (3)

with *c* is the measured quality factor,  $c_0$  is the initial quality value,  $c_{eq}$  is the quality factor at a specific equilibrium value [7, 8], *t* is the storage time, *k* is the rate extracted by the Eq. (2) and *c*' is a specific quality factor at a given time *t*.

#### 2.3 Monitoring Cycle and Pallets Repositioning

- 1. The L/H/T acquisition is done with a fixed reads rate of 1 read/5 min.
- 2. The L/H/T values are compared with the reference ones (T/H/Lref).
- 3. If recommended conditions are respected, the system uses  $k_{ref}$  as k(T) in Eq. (3) for the *c* and SL computing. If at least one threshold is exceeded the rate *k* is calculated according Eq. (2)

The Control Unit embeds a SL optimization algorithm that exploits the data provided by the L/H/T sensors from the Environmental Unit, to realize a heat map of the storing room. The repositioning working principle can be summarized in five main steps:

- 1. The algorithm virtually derives, on the storing room surface, a  $5 \times 5$  matrix. The N<sub>p</sub> pallets to be managed are placed on N<sub>p</sub> specific matrix coordinates. The system identifies each pallet with a number (ID) and derives an encumbrance matrix to define the navigation path of the pallet that will be displaced.
- 2. The system extracts two  $5 \times 5$  matrices, named  $T_{ref}$  and  $T_{amb}$ .  $T_{ref}$  contains the temperature reference values [6, 7] and  $T_{refi,j}$  will identify the element in the i-th row and j-th column.  $T_{amb}$  contains the temperature values acquired by

sensors. In the proposed example, eight values of *Tamb* are directly extracted by the sensors, according to the green nodes in Fig. 1, while the other ones are mathematically derived [8].

- 3. The temperature matrices lead to the definition of N<sub>p</sub> overlapped 2D quality ones, which constitute a 3D matrix  $C \in \mathbb{R}^{M,M,N_p}$ , with  $C_{i,j,p}$  the quality that a specific p-th pallet would have, in i-th row and j-th column position.
- 4. The algorithm controls sequentially all the pallets and analyzes the effective  $C_{i,j,p}$  value (row and column indexes of real position of the pallet) of the considered pallet. If  $C_{i,j,p}$  of the monitored product shows a firmness decrement, between two consecutive measures, of  $C_{th} = 2.68 \cdot 10^{-5}$  N/s (10 min @24 °C), the system activates the displacement operations. The quality factor threshold  $C_{th}$  is reprogrammable.
- 5. The shelf life 3D matrix  $SL \in \mathbb{R}^{M,M,Np}$  can be derived from *C* and  $T_{amb}$  matrices, with  $SL_{i,j,p}$  the expected shelf life of a specific p-th pallet, if it were in i-th row and j-th column position. The displacement operations ask for identifying (with the right order) the pallets movements that maximize the overall shelf life in the warehouse.

### **3** Results

For the validation of the proposed WSN-based system, the testing room has been monitored for about 10 days. The placement of the environmental sensors inside the room has been sketched in Fig. 1.

The testing room was left uncontrolled (no window management—free sun direct incidence) for 5 days, it was controlled (closing the window between the 9.00 and the 17.00) for the remaining 5 days. In Fig. 2 a is reported a heat-map of the room during 12 h monitoring (Day1: 8.00—Day1: 23.00 with step of 3 h). Figure 2b shows the SL prediction on 24 h in control-free environment before and after applying the proposed optimization algorithm. Finally, Fig. 2.c shows the SL prediction on 24 h, in controlled environment, with and without the SL optimization algorithm.

**Control Loop**. Typical profiles of temperature and RH have been extracted with ordinary least squares (OLS) approximations. The temperature measurements, along 10 days returned, on average, a value of 21.3 °C  $\pm$  1.2 °C. The RH recorded in the room in the same time span was 55.73%  $\pm$  3%. The monitoring system showed, on average (along the 5 control-free days), a typical temperature increment of 2.8 °C  $\pm$  0.8 °C from 9.00 to 16.00. These overheating cycles in control-free environment are due to the sun light irradiation through the window (the sensors were not directly exposed to the sun light).

**SL Maximization**. For the SL optimization algorithm validation, four pallets have been initially placed at specific coordinates of the  $5 \times 5$  matrix (from the left-bottom corner of the matrices in Fig. 2a—Pack: ID1{1,2}, ID2{1,4}, ID3{3,3}, ID4{5,1}).

A typical monitoring system [9–13] would leave the pallets in their own initial position, sending warning or asking for human intervention. Considering a control-



Fig. 2 a Testing room (in control-free and controlled situation) heat-map evolution with 3 h steps. b SL prediction on 24 h before and after applying the optimization algorithm in: b control-free environment c controlled environment

free testing room and a static monitoring, according to the Fig. 2.b the overall SL on all the 4 pallets would be 21 days and 15 h. In the same environmental condition (control-free room) the proposed dynamic monitor allows SL reaching 24 days and 23 h (~5 days longer). Considering, instead, a controlled testing room and a static monitoring, as in the Fig. 2c, the overall SL on all the 4 pallets is clearly improved –if compared to the uncontrolled storage place—to 23 days. Also in this case, the SL optimization algorithm evaluated an overall lifetime of 25 days and 2 h.

In Table 1, the most relevant solutions in monitoring the perishable goods [10-13] are reported and compared with our work. Differently from the state of the art techniques, the here proposed solution ensures simultaneously quality food computing, SL prediction and optimization, exploiting low cost technologies [14, 15].

#### 4 Conclusions

In this paper, an automatic architecture for perishable goods SL monitoring and proactive optimization has been presented. Exploiting the WSNs benefits, the system monitors and process the environmental parameters (i.e. temperature, RH, light) and through a low cost platforms (Raspberry Pi and Arduinos) manages the pallets disposition in order to minimize the food losses in a non-refrigerated testing room. The experimental results on agriculture products show a mean SL increment of about 5 days in control-free and not refrigerated environment. In controlled environment, the algorithm shows an improvement of 2 days on the expected overall expiration date.

| Features                 | Song [10]                | Karimi<br>et al. [11]   | Purandare<br>et al. [12]              | Abad et al. [13]         | This work                      |
|--------------------------|--------------------------|-------------------------|---------------------------------------|--------------------------|--------------------------------|
| Application              | Greenhouse<br>monitoring | Web-based<br>greenhouse | IoT for<br>post-<br>harvest<br>losses | Cold chain<br>monitoring | Optimization<br>of Goods<br>SL |
| Continuous<br>monitoring | V                        | ~                       | V                                     | V                        | V                              |
| Sensors                  | T, GPRS                  | T/H, GPRS               | Т                                     | L/H/T                    | L/H/T                          |
| Protocol                 | ZigBee                   | ZigBee                  | ZigBee                                | RFID                     | ZigBee<br>Bluetooth            |
| Computing platform       | μC                       | Web<br>Computing        | RPi, Web                              | μC                       | RPi                            |
| Shelf life prediction    | X                        | <b>v</b>                | <b>v</b>                              | X                        | <b>v</b>                       |
| SL maximization          | X                        | ×                       | X                                     | ×                        | V                              |
| Costs                    | ~                        | ×                       | <b>v</b>                              | X                        | <b>v</b>                       |

Table 1 State of the art of perishable goods reliable monitoring and SL prediction

## References

- 1. Aile, S.S.: EU Waste Target Review—State-Of-Play, presentation to the EU Platform Subgroup on food waste measurement meeting, Brussels, 31 March 2017
- Raab, V., Bruckner, S., Beierle, E., Kampmann, Y., Petersen, B., Kreyenschmidt, J.: Generic model for the prediction of remaining shelf life in support of cold chain management in pork and poultry supply chains. J. Chain Netw. Sci. 8(1), 59–73 (2008)
- Carrara, S., Torre, M.D., Cavallini, A., De Venuto, D., De Micheli, G.: Multiplexing pH and temperature in a molecular biosensor. In: 2010 Biomedical Circuits and Systems Conference (BioCAS), Paphos, pp. 146–149 (2010). https://doi.org/10.1109/biocas.2010.5709592
- 4. Jedermann, R., Ruiz-Garcia, L., Lang, W.: Spatial temperature profiling by semi-passive RFID loggers for perishable food transportation. Comput. Electron. Agric., 145–154 (2009)
- Annese, V.F., Crepaldi, M., Demarchi, D., De Venuto, D.: A digital processor architecture for combined EEG/EMG falling risk prediction. In: 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, pp. 714–719 (2016)
- De Venuto, D., Stikvoort, E., Tio Castro, D., Ponomarev, Y.: Ultra low-power 12-bit SAR ADC for RFID applications. In: 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010), Dresden, pp. 1071–1075 (2010). https://doi.org/10.1109/date.2010. 5456968
- Lana, M.M., Tijskens, L.M.M., Van Kooten, O.: Effects of storage temperature and fruit ripening on firmness of fresh cut tomatoes. Postharvest Biol. Technol. 35(1) (2005)
- De Venuto, D., Mezzina, G.: Spatio-temporal optimization of perishable goods' shelf life by a pro-active WSN-based architecture. Sensors 18, 2126 (2018). https://doi.org/10.3390/ s18072126
- De Venuto, D., Torre, M.D., Boero, C., Carrara, S., De Micheli, G.: A novel multi-working electrode potentiostat for electrochemical detection of metabolites. In: 2010 IEEE Sensors, Kona, HI, pp. 1572–1577 (2010). https://doi.org/10.1109/icsens.2010.5690297
- 10. Song, J. Greenhouse Monitoring and Control System Based on Zigbee Wireless Senor Network,. International Conference on Electrical and Control Engineering, Wuhan, 2010

- Karimi, N., Arabhosseini, A., Karimi, M., Kianmehr, M.H.: Web-based monitoring system using wireless sensor networks for traditional vineyards and grape drying buildings. Comput. Electron. Agric. 144(2018), 269–283 (2018)
- 12. Purandare, H., Ketkar, N., Pansare, S., Padhye, P., Ghotkar, A.: Analysis of post-harvest losses: an internet of things and machine learning approach. In: 2016 International Conference on Automatic Control and Dynamic Optimization Techniques, Pune, pp. 222–226 (2016)
- Abad, E., Palacio, F., Nuin, M., De Zarate, A.G., Juarros, A., Gómez, J.M., Marco, S.: RFID smart tag for traceability and cold chain monitoring of foods: demonstration in an intercontinental fresh fish logistic chain. J. Food Eng. 93(4), 394–399 (2009)
- De Venuto, D., Annese, V.F., Mezzina, G.: Remote neuro-cognitive impairment sensing based on P300 spatio-temporal monitoring. IEEE Sens. J. 16(23), 8348–8356 (2016). https://doi.org/ 10.1109/jsen.2016.2606553
- De Tommaso, M., Vecchio, E., Ricci, K., Montemurno, A., De Venuto, D., Annese, V.F.: Combined EEG/EMG evaluation during a novel dual task paradigm for gait analysis. In: Proceedings—2015 6th IEEE International Workshop on Advances in Sensors and Interfaces, IWASI 2015, art. no. 7184949, pp. 181–186 (2015). https://doi.org/10.1109/iwasi.2015.7184949

## FPGA-Based Multi Cycle Parallel Architecture for Real-Time Processing in Ultrasound Applications



Valentino Meacci, Enrico Boni, Alessandro Dallai, Alessandro Ramalli, Monica Scaringella, Francesco Guidi, Dario Russo and Stefano Ricci

**Abstract** A typical echograph transmits ultrasound signals and receives backscattered echoes through several tens of small transducers that compose the array of the probe. The system should process the acquired echoes in a very short time, typically few tens of ms, so that the medical doctor does not perceive any delay between the probe movements and the image update. This implies, especially when high-frame rate is required, the real-time processing of several GB of data per second: a huge constraint that is even emphasized in the new advanced ultrasound modalities investigated nowadays. In this paper, we present a Field Programmable Gate Array architecture which uses a combination of parallel and serial strategies capable to achieve a throughput rate of 3.5 GB/s in ultrasound data processing. The proposed architecture is implemented in the research ultrasound system ULA-OP 256. The performance of this approach is demonstrated by the real-time implementation of Plane Wave Vector Doppler, a novel investigation method that shows the blood velocity fields over a 2D section of an artery. Tests in a carotid artery of a volunteer are presented.

## 1 Introduction

A wide range of applications produce a considerable amount of data to be processed by high-end electronic systems in a well-defined time-span (real-time) [1]. For example, diagnostic ultrasound echographs reconstruct, process and show a tomographic image in few milliseconds from the reception of echo data [2, 3]. In echographs, an ultrasound burst is transmitted every Pulse Repetition Interval (PRI); hence, the echoes received by several tens of transducers, composing the array of the probe, are

A. Ramalli

V. Meacci  $(\boxtimes) \cdot E$ . Boni · A. Dallai · M. Scaringella · F. Guidi · D. Russo · S. Ricci Department of Information Engineering, University of Florence, Florence, Italy e-mail: valentino.meacci@unifi.it

Laboratory of Cardiovascular Imaging and Dynamics, Department of Cardiovascular Sciences, KU Leuven, Louvain, Belgium

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_34



Fig. 1 Multi cycle parallel processing architecture used on the research ultrasound scanner ULAOP-256  $\,$ 

sampled and beamformed. In the beamforming process, the echo-signals are delayed and summed to focus the energy backscattered from a specific direction involving several time-consuming operations and high throughput rates. For example, considering signals sampled at 12bit, 50Msps, from 64 transducers, the beamformer processor should sustain a peak throughput rate of 4.8 GB/s of data, which corresponds to an average throughput rate of 0.6 GB/s when a 10 cm-wide region of interest is investigated with a PRI of 1 ms. The needed processing power is currently split among several Field Programmable Gate Arrays (FPGAs) or Graphic Processing Units (GPUs) present in the electronics circuits of the echographs [4].

In a standard echograph a single beamforming operation is performed per PRI, but novel methods require up to several tens of beamforming operations per PRI that are hardly supported in real-time by current standard echographs.

In this work, we propose a compact and advanced FPGA architecture called Multi Cycle Parallel Processing (MCPP), which uses a combination of serial and parallel beamforming strategies (see Fig. 1) capable of increasing the performance of the beamforming process typically used in ultrasound systems. The MCPP was integrated in the high-performance FPGAs of the research ultrasound scanner ULA-OP 256 [5, 6]. Its performance was evaluated by implementing in real-time the Plane Wave Vector Doppler (PWVD), an advanced ultrasound method that requires 16 beamforming operations per PRI. A test on the carotid artery of a volunteer is finally reported.

### 2 Multi Cycle Parallel Processing Architecture

The MCPP architecture is based on a Dual Port Memory (DPM) and 4 parallel beamformers (BFs): the MCPP (Fig. 1) stores the data to be processed in the DPM at the storing frequency Fs, which depends on the sampling frequency of the front-end; then, 4 Beamformers (BFs) process the stored data *n*-times, by applying different delays, at maximum FPGAs frequency  $F_P$ . The maximum reprocessing number *n* is subordinated to the PRI duration, Fs and the number of beamformed samples.

Depending on application, different strategies can be used to optimize the performance of the MCPP architecture:

Single Buffer Mode: Every PRI the whole of DPM are used to store the acquired data at Fs. The processing starts when a first burst of data is acquired, typically 32 or 64 samples. It works at  $F_P$  in parallel to data acquisition. The stored data can be reprocessed *n*-times to maximize performance, where *n* must be chosen considering the time constraints of the specific application, e.g. the PRI duration.

*Ping-Pong Mode*: In this case the DPM is divided in two parts. While a part of the DPM is devoted to store the acquired data for the next elaboration at Fs, the data, which have already been stored in the previous PRI, are read and process several times by BFs at  $F_P$ . As in the Single Buffer Mode *n* depends on the PRI.

*Multiple Buffers Mode*: It is an extension of the Ping-Pong mode, where the DPM is divided into multiple buffers and the processing strategy can be chosen in relation to the applications.

The MCPP was implemented on FPGAs of the ARRIA V GX family (Altera, San Jose, CA, USA) embedded on Front End Digital (FED) (Fig. 2) boards of the research ultrasound scanner ULA-OP 256 designed at the Department of Information Engineering (DINFO) of the University of Florence. ULA-OP 256 manages up to 256 channels by using 8 FED boards; each one embeds an FPGA and 2 DSPs (320C6678; Texas Instruments Incorporation, Austin, TX, USA) used to acquire and process data coming from 32 channels. The DPM configured in ping-pong mode stores at 78.125 MHz the echoes digitized by 32 ADCs. Meanwhile, the 4 BFs processes several times at  $F_P = 234.375$  MHz the data saved in the previous PRI [7, 8] by means of the delay and sum strategy to focus the energy backscattered from a specific direction. The main blocks of each BF are Circular Buffers (CB), one for each channel, which use the delays to align the received data, and an adder that sums coherently the data that come from CBs. The delays, previously stored in an external memory during the initialization of the system, are managed by a controller embedded in the FPGAs. A block scheme of the MCPP architecture synthesized in ULA-OP 256 is depicted in Fig. 1. The MCPP architecture, compared to a conventional one, where data flow from ADCs to a single BF, increases the image frame rate up to 10-fold. The factor depends on the acquisition depth, the number of image lines and the number of parallel BFs [7]. In this implementation, the DPM stores up to



Fig. 2 ULA-OP 256 front end board

16384 words (8192 each PRI) of 392-bit each (about 6.2 Mb). As detailed in Table 1, the MCPP implemented in the FPGA embedded on FED boards of ULA-OP 256, employs about 31% of the adaptive logic modules (ALMs), and 24% of the logic registers present in the FE FPGA. The memory is allocated in the 10 kbits memory blocks (M10 K), which are used at 71%. Time closure was achieved for a maximum frequency Fp = 234.375 MHz.

#### **3** Plane Wave Vector Doppler

Classical Pulsed Wave Doppler methods are limited to detect the axial component of the blood velocity in a single sample volume. Methods capable of detecting the velocity vectors in 1D, 2D or even 3D regions as Plane Wave Vector Doppler (PWVD), solve these issues [9, 10]. In the Plane Wave Vector Doppler method, a linear array probe transmits, at Pulse Repetition Frequency (PRF = 1/PRI) rate, a plane wave that insonifies the region of interest (ROI). In reception the probe is divided into 8 pairs of sub-apertures,  $A_R(k)$  and  $A_L(k)$  symmetrically located on the left and right sides of each vector Doppler line ( $1 \le k \le 8$ ). The ROI is divided into 8 vector Doppler lines uniformly spaced at L/7 steps along the lateral direction. Figure 3 shows the geometry of a single Doppler line. The echoes received by the sub-aperture pairs are

| Resources                     | Usage | %  |
|-------------------------------|-------|----|
| Adaptive logic modules (ALMs) | 13020 | 31 |
| Logic register                | 22053 | 24 |
| M10K memory blocks            | 612   | 71 |
| DSP blocks                    | 320   | 32 |
| PLL                           | 6     | 2  |

Table 1 FPGA resources utilization of MCPP in a FPGA (ARRIA GX family) for 32 chs



Fig. 3 Apertures on ultrasound probe for plane wave vector doppler

simultaneously beamformed along the corresponding Doppler line. A scatterer, positioned at depth *d* and moving with velocity *V*, produces an echo that is received by the apertures pairs. The echoes acquired by the left and right apertures are processed to compute the mean Doppler shifts, *fDL* and *fDR*, respectively. The lateral (*Vx*) and axial (*Vz*) components of the velocity vector *V* are calculated by the trigonometric triangulation with respect to the aperture positions [11, 12]. When PWVD is used in ULA-OP 256, it is programmed to transmit plane waves at 8 MHz through the central 128 elements of a linear array (model LA533, Esaote S.p.A., Florence, Italy). The transmitting aperture is 31.36 mm. In RX, the 8 Vector Doppler lines are spaced by about 1.2 mm, which corresponds to 5 transducer elements. According to this geometry, the ROI extension is about 1 cm, well inside the region insonified by the 31.36 mm transmission (TX) aperture. The 16 sub-apertures (A<sub>R</sub>(k) and A<sub>L</sub>(k),  $1 \le k \le 8$ ) are positioned at *Xac* = 5.48 mm.

#### 4 Experiments

The exam was performed on a volunteer with mechanical aortic valve to evaluate the peak blood velocity in the common carotid artery. A screenshot of the real-time display of ULA-OP 256, taken during the exam, is shown in Fig. 4. It was obtained at 36 fps with PRF = 4 kHz. The sonographer placed the probe over the common carotid artery, at about 1 cm far from the bifurcation. Assisted by the velocity vectors displayed on the screen, the sonographer selected a point of interest (yellow circle near the pointer in Fig. 4) whose velocity time-trend was shown in the bottom panel (yellow curve) together with the value of the moving average of the last five velocity peaks, and the heart rate (See Fig. 4).



Fig. 4 Screenshot of the real-time display of ULA-OP256 captured during the investigation of the common carotid artery of a healthy volunteer at PRF = 8 kHz

## 5 Conclusion

In this paper we have presented a Multi Cycle Parallel Architecture based on a Dual Port Memory which stores the data received from the transducers. 4 beamformers process the data in a parallel/serial approach by applying different delays per run. This architecture was implemented on the FPGAs of the research echograph ULA-OP 256 and shown capable of beamforming 16 lines per PRI and producing real-time Vector Doppler images at 36 fps.

Acknowledgements Authors would like to express their deep gratitude to Professor Piero Tortoli, for his guidance in this research work.

## References

- 1. Davis, R.I., Grolleau, E.: Editorial note on Special issue on scheduling and timing analysis for advanced real-time systems. Real-Time Syst. **51**(2), 125–127 (2015)
- Evans, D.H., McDicken, W.N.: Doppler Ultrasound: Physics, Instrumentation and Signal Processing, 2nd edn. Wiley, Chichester (2000)
- Costa, A., De Gloria, A., Faraboschi, P., Olivieri, M.: A parallel architecture for the Color Doppler flow technique in ultrasound imaging. Microprocess. Microprogr. 38(1–5), 545–551 (1993)
- 4. Birk, M., Guth, A., Zapf, M., Balzer, M., Ruiter, N., Hübner, M., Becker, J.: Acceleration of image reconstruction in 3D ultrasound computer tomography: an evaluation of CPU, GPU and

FPGA computing. In: Proceedings of Conference on Design & Architectures for Signal & Image Processing (DASIP 2011), pp. 1–8

- Boni, E., Yu, A., Freear, F., Jensen, J., Tortoli. P.: Ultrasound open platforms for nextgeneration imaging technique development. IEEE Trans. Ultrason. Ferroelect. Freq. Contr. 65(7), 1078–1092 (2018)
- Boni, E., Bassi, L., Dallai, A., Meacci, V., Ramalli, A., Scaringella, M., Guidi, F., Ricci, S., Tortoli, P.: Architecture of an ultrasound system for continuous real-time high frame rate imaging. IEEE Trans. Ultrason. Ferroelectr. Freq. Control. https://doi.org/10.1109/tuffc.2017. 2727980
- Steinberg, B.D.: Digital beamforming in ultrasound. IEEE Trans. Ultrason. Ferroelect. Freq. Contr. 39, 716–721 (1992)
- Meacci, V., Bassi, L., Ricci, S., Boni, E., Tortoli, P.: High-performance FPGA architecture for multi-line beamforming in ultrasound applications. In: Proceedings of Euromicro Conference on Digital System Design (DSD 2016), pp. 584–590, Cyprus, August 2016
- Jensen, J.A., Nikolov, S.I., Yu, A.C.H., Garcia, D.: Ultrasound vector flow imaging-Part I: sequential systems. IEEE Trans. Ultrason. Ferroelectr. Freq. Contr. 63(11), 1704–1721 (2016)
- Jensen, J.A., Nikolov, S.I., Yu A.C.H., Garcia, D.: Ultrasound vector flow imaging-Part II: parallel systems. IEEE Trans. Ultrason. Ferroelectr. Freq.Contr. 63(11), 722–1732 (2016)
- Ricci, S., Bassi, L., Meacci, V., Ramalli, A., Boni, E., Tortoli, P.: Multi-line measurements of blood velocity vectors in real-time. In: Proceedings IEEE Ultrasonics Symposium, pp. 1–4, Sep. 2016
- Ricci, S., Ramalli, A., Bassi, L., Boni, E., Tortoli, P.: Real-time blood velocity vector measurement over a 2D region. IEEE Trans. Ultrason. Ferroelect. Freq. Contr. 65(2), 201–209 (2018). https://doi.org/10.1109/tuffc.2017.2781715
- Costa, A., De Franciscis, A., De Gloria, A., Faraboschi P., Olivieri, M.: Spectral estimation for 2-D Doppler ultrasound imaging in Electronics Letters, vol. 28, no. 23, pp. 2177–2179, 5 Nov. 1992

## Adaptive Tuning System and Parameter Estimation of a Digitally Controlled Boost Converter with STM32



Gianpaolo Vitale, Antonino Pagano, Leonardo Mistretta and Giuseppe Costantino Giaconia

**Abstract** This paper proposes a diagnostic system of a power electronic converter, based on a cheap STM32 Nucleo board, aiming to extract some relevant measurements which gives back important values of the converter's parameters behavior. Moreover the designed system is also exploited for implementing the converter's auto tuning in order to optimize its dynamic performance.

## 1 Introduction

The recognition of the circuital parameters of a DC-DC converter is often crucial to improve its reliability and performance. As a matter of fact, during prolonged operation, electric and thermal stress cause aging with consequent variation of the circuital parameters. In power electronic converters, the electrolytic capacitors and the MOSFETs higher deterioration rate, compared to the other devices, is widely recognized [1]. The percent failure of an electrolytic capacitor raises up to 60% whereas the MOSFET shows a value of 30% and the remaining components such as inductive elements and diodes lie under 5% [2–4]. The capacitor degradation causes a decrease in the output voltage and an increase of the ripple [5]. Literature shows several recent different approaches dealing with this issues. The estimation of both

G. Vitale

ICAR, Institute for High Performance Computing and Networking, National Research Council (CNR), Via Ugo La Malfa, 153, 90146 Palermo, Italy e-mail: gianpaolo.vitale@cnr.it

A. Pagano · L. Mistretta · G. C. Giaconia (🖂)

Dipartimento di Energia, Ingegneria dell'Informazione e Modelli Matematici, Università degli Studi di Palermo, 90128 Palermo, Italy e-mail: costantino.giaconia@unipa.it

A. Pagano e-mail: antonino.pagano1@gmail.com

L. Mistretta e-mail: leonardo.mistretta@unipa.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics* 

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_35

the output capacitor value and its ESR is discussed in [6, 7] whereas [8] adopts a neural network approach to detect when the converter parameters run out of tolerance. In [9, 10] the soft fault diagnosis approach, related to a degradation of the system's performance that not completely affect the normal operation, is introduced. On the other hand, on the basis of the estimated value of the parameters, the dynamic can be optimized even in case of variations of circuital parameters [11]. Before that a failure occurs however, a performance degradation is noticeable; for this reason the attention of researchers is focused on auto-tuning techniques as well. This task can be also accomplished on the basis of the system performance by a reference model as in [12, 13] or by stability boundary locus [14] or defining a forbidding region in the Nyquist plane [15]; however neither of these approaches do not pursue the knowledge of the converter parameters. This paper adopts the approach based on the continuous time model described in [16], since it is a simple methodology able to estimate all the circuital parameters, meaning the load resistance, inductance, output capacitance and the equivalent series resistor (ESR), on this basis a Dahlin controller is set-up. This kind of controller performs the cancellation of the poles of the converter and imposes the desired dynamic. The issues related to the implementation of this controller on digital systems have been discussed in [17] by numerical analysis whereas in [18] is explained how the parameters of this controller can be calculated on the basis of the circuital parameters of a DC/DC converter, but without explaining how the parameters are estimated. The novelty of this paper consists on the implementation, on a cheap microprocessor board, of the continuous time model [16] able to estimate all the circuit parameters and to simultaneously use them to set up the Dahlin controller able to perform the auto-tuning of the converter during its working operations. It is different from [6, 7] where only the capacitance and its ESR are estimated and it is computationally lightweight compared to the papers performing the estimation of all parameters that adopts a neural network [8], a particle swarm optimization [9] or a crow search algorithm [10]. The features of the proposed approach are the same of [11] which has the further advantage to avoid the current sensor but uses a FPGA to implement the algorithm including the current estimator. Contrarily to the system performance approaches [12-15] the proposed optimization, based on the continuous monitoring of the circuit parameters of the converter, allows the converter state-of-health to be continuously monitored improving reliability by scheduled maintenance. Finally it requires only a switching period to perform the estimation and runs on a cheap microprocessor board including the generation of the PWM gate drive signal.

### 2 Parameter Estimation of the Boost Converter

Figure 1 shows the boost converter with the STM board that samples the signals, estimates the electric parameters, performs the control auto tuning and finally outputs the PWM signal for the power MOSFET.



Fig. 1 Diagram of the Boost converter with the estimation and auto-tuning system

After sampling  $V_{out}$  and  $i_L$ , their mean values are calculated as in (1):

$$\bar{x} == (N-1)^{-1} \sum_{i=2}^{N} [x(i) - x(i-1)/2]$$
(1)

where *x* is the quantity to be averaged. The number of samples is given by N = fsam-ple/fsw. The main following parameters are estimated: load resistance, inductance, output capacitance and the equivalent series resistor ( $R_{load}$ , L, C, ESR).

The function that implements this algorithm starts from eight inputs:

- The sampled quantities: inductor current  $i_{L_i}$  capacitor voltage  $v_C$  and the gate voltage of the MOSFET ( $V_{gate}$ );
- The time vector: that is a vector containing the sampling instants;
- The constants of the circuit: the duty-cycle  $(D_{on})$ , the switching frequency  $(f_{sw})$ , the sampling frequency  $(f_{sample})$  and the input voltage  $(V_{in})$ .

The load resistance is obtained by the ratio between the output voltage and the load current obtained as the mean value of the current in the inductor:

$$R_{load} = \langle V_{out} \rangle / [\langle I_L \rangle (1 - D_{on})]$$
<sup>(2)</sup>

As for the ESR estimation, when the capacitor is in excellent conditions the ESR value is usually very small and the output voltage ripple exhibits a triangular pattern. During a time equal to  $T_{on}/2$ , the output voltage reaches its mean value, where  $T_{on}$  is a vector of the sampling instants when the MOSFET is in ON state. An increase in ESR resistance changes the trend of the output voltage, which at  $T_{on}/2$  will no longer reach its average value. The ESR can be then estimated by the difference between the voltage at  $T_{on}/2$  minus the its average value, divided by the output current (3):

$$ESR = -[V_{out}(T_{ON}/2) - \langle V_{out} \rangle] / \langle I_L \rangle$$
(3)

the minus sign is present because  $V_{out}(T_{on}/2)$  is smaller than the average voltage. The next step is the organization of the data and the identification of the operation



Fig. 2 Block diagram of the implemented system

modes of the converter, continuous (CCM) or discontinuous (DCM). If the converter is in CCM the algorithm needs the  $T_{on}$  only, while in the DCM both  $T_{on}$  and  $T_{off}$  vectors are necessary. In DCM mode, the load estimation is performed by (4):

$$R_{load} = [(D_{ON} + D_{OFF}) < V_{out} >]/[< I_L > (1 - D)]$$
(4)

Finally, by sampling the  $i_L$  and  $V_{out}$  in the conducting state ( $T_{on}$ ), it is possible to obtain the estimation of the inductance and of the capacitance by using the least mean squares method. The implemented algorithm is described by the following steps: (1) Sampling of the state variables ( $i_L$ ,  $V_{out}$ ) and the MOSFET gate signal during a switching cycle; (2) Evaluation of the averaged state variables; (3) Identification of operating mode (CCM or DCM) and evaluation of  $T_{on}$  and  $T_{off}$ ; (4) Estimation of the  $R_{load}$ ; (5) Estimation of the ESR; (6) Estimation of the inductance value by the LMS algorithm based on  $i_L$  sampling during  $T_{on}$ ; (7) Estimation of the capacitance value by the LMS algorithm based on  $i_L$  sampling during  $T_{on}$ . The described technique is very accurate, but to maintain its accuracy the sampling frequency must be relatively high, at least two order of magnitude more than the switching frequency [16], however this is easily achievable by the STM board and allows the estimation to be fastly performed in a switching period.

#### **3** Hardware Description

Figure 2 shows a simplified block diagram of the implemented system. The microcontroller acquires four signals from the Boost converter, executes the algorithm for the estimation of the parameters and sends the data to a notebook through the serial port. The system also generates the PWM signal for the gate of the MOSFET.

The *STM32F303RE* is based on the high-performance ARM Cortex-M4 32-bit RISC core, operating up to a frequency of 72 MHz, and embedding a floating point unit (FPU) [19]. Among other typical features, it is equipped with four 5MSPS sampling speed ADCs and two 12-bit DAC channels with analog supply from 2.4 to 3.6 V; other than several communication interfaces and useful timers.
| $\begin{array}{c} R_{load} \\ (\Omega) \end{array}$ | 20                   | 20                   | 20                   | 40                   | 100                   | 200                  |
|-----------------------------------------------------|----------------------|----------------------|----------------------|----------------------|-----------------------|----------------------|
| C<br>(μF)                                           | 165                  | 99                   | 33                   | 165                  | 165                   | 165                  |
| K <sub>p</sub>                                      | $5,96 \cdot 10^{-4}$ | $9,94 \cdot 10^{-4}$ | $2,78 \cdot 10^{-3}$ | $2,98 \cdot 10^{-4}$ | $1,195 \cdot 10^{-4}$ | $8,57 \cdot 10^{-5}$ |
| Ki                                                  | 6                    | 10                   | 28                   | 3                    | 1,2                   | 0,4                  |

 Table 1
 PI parameters for different working conditions

## 4 Digital Self-tuning Controller

In order to design and implement the converter's controller, a continuous time approach has been adopted. In particular a synthesis in continuous time and a subsequent discretization of the controller was realized. The PI digital regulator is the Dahlin regulator since it offers many interesting features [17], its parameters can be calculated on the basis of the circuit parameters values [18]. It should be noted that the Dahlin regulator imposes the desired dynamic of the system by cancelling its poles, for this reason it is crucial that the circuit parameters are carefully estimated.

$$u(k) = u(k-1) + K_p [e(k) - e(k-1) + T_s (K_i/K_p)e(k)]$$
(5)

Equation (5) uses the following parameters: k as the discretization index; e(k) as the system error; u(k) as the controller output; Ts as the sampling time;  $K_p$  and  $K_i$  are the parameters of the PI regulator. These  $K_p$  and  $K_i$  are adapted according to the estimation algorithm implemented within the microcontroller's firmware. The realized digital PI executes the auto-tuning of the converter. The values of this key parameters are summarized in Table 1 for some representative working conditions.

#### 5 Experimental Results

In order to verify the proposed method, a boost converter has been built and equipped with voltage and current sensors to sample the two status variables  $(i_L, v_C)$ . The other parameters are obtained by the microcontroller system that routinely computes them on the basis of the feedback control. In order to emulate a capacitance or a ESR variation during time, the prototype has been arranged with five parallel connected capacitances and five small resistances that can be freely selected during operations. The switching frequency of the MOSFET is equal to 10 kHz. The main parameters of the converter are summarized in Table 2 while the prototype is shown in Fig. 3. To avoid high frequency noise, a LP filter has been placed at the input of the A/D converter, also adding a zener diode to avoid dangerous overvoltages.

|        | 1                                                   | 1                     |                |
|--------|-----------------------------------------------------|-----------------------|----------------|
| Symbol | Rated values                                        | Supplier              | Code           |
| MOSFET | $R_{DS(on)} = 0.09 \ \Omega,$                       | INFINEON              | IRF530NPBF1    |
| Diode  | $I_{Dmax} = 15 \text{ A}, V_{Dmax} = 600 \text{ V}$ | Vishay Semiconductors | VS-15ETL06     |
| Rload  | 20–200 Ω                                            | Vishay Semiconductors | RS005          |
| L      | <i>270</i> μH                                       | Coilcraft             | PCV-2-274-10L  |
| С      | 5 × 33 μF                                           | Panasonic FC series   | EEUFC1V330     |
| ESR    | $5 \times 200 \text{ m}\Omega$                      | Vishay Semiconductors | LVR03R2000FE12 |

 Table 2
 Main parameters of the boost converter



Fig. 3 Prototype of the boost converter with the STM32 Nucleo

# 5.1 Parameter Estimation

Several tests have been performed both by varying the output capacitance and the ESR through suitable jumpers mounted on the PCB. The true value has been calculated as the mean of the estimated values whereas the variance is assumed as the error. Table 3 shows the results obtained varying the ESR and Table 4 the results when the capacitance is changed. From Table 4 a reduction of the error decreasing the value of the capacitance can be appreciated. This occurs since for higher capacitance values, the voltage ripple is lower, thus worsening the signal-to-noise ratio.

# 5.2 Auto Tuning Performances

Several experimental tests have been performed by tuning the digital regulator parameters as explained in Sect. 4. Firstly, the regulator has been set up in the worst case that in our circumstances corresponds to a load resistor of 200  $\Omega$ . Some simulation Adaptive Tuning System and Parameter Estimation ...

| Known ESR ( $\Omega$ ) | Estim. ESR $(\Omega)$ | Estim. $R_{load}$ ( $\Omega$ ) | Estim. C (µF) | Estim. L (µH) |
|------------------------|-----------------------|--------------------------------|---------------|---------------|
| 40                     | $4 \pm 9$             | $20,\!13\pm0,\!05$             | $161 \pm 2$   | $269 \pm 2$   |
| 50                     | $55\pm 8$             | $20,\!10\pm0,\!02$             | $156 \pm 10$  | $268 \pm 3$   |
| 70                     | $79 \pm 10$           | $20,\!11\pm0,\!02$             | $159 \pm 12$  | $266 \pm 5$   |
| 100                    | 107 ± 9               | $20,\!09\pm0,\!02$             | $155 \pm 11$  | $267 \pm 2$   |
| 200                    | $209 \pm 11$          | $20,\!13\pm0,\!02$             | $155 \pm 10$  | $265 \pm 5$   |

 Table 3 Experimental results varying the ESR



| Known C (µF) | Estimated C (µF) |
|--------------|------------------|
| 165          | $175 \pm 13$     |
| 132          | $141 \pm 10$     |
| 99           | $104 \pm 5$      |
| 66           | $67 \pm 3$       |
| 33           | $35 \pm 2$       |



Fig. 4 Experimental test carried out by a load variation from 20  $\Omega$  to 40  $\Omega$ 

have also been carried out to figure out the controller behavior at different conditions with and without applying auto-tuning optimization. It was observed that at  $R_{load} = 40\Omega$  the adaptive tuning avoids oscillations and steady state is reached with a lower settling time. Figure 4 shows an experimental tests obtained by measuring the output voltage behaviour when load has a step variation from  $20\Omega$  to  $40\Omega$ .

In particular orange trace depicts this step response under the PI control implemented without auto-tuning technique; while the blue one depicts performance by the auto-tuning. These comparative trials clearly spots the improvements so far obtainable both in its dynamic performances (lower peak and relatively faster response) and a lower error at steady state too.

## 6 Conclusions

A diagnostic system aimed to estimate the parameters of a boost converter has been conceived and implemented on a STM32 Nucleo board system. In particular, it gives the value of the output capacitance and its equivalent series resistance together with the inductance and load resistance, useful to prevent the failure of the converter. The main novelty lies on the monitoring system that simultaneously embeds the capability to perform the auto-tuning of the converter by a Dahlin controller by using these monitored values. Moreover the algorithm requires only a switch period to perform the estimation, both in CCM and in DCM mode yet running on a low cost hardware.

# References

- 1. Yang, S., Bryant, A., Mawby, P., Xiang, D., Ran, L., Tavner P.: An industry-based survey of reliability in power electronic converters. IEEE Trans. Ind. Appl. **47**(3), 1141–1451 (2011)
- Amaral, A., Cardoso, A.: On-line fault detection of aluminum electrolytic capacitors. In: Stepdown DC–DC Converters, Using Input Current and Output Voltage Ripple. IET Power Electronics, pp. 315–322 (2011)
- Lahyani, A., Venet, P., Grellet, G., Viverge, P.: Failure prediction of electrolytic capacitors during operating of a switch mode power supply. IEEE Trans. Power Electron. 13, 1199–1207 (1998)
- Arada, K., Katsuki, A.A., Fujiwara, M.: Use of ESR for deterioration diagnosis of electrolytic capacitor. IEEE Trans. Power Electron. 8, 355–361 (1993)
- Kulkarni, C., Biswas, G., Koutsoukos, X., Celaya, J., Goebel, K.: Integrated diagnostic/prognostic experimental setup for capacitor degradation and health monitoring. IEEE Autotestcon, 13–16 Sept 2010
- Yao, K., Tang, W., Hu, W., Lyu, J.: A current-sensorless online ESR and C identification method for output capacitor of buck converter. IEEE Trans. Power Electron. 30(12), 6993–7005 (2015)
- Laadjal, K., Sahraoui, M., Cardoso, A.J.M., Amarai, A.M.R.: On-line estimation of aluminum electrolytic-capacitor parameters using a modified Prony's method. in: IEEE 11th International Symposium on Diagnostics for Electrical Machines, Power Electronics and Drives, (SDEMPED), pp. 387–393 ((2017))
- Catelani, M., Ciani, L., Luchetta, A., Manetti, S., Piccirilli, M.C., Reatti, A., Kazimierczuk, M.K.: MLMVNN for parameter fault detection in PWM DC-DC converters and its applications for buck DC-DC converter. In: IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC) (2016)
- 9. Sun, Q., Wang, Y., Jiang, Y., Wu, Y.: On-line component-level soft fault diagnostics for power converters. In: Prognostics and System Health Management Conference, pp. 1–5 (2016)
- Sun, Q., Wang, Y., Jiang, Y., Shao, L.: Condition monitoring and prognosis of power converters based on CSA-LSSVM. In: International Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), pp. 524–529 (2017)
- Lukić, Z., Ahsanuzzaman, S.M., Zhao, Z., Prodić, A.: Sensorless self-tuning digital CPM controller with multiple parameter estimation and thermal stress equalization. IEEE Trans. Power Electron. 26(12), 3948–3963 (2011)
- Corradini, L., Mattavelli, P., Stefanutti, W., Saggini, S.: Simplified model reference-based autotuningfor digitally controlled SMPS. IEEE Trans. Power Electron. 23(4), 1956–1963 (2008)

- Costabeber, A., Mattavelli, P., Saggini, S., Bianco, A.: Digital autotuning of dc-dc converters based on model reference impulse response. In: Twenty-Fifth Annual IEEE Applied Power Electronics Conference and Exposition (APEC), pp. 1287–1294 (2010)
- Garg, M., Yogesh Hote, V., Pathak, M.K., Behera, L.: Approach for buck converter PI controller design using stability boundary locus. In: IEEE/PES Transmission and Distribution Conference and Exposition (T&D), pp. 1–5 (2018)
- De Keyser, R., Ionescu, C.M., Muresan, C.I.: Comparative evaluation of a novel principle for PID autotuning. In: 11th Asian Control Conference (ASCC), pp. 1164–1169 (2017)
- Buiatti, G.M., Amaral, A.M.R., Cardoso, A.J.M.: An online technique for estimating the parameters of passive components in non-isolated DC/DC converters. IEEE Ind. Electron., 606–610 (2007)
- Zhang, W.D., Sun, Y.X., Xu, X.M.: Robust digital controller design for processes with dead times: new results. IEEE Proc. Control Theor. Appl. 145, 159–164 (1998)
- Tahri, F., Tahri, A., Allali, A., Flazi, S.: The digital self-tuning control of step a down DC-DC converter. Acta Polytechnica Hungarica 9(6) (2012)
- 19. STMicroelectronics, STM32F303xD—STM32F303xE, Datasheet rev. n°5 (2016)

# **Developing a Machine Learning Library for Microcontrollers**



#### Andrea Parodi, Francesco Bellotti, Riccardo Berta and Alessandro De Gloria

Abstract With the appearance of tools to support the emerging paradigm of edge computing, we expect that low cost microcontrollers will become appealing execution platforms also for machine learning. To explore this field, we implemented Machine Learning eMbedded Library  $(ML)^2$  and tested it in a simple case (classifying human movement as normal or not) and with a benchmark dataset to have a first comparison in performance with other implemented algorithms. Results—in terms of accuracy and of execution time, both for training and classification—are promising, and encourage the next steps of our work, in the direction of extending the set of implemented algorithms and going more in depth with the testing. In any case, we believe that these preliminary results should spur the Internet of Things research community in devising distributed computing algorithms able to support ML computation as close as possible to the source.

# 1 Introduction

With the diffusion of the "Internet of Things", there is a growing demand for processing data and signals from sensors for monitoring environments. Artificial intelligence (AI) algorithms based on machine learning (ML) are ideal actors for a high-level processing, extracting useful information from raw data.

The edge computing paradigm has recently gained ever more attention, as it proposes a dense distributed computing model, where as much as possible of the processing should be done near the nodes of acquisition [1, 2], unloading the cloud

A. Parodi · F. Bellotti (🖂) · R. Berta · A. De Gloria

DITEN, Università degli Studi di Genova, Via Opera Pia 11/a, 16145 Genoa, Italy e-mail: Franz@elios.unige.it

R. Berta e-mail: Berta@elios.unige.it

A. De Gloria e-mail: ADG@elios.unige.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_36

and remote units. Microcontrollers become key devices in such new scenarios, given their wide diffusion in several application domains, and their versatility.

However, there are currently just few ML solutions on embedded systems, also because microcontrollers typically involve small-footprint, low price and low power hardware, while ML requires high computing power.

We have investigated this distributed intelligence emerging research field by developing a software library including, at present, just few of the most common and basic learning and data classification techniques. We addressed both the model learning and online classification processes on the board, trying to understand the potential and limits of the microcontroller platform.

The library (available open source, https://github.com/andreaparodi/ML\_Lib) currently includes three well established supervised learning algorithms—namely: k-Nearest Neighbors (K-NN), artificial neural network (ANN), and perceptron classifier (PC) -, that provide the same functionality, but with very different implementations.

## 2 Test Case

We made a first test classifying human movement, distinguishing "normal" walking from "deviated" walking (e.g., a person under the influence of alcohol).

Instead of the individual samples, we took time windows of about 5 s, sampling approximately every 100 ms. A total of 300 values are obtained (50 samples \* 6 dimensions, representing the gyroscope and for the accelerometer for each axis) for each sample to be classified. Such high dimensionality makes the real time classification impossible without some kind of data processing.

Differently from other solutions in literature (e.g., [3]), we extracted statistical parameters, such as the following 15 features, that represent the sample to be classified: sample mean of accelerometer and gyroscope for each axis (6 total), sample standard deviation of each component (6 total), Pearson correlation between acceleration and gyroscope values of the same axis.

Reducing the number of features and consequently operations allows on-board training and real-time classification. Figure 1 (that reports data computed by inspecting the relevant algorithms) shows how this reduction influences the total number of operations needed for the classification and training of a new sample, considering that the size of an input vector equal to the number of features (i.e., 300 or 15, in the two cases), the number of examples of the training is 100, the number of hidden nodes is 30 (as a bare minimum for good results), the number of output nodes is 1 (binary classification), the number of training cycles per perceptron classifier is 100 and the training cycle for the ANN are 1.

Data acquisition in real time was made through an additional shield, the "X-NUCLEO-IKS01A2", designed specifically for the STM32 family, that embeds some sensors, of which the one used is the "LSM6DSL", incorporating an accelerometer  $(\pm 2/\pm 4/\pm 8/\pm 16 \text{ g})$  and a gyroscope  $(\pm 125/\pm 245/\pm 500/\pm 1000/\pm 2000 \text{ dps})$ ,



Fig. 1 Number of operations necessary for the 300 (above) and 15 (below) input vector size

both 3D, for whose the appropriate functions for initialization and data acquisition have been written according to the datasheet [4].

We used an STM32F401RE microcontroller, chosen mainly for its versatility, its low cost, a powerful ARM *Cortex*-M4 84 MHz processor that includes an FPU (floating point unit), allowing floating point operations [5]. It has 512 KB Flash and 96 KB SRAM, and 64 KB core coupled memory (CCM).

The library is of course independent of the used microcontroller. Our test platform has an FPU. In absence of an FPU, the algorithms can work in fixed point logic, because both dataset and model parameters can be represented with 8 (3 + 5)significant digits loosing very little data information, which is compatible with fixed point arithmetic with a little decrease in performance.

#### **3** Results

Effectiveness of the library algorithms was verified first online, with a test set obtained through direct data collection. This human movement classification test was done with very good results but it is not reported because of its non repeatability. This test was done on two different subjects, mainly because data collection takes a long time and in some cases must be done for each subject.

Table 1 reports the results achieved using the recorded training and test sets, both containing an equal number of samples for each class, (a total of 100 for the training and 50 for the test). Results are reported for a single k-NN configuration (K = 11, Euclidean distance, majority vote), for an ANN trained with the whole training set

| e<br>test | Classifier | Training time<br>(s) | Classification<br>time (ms) | Errors (%) |
|-----------|------------|----------------------|-----------------------------|------------|
|           | K-NN       | -                    | 0.18                        | 2          |
|           | ANN        | 84.6                 | 0.3                         | 0          |
|           | PC         | 0.76                 | 0.037                       | 12         |

**Table 1** Results for themovement recognition test

(learning rate = 0.125, crosstrain disabled, 30 hidden nodes, single hidden layer) and for the PC (learning rate = 0.1). As expected, the most complex algorithms (ANN and k-NN) achieve optimal results in terms of accuracy.

This approach of making everything on the board has costs, in particular for memory size, training time, limited input vector size and number of training elements. In a second test (whose results we do not report for space limitations), we used an online benchmark dataset found online [6], representing the readings of a sonar that analyzes materials, distinguishing between rocks and metallic material. The dataset has a total of 60 features per sample (thus working also as a stress test), a training set of 191 samples and a test set of 100 samples (both not equally distributed). The training time for ANN was over a 30 min. we had set, and was thus made remotely. Classification times were negligible for all the three techniques, but accuracy was not good for PC (40%), because results could not be separated linearly. Surprisingly, k-NN performed better than ANN (3% vs. 6% error). In general, the accuracy of the classifiers has a price, paid by the ANN in terms of training time and number of values that must be recorded; and by k -NN in terms of the increase in label prediction time with the growth of the number of examples (that have to be compared with the sample to classify) and in the memory space needed to record them.

## 4 Conclusions

The emerging paradigm of edge computing relies on the concept of processing information as close as possible to its source, in order to achieve a more efficient utilization of the resources. With the appearance of tools to support this new perspective of distributed computing, we expect that low cost microcontrollers will become appealing execution platforms also for machine learning. To explore this field, we implemented Machine Learning eMbedded Library (ML)<sup>2</sup>—now featuring just three basic algorithms—and assessed its performance in a simple test case (classifying human movement as normal or not) and with a benchmark.

Results—in terms of accuracy and of execution time, both for training and classification—are promising, and encourage the next steps of our work, in the direction of extending the set of implemented algorithms and going more in depth with the testing. In any case, we believe that these preliminary results should spur the Internet of Things research community in devising distributed computing algorithms able to support ML computation as close as possible to the source.

## References

- 1. Satyanarayanan, M.: The emergence of edge computing. Computer 50(1), 30–39 (2017)
- 2. Shi, W., Dustdar, S.: The promise of edge computing. Computer 49(5), 78–81 (2016)
- Arias, P., Kelley, C., Mason, J., Bryant, K. and Roy, K.: Classification of user movement data, ICDSP 2018, February 25–27, 2018, Tokyo, Japan
- 4. https://www.st.com/en/mems-and-sensors/lsm6dsl.html
- 5. https://www.st.com/en/microcontrollers/stm32f401.html?querycriteria=productId=LN1810
- 6. http://fizyka.umk.pl/kis-old/projects/datasets.html#Sonar

# The Case for RISC-V in Space



Stefano Di Mascio, Alessandra Menicucci, Gianluca Furano, Claudio Monteleone and Marco Ottavi

Abstract This paper presents preliminary position on the use of the novel, free and open RISC-V Instruction Set Architecture (ISA) for on-board electronics in space. The modular nature of this ISA, the availability of a rich software ecosystem, a rapidly growing community and a pool of open-source IP cores will allow Space Industry to spin-in developments from terrestrial fields (in terms of security, artificial intelligence, support for operating systems, hardware acceleration etc.) while focusing its efforts mainly on aspects related to the specific needs of on-board electronics for space applications (e.g. fault tolerance, observability, error signaling, etc.). This will improve reuse and avoid the necessity of developments from scratch when not strategically needed, eventually increasing productivity and reducing costs. The use of an open, non proprietary ISA will allow ad-hoc design of microarchitecture-level soft error countermeasures that can greatly increase the robustness of Application Specific Standard Products (ASSP) and FPGA implementations.

# 1 Introduction

While open-source software has been around for decades, being the driving force behind most of the Internet and all of the top-500 supercomputers [1], hardware has not yet fully experienced the disruptive effects of openness. Nevertheless, over the last years RISC-V has risen in popularity, drawing the attention of several universities and companies previously focusing on other open and free ISAs, proprierary ISAs

S. Di Mascio · A. Menicucci

Delft University of Technology, Faculty of Aerospace Engineering, Space Systems Engineering, Delft, The Netherlands

G. Furano · C. Monteleone European Space Agency, European Space Technology Centre, Noordwijk, The Netherlands

M. Ottavi (🖂) Department of Electronic Engineering, University of Rome Tor Vergata, Rome, Italy e-mail: ottavi@ing.uniroma2.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_37



Fig. 1 Main CPU ISAs and their market share evolution in Europe and US. In bold SoCs. TCLS and DAHLIA are H2020 EU-funded research projects, while SPARC IPs and SoCs development in Europe have been largely funded by ESA

or even on ISAs designed in-house (with the big drawback of having to design and maintain a software ecosystem). LEON is a precursor applied to the space industry to what is currently happening now with RISC-V in the COTS market. Use of an open and free ISA like SPARC had compelling motivations at the time the introduction of LEON for space [2] and the code of the non-FT version is available to the public. European Space Industry (and a large part of the worldwide space community) is using LEON-based System-on-Chips (SoCs) in all ongoing and planned missions [3]. Most of the cost associated for those custom SoCs is in design, and introduction of reusable standard IP libraries allowed to simplify design complexity thus allowing all major European players to develop their own LEON-based processor and product lines (Fig. 1).

LEON was designed for a single function (processor), targeting an ASSP that came in 2009 with Atmel's AT697F but it has been increasingly being used as SoC platform since the very beginning. This generated a positive feedback with availability of IP libraries and use of commercial ones in several space grade SoCs. The availability of large rad-hard FPGAs (especially with introduction of ACTEL/Microsemi RTAX series) made the rest, since now space components and units manufacturers have a very efficient SoC platform that helps in minimizing design and SW development work, resulting in many first-silicon-good ASICs. The LEON 'ecosystem' is now mature (mostly thanks to ESA and Cobham Gaisler) and thanks to its portability and CAD tool independence allows to support of both ASIC and FPGA technologies, taking the best of both worlds. Any adoption of a new ISA for the design of safety and mission critical processors require robustness and dependability of the whole infrastructure around the processor IP:

- Complete, easy to reproduce verification with dedicated tool set
- Clear documentation on how to use it, with reference designs
- Mature, maintained SW tools
- Robustness validated through Silicon implementation and test
- Widespread knowledge in academia and industry.

## 2 The RISC-V ISA

RISC-V was originally developed by UC Berkeley to support computer architecture research and education oriented at hardware implementations, as they could not find an ISA fit for such purpose. An example of the limits of previously existing free ISAs is OpenRISC, which features micro-architectural choices like branch delay slots and is not designed to be modular. DARPA funded RISC-V in its very beginning [4] and is continuing to fund other activities related to the spin-in of open-source IPs in trustable electronic systems. The reason behind this is that open-source IPs and open ISAs can reduce the resources, time, and complexity required for 'secure and trusted' custom SoC design, as detailed information about open-source IP cores can be found by inspection and ad-hoc improvements or modifications for security are much easier, avoiding the need to design everything from scratch and thus ultimately increasing reusability [5]. The Space Industry can apply much of those considerations to enhancements in fault-tolerance of existing open-source COTS IPs. RISC-V is backed also by big players of the commercial field, as companies like Google, Qualcomm, IBM, NVIDIA, Samsung and Western Digital are members of the RISC-V foundation. The reasons for this is that concerns are growing about monopolistic positions in the embedded market, as ISA owners protect their IP not allowing freely available implementations and free-market competition from other core designers, thus ultimately preventing reuse and ad-hoc designs. The adoption of a free and open ISA can thus lead to shorter time to market and lower costs from reuse.

The main feature of RISC-V is modularity. The RISC-V manual is structured in two volumes, one for the user-level ISA and the other describing the privileged architecture with three privilege levels (User, Supervisor and Machine mode). An implementation can employ just the user mode, the user and machine mode (when security is a concern) or all of the three modes for Linux-like Operating Systems (OSs). The user-level ISA is defined as a base integer (I) ISA, which must be present in any implementation, plus optional extensions to the base ISA. The integer base is restricted to a minimal set of instructions sufficient to provide a reasonable target for compilers, assemblers, linkers, and OSs (with additional supervisor-level operations), and so provides a convenient ISA and software toolchain skeleton around which more customized processor ISAs can be built. A subset of the integer base can optionally be implemented (E) when an implementation targets small 32-bit microcontrollers, with 16 general purpose registers instead of 32. The standard defines a "general" subset (G) as the set of extension required for general purpose computing systems (e.g. this subset of the ISA is enough to run Linux from an instruction perspective). RISC-V allows both standard and non-standard extensions (defined outside the specifications). Whereas other ISAs are treated as a single entity which changes to a new version as instructions are added over time, RISC-V aims at keeping the base and the standard extensions. This will ultimately increase reuse of software, especially in the long-term.

The base of RISC-V is similar to the original RISC developed in the Berkeley RISC project [6], but is updated to account new trends and needs of the embedded market, since it is relatively new and, being and open standard, allows open discussion on what must be included in the standard. For instance the standard defines 32-bit (RV32), 64-bit (RV64) and even 128-bit (RV128) address space variants, it is thought to support manycore implementations (including heterogeneous multiprocessors), to be fully virtualizable to ease hypervisor development and provides features like a 16-bit compressed Instruction extension (C) to increase performance, code density and power efficiency. The standard atomic instruction extension (A) adds instructions that atomically read, modify, and write memory for inter-processor synchronization using load-link/store-conditional instructions instead of compare-and-swap instructions, thus avoiding the ABA problem affecting the latter and allowing a straightforward use of modern crossbars that don't support locked accesses, like AMBA AXI4. Furthermore, RISC-V is little-endian, allowing straightforward integration with the most popular state-of-the-art embedded infrastructures and proprietary architectures.

## **3** State of the Art

UC Berkeley and SiFive have released an open-source SoC generator called Rocket Chip to easily configure a SoC and automatically generate the synthesizable RTL (Verilog). It is written in Chisel, a hardware construction language based on Scala, and can be configured to generate a wide range of SoCs. Based on the Rocket Chip generator, SiFive has released several IPs, components and development boards (e.g. the Arduino-compatible HiFive1 and the Linux-capable HiFive Unleashed). SiFive also provided IP cores for the Mi-V ecosystem by Microsemi for their flash-based line of FPGAs, which comprises the radiation-tolerant RTG4 FPGA. The RISC-V software ecosystems is maturing quickly. Several ISA simulators (e.g. Spike and QEMU), C compilers (e.g. GCC), C libraries (e.g. glibc and newlib) and debugging tools (e.g. gdb) are already available. The availability of such ecosystem ignited the development of several open-source hardware platforms from several universities and companies. For instance ETH Zürich and University of Bologna are working on the Parallel Ultra Low Power (PULP) Platform, an ultra-low-power processing platform mainly targeted to the Internet of Things (IoT) [7]. It is based on several processors, ranging from a simple 2-stage 32-bit core (RV32EC) supporting compressed instructions to a 6-stage Linux-capable core (RV64GC) with caches and TLBs, and comprises several SoCs architectures (e.g. the single core PULPino and the many-core PULP). The platform provides also other IPs for communication (I2C, UART, SPI, etc.) and can be easily integrated in FPGAs and extended, as it is built around the popular AMBA crossbar AXI4 and the peripheral bus APB. The source code is written in SystemVerilog, an HDL fit both for design and verification which is becoming increasingly more popular in the commercial field. Based on the PULPino SoC, Sapienza University of Rome released Klessydra, an open-source SoC with multi-threaded CPU in VHDL [8]. Other open-source implementations are already available, as VectorBlox Computing's Orca, a stand-alone VHDL implementation of RV32I and RV32IM intended to target FPGAs and developed to be also included in their proprietary products. Also established players announced developments based on RISC-V. For instance Western Digital, a founder of the RISC-V Foundation, announced that over the next few years all the processors shipped within their products will be transitioned over to RISC-V [9] and several other big players are allegedly working on such transition.

## **4** Future Developments and Benefits for Space Industry

Processors in space face unique challenges due to the effects of the space environment. While components must be designed keeping into account the physical effects of a wide temperature range, mechanical vibrations, vacuum and radiations, when considering an IP core everything is lumped into functional faults. It is up to the IP core designer to consider how the space environment will induce functional faults and how to counteract to them with an effective area/power/frequency trade-off. The designer can typically chose between:

- a Fault-Tolerant (FT) IP Core designed from scratch
- a FT processor obtained modifying a COTS IP core
- a COTS IP core without any modifications.

The last choice has become very popular over the last years, as the use of proprietary COTS processors has been indicated as a solution to reduce costs and increase performance, relying only on system level enhancement to fault tolerance. However, such processors were not intended for the specific needs of electronics for space and sometimes their lack of functionalities can't be compensated at system level [10]. A typical example for this is observability: if the IP core or the ISA was not designed to let the user know what is happening inside the box, the response at system level may be inadequate or inefficient, resulting in reduced safety and availability. Implementing fault-tolerance at system level often implies heavy redundancies (e.g. Triple Modular Redundancy of the processors) with big penalties in terms of power and size. The use of an FT-enhanced COTS IP core, still compatible with the original software ecosystem, seems then the best approach to effectively keep into account the effects of the space environment while keeping costs and development time for both hardware and software under control. The main obstacle to this is that typically commercial processors and ISAs are proprietary: accessing the source of the IP core or obtaining a license to use the ISA may be too expensive, limit what can be done with the final product or simply be not possible. If the base of the FT processor is a open-source IP core based on an open and free ISA, then modifying the RTL becomes a viable way. The adoption of an open and free ISA that is designed to be modular and easily extendable like RISC-V enabled a vast field of research activities both for terrestrial applications (e.g. security, artificial intelligence, multi-threading, digital signal processing, etc.). The Space Industry can then spin-in developments from other fields, focusing its efforts mainly on improvements concerning the specific needs in space applications and without wasting efforts on non-strategic activities.

Nevertheless open and free ISAs offers advantages also in the case of developments of FT processors from scratch, as there is no need to pay for a proprietary ISA and the final product can be owned by the designer.

## 5 Conclusion

Thanks to the open and modular nature of RISC-V, designers are free to implement whatever architecture is deemed best for their applications, from low performance/low power microcontrollers to high performance CPU for payload applications to reliable processors handling a large number of tasks. The introduction of RISC-V in space will contribute to providing a range of alternatives to proprietary solutions in a frame of new architectures for on-board embedded systems, as concerns are growing about monopolistic positions in the embedded market. RISC-V looks like the best solution in this case, as it is backed by big commercial players and academia exactly for this reason. Further works and studies will give to the European Space Industry the tools and the knowledge required to choose proprietary solutions when actually needed and choose several degrees of openness when possible. This will lead the European Space Industry to build the next generation of embedded systems for space with a harmonious and effective mix of the two approaches, taking the best from both worlds.

### References

- 1. Margan, D., et al.: The Success of Open Source Software: A Review. MIPRO (2015)
- Gaisler, J.: A Portable and fault-tolerant microprocessor based on the SPARC V8 Architecture. In: International Conference on Dependable Systems and Networks (2002)
- 3. Furano, G., et al.: Roadmap for on-board Processing and Data Handling Systems in Space. Springer International (2017)

- 4. Waterman, A., et al.: The RISC-V instruction set. 2013 IEEE HCS
- Salmon, L.G.: A perspective on the role of open-source ip in government electronic systems. In: 7th RISC-V Workshop Proceedings, November 2017
- 6. Patterson, D.A., et al.: A VLSI RISC, IEEE Computer (1982)
- Rossi, D., et al.: PULP: a parallel ultra low power platform for next generation IoT applications. In: 2015 IEEE Hot Chips 27 Symposium (2015)
- 8. Cheikh, A., et al.: The microarchitecture of a multi-threaded RISC-V compliant processing core family for IoT end-nodes. In: APPLEPIES (2017)
- 9. Fink, M.: RISC-V: enabling a new era of open data-centric computing architectures. In: 7th RISC-V Workshop Proceedings, November 2017
- 10. Furano, G., et al.: A novel method for SEE validation of complex SoCs using Low-Energy Proton beams. 2016 IEEE DFT

# Encoder-Motor Misalignment Compensation for Closed-Loop Hybrid Stepper Motor Control



Stefano Ricci, Valentino Meacci, Dario Russo and Riccardo Matera

Abstract Field-Oriented Control (FOC) of hybrid stepper motors allows high performance in motor movements. In FOC, the shaft position is tracked through an encoder (or a similar device), and the stator magnetic field orientation is continuously adjusted to maintain the desired load angle, i.e. the phase between the magnetic fields produced by the stator windings and the rotor magnet. Unfortunately, measuring the load angle with high accuracy is not trivial. For example, in a typical 200 step/turn motor, the electro-mechanical configuration repeats every 7.8°, and a 5% load angle accuracy requires a 5/50 = 0.1% absolute accuracy in the alignment among motor windings and encoder and/or their coupling. When this is not achieved, torque and velocity are affected by oscillations. In this paper a simple solution is proposed where the misalignment errors are mapped with an open-loop motor run, and then compensated during the normal FOC employment. Experiments with a 1.1 Nm, 2-phase, 200-step/turn hybrid stepper motor show how the proposed method reduces the velocity oscillations in a constant torque condition.

# 1 Introduction

Today most commercial and industrial apparatuses include several mechanical parts that should move quickly and accurately among a sequence of positions: a 3D printer represents a noticeable example. The 2-phase hybrid stepper motor [1] is often selected for handling this kind of work. In fact, a typical 200 step/turn stepper motor allows to control the shaft position within 1.8° precision requiring some minimal electronics. When used in open loop, a digital device, (typically a microcontroller) generates pulse-bursts and a direction commands, which are converted by a power driver in the 2-phase currents that feed the motor. For every input pulse the driver modifies the phase-currents so that the magnetic field produced in the stator rotates

S. Ricci (🖂) · V. Meacci · D. Russo · R. Matera

Information Engineering Department, University of Florence,

Via S. Marta no. 3, 50123 Florence, Italy

e-mail: Stefano.ricci@unifi.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_38



Fig. 1 The electro-mechanical configuration in a hybrid stepper motor repeats every 4-step. The torque is related to the load angle  $\theta_L$ , which is the difference between the positions of the magnetic fields produced by the stator windings (FP) and the rotor magnet (RP)

by a "step" or a fraction of it ( $\mu$ -step) [2]. Unfortunately, if the motor load accidentally increases over the motor capacity, the control misses the position and the application fails. Closed-loop motor control [3], with Field Oriented Control (FOC) method [4] solves this problem. In FOC the physical position of the shaft (RP) is read (see Fig. 1), by an encoder or a similar device, and compared to the position of the magnetic field (FP) generated by the stator windings. Their difference represents the load angle,  $\theta_L$ , which is directly related to the motor torque *T*:

$$T = K \cdot \sin(\theta_L) \qquad \theta_L = FP - RP \tag{1}$$

where *K* is the torque constant. The FOC dynamically adjusts FP for the maximum torque, i.e.  $\theta_L = 90^\circ$  (different strategies are possible depending on the application). A possible error in the readings of the shaft and/or the winding magnetic field positions produces a corresponding error in the load angle, that affects the motor torque. The result is an undesired torque variation and velocity oscillation.

Unfortunately, reading the relative positions of the shaft and the windings magnetic field with the necessary accuracy is not trivial, since several inaccuracies can affect the measurement. The windings magnetic field position is extrapolated by the currents imposed to the windings, but this assumption does not consider, for example, possible inaccuracies in the winding construction and symmetry. The encoder is typically connected through a flexible mechanical coupling, but if the rotor/encoder axes are not perfectly aligned, a non-homokinetic movement along the turn is present. Shaft position is tracked by the encoder, whose possible non-linearity along the turn affects the measurement as well. Let us consider, for example, a typical 200 step/turn stepper motor. The electro-mechanical configuration of the magnetic fields represented in Fig. 1 repeats every 4 steps, i.e. 50 fold-per-turn [1]. This means that, if we need to measure the load angle  $\theta_L$  with -let say- a 5% accuracy, the aforementioned inaccuracy should be less than 5/50 = 0.1%. In other words, a small error in the construction of the motor, in the mechanical coupling between motor shaft and encoder, and/or the linearity of the encoder itself, will produce relatively large errors in load angle assessment.

In this work a simple but effective procedure for compensating this error is proposed. The misalignment errors between the theoretical position of the stator magnetic field and the shaft position are first mapped by driving the stepper motor in open-loop for a shaft turn. Then, the map is saved and used during the normal motor use. The method was implemented in a Field Programmable Gate Array (FPGA), nowadays a standard in industrial applications [5, 6], in a system dedicated to motor control. The presented experiments show how this method reduces velocity oscillations in a 1.1 Nm stepper motor coupled with a 10 k pulse/turn encoder.

#### 2 The Proposed Method

#### 2.1 Correction Table Calculation

We assume a 200 step/turn hybrid motor connected to a 10 k pulse/turn encoder. Apart from the encoder, the motor has no load. The motor is driven in open-loop in  $\frac{1}{2}$  step mode [1], i.e. 400  $\frac{1}{2}$  steps per turn. With reference to the pseudo-code of Fig. 2, left, SP is the theoretical winding position over the physical motor turn in  $\frac{1}{2}$  step units, i.e. the  $\mu$ -step counter, and P is the shaft position read by the encoder, in encoder units.

At the beginning the motor is moved to the zero-absolute reference provided by the encoder, and this is assumed as position P = 0. Here the  $\mu$ -step counter is zeroed as well (SP = 0). Then the algorithm enters a loop. In each iteration the motor is moved forward of <sup>1</sup>/<sub>2</sub> step and SP is incremented accordingly. After the movement, the motor rests to damp the oscillations. At rest, since the motor has a negligible load, the load angle is zero and the shaft aligns to the physical position of the winding field. In this condition P is read from the encoder, and subtracted from the its theoretical position, i.e. SP-10000/400, where 10000/400 is the conversion factor between the encoder (10000 ticks/turn) and the  $\frac{1}{2}$  step (400  $\mu$ -step/turn) scale. This difference (Err) represents an estimation of the alignment inaccuracy present in this particular position, in encoder units, which we would like to compensate for. Thus, the error is saved in a table at the addresses SP. The procedure is repeated until SP < 400, i.e. a shaft revolution is completed. At the end of the loop, a table is filled with the position errors along the turn with a <sup>1</sup>/<sub>2</sub> step resolution. The table is finally remapped to encoder units to be used in the run-time procedure detailed in Sect. 2.2. Figure 2, right, reports, for example, the position error measured in the set-up used in the



Fig. 2 Left: Procedure for error estimation; Right: Example of misalignment between encoder readings and theoretical position measured in the experimental set-up described below

experiments described in the Sec. 3. In this example the error is within  $\pm 10$  tick encoder, corresponding to  $\pm 20\%$  of an electrical step.

## 2.2 Misalignment Correction in Run-Time

With reference to Fig. 3, let's consider again a 10 k pulse/turn quadrature encoder connected to the shaft of a 200 step/turn motor. The misalignment errors are already mapped and saved in the correction table as previously explained. The shaft position along the turn are tracked by the up/down counter in the 0–9999 range. Since the error depends on the position along the turn, the shaft position is used as address for the correction table. The correction value is subtracted from the uncorrected position. Now the corrected value is mapped in the electro-mechanical configuration of Fig. 1 by dividing the value by 50 and taking the remainder (Mod200). The result, i.e. RP, represents the corrected shaft position, that now can be safely used in the FOC processing.

#### **3** Experiments and Results

#### 3.1 Experimental Set-up

The proposed method was tested on a system [7] based on the following 3 electronics boards. The DECA MAX10 (Arrow Electronics, Centennial, CO) includes an FPGA



Fig. 3 Data processing chain used for the correction of the shaft position

from the low-cost MAX10 family [8] produced by Intel-Altera (San Jose, CA). It is connected to a house-made interface board that includes the basic electronics to read the quadrature signals from incremental encoders, and to connect to the EVLPOWERSTEP01 evaluation boards of the PowerSTEP01 motor control device [9]. The board and the device are both produced by STMicroelectronics (Geneva, Switzerland). PowerSTEP01 is able to drive a stepper motor with up to 85 V and 10A currents. The proposed method (Fig. 3) was integrated in the MAX 10 FPGA, together with a basic FOC algorithm [4] and others ancillary codes necessary for the project. A NIOSII® soft processor was included in the FPGA as well and employed for system supervision, for implementing the table filling procedure described in Sect. 2.1, and for connecting to a host PC used for setting parameters and start/stop the motor.

A M1233041 hybrid stepper motor (Lam Technologies, Florence (FI), Italy) was connected to the PowerSTEP01 driver and coupled to a REV621 (Elap, Milan (MI), Italy). Details of this devices are reported in Table 1. The driver was powered at 45 V, the FOC was set to maintain the maximum torque, i.e. a  $\theta_L = 90^\circ$ . The first experiment was performed without the misalignment compensation, i.e. with the correction table zeroed. Then the table was calculated, and the experiment was repeated. The position read by the encoder was saved, in both cases, for post-analysis.

#### 3.2 Results

Figure 4 reports the velocity obtained by differentiating the encoder reading in a 200 ms time span. When the compensation is disabled (Fig. 4 top), after about 100 ms the mean velocity stabilized to 26.8 revolutions/s, but a periodic oscillation of more than 2 revolutions/s is clearly visible on top of the velocity. It should be

| Motor                          |                                           | Encoder          |                            |  |  |
|--------------------------------|-------------------------------------------|------------------|----------------------------|--|--|
| Model                          | M1233041                                  | Model            | REV621                     |  |  |
| Manufacturer                   | Lam Technologies,<br>Florence (FI), Italy | Manufacturer     | Elap, Milan (MI),<br>Italy |  |  |
| Flange                         | NEMA23                                    | Zero reference   | yes                        |  |  |
| Step angle                     | 1.8° (200 step/rev)                       | Pulse/revolution | 10000                      |  |  |
| Hold torque                    | 1.1 N·m                                   |                  |                            |  |  |
| Current                        | 4.2 A                                     |                  |                            |  |  |
| Phase<br>resistance/Inductance | 0.4 O/1.2 mH                              |                  |                            |  |  |

 Table 1
 Main devices employed in experiments



Fig. 4 Velocity measured without (top) and with (bottom) misalignment compensation

noted that the oscillation period (~37 ms) corresponds to the time needed for a shaft revolution (1/26.8 s). When the compensation is activated (Fig. 4, bottom), the velocity oscillation is greatly reduced. Moreover, with compensation, the velocity rises to 27.8 revolutions/s, i.e. the peak velocity without compensation, indicating that the proposed method eliminated the torque loss due to the motor/encoder misalignment.

# 4 Conclusion

This work presents a simple technique suitable for compensating for the possible alignments inaccuracies in close-loop hybrid stepper motor control. The reported experiment shows a reduction of the velocity oscillation from 7.5% (without compensation) to 1.8% (with compensation).

Acknowledgements This work is part of the MIPEC project (CUP 4421.02102014.072000051), funded by the Tuscany Region government through the FAR-FAS 2014 program. Authors thank Microtest srl (Altopascio, Italy) and the DIEF of the University of Florence (Italy) for their support.

# References

- 1. Athani, V.V.: Stepper Motors: Fundamentals, Applications and Design, New Age Inpernational (P) Ltd, New Delhi, India, reprint. ISBN 978-8122410068 (2005)
- Gaan, D.R., Kumar, M., Sudhakar, S.: Real-time precise position tracking with stepper motor using frequency modulation based microstepping. IEEE J. Ind. Appl. 54(1), 693–701 (2018). https://doi.org/10.1109/TIA.2017.2753158
- Le, K.M., Hoang, H.V., Jeon, J.W.: An advanced closed-loop control to improve the performance of hybrid stepper motors. IEEE T. Power Electr. 32(9), 7244–7255 (2016). https://doi.org/10. 1109/TPEL.2016.2623341
- Kim, W., Yang, C., Chung, C.C.: Design and implementation of simple field-oriented control for permanent magnet stepper motors without DQ transformation. IEEE T Magn. 47(10), 4231–4234 (2011). https://doi.org/10.1109/TMAG.2011.2157956
- Ricci, S., Meacci, V., Birkhofer, B., Wiklund, J.: FPGA-based system for in-line measurement of velocity profiles of fluids in industrial pipe flow. IEEE T Ind. Electron. 64(5), 3997–4005 (2017). https://doi.org/10.1109/TIE.2016.2645503
- Ricci, S., Liard, M., Birkhofer, B., Lootens, D., Brühwiler, A., Tortoli, P.: Embedded doppler system for industrial in-line rheometry. IEEE Trans. Ultrason. Ferroelect. Freq. Contr. 59(7), 1395–1401 (2012). https://doi.org/10.1109/TUFFC.2012.2340
- 7. Ricci, S., Meacci, V.: Simple torque control method for hybrid stepper motors implemented in FPGA. Electr. **7**(10), 242 (2018). https://doi.org/10.3390/electronics7100242
- 8. Intel: Intel MAX10 FPGA Device Datasheet. M10-DATASHEET, December 2017. https://www.altera.com/documentation/mcn1397700832153.html
- STMicroelectronics: PowerStep01 Datasheet. DocID025022 Rev 6, November 2017. http:// www.st.com/resource/en/datasheet/powerstep01.pdf

# Fully Integrated Galvanically Isolated DC-DC Converters Based on Inductive Coupling



Egidio Ragonese, Alessandro Parisi, Nunzio Spina and Giuseppe Palmisano

Abstract The paper reviews the most interesting architectures of integrated isolated dc-dc converters based on inductive coupling. Circuit implementations for both power and data/control links are discussed by taking advantages of a 0.35-µm BCD technology with thick-oxide back-end. The paper is mainly addressed to isolated sensor interface and gate-driver applications. Innovative dc-dc converter architectures are proposed for both applications, highlighting advantages and drawbacks with respect to traditional implementations.

# 1 Introduction

Nowadays, galvanic isolation is mandatory when high-power equipment is operated by human beings or to guarantee better reliability in harsh industrial environments. Galvanic isolation is also required for relatively low power levels e.g., sensor interfaces, serial link transceivers, low-power medical devices, and housekeeping power, such as gate-drivers or controllers for power converters. Figure 1a summarizes the most important application fields where galvanic isolation is adopted. A general block diagram of a galvanically isolated system is depicted in Fig. 1b. Two domains, A and B, are isolated since one of them is subject to hazardous voltages and/or requires a different ground reference. Data signals are transferred across the galvanic isolation barrier to enable bidirectional communication between the two domains, while an

E. Ragonese (🖂) · A. Parisi · G. Palmisano

DIEEI, Università di Catania, Viale A. Doria 6, 95125 Catania, Italy e-mail: egidio.ragonese@unict.it

A. Parisi e-mail: alessandro.parisi@dieei.unict.it

G. Palmisano e-mail: giuseppe.palmisano@unict.it

N. Spina STMicroelectronics, Stradale Primosole 50, 95121 Catania, Italy e-mail: nunzio.spina@st.com

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_39



Fig. 1 a Galvanic isolation application fields. b Simplified block-diagram of a galvanically isolated system. c Metal stack with thick-oxide option for galvanic isolation

isolated power supply for domain B is provided from domain A by a power transfer technique. Usually, isolated power levels between 100 mW and 1 W with data rate up to 100 Mbit/s can be required. Traditional isolation devices i.e., optocouplers and discrete transformers, can be replaced by highly integrated isolators exploiting capacitive or inductive transfer techniques [1]. This paper presents a review of the most interesting architectures for fully integrated isolated dc-dc converters with transformer-based on-chip galvanic barriers. To this aim, a 0.35- $\mu$ m BCD technology enriched with a thick-oxide back-end by STMicroelectronics is exploited. A simplified cross-section of the technology is shown in Fig. 1c [2]. An isolation rating,  $BV_{AC}$ , of 5 kV is guaranteed by the oxide layer between a Cu-thick metal 4 and an Al-thin metal 3, while 6-kV isolation can be achieved by using also the oxide layer up to the metal 2. The paper is organized as follows. Section 2 deals with the isolated power link design, while different dc-dc converter architectures are discussed in Sect. 3.

#### 2 Isolated Power Link Design

The design of an isolated dc-dc converter poses several challenges, especially for the isolated power link. The latter typically exploits a transformer-loaded power oscillator topology, which performs efficient dc-ac conversion. Isolation is guaranteed by the dielectric layer between the transformer windings (e.g., thick silicon oxide). On

the other hand, the rectifier usually exploits traditional full-bridge topologies that achieve high power efficiency without increasing system complexity. If available, Schottky diodes are used for a better rectifier efficiency. The power link is characterized by non-linear interactions between the blocks (i.e., the oscillator, the isolation transformer, and the rectifier) and requires system optimization to maximize the overall efficiency,  $\eta$ , that is the product of the efficiency of each block.

Figure 2 shows the most adopted power oscillator topologies with the corresponding isolation transformer and die micrographs, namely cross-coupled oscillator with stacked transformer, inductive-coupled current-reuse oscillator with threewinding interleaved transformer, and hybrid-coupled current-reuse oscillator with three-winding tapped transformer [3-6]. The first topology is based on an n-LDMOS cross-coupled pair. The oscillator is operated in D-class with the LDMOS transistors working as power switches and withstanding high output voltage oscillation at their drains to guarantee high power efficiency [7]. A traditional stacked isolation transformer is used, which requires a center tap on the primary winding to connect the power supply. The transformer can also be used to step-up the oscillation voltage by properly setting the turn ratio, as usually required in dc-dc converters for gate-driver applications where the output voltage is about 20 V. The second oscillator topology is an enhanced version of the traditional complementary cross-coupled oscillator. It takes advantage of current-reuse and power combining techniques to improve both output power and efficiency [8]. The reduced oscillation voltage imposed by the p-type pair makes possible implementation with a low-cost standard CMOS technology, which is an advantage with respect to the LDMOS cross-coupled pair topology. On the other hand, it requires a three-winding isolation transformer with a more complex configuration compared to the D-class oscillator. Oscillation synchronization at primary windings is mandatory and can be obtained by means of a pure inductive or a hybrid coupling (i.e., inductive and capacitive) between the p-MOS and n-MOS cross-coupled pairs. These different coupling mechanisms can lead to different implementations of the transformer primary windings. Indeed, interleaved or tapped spirals, can be used for pure inductive or hybrid coupling, respectively, as shown in Fig. 2b, c. While the interleaved configuration achieves a higher power efficiency at the cost of a lower power density, the tapped one considerably improves power density but slightly reduces efficiency. Both topologies are suitable for low-voltage applications since a small voltage step-up can be performed. Table 1 compares main performance of three isolated dc-dc converters based on oscillator topologies of Fig. 2. All dc-dc converters use a Schottky diode full-bridge rectifier with a power efficiency of about 75–80%. Reinforced isolation rating up to 10 kV [1] can be achieved by exploiting two thick-oxide isolation barriers connected in series and using different oscillator topologies, as in [9].



Fig. 2 Power oscillator topologies, corresponding isolation transformers and die micrographs: a cross-coupled oscillator, b inductive-coupled current-reuse oscillator and c hybrid-coupled current reuse oscillator

| <b>Table 1</b> Comparison of isolated de-de converters based on oscillator topologies of Fig. 2 | Table 1 | Comparison of isolated dc-d | lc converters based o | n oscillator topologies | s of Fig. 2 |
|-------------------------------------------------------------------------------------------------|---------|-----------------------------|-----------------------|-------------------------|-------------|
|-------------------------------------------------------------------------------------------------|---------|-----------------------------|-----------------------|-------------------------|-------------|

| Power oscillator<br>topology    | Rectifier<br>topology      | Pout<br>[mW] | η<br>[%] | VDD/VOUT<br>[V] | ∫osc<br>[MHz] | Isolation<br>[kV] | Power density<br>[mW/mm <sup>2</sup> ] |
|---------------------------------|----------------------------|--------------|----------|-----------------|---------------|-------------------|----------------------------------------|
| Inductive-coupled current-reuse | Schottky diode full bridge | 200          | 27       | 5/8             | 240           | 5                 | 19                                     |
| Hybrid-coupled current-reuse    | Schottky diode full bridge | 300          | 24       | 5/10            | 225           | 5                 | 36                                     |
| n-LDMOS cross-coupled           | Schottky diode full bridge | 980          | 30       | 5/20            | 165           | 5                 | 105                                    |

# **3 DC-DC Converter Architectures**

Traditional architectures for isolated dc-dc converters exploit several isolated links, each of them requires a dedicated isolation component [10]. In other terms, such architectures include at least three transformers to implement:

- an isolated link for the power transmission (i.e., the isolated power link);
- an isolated link for the feedback path for the output power regulation;
- several dedicated isolated links, each one for each data channel.

A fully integrated single-transformer architecture for low-power isolated dc-dc is reported in Fig. 3a [11]. The main idea is to use only one isolated channel for both power and bidirectional half-duplex data communication by means of an ASK modulation of the power oscillating signal at the primary and/or at the secondary



Fig. 3 Block diagrams and silicon implementations of innovative dc-dc converters with data communication  $\mathbf{a}$  without output power regulation  $\mathbf{b}$  with output power regulation

windings of the isolation transformer. Demodulation circuitries recover data and clock bit stream on both the first and second interfaces. This solution can be profitably exploited when power regulation is not mandatory. Indeed, data communication requires the presence of the power signal and thus typical power control based an on/off modulation of the oscillator (i.e., PWM modulation, Bang-Bang control scheme) cannot be implemented. The architecture is suitable for low-voltage isolated sensor interface applications with maximum output power, power efficiency and data rate of about 30 mW, 10% and 40 Mb/s, by using a power oscillation frequency as high as 330 MHz [12]. Its main drawback is the low rejection to common-mode transient interferences (CMTI) due to primary-to-secondary winding capacitances, especially when a large power transformer is used for better efficiency (e.g., in highpower applications). In this case, the alternative architecture shown in Fig. 3b can be profitably exploited [13]. It consists of only two isolated links i.e., a dedicated high-efficiency power link and an isolated signal link. The latter is used for both output voltage/power regulation and bidirectional half-duplex data communication. The output voltage  $V_{\rm ISO}$  is regulated by means of a feedback control link that exploits a low-power RF oscillator, whose oscillation peak voltage changes according to the output power imposed by reference voltage  $V_{\text{REF,OUT}}$  and load resistance  $R_{\text{L}}$ . The oscillation peak voltage of the RF oscillator is hence the control variable. It drives the power control block that produces a PWM signal,  $V_{\rm CTR}$ , whose duty cycle is strictly related to the oscillation voltage, which turns on and off the power oscillator.

The control link is also exploited for bidirectional half-duplex data communication by means of ASK modulation of the RF oscillation. High-speed (HS) data stream (from chip B to chip A) also exploits a common-mode transient (CMT) rejection block to improve the CMTI. The architecture is suitable for gate driver applications with output power, power efficiency and data rate of about 100 mW, 20% and 50 Mb/s, by using oscillation frequencies of about 350 MHz and 850 MHz for power and control/data channels, respectively [14]. The architecture can be also extended to reinforced isolation applications up to 10 kV by using two thick-oxide isolation barriers connected in series for both power and control/data links [15].

Acknowledgements The authors would like to thank V. Palumbo of STMicroelectronics for technology support. They also thank P. Lombardo, V. Fiore, and N. Greco for their technical contributions within their Ph.D. activities.

### References

- 1. DIN VDE Semiconductor Devices-Magnetic and Capacitive Coupler for Basic and Reinforced Isolation, VDE Verlag VDE V 0884–11, Jan. 2017
- 2. Palumbo, V., Ghidini, G., Carollo, E., Toia, F.: Integrated transformer. U.S. Patent App. 14733009, filed Jun. 8, 2015
- Fiore, V., Ragonese, E., Palmisano, G.: A fully-integrated watt-level power transfer system with on-chip galvanic isolation in silicon technology. IEEE Trans. Power Electron. 32, 1984–1995 (2017)
- Spina, N., Fiore, V., Lombardo, P., Ragonese, E., Palmisano, G.: Current–reuse transformer coupled oscillators with output power combining for galvanically isolated power transfer systems. IEEE Trans Circuits Syst. I: Reg. Papers 62, 2940–2948 (2015)
- Greco, N., Spina, N., Fiore, V., Ragonese, E., Palmisano, G.: A galvanically isolated DC–DC converter based on current-reuse hybrid–coupled oscillators. IEEE Trans. Circuits and Syst. II: Exp. Briefs. 64, 56–60 (2017)
- Greco, N., Parisi, A., Spina, N., Ragonese, E., Palmisano, G.: Integrated transformer modelling for galvanically isolated power transfer systems. In: Proceedings of conference on PhD Research in Microelectronics and Electronics (PRIME), Giardini Naxos, Italy, pp. 325–328, Jul. 2017
- Fanori, L., Andreani, P.: Class-D CMOS oscillators. IEEE J. of Solid-State Circuits 48, 3105–3119 (2013)
- Ragonese, E., Fiore, V., Spina, N., Palmisano, G.: Power oscillator apparatus with transformerbased power combining. US patent 9240752 B2, granted 19 Jan. 2016
- Greco, N., Parisi, A., Lombardo, P., Palmisano, G., Spina, N., Ragonese, E.: A 100-mW fully integrated DC-DC converter with double galvanic isolation. In: Proceedings 43rd IEEE Europe Solid-State Circuits Conf. (ESSCIRC), pp. 291–294, Sep. 2017
- Analog Devices. ADuM6201–Dual-Channel, 5 kV Isolators with integrated DC-to-DC converter datasheet. http://www.analog.com. Accessed 2018
- Ragonese, E., Fiore, V., Spina, N., Lombardo, P., Palmisano, G.: Power oscillator apparatus with transformer-based power combining for galvanically-isolated bidirectional data communication and power transfer. US patent 9306614 B2, granted 05 April 2016
- Lombardo, P., Fiore, V., Ragonese, E., Palmisano, G.: A fully-integrated half-duplex data/power transfer system with up to 40 Mbps data rate, 23mW output power and on-chip 5 kV galvanic isolation. In: Proceedings IEEE International Solid-State Circuits Conf. Tech. Dig. (ISSCC), pp. 300–301, Feb. 2016

- Ragonese, E., Spina, N., Lombardo, P., Greco, N., Parisi, A., Palmisano, G.: Galvanically isolated DC-DC converter with bidirectional data transmission. U.S. Patent 9948193 B2, granted 17 April 2018
- Ragonese, E., Spina, N., Castorina, A., Lombardo, P., Greco, N., Parisi, A., Palmisano, G.: A fully integrated galvanically isolated DC-DC converter with data communication. IEEE Trans. Circuits and Syst. I: Reg. Papers 65, 1432–1441 (2018)
- Greco, N., Parisi, A., Lombardo, P., Spina, N., Ragonese, E., Palmisano, G.: A double-isolated dc-dc converter based on integrated LC resonant barriers. Trans. Circuits Syst. I: Reg. Papers 65, 4423–4433 (2018)

# A Distributed Condition Monitoring System for the Non-invasive Temperature Measurement of Heat Fluids Circulating in Turbomachinery Pipes Based on Self-Powered Sensing Nodes



## Tommaso Addabbo, Elia Landi, Riccardo Moretti, Marco Mugnaini, Lorenzo Parri and Marco Tani

**Abstract** In this paper a network of temperature measurement sensors based on the dual heat flux method for the non-invasive monitoring of the temperature of hot fluids circulating in pipes is presented. Each node has an energy-harvesting power supply based on Peltier cells exploiting the heat dissipated by the pipes. The measured data are transmitted over a LoRa radio communication channel. The proposed solution aims to answer to the need of easy-to-mount, low-power, distributed networks of sensors for Industry 4.0 and Internet of Things purposes. The network has been tested evaluating the sensor performances and the energy balance.

**Keywords** Condition monitoring  $\cdot$  Industry 4.0  $\cdot$  IoT  $\cdot$  LoRa  $\cdot$  Dual heat flux method  $\cdot$  Seebeck effect  $\cdot$  Energy harvesting

T. Addabbo · E. Landi · R. Moretti (🖂) · M. Mugnaini · L. Parri · M. Tani

Department of Information Engineering and Mathematics, University of Siena,

Via Roma 56, 53100 Siena, Italy e-mail: moretti@diism.unisi.it URL: http://www.diism.unisi.it

T. Addabbo e-mail: addabbo@diism.unisi.it URL: http://www.diism.unisi.it

E. Landi e-mail: landi@diism.unisi.it URL: http://www.diism.unisi.it

M. Mugnaini e-mail: mugnaini@diism.unisi.it URL: http://www.diism.unisi.it

L. Parri e-mail: parri@diism.unisi.it URL: http://www.diism.unisi.it

M. Tani e-mail: tani@diism.unisi.it URL: http://www.diism.unisi.it

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_40 343

## 1 Introduction

Condition Monitoring (CM) is the process of monitoring one or more condition parameters in machinery, aiming to estimate the system health and to support preventive maintenance actions to prevent major faults [1-3].

A CM system should monitor the running machinery identifying and locating defects in detail. To perform these tasks, four main functions should be typically accomplished by a CM system [4]:

- sensing;
- data acquisition;
- fault detection;
- diagnosis.

Focusing on sensing, the design of the hardware should be carried out taking into account the monitoring method and the knowledge about the failure mechanisms on the machine, which are usually accompanied by changes in physical quantities like temperature, pressure or acceleration. Commonly, the sensors should be suitable for online measurement, providing good sensitivity and non-invasiveness.

In the last years, CM and sensors have gained even more importance with respect to the past, thanks to the new trends in Industry 4.0 and Internet of Things (IoT). In fact, the need of information requires to achieve great amounts of data from the monitored area, which can be obtained by using (possibly wireless and low-power) networks of distributed sensors [5–7].

In this work we present a system of temperature sensors based on the dual heat flux method for the non-invasive monitoring of the temperature of hot fluids in pipes. The system has an energy-harvesting power supply based on Peltier cells exploiting the heat dissipated by the pipes. The measured data are transmitted over a LoRa radio communication channel. The target application is the CM of turbomachinery for energy conversion or production [3, 8–10].

#### 2 Sensing Node Structure

The non-invasive measurement sensing node in the network is based on the dual heat flux method, that can be used as an alternative to thermowells in industrial applications.

In this paper the sensing device is described in a synthetic way. An extensive description, including a comparison with the standard techniques employed for measurements of this kind, is provided in our previous work [8].

A device of this kind consists of two insulating rings of different thickness on which four Resistance Temperature Detectors (RTDs) are mounted (two per ring, one on the inner face, one on the outer face). As shown in Fig. 1, the two rings are used to insulate two adjacent pipe sections, keeping a small gap between them to

**Fig. 1** Scheme of the dual heat flux insulating rings mounted on a pipe

Fig. 2 Sensing device and pipe equivalent circuit:  $R_1$ ,  $R_2$  and  $R_f$  describe the thermal characteristics of the insulators and the fluid,  $T_a$ ,  $T_f$ ,  $T_1$  and  $T_2$  are the ambient, the fluid and the pipe temperatures respectively



avoid thermal coupling. The ratio between the width and the length of the rings is chosen in order to get a dominant heat flux from the internal surface to the external surface of the rings.

This allows to model the system with an equivalent circuit, in which the voltages and currents correspond to temperatures and heat flows (respectively). The circuit is shown in Fig. 2, where  $T_a$  is the ambient temperature,  $T_f$  is the fluid temperature,  $T_1$ and  $T_2$  are the pipe temperatures below the insulators,  $R_f$  is the thermal resistance given by the fluid,  $R_1$  and  $R_2$  are the resistances given by the insulators. Temperatures are measured in °C, heat flows in W and thermal resistances in °C/W.

As shown in [8], solving the equivalent circuit the fluid temperature is obtained:

$$T_f = \frac{T_1 + T_2}{2} + \frac{T_1 - T_2}{2} \cdot \frac{\frac{R_1}{R_2} \cdot (T_2 - T_4) + (T_1 - T_3)}{\frac{R_1}{R_2} \cdot (T_2 - T_4) - (T_1 - T_3)}.$$
 (1)

## **3** Electronics

The device electronics is composed of:

- the temperature sensors conditioning circuit;
- the energy-harvesting module;
- the LoRa transmitting module;
- the digital control of the power supply.

The temperature sensing system estimates the fluid temperature starting from the information coming from four Pt100 RTDs. The conditioning electronics implements a four-wire reading, with a constant current excitation. The front-end is composed by a current buffer stage (realized using a voltage reference REF3425 and an operational



Fig. 3 Architecture of the proposed system

amplifier TLV2371) which gives the excitation and an amplifying stage (obtained with an instrumentation amplifier INA826) to enhance the output dynamics.

The electronics have been designed aiming to minimize the power consumption, since to facilitate the installation in plants where there is no possibility to bring wires to the sensors, each node was assumed to be power-supply independent, gathering energy from the local energy-harvesting module. For this reason a low power microcontroller STM32L476 from ST has been chosen.

As shown in Fig. 3, both the electronic front-end and the radio LoRa module are powered up by a switch controlled by the microcontroller; this allows to supply the front-end and the radio module only when required, saving energy for the rest of the time.

The power supply is composed by two branches, one connected to a 3 V lithium battery (CR123A, 1550/2000 mAh) by a pMOS, and the other connected to an energy-harvesting system by a Schottky diode. The energy-harvesting part consists of a thermoelectric generator (TEG), based on Peltier cells TEC1-12706, with the hot side in thermal contact with the pipe and the cold side in thermal contact with a heat sink. A boost converter is used to increase the generated voltage up to a stable value of 3.3 V. If the power given by the energy-harvesting module falls down, the pMOS switches automatically, connecting the battery to the system.

A capacitor is inserted in parallel with the power supply to filter the current peaks required by the radio module.

The LoRa module (SX1272) is a commercial device with a granted coverage in open field of more than 1 km, setting a transmission frequency of 868 MHz.

#### 4 Measurements and Results

The system has been tested evaluating the sensors measurement accuracy and the power balance of the electronics.

The tests have been performed on a 3" AISI 316L stainless steel pipe, using heated air as fluid. The insulating rings have been realized in polystyrene, with a width of 5 cm and thicknesses of 1 and 2 cm respectively.



#### 4.1 Sensors

The measurement system has been tested changing the fluid temperature from  $100 \,^{\circ}\text{C}$  down to  $70 \,^{\circ}\text{C}$ . The temperature has been changed every 8 min with steps of  $5 \,^{\circ}\text{C}$  per time.

As shown in Fig. 4, the system introduces at steady-state an error which is lower than  $1 \degree C$  and is able to adapt to a fast change of  $5 \degree C$  in about 2 min.

### 4.2 Power Balance

The active components used in the front-end are low-power devices, such to have a current absorption of 750  $\mu$  A per RTD.

The microcontroller is designed to consume 100  $\mu$ A at 1 MHz in run mode, less than 420 nA in standby mode, and a few hundreds of pA in sleep mode.

The LoRa module absorbs up to 50 mA only during the communication activities, for the rest of the time its current absorption is 0. It has been assumed that the measurements are performed every 2 min and the measured data are directly transmitted with the LoRa module.

The whole system works with a supply voltage of 3 V, therefore the total power consumption is on average equal to 10 mW, with peaks of 160 mW when the LoRa module is activated for transmission.

If the pipe temperature is high enough to produce a temperature difference with the sink of at least 30 °C (e.g., ambient temperature equal to 25 °C, heat sink temperature equal to 30 °C and pipe temperature greater or equal to 60 °C), the TEG module is able to provide power quantities ranging from 20 up to 50 mW.

Under these conditions, the energy-harvesting module can cover the entire power consumption. The battery is employed only during the current peaks introduced by the radio transmission operations. If the temperature difference goes below  $30 \,^{\circ}$ C, it is possible to continue working simply reducing the front-end and LoRa module run mode time intervals duration or increasing the sleep mode time intervals duration between two transmissions. The control of the duty cycle is performed by the microcontroller according to the temperature estimation performed on the pipe.
## **5** Conclusions

In this work a LoRa network of temperature sensors powered by Peltier cells for industrial applications has been presented. The implementation of the network involved the use of non-standard sensors, the design of dedicated electronics for energy-harvesting and sensor conditioning, and the definition of a control logic for the efficient use of energy. The performed tests involved the analysis of the used sensors measurement accuracy and the ratio between consumed and generated power. The results show that the system can be employed for the specified application.

## References

- Brotherton, T., Jahns, G., Jacobs, J., Wroblewski, D.: Prognosis of faults in gas turbine engines. In: 2000 IEEE Aerospace Conference Proceedings (Cat. No.00TH8484), vol. 6, pp. 163–171 (2000)
- Yang, S., Bryant, A., Mawby, P., Xiang, D., Ran, L., Tavner, P.: An industry-based survey of reliability in power electronic converters. IEEE Trans. Ind. Appl. 47(3), 1441–1451 (2011)
- Addabbo, T., Biondi, R., Cioncolini, S., Fort, A., Rossetti, F., Vignoli, V.: A zero-crossing detection system based on FPGA to measure the angular vibrations of rotating shafts. IEEE Trans. Instrum. Meas. 63(12), 3002–3010 (2014)
- 4. Han, Y., Song, Y.: Condition monitoring techniques for electrical equipment–a literature survey. IEEE Trans. Power Deliv. **18**(1), 4–13 (2003)
- Mikusz, M., Houben, S., Davies, N., Moessner, K., Langheinrich, M.: Raising awareness of IoT sensor deployments. In: Living in the Internet of Things: Cybersecurity of the IoT 2018, pp. 1–8 (2018)
- Monica, M., Yeshika, B., Abhishek, G.S., Sanjay, H.A., Dasiga, S.: Iot based control and automation of smart irrigation system: an automated irrigation system using sensors, GSM, bluetooth and cloud technology. In: 2017 International Conference on Recent Innovations in Signal processing and Embedded Systems (RISE), pp. 601–607 (2017)
- Singh, H., Pallagani, V., Khandelwal, V., Venkanna, U.: IoT based smart home automation system using sensor node. In: 2018 4th International Conference on Recent Advances in Information Technology (RAIT), pp. 1–5 (2018)
- Addabbo, T., Fort, A., Moretti, R., Mugnaini, M., Vignoli, V., Cinelli, C., Gerbi, F.: Development of a non-invasive thermometric system for fluids in pipes. In: 2017 IEEE International Symposium on Systems Engineering, ISSE 2017-Proceedings (2017)
- Addabbo, T., Fort, A., Biondi, R., Cioncolini, S., Mugnaini, M., Rocchi, S., Vignoli, V.: Measurement of angular vibrations in rotating shafts: effects of the measurement setup nonidealities. IEEE Trans. Instrum. Meas. 62(3), 532–543 (2013)
- Addabbo, T., Cordovani, O., Fort, A., Mugnaini, M., Vignoli, V.: Exhaust thermoelements redundant strategy to improve temperature reading reliability and serviceability. In: SMART-GREENS 2014–Proceedings of the 3rd International Conference on Smart Grids and Green IT Systems, pp. 96–100 (2014)

# Towards Subsea Non-ohmic Power Transfer via a Capacitor-Like Structure



Anwar Mohamed, Valentina Palazzi, Sunny Kumar, Federico Alimenti, Paolo Mezzanotte and Luca Roselli

**Abstract** In this work, a preliminary investigation of the behavior of a parallelplate capacitor-like structure immersed in the seawater is presented as the first step towards the development of a non-inductive contactless power transfer system under seawater. The reference structure consists of two square copper parallel plates with a side of 18 cm. Four media have been considered (air, de-ionized water, seawater and tap water); the distance between the plates is varied from 0.5 to 50 cm per each medium, and the transmission coefficient of the capacitor is recorded for each case. The measurements are performed in the frequency range 100–50 MHz. An equivalent circuit model has been carried out and some considerations about the power transfer mechanism in capacitor-like structures in the seawater have been drawn to pave the way towards more applicable structures.

**Keywords** Capacitive power transfer (CPT)  $\cdot$  Undersea wireless power transfer (U-WPT)  $\cdot$  Resistive coupling  $\cdot$  Wireless power transfer (WPT)  $\cdot$  Non-ohmic power transfer  $\cdot$  Contact-less power transfer

# 1 Introduction

Undersea Wireless Power Transfer (U-WPT) is becoming increasingly important for several applications ranging from oil platform maintenance to oceanographic investigations to cite a few [1]. So far, many studies have been carried out to extend the wide body of knowledge produced for terrestrial applications [2–5] to the undersea ones. Along this path, some peculiar differences have emerged, in particular: the

A. Mohamed (🖂)

V. Palazzi · F. Alimenti · P. Mezzanotte · L. Roselli Department of Engineering, University of Perugia, Perugia, Italy

S. Kumar PEC University of Technology, Chandigarh, India

© Springer Nature Switzerland AG 2019

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_41

Department of Electronic Engineering, University of Roma 2, Roma Torvergata, Rome, Italy e-mail: anwar.mohamed@students.uniroma2.eu

S. Saponara and A. De Gloria (eds.), Applications in Electronics

seawater has a significant conductivity, thus WPT via inductive coupling lacks of efficiency because of the relevant Eddy currents [5]. Capacitive coupling can be envisaged as a valuable alternative; however the few studies carried out so far have still evidenced limitations due to the conductivity of the water [6–7]. It is clear that a deep investigation to exploit all the opportunities to increase the efficiency of U-WPT is needed, both to minimize the cited detrimental factors and exploit all the potential advantages deriving from the peculiarities of the seawater.

According to this general motivation, the present work focuses on the analysis of the performance of a single parallel-plate capacitor immersed in the seawater, so as to obtain an accurate circuit model to be utilized in future WPT applications. The results of this preliminary investigation highlight the differences with air as well as with pure water applications, and open the door to new hypothesis for modeling and interpreting U-WPT by means of capacitor-like structures.

### 2 Experimental Analysis

The aim of the present work is to investigate the nature of the power transfer assuming a capacitor immersed in the seawater as a starting structure.

The parallel plate capacitor is investigated by measuring its transmission coefficient versus frequency (from 100 to 50 MHz) when the capacitor is immersed in four different media: air, deionized water, seawater and tap water.

The experimental setup is sketched in Fig. 1. The transmission coefficient  $(S_{21})$  is measured by using a Vector Network Analyzer (VNA) model Fieldfox N9918A. The two internal pins of the SMA connectors are connected to the two plates of the capacitor via two wires. It is worth noticing that unlike a purely inductive coupling made up of coils that are purely magnetically coupled without any ohmic connection, neither for the signal, nor for the ground, the latter experiment cannot be considered a truly example of WPT, because the ground references on both sides of the Device Under Test (DUT) are in common via the VNA ground. Consequently, a return ground ohmic path for the current is provided outside the water tank; however, this experiment is useful to focus on the power transfer mechanism and performance of a single capacitor-like structure when different media are considered.

The two metal plates of the capacitor, two square copper plates with a side of 18 cm, are placed inside two thin plastic bags (thickness of the plastic film around 0.1 mm) to prevent direct contact of any part of the DUT with the liquid. Such isolated plates are finally placed inside a container ( $56 \times 36 \times 30$  cm<sup>3</sup>), which has been filled with the material under test.

Before starting with the measurement of the capacitor, the transmission coefficient of the standalone wires is evaluated (see Fig. 2) by connecting the two wires together. As a whole, the wires can be approximated as a series inductor on the order of  $2 \mu H (1 \mu H \text{ for each wire})$ , thus introducing, by themselves, a low-pass behavior (the relevant cut-off frequency lies around 10 MHz) to the interconnection. Such a behavior must



be taken into account when considering the DUT performance in the subsequent analysis.

# 2.1 Measurements in Air

Firstly, the DUT (a series of a connecting wire, the capacitor and a wire again) is measured in air and the measurement results are reported in Fig. 3. The relationship between the geometry of the parallel plate capacitor and its equivalent capacitance C is reasonably described by the following well-known equation:

$$C = \varepsilon_0 \varepsilon_r \frac{A}{d},\tag{1}$$



Fig. 2 Transmission coefficient of the measurement setup consisting of the two connecting wires only



**Fig. 3** Transmission coefficient of the capacitor measured in air, for different distances between the copper plates. The results are reported in a subset of the total frequency range, i.e., 1–50 MHz for a better readability

where  $\varepsilon_0$  is the permittivity of the vacuum,  $\varepsilon_r$  is the relative permittivity of the medium, A is the area of the square plate and d is the distance between the plates.

In the present case,  $\varepsilon_r$  is almost equal to 1, and the impact of the plastic films can be considered negligible.

Consequently, by varying the distance of the parallel copper plates of the capacitor in air, its equivalent capacitance changes accordingly.

This is testified by the increase of the frequency associated with the maximum value of  $S_{21}$  (mainly ascribable to the resonance frequency of the circuit capacitor plus wires) as the distance *d* between the plates increases. As a whole, the setup under test can be approximated as a series resonant circuit, with a capacitance that varies as a function of distance *d* (see Fig. 4) and C varies according to formula (1).



Fig. 4 Circuit model for the measured capacitor, both in air and in deionized water



### 2.2 Measurements in Deionized Water

The same measurements are then performed in deionized water. The results are illustrated in Fig. 5. Again, the resonant frequency of the circuit varies with d, so the circuit can be still modeled as in Fig. 4, with the difference that the equivalent capacitance is higher in the deionized water than in the air, as expected, due to the higher permittivity of the water ( $\varepsilon_r$  around 80 in the latter case [8]). As a consequence, the main resonant frequency of the circuit is lower than in the previous situation.

#### 2.3 Measurements in Seawater

Finally, the circuit is immersed in the seawater. The latter is prepared by adding 35 g of salt for each liter of tap water, according to [8].

By adding salt, the water becomes moderately conductive (in particular, a conductivity  $\sigma$  around 4 S/m is achieved [8]). As a consequence, the behavior of this medium is closer to a lossy conductor than to a dielectric.

Figure 6 evidences that the measured transmission coefficient of the circuit under test results fairly independent of the distance between the copper plates of the capacitor.

Moreover, the total capacitance is higher than the one measured in the deionized water. As a consequence, the previous circuit model is not valid in the present case.





## **3** Equivalent Model for a Capacitor Under Seawater

In order to derive an extended model that accounts for the new situation stated by the conductive medium, it is worth underlining two aspects:

(a) the seawater acts also as a conductor. The value of the equivalent resistance depends on the conductivity of the water, and a first guess approximation, also on the area of the copper plates A, and on the distance d between the plates according to the second Ohm law, is:

$$R = \frac{d}{\sigma A}.$$
 (2)

Due to the large area of the copper plates adopted in the experiment, R varies between 0.4 to 4  $\Omega$  as *d* varies between 0.5 to 50 cm; this is a conservative, in excess, estimation performed by assuming that only the water, strictly between the two plates, participates to the ion transport. Such a fairly small resistance, in parallel with C, almost short-circuits RF signals at the latter capacitance, which has thus a negligible effect in such a scenario.

(b) A direct consequence of (a) is that the seawater in the proximity of the covering film of the plates is seen by the copper plate as a bad metallic surface with a thin plastic dielectric layer in between. This actually results in a different capacitive structure, made, on the one side, by the copper plate, and, on the other side, by the seawater conductive surface close to the plastic film that thus behaves as a thin dielectric layer.

Eventually, while in the air and in deionized water experiments, a structure with a line-up Metal-Insulator-Metal (M-I-M) (where the "I" layer was represented by air or deionized water) was observed, here a more complex line-up is experienced, where the seawater behaves as a bad metal medium from the capacitive point of view and as a low resistor from the conductive point of view. Assuming for the sake of brevity that this medium can be considered as a Non Ideal Metal



Fig. 7 Circuit model for the measured capacitor-like structure in the seawater



(NIM), the resulting line-up can be considered of the kind of: M-I-NIM-I-M where "I", this time, is just the thin plastic film isolating the plates from water.

In order to account for this new situation, the equivalent circuit is proposed in Fig. 7. Besides the inductors, which are associated with the external connecting wires and do not change with respect to the previous situation, two fixed capacitors  $C_f$  are introduced to account for the presence of the two M-I-NIM structures, actually implemented by inserting the isolated plates into the seawater and C is approximately the original capacitance in the Fig. 4. It is worth noticing that the plastic film is much thinner than the distance between the plates considered so far (about two order of magnitude), the two fixed capacitances are thus expected to be higher than the capacitance measured in the deionized water case (Fig. 8).

In order to have a first guess estimation of the value of these capacitances an a posteriori fitting of the experimental results has been performed and the reasonable value of 7 nF has been found. That is consistent with Eq. (1) providing 7 nF when a dielectric permittivity of about 2.5 is assumed for the plastic film and the thickness of the dielectric film is set to 0.1 mm; which is the case.

To assess the model and also to extend its validity, a case where the variation of R is larger than in the seawater is carried out. It consists of the adoption of tap water



as a medium. Tap water, in fact, exhibits a lower conductance than the seawater, but still high enough to consider the medium as a NIM from the point of view of the equivalent circuit model. In this case R varies significantly with distance, yielding a corresponding decrease of the transfer efficiency, but still no significant variation in frequency shape, i.e., in the reactances that are experienced, as testified by Fig. 9.

## 4 Conclusion

This paper describes some experiments to investigate the actual nature of the noninductive coupling in the seawater. This experimental campaign is carried out by starting from the simplest non inductive coupling structure that can be conceived: a single series capacitor consisting of a couple of parallel plates and a medium in between (air, deionized water, sea water and finally tap water). The results have been used to provide equivalent circuit models, the parameters of which have been evaluated by fitting the experimental data to the schematic simulations. From this activity some considerations can be deduced. In particular, this activity pointed out how the conductivity of the water, that is usually considered detrimental for the inductive coupling as well as for the capacitive one, is actually the main mechanism of ion transfer in the seawater and it can be in principle exploited by using a capacitor-like structure where the two plates of the capacitor are kept isolated from the water, so that the nature of the contact is still non-ohmic (the two contacted part are still isolated), while the capture of the power transfer can be considered mainly ohmic. This aspect is of paramount importance to open the way to new very simple structures able to provide contactless power transfer under seawater with less stringent constrain than usual in terms of alignment, costs and implementation technologies, providing a paradigm shift consisting of the exploitation of an aspect, i.e., high seawater conductivity, that has been usually considered detrimental.

## References

- Urano, M., Ata, K., Takahashi, A.: Study on underwater wireless power transfer via electric coupling with a submerged electrode. In: 2017 IEEE International Meeting for Future of Electron Devices, Kansai (IMFEDK), pp. 36-37. Kyoto (2017)
- Zou, L.J., Hu, A.P., Su, Y.G.: A single-wire capacitive power transfer system with large coupling alignment tolerance. In: 2017 IEEE PELS Workshop on Emerging Technologies: Wireless Power Transfer (WoW), pp. 93–98. Chongqing (2017)
- 3. Lu, F., Zhang, H., Mi, C.: A two-plate capacitive wireless power transfer system for electric vehicle charging applications. IEEE Trans. Power Electron. **33**(2), 964–969 (2018)
- Kumar, A., Pervaiz, S., Chang, C.K., Korhummel, S., Popovic, Z., Afridi, K.K.: Investigation of power transfer density enhancement in large air-gap capacitive wireless power transfer systems. In: 2015 IEEE Wireless Power Transfer Conference (WPTC), pp. 1–4. Boulder, CO, (2015)
- Cheng, Z., Lei, Y., Song, K., Zhu, C.: Design and loss analysis of loosely coupled transformer for an underwater high-power inductive power transfer system. IEEE Trans. Magn. 51(7), 1–10 (2015)
- Kline, M., Izyumin, I., Boser, B., Sanders, S.: Capacitive Power Transfer for Contactless Charging. In: Proceedings of IEEE Applied Power Electronics Conference and Exposition, pp. 1398–1404. Fort Worth, TX (2011)
- Naka, Y., Yamamoto, K., Nakata, T., Tamura, M., Masuda, M.: Verification efficiency of electric coupling wireless power transfer in water. In: Proceedings of IEEE MTT-S International Conference on Microwaves for Intelligent Mobility, pp. 83–86. Nagoya, Japan (2017)
- Somaraju, R., Trumpf, J.: Frequency, temperature and salinity variation of the permittivity of seawater. IEEE Trans. on Antennas Propag. 54(11), 3441–3448 (2006)

# A Robust Sensing Node for Wireless Monitoring of Drinking Water Quality



Lorenzo Mezzera, Michele Di Mauro, Marco Tizzoni, Andrea Turolla, Manuela Antonelli and Marco Carminati

**Abstract** In this work a low-cost and credit-card-sized electronic platform for continuous monitoring of water quality is presented. It simultaneously measures water pH, temperature, conductivity, flow rate, pressure and the micrometric thickness of surface fouling, of both chemical and biological nature. The system includes a GSM transceiver that creates a wireless sensors network for real-time monitoring of these parameters, enabling increased safety and, in perspective, automation and predictive maintenance of the water network. An external watchdog timer increases the robustness and the complete self-diagnostic ability of the solution. Preliminary results from field validation are reported.

# 1 Introduction

Water is often called the "blue gold" of the 21th century. This is due to its relevant role for human life, agriculture and economy. Public opinion, in fact, is becoming increasingly aware of its limited availability and, thus, of its value for mankind. For this reason, public agencies managing the water cycle are under increasing pressure to grant safety, quality and quantity of drinking water. In particular, monitoring of drinking water quality is fundamental in order to detect natural or deliberated contamination (such as in terroristic attacks). The system presented in this work can monitor the main chemical and physical parameters of the drinking water such as conductivity, pH, temperature and three parameters useful for the water network operators such as water flow, pressure and, by means of an innovative sensor, the

L. Mezzera · M. Carminati (🖂)

Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy e-mail: marco1.carminati@polimi.it

M. Di Mauro · M. Tizzoni · A. Turolla · M. Antonelli Dipartimento di Ingegneria Civile e Ambientale, Politecnico di Milano, Piazza Leonardo da Vinci 32, 20133 Milan, Italy e-mail: manuela.antonelli@polimi.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_42

thickness of biological (biofilm) or chemical (limestone) deposit on the inner surface of pipes. Along with the introduction of this new deposit sensor, the major challenges in the design of the systems are the pursuit of (i) reliability and robustness (for both of the electronics and the mechanics) and (ii) the combination of low-cost with high sensitivity, mostly achieved with accurate analog design.

The system, when pervasively installed in hundreds of nodes, can thus provide an accurate and real-time overview of the network status, with no need for traditional laboratory analysis that, despite being very accurate (and thus complementary to on-line sensors), are more expensive, time consuming and, most of the times, not representative of the entire network. At the moment, a few of these sensing nodes are placed, on a pilot scale, in strategic points of a real water distribution network in order to model the system and improve maintenance and potabilization processes, optimizing for instance the use of chemicals.

#### 2 Sensors Platform and Electronic Circuit

The platform, which is the third evolution of past prototypes endowed with an increasing number of sensors and functions [1], is able to measure the basic parameters of drinking water: conductivity, pH and temperature (to compensate for the temperaturedependence of other sensors) as in others work in the field of drinking water [2]. Flow rate, pressure along pipes and deposit thickness can also be monitored. The measurement of scaling and biofilm is the most original and challenging aspect of this project because no other sensors present in the market are able to easily discriminate between this two different type of fouling. With respect to other commercial products, adopting optical or electrochemical principles [3], our system measures impedance across planar microelectrodes (at frequency shunting the double layer capacitance) that is proportional to the thickness and type of deposit [4]. In this way the main goals of this project, i.e. robustness and sensitivity, can be fulfilled through the use of a very compact and low-cost sensors [5].

The system is managed by an Arduino Mega microcontroller (Fig. 1) that acquires data from the sensors, after conditioning by a custom analog electronic board, manages the data and, thanks to a GSM Shield with an external antenna, transmits them (every 15-60 s) to an IoT cloud server (Thingspeak). This portal manages data processing and visualization (either raw or processed by proper models) on a web page compatible with both a computer and a smartphone. In order to increase the robustness of the solution, an external watchdog controls the correct execution of the code in the microcontroller and resets the system in case of failure. The use of the external watchdog is necessary because the connection and data sending phase can be very long (upon to 20 s) and the maximum time set for the internal microcontroller watchdog is only 8 s.

The design specifications in terms of sensors resolution are:  $0.5 \,^{\circ}$ C for temperature, 0.1 for pH, 2  $\mu$ S/cm for conductivity and sub-micrometric resolution for deposit. The spacing of the microelectrodes determines the measurable deposit thickness and the



Fig. 1 Scheme of the whole electronic platform including the sensors embedded in the pipe and the GSM shield for wireless communication

operating frequency. For 10  $\mu$ m electrodes, the operating frequency is around 2 MHz, a decade after the first pole of the electrochemical interface. A relatively high value of the stimulating signal frequency leads to use a shielded and waterproof wire such as a standard VGA cable. Thus, a synchronous demodulation scheme is adopted for the accurate measurement of impedance in the 1–10 MHz range. A low-power Direct Digital Synthesis (DDS) generates the sinusoidal stimulation and is used in combination with a 50- $\Omega$  transimpedance amplifier (TIA) that converts the sine current into 100 mV voltage. This voltage is filtered by a second-order Chebychev low-pass filter (LPF) implementing a Sallen-Key configuration. This stage is the driver of the forcing electrode. The current signal of the sensing electrode is collected by another TIA and sent to an analog multiplier with the reference sinusoid. Thanks to the last LPF, a DC signal inversely proportional to the resistance of the deposit is obtained. The measured resolution is better than 2  $\mu$ m [4].

Gold-coated connector pins [6] are used for conductivity sensing. Thanks to the low-cost of these probes, four of them are mounted here. This solution allows to use a physical redundancy and fault detection (voting) logic to identify and exclude faulty probes [7]. Given the different geometries of the electrodes (larger separation) here the operating frequency is lower (100 kHz), thus allowing the use of a single-chip 12-bit FFT-based impedance detector (AD5933 by Analog Devices), to save space and complexity. The conductivity value (along with temperature) is also used by the algorithm which estimates the deposit thickness in order to take into account the baseline water conductivity. pH and temperature are measured by means of an industrial probe whose signals are acquired by a 24-bit ADC, setting a reference and



Fig. 2 Stack of the 3 electronics board and the external watchdog that performs the control on the code

reading the value from the sensor. The value measured by the manometer is read by the Arduino internal ADC since its performance is adequate. The flow meter provides a signal which is frequency-modulated by the water speed thanks to a Hall sensor. A key aspect during the layout of the PCB was the design of the power supply. Digital and analog parts are present at the same time and must be separately routed. The Arduino board supplies the digital power to the ICs with digital part and another separate LDO regulator is used only for the analog power supply. The 4-layer PCB is designed in order to separate with a ground layer the analog portion of the circuit from the route of the on board communication protocol, such as SPI and I2C. The entire electronic circuit is composed by the Arduino Mega on bottom, the custom analog-conditioning board in the inner and the GSM Shield on the top of the structure (Fig. 2).

# **3** Performance Assessment and Validation

The entire system has been thoroughly tested: initially in a static laboratory analysis and then in a closed water loop, to create similar dynamic conditions (of water flow and pressure) to those of a real distribution network. The experimental results confirm performances in agreement with design specifications and in line with what offered by commercial instrumentation, but with higher integration and lower cost and size. In particular:

• temperature: resolution better than 0.1 °C between 0 and 50 °C



- pH: resolution better than 0.1 between pH 0 and pH 14
- conductivity: resolution better than 20 ppm between 50 and 2500  $\mu$ S/cm
- film monitoring: resolution better than 2  $\mu m$  between 5 and 35  $\mu m$  and better than 5  $\mu m$  between 35 and 350  $\mu m$
- flow: resolution better than 0.1 L/s between 2.5 and 80 L/s
- pressure: the system was tested up to 4 bar

The system has been tested for a continuous 3-week period within the loop. Among various tests of tracking the chemo-physical parameters of water, a measurement of the power consumption was performed by means of a DMM7510 multimeter by Keithley. Data show (Fig. 3) that the acquisition and transmission periods represent only a small amount of the total power consumption and the most of the energy is spent during the "delay" condition. This is due to the fact that the prototyping platform here used (Arduino) cannot swtich to sleep mode and the acquisition custom board doesn't have a power-down solution. The adoption of the GSM technology, instead of an emerging IoT-oriented solution (such as Sigfox or LoRa WAN), is motivated by the lack of a standardization and by the excellent coverage provided by the traditional cellular network. Anyway, an auxiliary port for serial communication with different transceiver modules is present to adapt to various radio infrastructures, locally available along the network of pipes.



## 4 Conclusions and Outlook

After the laboratory validation with the loop, three fully-equipped prototypes (Fig. 4) are currently under validation in a real water distribution network for a pilot validation in the field. On the other hand, the time tracking of power consumption shows that it needs to be reduced if an energy harvesting system (involving the use of an external battery) will be leveraged to allow installation of the nodes away from power mains. Preliminary tests were performed with an energy-recovering turbine in combination with a smart algorithm running in the Cloud and adapting the time delay between two different acquisitions, in order to maximize the charge of the battery. Simultaneously, a redesign of the entire board is in progress to reduce the consumption in stand-by condition. During the sleep phase, the microcontroller turns off the sensors acquisition and transmission sections, powering only the energy harvesting. The power consumption drops down to about 30 mA, i.e. ten times smaller than the present average consumption of about 1.5 W. Another solution for the offline operation is the implementation of on-board data storage. An SD card reader and a Real Time Clock (RTC) are present to log the values coming from the sensors with the exact time.

**Acknowledgements** Politecnico di Milano is acknowledged for providing partial financial support to this activity through the Switch2Product Competition 2017.

# References

- Carminati M., et al.: A smart sensing node for pervasive water quality monitoring with antifouling self-diagnostics. In: Proceedings of IEEE ISCAS, pp.1–5 (2018)
- 2. Lambrou, T.P., et al.: A low-cost sensor network for real-time monitoring and contamination detection in drinking water distribution systems. IEEE Sensors J. 2765–2772 (2014)
- 3. Pavanello, G., et al.: Exploiting a new electrochemical sensor for biofilm monitoring and water treatment optimization. Water Res. **45**, 1651–1658 (2011)
- 4. Turolla, A, et al: Development of a miniaturized and selective impedance sensor for real-time slime monitoring in pipes and tanks. Sensors and Actuators B **281**, 288–295 (2019)
- Munoz-Berbel, X., Munoz, F., Vigués, N., Mas, J.: On-chip impedance measurements to monitor biofilm formation in the drinking water distribution network. Sensors and Actuators B 118, 129–134 (2006)
- Carminati, M., Luzzatto-Fegiz, P.: Conduino: affordable and high-resolution multichannel water conductivity sensor using micro USB connectors. Sensors and Actuators B Chem. 251, 1034–1041 (2017)
- Ramotsoela, D., Abu-Mahfouz, A., Hancke, G.: A survey of anomaly detection in industrial wireless sensor networks with critical water system infrastructure as a case study. Sensors 8, 2491(1–24) (2018)

# Doubly-Balanced Gilbert Cell Down-Conversion Mixer in AMS 0.35 µm SiGe CMOS for Mode-1 MB-OFDM UWB Receivers



#### S. Cammarata, G. Fieramosca, B. Neri, F. Baronti and S. Saponara

**Abstract** This work presents the design of a mixer in RF CMOS technology for a MB-OFDM UWB band transceiver operating in Mode-1 (3.1–4.8 GHz). Working at circuit-level in ADS<sup>TM</sup> RF CAD and at system-level in Matlab-Simulink<sup>TM</sup>, a Gilbert cell architecture is proposed, exploiting current bleeding, integrated matching and inductive tuning techniques. Circuit performance results in AMS 0.35  $\mu$ m SiGe CMOS technology prove the feasibility of the circuit for integration in a low-cost UWB receiver.

**Keywords** Multi-band orthogonal frequency division multiplex (MB-OFDM) · Ultrawideband (UWB) · Cmos mixer · Current bleeding

## 1 Introduction

The aim of this work is to design a CMOS down-conversion mixer, using AMS  $0.35 \mu$ m SiGe CMOS technology, able to properly work for the MB-OFDM UWB service [1]. State-of-art solutions, such that of Touati's in [2], are taken as benchmark. The allocated spectrum for this service is divided in multiple band groups, among which only the first has to be served, while all the other are for future use. The mandatory group is called Mode-1: it is composed of 3 channels centered at 3.4, 3.9 and 4.5 GHz respectively. Each one is divided into 128 sub-channels, which constitute the orthogonal subcarriers of the OFDM signal associated to the channel. In an UWB receiver, a lot of trade-offs are posed for a mixer design: wideband matching, high conversion gain (about 14 dB in [2]), flatness of parameters over all the channels. This is very hard to obtain with the considered technological process, which offers a low cost solution, more than one order of magnitude cheaper in Europractice than GF 28 nm RF one, but is not suited for frequencies above 4 GHz.

S. Cammarata (⊠) · G. Fieramosca · B. Neri · F. Baronti · S. Saponara Dipartimento Ingegneria dell'Informazione – Università di Pisa, Via G. Caruso 16, Pisa, Italy

e-mail: cammarata.simone@yahoo.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics* 

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_43

A doubly-balanced active mixer topology has been chosen to increase the conversion gain and to eliminate to a great extent LO signal presence and even-order distortion products at the mixer output. Some circuit techniques are investigated to improve matching, gain and noise performances without degrading power consumption. All circuit-level simulations are performed in ADS<sup>TM</sup>, as discussed in Sect. 3. In Sect. 4 a system-level simulation is carried out by means of a Matlab-Simulink<sup>TM</sup> model and conclusions are drawn.

#### 2 Gilbert Cell Operation, Sizing and Biasing

Referring to Fig. 1, the sinusoidal RF signal is transformer-coupled [3] to the transconductive stage in order to drive the input transistor pair with a balanced signal. Assuming an internal impedance of the RF source  $Z_{S_{RF}} = 2Z_0 = 100 \Omega$  the differential input impedance  $Z_{in_{RF}}$  seen from  $v_{rf}^+$  and  $v_{rf}^-$ , i.e. the RF port, needs to be equal to  $Z_{S_{RF}}$  to achieve maximum power transfer. The load of the switching stage is formed by two contributions:

- 1. A resistance  $R_L^{(DC)}$  to adjust the DC bias point (as it is done also in [2, 3]).
- 2. The «actual» load of the mixer  $Z_L^{(IF)}$ , i.e. the input impedance of the subsequent stage, transformed coupled to the output port. A common choice is to connect a transmission line to each output terminal, usually with characteristic impedance  $Z_0 = 50 \Omega$ ; hence:  $Z_L^{(IF)} = 2Z_0 = 100 \Omega$ .

A proper choice of device parameters and bias is needed to obtain the required performances. The circuit has six MOSFETs but all the devices of the transconductive stage must be nominally equal  $(M_1, M_2 = M_{RF})$  as well as all the ones composing the switching stage  $(M_1, M_2, M_3, M_4 = M_{LO})$ . To size and bias all the active devices it is helpful to recall that a single independent MOSFET could be described by a set of four «DC point» quantities and two geometrical dimensions:  $(V_{GS}, V_{DS}, V_{BS}, I_D, W, L)$ . The whole set of degree of freedom (DOFs) that affects the Gilbert cell DC operation is then:

$$\left\{ (V_{\text{GS}}, V_{\text{DS}}, V_{\text{BS}}, I_D, W, L)_{M_{\text{RF}}, M_{\text{LO}}}, R_L^{(\text{DC})}, V_{\text{DD}}, I_{\text{bias}}, V_{\text{tail}} \right\} .$$

The parameters that directly come from specifications, which are taken from [2], are:  $V_{\text{DD}} = 3$  V,  $I_{\text{bias}} = 6 \text{ mA}$ ,  $R_L^{(\text{DC})} = 600 \Omega$ , setting power consumption to  $P_D = V_{\text{DD}}I_{\text{bias}} = 18 \text{ mW}$ . The channel length *L* of each transistor has been set to the minimum value allowed by the technology process in order to maximize speed performance:  $L_{M_{\text{RF}}} = L_{M_{\text{LO}}} = 0.35 \,\mu\text{m}$ . Concerning transistor width it is needed to distinguish between the two stages:

1.  $M_{\rm RF}$  is set with the heuristic formula provided by Shaeffer and Lee [4] which gives a power-constrained optimum value for the noise figure of a single MOSFET. The «rule of thumb» is:  $W \cdot f = 500 \div 600 \,\mu$ m GHz. Because of the



**Fig. 1** Gilbert cell improved by  $L_{\pi}$  tuning inductor and  $I_{bleed}$  current-bleeding sources (highlighted in dashed boxes). The substrate tap of each MOSFET is connected to ground. Regarding the sizing phase described in Sect. 2 the  $I_{bleed}$  current sources should be considered switched off

band wideness it has been chosen to size  $M_{\rm RF}$  width at the central frequency, i.e.  $f = 4 \,\text{GHz}$ :  $W_{M_{\rm RF}} = 150 \,\mu\text{m}$ .

2. Since  $M_{\rm LO}$  devices has to emulate alternatively open or short circuits, they will have a lower impact on the overall noise performance. Hence, it is not crucial to set their width with Lee's formula. We could harness this DOF to lower intrinsic series resistance between drain and source in the «on-state» of the switch setting  $W_{M_{\rm LO}}$  to the maximum value allowed by the technology.

A voltage headroom  $V_{\text{tail}}$  for the current generator  $I_{\text{bias}}$  is required, and has been set taking into account that the most common implementation is a simple current mirror. Aiming to design  $M_{\text{LO}}$  switches as much ideal as possible, we decided to set the voltage drop across them to a low value, of the same order of  $V_{\text{tail}}$ . The DC

| Table 1   | $M_{\rm RF}$ and | $M_{\rm LO}$ DC |
|-----------|------------------|-----------------|
| point sur | nmary            |                 |

| M <sub>RF</sub>               | MLO                           |
|-------------------------------|-------------------------------|
| $V_{\rm GS} = 0.894  {\rm V}$ | $V_{\rm GS} = 0.933  {\rm V}$ |
| $V_{\rm DS} = 0.7  \rm V$     | $V_{\rm DS} = 0.25  \rm V$    |
| $V_{\rm BS} = -0.25\rm V$     | $V_{\rm BS} = -0.95  {\rm V}$ |
| $I_D = 3 \mathrm{mA}$         | $I_D = 1.5 \text{ mA}$        |

operating point is summarized in Table 1.

$$V_{\text{RF}_{\text{DC}}} = 1.144 \text{ V} ,$$
  
 $V_{\text{LO}_{\text{DC}}} = 1.883 \text{ V} ,$  (1)  
 $V_{\text{tail}} = 0.25 \text{ V} .$ 

#### **3 RF** Design and Optimization

To achieve complex conjugate matching on the RF port, inductive source degeneration and gate matching networks have been added to  $M_{\rm RF}$  transistors. This is the solution that guarantees the best performances in terms of noise, according to [4]. A simplified analysis of the single MOSFET  $M_{\rm RF}$  (including  $L_s$  and  $L_g$ ) at the central frequency of the band (i.e.  $\omega_0/2\pi = 4$  GHz), leads to an analytical expression of its input impedance ( $Z_{\rm in}$ ):

$$Z_{\rm in} = L_{\rm s} \frac{g_{\rm m}}{c_{\rm gs}} + j \left( \omega_0 L_{\rm s} + \omega_0 L_{\rm g} - \frac{1}{\omega_0 c_{\rm gs}} \right) . \tag{2}$$

Imposing that  $\Re \{Z_{in}\} = Z_0$  and  $\Im \{Z_{in}\} = 0$  we obtain two constraints for  $L_s$ and  $L_g$ . Considering that in AMS  $0.35 \,\mu$ m technology  $\frac{c_{gs}}{W} \sim 1.15 \,\text{pF/mm}$  and  $f_T \sim 15 \,\text{GHz}$  (calculated in the bias point of Table 1), we end up with:  $L_s \approx 0.5 \,\text{nH}$ and  $L_g \approx 8.7 \,\text{nH}$ . These approximated values will be the starting point for the RF optimization phase. To enhance mixer performances two circuit techniques could be adopted: **current bleeding** and **tuning inductance**.

Originally proposed by Lee [5], the idea behind current-bleeding came from the fact that both third-order intercept point (IIP<sub>3</sub>) and conversion gain ( $G_C$ ) are proportional to the square root of  $M_{RF}$ 's quiescent current, since it works in saturation region [2]. Moreover, decreasing the bias current of  $M_{LO}$  devices leads to a reduction of flicker noise produced by the switching stage [6].

Mixer performances could then be improved increasing  $I_{D_{M_{\text{RF}}}}$  and decreasing  $I_{D_{M_{\text{LO}}}}$ . Hence, Lee's solution was to place a current generator  $I_{\text{bleed}}$  at each output node of the transconductive stage in order to provide this «excess current» to  $M_{\text{RF}}$  devices. Typically, the value of  $I_{\text{bleed}}$  is set to half the bias current of the RF stage [5]:  $I_{\text{bleed}} = \frac{I_{D_{M_{\text{RF}}}}{2}}{2} = 1.5 \text{ mA}$ . Note that this procedure will change the bias point designed so far.



The standard Gilbert cell topology has some drawbacks due to the parasitic capacitance present at the drain nodes of the transconductive transistor pair, as shown in Fig. 1. This capacitance  $c_{par}$  lowers the impedance at that node and leads to RF signal loss, inducing reduction in transconductance and a degradation of both conversion gain and noise figure [7]. To nullify this bad effect, it is possible to use a single common inductor  $L_{\pi}$  connected between the output nodes of the first stage to determine a resonance condition with  $c_{par}$ . To maximize conversion gain and minimize noise figure (NF) the optimum value of  $L_{\pi}$  has been set to 3.5 nH after some cycles of parametric sweep keeping constant the other parameters. The tuning inductor plays a crucial role also in adjusting the real part of the impedance seen from the RF port of the Gilbert cell. As it is shown in Fig. 2, the  $\Re \{Z_{in_{RF}}\}$ , without  $L_{\pi}$ , is always greater than  $Z_{S_{RF}} = 100 \Omega$ , and even performing a parametric sweep on  $L_s$ we end up with a very large  $\Re \{Z_{in_{RE}}\}$  compared to the value that we expected from our simplified analysis. It could be due to the unavoidable coupling between  $M_{\rm RF}$ 's gate and  $M_{\rm RF}$ 's drain circuit, that feeds back the load to the input mesh. Putting  $L_{\pi}$ in the circuit, it is easy to see that  $\Re \{Z_{in_{RF}}\}$  settles to the desired value on a considerable part of the band, and stays a little bit below  $100 \Omega$  only in the initial part (without affecting  $\Im \mathfrak{M}\{Z_{in_{RF}}\}$ ). Unfortunately, as it is shown in Fig. 2, it is impossible to achieve  $\Im \mathfrak{M}\{Z_{in_{RF}}\} = 0$  over the entire bandwidth, or at least a single channel. This comes from the fact that the implemented gate network, i.e.  $L_g$ , realizes only a narrow-band matching. A possible solution could be the design of a wide-band matching network. It involves the usage of more reactive elements to distribute multiple resonance conditions all over the bandwidth to get the desired response, in terms of  $Z_{in_{RF}}$ . Its design requires filter theory [2], and goes beyond the scope of this work.

Then,  $L_g$  could be used as a DOF to optimize other figures of merit. Consulting results obtained by [2], we decided to focus on conversion gain  $G_C$  enhancement because it was the parameter where we were lacking the most (Fig. 3).

The results of a parametric sweep on  $L_g$  are reported in Fig. 4, where a kink for  $L_g$  values below 6 nH shows up. Performed simulations have shown that it could be due to the *potential instability* of the transconductive amplifier in those conditions. Hence,  $L_g$  value should be >6 nH to avoid the sharp peak within the band, and <8 nH to keep  $G_C > 0$  dB for all the channels.



**Fig. 3**  $G_{\rm C}$  ( $f_{\rm LO}$ ) comparison between the three different designs



**Fig. 4** Conversion gain  $G_{\rm C}$  and noise figure NF<sub>SSB</sub> as a function of frequency  $f_{\rm LO}$  (with an  $f_{\rm RF}$  frequency 45 mH apart so that a baseband signal at  $f_{\rm IF} = 45$  MHz is observed) and  $L_{\rm g}$ . Here:  $L_{\pi} = 3.5$  nH,  $I_{\rm bleed} = 0$ 

All the observations made for  $G_{\rm C}$  applies, in a specular way, also to NF, i.e. whether  $G_{\rm C}$  increases NF decreases, and whether  $G_{\rm C}$  flattens the same does NF. This is a good result because it means that we optimized conversion gain and noise performances at the same time. Finally, it has been set to:  $L_{\rm g} = 7.5$  nH.

Table 2 summarizes all the relevant performances of the proposed design. For simplicity, they are evaluated for the three central frequencies of each channel of the Mode-1 service, i.e.  $f = \{3.4, 3.9, 4.5 \text{ GHz}\}$ . Note that RF simulations do not include models for mismatches or non-idealities of components used, so isolation parameters would not be considered in this work.

| $\Delta f = 4.125$   | MHz               |                              |                       |                         |                          |                        |
|----------------------|-------------------|------------------------------|-----------------------|-------------------------|--------------------------|------------------------|
| Circuit version      | Mode-1<br>channel | $Z_{\text{in}_{RF}}(\Omega)$ | $G_{\rm C}({\rm dB})$ | NF <sub>SSB</sub> (dBm) | CP <sub>1 dB</sub> (dBm) | IIP <sub>3</sub> (dBm) |
| Standard             | f#1               | 112 - j124                   | 2.151                 | 5.705                   | -11.50                   | -0.425                 |
|                      | f#2               | 112 - j21.0                  | 1.394                 | 5.775                   | -10.98                   | 0.193                  |
|                      | f#3               | 112 + j86.0                  | -1.012                | 6.516                   | -9.11                    | -0.208                 |
| $\mathrm{w}/L_{\pi}$ | f#1               | 89.7 - j123                  | 4.535                 | 5.107                   | -13.67                   | -0.303                 |
|                      | f#2               | 95.4 - j18.0                 | 4.146                 | 5.204                   | -12.10                   | 2.096                  |
|                      | f#3               | 100 + j90.0                  | 1.061                 | 5.883                   | -8.46                    | 1.923                  |
| w/Ibleed             | f#1               | 84.4 - j114                  | 5.704                 | 4.902                   | -15.34                   | 0.031                  |
|                      | f#2               | 93.1 - j8.78                 | 4.421                 | 5.064                   | -10.83                   | -1.056                 |
|                      | f#3               | 101 + j98.2                  | 1.552                 | 5.741                   | -11.83                   | 0.918                  |

**Table 2** Simulated mixer performances for  $f_{\#1} = 3.4$  GHz,  $f_{\#2} = 3.9$  GHz,  $f_{\#3} = 4.5$  GHz,  $f_{IF} = 45$  MHz. Operating conditions:  $P_{RF} = -30$  dBm,  $P_{LO} = 7.5$  dBm,  $L_{\pi} = 3.5$  nH,  $I_{bleed} = 1.5$  mA,  $\Delta f = 4.125$  MHz

To verify the benefits introduced by circuital solutions three different version of the mixer will be considered (the  $G_{\rm C}(f)$  trend is reported in Fig. 3):

- 1. Standard Gilbert cell, without  $L_{\pi}$  and bleeding current.
- 2. Modified Gilbert cell with  $L_{\pi} = 3.5$  nH, but without bleeding current (referred as «w/ $L_{\pi}$ » throughout the article).
- 3. Complete Gilbert cell with 3.5 nH inductive tuning and 1.5 mA bleeding current (referred as  $\langle w/I_{bleed} \rangle$  throughout the article).

# 4 System-Level Simulation Results and Conclusions

An «out-of-the-box» Matlab-Simulink<sup>TM</sup> testbench has been used to carry out system-level data. Without loss of generality, it is possible to evaluate system performances considering a single subchannel digitally encoded with 8-PSK modulation. The model includes:

- 1. a digital signal generator, which creates the numerical symbol stream with symbol bandwidth B = 4.125 MHz, according to the modulation format.
- 2. a model for path loss over an AWGN fading-less communication channel.
- 3. a blocker signal generator, whose bandwidth is left equal to the main RF signal because, in our case, it represents an adjacent subchannel.
- 4. the receiving unit models a direct conversion I/Q receiver followed by numerical de-mapping blocks. Two nominally identical instances of the mixer, characterized with parameters extracted from ADS<sup>TM</sup> simulations reported in Table 2, has been implemented for comparison purposes.

The goodness of the system can be evaluated, at first glance, by received constellation plots and eye diagrams. These graphical inspection tools could be summarized

| Mode-1 channel<br>(GHz) | EVM <sub>standard</sub> (%) | $\text{EVM}_{\text{w}/L_{\pi}}$ (%) | EVM <sub>w/Ibleed</sub> (%) |
|-------------------------|-----------------------------|-------------------------------------|-----------------------------|
| $f_{\#1} = 3.4$         | 5.671                       | 5.261                               | 5.518                       |
| $f_{\#2} = 3.9$         | 7.384                       | 5.719                               | 6.203                       |
| $f_{\#3} = 4.5$         | 11.54                       | 9.236                               | 9.808                       |

Table 3 Simulated EVM performances

by the Error Vector Magnitude (EVM), whose values for each mixer version all over the Mode-1 channels are reported in Table 3.

System-level simulations highlighted that the tuning inductor  $L_{\pi}$  gave a valid improvement of the Gilbert cell circuit, for what it concerns purely «electrical» parameters as well as system-level performances (as it is shown in Table 3). The current-bleeding technique, in the way it was implemented, did not seem to bring advancements to the system. This could be due to the fact that it was designed only following «rules of thumb» coming from literature. Further work could then be focused on the optimization of the bleeding stage, eventually tailoring current  $I_{bleed}$  value and the associated bias point, and considering feasible implementations of such current sources.

### References

- 1. ECMA International: High Rate-Ultra Wide Band (UWB) Background
- Touati, F., Douss, S., Loulou, M.: A high-performance doubly-balanced mixer in 0.35-μm CMOS for mode-1 MB-OFDM UWB receivers, vol. 46, pp. 351–363 (2008)
- Gordon, M.Q., Yao, T., Voinigescu, S.P.: 65-GHz receiver in SiGe BiCMOS using monolithic inductors and transformers. In: Digest of Papers, 2006 Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems, pp. 4 (2006)
- Shaeffer, D.K., Lee, T.H.: A 1.5-V, 1.5-GHz CMOS low noise amplifier. IEEE J. Solid-State Circuits 32(5), 745–759 (1997)
- 5. Lee, S.G., Choi, J.K.: Current-reuse bleeding mixer. Electron. Lett. 36(8), 696-697 (2000)
- Razavi, B.: RF Microelectronics, 2nd edn. Prentice Hall Press, Upper Saddle River, NJ, USA (2011)
- Phan, A.T., Kim, C.W., Kang, M.S., Lee, S.G., Su, C.D.: Frequency-controllable image rejection down CMOS mixer, vol. 1, pp. 46–50 (2004)

# **Efficient Implementation of Recurrent Neural Network Accelerators**



Vida Abdolzadeh and Nicola Petra

**Abstract** In this paper we propose an accelerator for the implementation of Long Short-Term Memory layer in Recurrent Neural Networks. We analyze the effect of quantization on the accuracy of the network and we derive an architecture that improves the throughput and latency of the accelerator. The proposed technique only requires one training process, hence reducing the design time. We present implementation results of the proposed accelerator. The performance compares favorably with other solutions presented in Literature.

## 1 Introduction

In recent years Neural Networks have been widely used for data modeling and classification. Among the many types of Neural Networks, Recurrent Neural Networks (RNN) have been used in the last decades for audio captioning/speech recognition [1], data injection attack detection [2], data forecast and prediction [3].

Recurrent Neural Networks are the natural extension of the Convolutional Neural Networks (CNN) [4–6]. The main difference is the use of a memory that allows modeling dynamic data, such as sequences [7]. As a consequence, the computation of the result in a RNN is based on a feedback architecture where the output is a function of the whole input sequence.

The feedback architecture has two main effects: the throughput of the RNN is strictly dependent on the latency of the computation and cannot be easily improved using pipe-lining or similar techniques; the effect of the quantization error must be carefully taken into account because it may accumulate.

For cheap implementations of Neural Networks, one of the most promising architectures is the SoC with one or more hardware accelerators that speed-up the execution of the Neural Network algorithm [8, 9]. In these architectures, the algorithm is

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_44

V. Abdolzadeh (⊠) · N. Petra

Department of Electrical Engineering and Information Technology, University Federico II, via Claudio 21, 80125 Naples, Italy e-mail: vida.abdolzadeh@unina.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics



partially run on a general-purpose processor, while the bottlenecks of the elaboration are implemented on a FPGA or on similar technologies. Most of the implementations based on this approach refer to CNNs.

In this paper we focus on the implementation of accelerators for RNNs. The core of a RNN is the Long Short-Term Memory (LSTM) layer. We analyze the effect of quantization for this layer and we derive an architecture for the efficient implementation of the LSTM.

This paper is organized as follows. Section 2 describes the architecture of the RNN and the LSTM layer. In Sect. 3 we analyze the effect of the quantization on the network accuracy. Section 4 discusses the implementation of the LSTM layer.

#### 2 Recurrent Neural Network Architecture

The architecture of a typical RNN is shown in Fig. 1. The input of the network is a sequence of vectors. The feature extraction layer often performs a linear operation on the inputs. The core of the network is the LSTM layer. The final layers define the type of RNN. In Fig. 1a the soft-max layer is used for sequence classification, while the regression layer in Fig. 1b is used for data prediction.

The LSTM layer implements a feedback elaboration so that its outputs are a function of the whole input sequence. The input of the LSTM layer is a sequence  $x_t$ , where *t* is the time. Each element of the input sequence  $x_t$  is a vector made of Z elements (cfr. Fig. 1). The output of the layer is the sequence  $h_t$  called the hidden state of the layer. Each element of the output sequence is a vector made of N elements (cfr. Fig. 1). The hidden state is computed according to the following equation [8]:

$$h_t = \mathcal{U}(h_{t-1}, x_t) \tag{1}$$

where U is a non-linear function. One of the most used implementation of U is given by the following equations:

$$i_{t} = \sigma(W_{ix} \times x_{t} + W_{ih} \times h_{t-1} + b_{i})$$

$$\tag{2}$$

$$f_{\rm t} = \sigma(W_{\rm fx} \times x_{\rm t} + W_{\rm fh} \times h_{\rm t-1} + b_{\rm f}) \tag{3}$$

$$g_{t} = \tanh(W_{cx} \times x_{t} + W_{ch} \times h_{t-1} + b_{c})$$
(4)

$$c_{t} = f_{t} \cdot c_{t-1} + i_{t} \cdot g_{t} \tag{5}$$

$$o_{t} = \sigma(W_{ox} \times x_{t} + W_{oh} \times h_{t-1} + b_{o})$$
(6)

$$h_{\rm t} = o_{\rm t} \cdot \tanh(c_{\rm t}) \tag{7}$$

where  $c_t$  is the cell activation vector,  $i_t$  and  $f_t$  are called the input gate and the forget gate respectively,  $o_t$  is the output gate. The function  $\sigma$  is the sigmoid function, while  $tanh(c_t)$  and  $g_t$  are called the output activation function and input activation function.

The multiplications  $\times$  in Eqs. (2)–(7) are matrix multiplications.  $W_{ix}$ ,  $W_{fx}$ ,  $W_{cx}$ and  $W_{ox}$  are N  $\times$  Z constant weight matrices while  $W_{ih}$ ,  $W_{fh}$ ,  $W_{ch}$  and  $W_{oh}$  are N  $\times$  N constant weight matrices. The multiplications  $\cdot$  are dot products. Finally,  $b_i$ ,  $b_f$ ,  $b_c$  and  $b_o$  are constant bias vectors made of N elements. The network is trained by properly choosing the constant values of the bias vectors and the weight matrices.

#### **3** Quantization Effects

An aggressive quantization allows obtaining an efficient implementation of Eqs. (2)–(7) but it also affects the accuracy of the network.

The quantization can be taken into account during the training process. However, we propose to apply quantization after the network has been trained. The advantage of this technique is that the design time of the accelerator is lower. As we will show, our approach introduces a negligible accuracy loss.

The search of the optimal quantization for a given target accuracy is not a straightforward task since it requires to fix independently the number of bits used for each one of the 7 signals in the network (the hidden state, the cell activation, the three gates, the two hyperbolic tangents) and the 12 constant matrices/vectors because these values have different dynamic range.

In order to reduce the search space, we define two parameters: the maximum variable error (MVE =  $2^{-M}$ ) and the maximum constant error (MCE =  $2^{-L}$ ).

MVE is the maximum error allowed on the representation of each variable signal *s* in Eqs. (2)–(7). If we define  $\hat{s}$  as the quantized version of the signal *s* we have:

$$\left|\hat{s} - s\right| \le \mathsf{MVE}\forall s \in \{h_t, i_t, f_t, c_t, o_t, g_t, \mathsf{tanh}(c_t)\}\tag{8}$$



MCE is the maximum error allowed on the representation of each weight *w* of each weight matrix. If we call  $\hat{w}$  the quantized representation of the weight *w* we have:

$$\left|\hat{w} - w\right| \le \text{MCE}\,\forall w \in W_{\text{ix}} \cup W_{\text{fx}} \cup W_{\text{cx}} \cup W_{\text{ox}} \cup W_{\text{ih}} \cup W_{\text{fh}} \cup W_{\text{ch}} \cup W_{\text{oh}} \tag{9}$$

The quantized representation  $\hat{b}$  of a bias value *b* is obtained according to the following rule:

$$\left| \hat{b} - b \right| \le \text{MCE} \cdot \text{MVE} \,\forall b \in b_{i} \cup b_{f} \cup b_{c} \cup b_{o}$$
 (10)

In order to show the effect of our quantization scheme we have designed and trained two RNNs. The first network is based on the scheme in Fig. 1a and is used to identify a speaker among 9 possible candidates. For this network Z is equal to 12 and N is equal to 50. The second network is based on the scheme of Fig. 1b. The network is used to predict the monthly occurrence of chickenpox on the basis of previous history. For this network Z is equal to 1 and N is equal to 200. The training and the test sequences of the two networks are available on-line [7, 10].

We have trained both networks using floating-point representation for each variable signal and each constant factor in the Eqs. (2)–(7). The training operation has been performed using Matlab. After the training process we have applied the constraint (9) on the weight factors.

Figure 2 shows the dependency of the accuracy of the first RNN on the MCE. The accuracy is computed as the percentage of correct speaker identification over the entire test set. As can be seen, decreasing the MCE, the accuracy of the network improves. However, the result in Fig. 2 shows that there is an optimal value for MCE. Reducing the MCE under the optimal spot does not increase the accuracy of the network. As can be seen L = 5 allows achieving the same accuracy of the floating point representation.

Figure 3 shows the result of a similar analysis performed on the second RNN. Here the accuracy is computed as the root mean square error between the value predicted by the network and the actual value of the series. Again, as can be seen, there is an optimal spot that can be used to fix the value of L.



Fig. 3 RMSE versus MCE for the forecasting RNN



Fig. 4 non-zero constant weights versus MCE for the classification RNN

Increasing the value of MCE not only allows reducing the number of bits used for the weight factors, it also allows reducing the overall number of non-zero constant weights. Figure 4 shows the number of non-zero values as a function of MCE for the first RNN. As can be seen, the overall number of non-zero coefficients reduces by 50% at the optimal spot (the one chosen in Fig. 2). Similar considerations can be done on the second RNN.

Once we have found the optimal spot for the MCE we can apply the constraints (8) and (10) on the variable signals and the bias values respectively. Figure 5 shows the relation between accuracy, RMSE and MVE for both networks. In this figure, L is fixed at the optimal spot as can be seen an optimal spot can be found for MVE as well and hence for M.

The use of the optimal spot is a technique that can be used for the quantization in any RNN. It allows to reduce the size of the signals in the accelerator. It also allows reducing the number of constants that must be stored in the internal memory of the accelerator. Overall, the loss on the accuracy of the network is neglectable.



Fig. 5 Accuracy versus MVE. a Accuracy of the classification RNN b RMSE of the forecasting RNN





## 4 Circuit Implementation

The direct implementations of Eqs. (2)–(7) requires the use of two memories to store the values of  $h_{t-1}$  and  $c_{t-1}$ . However, in a SoC architecture, this choice is non-optimal. By inspecting Eqs. (2), (3), (4) and (6), it can be seen that they all depend on the value of  $h_{t-1}$ . As a consequence, the scheduling of these operations on the DSPs of the SoC is limited by the conflicts in the access to the memory that stores  $h_{t-1}$  [3]. A better implementation can be obtained using the architecture in Fig. 6.

In our architecture the Eqs. (2), (3), (4) and (6) are divided in two parts: the recurrence part that depends on  $h_{t-1}$  and the input part that only depends on  $x_t$ . Instead of storing  $h_{t-1}$ , we store the recurrence part of each of the four equations separately. This choice greatly improves the latency of the circuit and, because of the feedback architecture, the throughput of the accelerator.

We have designed two accelerators. Both accelerators implement the first RNN discussed in the previous section and can be used for the classification of speakers. We have used high-level synthesis to synthesize the accelerator. We used Xilinx Zynq XC7Z020 as the target technology and Vivado HLS as the synthesis tool. The accelerator receives the commands from an AXI4 lite compatible interface. The inputs and the outputs are read from and stored into external block-RAMs in order

|                                | SRAM   | DSP  | FF    | Latency | Recurrence<br>latency | f <sub>clock</sub> (MHz) |
|--------------------------------|--------|------|-------|---------|-----------------------|--------------------------|
| Single<br>Recurrence<br>Mem.   | 387 Kb | 11   | 30604 | 58 K    | 31 K                  | 115                      |
| Multiple<br>Recurrence<br>Mem. | 387 Kb | 11   | 1157  | 30 K    | 7651                  | 115                      |
| [1]                            | 86 Kb  | 96   |       | 40 M    | N.A.                  | 200                      |
| [5]                            | 17 Mb  | 1504 | 453 K | 16 K    | N.A.                  | 200                      |
| [6]                            | 288 Kb | 50   | 13 K  | 127 K   | N.A.                  | 142                      |

 Table 1
 Accelerator performance

to keep the IP as fast as possible. We have fixed the clock frequency to 100 MHz to limit the maximum number of DSPs that can be cascaded in the data-path with no pipelining allowed.

The first accelerator uses a single recurrence memory to store the value of  $h_{t-1}$ . The achieved performance is shown in the first row of Table 1. 4 DSPs are used to implement the recurrence operations (the multiply-and-add operation involving  $h_t$ ) plus 7 DSPs for the remaining operations.

The second accelerator is based on the architecture of Fig. 6 and uses 4 recurrence memories. The computation of the recurrence operation is obtained with a parallel data path, using 4 DSPs.We allowed 7 DSPs to be used in the computation of the other equations. The results are shown in the second row Table 1.

As shown, the 4 data-paths used to compute the recurrence operations allows to reduce the recurrence latency by 75%. Furthermore, the use of 4 memories allows the reduction of the overall latency by 49% with the same number of arithmetic units used. Compared with previous art, the proposed circuit has a very small foot-print and is suitable for efficient accelerators for IoT devices.

### References

- 1. Han et al.: ESE: efficient speech recognition engine with sparse LSTM on FPGA. In: Proceedings of the 2017 ACM/SIGDA FPGA 2017, Monterey, CA, USA, Feb. 2017, pp. 75–84
- Wang, W., Xie, Y., Ren, L., Zhu, X., Chang, R., Yin, Q.: Detection of data injection attack in industrial control system using long short term memory recurrent neural network. In: 13th IEEE ICIEA, Wuhan, China, pp. 2710–2715 (2018). https://doi.org/10.1109/iciea.2018.8398169
- Zou, L., Gu, Y., Song, J., Liu, W., Yao, Y.: Long short-term memory based recurrent neural networks for collaborative filtering. In: IEEE UIC 2017 San Francisco, CA, USA, pp. 1–6. https://doi.org/10.1109/uic-atc.2017.8397539
- 4. Ardakani et al.: An Architecture to accelerate convolution in deep neural networks. IEEE TCAS I: Regular Papers **65**(4) (2018)
- 5. Price et al.: A low-power speech recognizer and voice activity detector using deep neural networks. IEEE JSSC **53**(1) (2018)

- 6. Moini et al.: A resource-limited hardware accelerator for convolutional neural networks in embedded vision applications. IEEE TCAS II: Express Briefs **64**(10) (2017)
- 7. JANUARY 2018, pp. 198–208.UCI Machine Learning Repository: Japanese Vowels Dataset. https://archive.ics.uci.edu/ml/datasets/Japanese+Vowels
- Chang et al.: Recurrent Neural Networks Hardware Implementation on FPGA (2015). https:// arxiv.org/abs/1511.05552v4
- 9. Du et al.: A Reconfigurable streaming deep convolutional neural network accelerator for internet of things. IEEE TCASI **65**(1) (2018)
- 10. Hyn-d-man, R.J.: Time Series DataLibrary. https://datamarket.com/data/list/?q=cat:g24% 20provider:tsdl

# Ultrasound Measurement of the Peak Blood Flow Based on a Doppler Spectrum Model



**Riccardo Matera, David Vilkomerson and Stefano Ricci** 

**Abstract** Doppler ultrasound techniques play an important role in the investigation of blood flow and, in recent decades, have become standards in cardiovascular medicine. In current clinical practice, an arterial stenosis is evaluated from the maximum blood velocity measured in an echo-Doppler investigation. Unfortunately, the blood Doppler signal produces a relative wide Doppler spectrum, and it is not trivial to detect the exact frequency that corresponds to the maximum velocity through the Doppler formula. The measurement is thus affected by high inaccuracies. In this work, a method based on a mathematical model of the Doppler spectrum is proposed to detect the frequency that corresponds to the maximum velocity. The method has been implemented in a custom electronics system and validated through experiments on a flow phantom. Experiments with flows between 100 and 300 mL/min (peak velocity range 6.6–19.9 cm/s) resulted in a bias lower t han 1% and a standard deviation 4%.

# 1 Introduction

The peak velocity of blood flowing in arteries has significant importance in the current clinical practice. In presence of stenosis the vessel lumen is reduced, and the blood velocity rises correspondingly. Consequently, the blood peak velocity represents one of the main parameters evaluated to decide about the need of surgery [1].

The Pulsed Wave (PW) echo Doppler ultrasound is the main investigation method for hemodynamic assessment [2], but unfortunately, the peak velocity is not easily detectable. Ideally, the maximum Doppler frequency  $f_{d_P}$  of the signal backscattered from blood should correspond to the maximum velocity,  $v_M$ , through the Doppler formula:

R. Matera (🖂) · S. Ricci

Department of Information Engineering, University of Florence, Florence, Italy e-mail: stefano.ricci@unifi.it

D. Vilkomerson DVX LLC, Princeton, NJ, USA

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_45

**Fig. 1** A large sample volume is generated that covers all the depths of the vessel for a length A



$$v_M = \frac{cf_{d_P}}{2f_t \cos\theta} \tag{1}$$

where *c* is the speed of sound,  $f_t$  the transmission frequency, and  $\theta$  the Doppler angle [3]. But unfortunately, the Doppler spectrum is distorted by several artefacts [4]. For example, the finite time the blood particles transit in the ultrasound beam produces a significant widening of the Doppler spectrum [2]. Thus, which is the Doppler frequency  $f_{d_P}$  of the spectrum related to the blood peak velocity? At present, heuristic methods are used to find the crucial highest frequency, usually by choosing the Doppler frequency that crosses a power threshold set in relation to the noise power level, or as a percentage of total spectral power [4, 5]. Such heuristic methods provide peak velocities that vary for differing signal-noise levels and transducer configurations, and therefore provide unreliable velocity determinations. In this work, a Doppler method for blood peak velocity detection is presented. Based on a mathematical model of Doppler spectrum, the method predicts how to locate the Doppler frequency  $f_{d_P}$  related to the blood peak velocity  $v_M$ .

## 2 Doppler Spectrum Model

Let's consider the configuration shown in Fig. 1, where a flow with parabolic velocity distribution travels through a vessel of diameter 2*R*. A transducer, excited in PW mode, transmits an unfocussed ultrasound beam of wavelength  $\lambda$ , insonating the flow, and receives the echoes from the moving scatterers at an angle  $\theta$  with respect to the flow direction.

The transmitted beam insonates all the flow components, travelling through the cylindrical vessel region of length A, and all of the backscattered signals generated from the flow are received. This condition can be obtained by transmitting sufficiently long ultrasound pulses at the Pulse Repetition Frequency (PRF) rate. For example, 40 cycles at 5 MHz cover a region of more than 11 mm length, when c = 1480 m/s.

**Fig. 2** Quantization of fluid flow into discrete velocity shells



The continuous distribution of velocity present in the flow is approximated by a group of *M* discrete *shells of fluid*, as shown in Fig. 2. The fluid in the *m*-th cell has a constant velocity:

$$v_m = (m+0.5) \cdot v_s \tag{2}$$

where  $v_s$  is the velocity quantization step and  $v_M$  is the peak velocity. The scatterers are observed for a time  $T_0 = N/PRF$ , where N is the number of samples acquired in reception. The shells where the scatterers do not have enough velocity to cross the whole length A in the observation time  $T_0$ , i.e. where  $v_m < A/T_0$ , are numbered from 0 to  $m_t - 1$ ; the remaining shells, i.e. where  $v_m \ge A/T_0$ , are labeled from  $m_t$  to M - 1, where  $m_t$  is the transition shell.

According to the model [6], the total Doppler power spectrum is approximated by summing the contribution from all of the shells generated in a *N*-point FFT:

$$S(f) = \sum_{m=0}^{m_{t}-1} \left[ (A - v_{m}T_{0})P(f, f_{m}, N) + \frac{2v_{m}T_{0}}{N} \sum_{j=1}^{N-1} P(f, f_{m}, j) \right] \\ + \sum_{m=m_{t}}^{M-1} \left[ A\left(\frac{T_{0}}{tt_{m}} - 1\right)P(f, f_{m}, w_{m}) + \frac{2A}{w_{m}} \sum_{j=1}^{w_{m}-1} P(f, f_{m}, j) \right] \\ P(f, f_{m}, w_{m}) = \left(\frac{w_{m}}{N}\right)^{2} \operatorname{sin} c^{2} \left[ |f_{m} - i| \frac{w_{m}}{N}\pi \right]; 0 < i < N$$
(3)

where  $tt_m$  is the transit time of the particles at velocity  $v_m$ ,  $P(f, f_m, w_m)$  is the FFT frequency pulse generated by the shell at velocity  $v_m$ , observed for  $w_m$  samples and centered at the Doppler frequency  $f_m$ :

$$f_m = \frac{v_m}{\lambda} \cos\theta \tag{4}$$
The Doppler spectrum features a typical plateau region before a slope down towards the higher frequencies. Thanks to this model, with the analytical calculation reported in [7], the Doppler frequency  $f_{d_P}$ , corresponding to the peak velocity  $v_M$ , can be accurately located in the slope region, exactly at half the reference power of the spectrum, taken in the plateau region.

#### **3** Experimental Validation

#### 3.1 Experimental Set-up and Data Processing

The validity of the model has been tested at the MSD Laboratory (Department of Information Engineering, DINFO, University of Florence, Italy) using the set-up depicted in Fig. 3. A hydraulic circuit consisting in two reservoirs located at different heights and connected by an 8 mm-diameter pipe (the measuring cell) was used. A blood-mimicking fluid flowed between the reservoirs. The experiments were performed using an ultrasound Doppler system specifically designed at the MSD Laboratory for ultrasound research activities [8] connected to 5 MHz transducers. The probe, whose configuration is shown in Fig. 1, features a 14 mm aperture capable of uniformly insonating the region below for a lateral extension of A = 10 mm. The same aperture receives the echo backscattered from the blood mimicking fluid with a Doppler angle of 46.5°. 40 cycles at 5 MHz were transmitted, so that the large sample volume generated (about  $10 \times 10 \times 10 \text{ mm}^3$ ) completely covered the pipe diameter. 13 experiments with flows between 100 mL/min and 300 mL/min, corresponding to 6.6 cm/s and 19.9 cm/s respectively, where performed. The PRF was set between 500 Hz and 2 kHz, depending on the flow velocity.

In each acquisition, about 60 s of signal were acquired, coherently demodulated in complex (I/Q) samples in the hardware, and then saved. A 50 Hz wall filter was first applied and then the power spectral density was estimated through a 1024-point FFT, achieving a spectral resolution of about  $1 \div 2$  Hz over  $1 \div 0.5$  s of observation time, respectively. For each flow setting, all spectra obtained from the acquired signal were averaged. The proposed method was used to locate the corresponding frequency value  $f_{d_P}$ , which was finally converted to the measured peak velocity by the Doppler equation (1). The measured velocity was compared with the reference peak velocity  $v_P$ , obtained from Eq. 5, that assumes a parabolic flow profile:

$$v_p = \frac{8 \cdot Q}{\pi \cdot D^2} \tag{5}$$

where D is the pipe diameter.



Fig. 3 Experimental set-up used for validating the method. Hydraulic circuit **a**, ultrasound Doppler system **b**, and software management **c** 



Fig. 4 Measurements (o) and regression line (red dashed) are compared with identity line (black dotted)

#### 3.2 Results

Figure 4 reports the measurements, together with the regression line (dashed red) and the identity line (dotted black). The regression line has gain and offset of about 1.04 cm/s and -0.4 cm/s, respectively. The coefficient of determination,  $R^2$ , is very close to 1. Figure 5 reports the Bland–Altman analysis of the data. Here, the vertical axis represents the difference between the measurement and the reference, expressed as percentage with respect to the mean value. A +0.5% bias is observed together with the limits of agreement of -2.4%, 3.4%.



Fig. 5 Measurements are compared with the reference with Bland-Altman method

#### 4 Conclusions

A method for the accurate measurement of the maximum velocity in echo-Doppler blood investigations was presented. The method, based on a mathematical model of the Doppler spectrum, is suitable to be adapted to the array probes employed in modern echographs [9], and to be used in vector Doppler applications [10], where the Doppler angle is automatically calculated. Other biomedical examinations can benefit from this method as well, like volume flow measurement and/or arterial wall shear rate evaluation [11]. The method resulted in an excellent linearity with practically no gain error, whereas Bland–Altman analysis reported no error dependence on the flow rate. The bias and standard deviation were lower than 1% and 3%, respectively.

#### References

- 1. Grant, E., Benson, C., Moneta, G., Alexandrov, A., Baker, J., Bluth E., et al.: Carotid artery stenosis: gray-scale and doppler US diagnosis. Radiology **229**(2), 340–346 (2003)
- Evans, D., McDicken, W.: Doppler Ultrasound: Physics, Instrumentation and Signal Processing. Wiley, Chichester (2000)
- Tortoli, P., Guidi, G., Newhouse, V.: Improved blood velocity estimation using the maximum doppler frequency. Ultrasound Med. Biol. 21(4), 527–532 (1995). https://doi.org/10. 1016/0301-5629(94)00137-3
- Steinman, A.H., Tavakkoli, J., Myers, J.G., Cobbold, R.S., Johnston, K.W.: Sources of error in maximum velocity estimation using linear phased-array doppler systems with steady flow. Ultrasound Med. Biol. 27(5), 655–664 (2001). https://doi.org/10.1016/S0301-5629(01)00352-0
- Marasek, K., Nowicki, A.: Comparison of the performance of three maximum doppler frequency estimators coupled with different spectral estimation methods. Ultrasound Med. Biol. 20(7), 629–638 (1994). https://doi.org/10.1016/0301-5629(94)90111-2

- Vilkomerson, D., Ricci, S., Tortoli, P.: Finding the peak velocity in a flow from its doppler spectrum. IEEE Trans. Ultrason. Ferroelect. Freq. Control 60(10), 2079–2088 (2013). https:// doi.org/10.1109/tuffc.2013.2798
- Ricci, S.: An analytical model of the doppler spectrum for peak blood velocity detection. In: IEEE Ultrasonics Symposium Proceedings, pp. 2241–2244. Chicago (IL) (2014). https://doi. org/10.1109/ultsym.2014.0558
- Ricci, S., Liard, M., Birkhofer, B., Lootens, D., Brühwiler, A., Tortoli, P.: Embedded doppler system for industrial in-line rheometry. IEEE Trans. Ultrason. Ferroelect. Freq. Control 59(7), 1395–1401 (2012). https://doi.org/10.1109/tuffc.2012.2340
- Ricci, S., Matera, R., Tortoli, P.: An improved doppler model for obtaining accurate maximum blood velocities. Ultrasonics 54(7), 2006–2014 (2014). https://doi.org/10.1016/j.ultras.2014. 05.012
- Ricci, S., Vilkomerson, D., Matera, R., Tortoli, P.: Accurate blood peak velocity estimation using spectral models and vector doppler. IEEE Trans. Ultrason. Ferroelect. Freq. Control 62(4), 686–696 (2015). https://doi.org/10.1109/tuffc.2015.006982
- Ricci, S., Swillens, A., Ramalli, A., Segers, P., Tortoli, P.: Wall shear rate measurement: validation of a new method through multi-physics simulations. IEEE Trans. Ultrason. Ferroelect. Freq. Control 64(1), 66–77 (2017). https://doi.org/10.1109/tuffc.2016.2608442

# Embedded System to Recognize Movement and Breathing in Assisted Living Environments



#### Eva Rodríguez de Trujillo, Ralf Seepold, Maksym Gaiduk, Natividad Martínez Madrid, Simone Orcioni and Massimo Conti

**Abstract** The goal of this paper pretends to show how a bed system with an embedded system with sensor is able to analyze a person's movement, breathing and recognizing the positions that the subject is lying on the bed during the night without any additional physical contact. The measurements are performed with sensors placed between the mattress and the frame. An Intel Edison board was used as an endpoint that served as a communication node from the mesh network to external service. Two nodes and Intel Edison are attached to the bottom of the bed frame and they are connected to the sensors.

# 1 Introduction

Due to many researches it is well known that the human body needs sleep like food, water, or oxygen to survive. In general, sleep is a state where the body and mind are

M. Gaiduk e-mail: maksym.gaiduk@htwg-konstanz.de

R. Seepold · N. Martínez Madrid

Department of Information and Internet Technology, Sechenov University, Campus Trubetskaya Str., 8, b. 2, 119992 Moscow, Russia e-mail: natividad.martinez@reutlingen-university.de

N. Martínez Madrid Reutlingen University, Alteburgstr. 150, 72762 Reutlingen, Germany

M. Conti e-mail: m.conti@univpm.it

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_46 391

E. R. de Trujillo · R. Seepold (⊠) · M. Gaiduk

HTWG Konstanz, Alfred-Wachtel-Str. 8, 78462 Konstanz, Germany e-mail: ralf.seepold@htwg-konstanz.de

S. Orcioni · M. Conti Department of Information Engineering, Università Politecnica delle Marche, Via Brecce Bianche, 12, 60131 Ancona, Italy e-mail: s.orcioni@univpm.it

allowed to rest and become restored [1]. Poor sleep can lead to many health issues as dementia, diabetes, heart disease and many other problems [2]. The sleep quality is based on the amount of sleep we get and in which positions we sleep [3]. If you don't get enough sleep, you experience less time in this relaxed state and over time are at a higher risk for stroke, angina, and heart attacks. There are five stages of sleep. Stages 1–4 are non-REM sleep, followed by REM sleep. These stages progress cyclically from 1 through REM then begin again with stage 1. A complete sleep cycle takes an average of 90–110 min. Stage 1 is light sleep where you drift in and out of sleep and can be awakened easily. In this stage, the eyes move slowly and muscle activity slows. In stage 2, eye movement stops and brain waves become slower with only an occasional burst of rapid brain waves. When a person enters stage 3 (first stage of a deep sleep), extremely slow brain waves called delta waves are interspersed with smaller, faster waves. In stage 4, deep sleep continues as the brain produces delta waves almost exclusively. During deep sleep, human growth hormone is released and restores your body and muscles from the stresses of the day. Stage 2 follows this deep sleep phase and the REM phase occurs. During REM (rapid eye movement) sleep, brain waves mimic activity during the waking state. The eyes remain closed but move rapidly from side-to-side. Breathing becomes more rapid, irregular and shallow, eyes jerk rapidly and limb muscles are temporarily paralysed. Also, heart rate increases and blood pressure rises. Muscle paralysis often accompanies REM sleep. A sleep cycle refers to the period of time it takes for an individual to progress through the stages of sleep outlined above. After REM sleep, the individual returns to stage 1 of light sleep and begins a new cycle. The amount of time you spend in each stage also depends on your age [4] (Fig. 1).

Recent research suggests that different sleep positions not only impact our quality of sleep but are also linked to our personalities [3]. The three main sleeping positions are 'Side, Back and Stomach'. Each position has advantages and disadvantages. For example, people with sleep apnea being advised to sleep on their side rather than their back. A lateral position during sleep is effective in reducing sleep disorder



Fig. 1 Hypnogram showing sleep stages and cycles in adult sleep [5]

symptoms or moderate snoring and apnea since this increases in the supine posture and decreases in the lateral posture. Sleep specialists recommend sleeping on your side as it will give you a more comfortable rest and can decrease the likelihood of interrupted sleep.

#### 2 State of Art

Polysomnography (PSG) is the measurement of multiple physiological parameters during sleep and is a sleep study type. It is usually conducted in sleep laboratories. The used technologies for PSG are electroencephalogram (EEG), electrooculogram (EOG), electromyogram (EMG), electrocardiogram (ECG), blood oxygen saturation (SpO2), respiratory airflow and respiratory effort [6]. PSG is the most accurate way to measure physiological changes in sleep. It is used to determinate sleep stages and to diagnose particular sleep disorders, like insomnia, sleep apnea or circadian rhythm sleep disorders to mention a few [7]. Unfortunately, the practice of PSG is a complex procedure. It takes approximately 30–40 min to prepare the patients for the sleep study. To perform the PSG, subjects have to carry a minimum of 22 wire attachments to their body during sleep [8]. With the three parameters EEG, EOG and EMG it is possible to divide sleep into different stages [9]. The Fig. 2 shows an overview about the used polysomnography leads and how this can look in practice.

This is a commonly used method that submits the subject under stress, thereby leading to the subject encountering even more difficulty in falling sleep. The draw-



back of using these methods is a requirement of complex equipment, necessary knowledge to evaluate the results and a controlled environment which is why these measurements are usually done only for clinical or research purposes. But simple sleep monitoring can be done using much simpler processes with heart rate, body movement and position tracking. There is a persistent need for a relatively low-cost, simple alternative to polysomnography for the evaluation of sleepiness and insomnia. For these reasons amongst others, there is a trend of new research using non-invasive sleep analysis system, that is the proposal on this work. Examples of such sensors used for this kind of analyses are; piezoelectric signals, video motion system, optical fiber sensors, radar sensors, load cells, textile recording system, pneumatic method and pressure sensors. The system which this work is based on works with pressure sensor. Pressure is force applied over an area, and for the sleep analysis the pressure is measured by the change on the electric field. The force sensing resistor (FSR) is the type of sensor selected for this work. It not only shows great potential in being able to identify gross movement and respiration, but the sleep phases as well. FSR has a resistance value that changes according to the applied force. 0.01-10 kg is the typical range of applied force recognized by FSR [11]. Other methods to track the sleep are based on the use of motion sensors, which measure the body movement-actigraphy [12]. Some other proposing studies work with wereable devices and read time data [13].

## **3** System and Methodology

The proposed solution is based on a pressure sensor mesh network which is placed under the mattress and which tracks the movement and vital signs of the sleepers. Furthermore, an algorithm is created to detect the position that the subject is lying on the bed. This algorithm is capable to figure out the position thank to the values taken from 24 pressure sensors placed under the mattress, avoiding any kind of contact with the subject to exclude inconveniences. The pressure sensors are distributed on four vertical lines and six horizontal lines, following the main pressure points of more frequent sleep positions. FSR was chosen because the contact surface is greater in this type than in others pressure sensors. The 24 pressure sensors are held on plastic base-plates set in the frame of the bed and they are connected to the microcontroller ATSAMD21J16 32-bit ARM. This is the microcontroller that is used to convert analogue voltage into a digital value. An Intel Edison Board was used as endpoint that served as a communication node connecting the sensor network to external services. It records the data and passes it into a visualization and analysis tool. Two nodes and Intel Edison are attached to the bottom of the bed frame and they are connected to the sensors. Each node is connected to 12 sensors and an endpoint. The nodes are communicating with the endpoint via single I2C but address arbitration is implemented digitally. The node measures voltage value on the sensor pins and saves it in the local dynamic buffer. When a read-request arrives via system bus, the microcontroller processes the request and returns the latest measurements. The data collection is done on the endpoint node. Periodically, the endpoint queries for the new data [14].

The goal of this algorithm is to detect the position that the subject is lying on the bed. To reach this goal, the typical patterns for each group of positions and for each specific position inside each group were created. The algorithm compares all values of the current position with the patterns from the groups of positions. In detail, the algorithm works with a two-stage approach: (1) It selects a corresponding group of positions fitting to the current position. (2) The algorithm looks for the exact position according the group selected in (1). Statistics analysis was done to find out the typical patterns of positions and the patterns for each group of positions.

#### 4 **Results**

Several measurements have been executed with different persons for the evaluation of the movement and position recognition. Each person has simulated different sleep positions while recording. The bed system is available to recognise the different movements over the test, the position that the person is lying on and the breathing rate.

The current algorithm can recognize three different main groups of positions. The main groups of positions that are now recognized are to be sit on the bed, lying on one side or lying on the back. Inside each group, it is possible to make difference between to be sit at the top of the bed and to be sit at the end of the bed, to be lying on the right side or to be lying on the left side. In conclusion, five different positions are recognized.

It is possible to recognise the periodical signal and the pikes from breathing. The frequency of signal recording was equal to 1 Hz, but with a higher frequency the periodical signal would be clearer.

In the laboratory, 30 measurements were taken in different positions to check the algorithm's matching capabilities. In eight of them, the subject was lying on the right side. In others eight measurements, the person was lying on the left side. Nine measurements were taken with the subject lying on her back, and finally 5 measurements were taking with the subject sitting on the bed. The algorithm hit ratio is 83%, since we got 25 hits and 5 mistakes.

The next figure shows the relation between the distribution of the maximal values taken from the sensors placed under the bed and the main pressure points on the right side position. We can observe that the main points on the lying right side position are corresponded with the maximal values (Fig. 3).



Fig. 3 Distribution of sensors values in position "lying on the right side"

#### **5** Conclusions

The proposed system is able to recognise movements and positions and it shows capacity to detect respiration signals; this is why the system could be suitable for sleep analysis providing data about movement and related to respiration. Especially, the clear and simple setup of the hardware and the embedding of the algorithm into small prototyping boards, demonstrate the potential of high applicability at home domains. On the other hand, the classification match of about 83% underline the effective algorithm implemented.

The results obtained from the algorithm can be evaluated in cooperation with sleep medicine experts, since the position that the person is lying on over the night can be related with the personality of this subject, the sleep quality and some diseases. Some specific positions can be effective in reducing sleep disorder symptoms.

As next step, it is planned to move to a higher scan frequency, in order to reach better results according to the breathings rate detection and possibly to show heart activities.

In parallel, the algorithm will be extended to cover and obtain more positions.

Acknowledgements This research was partially funded by the EU Interreg V-Program "Alpenrhein-Bodensee-Hochrhein": Project "IBH Living Lab Active and Assisted Living", grants ABH040 and ABH66.

#### References

- 1. William, H.: Springs: Essentials of Polysomnography (2009)
- 2. Cummings, S.: The best and worst sleep positions for different medical problems. In: The Sleep Advisor (2017). https://www.sleepadvisor.org/best-sleeping-positions/
- 3. Hines, J.: Which sleep position is the best. Alaska sleep clinic (2018). http://www.alaskasleep. com/blog/which-sleep-position-is-best
- Willson, A.: Benefits of sleep, tuck advancing better sleep (2017). https://www.tuck.com/sleepbenefits/
- 5. Mastin, L.: How sleep works. www.howsleepworks.com
- 6. Lee-Chiong, T.: Sleep Medicine: Essentials and Review (2008)
- 7. Redeker, N.S., McEnnany, G.P.: Sleep Disorders and Sleep Promotion in Nursing Practice (2011)
- 8. Chokroverty, S., Thomas, R.: Atlas of Sleep Medicine, 2nd edn (2013)
- 9. Friedman, M.: Sleep Apnea and Snoring: Surgical and Nonsurgical (2008)
- 10. Brain Ripley, W.V.: (2016) https://cran.rproject
- 11. Yaniger, S.I.: Force Sensing Resistors: A Review of the Technology, pp. 666–668. Electro International (1999)
- 12. Hedner, J., Pillar, G., et al.: A novel adaptive wrist actigraphy algorithm for sleep-wake assessment in sleep apnea patients. Sleep **27**(8), 1560–1566 (2004)
- Velicu, O.R., Madrid, N.M., Seepold, R.: Experimental sleep phases monitoring. In: 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI), pp. 625–628 (2016)
- Gaiduk, M., Vunderl, B., Seepold, R., Ortega, J.A., Penzel, T.: Sensor-mesh-based system with application on sleep, study. In: Rojas I., Ortuño F. (eds) Bioinformatics and Biomedical Engineering. IWBBIO 2018. Lecture Notes in Computer Science, vol. 10814. Springer, Cham (2018)

# **Energy Harvesting with Current Sensors to Sustain Embedded IoT Platforms**



Matteo Nardello and Davide Brunelli

**Abstract** Powering IoT devices is becoming a major issue due to the expected number of nodes to be installed in the next years. Powering those devices by harvesting energy from the environment can represent a solution for avoiding battery or AC/DC converters. In this paper, we present the study of an electrical energy harvesting power supply, using a current transformer sensor, as a feasible solution for powering low power embedded platform without batteries. The proposed system is not dependent on the load, and can be adapted to different scenarios and IoT applications.

Keywords Energy harvesting · Autonomous systems · Embedded systems

# 1 Introduction

Nowadays energy harvesting systems have emerged as a prominent research area and continues to grow at a rapid pace. Providing solutions for avoiding the need for batteries or power switches, would support the widespread of IoT devices as well as distributed monitoring systems. A wide range of applications suitable to be powered by harvesters are starting to appear [1–5], including distributed wireless sensor nodes for data streaming [6], embedded and implant sensors for wearable applications [7], sensor data acquisition from unmanned vehicles [8], smart merters [9], and Industrial Internet of Things (IIoT) [10].

To this end, we have focused our work on the develop and study of an energy harvesting power supply for low-power embedded systems, in this case, an embedded platform powered by a low-power STM32 MCU and a LoRa radio (Fig. 1). For tailoring the design to sustain the load, we have started the study from the power

M. Nardello (🖂) · D. Brunelli

Department of Industrial Engineering, University of Trento, 38123 Trento, Italy e-mail: matteo.nardello@unitn.it URL: https://www.dii.unitn.it/

D. Brunelli e-mail: davide.brunelli@unitn.it

© Springer Nature Switzerland AG 2019

Pervading Industry, Environment and Society, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_47

399

S. Saponara and A. De Gloria (eds.), Applications in Electronics



Fig. 1 System architecture block diagram

requirement of the load and then tailored the main components of the energy harvesting power supply, as the intrinsic characteristics of the used components have an impact on efficiency.

Two different current transformers (CT) sensors have been compared, one (CT1) with 1:1000 ratio<sup>1</sup> and the second one (CT2) with a 1:3000 ratio.<sup>2</sup> Several diodes produced by different manufacturers have also been tested, for evaluating the most efficient combination at low currents. Last key component is the energy storage, that should be tailored to sustain the energy requirements of the platform to be powered. Even if, there are innovative technologies available on the market, such as the lithium-ion capacitors [11], supercapacitors are more suitable to achieve an energy-neutral architecture, in our case. The paper is structured as follows: Sects. 2 and 3 discuss the selection of the most efficient CT sensor and rectifier components, respectively. Results discussion and sustainability tests are presented in Sect. 4; then Sect. 5 concludes this work.

# 2 Rectifier Analysis and Energy Reservoir Selection

Several diodes, with different characteristics are chosen and analyzed in terms of harvested energy efficiency, to address the best bridge rectifier configuration. Initial tests were conducted using a 100 mF supercapacitor as an energy reservoir and the Vitec CT sensor with a 3:3000 turn ratio. Different primary loads have been simulated using an array made of several incandescent light bulbs and loads, ranging between 100 and 1400 W. Regarding the rectifiers, bridges with the following diodes

<sup>&</sup>lt;sup>1</sup>LEM TOP 90-S10/SP2. https://docs-emea.rs-online.com/webdocs/14ce/0900766b814cedb3.pdf.

<sup>&</sup>lt;sup>2</sup>Vitec 57PR1673. http://www.viteccorp.com/data/CatalogSensing.pdf.



**Fig. 2** a Charging curves of a 100 mF supercapacitor, using 7 different diodes. For clarity, the figure shows curves ranging from 2.5 to 4.5 V. **b/c** Charging curves of a 22 mF supercapacitor, while harvesting energy from a 500 W (**b**) and 1400 W (**c**) primary load. For clarity, the figure shows curves ranging from 2 to 4.5 V

have been investigated: SDM10P4,<sup>3</sup> PMEG2010ER,<sup>4</sup> BAT30,<sup>5</sup> BAT43,<sup>6</sup> BAT48,<sup>7</sup> 1N5711<sup>8</sup> and 1N4007,<sup>9</sup> and results comparison is in Fig. 2a. In this respect, the time required for charging the supercapacitor to a specific voltage can vary dramatically, due to the characteristic of the selected diodes (i.e. BAT48 vs. IN5711).

#### 2.1 Bridge Selection

To evaluate the small differences between the diodes, another set of tests using a smaller supercapacitor have been conducted, this time using only the 3 selected diodes BAT43, BAT48, and SDM10 since they are the most efficient at lower input current, when the energy that can be harvested is limited. After a performance comparison, BAT48 was selected for further experiments, as the most efficient at low primary loads. Results are summarized in Fig. 2.

#### 2.2 Supercapacitor Selection

The energy reservoir, used as the accumulator for the harvested energy, is usually tailored and calculated for the specific application requirements. Starting from the

<sup>&</sup>lt;sup>3</sup>SDM10P45. https://www.mouser.it/datasheet/2/115/ds30287-71260.pdf.

<sup>&</sup>lt;sup>4</sup>PMEG2010ER. https://assets.nexperia.com/documents/data-sheet/PMEG2010ER.pdf.

<sup>&</sup>lt;sup>5</sup>BAT30. https://www.st.com/resource/en/datasheet/bat30.pdf.

<sup>&</sup>lt;sup>6</sup>BAT43. https://www.vishay.com/docs/85660/bat42.pdf.

<sup>&</sup>lt;sup>7</sup>BAT48. https://www.st.com/resource/en/datasheet/bat48.pdf.

<sup>&</sup>lt;sup>8</sup>1N5711. https://www.st.com/resource/en/datasheet/1n5711.pdf.

<sup>&</sup>lt;sup>9</sup>1N4007. https://www.diodes.com/assets/Datasheets/ds28002.pdf.

amount of energy consumed by the specific task set, the capacitance of the supercapacitor is chosen accordingly. Three different supercapacitors have been tested. We have used a 5.5 V 100 mF supercapacitor, a 4.5 V 22 mF, and a series of two 12 V 90 mF. To prevent the power line dropping below the brownout of the platform, another smaller capacitor has been inserted after the buck converter, connected to the board power line. This capacitor is used for withstanding the current peak associated with the radio transmission.

#### **3** CT Sensor Analysis

Like any transformer, a current transformer consists of a primary winding, a core, and a secondary winding. In our case, the primary is the wire where the CT sensor is clamped on, while the secondary winding is a characteristic of the sensor itself. We have studied two different sensors (CT1) and (CT2) with respectively 1000 and 3000 secondary turns. The relation between the current flowing in the primary and the current in the secondary, is described by Fig. 3a. Contrarily from the expected the sensor with a lower turning ratio charges the supercapacitor faster, in contrast, with Fig. 3a. Experimental tests have highlighted how the output current is affected by the magnetic coupling and impedance mismatch between the CT and the bridge rectifier and that a higher number of primary winding contributes to a better magnetic coupling, thus higher output current in secondary. Result while charging a 100 mF supercapacitor with a 500 W primary load is presented in Fig. 3b. As it can be noted, the LEM sensor charges the supercapacitor 8 times slower than the Vitec while having a higher turning ratio.



**Fig. 3** a Working principle of a current transformer. **b** Charging curves using two different CT. **c** Picture of the installed sensor, with 3 turns of the primary for a ratio of 3–3000

# 4 Sustainability Analysis

Once selected the main components of the energy harvesting power supply, we moved our tests to power an embedded platform based on a STM32L4 MCU and an SX1276 LoRa radio. To power the board, an efficient ultra-low-quiescent-current buck convert is connected between the high voltage input supercapacitor and the power line of the board. Another smaller capacitor is connected to the output of the buck converter to withstand the peak current associated with the transmission of a LoRa packet. The code running inside the embedded platform simulate a typical scenario where it is required to acquire some data, analyze it and then send using a radio channel:

- **Data acquisition**: The first task of the board is to acquire a signal using the internal ADC. While in this state, the CPU of the microcontroller is offloaded thanks to the use of the DMA, requiring just 200  $\mu$ A.
- **Data analysis:** Acquired data is first filtered and then analyzed. For simulating a heavy task, the board computes an FFT of the acquired data. In this case, the energy requirement is proportional to the frequency of the MCU.
- LoRa transmission: Last task is data streaming. Every 5 min, a report is sent to a remote server, with all the data collected and analyzed. This is the most demanding task in terms of energy with a current peak of 120 mA.

The first two tasks are repeated every 500 ms, while data is transmitted using the radio every 5 min. This is the time for a complete cycle used for evaluating the sustainability of the whole system. This analysis has been carried out by monitoring the voltage of our energy reservoir, that should increase, or at least stay constant, between two successive cycles. This means that the energy harvested is equal to or greater than the energy used by the load. Another range of tests has been conducted to create a relation between the CPU frequency and the minimum primary load for ensuring self-sustained operation, with a packet transmission every 5 min. Frequency between 1 and 48 MHz has been tested. Results are presented in Table 1. Figure 4 presents the voltage trend of the input supercapacitor, during the execution of the 3 tasks above. As it can be noted, the node is able to self-sustain its operation, avoiding the need for a battery or an AC/DC converter.

| Table 1 Relation between er o nequency and primary load, to ensure sen-sustainability |     |     |     |     |     |     |     |     |
|---------------------------------------------------------------------------------------|-----|-----|-----|-----|-----|-----|-----|-----|
| Primary load [W]                                                                      | 300 | 300 | 400 | 400 | 500 | 600 | 500 | 600 |
| Frequency [MHz]                                                                       | 1   | 2   | 4   | 8   | 16  | 24  | 32  | 48  |

Table 1 Relation between CPU frequency and primary load, to ensure self-sustainability



**Fig. 4** Input supercapacitor voltage trend, while powering the embedded platform. **a** CPU clocked at 8 MHz while harvesting from 400 W primary load. **b** CPU clocked at 48 MHz while harvesting from 600 W primary load. As it can be noted, by scaling the frequency, we can adapt the energy demand to the energy provided by the harvester to meet self-sustainable operation

#### 5 Conclusion

Providing new solutions for avoiding the use of batteries and AC/DC converters can foster the wide spreading of resource-constrained devices for environmental monitoring. Starting from the power requirements of an embedded system to be powered, in this paper, we tested and sized the energy harvesting power supply. Experimental results have demonstrated how components and parameters optimization is mandatory for maximizing the quantity of energy that can be harvested and thus transferred to the load, and how the proposed system is a feasible solution for powering resourceconstrained platforms. Moreover, the proposed power supply is transparent to the load and can be adapted to different scenarios.

#### References

- 1. Sartori, D., Brunelli, D.: A smart sensor for precision agriculture powered by microbial fuel cells. In: 2016 IEEE Sensors Applications Symposium (SAS), pp. 1–6 (2016)
- Kondo, T., Chiwaki, N., Sugahara, S.: Design and performance of thin-film μteg modules for wearable device applications. In: 2017 IEEE Electron Devices Technology and Manufacturing Conference (EDTM), pp. 201–203 (2017)
- Rossi, M., Rizzon, L., Fait, M., Passerone, R., Brunelli, D.: Energy neutral wireless sensing for server farms monitoring. IEEE J. Emerg. Sel. Top. Circuits Syst. 4(3), 324–334 (2014)
- Bergonzini, C., Brunelli, D., Benini, L.: Comparison of energy intake prediction algorithms for systems powered by photovoltaic harvesters. Microelectron. J. 41(11) (2010)
- Porcarelli, D., Brunelli, D., Benini, L.: Clamp-and-forget: a self-sustainable non-invasive wireless sensor node for smart metering applications. Microelectron. J. 45(12) (2014)

- Brunelli, D., Maggiorotti, M., Benini, L., Bellifemine, F.L.: Analysis of audio streaming capability of Zigbee networks. In: Wireless Sensor Networks, EWSN 2008. Lecture Notes in Computer Science, vol. 4913. Springer, Berlin (2008)
- Brunelli, D., Farella, E., Rocchi, L., Dozza, M., Chiari, L., Benini, L.: Bio-feedback system for rehabilitation based on a wireless body area network. In: Fourth Annual IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW 2006), pp. 5, 531 (2006)
- Rossi, M., Brunelli, D., Adami, A., Lorenzelli, L., Menna, F., Remondino, F.: Gas-drone: portable gas sensing system on UAVS for gas leakage localization. In: Sensors, pp. 1431– 1434. IEEE (2014)
- Dalpiaz, G., Longo, A., Nardello, M., Passerone, R., Brunelli, D.: A battery-free non-intrusive power meter for low-cost energy monitoring. In: IEEE Industrial Cyber-Physical Systems (ICPS), St. Petersburg, vol. 2018, pp. 653–658 (2018)
- Tessaro, L., Raffaldi, C., Rossi, M., Brunelli, D.: Lightweight synchronization algorithm with self-calibration for industrial LORA sensor networks. In: Workshop on Metrology for Industry 4.0 and IoT, Brescia, vol. 2018, pp. 259–263 (2018)
- Porcarelli, D., Brunelli, D., Benini, L.: Characterization of lithium-ion capacitors for lowpower energy neutral wireless sensor networks. In: 2012 Ninth International Conference on Networked Sensing (INSS), pp. 1–4 (2012)

# A PXI Based Implementation of a TLK2711 Equivalent Interface



Pietro Nannipieri, Luca Dello Sterpaio, Antonino Marino and Luca Fanucci

**Abstract** In the last few years, satellite on board data handling data rate requirements increased dramatically, from hundreds of megabits per second to several gigabits per second. SpaceWire, the current generation European standard communication protocol, is no more able to satisfy the demanding requirements, while the upcoming protocol, SpaceFibre, has not been adopted yet in any current space missions. This resulted in the adoption by the industry and by space agencies of a non-standardized solution, the TLK2711 ASIC chip, which implements lower communication protocol levels of the Open System Interconnection stack (Wizard Link), leaving to the user the implementation of higher-level stack layers. Consequently in the space community the need for hardware to test these systems grown. In this paper, an implementation of a Wizard Link TLK2711 equivalent circuit on a National Instruments PXI platform is presented, in order to boost the development of TLK2711 compatible ground test equipment.

**Keywords** PXI · EGSE · TLK2711 · Wizard link · Space · High speed serial communications

# 1 Introduction

The evolution in space missions is continuously pushing forward the request of higher on-board communication data-rates [1]. The european State-Of-the-Art (SOA) for spacecraft on-board communication is the SpaceWire standard [2]. It provides a

P. Nannipieri (🖂) · L. Dello Sterpaio · A. Marino · L. Fanucci

Department of Information Engineering, University of Pisa, Pisa, Italy e-mail: pietro.nannipieri@ing.unipi.it

L. Dello Sterpaio e-mail: luca.dellosterpaio@ing.unipi.it

A. Marino e-mail: antonino.marino@ing.unipi.it

L. Fanucci e-mail: luca.fanucci@unipi.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_48

407

bi-directional, full-duplex data links from 2 Megabit Per Second (Mbps) up to 200 Mbps designed to connect together instruments, mass-memory, high data-rate sensors and telemetry/telecommand subsystem on-board spacecraft. The purpose of SpaceWire standard was to fulfil the needs of hundreds of Mbps data-rate and to reduce system integration costs re-using data handling equipment across different missions. Nowadays SpaceWire is used as medium-rate on-board interface: applications such as Synthetic Aperture Radar (SAR) require data-rate in the order of Gigabit Per second (Gbps). In order to fulfil these needs, the European Space Agency (ESA) promoted the development of a new standard: SpaceFibre [3]. SpaceFibre is the upcoming standard for high-speed highly reliable on-board data communications. It is able to support data rates as 2.5 Gbps, 3.125 Gbps and 6.25 Gbps per lane, which can be further be incremented with multi-lane design. SpaceFibre introduces the concept of Virtual Channels (VCs) and provides coherent Quality of Service (QoS) and Fault Detection Isolation and Recovery (FDIR) techniques. Although SpaceFibre will be able to meet high data-rate requirements, the final version of the standard has not been released yet. Currently, the high data-rate (Gbps) is obtained through different non-standardised on-board interfaces: trough general use of Serialiser Deserialise (SerDes), building up completely custom protocols (Data-rate dependent on the technology), parallel LVDS (up to 1.6 Gbps) and the widely used Wizard Link (up to 2 Gbps). Wizard Link is a chipset solution from Texas Instrument also called TLK2711 [4] (see Fig. 1). It provides: bi-directional point-to-point communication; 16-bit data bus; 8b/10b encoding, parallel-to-serial and serial-to parallel and data-rate up to 2 Gbps. In the last decade, Wizard Link has been adopted as high-speed interface in many space missions such as Sentinel-1, COSMO-SkyMed Second Generation, ALOS-2 and it is widely used in the upcoming space missions [5]. As high-speed interface of the on-board data handling, the TLK2711 is a relevant part for Electronic Ground Segment Equipment (EGSE). In fact, Wizard Link capable EGSEs shall be equipped of the same number of TLK2711 chips that the Device Under Test (DUT) has, in order to test the correct functionality of the system.

#### 2 Wizard Link Equivalent IP

The core of the proposed system is the TLK2711 equivalent IP. It exactly emulates the behaviour of a Wizard Link node on a technology independent solution, portable on different FPGA platforms. With these capabilities, a potential system integrator will be able to directly interface the programmable logic section of the EGSE, provided that it embeds high speed serial link capabilities, directly to a Wizard Link interfaced Device Under Test (DUT). Let us suppose that we have to establish a link between a DUT with serial Wizard Link interface and an host FPGA, where the low level portion of the EGSE system is mapped. The user would need a test set up that can be represented as in Fig. 2.

It is easy to notice that 3 different devices are to be used, in particular the host side requires a TLK2711 Integrated Circuit (IC) itself as interface, to connect the host



Fig. 1 TLK2711 high level block diagram

FPGA with the DUT. This results in higher costs, both for the number of components used and for the greater area necessary on the same circuit board. With our proposed solution the same test set-up may be realised as shown in Fig. 3 just interfacing the TLK2711 with the high speed serial interface of the FPGA, which then embeds also the hardware section of the EGSE system.

To achieve full compatibility with a Wizard Link node, the system shall:

- Perform 8B/10B encoding to 20 bit word on TX side.
- Perform 8B/10B decoding to 16 bit word on RX side.
- Perform comma alignment of symbols parallelised by the SerDes.
- Serialize 20 bit word on TX side and de-serialize 20 bits word on RX side.
- Properly signal error situations to the host, i.e. output K31.7 symbol on both bytes in case of loss of signal.



Fig. 2 Classical test setup for DUT with Wizardlink interface



Fig. 3 Test setup with proposed architecture for DUT with Wizardlink interface

#### **3** Proposed System

#### 3.1 PXI Platform

The PCI eXtension for Instrumentation (PXI) standard is aimed to increase the Peripheral Component Interconnect (PCI) specifications both in terms of performance and reliability [6]. Such requirements are needed for instrumentations intended for harsh industry environments and applications. Also, system flexibility is a key aspect and it results improved thanks to its innate modular approach, Eurocard form factor and hot swap capabilities. Indeed, based on PCI communications protocol and Compact PCI (cPCI) hardware, the PXI specifications introduces along sets of dedicated signals for common reference clock distribution, several levels of precise synchronization triggers and local peer-to-peer buses. Compared to custom or proprietary solutions, the use of a standardized approach minimizes the risks of hardware incompatibility and offers the opportunity to unify under a single environment different instruments, even of very different nature. As downside, setting up the PXI environments often requires a steep initial investment if compare to custom and specific stand-alone instruments. National Instruments is one of the main test equipment manufacturer and supporter of the PXI standard adoption. All National Instruments PXI systems are natively integrated into their LabVIEW environment. The Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a is system design environment based on a graphic approach of block diagrams to represent the data flow and operations. It is intended to rapidly design applications for test, measure and control [7].

As target PXI peripheral module the National Instruments PXIe 6591 High Speed Serial Instrument has been chosen, mainly for its GTX transceivers integrated into the Xilinx Kintex 7 chip onboard [8]. PXIe 6591 transceivers are capable of sustain serial links with throughput up to 12.5 Gbps. Transceiver signals are available to the board physical front panel through two miniSAS HD connectors (four GTXs per port). No physical connection adaptor has been designed for the miniSAS HD ports since this aspect is very application dependent and unrelated to the TLK2711 itself.





## 3.2 TLK2711 Equivalent System Design

The TLK2711 equivalent IP has been used to emulate a Wizard Link node. In Fig. 4 a block scheme of the IP is presented.

In particular the system is composed by the following blocks:

- A SerDes Block, responsible of shifting parallel data paths into serial links, both in TX and RX. The SerDes shall also recover the RX CLK signal from the input serial data stream.
- On RX Side, a block called Symbol Synchronisation is responsible of properly align the 20 bit words outputted by the de-serialiser, using unique bit sequences called commas.
- An 8B/10B encoder/decoder. The advantages of this form of encoding are redundancy on symbols which make easy to understand whether an error occurred and easy clock recover due to the high number of transition between zeroes and ones.
- A compact Control block is built on top of the other section of the IP. It ensures that all the high level features of the TLK2711 are properly emulated, i.e. the way errors are handled.

In order to implement the system design into a NI PXIe FPGA board, it is necessary to exploit the use of Socketed Component-Level IP (Sck-Clip) asset in LabVIEW FPGA work-flow. Indeed, LabVIEW FPGA module is intended to let users to benefit of custom hardware acceleration in own test, measurement and processing setups by drawing block diagrams, as for typical LabVIEW Virtual Instruments (VIs). Hardware and software partitioning is completely left up to the design tool. This embrace also HDL synthesis and implementation on the target FPGA module. The Sck CLIP is usually instantiated to import small user custom designs of third party IPs into a LV FPGA VI [9]. Valid source types for a Sck Clip module are Electronic Design Interchange Format (EDIF) netlists, Xilinx Design Constraints (XDC) files, Verilog



Fig. 5 Block diagram of the proposed system

| Table I | TLK2/11 | equivalen | t resource i | utilisation | on Xilinx | lintex 7 |  |
|---------|---------|-----------|--------------|-------------|-----------|----------|--|
|         |         |           |              |             |           |          |  |

| LUT | REG | GTX primitive | MMCME2 or PLL |
|-----|-----|---------------|---------------|
| 476 | 596 | 1             | 2             |

or VHDL sources. For some PXIe peripherals, such as the NI PXIe 6591 board, Sck Clip is also the only method to access signals to and from the module physical front panel connectors. TLK2711 Equivalent system design was carried out in Xilinx Vivado design tool, according to the FPGA chip on board of the target PXIe module. Also in Vivado, system design has been synthesized for the very same target FPGA chip and then product results have been exported as EDIF files. Netlists have been imported into the LabVIEW project as sources for the Sck Clip module.

In other terms, Sck CLIP methodology has been intentionally employed not to import a small custom HDL design in a larger LabVIEW FPGA project, but to wrap a whole system design under the LabVIEW environment instead.

In Table 1 the results in term of occupied resources are presented.

Figure 5 provides an overview of the TLK2711 Equivalent system. The NI PXIe 6591 Peripheral Module is hosted into a PXIe compatible chassis, along with users PXI controller. The on board FPGA chip can communicate through the PXIe bus thanks to a dedicated interface. Still, front panel connectors can be accessed thanks to the Sck Clip methodology.

#### 4 Conclusions

In this paper a PXI implementation of a TLK27111 equivalent system has been presented. After the problem introduction, the key advantages, in terms of costs, but also signal integrity and system integrability, has been presented. The IP core has been presented, with special attention to the main requirements which need to be taken into account in order to design a Wizard Link fully compatible system. Finally, the design flow necessary to integrate the IP core within the PXI platform has been accurately described and the resource utilisation has been presented, highlighting once more the low number of resource needed to implement such system, which leave plenty of space on the FPGA board to build specific EGSE functions.

#### References

- Evans, B.G., Thompson, P.T., Corazza, G.E., Vanelli-Coralli, A., Candreva, E.A.: 19452010: 65 years of satellite history from early visions to latest missions. In: Proceedings of the IEEE, vol. 99, no. 11 (2011)
- European Cooperation for Space Standardization, Standard ECSS-EST-50-12C, SpaceWire, Links, Nodes, Routers and Networks, Issue 1, European Cooperation for Space Data Standardization (formerly ECSS-E-50-12A February 2003) (2003)
- 3. Space Engineering Space Fibre Very High-Speed Serial Link, ECSS-E-ST-50-11C-DIR1. http:// ecss.nl/standard/ecss-e-st-50-11c-dir1/ (2018). Accessed July 2018
- 4. Texas Instruments.: TLK2711-SP 1.6-Gbps to 2.5-Gbps Class V Transceiver, TLK2711-SP datasheet (2006). Revised Feb 2018
- Earth Observation Portal. https://directory.eoportal.org/web/eoportal/satellite-missions (2018). Accessed July 2018
- 6. PXISA.: PXI-1 Hardware Specifications, Rev. 2.2, PXISA, Niwot (USA) (2004)
- National Instruments What is LabVIEW.: 2018 ni.com. http://www.ni.com/en-us/shop/labview. html (2018). Accessed July 2018
- National Instruments PXIe 6591-PXI High-Speed Serial Instrument.: 2018 ni.com. http://www. ni.com/en-us/support/model.pxie-6591.html (2018). Accessed July 2018
- National Instruments Importing External IP Into LabVIEW FPGA.: 2018 ni.com. http://www. ni.com/tutorial/7444/en/ (2018). Accessed July 2018

# An FPGA-Based Real-Time Acquisition System for a Distributed Acoustic Sensor Based on Φ-OTDR



Francesco Martina, Yonas Muanenda, Stefano Faralli and Fabrizio Di Pasquale

**Abstract** We report on the design and implementation of a cost-effective FPGAbased acquisition system for real-time detection of vibrations in a Distributed Acoustic Sensor (DAS), demonstrating a highly scalable architecture having suitable features for industrial applications.

# 1 Introduction

Distributed dynamic fiber optic sensing enables continuous measurements of environmental parameters such as dynamic strain or vibration over long distances [1]. This capacity is attracting considerable interests from many industries to monitor large plants and infrastructures such as: oil and gas pipelines, railways, roads, bridges and surveillance systems [2]. Recently, enhanced Distributed Acoustic Sensing (DAS) systems have been widely investigated, with market survey showing that the technology is rapidly growing with rate of 20%/year and its revenue is projected to exceed US\$ 2 Billion in 2025 [3]. Among others, real time measurement techniques using Phase-Sensitive Optical Time Domain Reflectometry ( $\Phi$ -OTDR) have been one of the areas of focus of research [4].  $\Phi$ -OTDR is based on the observation of coherent Rayleigh backscattering from pulses of light sent into a sensing fiber in time domain. Consecutive traces are acquired and the evolution of the amplitude or phase of the received signal at each spatial location is used to measure perturbations which change the refractive index and the optical path length at the point of impact. However, despite the high volume of raw data involved in such measurements, little attention has been paid to their implementation with cost-effective acquisition systems.

On the other hand, FPGA-based data acquisition and processing systems have been proposed for many data-intensive applications, such as multipoint shockwave

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_49

F. Martina · Y. Muanenda (🖂) · S. Faralli · F. Di Pasquale

Institute of Communication, Information and Perception Technologies (TeCIP), Scuola Superiore Sant'Anna, Via Moruzzi 1, 56124 Pisa, Italy

e-mail: y.muanenda@santannapisa.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

measurements and fiber-optic based gyroscopes [5, 6]. A recent FPGA-based solution for distributed sensing has been proposed for Brillouin-based sensors [7], but the solution does not involve a real-time data acquisition architecture. In this paper we address this issue, reporting on the end-to-end design and implementation of a realtime acquisition system for a DAS based on  $\Phi$ -OTDR, including: backscattering signal acquisition along the fiber, its amplification, the analog-to-digital conversion, the signal processing on a Xilinx SoC device, the data transmission on a network and display of the results by a user interface.

#### 2 Main System Specifications

A complete DAS system includes the hardware and the software architectures, each of them with their respective specifications. At the initial stage of the design, we have distinguished the requirements relevant to the final user from the low-level specifications. The former includes the sensor extension, spatial resolution and measurement accuracy. The latter concerns the sampling frequency and the required signal processing. Furthermore, we have targeted a product with average performances, in order to effectively address the typical issues of the acquisition, while ensuring fast developments and debugging times.

The proposed solution can be easily scaled to higher performances, keeping the same architecture. The sensor is designed to detect acoustic vibrations within 1.5 km of standard single-mode fiber, with a spatial resolution of 20 m. The chosen acoustic band, extended from 50 Hz to 1.0 kHz, is suitable for sensing applications of our interest. In addition, the sensor is designed to send the data to the user through an Ethernet interface, in order to integrate the system in a network environment. Additional details on system specifications are reported in [8].

#### **3** Optical System and Electronic Front-End

Figure 1 shows the optical system architecture, which is a typical  $\Phi$ -OTDR scheme using direct detection, together with the receiver electronics. Pump light pulses are generated using a coherent laser with a linewidth of 50 kHz, which is modulated by an acousto-optic modulator (AOM). The Erbium-doped fiber amplifiers (EDFA) and the optical band-pass filters (BPF) are used to boost the signal before and after the pulse modulation. The amplified coherent pulses are then sent into the optical fiber via an optical circulator. Subsequently, the back-scattered light is detected with a PIN photodiode followed by a trans-impedance amplifier (TIA), which is connected to the electronic front-end through a standard coaxial cable.

As shown in Fig. 1 (bottom left), the analogue electronic front-end is made up of a wideband amplifier and an analog to digital converter (ADC). The former has been specifically designed for this prototype and it allows amplifying the signal by 20 dB



Fig. 1 Scheme of the optical system architecture and the acquisition electronics, including custommade modules: an ADC board adapter (red) and the wide band amplifier (black)

over the full photo-receiver bandwidth (125 MHz). The electrical signal is digitalized by using a 100 MS/s ADC with a resolution of 8 bits. An additional card has also been designed to adapt the digital bus to the FPGA board inputs and to acquire the trigger, which is externally generated by the source of the pulses modulating the light source.

#### 4 Digital Signal Processing

The operations needed to convert the raw back-scattered traces into vibrational data are completely implemented on a Xilinx device. This is a System on Chip (SoC), which integrates both processor and FPGA architecture into the same package. The programmable logic has been used for high throughput processing and the dual core processor to reorganize the results and to transmit them on the network.

In this prototype, we tested a simple and efficient method to detect the vibrations along the fiber. We developed the custom logic modules, which acquire the input data stream of back-scattering traces repeatedly and synchronously with both the trigger and the pumped light pulses. In this way, a fixed correspondence is guaranteed between the acquired samples and the positions along the sensing fiber. Therefore, studying the intensity variations of each sample, during the repeated acquisitions, we obtain information on the local vibrations along the sensing distance. In order to implement this procedure in the device, a multi-channel finite impulse response (FIR) filter has been included in the logic, with the frequency specifications reported in Sect. 2. The device also implements averaging functions to reduce the out-of-band noise. The designed architecture guarantees that each filter channel corresponds to the back-scattered signal from a specific fiber position. Figure 2 shows the digital signal processing architecture, which is implemented in the SoC device. The scalability of the proposed scheme follows from the fact that enhancing the capacity in terms of



Fig. 2 Signal processing architecture: FPGA logic in the upper part, processor unit in the lower one

sensing distance, number of points of measurement, etc....can be achieved by using a higher performance FPGA module while maintaining a similar overall architecture.

The proposed acquisition system is also cost-effective since part of the components used in the electronic part shown in Fig. 1 are custom-made boards, and the sum of the costs of individual parts and that of the FPGA device is significantly less than other schemes which use commercially available acquisition systems.

#### **5** Experimental Results

To verify the effectiveness of the proposed real time acquisition system, a preliminary assessment of its capability was performed by using an attenuated tap of the pulses sent into the fiber and monitored by a photodiode. Consecutive pulses were clearly visible in the user interface, including the changes in pulse width and repetition rate. It shows that a simplified version of the system (without filtering) provides the basic functionalities of an oscilloscope. Several tests were performed on the acquisition of the raw  $\Phi$ -OTDR traces, an example of which is shown in Fig. 3.

Subsequently, we enabled the digital filter processing to detect the vibrations. The results are displayed in Figs. 4 and 5, where the x-axis represents the fiber length in meters, while the y-axis the magnitude of variation in arbitrary scale. In the first test we applied a vibration signal of about 500 Hz to the fiber under test by a piezoelectric actuator (PZT), placed at 1 km of distance from the input fiber end. It is possible to see in Fig. 4 the corresponding sharp peak at the expected position.

In the second test, shown in Fig. 5, we have slightly perturbed the fiber spool with a finger. It is possible to notice a periodic trend in the vibration magnitude. This is because positions at the same radial direction on the fiber spool are more excited than others when the external stress is applied.



Fig. 3 Example of acquired raw back-scattering trace (signal intensity in volt versus fiber length in meters)



Fig. 4 Detection of a vibration generated by a piezoelectric actuator



Fig. 5 Detection of a mechanical perturbation applied on the fiber spool

# 6 Conclusions

In summary, we have developed a real-time, cost-effective and scalable acquisition system for coherent  $\Phi$ -OTDR based DAS, which is able to detect mechanical vibrations along a single-mode optical fiber. In particular, we have implemented the signal acquisition chain: the low noise amplification, the analog-to-digital conversion and the signal processing on a Xilinx SoC device. We also designed the data transmission and the openGL real-time rendering of the results. The system was used to detect vibrations and effectively measured the relative perturbation positions within the expected precision. The proposed acquisition and processing scheme is highly cost-effective, easily scalable to higher performance FPGAs and suitable for industrial development of distributed acoustic sensing systems.

#### References

- 1. Masoudi, A., Belal, M., Newson, T.: A distributed optical fibre dynamic strain sensor based on phase-OTDR. Meas. Sci. Technol. **24**, 085204 (2013)
- Baronti, F., Lazzeri, A., Roncella, R., Saletti, R., Signorini, A., Soto, M., Bolognini, G., Di Pasquale, F.: SNR enhancement of Raman-based long-range distributed temperature sensors using cyclic simplex codes. Electron. Lett. 46, 1221–1223 (2010)
- Persistence Market Research: Global distributed acoustic sensing market to surpass US\$ 2 billion in revenues by 2025. Pers. Mark. Res. https://www.persistencemarketresearch.com/ mediarelease/distributed-acoustic-sensing-market.asp (2017)
- Muanenda, Y., Oton, C.J., Faralli, S., Di Pasquale, F.: A cost-effective distributed acoustic sensor using a commercial off-the-shelf DFB laser and direct detection phase-OTDR. IEEE Photonics J. 8, 1–10 (2016)
- Abbas, S.H., Lee, J.R., Jang J.K., Kim, Z.: FPGA-based multipoint shock wave measurement system using LDVs for aerospace applications. In: Proceedings of IEEE Aerospace Conference pp. 1–6 (2016)
- Peesapati, R., Kumar, K.S., Sabat, S.L.: FPGA based fiber optic gyroscope signal denoising using discrete wavelet transform. ARPN J. of Eng. App. Sci. 7, 1480–1489 (2012)
- Abbasnejad, M., Alizadeh, B.: FPGA-based implementation of a novel method for estimating the brillouin frequency shift in BOTDA and BOTDR sensors. IEEE Sensors J. 18, 2015–2022 (2018)
- Martina, F.: Sistema di acquisizione in tempo reale per riflettometria opto-acustica distribuita: UNIPI (2018). https://etd.adm.unipi.it, https://www.dropbox.com/s/qsts7pwzrd14dvj/ thesis.pdf?dl=0

# Sound-Based Detection and Ranging System as Example Application of a Rapid Prototyping and Low-Cost Technology for Board-Level Electronic Systems Education



#### Stefano Di Pascoli, Gabriele Ciarpi and Sergio Saponara

**Abstract** The work presents a low-cost and rapid prototyping technology for printed circuits board (PCB) design. The PCB design flow and the prototyping technology are presented, which are tailored for educational applications in University course of board-level electronic systems. Some fabricated samples with the proposed technology of educational electronic systems, designed by bachelor-level student teams, e.g. a sound-based detection and ranging system, are presented. The achieved results show the feasibility of the proposed technique as rapid prototyping tool and its suitability for engineering educational purpose.

## 1 Introduction

One of the main issue in Electronics educations is transferring the know-how in design, prototyping and testing of electronic systems, implemented using COTS (Commercial Off-the-Shelf) components at board-level, to students. Indeed, this subject of the Electronic field is still full of heuristics rules and usually require years of expertise to manage complex issues related to electrical, electromagnetic, mechanical and thermal aspects of components [1]. Moreover, a printed-circuit-board (PCB) electronic system design typically involves both analog and digital domains, power supply managements issues and low-power signal ones, and all non-ideal effects due to saturations, cross-talks, interference coupling, parasitic components. Multiple modeling approaches have to be used: e.g. both concentrated or distributed models since the signal frequency is continuously increasing due to the success of

Dipartmento di Ingegneria dell'Informazione, Università di Pisa, Via G. Caruso 16, 56122 Pisa, Italy

e-mail: Stefano.di.pascoli@unipi.it

G. Ciarpi e-mail: gabriele.ciarpi@unipi.it

S. Saponara e-mail: sergio.saponara@unipi.it

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_50

S. Di Pascoli (🖂) · G. Ciarpi · S. Saponara

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

wireless systems and of high-speed digital ones. Finally, both CAD and prototyping technology concepts have to be transferred to the students considering the schematic design, components selection, functional, electrical and thermal simulations, thermal and reliability proper sizing, place and route design. The laboratory parts is also essential for board prototyping and assembling and testing. Due to the educational aim and to allow at home replicability of the experiments, then a free CAD flow and a simple technology have to be set up.

Several other students lab activities are employed in different tertiary level electronics courses [2–6]. However, actual system fabrication (PCB design and fabrication, components selection and procurement and final assembly and testing) is often overlooked and many lab activities concentrate in algorithm development [3, 4], or employ a commonly available or specially built platform [2, 6].

To overcome these issues, this work presents a low-cost CAD flow and rapid prototyping technology for PCB design, tailored for educational applications in University course in Sects. 2 and 3. A sample circuit designed by a bachelor-level student team and fabricated with the proposed technology, a sound-based detection and ranging system, is presented in Sect. 4. The achieved results are analyzed in Sect. 5 and show the feasibility of the proposed technique as rapid prototyping tool and its suitability for engineering educational purpose. Conclusions are drawn in Sect. 5.

#### 2 Board-Level Electronic Systems Education

The Board Level Electronics (BLE) course (Italian: Costruzioni Elettroniche, CE) provides 6 ECTS units (60 h of classroom lessons and supervised laboratory activities) and is focused on the applicative and industrial aspects of electronic system design. The enrolled students have a prerequisite curriculum, which includes the knowledge of signal theory, electronic devices theory, circuit theory and basic electronics. The course material can be divided into three main blocks:

- 1. "Circuit board theory": signal integrity, board parasitic effects and estimation, thermal aspects, reliability theory and estimation, basic rules for board level design. (50% of total time)
- 2. CAD tools for board development (15%)
- 3. Supervised lab: The students are encouraged to divide into groups (1–4 people each) and to propose the development of an electronic system. (35%)

The main purpose of the course is to present, for the first time, the students to the practical aspect of circuit fabrication. It is the first occasion for them to realize that circuit elements have a shape and a mass, that some solutions are easy and other are not, some are cheap and some are costly. In addition, the course aims to develop the teamwork skill of the students so as to introduce them in the industrial or research world, where the teamwork is the key to success. They are requested to use a high degree of autonomy to find and integrate multiple information about electrical, thermal, mechanical and electromagnetic aspects of components and their packages. They must develop a problem-solving attitude.

#### 3 Rapid Prototyping and Low-Cost Technology

The proposed system is then developed using only free and/or open source software and low cost and easily available hardware, materials and tools.

Even if it is not an objective of the course, the participant students are encouraged to verify the functionality of the proposed system by means of a circuit simulator. Most popular free circuit simulators are *Cadence ORCAD Pspice*, *lite edition* and *Linear LTSpice*. The circuit simulation phase is of paramount importance for analog systems (or analog subsystems/frontends, in case of mixed-signal systems, like the circuit described further in this paper).

The rest of the development phases are to be developed with *Kicad EDA* suit. Kicad is reasonably complete and powerful, and has a large library of components and footprints. Only a few components and footprints, especially relative to electro mechanical components (switches, pushbuttons, connectors ...) cannot be found in Kicad libraries and have to be designed from scratch.

The participants use *Kicad EEschema* to design the schematic, select or design the components footprints, perform the annotation and electric rule check and generate the netlist. The PCB place and route phase is performed with *Kicad Pcbnew*, with a manual approach.

The chosen technology is based on the use of Surface Mount Devices (SMD) with a single copper top layer for interconnections. Electromechanical components (connectors, switches, etc...) are often mounted on the bottom face of the PCB, and soldered on the top layer, since this configuration provides much more mechanical strength and reliability than the use of SMD connectors.

The students are instructed to use a minimum track width of 0.4 mm, and a 0.3 clearance. These layout rules allow the use of components with a pin pitch of 0.8 mm like microcontrollers in TQFP package. Two-terminal components (resistors, capacitors, diodes, LED, ...) can use the 0805 SMD package.

The final PCB layout developed with the Kicad Pcbnew software is exported into the SWG file format. The tailored procedure used to move from an abstract SWG file to a real physic board is described in the next 6 stages:

- 1. The SWG file is printed with a common office laser printer (Epson AL-M200) on specially dextrin-coated *toner transfer paper*.
- 2. This paper print is sandwiched, face down, on a 0.8 mm FR4 substrate coated with a 35  $\mu$ m copper layer and heated with a common office laminator. The heat glues the toner artwork to the copper surface.

- 3. After 5 min immersion in water the dextrin coating releases the paper from the other side of the toner artwork and the paper can be removed. The toner layout now on copper is not able to withstand the copper chemical etching solution and has to be reinforced. This result can be obtained re-laminating the board with a polymer coated polyester foil (Toner Reactive Foil—TRF). A new layer of high-opacity toner sticks to the original toner layer increasing its chemical robustness.
- 4. After the mask creation, the uncovered copper can be removed with a chemical attack. We use a mixture of tap water (500 ml) 38%–HCl in water (150 ml) and 30%–hydrogen peroxide (50 ml). The attack time is about 10 min. The final PCB is cleaned with acetone. Since we usually immediately solder the components, no copper finishing stage is used.
- 5. The holes for non-SMD components are manually performed with a vertical mount drill; the minimal hole diameter is 0.8 mm. The maximum size of the boards that can be fabricated with the proposed technology are 210 mm  $\times$  290 mm, the limitation arises from the size of the dextrin paper sheets used for the artwork printing and the maximum width of the office laminator The components are manually soldered with the help of a low magnification microscope.

# 4 Design Examples: Sound-Based Detection and Ranging System

Group 12 proposed a "sound based detection and ranging system": a system for the detection of the distance and direction (in two dimensions) of a loud sound. The system is composed by 3 omnidirectional microphones, three analog front ends based on an amplifier stage and filter, a comparator, a microcontroller (Atmega32) driving a LCD display, see Fig. 1.

The three fronts ends employ three OpAmp for a gain stage (30 dB) and a second order Butterworth bandpass filter (20 Hz–20 kHz). The amplified and band limited signal is than fed to a comparator with a user trimmable threshold. Finally, the three digital signals reach the interrupt input of the microcontroller, which implements the range and direction finding algorithm (implemented in C and assembly language) and drives an LCD display.

The circuit is powered at 5 V with a linear regulator and a 9 V battery.

The students provided also an activity record describing the development of the system. This report, besides the description of the circuits, contain a discussion of some power, thermal and reliability aspects of the system.

- A power consumption estimation of the circuit is provided together with a thermal discussion of the most critical component (the voltage regulator).
- Furthermore, the designers verified that their assumption of concentrated-elements used during the design of the system was reasonable at the frequencies in use;



Fig. 1 Block diagram of the sound-based detection and ranging system



Fig. 2 Main board (left) and microphone boards (right 3) of the sound-based detection and ranging system

they also assessed the effect of parasitic board effects (interconnection resistance, inductance and capacitance) and of thermal noise sources.

• Finally, a reliability analysis of the system (Fig. 2) has been performed, based on reliability data available for the microphone, the processor, the comparator and the OpAmp.
| Tuble 1 Groups progress after ubbet 6 weeks from the start of stage 5 |                                                  |                      |  |
|-----------------------------------------------------------------------|--------------------------------------------------|----------------------|--|
| Group ID                                                              | Description                                      | Status after 6 weeks |  |
| 7                                                                     | Class D audio amplifier with spectral visualizer | Working              |  |
| 11                                                                    | Class AB audio amplifier                         | Working              |  |
| 2                                                                     | Psychedelic LED lights                           | Partially working    |  |
| 8                                                                     | Rotating LED display                             | Working              |  |
| 9                                                                     | Greenhouse automator                             | Partially working    |  |
| 6                                                                     | Drink dispenser                                  | Working              |  |
| 4                                                                     | Two wheel robot                                  | Testing              |  |
| 12                                                                    | Sound-based detection and ranging system         | Testing              |  |
| 16                                                                    | Signal generator                                 | Pcb ready            |  |
| 1                                                                     | FM sonar rangefinder                             | Pcb ready            |  |
| 3                                                                     | Robotic car with wrist remote                    | Pcb ready            |  |
| 10                                                                    | Theremin                                         | Pcb development      |  |
| 13                                                                    | Magnetic gun                                     | Pcb development      |  |
| 15                                                                    | Amplified loudspeaker                            | Pcb development      |  |

 Table 1 Groups progress after about 6 weeks from the start of stage 3

#### 5 Course Statistics and Conclusions

Apart the specific example described in Sects. 4 and 5 presents the course and project lab statistics. The projects are proposed and selected by the students, to stimulate their innovation attitude, but with guidance and suggestion from the teachers. The project development can be divided into three stages:

- (1) Initial development: starting idea, analysis of the theory of operation, block diagram, gathering of information;
- (2) Schematic development: selection of components and schematic design with *Kicad Eeschema*;
- (3) PCB layout development with *Kicad Pcbnew*, PCB fabrication, assembling, and testing, using the technology described in Sect. 3. For projects using a microcontroller, this phase also includes firmware development.

Stages 1 and 2 were conducted in classroom with supervision. The selection of components was guided by means of the provision of a "suggested components list" (which included passive components, common discrete active devices, a few optical devices, a microprocessor, operational amplifiers, the 555 timer, some connectors and other commonly used devices). Other components could be acquired, with a cost constrain.

Stage 3 was performed partially as a homework (Kicad Pcbnew design) and partially with supervision (the fabrication process). After the three stages are completed a technical documentation in English has to be produced, including a description of the 3 above stages at system, circuit and technology levels, and also including analysis of signal integrity, parasitic, thermal issues and proper cooling a/heat sink strategies, reliability and FMEA/Failure mode an defect analysis). The final exam is based on the analysis of the prototypes plus an oral session with 50% of the questions based on the implemented specific project plus 50% of questions related to theoretical aspects of CAD flow, design methodology or technology. Table 1 show the progress after about 6 weeks from the start of stage 3.

Acknowledgements We thanks the members of the group 12, Marco Bellamio, Paolo Bonifati, Francesco Lombardi, Luca Lucarelli, who developed the describes system.

#### References

- Awawdeh, M., Faisal, T., Fadhel, F., Al Hamadi, A.: Improving electronics engineering students' skills by projects' college competition. In: 2018 Advances in Science and Engineering Technology International Conferences (ASET), pp. 1–5. Dubai, Sharjah, Abu Dhabi, United Arab Emirates. https://doi.org/10.1109/icaset.2018.8376936 (2018)
- Cunha, B.G.P., et al.: DidacTronic: a low-cost and portable didactic lab for electronics: Kit for digital and analog electronic circuits. In: 2016 IEEE Global Humanitarian Technology Conference (GHTC), pp. 296–303. Seattle, WA. https://doi.org/10.1109/ghtc.2016.7857296 (2016)
- Esposito, W.J., Mujica, F.A., Garcia, D.G., Kovacs, G.T.A.: The Lab-In-A-Box project: An Arduino compatible signals and electronics teaching system. In: 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE), pp. 301–306. Salt Lake City, UT. https:// doi.org/10.1109/ewme.2016.7496479 (2015)
- Esposito, W.J., Mujica, F.A., Garcia, D.G., G.T.A. Kovacs, The Lab-In-A-Box project: An Arduino compatible signals and electronics teaching system. In: 2015 IEEE Signal Processing and Signal Processing Education Workshop (SP/SPE), pp. 301–306. Salt Lake City, UT. https:// doi.org/10.1109/dsp-spe.2015.7369570 (2015)
- Nerguizian, V., Mhiri, R., Saad, M., Kane, H., Deschênes, J.S., Saliah-Hassane, H.: Lab@home for analog electronic circuit laboratory. In: 2012 6th IEEE International Conference on E-Learning in Industrial Electronics (ICELIE), pp. 110–115. Montreal, QC. https://doi.org/10. 1109/icelie.2012.6471157 (2012)
- Lumpp, J.K., Blackburn, W.C., et al.: Instrumentation and measurement in a first-year engineering program. IEEE Instrum. Meas. Mag. 21, 20–24 (2018). https://www.orcad.com/resources/ download-orcad-lite

# **Approximate Memory Support for Linux Early Allocators in ARM Architectures**



Giulia Stazi, Antonio Mastrandrea, Mauro Olivieri and Francesco Menichelli

**Abstract** Approximate computing is a new paradigm for energy efficient design, based on the idea of designing digital systems that trade off computational accuracy for energy consumption. The paradigm can be applied to different units (i.e. internal units of the CPU, floating point coprocessors, memories). Considering the memory subsystem, approximate memories are physical memories where circuitlevel or architecture-level techniques are implemented in order to reduce energy at the expense of errors occurring in bit cells. Supporting approximate memories at operating system level is required for managing them efficiently and for allowing user level applications to use it directly, but its implementation is subject to specific requirements and constraints, sometimes architecture dependent. In this paper we describe the introduction of approximate memory support on ARM architectures, which are widely adopted in low power embedded systems. While Linux support for approximate memory has already been introduced for main allocators, porting it to ARM architectures required the introduction of its specific support in the Linux early allocators, that are a fundamental function of the Linux kernel startup phase, before instantiation of the main allocators.

**Keywords** Approximate memory · Approximate computing · Linux OS · Low power embedded systems

G. Stazi · A. Mastrandrea · M. Olivieri · F. Menichelli (🖂)

Department of Information Engineering, Electronics and Telecommunications (DIET), Sapienza University of Rome, Via Eudossiana, 18, 00184 Roma, Italy e-mail: menichelli@diet.uniroma1.it

G. Stazi e-mail: stazi@diet.uniroma1.it

A. Mastrandrea e-mail: mastrandrea@diet.uniroma1.it

M. Olivieri e-mail: olivieri@diet.uniroma1.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics* Pervading Industry, Environment and Society, Lecture Notes in Electrical

Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_51

429

#### 1 Introduction

Approximate memories, as part of the *approximate computing* design paradigm, have been a particularly prolific source of research works in the late years. Depending on the technology (i.e. DRAM, SRAM) many architectural ideas and circuit design implementations have been proposed [1–4], demonstrating a large impact in reducing the energy consumption of the memory subsystem. In parallel with the introduction of them as a component in a computing platform, a second problem arises: the ability to efficiently managing and make them usable in software applications. While in simpler architectures memories are managed and allocated directly by software, larger and more complex embedded systems platforms (e.g. embedded systems for networked applications, graphics or multimedia [5, 6]) tend to have an operating system which provides fundamental services as multiprocessing, virtual memory management, file system and network stack. These systems are also those that require larger array of physical memories, thus they benefit more from the reduction of energy consumption offered by approximate memories.

Since Linux is widely used as OS in embedded systems, due to its flexibility and availability of source code, approximate memory support for the main allocators has already been investigated and implemented [7]. However, for the complete support of approximate memories, especially for embedded platforms, such as ARM architectures, it is required that *Linux early allocators*, which are the fundamental allocators used during the startup of the operating system, are modified. *Linux early allocators* are responsible for allocating, among others, data structures required by kernel deeply internal functions, as the virtual memory management in the main allocator.

The following Sections are organized as follows: in Sect. 2 we describe the state of approximate memory support before this work, Sect. 3 reports the main contribution of the present work. Section 4 provides results, in the form of allocation statistics provided by the kernel and discusses the characteristics of the implementation.

### 2 Previous Works

In this section we briefly describe our previous work regarding the introduction of approximate memory support in Linux kernel [7]. This extension, which relies on the internal concept of *physical zone*, involved the creation of a new Linux memory zone, called ZONE APPROXIMATE, where approximated pages containing non-critical data can be grouped, and the implementation of a custom system call to allow user space applications to dynamically request pages within this zone.

The advancements described in this paper regard the extension of approximate memory support for Linux kernel early allocators, along with boot time vector allocation. This further steps were required for porting approximate memory support to ARM architectures, but is valid for all architectures that make use of early boot allocators during the booting process.

#### **3** Linux Early Allocators and Approximate Memory

Linux early boot allocators are used during the boot process in order to allocate data structures in the initial phase of system startup, before the main allocators are instantiated. For ARM architectures, the initialization of all physical zones, including ZONE\_APPROXIMATE, takes place in *bootmem\_init* function. This routine determines the limits of all physical memory available (PFN limits) and sets up the early memory management subsystem. After this process, pages allocated by the boot allocator are freed and physical zone limits, including ZONE\_APPROXIMATE, are determined.

## 3.1 Bootmem Allocator

At start-up Linux kernel gains access to all physical memory available in the system. Before memory zone allocator is set up and running, it can be necessary to preallocate some initial memory areas for kernel data structures and system-wide use, taking them from available RAM. To address this requirement, a special allocator called *bootmem allocator* or *memblock allocator*, is introduced. The initialization of this early allocator is architecture dependent and it is set up in *setup\_arch* routine.

Once the boot memory management is available, it can allocate areas from low memory (memory directly mapped in Kernel space), with page granularity. The early allocator is used only at boot time to reserve and to allocate pages for internal kernel use. For example, *page tables* are built from this pool of physical memory pages, allowing the MMU to be turned on and Linux kernel to switch to virtual memory management.

The whole mechanism requires that the kernel must be aware of approximate memory in the early boot phases: approximate memory must be visible in order to properly instantiate paging and main allocators, but must not be used for kernel data structures, which contains critical data that do not allow any form of corruption. In order to exclude physical memory pages mapped as approximate from bootmem allocation, we modified the allocation algorithm of the memblock interface. A memblock is a structure that stores information of physical memory regions reserved by Linux kernel during the early bootstrap period. The core function of this early allocation is memblock virt alloc internal, which in turn calls memblock find in range node. This routine receives as parameters, among others, the requested memory size and the lower and upper bounds of the physical region where the memory block will be allocated. At first, the allocation starts from the lower bound and the following allocation requests will proceed to lowest available address starting from the lower bound. The upper bound instead corresponds to the end of the candidate physical memory range and it is set to the value of the global parameter memblock\_current\_limit, which is set to the end of low memory region, forcing early boot allocation within the low memory region, that is the only region the kernel can directly access.

In order to include and support the presence of approximate memory, we modified the algorithm for computing *memblock\_current\_limit*, forcing it to be always below the lower limit of the approximate memory physical region. In this way it is ensured that the bootmem allocator gets free pages only from physical exact memory.

#### 3.2 Vectors

In order to boot the primary core, the kernel allocates a single 4KB page as vector page, mapping it to the location of ARM exception vectors at virtual address 0xFFFF0000 or 0x00000000. When this step is completed, the *trap\_init* function copies the exception vector table, exception stubs, and helpers from *entry-arm.S* into the vector page.

In particular, the allocation of the ARM vectors page is performed by the *early\_alloc* function allocator. This allocator cannot exclude approximate memory, since it does not allow to specify the memory zone. In order to ensure that the vectors page is never allocated in approximate memory, the implementation of a new *early\_alloc* was required. The new *early\_alloc* uses the *memblock* interface and allow to explicitly indicate an address limit for the allocation request, in order to exclude approximate memory.

#### 3.3 Approximate Memory and DTB

During the boot process, a "Device Tree Blob" (DTB) file is loaded into memory by the bootloader and passed to the Linux kernel. This DTB file is a tree data structure containing nodes that describe the system hardware layout to the Linux kernel, allowing for platform-specific code to be moved out of kernel sources and replaced with generic code that can parse the DTB and configure the entire system as required. Each physical device is indeed described inside the device tree, in particular it is represented as a node and all its properties are defined under that node.

In order to support approximate memory management in ARM architectures, we defined a new DTS file (from which the DTB is generated), specific for each ARM platform, with a special node for approximate memory (Fig. 1 on left). In particular this node, called *approx\_mem*, collects the information about the physical address range and the size of zone approximate.

#### 4 Experimental Results

In this section we describe the results and the architecture setup used to evaluate the introduction of approximate memory support for early boot allocators in Linux kernel on ARM architectures.

```
memory {
reg = <(baseaddr1) (size1)
(baseaddr2) (size2)
...
(baseaddrN) (sizeN)>;
};
```

| 0xA0000000 | Reserved   |  |
|------------|------------|--|
| ~~~~~      | Local DDR2 |  |
| 0x84000000 | REMAP      |  |
| 0x60000000 | Local DDR2 |  |
| 0x5C000000 | Reserved   |  |

Fig. 1 On left: Memory node in device tree. On right: Vexpress Cortex A9 board memory map (extract)

### 4.1 Hardware Setup

ARM Versatile Express (Vexpress) Cortex A9 has been chosen as the architecture for performing tests. These tests were run on the emulation platform AppropinQuo [8], that contains specific approximate memory models for the chosen architecture. Fig. 1 on right shows the memory map of Vexpress Cortex A9, in particular RAM memory can be present from 0x60000000 to 0x80000000 and from 0x84000000 to 0xA000000. We chose to map 128 + 128 MB of RAM: the first 128 MB part is exact, starting from address 0x60000000 to 0x6FFFFFFF and the second 128 MB part, from address 0x68000000 to 0x6FFFFFFF, is approximate memory. This map was used to configure the emulator and also to set the corresponding kernel dts file.

## 4.2 Results

Figure 2 (left) shows the statistics for the ZONE\_NORMAL region, obtained through the *zoneinfo* system command. The region contains exact memory, is composed of 32768 pages; considering that every page in the ARM architecture is 4 KB large, it confirms the availability of 128 MB of exact memory. The start\_pfn number indicates the start address of exact memory in physical pages (393219 × 4096 = 0x60000000). Fig. 2 (right) shows the same statistics for the ZONE\_APPROXIMATE. The approximate area has 32768 pages; again this confirms that the approximate region area is 128 MB large. The start\_pfn number indicates the start address of approximate at address 0x68000000 (425984 × 4096 = 0x68000000). Particular importance comes from information regarding 'present' and 'managed' lines. The latter corresponds to pages managed by the buddy system (the main allocator); they are computed as the number of present pages minus the number of present pages matches the number of managed pages, we have the demonstration that during the boot phase no pages belonging to the zone\_approximate were allocated.

Moreover, in order to evaluate the correct memory allocation of apprroximate physical pages in ZONE\_APPROXIMATE, we run a testbench that requests a block of 1000 memory pages (about 4 MB) from ZONE\_APPROXIMATE. Using the *zoneinfo* 

```
cat /proc/zoneinfo
                                              cat /proc/zoneinfo
Node 0, zone Normal
                                              Node 0, zone Approximate pages free 32768
pages free 28036
                                              min 180
min 167
                                              low 225
low 208
                                              high 270
high 250
                                              scanned 0
scanned 0
                                              spanned 32768
spanned 32768
                                              present 32768
present 32768
                                              managed 32768
managed 30487
                                              start_pfn: 425984
start_pfn: 393216
```

Fig. 2 On left: kernel boot logs. On right: zone\_approximate statistics

```
cat /proc/zoneinfo
Node 0, zone Approximate pages free 32768
min 180
low 225
high 270
scanned 0
spanned 32768
present 32768
managed 32768
...
start.pfn: 425984
```

```
cat /proc/zoneinfo
Node 0, zone Approximate pages free 31768
min 180
low 225
high 270
scanned 0
spanned 32768
present 32768
managed 32768
...
start.pfn: 425984
```

Fig. 3 On left: zone\_approximate statistics before allocations. On right: zone\_approximate statistics after allocation

system command we get the information printed in Fig. 3 before (left) and after (right) the allocation request. We can see that ZONE\_APPROXIMATE has 31768 free pages after allocation, confirming that the application allocated exactly 1000 pages.

# 5 Conclusions

In this work we analyzed early allocators in Linux kernel, proposed and implemented an extension of their function in order to introduce the support of approximate memory in the architecture. This work was done in the specific context of porting approximate memory support on ARM architectures, a widely adopted architecture in low power embedded systems, but it is valid also for other architectures that make use of early allocators during the boot process. After completing the porting we run the kernel on an ARM platform and demonstrated the correctness of the boot process and of the main allocators, that can now see and manage the correct number and set of physical pages for the approximate memory.

# References

- 1. Liu, S., Pattabiraman, K., Moscibroda, T., Zorn, B.G.: Flikker: saving dram refresh-power through critical data partitioning. ACM SIGPLAN Not. **47**(4), 213–224 (2012)
- Lucas, J., Alvarez-Mesa, M., Andersch, M., Juurlink, B.: Sparkk: Quality-scalable approximate storage in dram. The memory forum, pp. 1–9 (2014)

- Raha, A., Sutar, S., Jayakumar, H., Raghunathan, V.: Quality configurable approximate dram. IEEE Trans. Comput. 66(7), 1172–1187 (2017)
- Frustaci, F., Blaauw, D., Sylvester, D., Alioto, M.: Approximate srams with dynamic energyquality management. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 24(6), 2128–2141 (2016)
- Nguyen, D.T., Kim, H., Lee, H.-J., Chang, I.-J.: An approximate memory architecture for a reduction of refresh power consumption in deep learning applications. In: IEEE International Symposium on Circuits and Systems (ISCAS), IEEE, pp. 1–5 (2018)
- Stazi, G., Adani, L., Mastrandrea, A., Olivieri, M., Menichelli, F.: Impact of approximate memory data allocation on a h.264 software video encoder. In: Workshop on Approximate and Transprecision Computing on Emerging Technologies ATCET2018 (2018)
- Stazi, G., Menichelli, F., Mastrandrea, A., Olivieri, M.: Introducing approximate memory support in linux kernel. In: 2017 13th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), pp. 97–100. IEEE (2017)
- Menichelli, F., Stazi, G., Mastrandrea, A., Olivieri, M.: An emulator for approximate memory platforms based on qemu. In: International Conference on Applications in Electronics Pervading Industry, Environment and Society, pp. 153–159. Springer (2016)

# Fully Digital Low-Power Implementation of an Audio Front-End for Portable Applications



Gabriele Meoni, Luca Pilato, Gabriele Ciarpi, Alessandro Palla and Luca Fanucci

**Abstract** This work presents a fully digital implementation of an audio front-end for portable applications developed in a low-power perspective. Such platform acquires audio samples through a four digital Pulse Digital Modulation (PDM) microphones array and implements two-channel (left and right) beamformers to perform speech enhancement. The system also features a low-power Voice Activity Detector (VAD) for the implementation of source localization or noise reduction algorithms. Finally, the processed data can be converted into a 16-bit resolution Pulse Code Modulation (PCM) format and then transmitted through a standard Integrated Interchip Sound (I2S) interface. The front-end has been implemented onboard a Microsemi IGLOO nano Field Programmable Gate Array (FPGA), reaching a power consumption of only 2.264 mW. The low-power architecture and the implemented functionalities make the system a promising front-end for audio speech portable applications.

**Keywords** Low-power digital architecture · Low-power FPGA · Digital signal processing · Audio beamforming · Voice activity detection

Department of Information Engineering (DII), University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy

e-mail: gabriele.meoni@ing.unipi.it

L. Pilato e-mail: luca.pilato@for.unipi.it

G. Ciarpi e-mail: gabriele.ciarpi@ing.unipi.it

A. Palla e-mail: alessandro.palla@for.unipi.it

L. Fanucci e-mail: luca.fanucci@unipi.it

G. Meoni (🖂) · L. Pilato · G. Ciarpi · A. Palla · L. Fanucci

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_52

# 1 Introduction

During the last years, portable systems for speech audio signal processing have been becoming crucial thanks to their importance in human to human and human to machine communications [2]. These systems' tasks are linked with data acquisition through microphones, signal conditioning, analog to digital conversion and speech enhancement. The latter, in particular, is increasing its importance since the noise presence might reduce the quality of the services [2]. Nevertheless, the choice of speech enhancement algorithms is conditioned by the low power constraints that are typical of portable systems. For this reason, trade-offs between power consumption and complexity shall be found for the respect of such requirements. Modern systems implement Differential Microphone Arrays (DMAs) beamforming algorithms for the rejection of disturbance sources generated from particular angular directions [2, 6]. Beamforming is realized by exploiting the differences on the time of arrival of the audio waveforms in the various microphones of a microphone array to modify the directivity pattern. DMA beamforming constitutes an interesting solution since it offers beampatterns with reduced dependency with frequency, and its architecture is particularly well suited for their implementation onboard Field Programmable Gate Arrays (FPGAs) or microcontrollers [4].

Moreover, a beamforming-based system can be empowered through the implementation of *Voice Activity Detectors* (VADs). Indeed, the latter might be used to realize source localization over the different angular directions [8]. For these reasons, in this paper we present a digital low-power and low-area audio front-end for different speech-based applications. The presented front-end acquires data through 4 *Pulse Density Modulation* (PDM) microphones. The latter allow to avoid the use of *Analog To Digital Converters* (ADCs) for the digitization of the signals. In this way, the complexity of the *full digital* front-end system is reduced [6]. Saving area



Fig. 1 Schematic of the system architecture

is also possible by performing first-order DMA beamforming directly over the PDM microphone signals, only requiring single bit operations. Finally, the VAD of our previous work is used [3], whose design is low-power and low-area oriented. The remainder of the work is structured as described: in Sect. 2 the whole system architecture is presented; Sect. 3 describes the implementation of the system onboard a low power *IGLOO nano* FPGA and provides details on the area occupation and on the post-layout simulation performed for the system power estimation. Finally, in Sect. 4, conclusions are given (Fig. 1).

### 2 System Architecture

#### 2.1 Architecture Description

The system interfaces 4 PDM microphones and produces as output a two channels (left and right) *Integrated Interchip Sound (I2S)* stream. In order to implement source localization or speech enhancement, the system performs beamforming algorithms. The architecture is programmable by an internal 32-bit configuration register, accessible through a *Serial Peripheral Interface* (SPI).

To reduce the power consumption and increase the time resolution [1], the latter are executed over PDM signals. Two different first order PDM DMA beamformers are implemented and their outputs are converted to 16-bit *Pulse Code Modulation* (PCM) format through low pass and downsampling *Cascaded Comb Integrator* (CIC) filters. Each beamformer is realized by performing the single bit difference between the signals of two microphones, one of them delayed by a configurable delay line. Both the left and right delay lines consist in a 64-bit shift registers, where the delay is programmed by the SPI.

Each microphone can be routed to the inputs (delayed and not delayed) of the left and right beamformers through 4 multiplexers, which are controllable through SPI. This approach guarantees the usability of the front-end with various patterns of microphone arrays and different distances among the microphones. In addition, since the left and the right channels exploit separates beamformers, it is possible to generate diverse beampatterns for each channel, filtering the sources generated from different angular directions.

Some of the configuration register bits can be used to disable the beamforming operation forcing the inputs of *MUX1* and *MUX3* to zero to avoid the propagation of bits flipping in the correspondent delay line, reducing the power consumption.

A VAD is also implemented to estimate the voice activity at the output of the CICs. An 8x amplification with saturation control is provided to compensate the attenuation introduced by the beamforming operation when enabled. This is necessary for the correct estimation of the voice activity. On the contrary, if the correspondent beamformer is disabled, the input of the VAD is directly connected to the selected CIC output.

#### 2.2 Data Acquisition

The PDM beamformers generate 2-bit PDM signals which shall be converted into the 16-bit PCM format. The number of bits choice requires design trade-offs between resolution and power consumption. Indeed, improving the resolution on the audio signal representation requires to increase the *OverSampling Ratio* (OSR) in order to improve the quantization noise rejection with consequent increment of the system clock frequency and the power consumption. In addition, we fixed sample rate to  $f_{sample} = 31250 \text{ kHz}$ , corresponding to a system bandwidth of 16 kHz. For such bandwidth values, the most of commercial PDM microphones rarely offer resolutions much higher than 16 bits. Having 16-bit accuracy requires a signal to noise ratio of:

$$SNR_O|_{dB} = DR|_{dB} + 1.76 = 6.02 \cdot n + 1.76 = 98.08 \, dB.$$
 (1)

where  $SNR_Q|_{dB}$  is the signal to noise ratio calculated with only quantization noise,  $DR|_{dB}$  is the dynamic range expressed in decibel, and *n* is the number of required bits. Since, using an ideal filter, the  $SNR_Q|_{dB}$  of PDM signal is [7]:

 $SNR_O|_{dB} \approx 7.78 + (2L+1) \cdot 10Log(OSR) - 9.94 \cdot L + 10Log(2L+1)$  (2)

we fixed OSR = 64, supposing the microphones used implement a L = 3 order of *Delta-Sigma* modulation. In this way, considering the filter non-idealities, we obtain  $15^{\frac{3}{4}}$  bits of resolution.

For such values of OSR, the *polyphase* implementation of the CIC can be used in order to save power consumption. However, it leads to a higher area occupation and requires the use of clock gating or other asynchronous approaches to have acceptable improvements, which make the design much more complex [5]. For such reason, the *standard recursive* implementation is preferred. For sufficient accuracy performances, a fourth-order CIC with *differential delay* of 2 has been implemented. The choice of the CIC order, *K*, has been performed through the thumb rule K = L + 1 suggested by [7]. In order to have OSR = 64, the system and the microphones clocks have been fixed to  $f_{clk} = f_{mic} = OSR \cdot f_{sample} = 2$  MHz.

## 2.3 Beamforming Performances

First of all, beamforming is performed in PDM domain to reduce the power consumption. Indeed, following the approach suggested by [5], the power consumption of the beamformer is estimated to be proportional to the bus width, W, of the adder cells,  $P \propto VDD^2 \cdot W \cdot f_{clk}$ . In our case, W = 2, permitting the minimization of P. The DMAs beamforming characterization is done using two PDM omnidirectional microphones as fundamental elements. To achieve good performances in a linear array, the beamforming algorithm needs the distance l between the microphones is lower than the wavelength of the audio signal. For a bandwidth of 16 kHz, the distance *l* should be lower than the shortest wavelength, given by

$$l << \lambda_{min} = \frac{c_{sound}}{f_{BW}} = \frac{340 \text{ m/s}}{16 \text{ kHz}} = 2.1 \text{ cm}$$
 (3)

where  $c_{sound}$  is the speed of the audio signal in air. We also need to consider the beamforming elaboration where each delay element of the shift register adds a clock cycle period to the delayed signal of  $T_{clk} = 1/f_{clk}$ . The beamforming Eq. 4 gives us a simple relation to find the optimal delay of the shift register in terms of the number of clock cycles *n* and the microphones distance *l*, to adjust the directivity of the microphones array and get the cardioid pattern.

$$n = \operatorname{round}\left(\frac{l}{c_{sound}}f_{clk}\right) \tag{4}$$

Using 64-bit shift registers, our system can adjust the beampattern as a cardioid for a microphone maximum distance of about  $l_{max} = 1$  cm. In addition, the cardioid beamformer can be set with a spatial resolution of  $\Delta l = c_{sound}/f_{clk} = 0.17$  cm, allowing to use a generic array geometry with a fine tuning of the cardioid beamformer.

Figure 2 shows the directivity pattern of two 5 mm distant microphones for an audio source of 1 kHz with and without beamforming delay. The directivity pattern is function of the incidence angle  $\theta$  of the audio waves to the normal of the microphones plane. To achieve a cardioid configuration, as described by Eq. 4, we need to set n = 29.



Fig. 2 Beampattern of two microphones with n = 0 and n = 29 for the beamformer

#### 2.4 Voice Activity Detector

The VAD proposed in our previous work [3] has been used. Such system was designed to work with 8 kHz sampled input stream collected in 10 ms long frames. In this work, the same VAD works with the system sample rate  $f_{sample} = 31250$  Hz; each frame contains 312 samples, correspondent to a frame length of 9.984 ms. The functionality of the system has been proved through a Matlab<sup>®</sup> simulation. The number of bits for the coefficients,  $b_{coeff}$ , and the value of the threshold for the decision (*voice/silence*) to use have been established by appropriate simulations over a specific cross-validation dataset, as specified in [3].

## **3** FPGA Implementation and Results

The described architecture was implemented onboard a 1.2V *Microsemi IGLOO nano AGL250V2* FPGA. The layout step of the design was performed by using a *Power driven* approach in order to minimize the power consumption. The occupation of the FPGA is of the 98.10% (6027 / 6144 Cores), which corresponds to a design of about 245250 system gates or 2943 Logic Elements (LEs).

Regarding the timing, the system clock was constrained to be 2MHz, giving a maximum working frequency of 6.442 MHz and a minimum slack time of 345 ns. The power consumption of the whole system was estimated by performing a postlayout simulation. The generation of the testbench for the simulation was carried out through Simulink<sup>®</sup>. In particular, by using the  $\Sigma - \Delta$  toolbox, two PDM audio streams were created to emulate PDM microphones for the stimulus of the four inputs of the system. The PDM streams contain two audio tracks appropriately delayed and summed in order to model the situation in which two microphones are stimulated by two sources generated from the angular directions  $\theta = 0$  and  $\theta = \pi$ . In the testbench, the system has been set in order to recover the first track in the channel R and the second track in channel L. The VAD was also enabled in the simulation. In this way, all the processing chain has been stimulated. The post-layout simulation was useful to estimate the *Switching activity* of all the system nodes, allowing to measure the power consumption which turned out to be of 2.264 mW.

#### 4 Conclusions

This work describes a full digital audio speech front-end implementing first order DMA beamforming of PDM signals, right and left channels PDM/PCM conversion and VAD algorithm. The use of PDM microphones, the PDM-based beamforming and the particular design of VAD made the implementation low-power and low-area oriented, permitting the synthesis onboard a *Microsemi IGLOO nano* FPGA.

A post-layout simulation showed that the power consumption of the whole platform is only of 2.264 mW. This feature, together with the functionalities implemented, makes the platform a promising device for the processing of audio speech signals for portable applications.

# References

- Altinok, G.D., Al-Janabi, M., Kale, I.: Improved ultrasound digital beamforming using single-bit sigma-delta modulators with band-pass decimation filters. In: 2011 IEEE International Symposium of Circuits and Systems (ISCAS) (2011)
- Martín-Doñas, J.M., Gomez, A.M., López-Espejo, I., Peinado, A.M.: Dual-channel dnn-based speech enhancement for smartphones. In: 2017 IEEE 19th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6. IEEE (2017)
- Meoni, G., Pilato, L., Fanucci, L.: A low power voice activity detector for portable applications. In: 2018 14th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), pp. 41–44. IEEE (2018)
- Palla, A., Fanucci, L.: High performance embedded short time fourier transform architecture for real-time speech enhancement using differential microphone arrays. In: Embedded World Conference. WEKA FACHMEDIEN GmbH (2017)
- Palla, A., Meoni, G., Fanucci, L.: Area and power consumption trade-off for Σ-Δ decimation filter in mixed signal wearable IC. In: Nordic Circuits and Systems Conference (NORCAS), pp. 1–5. IEEE (2016)
- Sanchez-Hevia, H.A., Gil-Pita, R., Rosa-Zurera, M.: FPGA-based real-time acoustic camera using pdm mems microphones with a custom demodulation filter. In: Sensor Array and Multichannel Signal Processing Workshop (SAM), 2014 IEEE 8th, pp. 181–184. IEEE (2014)
- 7. Schreier, R., Temes, G.C.: Understanding delta-sigma data converters (2005)
- Taghizadeh, M.J., Garner, P.N., Bourlard, H., Abutalebi, H.R., Asaei, A.: An integrated framework for multi-channel multi-source localization and voice activity detection. In: 2011 Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pp. 92– 97. IEEE (2011)

# **Comparison and Implementation** of Variable Fractional Delay Filters for Wideband Digital Beamforming



Gian Carlo Cardarilli, Daniele Giardino, Marco Matta, Marco Re, Francesca Silvestri, Lorenzo Simone and Sergio Spanò

**Abstract** Digital delays represent one of the crucial elements in a digital Beamforming system. If signals involved in the Beamforming are wideband signals, it is not possible to use a phase shifter for the delay generation. In this case, other solutions, like fractional delay filters, are required. This paper compares different fractional delay methods in terms of magnitude response, error phase response, computational costs and hardware complexity, considering as target a FPGA implementation.

# 1 Introduction

The beamforming technique is used to transmit and receive directional signals and it is applied in several scenarios ranging from satellite [1], to unmanned aerial vehicles [2] and to small mediator devices in the Internet of Things to increase the link reception of low-cost sensors [3]. It consists in the combination of M signals coming from M

G. C. Cardarilli e-mail: cardarilli@ing.uniroma2.it

M. Matta e-mail: matta@ing.uniroma2.it

M. Re e-mail: re@ing.uniroma2.it

F. Silvestri e-mail: f.silvestri@ing.uniroma2.it

S. Spanò e-mail: spano@ing.uniroma2.it

L. Simone Thales Alenia Space, Rome, Italy e-mail: lorenzo.simone@thalesaleniaspace.com

© Springer Nature Switzerland AG 2019 S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_53

G. C. Cardarilli · D. Giardino (⊠) · M. Matta · M. Re · F. Silvestri · S. Spanò University of Rome Tor Vergata, Rome, Italy e-mail: giardino@ing.uniroma2.it

antennas, whose difference in phase and amplitude gives information about the beam direction. In order to control the directivity of the antenna arrays, the processing must compensate the impairments introduced by the analog chain [4].

As mentioned above, the beamforming is mainly based on signal delays. This aspect represents one of the crucial elements in particular for the Digital implementation of Beamforming, for its discrete-time representation of signals. In this context, delays typically are realized using phase shifters, but if the involved signals are not broad-band signals, the phase shifters cannot be used and other solutions are required. In this case, a common approach is the use of filters able to generate also delays which are fractional part of the system clock cycle (fractional delays). The main requirements of these filters are:

- 1. Large bandwidth (ideally Nyquist frequency)
- 2. Reduced Magnitude ripple
- 3. Reduced Phase Ripple
- 4. Constant group delay

In this work, we have analyzed different methods for realizing fractional delays, namely: Polyphase Filter [5], Lagrange Interpolation [6] and Weighted Least Squares (WLS) [7]. The experimental results show that the WLS method is the more advantageous in terms of performance and complexity. Considering of this result, we have improved the Frequency Response and Phase Response of WLS method used in [7].

### **2** Variable Fractional Delay and Its Approximation

The generation of different beams in a beamforming system requires the use of a Variable Fractional Delay (VFD). With VFD, a digital system is able to generate fractional delays similarly to the continuous delay line used in analog systems [8]. The impulse response of a VFD is a *sinc* function h[n] = sinc(n-D) where D is the delay constituted of an integer part and a fractional part  $\alpha$ . Since the impulse response of a *sinc* function has infinite length, the ideal VFD is not realizable. For this reason, we have analyzed different methods for obtaining a good approximation of *sinc* function.

### 2.1 Comparison Methods

There are many ways to implement a fractional delay element and, in this work, we consider three different approaches:

- 1. Polyphase FIR filter
- 2. Maximally-flat FIR approximation (Lagrange Interpolation)
- 3. Weighted Least-Squares (WLS) approximation

The Polyphase FIR filter approach is used to implement interpolation and decimation filters [9] and filter banks [10–12]. As explained in [5], the polyphase decomposition can be used to design a fractional delay filter, and a serial implementation can be developed using a time varying filter with the desired delay. A limit of this approach is that only rational fractional delays are available and the filter order must be very high. On the other hand, a proper design of lowpass prototype filter allows to obtain linear phase. For example, a window design method can be used for decreasing ripple in the frequency response. In this way, the designer can implement high-quality wideband VFD filters.

Lagrange Interpolation is the most commonly used approach. It accurately approximates the delay at low frequencies with a very simple implementation (in particular using the Farrow structure). There are many ways to design a maximally-flat FIR [13, 14]. In this work, we use the method described in [6] for obtaining a polynomial approximation:

$$y[n] = \sum_{k=-N_1}^{N_2} P_k(\alpha) x[n+k] = P_{-N_1}(\alpha) x[n-N_1] + \ldots + P_{N_2}(\alpha) x[n+N_2] \quad (1)$$

where  $P_k(\alpha)$  are the Lagrange polynomials and  $\alpha$  is the fractional delay. When big delays are required, the Lagrange method has several problems at high frequencies. In this case, the system has a flat response only at low frequencies. Consequently, it cannot be used in wideband applications unless the order of polynomial is increased.

WLS method is used to solve this problem and it is based on the minimization of the following weighted squared error function:

$$J = \int_{0}^{\delta \pi} \int_{-0.5}^{0.5} W(\omega, \alpha) |e(\omega, \alpha)|^2 d\alpha \, d\omega$$
(2)

where  $W(\omega,\alpha)$  is a non-negative weighting function and  $e(\omega,\alpha)$  is the difference between the actual and desired variable frequency responses.

This technique allows for obtaining good performance using a filter's order lower than the Lagrange method. Also, for this method, there are many algorithms to obtain a polynomial approximation and, in this work, we have used the method exposed in [7], where the authors have investigated on the coefficient symmetry existing in the FIR of the VFD filters.

In this work we have implemented (1) a Polyphase FIR with 64 branches to generate a considerable number of fractional delays, (2) a polynomial approximation for Lagrange method, (3) the WLS method proposed by the authors in [7] with the same order of Lagrange method (Table 1).

In the next table, a summary of the properties of these three methods is shown.

The term "by constant" indicates that the input signal is multiplied by a constant, while "general" indicates that the input signal is multiplied by a variable coefficient. For the Polyphase FIR, the variable coefficients are the taps of subfilters, while for

| Polyphase FIR                                             | Lagrange method                                                                  | WLS method                                                                        |  |
|-----------------------------------------------------------|----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|--|
| Time-Varying FIR                                          | Farrow structure                                                                 | Farrow structure                                                                  |  |
| # branches: 64                                            | # of FIR filters: 7                                                              | # of FIR filters: 7                                                               |  |
| # of taps for single FIR: 8                               | # of taps for single FIR: 8                                                      | # of taps for single FIR: 13                                                      |  |
| Prototype filter order: 512                               | Polynomial order: 7                                                              | Polynomial order: 7                                                               |  |
| <ul><li># of multiplications</li><li>general: 8</li></ul> | <ul><li># of multiplications</li><li>by costant: 56</li><li>general: 7</li></ul> | <ul><li># of multiplications</li><li>by constant: 91</li><li>general: 7</li></ul> |  |

Table 1 Summary of the properties of each method that have been analyzed



Fig. 1 The three methods are compared with  $\alpha = 0.5$  (worst case). On the left, there is the magnitude comparison and, on the right, there is the error phase response compared to the ideal Phase Response



Fig. 2 Farrow Filter implements a polynomial that depends on a variable. In this case, the variable is the fractional delay  $\alpha$ 

the others methods the coefficients coincide with the fractional delay. Also, we would point out that the chosen coefficients of the WLS method are constant and symmetric, and so you need only 45 fixed multiplications.

In Fig. 1, a Magnitude Frequency Response and Error Phase Response are shown.

The most critical blocks for the FPGA implementation are the memory, used for storing the filter parameters and the delay, and the general multiplications. In this context, the previous table and figure show that the WLS method does not need of many resources and its behavior in frequency is suitable for wideband application (Fig. 2).

For the above results, we decided to implemented the WLS method using the Farrow Filter structure [15].

#### **3** WLS Implementation and Optimization

In this implementation, we used the same WLS method of [7] for obtaining the polynomial approximation. In particular, we modified the design procedure to get a polynomial with the same order of [7] but with a reduced ripple. This improvement was obtained by changing the weighting function.

As in [7], we used two regions piecewise linear function but with different values. Table 2 compares the function of [7] with that proposed in our paper.

Our polynomial is implemented using the same Farrow filter of [7]. Figure 3 shows that our WLS method has a greater bandwidth with small ripples and, consequently, a reduced degradation of the signal.

The polynomial approximation of the WLS method have been validated using MATLAB. The hardware implementation, based on a Farrow structure, has been tested using the FPGA-In-Loop (FIL) simulation. This approach synchronizes a Simulink or MATLAB simulation with an HDL design running on a FPGA on board.

The board used is a Xilinx Zedboard and a post synthesis resources utilization of Farrow filter with 16 bits of resolution is shown in Table 3.

| WLS of [7]                                                                                                 | WLS of this work                                                                                                    |
|------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|
| $W_1(\omega) = \begin{cases} 1 & \omega \in [0, 0.88\pi] \\ 47 & \omega \in (0.88\pi, 0.9\pi] \end{cases}$ | $W_1(\omega) = \begin{cases} 10^3 \ \omega \in [0, 0.86\pi] \\ 0.2 \ \omega \in (0.86\pi, \pi] \end{cases}$         |
| $W_2(\alpha) = \begin{cases} 1 & \alpha \in [0, 0.456] \\ 3700 & \alpha \in (0.456, 0.5] \end{cases}$      | $W_2(\alpha) = \begin{cases} e^{-10^3 \alpha} & \alpha \in [0, 0.49] \\ -10^3 & \alpha \in (0.49, 0.5] \end{cases}$ |
| Wideband Methods Comparison: Magnitude (dB) Wideband                                                       | Methods Comparison: Phase Error Response (deg)                                                                      |

Table 2 Comparison of the weighting functions between [7] and this work

Fig. 3 The WLS methods are compared with  $\alpha = 0.5$  (worst case). On the left, there is the Magnitude comparison and, on the right, there is the Error Phase Response compared to the ideal Phase Response

Phase Error Response

-10

-20

-30

-40

-50

0

0.2

0.4

Normalized Frequency ( $\times \pi$  rad/sample)

0.6

0.8

Magnitude (dB)

-5

10

-15

0

0.2

0.4

0.6

Normalized Frequency ( $\times \pi$  rad/sample)

0.8

| Table 3 Resources           utilization of the farrow filter | Site type | Estimation | Available | Utilization % |
|--------------------------------------------------------------|-----------|------------|-----------|---------------|
|                                                              | LUT       | 6277       | 53200     | 11.80         |
|                                                              | FF        | 1701       | 106400    | 1.60          |
|                                                              | DSP       | 7          | 220       | 3.18          |

WLS of [4]

-WLS of this work

We have implemented the FIR filters of the Farrow Filter using the LUTs for the fixed multiplicands because the coefficients are constant. We also exploited the advantage of the coefficients symmetry proposed in [7]. The DSP blocks of the FPGA have been used to implement the product between the output filters and the desired fractional delay  $\alpha$  (see Fig. 2).

### 4 Conclusions

This work shows the results of a comparison of three methods for obtaining VFD for digital beamforming applications in terms of performance, resources needed and flexibility. The WLS was selected and implemented using a Farrow Filter. The method chosen for implementing the WLS [7] allows to reduce the computational costs, and we modified the weighting functions to obtain a polynomial approximation with a greater frequency response, with the same order of [7]. This study allows to create a low complexity VFD for wideband digital beamforming.

## References

- 1. Rossi, T., De Sanctis, M., Maggio, F.: Evaluation of outage probability for satellite systems exploiting smart gateway configurations. IEEE Commun. Lett. **21**(7), 1541–1544 (2017)
- Dalmasso, I., Galletti, I., Giuliano, R., Mazzenga, F.: WiMAX networks for emergency management based on UAVs. In: IEEE—AESS European Conference on Satellite Telecommunications. (IEEE ESTEL), Rome, Italy, Oct. 2012, pp. 1–6
- Giuliano, R., Mazzenga, F., Neri, A., Vegni, A.M.: Security access protocols in IoT capillary networks. IEEE Internet Things J. 4(3), 645–657 (2017)
- Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Re, M., Rufolo, G., Bernocchi, G.: Analog chain calibration in digital Beam-Forming applications. ARPN J. Eng. Appl. Sci. 13(2), 752–760 (2018)
- Välimäki, V., Laakso, T.I.: Fractional delay filters—design and applications. Nonuniform Sampling. Springer, Boston, MA, pp. 835–895 (2001)
- 6. Sanjit Kumar, M.I.T.R.A., Yonghong, K.U.O: Digital signal processing: a computer-based approach. New York: McGraw-Hill Higher Education (2006)
- Deng, T.-B.: Symmetry-based low-complexity variable fractional-delay FIR filters. In: ISCIT 2004. IEEE International Symposium on Communications and Information Technology, 2004, Vol. 1. IEEE (2004)
- Laakso, Timo I., et al.: Splitting the unit delay [FIR/all pass filters design]. IEEE Signal Process. Mag. 13(1), 30–60 (1996)
- Bellanger, M., Bonnerot, G., Coudreuse, M.: Digital filtering by polyphase network: Application to sample-rate alteration and filter banks. In: IEEE Transactions on Acoustics, Speech, and Signal Processing, pp. 109–114 (1976)
- Acciarito, S., Cardarilli, G.C., Giardino D., Khanal, G.M., Di Nunzio, L., Fazzolari, R., Re, M., Silvestri, F.: FPGA Implementation of a Channelizer with 2048 Channels utilizing USRP-SDR platform for satellite communications. In: Applications in Electronics Pervading Industry, Environment and Society (2017)

- Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Fereidountabar, A., Giuliani, F., Re, M., Simone, L.: Comparison of jamming excision methods for direct sequence/spread spectrum (DS/SS) modulated signal. J. Theor. Appl. Inf. Technol. 95(13), 2878–2888 (2017)
- Cappello, S., Cardarilli, G.C., di Nunzio, L., Fazzolari, R., Re, M., Albicocco, P.: Flexible channel extractor for wideband systems based on polyphase filter bank. J. Theor. Appl. Inf. Technol. 95(16), 3841–3850 (2017)
- Hermanowicz, Ewa: Explicit formulas for weighting coefficients of maximally flat tunable FIR delayers. Electron. Lett. 28(20), 1936–1937 (1992)
- Minocha, Shailey, Roy, S.C., Kumar, Balbir: A note on the FIR approximation of a fractional sample delay. Int. J. Circuit Theory Appl. 21(3), 265–274 (1993)
- Farrow, Cecil W.: A continuously variable digital delay element. In: IEEE International Symposium on Circuits and Systems. IEEE (1988)

# Autonomous Sail Surface Boats, Design and Testing Results of the MOUNTAINS Prototype



Enrico Boni, Marco Montagni and Luca Pugi

Abstract In previous research activities authors have developed a prototype of innovative Autonomous Sail Surface Boat, briefly called MOUNTAINS. This prototype has been the object in the last year of intense activities concerning both design and calibration that have been refined producing an enhanced version aiming to solve some limitations of the previous design that are more generally a clear innovation respect to the state of art of existing sail drones. The most important innovation is represented by an innovative configuration of sails which is able, in conjunction with a revised version of the control system, to assure higher and more robust performances of the vessel respect to different operating conditions. Proposed solution have been tested experimentally at Renai Lakes and obtained results are also described in this work.

# 1 Introduction: Autonomous Sail Drones, State of the Art

There is an increasing interest for the study of autonomous surface sea-vehicles [1] due to the necessity of assuring prolonged low cost patrolling missions that should be performed for various reasons such as:

- monitoring of relevant sites (marine archeological sites [2], protected marine areas, marine production sites, etc.);
- data collection of extended lakes or marine areas as example to verify large scale distribution of pollutants or of relevant resources (food, minerals etc.) or to acquire big, coherent data sets needed for the investigation of large scale phenomena like climate change, marine currents, evolution of biological diversity and so on.

E. Boni · M. Montagni

© Springer Nature Switzerland AG 2019

DINFO (Dipartimento di Ingegneria dell' Informazione), Università degli Studi di Firenze, Via di Santa Marta 3, 50139 Florence, Italy

L. Pugi (🖾) DIEF (Dipartimento di Ingegneria Industriale), Università degli Studi di Firenze, Via di Santa Marta 3, Florence, Italy e-mail: luca.pugi@unifi.it

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_54



Fig. 1 a, b Advanced autonomous sail vehicles, a the Saildrone<sup>TM</sup> and b the Datamaran<sup>TM</sup>

Sail propulsion is an ideal solution in order to assure a prolonged autonomy of the vehicle since it's based on a renewable energy source (the wind). Energy needed for additional on board systems such as navigation logic and transported payload should be provided by additional renewable sources (as example solar panels) that can be easily installed on the vehicle. Further advantages of this solution are represented by the reduced impact in terms of noise and more generally in terms of emissions produced by a sail propulsion which are substantially null respect to the other systems that are most commonly exploited for autonomous vehicles [3].

It should be noticed that sail propulsion excluding limited navigation and guidance surfaces, such as the rudder can substantially operate avoiding moving parts in the water. This is a clear advantage in terms of safety since living beings can be harmed or damaged by moving parts of the propulsion system. This is also a quite interesting vehicle reliability and robustness respect to various kind of debris (natural or artificial) that should harm a conventional propulsion system with moving parts.

For all the above described systems there is a wide industrial and research interest in the development of sail drones. In particular, there are some advanced commercial products such as the Saildrone<sup>TM</sup> [4] and the Datamaran<sup>TM</sup> [5] visible in Figs. 1a, b which are substantially based on the following common concepts:

• They are catamarans or more generally multi-hull vessels (such as the Saildrone<sup>™</sup> visible in Fig. 1a in order to increase stability especially against roll rotations. Unfortunately, when a complete roll-over occurs the correct configuration of the vehicle is almost impossible to be restored. Some systems as the active solution adopted for the AMS Datamaran<sup>™</sup> (visible in Fig. 1b are very interesting but really complex, since the affect the whole design of the vehicle

• Sails are solid wing/aerodynamic profiles. Solid profiles are robust, easier to be simulated (single domain simulations aerodynamic simulations are easier respect to multi-domain, aero-elastic ones). Also actuation is simpler since it's reduced to the control of the angular position of a rigid elements as visible in the example of Fig. 1a. However, they are heavier respect to conventional sails and at the end more expensive and less performing respect to variable/off design conditions.

Looking to existing solutions, it is possible to detect another interesting aspect of this kind of propulsion: to manage sails in a satisfactory way the on board systems has to continuously monitor or estimate wind and or surface currents conditions, so some important meteorological or oceanographic information should be easily obtained as simple acquisitions of estimated navigation states of the vehicle avoiding the need of additional payload since the same vehicle is used as a servo-sensor.

Authors have worked to the development of an innovative prototype the so called MOUNTAINS [6] (MOnitoring with UNmanned Teleoperated or Autonomous INnovative Sail-drone) which is described in Sect. 2.

# 2 "Mountains" Prototype

Design of Mountains prototype have started from specifications described in Table 1.

| Specification       | Value                                                           | Note                                                                                          |  |
|---------------------|-----------------------------------------------------------------|-----------------------------------------------------------------------------------------------|--|
| Length              | Less than 2 m                                                   | Easy transportation                                                                           |  |
| Weight              | 20 kg (*no payload)                                             | Manual handling                                                                               |  |
| Sail                | Conventional Sail System                                        | Lighter, Cheaper, Efficient.                                                                  |  |
| Sail actuation      | commercial servo-systems                                        | Cheap assembly and maint.                                                                     |  |
| Sail control        | Full sail setting                                               | Sail settings adjusted according different weather/op. conditions                             |  |
| Elec. control Units | Cheap Microcontrollers                                          | Cheap assembly                                                                                |  |
| Anemometer          | Low cost automotive components                                  | Cheap assembly and maint. Easy customization [6]                                              |  |
| Energy Man.         | Solar Panels and backup batteries (assuring 48 h of operations) | Night functionality, robust against weather or system degradation                             |  |
| Autonomy            | Virtually unlimited                                             |                                                                                               |  |
| Use mode            | Autonomous/rem. control                                         | Easy calibration with remote control                                                          |  |
| Production cost     | 2000€                                                           | Calculated considering pure costs<br>of materials that can be acquired<br>for a single on web |  |

Table 1 Main specifications and features of MOUNTAINS prototype



Fig. 2 a, b Comparison between a MOUNTAINS version 1.0 (2017) and b the new 2.0 one

General aim of this specifications it's to be thought to be able to exploit thousands of years of naval technology in order to improve cost and performances of the proposed solutions with advanced mechatronic technologies that are available in large volumes at cheap prices (the so called "Makers Revolution" [7, 8]) allowing fast prototyping, distributed manufacturing, creation of light small drones that can be easily transported, managed and put in service by no more than one-two person with the availability of reduced or near to null infrastructures (no lifting device, a car or a small van for transportation, lack of specific laboratories that are difficult to be managed for field activities). Finally, it should be noticed a possible scaling of the proposed solution for the creation of automated cruising, sailing technologies that can be applied to leisure boats such as yachts: these are applications that are often marginally considered by drone developers thus representing an enormous potential market.

In Fig. 2a, b authors compare the current design evolution of the sailing system from the first mountain prototype (version 1.0, Spring 2017) to the current version (version 2.0, Spring 2018).

Respect to catamarans described in Figs. 1a, b the proposed solution has a single hull stabilized by a relatively big kneel in order to assure that, in case of roll over, weight distribution should be able to cause a return of the boat in the right position. Respect to original Mountains prototype (version 1.0), the current vehicle (2.0) is equipped with servo-regulated sails that can be automatically removed or restored according weather conditions in order to prevent roll over or sail damage. In case of vehicle roll-over an automatic winding system (object of a dedicated publication) can reduce sail surfaces to a near to null vehicle in order to make easier the restoring of a stable vehicle configuration (this is an innovation respect to the previous version



Fig. 3 Simplified Scheme of the inner navigation loop (low-level drone navigation system)



**Fig. 4** a, b Experimental behavior of the MOUNTAINS sail drone (test performed at Renai Lakes in Spring 2018), a performed path, b time history of boom wind and rudder angles

of the vehicle). Also the adoption of two sails (Mainsail and Headsail) assures a more equilibrated distribution of propulsion forces due to wind aiming to reduce the usage of the rudder, increase efficiency and further increase the stability respect to roll-over. A simplified control scheme of the proposed navigation loop is visible in Fig. 3: since angular position of sails is still controlled by a single actuator, authors have substantially maintained the same control architecture for both vehicle versions. In the new version of the vehicle it is possible to adjust sail surfaces respect to wind intensity so the new controller is able to schedule its gains respect to the state of sails.

In Fig. 4a, b authors show some preliminary results concerning experimental activities performed at the Renai Lakes near Florence. Results are referred to a very small closed loop performed with conditions that substantially stress the ability of the control system to correctly drive the drone.

- Small distances between waypoints: this condition involves that effects due to noise on localization and guidance should produce high disturbances.
- Low wind speed (3–5 km/h): in this conditions errors due to poor noise to signal ratio in feedback sensors (anemometers) can be easily verified, also turning policy is stressed by modest propulsive forces.
- Square/Rectangular Closed Loop trajectory: a rapid sequence of different navigation conditions due to variable relative direction of wind is verified; typically, the trajectory is chosen to verify four different conditions (in irons/Close Howling reach, running, reach with wind in the opposite direction). Since the loop is short its' quite easy to produce a test in which wind conditions are almost known and constant.

It's interesting to notice that obtained performances in terms of trajectory are quite satisfying also the rudder profile is very regular and the rudder is almost aligned to hull and kneel for a large part of the navigation path: this is a very positive feature in terms of efficiency, in terms of controllability of the system and finally in terms of reliability of the used servo-actuator.

## **3** Conclusions and Future Developments

Results of current activities are quite encouraging considering that this research is almost completely self-financed through crowd-founding. In particular, it should be noticed that the proposed solution could be applied with limited modifications also to conventional boats not originally designed for this purpose. So, authors are currently working to a retro-fit, scalable kit and conversion procedure that should be adapted to some of the most common boat layouts commercially available [9]. Also scaling criteria [10] should be another interesting aspect of the problem both in terms of research and industrial impact [11].

Acknowledgements Authors wish to tanks all the students and the private sponsors that have sustained this initiative coordinated by Marco Montagni that have enthusiastically supported this work (https://www.facebook.com/labarchettamagica/).

# References

- 1. Stelzer, R., Pröll, T.: Autonomous sailboat navigation for short course racing. Robot. Auton. Syst. **56**(7), 604–614 (2008)
- Allotta, B., Pugi, L., Bartolini, F., Costanzi, R., Ridolfi, A., Monni, N., Gelli, J., Vettori, G., Gualdesi, L., Natalini, M.: The thesaurus project, a long range AUV for extended exploration, surveillance and monitoring of archaeological sites. In: Computational Methods in Marine Engineering V—Proceedings of the 5th International Conference on Computational Methods in Marine Engineering, MARINE 2013, pp. 760–771 (2013)

- Allotta, B., Pugi, L., Bartolini, F., Ridolfi, A., Costanzi, R., Monni, N., Gelli, J.: Preliminary design and fast prototyping of an Autonomous Underwater Vehicle propulsion system. In: Proceedings of the Institution of Mechanical Engineers Part M: Journal of Engineering for the Maritime Environment, vol. 229 (3), pp. 248–272 (2015). https://doi.org/10.1177/ 1475090213514040
- Voosen, P.: Fleet of sailboat drones could monitor climate change's effect on oceans http:// www.sciencemag.org. On line https://doi.org/10.1126/science.aat5323
- 5. Images and tech documentation freely available at the official site of ASM. www.automarinesys. com
- Pugi, L., Allotta, B., Enrico, B., Guidi, F., Montagni, M., Massai, T.: Integrated design and testing of an anemometer for autonomous sail drones. J. Dyn. Syst. Measurement Control Trans. ASME 140(5), 055001 (2018). https://doi.org/10.1115/1.4037840
- 7. Marsh, P.: The New Industrial Revolution: Consumers, Globalization and the End of Mass Production. Yale University Press (2012)
- Gomes, L., Costa, A., Fernandes, D., Marques, H., Anjos, F.: Improving instrumentation support and control strategies for autonomous sailboats in a regatta contest. In: Robotic Sailing 2016, pp. 45–56. Springer, Cham (2017)
- Gomes, L., Costa, A., Moutinho, F., Mota, R.: Attracting students to engineering through autonomous sailing yacht development. In: 2015 IEEE International Conference on Industrial Technology (ICIT), pp. 3252–3257. IEEE (2015)
- Lavigne, E., Piquemal, B., Bourdon, A., Chesné, S., Guillou, G., Babau, J.P.: A process for evaluating parametric models for mechanical systems simulation: the case of a sailboat. In: Sofware and Hardware Architectures for Robots Control (2016)
- Silva Junior, A.G.D., Lima Sa, S.T.D., Santos, D.H.D., Negreiros, Á.P.F.D., Souza Silva, J.M.V.B.D., Álvarez Jácobo, J.E., Garcia Gonçalves, L.M.: Towards a real-time embedded system for water monitoring installed in a robotic sailboat. Sensors 16(8), 1226 (2016)

# A Low Cost ALS and VLC Circuit for Solid State Lighting



Massimo Ruo Roch and Maurizio Martina

Abstract Solid state lighting is nowadays widely diffused both in residential and office or industrial environment. Ambient light sensing to modulate lamp power is typical, too, but sensors inside a lamp are a challenge, due to the high flux of these sources, which easily saturates nearby light detectors. Usually, separate sensing devices must be introduced in the system, thus increasing complexity and cost. In this work, a methodology will be presented, to allow integration of a light sensing device inside a lamp, using low cost circuitry to mitigate interactions between high power LED sources and sensing photodiodes. Moreover, the same circuit allows visual light communication among sources.

Keywords LED · ALS · VLC · IoT · Sensing

# 1 Introduction

Usage of white LEDs as light sources is rapidly pervading the market, with large scale marketing starting around 2010. However, the initial approach was to use these kind of devices just as a plug-in replacement for traditional lamps, i.e. incandescent and fluorescent sources. The driving force is mainly energy savings, around 50% with respect to fluorescent lamps, and up to 80% if compared to incandescent bulbs [1, 2].

Specific characteristics of LEDs, seen as electronic devices, opened new usage perspectives, as they are dramatically different from their predecessors. Main differences are as follows:

- Quasi linear dependence of light flux from current passing through the LED. This relationship allows to easily modulate the total emitted power, leading to light dimming in the environment, to save energy, or to improve user experience.
- Reduced dimensions. White LEDs have small dimensions (1 mm × 1 mm for a 1W source). Design of very small lamps is nowadays typical.

M. Ruo Roch (🖂) · M. Martina

Dipartimento di Elettronica e Telecomunicazioni, Politecnico di Torino, Turin, Italy e-mail: massimo.ruoroch@polito.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_55

- High modulation speed. White LEDs can change their flux on the nanosecond timescale, allowing to use them both to illuminate, and to carry information, given a proper modulation scheme.
- Complex driving electronics required. The light source, typically built from several parallel strings of series connected LEDs, is typically driven by a current generator, with voltages in the range of 10–100V, and currents ranging from 100 mA up to several Ampere. Moreover, regulations requires high power factor designs of the power supply unit.

Introducing light sensors in the environment to collect data about illumination, and to send them to a control unit able to change the flux emitted by LED lamps, is again a typical situation in the age of Internet of Things (IoT) [3].

The overall system is usually realized with physically separated objects. Specifically, the following devices must be introduced:

- Solid state lamps, with a control input to dim them.
- Light sensors, with a data output to send collected data.
- A central control unit, which, according to sensed information, modulates lamps power.
- A communication network, to join together these building blocks. Several solutions exist, both wired (e.g. DALI), and wireless (e.g. ZigBee)

Having separated sensors and lamps leads to a cost increase, as deployment costs are related to the number of different objects to be installed. On the other hand, integrating a light sensor inside a LED lamp is a challenge, as the high flux level of LED can easily saturate nearby photo sensors. Several topology have been suggested to overcome this problem [4–6], but anyway they still require to optically shield the sensor from the LEDs, and this is not practical in compact lamps.

Visual Light Communication (VLC) is rapidly gaining interest, too, both in indoor and outdoor environments [7–9], but main interest is in achieving high bandwidth efficiency, and LED lamps are mainly used as transmitters [10]. Besides, spread spectrum techniques are usually adopted [11].

In this work, instead, simplicity and cost is the driving force, to allow large scale deployment of ALS and VLC distributed systems fully integrated inside low cost LED lamps. In this context, high bandwidth is not a premium, as IoT system do not need to transmit too much information (ambient light levels, temperature, people presence, etc).

#### 2 **Proposed Solution**

In a typical solid state lamp, represented in Fig. 1, a Power Supply Unit (PSU) convert the AC line voltage to a DC constant current. Usually this current is not perfectly constant, and is slightly modulated at two times the AC line frequency, to satisfy High Power Factor (HPF) requirements. LEDs are directly fed by the PSU. Last, the



Fig. 1 Standard LED lamp



Fig. 2 Modified LED lamp

PSU, if dimmable, has a control input (DIM), used to change the amount of light emitted by the lamp.

In the proposed solution (Fig. 2), a control and sensing unit is interposed between the PSU and the LEDs. This block has the task to allow accurate Ambient Light Sensing (ALS), and, at the same time, to modulate emitted light in such a way to allow Visual Light Communication (VLC). The light sensor is a photodiode, which act both as a sensor for ambient light and as a receiver for optical signals coming from other lamps.

To accomplish these functions, at each cycle of the AC line input the LEDs are turned off for a short interval (a negligible fraction of the cycle time). This interval is centered around the zero crossing of the sinusoidal input waveform. The usage of the AC line as a time reference gives best results, as it allows automatic synchronization of every lamp connected to the same power connection. Moreover, performing measurement and communication activities at a time in which the power line is crossing zero, minimizes errors due to interference's, both originated by the internal PSU, and injected from the outside.

Following LEDs turn off, a short amount of time must elapse, to accomplish photo detector and amplifier settling time. Now, a sequence of samples is gathered and stored, for further calculations. If no communication is needed, the average value of these samples represents the sensed ambient light level.

If communication is needed, instead, a modulation scheme must be introduced. The chosen approach is based on the emission of few pulses by the LEDs, injecting a controlled current. Zeros and ones are represented by different number of pulses at different frequencies. A key point is that the total energy must not change, to avoid visible luminance changes of the lamp as a function of the transmitted pattern. As



Fig. 3 Modulation patterns



Fig. 4 Control and sensing block

an example, the two different modulations can be 4 pulses at 100 kHz, and 8 pulses at 200 kHz to represent '0' and '1', respectively (Fig. 3).

The internal structure of the circuit designed to implement the above functionalities is depicted in Fig. 4.

A photodiode, with spectral response resembling the human eye one, feeds a transresistance amplifier. It has a differential output, as it must be mounted on the front of the lamp, near the LEDs, and the noise level along the cables can be quite high. The lamp PSU itself has nodes swinging from 0 to 700 V at 20–200 kHz, generating significant EMI. Besides, the gain of the transresistence amplifier is limited, as it must not saturate even when the LEDs are turned on. It means the signal measured when LEDs are turned off has limited amplitude, and a single ended approach would degrade precision and reliability.

The differential signal is fed to an instrumentation amplifier (A1), used to eliminate common mode noise, and then sent to a sample-and-hold. This one is in hold mode when LEDs are turned on, and in sample mode when they are turned off. Its task is to minimize transients at the input of the successive stages, allowing an easier design (and higher gain) of the following band-pass filter A2. In fact, an abrupt change at the input of the filter (fall down of the LED power—see Fig. 3) would be anyway amplified, and must be avoided. A2 is introduced to further improve SNR, and to increase sensitivity.

The output of A1 and A2 are sent to a microcontroller (MCU), which samples the signals while LEDs are off, and store them for further computations. Signal processing algorithms are executed while LEDs are on, in the remaining portion of the AC line cycle.

Two different calculations must be performed:

• ALS. The actual value of measured luminance is obtained simply averaging the samples acquired while LEDs are off, but outside the communication interval.



Fig. 5 Details of the modulator block

• VLC. To detect the received bit, a differential approach is used, comparing the amplitude of the frequencies corresponding to '0' and '1' respectively. To obtain a boost in sensitivity, the samples are pre-processed through two Goertzel filters, centered on the desired frequencies. These filters [12, 13] are marginally stable, but they can be efficiently used with short sequences.

Light modulation is implemented by the modulator block of Fig. 4. A simplified schematic is visible in Fig. 5. It is basically a switch, used to interrupt the current flow from the lamp PSU, and a current generator, driven by the DAC output of the MCU. The current generator is implemented through a current mirror with gain. The gain is needed as LEDs driving current is typically two order of magnitude greater then the output current capabilities of the DAC. The switch is implemented through a pair of source coupled NMOS transistors. This topology neglects the presence of the body diodes, allowing to block current flow in both directions (right side of the figure).

#### **3** Implementation and Results

A physical implementation has been designed, using off-the-shelf discrete components, and simulated, taking in account parasitics. The circuit has been designed in the hypothesis to insert it inside a T8 form factor LED tube. Special care has been taken to reduce overall cost, obtaining a module which requires less than 10 US\$.

In Fig. 6 relevant waveforms of the ALS and VLC blocks are shown. The upper waveform is the signal at the input of the S/H, and the center one is the output of the S/H, showing almost complete cancellation of LEDs turn-on/turn-off transients. Last, bottom waveform is the modulated signal at the output of the band-pass filter, sent to the input of the A/D converter into the MCU, to be processed by the two Goertzel digital filters.



Fig. 6 Simulation output

#### References

- 1. Nakamura, S.: Energy savings by LED lighting. In: 2015 Conference on Lasers and Electro-Optics (CLEO), San Jose, CA, p. 1 (2015)
- Garcia-Llera, D.: Optimizing LED lamps design for street lighting with staggered arrangement allowing energy saving strategies in a Lighting Smart Grid context. In: 2015 IEEE Industry Applications Society Annual Meeting. Addison, TX, pp. 1–8 (2015)
- Dheena, P.P.F., Raj, G.S., Dutt, G., Jinny, S.V.: IOT based smart street light management system. In: 2017 IEEE International Conference on Circuits and Systems (ICCS), Thiruvananthapuram, pp. 368–371 (2017)
- Belmonte, P.N.A., Chaves, L.M., Torres, F.S., de Lima Monteiro, D.W.: On overcoming photodetector saturation due to background illumination while maintaining high sensitivity by means of a tailored CMOS pixel. In: 2018 Global LIFI Congress (GLC). Paris, pp. 1–5 (2018)
- Belmonte, P.N., French, P.J., Monteiro, D.D.L., Torres, F.S.: Linear high-dynamic-range bouncing pixel with single sample. In: Proceedings of 2013 International Image Sensor Workshop ISSW, pp. 12–16 (2013)
- Spivak, A., Belenky, A., Fish, A., Yadid-Pecht, O.: Wide-dynamic-range CMOS image sensors a comparative performance analysis. IEEE Trans. Electron. Devices 56(11), 2446–2461 (2009)
- Ayub, S., Kariyawasam, S., Honary, M., Honary, B.: A practical approach of VLC architecture for smart city. In: 2013 Loughborough Antennas Propagation Conference (LAPC), pp. 106– 111, Nov 2013
- Boubakri, W., Abdallah, W., Boudriga, N.; A light-based communication architecture for smart city applications. In: 2015 17th International Conference on Transparent Optical Networks (ICTON), pp. 1–6, July 2015
- Pathak, P.H., Feng, X., Hu, P., Mohapatra, P.: Visible light communication networking and sensing: a survey potential and challenges. IEEE Commun. Surv. Tutor. 17(4), 2047–2077 (2015)
- Hranilovic, S.: On the design of bandwidth efficient signalling for indoor wireless optical channels. Int. J. Commun. Syst. 18(3), 205–228 (2005)
- Martina M., Roch M.R., Ghirardi F.: Implementation of a spread-spectrum-based smart lighting system on an embedded platform. In: De Gloria, A. (eds) Applications in Electronics Pervading Industry, Environment and Society. Lecture Notes in Electrical Engineering, vol. 351. Springer, Cham (2016)
- Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-Time Signal Processing. 2nd ed. Upper Saddle River, NJ: Prentice Hall, 870 p. (1999). ISBN 01-375-4920-2
- 13. Zplata, F., Kasal, M.: Using the Goertzel algorithm as a filter. In: 2014 24th International Conference Radioelektronika. Bratislava, pp. 1–3 (2014)

# Chamberlin State-Variable Filter Structure in FPGA for Musical Applications



Adriana Ricci, Mattia Silvestrini, Massimo Conti, Marco Caldari and Franco Ripa

**Abstract** This paper describes the hardware implementation of the Chamberlin filter and how its structure can be mapped on Xilinx FPGA (Field Programmable Gate Array) for musical applications. Furthermore, it shows the comparison of the performance of the Chamberlin filter with other types of digital filter for different audio inputs. Finally, the structure of the Chamberlin Filter has been implemented in C code to check if high-level synthesis leads to further improvements in system optimization.

# 1 Introduction

Digital electronic musical instruments usually generate sound essentially using Pulse Code Modulation (PCM) synthesis. PCM synthetizer is made up of several components (oscillator, controlled filters, controlled amplifiers and controller devices) that are combined together to generate the desire sound.

Polyphony is the ability of an electronic instrument to process and simultaneously play a certain number of voices. The polyphony of a modern electronic musical instrument is at least 128. These mechanisms of sound generation must be performed in a very short time to reduce as much as possible the latency that exists between a certain action, such as the activation of a key, and the actual perception of the produced sound. Therefore, the signal processing must be extremely fast to execute complex algorithms and simultaneously on 128 voices. Years ago, the core of a digital electronic musical instrument was an ASIC, since a general-purpose DSP was not

M. Caldari · F. Ripa KORG Italy S.P.A., Osimo, Italy e-mail: caldari@korg.it

F. Ripa e-mail: ripa@korg.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_56 469

A. Ricci · M. Silvestrini · M. Conti (🖂)

Dipartimento di Ingegneria dell'Informazione, Università Politecnica delle Marche, Ancona, Italy e-mail: m.conti@univpm.it

able to perform signal processing in real time. Technological improvement in the past few years has opened new paths for engineers to design various applications on FPGAs. The FPGAs provide the high performance of the ASIC while avoiding its high development cost and its inability to accommodate design modifications after production. Digital signal processing is an important area where FPGAs have found many applications in recent years. FPGA contains over a million equivalent logic blocks and tens of thousands of flip-flops. This means that it is not possible to use traditional methods of logic design involving the drawing of logic diagrams when the digital circuit may contain thousands of gates [1]. The complex algorithms for electronic musical instruments must be accurately translated into VHDL to reduce latency, hardware occupancy and not less important aspect power dissipation. Some implementations of digital filters on FPGA are reported in [2]. Today, software tools have been implemented that make automatic design synthesis from source C/C++ code down to RTL architecture. Modern HLS (High-Level Synthesis) compilers, like Xilinx Vivado HLS, support the evaluation of different architectures, allowing an extensive design space exploration without much effort.

In our work, we consider the sound effect of the Chamberlin filter [3], widely used but, up to now, implemented only in software and not with a specific hardware. This paper explains the process of designing a Chamberlin state-variable filter in VHDL and in MATLAB. The design is tested by different input signals. Then the filter outputs are compared for explaining the effects of quantization error between fixed-point and floating-point designs. Furthermore, the Chamberlin state-variable filter VHDL design is compared with FIR filter design with two different orders for evaluating the performance of design synthesis on FPGA in terms of power and resource utilization. Finally, the Chamberlin state-variable filter VHDL design is replicated in C code to evaluate the performance after the automatic high-level synthesis.

#### 2 Chamberlin State-Variable Filter

We study the Chamberlin filter as a component of the processing chain in the digital musical synthesis,

The Chamberlin structure is a form of IIR (Infinite Impulse Response) filter introduced to computer music in [3] and attained much popularity and successive refinement since.

Like all the IIR filters, it tends to have better magnitude responses, require fewer coefficients, require less storage for variables, have lower latency than the FIR filters.

The availability of poles in the transfer function makes it possible to satisfy the filter constraints with a transformation of a lower order than that required by a FIR filter implementation. However, an originally stable IIR filter characterized by infinite precision coefficients may be unstable when coefficients get quantized due to implementation [4]. Apart from stability and the desired frequency response, the Chamberlin filter structure allows to control the cut-off frequency  $f_c$  and the quality



Fig. 1 Chamberlin state-variable filter structure

factor Q and gets a complete set of filter types (low pass, high pass, band pass and band stop filtering) using a single hardware structure. Figure 1 shows the structure of the Chamberlin filter. F and D are the control parameters of the Chamberlin filter. F controls the natural frequency  $f_0$ , while D sets the damping ratio, which is inversely proportional to quality factor Q. In the undamped case (D = 0), the natural frequency  $f_0$  coincides with the cut-off frequency  $f_c$ , and the following relation holds with  $f_s$  denoting the sample rate (48 kHz for musical applications):

$$F = 2\sin\left(\pi f_c / f_s\right) \tag{1}$$

The transfer function of the Chamberlin filter is

$$H(z) = N(z)/[z^{2} + z(F^{2} + DF - 2) + (1 - DF)]$$
<sup>(2)</sup>

where N(z) is a linear combination of zeros of transfer function as explained in detail in [5].

An analysis on transfer function of outputs of the Chamberlin filter shows the behaviour and stability constraints for varying values of the F and D parameters [5]. The discretized filter converges to the continuous one with decreasing F, and the condition of Flat Band response is for F = D = 1.

The most prominent deviation from ideal behaviour is found for large D, where the filter starts to peak around  $f_s/2$  and finally goes unstable with increasing F. Consequently, the limit of stability for the Chamberlin filter arises when 0 < F < 1 and 0 < D < 2. By Eq. (1) it's obtained that  $f_c = f_s/6$ ; then for musical applications the maximum cut-off frequency fc is 8 kHz.

#### **3** Hardware Implementation

The Chamberlin filter design is implemented in VHDL architecture by Xilinx VIVADO 2017.2 tool and, then, it is tested for different audio input signal. Finally, its structure has been mapped on Xilinx FPGA ZinqBoard 7020.





In the hardware implementation of the filter, the representative numbers of the variables and the coefficients must be stored in finite dimension registers. In fixed-point arithmetic, the numbers used are fractional and defined in [-1, 1] range and can be represented by a finite number of bits. This representation involves a quantization error. In this work, we used Qm.n notation, where m represents the number of magnitude bits and n represents the number of fractional bits. The total number of bits is b = m + n. The VHDL structure of the filter is tested with different value of n (Qm.4, Qm.7, Qm.9, Qm.11, Qm.13, Qm.15), until a maximum total number of bits equal to 20. Being the input signal normalized we verified that the one bit for the integer is enough to avoid overflow; therefore, in the next part of the work we consider m = 2.

As an example, Fig. 2 shows the impulse response for the band-pass output of the Chamberlin filter for different number of fixed-point representation bits. Note that the deviation from floating-point case becomes relevant for n < 9.

To measure the error introduced by the quantization we used the SQNR (Signal to Quantization Noise Ratio) defined as:

$$SQNR = 10\log_{10}(\sum x_j^2 / \sum x_j^2 - x_{fixed}^2)$$
(3)

Figure 3 reports the SQNR as a function of the total number of bits b, for impulse signal and for the three outputs of the filter.

To compare the hardware implementation complexity on FPGA, the FIR 5 taps and 10 taps filter structures are implemented in VHDL too, because the FIR filters design is simpler than IIR filter design. The reduced complexity of the Chamberlin filter with respect to the FIR filter is evident, while maintaining the filtering performance of a standard IIR filter, as shown in Fig. 4.

Figure 5 shows the resource utilization on a Zynq 7020. The FIR filters use a greater number of Flip-Flops with respect to the Chamberlin filter, due to the greater



number of registers required. The synthesis tool does not use DSP when b < 11, moving the multipliers and adders on the LUTs. When b > 10 the resources required by the Chamberlin filter in terms of LUT, FF and DSP are less with respect to the FIR filters. The performance of the Chamberlin filter is approximated much better by the FIR 10 with respect to the FIR 5, as shown in Fig. 4.

Power dissipation is an important specification in the design of modern electronic systems [6, 7], we investigated this aspect in the design of the Chamberlin filter. The XPE (XPower Estimator) tool of Xilinx allows to simulate the power consumption and the resource utilization of the Chamberlin filter implementation on FPGA. We tested the design for different values of total number of fixed-point representation bits. Figure 6 shows the dynamic power of all types of filter in testing. Note that the Chamberlin filter power consumption is less than the FIR 5 taps filter until the

|                   | Used resources on Zedboard (Zinq 7020) |    |     |       |     |     |       |     |     |  |
|-------------------|----------------------------------------|----|-----|-------|-----|-----|-------|-----|-----|--|
| bit n. Chamberlin |                                        |    |     | FIR 5 |     |     | FIR10 |     |     |  |
| b                 | LUT                                    | FF | DSP | LUT   | FF  | DSP | LUT   | FF  | DSP |  |
| 8                 | 249                                    | 16 | 0   | 47    | 36  | 0   | 141   | 78  | 0   |  |
| 9                 | 341                                    | 18 | 0   | 95    | 44  | 0   | 161   | 88  | 0   |  |
| 10                | 394                                    | 20 | 0   | 98    | 49  | 0   | 207   | 98  | 0   |  |
| 11                | 78                                     | 22 | 3   | 85    | 54  | 2   | 163   | 108 | 4   |  |
| 12                | 86                                     | 24 | 3   | 95    | 59  | 3   | 182   | 118 | 5   |  |
| 13                | 92                                     | 26 | 3   | 100   | 64  | 3   | 192   | 128 | 5   |  |
| 14                | 99                                     | 28 | 3   | 105   | 69  | 3   | 202   | 138 | 6   |  |
| 15                | 106                                    | 30 | 3   | 114   | 74  | 3   | 221   | 148 | 6   |  |
| 16                | 114                                    | 32 | 3   | 125   | 79  | 3   | 240   | 158 | 6   |  |
| 17                | 120                                    | 34 | 3   | 130   | 84  | 3   | 250   | 168 | 6   |  |
| 18                | 127                                    | 36 | 3   | 135   | 89  | 3   | 260   | 178 | 6   |  |
| 19                | 134                                    | 38 | 3   | 145   | 94  | 3   | 299   | 190 | 6   |  |
| 20                | 363                                    | 40 | 3   | 269   | 100 | 3   | 531   | 200 | 6   |  |

Fig. 5 Resource utilization for varying b



Fig. 6 Dynamic power consumption

precision bits total number b = 16. For b = 16 the power dissipation of the Chamberlin filter is 39% with respect to the FIR 10. The advantage of the Chamberlin filter is even more evident for the fact that it provides four different outputs (low pass, high pass, band pass and band stop filtering) at the same time, while 4 FIR filters are necessary to obtain the same outputs.

After the mapping on FPGA, we have implemented the same structure of the Chamberlin filter in fixed-point representation with b = 20 using C code by Xilinx HLS VIVADO 2017.3 to evaluate the performance of the design generated after the automatic synthesis.

Table 1 shows the reduced utilization of LUT and DSP in the synthesized design and the increase of the flip flop number.

| <b>Table 1</b> Resource utilizationfor $b = 20$ |     | VHDL | HLS |
|-------------------------------------------------|-----|------|-----|
| 1010 20                                         | LUT | 363  | 268 |
|                                                 | FF  | 40   | 183 |
|                                                 | DSP | 3    | 0   |

# 4 Conclusions

This work presents the hardware implementation of the Chamberlin state-variable filter on a FPGA. The VHDL implementation is tested for different precision bits numbers and the results are compared with floating-point design.

Thus, the performances obtained in terms of precision and subjective analysis are satisfactory for fractional bits number n > 14.

Furthermore, the Chamberlin state-variable filter VHDL design is compared with FIR filter design with two different orders. With respect to the low order of FIR filters tested (5 and 10 taps), the Chamberlin state-variable filter requires less hardware resources and provides four outputs at the same time: low pass, high pass, band pass and band stop filtering.

Furthermore, the Chamberlin state-variable filter has been implemented by High-Level Synthesis tool used to verify the optimization of VHDL design. The results show a reduction of logic resource but an increase of Flip-Flop use. That means further optimization of the design is possible.

# References

- 1. Goslin, R., Jose, S.: A guide to using field programmable gate arrays (FPGAs) for applicationspecific digital signal processing performance
- Chauhan, A.S., Lal, A.M., Maheshwari, V., Das, D.B.: Hardware implementation of DSP filter on FPGAs, Int. J. Comput. Appl. 62(16), 0975–8887 (2013)
- 3. Chamberlin, H.: Musical Applications of Microprocessors. ISBN 978-0810457683
- 4. Proakis, J., Manolakis, D.: Digital Signal Processing. Prentice Hall
- 5. Frei, B.: Digital Sound Generation. ICST Zurich University of the Arts, Zürich, Switzerland
- Giammarini, M., Conti, M., Orcioni, S.: System-level energy estimation with Powersim. In: Proceedings of IEEE International Conference on Electronics, Circuits and Systems (ICECS) 2011. Beirut, pp. 723–726 (2011)
- Vece, G., Conti, M.: Power estimation in embedded systems within a system C-based design Context: the PK tool environment. In: Proceedings of the IEEE International Workshop on Intelligent Solutions in Embedded Systems WISES09. Ancona, Italy, pp. 179–184 (2009)

# **Brake Blending and Optimal Torque Allocation Strategies for Innovative Electric Powertrains**



Luca Pugi, Tommaso Favilli, Lorenzo Berzi, Edoardo Locorotondo and Marco Pierini

**Abstract** Development of electric vehicles is not only an opportunity in terms of environmental sustainability but it also offers interesting possibilities in terms of control performances that can be achieved by on board systems devoted to increase vehicle safety and stability by modulating longitudinal efforts applied to tires. It's not only a matter of performances but also of standardization in a single integrated subsystem able to safely control vehicle dynamics of various functions that are currently implemented by different subsystems. This simplification and rationalization of the whole mechatronic system should be of fundamental importance also for the integration of autonomous or assisted driving functionalities making easier and safer system integration.

# 1 Introduction: Brake Blending for Automotive Applications

Conventional brake plant adopted for railway vehicle are mainly fluid based being hydraulic solutions [1] preferred for small to medium sized vehicles while for heavy trucks are often adopted pneumatic schemes [2] highly resembling the conventional UIC railway brake [3]. With the growing diffusion of electric traction system in the automotive sector is growing the opportunity of exploiting their four quadrant capabilities in order to perform an extensive use of regenerative braking mainly to optimize energy consumptions and consequently the autonomy of the vehicle both for traction [4] or to fed on board subsystems [5]. The amount of recovered energy is related to the characteristics of the typical driving cycle and it can reach values above 20% of the energy spent for traction, especially in urban-suburban context [6]. This is a clear difference respect to the railway application of regenerative braking where this kind of technology has been originally developed mainly to reduce the consumption

L. Pugi (🖂) · T. Favilli · L. Berzi · E. Locorotondo · M. Pierini

DIEF (Dipartimento di Ingegneria Industriale), Università degli Studi di Firenze, Via di Santa Marta 3, Florence, Italy

e-mail: luca.pugi@unifi.it

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_57



of brake friction elements (pads and discs) but it should be a not negligible aspect in terms of improvement of the environmental impact [7]. It should be finally noticed that respect to the corresponding railway application brakes play an important role also in controlling the lateral stability of the vehicle as assured indirectly by systems as the ABS [8] or directly such as the ESP [9]. For this kind of applications superior dynamic response of electric motors should be exploited to further improve stability and controllability performances of vehicles especially when wheels are actuated independently making possible the implementation of Torque Vectoring Strategies [10]. Simultaneous management of braking forces produced by different plant and actuations systems is often called brake blending. Since performances of blended braking plants are associated to different reliability and availability levels, blending system have also to assure the requested braking performances compensating limitations arising from current state of motors, drive and storage systems. In addition, the braking command strategy has to be implemented in order to let the user fully exploit regenerative braking potential while maintaining comfort and intuitiveness [11] for the user. In this work authors propose and describe innovative criteria in order to easily integrate optimal allocation and blending policies able to fully exploit in a relatively simple way torque vectoring capabilities of distributed electric traction system.

## **2** Reference Benchmark Configuration

In this work authors have considered a generic electric vehicle equipped with independent in-wheel motors that should be used to distribute traction among two or four motors wheels according simplified schemes visible in Fig. 1a–c.

For the proposed benchmark configuration, authors supposed a nested layout of standards mechatronics systems that have to access to brake actuation reproducing a common scheme which is also adopted by the most widely diffused and simulation software like Siemens Amesim<sup>™</sup> [12] as visible in Fig. 2a and b: brake demand (Fig. 2a) is pre-processed by an EBD system able to optimally distribute braking performances between wheels' respect to an estimated distribution of normal con-



tact forces. Then this brake demand is modified by an ESP system that should be able to activate and modulate brake demand also during the traction phase in order to correct vehicle behavior respect to stability criteria mostly based on a comparison between measured kinematic (yaw speed, lateral acceleration) and an expected one (a tolerated trajectory respect to ideal steering conditions). Finally, generated brake demand is processed by an inner loop corresponding to the ABS system able to modulate brake performances on each wheel in order to optimally exploit available wheel-road adhesion avoiding wheel locking and saturation of the available tangential forces which are potentially dangerous also for lateral stability. Respect to this quite conventional scheme reproduced in Fig. 2a, authors considered the following generalized approach in which the plant is generalized respect to a more general and innovative approach which is summarized in Fig. 2b: Since electric motors are able to operate in four quadrants the concept of brake demand is generalized in terms of a generic torque, longitudinal traction braking performances which is split among wheels according powertrain configuration and estimated normal contact forces (modified EBD block in Fig. 2b). Torque reference on wheels is then modified by the ESP (extended ESP-Torque Vectoring block in Fig. 2b) that should modify both traction and braking efforts on wheels according chosen powertrain configuration and different limitations of the involved actuation systems. These reference efforts are processed by a hybrid ABS-ASR subsystem since the sign of processed signals should be both positive or negative being the same system devoted both to control traction and braking maneuvers. Especially in case of braking efforts the system has to manage the application of braking efforts between conventional and electric plant, performing the previously defined blending functionalities. By comparing the two schemes of Fig. 2a and b, most noticeable differences among the two plants concern the allocation of longitudinal efforts performed by the extended ESP torque vectoring block and the brake blending one. For this reason, in this short work authors have concentrated their efforts in the description of this two blocks.

# 2.1 Optimal Allocation of Efforts for the Enhanced ESP System

In order to correct vehicle trajectory, ESP has to allocate a known correction torque  $M_z$  which is a function of the error between desired  $r_{ref}$  and estimated  $r_{feed}$  yaw rotational speed Eq. (1)

$$M_z = f(r_{ref}, r_{feed}) \tag{1}$$

In order to allocate the torque  $M_z$  the effort applied on each wheel should be corrected applying a force  $T_{Cij}$  being i and j two indexes describing the position of the wheel (Front, Rear, Left, Right). Applied correction forces has to satisfy at the



Fig. 2 a, b. Comparison between conventional (a) and innovative (b) layouts of the mechatronics on board systems aiming to modulate vehicle braking efforts

same time relation Eq. (2) and constraints  $T_{\min ij}$ ,  $T_{\max ij}$  Eq. (3) which depend from availability of both actuation systems (electric motors and mech. brakes).

$$M_{z} = \frac{1}{2}t \left(T_{fr} - T_{fl} + T_{rr} - T_{rl}\right)$$

$$= \underbrace{\begin{bmatrix} B(\text{allocation matrix}) \\ \hline \left[\frac{1}{2}t - \frac{1}{2}t \ \frac{1}{2}t - \frac{1}{2}t \ \end{bmatrix}}_{T_{Crr}} \begin{bmatrix} T_{Cfr} \\ T_{Crr} \\ T_{Crl} \end{bmatrix} \end{bmatrix} T_{C} (\text{vector of corrections applied by ESP}) \quad (2)$$

$$T_{\min_{ij}} \leq \underbrace{T_{ij}}_{T_{ij}} + T_{Cij} \leq T_{\max_{ij}} \quad (3)$$

In Eq. (3)  $T_{*ij}$  and  $T_{ij}$  represent respectively the reference torque and the corrected one (after the application of  $T_{Cij}$ ). Since Eq. (2) has potentially multiple solutions it should be solved using the Moore-Penrose pseudo-inverse of the torque allocation matrix B as previously experienced by authors in optimal thrust actuation allocation problems for underwater vehicles [13]. Solution obtained with the pseudo-inverse approach is optimal since it minimize the norm of the correction vector  $T_c$ : in this way the applied correction is minimal assuring, if possible, the respect of constraints Eq. (3). Also a well distributed allocation of efforts between wheels is obtained, this should be very useful especially in degraded adhesion conditions avoiding, as possible. The saturation of available adhesion on wheel-road contact patches. Performed calculation is performed iteratively since at each computational step  $T_{ij}$  values that violate constraints Eq. (3) are saturated on corresponding limits; then pseudo-inverse calculation is repeated. The use of an iterative procedure it's not a problem since it's possible to demonstrate that even in worst numerical conditions no more than four iterations are necessary while numerical resources needed to calculate the pseudo-inverse matrix of four or less elements is almost negligible.

#### 2.2 Brake Blending Controller

After the efforts  $T_{ij}$  have been also processed and further limited by ABS/ASR system respect to available wheel road adhesion the resulting references should be processed by a low-level blending controller which substantially performs operations described in a simplified way by Eq. (4):

$$if(T_{ij} \le 0) \Rightarrow \begin{cases} T_{ij\_ele} = \min(T_{ij}, T_{ij\_reg\,lim}) \\ T_{ij\_brk} = \min(T_{ij} - T_{ij\_reg}, 0) \end{cases} \text{"braking"} \\ \Rightarrow else T_{ij\_ele} = \min(T_{ij}, T_{ij\_tra\,lim}) \text{"traction"} \end{cases}$$
(4)

According Eq. (4) in case of braking efforts blending controller privilege the application of electric efforts ( $T_{ij\_ele}$ ) respect to conventional braking ( $T_{ij\_brk}$ ); in both cases electric efforts are limited respect to constraints ( $T_{ij\_ralim}$ ,  $T_{ij\_reglim}$ ) that should be easily calculated according powertrain configuration, state and availability of motors, drives and connected energy storage systems.

#### **3** Preliminary Results

Proposed Model was implemented in a preliminary "toy" version using Matlab Simulink 2018a and in particular the new "vehicle dynamics blockset<sup>TM</sup>" which makes available in matlab both advanced vehicle multibody models and relatively detailed models of tyre-road interaction based on widely accepted approach proposed by Pacejka. Potential advantages of the proposed approach should be easily understood looking at some preliminary results visible in Fig. 3a–c: the behavior of a vehicle with four independent in wheel motors (powertrain layout in Fig. 1b), which performs a narrow curve (radius 18 m) with degraded adhesion conditions. In this way it can be easily understood the capability of the proposed model of implementing and representing some typical behaviors of ESP and ABS systems.

Since the vehicle is equipped with four motors performed maneuvers involve a negligible usage of the conventional brake, with positive consequences both in terms of friction brake and pads (that are not used) and in terms of recovered energy (since all the braking actuation is almost entirely regenerative).



Fig. 3 a-c. Example of speed (a) and torque profiles (b) respect to performed trajectory (c)

# 4 Conclusions and Future Developments

Results of current activities applied to a generic vehicle with distributed traction systems are quite encouraging. As previous step authors are working to a further improvement of the proposed approach hoping to be able to generalize and apply this solution to the largest number of possible "Use Cases" that should be made available by the industrial partners of the OBELICS Project (current results are referred to a preliminary toy model). An extended version of this paper describing in detail both modelling methodologies and obtained results should be the natural prosecution of this preliminary work.

Acknowledgements This work is part of the OBELICS project which has received funding from the European Unions Horizon 2020 research and innovation program under grant agreement No. 769506.

# References

 Gerdes, J.C., Hedrick, J.K.: Brake system modeling for simulation and control. J. Dyn. Syst. Meas. Control. Trans. ASME 121(3), 296–503 (1999). https://doi.org/10.1115/1.2802501

- Subramanian, S.C., Darbha, S., Rajagopal, K.R.: Modeling the pneumatic subsystem of an scam air brake system. J. Dyn. Syst. Meas. Control. Trans. ASME 126(1), 36–46 (2004). https:// doi.org/10.1115/1.1666893
- Pugi, L., Malvezzi, M., Papini, S., Vettori, G.: Design and preliminary validation of a tool for the simulation of train braking performance. J. Mod. Transp. 21(4), 247–257 (2013). https:// doi.org/10.1007/s40534-013-0027-6
- Lv, C., Zhang, J., Li, Y., Yuan, Y.: Mechanism analysis and evaluation methodology of regenerative braking contribution to energy efficiency improvement of electrified vehicles. Energy Convers. Manag. 92, 469–482 (2015). https://doi.org/10.1016/j.enconman.2014.12.092
- Pugi, L., Pagliai, M., Nocentini, A., Lutzemberger, G., Pretto, A.: Design of a hydraulic servoactuation fed by a regenerative braking system. Appl. Energy 187, 96–115 (2017). https://doi. org/10.1016/j.apenergy.2016.11.047
- Berzi, L., Delogu, M., Pierini, M.: Development of driving cycles for electric vehicles in the context of the city of Florence. Transp. Res. D: Transp. Environ. 47, 299–322 (2016). https:// doi.org/10.1016/j.trd.2016.05.010
- Kukutschová, J., Roubíček, V., Malachová, K., Pavlíčková, Z., Holuša, R., Kubačková, J., Mička, V., MacCrimmon, D., Filip, P.: Wear mechanism in automotive brake materials, wear debris and its potential environmental impact. Wear, 267(5–8), 807–817 (2009). https://doi. org/10.1016/j.wear.2009.01.034
- Pasillas-Lépine, William: Hybrid modeling and limit cycle analysis for a class of five-phase anti-lock brake algorithms. Veh. Syst. Dyn. 44(2), 173–188 (2007). https://doi.org/10.1080/ 00423110500385873
- Fennel, H., Ding, E.L.: A model-based failsafe system for the continental TEVES electronicstability-program (ESP) (No. 2000-01-1635). SAE Technical Paper (2000)
- Pugi, L., Grasso, F., Pratesi, M., Cipriani, M., Bartolomei, A.: Design and preliminary performance evaluation of a four wheeled vehicle with degraded adhesion conditions. Int. J. Electr. Hybrid Veh. 9(1), 1–32 (2017). https://doi.org/10.1504/ijehv.2017.082812
- Zhang, J., Lv, C., Gou, J., Kong, D.: Cooperative control of regenerative braking and hydraulic braking of an electrified passenger car. Proc. Inst. Mech. Eng. D: J. Automob. Eng. 226, 1289–1302 (2018). https://doi.org/10.1177/0954407012441884
- 12. Siemens Amesim<sup>™</sup>, Techinical documentation release 14.00
- Pugi, L., Pagliai, M., Allotta, B.: A robust propulsion layout for underwater vehicles with enhanced manoeuvrability and reliability features. Proc. Inst. Mech. Eng. M: J. Eng. Marit. Environ. (2017). https://doi.org/10.1177/1475090217696569

# **Smart Coaster: An Example of IoT Design and Implementation**



Maurizio Rossi, Matteo Nardello and Davide Brunelli

**Abstract** Internet of Things is entering our daily life at a fast pace, with physical devices connected to the internet, collecting and sharing data. In this paper, we describe the design issues of a smart coaster that brings the IoT technology in bars and clubs. The device enables new IoT services and facilities associated with traditional drink glasses. Exploiting this practical example, we present a methodological approach to the IoT design.

# 1 Introduction

In this work, we present the design of a smart coaster that has low power consumption, compactness, LoRaWAN communication interface and low cost. The system wakes up only when required, detecting the fill level of the glass. It is cheap and easy to use and maintain, with a 3D printed enclosure and a wireless recharge.

The system has been designed to bring IoT facilities to clubs and similar commercial activities, where coasters can *interact* with the user and notify waiters only when needed, transparently for the customers.

Low-Power Wide Area Network (LPWAN) technologies are used in this application, and specifically LoRaWAN that is gaining momentum in the market. LoRa is a wireless modulation that allows long range communication at a very low energy consumption and bit-rate. LoRaWAN stack defines the communication and security protocols to guarantee interoperability on top the LoRa network. Many applications

M. Rossi · M. Nardello · D. Brunelli (🖂)

Department of Industrial Engineering, University of Trento, 38122 Trento, Italy e-mail: davide.brunelli@unitn.it URL: https://www.dii.unitn.it/

M. Rossi e-mail: maurizio.rossi@unitn.it

M. Nardello e-mail: matteo.nardello@unitn.it

*Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_58

485

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), Applications in Electronics



Fig. 1 Coaster 3D design (left), PCB layout (middle) and functional scheme (right)

have been proposed in the literature both in Industry 4.0 [1, 2] and IoT [3–5] fields. While LoRa itself is mostly used for ease of customization [6], security and privacy provided by the LoRaWAN stack allows for shorter time-to-market.

# 2 Design

The project has been developed around the *CMWX1ZZABZ* SoC developed by Murata,<sup>1</sup> because it integrates, in a really small footprint, both the MCU and the LoRa radio. Moreover, its ultra-low power consumption with a STM32L0 MCU, permits to achieve the energy autonomy required by the application. Figure 1-right shows the overall configuration of the system, while Fig. 1-left presents the final rendering of the prototype.

The system includes multiple interfaces to detect the presence and the state of the glass, a button that can be used to call the waiter and also an inductive charging circuitry to charge the built-in battery.

# 2.1 3D Printed Coaster

The physical structure of the coaster is a stacked structure where all the layers can be interlocked on each other, realized using a 3D printer. Starting from the bottom layer up, the first acts as a support, the second supports the electronics and the piezo sensors, the third layer is used to achieve uniform pressure distribution over the sensors and the top ring allows the fixing of the structure.

<sup>&</sup>lt;sup>1</sup>CMWX1ZZABZ - http://wireless.murata.com/RFM/data/type\_abz.pdf.



Fig. 2 Response of FSR 406 using  $37 \times 37$  mm (left) and  $18 \times 18$  mm (right) tips

## 2.2 Sensors

Two kind of sensors were used to detect glass fill level and presence. A Force Sensitive Resistor (*FSR*) was chosen to detect physical pressure or weight to estimate fill level. This sensor is composed of two layers of semi-conductive materials separated by a non-conductive spacer that offers a variable resistance. The higher the pressure applied, the lower the resistance of the junction between the two layers. Typically, the resistance varies from  $1 \text{ M}\Omega$  (unload condition) to  $100 \Omega$  (maximum load condition).

The transfer function is not linear, moreover, the output depends on the sensor's shape (square *FSR 406* or circular *FSR 402*) and how the force is applied. The best detection occurs when the tip (on the bottom side of the glass support) covers all the sensor's surface thus to obtain a homogeneous pressure. The readout is executed through a voltage divider directly connected to the built-in 12 bit ADC of the MCU.

Figure 2 shows how the output changes with respect to the tip shape  $37 \times 37$  mm versus  $18 \times 18$  mm using the same *FSR 406* sensor. These graphs show that a larger tip (and sensor) provides better and more stable detection of the glass fill level, providing also a greater stability.

To detect glass presence while minimizing consumptions, we included a selfpowered wake-up system exploiting a piezo-electric disc. This device allows to translate a dynamic stress into a voltage using a charge pump circuitry with a fast railto-rail JFET amplifier. The signal is enough to wake up the MCU. Such piezo devices are very versatile for a wide range of applications, for example in healthcare [7] or wellbeing [8, 9].

## 2.3 Battery and Charging System

To guarantee a working life of at least 50 hours we choose a 100 mAh LiPo battery (Table 1). The battery,  $15 \times 18 \times 7$  mm to fit the coaster, is charged through an inductive charging coil and a Li-Po charger controller, all embedded in the custom PCB.

| Consumptions data               |                    |        |       |
|---------------------------------|--------------------|--------|-------|
| Parameter                       | Symbol             | Value  | Unit  |
| Worst consumptions              | Cworst             | 0.0086 | (mAh) |
| Sending time                    | T <sub>send</sub>  | 4      | (s)   |
| Opamp consumptions              | Copamp             | 5      | (mA)  |
| Assumptions                     |                    |        |       |
| Mean working time               | t <sub>hour</sub>  | 8      | (h)   |
| Number of messages sent per day | num <sub>day</sub> | 500    | (#)   |

Table 1 Energy requirement computed for the IoT device



Fig. 3 Firmware flow chart

# 2.4 Software

**Initialization**: when the micro-controller turns on it configures the necessary registers and peripherals (see Fig. 3).

**Sleep mode**: the device remains in sleep mode until an interrupt occurs. This mode reduces consumption even if peripherals remain active.

Wake up: the device wakes up from the sleep mode when a glass is positioned above it or the user presses the *call waiter* button. In the first case the piezo acts as an interrupt which wakes up the MCU and, after a few micro-seconds, enables the ADC reading of the FSR. If the ADC value is in the empty glass range the device sends the message *empty glass* through the LoRaWAN protocol. In the second case instead, the user can call the waiter just by pressing a button and the device sends the message *waiter is needed*. In both cases the battery status (*low battery*) is included in the payload. Then a led blinks and then the device returns to the sleep mode.

The development of the LoRaWAN protocol on the selected SoC platform has as main characteristics an extremely low CPU load, reduced latencies, small memory footprint, low-power timing services and ensures easy application integration with security and privacy. The implemented network architecture is based on the connection of multiple coasters (Class A end-devices) with a LoRaWAN gateway implemented on a Raspberry Pi that uses The Things Network as network server. The data package contains the data related to the coaster and the unique coaster-*ID*.

## **3** Power Consumption

We assessed the power consumption of the system. The first graph (Fig. 4-left) shows the current drawn when the MCU reads the weight over the coaster and then sends the message, while Fig. 4-right shows the *call waiter* operation by a simple button on the coaster. Both operations include power consumption by LoRa radio.

A detail of LoRa consumption is shown in Fig. 5. The first highest peak in Fig. 5 represents the transmission (Tx) followed by two receiving windows (Rx) typical of Class A nodes. During the sleep mode consumptions are approximately 5 mA, due



Fig. 4 Current consumption of FSR with LoRa tasks (left) and waiter call with LoRa (right)





to the JFET amplifier. The step after the Tx peak, which is present in both graphs of Fig. 4 but not in Fig. 5, is related to the MCU consumption before re-entering the sleep-state. The main difference between the two profiles in Fig. 4 is related to the fact that when the glass is put on the coaster, left-case, the ADC is enabled for sensors acquisition, while, during the waiter's call, this doesn't occur.

# 4 Conclusions

In this paper, we presented a smart and compact device able to enclose all the features that make it an innovative and useful coaster with IoT facilities. Main design steps have involved the ultra-low-power consumptions using state-of-the-art SoC with built-in LoRaWAN radio and protocol stack.

# References

- Tessaro, L., Raffaldi, C., Rossi, M., Brunelli, D.: Lightweight synchronization algorithm with self-calibration for industrial LORA sensor networks. In: 2018 Workshop on Metrology for Industry 4.0 and IoT, pp. 259–263. Brescia (2018)
- Tessaro, L., Raffaldi, C., Rossi, M., Brunelli, D.: LoRa performance in short range industrial applications. In: 2018 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), pp. 1089–1094. Amalfi, Italy (2018)
- Dalpiaz, G., Longo, A., Nardello, M., Passerone, R., Brunelli, D.: A battery-free non-intrusive power meter for low-cost energy monitoring. In: 2018 IEEE Industrial Cyber-Physical Systems (ICPS), pp. 653–658. St. Petersburg (2018)
- Rizzi, M., Ferrari, P., Flammini, A., Sisinni, E.: Evaluation of the IoT LoRaWAN solution for distributed measurement applications. IEEE Trans. Instrum. Meas. 66(12), 3340–3349 (2017)

- Neumann, P., Montavont, J., Noël, T.: Indoor deployment of low-power wide area networks (LPWAN): a LoRaWAN case study. In: 2016 IEEE 12th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob), pp. 1–8. New York, NY (2016)
- Saravanan, M., Das, A., Iyer, V.: Smart water grid management using LPWAN IoT technology. In: Global Internet of Things Summit (GIoTS), pp. 1–6. Geneva (2017)
- Rossi, M., Rizzi, A., Lorenzelli, L., Brunelli, D.: Remote rehabilitation monitoring with an IoT-enabled embedded system for precise progress tracking. In: 2016 IEEE International Conference on Electronics, Circuits and Systems (ICECS), pp. 384–387. Monte Carlo (2016)
- Rossi, M., Khouia, A.O., Lorenzelli, L., Brunelli, D.: Energy neutral 32-channels embedded readout system for IoT-ready fitness equipments. In: Sensors Applications Symposium (SAS), 2016 IEEE, pp. 1–6. Catania (2016)
- Brunelli, D., Maggiorotti, M., Benini, L., Bellifemine, F.L.: Analysis of audio streaming capability of zigbee networks. In: Wireless Sensor Networks. Lecture Notes in Computer Science, LNCS, vol. 4913, pp. 189–204. Springer, Heidelberg (2008)

# **IP** Generator Tool for Efficient Hardware Acceleration of Self-organizing Maps



Daniele Giardino, Marco Matta, Marco Re, Francesca Silvestri and Sergio Spanò

Abstract In this paper, authors present an IP generator for FPGA-based hardware acceleration of Kohonen's Self-Organizing Maps (SOM). The IP generator is realized in MATLAB and offers the user the possibility to design an efficient FPGA hardware accelerator with several settings such as the number of features and the number of neurons. The optimization is achieved by applying some approximations to the original SOM algorithm, these modifications do not affect the functionality of the map. The generated IP cores can be used both for training and inference and the software can check the clustering performances.

# 1 Introduction

In recent years, Machine Learning (ML) algorithms were introduced in several fields [1–3]. The increasing interest in ML can be associated both to the high computational capabilities of modern electronic devices and the introduction of new technologies [4–8]. Anyway, ML can be also applied in scenarios such as the Internet of Things (IoT) [9], satellite [10] or unmanned aerial systems (UAS) [11], where intelligence should be moved in specific elements of the network due to the limitations of some devices.

D. Giardino e-mail: giardino@ing.uniroma2.it

M. Matta e-mail: matta@ing.uniroma2.it

M. Re e-mail: re@ing.uniroma2.it

F. Silvestri e-mail: f.silvestri@ing.uniroma2.it

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_59

493

D. Giardino · M. Matta · M. Re · F. Silvestri · S. Spanò (⊠) University of Rome Tor Vergata, Rome, Italy e-mail: spano@ing.uniroma2.it

Currently, the market offers several electronic devices for efficiently implementing ML systems. In this context, FPGAs represent a very interesting choice thanks to their flexibility and high computational power.

The literature proposes multiple examples of FPGA-based hardware accelerators for machine learning. However, the design of such systems requires a considerable effort as it implies the use of hardware description languages (HDL).

In this work, authors present an optimized VHDL code IP generator for Self-Organizing Maps (SOM). The IP generator offers designer the possibility to generate SOM with several settings as, for example, the number of features (dimensions) and the number of neurons. The generated IP cores can be used both for training and for inference stages. Our architecture is able to reduce the hardware complexity of the map while not affecting the clustering performances, this is achieved by some mathematical approximations on the original algorithm.

The SOM algorithm proposed by Teuvo Kohonen [12] uses an unsupervised learning method for mapping high dimensional input data to a low dimensional space that has typically two dimensions. The neurons of a SOM are arranged in a twodimensional array and, for each of them, an *N*-dimensional weight vector  $\vec{m}_i$  is assigned.

In the traditional training mode of SOM, a set of *N*-dimensional input vectors  $\vec{x}$ , representing the examples for the training process, are fed to the algorithm one at time.

The update process is based on the winner neuron, also known as Best-Matching-Unit (BMU). Since both the weight vectors and the inputs have the same spatial dimensions, they can be represented in an N-dimensional space. The winner neuron, identified in this work with a subscript w, is the closest one to the considered input.

The update formula for the weight vectors Eq. (1) depends on the considered winner neuron and on a function  $h_{wi}$  called *neighbourhood* function. All the neurons are updated simultaneously.

$$\vec{m}_{i}(t+1) = \vec{m}_{i}(t) + h_{wi}(t) \cdot \left[\vec{x} - \vec{m}_{i}\right]$$
(1)

The neighbourhood function has its geometric center on the winner neuron and it monotonically decreases as the distance from the point increases. The most common type of neighbourhood function is the Gaussian one (GNF) defined as:

$$h_{wi}(t) = \eta(t) \exp\left(-\frac{\|\vec{m}_i - \vec{m}_w\|^2}{2\sigma^2(t)}\right)$$
(2)

where  $\eta(t)$  is the learning factor of the network and the variance  $\sigma^2(t)$  represents the neighbourhood radius, both should be monotonically decreasing functions.

#### 2 Hardware Accelerator Model

In order to efficiently implement SOM on FPGAs, the proposed IP generator introduces some modifications to the original SOM presented in [12]. Equations (1) and (2) are modified replacing the Euclidean distance with the Manhattan distance (less hardware complexity is required) and the Gaussian neighbourhood function is replaced with a base-two exponential function consisting of simple shifts [13].

The proposed Neighbourhood Function updates all the neurons simultaneously in order to get a convergence performance as possible closer to the original Gaussian one. Our neighbourhood function avoids using multiplications, divisions and the computation of any exponential function by converting all the operations into  $2^i$ cases. It is well known that multiplications and divisions by power of two can be easily accomplished by simply using arithmetic barrel shifters that are very fast respect to traditional architectures. Considering also the approximation of  $e \cong 2$ , the Eq. (2) becomes:

$$h_{wi}(t) = 2^{-\left[\frac{\|\vec{m}_{i} - \vec{m}_{w}\|}{2^{b}} + \eta + t_{corr}(t)\right]}$$
(3)

The  $2^{b}$  factor is constant and is related to the original neighbourhood radius  $\sigma^{2}(t)$  while  $\eta + t_{corr}(t)$  is related to the original learning rate  $\eta(t)$ . The monotonically decreasing trend of the function has been moved completely to the  $t_{corr}(t)$  factor. The  $t_{corr}(t)$  trend is designed to decrease by one unit every  $2^{t_{bias}}$  epochs where  $t_{bias}$  is a constant parameter.

## 2.1 Hardware Architecture

The implementation of Eqs. (1) and (3) is shown in Fig. 1. Each neuron has its own update structure.

The parameters  $m_i$  represent the weight vectors,  $m_w$  represents the winner neuron weight vector,  $|d_i|$  are the distances between the input x and the i-th neuron,  $d_i = \vec{x} - \vec{m}_i$  and  $d_m$  is the distance between the winner and the i-th neuron.

# **3 VHDL SOM IP Generator**

The IP generator offers to the designers the possibility to configure parameters and to generate the VHDL code using a Graphical User Interface (GUI) realized in MATALB. After the start-up, the program prompts the user for the parameters of the map as shown in Fig. 2. The user can choose the number of features, the number of neurons and the bit size for all the weights. The neurons can be initialized in a hexagonal, grid or random topology covering a certain percentage of the N-d space.



Fig. 1 Hardware architecture for the update formula of the optimized SOM model

| Welcome to the Self              | -Organizing Map | s IP core ge | nerator           |              |                |            |     |                    |  |
|----------------------------------|-----------------|--------------|-------------------|--------------|----------------|------------|-----|--------------------|--|
| SOM settings                     |                 | SOM a        | rchitectu         | re preview   |                |            |     |                    |  |
| Number of features               |                 |              |                   |              |                |            |     |                    |  |
|                                  | Validate        | 1            |                   |              |                |            |     |                    |  |
| Number of neurons                |                 |              | Weight            | • • • •      | Input datases  |            | ~ * | Winner             |  |
|                                  | Validate        |              |                   |              |                |            |     | evaluation         |  |
| Weights bit precision (unsigned) |                 | 4            | Weight<br>updater | + n ing _ 5~ | walutor        | Ş.         | ~ • |                    |  |
|                                  | Validate        |              |                   |              |                | Woner neur |     | ÷.                 |  |
|                                  |                 |              |                   |              |                | . Tenger   |     |                    |  |
| Neurons initializatio            | on settings     |              |                   |              |                |            |     |                    |  |
| Topology                         |                 | 4            | Weight            | + ~~ _~      | Piput distance | į.         | ~ • | Winner<br>distance |  |
| Random                           | ~               |              |                   | ,            | +              | <u>1</u>   | -   | evaluation         |  |
| Space span [%]                   |                 |              |                   |              |                |            |     |                    |  |
|                                  | Validate        |              |                   |              |                |            |     |                    |  |
| Space span [%]                   | Validate        |              |                   |              |                |            |     |                    |  |

Fig. 2 Initial configuration prompt

# 4 Simulation

The proposed tool provides a fast way to test the functionality of the generated SOM. The user can train the net with an array of inputs for a certain number of epochs. The software can show the results for a map of maximum 3 features.

As example, Fig. 3 shows the training results of a system where have been used 3 features, 6 noisy clusters (each one consisting of 100 inputs) randomly initialized in a 16 bits quantized space. The map was randomly initialized with 16 neurons, the first plot is the initial state and the second one is the result of the training process. The green smaller dots represent the clusters and the blue circles represent the neurons. In more than 90% of our tests, the clustering performances were satisfying.



Fig. 3 Learning simulation results using 3 features



Fig. 4 Number of slices, dynamic power and GCUPS of different architectures

# 5 Circuit Area, Power and Performances

In order to validate the IP generator, some Synthesis and Place & Route have been performed using the Xilinx Vivado 2017.4 toolchain and the FPGA Virtex 7 xc7vx690t as target device. Experiments have been performed using different SOM configurations. In this section, authors show experimental results for the following configurations:

- 8 bits for representing each weight of the neuron
- 1, to 4 features
- 16, 36, 64, 100 and 144 neurons.

Figure 4 shows the slices, the dynamic power and the Connections Updates Per Second (CUPS) for the various generated architectures. Notice that the power has been estimated using a worst-case approach considering an activity factor of 0.5 on every node of the synthesized network.

The implementation results show how our architectures achieve very high computational performances using a limited amount of hardware resources.

# 6 Conclusion

In this work, we proposed an optimized IP core generator for VHDL code of a hardware accelerator for Self-Organizing Maps. It can accelerate both the learning phase (training) and the recall phase (inference). Thanks to its flexibility, it can be used for any application that requires a high number of neurons or features and low resources, even coupled to a micro-processor [14, 15]. In a future update of the software, we will able to provide an AXI interface that would further facilitate the designer the implementation of our SOM IP core on a System-of-Chip (SoC).

# References

- Lo Sciuto, G., Susi, G., Cammarata, G., Capizzi, G.: A spiking neural network-based model for anaerobic digestion process. In: IEEE 23rd International Symposium on Power Electronics, Electrical Drives, Automation and Motion (2016)
- Brusca, S., Capizzi, G., Lo Sciuto, G., Susi, G.: A new design methodology to predict wind farm energy production by means of a spiking neural network based-system. Int. J. Numer. Model. Electron. Netw. Devices Fields 7 (2017)
- Scarpato, N., Pieroni, A., Di Nunzio, L., Re, M., Salerno, M., Susi, G.: E-health-IoT universe: A review. Int. J. Adv. Sci. Eng. Inf. Technol. 7(6), 2328–2336 (2017)
- 4. Cardarilli, G.C., Cristini, A., Di Nunzio, L., Re, M., Salerno, M., Susi, G.: Spiking neural networks based on LIF with latency: simulation and synchronization effects. In: Asilomar Conference on Signals, Systems and Computers (2016)
- Khanal, G.M., Acciarito, S., Cardarilli, G.C., Chakraborty, A., Di Nunzio, L., Fazzolari, R., Cristini, A., Re, M., Susi, G.: Synaptic behaviour in ZnO-rGO composites thin film memristor. Electron. Lett. 53(5), 296–298 (2017)
- Acciarito, S., Cardarilli, G.C., Cristini, A., Di Nunzio, L., Fazzolari, R., Khanal, G.M., Re, M., Susi, G.: Hardware design of LIF with Latency neuron model with memristive STDP synapses. Integr. VLSI J. 59, 81–89 (2017)
- Khanal, G.M., Cardarilli, G., Chakraborty, A., Acciarito, S., Mulla, M.Y., Di Nunzio, L., Fazzolari, R., Re, M.: A ZnO-rGO composite thin film discrete memristor. IEEE, ICSE, Article No. 7573608, pp. 129–132 (2016)
- Acciarito, S., Cristini, A., Di Nunzio, L., Khanal, G.M., Susi, G.: An a VLSI driving circuit for memristor-based STDP. PRIME 2016, Article No. 7519503 (2016)
- Giuliano, R., Mazzenga, F., Neri, A., Vegni, A.M.: Security access protocols in IoT capillary networks. IEEE Internet Things J. 4(3), 645–657 (2017)
- Sacchi C., Rossi T., Menapace M., Granelli F.: Utilization of UWB transmission techniques for broadband satellite connections operating in W-Band. In: 2008 IEEE Globecom Workshops (2008) 1–6
- Dalmasso I., Galletti I., Giuliano R., Mazzenga F.: WiMAX networks for emergency management based on UAVs. In: IEEE—AESS European Conference on Satellite Telecommunications, pp. 1–6 (2012)
- 12. Kohonen, T.: The self-organizing map. Neurocomputing 21, 1–6 (1998)
- Martín-del-Brío, B., Blasco-Alberto, J.: Hardware-oriented models for VLSI implementation of self-organizing maps. In: International Workshop on Artificial Neural Networks, pp. 712–719 (1995)
- Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Re, M., Silvestri, F., Spanò, S.: Energy consumption saving in embedded microprocessors using hardware accelerators. Telkomnika 16(3), 1019–1026 (2018)

IP Generator Tool for Efficient Hardware Acceleration ...

 Cardarilli, G.C., Di Nunzio, L., Fazzolari, R., Re, M., Lee, R.B.: Integration of butterfly and inverse butterfly nets in embedded processors: effects on power saving. In: Conference Record – Asilomar Conference on Signals, Systems and Computers, Article No. 6489268, pp. 1457–1459 (2012)

# **Correction to: Applications in Electronics Pervading Industry, Environment and Society**



Sergio Saponara and Alessandro De Gloria

Correction to: S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7

In the original version of the book, the following belated correction has been incorporated: The Volume number "550" has been changed to "573" in the book.

The updated version of the book can be found at https://doi.org/10.1007/978-3-030-11973-7

<sup>©</sup> Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7\_60

# **Author Index**

#### A

Abdolzadeh, Vida, 375 Addabbo, Tommaso, 343 Akkad, Ghattas, 153 Alimenti, Federico, 349 Andriolli, N., 77 Antonelli, Manuela, 359 Armenise, Mario N., 53

#### B

Baldanzi, Luca, 11 Baronti, F., 367 Bartolini, Andrea, 169 Batista, Edgar, 59 Bauer, Anton, 83 Bellotti, Francesco, 3, 313 Benini, Luca, 169 Bertacchini, Alessandro, 109 Berta, Riccardo, 3, 313 Bertolucci, Matteo, 11 Berzi, Lorenzo, 477 Bhattacharya, Jhilik, 145 Biagetti, Giorgio, 187 Bizzi, Emilio, 179 Bonato, Paolo, 179 Boni, Enrico, 129, 295, 453 Brunelli, Davide, 169, 205, 399, 485 Brunetti, Giuseppe, 53 Bruschi, Paolo, 93 Buccolini, Luca, 221 Buettner, Jonas, 83

#### С

Caldari, Marco, 469 Cammarata, S., 367 Canziani, Alfredo, 145 Cardarilli, Gian Carlo, 253, 445 Caridi, A., 137 Carminati, Marco, 359 Carrato, Sergio, 145 Cavarra, Andrea, 195 Caviglia, D. D., 137 Chiarelli, A. M., 45 Chiesa, M., 77 Ciarpi, Gabriele, 269, 421, 437 Cicala, Francesco, 145 Ciminelli, Caterina, 53 Cito, Michele, 53 Cocorullo, Giuseppe, 213 Colli, M., 137 Conteduca, Donato, 53 Conti, Massimo, 221, 391, 469 Cordopatri, Antonio, 213 Crippa, Paolo, 187 Crocetti, Luca, 11

#### D

Dallai, Alessandro, 129, 295 De Bortoli, Luca, 145 De Caro, Davide, 237 De Gloria, Alessandro, 3, 313 de Trujillo, Eva Rodríguez, 391 De Venuto, Daniela, 37, 287 Dell'Olio, Francesco, 53 Dello Sterpaio, Luca, 407 Delucchi, A., 137 Demarchi, Danilo, 179 Di Mascio, Stefano, 319 Di Mauro, Michele, 359 Di Meo, Gennaro, 237

© Springer Nature Switzerland AG 2019

S. Saponara and A. De Gloria (eds.), *Applications in Electronics Pervading Industry, Environment and Society*, Lecture Notes in Electrical Engineering 573, https://doi.org/10.1007/978-3-030-11973-7

Author Index

Di Nunzio, Luca, 253 Di Pascoli, Stefano, 421 Di Pasquale, Fabrizio, 415

#### Е

Edlinger, G., 45 ElHassan, Bachar, 153 Erlbacher, Tobias, 83 Esposito, Darjn, 237

#### F

Falaschetti, Laura, 187 Falcone, Francisco, 279 Fanucci, Luca, 11, 407, 437 Faralli, Stefano, 77, 415 Favilli, Tommaso, 477 Fazzolari, Rocco, 253 Ferrari, Paolo, 19 Fieramosca, G., 367 Finocchiaro, Alessandro, 69 Flammini, Alessandra, 19 Furano, Gianluca, 319

#### G

Gaiduk, Maksym, 391 Gambini, F., 77 Garbuglia, Federico, 221 Giaconia, Giuseppe Costantino, 45, 303 Giardino, Daniele, 253, 445, 493 Girlando, Giovanni, 69 Grammatikakis, M. D., 27 Greco, G., 45 Guidi, Francesco, 295 Guzzi, Francesco, 145

#### K

Kobeissi, Ahmad, 3 Krauss, Thomas F., 53 Kumar, Sunny, 349

#### L

Landi, Elia, 343 Lanza, L. G., 137 Lenge, Matteo, 129 Le Roy, Frederic, 153 Locorotondo, Edoardo, 477 Lovecchio, Nicola, 117

#### Μ

Magazzù, Guido, 269 Mansour, Ali, 153 Marino, Antonino, 407 Marsi, Stefano, 145 Martínez-Ballesté, 59 Martínez Madrid, Natividad, 187, 391 Martina, Francesco, 415 Martina, Maurizio, 245, 461 Masera, Guido, 245 Masera, Maurizio, 245 Mastrandrea, Antonio, 261, 429 Matera, Riccardo, 327, 383 Matta, Marco, 253, 445, 493 Meacci, Valentino, 101, 295, 327 Menichelli, Francesco, 261, 429 Menicucci, Alessandra, 319 Meoni, Gabriele, 437 Merla, A., 45 Mezzanotte, Paolo, 349 Mezzera, Lorenzo, 359 Mezzina, Giovanni, 37, 287 Mistretta, Leonardo, 45, 303 Mohamed, Anwar, 349 Montagni, Marco, 453 Monteleone, Claudio, 319 Moretti, Riccardo, 343 Motta, Alessandro, 69 Motto Ros, Paolo, 179 Mouzakitis, N., 27 Muanenda, Yonas, 415 Mugnaini, Marco, 343

#### Ν

Najem, Mohamad, 153 Nannipieri, Pietro, 407 Napoli, Ettore, 237 Nardello, Matteo, 399, 485 Neri, B., 367 Nocera, Claudio, 195 Ntallaris, E., 27 Nurra, V., 77

#### 0

Olivieri, Mauro, 261, 429 Orcioni, Simone, 221, 391 Ottavi, Marco, 319

#### P

Pagani, Alberto, 69 Pagano, Antonino, 303 Palazzi, Valentina, 349 Palla, Alessandro, 437 Palla, Fabrizio, 269 Palmisano, Giuseppe, 69, 195, 335 Papotto, Giuseppe, 195 Parisi, Alessandro, 335 Parodi, Andrea, 313 Parri, Lorenzo, 343 Pasetti, Marco, 19 Pastorino, M., 137 Patelis, K., 27 Patsakis, Costas, 279 Pavan, Paolo, 109 Peloso, Riccardo, 245 Peña, Marta, 59 Petra, Nicola, 375 Petroni, F., 77 Pierini, Marco, 477 Pilato, Luca, 437 Piotto, Massimo, 93 Piperaki, V., 27 Placidi, Pisana, 117 Polonelli, Tommaso, 169 Pugi, Luca, 453, 477

#### R

Ragonese, Egidio, 195, 335 Ramalli, Alessandro, 129, 295 Ramponi, Giovanni, 145 Randazzo, A., 137 Re, Marco, 253, 445, 493 Ria, Andrea, 93 Ricci, Adriana, 469 Ricci, Stefano, 101, 295, 327, 383 Rinaldi, Stefano, 19 Ripa, Franco, 469 Rizzo, R., 45 Roselli, Luca, 349 Rossi, Fabio, 179 Rossi, Maurizio, 205, 485 Rossi, Stefano, 129 Ruo Roch, Massimo, 461 Russo, Dario, 101, 295, 327

#### S

Sapienza, Stefano, 179 Saponara, Sergio, 11, 269, 367, 421 Scaringella, Monica, 295 Scherz, Wulhelm Daniel, 187 Scorzoni, Andrea, 117 Seepold, Ralf, 187, 391 Selvo, Paolo, 245 Shafique, Muhammad, 245 Silvestri, Federica, 261 Silvestri, Francesca, 253, 445, 493 Silvestrini, Mattia, 469 Simoncini, Flavio, 19 Simone, Lorenzo, 445 Singla, Xavier, 59 Sisinni, Emiliano, 19 Solanas, Agusti, 59, 279 Spanò, Sergio, 253, 445, 493 Spina, Nunzio, 335 Stagnaro, M., 137 Stazi, Giulia, 261, 429 Strollo, Antonio G. M., 237

#### Т

Tani, Marco, 343 Tizzoni, Marco, 359 Tosato, Pietro, 205 Turchetti, Claudio, 187 Turolla, Andrea, 359

#### U

Unterhorst, Matteo, 221

#### V

Valigi, Paolo, 117 Vilkomerson, David, 383 Vitale, Gianpaolo, 303 Vougioukalos, G., 27

#### Z

Zappasodi, F., 45