Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

So you’re planning your IOT solution and have decided to build into your solution a database server. Perhaps you’ve never used a database system before or maybe you’ve used one as a user but have never had any need to set up one from scratch. Or perhaps you’ve decided to discover what all the fuss is about database systems in general. Whichever the case, you have the core knowledge you need to get startedFootnote 1: you know what you want to store and what the data looks like.

Recall we discussed and saw some examples of IOT data and how best to store them for an IOT solution. Recall there are trade-offs for each type of data we want to store. You also learned more about how to augment data to make it more useable in your IOT solutions. As you saw, it isn’t always the case that the originator of the data (the sensor nodes or platform) that needs to do this augmentation. As you learned, data aggregators with more computing resources are better suited for such operations. However, you will also see that there is even more power in database servers (sometimes called data nodes since you can have more than one), which can perform data aggregation and annotation automatically.

In this chapter, you will also see how to put these techniques in practice to store data in a database. More specifically, you will learn how to use MySQL and leverage that knowledge in an IOT solution. We begin with a short discussion on how to get MySQL, install it, and make your first database. The rest of the chapter is devoted to presenting a short primer on how to use MySQL through examples.

Getting Started

MySQL is the world’s most popular open source database system for many excellent reasons. First, it is open source, which means anyone can use it for a wide variety of tasks for free.Footnote 2 Best of all, MySQL is included in many platform repositories, making it easy to get and install. If your platform doesn’t include MySQL in the repository (such as aptitude), you can download it from the MySQL web site ( http://dev.mysql.com ).

Oracle Corporation owns MySQL. Oracle obtained MySQL through an acquisition of Sun Microsystems, which acquired MySQL from its original owners, MySQL AB. Despite fears to the contrary, Oracle has shown excellent stewardship of MySQL by continuing to invest in the evolution and development of new features as well as faithfully maintaining its open source heritage. Although Oracle also offers commercial licenses of MySQL—just as its prior owners did in the past—MySQL is still open source and available to everyone.

What Is Open Source? Is It Really Free?

Open source software grew from a conscious resistance to the corporate-property mind-set. While working for MIT, Richard Stallman, the father of the free software movement, resisted the trend of making software private (closed) and left MIT to start the GNU (GNU Not Unix) project and the Free Software Foundation (FSF).

Stallman’s goal was to reestablish a cooperating community of developers. He had the foresight, however, to realize that the system needed a copyright license that guaranteed certain freedoms. (Some have called Stallman’s take on copyright “copyleft,” because it guarantees freedom rather than restricts it.) To solve this, Stallman created the GNU Public License (GPL). The GPL, a clever work of legal permissions that permits the code to be copied and modified without restriction, states that derivative works (the modified copies) must be distributed under the same license as the original version without any additional restrictions.

There was one problem with the free software movement. The term free was intended to guarantee freedom to use, modify, and distribute; it was not intended to mean “no cost” or “free to a good home.” To counter this misconception, the Open Source Initiative (OSI) formed and later adopted and promoted the phrase open source to describe the freedoms guaranteed by the GPL license. For more information about open source software, visit www.opensource.org .

How Do I Use MySQL?

MySQL runs as a background process (or as a foreground process if you launch it from the command line) on your system. Like most database systems, MySQL supports Structured Query Language (SQL). You can use SQL to create databases and objects (using data definition language [DDL]), write or change data (using data manipulation language [DML]), and execute various commands for managing the server.

To issue these commands, you must first connect to the database server. MySQL provides a client application named mysqlFootnote 3 that enables you to connect to and run commands on the server. The client accepts SQL commands as well as a few commands specific to the client itself. A semicolon must terminate all commands.

Tip

To see a list of the commands available in the client, type help and press Enter at the prompt.

To connect to the server, you must specify a user account and the server to which you want to connect. If you are connecting to a server on the same machine, you can omit the server information (host and port) as these default to localhost on port 3306. The user is specified using the --user (or -u) option. You can specify the password for the user on the command, but the more secure practice is to specify --password (or -p), and the client with prompt you for the password. If you do specify the password on the command line, you will be prompted with a warning encouraging you to not use that practice.

Using the mysql client on the same machine without the --host (or -h) and --port option does not use a network connection. If you want to connect using a network connection or want to connect using a different port, you must use the loopback address. For example, to connect to a server running on port 13001 on the same machine, use the command mysql -uroot -p –h127.0.0.1 --port=13001.

Listing 5-1 shows examples of several SQL commands in action using the mysql client.

Listing 5-1.Commands Using the mysql Client

$ mysql -uroot -p

Enter password:

Welcome to the MySQL monitor.  Commands end with ; or \g.

Your MySQL connection id is 4

Server version: 5.7.8-rc-log MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its

affiliates. Other names may be trademarks of their respective

owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> CREATE DATABASE plant_monitoring;

Query OK, 1 row affected (0.01 sec)

mysql> CREATE TABLE plant_monitoring.plants (plant_name char(50), sensor_value int, sensor_event timestamp);

Query OK, 0 rows affected (0.06 sec)

mysql> INSERT INTO plant_monitoring.plants VALUES ('living room', 23, NULL);

Query OK, 1 row affected (0.04 sec)

mysql> SELECT * FROM plant_monitor.plants;

+-------------+--------------+---------------------+

| plant_name  | sensor_value | sensor_event        |

+-------------+--------------+---------------------+

| living room |           23 | 2015-09-22 19:54:01 |

+-------------+--------------+---------------------+

1 row in set (0.01 sec)

mysql> SET @@global.server_id = 111;

Query OK, 0 rows affected (0.00 sec)

mysql>

In this example, you see DML in the form of the CREATE DATABASE and CREATE TABLE statements, DDL in the form of the INSERT and SELECT statements, and a simple administrative command to set a global server variable. Next you see the creation of a database and a table to store the data, the addition of a row in the table, and finally the retrieval of the data in the table. Notice how I used capital letters for SQL command keywords. This is a common practice and helps make the SQL commands easier to read and easier for find user-supplied options or data.

You can exit the MySQL client by typing the command quit. On Linux and Unix systems, you can press Ctrl+D to exit the client.

A great many commands are available in MySQL. Fortunately, you need master only a few of the more common ones. The following are the commands you will use most often. The portions enclosed in <> indicate user-supplied components of the command, and [...] indicates that additional options are needed.

  • CREATE DATABASE <database_name>: Creates a database

  • USE <database>: Sets the default database (not an SQL command)

  • CREATE TABLE <table_name> [...]: Creates a table or structure to store data

  • INSERT INTO <table_name> [...]: Adds data to a table

  • UPDATE [...]: Changes one or more values for a specific row

  • DELETE FROM <table_name> [...]: Removes data from a table

  • SELECT [...]: Retrieves data (rows) from the table

  • SHOW [...]: Shows a list of the objects

Although this list is only a short introduction and nothing like a complete syntax guide, there is an excellent online reference manual that explains every command (and much more) in great detail. You should refer to the online reference manual whenever you have a question about anything in MySQL. You can find it at http://dev.mysql.com/doc/ .

One of the more interesting commands shown allows you to see a list of objects. For example, you can see the databases with SHOW DATABASES, a list of tables (once you change to a database) with SHOW TABLES, and even the permissions for users with SHOW GRANTS. I find myself using these commands quite frequently.

Tip

If you use the mysql client, you must terminate each command with a semicolon (;) or \G.

If you are thinking that there is a lot more to MySQL than a few simple commands, you are absolutely correct. Despite its ease of use and fast startup time, MySQL is a full-fledged relational database management system (RDBMS). There is much more to it than you’ve seen here. For more information about MySQL, including all the advanced features, see the reference manual.

Mysql—What Does It Mean?

The name MySQL is a combination of a proper name and an acronym. SQL is Structured Query Language. The My part isn’t the possessive form—it is a name. In this case, My is the name of the founder’s daughter. As for pronunciation, MySQL experts pronounce it “My-S-Q-L” and not “my sequel.”

How to Get and Install MySQL

The MySQL server is available for a variety of platforms including most Linux and Unix platforms, Mac OS X, and Windows. To download MySQL server, visit http://dev.mysql.com/downloads/ and click Community and then MySQL Community Server. This is the GPLv2 license of MySQL.Footnote 4 The page will automatically detect your operating system. If you want to download for another platform, you can select it from the drop-down list.

The download page will list several files for download. Depending on your platform, you may see several options including compressed files, source code, and installation packages. Most will choose the installation package for installation on a laptop or desktop computer. Figure 5-1 shows an example for the APT repository for the Debian and Ubuntu platforms.

Figure 5-1.
figure 1figure 1

Download page for Ubuntu Linux

One of the most popular platforms is Microsoft Windows. Oracle has provided a special installation packaging for Windows named the Windows Installer. This package includes all the MySQL products available under the community license including MySQL Server, Workbench, Utilities, Fabric, and all of the available connectors (program libraries for connecting to MySQL). This makes installing on Windows a one-stop, one-installation affair. Figure 5-2shows the download page for the Windows installer. The following paragraphs demonstrate how to install MySQL on Windows 10. You will see how to install MySQL on single-board computers like the Raspberry Pi in the next chapter.

Figure 5-2.
figure 2figure 2

Download page for Windows Installer

Begin by choosing either the Windows Installer 32- or 64-bit installation package that matches your Windows version. Once the file is downloaded, click the file to begin installation. Note that some browsers such as the new Edge browser may ask you if you want to launch the installation. You may need to reply to a dialog permitting the installation.

The first step is agreeing to the license. Figure 5-3 shows the license agreement panel of the installation dialog.

Figure 5-3.
figure 3figure 3

License agreement

The license shown is the GPLv2 license for the community edition. Once you have read the licenseFootnote 5 and agree, select the “I accept the license terms” checkbox and click Next.

The next panel displays the setup or installation type. Most will choose the developer option because it installs all the MySQL components and applications and sets the defaults for running MySQL on the local machine. You can choose a different option and read more about each in the text to the right. Figure 5-4 shows the setup type panel. Once you make a selection, click Next.

Figure 5-4.
figure 4figure 4

Setup type

The next panel will check for any required components. For example, if you are installing on a machine that does not include Python or Visual Studio, you will get a warning, as shown in Figure 5-5. To proceed, you must resolve each of the issues. That is, the buttons will not be available until the requirement is resolved. Once you have them resolved, click Next.

Figure 5-5.
figure 5figure 5

Check Requirements page

The next panel will show all the packages available for installation. If you do not want to install one or more of them, you can click each and choose not to install them. Figure 5-6 shows an example with all packages marked for installation. Once you are satisfied with the options for installation, click Execute.

Figure 5-6.
figure 6figure 6

Preparing for installation

The panel will update with the progress of each installation, as shown in Figure 5-7.

Figure 5-7.
figure 7figure 7

Installing packages

Different symbols will appear next to each package as the installation proceeds. In some cases, the installer may download additional packages. You can see this in Figure 5-7. Note that Connector/Python 2.1.2 was downloaded. You can click the Show Details button to see more details of the installation.

Once all the packages selected have been installed, you will see the configuration panel, as shown in Figure 5-8. This panel shows you a list of the different configuration options based on the packages you chose. If you chose to install everything, you will see a panel similar to the figure. Click Next to proceed.

Figure 5-8.
figure 8figure 8

Product configuration

The configuration process will be different for each product. For example, to configure MySQL server, there are several steps beginning with networking, as shown in Figure 5-9. This panel allows you to choose a configuration type (how the server launches and runs) as well as the networking specifics. It is on this page that you can choose the TCP/IP port that the server will listen for connections. If you want to configure additional parameters, select the Show Advanced Options checkbox. Once you have made you selections, click Next to move to the next step.

Figure 5-9.
figure 9figure 9

Type and Networking page

The next panel is the accounts and roles panel, as shown in Figure 5-10. This panel allows you to set up initial user accounts as well as the root password. It is strongly recommended you select a strong password for the root account. You can also set up additional user accounts with different roles by clicking the Add User button. Once you have your settings chosen, click Next to move to the next step.

Figure 5-10.
figure 10figure 10

Accounts and Roles page

The next panel allows you to control how MySQL is started on the Windows machine. Figure 5-11 shows the details. Notice you can configure the server to start as a Windows service, start MySQL automatically at startup (or not), and what type of account the server will use. I strongly recommend leaving the default for that setting unless you know how to setup an account for running a service. Click Next to move to the next step.

Figure 5-11.
figure 11figure 11

Windows Service page

If you checked the advanced settings in the first configuration panel, you will see the advanced options panel, as shown in Figure 5-12. This panel allows you to turn on the general log (for recording server feedback statements), query log (for recording all queries), and the binary log (for use in replication and backup). If you plan to use the server in a replication setup (I will discuss this in Chapter 7), you should turn on the binary log. The server ID must be unique among all servers in a replication setup, and you can set that on this panel. Once you have chosen your settings, click Next.

Figure 5-12.
figure 12figure 12

Advanced Options panel

The next panel shown in Figure 5-13 displays the progress of the configuration.

Figure 5-13.
figure 13figure 13

Server configuration execution

If you chose to start MySQL as a Windows service, you will see a second set of statements listed, as shown in Figure 5-14. Finally, if you chose to install the samples and example databases, you will see another dialog panel showing the progress of installing the sample databases. Once all steps are complete, click Finish.

Figure 5-14.
figure 14figure 14

Configuring examples

One of the things I like most about the Windows Installer besides being a one-stop installation mechanism is the ability to use the installer again to make changes. That is, you can run the installer another time to install a different package or even remove packages you no longer need. It is a handy way to install and configure MySQL on Windows.

Note

Installers for other platforms are adopting similar mechanisms as the Windows Installer. For example, most have the configuration steps in the installation package.

Now that you know how to set up MySQL, let’s discuss how you use MySQL to store and retrieve data.

How Data Is Stored and Retrieved

Now that you know what MySQL is and how it is used, you need to know a bit more about RDBMSs and MySQL in particular before you start building your first database server. This section discusses how MySQL stores data (and where it is stored), how it communicates with other systems, and some basic administration tasks required to manage your new MySQL server.

Note

I will show you how to install MySQL on the Raspberry Pi and similar boards in Chapter 6.

What Is A Relational Database Management System?

An RDBMS is a data storage and retrieval service based on the Relational Model of Data as proposed by E. F. Codd in 1970. These systems are the standard storage mechanism for structured data. A great deal of research is devoted to refining the essential model proposed by Codd, as discussed by C. J. Date in The Database Relational Model: A Retrospective Review and Analysis.Footnote 6 This evolution of theory and practice is best documented in The Third Manifesto.Footnote 7

The relational model is an intuitive concept of a storage repository (database) that can be easily queried by using a mechanism called a query language to retrieve, update, and insert data. Many vendors have implemented the relational model because it has a sound systematic theory, a firm mathematical foundation, and a simple structure. The most commonly used query mechanism is SQL, which resembles natural language. Although SQL is not included in the relational model, it provides an integral part of the practical application of the relational model in RDBMSs.

The data are represented as related pieces of information (attributes or columns sometimes called fields) about a certain event or entity. The set of values for the attributes is formed as a tuple (sometimes called a record or row). Tuples are stored in tables that have the same set of attributes. Tables can then be related to other tables through constraints on keys, attributes, and tuples.

Tables can have special mappings of columns called indexes that permit you to read the data in a specific order. Indexes are also useful for fast retrieval of rows that match the value(s) of the indexed columns.

How and Where MySQL Stores Data

The MySQL database system stores data via an interesting mechanism of programmatic isolation called a storage engine that is governed by the handler interface. The handler interface permits the use of interchangeable storage components in the MySQL server so that the parser, the optimizer, and all manner of components can interact in storing data on disk using a common mechanism. This is also referred to as a pluggable storage engine.

Note

MySQL supports several storage engines. Most are designed to write data to disk by default. However, the MEMORY storage engine stores data in memory but is not persistent. That is, when the computer is rebooted, the data is lost. You can use the MEMORY storage engine for fast lookup tables. Indeed, one optimization technique is to create copies of lookup tables at startup using the MEMORY storage engine.

What does this mean to you? It means you have the choice of different mechanisms for storing data. You can specify the storage engine in the table CREATE statement shown in the following code sample. Notice the last line in the command: this is how a storage engine is specified. Leaving off this clause results in MySQL using the default storage engine. For the examples in this book, MySQL 5.5 uses the MyISAM storage engine by default.

Tip

The default storage engine was changed from MyISAM to InnoDB in MySQL version 5.6.

CREATE DATABASE `bvm`;

CREATE TABLE `bvm`.`books` (

  `ISBN` varchar(15) DEFAULT NULL,

  `Title` varchar(125) DEFAULT NULL,

  `Authors` varchar(100) DEFAULT NULL,

  `Quantity` int(11) DEFAULT NULL,

  `Slot` int(11) DEFAULT NULL,

  `Thumbnail` varchar(100) DEFAULT NULL,

  `Description` text

) ENGINE=MyISAM;

Great! Now, what storage engines exist on MySQL? You can discover which storage engines are supported by issuing the SHOW STORAGE ENGINES command, as shown in Listing5-2. As you see, there are a lot to choose from. I cover a few that may be pertinent to planning IOT solutions.

Note

The following sections show how to work with MySQL on a typical Linux-like (actually Unix-like) platform. I’ve found most IOT solutions will use forms of these platforms rather than Windows 10, but that may change in the future. For now, I focus on exploring MySQL on these platforms rather than Windows. However, many of the examples shown can be executed on Windows albeit with a different set of commands.

Listing 5-2.Available Storage Engines

mysql> SHOW STORAGE ENGINES \G

*************************** 1. row ***************************

      Engine: FEDERATED

     Support: NO

     Comment: Federated MySQL storage engine

Transactions: NULL

          XA: NULL

  Savepoints: NULL

*************************** 2. row ***************************

      Engine: MRG_MYISAM

     Support: YES

     Comment: Collection of identical MyISAM tables

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 3. row ***************************

      Engine: CSV

     Support: YES

     Comment: CSV storage engine

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 4. row ***************************

      Engine: BLACKHOLE

     Support: YES

     Comment: /dev/null storage engine (anything you write to it disappears)

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 5. row ***************************

      Engine: MyISAM

     Support: YES

     Comment: MyISAM storage engine

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 6. row ***************************

      Engine: InnoDB

     Support: DEFAULT

     Comment: Supports transactions, row-level locking, and foreign keys

Transactions: YES

          XA: YES

  Savepoints: YES

*************************** 7. row ***************************

      Engine: ARCHIVE

     Support: YES

     Comment: Archive storage engine

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 8. row ***************************

      Engine: MEMORY

     Support: YES

     Comment: Hash based, stored in memory, useful for temporary tables

Transactions: NO

          XA: NO

  Savepoints: NO

*************************** 9. row ***************************

      Engine: PERFORMANCE_SCHEMA

     Support: YES

     Comment: Performance Schema

Transactions: NO

          XA: NO

  Savepoints: NO

9 rows in set (0.00 sec)

mysql>

As of version 5.6, MySQL uses the InnoDB storage engine by default. Previous versions used MyISAM as the default. InnoDB is a fully transactional, ACIDFootnote 8 storage engine. A transaction is a batch of statements that must all succeed before any changes are written to disk. The classic example is a bank transfer. If you consider a system that requires deducting an amount from one account and then crediting that amount to another account to complete the act of moving funds, you would not want the first to succeed and the second to fail, or vice versa!

Wrapping the statements in a transaction ensures that no data is written to disk until and unless all statements are completed without errors. Transactions in this case are designated with a BEGIN statement and concluded with either a COMMIT to save the changes or a ROLLBACK to undo the changes. InnoDB stores its data in a single file (with some additional files for managing indexes and transactions).

The MyISAM storage engine is optimized for reads. MyISAM has been the default for some time and was one of the first storage engines available. In fact, a large portion of the server is dedicated to supporting MyISAM. It differs from InnoDB in that it does not support transactions and stores its data in an indexed sequential access method format. This means it supports fast indexing. You would choose MyISAM over InnoDB if you did not need transactions and you wanted to be able to move or back up individual tables.

Another storage engine that you may want to consider, especially for sensor networks, is Archive. This engine does not support deletes (but you can drop entire tables) and is optimized for minimal storage on disk. Clearly, if you are running MySQL on a small system like a Raspberry Pi, small is almost always better! The inability to delete data may limit more advanced applications, but most sensor networks merely store data and rarely delete it. In this case, you can consider using the Archive storage engine.

There is also the CSV storage engine (where CSV stands for comma-separated values). This storage engine creates text files to store the data in plain text that can be read by other applications such as a spreadsheet application. If you use your sensor data for statistical analysis, the CSV storage engine may make the process of ingesting the data easier.

So, where is all this data? If you query the MySQL server and issue the command SHOW VARIABLES LIKE "datadir";, you see the path to the location on disk that all storage engines use to store data. In the case of InnoDB, this is a single file on disk located in the data directory. InnoDB also creates a few administrative files, but the data is stored in the single file. For most other storage engines except NDB and MEMORY, the data for the tables is stored in a folder with the name of the database under the data directory. Listing 5-3 shows an example from a Mac OS X machine. You may need to use different paths on your own machine.

Listing 5-3.Finding Where Your Data Is Located

mysql> SHOW VARIABLES LIKE 'datadir'

+---------------+------------------------+

| Variable_name | Value                  |

+---------------+------------------------+

| datadir       | /usr/local/mysql/data/ |

+---------------+------------------------+

1 row in set (0.00 sec)

mysql> quit;

bye

$ sudo ls -lsa /usr/local/mysql/data

rwxr-x---    58 _mysql  wheel       1972 Feb  6 15:05 .

drwxr-xr-x   17 root    wheel        578 Jan 20 16:38 ..

-rw-rw----    1 _mysql  wheel          0 Feb  6 15:04 Chucks-iMac.local.err

-rw-rw----    1 _mysql  wheel          5 Feb  6 15:00 Chucks-iMac.local.pid

drwx------    6 _mysql  wheel        204 Oct 17 15:16 bvm

-rw-rw----    1 _mysql  wheel    5242880 Feb  6 15:00 ib_logfile0

-rw-rw----    1 _mysql  wheel    5242880 Feb  6 15:00 ib_logfile1

-rw-rw----    1 _mysql  wheel  815792128 Feb  1 17:16 ibdata1

-rw-rw----    1 _mysql  wheel   52428800 Feb  1 17:16 ibdata2

drwxr-x---   77 _mysql  wheel       2618 Jan  8 15:24 mysql

drwx------   38 _mysql  wheel       1292 Nov 27 08:46 sakila

drwx------  192 _mysql  wheel       6528 Oct 22 12:17 test

drwx------    6 _mysql  wheel        204 Dec 18 17:05 world_innodb

$ sudo ls -lsa /usr/local/mysql/data/bvm

drwx------   6 _mysql  wheel   204 Oct 17 15:16 .

drwxr-x---  58 _mysql  wheel  1972 Feb  6 15:05 ..

-rw-rw----   1 _mysql  wheel  5056 Oct 17 15:24 books.MYD

-rw-rw----   1 _mysql  wheel  1024 Oct 17 15:25 books.MYI

-rw-rw----   1 _mysql  wheel  8780 Oct 17 15:16 books.frm

-rw-rw----   1 _mysql  wheel    65 Oct 17 15:15 db.opt

This example first queries the database server for the location of the data directory (it is in a protected folder on this machine). If you issue a listing command, you can see the InnoDB files identified by the ib and ibd prefixes. You also see a number of directories, all of which are the databases on this server. After that is a listing of one of the database folders. Notice the files with the extension .MY?: these are MyISAM files (data and index). The .frm files are the configuration files created and maintained by the server.

Tip

If you want to copy data from one server to another by copying files, be sure to copy the .frm files as well! This is easy for MyISAM and Archive but much harder with InnoDB. In the case of InnoDB, you have to copy all the database folders and the InnoDB files to make sure you get everything.

Although it is unlikely that you would require a transactional storage engine for a database node in your IOT solution, such as a Raspberry Pi running MySQL Server, MySQL 5.6 has one, and it’s turned on by default. A more likely scenario is that you would use the MyISAM or Archive engine for your tables.

For more information about storage engines and the choices and features of each, please see the online MySQL Reference Manual section “Storage Engines” ( http://dev.mysql.com/doc/ ).

The MySQL Configuration File

The MySQL server can be configured using a configuration file similar to the way you configure the Raspberry Pi. On Windows, the MySQL configuration file is located in the installation folder and is named my.ini. On other systems, it is located in the /etc/mysql folder and is named my.cnf. This file contains several sections, one of which is labeled [mysqld]. The items in this list are key-value pairs; the name on the left of the equal sign is the option, and its value on the right. The following is a typical configuration file (with many lines suppressed for brevity):

[mysqld]

port = 3306

basedir = /usr/local/mysql

datadir = /usr/local/mysql/data

server_id = 5

general_log

As you can see, this is a simple way to configure a system. This example sets the TCP port, base directory (the root of the MySQL installation including the data as well as binary and auxiliary files), data directory, and server ID (used for replication, as discussed shortly) and turns on the general log (when the Boolean switch is included, it turns on the log). There are many such variables you can set for MySQL. See the online MySQL reference manual for details concerning using the configuration file. You will change this file when you set up MySQL on the Raspberry Pi.

How to Start, Stop, and Restart MySQL on Windows

While working with your databases and configuring MySQL on your computer, you may need to control the startup and shutdown of the MySQL server. The default mode for installing MySQL is to automatically start on boot and stop on shutdown, but you may want to change that, or you may need to stop and start the server after changing a parameter. In addition, when you change the configuration file, you need to restart the server to see the effect of your changes.

You can start, stop, and restart the MySQL server with the notifier tray application or via the Windows services control panel. Simply select the MySQL service and right-click to stop or start the service. This will execute a controlled shutdown and startup should you need to do so.

Creating Users and Granting Access

You need to know about two additional administrative operations before working with MySQL: creating user accounts and granting access to databases. MySQL can perform both of these with the GRANT statement, which automatically creates a user if one does not exist. But the more pedantic method is first to issue a CREATE USER command followed by one or more GRANT commands. For example, the following shows the creation of a user named sensor1 and grants the user access to the database room_temp:

CREATE USER 'sensor1'@'%' IDENTIFIED BY 'secret'

GRANT SELECT, INSERT, UPDATE ON room_temp.* TO 'sensor1'@'%'

The first command creates the user named sensor1, but the name also has an @ followed by another string. This second string is the host name of the machine with which the user is associated. That is, each user in MySQL has both a user name and a host name, in the form user@host, to uniquely identify them. That means the user and host sensor1@10.0.1.16 and the user and host sensor1@10.0.1.17 are not the same. However, the % symbol can be used as a wildcard to associate the user with any host. The IDENTIFIED BY clause sets the password for the user.

A Note About Security

It is always a good idea to create a user for your application that does not have full access to the MySQL system. This is so you can minimize any accidental changes and also to prevent exploitation. For sensor networks, it is recommended that you create a user with access only to those databases where you store (or retrieve) data. You can change MySQL user passwords with the following command:

SET PASSWORD FOR sensor1@"%" = PASSWORD("secret");

Also be careful about using the wildcard % for the host. Although it makes it easier to create a single user and let the user access the database server from any host, it also makes it much easier for someone bent on malice to access your server (once they discover the password).

Another consideration is connectivity. As with the Raspberry Pi, if you connect a database to your network and the network is in turn connected to the Internet, it may be possible for other users on your network or the Internet to gain access to the database. Don’t make it easy for them—change your root user password, and create users for your applications.

The second command allows access to databases. There are many privileges that you can give a user. The example shows the most likely set that you would want to give a user of a sensor network database: read (SELECT), add data (INSERT), and change data (UPDATE). See the online reference manual for more about security and account access privileges.

The command also specifies a database and objects to which to grant the privilege. Thus, it is possible to give a user read (SELECT) privileges to some tables and write (INSERT, UPDATE) privileges to other tables. This example gives the user access to all objects (tables, views, and so on) in the room_temp database.

As mentioned, you can combine these two commands into a single command. You are likely to see this form more often in the literature. The following shows the combined syntax. In this case, all you need to do is add the IDENTIFIED BY clause to the GRANT statement. Cool!

GRANT SELECT, INSERT, UPDATE ON room_temp. * TO  'sensor1'@'%' IDENTIFIED BY 'secret'

Common MySQL Commands and Concepts

Learning and mastering a database system requires training, experience, and a good deal of perseverance. Chief among the knowledge needed to become proficient is how to use the common SQL commands and concepts. This section completes the primer on MySQL by introducing the most common MySQL commands and concepts.

Note

Rather than regurgitate the reference manual,Footnote 9 this section introduces the commands and concepts at a high level. If you decide to use any of the commands or concepts, please refer to the online reference manual for additional details, complete command syntax, and additional examples.

MySQL Commands

This section reviews the most common SQL and MySQL-specific commands that you will need to know to get the most out of your IOT database. While you have already seen some of these in action, this section provides additional information to help you use them.

Note

Case sensitivity of user-supplied variables (for example, last_name versus Last_Name) is not consistent across platforms. For example, case-sensitivity behavior is different on Windows than it is on Mac OS X. MySQL adheres to the platform’s case-sensitivity policy. Check the online reference manual for your platform to see how case sensitivity affects user-supplied variables.

Creating Databases and Tables

The most basic commands you will need to learn and master are the CREATE DATABASE and CREATE TABLE commands. Recall that database servers such as MySQL allow you to create any number of databases that you can add tables and store data in a logical manner.

To create a database, use CREATE DATABASE followed by a name for the database. If you are using the MySQL client, you must use the USE command to switch to a specific database. The client focus is the latest database specified either at startup (on the command line) or via the USE command. You can override this by referencing the database name first. For example, SELECT * FROM db1.table1 will execute regardless of the default database set. However, leaving off the database name will cause the mysql client to use the default database. The following shows two commands to create and change the focus of the database:

mysql> CREATE DATABASE plant_monitoring;

mysql> USE plant_monitoring;

Tip

Recall if you want to see all the databases on the server, use the SHOW DATABASES command.

Creating a table requires the, yes, CREATE TABLE command. This command has many options allowing you to specify not only the columns and their data types but also additional options such as indexes, foreign keys, and so on. An index can also be created using the CREATE INDEX command (see the following code). The following shows how to create a simple table for storing plant sensor data.

CREATE TABLE `plant_monitoring`.`plants` (

  `plant_name` char(30) NOT NULL,

  `sensor_value` float DEFAULT NULL,

  `sensor_event` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  `sensor_level` char(5) DEFAULT NULL,

  PRIMARY KEY `plant_name` (`plant_name`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Notice here that I specified the table name (plants) and four columns (plant_name, sensor_value, sensor_event, and sensor_level). I used several data types. For plant_name, I used a character field with a maximum of 30 characters, a floating-point data type for sensor_value, a timestamp value for sensor_event, and another character field for sensor_level of five characters.

The TIMESTAMP data type is of particular use in IOT solutions or any time you want to record the date and time of an event or action. For example, it is often helpful to know when a sensor value is read. By adding a TIMESTAMP column to the table, you do not need to calculate, read, or otherwise format a date and time at the sensor or even aggregate node.

Notice also that I specified that the sensor_name column be defined as a key, which creates an index. In this case, it is also the primary key. The PRIMARY KEY phrase tells the server to ensure there exists one and only one row in the table that matches the value of the column. You can specify several columns to be used in the primary key by repeating the keyword. Note that all primary key columns must not permit nulls (NOT NULL).

If you cannot determine a set of columns that uniquely identify a row (and you want such a behavior—some favor tables without this restriction, but a good DBA would not), you can use an artificial data type option for integer fields called AUTO INCREMENT. When used on a column (must be the first column), the server automatically increases this value for each row inserted. In this way, it creates a default primary key. For more information about auto increment columns, see the online reference manual.

Tip

Best practices suggest using a primary key on a character field to be suboptimal in some situations such as tables with large values for each column or many unique values. This can make searching and indexing slower. In this case, you could use an auto increment field to artificially add a primary key that is smaller in size (but somewhat more cryptic).

There are far more data types available than those shown in the previous example. You should review the online reference manual for a complete list of data types. See the section “Data Types.” If you want to know the layout or “schema” of a table, use the SHOW CREATE TABLE command.

Like databases, you can also get a list of all the tables in the database with the SHOW TABLES command.

Getting Results

The most used basic command you need to know is the command to return the data from the table (also called a result set or rows). To do this, you use the SELECT statement. This SQL statement is the workhorse for a database system. All queries for data will be executed with this command.Footnote 10 As such, we will spend a bit more time looking at the various clauses (parts) that can be used starting with the column list.

The SELECT statement allows you to specify which columns you want to choose from the data. The list appears as the first part of the statement. The second part is the FROM clause, which specifies the table(s) you want to retrieve rows from.

Note

The FROM clause can be used to join tables with the JOIN operator. You will see a simple example of a join in a later section.

The order that you specify the columns determines the order shown in the result set. If you want all of the columns, use an asterisks (*) instead. Listing 5-4 demonstrates three statements that generate the same result sets. That is, the same rows will be displayed in the output of each. In fact, I am using a table with only four rows for simplicity.

Listing 5-4.Example SELECT Statements

mysql> SELECT plant_name, sensor_value, sensor_event, sensor_level FROM plant_monitoring.plants;

+------------------------+--------------+---------------------+--------------+

| plant_name             | sensor_value | sensor_event        | sensor_level |

+------------------------+--------------+---------------------+--------------+

| fern in den            |       0.2319 | 2015-09-23 21:04:35 | NULL         |

| fern on deck           |         0.43 | 2015-09-23 21:11:45 | NULL         |

| flowers in bedroom1    |        0.301 | 2015-09-23 21:11:45 | NULL         |

| weird plant in kitchen |        0.677 | 2015-09-23 21:11:45 | NULL         |

+------------------------+--------------+---------------------+--------------+

4 rows in set (0.00 sec)

mysql> SELECT * FROM plant_monitoring.plants;

+------------------------+--------------+---------------------+--------------+

| plant_name             | sensor_value | sensor_event        | sensor_level |

+------------------------+--------------+---------------------+--------------+

| fern in den            |       0.2319 | 2015-09-23 21:04:35 | NULL         |

| fern on deck           |         0.43 | 2015-09-23 21:11:45 | NULL         |

| flowers in bedroom1    |        0.301 | 2015-09-23 21:11:45 | NULL         |

| weird plant in kitchen |        0.677 | 2015-09-23 21:11:45 | NULL         |

+------------------------+--------------+---------------------+--------------+

4 rows in set (0.00 sec)

mysql> SELECT sensor_value, plant_name, sensor_level, sensor_event FROM plant_monitoring.plants;

+--------------+------------------------+--------------+---------------------+

| sensor_value | plant_name             | sensor_level | sensor_event        |

+--------------+------------------------+--------------+---------------------+

|       0.2319 | fern in den            | NULL         | 2015-09-23 21:04:35 |

|         0.43 | fern on deck           | NULL         | 2015-09-23 21:11:45 |

|        0.301 | flowers in bedroom1    | NULL         | 2015-09-23 21:11:45 |

|        0.677 | weird plant in kitchen | NULL         | 2015-09-23 21:11:45 |

+--------------+------------------------+--------------+---------------------+

4 rows in set (0.00 sec)

Notice that the first two statements result in the same rows as well as the same columns in the same order, but the third statement, while it generates the same rows, displays the columns in a different order.

You can also use functions in the column list to perform calculations and similar operations. One special example is using the COUNT() function to determine the number of rows in the result set, as shown here. See the online reference manual for more examples of functions supplied by MySQL.

SELECT COUNT(*) FROM plant_monitoring.plants;

The next clause in the SELECT statement is the WHERE clause. This is where you specify the conditions you want to use to restrict the number of rows in the result set. That is, only those rows that match the conditions. The conditions are based on the columns and can be quite complex. That is, you can specify conditions based on calculations, results from a join, and more. But most conditions will be simple equalities or inequalities on one or more columns in order to answer a question. For example, suppose you wanted to see the plants where the sensor value read is less than 0.40. In this case, we issue the following query and receive the results. Notice I specified only two columns: the plant name and the value read from sensor.

mysql> SELECT plant_name, sensor_value FROM plant_monitoring.plants WHERE sensor_value < 0.40;

+---------------------+--------------+

| plant_name          | sensor_value |

+---------------------+--------------+

| fern in den         |       0.2319 |

| flowers in bedroom1 |        0.301 |

+---------------------+--------------+

2 rows in set (0.01 sec)

There are additional clauses you can use including the GROUP BY clause, which is used for grouping rows for aggregation or counting, and the ORDER BY clause, which is used to order the result set. Let’s take a quick look at each starting with aggregation.

Suppose you wanted to average the sensor values read in the table for each sensor. In this case, we have a table that contains sensor readings over time for a variety of sensors. While the example contains only four rows (and thus may not be statistically informative), the example demonstrates the concept of aggregation quite plainly, as shown in Listing 5-5. Notice what we receive is simply the average of the four sensor values read.

Listing 5-5.GROUP BY Example

mysql> SELECT plant_name, sensor_value FROM plant_monitoring.plants WHERE plant_name = 'fern on deck'

+--------------+--------------+

| plant_name   | sensor_value |

+--------------+--------------+

| fern on deck |         0.43 |

| fern on deck |         0.51 |

| fern on deck |        0.477 |

| fern on deck |         0.73 |

+--------------+--------------+

4 rows in set (0.00 sec)

mysql> SELECT plant_name, AVG(sensor_value) as avg_value FROM plant_monitoring.plants WHERE plant_name = 'fern on deck' GROUP BY plant_name;

+--------------+-------------------+

| plant_name   | avg_value         |

+--------------+-------------------+

| fern on deck | 0.536750003695488 |

+--------------+-------------------+

1 row in set (0.00 sec)

Notice I specified the average function, AVG(), in the column list and passed in the name of the column I wanted to average. There are many such functions available in MySQL to perform some powerful calculations. Clearly, this is another example of how much power exists in the database server that would require many more resources on a typical lightweight sensor or aggregator node in the network.

Notice also that I renamed the column with the average with the AS keyword. You can use this to rename any column specified, which changes the name in the result set, as you can see in the listing.

Another use of the GROUP BY clause is counting. In this case, we replaced AVG() with COUNT() and received the number of rows matching the WHERE clause. More specifically, we want to know how many sensor values were stored for each plant.

mysql> SELECT plant_name, COUNT(sensor_value) as num_values FROM plant_monitoring.plants GROUP BY plant_name;

+------------------------+------------+

| plant_name             | num_values |

+------------------------+------------+

| fern in den            |          1 |

| fern on deck           |          4 |

| flowers in bedroom1    |          1 |

| weird plant in kitchen |          1 |

+------------------------+------------+

4 rows in set (0.00 sec)

Now let’s say we want to see the results of our result set ordered by sensor value. We will use the same query that selected the rows for the fern on the deck, but we order the rows by sensor value in ascending and descending order using the ORDER BY clause. Listing 5-6shows the results of each option.

Listing 5-6.ORDER BY Examples

mysql> SELECT plant_name, sensor_value FROM plant_monitoring.plants WHERE plant_name = 'fern on deck' ORDER BY sensor_value ASC;

+--------------+--------------+

| plant_name   | sensor_value |

+--------------+--------------+

| fern on deck |         0.43 |

| fern on deck |        0.477 |

| fern on deck |         0.51 |

| fern on deck |         0.73 |

+--------------+--------------+

4 rows in set (0.00 sec)

mysql> SELECT plant_name, sensor_value FROM plant_monitoring.plants WHERE plant_name = 'fern on deck' ORDER BY sensor_value DESC;

+--------------+--------------+

| plant_name   | sensor_value |

+--------------+--------------+

| fern on deck |         0.73 |

| fern on deck |         0.51 |

| fern on deck |        0.477 |

| fern on deck |         0.43 |

+--------------+--------------+

4 rows in set (0.00 sec)

As I mentioned, there is a lot more to the SELECT statement than shown here, but what we have seen here will get you very far, especially when working with data typical of most small to medium-sized IOT solutions.

Adding Data

Now that you have a database and tables created, you will want to load or insert data into the tables. You can do so using the INSERT INTO statement. Here we specify the table and the data for the row. The following shows a simple example:

INSERT INTO plant_monitoring.plants (plant_name, sensor_value) VALUES ('fern in den', 0.2319);

In this example, I am inserting data for one of my plants by specifying the name and value. What about the other columns, you wonder? In this case, the other columns include a timestamp column, which will be filled in by the database server. All other columns (just the one) will be set to NULL, which means no value is available, the value is missing, the value is not zero, or the value is empty.Footnote 11

Notice I specified the columns before the data for the row. This is necessary whenever you want to insert data for fewer columns than what the table contains. More specifically, leaving the column list off means you must supply data (or NULL) for all columns in the table. Also, the order of the columns listed can be different from the order they are defined in the table. Leaving the column list off will result in the ordering the column data based on how they appear in the table.Footnote 12

You can also insert several rows using the same command by using a comma-separated list of the row values, as shown here:

INSERT INTO plant_monitoring.plants (plant_name, sensor_value) VALUES ('flowers in bedroom1', 0.301), ('weird plant in kitchen', 0.677), ('fern on deck', 0.430);

Here I’ve inserted several rows with the same command. Note that this is just a shorthand mechanism and, except for automatic commits, no different than issuing separate commands.Footnote 13

Changing Data

There are times when you want to change or update data. You may have a case where you need to change the value of one or more columns, replace the values for several rows, or correct formatting or even scale of numerical data. To update data, we use the UPDATE command.

You can update a particular column, update a set of columns, perform calculations one or more columns, and more. I do not normally have much need to change data in an IOT solution, but sometimes in the case of mistakes in sensor-reading code or similar data entry problems, it may be necessary.

What may be more likely is you or your users will want to rename an object in your database. For example, suppose we determine the plant on the deck is not actually a fern but was an exotic flowering plant.Footnote 14 In this case, we want to change all rows that have a plant name of “fern on deck” to “flowers on deck.” The following command performs the change:

UPDATE plant_monitoring.plants SET plant_name = 'flowers on deck' WHERE plant_name = 'fern on deck'

Notice the key operator here is the SET operator. This tells the database to assign a new value to the column(s) specified. You can list more than one set operation in the command.

Notice I used a WHERE clause here to restrict the UPDATE to a particular set of rows. This is the same WHERE clause as you saw in the SELECT statement, and it does the same thing; it allows you to specify conditions that restrict the rows affected. If you do not use the WHERE clause, the updates will apply to all rows.

Caution

Don’t forget the WHERE clause! Issuing an UPDATE command without a WHERE clause will affect all rows in the table!

Removing Data

Sometimes you end up with data in a table that needs to be removed. Maybe you used test data and want to get rid of the fake rows, or perhaps you want to compact or purge your tables or want to eliminate rows that no longer apply. To remove rows, use the DELETE FROM command.

Let’s look at an example. Suppose you have a plant-monitoring solution under development and you’ve discovered that one of your sensors or sensor nodes are reading values that are too low, because of a coding, wiring, or calibration error. In this case, we want to remove all rows with a sensor value less than 0.20. The following command does this:

DELETE FROM plants WHERE sensor_value < 0.20;

Caution

Don’t forget the WHERE clause! Issuing a DELETE FROM command without a WHERE clause will permanently delete all rows in the table!

Notice I used a WHERE clause here. That is, a conditional statement to limit the rows acted upon. You can use whatever columns or conditions you want; just be sure you have the correct ones! I like to use the same WHERE clause in a SELECT statement first. For example, I would issue the following first to check that I am about to delete the rows I want and only those rows. Notice it is the same WHERE clause.

SELECT * FROM plants WHERE sensor_value < 0.20;

MySQL Concepts

Besides the commands shown earlier, there are additional concepts that you may want to consider using. While it is true each has one SQL command, I list them here as a concept because they are more complex than a simple object creation or data retrieval.

Indexes

Tables are created without the use of any ordering. That is, tables are unordered. While it is true MySQL will return the data in the same order each time, there is no implied (or reliable) ordering unless you create an index. The ordering I am referring to here is not like you think when sorting (that’s possible with the ORDER BY clause in the SELECT statement).

Rather, indexes are mappings that the server uses to read the data when queries are executed. For example, if you had no index on a table and wanted to select all rows with a value greater than a certain value for a column, the server will have to read all rows to find all the matches. However, if we added an index on that column, the server would have to read only those rows that match the criteria.

I should note that there are several forms of indexes. What I am referring to here is a clustered index where the value for column in the index is stored in the index, allowing the server to read the index only and not the rows to do the test for the criteria.

To create an index, you can either specify the index in the CREATE TABLE statement or issue a CREATE INDEX command. The following shows a simple example:

CREATE INDEX plant_name ON plants (plant_name);

This command adds an index on the plant_name column. Observe how this affects the table.

CREATE TABLE `plants` (

  `plant_name` char(30) DEFAULT NULL,

  `sensor_value` float DEFAULT NULL,

  `sensor_event` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  `sensor_level` char(5) DEFAULT NULL,

  KEY `plant_name` (`plant_name`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1

Indexes created like this do not affect the uniqueness of the rows in the table, in other words, making sure there exists one and only one row that can be accessed by a specific value of a specific column (or columns). What I am referring to is the concept of a primary key (or primary index), which is a special option used in the creation of the table as described earlier.

Views

Views are logical mappings of results of one or more tables. They can be referenced as if they were tables in queries, making them a powerful tool for creating subsets of data to work with. You create a view with CREATE VIEW and give it a name similar to a table. The following shows a simple example where we create a test view to read values from a table. In this case, we limit the size of the view (number of rows), but you could use a wide variety of conditions for your views, including combining data from different tables.

CREATE VIEW test_plants AS SELECT * FROM plants LIMIT 5;

Views are not normally encountered in small or medium-sized database solutions, but I include them to make you aware of them in case you decide to do additional analysis and want to organize the data into smaller groups for easier reading.

Triggers

Another advanced concept (and associated SQL command) is the use of an event-driven mechanism that is “triggered” when data is changed. That is, you can create a short set of SQL commands (a procedure) that will execute when data is inserted or changed.

There are several events or conditions under which the trigger will execute. You can set up a trigger either before or after an update, insert, or delete action. A trigger is associated with a single table and has as its body a special construct that allows you to act on the rows affected. The following shows a simple example:

DELIMITER //

CREATE TRIGGER set_level BEFORE INSERT ON plants FOR EACH ROW

BEGIN

  IF NEW.sensor_value < 0.40 THEN

    SET NEW.sensor_level = 'LOW'

  ELSEIF NEW.sensor_value < 0.70 THEN

    SET NEW.sensor_level = 'OK'

  ELSE

    SET NEW.sensor_level = 'HIGH'

  END IF;

END //

DELIMITER ;

This trigger will execute before each insert into the table. As you can see in the compound statement (BEGIN...END), we set a column called sensor_level to LOW, OK, or HIGH depending on the value of the sensor_value. To see this in action, consider the following command. The FOR EACH ROW syntax allows the trigger to act on all rows in the transaction.

INSERT INTO plants (plant_name, sensor_value) VALUES ('plant1', 0.5544);

Since the value we supplied is less than the middle value (0.70), we expect the trigger to fill in the sensor_level column for us. The following shows this indeed is what happened when the trigger fired:

+-------------+--------------+---------------------+--------------+

| plant_name  | sensor_value | sensor_event        | sensor_level |

+-------------+--------------+---------------------+--------------+

| plant1      |       0.5544 | 2015-09-23 20:00:15 | OK           |

+-------------+--------------+---------------------+--------------+

1 row in set (0.00 sec)

This demonstrates an interesting and powerful way you can create derived columns with the power of the database server and save the processing power of your sensor or aggregator nodes. I encourage you to consider this and similar powerful concepts for leveraging the power of the database server.

Simple Joins

One of the most powerful concepts of database systems is the ability to make relationships (hence the name relational) among the data. That is, data in one table can reference data in another (or several tables). The most simplistic form of this is called a master-detail relationship where a row in one table references or is related to one or more rows in another.

A common (and classic) example of a master-detail relationship is from an order-tracking system where we have one table containing the data for an order and another table containing the line items for the order. Thus, we store the order information such as customer number and shipping information once and combine or “join” the tables when we retrieve the order proper.

Let’s look at an example from the sample database named world. You can find this database on the MySQL web site ( http://dev.mysql.com/doc/index-other.html ). Feel free to download it and any other sample database. They all demonstrate various designs of database systems. You will also find it handy to practice querying the data as it contains more than a few, simple rows.

Note

If you want to run the following examples, you need to install the world database as described in the documentation for the example ( http://dev.mysql.com/doc/world-setup/en/world-setup-installation.html ).

Listing 5-7 shows an example of a simple join. There is a lot going on here, so take a moment to examine the parts of the SELECT statement, especially how I specified the JOIN clause. You can ignore the LIMIT option because that simply limits the number of rows in the result set.

Listing 5-7.Simple JOIN Example

mysql> SELECT Name, Continent, Language FROM Country JOIN CountryLanguage ON Country.Code = CountryLanguage.CountryCode LIMIT 10;

+-------------+---------------+------------+

| Name        | Continent     | Language   |

+-------------+---------------+------------+

| Aruba       | North America | Dutch      |

| Aruba       | North America | English    |

| Aruba       | North America | Papiamento |

| Aruba       | North America | Spanish    |

| Afghanistan | Asia          | Balochi    |

| Afghanistan | Asia          | Dari       |

| Afghanistan | Asia          | Pashto     |

| Afghanistan | Asia          | Turkmenian |

| Afghanistan | Asia          | Uzbek      |

| Angola      | Africa        | Ambo       |

+-------------+---------------+------------+

10 rows in set (0.00 sec)

Here I used a JOIN clause that takes two tables specified such that the first table is joined to the second table using a specific column and its values (the ON specifies the match). What the database server does is read each row from the tables and returns only those rows where the value in the columns specified a match. Any rows in one table that are not in the other are not returned.

Tip

But you can retrieve those rows with different joins. See the online reference manual on inner and outer joins for more details.

Notice also that I included only a few columns. In this case, I specified the country name and continent from the Country table and the language column from the CountryLanguage table. If the column names were not unique (the same column appears in each table), I would have to specify them by table name such as Country.Name. In fact, it is considered good practice to always qualify the columns in this manner.

There is one interesting anomaly in this example that I feel important to point out. In fact, some would consider it a design flaw. Notice in the JOIN clause I specified the table and column for each table. This is normal and correct, but notice the column name does not match in both tables. While this really doesn’t matter and creates only a bit of extra typing, some DBAs would consider this erroneous and would have a desire to make the common column name the same in both tables.

Another use for a join is to retrieve common, archival, or lookup data. For example, suppose you had a table that stored details about things that do not change (or rarely change) such as cities associated with ZIP codes or names associated with identification numbers (e.g., SSN). You could store this information in a separate table and join the data on a common column (and values) whenever you needed. In this case, that common column can be used as a foreign key, which is another advanced concept.

Foreign keys are used to maintain data integrity (that is, if you have data in one table that relates to another table but the relationship needs to be consistent). For example, if you wanted to make sure when you delete the master row that all of the detail rows are also deleted, you could declare a foreign key in the master table to a column (or columns) to the detail table. See the online reference manual for more information about foreign keys.

This discussion on joins touches only the very basics. Indeed, joins are arguably one of the most difficult and often confused areas in database systems. If you find you want to use joins to combine several tables or extend data so that data is provided from several tables (outer joins), you should spend some time with an in-depth study of database concepts such as Clare Churcher’s book Beginning Database Design (Apress, 2012).

Additional Advanced Concepts

There are more concepts and commands available in MySQL, but two that may be of interest are PROCEDURE and FUNCTION, sometimes called routines. I introduce these concepts here so that if you want to explore them, you understand how they are used at a high level.

Suppose you need to run several commands to change data. That is, you need to do some complex changes based on calculations. For these types of operations, MySQL provides the concept of a stored procedure. The stored procedure allows you to execute a compound statement (a series of SQL commands) whenever the procedure is called. Stored procedures are sometimes considered an advanced technique used mainly for periodic maintenance, but they can be handy in even the more simplistic situations.

For example, suppose you want to develop your IOT solution, but since you are developing it, you need to periodically start over and want to clear out all the data first. If you had only one table, a stored procedure would not help much, but suppose you have several tables spread over several databases (not unusual for larger IOT solutions). In this case, a stored procedure may be helpful.

Tip

When entering commands with compound statements in the MySQL client, you need to change the delimiter (the semicolon) temporarily so that the semicolon at the end of the line does not terminate the command entry. For example, use DELIMITER // before writing the command with a compound statement, use // to end the command, and change the delimiter back with DELIMITER ;. This is only when using the client.

Since stored procedures can be quite complicated, if you decide to use them, read the “CREATE PROCEDURE and CREATE FUNCTION Syntax” section of the online reference manual before trying to develop your own. There is more to creating stored procedures than described in this section.

Now suppose you want to execute a compound statement and return a result—you want to use it as a function. You can use functions to fill in data by performing calculations, data transformation, or simple translations. Functions therefore can be used to provide values to populate column values, provide aggregation, provide date operations, and more.

You have already seen a couple of functions (COUNT, AVG). These are considered built-in functions, and there is an entire section devoted to them in the online reference manual. However, you can also create your own functions. For example, you may want to create a function to perform some data normalization on your data. More specifically, suppose you have a sensor that produces a value in a specific range, but depending on that value and another value from a different sensor or lookup table, you want to add, subtract, average, and so on, the value to correct it. You could write a function to do this and call it in a trigger to populate the value for a calculation column.

Tip

Use a new column for calculated values so that you preserve the original value.

What About Changing Objects?

You may be wondering what you do when you need to modify a table, procedure, trigger, and so on. Rest easy, you do not have to start over from scratch! MySQL provides an ALTER command for each object. That is, there is an ALTER TABLE, ALTER PROCEDURE, and so on. See the online reference manual section entitled “Data Definition Statements” for more information about each ALTER command.

Planning Database Storage for IOT Data

Now that you know how to get, install, and use MySQL, it’s time to focus on how to apply what you learned in previous chapters and set up a database for storing IOT data. Recall IOT data can be any form of sensor data, personal information, codes, dates of events, identification of devices, and so on.

I present this topic by way of example. More specifically, I feel it is best to demonstrate database design rather than dictate practice and policy by rhetoric. I think the examples show many of the concepts and constructs you are likely to encounter when designing your own database(s) for your IOT solutions. Let’s dive in with a complete design for a plant-monitoring solution.

The following may seem familiar since I used some primitives of these tables in previous sections. However, this section contains a fully developed database design. This is just one possible design I could have used. I challenge you to consider alternatives should you consider implementing your own plant-monitoring solution.

Correct Database Design: Am I Doing This Right?

There isn’t really any wrong way to design your database. While some DBAs would cringe at such a claim, so long as you can achieve all your goals with reasonable performance and no loss of data, you should consider your design feasible. After all, even the most complex and professionally design databases go through routine and evolutional changes. That is, you don’t have to get it right on the first go. You can always adapt your database to your growing and maturing IOT solutions needs.

Example 1: Plant-Monitoring System

Let’s explore an IOT solution that monitors ambient temperature and soil moisture for plants. In this case, the solution is designed to support any number of plants (the target data object). A key component of this IOT solution is that all the data collected is stored where the sensors employed are read about once every hour.

In the following example, I show you four basic steps that I like to use when designing a database. You may find other philosophies that have more steps with more rigid processes (and that’s great), but for the enthusiasts and hobbyists, I recommend using these simplified steps. That doesn’t mean you cannot design any other way, just that this method should work for most. Indeed, if you have experience in designing databases, you should see parallels with your own methodology.

Step 1: Describe the Data

The first thing you should do when designing a database is to describe the data as completely as you can. You should describe the data in English terms perhaps even writing it out on a piece of paper. Doing so helps you define what the data looks like conceptually so that you can determine how many objects you want to store, what they are composed of, and how to organize them.

The plant-monitoring system should store information about plants. Specifically, we want to store information that tells us when plants need watering and how often. It is also important to know the name of the plant, where it is located, and whether it is inside or outside. Thus, the data we need to store consists of the plant name, location, whether they are inside or outside, soil moisture sensor value, temperature sensor value, and time of the sensor reading. We also determine that we want to quantify the value of the soil moisture to make it easy to write an application to detect when a plant needs watering. Indeed, we could even add an automatic watering feature in the future!

Since the plant name is always the same, we don’t need to store that information more than once. Thus, we will create one table to store the information about the plant we are monitoring and create another table to store the sensor readings. This way, if you need to change the name of a plant (like we saw previously) or you want to change its location, you need to change it in only one place. Since we will store the sensor readings in a different table, we must choose a column to be used to join the tables. Since we don’t have any reasonable numeric value assigned to each plant, we can use the auto increment feature to add a unique key (in this case the primary key).

Step 2: Design the Database Objects

Now let’s see how these tables would look like. We name the master table (the one that stores the plant information) plants and the detail table (sensor readings) readings. We place each of these tables in the database named plant_monitoring. Listing 5-8 shows the layout or schema of each of the tables in the database.

Listing 5-8.Plant-Monitoring Schema

-- A database for storing plant soil moisture and ambient temperature

CREATE DATABASE plant_monitoring;

USE plant_monitoring;

-- This table stores information about a plant.

CREATE TABLE `plants` (

  `id` int(11) NOT NULL AUTO_INCREMENT,

  `name` char(50) DEFAULT NULL,

  `location` char(30) DEFAULT NULL,

  `climate` enum ('inside','outside') DEFAULT 'inside',

  PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

-- This table stores values read from sensors. The sensors are read

-- periodically by a sensor node and the values stored as a single row

-- with the date and time added by the database server.

CREATE TABLE `readings` (

  `id` int(11) NOT NULL,

  `moisture` float DEFAULT NULL,

  `temperature` float DEFAULT NULL,

  `event_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,

  `soil_status` enum ('DRY', 'OK', 'WET') DEFAULT NULL

) ENGINE=InnoDB DEFAULT CHARSET=latin1;

DELIMITER //

CREATE TRIGGER set_status BEFORE INSERT ON readings FOR EACH ROW

BEGIN

  IF NEW.moisture < 250 THEN

    SET NEW.soil_status = 1;

  ELSEIF NEW.moisture < 400 THEN

    SET NEW.soil_status = 2;

  ELSE

    SET NEW.soil_status = 3;

  END IF;

END //

DELIMITER ;

Notice the trigger defined on the readings table. Recall from a previous section that a trigger can be used to provide data for calculated columns, which is what we want to store in the soil_status column. In this case, we set the low threshold for dry soil to 250 and wet for anything over 400. Thus, between 250 and 399, the plant soil is considered normal (Ok).

Also, notice the comments included in the SQL code. The double dash is a comment that the MySQL client will ignore. While this example database is rather simple, it does not hurt to add short descriptions of your database objects (tables, and so on) so that you can remember what each does should you forget or as someone else to work with the database or your IOT solution (like for a bespoke application).

Notice also the location column in the plants table. This is an example of using enumerated values so that you don’t have to repeatedly enter the string with each plant you add to the table. Values start at 1, so you can specify 1 for inside and 2 for outside, as shown here:

mysql> INSERT INTO plants VALUES(NULL, 'Jerusalem Cherry', 'deck', 2);

Query OK, 1 row affected (0.01 sec)

mysql> select * from plants;

+----+---------------------+----------+---------+

| id | name                | location | climate |

+----+---------------------+----------+---------+

|  1 | Jerusalem Cherry    | deck     | outside |

+----+---------------------+----------+---------+

1 row in set (0.00 sec)

Note

I chose as descriptive names as I could for the columns. I also threw in some that are a bit fuzzy. Can you tell which ones could use a better name? Hint: what does climate mean to you? Clearly, choosing valid, meaningful names for columns is a challenge and a bit of an art form.

Finally, notice that while I added a primary key, in this case the AUTO_INCRMENT data type to the plants table, I did not add one to the readings table. This is so that we can store any number of rows in the readings table and that it is entirely possible that the data will not be unique. More precisely, the values read from the sensors may be identical from two or more readings. Thus, I left the uniqueness factor off of the definition of the readings table.

I should also note a consequence of using the AUTO_INCREMENT data type to uniquely identify rows. While conceptually there cannot be more than one plant named the same in the same location in the plants table (even if there were three ferns on the deck, most likely you’d name them differently), the fact that AUTO_INCREMENT is an artificial uniqueness mechanism means you could very well enter the same data twice resulting in different AUTO_INCREMENT values. So, a bit of caution is prudent when working with AUTO_INCREMENT.

There are two handy tools you can use when designing tables. First, you can use the SHOW CREATE TABLE command to see the actual SQL command to re-create the table. In fact, the SHOW CREATE can be used for any object such as SHOW CREATE TRIGGER. Second, you can use the EXPLAIN command, as shown in Listing 5-9.

Listing 5-9.Using EXPLAIN

mysql> explain plants;

+------------+----------+------+-----+---------+----------------+

| Field      | Type     | Null | Key | Default | Extra          |

+------------+----------+------+-----+---------+----------------+

| id         | int(11)  | NO   | PRI | NULL    | auto_increment |

| name       | char(50) | YES  |     | NULL    |                |

| location   | char(30) | YES  |     | NULL    |                |

| plant_type | char(30) | YES  |     | NULL    |                |

+------------+----------+------+-----+---------+----------------+

4 rows in set (0.00 sec)

mysql> explain readings;

+-------------+-------------+------+-----+-------------------+-----------------------------+

| Field       | Type        | Null | Key | Default           | Extra                       |

+-------------+-------------+------+-----+-------------------+-----------------------------+

| id          | int(11)     | NO   |     | NULL              |                             |

| moisture    | float       | YES  |     | NULL              |                             |

| temperature | float       | YES  |     | NULL              |                             |

| event_time  | timestamp   | NO   |     | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |

| soil_status | enum('DRY', | YES  |     | NULL              |                             |

|             |  'OK','WET')|      |     |                   |                             |

+-------------+-------------+------+-----+-------------------+-----------------------------+

5 rows in set (0.00 sec)

Here you see a result set that shows all the columns and their data types and options for each table in the plant monitoring database. Notice the extra information we get for the timestamp field. Here it tells us the timestamp will be updated when the row is updated, but it also applies to a new row.

But wait! How does each sensor know what the ID is for each plant? Well, this is a bit of a cart-and-horse situation. If you consider each sensor may be read and the data sent via a small microcontroller board, then the code (sketch in Arduino lingo) must know the ID from the plants table. Since we used an AUTO_INCREMENT column in the table, we must first insert the data about the plant and then query the table for its ID. The following shows an example:

mysql> INSERT INTO plant_monitoring.plants VALUES (NULL, 'fern', 'beside picnic table', 2);

Query OK, 1 row affected (0.01 sec)

mysql> SELECT LAST_INSERT_ID();

+------------------+

| LAST_INSERT_ID() |

+------------------+

|                9 |

+------------------+

1 row in set (0.00 sec)

mysql> SELECT * FROM plant_monitoring.plants WHERE id = 9;

+----+------+---------------------+---------+

| id | name | location            | climate |

+----+------+---------------------+---------+

|  9 | fern | beside picnic table | outside |

+----+------+---------------------+---------+

1 row in set (0.00 sec)

But notice I did not have to actually query the plants table. Instead, I used the LAST_INSERT_ID() function, which returns the value of the last AUTO_INCREMENT value generated. Cool!

Once we have this information, we can program the sensor reader to send this information to the database to populate the readings table and therefore make the relationship to the plants table. That means when we query for all the sensor readings for a particular plant, the join will return the correct information. Thus, if I wanted to store data about the plant with id = 9, I would use an INSERT statement like the following:

INSERT INTO plant_monitoring.readings VALUES (9, 233, 38.4, NULL, NULL);

Database Design First? What About The Application?

I once worked for an organization that felt the application and user interface were more important than the database design. I guess that’s why they eventually hired more DBAs than developers. They had to reengineer the database several times during the development of the application, which caused major delays in deliveries and some interesting and serious bugs.

Always design your database including the questions you want to ask after you’ve defined the high-level requirements but before you implement the source code. This way, your application will be based on a fully functional database system with well-defined queries rather than struggling to make queries fit some code implementation.

Now let’s see how to get information out of the database.

Step 3: Design the Queries or Questions for the Database

Whenever designing a database, I make it a point, if not a required milestone, to determine what queries I want to run on the data (in other words, the questions I want to ask and the answers I expect). These questions are those that would be asked of an application designed to present the information to users. Think of it this way: if you had a plant-monitoring system, what features or results do you expect to see in the user interface? Defining these before you design your application will make the application development proceed much more smoothly.

Queries we may want to perform on a plant-monitoring solution include but are not limited to the following. As you will see, some of these are simple and easy, while others may take a more complex SELECT statement to achieve. I will demonstrate some of these in the following sections, leaving the others for your exploration.

  • Which plants need watering?

  • Which plants are experiencing the highest temperatures?

  • What is the average soil moisture for each plant for each day?

  • What is the temperature range that a particular plant endures during the daylight hours?

  • How many plants are being monitored?

  • How many plants are outside?

  • What is the average temperature of the inside or outside plants?

The following sections walk you through how to create the queries for three of these questions. I present the queries first without validation and later will walk through the same logical process showing you how to test each query with known test data. However, for some of the more complex queries, I show you how they work in context so that you can see how the parts fit together.

Example 1: A Simple Query

Let’s begin with a simple query. This one involves doing some aggregation. Recall one of the questions we want to ask is, “What is the average temperature of the inside or outside plants?” In this case, let’s make it a bit easier and query only for the plants that are outside. As you will see, it is trivial to change the query to find the inside plants.

Like all good query design, we begin by breaking down the problem. For this query, we need to know the average temperature for each plant. Recall we need to use a GROUP BY clause with the AVG() function. But what if we want to restrict the rows to only the readings taken on a given day such as today? That is, our application may provide a periodic status of the average temperature for each plant for the current day. We could query the data over a range of times, but that is more of an analysis operation—not that you couldn’t do that. Just to keep it simple, we’ll use today’s readings.

So, how would you query for such information? Clearly, we need to use some form of date check. Fortunately, MySQL provides many such functions. For this, we will use the DATE() and CURRENT_DATE() functions. These perform the operation of determining the samples that were taken today using the event_time TIMESTAMP column. One of the really cool and powerful features of the database is if we use these functions, we can create queries, views, functions, and so on, that automatically determine which rows were recorded today. Thus, we don’t need to read, store, and transmit the date when doing queries using these functions. Cool!

The following WHERE clause demonstrates how to do this. Notice we pass in the event_time column to the function, which then determines the date. We compare this to the current date and thus find only those rows where the sensor reading was taken today.

WHERE DATE(event_time) = CURRENT_DATE()

Now we just need to do the average. We’ve already seen an example earlier, so the following SQL command should look familiar—at least in concept:

SELECT id, AVG(temperature) as avg_temp FROM readings WHERE DATE(event_time) = CURRENT_DATE() GROUP BY id;

This will give us the average temperature and the ID for each plant for those readings taken today. All that is left is to join this information with the information in the plants table to get the name, location, and average temperature. The following SQL statement shows how to do this:

SELECT name, location, AVG(temperature) as avg_temp

FROM plants JOIN readings ON plants.id = readings.id

WHERE DATE(event_time) = CURRENT_DATE()

GROUP BY plants.id;

Wow, there is a lot going on here for such a simple query! I placed each clause on a separate line to make it easier to read. Let’s review each.

First, we see the columns selected are the plant name and location as well as the average temperature. Recall the GROUP BY clause is what controls which rows are supplied to the function. In this case, we group by the plant ID. Notice the FROM clause performs a join between the plants table and the readings table. This is so that we can get the plant information from the plants table but perform our calculations (average temperature) on the data from the readings table that matches the plants table. Thus, a simple join! Finally, we use the WHERE clause described earlier to restrict the data to only those samples taken today.

But we’re not done. We haven’t answered the question in its entirety. That is, we also want to know the average temperature for the outside plants only. All we need to do then is add that condition to the WHERE clause as follows:

SELECT name, location, AVG(temperature) as avg_temp

FROM plants JOIN readings ON plants.id = readings.id

WHERE DATE(event_time) = CURRENT_DATE() AND plants.climate = 2

GROUP BY plants.id;

Now, we’re done! To query for the inside plants, just change the climate value to 1. Recall this is the value of the enumerated data type for the climate column.

Now let’s look at a much more complex query. At first glance, this will see simple but as you shall see has many layers.

Note

What follows is just one example of how to form the query. There are several others, some more optimal, but I want to demonstrate the logical thought process you could take to solve the problem. Feel free to experiment and improve the following example.

Example 2: A Complex Query

Let’s now look at a more complex query. Or rather one that is simple in its formation but not trivial in implementation. In this case, consider the query, “Which plants need watering?” For this, we need to know which of the plants have a moisture value that is below our threshold of DRY/OK/WET. Since we use a calculated column, we do not have to see the actual sensor value. That is, we can query the readings table and discover which are marked DRY.

You may think that we just need to query the readings table for any value of DRY for the sensor readings for today. This will get you close, but it likely result in a number of rows, but what if you or an automatic plant-watering system waters the plant or what if it rains? You could have only a few readings of DRY but more readings of OK or WET. In this case, the plant may not need watering at all.

You may also consider selecting the most recent sensor reading for each plant. Indeed, this is what most people would do. But that won’t cover situations where the sensors are reading values on the borderline or produce spurious readings. For example, the hobbyist-level soil moisture sensors (in other words, the affordable ones) are not 100 percent accurate and can produce slightly inconsistent readings. They are perfectly fine for a “broad” reading so long as you understand you should review several readings either averaging or considering the percentage of readings of one range versus another.

What we need is a way to determine those plant readings of DRY that are more frequent than the other values, which requires a bit of mathematics. And you thought this was simple. Well, it is if you break it down into smaller parts, which is why I chose this example. Let’s break this down into parts. We begin with the most basic statement of what we want.

More specifically, we want to know the current day’s soil status for all plants. We will count the values of soil_status and determine the percentage of the occurrence based on the total readings for the day. We can use that information later to determine which plants need watering.

To make this work, we’re going to use a concept in MySQL called a view. Recall a view is a logical representation of a result set and can be treated like a table in other SELECT statements. The following shows the view to retrieve the current day’s soil status:

CREATE VIEW soil_status_today AS

  SELECT id, soil_status, count(soil_status) as num_events FROM plant_monitoring.readings

  WHERE DATE(event_time) = CURRENT_DATE() GROUP BY id, soil_status;

Notice also I used a GROUP BY clause to aggregate the values counting how many were each status value. Let’s see an example result. Notice I query the view just like a table.

mysql> SELECT * FROM plant_monitoring.soil_status_today;

+----+-------------+------------+

| id | soil_status | num_events |

+----+-------------+------------+

|  1 | DRY         |         10 |

|  2 | OK          |         10 |

|  3 | DRY         |          4 |

|  3 | OK          |          4 |

|  3 | WET         |          2 |

|  4 | OK          |          6 |

|  4 | WET         |          4 |

|  5 | OK          |         10 |

+----+-------------+------------+

8 rows in set (0.01 sec)

So, this tells us that, for today, plant IDs 1 and 3 are dry. But we’re not done! Consider there are multiple samples taken during the day. Some plants may be on the threshold where they are a little dry but not so much that the sensor readings are consistent. So, we want plants that are consistently dry where there are more DRY events read than OK or WET. We could use another view to get this information, but let’s see how we can do this with a stored function. The following creates a function that returns the maximum number of samples collected for today for a given plant:

DELiMITER //

CREATE FUNCTION plant_monitoring.max_samples_today (in_id int)

RETURNS int DETERMINISTIC READS SQL DATA

BEGIN

  DECLARE num_samples int;

  SELECT COUNT(*) into num_samples FROM plant_monitoring.readings

  WHERE DATE(event_time) = CURRENT_DATE() AND readings.id = in_id;

  RETURN num_samples;

END //

DELIMITER ;

Let’s see how this function works. Let’s create a query using the view and calculate the percentage of occurrence for each value for each plant. The following SELECT statement accomplishes this with a bit of mathematics. I include the rows to show you that it works.

mysql> SELECT *, max_samples_today(id) as max_samples, (num_events/max_samples_today(id)) as percent_occurrence FROM plant_monitoring.soil_status_today;

+----+-------------+------------+-------------+--------------------+

| id | soil_status | num_events | max_samples | percent_occurrence |

+----+-------------+------------+-------------+--------------------+

|  1 | DRY         |         10 |          10 |             1.0000 |

|  2 | OK          |         10 |          10 |             1.0000 |

|  3 | DRY         |          4 |          10 |             0.4000 |

|  3 | OK          |          4 |          10 |             0.4000 |

|  3 | WET         |          2 |          10 |             0.2000 |

|  4 | OK          |          6 |          10 |             0.6000 |

|  4 | WET         |          4 |          10 |             0.4000 |

|  5 | OK          |         10 |          10 |             1.0000 |

+----+-------------+------------+-------------+--------------------+

8 rows in set (0.01 sec)

In this case, I have exactly ten sensor readings for each plant for today. This is because my test data (which I will show you in a later section) is fixed. When I run this on my live plant-monitoring solution in my home, my sensor readings average about one per hour with 18 to 24 readings per day.

Notice I added the function call to get the maximum number of samples (readings) and then another column calculating the percentage of the total occurrences. But this is a lot of work. We can do it a bit easier now that we know how to do mathematical operations in the query.

All we need to do now is add a check for the percentage, say more than 50 percent, and restrict the rows to those with a soil_status of DRY. Thus, we take the previous query and add a few more conditions. We will also limit the columns in the result to just the id. The following query shows an example SELECT statement to determine which readings indicate which plant or plants need watering today:

mysql> SELECT id FROM soil_status_today WHERE ((num_events/max_samples_today(id)) > 0.50) AND soil_status = 1;

+----+

| id |

+----+

|  1 |

+----+

1 row in set (0.00 sec)

We’re almost there. Once again, we will use the previous query and join against the plant table to determine which plants need monitoring.

SELECT name, location FROM plants JOIN (SELECT id FROM soil_status_today WHERE ((num_events/max_samples_today(id)) > 0.50) AND soil_status = 1) as sub_query on plants.id = sub_query.id;

And now we know the plant that needs watering today. Notice we simply used our logical breakdown of how to find the data starting from the most basic and moving outward (as in narrowing our results). As you can see, this query wasn’t nearly that difficult, and since we created a function and a view to help us, queries similar to this one will be much easier to write.

Step 4: Testing the Database

Now that we have the database designed, implemented, and the queries decided upon (or at least all of the ones we can think of), we are ready to start building our sensor network and code the application, right? Nope. We need to test the queries not only to make sure they work (that is, have the correct syntax) but also to make sure we are getting the results we expect. This requires having a known data set to work with. It will do you little good if you find out after your application has gone viral that one of the queries doesn’t return the correct data.

At this point, the data may not be completely accurate nor does it require actual, live data. You can use made-up data so long as you make data that is representative of the range of values for each column in the table. That is, make sure you know what the min and max values are that your sensors can read. Listing 5-10 shows some sample data I created to test the queries for the plant-monitoring solution.

Listing 5-10.Sample Data

INSERT INTO plant_monitoring.plants VALUES (NULL, 'Jerusalem Cherry', 'deck', 2);

INSERT INTO plant_monitoring.plants VALUES (NULL, 'Moses in the Cradle', 'patio', 2);

INSERT INTO plant_monitoring.plants VALUES (NULL, 'Peace Lilly', 'porch', 1);

INSERT INTO plant_monitoring.plants VALUES (NULL, 'Thanksgiving Cactus', 'porch', 1);

INSERT INTO plant_monitoring.plants VALUES (NULL, 'African Violet', 'porch', 1);

INSERT INTO plant_monitoring.readings VALUES (1, 235, 39.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 235, 38.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 230, 38.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 230, 39.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 215, 39.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 215, 39.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 225, 39.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 220, 38.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 222, 38.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (1, 218, 37.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 355, 38.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 350, 38.6, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 366, 38.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 378, 38.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 361, 38.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 348, 37.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 343, 39.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 342, 38.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 358, 36.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (2, 377, 36.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 155, 33.6, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 150, 33.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 166, 33.6, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 278, 32.3, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 261, 31.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 248, 32.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 313, 33.6, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 342, 32.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 458, 31.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (3, 470, 33.4, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 333, 33.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 345, 33.6, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 360, 34.4, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 380, 34.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 395, 33.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 385, 33.4, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 425, 32.3, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 420, 31.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 422, 33.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (4, 418, 32.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 335, 39.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 335, 38.7, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 330, 38.8, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 330, 39.1, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 315, 39.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 315, 39.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 325, 39.2, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 320, 38.9, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 322, 38.5, NULL, NULL);

INSERT INTO plant_monitoring.readings VALUES (5, 318, 37.1, NULL, NULL);

Notice I used NULL for the last two columns. This is habit of mine from my DBA days where I always supply a value for every field in the table. You do not have to do it that way. Indeed, it is also correct to specify the SQL commands with just the values you know, but remember to use the column list, as shown here:

INSERT INTO plant_monitoring.readings (id, moisture, temperature) VALUES (7,418,32.5);

As a simple test, let’s do a simple join to find all the readings for a particular plant. For example, we ask the question, “What are the moisture readings for the plants on the deck?” The following query returns the results:

mysql> SELECT name, location, moisture FROM plants JOIN readings ON plants.id = readings.id WHERE location LIKE '%deck%'

+------+----------+----------+

| name | location | moisture |

+------+----------+----------+

| fern | deck     |      235 |

| fern | deck     |      235 |

| fern | deck     |      230 |

| fern | deck     |      230 |

| fern | deck     |      215 |

| fern | deck     |      215 |

| fern | deck     |      225 |

| fern | deck     |      220 |

| fern | deck     |      222 |

| fern | deck     |      218 |

+------+----------+----------+

10 rows in set (0.00 sec)

Notice I used a LIKE function in the WHERE clause. I used it because I wasn’t sure if there were more than one plant on the deck. For example, there could have been a plant whose location was, “deck by table” or “on deck under tree.” Using the LIKE with a wildcard, %, on each side returns all rows that have “deck” in the location column value. Cool! Take some time to look at the sample data in Listing 5-10 to ensure you are seeing the correct data.

Now that we have some sample data, let’s see the results of the example queries. I walk through the results in the same order that I explained the previous query.

Testing Example 1

Recall for this query we simply want the average temperature for the plants that are outside. When I test a query, I like to break it down into its most simplistic parts—the same way I did to develop it. This way, I can verify I am getting the correct results for each part. Let’s see that query again.

SELECT name, location, AVG(temperature) as avg_temp

FROM plants JOIN readings ON plants.id = readings.id

WHERE DATE(event_time) = CURRENT_DATE() AND plants.climate = 2

GROUP BY plants.id;

Let’s begin with the most basic data—the name and location of the plants that live outside.

Note

I will enter the query by parts so you can read the results better. As you can see, it is much easier to read in the MySQL client this way, and the client will “wait” until you type the semicolon to execute the query.

mysql> SELECT * FROM plants

    -> WHERE climate = 2;

+----+---------------------+----------+---------+

| id | name                | location | climate |

+----+---------------------+----------+---------+

|  1 | Jerusalem Cherry    | deck     | outside |

|  2 | Moses in the Cradle | patio    | outside |

+----+---------------------+----------+---------+

2 rows in set (0.00 sec)

So, we see there are two plants outside, one on the deck and another on the patio. Now, what is the average temperature of each plant’s readings for today?

mysql> SELECT id, AVG(temperature)

    -> FROM readings

    -> WHERE DATE(event_time) = CURRENT_DATE() GROUP BY id;

+----+--------------------+

| id | AVG(temperature)   |

+----+--------------------+

|  1 |  38.89000015258789 |

|  2 |  38.12999954223633 |

|  3 | 32.859999656677246 |

|  4 |  33.21000003814697 |

|  5 |  38.89000015258789 |

+----+--------------------+

5 rows in set (0.00 sec)

Let’s combine the two to make sure we get exactly what we expect from the test data. We should first examine the data and try to determine what we should see. In this case, we should see one plant with an average temperature of 38.89 and another with 38.13 (rounding up).

mysql> SELECT name, location, AVG(temperature) as avg_temp

    -> FROM plants JOIN readings ON plants.id = readings.id

    -> WHERE DATE(event_time) = CURRENT_DATE() AND plants.climate = 2

    -> GROUP BY plants.id;

+---------------------+----------+-------------------+

| name                | location | avg_temp          |

+---------------------+----------+-------------------+

| Jerusalem Cherry    | deck     | 38.89000015258789 |

| Moses in the Cradle | patio    | 38.12999954223633 |

+---------------------+----------+-------------------+

2 rows in set (0.00 sec)

Yes! Now we know that query gets the results we want. What about that other, complex query?

Testing the Complex Query

For the complex query, we want to know the plants that need watering today. You have already seen how to break this query down into parts to construct the SQL statement. Let’s see how the database generates the results of each part then combine them to validate the query.

Let’s begin with the view we created. Recall, this view returns the ID, soil_status, and count of each soil_status value for those readings taken today. That is, we should see several rows for those plants that have more than one soil status value for today.

mysql> SELECT *

    -> FROM soil_status_today;

+----+-------------+------------+

| id | soil_status | num_events |

+----+-------------+------------+

|  1 | DRY         |         10 |

|  2 | OK          |         10 |

|  3 | DRY         |          4 |

|  3 | OK          |          4 |

|  3 | WET         |          2 |

|  4 | OK          |          6 |

|  4 | WET         |          4 |

|  5 | OK          |         10 |

+----+-------------+------------+

8 rows in set (0.00 sec)

Great! Now we know the status of each plant’s soil moisture for today. Notice some plants have more than one value as their soil moisture changed throughout the day. What we want are those plants that have more “dry” values than other values. But let’s slow down a bit.

Recall we used a function to calculate the maximum samples for a given plant taken today. Let’s use that function to zero in on the plant IDs in the previous results. In this case, we see several rows that have a higher number of one value for soil_status. Let’s use the function to return the percentage of occurrences for each value found. In this case, I will query for all the plants, calculating the percentages so that we can see all the data more easily. What we should see are the same rows as before only with the average samples added.

mysql> SELECT id, soil_status, num_events, (num_events/max_samples_today(id)) as percent_occurrence

    -> FROM soil_status_today;

+----+-------------+------------+--------------------+

| id | soil_status | num_events | percent_occurrence |

+----+-------------+------------+--------------------+

|  1 | DRY         |         10 |             1.0000 |

|  2 | OK          |         10 |             1.0000 |

|  3 | DRY         |          4 |             0.4000 |

|  3 | OK          |          4 |             0.4000 |

|  3 | WET         |          2 |             0.2000 |

|  4 | OK          |          6 |             0.6000 |

|  4 | WET         |          4 |             0.4000 |

|  5 | OK          |         10 |             1.0000 |

+----+-------------+------------+--------------------+

8 rows in set (0.00 sec)

Great, seven rows! Now, we’re getting somewhere. Now let’s limit the output of that result with only those that have a higher than 50 percent occurrence.

mysql> SELECT id, soil_status, num_events, (num_events/max_samples_today(id)) as percent_occurrence

    -> FROM soil_status_today

    -> WHERE (num_events/max_samples_today(id)) > 0.50;

+----+-------------+------------+--------------------+

| id | soil_status | num_events | percent_occurrence |

+----+-------------+------------+--------------------+

|  1 | DRY         |         10 |             1.0000 |

|  2 | OK          |         10 |             1.0000 |

|  4 | OK          |          6 |             0.6000 |

|  5 | OK          |         10 |             1.0000 |

+----+-------------+------------+--------------------+

4 rows in set (0.00 sec)

Notice all we did was add a WHERE clause. Now we’ve got those rows for readings taken today that have a higher than 50 percent occurrence of a single value for soil_status. Let’s expand that query one more time and add the condition for soil_status = 'DRY'.

Tip

Did you notice something there? Look at soil_status = 'DRY'. Notice anything weird about that? Yep, it’s an enumerated column, and I used one of the values instead of a numeric value like previous examples. As you can see, so long as the values are listed in the enumeration, you can use either the numeric or text value. You will get an error if the text does not match one of the enumerated values.

mysql> SELECT id, soil_status, num_events, (num_events/max_samples_today(id)) as percent_occurrence                    -> FROM soil_status_today

    -> WHERE (num_events/max_samples_today(id)) > 0.50 AND soil_status = 'DRY'

+----+-------------+------------+--------------------+

| id | soil_status | num_events | percent_occurrence |

+----+-------------+------------+--------------------+

|  1 | DRY         |         10 |             1.0000 |

+----+-------------+------------+--------------------+

1 row in set (0.00 sec)

Perfect! We’re almost there. Now, we want to know the name and location of that plant, id = 1. For this, we add a new join to get the information from the plants table. Notice I use the numeric value for the soil status column.

mysql> SELECT plants.id, name, location, soil_status, num_events, (num_events/max_samples_today(plants.id)) as percent_occurrence

    -> FROM soil_status_today JOIN plants ON soil_status_today.id = plants.id

    -> WHERE (num_events/max_samples_today(plants.id)) > 0.50 AND soil_status = 1;

+----+------------------+----------+-------------+------------+--------------------+

| id | name             | location | soil_status | num_events | percent_occurrence |

+----+------------------+----------+-------------+------------+--------------------+

|  1 | Jerusalem Cherry | deck     | DRY         |         10 |             1.0000 |

+----+------------------+----------+-------------+------------+--------------------+

1 row in set (0.01 sec)

Aha! Now, we’ve got it. Well, almost. We have too much information. But at least we can see the information is correct. Now, we just want the plant name and location. Let’s limit that output a bit.

mysql> SELECT name, location

    -> FROM soil_status_today JOIN plants ON soil_status_today.id = plants.id

    -> WHERE (num_events/max_samples_today(plants.id)) > 0.50 AND soil_status = 1;

+------------------+----------+

| name             | location |

+------------------+----------+

| Jerusalem Cherry | deck     |

+------------------+----------+

1 row in set (0.00 sec)

And there it is! Easy, right? Well, not so much if this was your first attempt, but notice how much easier it is to write the query when we use tools such as views, functions, and aggregate features!

Now that we have seen a complete and working example, let’s consider the recommendations and best practices for designing databases for IOT solutions.Footnote 15

Recommendations and Best Practices

Let’s review the best practices and recommendations for how you should build you database(s) for your IOT solution. This section presents a number of tips for designing databases. That said, this section cannot cover all there is to learn or know about database design. Indeed, there are many more, especially for enterprise-level database design and management. Thus, it would require several chapters, many pages, and an entire book many times the size of this section to do it justice. However, I feel it is important to close out the discussion of learning to use MySQL and databases in your IOT solutions with a reference for you to review when planning your IOT database design. I list them in no particular order.

  • Use the correct data type: Do your homework and determine the best data type to use for each column. Avoid the temptation of using all character fields, which can make comparisons return false results or frustrate you as you try to figure out why a view or function fails.

  • Use the smallest data type: Try to use the smallest data type for your data. For example, avoid using wide character fields. You can always increase the size if you need to (should you start truncating the data). Similarly, use the binary type that matches the maximum value you will store. For example, if you do not need high precision or big numbers, do not use double or long integer data types.

  • Use indexes: For queries on data that is more than a few dozen rows, consider adding indexes on the frequently queried columns. Indexes can vastly improve joins and complex queries such as ranges if there are indexes on the columns being compared or calculated.

  • Don’t store redundant data: Try to avoid the temptation of storing the same value in multiple tables. It may make your queries easier to write but can make updates difficult. That is, how would you know where every occurrence of the column exists for larger databases? Use master-detail relationships to simplify the data.

  • Plan your queries: Always include the questions you want to ask of the database when designing tables. Preparing the queries ahead of time will help with development efforts later.

  • Avoid using SELECT*: While it is easy to just get all the columns for a table, the use of * in the column specification can be suboptimal for large tables or tables with a lot of columns. We saw this earlier in the complex example. The only columns needed were the name and location. Thus, you should resist the temptation to get all the columns and instead specify the columns you want.

  • Use lookup tables for fixed or seldom changed data: Similar to redundant data, using static (or seldom updated) tables can help reduce the amount of extra data you are passing in result sets and storing. It is much more efficient to store a shorter integer or similar key than to store one or more fields throughout the table.

  • Use the power of the database server whenever possible: Try to think of ways to offload the processing of aggregates, mathematical functions, and especially computational expensive tasks such as date and string manipulation to the database server. The power of views, functions, triggers, and so on, cannot be understated. It may take some time to become proficient in these concepts, but the payoff can mean using smaller, cheaper components in your IOT network nodes.

  • Use good, coherent names: Try to use as descriptive and coherent names in your database objects. Resist the temptation to save coding keystrokes by using a, b, c, and so on, as object or column names. If someone other than you tried to figure out what the data describes, they’d be completely lost. It is better to be slightly verbose than terse. Finally, avoid using acronyms or nonstandard abbreviations, which can obfuscate the meaning. For example, what is kdxprt? Kid expert? Nope. I had to ask the designer—it means (in a politically correct manner), “former parent.” Don’t do that.

  • Use primary keys on master tables: For any data that can be uniquely identified, use a primary key that can uniquely identify each row. You can use multiple columns in a primary key. For tables where the columns do not uniquely identify the row, you can add a surrogate or artificial primary key using an AUTO_INCREMENT integer data type.

  • Avoid wide tables: If your table contains 20 or more columns, it is likely it is poorly designed. More specifically in database terms, it is not normalized. Look at the data again and determine whether the columns used can be moved to another table. Look for columns that are reference in nature, do not change frequently, or are redundant.

  • Do not throw away data: You should always retain the raw data in your tables. If your database stores sensor readings, store the raw values. If you need a calculated column to make reading or queries easier, it is OK to store them like I showed you in the plant-monitoring database, but try to make these columns simple and use a trigger to set the value rather than code on your data or sensor nodes. This isolates the code to a single location and once tested can be relied upon. Having the raw data will allow you to plan queries in the future that you may not have considered requiring the raw data.

  • Avoid storing binary data: While database systems allow you to store large, binary data (in MySQL a BLOB or binary large object), they are not efficient and a poor choice for tables that store many rows this way, especially if they do not change. For example, if you wanted to store a photo associated with a data item (a row in your master table), you should consider making a field to store a path to the image and store it on the database server. While this creates a new problem—changing the path—it removes the BLOB from your table and can make queries a bit more efficient.

  • Normalize your database: Normalization is a big deal for a lot of database experts and rightly so, but for the enthusiast and hobbyist getting your database into one of the higher normal forms (from relational database theory) may be far too much work. That said, I encourage you to strive for third normal form,Footnote 16 but don’t kill yourself getting there. You can ruin a large database quickly trying to over normalize. A small amount of under normalization is permissible if efficiency is not lost to gain simplicity.

  • Design your database before coding: Always design your database after you’ve defined the high-level requirements but before you implement the source code. This may sound backward, but it is an excellent habit to form.

  • Test, test, test! I cannot stress this enough. Time spent testing your database and more importantly your queries with known (test) data will save you a lot of headaches later when you start developing an application to query and present the data.

  • Back up your data: Once you have your solution running, perform a backup on the data. If you do not have a lot of data, tools such as mysqldump, MySQL Utilities,Footnote 17 and similar can make a logical backup of your data in SQL form, which you can restore should you need to do so. If your data is larger, say gigabytes or more, you should consider a commercial backup solution such as MySQL Enterprise Backup.

  • Document your database: It may seem like extra work if your database contains only a single or small number of tables and little data, but imagine what would happen if your solution ran for years without issue and then one day you needed to add new features or troubleshoot a problem. If you don’t know what the database stores or how it produces the results (views, triggers, and so on), you may spend a lot of time digging into false leads. You can document your database in a number of ways. I like to store the SQL statements in a file and write short descriptions of each object. See the example code for this chapter for an example.

Summary

The MySQL database server is a powerful tool. Given its unique placement in the market as the database server for the Internet, it is not surprising that IOT developers (as well as many startup and similar Internet properties) have chosen MySQL for their IOT solutions. Not only is the server robust and easy to use, it is also available as a free community license that you can use to keep your initial investment within budget.

In this chapter, you discovered some of the power of using a database server, how database servers store data, and how to issue commands for creating databases and tables for storing data as well as commands for retrieving that data. While this chapter presents only a small primer on MySQL, you learned how to get started with your own IOT data through the example IOT solution shown. It is likely your own IOT solution will be similar in scope (but perhaps not the same database objects or table layout [schema]).

In the next chapter, you will see how to build a database node using a Raspberry Pi. You will see how the examples in this chapter concerning how and where the data is stored can be leveraged to make a robust MySQL server using a low-cost computer. The process for installing MySQL on other boards is similar.