Database Forensics

Le-Khac, Nhien-An; Choo, Kim-Kwang Raymond

doi:10.1007/978-3-031-16127-8_2

Part of the book series: Studies in Big Data ((SBD,volume 116))

563 Accesses
1 Citations

Abstract

This chapter provides a high-level overview on databases including relational databases, database management systems and the structure query language (SQL). A database forensic process is also described, following by some examples of the database investigation. Concepts presented in this chapter are necessary for understanding the later chapters in this book.

Access provided by Autonomous University of Puebla. Download chapter PDF

Schema Reconstruction in Database Forensics

Overview of Databases

History of Databases

2.1 Introduction to Databases

Today, investigators need databases to store and analyze forensic and criminal data (Fig. 2.1). Hence, they should design and build database solutions for investigations.

Two tables. One has 8 columns and 5 rows with headers case number, date, block, primary type, description, location d e, arrest, and F B I code, and the other titled, Investigations has headers case number, case type, and O i C. — **Fig. 2.1**

Besides, most parts of digital forensics today deal with extraction and collection of evidences from databases such as history or cookies information of browsers [1], account information, contact list or call logs of VoIP (Voice over Internet Protocol) application (Fig. 2.2) [2, 3] and social media apps [4, 5]. Therefore with a best understanding of database structure, investigator could retrieve evidences more efficient from variant types of data across electronic devices.

Two sets of databases. One is Contacts dot s q lite, S Q Lite 3, under which are 7 labeled tables, and the other one has a folder labeled with a series of letters and numbers, with subfolders, documents, dot s giggle, app data, and l u a. — **Fig. 2.2**

2.1.1 What Is a Database?

Database is not only tables as shown in previous examples. To define what a database is, we should know what is data? So, Eliot said: “Where is the wisdom? Lost in the knowledge. Where is the knowledge? Lost in the information” [6], and Mr. Celko, he said: “Where is the information? Lost in the data” [7]. Maybe the poet Eliot never wrote a computer program in his life but Mr. Celko did. However, we agree with both Eliot and Celko on their points about unofficial definition of knowledge, information and data. Actually, data is a representation of fact, figure or idea. Data normally refers to a collection of numbers, characters, image, etc.

Besides, when data has a relational connection, its meaning is an information. For example, Bob, 10 are data. “Bob” is “10” years old is an information. Knowledge is the appropriate collection of information such that it is to be useful. For example, when we have a collection of information such as “Bob is 10 years old”, “Alice is 5 years old”, “Rian is 6 years old”, etc. we can have a rule “80% of children in this room is less than 8 years old”. This rule is a knowledge.

Data is typically processed by human or stored and processed in computer as files. But how these files are organized so that users can access their information easily and effectively at any given time? It could be a database.

There are many way is define a database. Firstly, it is a collection of data, which is structurally stored in a computer system. A database can be considered a collection of related data which are describing the activities of relevant organizations. For instance, a school database contains information about the pupils, school, subjects, and rooms. This database also has the relationships between these objects such as school teaching subjects and the use of rooms for subjects. At the technical point of view, a database could be a tool that stores data, and allows users to create, read, update, and delete the relevant data per request [7].

Databases can be simple or very complicated [8]. We can have not only Universities’ databases as discussed previously but also many examples of databases in everyday uses. For example, a telephone company has a customer database with basic information about their clients such as first name, surname, address, city, postal code and phone number.

An example of large databases is databases existing in banking systems, library catalogues, hotel or airline reservation, social networking, etc. Besides, databases can also be used to store images, audio streams and videos digitally.

Law enforcement has been used databases to store and manage criminal data, which can help to identify and catch. Also, by suing the databases, law enforcement members can effectively gather relevant information to assist in investigations.

2.1.2 Database Management System (DBMS)

A Database Management System (DBMS) [9] is a software system specifically designed to handle and exploit databases at different scales. Most DBMSs are used to manage relational databases. But why we need a DBMS? Today, we live in a world experiencing information explosion. In order to manage efficiently the huge amount of data, we need DBMSs because a DBMS can provide:

Data independence: A DBMS gives an abstract view of data representation and storage to the application programs.
Efficient data access: A DBMS normally applies optimal techniques to handle and access data efficiently.
Data integrity and security: A DBMS implements a variety of mechanisms to guarantee the integrity constraints on the data and to control the access the relevant data.
Data administration: A DBMS allows the databases to share among several users or different user groups in an efficient way in terms of storing and retrieving.

A DBMS moreover can handles concurrent access and crash recovery as well as to reduce application development time. However, a DBMS has some drawbacks such as its overhead costs.

Some popular DBMSs in the market can be listed as Microsoft SQL Server [10], Oracle database (Oracle DB) [11], PostgreSQL [12] (open source) and MySQL [13] (open source).

There are four important elements of a DBMS: modeling language, data structures, data query language and transactions. A data model is a collection of data description at a high-level that hides details in low-level storage. A data model can be represented by data schemas. A modeling language is used to define the schema of each database stored in a DBMS, according to the data model. As mentioned previously, most DBMSs today are using the relational data model. There are moreover other data model such as: entity-relationship model, relational algebra, hierarchical model, network model and object data model.

Data structure relates to data types, relationships and constraints and the data structures allows DBMS to interact with the data align to their integrity.

A query language is a specialized language that allows to post a queries. A query is a question involving the data stored in a DBMS. For example, some queries can be posted for the University database such as “What is the student name of the student with an ID 1234?”, “How many students are enrolled in course COMP47370?” etc.

A transaction is an execution from the user program in a DBMS. It is also a basic unit of change in the DBMS.

2.1.3 Database Types and Users

Let’s have a look at different types of a database [14]. First, it is a flat file. A flat file is simply file containing text. A flat file could be a database if we add structure in. For example, by separating values with commas, a flat file becomes a .csv file. Although flat files do not provide many services, they are simple and easy to understand. Flat files are also good places to store configuration settings such as .INI files.

In a hierarchical database, records’ relationships form a tree-like structure. One data record logically links to other data records. The structure of hierarchical database is simple and it is restricted to a one-to-many relationship. An example of a hierarchical database is the Windows system registry (Fig. 2.3).

A window titled registry editor has different labeled folders and subfolders on the left panel, where the windows folder is highlighted. On the right panel is a table with 3 columns and 9 rows with headers, name, type, and data. — **Fig. 2.3**

A network database is not a database that is used over the network of computers. It relates to the database structure that consists of a collection of nodes connected by links. The nodes and links represent objects such as members of a social network. This database structure is uncommon and normally used to express many-to-many relationships [14].

The object oriented database is used to manage objects of varied types such as pictures, video clips, voice and text, etc. The object database management systems (ODBMS) provides special query syntax for accessing and retrieving objects from the database. This database type is popular for Web-based applications.

Regarding the database users, there are some popular roles in practice such as a database administrators (DBA), database designers, System Analysts and Application programmers and finally end users.

2.2 Relational Databases

2.2.1 Basic Concepts

In the relational data model, data is organized in tables [8, 9, 14]. A table has a set of records (or rows) and each record can have many attributes/fields (or columns). The simple structure of relational data model makes it easy to understand and easy to exploit with high-level languages such as Structured Query Language (SQL) [15] to query the data. Current popular relational DBMSs include Oracle DB, MS SQL Server, MySQL, DBs, etc.

Figure 2.4 illustrates an example of a relational database, which is a university database to explain the following concepts: table, row and column. Within this database will be three tables named Student, Module and Enroll. Table Student for example includes student information such as student id (SID), name, age and grade point average (gpa). Each information is described as a field (or an attribute, or a column) of this table. A column is also called an attribute. The set of the validate values of an attribute is called the attribute’s domain. It relates to data type of this field. For example, the domain of the attribute age is integer, the domain of the attribute name is string and of the attribute gpa is real number. Besides, a table is sometime called a relation.

A database has 3 tables where column headers have arrows on them labeled module and fields. A set of arrows pointed at the rows is named tuples. — **Fig. 2.4**

Schema versus Instance

At the model level, there are two concepts related to relational database: schema and instance.

A schema describes the column head for the table that specifies how data to be structured at the logic level. The schema is also called meta-data. In fact, a schema species the name of table, all attribute (field) names, and the domain for each attribute. The schema is normally defined at the setup time of a relational database and it’s rarely changed because of high cost for updating of instances.

For example, for the Student table in previous example, we have the Student schema as you can see here, the name of this schema is also Student and this schema specifies the name and data type of each attribute.

A record/row in a table is called a table instance, it is also called a tuple. In fact, all tuples in a table have the same number of fields/attributes and there is no two tuples that are identical. That means a table contains a set of unique instances. Besides the order of these instances are not important. The value of an instance can be updated but it always conforms to its schema. In table Student for example (Fig. 2.4), there are five instance such as (234, Rian, 18, 2.3), (541, James, 22, 1.8), etc.

Keys and Index

Key is an important concept of relational database. A key is a combination of one or more fields that it can be used to identify records (rows) in a table. A key of a table is defined as a set of one or more attributes (or columns) in this table. For example, a Student table can use SID as a key to find students. If a student’s ID is known, the relevant student’s record can be found in the table. Hence, this table, the key is SID.

A compound/composite key is a key that has more than one attributes (or columns). For example, in table Case_Investigation (Fig. 2.5), it might be used the combination of CaseID and Investigator as a compound key. Of course there is an assumption that there is no duplication of investigator names in the department. Besides, in the previous example of University’s database, it might be used the combination of SID and CID to look up enrollments.

A table has 6 columns and 5 rows. The column headers are Case I D, Criminal type, Description, Location, Investigator, and Date. — **Fig. 2.5**

A superkey is a set of one or more attributes in a table so that a record is unique. There are no two records that can have the same values. Hence, a superkey is also called a unique key. For example, in the Enroll table (Fig. 2.4), the SID and CID attributes together form a superkey because no two records have exactly the same SID and CID values. Besides, in the Student table, the superkey is SID.

There are more key definition in relational database. A primary key is a superkey and a table can have only one primary key. It should be noted that every records in a database has the own primary key.

Besides, a secondary key is used to lookup records but it does not guarantee the record uniqueness.

Another kind of key is the foreign key, which is used to refer to a primary key of another table.

Index is another important concept of the relational databases. This special database structure allows to find records quicker and easier by using one or more attributes’ values. Note that indexes are not the same as keys. For example, as shown in Fig. 2.4, Student table holds student information: name, age, and gpa. This table also has the primary key SID. If students do not remember their student IDs, the student name can be used to search a student. If this table is indexed by name, users can quickly locate a student’s record in two ways: by student ID or by name.

2.2.2 Database Design

Databases store vast quantities of information. Consider government social security databases for instance. Information is stored on every citizen in the entire nation. This information includes name, address, date of birth, income, tax status et cetera. Searching through all of this information can be very time consuming. If every record needed to be checked in the database it would be a very inefficient system. As mentioned above, there are two ways to make these searches faster: using keys or indexes. For example, the social security system gives each person in the nation a unique identifying number: a social security number. This number acts as a key. It is a unique number that identifies the person in question and can be searched for very quickly in the database. Rather than search for the name for example, Jim Murphy in the database users can search for the unique social security number.

There are three main steps in database design [9]. Gathering the required information, in other words to make sure that all of the necessary information can be stored in the database. These pieces of information will form the fields. This information should also be logically divided into tables. For each of the fields identified designers should select a datatype for this. Finally designers need to create keys or indexes to make retrieval quicker.

2.3 Structured Query Language (SQL)

2.3.1 SQL and SQLite

SQL, or Structured English Query Language [15] is the standard query language for relational DBMSs. There are different versions of SQL: SQL-86 or SQL-1 is the first standard version of SQL. SQL-2 is a revised and expanded version of SQL-86, it is also called SQL-92. The next version is a well-recognized standard SQL-99. The other standards such as SQL-3 have been proposed. However they are not fully endorsed by the industry.

SQL includes statements for data definitions, queries and updates. SQL uses the term table, row and column for the relation model.

For example, the CREATE command in SQL is for the data definition that is used to create schemas, tables, relations, domains, etc. The ALTER command is used to change the definition of a table such as adding or dropping a column, changing a column definition (name of column, data type), and adding or dropping table constraints. The DROP command is used to drop tables, domains or constraints. The SELECT command in SQL is a basic statement for retrieving information from database.

SQLite is public-domain, lightweight relational DBMS [16]. SQLite follows nearly entire SQL-92 standard. SQLite is designed by D. Richard Hipp and the first version was released in August 2000.

SQLite does not require a separate server as other DBMSs such as MySQL, SQOL server of Oracle to operate. That means users do not need to setup/configure a server. It is stability, ease to use as for instance, creating a SQLite database is as simple as opening a file. Moreover, the whole database are stored in a single file that can run on many platforms. SQLite can runs with a limit of hardware resource (CPU, RAM).

Recently, many applications are now storing log information in SQLite 3 database format. These include: Skype; WhatsApp; iOS; Google-Chrome; Mozilla Firefox and many more. So, the ability to extract this log information and handle it in meaningful ways is essential in forensics.

2.3.2 SQLite Basic Commands

The command line interface for SQLite available for all major. There are also some SQLite management commands that start with a dot as follows:

.tables: display all tables in the database
.schema: display the schema of tables created
.quit to exit the SQLite application

There are also some basic SQLite commands as follows^{Footnote 1}:

Launch SQLite: sqlite3
Create/open a SQLite file: sqlite3 <filename>
Create a table: CREATE TABLE command
Add a record into a table: INSERT INTO command
Delete a table: DROP TABLE command
Retrieve information: SELECT command
Delete a record: DELETE command

Examples:

Create/open university database:

sqlite3 university

Create Student table:

CREATE TABLE Student (SID INTEGER,

name TEXT,

age INTEGER,

gpa REAL);

Add a record into Student table:

INSERT INTO Student VALUES (1, “Rian”, 18, 3.3);

Retrieve all students who have gpa greater than 3.68:

SELECT * FROM Student WHERE gpa > 3.68;

Delete student James:

DELETE FROM Student WHERE name = ‘James’;

Delete Student table:

DROP TABLE Student.

2.4 Database Forensics

As mentioned in Sect. 2.1, there are many applications varied from web browsers to mobile apps, running on different operating systems and platforms that are using databases, especially SQLite databases to store relevant information such as searching history or cookies information of browsers, account information or call logs, which are important artifacts for any forensic investigation. This also means that investigators regularly encounter such artifacts from databases. Hence, how to conduct the database forensics from a given application? How to acquire and analysis artifacts?

Figure 2.6 illustrates a flow chart of the suggested forensic method that has three main phases: preparation, acquisition and analysis.

A flowchart has 6 major stages namely, explore the app, identify database files, collect database info, run S Q L queries, analysis, and report. The sub-stages that use the app and perform tasks are boxed. — **Fig. 2.6**

The investigator should explore the application in the first phase (Step 1, Fig. 2.6). The objective is to understand the application i.e. what can the application do? What information might be stored in its databases (e.g. SQLite files)? To achieve this objective, investigators should look for information related to the application, using application websites, open sources, published documents, forensic sites and blogs etc. Without knowledge of what the application does it is very difficult to know what to look for. Hence, in some cases, using the application (Step 1a, Fig. 2.6) is recommended to gain a better understanding of this application.

The second phase includes three steps: identify the database files, collect database information and extract relevant information from databases.

The objective of identifying the database files (Step 2, Fig. 2.6) is to locate all databases used by the application. Investigators can look at open sources on the internet. Databases of existing applications have probably been illustrated in published documents, relevant websites, blogs, etc. Moreover, Profiles, Preferences folders in applications could store location information. Another approach is to install the application in a virtual machine or a simulated platform. Investigators should take snapshots of (database) files existing before and after the installation and try to identify the new (database) files in the after snapshot. The outcome of this step is a list of all database files used by this application.

Next, investigators should collect as much as much information on the databases located (Step 3, Fig. 2.6). The relevant information of each database can be listed as a list of tables, the structure of all tables, the number of entries in all tables, the relationship between tables, index and key for each table, etc. To assist the collection information in this step, a good practice is to use the application by performing its common tasks (Step 3a, Fig. 2.6) then repeat Step 3 to identify which information created or updated, which table entries have been modified and examine these changes.

The final step of this acquisition stage is to launch SQL queries to extract necessary data (artifacts) from all relevant tables (Step 4, Fig. 2.6) of the databases. Note that the storage of the acquired data and devices should follow the policy regarding the chain of custody and jurisdiction.

The acquired data are analyzed in Stage 3 (Step 5, Fig. 2.6). Note that investigators could re-run Step 4 (Fig. 2.6) if the acquired data outcome is un-sufficient for the investigation.

Finally, when the analysis phase is finished, investigators should conduct a review of the process and the actions of the previous steps to validate the investigation process. A formal report is produced in this phase (Step 6, Fig. 2.6) to record all the steps of the investigation, explain the findings etc.

2.5 Examples

This section presents the forensic investigation of databases from some popular applications as examples, more case studies of database forensics will be introduced in the following chapters.

2.5.1 IOS Database Investigation

Smartphones are widely used today and there are security risks associated with their use such as conducting a digital crime or becoming a victim of one. Hence, smartphones have an important role in the crime investigation [17]. Eventually, in crime scenes the importance of evidence out of the smartphone is increasing with the effect that more and more phones are seized and for a thoroughly investigation offered at the digital department of the police. In the early days the possible information out of a cell phone was phone calls, SMS and contacts. Nowadays the smartphone contains email, web browsing information, Chat, Instant Message, Documents, Photos, Videos, and plenty of applications used daily [18]. The most popular operating systems on smartphones today can be listed as iOS and Android.

iOS (or iPhone OS up to Version 3) is developed by Apple specifically for many Apple’s products such as iPhone, iPad, iPod touch, Apple TV, etc. iOS revolutionized the way cell phones have been created. Like on other mobile platforms, most important information on the iOS devices are stored in SQLite databases that are used by both native and third-party applications. The popular databases in iOS are the Address Book, SMS, and Call History databases. Table 2.1 lists common iOS databases in different iOS versions.

Table 2.1 List of common iOS databases

Full size table

Most evidence generated by native applications are located in the Library directory. In this directory, the AddressBook refers to the information related to the personal contacts that present in the Contact application. There are two databases of interest in this AddressBook directory: AddressBook.sqlitedb and AddressBookImages.sqlitedb. The AddressBook.sqlitedb contains the information for each contact such as name, surname, phone number, e-mail address, etc. The number of tables in this database depends on iOS version. On the left of Fig. 2.7 is a list of tables from the contact database AddressBook.sqlitedb, which can be found in the Home/Library/AddressBook folder of recent iOS versions. The tables of interest are mainly ABPerson and ABMultiValue. ABPerson table (Fig. 2.7, right) with 46 fields contains the name, organization, department, and other general information for each contact. ABMultivalue table contains phone numbers, email addresses, website URLs, and other data for the case a contact may have more than one.

A dataset has 2 folders. The one on the left has the table, first sort section count total, highlighted, and the one on the right is labeled, A B person. — **Fig. 2.7**

Some multivalued entries contain multiple values. For example, an address consists of a city, state, zip code, and country code and these values can be found in ABMultiValueEntry table. This table has parend_id field, which corresponds to the rowid of ABMultiValue table.

Figure 2.8 illustrates an example of retrieving some important information from tables of AddressBook.sqlitedb database.

A snippet from the S Q Lite query contains some functions for the selection of a person's info, like the name, department, organization, and birthday. — **Fig. 2.8**

AddressBookImages.sqlitedb contains images associated to a given contact. In this database, the important table is ABFullSizeImage.

The Call History information is stored in the Call History database: callhistory.storedata that contains each of the missed, placed, and received calls, etc. on the device. So, this information helps to find tracks about incoming, outgoing and missed calls with time and date occurred and their duration. Tables contained within this database are listed in Fig. 2.9 (left) where ZCALLRECORD (right) is the important one.

A folder, labeled tables has 5 entries, the entry Z call record is highlighted, and a list of its fields is depicted on the right. A snippet of an S Q Lite query is below. — **Fig. 2.9**

Figure 2.9 (bottom) shows an example of exploring some important fields of ZCALLRECORD table such as the call sequence id, phone number, the duration of the call, the call direction, the call status and the call timestamp. The headers command is used in this example to display the column headers in our query results. The value of its parameter must be either ON or OFF. Note that the call direction showed in this example is a number i.e. 0, 1. To display this information in a more comprehensive format for the query result, ZORIGINATED field is used with the value 0 is “Incoming” and value 1 is “Outgoing” with the SQLite CASE command (Fig. 2.10). Similarly, in the last example, the CASE command for the call status ZANSWERED field is used, with value 0—a missed call and 1—a call answered (Fig. 2.10).

Three sets of snippets from S Q Lite queries have certain texts on them boxed. Boxed texts include call direction, case when and then, among others. — **Fig. 2.10**

Messages are one of the most significant data items to be recovered from iOS devices. The database, which is used to storedsuch information in iOs is sms.db that can be found in the Home/Library/SMS folder and the important table is message table (Fig. 2.11). This table contains the message, date and time, and whether the message was sent or received, etc. For example, this table uses specific fields for each status such as is_sent, is_read, is_delievered, etc. and the value of each specific field is either 0 (default) or 1 (Fig. 2.11).

A folder, labeled tables has 10 entries, and the entry, message, is highlighted. A portion of the list of entries under the message folder is boxed. — **Fig. 2.11**

From the forensic point of views, there are also other interesting sqlite databases in iOS such as notes, photos, voicemail, healthdb and tcc.db (Table 2.1). tcc.db database for example, tcc means Transparency Consent and Control system, it contains all the prevailing settings for privacy controls, including the allow lists which are displayed in the Privacy tab of the Security and Privacy pane. The healthdb.sqlite database contains all information collected or received by the Health app. The voicemail.db database contains voicemail entries. The photos.sqlite database stores a lot of photo asset information including location, date, time, etc.

Note database NoteStore.sqlite is also interesting because some users tend to store passwords and other important info in notes. Figure 2.12 illustrates tables in the NoteStore.sqlite database, and the important tables are ZNOTE and ZNOTEBODY. Some important fields are ZAUTHOR that contains the author’s email address, ZTITLE contains the title of the note and ZCONTENT field of the ZNOTEBODY table has the content of the note. However, most of the notes’ contains in the recent iOS version can be found in NoteStore.sqlite database that can be synchronized with the iCloud account and the note content is normally encrypted.

A folder, labeled tables has 11 entries, Z note, and Z note body are open on the right. The entries Z author, Z title, and Z content are boxed. — **Fig. 2.12**

Figure 2.13 lists tables in the Photos.sqlite database, and an important table is ZGENERICASSET. There are many fields in this ZGENERICASSET table. Some important fields are ZDATECREATED, which is the timestamp of the image, ZLATITUDE and ZLONGITUDE, which are the location where the photo is taken, ZFILENAME and so on. Figure 2.13 also has an example of SQLite query to show all photo names with the timestamp and location.

A folder, labeled tables, has the Z generic asset table highlighted with boxed entries like z longitude, and z latitude. A snippet of an S Q Lite query is below. — **Fig. 2.13**

Finally, Fig. 2.14 shows tables of voicemail.db database with an example of a SQLite query to get a list of voicemails sorted by date.

A folder for call history dot store data tables has the voicemail table highlighted with the entry date boxed. A snippet of an S Q Lite query is below. — **Fig. 2.14**

2.5.2 WhatsApp Database Forensics

WhatsApp is an Instant Messaging and Voice over IP (VoIP) app developed by Brian Acton and Jan Koum. Users can use WhatsApp to exchange instant messages, images, video and audio media messages. Today, there are around two billion WhatsApp registered users active monthly. Hence it becomes an important source of forensic investigation [19].

WhatsApp is available on different devices such as iPhone, iPad, Android phones and tablets and also with different platforms such as Windows, Mac, iOS, Android, etc.

On iOS systems, the manual file system analysis initially shown that the WhatsApp files are in the directory group.net.whatsapp.WhatsApp.shared/. The activity and contact information are stored in SQLite database files. For example, the communication activity is stored in ChatStorage.sqlite database and the contact information is stored in the ContactsV2.sqlite database. There are other artefacts in net.whatsapp.WhatsApp/Documents/ folder. The ChatStorage.sqlite database (Fig. 2.15) might be of interest in an investigation. Its most significant tables are ZWACHATSESSION, ZWAGROUPINFO, ZWAGROUPMEMBER, ZWAMEDIAITEM and ZWAMESSAGE. ZWACHATSESSION table contains a list of unique conversations started with different contacts or groups. ZWAGROUPINFO table stores a list of group conversations. ZWAGROUPMEMBER table has a list of contacts participating in group conversations. ZWAMEDIAITEM table contains a list of exchanged media items and finally ZWAMESSAGE is an important one, it stores a list of messages exchanged.

A database has a folder labeled as tables. It has a list with 17 entries which include Z W A blacklist item, and Z W A chat properties, among others. — **Fig. 2.15**

Figure 2.16 illustrates more details of ZWAMESSAGE table. The interesting fields are: ZMESSAGESTATUS contains the message status. For example, value ‘1’ is ‘received’, value ‘8’ is ‘read’. ZMESSAGETYPE is the message type, with ‘0’ is text, ‘1’ is image; ‘2’ is video; ‘3’ is audio; ‘4’ is contact; ‘5’ is location; ‘6’ is group; ‘7’ is URL; ‘8’ is file; etc. ZGROUPMEMBER stores a value, which is a primary key in ZWAGROUPMEMBER table, it links with ZWAGROUPMEMBER to identify the sender in a group. ZMESSAGEDATE contains the created date or the received date of a message. ZSENTDATE contains the sent date of a message. And finally ZTEXT contains text or Emojis.

A folder for the Z W A message table has some entries encircled like message status and type, among others. Examples of the encircled entries are given on the right. — **Fig. 2.16**

Another interesting WhatsApp database is Contacts.sqlite with the most important table is ZWAADDRESSBOOKCONTACT (Fig. 2.17). In this table, we can find the list of all contacts found on the phone with the phone numbers and the contact’s profile. Some important fields are ZFULLNAME stores the Contact’s full name. ZPHONENUMBER stores the contact’s phone number. ZWHATSAPPID stores the contact’s number in WhatsApp id format, i.e. the phone number in international format.

A folder for contacts dot s q lite tables is on the left where the table entry Z W A address book contact is highlighted. The entry Z sort is selected. — **Fig. 2.17**

On the Android systems, WhatsApp artifacts can be found in the following folders: data/com.whatsapp/databases/msgstore.db; data/com.whatsapp/databases/wa.db; data/com.whatsapp/…

The folder ‘databases’ was identified as storing the most valuable information in order to reconstruct the communication history in WhatsApp, which were the two main database files wa.db and msgstore.db that contained all the exchanged attachments, such as images, video and contact cards. Figure 2.18 illustrates the structure of database and fields of its message table.

A folder is labeled as tables and has entries, away messages, and call logs, among others. A list of fields under the message table is on the right. — **Fig. 2.18**

The wa.db database has one important table wacontacts (Fig. 2.19). This table contains a list of all contacts found on the phone (not only WhatsApp contacts). It has the following attributes:

A folder is labeled as tables and the entry, w a contacts, is highlighted. Under w a contacts, some entries like status and given name are encircled. — **Fig. 2.19**

jid: The contact or group WhatsApp id in < phonenumber> @s.whatsapp.net format
is_whatsapp_user: Boolean if the phone number is a registered WhatsApp user
status: The status text of the contact
number: The contact’s phone number
display_name: The contact’s display name in the contact list
phone_type: Number mapped to phone type, i.e. mobile, home etc.
given_name: Contact’s first name
family_name: Contact’s family name
wa_name: Contact’s WhatsApp name.

We refer readers to other related research efforts on WhatsApp forensics, such as those outlined in [19, 20].

2.6 Summary

This chapter present background of databases and database investigation that are necessary to follow case studies in following chapters. Basic concepts of databases including relational databases, database design and SQL language were introduced. This chapter also described a process for investigating application databases. Finally, two examples of database forensics were illustrated to show how to examine databases from iOS and from an instant message application. The following chapters will present case studies where database investigation techniques are applied in acquire relevant artifacts from different applications of forensic cases.

Notes

1.
More details of SQLite commands’ syntax and examples can be found in: https://www.sqlite.org/docs.html.

References

Warren, C., El-Sheikh, E., & Le-Khac, N.-A. (2017). Privacy preserving internet browsers—forensic analysis of browzar. In K. Daimi, et al. (Eds.), Computer and network security essentials (pp. 369–388 (18 pages)). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-319-58424-9_21
Sgaras, C., Tahar Kechadi, M., & Le-Khac, N.-A. (2015). Forensics acquisition and analysis of instant messaging and VoIP applications. Lecture Notes in Computer Science (Vol. 8915, pp. 188–199 (12 pages)). https://doi.org/10.1007/978-3-319-20125-2_16
Schipper, G. C., Seelt, R., & Le-Khac, N.-A. (2021). Forensic analysis of matrix protocol and Riot.im application. Forensic Science International: Digital Investigation, 36, 301118. https://doi.org/10.1016/j.fsidi.2021.301118
Thantilage, R., & Le-Khac, N.-A. (2020). Forensic analysis of e-dating applications based on iPhone backups. In G. Peterson & S. Shenoi (Eds.), Advances in digital forensics XVI. Digital forensics 2020. IFIP advances in information and communication technology (Vol. 589). Springer. https://doi.org/10.1007/978-3-030-56223-6_12
Voorst, R. V., Kechadi, T., & Le-Khac, N.-A. (2015). Forensics acquisition of IMVU: a case study. Journal of Association of Digital Forensics, Security and Law, 10(4), 69–78 (10 pages). https://doi.org/10.15394/jdfsl.2015.1212
Eliot, T. S. (1934). The Rock, Hardcourt, Brace and Company Inc. https://archive.org/details/in.ernet.dli.2015.3608/mode/2up
Celko, J. (1999). Data and databases: Concepts in practice, Morgan Kaufmann (1st ed.). ISBN-13: 978-1558604322.
Google Scholar
Connolly, T., & Begg, C. (2010). Database systems: A practice; approach to design, implementation and management (5th ed.). Addison Wesley.
Google Scholar
Coronel, C., et al. (2020). Database principles: Fundamentals of design, implementation and management (3rd ed.). Cengage Learning EMEA. ISBN-13: 978-1473768048.
Google Scholar
Assaf, W., et al. (2018). SQL server 2017 administration inside out (1st ed.). Microsoft Press. ISBN-13: 978-1509305216.
Google Scholar
Freeman, R. G. (2013). Oracle database 12c new features (1st ed.). McGraw Hill.
Google Scholar
Matthew, N., & Stones, R. (2005). Beginning databases with PostgreSQL: From novice to professional (2nd Corrected ed., Corr. 3rd printing ed.). A Press. ISBN-13: 978-1590594780.
Google Scholar
Grippa, V. M., & Kuzmichev, S. (2021). Learning MySQL: Get a handle on your data (2nd ed.). O’Reilly Media, Inc. ISBN-13: 978-1492085928.
Google Scholar
Coronel, C., & Morris, S. (2018). Database systems: Design, implementation, & management (13th ed.). Cengage learning (13th ed.). ISBN-13: 978-1337627900.
Google Scholar
Rockoff, L. (2016). The language of SQL (2nd ed.). Addison-Wesley Professional.
Google Scholar
Kreibich, J. A. (2010). Using SQLite (1st ed.). O’Reilly Media. ISBN-13: 978-0596521189.
Google Scholar
Aouad, L., Kechadi, M.-T., Trentesaux, J., & Le-Khac, N.-A. (2012). An open framework for smartphone evidence acquisition. In P. Gilbert & S. Sujeet (Eds.), Advances in digital forensics VIII (pp. 159–166 (8 pages)). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-33962-2
Thantilage, R., & Le-Khac, N.-A. (2019). Framework for the retrieval of social media and instant messaging evidence from volatile memory. In 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom-19) (CORE Rank A).
Google Scholar
Cents, R., & Le-Khac, N.-A. (2020). Towards a new approach to identify WhatsApp messages. In 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications (IEEE TrustCom-20) (CORE Rank A).
Google Scholar
Wijnberg, D., & Le-Khac, N.-A. (2021). Identifying interception possibilities for WhatsApp communication. Forensic Science International: Digital Investigation, 38(Supplement), 301132. https://doi.org/10.1016/j.fsidi.2021.301132
Article Google Scholar

Download references

Acknowledgements

Authors would like to thank Ranul Thantilage, who helped us to prepare relevant SQLite databases used in this chapter.

Author information

Authors and Affiliations

School of Computer Science, University College Dublin, Dublin, Ireland
Nhien-An Le-Khac
Department of Information Systems and Cyber Security, University of Texas at San Antonio, San Antonio, TX, USA
Kim-Kwang Raymond Choo

Authors

Nhien-An Le-Khac
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Kwang Raymond Choo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nhien-An Le-Khac .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Le-Khac, NA., Choo, KK.R. (2022). Database Forensics. In: A Practical Hands-on Approach to Database Forensics. Studies in Big Data, vol 116. Springer, Cham. https://doi.org/10.1007/978-3-031-16127-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-16127-8_2
Published: 22 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16126-1
Online ISBN: 978-3-031-16127-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Database Forensics

Abstract

Similar content being viewed by others