Keywords

1 Introduction

In recent years, there has been a growing deployment of machine learning systems on Linux Operating Systems. Within the Linux environment, numerous runtime problems and potential threats necessitate resolution through security monitoring and analysis, which includes the examination of logs to identify causes [1]. Log information serves as an immediate reflection of a system’s operational status. Developers utilize logs for diagnosing system malfunctions, recording experimental results, and generating data for training security analysis models.

In this paper, we focus on the logging service in the Linux server kernel and aim to assist programmers in designing kernel drivers and generating data for machine learning. Generally, developers strive to capture system state dumps, execute tracing, and communicate events through log data. Within the 13,390,104 lines of source code in the Linux kernel, there are 498,897 lines (approximately 3.79%) dedicated to logging code [2]. Logging functions in the Linux kernel write messages into the log buffer. Commonly used logging functions include the printk function, which is similar to the printf function. The distinction lies in printk’s specification of the log level for event recording. Subsequent Linux kernel versions have introduced variations of the printk function such as pr_info, and dev_warn, which incorporate log levels into their nomenclature.

However, the existing kernel logging services are unreliable due to several factors. Firstly, the Linux kernel generates a substantial volume of continuous log output from numerous programs. When Shiqing et al. [3] tested the overhead of the Linux audit system, they found that servers generate approximately 130GB of log data per day, while client machines generate about 5GB of log data per day. Nevertheless, developers often focus on specific Linux kernel modules. For instance, Tan Y et al. [4] employ authorization lists recorded in the logs as the basis for comparison. In the RootAgency [5], the logs document the time consumption from the test app initiating the request until the end user receives the reply for the root privilege request. Neither of them pays attention to logs related to unrelated modules. The excessive volume of irrelevant logs has caused interference in their experiments. Furthermore, the kernel logging service also faces the issue of log loss. The kernel log buffer is a ring buffer, which operates on a first-in, first-out (FIFO) basis. The default size of the Linux kernel log buffer is 128KB. As data accumulates beyond the capacity of the ring buffer, the oldest data is overwritten to accommodate new information.

In this paper, we introduce a kernel logging service solution, EasyLog, which effectively mitigates the above issues. In terms of interface utilization, EasyLog provides write interfaces such as easy_xx for log writing at the kernel layer and system call interfaces such as open, read, and close for log reading at the application layer. Besides, regarding log simplification, the design of the easy_xx functions is influenced by kernel functions like dev_info, appending specific identifiers to the end of each log. When our write interfaces append to record logs, EasyLog extracts logs using the identifier and stores them in our new ring log buffer, which can be expanded to 2 MB. This approach significantly reduces the volume of logs. It also elongates the time required to fill the circular buffer and decreases the likelihood of log loss. Furthermore, developers can utilize the easy_xx functions in their experimental kernel, subsequently compile and execute the code, and directly get logs from the new log buffer. To sum up, EasyLog reduces log volume, enhances effective log density, diminishes the probability of log loss, and facilitates the development of new modules for programmers.

The rest of the paper is structured as follows. Section 2 describes recent application of logs, the principles underlying the printk mechanism, and recent advancements in kernel-level logging service. In Sect. 3, the comprehensive architectural design is presented. Section 4 describes the details of the implementation and the interface design of EasyLog. In Sect. 5, we validate the effectiveness of EasyLog in mitigating the loss rate of important logs. Section 6 elucidates the utilization of the EasyLog service to assist in the development of new modules for the Linux kernel. Section 7 concludes.

2 Background

2.1 The Applications of Logs

When the kernel crashes, developers can solve bugs by analyzing the preserved system log files. In recent years, logs aid developers in diagnosing system errors, training security models, and documenting experimental results. In 2015, the EASEAndroid platform [6], the inaugural audit log analytic system for SEAndroid, employed semi-supervised learning to autonomously enhance the SEAndroid policy. In 2018, Xue B et al. [7] obtain the encryption rate and data processing size of the baseband processor through log information. In 2021, Li Y et al. [1] proposes a host security analysis method based on D-S evidence theory, which involves extracting information from monitoring logs and subsequently training a security analysis model. The model can be applied to host security analysis in different operating systems with minimal or almost no modification.

2.2 The Analysis of Printk

The kernel log module resides in ./kernel/printk/. Figure 1 illustrates the read-write framework of the kernel log module. As depicted in the figure, its core component is the ring buffer, denoted as the “log buffer". The printk function and the devkmsg_write function, acting as producers, store messages in the log buffer. On the other hand, the log service modules on the right side of the figure function as a consumer, reading messages from the log buffer.

Fig. 1.
figure 1

The Read-Write Framework of the kernel log module

Log Buffer: The size of the kernel log buffer is determined jointly by the configuration parameters CONFIG_LOG_BUF_SHIFT and the number of CPU cores in the SMP system. During the kernel boot, information regarding the system’s memory layout and the number of CPU cores is unknown before device tree parsing. To support the utilization of the printk function, the kernel defines a static global log buffer with a size of (1 <​<​ CONFIG_LOG_BUF_SHIFT). After CPU initialization, an additional global log buffer is dynamically allocated with a size of (1 <​<​ CONFIG_LOG_BUF_SHIFT + 1 <​<​ LOG_CPU_MAX_BUF_LEN), and the log data from the original static buffer is copied into it.

The log buffer is managed through the data structure printk_ringbuffer. Figure 2 presents the data structures of printk_ringbuffer. As depicted, it comprises three main components: 1) a ring buffer for data storage, managed using head and tail pointers to track the buffer’s status. When data needs to be written, the head pointer is updated based on the length of the data being written. If the free space in the ring buffer is insufficient, the oldest data is purged starting from the tail pointer. Additionally, each piece of written data is assigned an ID, which is used to indicate the index in the prb_desc array and the printk_info array. 2) An array of prb_desc structures, with each element maintaining the position information of a log within the ring buffer, along with its status. 3) An array of printk_info structures, with each element responsible for managing additional information associated with a log, such as its sequence number(seq), timestamp, length, log level, and more.

Fig. 2.
figure 2

Principal Members and Data Structures of printk_ringbuffer

Log Stoarge Process: We use the printk interface as an example for explaining the log storage process. 1) Allocate an entry and increment the sequence number: The desc_reserve function retrieves an available entry from the prb_desc structure array. If no free entry is available, it overwrites the oldest log. Upon successful allocation, the descriptor’s status is set to desc_reserved. Subsequently, the sequence number for that log is set. 2) Allocate space and copy the log: The data_alloc function allocates a segment of space from the ring buffer to store the new log data. The log_data_copy function copies the data to be written into the allocated space within the ring buffer. 3) Update the status: The _prb_commit function updates the status of the new element in the prb_desc array to desc_committed, and then the desc_make_final function updates the status to desc_finalized. After this operation, the log is written and ready for reading.

Log Retrieval Process: When reading logs, a sequence number (seq) is provided as a parameter, and the prb_read function is called to retrieve the log corresponding to that sequence number. 1) Retrieve a valid status log: The desc_read_finalized_seq function reads the status of the log corresponding to the provided sequence number. If the status is valid, the subsequent log data and information retrieval operations are executed. After completion, the status of that log is rechecked. If it remains valid and the sequence number has not changed, it signifies that during the reading process, the data was not modified by write operations. Otherwise, the reading process fails. 2) Handling of Reading Failures: In case of reading failures, the prb_first_seq function is called to obtain the first readable log after the provided sequence number, and the retrieval process is restarted.

Devkmsg Log Interface: The devkmsg service provides log read and write operations to user space through the device file node /dev/kmsg. Devkmsg maintains an independent sequence number and log reading is based on this sequence number to determine which log needs to be read. The user space read interface provided by EasyLog in this paper is modeled after the Devkmsg service.

Other Interfaces: 1) syslog interface: This interface exports logs through system calls but does not provide log writing operations. The syslog interface finds utility in various scenarios, including applications like dmesg, klogd, and /proc/kmsg. 2) Console log interface: It primarily offers console initialization, and registration processes, and specifies the preferred console interface through command-line parameters. 3) Kmsg dump interface: This interface is primarily used by pstore. pstore is applied to save system logs to a backend device in the event of a system crash, assisting developers in debugging and analysis.

2.3 Recent Work About Logging

Linux manages storage devices, networks, man-machine interfaces, CPUs, and more through software layers such as device drivers, file systems, and communication protocols. These intricate modules are maintained by hundreds of programmers. As Linux grows in complexity, an increasing number of system analysis tools have been proposed to help developers in analyzing system behavior. The simplest logging tool in Linux is the printk function, as mentioned earlier. In Linux kernel v1.3.983 [2], a set of additional logging functions was introduced to enhance the conciseness of log statement recording. These functions incorporate log levels in their names. Consequently, programmers are no longer required to use printk function with log-level parameters such as KERN_DEBUG and KERN_INFO. Another set of logging functions specifically designed for device drivers, such as dev_dbg and dev_info, automatically embed the device name in their outputs, thereby facilitating the identification of the source of log messages.

Both dev_xx functions and pr_xx functions are variants of the printk function, and the underlying issues with printk remain unresolved. The printk function uses an asynchronous daemon to read and write a ring buffer, making the buffer vulnerable to overwriting and event loss. The Linux Trace Toolkit (LTT) [8] logs around 45 predefined events, including interrupts, system calls, and network packet arrivals. The tool is advantageous due to its relatively low overhead and the presence of a visualization tool to aid in analyzing logged data. However, it lacks flexibility and scalability. Relayfs [9] is proposed, which divides logs into different subsystem/client channels, effectively addressing the fundamental overhead caused by locking during logging. KLogger [10] is presented as a software tool for logging operating system kernel events. Developers can insert new log events into the kernel using this tool. Furthermore, an alternative approach to logging all events is sampling. OProfile [11] adopts a sampling approach, serving as the underlying infrastructure for HP’s Prospect tool. OProfile uses Intel’s hardware performance counters to generate traps for every N occurrence of specific hardware events. However, since OProfile is based on periodic sampling, it may miss events with finer granularity than the sampling rate.

3 Architecture

Figure 3 illustrates the architecture of EasyLog. 1) The server kernel subsystem utilizes the easy_xx functions to record log entries. 2) EasyLog service filters logs with specific suffixes and directs them into a newly created ring buffer. 3) User-space applications access the new log buffer through the character device node /dev/easylog.

Fig. 3.
figure 3

The architecture of EasyLog

4 Implementation

4.1 Channel and Data Management Schemes

The EasyLog module maintains a ring buffer as its log buffer. Its structure, initialization, and read-write processes resemble the printk’s log buffer.

Initialization: During the early stages of kernel startup, the original static log buffer is still employed. Subsequently, in the start_kernel function, an initialization function is called to dynamically allocate a global log buffer of size (1>​>​CONFIG_EASYLOG_BUF_SHIFT).

Log Storage: The kernel subsystem uses the easy_xx functions to record log entries, with each log appended with a special identifier suffix.

Log Filtering: The log_store function is responsible for appending logs generated by other modules to the original log buffer. Within the log_store function, the EasyLog service performs a hook-like operation. The service filters logs generated by other modules, extracts logs with suffixes, removes the suffixes, and subsequently stores them in our log buffer following the writing procedure described in Sect. 2.1. This process ensures that the log buffer’s read and write operations do not conflict through spin locks and local interrupt disabling. During this filtering step, both our log buffer and the printk’s log buffer need to be locked for protection.

Log Retrieval: EasyLog maps our ring buffer to a pseudo-file, namely the character device file /dev/easylog, which can be accessed by userspace for reading or process memory mapping.

4.2 Interface

This section describes the basic utilization of EasyLog in both kernel subsystems and user space. Within the kernel subsystem, a set of kernel-space APIs can be employed to write logs with suffixes, facilitating the extraction of logs by the EasyLog service. In the user space, user programs can access logs from EasyLog by reading from the character device node /dev/easylog.

Drawing inspiration from the pr_xx functions, the kernel subsystems write a special log using the easy_xx function, where ’xx’ denotes the corresponding log level. Each kernel log comprises three essential components [2]: the kernel event level, a static message detailing the event, and variable values associated with the log event. Detailed API definitions are provided below.

figure a

In this context, the easy_suffix_printk function, inspired by the dev_xx functions, appends the “–easylog–" identifier to the end of each log. Subsequently, it invokes the vprintk_emit function, passing the kernel event level and the newly generated log with the suffix.

The reading operation of user programs in EasyLog draws inspiration from the reading operation of the devkmsg service. /dev/easylog is a readable character device file, permitting multiple user processes to access log records. Each process can obtain a complete set of log entries from EasyLog.

The collection of user-space file operations provided by EasyLog is as follows: 1) The open function, responsible for opening the character device file node /dev/easylog and creating a deveasylog_user structure object (as detailed later) named user, corresponding to the deceasylog_open function in the kernel. 2) The read function, which reads log entries from the log buffer, corresponds to the deceasylog_read function in the kernel. 3) The release function, responsible for releasing all resources acquired by the open function, corresponding to the deceasylog_release function in the kernel.

figure b

Each process that opens /dev/EasyLog is associated with an independent deveasylog_user structure object, as described below. The deveasylog_user structure maintains a unique sequence number for each reading process, denoted as seq, which represents the sequence number of the log currently being read by the process. The mutex lock, denoted as lock, ensures that only one thread within each process can perform write operations on the text_buf. The rs variable is used for rate limiting. Following the Log Retrieval Process in Sect. 2.1, logs are read from EasyLog’s log buffer, recorded in the text_buf, and subsequently returned to user space by invoking the copy_to_user function.

figure c

5 Experiment

This section of experiments aims to demonstrate the significant reduction in the loss rate of important logs achieved by EasyLog. Important logs refer to the logs associated with the development modules during server kernel development. Within the developing server kernel subsystems, developers invoke the easy_xx functions and append specific identifiers to the end of each log. This experiment considers such logs as important logs, defined as label logs. EasyLog extracts the logs with specific identifiers and stores them in a new ring log buffer.

In our experiments, it was essential to simulate real-world log generation scenarios as closely as possible, ensuring that normal logs and label logs terminated their output as synchronously as feasible. The original printk and EasyLog log buffer were set to 256 KB. We designed two kernel modules: one module invokes pr_info to write N normal log entries, while the other module invokes easy_info to write N*0.1 label log entries, ensuring a ratio of 100 normal logs to 10 label logs. The values of N range from 2048, 4096, 6144...20480. Additionally, normal logs were generated at intervals of 0.1 s, whereas the generation intervals for label logs were a random number between [0.5, 1.5] (with a mean of 1). These two modules were executed concurrently, writing normal logs and label logs in parallel. We consider the logs recorded in /dev/kmsg as the logs logged by the original logging system, while the logs in /dev/easylog are the logs recorded by EasyLog. The loss rate of label logs was tested separately. The formula for calculating the loss rate of label logs is provided in Eq. 1. Table 1 presents the experimental results.

$$\begin{aligned} Label\ Logs\ Loss\ Rate = \frac{Label\ Logs\ Loss\ Count}{Label\ Logs\ Count} \times 100\% \end{aligned}$$
(1)
Table 1. The loss rate of label logs

The Table 1 shows that: 1) When the log volume is relatively small, there is no significant difference in the loss rate of label logs and it remains at 0. 2) As the log volume gradually increases, the loss rate of label logs in the /dev/kmsg increases progressively. 3) Due to the constraint of ensuring that normal and label logs terminate their outputs concurrently, the growth in the loss rate of label logs in the /dev/kmsg slows down as the log volume increases, and it will not reach 100%. 4) Because the total size of label logs is less than 256 KB, the loss rate of label logs consistently remains at 0. In summary, EasyLog reduces the volume of logs that need to be recorded by extracting important logs, allowing the system sufficient time to store the logs from the buffer into files, thus effectively reducing the probability of log loss.

6 Application

EasyLog aims to assist programmers in designing kernel drivers and generating data for machine learning. In this section, we demonstrate how EasyLog aids developers in kernel driver development. Some hardware devices, such as GPUs, have old versions phased out and new versions released, necessitating corresponding driver updates. Besides, the driver subsystems are significantly larger than other subsystems. From Linux versions v4.3 to v5.3, there were a total of 211,437 modifications to log statements, with the driver subsystem accounting for 86.60% of the overall log code changes [2]. Therefore, optimizing the logging system is of paramount importance for Linux server kernel driver development.

Fig. 4.
figure 4

The output of “dmesg". The figure indicates the original kernel logs include too much unrelated information.

Fig. 5.
figure 5

The output of “cat /dev/easylog". The figure indicates the EasyLog’s logs are clearer and more coherent.

In this chapter, we take USB storage device-related drivers as an example. We output logs when USB flash drives are inserted and removed. The log content includes relevant information about the USB flash drive, such as product, vendor, manufacturer, serial number, as well as the time of insertion and removal. The driver functions involved in this process include the scsi_add_lun function in scsi/scsi_scan.c and the usb_disconnect function in usb/core/core.c. In these respective locations within the functions, the easy_info function is called to write the relevant log content. Repeatedly inserting and removing different USB flash drives or hard drives, Fig. 4 and Fig. 5 depict screenshots of the dmesg output and the cat /dev/easylog output. The Fig. 5 is clearer and more coherent, devoid of interference in the logs, which is advantageous for programmers working on new modules.

7 Conclusion

To address issues such as log loss and excessive log volume in the Linux logging service, this paper proposes the EasyLog service. EasyLog maintains a ring buffer as the log buffer. EasyLog extracts logs with special identifiers from the logs written to the original log buffer and stores them in our new log buffer. Furthermore, EasyLog provides write functions such as easy_xx for kernel modules to write logs and read functions for applications to read logs.

In the fifth section, experimental results demonstrate that when the total log volume is relatively low, both EasyLog and the original Linux logging module have a log loss rate of 0 for critical logs. However, when the proportion of critical logs remains constant but the total log volume increases, the log loss rate of critical logs in the original Linux logging system gradually rises. In contrast, the EasyLog module en sures that these critical logs are not overwritten by unrelated logs by extracting critical logs and storing them in our new ring buffer. As a result, the log loss rate of critical logs in EasyLog is significantly lower than in the original Linux logging system. When the total size of critical logs does not exceed the log buffer capacity, the loss rate of critical logs is 0.

In conclusion, EasyLog proficiently mitigates log loss concerns and facilitates the creation of kernel drivers as well as the generation of data for machine learning applications.