Keywords

1 Introduction

The Internet Control Message Protocol (ICMP) is part of the original Internet Protocol specification (ICMP is IP protocol number one), and has remained largely unchanged since RFC 792 [21]. Its primary function is to communicate error and diagnostic information; well-known uses today include ICMP echo to test for reachability (i.e., ping), ICMP time exceeded to report packet loops (i.e., traceroute), and ICMP port unreachable to communicate helpful information to the initiator of a transport-layer connection. Today, 27 ICMP types are defined by the IESG, 13 of which are deprecated [11].

Among the non-deprecated ICMP messages are timestamp (type 13) and timestamp reply (type 14). These messages, originally envisioned to support time synchronization and provide one-way delay measurements [19], contain three 32-bit time values that represent milliseconds (ms) since midnight UTC. Modern clock synchronization is now performed using the Network Time Protocol [18] and ICMP timestamps are generally regarded as a potential security vulnerability [20] as they can leak information about a remote host’s clock. Indeed, Kohno et al. demonstrated in 2005 the potential to identify individual hosts by variations in their clock skew [12], while [6] and [4] show similar discriminating power when fingerprinting wireless devices.

In this work, we reassess the extent to which Internet hosts respond to ICMP timestamps. Despite no legitimate use for ICMP timestamps today, and best security practices that recommend blocking or disabling these timestamps, we receive timestamp responses from 2.2 million IPv4 hosts in 42,656 distinct autonomous systems (approximately 15% of the hosts queried) during a large-scale measurement campaign in September and October 2018. In addition to characterizing this unexpectedly large pool of responses, we seek to better understand how hosts respond. Rather than focusing on clock-skew fingerprinting, we instead make the following primary contributions:

  1. 1.

    The first Internet-wide survey of ICMP timestamp support and responsiveness.

  2. 2.

    A taxonomy of ICMP timestamp response behavior, and a methodology to classify responses.

  3. 3.

    Novel uses of ICMP timestamp responses, including fine-grained operating system fingerprinting and coarse geolocation.

Fig. 1.
figure 1

ICMP timestamp message fields

2 Background and Related Work

Several TCP/IP protocols utilize timestamps, and significant prior work has examined TCP timestamps in the context of fingerprinting [12]. TCP timestamps have since been used to infer whether IPv4 and IPv6 server addresses map to the same physical machine in [2] and combined with clock skew to identify server “siblings” on a large scale in [24].

In contrast, this work focuses on ICMP timestamps. Although originally intended to support time synchronization [19], ICMP timestamps have no modern legitimate application use (having been superseded by NTP). Despite this, timestamps are not deprecated [11], suggesting that while hosts must support them, little attention is paid to their implementation and use.

Figure 1 depicts the structure of timestamp request (type 13) and response (type 14) ICMP messages. The 16-bit identifier and sequence values enable responses to be associated with requests. Three four-byte fields are defined: the originate timestamp (orig_ts), receive timestamp (recv_ts), and transmit timestamp (xmit_ts). Per RFC792 [21], timestamp fields encode milliseconds (ms) since UTC midnight unless the most significant bit is set, in which case the field may be a “non-standard” value. The originator of timestamp requests should set the originate timestamp using her own clock; the value of the receive and transmit fields for timestamp requests is not specified in the RFC.

To respond to an ICMP timestamp request, a host simply copies the request packet, changes the ICMP type, and sets the receive and transmit time fields. The receive time indicates when the request was received, while the transmit time indicates when the reply was sent.

Several prior research works have explored ICMP timestamps, primarily for fault diagnosis and fingerprinting. Anagnostakis et al. found in 2003 that 93% of the approximately 400k routers they probed responded to ICMP timestamp requests, and developed a tomography technique using ICMP timestamps to measure per-link one-way network-internal delays [1]. Mahajan et al. leveraged and expanded the use of ICMP timestamps to enable user-level Internet fault and path diagnosis in [16].

Buchholz and Tjaden leveraged ICMP timestamps in the context of forensic reconstruction and correlation [3]. Similar to our results, they find a wide variety of clock behaviors. However, while they probe \(\sim \)8,000 web servers, we perform an Internet-wide survey including 2.2M hosts more than a decade later, and demonstrate novel fingerprinting and geolocation uses of ICMP timestamps.

Finally, the nmap security scanner [15] uses ICMP timestamp requests, in addition to other protocols, during host discovery for non-local networks in order to circumvent firewalls and blocking. nmap sets the request originate timestamp to zero by default, in violation of the standard [21] (though the user can manually specify a timestamp). Thus, ICMP timestamp requests with zero-valued origination times provide a signature of nmap scanners searching for live hosts. While nmap uses ICMP timestamps for liveness testing, it does not use them for operating system detection as we do in this work.

To better understand the prevalence of ICMP timestamp scanners, we analyze 240 days of traffic arriving at a /17 network telescope. We observe a total of 413,352 timestamp messages, 93% of which are timestamp requests. Only 33 requests contain a non-zero originate timestamp, suggesting that the remainder (nearly 100%) are nmap scanners. The top 10 sources account for more than 86% of the requests we observe, indicating a relatively small number of active Internet-wide scanners.

3 Behavioral Taxonomy

During initial probing, we found significant variety in timestamp responses. Not only do structural differences exist in the implementation of [21] by timestamp-responsive routers and end systems (e.g., little- vs big-endian), they also occur relative to how the device counts time (e.g., milliseconds vs. seconds), the device’s reference point (e.g., UTC or local time), whether the reply is a function of request parameters, and even whether the device is keeping time at all.

Table 1. ICMP timestamp classification fingerprints

3.1 Timestamp Implementation Taxonomy

Table 1 provides an exhaustive taxonomy of the behaviors we observe; we term these the ICMP timestamp classifications. Note that this taxonomy concerns only the implementation of the timestamp response, rather than whether the responding host’s timestamp values are correct.

  • Normal: Conformant to [21]. Assuming more than one ms of processing time, the receive and transmit timestamps should be not equal, and both should be nonzero except at midnight UTC.

  • Lazy: Performs a single time lookup and sets both receive and transmit timestamp fields to the same value. A review of current Linux and FreeBSD kernel source code reveals this common lazy implementation [10, 13].

  • Checksum-Lazy: Responds to timestamp requests even when the ICMP checksum is incorrect.

  • Stuck: Returns the same value in the receive and transmit timestamp fields regardless of the input sent to it and time elapsed between probes.

  • Constant 0, 1, Little-Endian 1: A strict subset of “stuck” that always returns a small constant value in the receive and transmit timestamp fields.

  • Reflection: Copies the receive and transmit timestamp fields from the timestamp request into the corresponding fields of the reply messageFootnote 1.

  • Non-UTC: Receive and transmit timestamp values with the most significant bit set. As indicated in [21], network devices that are unable to provide a timestamp with respect to UTC midnight or in ms may use an alternate time source, provided that the high order bit is set.

  • Linux htons() Bug: Certain versions of the Linux kernel (and Android) contain a flawed ICMP timestamp implementation where replies are truncated to a 16-bit value; see Appendix A for details.

  • Unknown: Any reply not otherwise classified.

3.2 Timekeeping Behavior Taxonomy

We next categorize the types of timestamp responses we observe by what the host is measuring and what they are measuring in relation to.

  • Precision: Timestamp reply fields should encode ms to be conformant, however some implementations encode seconds.

  • UTC reference: Conformant to the RFC; receive and transmit timestamps encode ms since midnight UTC.

  • Timezone: Replies with receive and transmit timestamps in ms relative to midnight in the device’s local timezone, rather than UTC midnight.

  • Epoch reference: Returned timestamps encode time in seconds relative to the Unix epoch time.

  • Little-Endian: Receive and transmit timestamps containing a correct timestamp when viewed as little-endian four-byte integers.

4 Methodology

We develop sundial, a packet prober that implements the methodology described herein to elicit timestamp responses that permit behavioral classification. sundial is written in C and sends raw IP packets in order to set specific IP and ICMP header fields, while targets are randomized to distribute load. We have since ported sundial to a publicly available ZMap [8] module [22].

Our measurement survey consists of probing 14.5 million IPv4 addressesFootnote 2 of the August 7, 2018 ISI hitlist, which includes one address per routable /24 network [9]. We utilize two vantage points connected to large academic university networks named after their respective locations: “Boston” and “San Diego.” Using sundial, we elicit ICMP timestamp replies from \(\sim \)2.2 million unique IPs.

This section first describes sundial ’s messages and methodology, then our ground truth validation. We then discuss ethical concerns and precautions undertaken in this study.

4.1 sundial Messages

In order to generate and categorize each of the response behaviors, sundial transmits four distinct types of ICMP timestamp requests. Both of our vantage points have their time NTP-synchronized to stratum 2 or better servers. Thus time is “correct” on our prober relative to NTP error.

  1. 1.

    Standard: We fill the originate timestamp field with the correct ms from UTC midnight, zero the receive and transmit timestamp fields, and place the lower 32 bits of the MD5 hash of the destination IP address and originate timestamp into the identifier and sequence number fields. The hash permits detection of destinations or middleboxes that tamper with the originate timestamp, identifier, or sequence number.

  2. 2.

    Bad Clock: We zero the receive and transmit fields of the request, choose an identifier and sequence number, and compute the MD5 hash of the destination IP address together with the identifier and sequence number. The lower 32 bits of the hash are placed in the originate timestamp. This hash again provides the capability to detect modification of the reply.

  3. 3.

    Bad Checksum: The correct time in ms since UTC midnight is placed in the originate field, the receive and transmit timestamps are set to zero, and the identifier and sequence number fields contain an encoding of the destination IP address along with the originate timestamp. We deliberately choose a random, incorrect checksum and place it into the ICMP timestamp request’s checksum field. This timestamp message should appear corrupted to the destination, and a correct ICMP implementation should discard it.

  4. 4.

    Duplicate Timestamp: The receive and transmit timestamps are initialized to the originate timestamp value by the sender, setting all three timestamps to the same correct value. The destination IP address and originate timestamp are again encoded in the identifier and sequence number to detect modifications.

Many implementation behaviors in Sect. 3 can be inferred from the first, standard probe. For instance, the standard timestamp request can determine a normal, lazy, non-UTC and little-endian implementation. In order to classify a device as stuck, both the standard and duplicate timestamp requests are required. Two requests are needed in order to determine that the receive and transmit timestamps remain fixed over time, and the inclusion of the duplicate timestamp request ensures that the remote device is not simply echoing the values in the receive and transmit timestamp fields of the request. Similarly, timestamp reflectors can be detected using the standard and duplicate request responses.

The checksum-lazy behavior is detected via responses to the bad checksum request type. The Linux htons() bug behavior can be detected using the standard request and filtering for reply timestamps with the two lower bytes set to zero. In order to minimize the chance of false positives (i.e., the correct time in ms from UTC midnight is represented with the two lower bytes zeroed), we count only destinations that match this behavior in responses from both the standard and bad clock timestamp request types.

To detect the unit precision of the timestamp reply fields, we leverage the multiple requests sent to each target. Because we know the time at which requests are transmitted, we compare the time difference between the successive requests to a host and classify them based on the inferred time difference from the replies.

Finally, we classify responsive devices by the reference by which they maintain time. We find many remote machines that observe nonstandard reference times, but do not set the high order timestamp field bit. A common alternative timekeeping methodology is to track the number of ms elapsed since midnight local time. We detect local timezone timekeepers by comparing the receive and transmit timestamps to the originate timestamp in replies to the standard request. Receive and transmit timestamps that differ from our correct originate timestamp by the number of ms for an existing timezone (within an allowable error discussed in Sect. 5.2) are determined to be keeping track of their local time.

Last, a small number of devices we encountered measured time relative to the Unix epoch. Epoch-relative timestamps are detected in two steps: first, we compare the epoch timestamp’s date to the date in which we sent the request; if they match, we determine whether the number of seconds elapsed since UTC midnight in the reply is suitably close to the correct UTC time.

4.2 Ground Truth

To validate our inferences and understand the more general behavior of popular operating systems and devices, we run sundial against a variety of known systems; Table 2 lists their ICMP timestamp reply behavior.

Table 2. Ground truth classification of ICMP timestamp behaviors

Apple desktop and mobile operating systems, macOS and iOS, both do not respond to ICMP timestamp messages by default. Initially, we could not elicit any response from Microsoft Windows devices, until we disabled Windows Firewall. Once disabled, the Windows device responds with correct timestamps in little-endian byte order. This suggests that not only are timestamp-responsive devices with little-endian timestamp replies Windows, but it also worryingly indicates that its built-in firewall has been turned off by the administrator.

BSD and Linux devices respond with lazy timestamp replies, as their source code indicates they should. JunOS and Android respond like FreeBSD and Linux, on which they are based, respectively. Of note, we built the Linux 3.18 kernel, which has the htons() bug described in Sect. 6; it responded with the lower two bytes zeroed, as expected. This bug has made its way into Android, where we find devices running the 3.18 kernel exhibiting the same signature.

Cisco devices respond differently depending on whether they have enabled NTP. NTP is not enabled by default on IOS; the administrator must manually enable the protocol and configure the NTP servers to use. If NTP has not been enabled, we observe devices setting the most significant bit, presumably to indicate that it is unsure whether the timestamp is accurate, and filling in a UTC-based timestamp with the remaining bits, according to its internal clock.

Telnet Banner and CWMP GET Ground Truth. To augment the ground truth we obtained from devices we were able to procure locally, we leveraged IPv4 Internet-wide Telnet banner- and CPE WAN Management Protocol (CWMP) parameter-grabbing scans from scans.io [23]. From October 3, 2018 scans, we search banners (Telnet) and GET requests (CWMP) for IP addresses associated with known manufacturer strings. We then probe these addresses with sundial.

Figure 2 displays the most common fingerprints for a subset of the manufacturers probed from scans.io’s Telnet banner-grab dataset, while Fig. 3 is the analogous CWMP plot. We note that non-homogeneous behavior within a manufacturer’s plot may be due to several factors: different behaviors among devices of the same manufacturer, banner spoofing, IP address changes, and middleboxes between the source and destination. We provide further details regarding our use of the scans.io datasets in Appendix B.

Fig. 2.
figure 2

Incidence of fingerprints for most common telnet banner manufacturers

Fig. 3.
figure 3

Incidence of fingerprints for most common CWMP scan manufacturers

4.3 Ethical Considerations

Internet-wide probing invariably raises ethical concerns. We therefore follow the recommended guidelines for good Internet citizenship provided in [8] to mitigate the potential impact of our probing. At a high-level, we only send ICMP packets, which are generally considered less abusive than e.g., TCP or UDP probes that may reach active application services. Further, our pseudo-random probing order is designed to distribute probes among networks in time so that they do not appear as attack traffic. Finally, we make an informative web page accessible via the IP address of our prober, along with instructions for opting-out. In this work, we did not receive any abuse reports or opt-out requests.

5 Results

On October 6, 2018, we sent four ICMP timestamp request messages as described in Sect. 4.1 from both of our vantage points to each of the 14.5 million target IPv4 addresses in the ISI hitlist. We obtained at least one ICMP timestamp reply message from 2,221,021 unique IP addresses in 42, 656 distinct autonomous systems as mapped by Team Cymru’s IP-to-ASN lookup service [5]. Our probing results are publicly available [22].

We classify the responses according to the implementation taxonomy outlined in Sect. 3 and Table 1, the timekeeping behavior detailed in Sect. 3.2, and the correctness of the timestamp reply according to Sect. 5.2. Tables 3 and 4 summarize our results in tabular form; note that the implementation behavior categories are not mutually exclusive, and the individual columns will sum to more than the total column, which is the number of unique responding IP addresses. We received replies from approximately 11,000 IP addresses whose computed MD5 hashes as described in Sect. 4.1 indicated tampering of the source IP address, originate timestamp, or id and sequence number fields; we discard these replies.

5.1 Macro Behavior

Lazy replies outnumber normal timestamp replies by a margin of over 50 to 1. Because we had assumed the normal reply type would be the most common, we investigated open-source operating systems’ implementations of ICMP. In both the Linux and BSD implementations, the receive timestamp is filled in via a call to retrieve the current kernel time, after which this value is simply copied into the transmit timestamp field. Therefore, all BSD and Linux systems, and their derivatives, exhibit the lazy timestamp reply behavior.

Normal hosts can appear lazy if the receive and transmit timestamps are set within the same millisecond. This ambiguity can be resolved in part via multiple probes. For instance, Table 3 shows that only \(\sim \)50% of responders classified as normal by one vantage are also marked normal by the other.

The majority (61%) of responding devices do not reply with timestamps within 200 ms of our NTP-synchronized reference clock, our empirically-derived correctness bound discussed in Sect. 5.2. Only \(\sim \)40% of responding IP addresses fall into this category; notably, we detect smaller numbers devices with correct clocks incorrectly implementing the timestamp reply message standard. For example, across both vantage points we detect thousands of devices whose timestamps are correct when interpreted as a little-endian integer, rather than in network byte order. We discover one operating system that implements little-endian timestamps in Sect. 4.2. In another incorrect behavior that nevertheless indicates a correct clock, some devices respond with the correct timestamp and the most significant bit set – a behavior at odds with the specification [21] where the most significant bit indicates a timestamp either not in ms, or the host cannot provide a timestamp referenced to UTC midnight. In Sect. 4.2, we discuss an operating system that sets the most significant bit when its clock has not been synchronized with NTP.

Table 3. Timestamp reply implementation behaviors (values do not sum to total)

Over 200,000 unique IPs (>10% of each vantage point’s total) respond with the most significant bit set in the receive and transmit timestamps; those timestamps that are otherwise correct are but a small population of those we term Non-UTC due to the prescribed meaning of this bit in [21]. Some hosts and routers fall into this category due to the nature of their timestamp reply implementation – devices that mark the receive and transmit timestamps with little-endian timestamps will be classified as Non-UTC if the most significant bit of the lowest order byte is on, when the timestamp is viewed in network byte order. Others, as described above, turn on the Non-UTC bit if they have not synchronized with NTP.

Another major category of non-standard implementation behavior of ICMP timestamp replies are devices that report their timestamp relative to their local timezone. Whether devices are programmatically reporting their local time without human intervention, or whether administrator action is required to change the system time (from UTC to local time) in order to effect this classification is unclear. In either case, timezone timestamp replies allow us to coarsely geolocate the responding device. We delve deeper into this possibility in Sect. 5.4.

Finally, while most responding IP addresses are unsurprisingly classified as using milliseconds as their unit of measure, approximately 14–16% of IP addresses are not (see Table 4). In order to determine what units are being used in the timestamp, we subtract the time elapsed between the standard timestamp request and duplicate timestamp request, both of which contain correct originate timestamp fields. We then subtract the time elapsed according to the receive and transmit timestamps in the timestamp reply messages. If the difference of differences is less than 400 ms (two times 200 ms, the error margin for one reply) we conclude that the remote IP is counting in milliseconds. A similar calculation is done to find devices counting in seconds. Several of the behavioral categories outlined in Sect. 3.1 are included among the hosts with undefined timekeeping behavior – those whose clocks are stuck at a particular value and those that reflect the request’s receive and transmit timestamps into the corresponding fields are two examples. Others may be filling the reply timestamps with random values.

Table 4. Timestamp reply timekeeping behaviors

5.2 Timestamp Correctness

In order to make a final classification – whether the remote host’s clock is correct or incorrect – as well as to assist in making many of the classifications within our implementation and timekeeping taxonomies that require a correctness determination, we describe in this section our methodology for determining whether or not a receive or transmit timestamp is correct.

To account for clock drift and network delays, we aim to establish a margin of error relative to a correctly marked originate timestamp, and consider receive and transmit timestamps within that margin from the originate timestamp to be correct. To that end, we plot the probability density of the differences between the receive and originate timestamps from 2.2 million timestamp replies generated by sending a single standard timestamp request to each of 14.5 million IP addresses from the ISI hitlist [9] in Fig. 4.

Figure 4 clearly depicts a trough in the difference probability values around 200 ms, indicating that receive timestamps greater than 200 ms than the originate timestamp are less likely than those between zero and 200 ms. We reflect this margin about the y-axis, despite the trough occurring somewhat closer to the origin on the negative side. Therefore, we declare a timestamp correct if it is within our error margin of 200 ms of the originate timestamp.

Fig. 4.
figure 4

Empirical recv_ts- orig_ts PMF

Fig. 5.
figure 5

Response error; note hourly peaks

5.3 Middlebox Influence

To investigate the origin of some of the behaviors observed in Sect. 3 for which we have no ground truth implementations, we use tracebox [7] to detect middleboxes. In particular, we chose for investigation hosts implementing the reflection, lazy with MSB set (but not counting milliseconds), and constant 0 behaviors, as we do not observe any of these fingerprints in our ground truth dataset, yet there exist nontrivial numbers of them in our Internet-wide dataset.

In order to determine whether a middlebox may be responsible for these behaviors for which we have no ground truth, we tracebox to a subset of 500 random IP addresses exhibiting them. For our purposes, we consider an IP address to be behind a middlebox if the last hop modifies fields beyond the standard IP TTL and checksum modifications, and DSCP and MPLS field alterations and extensions. Of 500 reflection IP addresses, only 44 showed evidence of being behind a middlebox, suggesting that some operating systems implement the reflect behavior and that this is a less common middlebox modification. The lazy with MSB set (but non-ms counting) behavior, on the other hand, was inferred to be behind a middlebox in 333 out of 500 random IP addresses, suggesting it is most often middleboxes that are causing the lazy-MSB-set fingerprint. Finally, about half of the constant 0 IP addresses show middlebox tampering in tracebox runs, suggesting that this behavior is both an operating system implementation of timestamp replies as well as a middlebox modification scheme.

5.4 Geolocation

Figure 5 displays the probability distribution of response error, e.g., \(\texttt {recv\_ts}- \texttt {orig\_ts}\), after correct replies have been removed from the set of standard request type responses. While there is a level of uniform randomness, we note the peaks at hour intervals. We surmise that these represent hosts that have correct time, but return a timezone-relative response (in violation of the standard [21] where responses should be relative to UTC). The origin of timezone-relative responses may be a non-conformant implementation. Alternatively, these responses may simply be an artifact of non-NTP synchronized machines where the administrator instead sets the localtime correctly, but incorrectly sets the timezone. In this case, the machine’s notion of UTC is incorrect, but incorrect relative to the set timezone. Nevertheless, these timezone-relative responses effectively leak the host’s timezone. We note the large spike in the +9 timezone, which covers Japan and South Korea; despite the use of nmap’s OS-detection feature, and examining web pages and TLS certificates where available, we could not definitively identify a specific device manufacturer or policy underpinning this effect.

To evaluate our ability to coarsely geolocate IP addresses reporting a timezone-relative timestamp, we begin with \(\sim \)34,000 IP addresses in this category obtained by sending a single probe to every hitlist IP from our Boston vantage. Using the reply timestamps, we compute the remote host’s local timezone offset relative to UTC to infer the host’s timezone. We then compare our inferred timezone with the timezone reported by the MaxMind GeoLite-2 database [17].

For each IP address, we compare the MaxMind timezone ’s standard time UTC-offset and, if applicable, daylight saving time UTC offset, to the timestamp-inferred offset. Of the 34,357 IP addresses tested, 32,085 (93%) correctly matched either the standard timezone UTC offset or daylight saving UTC offset, if the MaxMind-derived timezone observes daylight saving time. More specifically, 18,343 IP addresses had timestamp-inferred timezone offsets that matched their MaxMind-derived timezone, which did not observe daylight saving time. 11,188 IP addresses resolved to a MaxMind timezone, whose daylight saving time offset matched the offset inferred from the timestamp. 2,554 IP addresses had timestamp-inferred UTC offsets that matched their MaxMind-derived standard time offset for timezones that do observe daylight saving time. Of the inferred UTC-offsets that were not correct, 1,641 did not match either the standard time offset derived from MaxMind, or the daylight saving time offset, if it existed, and 631 IP addresses did not resolve to a timezone in MaxMind’s free database.

6 Conclusions and Future Work

We observe a wide variety of implementation behavior of the ICMP timestamp reply type, caused by timestamps’ lack of a modern use but continued requirement to be supported. In particular, we are able to uniquely fingerprint the behavior of several major operating systems and kernel versions, and geolocate Internet hosts to timezone accuracy with >90% success.

As future work, we intend to exhaustively scan and classify the IPv4 Internet, scan a subset with increased frequency over a sustained time period, and to do so many vantage points. We further plan to integrate the OS-detection capabilities we uncover in this work into nmap, and add tracebox functionality to sundial in order to better detect middlebox tampering with ICMP timestamp messages.