Keywords

1 Introduction

Our focus is to mitigate Distributed Denial of Service (DDoS) attacks against Domain Name System (DNS) servers. DNS servers are an essential component of ICT based Information Systems. By Information Systems, we are using the category definition of, “Technology View”, as defined in [3]. Our case study is concerning a Web (including Web application) service. Web servers are the most popular type of public Internet servers, in “Technology View” based Information Systems. Less common public Internet Information System servers are email and database servers, however even email services are sometimes accessible via Web servers. We have made progress to mitigate these attacks.

1.1 Background

For computer networking, the DNS service is used to translate between symbolic names and IP addresses. The DNS hierarchy levels are shown Fig. 1 [25].

Fig. 1.
figure 1

Summary of DNS hierarchy

Here is a simplified example, showing how DNS works. To access Twitter via a Web browser, you type in the URL, such as https://Twitter.Com. Your browser then makes a request to the operating system to resolve the symbolic name (Twitter.Com) and returns an IP address. The operating system will already have one or more DNS servers configured (perhaps manually and perhaps automatically via Dynamic Host Control Protocol (DHCP)).

One way or another you should get the relevant IP address. However, if the DNS service is under attack, you might not get an answer at all. The DNS name to IP address resolution process is shown in Fig. 2.

Fig. 2.
figure 2

DNS name to IP address resolution

There are many diverse types of network attacks which are intended to decrease the availability of services. One common type is a DDoS attack. DDoS attacks are easy to perform and are difficult to defend against. There are malicious companies who charge a service fee and then perform a DDoS attack, on your behalf. A general DDoS attack strategy, based on a Botnet is shown in Fig. 3.

The term Botnet comes from robot and network. A Botnet is a large group of computers, which are under the command and control of the Botnet owner. For a malicious Botnet, the machines were often exposed to Malware. These Botnets are often a collection of thousands (or even hundreds of thousands) of machines. Botnet owners can use the machines to perform a coordinated DDoS attack.

DDoS attacks can render DNS and other online services unavailable [24]. If the attacks are successful against DNS servers, then clients will be unable to obtain the IP address of the intended online service.

1.2 Motivation

Dyn is a DNS hosting provider to thousands of different organizations. For example, Twitter uses Dyn for their DNS hosting. We will refer to Twitter as our case study. Twitter has 313 million monthly active users and one billion unique visits to sites with embedded tweets [20]. There was a successful DDoS attack against Dyn in October 2016, and therefore Twitter became unavailable to many of their customers [1].

Fig. 3.
figure 3

General DDoS attack

Once Dyn’s DNS servers were no longer fully available, no user could contact several Internet Services, such as Twitter. The problem was that Twitter’s customers tried to perform a symbolic DNS name to IP address translation, but they never received a valid answer. For example, when users asked for the IP address corresponding to Twitter.com, they did not receive an answer. This caused Dyn’s DNS hosting customers, including Twitter, to become unavailable, even though Twitter’s Web servers were up and running. If there were adequate easy to implement DNS DDoS defenses, this Dyn DDoS DNS attack should not have occurred.

The research community has provided many general DDoS solutions, but we were unable to find any DNS specific design guidelines and DNS protocol changes, which would mitigate the DDoS DNS attacks. So, our contribution is to find minor DNS protocol changes, which can be used to mitigate DNS DDoS attacks.

1.3 General Related Work

A brief mention of the more general related work follows: In [28], S. Zargar et al. provides a Taxonomy of DDoS Defenses. In [1], V. Almedia, et al. discuss DDoS and cyberwarfare. In [29], K. Zeb et al. provides a survey of DDoS attacks and defenses in cyberspace. In [19], R. Soni et al. provides a summary, concerning security in the public clouds.

1.4 Contributions

We believe that part of the problem is that many companies do not have the appropriate security knowledge to defend against some types of DNS DDoS attacks and we’ll discuss our related contribution in this paper.

With our TTL2 contribution, we allow DNS authoritative server administrators to have much control over the actual TTL at the DNS resolvers, recursive servers, and clients. With our TTL3 contribution, we allow DNS authoritative server administrators to have much better control over the actual DNS cache timeout, at the DNS resolvers, recursive DNS servers and clients. Even when the end client does not support our TTL2/3 timers, we show how the intermediate DNS resolvers and recursive servers can automatically provide enhanced DNS timer functionality, with no changes to the end client. A summary of our specific contributions is the following:

  1. 1.

    We propose best practices, which can mitigate some types of DNS DDoS attacks

  2. 2.

    Our proposed DNS TTL2 protocol enhancement

  3. 3.

    Our proposed DNS TTL3 protocol enhancement

  4. 4.

    We show how the DNS clients can take advantage of our enhancements with no changes

1.5 Outline of This Article

The rest of this article is organized as follows. In Sect. 2, our design guidelines to mitigate some DNS DDoS attacks is presented. In Sect. 3, our DNS TTL2 contribution is presented, which allows better control of the DNS TTL. In Sect. 4, our DNS TTL3 contribution is presented, which allows better control over DNS caching. In Sect. 5, we cover the related work (which had not been mentioned previously). We wrap it up, with our conclusions and recommended future work, in Sect. 6.

2 Contribution 1 - Design Guidelines

Without making any changes to the DNS protocol, it is quite easy to mitigate some types of DNS DDoS attacks. We’re using Dyn and Twitter, as our case study, so let’s evaluate the recent Nov. 2016 Dyn attack, that effected Twitter [1]. A summary of the relevant DNS structure for Twitter, follows:

  1. 1.

    DNS root servers

  2. 2.

    DNS TLD servers, including the .Com TLD servers

  3. 3.

    DNS Twitter.Com servers (which were Dyn servers, since Twitter was hosting their DNS at Dyn)

The attack was not against the DNS root or TLD servers, so we can ignore them for the moment. We know the attack was against the Dyn DNS hosting provider, but it affected Twitter. However, was the attack specifically against Twitter. Keep in mind, Dyn hosts DNS for thousands of organizations. Therefore, we should not assume that the attack was specifically against Twitter. Likewise, we should not assume that the attack was specifically against Dyn. The attack could very well have been against just one of Dyn’s DNS hosting customers, but not against Twitter.

To simplify the conditions under which our contribution is helpful, let’s simplify the case study, as follows: Dyn was hosting DNS for 1,000 organizations. The November attack was against one organization, which was not Twitter. We’ll assume that the attacked organization’s domain name was, Under-attack.com.

So Dyn’s own DNS servers were hosting for Under-attack.com, Twitter.com, and 998 other organizations. When the attack started against Under-attack.com, as a by-product, Dyn’s DNS servers were attacked, which meant that they were not 100% available, for their other DNS hosted customers. So, since Twitter was DNS hosted on the same exact Dyn DNS servers, Twitter was also affected.

Our design guideline contributions are that Twitter should have developed a script ahead of time, to deal with this potential vulnerability. The first step is that Twitter should have developed a simple Linux script, which would simply do the following: Make simple DNS requests against all of Twitter’s DNS servers, which were hosted at Dyn. The script should measure DNS resolution availability. When the availability dropped below 100%, the following should have been performed: The script should have automatically made a configuration change at the .Com servers, and removed the specific Dyn hosted Twitter DNS servers, which were not at 100% availability. The script should have also immediately enabled DNS hosting at another DNS hosting provider, such as Microsoft, Google, etc. The accounts should have been enabled ahead of time. Under certain conditions, Twitter could have simply moved all DNS hosting to some other provider. With this simple script, Twitter would have only been off-line for a short amount of time for new customers. All the other 998 customers could have run a similar script to greatly mitigate the DNS availability issues for their domains.

If you are considering scripts, we recommend that you check our Google’s DNS hosting over HTTPS RESTful JSON API [10]. To learn about OpenDNS’s related offering, we refer you to [15] and to [9].

3 Contribution 2 - Anti-DDoS Timer TTL2

Some relevant background on DNS timers is now in order. A summary of the DNS process is found in Fig. 4.

DNS server records include what is called a TTL (time to live). The TTL field is in seconds. Let’s assume the TTL for the Twitter’s main web page is set for 1 h (3600 s). One might think that after 1 h, clients will try to contact their DNS server, to get a new copy of the DNS record. However, end user DNS clients almost never would contact Twitter’s authoritative DNS servers directly (at least far less than 0.01% of the time). As shown in Fig. 4, clients will contact intermediate caches, resolvers, and recursive servers to resolve the symbolic to IP address translation.

Fig. 4.
figure 4

Summary of DNS process

We’ll provide an example of how the actual TTL does not provide all the control, that we may wish to have (at least not as we might expect), by the end client. We include the logic showing that our TTL2 feature is superior.

Time 0, 10:00: Client 1’s Web application requests the Twitter.com name to IP address resolution. The answer is not found on the client 1’s cache, so the client 1 asks its DNS server, which is ISP 1’s recursive DNS server 2, for the answer. DNS server 2 gets the answer from the authoritative server and delivers the answer to client 1, at 10:00:10.

Time 1, 10:01: The server changes its DNS record entry for Twitter.com. The above issues become a big problem, as related to DNS DDoS attacks. I.E., after a DNS DDoS attack, the authoritative server may wish to change their DNS records. However, the authoritative server cannot fully control when all clients will timeout their TTL, due to the above specific DNS design limitation.

Time 2, 10:30: Client 1’s Web application requests the Twitter.com resolution again. Client 1 finds the answer in its cache. It should be noted that this is a design limitation of DNS, in its current design.

Time 3, 10:59: Client 2’s Web application requests the Twitter.com resolution. The answer is not found on the client 2’s cache, and client 2 is using the same ISP recursive DNS server 2. So client 2 asks its DNS server, which is ISP 1’s recursive DNS server 2, for the answer. ISP 1’s recursive DNS server 2 has the answer cached, due to client 1’s previous request. So ISP 1’s recursive DNS server 2 provides the cached answer to client 2, at 10:59:10. However, that entry is no longer valid (as of 10:01).

Time 4, 11:58: Client 2’s Web application again requests the Twitter.com resolution. The answer is found in client 2’s cache (received at 10:59:10). It has only been 59 min since client 2 received the answer, so the cache entry will be used. However, this entry was configured as invalid by the authoritative DNS server at 10:01. This is also the intended design, that clients may lose access (in this case) for up to one hour plus one hour times the number of hops, due to this caching issue. There was one hop, so the current DNS design is that those clients can lose access for up to two hours. The issue is that the DNS zone administrators have no control over this issue, which we will now solve.

As a contribution, we propose some small changes to the DNS protocol, to mitigate this problem. We propose to continue to use the TTL, as is done today. We propose to add a new field, called TTL2. TTL works as today, where the TTL is only based on when the downstream DNS server or client received the TTL. Our proposed TTL2 is a timer, which decreases based on when the very first DNS server received the record, from the authoritative server. So, our TTL2 should be considered as an absolute timer, based on only when the authoritative server sent the record. Let’s assume that Twitter sets the TTL timer to 5 min and sets the TTL2 timer to 150% times TTL, or to 7.5 min.

Here is how our proposed downstream DNS sever works, concerning the TTL2 field. Let’s suppose the downstream DNS server receives a request, and forwards the DNS record to client 1. As this time, client 1 would receive the DNS record with TTL set to 5 min and TTL2 set to 7.5 min. Let’s assume that 4 min later, client 2 asks for the same record. The recursive DNS server would serve this from its cache. However, the DNS server would change the TTL2 from 7.5 to 3.5 (subtract the time that passed, which was four minutes). Client 2 would receive the DNS record with TTL set to 5 min and TTL2 set to 3.5 min. With this TTL2 record, client 2 would know it should perform a new DNS request after just 3.5 min. With our solution, it does not matter how many DNS servers (supporting TTL2) are between the original server and the end clients. The clients can always ask for a new record after TTL2 expires, which is independent of the number of intermediate DNS routers. Even if the authoritative DNS server or a downstream DNS server does not support TTL2, any downstream DNS server could also assign 150% (or whatever they are configured to do by default) to TTL2. Our TTL2 mitigates DNS DDoS attacks by allowing the DNS owner to have much stronger control, as to when the caches expire.

Let’s now assume that the client does not support TTL2. As long as any upstream DNS resolver or server supports TTL2, they could properly answer the client, with their adjusted original TTL value.

4 Contribution 3 - Anti-DDoS Timer TTL3

However, there is another major limitation with DNS, concerning DDoS attacks. Let’s suppose all of Twitter’s customers have a cache DNS entry of Twitter’s main site. Then Dyn’s DNS service becomes unavailable for a few hours (longer than the TTL or TTL2). Then Dyn can no longer serve DNS records to their clients, since the DDoS attack reduces their availability. Since the client’s TTL and TTL2 have expired, they will not use the stale DNS cache record. The client will not use the stale cache and has no access to the DNS service (which is down). Even with stale DNS caches, the clients no longer have access to Twitter.

As another contribution, we propose a slight change to the DNS protocol, to mitigate this specific DDoS problem. So, we propose a new field TTL3 which is also a timer. We call it the DNS service down field, meaning that Twitter can configure how long their DNS might be offline (DDoS attack or not), under which time, the clients and DNS server are specifically instructed to keep using their cache entries.

We’ll provide an example of how the actual TTL does not provide all the control, that we may wish to have (at least not as we might expect), by the end client. We include the logic showing that our TTL3 feature is superior.

Time 0, 10:00: Client 1’s Web application requests the Twitter.com name to IP address resolution. The answer is not found on the client 1’s cache, so the client 1 asks its DNS server, which is ISP 1’s recursive DNS server 2, for the answer. DNS server 2 gets the answer from the authoritative server and delivers the answer to client 1, at 10:00:10.

Time 1, 10:01: The server changes its DNS record entry for Twitter.com. The above issues become a big problem, as related to DNS DDoS attacks. I.E., after a DNS DDoS attack, the authoritative server may wish to change their DNS records. However, the authoritative server cannot fully control when all clients will timeout their TTL, due to the above specific DNS design limitation.

Time 2, 10:30: Client 1’s Web application requests the Twitter.com resolution again. Client 1 finds the answer in its cache. However, client 1’s cache entry has been invalid since 10:01. It should be noted that this is a design limitation of DNS, in its current design.

Time 3, 10:59: Client 2’s Web application requests the Twitter.com resolution. The answer is not found on the client 2’s cache, and client 2 is using the same ISP recursive DNS server 2. Client 2 asks its DNS server, which is ISP 1’s recursive DNS server 2, for the answer. ISP 1’s recursive DNS server 2 has the answer cached, due to client 1’s previous request. ISP 1’s recursive DNS server 2 provides the cached answer to client 2, at 10:59:10. However, that entry is no longer valid (as of 10:01).

Time 4, 11:58: Client 2’s Web application again requests the Twitter.com resolution. The answer is found in client 2’s cache (received at 10:59:10). It has only been 59 min since client 2 received the answer, so the cache entry will be used. However, this entry was configured as invalid by the authoritative DNS server at 10:01. This is also the intended design, that clients may lose access (in this case) for up to one hour plus one hour times the number of hops, due to this caching issue. There was one hop, so the current DNS design is that those clients can lose access for up to two hours. The issue is that the DNS zone administrators have no control over this issue, which we will now solve.

Well provide an example of how the actual TTL is not obeyed (at least not as we might expect), by the end client

Assume that Twitter wants to allow clients who have accessed their site during the past week, to continue to use their DNS cache for at least one more week. Twitter would then set the TTL3 to two weeks. The reason for two weeks instead of one week is the following. A client might have accessed Twitter’s site six days ago. However, Twitter wants them to continue to use their cache for at least one week. If the TTL3 was set to one week, this client would stop using their cache after just one day. Now, after the cache times out, it would look at its TTL3 timer, and would then continue to use its cache entry (for at least one week).

Let’s now assume that the client does not support TTL3. As long as any upstream DNS resolver or server supports TTL3, they could properly answer the client, with their stale cache entry. We found that OpenDNS has a similar proprietary solution, which only works on their servers and has other limitations [15].

5 Related Work

Our literature review search initially focused on the more recent DDoS papers, published in 2014 or after. This research topic is well developed. Here are the number of hits, via a few DDoS searches: Via Semantic Scholar, we had 3,352 hits (filtered by just Computer Science). Via Scopus, we had 967 hits (filtered by just Computer Science). Via Web of Science, we had 658 hits (filtered by just Computer Science), which included three sub-categories. We reviewed papers with a higher number of citations, and papers which were more influential. This literature review helped us in this and other papers.

For this article, we then moved back and forth, after reviewing the references and which other papers cited these papers. Some of the more relevant papers follow, with our comments.

5.1 Our Related Work

When we discuss the following works, we will first present the work, as presented by the original authors. When we add our comments, we will precede our viewpoints with the following, “Our Comments:”. A few of our own papers, which are related to this article and which we build upon follow:

  1. 1.

    In [4], we (Booth and Andersson) introduced ways to strongly mitigate DDoS reflection attacks. Again, our technique was based on micro-segmentation, and protocol/port firewall rule-sets. The destination IP/protocol/Port was fixed. Our New Comments: This previous paper was a general solution, and to address DDoS (in general). However, it did not take into account the specific DNS protocol. In this article, we carefully analyzed the DNS protocol and found better specific mitigations solutions for just DNS DDoS attacks.

  2. 2.

    In [5], we (Booth and Andersson) continue our previous work, but extend the defense to all 3/4/7 network attacks and discuss how to defend against each specific attack. One method is to hide the service behind unique URLs. It is shown how to hide servers behind secret URLs, where a client must be authenticated, to obtain the secret URL. If there is an attack against any URL, we know exactly which client has leaked the information. Our New Comments: This article is similar, but focused only on DNS service DDoS attacks. Therefore, we covered numerous specific DNS DDoS attacks and DNS specific protocol change mitigation techniques were provided.

5.2 Other Related Work

We now present other related work to show:

  1. 1.

    How we have accepted previous knowledge

  2. 2.

    How we build upon that knowledge

  3. 3.

    We provide limited comments, concerning the related works

Also, we widened the scope of our literature review to include DDoS related papers, which were not specifically concerning DDoS DNS attacks in order to gain a stronger general theoretical background.

In [21], S. Venkatesan et al., replaced the active public IP addresses on an hourly basis to mitigate the DDoS effect of attacks. In [2], A. Aydeger et al., also performed this moving target strategy, but took advantage of SDN. In [23], H. Want et al., also propose to conceal the address changes from clients, in order to determine which clients are malicious. Our Comments: We really like the idea of dynamic change of IP address and these authors have provided great contributions. However, we have two issues against the hourly specific strategy. Since they change the addresses on an hourly basis, either (1) there was no attack and they changed the address too early, or (2) there was an attack earlier, and they waited too long, before changing the address. So, we suggest that the IP address is simply changed only upon attack, and immediately after the attack. SDN simply provides a more efficient solution to a non-SDN solution, which can at times be very helpful. If the reader is interested in SDN, we recommend reviewing [7], Esch, J., to see how SDN can help. Also, in [11], Lim, S. et al. has a lot of specific information as to how SDN can be used to detect and prevent DDoS attacks.

In [27], S. Yu et al. attempt to mitigate DDoS attacks via filtering inline traffic. Our Comments: Filtering inline is a good strategy when the DDoS attack bandwidth is low.

In [8], S. Fay et al. tries to mitigate DDoS attacks via scaling up, based on the attack traffic level. Our Comments: This is a good strategy to defeat DDoS attacks. However, we must point out that to scale up in the cloud after an attack, often does incur significant charges. These charges can be extremely high, depending on the strength of the attack. So, we recommend that limits are configured, so that the organization is not surprised with a very expensive invoice. To the extent that the limits are reached, then the DDoS attack becomes very successful.

In [11], A. Yaar et al. propose a path identification method, to mitigate DDoS attacks. Our Comments: Path identifiers to defeat a DDoS attack are an extremely interesting approach. However, if one can’t stop Botnet clients from repeatedly sending massive numbers of requests, requiring traversal via path identifies, we don’t see how this will help (however we wish to disclose that we are not yet experts, at DDoS prevention via path identifiers).

In [11], S. Lim et al. propose an SDN solution to defeat DDoS attacks. In [17], R. Sahey et al. also propose DDoS mitigation via SDN. In [14], S. Mousavi et al. propose to use SDN, and to try and prevent inline direct SDN attacks. Our Comments: SDN is only a technique for much more efficient and perhaps lower cost networking. Therefore, SDN has the same exact DDoS issues to solve, in general. Having said that, they have done good work at using SDN to mitigate DDoS attacks, in a more efficient way.

In [22], B. Wang et al. propose a DDoS attack mitigation architecture that integrates a highly programmable network monitoring to enable attack detection and a flexible control structure to allow fast and specific attack reaction. Our Comments: For the DDoS general case, they have done good work. However, our work was a bit more focused, on DNS DDoS attacks. This allowed us to develop a much simpler strategy, to detect an attack. We simply treat the DNS service as a black box. From the outside, we simply run a script to see if we receive 100% DNS answers to our DNS queries. If not, we assume an attack. Having said that, we could of course, after we detect loss of availability, reuse their contribution to try and determine the type of attack and strength.

In [6], J. Czyz et al. propose how to mitigate reflection DDoS attacks, via the NTP protocol. Our Comments: Reflection DDoS attacks are a major issue because they can generate huge bandwidth attacks. To mitigate these problems, organizations can host their DNS services at providers such as Dyn, Microsoft, Google, etc. In security, we call this a transfer of risk. However, as we have shown even Dyn lacked the knowledge or capability to eliminate the vulnerabilities. So, Twitter thought that they transferred the risk to Dyn, however Twitter actually maintained some of the risk.

In [21], S. Venkatesan et al. propose a moving target DDoS defense. Our Comments: The idea that after an attack, the services move or change IP addresses is a fantastic contribution. We really hope a lot more researches explore this moving target defense strategy.

In [16], C. Rossow provides a great study, concerning DDoS amplification attacks. Our Comments: However, these attacks can be mitigated by simply performing stateless filtering. Having said that, if the bandwidth is too high, the filtering no longer becomes a valid solution.

In [18], M. Shtern et al. have an interesting study, of DDoS, when the attack is low and slow, and how to deal with this special case of attacks.

6 Conclusion

As stated before, as a reminder, we have the following contributions:

  1. 1.

    We propose best practices, which can mitigate the effects of DNS DDoS attacks.

  2. 2.

    Our proposed DNS TTL2 protocol enhancement allows DNS authoritative server administrators to have much control over the actual TTL at the DNS resolvers, recursive servers, and clients.

  3. 3.

    Our proposed DNS TTL3 protocol enhancement allows DNS authoritative server administrators to have much better control over the actual DNS cache timeout, at the DNS resolvers, recursive DNS servers and clients.

  4. 4.

    Even when the end client does not support our TTL2/3 timers, we show how the intermediate DNS resolvers and recursive servers can automatically provide enhanced DNS timer functionality, with no changes to the end client.

We are planning to implement our DNS backup service and recommend others do this, as future work. We are trying to meet other researchers, who wish to work with us to prevent DDoS attacks, using server-less functions [26], such as that Microsoft’s [13] and via API gateways, such as from Microsoft [12].