Keywords

1 Introduction

In recent years, organizations have made significant progress in many aspects at a global level, thanks to the accelerated growth and constant use of technology in their day-to-day life, that is, digital tools have become critical for organizations and your daily operation. For example, in 2019, more than 3.9 billion email accounts were registered, with an approximate of 559,000 emails sent and received every second during that year, and it is expected that by 2024 the figure will reach 4.3 billion email accounts [1]. However, dependence on digital information and tools brings with it a large number of challenges that organizations must face, since, in recent years, trends in digital crimes and cyber threats have increased considerably. Following the example mentioned above of the use of email, according to Internet Live Stats [2], approximately 2,962,154 emails were sent per second in 2020, with 67% of these being classified as spam. This tells us that digital systems are being pressured by adversaries and not only in quantity, but also in the complexity of the techniques used, such as phishing or Business Email Compromise (BEC), to be able to compromise the systems or steal corporate information [3], this being the main vector of digital fraud and malware distribution in recent years with a recorded loss of $ 1.25 trillion in 2018 alone.

Unfortunately, dependence on technology, as well as the large amount of critical and confidential information handled by organizations today, has become the target of cybercriminals, who have developed new ways to affect integrity, availability, and confidentiality of the organizations’ systems and data, using advanced techniques such as digital hijacking, better known as ransomware, which recorded an approximate loss of 10.1 billion euros in 2019 [4]. The main problem that organizations face is that advanced cyber threats bypass the protection controls implemented in organizations, such as firewalls and antivirus, which are commonly based on detection engines for static signatures or known threats. Therefore, cybersecurity specialists have chosen to develop new ways to prepare for and generate new strategies in the face of these advanced threats, using collaborative intelligence platforms to prevent and/or minimize the risk of a cyberattack before it happens. This practice is known as Cyber Threat Intelligence (CTI), which is defined as knowledge based on evidence, which includes context, mechanisms, indicators, implications, and practical advice, about an existing or emerging threat to information assets of organizations that can be used to inform decisions regarding the subject’s response to that threat [5].

1.1 Contributions

The main contribution of this research is based on the compilation and investigation of the schemes, tools, challenges, and sets of methodologies most used for the execution of a CTI program. This will allow organizations with smaller budgets to use the CTI platform and the methodologies described in this document to generate a defense strategy and stay secure. This chapter is organized as follows:

  • Section II raises the importance of the CTI field, as well as the challenges that generating intelligence represents for organizations.

  • Section III aims to present the research of the most relevant methodologies in the field of CTI during the last years.

  • Section IV’s contribution is the deployment of a TIP exposed to the internet based on deception techniques for the collection of cyber threat events for the generation of CTI.

  • Section V presents the results obtained from the investigation.

  • Section VI presents the conclusions of the research article (Fig. 1).

Fig. 1
figure 1

Contributions and research structure

2 Importance and Challenges of CTI

The CTI field has been widely accepted by different cybersecurity specialists from different industry sectors, such as the energy sector, which has been subject to different types of advanced cyberattacks, for which reason research has been carried out to develop CTI platforms that integrate the strategic, tactical, and operational levels of IT, aiming to provide a comprehensive response to the evolving threat landscape of energy systems [6]. On the other hand, Verizon confirmed 927 incidents of cyberattacks and about 207 cases of disclosure of confirmed unauthenticated data in the financial sector, for which various cyber defense strategies and techniques for the financial sector have been explored [7], as well as the integration of digital systems, Internet of Things (IoT), cloud computing paradigms to develop smart systems, smart homes and smart cities [8]. However, one of the main points that needs to be understood in this area is the difference between data, information, and intelligence to understand CTI. Therefore, we can define data as an individual element that contains information on either a system, an action or an executed process, that is, individual elements with a specific meaning. On the other hand, we can define the term threat as the possible danger that can be used to exploit an existing vulnerability with the intention of causing damage to systems, networks, or entire organizations (Table 1).

Table 1 Differences between data, information, and intelligence [9]

These three terms are sometimes used without much attention according to Dr. Christopher Ahl-berg [9], which explains that some threat feeds are advertised as intelligence when in reality they are just data packets. Organizations often embed threat data sources into their network only to find that they cannot process all the additional data, which only adds to the burden on analysts trying to classify threats. Rather, threat intelligence lightens that burden by helping analysts decide what to prioritize and what to ignore. Therefore, a different context can be given to the terms data, information, and intelligence when talking about cybersecurity (Table 2).

Table 2 Cybersecurity: Data, information and intelligence [9]

Therefore, we can say that CTI is the ability to acquire knowledge about a company, as well as its existing conditions and capabilities, in order to determine the possible actions of a malicious actor or threat when exploiting existing critical vulnerabilities. In addition, it uses multiple information security disciplines (threat intelligence, vulnerability management, security configuration management, incident response, etc.) and sets of tools to collect information about the network through monitoring and reporting to provide decision makers at all levels to prioritize resource allocation to perform risk mitigation.

2.1 CTI Challenges

The main challenge in executing a CTI strategy focuses on the quality of intelligence obtained through data analysis and lies mainly in being able to transform this large amount of information into something actionable that can be used to make decisions for top management [10], for example, being able to prioritize activities and assign budget or personnel based on the impacts that may be had based on the data collected. This requires not only time and/or effort from the IT or cybersecurity team, but also organization, collaboration between the different areas, and experience and resources assigned by senior management to carry out these investigations successfully. Although, the commercialization of products and services related to CTI from different developers and manufacturers has helped automate many of the tasks related to the extraction, detection, and update of threats, and especially the automation of responses to incidents [11].

In order to significantly interrupt or prevent the attack or intrusion of the adversary, it is necessary to have a defensive strategy and prepare the infrastructure to take into account the security needs, controls, processes, and resources that must be available. With knowledge of the adversary’s tactics and objectives, defenders must prepare their infrastructure to counter attacks in the widest range possible to cover all possible adversaries’ attack vectors. However, there is some research that mentions this as a great challenge, since it is mentioned that the CTI field lacks a mature methodology, which can affect the analysis of threats and adversaries by defenders [12]. Despite its challenges, CTI should not be ruled out yet, since it is an emerging field that has great potential and constant development by the cybersecurity analyst community, with the objective of applying defense strategies and controls against adversaries. seeking to engage an organization

3 CTI Methodologies

There are different methodologies in the field of CTI, which could be defined as a structure to think about how attackers operate, discover their methods and in which part of the general life cycle of the attack that event is occurring. More specifically, CTI methodologies allow us to be very prescriptive in how we attack a specific situation, that is, they allow us to focus attention on the appropriate areas to ensure monitoring and mitigation of existing or emerging threats. They also provide a common language to communicate internally and also externally regarding threat details, interrelationships between events, and correlations with external data sources. So, we can say that they allow us to connect and understand where something is happening and focus our resources within that small area rather than trying to take a reactive approach. In addition, it allows us to be much more focused on the specific area that needs our attention to assign specific resources to a threat or technique used by an adversary that could harm our organization. That way, defenders don’t waste time, effort, and resources working in areas that are not necessarily affected or perhaps not necessarily relevant to the incident they are facing or are about to face.

3.1 Cyber Kill Chain Model

The methodology developed by Lockheed Martin [13] is based on the military concept of “Kill Chain,” which consists of seven different areas that allow us to understand in which part of the process or attack chain a specific threat occurs, whether in reconnaissance, weaponry, delivery, etc. Therefore, if we understand where that threat is in the process, we can focus our resources and our efforts on mitigating it. And if we have a proper framework, we can understand what actions need to be taken in that area so that we can respond quickly to the opponent’s techniques [14]. For example, first comes recognition, where the adversary is looking for a weakness, that is, obtaining registration credentials or information that can be used for a phishing attack, where in this case, the weakness is the user who you receive the phishing added to the vulnerabilities of your device. The next thing is weaponization, which consists of creating the delivery of the threat, using an exploitation technique (exploit), such as a Backdoor typically. Delivery is the process of sending that payload to the victim, which could be a malicious email, it could be a USB memory left on a desk on the floor of a parking lot near the organization or even in the parking of this. Then we have vulnerabilities, in which the attacker performs the act of executing the code on the remote system and then proceeds to the phase of the actual installation of malware in that target. And that brings us to C&C, which will create a channel or persistence where the attacker can control the system remotely by sending instructions with some specific objective or purpose. At that point, the attacker already has control of perhaps more than one system; therefore, it is very important to monitor the different C&C channels detected in our network. And at the end of this, the desired actions are performed. So that’s the intended goal, whether it’s encrypting data, destroying it, exfiltering it, etc (Fig. 2).

Fig. 2
figure 2

Cyber Kill Chain Model phases [14, 15]

This methodology describes an intelligence-based, threat-focused approach to studying intrusions from the perspective of adversaries. Each phase of the intrusion is assigned to courses of action for detection, mitigation, and response and shows that there are certain phases that the adversary has to fulfill in order to complete its objective, and that in addition, they have been used in different fields of the industry to generate strategies based on taxonomies of specific threats such as Trojans that affect the financial sector [16] and various investigations are explored on how to be able to predict certain threats or adversaries’ behavior [17], which leads us to think that this model is based on the reconstruction of an attack in order to better understand it and mitigate it. Therefore, if we can understand where a specific action is in the process, we will know how to focus our efforts and resources to mitigate the threat before it reaches a final phase, such as a link in a chain is broken. the objective is to break the progress of the cyberattack in any of its phases [18].

3.2 The Diamond Model of Intrusion Analysis

Intrusion analysis has long been considered an art to be learned and practiced, rather than a science to be studied and refined. However, approaching it only as an art has long-delayed improvements and understanding, further slowing down the evolution of threat mitigation that relies on efficient, effective, and accurate analysis. Unknowingly, analysts have used the Diamante model for decades, but have lacked the full framework to understand, improve, and focus their efforts. This model describes the main capacities and characteristics of an intrusion event: adversary, capacity (techniques and tools used by an adversary), infrastructure, and victim, which are linked in a diamond-shaped diagram, in which the edges are used to represent relationships between features that can be exploited analytically to discover and further develop awareness of malicious activity [19], that is, the model describes that an adversary deploys a capacity on some infrastructure against a victim (Fig. 3).

Fig. 3
figure 3

The Diamond Model of intrusion analysis [19]

These activities, in turn, are known as events, which define a series of steps that the adversary must execute to achieve their objective. Likewise, the authors of the model describe 7 fundamental bases to understand the intrusion model process in the form of Axioms, where the objectives and/or needs of the adversaries are defined in order to meet their objectives (Table 3).

Table 3 Axioms of the Diamond Model [19]

The model establishes a formal method that applies scientific principles to the analysis of threats, in particular, those of measurement, testability, and repeatability, providing a comprehensive method of documentation, synthesis, and correlation of the activity [20]. For this reason, we can say that both the events and the processes or threads of each activity carried out by the attacker are necessary elements for a complete understanding of the malicious activity itself, since a more effective and strategic mitigation requires an understanding and context of the intrusions themselves, with the main objective of being able to expand the panorama of the threat and understand the phases and processes of this. This scientific approach and simplicity produce improvements in analytical effectiveness, efficiency, and precision. Ultimately, the model provides opportunities to integrate and generate real-time intelligence for network defense, automating correlation between events, confidently classifying events in adverse campaigns, and forecasting adverse operations while planning and playing mitigation strategies.

While the Kill Chain model provides information on the attackers’ operations, the Diamante model broadens the perspective and context of the attackers between each of the intrusion phases, that is, it together allows to have a broader view of the attacker and not just the technical indicators. In addition, the Diamante model provides a formal mathematical method for the analysis and grouping of effective graphs (e.g., grouping/ranking) to solve many kinds of analytical problems. [19]. The Diamond Model identifies how and why an attack occurs, since we can see that an attacker attacks a victim based on two main attributes, called infrastructure and capacity, precisely capturing and organizing the fundamental concepts that underpin everything that does intrusion analysis, as well as how intrusion analysis is synthesized and used for network defense and mitigation [21]. However, its greatest contribution is that it ultimately applies scientific rigor and principles of measurement, testability, and respectability to the domain, allowing intrusion analysis to be more effective, efficient, and accurate, leading to faster, more effective, and efficient mitigation to defeat adversaries.

3.3 MITRE ATT&CK

MITRE is a nonprofit organization that works in the public interest in federal, state, and local governments, as well as in industry and academia. Likewise, it contributes to different areas such as artificial intelligence, intuitive data science, quantum information science, health informatics, space security, political and economic experience, reliable autonomy, threat exchange, and cyber resilience [22]. Likewise, this organization developed a methodology called ATT&CK, which is used as a basis for the development of specific threat models in the private sector, in the government, and in the community of cybersecurity products and services that contain known behaviors of attackers or adversaries, better known as advanced persistent threats (APT), which are organized attack groups with a large amount of resources and advanced techniques that allow them to carry out complex attacks [23], and in turn, research related to this methodology in critical infrastructures has been presented [24]. For example, they often stay on an organization’s network and obtain information before proceeding to the next phases of their attack. ATT&CK focuses on how external adversaries engage and operate within computer information networks (Table 4).

Table 4 MITRE ATT&CK core components [25]

This model originated from a project that aimed to document and categorize postengagement adversary tactics, techniques, and procedures (TTP) against Microsoft Windows systems to improve the detection of malicious behavior. [25]. Since then, it has grown to include other operating systems, as well as other areas such as mobile devices, cloud-based systems, and industrial control systems. The basis of ATT&CK is the set of techniques and sub-techniques that represent actions that opponents can perform to achieve objectives. These objectives are represented by the categories of tactics to which the techniques and sub-techniques belong. The relationship between tactics, techniques, and sub-techniques can be visualized in the ATT&CK Matrix (Fig. 4).

Fig. 4
figure 4

MITRE ATT&CK Enterprise Matrix [26]

MITRE ATT&CK offers cybersecurity analysts a common language to structure, compare, and analyze CTI for any organization that wants to move toward an informed defense on existing or emerging threats, as it includes information on malware, tools, TTP, business techniques, behavior, and other indicators associated with threats.

3.4 MITRE Shield

Based on the ATT&CK methodology, a new methodology called MITRE Shield was developed, which is based on the implementation of the concept of active cybernetic defense, which aims to carry out cyberdefensive actions until being able to deceive the adversary, having an active participation with him to study and learn more about the tactics and techniques used to generate a CTI and prepare for future threats. The Shield matrix consists of the following components (Table 5).

Table 5 MITRE Shield core components [27]

Within Shield, there is also a matrix of tactics that denotes what the defender is trying to achieve through columns and techniques, which describe how the defense achieves the tactics. However, those terms have been made to fit the domain of Active Cyber Defense. These tactics within MITRE Shield must be taken into account as a strategy of each planned active defense operation in order to respond to any intrusion from an adversary or threat. Given this, it is necessary to develop the techniques described within Shield to implement security controls in an operational environment. MITRE found that a single technique can be compatible with several different tactics, and for any tactic, there are multiple techniques that can be used. In addition, it has a section where ATT&CK tactics and techniques are mapped so that defenders can have the applicable active defense information, including the presented opportunity space, the active defense technique that will be implemented, and the use case for that implementation (Fig. 5).

Fig. 5
figure 5

The Shield Matrix [28]

The combination of ATT&CK and Shield methodologies can help defenders deepen their understanding of the opponent’s behavior and engagements and suggest ways the defender can implement an active defense strategy. [28]. Therefore, the goal of Shield is that defenders can take advantage of the tactics and techniques of this methodology to better create, implement and operate their active defense solutions, showing how the defensive side of Shield to align with ATT&CK, with the main objective that organizations and their defenders can take advantage of both strategies to maximize their defensive efforts and be able to generate a more solid strategy that allows them to learn from adversaries while defending against them.

4 Threat Intelligence Platform (TIP)

The techniques and results that CTI has provided in recent years have gained a great deal of attention in cybersecurity communities as a way to forecast potential threats and reduce attack detection time in terms of supply chain processes. death, as well as the use of open-source intelligence information (OSINT) is becoming a fundamental approach to gain awareness about cybersecurity threats; however, to process the large amount of information is usually one of the most important challenges in this area for defenders. Therefore, cybersecurity researchers and analysts have opted for the use of intelligence tools and platforms to be able to manage all threat analysis tasks in an orderly manner. A threat intelligence platform (TIP) helps organizations aggregate, correlate, and analyze threat data from multiple sources in real time and is comprised of several core features that enable organizations to implement an intelligence-driven approach to security to support defensive actions. The main purpose is to help organizations understand risks and protect against a variety of threat types that are likely to affect their environments collected from different intelligence sources (Table 6).

Table 6 Threat Intelligence Platforms stages

A TIP automatically analyzes the content of the threat indicators and the relationships between them to enable the production of useful, relevant, and timely threat intelligence from the data collector. This analysis allows the identification of tactics, techniques, and procedures of the threat actors or TTPs. These platforms can be a cloud or on-premises system to facilitate threat data management from a variety of existing security tools, such as SIEM, Firewall, API, Terminal Management Software, or Intrusion Prevention System (IPS). Investigations and papers have been presented where some threat intelligence exchange platforms are evaluated [29] [30], where users from the IT community and other communities in general can share their incident information in a trusted environment, such as Malware Information Sharing Platform (MISP) [31], which allows us to share, store, and correlate indicators of compromise of targeted attacks, threat intelligence, financial fraud information, and vulnerability information [32]. The malware information exchange platform can be accessed from different interfaces, such as a web interface (for analysts or incident handlers) or via a ReST API (for systems that push and pull IOCs). The inherent goal of MISP is to be a robust platform that ensures smooth operation when revealing, maturing, and exploiting threat information.

4.1 T-Pot: Platform Based on Deception Techniques

Currently, there are tools and/or platforms whose main objective is to be able to simulate a productive environment or service to maintain active communication with the adversary, commonly known as honeypots, which allows defenders to collect more information about the TTPs to be able to generate CTI. There are different honeypot platforms on the market, especially by some cybersecurity manufacturers, such as TrapX Security [33], Attivo Networks [34] and Fortinet [35], that help to detect external and internal threats, and studies reveal that two-thirds of the incidents found were from external actors, while the remaining third involved internal actors [36]. There are some investigations related to platforms that have multi-honeypot systems, which have focused on the investigation and relevance of the data collected through attacks. [37]. T-Pot is based on a vanilla ISO image of Ubuntu 14.04.02, which is heavily dependent on docker and docker-compose. The goal proposed by T-Pot is to create a system whose full TCP network range, as well as some important UDP services, acts as decoys for adversaries, in order to forward all incoming attack traffic to the honeypot sensors. more suitable for interacting and processing said information [38] (Fig. 6).

Fig. 6
figure 6

T-Pot arquitecture [38]

The project provides multiple coupled honeypots and a large number of preinstalled research tools, such as ELK, which provides a search engine and analytics, as well as an interface for data visualization. [39], Spiderfoot, which allows to perform Footprinting tasks, by acting as an aggregator of a multitude of sources, on which it allows a simple and fast search by having its own web interface [40], Cyberchef—a web application for data encryption, encoding, compression, and analysis [41]; Suricata—capable of real-time intrusion detection (IDS); online intrusion prevention (IPS); network security monitoring (NSM); and offline pcap processing [42], among others (Table 7).

Table 7 T-Pot honeypots [38]

5 Experimentation and Results

In this investigation, we deployed T-Pot in the public cloud of Amazon Web Services (AWS), with the main objective of exposing the available honeypots of the T-Pot platform to the internet, in order to be able to collect real threats that are mostly automated in cyberspace, which will allow us to collect some of the adversaries’ TTPs and analyze the information to model it with the methodologies described in the document through case studies, with the purpose of being able to identify how each of the models facing a real cyberattack and how they complement each other. The following is the data collected by the platform for 20 days and the characteristics and resources used to deploy the T-Pot platform on AWS (Fig. 7 and Table 8).

Fig. 7
figure 7

Detected attacks on T-Pot honeypots at AWS after 20 days

Table 8 AWS-arranged resources to deploy T-Pot

5.1 Case Study

In this section, we analyze the data collected by the honeypots exposed to the internet to analyze the Cyber Kill Chain, ATT&CK, and Shield and Diamond Models, in order to understand some of the phases and TTP’s carried out by the adversaries to achieve their objectives within the infrastructure deployed.

The case study that arises involves a victim who deploys a service with SSH, or Secure Shell, makes it very easy to access remote servers while keeping them safe from hackers. The adversaries used mass scanning techniques to detect known vulnerabilities in the exposed systems. Additionally, adversaries performed email creation using techniques such as social engineering and phishing based on previous GeoIP scans to decide the language to use. Additionally, adversaries generated an exploit to run a webshell as malware from a previously infected website. Different attempts to access the platforms by protocols such as SSH and RDP from IPs classified as malicious were detected. Some of these IPs managed to access the systems through the use of techniques such as Brute Force and dictionary attacks. Once the adversary entered the system, command executions were detected for downloading files through a shellcode to a URL classified as malicious by malware content. After downloading, command executions were detected for downloading malware files from an IP classified as malicious. Once the malicious domains/IPs and the malware downloaded by the adversary were detected, we proceeded to review open intelligence sources such as Virus Total to search for the IPs or domains related to the installed malware. The results of the investigations carried out with the methodologies described in the document are presented below. The Cyber Kill Chain Model helps us understand each of the phases of the attack chain carried out by the adversaries and guides us to be able to start searching for the relevant events in each of the phases (Table 9).

Table 9 Case study results with Cyber Kill Chain Model

With the information collected by the honeypot platform and the Cyber Kill Chain Model, it is possible to delve much more into the tactics and techniques used by the adversaries to understand their objectives and operations through the use of MITRE ATT&CK (Table 10).

Table 10 Case study results with MITRE ATT&CK

Once the TTPs of the adversaries are modeled in ATT&CK, we can use MITRE Shield to make a map of the TTPs found in ATT&CK used by the adversaries to carry out an active defense strategy. For example, initial access to the exposed systems was done through a remote exposed system through a valid account. Given this, Shield gives us the possibility of generating an active defense strategy with the opportunity to validate if the adversary already has credentials from one or more accounts valid for any network system by using a decoy or honeypot to collect more information on the TTPs used by the adversary as we have been doing during this investigation (Table 11).

Table 11 Case study results with MITRE Shield

With Diamond Model, we use a victim-centered approach (honeypot) in order to reveal the connection between the adversary’s related elements, such as infrastructure and capabilities organized in events (Table 12 and Fig. 8).

Table 12 Case study results with Diamond Model
Fig. 8
figure 8

Case study results with Diamond Model

6 Conclusions

The conventional incident response process is generally initiated after a successful exploitation phase is executed, causing defenders to be inherently disadvantaged and their response too late. In this research, we have demonstrated the effective use of the most used methodologies in the CTI field for the detection, mitigation, and generation of CTI during the analysis and investigation process of real TTPs collected using a platform with deception techniques; it was found that these methodologies complement each other. In which, the Cyber Kill Chain Model helps us understand the phases of an adversary’s attack; therefore, we obtain an initial guide for defenders on where to focus their resources, and in turn, it helps to feed the MITRE ATT&CK model, which gives us the opportunity to list and understand what TTPs are used by adversaries in each of their attack phases and provide the necessary information for the generation of an active defense strategy with MITRE Shield. Therefore, Shield is responsible for guiding us on how to take advantage of the adversaries’ TTPs to actively defend ourselves and collect more information in order to actively generate CTI to complement ATT&CK and the Diamond Model. Finally, the Diamond Model powered by Kill Chain, ATT&CK, and Shield, is very useful to list the capabilities and infrastructure of the adversaries, with the main objective of understanding and documenting their attack methodology to generate CTI, which can be alienated by Shield constantly.

Understanding how a cyberattack can benefit a security team in our organization can benefit the cybersecurity community by encouraging defenders to collect data on adversaries to increase the knowledge base of TTPs, facilitating the selection of defense measures. If defenders implement countermeasures faster than their opponents evolve, they maintain a tactical advantage.