Keywords

1 Introduction

The Internet creates an opportunity for offenders to carry out different types of attacks. Public service transparency and anonymous internet connectivity are improving those attacks. Cyber-attacks cost US$ 114 billion per year, according to the Symantec cybercrime study released in April 2012. If indeed the time lost by companies attempting to recover from cyber-attacks is counted, the overall cost of cyber-attacks will exceed US$385 billion [1]. Cyber-attack victims are also rising significantly. Symantec’s study, in which 20,000 people in 24 countries were interviewed, revealed that 69% were the victim of a cyber-attack in their lives. Symantec also estimated that 14 people are the targets of cyber-attacks every second [1]. How do cyber threats thrive? It’s because cyber-attacks are less costly, easier and involve lesser risk than physical attacks. Cyber attackers only need a few resources besides a computer and an Internet connection. They are unaffected by geography and space. They are difficult to track and prosecute because of the open and anonymous nature of the Internet. Considering that attacks on IT infrastructure are very enticing, it is expected that the number and severity of cyber-attacks will continue to increase.

Cybersecurity is built to protect the digital assets of the company from ever-growing cyber-attacks. Cybersecurity can be achieved by the introduction of adequate security controls that include many security features, such as cybercrime deterrence, prevention and identification. Cybersecurity’s primary purpose is to provide data and services with confidentiality, integrity and availability (CIA), sometimes referred to as the CIA triad (Fig. 1).

Fig. 1
figure 1

The CIA triad

Attackers begin by gathering all types of data for a successful attack. Reconnaissance is one of the very first steps that attackers use to obtain information. Advanced port scanning techniques are accessible via free open-source software that provides a wealth of network information. This useful information includes port status, device types and variants, operating systems and service packages. Attackers are likely to succeed in combining information about vulnerable facilities with reconnaissance data. Social engineering is a technique of deception that uses human error to gain sensitive information, access or useful Intel. These “human hackers” scams in cyber-crimes appear to attract unsuspecting users to reveal data, to spread malware or to give them access to restricted systems. Attacks can occur online, in-person and through other interactions. Social engineering schemes are based on how people think and behave. Once an attacker knows what motivates a user’s actions, it can successfully trick and exploit the user. Unpatched software is terminology for defining a computer code with known vulnerabilities. Once security bugs occur in the computer code, software developers write add-ons to the code known as “patches” to fix the security gaps in the code. Running unpatched software is dangerous because hackers are well aware of vulnerabilities until they appear. Advanced Persistent Threats (APT’s) typically involve multiple stages, including network hacking, the avoidance of tracking, the development of a strategy to target and monitor company data, the collections and exfiltration of confidential company data, to decide when the required information is most available. Advanced persistent threats have resulted in many massive, expensive privacy violations and are proven to be unnoticeable by traditional security assessments. Advanced persistent attacks are now becoming more and more common as cybercriminals try advanced techniques to accomplish their objectives. There have been a number of qualitative efforts to model a cyber-attack. As highlighted in Fig. 2, the most prominent of them is Lockheed Martin Model, which introduces the notion of a cyber-attack kill chain, describing sophisticated attacks, such as Advanced Persistent Threats (APTs), in seven steps: reconnaissance, weaponization, delivery, exploitation, installation, command and control and act on the objective. Rutherford and White improved the model by adding intelligence gathering. Researchers at FireEye and Command Five introduced their own models to describe and model a cyber-attack.

Fig. 2
figure 2

Models for describing cyber-attacks: a Lockheed Martin cyber kill chain; b the Rutherford and White variant of the cyber kill chain model; c Fire Eye model; and d the Command Five Pty Ltd attack model [2]

2 Background

Confidentiality is securing resources from unwanted or illegal access, integrity is ensuring that data remain unchanged, availability ensures that data are available without any drop in access, non-repudiation ensures no one can deny another person any service and authentication ensures people with proper access is only allowed access. These features are the cornerstone of any security system and are better known as the five pillars of information security. We can consider a cyber-attack as an assault that breaches any of the pillars. The attacker, thus, tries to breach one or more of the pillars or to attack a targeted system as successfully as possible. The objective might be a host, a network or a major infrastructure of information systems [3].

2.1 Anatomy of a Typical Cyber-Attack

Reconnaissance. The first step where the attacker begins by learning and understanding the details of an organization and network and to identify online behavior of key people of the organization, typically the system architect and network administrator.

Information Enumeration. Enumeration is characterized as the process of listing of usernames, hostnames and IP addresses, network resources and devices used, the operating systems and network topology. The attacker creates an active connection and conducts guided queries to get to know the target and collect more information. These collected data will contain all the security vulnerabilities or weaknesses present in network and attempts to exploit them during the exploitation point.

Network and System Penetration. The attacker is now trying to penetrate the network and the infrastructures once they have access to the network. This can occur in several ways. The attacker can record keystrokes using a malware called key-logger to capture important information entered by a user like a password, or the attacker can plant a worm and leave. The worm will be able to spot vulnerable systems and start spreading across the entire network. They expect the attack to continue as long as the attacker wants to until detected. It could be months or years. A modern vector for attackers is cloud-based data storage. With large-scale adoption of this technology, there has been an increased lack of security features [4]. Basic Role-Based-Encryption schemes have not been implemented. This gives an attacker more data to play with.

Data Exfiltration. Once the attacker has penetrated the system, he has to transfer the data to his system. Any unauthorized transition of data is data exfiltration. Whether data are stolen from a printer or a thumb drive, it’s a rather real threat to corporations. Attacks can be performed manually by an authorized person who has access to enterprise systems or by malicious external actors who have access to them. Exfiltration of data can be a big business for cybercriminals and a big problem for any organization that is attacked.

Attack Sanitation. This is the last process of an attack where an attacker cleans up. All evidence of the attackers’ presence is removed from the network and host systems to look like nothing has happened.

2.2 Cyber Kill Chain

Lockheed Martin, an American defense company initially authored the Cyber Kill-Chain Framework for Cyber Intrusion Identification and Prevention as part of its information and intelligence-driven defense and security model. The model defines what adversaries need to accomplish in pursuit of that goal by targeting the network, exfiltration of data and maintaining persistence within the organization. This model essentially implies that stopping adversaries at any step in the kill chain breaks the sequence of attack. Attackers must make full headway through all steps of the kill chain to complete a successful attack. The outcome is an inescapable phase where all attacks last, and thus stopping them at this stage massively improves the potential of busting any cyber-attack. When stopped in the early stages of the chain, the speed of success will be enhanced. Moreover, any threat, and therefore its traces, could be an opportunity to learn more about the attackers, their methodology and use their resistance to improve security systems. A far more realistic configuration of defenses results in a greater understanding of enemies and their directions [5]. The Cyber-Kill Chain proposes that the attackers will take six specific steps to try their attacks: Reconnaissance: This stage may where attacker employs the process of target selection, the classification of organizational information, organizational rules and laws, information on the organization’s technological preference and topography, social network behavior of the employees or mailing lists available on a public repository. Weaponization and Packaging: In this stage, the attacker employs several forms: internet-facing application and thin client penetration, licensed and publicly available resources or modified software (licensed by the organization or modified to suit their purpose), enterprise software bugs like an XML parser or a word processor spreadsheets and presentation software which makes use of macros, target the most common sites used by the employees and set up phishing sites. Usually, they are equipped with information on duplicitous or accurate objectives. Delivery: The payload is supplied either by the victim (for example the browsing of a malicious site that leads to a malware deployment). The supply of ransom software, or the opening of fake PDF documents, or an intrusion (SQL or network access exploit) is triggered. Delivery: After delivery to the user, computer or device, the malicious payload will compromise the targeted asset and therefore the environment will become an initial foothold. This will be generally achieved by exploiting a security flaw that already has a patch. Although zero-days are exploited, in most cases, adversaries do not need to perform them as most of the targets will already be affected.

3 Tool Selection

The goal of this research is to assess the features of free/open source reconnaissance tools developed by independent researchers. Open-source software is a type of computer software in which source code is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software to anyone and for any purpose [3]. The following research process was adopted for the assessment:

  • The selection of the most used reconnaissance tools.

  • The assessment and evaluation criteria are defined solely based on the ease of use, availability of a graphical interface and the number of features it provides.

  • The execution of the tool.

  • The analysis of the output generated by each tool.

3.1 Reconnaissance Frameworks: A Comparison and Study of Open-Source Automated Recon Tools

We searched the GitHub repository, one of the biggest sources for open source projects, with the goal of selecting the five most common free/open source tools to be evaluated. We used keywords such as “Automated”, “Reconnaissance” and “Cyber”. We further filtered the search results using the following filters:

  • The last update or commit is after 2016.

  • OS independent, i.e. compatible with both Windows and Linux

  • Makes use of OSINT for data collection.

In total, there were 264 public repositories on GitHub matching the above keywords. Using the inclusion filters and sorting by the most downloaded, we shortlisted the following five tools:

  • Osmedeus

  • AutoRecon

  • AutoReconR

  • InstaRecon

  • InstaRecon.

For the assessment, only key functionalities of each framework are addressed, and no additional plug-ins are considered.

3.2 Evaluation Criteria

To evaluate the tools, we consider the following features.

  • IP Recon: These tools are used to perform reconnaissance and gather information about the network the host is connected to.

  • Social Engineering: This technique involves looking for reasoning to gain sensitive information or text by stimulating an individual mind or sense of social norms.

  • Threat Intel: Cyber threat intelligence is information about risks and threat actors helping to prevent dangerous cyberspace incidents. Information outlets for cyber threats include open source information, social networking intelligence, human intelligence, digital intelligence, or deep and dark web intelligence.

  • Forensic Tools: Tools used to investigate and gather evidence from various sources like an image file, audio file or even a sound file.

3.3 Evaluation Results

  • Osmedeus is an open-source tool developed by a security researcher called @j3ssie. It is developed using a python client and Django API server. It can be used to scan the vulnerabilities of the target network and server. It features an impressive collection of tools such as web technology detection, IP discovery, and machine discovery. You can split the workspace to store all the scan data and logging information. Finally, it facilitates continuous scanning and allows you to access the scan report from the command line [6]. It is also loaded with web-based technology tracking, IP discovery, and system discovery backtrack features. The framework will split the workspace to store all scan data and log information. Eventually, it can support continuous scanning and allow you to view the scan report from the command line (Fig. 3).

    Fig. 3
    figure 3

    Osemedus [7]

  • AutoRecon, developed by Tib3rius is an automated recon tool, which is capable of running multiple threads. This means that it can perform simultaneous network scan at a time. The tool is highly flexible and is written in python. It uses modules from standalone recon apps [7]. The initial vector for the tool is a network scan which includes both port scan as well. The results from these scans are used for performing further enumeration of the network. The IP addresses of the targets can be provided as a list. Since multi-threading is enabled, multiple hosts can be scanned simultaneously. Port scanning can be customized to perform scans on well-known ports, top 1000 ports or all ports, suggests recommended commands to be performed after the initial scan. Proper directories are maintained. Logs and scan results are saved in a structural format. Vulnerable components are highlighted to help the user to target the system (Fig. 4).

    Fig. 4
    figure 4

    AutoRecon [8]

  • AutoReconR developed by Stefan Voemel is an improvement on AutoRecon with various added features. It attempts to automate parts of the identification and listing process of the network. Respective findings are described and summarized in an automatic report. As such, AutoReconR can make it easier to identify potential weaknesses in target systems more quickly and to find an entry point [8]. The tool is designed to run in the background while the tester can focus on other tasks in parallel. For example, in the laboratory environments provided by Offensive Security or during security tests such as OSCP, the tester can start writing exploits while AutoReconR scans the remaining targets and performs automatic service listings. Some of the features are: the possibility of specifying targets either via a command line or an input file. Define scanning and service listing profiles in custom configuration files. Automatically store scan results in a folder structure defined by service name. Launch additional actions based on identified services and service patterns. Summarize the results in a corresponding (at present very basic) PDF study combine software runtime and scan depth with the aid of complexity levels.

  • InstaRecon developed by Luis Teixeira is an automated digital reconnaissance used to get the initial footprint of the system. It helps the attacker to obtain information about the target device or the network. This knowledge can be used to strike the device. That is why it can be called a Pre-Attack because all the information is checked to ensure a complete and effective resolution of the attack [9]. The features of InstaRecon are IP information and domain name information using whois and dig, Obtain IP address of domain name by using whatsmyip, Search engine lookup using Shodan, Domain Name Lookup using MX records, subdomain enumeration by using Google website reverse lookup (Fig. 5).

    Fig. 5
    figure 5

    InstaRecon [9]

From Table 1, it is evident that there is a significant gap in the current open source tools. With the ever-changing threat landscape, the need for integrating social engineering and forensic tools along with a threat intelligence model has become paramount.

Table 1 Comparison between the tools

4 Literature Survey/Related Work

To develop a reconnaissance tool, we need to analyze and characterize the relationship between an attacker and a victim. To systematically understand and characterize the behavior of cyber-attack reconnaissance behavior, Richard B. Garcia-LeBron (2018) proposes a systematic framework to study an attacker’s behavior [10]. The system is composed of three abstraction levels: macroscopic, mesoscopic and microscopic as seen in Fig. 6.

Fig. 6
figure 6

The three levels of abstraction proposed by Richard B. Garcia-Lebron [11]

On the macroscopic level, the graph of the time series of attacks in victim relations of the perception and understanding of cyber-attacker activities is studied. This graph-theoretical abstraction allows various information to be obtained by the use of a set of existing capabilities. To define the graphs of the attacker/victim relationship, the features that display these graphs were used which implements the comparisons that match different times windows between these graphs [12]. Additionally, the notions of efficient characteristics (i.e. features that may or may not characterize the creation of bipartite attacker-victim graphs) and robust characteristics (i.e. characteristics that are efficient in time resolutions) are described [13]. Finally, a dataset that was collected from a honeypot and used to conduct a case study to investigate the time resolutions that need to be considered to characterize as comprehensive as possible the evolution of the attacker-victim bipartite graphs [11]. Therefore to represent the behavior, only a small number of time resolutions have to be considered. At the mesoscopic level, the study of clustering cyber attackers using their identification behaviors, modeled as time series of reconnaissance activities is initiated [14]. A two-resolution methodology is used to characterize cyber-attack reconnaissance behaviors. At the microscopic level, a system to classify and organize temporal-spatial behaviors is proposed. The system provides a novel concept for the identification actions of the attackers dubbed: attacker identification trajectories. It also provides the cluster of attacker reconnaissance trajectories called a visual representation: Attacker reconnaissance trajectory hierarchy trees. A target reconnaissance model was proposed by Hung T. Nguyen et al. (2016) using probabilistic graphs on Facebook. Michael Glassman et al. (2012) present the idea of OSINT as an essential component in understanding the resolution of human problems in the twenty-first century. OSINT is a result of the emerging ties of human intelligence, which occur off the emergence of the Internet and World Wide Web expansionism of everyday life in various ways. Abel Yeboah-Ofori et al. (2016) propose a framework to thoroughly analyze and review existing study results on cyber intelligence and open-source information analysis and recognize all the risks and weaknesses of social networks online for mitigating purposes.

5 Design Considerations for a New Open Source Tool

Reconnaissance can take place vertically and horizontally. First, an intruder needs to obtain all possible subnet information from hosts and ports. This allows for the development of a targeted network map. Second, the attacker tries to evaluate the possible vulnerabilities of particular services. Cyber Reconnaissance is critical in today’s world due to an increase in cyber warfare involving actions by a political entity or nation-states to attack and attempt to damage other nations’ computers or information networks through, for example, computer viruses or denial of service attacks. Cyber reconnaissance is designed to level the playing field by providing organizations with a high-resolution picture of their cyber landscape from an adversary’s perspective. While cybersecurity uses the implementation of technical means to protect the critical infrastructure or assets of an organization, it is also important that data collected during the reconnaissance phase is converted into intelligence, also known as cyber threat intelligence (CTI). The CTI is based on the collection of intelligence using open-source intelligence (OSINT), social media intelligence (SOCMINT), human intelligence (HUMINT), or insight from the deep and dark web.

5.1 Techniques Involved

Network scanning and port scanning. Processes to learn about a network’s structure and actions—not hostile intrinsically, but often used by malicious individuals to conduct recon before trying to penetrate a network. Network Scan detects and maps all active hosts to their IP addresses within a network. Port Scan scanning refers to the process of sending packets to various host ports and analyzing responses to gain information about their operating system and running services. The first step in aggressive network scanning is often the host discovery process of deciding which devices on a network are down and up. For host discovery, two protocols are used most regularly: Address Resolution Protocol (ARP) scan and various forms of ICMP (Internet Control Message Protocol) scan. Because all individual ARP requests are required to map IP addresses to MAC addresses on a local subnet, ARP requests can be forwarded to multiple IP addresses on a local area network (LAN) based on the ARP response. Port scanning may be used to identify service uses in particular ports once the hosts that are available on the network have been scanned by network scanning. Typically, port scanning attempts to classify port as one of three names, Open: the destination answers with a packet that indicates the port that listens to it, meaning that any service used (usually TCP or UDP) for scanning is also in use. Closed: The request packet has received the destination but replies that there is no port listening service. Filtered: the port may be open, but the packet was filtered out of the firewall and excluded, so no response was received.

OS Fingerprinting. To detect OS, networks, utilities, and program names and numbers, attackers can send custom packets to the target. These packets will receive the victim’s response in the form of a digital signature. This signature is one of the keys to recognize which applications, protocols and OS is running the target system. If the attackers have the right details, they will know your scenario and will be able to build a complete infrastructure map of all your services and potential network topology to fine-tune their digital assault. OS Fingerprinting is a method for determining which operating system the remote computer is running. OS Fingerprinting is often used for cyber reconnaissance because most exploitable bugs are unique to the operating system.

6 Conclusion and Future Work

The importance of reconnaissance has been discussed in-depth in the paper. Global cyber threat landscape mandates organizations to perform penetration test exercises often to secure their systems from the ever-evolving attacker landscape. From the cyber-kill chain, it is obvious that reconnaissance is the first step in any attack. Therefore, the need for an automated tool is necessary to streamline the process. The tools mentioned in the paper are the first step toward that and are not the finished product. We can further improve this software by including the following modules: Forensic Module: To obtain metadata from video and audio files, documents, etc. Automated Workflows: To directly provide input for reverse DNS lookup from the Nmap scan result page. CVE details: To obtain public vulnerabilities of the technology used Since OSINT is an ocean of data, there have to be two key components to any software, Fluidity and Rigor. Therefore, the new proposed system would be designed based on the experimental results of the attacker victim relationship. This helps us to model our software on current attacker trends. Along with this, a study on alert correlation systems will help us in reducing the number of alerts generated by an IDS or IPS when obtaining dataset to generate an attacker model [15]. A major issue when taking datasets generated from these systems is the false positive. Negating this will help us better in identifying actual threats. This will help in timely insight, which will enable to organize a better defense from this ever-changing threat.