Keywords

1 Introduction

The primary reason behind any web attack is insufficient security or design flaws in the web application, thereby, allowing hackers to enter into the system and steal confidential data such as username, passwords, transaction details, session Ids/token, and database-related information. According to a survey, PHP (78.5%) and JavaScript (94.7%) are the most commonly used server-side and client-side programming languages, respectively [1].

A cyber-criminal would first analyse the website for a vulnerability using online tools such as vulnerability scanners or botnets. Vulnerabilities such as virus-infected administrator’s system, weak password, out of date security patches, browser plugins, and permissive coding practices may give a chance to the hacker to enter the system and steal the data. Moreover, a recent survey was conducted on the usage of world-wide web in which the authors depicted that most common vulnerabilities are found at application level, which is layer 7 according to the OSI network model [2], and 93% of data breaches occur due to human error while designing and developing the web application [3]. For an instance, neglect of data validation could give a clear path to attacker to deceive the web server into running unsafe commands [4]. In 2019, an EDGESCAN organisation generated a vulnerability statistics report in which it claimed that 19% of all vulnerabilities were associated with layer 7, and the rest 81% of vulnerabilities were linked with network layer. SQL injection was significant at 5.55%, XSS at 14.69%, and other injection attacks such as OS, CRLF, and JavaScript were significant at 8.18%. According to 2020 Cyber security report, approximately 93% of the files, which were shared through web in India, were found to be malicious [5], and 64% of the organisations in India are believed to be impacted by the information disclosure vulnerability. Despite the extensive research being done in developing new tools and protocols to detect, prevent, and mitigate the web attacks, still numerous websites are non-immune to the web attacks. This clearly depicts the need to detect the software-related vulnerability in order to prevent web security exploitation by the hacker. Following are the top 10 vulnerabilities in 2020 [6] according to OWASP (Open Web Application Security Project) [7] (Table 1).

Table 1 Top 10 common vulnerabilities and exploits (CVE)

Figure 1 demonstrates an attacker interrupting normal communication between client and server, getting successful in bypassing the system, and modifying the crucial data. The attack is viable due to the nature of HTTP and HTTPS protocol. In case of HTTPS, two connections are built up: one SSL connection is created between the client and the hacker, second SSL connection is created between the hacker and web server where the cybercriminal splits the TCP connection between the client and web server. In 2019, SSL labs claimed that 1.2% of HTTPS servers are still vulnerable to attack. For example, DROWN attack vulnerability can be easily carried out on websites using HTTPS and SSL/TLS services [8]. The misconfiguration and inappropriate default setting allow the attacker to decrypt TLS connection between client and server.

Fig. 1
figure 1

Scenario of attack on a web application

2 Literature Survey

2.1 SQL Injectıon Vulnerabılıty Attack and Preventıon

This web attack was first discovered in 1998 by a security researcher, Jeff Forristal, and is still at the top list since 2003. Antivirus programs are ineffective at handling SQLI attack. Any company that operates its website on SQL database is prone to this attack if it does not have sufficient input validations in its web forms. As a result, anyone can insert malicious SQL commands into the input string of a web form, web cookie, or a page request (browser), and can retrieve, modify, and delete the data present in the database putting data integrity, authentication, authorisation, and confidentiality at risk. In 2012, a researcher claimed that 97% of data breaches occur due to SQLi. Surprisingly, health industry is the most attacked industry and with maximum number of data breaches due to SQLi attack. The attack is done on data-driven applications as the behaviour of these applications generally depends upon the data input. Therefore, this attack is quite easy to execute. However, lack of awareness and implementation of security protocols by the organisations leads to data leaks resulting from SQLI attack. The attack could be carried out with one of the following objectives [9]:

  • to identify injectable parameters

  • to extract/retrieve data

  • to add/modify data

  • to perform DOS

  • to evade detection

  • to bypass authentication

  • to execute remote commands

  • to perform privilege escalation.

SQL injection attack has two stages [10]:

  1. i.

    Injection attack stage 1

  2. ii.

    Injection attack stage 2.

Stage 1 is known as reconnaissance. At this stage, the attacker passes random unexpected values to the arguments and observes how application responds. Stage 2 is known as actual attack. At this stage, the attacker provides carefully-crafted input values that will be interpreted as part of SQL commands rather than merely data. The database then executes the SQL commands as altered by the attacker. SQLi can be categorised into the following four types as illustrated in Fig. 2:

Fig. 2
figure 2

Types of SQL injection vulnerability attacks [9]

2.1.1 CLASSIC SQL Injection Attack

It occurs generally when the user input is not filtered and escaped correctly in the web form. Owing to this, attacker sends batch commands to the database server, and in return receives specific output based on the input statements [11]. As a result, he can control application’s entire database as illegitimate admin user. The input may include SELECT commands, which can download entire database including users’ personal information such as unique identification, phone number or credit/debit card numbers. The attacker could also use INCLUDE or UPDATE commands to create new user accounts or alter the existing ones.

For instance, following Fig. 3, integer value ‘1’ is passed to the web submission form (DVWA), where the security level was set as low, which returned the first name and surname for user id ‘1’. Similarly, it will return the values for user id 2 or 3. This means that website is vulnerable to SQLI attack. Moreover, the URL also depicts the id number. The URL is:

Fig. 3
figure 3

Extracting the values of argument ‘id’

http://localhost/dvwa/vulnerabilities/sqli/?id=1&Submit=Submit#

If the id number is changed in URL itself, the results will be displayed for that particular id. For example, if we change the id from 1 to 2 and press enter, the database will return the first name and surname for id 2.

We can also extract all first names and surnames by passing the string %’ or ‘0’=‘0 in the input form. This will return the information for all five records present in the database (Fig. 4).

Fig. 4
figure 4

Extracting the values for all records

Classic SQLI can be implemented by one of the following techniques [1, 9, 12]:

  1. i.

    Tautologies

  2. ii.

    UNION SQL Queries

  3. iii.

    Piggy-backed SQL queries

  4. iv.

    Alternate encoding

  5. v.

    Illogical Queries

  6. vi.

    Stored Procedures.

USING TAUTOLOGY

Tautology means an expression or a logical statement, which is always true. This means that attacker can use such SQL statements, which will always be true, and hence results in executing the queries at the database server. The attack is carried with one of the following objectives:

  • to extract/retrieve data

  • to bypass authentication

  • to identify injectable arguments.

The attack is implemented by using conditional expressions using OR operator.

USING UNION COMMAND

UNION SQL attack is operated especially to determine the database version or the information about the number of rows and columns. Just like the former uses OR operator, this attack uses UNION operator where UNION is used to merge two SELECT statements.

The attack is carried out with one of the following objectives:

  • to extract/retrieve data

  • to bypass authentication.

By default, most of the databases such as MySQL stores the database-related information like name with version, number of tuples, etc. and can display the database version while generating error messages for incorrect queries [13]. Such misconfiguration could allow attackers to compromise database for future attacks. For example, the following UNION SQL query will end up in extracting the information of all records with database version as the last one.

%’ or 0=0 union select null, version() #

Attacker could also use the union statement to extract the hostname (Fig. 5) using the following command:

Fig. 5
figure 5

Using union query to retrieve the hostname

‘union select null, @@hostname#

The attacker could also pass union queries to extract the details of information schema, location of database files, and even read files located on the remote system. Therefore, in order to avoid such type of attacks, it is always recommended to use prepared statements in conjunction with GET statement.

USING PIGGY-BACKED STATEMENTS

As the name implies, piggy-backed statement means to add one statement at the end of another statement to make it a single command by using semicolon. The database that would be vulnerable to such attack could allow multiple statements to be treated as a single statement if and only if the former statement comes out to be valid and true. For example, following Fig. 6 demonstrates inserting second query using semicolon after the first true query.

Fig. 6
figure 6

Sample example of piggybacked SQLI attack

This vulnerability can be misused by the attacker to execute remote commands such as dropping the tables or to shut-down the entire system using command SHUTDOWN;–. As a result, he can effortlessly implement ‘Denial of service (DoS)’ attack.

USING ALTERNATE ENCODINGS

An attacker may use special encoding techniques in order to prevent detection of malicious code by the software [14]. For instance, he may use ghost characters to bypass the filters as Web or FTP server fails to detect the extra characters. These characters are the extra characters, which do not have any effect on the API layer, hence, will automatically get stripped off from input string. Following is the list of ‘improper handling of encoding’ vulnerabilities [15] that could allow attacker to do further damage:

  1. i.

    Using char() of ASCII [1]

  2. ii.

    Using ghost characters

  3. iii.

    Passing special characters using % in the URL (URL encoding due to insufficient filtering on the URL)

  4. iv.

    Repetition of encoding or Double encoding

  5. v.

    Encoding IP/web address

  6. vi.

    Adding NULL bytes in the input

  7. vii.

    Using Unicode/UTF-8 encoding technique

  8. viii.

    Using NULL terminator by post-fixing the data to avoid filter.

This type of attack is difficult to implement as the developer needs to check the validation and proper sanitisation for all of the above-mentioned encodings including URLs, IP address, and input.

USING ILLOGICAL QUERIES

As the name suggests, a threat actor can pass incorrect SQL statements in order to collect critical information about the database just from the error or log messages, which could display errors related to syntax of code, logical error, or type mismatch error. This could lead to exposure of injectable arguments/parameters to the attacker. Due to this reason, this type of attack is also sometimes referred to as error-based injection [1]. For example, in the following Fig. 7, after inserting incorrect query, server return name of database in the error message.

Fig. 7
figure 7

Sample example of error-based SQLI attack

USING STORED PROCEDURES

Stored procedures are the compound statements that contain a set of multiple SQL statements as a group, which further gets saved in a data dictionary of RDBMS [16]. This group is given a specific unique name. This provides flexibility to call these set of statements from multiple programs using a single name (just like we call functions). As a result, they provide various benefits such as handling runtime errors, data validation, provide mechanism for access control, etc. There is a common myth among most of the developers that stored procedures are always safe. However, they are completely not, if dynamic SQL inside the stored procedure is not handled properly. What I mean to say is, if the dynamic query used inside the stored procedures is created by concatenating the user input values instead of formal parameters, then it is at high risk. For example, the following first statement illustrates the bad example of dynamic SQL.

sb.command.Append(“Name=”+inputName.value+, “,”);

Good example:

sb.command.Append(“Name=@Name”);

2.1.2 BLIND SQL Injection Attack

As the name suggests, in this type of attack, the results of SQL injection are hidden from the attacker, therefore, it becomes quite difficult for the attacker to extract data in one attempt [9, 11]. The attacker performs number of attempts before reaching to final successful request. It is also known as inference injection attack. For example, let us take the same example that we took in CLASSIC SQL injection attack. If we pass a true value, i.e. 1, then instead of getting the actual results such as value for first name and last name, we will get out mentioned in Fig. 8.

Fig. 8
figure 8

Sample example of SQL blind injection attack

Blind SQLI attack has two types:

  1. i.

    Time-based blind SQL injection attack

  2. ii.

    Content-based SQL injection attack.

Sometimes, it is also referred to as conditional response as the attacker sends a malicious code with some conditions to the server and checks the response. In most cases, the queries are crafted as Boolean values, i.e. true or false. If the response happens to be true, the injectable parameters can be detected, else attacker can try another malicious set. It could also be the response rate of HTTP request [17]. In the first type, the attacker sets a time limit in the code and analyses the response received from the web server. Whereas, in the content-type, it is done depending upon the content generated by the query. In order to check whether the website is vulnerable to BLIND SQLI attack (stage 1: reconnaissance), attacker could use online vulnerability tool such as SQLMap (could be even used by researchers for teaching and learning process).

2.1.3 DBMS Specific SQL Injection Attack

This type of attack is done using two techniques: DB fingerprinting and DB mapping. DB fingerprinting means executing illogical queries in order to extract database-related information such as analysing error messages, inserting query to know DB version, ascertaining table names, information schema, and number of rows and columns, etc.. The type of error message generated by the database will vary depending upon the type of back-end database used. For example, the following error messages tell about the incorrect number of columns, so attacker can easily modify the input to obtain the correct result (Fig. 9).

Fig. 9
figure 9

Sample error message

The attacker could also construct a query to retrieve the exact version of database using inference testing as discussed earlier. By mapping the database using online tools, hackers can easily access the application’s data layer.

2.1.4 COMPOUNDED SQL İnjection Attack

Compounded SQLI attack means that the attacker can use another attack in conjunction with SQLI attack. For instance, the following attacks can be executed by the attacker after performing SQLI attack.

  1. i.

    XSS attack,

  2. ii.

    insufficient authentication attack,

  3. iii.

    DDoS attack, and

  4. iv.

    DNS hijacking attack.

Finally, SQLI attack can be prevented only by considering the above-mentioned vulnerabilities while developing a website as firewalls, antivirus programs, and SSL are ineffective in preventing such attacks. Therefore, developer must consider the following points in order to avoid SQL injection attack on web application:

  1. i.

    Using prepare() function (prepared statements)

  2. ii.

    Including user input validation statements such as removing the extra special character or string such as –,;, ‘, SHUTDOWN, DROP, or DELETE (from web URL, web form or cookie) while receiving input from the user as such characters could be used to bypass the web filters.

  3. iii.

    Treating received input from the user as a string instead of a command.

  4. iv.

    Always keep in check of permission scheme of database, and doing regular checks of all system files for any modification to the system.

  5. v.

    Configuring the database error messages so that critical information do not get exposed to someone who do not have access rights.

2.2 Broken Authentıcatıon and Sessıon Management Vulnerabılıty Attack and Preventıon

Since HTTP is a stateless protocol, some kind of protocol is required that can keep track of the activities of a particular user using the website and is passed as an argument in the GET or POST query. This is achieved by providing session ID or token to a user when he visits any website [18]. This session id is used to identify that user during the information exchange (HTTP request and HTTP response). The time span of the sessions is kept as short as possible for security purposes. If sessions are not handled properly during the website development, the attacker could use or steal any logged-in user’s session id and can obtain the potential privileges. Session ID is usually generated as a random long string so that it becomes difficult for the user to guess the next one [18].

Generally, sessions can be maintained either on server side or on client side depending upon the web application’s requirements. While storing session on server is a highly complex process and may result in an increase in latency time, the users’ credentials are generally claimed to be much safer as the users’ data are not exposed, and the cookie size is kept small. On the other hand, due to the complex nature of server-side session management, most developers prefer storing the session inside the authentication cookie on the client side. However, this common technique generally poses higher risks if the data integrity, authenticity, and confidentiality are not guaranteed.

The website could be vulnerable to session fixation attacks if the sessions and authentication are not handled properly while designing or developing the website. The following Fig. 10 demonstrates the session fixation attack.

Fig. 10
figure 10

Sample scenario of session fixation attack

Any website is vulnerable to broken authentication and session fixation attack if the following points are not considered:

  1. i.

    Permits the use of weak password

  2. ii.

    Permits the multiple failed login attempts

  3. iii.

    Session ID is visible in the URL

  4. iv.

    Multi-factor authentication is missing

  5. v.

    Session ID is not refreshed during the activity

  6. vi.

    Session id still persists in memory even when the user has logged out, especially when user sign-in using SSO (Single Sign ON that means signing-in by trusting the third party such as login through Google or Facebook)

  7. vii.

    Using unencrypted communication channel for sending password or session ids/tokens.

  8. viii.

    Using weak account recovery algorithms.

Md. Maruf Hassan et al. [15] executed a case study on weak authentication and session management vulnerability in Bangladesh and found out that a total of 267 public (72%) and private (28%) organisations were vulnerable to this attack, i.e. approximately 56% websites among their sample. The intruder can obtain the session id of targeted user using online tools such as Google dork, eat my cookie, or cookie manager.

Following points must be considered to prevent this type of attack:

  1. i.

    Ensure strong password by adding validation checks.

  2. ii.

    Limit the failed login attempts and alert the concerned user and the admin regarding the brute-force attempt.

  3. iii.

    Ensure the shortest life span of each session ID.

  4. iv.

    Sessions must be shared over the encrypted channels.

  5. v.

    Use strong hashing and salting algorithm to store passwords in the database such as SHA256 [19].

  6. vi.

    Better use POST method instead of GET method as it is more secure because it never expose the user’s data either through web URL or server logs.

  7. vii.

    Use strong hashing function to encrypt password.

  8. viii.

    It is necessary to test all the platforms (such as Google or Facebook) used where sessions are being shared through URL.

  9. ix.

    Include the option of asking old password while the processing request of changing the password.

  10. x.

    Ensure the data are not cached in the browser, i.e. back button must not show the previous result in case of banking websites.

  11. xi.

    Web application firewalls can be used to validate the sessions.

  12. xii.

    Use SSL certificate.

2.3 Cross-Site Scrıptıng Attack and Preventıon

Two-third of all web apps are found to be vulnerable to cross-site scripting attack, also known as XSS attack. The term was first introduced in November 1999 when a group of security researchers heard about the injection of malicious scripts and image tags into the HTML pages of some dynamic websites. After 2 months, in February 2000, they published a report demonstrating the XSS vulnerability. For your knowledge, it was named XSS instead of its short form CSS only to avoid name ambiguity for Cascading Style sheets (CSS). The malicious script is executed on the client side, usually on user’s browser. As a result, the communication between the user and vulnerable website is compromised. If a dynamic website is vulnerable to SQLI attack or broken authentication and session management attack, then there is a higher risk that the website will be vulnerable to XSS attack as well. Just like SQLI attack is targeted for SQL-based applications by passing SQL queries, XSS attack is targeted to HTML pages where the intruder injects the malicious code into HTML web pages. It is the most popular technique used by cyber-criminals to steal sessions or to attack a company’s entire social network. According to Wikipedia, the most prominent websites such as Facebook, Twitter, and YouTube had also suffered from this attack in the past. The following steps explain the general scenario of attack:

Step 1: Attacker finds a vulnerable website, which allows the injection of untrusted malicious code into its webpage. For example, inserting false advertisements on the web page, displaying false content on the website.

Step 2: Attacker inserts malicious client-side JavaScript/ActiveX/VBScript/HTML code on the web application. This code is either sent to the victim’s web browser or the web server depending upon the type of XSS attack.

Step 3: User clicks on the malicious link either while visiting the website or accessing service from the web server.

Step 4: Attacker has access to private credentials or details of the victim through a vulnerable website by bypassing the SOP (Same Origin Policy).

M. Liu et al. [18] conducted a survey on XSS attacks on their local vulnerable test website. The paper illustrated the various risks associated with XSS vulnerability. The risks include phishing attacks, exploitation of user’s session id or token id, DoS and DDoS attacks, stealing client’s web browser screenshot, and risk of XSS worms on click malicious link.

Germán E. Rodríguez et al. [16] conducted a survey on mitigation of XSS attacks and discovered that 40% of attacks are implemented using XSS technique. The following table illustrates the use of most common attacks with their percentage according to [16] (Table 2).

Table 2 Percentage of occurrence of attacks

XSS attacks can be categorised into the following types [18]:

  1. i.

    Server-side vulnerability

    • Persistent XSS

    • Non-Persistent XSS

  2. ii.

    Client-side vulnerability

    • DOM-based XSS

Persistent XSS is also known as Stored XSS. In this attack, the malicious script is added directly on the website (especially forms, blogs or comment sections), therefore, it is also known as direct/second-order/type-1/stored XSS attack as the script gets stored on the web server. So, whenever user visits that website, the malicious code gets executed, and hence, it is said to be more harmful than other two types.

If website is vulnerable to this attack, then attacker can execute phishing attack and key-logger attack. In former attack, the credentials of the user are compromised. In later, attacker is able to capture the keystrokes of the user for the vulnerable web page. Attacker can also construct a script to take screenshot of the web page by injecting that script on the website. As a result, personal data or bank balance of victim can be easily exploited. Bind-XSS is one of the types of Persistent XSS attacks. The following steps explain the scenario of stored XSS attack:

Step 1: Attacker posts a message containing malicious script on a form/blog.

Step 2: The script gets stored in the server’s database.

Step 3: Victim visits the webpage with malicious content and requests a service.

Step 4: The website displays the content containing the malicious code.

Step 5: Attacker gets complete control over the victim’s system.

Non-Persistent XSS attack is also referred as type-II or reflected XSS attack where reflected means that the results of malicious query are visible to the attacker. The attacker crafts the malicious link in such a way that it appears to be from a trusted source. When the victim clicks on the malicious link, web server sends a response including the malicious script to the user. For example, the following figure demonstrates reflected XSS attack when a script query: <script> alert(“HELLO”) </script> is entered in the name box, it is reflected in the URL (Fig. 11).

Fig. 11
figure 11

Example of reflected XSS vulnerability

DOM-based/type-0 XSS attack is a client-side vulnerability attack where DOM is abbreviated as Document Object Model. DOM is an object model for every HTML webpage. It includes the properties of the HTML page, which allows it to change its’ content. So, when DOM-based XSS attack is executed, the JavaScript code is embedded in the client-side program, which allows it to modify the content of DOM and can also change the values of objects’ properties, while the user visits the page without malicious link. Since the malicious code executes on victim’s computer, server-side detection algorithm would fail to detect this type of attack.

Following points should be considered to prevent XSS attacks:

  1. i.

    Execution of JavaScript code can be prevented by setting the cookie to HTTPOnly flag.

  2. ii.

    Invalid requests can be redirected.

  3. iii.

    Simultaneous multiple logins to the same account must be detected and session must be declared invalid.

  4. iv.

    Escaping schemes could be used.

  5. v.

    Appropriate response header must be used.

  6. vi.

    Detection should be done at both client side and server side.

3 Conclusıon and Future Scope

In conclusion, with the rise of internet technology, it has become crucial to protect one’s data and privacy, where the hackers could use numerous online tools to catch just one vulnerability in website, and if found can put the users’ integrity, confidentiality, and authenticity at risk. SQL injection, XSS attack, and broke authentication attacks could put users’ privacy at risk. It is suggested to use the artificial intelligence-based detection method to detect a web vulnerability.

Since it has become extremely crucial to protect online resources from being exposed to hackers as new attacks are being carried out every day by hackers, web attacks could be defeated by integrating the detection and prevention techniques using machine learning algorithms.