In this chapter, we will discuss and walk through a scenario at a broader level wherein an organization adopts a multi-cloud environment, integrates on-premises systems with the cloud, and deploys services leveraging Infrastructure-as-a-Service, Platform-as-a-Service, or Software-as-a-Service. This is a big picture scenario, and at some point, or another, any enterprise reaches this state to a greater extent. Monitoring at this stage may get very complicated and overwhelming, owing to the various tools, methods, and maturity level of the knowledge available. Figure 3-1 represents the scenario.

Figure 3-1
figure 1

Multi-cloud and hybrid scenario

The Environment

There are different aspects of looking at the monitoring and ensuring all requirements are taken care. It can be based on Applications, the entire environment, or monitoring for a specific purpose such as troubleshooting an ongoing issue.

In our scenario, we have an enterprise with on-premises applications and resources. They have subscriptions from two different cloud vendors: Microsoft Azure and another leading cloud provider. Both cloud environments are integrated with on-premises and a good network connectivity exists with on-premises. There are applications hosted on Azure virtual machines and on Azure PaaS solutions such as Azure App service. On the other cloud service provider, the organization has a few virtual machines hosting some other workloads. There is dedicated VPN connectivity across the location/sites to ensure secure access to cloud and on-premises resources. They also have adopted a cloud integrated identity with on-premises, and all users use on-premises identity to access cloud resources. They have recently migrated to Office365 to replace their business productivity applications and use Exchange online; SharePoint online; and the Office suite part of O365 subscription, which is an end-to-end SaaS solution. To keep the overall scenario simple, let us consider this to be a single subscription, single tenant, and one-region scenario for the cloud environments. They currently have SCOM on-premises for monitoring their workloads.

We will not lay down any specific business, technical, functional, and nonfunctional requirements as of now. We will explore various monitoring options and services for our environment first, and then in our upcoming chapters we will try to achieve more.

Monitoring the Platform

In order to monitor the platform, let us first break it into components . Table 3-1 is a good start.

Table 3-1 Platforms in Our Scenario

It is important to identify if we will require a single solution or multiple solutions to monitor all the platforms. In our scenario, we can just start with Azure Monitor. Note that it is possible that the other cloud provider may also have similar toolsets, and it is up to us to guide them which one may be most useful and best for them; and it is up to the customer to decide which one to go with. Now let us look at the tools that we can use (Table 3-2).

Table 3-2 Monitoring the Platform – Tools

Health Monitoring

There are two types of health monitoring that we can address. Let’s look at Table 3-3.

Table 3-3 Monitoring the Services Health

Microsoft Azure provides health service monitoring for both scenarios. Let us know understand those tools to take necessary steps (Table 3-4).

Table 3-4 Monitoring the Services Health – Tools

Figure 3-2 is a quick look at what health service and health Alerts in Azure Monitor looks like.

Figure 3-2
figure 2

Resource health and Health alerts

Usage Monitoring

Usage monitoring should be able to help with items shown in Table 3-5.

Table 3-5 Usage Monitoring

Let us explore how these items can be addressed and with which tools. For most of the cases, usage is closely related to billing. To ensure that you are not over your spending limit, it is advisable to keep a track of the usages by resources and how much is spent over a period of time. Then you will be able to identify if there is any specific thread of usage in a specific department or in a specific environment. For example, in our scenario, if the organization is not migrating a lot of workload to Azure, it may not have any steep increase in Azure costs. It may just be a straight line too or just a growing trend depending on the new user addition or customer addition. The spending may see sudden increase if they are migrating their existing workloads to Azure or new projects are only been hosted on Azure. Similarly, if this is a Development environment, the spending may not be very high as compared to production; however, it may have sudden spikes in cost due to various development sprint cycles or test scenarios, etc. There can be other scenarios as well that may contradict the previous predictions. But, in all cases, it is a good idea to identify where the money is going.

Let us look at some of the tools to show us where (Table 3-6).

Table 3-6 Usage Monitoring – Tools

Performance Monitoring

Performance monitoring is the key to ensure if the system is functioning optimally. We will discuss this in more detail in our upcoming chapters and explore how various metrics and values can make a difference in your design decision. For this scenario, we have various resources, so let us look at how and what we can monitor (Table 3-7).

Table 3-7 Performance Monitoring

Now let us look at our tool to monitor performance data in Table 3-8.

Table 3-8 Performance Monitoring – Tools

There may be different counters for on-premises systems. They will, however, follow the same old practice of performance monitoring using metrics. Since in our case, we have SCOM, it can be easily used to monitor performance counters for on-premises systems. With the help of Azure Monitor integration with SCOM, we can now have better insight and analytics of the various performance counters.

Availability Monitoring

Availability monitoring and health monitoring are very closely related. Health monitoring helps to identify the current health state of the system, whereas availability monitoring is about the statistics of uptime for a system and its components. It includes granular identification of crucial individual component failures of a system. This is to avoid any future failure and take corrective action to maximize availability. Availability monitoring depends on lower-level factors and identifying the critical segments of the system that might cause overall availability issues. The recorded statistical value of all such components can be aggregated together to arrive at the availability of the system (Table 3-9).

Table 3-9 Availability Monitoring

Now let us investigate what parameters we need to record to calculate such values and monitor a system’s availability. Table 3-10 tries to explain some of the critical items.

Table 3-10 Availability Monitoring - Tools

Security Monitoring

One of the most important aspects of monitoring is Security Monitoring. Let us quickly look through the various aspects of security monitoring for our scenario (Table 3-11).

Table 3-11 Security Monitoring – Tools

Auditing Compliance and SLA Monitoring

This part can be very specific to the requirement, and hence we would like you to review Chapter 2 – Scenario 1 and Scenario 2 for these two items.

Paint the Picture

At a very high level, the monitoring components for our scenario will look like Figure 3-7.

Figure 3-7
figure 7

Monitoring components high-level view

Hybrid Cloud Monitoring

In hybrid cloud architectures, it is important to ensure that the existing toolsets being used by organizations are put to best use while leveraging the capabilities offered by the cloud. An IT team shouldn’t have to hop between tools to get a picture of the overall environment’s health. Azure Monitor provides multiple integration points for such deployment scenarios so that you have a single repository for all your monitoring data. Let us explore some of the hybrid cloud monitoring options that work well with the Azure Monitor.

Most enterprises use Security Information and Event Management (SIEM) tools such as Splunk, QRadar, or ArchSight. These tools provide various functionalities, features, and capabilities with the core objective to collect, store, analyze. and manage security information and events. They collect data from various sources (physical and virtual systems, devices, applications, and hardware), store, analyze, and alert for various security anomalies, detection, issues, and provides various reporting. Though the services and capabilities may be different for these tools to some extent, most of them provide overlapping features and functionalities.

Customers can configure the logs and data from Azure resources to be forwarded to a centralized storage location, which can be consumed by the SIEM tools. To support such capability, multiple diagnostic settings can be enabled. The diagnostic information can be sent to Azure storage accounts, Events Hub, or Log Analytics namespaces depending on how you want to configure it. Microsoft has also partnered with leading SIEM tool vendors to develop connectors that can collect and integrate data from Azure Monitor. Azure Monitor integration with such tools can be achieved using different approaches. Some of these tools will now be briefly discussed.

Splunk

Azure Monitor Add-on for Splunk helps to collect and integrate data from Azure into the Splunk tool. The add-on at the time of writing this book supports Activity log, Diagnostics logs, and Metrics information to be collected and sent to Splunk. Activity logs are enabled via the log profile, and diagnostics logs as well as metrics should be enabled in the diagnostic settings of respective Azure resources.

The architecture uses Event Hub as a mediator, where the logs from different Azure sources are sent to the Event Hub initially. The logs will remain in the Event Hub based on the retention period configured waiting for it to be retrieved periodically by add-ons like Splunk. The add-on is configured to read information from an Event-Hub namespace, which could have multiple hubs, each storing different type of logs associated with a resource.

Steps to be completed at a high level on the Azure side for the integration are as follows:

  1. 1)

    Splunk uses the context of an AD application to retrieve data from Azure. This AD application should be created as Application type WebApp/API and assigned reader access to the subscription.

  2. 2)

    Create the Event Hub Namespace to which the activity logs, metrics, and diagnostic logs will be routed. It could be a single Namespace or multiple namespaces depending on how you want to integrate the add-on.

  3. 3)

    The Azure AD application keys and the Event Hub keys should be securely stored in a KeyVault.

  4. 4)

    Assign necessary permissions for the AD application to the Azure resources from which the data will be read.

  5. 5)

    Configure the Metrics, activity, and Diagnostics log settings so that the data is sent to Event Hub and can be pulled by the Splunk add-on.

A ready-to-use script is available in GitHub that will pull all required information from the Azure subscription required for configuring the Splunk plug-in. This script is available here: https://github.com/microsoft/AzureMonitorAddonForSplunk .

The script will generate all the required Azure configuration information that can be used as data inputs for the Add-on configuration from the Splunk UI.

IBM QRadar

To Integrate IBM QRadar , SIEM can be configured to collect data routed to Event Hubs using Microsoft Azure DSM and Microsoft Azure Event Hub Protocol. As with Splunk, the prerequisite configuration required on Azure side is that the logs should be routed to Event Hub.

The high-level configuration steps to be completed for QRadar to enable log collection are listed here:

  1. 1)

    Open port 443, 5671, and 5672 should be opened from the QRadar box.

  2. 2)

    Install the most recent versions of the following RPMs in QRadar:

    DSMCommon RPM Microsoft Azure DSM RPM Protocol Common RPM Microsoft Azure Event Hubs Protocol RPM

  3. 3)

    Once the protocols are installed, the “Microsoft Azure” log type will be available in QRadar along with the “Microsoft Azure Event Hubs” protocol when you try to configure the log source.

  4. 4)

    Add the Event Hubs connection string, consumer group, and the storage account connection string in the log sources configuration and deploy changes.

The logs from Azure portal will be available in QRadar on successful completion of the configuration. As per Azure DSM Specifications published by IBM, the following event types from Azure can be collected by QRadar through this integration: Network Security Group (NSG) Flow logs, NSG Logs, Authorization, Classic Compute, Classic Storage, Compute, Insights, KeyVault, SQL, Storage, Automation, Cache, CDN, Devices, Event Hub, HDInsight, Recovery Services, AppService, Batch, Bing Maps, Certificate Registration, Cognitive Services, Container Service, Content Moderator, Data Catalog, Data Factory, Data Lake Analytics, Data Lake Store, Domain Registration, Dynamics LCS, Features, Logic, Media, Notification Hubs, Search, Servicebus, Support, Web, Scheduler, Resources, Resource Health, Operation Insights, Market Place Ordering, API Management, AD Hybrid Health Service, Server Management.

ArcSight

There are multiple options available for integrating ArcSight with logs from Azure. One solution is to use the Azure Log integration service. This service should be installed on a machine that collects logs that will be sent to the SIEM system, in this case ArcSight. However, this tool is deprecated as customers are expected to use connectors like the ones provided by Splunk and IBM QRadar. The mart connector for Microsoft Event Hub is available in ArcSight SmartConnector 7.10.0. At the time of writing this book, the smart connector supports collection of diagnostics, audit, sign-in, and activity logs from Azure.

Multi-Cloud Monitoring

For customers with a multi-cloud land space, monitoring can be even more challenging as they would need to get a holistic view of resource health across multiple cloud platforms. As per Right scale’s state-of-the-art cloud report for 2018, 84% of the organizations have a multi-cloud strategy. There are many factors to be considered while finalizing the monitoring strategy for a Multi-Cloud environment, and here are some pointers :

  • Is it easier to use the native cloud monitoring solution offered by the cloud platform for each cloud?

  • What is the trade-off when compared to the efforts involved in hopping between multiple tools?

  • Can you plug in the cloud services to your existing monitoring tools? If yes, what are the advantages and limitations?

  • What are the key monitoring metrics that are inevitable for your organization? Are they supported in the tools being considered?

  • For adopting a new monitoring tool, what is the learning curve involved and the organization’s investment toward the same?

Vendors like SolarWinds, OpsRamp, Grafana and New Relic offer multi-cloud monitoring capabilities to monitor resource deployed in cloud platforms like AWS, GCP, and Azure as well as on-premises environments. If you are already using monitoring tools, it is worth checking the availability of connectors to monitor cloud platforms using the same tool. Let us briefly explore few of these options.

System Center Operations Manager (SCOM)

System Center Operations Manager (SCOM) has an integration pack available for AWS to collect performance information from AWS resources and feed it into the tool. It leverages the Amazon Cloud watch metrics that appear as performance counters in SCOM. Alerts from cloud watch will be available as alerts in SCOM. The AWS management pack supports collection of metrics from AWS resources like EC2, EBS volumes, Load Balancers, EC2 Autoscaling groups, Availability zones, CloudWatch custom Metrics, etc. SCOM can also be integrated with Azure Monitor by installing the Azure log Analytics agent so that you have a single pane view of multiple environments.

Grafana

Another possible solution that can be explored in a multi-cloud scenario is Grafana . Grafana offers a seamless way of visualizing the data coming from multiple cloud platforms, thereby providing a dashboard view of your overall environment’s health. Grafana has plug-ins available to collect data from stack driver, cloud watch, Oracle cloud infrastructure, Azure Monitor, etc., to generate insights. With Azure, the integration is as easy as adding a data source and providing an Azure AD service principal with a log Analytics reader role assigned to it. You can also integrate application insights with Grafana by calling application insights APIs with credentials that have permission to read the metrics.

Azure Log Analytics

While considering Native Azure monitoring solution, log analytics agents support data collection from multiple sources, be it virtual/physical machines on-premise as well as resources deployed in other cloud platforms. Azure log analytics is another solution that can be considered in both multi-cloud as well as hybrid cloud scenarios as it can pretty much integrate with any environment provided there is connectivity enabled to Azure Monitor. It also provides a single dashboard view of your resources deployed in multiple platforms, with built-in analytics capabilities to give you deeper insights and generate value from your monitoring of the data.