Cloud Incident Readiness: Critical infrastructure for cloud incident response

May 6, 2025

Introduction

Welcome back to our series on preparing for incidents in the cloud. Check out Part 1 if you want to know how to setup access in the various clouds for your teams. In Part 2 we ranked the various logs for cloud incident response across the major clouds. In our final part we will go over the infrastructure that you can use to perform incident response tasks such as root cause analysis, containment and eradication. We will also provide you with some tips/hints to make your cloud incident response life easier.

Microsoft

As usual we will start with Microsoft. Let's start with the analysis phase of an incident where you typically search through the available logs for traces of an attacker. We will focus on two critical resources you want to make sure you setup right now to make incident response possible:

  • Log Analytics
  • Storage accounts

Log analytics

This Microsoft Azure service can be used to search trhough logs generated in the Microsoft cloud, but it also supports logging from on-premise systems as well as third party logging. Log Analytics (LA) is also the underlying analytics service for Microsoft Sentinel, so if you're searching through logs in there you are also using LA. Now to the point, what should you do right now to make your IR life easier:

  • Centralize your logs in LA, please forward your Big 3 logs:
    • Entra Sign-in Logs
    • Entra Audit Logs
    • Azure Activity Logs
  • Setup alerts in LA for high impact actions, when you perform a search you can create an Alert Rule that can be used to notify you if this activity occurs. This is recommended for high impact activity with low noise, some suggestions:
    • Someone logs in with the breakglass account
    • New role assignment for highly privileged roles (e.g. Global Administrator or User Access Administrator in Azure)
  • Make sure you configure a resource lock for the workspace, so it can't be 'accidentally' deleted by someone or intentionally by a threat actor:
    • Azure Log Analytics --> Select Workspace --> Locks --> Add --> Set lock type to Delete
  • Log LA queries, it's possible to audit what queries are being executed against your LA workspace. If you worry about insider threats or want to be able to audit who searched for certain sensitive data you need to enable this via:
    • Azure Log Analytics --> Select Workspace --> Select Diagnostics Settings --> Enable Audit logging

The best part of LA is ofcourse that you can query logs with KQL, if you want inspiration I suggest you check-out the work of our parttime IR specialist and fulltime KQL master Bert-Jan.

Storage accounts

As an alternative you can also configure logging that is being generated by Microsoft to be sent to a storage account. The big advantage is that it's more cost friendly than LA. The downside is it is a bit more work to query the data, because it's not directly searchable in LA. To configure logs to be sent to a storage account all you have to do is go to 'Diagnostic settings' in Entra ID or in Azure depending on the log you want to forward and set the destination to Storage account. See below an example of a configuration that forwards the Entra ID Audit & Sign-in Logs.

Storage account tips

  • Be careful with public access, by default storage accounts have quite loose access restrictions, make sure you tighten the access control to your senstivie logs
  • You need to configure retention in the storage account, oftentimes you don't want to store the logs forever or it's not allowed due to regulations
  • In case of an incident you can use Azure Data Explorer or Azure Synapse to access and analyze the data, this requires some setup but it's possible

Moving on to the later stages of an IR, containment & eradication we can also leverage various Microsoft services to make IR life easier. We will focus on the following services:

  • Subscription/Resource Group
  • Microsoft Entra ID

Subscription/Resource Group

Let's say your Azure tenant is compromised and you need to contain the incident, but you can't afford to shut down the running resources you can use Subscriptions and Resource Groups to setup a 'new' segmented environment in the cloud. Some examples:

  • Setup a new Subscription that is only accessible by IR personnel and break glass accounts
  • Setup a new Resource Group with an Azure Virtual Network with very strict access permissions
  • Create a washing street in Azure using Subscriptions or Resource Groups

Microsoft Entra ID

Within Entra ID we can leverage the security features to assist us with containment and eradication of an incident. We want to highlight the following features they do require a P1/P2 license:

  • Conditional Access
  • Privileged Identity Management

The concept of "pulling the plug" is probably familiar if you have worked on large scale breaches, it basically means you will disconnect an organization from the outside world in a (final) effort to stop the threat actor from connecting to your systems. In the cloud this concept is much harder as you can image, you can't control the datacenters of the cloud providers. In the cloud the identity oftentimes is the perimeter, this can be used to our advantage as well.

Conditional Access

With conditional acces we can create a lock-out policy which basically, prevents any identity (except one or two emergency accounts) from performing an action in the cloud. This policy is relatively straightforward and you can tweak it so that it will allow a group of incident responders or sysadmins to still perform rqeuired actions to remove the threat actors from the network.

Tip: Don't forget to exclude users that still need access ;) 
Tip 2: Use Trusted Locations to limit the logins to only known IP-addresses/ranges to make this even more secure, but be careful of IP address changes.

Privileged Identity Management (PIM)

If the previous action is a bit too 'nuclear' you can also setup more granular controls in case of an ongoing incident to support with containment and eradication. With PIM you can configure high privileged Entra ID roles such as Global Administrator, (Cloud) Application Administrator, Cloud Device Administrator and others to be limited to trusted users only. This allows you to protect additional sensitive actions to be performed by a threat actor that might still be active.

AWS

Let's switch over to AWS. There's also a bunch of things you can do right now to prepare for the worst case. For the Analysis phase we want to highlight the following things you should do.

Central log management

To be honest this is a must and not a should. If you have an Organization in AWS go to CloudTrail --> Trails--> Create Trail and enable the checkbox 'Enable for all accounts in my organization'.

Ideally, you store these logs in a separate Logs or Archive AWS account that is only used for that purpose, which makes it easier to lockdown access.

In-cloud analysis with Athena

Building on the previous task, now that we have central log management, we can leverage another AWS service called Athena. This service makes it possible to directly query data from S3 buckets in the background it's using another service Glue to map the data. To be honest AWS is awesome with documenting how you can leverage Athena so I want to make sure I refer you to the official docs on how you can query the Organization CloudTrail, all you have to do is:

  • Go to Athena
  • Go to Query Editor
  • Copy and past the SQL statement from this page
  • Modify the Account ID and location of the S3 bucket so that it matches you or your clients' logging location and run the query
  • Your logs are now searchable, no need to index or process the data
Tip: Do not write expensive queries such as SELECT * FROM cloudtrail_ logs, Athena charges on query runtime and amount of data scanned this can be costly ($$$)

Now that you have the ability to search through your data and assuming you figured out what has happened you want to perform containment and eradication actions.

  • Separate accounts
  • Cross-Org access

Separate accounts

For AWS we don't have to reinvent the wheel, there's actually a lot of information available on this topic. AWS has developed a great resource called the AWS Security Incident Response User Guide (link). It is really good and contains a lot of information on how you should setup your infrastructure, the picture below shows the high level architecture for setting up an IR capability in an AWS organization.

Now this is an oversimplified picture, but you should try to work on a setup similar to this.

Cross-Org access

What we like to do for our retainer clients is setup a way for us to activate our access in case of an incident. One of the ways to do that is using IAM roles. The example below shows how we can create an IAM role that's attached to a custom IR role in the customer tenant that can be assumed from our AWS tenant.

You will have to add in a policy to the above trust policy to define what permissions the role is allowed to.

Google

Last but not least let's talks about Google. From an incident detection perspective Google has the built-in Google Logs Explorer that can be leveraged to search through Google generated logging and Google also has their own Google Log Analytics service so let's talk about it.

Logs Explorer

Within a Google cloud environment you can always navigate to Log Explorer to search through available logs. By default Google Log Explorer allows you to search through logs that are part of a 'Logging Bucket'. A logging bucket is a special bucket in Google that can only be used for log storage, you can't manually add files to it. In order for you to prepare for an incident in Google, we recommend enabling the logs in your Google Cloud services and store them in a logging bucket. Additionally there's another 'trick' that you should implement if your company is using Google Workspace. The trick is that you can directly search these logs in Log Explorer:

  • Go to Google Workspace Admin (link)
  • Go to Account Settings - Legal & Compliance (link)
  • Under Sharing Options set it to Enabled as shown below:

Now you can go to Log Explorer on the Organization scope and you can search through the following logs from Google Workspace directly:

  • Access Transparency: Logs when Google personnel access your Google Workspace content, separate from user activity logs.
  • Google Workspace Admin Audit: Logs actions taken in the Admin console, like user creation or service changes.
  • Google Workspace Enterprise Groups Audit: Logs changes to groups and memberships, such as adding users or deleting groups.
  • Google Workspace Login Audit: Logs user sign-in events.
  • Google Workspace OAuth Token Audit: Logs usage and authorization of third-party apps accessing Google Account data.
  • Google Workspace SAML Audit: Logs successful and failed sign-ins to SAML applications.

The picture below shows Logs Explorer with the Google Workspace logs:

Now to be honest querying in Log Explorer isn't the most pleasant exercise, the query language and GUI are not our favorite. The aggregation options are also limited, that's (probably) why Google also created a service called Log Analytics. This service allows you to perform log analysis and querying using SQL. Before you can use the service you have to make the Logging Bucket, Log Analytics compatible.

When you create a new bucket you must configure the highlighted setting otherwise you can't use it in Log Analytics.

You can also upgrade existing buckets directly, but keep in mind you can't downgrade them. The upgrade basically means a schema is generated for the data in the bucket so Google's Log Analytics can perform statistics/aggregation on the data in there. The picutre below shows that 2/3 buckets are Log Analytics availble.

Within Log Analytics you can perform SQL style queries which is a lot easier than using Log Explorer:

Tip: Make sure you forward your Google Workspace logs to Google Cloud it's free of charge and make sure you are sending relevant logs to a logging bucket, this will allow the IR team to quickly start their analysis.

Another thing we can do right now in Google is using Google IAM to provide your (external) IR team access in case of an incident. Similar to an AWS IAM Role you can use Google IAM Role and assign it to an external identity. This identity is an email based identity such as user@invictus-ir.com or a Google group such as IR-Consultants@invictus-ir.com. You then assign this role to a resource or set of resources in Google Cloud. You can do this on the Organization, Folder, Project or even individual resource level. This is something you can setup right now and it will save you crucial time in case of an incident.

Conclusion

In this final part of our cloud incident response readiness series we showed the infrastructure and resources you can use to prepare for an incident across the Big 3 cloud providers. The theme across all of them is that the cloud providers have built-in log analysis tools that we can and should use. The reason for that is that the speed of going from log to analysis is unparalleled. The big issue that we see over and over again is that most organizations will start preparing for an incident after an incident, so I hope this series will be a lesson for organizations to do it right now, don't wait!

If you ever need help or custom advice on how to setup an incident response capability in the cloud, contact us!

About Invictus Incident Response

We are an incident response company and we ❤️ the cloud and specialize in supporting organizations in preparing and responding to a cyber attack. We help our clients stay undefeated!

🆘 Incident Response support reach out to cert@invictus-ir.com or go to https://www.invictus-ir.com/24-7