Incident Response in Kubernetes (EKS)
The pager goes off at 3:00 AM.
Your security dashboard is flashing red with an alert inside a production cluster. In the old days of physical servers, you could simply pull the network cable. But this is Kubernetes. If you simply kill the compromised pod, the scheduler will faithfully recreate it elsewhere, potentially bringing the attacker along for the ride.
Welcome to the first part of our series on incident response in managed Kubernetes clusters. Over the next three posts, we’ll be exploring how to handle security breaches across the big three: EKS (AWS), GKE (Google Cloud), and AKS (Azure), starting with EKS.
tl;dr
This guide covers how logging is handled in EKS and advises turning on logging before an incident occurs. Since EKS containers are non-persistent, any data not forwarded to CloudWatch (or another log analysis platform) is lost forever during a restart.
Understanding EKS
Before we can investigate Amazon Elastic Kubernetes Service (EKS), we have to understand how it actually operates. Kubernetes clusters provide a notorious level of management overhead. To solve this, Amazon provides EKS, where Amazon is responsible for hosting and managing the cluster, offloading the complexity from the operator.
The cluster consists of two main parts:
- The Control Plane: This is managed by AWS. It includes the Kubernetes API server, the etcd database (where all your cluster configs live), and the controllers.
- The Data Plane: These are the worker nodes where your actual code runs.
An analogy for this is like renting an apartment. The landlord is responsible for the state of the house, the plumbing etc., so you as the renter can be busy decorating the place. However, if a guest starts a fire, the landlord isn't going to put it out for you, but they might have CCTV footage of who entered the building.

Investigating EKS
Within EKS, there are several log sources which differ in purpose for a security investigation, as shown in table 1.
These logs are supported by several non-EKS specific log sources, as depicted in table 2.
Log collection
By default, if a pod is compromised and then crashes or is deleted, its local logs vanish. This is why shipping logs to a centralized location (like CloudWatch Logs or an S3 bucket) is an advised step. If you aren't shipping them, the attacker can cover their tracks just by crashing the container. The AWS CLI can be used to enable all logging as mentioned in table 1 via:
aws eks update-cluster-config \
--region region-code \
--name my-cluster \
--logging '{"clusterLogging":[{"types":["api","audit","authenticator"
,"controllerManager","scheduler"],"enabled":true}]}'
In order to send application logs to a centralized environment (such as CloudWatch), a separate agent is required to forward these, such as Fluent Bit. Furthermore, if kubectl is configured correctly, it can be used to retrieve the logs as well. The limiting factor here is that kubectl can only interact with the dataplane of the cluster and thus only retrieve a handful of logs, such as the pods' stdout logging. Control plane logging (such as the Audit Logs) are handled by AWS and confined as such).
Pod Forensics
While this guide focuses on the logs from EKS itself, sometimes the investigation requires you to get your hands dirty on the worker node itself. If an attacker has deployed fileless malware, logs alone won't tell the full story. For a deep dive into capturing memory dumps, analyzing container runtimes (containerd/CRI-O), and inspecting the /var/log/pods directory on the physical host, we recommend this SANS guide on Forensics in EKS.
Mapping to Kubernetes TTPs
Microsoft released the threat matrix for Kubernetes, providing a way to map attacker behaviour (TTPs) to goals (see figure 2).

We can use this matrix to see where behaviour would be detected, see table 3. Do note that this table is non-exhaustive, as certain tactics can be found in multiple log sources, depending on the technique used.
Containment & Eradication
Once the compromise has been detected, it is important to eradicate the actor as soon as possible to perform some damage control. A possibility is to power off the compromised resource. This however comes with the caveat that non-persistent data is lost, such as temporary files and fileless malware. Hence, it is recommended to let the resource live and rather quarantine it:
- Apply a "Deny-All" NetworkPolicy to the specific pod. This cuts the attacker's connection to their C2 server but keeps the pod alive for you to investigate.
- Remove the pod's labels so the Load Balancer stops sending it production traffic.
Ensure that compromised authentication data (such as access keys) are rotated and vulnerabilities are patched. Remember: Treat your containers like cattle, not pets. Don't try to clean a compromised container. Patch the vulnerability in your code, build a new image, and deploy a fresh, clean version of the app.
Investigating in CloudWatch
Having the logs in CloudWatch is one thing, finding the right events is another. We have thus added a few sample queries for possible abuse.
Common attack scenarios
Investigate kube exec abuse:
The kubectl tool can be used to directly execute commands within a pod, essentially granting a shell. Kubectl talks to the Kubernetes API server. Thus, execution can be found via filtering on Kubernetes API requests where the requestURI contains the ‘exec’ keyword, as such:
fields @timestamp, requestURI, userAgent, sourceIPs.0
| filter @logStream like /kube-apiserver-audit/
| filter requestURI like "/exec"
| sort @timestamp desc
While activity may appear suspicious, it can be a legitimate administrator performing routine tasks. Correlate the user and the source IP address to determine whether this concerns malicious behaviour.
Investigate forbidden secrets listing:
Furthermore, kubectl can be used to retrieve secrets from the cluster. The behaviour is quite similar to attempting to exec into a pod via the tool, so the query to detect it is also quite similar:
fields user.username, responseStatus.code, operation, requestURI
| filter @logStream like /kube-apiserver-audit/
| filter requestURI like "/secrets"
| filter responseStatus.code = 403
| stats count(*) by user.username, requestURI
Real-World Context
In mid 2025, the actor TraderTraitor, which is a North Korean state-sponsored threat group, was seen attacking a cryptocurrency exchange, where a Kubernetes cluster was attacked and used as a pivot to other services [source]. Initial access was achieved to the cluster via phishing, granting the actor the ability to deploy a malicious pod, which was designed to expose the mounted service account token. Via the privileged service account, the actor could authenticate to the Kubernetes API, perform discovery and create a backdoor into a production pod to maintain persistent access within the cluster. This furthermore granted the ability to move laterally to other cloud services, after which they could achieve their final goal by reaching the financial systems.
Conclusion
Incident Response in EKS is a shared responsibility with Amazon. As we saw in the real-world case, Kubernetes can also serve as a pivot to other internal (cloud) systems, thus the need to appropriately harden the environment. Since containers within Kubernetes are ephemeral, ensure that logging is enabled beforehand and is accessible for your IR team. If an incident occurs, quarantine the infected resources and approach it as cattle not as pets by replacing the infected resources.
Coming up next in Part 2: We’re crossing the fence over to Google Cloud to look at GKE. We’ll see how Google handles logging differently and how it compares to AWS.
About Invictus Incident Response
We are an incident response company and we ❤️ the cloud. We help our clients stay undefeated.
🆘 Incident Response support: reach out to cert@invictus-ir.com or go to https://www.invictus-ir.com/24-7
Be ready for the next cloud incident.
