Rich Gibbs

AWS IMDSv2 Migration Without Breaking Things

aws · ec2 · imdsv2 · security · devops · cloud-security

If you have EC2 instances older than a year or two, some of them probably still allow IMDSv1. The Instance Metadata Service is the HTTP endpoint at 169.254.169.254 every EC2 instance can hit to learn about itself: instance ID, region, attached IAM role, and the temporary credentials that come with it. IMDSv1 is the original unauthenticated GET protocol. IMDSv2 is the session-token version that blocks a class of SSRF and confused-deputy attacks from walking off with your IAM Role credentials.

AWS has been nudging everyone toward IMDSv2 for years, but existing fleets, AMIs baked before the change, and ASGs pinned to old launch templates are full of IMDSv1-allowing instances. Migration is conceptually simple — flip a setting per instance — and operationally annoying, because flipping it on the wrong workload breaks credential lookups for SDKs, kubelet, the ECS agent, or your own scripts.

This guide walks through the migration the way an operator actually has to do it: detect what is still using v1, change instances in safe waves, validate, and have a rollback path.

Why Migrate

IMDSv1 is a plain HTTP GET against the link-local address. Anything inside the instance that can make an outbound HTTP request — including a vulnerable web app with SSRF — can read instance metadata, including the IAM Role Credentials path:

GET http://169.254.169.254/latest/meta-data/iam/security-credentials/<role-name>

That returns short-lived credentials for whatever role is attached to the instance. With IMDSv1, no proof of locality is required. An SSRF in a public-facing service can pivot directly to your IAM credentials.

IMDSv2 changes the protocol in two important ways:

  1. Session tokens. Callers PUT to /latest/api/token for a session token, then send it back as X-aws-ec2-metadata-token. SSRF primitives that only allow GET are blocked.
  2. Hop limit. The token response honors a TTL hop limit. Default is 1, so a container behind a Docker bridge or a pod behind a CNI cannot reach IMDS unless explicitly allowed.

Set IMDSv2 to required and v1 stops responding. That’s the goal state.

What Breaks

The realistic breakage list is short and well-known. Knowing it upfront is most of the migration.

  • Old AWS SDKs. Anything older than the published cutoffs only knows IMDSv1: AWS CLI v1 < 1.18.x, boto3 < 1.12.x, AWS SDK for Java v1 < 1.11.678, Go SDK v1 < 1.25.38, .NET SDK before late-2019. Modern SDKs auto-negotiate v2 with v1 fallback, but if v2 is required the fallback never engages.
  • Containers behind Docker bridge or CNI. The default hop limit of 1 denies pods/containers that route through the bridge. Raise the hop limit to 2 — or better, use IRSA on EKS, EC2 Pod Identity, or task roles on ECS so workloads don’t depend on instance metadata at all.
  • kubelet on self-managed nodes. Older kubelets only spoke v1. Modern EKS-optimized AMIs are fine; legacy kops clusters and old custom AMIs are the usual offenders.
  • ECS agent. amazon-ecs-init >= 1.50 supports IMDSv2. Old ECS-optimized AMIs not re-rolled in years can fail credential fetch.
  • CloudWatch / SSM agent. Recent versions fine; very old pinned versions not.
  • Custom scripts. curl http://169.254.169.254/latest/meta-data/... without a token will 401 once v1 is off.
  • Third-party agents in old AMIs. Old Datadog, New Relic, Splunk, or backup agents from years-old golden images can be v1-only.

That’s the whole list. Everything else either works on day one or never touched IMDS.

Detect IMDSv1 Use

Don’t flip the switch blind. Find the callers first.

CloudWatch metric: MetadataNoToken

Every EC2 instance emits a CloudWatch metric called MetadataNoToken in the AWS/EC2 namespace. It increments every time something on the instance hits IMDSv1. This is the single most useful signal you have.

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name MetadataNoToken \
  --dimensions Name=InstanceId,Value=i-0123456789abcdef0 \
  --statistics Sum \
  --period 3600 \
  --start-time "$(date -u -d '7 days ago' +%Y-%m-%dT%H:%M:%SZ)" \
  --end-time   "$(date -u +%Y-%m-%dT%H:%M:%SZ)"

If Sum across the last 7 days is 0, that instance is not making any IMDSv1 calls and is safe to switch. Anything non-zero means something is still hitting v1.

For a fleet view, query across all instance IDs or use CloudWatch Metrics Insights / Metric Math to graph MetadataNoToken aggregated. Tag the noisy instances and dig in.

Inventory: which instances even allow v1?

aws ec2 describe-instances \
  --query 'Reservations[].Instances[].{
    Id:InstanceId,
    State:State.Name,
    HttpTokens:MetadataOptions.HttpTokens,
    HopLimit:MetadataOptions.HttpPutResponseHopLimit,
    Endpoint:MetadataOptions.HttpEndpoint
  }' \
  --output table

HttpTokens is what you care about. It will be one of:

  • optional — IMDSv1 still allowed (the thing you’re trying to remove)
  • required — IMDSv2 only (the goal state)

A simple “what’s left?” query:

aws ec2 describe-instances \
  --filters "Name=metadata-options.http-tokens,Values=optional" \
            "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[].InstanceId' \
  --output text

CloudTrail and VPC flow logs

CloudTrail does not log calls to IMDS itself — those never leave the instance. What it does show is the AWS API calls made with the credentials IMDS handed out, via userIdentity.sessionContext and the accessKeyId of the temporary credentials. Useful for finding workloads still authenticating via instance role that should have moved to IRSA or task roles.

VPC flow logs do not see 169.254.169.254 traffic either — link-local stays inside the host. Stick to MetadataNoToken plus the inventory query.

On-host detection

If you have shell access to a candidate instance, run something quick before you change settings:

# Try IMDSv1 — if this returns data, v1 is still on
curl -s -o /dev/null -w "%{http_code}\n" \
  http://169.254.169.254/latest/meta-data/instance-id

# Try IMDSv2 — should always return 200 once v2 is supported
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/instance-id

To find callers on a host, auditd rules on connects to 169.254.169.254 plus ss -tnp snapshots usually identify the offending process. On a Kubernetes node, look at old DaemonSets and sidecars first.

Migration Steps

The flow that has worked reliably for small and mid-size fleets:

1. Baseline and freeze new IMDSv1

Set account-level defaults so anything launched from now on is IMDSv2-required and any new AMIs are also v2-required:

# Default IMDS options for new instances in this region
aws ec2 modify-instance-metadata-defaults \
  --http-tokens required \
  --http-put-response-hop-limit 2 \
  --http-endpoint enabled

# Default for newly-registered AMIs
aws ec2 modify-image-attribute \
  --image-id ami-xxxxxxxxxxxxxxxxx \
  --imds-support v2.0

Use modify-image-attribute --imds-support v2.0 on each AMI you control. Once set, instances launched from that AMI get v2-required automatically.

Also set the launch template / Auto Scaling group launch template versions to require IMDSv2:

aws ec2 create-launch-template-version \
  --launch-template-id lt-0123456789abcdef0 \
  --source-version 1 \
  --launch-template-data '{
    "MetadataOptions": {
      "HttpTokens": "required",
      "HttpPutResponseHopLimit": 2,
      "HttpEndpoint": "enabled"
    }
  }'

This stops the bleeding. Old instances may still be on v1, but no new ones are.

2. Sort instances into waves

Pull the list of HttpTokens=optional instances. Group them by:

  • Wave 0 — disposable. Stateless workers, batch nodes, dev/test. Cheap to break, cheap to recreate. Migrate first.
  • Wave 1 — replaceable through autoscaling. ASG-managed web tiers, ECS/EKS nodes. New launches are already v2-required; old nodes get rotated out by simply triggering an instance refresh.
  • Wave 2 — stateful or hand-built. Bastions, databases on EC2, single-instance services, anything pet-shaped.

For waves 0 and 1, prefer rotation over modification — relaunch from updated launch templates rather than mutating live instances. Less risky, fewer surprises.

3. Optional: try optionalrequired with a hop bump

For a stateful instance you cannot easily relaunch, raise the hop limit first (so containers keep working), then flip tokens to required:

# Step A: bump hop limit while still allowing v1
aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-put-response-hop-limit 2 \
  --http-tokens optional \
  --http-endpoint enabled

# Verify everything still works for at least one full agent cycle
# (CloudWatch agent, SSM agent, your app, container credential lookups)

# Step B: require v2
aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens required

Watch MetadataNoToken after step A — if any callers are still using v1, they will keep showing up in the metric. Fix or upgrade them before step B.

4. Roll Auto Scaling groups

After the launch template is updated:

aws autoscaling start-instance-refresh \
  --auto-scaling-group-name my-asg \
  --preferences '{"MinHealthyPercentage": 90, "InstanceWarmup": 300}'

For EKS managed node groups, the equivalent is updating the node group to a new launch template version and letting AWS drain and replace nodes. For ECS, update the capacity provider’s launch template and either drain instances or wait for natural turnover.

5. Sweep and confirm

After each wave, re-run the inventory query and the MetadataNoToken check. Anything still on optional should have a name attached to it and a reason.

Mid-article CTA: Want a one-shot read-only audit that tells you which of your EC2 instances still allow IMDSv1, plus a dozen other quiet AWS posture issues? That’s exactly what QuickCheck is built for. Skim a sample report before you decide.

Validation

After you flip an instance, you want fast confirmation it’s actually on v2 and nothing is silently failing.

Confirm v2-required at the API level

aws ec2 describe-instances \
  --instance-ids i-0123456789abcdef0 \
  --query 'Reservations[0].Instances[0].MetadataOptions'

Expected:

{
  "State": "applied",
  "HttpTokens": "required",
  "HttpPutResponseHopLimit": 2,
  "HttpEndpoint": "enabled",
  "HttpProtocolIpv6": "disabled",
  "InstanceMetadataTags": "disabled"
}

State: applied matters — pending means the change has not landed yet.

Confirm v1 is actually rejected on the host

# Should now return 401 Unauthorized
curl -s -o /dev/null -w "v1: %{http_code}\n" \
  http://169.254.169.254/latest/meta-data/instance-id

# Should return 200 with the instance ID
TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  -w "\nv2: %{http_code}\n" \
  http://169.254.169.254/latest/meta-data/instance-id

v1: 401 and v2: 200 is the correct pair.

Confirm credentials still resolve

TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
ROLE=$(curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/)
curl -s -H "X-aws-ec2-metadata-token: $TOKEN" \
  http://169.254.169.254/latest/meta-data/iam/security-credentials/$ROLE \
  | head -c 200; echo

You should see AccessKeyId, SecretAccessKey, Token, and Expiration.

Confirm app-level health

  • aws sts get-caller-identity from the instance using whichever SDK your workloads use.
  • Container credential lookups from inside one container per host (especially if you raised the hop limit).
  • ECS agent: curl -s http://localhost:51678/v1/metadata should still respond.
  • kubelet health: nodes still Ready, image pulls from ECR still work.

Confirm MetadataNoToken is zero

After 24–48 hours on v2-required, MetadataNoToken should be a flat zero line. If not, something is still calling v1 — which now means it is failing. Find it.

Rollback

You want this written down before you need it.

Per-instance rollback is one CLI call:

aws ec2 modify-instance-metadata-options \
  --instance-id i-0123456789abcdef0 \
  --http-tokens optional \
  --http-put-response-hop-limit 2 \
  --http-endpoint enabled

That re-enables IMDSv1 immediately, no instance restart required. It is the same call you used to flip forward — just with optional instead of required.

Launch template rollback: revert to the previous version.

aws ec2 modify-launch-template \
  --launch-template-id lt-0123456789abcdef0 \
  --default-version 1

Auto Scaling rollback: trigger another instance refresh against the previous LT version, or roll forward with a fixed template once you know what broke. Avoid the temptation to mutate live ASG instances; relaunch is cleaner.

For account-level defaults, you can re-relax them, but generally do not. Once new instances are v2-required by default, leave that in place even if you have to roll back individual stragglers.

QuickCheck CTA

If you’d rather not hand-roll the inventory queries and CloudWatch checks across every account and region, QuickCheck runs a read-only, one-shot review of your AWS posture and produces a plain-English report. IMDSv1 stragglers are one of the dozen things it surfaces — alongside open security groups, public S3, missing MFA on root, untagged keys, and a few other “you’d rather know” items. See an example in the sample report. It is not magic and not a replacement for proper cloud security tooling, but it is a fast way to know where you stand before you start migrating.

What This Is Not

To set expectations clearly:

  • This is not a penetration test. It is a configuration migration, not an adversarial exercise.
  • This is not a certification or compliance attestation. Migrating to IMDSv2 is a control improvement; it does not by itself constitute SOC 2, ISO 27001, PCI, or anything else. Your auditor still wants the artifacts they always want.
  • This is not a guarantee. Cloud security is a portfolio of controls. IMDSv2 closes one well-known SSRF-to-credentials path; it does not address misconfigured security groups, overly broad IAM policies, leaked long-lived keys, or vulnerable application code. Treat it as one item on the list.
  • This is not a substitute for moving workloads to IRSA / EC2 Pod Identity / ECS task roles where those fit. IMDSv2 makes instance metadata safer; per-workload identity is still the better long-term answer for containers.

Migrate to IMDSv2 because it is cheap, well-understood, and removes a real foot-gun. Then keep going.


About Tuck Sentinel

Tuck Sentinel is the security-focused side of an indie operator workshop by Rich Gibbs. It builds small, sharp tools — like QuickCheck — for founders and small teams who want a competent read of their cloud posture without an enterprise platform. The bias: fast, honest, read-only assessments and migrations you can actually finish.

{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "AWS IMDSv2 Migration Without Breaking Things",
  "description": "A practical, indie-founder guide to migrating EC2 instances from IMDSv1 to IMDSv2 without breaking SDKs, containers, kubelet, or the ECS agent.",
  "author": {
    "@type": "Organization",
    "name": "Tuck Sentinel"
  },
  "publisher": {
    "@type": "Organization",
    "name": "Tuck Sentinel",
    "url": "https://richgibbs.dev/"
  },
  "mainEntityOfPage": {
    "@type": "WebPage",
    "@id": "https://example.com/blog/aws-imdsv2-migration-without-breaking-things"
  },
  "image": "https://example.com/og/aws-imdsv2-migration.png",
  "articleSection": "Cloud Security",
  "keywords": "AWS, EC2, IMDSv2, IMDSv1, cloud security, IAM, SSRF, migration",
  "about": [
    { "@type": "Thing", "name": "AWS EC2 Instance Metadata Service" },
    { "@type": "Thing", "name": "IMDSv2" },
    { "@type": "Thing", "name": "Cloud Security Posture" }
  ]
}