Skip to content

Memory Manager Hugepages Availability Verification #5759

@srikalyan

Description

@srikalyan

Enhancement Description

This enhancement proposes adding OS-level hugepage availability verification to the Memory Manager's Static policy during pod admission.

Problem Statement

The Memory Manager's Static policy tracks hugepage allocations only for Guaranteed QoS pods. Burstable pods can legitimately request hugepages through standard Kubernetes resource requests, but these allocations are not tracked by the Memory Manager for NUMA placement purposes.

This creates a tracking gap where:

  1. The scheduler approves a Guaranteed pod (node-level accounting shows availability)
  2. The Memory Manager's internal state shows hugepages as available
  3. But the OS has already allocated those hugepages to a Burstable pod
  4. The Guaranteed pod fails at runtime when hugepages are exhausted

Real-world example from #134395:

  • Memory Manager internal state: 15.2 GB free hugepages
  • Actual OS state (sysfs): 3.2 GB free hugepages

Proposed Solution

  1. cadvisor: Add FreePages field to HugePagesInfo struct, populated from sysfs
  2. Memory Manager: Verify OS-reported free hugepages during Allocate() before admitting pods

Related Links

Graduation Criteria

  • Alpha (v1.36): Feature gate MemoryManagerHugepagesVerification, unit tests, e2e tests
  • Beta (v1.37): Metrics, user feedback incorporated
  • GA (v1.38): Feature enabled by default

/sig node
/kind feature

Metadata

Metadata

Assignees

Labels

kind/featureCategorizes issue or PR as related to a new feature.lead-opted-inDenotes that an issue has been opted in to a releasesig/nodeCategorizes an issue or PR as relevant to SIG Node.stage/alphaDenotes an issue tracking an enhancement targeted for Alpha status

Type

No type

Projects

Status

At risk for PRR freeze

Relationships

None yet

Development

No branches or pull requests

Issue actions