Go Runtime Metrics: CPU, GC, Memory, Scheduler

Go exposes many runtime metrics through runtime/metrics, and many metrics clients (Prometheus, VictoriaMetrics) use it to export Go runtime stats from your application. These metrics include things like:

how much memory your Go runtime has mapped,

how much is used by heap objects,

how much is free,

how much has been returned to the OS,

how much time goroutines have spent waiting for mutex locks,

etc.

Getting Go runtime stats in a few lines:

go
func main() {
    s := []metrics.Sample{{Name: "/sched/goroutines:goroutines"}}
    metrics.Read(s)

    if s[0].Value.Kind() == metrics.KindBad {
        fmt.Printf("metric not supported")
        return
    }
}

Below is the Go runtime metrics cheat sheet.

NOTES

Gauge means a value right now.

Cumulative means a counter that only goes up.

Estimated means the runtime is measuring a real quantity (often time/bytes) but can’t measure it exactly, so it computes a best-effort estimate (usually close, but not guaranteed exact).

Approximate means the runtime is counting things, but the count itself is intentionally rough (may miss or double-count due to sampling/races/aggregation), so it may not match other “counts” of the same thing.

CGO

This number is only relevant if your program uses cgo, which means calling C from Go.

/cgo/go-to-c-calls:calls (uint64, cumulative) How many calls were made from Go into C since the process started.

CPU

These metrics split up where CPU time goes inside the Go process. They are cumulative (they only go up), so you usually look at how fast they increase over time. These are estimates, so they are most useful when comparing with other /cpu/classes metrics.

/cpu/classes/gc/mark/assist:cpu-seconds (float64, cumulative, estimated) CPU time your goroutines spent doing some of the garbage collector’s “marking” work (figuring out which objects are still reachable) while your program runs.

During a GC cycle, the runtime needs to scan a certain amount of memory and if your program is allocating quickly, the GC may not scan fast enough to finish on time. In that case, the runtime can make the goroutine that is allocating memory stop briefly and do a small amount of marking work before it can continue. If this grows quickly, allocations may feel slower because your goroutines are being used for GC work more often.

/cpu/classes/gc/mark/dedicated:cpu-seconds (float64, cumulative, estimated) CPU time spent doing GC marking work by GC worker goroutines that the runtime runs specifically for GC. These are dedicated GC workers: when they are running, that CPU time is going to GC instead of your application code.

/cpu/classes/gc/mark/idle:cpu-seconds (float64, cumulative, estimated) CPU time spent running the GC’s marking phase using spare CPU time, when the scheduler had nothing else useful to run. This is GC work happening in the “gaps” when your program is not fully using the available CPU.

It usually hurts throughput less than dedicated GC CPU, and subtracting this from total GC CPU gives a rough idea of how much GC work competed directly with your application.

/cpu/classes/gc/pause:cpu-seconds (float64, cumulative, estimated) CPU time attributed to GC “stop-the-world” pauses, when the runtime briefly stops all goroutines so the GC can safely do certain work. It is computed as (GOMAXPROCS * pause time), because during the pause your whole Go process is effectively not doing normal work. If this grows quickly, GC pauses may be contributing to latency spikes.

For the pause durations themselves (in seconds), see /sched/pauses/total/gc:seconds.

/cpu/classes/gc/total:cpu-seconds (float64, cumulative, estimated) Total CPU time spent doing garbage collection work. This is the sum of assist, dedicated, idle, and pause.

/cpu/classes/idle:cpu-seconds (float64, cumulative, estimated) CPU time that was available to the Go scheduler but wasn’t used. If this grows fast, your process had unused CPU time. It may be waiting on something else like I/O, locks, or the OS.

/cpu/classes/scavenge/assist:cpu-seconds (float64, cumulative, estimated) CPU time spent quickly returning unused Go heap memory back to the OS when the runtime decides memory is tight.

This is work done to shrink the process sooner (so the OS or your container can reclaim RAM), and it can cost CPU. If this rises quickly, Go is spending more time trading CPU for a smaller memory footprint.

/cpu/classes/scavenge/background:cpu-seconds (float64, cumulative, estimated) CPU time spent in the background returning unused memory to the OS. This is the “low priority” version of scavenging. It usually matters less for program speed than assist scavenging.

/cpu/classes/scavenge/total:cpu-seconds (float64, cumulative, estimated) Total CPU time spent returning unused memory to the OS. This is the sum of assist and background.

/cpu/classes/total:cpu-seconds (float64, cumulative, estimated) Total CPU time “budget” available to this Go process over time, based on GOMAXPROCS. A simple way to think about it is: if GOMAXPROCS is 4, this number grows by about 4 CPU-seconds each second of wall time.

Compare this with user and idle to see whether the process is actually using the CPU time it could use.

/cpu/classes/user:cpu-seconds (float64, cumulative, estimated) CPU time spent running your application’s Go code (your goroutines doing your work). A small amount of Go runtime work may be included. Compare this with total to estimate how much of the available CPU time is going to your code.

GC

The garbage collector, also called GC, frees heap memory that is no longer reachable. The heap is where most allocations made with make or new end up. GC work and allocation rates are often the first place to look when memory grows.

/gc/cleanups/executed:cleanups (uint64, cumulative, approximate) How many cleanup functions have run since start. A cleanup is a function registered with runtime.AddCleanup, which the runtime runs later after an object becomes unreachable (usually to release some underlying resource). If this is much smaller than “queued”, many cleanups are still waiting to run.

/gc/cleanups/queued:cleanups (uint64, cumulative, approximate) How many cleanup functions have been queued to run since start. Subtract “executed” from this to estimate how many are waiting.

/gc/cycles/automatic:gc-cycles (uint64, cumulative) How many GC cycles the runtime started by itself. The runtime triggers these when it decides it is time to collect garbage (based on how the heap is growing and the current GC settings). If this rises quickly, your app is spending more time stopping to collect garbage.

/gc/cycles/forced:gc-cycles (uint64, cumulative) How many GC cycles were started because your program explicitly asked for one (for example, by calling runtime.GC). This is common in tests or debugging. In prod, forcing GC often adds extra work and can increase latency if done frequently.

/gc/cycles/total:gc-cycles (uint64, cumulative) Total number of GC cycles since start. This includes automatic and forced cycles. If this rises quickly, GC is running often.

/gc/finalizers/executed:finalizers (uint64, cumulative) How many finalizer functions have actually run since start. A finalizer is a function registered with runtime.SetFinalizer that the runtime may run later, after an object becomes unreachable, usually to release some underlying resource.

Finalizers run on a single runtime goroutine, so if finalizers are slow or numerous, this number may grow slowly and a backlog can build up.

/gc/finalizers/queued:finalizers (uint64, cumulative) How many finalizer functions have been queued up to run since start (because their objects became unreachable). Subtract “executed” from this to estimate how many are waiting.

If the gap keeps growing, finalizers are piling up faster than they are being processed, which can add CPU work and make resource release timing less predictable.

/gc/gogc:percent (uint64, gauge) Your GC heap growth target (a percentage, default 100). Higher means “allow the heap to grow more between GC cycles”, which can reduce CPU spent on GC but use more memory.

Lower means GC runs more often and uses more CPU. This is controlled by the GOGC setting (or runtime/debug.SetGCPercent).

/gc/gomemlimit:bytes (uint64, gauge) The Go runtime’s memory limit for the process. When set, Go tries harder to keep memory below this level. This is controlled by the GOMEMLIMIT setting.

If this is set low, you may see more GC activity and more memory being returned to the OS.

/gc/heap/allocs-by-size:bytes (float64 histogram, cumulative) Histogram of heap allocations by approximate size (bucket counts only go up). It helps you see whether you allocate mostly small objects or frequently allocate large ones.

Note: this does not include tiny objects counted by /gc/heap/tiny/allocs:objects (it counts tiny blocks instead).

/gc/heap/allocs:bytes (uint64, cumulative) Total heap bytes your program has allocated since start. This is a lifetime total (it never goes down), so it tells you how much allocation work is happening over time, not how much memory you are using right now.

/gc/heap/allocs:objects (uint64, cumulative) Total number of heap allocation events since start. The rate of increase tells you how frequently your program allocates objects; a high rate is often caused by many small, short-lived allocations.

Note: tiny allocations are not counted as individual objects here, see /gc/heap/tiny/allocs:objects.

/gc/heap/frees-by-size:bytes (float64 histogram, cumulative) Histogram of allocation sizes whose memory was freed by the GC. Comparing this with allocs by size can hint at which sizes tend to die quickly.

If most frees are small, your workload may create many short lived small objects.

/gc/heap/frees:bytes (uint64, cumulative) Total heap bytes that the garbage collector has freed since start. This is a lifetime total (it never goes down), so compare how fast it grows with allocs to understand churn.

If frees grows much slower than allocs, your program is likely keeping more data alive (the live heap is growing).

/gc/heap/frees:objects (uint64, cumulative) Total number of heap allocations whose storage was freed by the garbage collector since start. The rate tells you how many objects are dying and getting reclaimed over time.

Note: tiny allocations are not counted as individual objects here; see /gc/heap/tiny/allocs:objects.

/gc/heap/goal:bytes (uint64, gauge) The heap size target for the end of the current GC cycle. The GC tries to keep the live heap under this goal. If the goal is low, GC may run more often to keep memory down.

/gc/heap/live:bytes (uint64, gauge) How many heap bytes were still reachable after the most recent GC cycle (the “live heap”). This is a good approximation of how much heap memory your program is actually using, not just allocating and freeing over time.

If this trends upward across many GC cycles, your program is keeping more data alive (for example, growing caches, slices, maps, or leaks).

/gc/heap/objects:objects (uint64, gauge) How many heap allocations are currently occupying heap memory (live objects plus objects not yet fully swept). This helps you see whether your live heap is made up of many small objects or fewer large ones.

If this grows steadily, you may be retaining more objects than expected (even if each object is small).

/gc/heap/tiny/allocs:objects (uint64, cumulative) Count of “tiny” allocations. On 64-bit Go, “tiny” here means very small, pointer-free allocations smaller than 16 bytes.

If this increases quickly, your program is doing lots of very small allocations.

/gc/limiter/last-enabled:gc-cycle (uint64, gauge) The GC cycle number when the runtime last enabled the GC CPU limiter. 0 means it has never been enabled.

The CPU limiter is a safety feature in Go runtime. If GC starts taking too much CPU, the runtime backs off (often by letting the heap grow more) so your program can keep making progress. This can happen more often when you set a memory limit (GOMEMLIMIT).

/gc/pauses:seconds (float64 histogram, cumulative) Deprecated. Use /sched/pauses/total/gc:seconds instead.

/gc/scan/globals:bytes (uint64, gauge) How many bytes of package-level (global) data the GC needs to scan for pointers during a GC cycle.

Larger values mean the GC has more work every cycle even if you are not allocating much, which can increase GC CPU time.

/gc/scan/heap:bytes (uint64, gauge) How many heap bytes are “scannable” in the current GC cycle, meaning they may contain pointers the GC must follow. Pointer-free heap data is mostly skipped.

Higher values mean the GC has more to look through each cycle, which can increase GC CPU time.

/gc/scan/stack:bytes (uint64, gauge) How many bytes of goroutine stack the GC scanned in the last GC cycle (looking for pointers to heap objects).

This tends to grow with more goroutines and larger or deeper stacks, and it can add GC work even if the heap is stable.

/gc/scan/total:bytes (uint64, gauge) Approximate total bytes of memory the GC may need to scan for pointers. This is the sum of /gc/scan/globals, /gc/scan/heap, and /gc/scan/stack. Higher values usually mean more GC scanning work per cycle.

/gc/stack/starting-size:bytes (uint64, gauge) Starting stack size for new goroutines. Go stacks grow and shrink as needed, but every new goroutine begins with this initial stack reservation.

If this is larger, new goroutines use more memory up front, but they may need to grow their stacks less often.

MEMORY

These are “how much memory is in each bucket right now” gauges. The heap is memory used for objects. Stacks are per goroutine call stacks. “Metadata” iis runtime bookkeeping needed to manage allocations.

/memory/classes/heap/free:bytes (uint64, gauge) Heap memory that is completely free and could be returned to the OS, but has not been returned yet. This is free space inside Go’s heap that is still backed by real memory.

If this is high, the process can look large at the OS level even though the heap has plenty of free space, and it may shrink later if the runtime returns memory to the OS.

/memory/classes/heap/objects:bytes (uint64, gauge) Heap memory occupied by objects: both live objects and objects that are already dead but have not been marked free yet.

This is the closest “how much heap is filled with objects right now” number. If this is high and stays high, your program is keeping a lot of data in memory, or the GC has not reclaimed it yet.

/memory/classes/heap/released:bytes (uint64, gauge) Heap memory that Go has returned to the OS. It is still part of the process address space, but the OS does not need to keep physical memory for it.

Go is giving memory back if this grows. The process may look smaller in RSS, even if the total mapped address space does not change much.

/memory/classes/heap/stacks:bytes (uint64, gauge) Heap memory reserved for stacks. This includes goroutine stacks (and, in some cases, OS thread stacks), whether or not the full reserved space is currently being used.

If this is high, you may have many goroutines or goroutines that need larger stacks (deep call chains, large local variables), which can make stack memory a noticeable part of total memory.

/memory/classes/heap/unused:bytes (uint64, gauge) Heap memory reserved for heap objects but not currently holding objects. Think of it as empty space inside Go’s heap that is ready to be reused for future allocations.

If this is high while /memory/classes/heap/objects is low, your process may look larger than the live heap because Go is holding on to reserved heap space for reuse.

/memory/classes/metadata/mcache/free:bytes (uint64, gauge) Memory reserved for the runtime’s mcache bookkeeping (allocator caches), but not currently in use. The allocator keeps these caches so small allocations can be fast.

This is internal runtime overhead and usually not something you tune directly.

/memory/classes/metadata/mcache/inuse:bytes (uint64, gauge) Memory currently used by the runtime’s mcache bookkeeping (allocator caches).

If this is high, more memory is going to allocator cache bookkeeping, often because the program is doing lots of allocations across many goroutines and threads.

/memory/classes/metadata/mspan/free:bytes (uint64, gauge) Memory reserved for the runtime’s mspan bookkeeping (metadata used to track regions of the heap), but not currently in use.

This is internal allocator overhead. If it grows a lot, it can mean the heap is fragmented into many tracked pieces.

/memory/classes/metadata/mspan/inuse:bytes (uint64, gauge) Memory currently used by the runtime’s mspan bookkeeping (metadata used to track regions of the heap). If this is high, more memory is going to heap-tracking metadata, often because the heap has many active spans to manage.

/memory/classes/metadata/other:bytes (uint64, gauge) Other runtime metadata memory that doesn’t fit the other metadata buckets. This includes GC-related internal buffers and other runtime bookkeeping. If this is high, the runtime itself is using more memory overhead, even if your heap objects are not growing.

/memory/classes/os-stacks:bytes (uint64, gauge) Stack memory allocated directly by the OS for OS threads (not goroutine stacks). This is often 0 in pure Go programs, but can grow when using cgo or when many OS threads are created.

/memory/classes/other:bytes (uint64, gauge) Other runtime memory that doesn’t fit the other buckets. This includes things like execution trace buffers, runtime debugging structures, and internal bookkeeping for finalizers and profiling.

If this grows, it is often because tracing/debugging/profiling features are enabled or heavily used.

/memory/classes/profiling/buckets:bytes (uint64, gauge) Memory used by the runtime’s profiling data structures for tracking stack traces (a hash map of stack trace “buckets”). This can grow when you enable or heavily use profilers that record stack traces.

/memory/classes/total:bytes (uint64, gauge) Total read-write memory mapped by the Go runtime for this process. This is the sum of all the /memory/classes buckets (heap, stacks, and runtime metadata), and it does not include memory owned by C code via cgo.

This can be much larger than “heap objects” because it includes free space, reserved space, and runtime overhead.

SCHEDULER

The scheduler decides which goroutines run on which OS threads. “Runnable” means ready to run. “Running” means currently on CPU. “Waiting” means blocked, like on I/O, channels, or locks.

/sched/gomaxprocs:threads (uint64, gauge) Current GOMAXPROCS value, which limits how many OS threads may run Go code at the same time. If this is smaller than the available CPU cores, your program may not use all cores even if there is enough runnable work.

/sched/goroutines-created:goroutines (uint64, cumulative) Total number of goroutines created since start. This is a lifetime counter, not the number currently alive.

If this grows very fast, your program is creating goroutines frequently, which can add overhead (scheduling work, stack growth, and garbage).

/sched/goroutines/not-in-go:goroutines (uint64, gauge, approximate) Number of goroutines that are not running Go code because they are in a system call or a cgo call (either running in the kernel/C, or blocked there). If this is high, your program may be spending a lot of time waiting on the OS (disk, network, timers) or inside C code.

/sched/goroutines/runnable:goroutines (uint64, gauge, approximate) Goroutines that are ready to run, but are waiting for CPU time (not currently executing).

If this stays high, the program likely has more runnable work than CPU (or GOMAXPROCS) can handle, so goroutines are building up in the run queue.

/sched/goroutines/running:goroutines (uint64, gauge, approximate) Goroutines currently executing on CPU. This is capped by /sched/gomaxprocs:threads.

If this is near /sched/gomaxprocs:threads and runnable is also high, the program likely has more work than CPU, so extra goroutines are waiting their turn.

/sched/goroutines/waiting:goroutines (uint64, gauge, approximate) Goroutines that are blocked, waiting on something (I/O, timers, channels, locks, etc.). If this is high or rises suddenly, many goroutines are stalled on a shared bottleneck. Check what they are blocked on.

/sched/goroutines:goroutines (uint64, gauge) Number of goroutines currently alive. If this grows and does not come back down, you may have goroutines that are stuck, leaking, or waiting forever.

/sched/latencies:seconds (float64 histogram, cumulative) Histogram of scheduling delay: how long goroutines stayed ready-to-run before they actually started running on CPU.

Higher values mean goroutines are waiting longer to get CPU time, usually due to CPU pressure (too much runnable work) or frequent wakeups that create bursts of runnable goroutines.

/sched/pauses/stopping/gc:seconds (float64 histogram, cumulative) Histogram of the “stopping” part of a GC stop-the-world event: how long it took to get all processors (Ps) to stop running Go code after the runtime decided to stop the world.

This is only the lead-up to the pause. The full pause time that affects latency is /sched/pauses/total/gc:seconds.

/sched/pauses/stopping/other:seconds (float64 histogram, cumulative) Histogram of the “stopping” part of a non-GC stop-the-world event: how long it took to get all processors (Ps) to stop.

This is only the lead-up to the full pause. See /sched/pauses/total/other:seconds for the total stop-the-world time for non-GC pauses.

/sched/pauses/total/gc:seconds (float64 histogram, cumulative) Histogram of total stop-the-world pause time for GC: from the moment the runtime requests the stop until the world is running again. This includes the time spent stopping all Ps and the time spent fully paused.

Higher values here directly show GC-related latency pauses that your program experiences.

/sched/pauses/total/other:seconds (float64 histogram, cumulative) Histogram of total stop-the-world pause time for non-GC reasons: from the moment the runtime requests the stop until the world is running again. This includes the time spent stopping all Ps and the time spent fully paused.

If you see large values here, your program is seeing stop-the-world pauses that are not caused by GC.

/sched/threads/total:threads (uint64, gauge) Number of live OS threads currently owned by the Go runtime. This can grow with cgo, blocking syscalls, or workloads that need many threads. Very large values can indicate lots of goroutines stuck in syscalls/cgo or heavy thread creation.

SYNC

Locks (like sync.Mutex and sync.RWMutex) protect shared data, but if many goroutines try to take the same lock, they will line up and wait. This is called lock contention, and it can limit throughput and increase latency.

/sync/mutex/wait/total:seconds (float64, cumulative, approximate) Total time goroutines spent blocked waiting to acquire locks (including sync.Mutex, sync.RWMutex, and some runtime-internal locks).

If this grows fast, lock contention is likely a meaningful cost in your program. The usual next step is to capture a mutex or block profile (runtime/pprof) to find the hottest locks and reduce shared locking or time spent inside the lock.

NOTES

CGO

CPU

GC

MEMORY

SCHEDULER

SYNC

Subscribe for backend, internals, and performance