jvm-gc-tuning
Diagnoses JVM garbage-collection behaviour under load: reads and interprets unified GC logs (-Xlog:gc*), selects the right collector (G1 vs ZGC vs Parallel vs Serial), tunes heap sizing and pause-time targets, quantifies allocation rate, and traces the GC-pause-to-latency-tail link using GCViewer and Java Flight Recorder (JFR). Use when a load test reveals p99/p999 latency spikes that correlate with GC activity, or when heap sizing and collector selection need justification before a performance baseline is locked.
jvm-gc-tuning
Overview
JVM garbage collection is a common hidden cause of latency tail in load tests. A JVM application that looks CPU-bound or I/O-bound in a flame graph may instead be stalling on stop-the-world (STW) pauses: the request thread is live but blocked while the collector scans live objects. This skill walks the full diagnostic workflow from log capture through collector selection to heap sizing and tool analysis.
Upstream: pair with flame-graph-analyzer when GC frames appear as wide leaves in the async-profiler output. Downstream: hand tuned JVM flags to the load runner (k6-load-testing, gatling-load-testing) to re-run the baseline after changes.
Step 1 - Enable unified GC logging
Java 9+ replaced the legacy -XX:+PrintGCDetails family with the unified logging framework. The canonical capture command, per the Java 21 java man page:
# Minimum useful capture (info level, timestamped, rotating files)
java -Xlog:gc*:file=gc.log:time,uptime,level,tags:filecount=5,filesize=20M \
-Xms4g -Xmx4g \
-jar app.jarRecommended tag set for GC diagnostics:
| Tag | What it adds |
|---|---|
gc | Every collection event + pause time |
gc+heap | Heap occupancy before/after each GC |
gc+phases | Phase breakdown inside each pause (G1/ZGC) |
gc+age | Object age histograms (tenuring threshold) |
gc+alloc+profiling | Top allocation call sites (requires -Xlog:gc+alloc+profiling=debug) |
For a single diagnostic capture that covers all these tags:
java -Xlog:gc*,gc+phases,gc+heap,gc+age,gc+alloc+profiling=debug:file=gc_full.log:time,uptime,pid,level \
-jar app.jarLegacy flags (-XX:+PrintGCDetails, -XX:+PrintGCTimeStamps) still compile in Java 21 but are deprecated. Migrate to -Xlog:gc* before Java 25.
Step 2 - Read the GC log: key signals
A G1 GC log line at info level looks like:
[2.415s][info][gc] GC(3) Pause Young (Normal) (G1 Evacuation Pause) 512M->320M(2048M) 12.453msFields:
| Field | Meaning |
|---|---|
2.415s | Uptime at pause start |
GC(3) | Monotonic GC event counter |
Pause Young (Normal) | Pause type (Young / Mixed / Full) |
512M->320M(2048M) | Heap before -> after (heap capacity) |
12.453ms | Wall-clock pause duration (STW time) |
Red flags to scan for first:
Step 3 - Select the right collector
Per the HotSpot GC Tuning Guide, Available Collectors:
| Situation | Collector | Flag |
|---|---|---|
| Small dataset (<= 100 MB heap) or single CPU | Serial | -XX:+UseSerialGC |
| Batch / offline job; throughput dominant; pauses >= 1 s acceptable | Parallel | -XX:+UseParallelGC |
| Server app; pause target 50-500 ms; default on modern JVMs | G1 | -XX:+UseG1GC (default) |
| Latency-critical service; pause target < 1 ms; heap 100 MB to 16 TB | ZGC (Generational) | -XX:+UseZGC -XX:+ZGenerational |
Oracle's guidance: "Allow the VM to select automatically unless you have strict pause-time requirements. Adjust heap size first before changing collectors." (Available Collectors)
Pause-time vs throughput tradeoff
Per the Ergonomics section: GC throughput overhead is calculated via GCTimeRatio:
max GC time fraction = 1 / (1 + GCTimeRatio)-XX:GCTimeRatio=19 targets at most 5% of CPU in GC (default for Parallel collector). Setting -XX:MaxGCPauseMillis too aggressively on G1 reduces young-gen size and increases GC frequency, which can hurt throughput even while individual pauses shrink. This is the core tradeoff: shorter pauses require more frequent collections.
ZGC eliminates this tradeoff for most workloads by doing nearly all work concurrently, at the cost of slightly higher CPU and memory overhead compared to G1 (ZGC documentation).
Step 4 - Tune G1 for a pause-time target
From the G1 GC Tuning section:
Baseline low-latency profile (target < 50 ms pauses)
-XX:+UseG1GC
-Xms4g -Xmx4g # fixed heap avoids resize pauses
-XX:MaxGCPauseMillis=50 # pause-time target (statistical goal, not hard cap)
-XX:GCPauseIntervalMillis=200 # measure target over 200 ms window
-XX:G1NewSizePercent=10 # smaller young gen -> shorter pauses
-XX:G1MaxNewSizePercent=30
-XX:+G1UseAdaptiveIHOP # let G1 learn IHOP from observed marking durationKey G1 flags
| Flag | Default | Effect |
|---|---|---|
-XX:MaxGCPauseMillis | 200 | Pause-time target; G1 sizes young gen to meet this with high probability |
-XX:InitiatingHeapOccupancyPercent | 45% | Triggers concurrent marking when old gen occupancy reaches this level |
-XX:G1HeapRegionSize | ergonomic (~2048 regions) | Region size 1-512 MB (power of 2); objects >= half this become humongous |
-XX:G1MixedGCLiveThresholdPercent | 85% | Regions with live occupancy above this are skipped during mixed GC |
-XX:G1HeapWastePercent | 5% | Space-reclamation phase ends when reclaimable space drops below this |
Important: G1 is not a real-time collector. The pause target is a statistical goal. Setting it below 10 ms without also switching to ZGC typically causes G1 to thrash with very frequent tiny collections.
Step 5 - Tune ZGC for sub-millisecond latency
From the ZGC section: ZGC performs all expensive work concurrently; pauses remain below 1 ms regardless of heap size. The primary tuning lever is -Xmx.
-XX:+UseZGC -XX:+ZGenerational # generational mode recommended
-Xms8g -Xmx8g
-XX:+AlwaysPreTouch # pre-page memory at startup (reduces first-request jitter)
-XX:SoftMaxHeapSize=7g # soft ceiling; GC keeps heap under this when possibleZGC does NOT benefit from manual tuning of pause-time targets - those flags are G1-specific. If ZGC pauses grow, increase -Xmx or reduce allocation rate.
Step 6 - Analyse allocation rate
High allocation rate is the primary driver of GC frequency. Allocation rate (MB/s) = heap consumed between GC events / time between events.
Compute from GC log heap lines:
# Extract heap-before values from gc.log
grep "Pause Young" gc.log \
| awk -F'[-> ()ms]+' '{print $5, $7, $9, $10}' \
# fields: before_MB after_MB capacity_MB pause_msAllocation rate signals:
| Rate | Implication |
|---|---|
| < 100 MB/s | Normal for most services |
| 100-500 MB/s | Watch for frequent young-gen collections |
| > 500 MB/s | Object pooling, streaming, or escape-analysis fixes needed |
| Sudden spike during load test | Request path creating large short-lived objects |
To identify the allocation call sites, enable gc+alloc+profiling:
-Xlog:gc+alloc+profiling=debug:file=alloc.log:timeOr use JFR (see Step 7) which captures allocation sites with lower overhead.
Step 7 - Use Java Flight Recorder (JFR) for deep analysis
JFR provides structured GC event data with nanosecond resolution and captures allocation stack traces. Per the JFR API documentation, JFR provides more context, better timing control, and richer event information than text log files.
Enable JFR during a load test:
java -XX:StartFlightRecording=duration=120s,filename=recording.jfr,settings=profile \
-jar app.jarKey JFR event categories for GC analysis:
| Event category | What to look for |
|---|---|
jdk.GarbageCollection | Duration, cause, GC ID per collection |
jdk.G1HeapSummary | Eden/survivor/old occupancy before and after |
jdk.ObjectAllocationInNewTLAB | Allocation call stacks (sampled) |
jdk.ObjectAllocationOutsideTLAB | Large object allocations (humongous) |
jdk.GCHeapSummary | Heap used/committed at collection start |
Analyse the .jfr file with JDK Mission Control (GUI) or the jfr CLI:
# Print GC pause summary from recording
jfr print --events jdk.GarbageCollection recording.jfr | head -100
# Print allocation call sites (top allocators)
jfr print --events jdk.ObjectAllocationInNewTLAB recording.jfr \
| sort -t'=' -k2 -rn | head -20Step 8 - Analyse with GCViewer
GCViewer parses GC log files and computes summary metrics (throughput, pause time statistics, allocation rate, promotion rate) from the same -Xlog:gc* output.
# Run GCViewer (requires Java)
java -jar gcviewer-1.36.jar gc.logKey metrics GCViewer surfaces:
| Metric | GC health interpretation |
|---|---|
| Throughput % | % of time NOT in GC; target > 95% for most services |
| Avg pause time | Mean STW duration; should be below MaxGCPauseMillis |
| Max pause time | Worst-case STW; the main driver of p99/p999 latency |
| Allocation rate (MB/s) | Matches analysis in Step 6 |
| Promotion rate (MB/s) | Objects escaping young gen; high rate fills old gen faster |
Step 9 - Trace the GC-pause-to-latency-tail link
A GC pause does not appear in CPU flame graphs (threads are suspended). To correlate GC pauses with HTTP latency spikes:
Output format
After completing the diagnostic, emit a findings block:
## JVM GC diagnosis - <service-name>
**Collector:** G1 / ZGC / Parallel / Serial
**Heap:** Xms=Ng / Xmx=Ng (fixed or dynamic)
**Log capture window:** Ns during load test at Nrps
### Pause summary
| Metric | Observed | Target |
|--------|----------|--------|
| Mean pause | Nms | MaxGCPauseMillis |
| p99 pause | Nms | - |
| Max pause (cause) | Nms (Pause Full / Humongous / ...) | - |
| Throughput (% non-GC) | N% | >95% |
### Allocation rate
N MB/s average; peak N MB/s at N rps load.
Top allocator: <class>.<method> (<evidence: JFR or alloc log>)
### Finding
<One sentence: what is wrong and why it causes latency tail.>
### Recommended changes
1. <Flag change with before/after value>
2. <Heap resize or allocation fix>
### Verification
Re-run load test with the same scenario; confirm p99 drops below threshold and
throughput % improves. Feed results to perf-budget-gate.Anti-patterns
| Anti-pattern | Why it fails | Fix |
|---|---|---|
| Tuning pause target without fixing allocation rate | Shorter target forces smaller young gen; collections become more frequent; throughput drops | Reduce allocation rate first; then lower pause target |
Setting -Xms much smaller than -Xmx | Heap resize pauses during warm-up inflate early p99 | Set Xms = Xmx in production |
| Switching to ZGC without increasing heap | ZGC needs more headroom for concurrent reclamation | Add 20-30% extra heap when migrating from G1 to ZGC |
Reading only Pause Full lines and ignoring Young pauses | Full GC is the most visible event but frequent young pauses add up to more cumulative stall | Compute cumulative GC time from all pause types |
| Diagnosing GC from flame graph alone | STW pauses do not appear in on-CPU profilers; threads are suspended | Correlate GC log timestamps with latency percentiles |