Skip to content

Add LockSupport TaskBlock profiling instrumentation#11545

Draft
kaahos wants to merge 7 commits into
masterfrom
paul.fournillon/wallclock-locksupport-taskblock
Draft

Add LockSupport TaskBlock profiling instrumentation#11545
kaahos wants to merge 7 commits into
masterfrom
paul.fournillon/wallclock-locksupport-taskblock

Conversation

@kaahos

@kaahos kaahos commented Jun 3, 2026

Copy link
Copy Markdown

What Does This Do

Add TaskBlock profiling instrumentation for LockSupport.park/unpark, including blocker and unblocking span attribution where available.

Motivation

LockSupport.park is widely used by executors, queues, schedulers, and virtual-thread internals. Capturing these park intervals as TaskBlock events makes parked time attributable to spans and easier to separate from active execution in wall-clock profiles.

Additional Notes

Contributor Checklist

  • Format the title according to the contribution guidelines
  • Assign the type: and (comp: or inst:) labels in addition to any other useful labels
  • Avoid using close, fix, or any linking keywords when referencing an issue
    Use solves instead, and assign the PR milestone to the issue
  • Update the CODEOWNERS file on source file addition, migration, or deletion
  • Update public documentation with any new configuration flags or behaviors
  • Add your completed PR to the merge queue by commenting /merge. You can also:
    • Customize the commit message associated with the merge with /merge --commit-message "..."
    • Remove your PR from the merge queue with /merge -c
    • Skip all merge queue checks with /merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-level
    • Get more information in this doc

Jira ticket: [PROJ-IDENT]

@datadog-official

This comment has been minimized.

@kaahos kaahos force-pushed the paul.fournillon/wallclock-locksupport-taskblock branch from 804e4ef to c3fca82 Compare June 3, 2026 12:20
@kaahos kaahos force-pushed the paul.fournillon/wallclock-locksupport-taskblock branch from c3fca82 to db17e59 Compare June 5, 2026 08:47
@kaahos kaahos force-pushed the paul.fournillon/wallclock-locksupport-taskblock branch from db17e59 to b5ead17 Compare June 5, 2026 09:32
@kaahos kaahos force-pushed the paul.fournillon/wallclock-locksupport-taskblock branch from b5ead17 to 235547a Compare June 5, 2026 13:18
"profiling.ddprof.wall.precheck";
public static final boolean PROFILING_DATADOG_PROFILER_WALL_PRECHECK_DEFAULT = false;

public static final String PROFILING_TASK_BLOCK_UNBLOCKING_ENABLED =

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — MEDIUM] PROFILING_TASK_BLOCK_UNBLOCKING_ENABLED and its _DEFAULT are defined but never read anywhere in the codebase. The unblocking-span attribution path (recordUnpark writing UNPARKING_SPAN) runs unconditionally with no kill switch. The constant is also absent from supported-configurations.json.

Suggestion: Either wire the flag into LockSupportHelper.recordUnpark() (guard UNPARKING_SPAN population when disabled) and register DD_PROFILING_TASKBLOCK_UNBLOCKING_ENABLED in supported-configurations.json, or delete both constants.

[CONSENSUS · concern: necessity · confidence: high]

"aliases": []
}
],
"DD_PROFILING_ASYNC_WALL_PRECHECK": [

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — MEDIUM] Spurious metadata entry DD_PROFILING_ASYNC_WALL_PRECHECK: no profiling.async.wall.precheck property exists in the codebase. The only precheck config is profiling.ddprof.wall.precheck, which maps to DD_PROFILING_DDPROF_WALL_PRECHECK and is already present at line 2996. This phantom entry will be surfaced to docs generators and Helm chart consumers as a real env-var.

Suggestion: Remove the DD_PROFILING_ASYNC_WALL_PRECHECK block. The precheck flag is already exposed via DD_PROFILING_DDPROF_WALL_PRECHECK.

[concern: consistency · confidence: high]

} catch (Throwable ignored) {
// parkEnter failed (e.g. profiler not yet initialised, JNI error); do not track this park.
// Drain any stale unblocking-span entry so it is not mis-attributed to the next park.
UNPARKING_SPAN.remove(Thread.currentThread());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — LOW] recordUnpark inserts an entry for the target thread on every traced unpark, but draining only happens inside an instrumented park return. The unpark-before-park idiom (a normal LockSupport pattern where the permit is set before the thread parks) leaves a stale entry that the next unrelated instrumented park on that thread will consume and mis-attribute as being unblocked by the wrong span.

Suggestion: Extend the existing 'Known limitation' javadoc to explicitly cover the unpark-before-park staleness window. For stronger correctness, stamp entries with a monotonic sequence and ignore stale entries on park return.

[concern: correctness · confidence: medium]

} else {
// Platform thread: parkExit() clears native parked state and records an eligible TaskBlock
// using the entry tick saved by parkEnter().
state.profiling.parkExit(state.blockerHash, unblockingSpanId);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — LOW] Asymmetric exception handling in finish(ParkState,long): the virtual-thread branch wraps recordTaskBlockWithContext in try/catch(Throwable) but the platform-thread branch calls parkExit() unguarded. The advice suppress=Throwable covers the production call path, but the inconsistency is a latent reliability hole for direct callers.

Suggestion: Wrap the parkExit() call in try { ... } catch (Throwable ignored) {} to match the virtual-thread branch.

[concern: robustness · confidence: medium]

* TaskBlock} emitted on that thread.
*/
@Test
void stale_entry_is_drained_when_park_fires_without_active_span() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — LOW] Drain-semantics coverage gap. The one-arg finish(ParkState) overload drains UNPARKING_SPAN before delegating; the public two-arg finish(ParkState,long) with null state returns early without touching the map. Tests exercise only the one-arg drain path.

Suggestion: Either add a test asserting finish(null, x) does NOT drain (documenting the intended asymmetry), or make the two-arg overload also drain on null state for uniform behavior.

[concern: completeness · confidence: medium]

ProfilerContext ctx = ProfilerContexts.of(AgentTracer.activeSpan());
if (ctx == null) {
// No active span - nothing to record.
UNPARKING_SPAN.remove(Thread.currentThread());

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Sphinx Review — LOW] The virtual-thread branch of captureState (if (VirtualThreads.isCurrent())) is never executed in unit tests — tests always run on platform threads. Inverting the condition would pass all tests silently. Span/root ID capture, getCurrentTicks(), and recordTaskBlockWithContext invocation have no deterministic unit coverage.

Suggestion: Add a unit test that constructs a virtual ParkState directly (5-arg constructor) and calls finish, asserting recordTaskBlockWithContext is invoked and that the spanId==0 guard suppresses emission when inactive.

[concern: test-adequacy · confidence: high · mutation-detected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants