Add LockSupport TaskBlock profiling instrumentation#11545
Conversation
This comment has been minimized.
This comment has been minimized.
804e4ef to
c3fca82
Compare
c3fca82 to
db17e59
Compare
db17e59 to
b5ead17
Compare
b5ead17 to
235547a
Compare
| "profiling.ddprof.wall.precheck"; | ||
| public static final boolean PROFILING_DATADOG_PROFILER_WALL_PRECHECK_DEFAULT = false; | ||
|
|
||
| public static final String PROFILING_TASK_BLOCK_UNBLOCKING_ENABLED = |
There was a problem hiding this comment.
[Sphinx Review — MEDIUM] PROFILING_TASK_BLOCK_UNBLOCKING_ENABLED and its _DEFAULT are defined but never read anywhere in the codebase. The unblocking-span attribution path (recordUnpark writing UNPARKING_SPAN) runs unconditionally with no kill switch. The constant is also absent from supported-configurations.json.
Suggestion: Either wire the flag into LockSupportHelper.recordUnpark() (guard UNPARKING_SPAN population when disabled) and register DD_PROFILING_TASKBLOCK_UNBLOCKING_ENABLED in supported-configurations.json, or delete both constants.
[CONSENSUS · concern: necessity · confidence: high]
| "aliases": [] | ||
| } | ||
| ], | ||
| "DD_PROFILING_ASYNC_WALL_PRECHECK": [ |
There was a problem hiding this comment.
[Sphinx Review — MEDIUM] Spurious metadata entry DD_PROFILING_ASYNC_WALL_PRECHECK: no profiling.async.wall.precheck property exists in the codebase. The only precheck config is profiling.ddprof.wall.precheck, which maps to DD_PROFILING_DDPROF_WALL_PRECHECK and is already present at line 2996. This phantom entry will be surfaced to docs generators and Helm chart consumers as a real env-var.
Suggestion: Remove the DD_PROFILING_ASYNC_WALL_PRECHECK block. The precheck flag is already exposed via DD_PROFILING_DDPROF_WALL_PRECHECK.
[concern: consistency · confidence: high]
| } catch (Throwable ignored) { | ||
| // parkEnter failed (e.g. profiler not yet initialised, JNI error); do not track this park. | ||
| // Drain any stale unblocking-span entry so it is not mis-attributed to the next park. | ||
| UNPARKING_SPAN.remove(Thread.currentThread()); |
There was a problem hiding this comment.
[Sphinx Review — LOW] recordUnpark inserts an entry for the target thread on every traced unpark, but draining only happens inside an instrumented park return. The unpark-before-park idiom (a normal LockSupport pattern where the permit is set before the thread parks) leaves a stale entry that the next unrelated instrumented park on that thread will consume and mis-attribute as being unblocked by the wrong span.
Suggestion: Extend the existing 'Known limitation' javadoc to explicitly cover the unpark-before-park staleness window. For stronger correctness, stamp entries with a monotonic sequence and ignore stale entries on park return.
[concern: correctness · confidence: medium]
| } else { | ||
| // Platform thread: parkExit() clears native parked state and records an eligible TaskBlock | ||
| // using the entry tick saved by parkEnter(). | ||
| state.profiling.parkExit(state.blockerHash, unblockingSpanId); |
There was a problem hiding this comment.
[Sphinx Review — LOW] Asymmetric exception handling in finish(ParkState,long): the virtual-thread branch wraps recordTaskBlockWithContext in try/catch(Throwable) but the platform-thread branch calls parkExit() unguarded. The advice suppress=Throwable covers the production call path, but the inconsistency is a latent reliability hole for direct callers.
Suggestion: Wrap the parkExit() call in try { ... } catch (Throwable ignored) {} to match the virtual-thread branch.
[concern: robustness · confidence: medium]
| * TaskBlock} emitted on that thread. | ||
| */ | ||
| @Test | ||
| void stale_entry_is_drained_when_park_fires_without_active_span() { |
There was a problem hiding this comment.
[Sphinx Review — LOW] Drain-semantics coverage gap. The one-arg finish(ParkState) overload drains UNPARKING_SPAN before delegating; the public two-arg finish(ParkState,long) with null state returns early without touching the map. Tests exercise only the one-arg drain path.
Suggestion: Either add a test asserting finish(null, x) does NOT drain (documenting the intended asymmetry), or make the two-arg overload also drain on null state for uniform behavior.
[concern: completeness · confidence: medium]
| ProfilerContext ctx = ProfilerContexts.of(AgentTracer.activeSpan()); | ||
| if (ctx == null) { | ||
| // No active span - nothing to record. | ||
| UNPARKING_SPAN.remove(Thread.currentThread()); |
There was a problem hiding this comment.
[Sphinx Review — LOW] The virtual-thread branch of captureState (if (VirtualThreads.isCurrent())) is never executed in unit tests — tests always run on platform threads. Inverting the condition would pass all tests silently. Span/root ID capture, getCurrentTicks(), and recordTaskBlockWithContext invocation have no deterministic unit coverage.
Suggestion: Add a unit test that constructs a virtual ParkState directly (5-arg constructor) and calls finish, asserting recordTaskBlockWithContext is invoked and that the spanId==0 guard suppresses emission when inactive.
[concern: test-adequacy · confidence: high · mutation-detected]
What Does This Do
Add TaskBlock profiling instrumentation for
LockSupport.park/unpark, including blocker and unblocking span attribution where available.Motivation
LockSupport.parkis widely used by executors, queues, schedulers, and virtual-thread internals. Capturing these park intervals as TaskBlock events makes parked time attributable to spans and easier to separate from active execution in wall-clock profiles.Additional Notes
DD_TRACE_LOCK_SUPPORT_ENABLEDContributor Checklist
type:and (comp:orinst:) labels in addition to any other useful labelsclose,fix, or any linking keywords when referencing an issueUse
solvesinstead, and assign the PR milestone to the issue/merge. You can also:/merge --commit-message "..."/merge -c/merge -f --reason "reason"; please use this judiciously, as some checks do not run at the PR-levelJira ticket: [PROJ-IDENT]