Skip to content

Fix deadlock releasing TSFN after environment shutdown#480

Open
tetra-fox wants to merge 1 commit into
microsoft:mainfrom
tetra-fox:fix/tsfn-release-after-env-teardown
Open

Fix deadlock releasing TSFN after environment shutdown#480
tetra-fox wants to merge 1 commit into
microsoft:mainfrom
tetra-fox:fix/tsfn-release-after-env-teardown

Conversation

@tetra-fox

Copy link
Copy Markdown

Since nodejs/node#55877 (appeared in Node 24.14.0), environment shutdown finalizes a TSFN that still has use counts by releasing its resources while holding the TSFN mutex. Dropping the env reference there runs napi finalizers synchronously. The instance data finalizer disposes JSRuntimeContext, whose JSTsfnSynchronizationContext.Dispose() called napi_release_threadsafe_function on the same TSFN and re-locked a mutex the thread already holds. The JS thread deadlocks against itself. On older Node the stale release was a silent use-after-free. In VRCX (Electron 40 / .NET 9 / Linux) every quit hung the process:

Microsoft.JavaScript.NodeApi.Runtime.NodejsRuntime.ReleaseThreadSafeFunction(...)
Microsoft.JavaScript.NodeApi.Interop.JSThreadSafeFunction.Release()
Microsoft.JavaScript.NodeApi.Interop.JSTsfnSynchronizationContext.Dispose()
Microsoft.JavaScript.NodeApi.Interop.JSRuntimeContext.Dispose()
Microsoft.JavaScript.NodeApi.JSValue.FinalizeGCHandleToDisposable(...)

thread_finalize_cb fires when Node.js destroys the TSFN, before the instance data finalizer runs. The fix tracks it and skips the release once it has fired. The normal dispose path is unchanged, and both the callback and Dispose() run on the JS thread, so there is no race on the flag. The finalization docs describe this ordering and warn about use-after-free in napi_finalize callbacks.

Repro on Node.js >= 24.14.0 (Linux x64, stock mcr.microsoft.com/dotnet/runtime:9.0 container): npm install node-api-dotnet@0.9.21, then node -e "require('node-api-dotnet/net9.0')" never exits, parked on the stack above. With this change it exits immediately. On Node.js <= 24.13.0 the one-liner exits normally. Quit in VRCX hung 3/3 times on stock and exits cleanly 3/3 on this patch, and the existing suite passes.

Considered but left out: the same guard on the other TSFN entry points (Ref/Unref, Post/Send), and disposing JSRuntimeContext from a cleanup hook registered after the TSFN is created, so it runs while the TSFN is still alive (the pattern those docs recommend). The latter changes disposal timing during shutdown, so it seemed like a separate discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant