Fix memory leak and stale-model results when hot-swapping PredictionEnginePool#7626
Fix memory leak and stale-model results when hot-swapping PredictionEnginePool#7626kotlarmilos wants to merge 2 commits into
Conversation
25e5a39 to
dbaddcf
Compare
There was a problem hiding this comment.
Pull request overview
This PR improves the reliability of PredictionEnginePool hot-swapping under live traffic by ensuring pooled PredictionEngine instances are returned to (and disposed with) the correct pool generation, avoiding stale-model reuse and unbounded native-memory growth. It also adds test coverage for hot-swap behavior and pool disposal, and introduces an Azure DevOps “report green” pipeline for Build Analysis.
Changes:
- Add a generation-owning
DisposablePredictionEnginePooland updatePoolLoaderto route rentals/returns to the originating generation. - Make
PredictionEnginePooldisposable and add tests covering in-flight usage across a hot-swap and disposal behavior. - Add
build/report-green.ymlto emit a build.completed event for docs/workflow-only PRs.
Show a summary per file
| File | Description |
|---|---|
| test/Microsoft.Extensions.ML.Tests/PredictionEnginePoolTests.cs | Adds tests for hot-swap behavior and pool disposal. |
| src/Microsoft.Extensions.ML/PredictionEnginePool.cs | Implements IDisposable and routes rent/return through PoolLoader generation-aware APIs. |
| src/Microsoft.Extensions.ML/PoolLoader.cs | Introduces generation-aware Get/Return and swaps generations on reload. |
| src/Microsoft.Extensions.ML/DisposablePredictionEnginePool.cs | New pool implementation that disposes retained engines and drains on generation disposal. |
| build/report-green.yml | Adds an AzDO no-op pipeline for Build Analysis on build-excluded PRs. |
Copilot's findings
- Files reviewed: 5/5 changed files
- Comments generated: 6
dbaddcf to
e66c00f
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #7626 +/- ##
==========================================
+ Coverage 69.59% 69.60% +0.01%
==========================================
Files 1484 1484
Lines 273606 273764 +158
Branches 27949 27969 +20
==========================================
+ Hits 190410 190567 +157
+ Misses 75832 75816 -16
- Partials 7364 7381 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@kotlarmilos, I guess you already tried so asking mainly to get better context - there is a DisposableObjectPool already in the dotnet/aspnetcore and it should be actually used when we ask for a pool of a disposable type (see here) so is the problem really in the disposing and do we make the pool the right way or it's a bug in the pool we should report (not with fixing it temporarily with our own patched pool for sure)? Edit: I actually see the place where we create DefaultObjectPool directly. |
rosebyte
left a comment
There was a problem hiding this comment.
Left two comments to discuss.
…nginePool ## Summary Makes zero-downtime model hot-swapping in `Microsoft.Extensions.ML` safe under live traffic: `PredictionEnginePool` can now be reloaded continuously without leaking native memory, serving predictions from a stale model, or disposing a shared transformer out from under sibling engines. ## Problem `PredictionEngine<TData,TPrediction>` wraps native resources and is `IDisposable`. Three defects made hot-swap unsafe for a long-lived, high-throughput service: 1. **Native memory leak on reload.** `PoolLoader.LoadPool()` created the pool with `new DefaultObjectPool<>(...)` and, on reload, swapped in a new pool via `Interlocked.Exchange` without disposing the old one. `DefaultObjectPool<T>` never disposes the objects it retains, so every reload abandoned a generation of native engines to the GC finalizer (effectively unbounded growth at a 15-20 minute swap cadence). 2. **Cross-generation contamination (wrong-model results).** `ReturnPredictionEngine` always returned an engine to the *current* pool. An engine rented against the old model and returned after a swap landed in the *new* model's pool and was later handed to a caller, producing stale-model predictions. 3. **Shared-transformer double dispose.** `PredictionEnginePoolPolicy.Create()` built engines with the default `ownsTransformer: true`, so every engine in a generation believed it owned the single shared `ITransformer`. Once engines are actually disposed (e.g. on overflow), the first disposal tears down the shared model for all sibling engines. ## Change - `PoolLoader` now builds each pool generation with `DefaultObjectPoolProvider`, which returns the framework's `DisposableObjectPool` for `IDisposable` element types. That pool already disposes overflow engines, disposes engines returned after it has been disposed, and disposes everything it retains on `Dispose()` - so no custom pool type is needed. - `PoolLoader` is generation-aware: a `ConditionalWeakTable` records the generation each rented engine came from and routes it back to that exact generation on return, so an old-model engine is never mixed into the new pool. On reload the old generation is disposed atomically after the swap, and the generation's `ITransformer` is disposed with it (guarded by reference equality so a model still in use by the surviving generation is never disposed). - `PredictionEnginePoolPolicy.Create()` now passes `OwnsTransformer = false`; the shared model's lifetime is owned by `PoolLoader` (per generation) instead of by each engine. - `PredictionEnginePool` implements `IDisposable`, guards `GetPredictionEngine` against use after disposal, disposes (rather than throws on) engines returned after disposal, and disposes its loaders so the file/uri watchers and change-token registrations are torn down. ## Proof New tests: `pool_serves_predictions_across_a_hot_swap` (serve across an in-flight swap; stale engine not reused), `disposing_pool_releases_loader_resources`, and `pooled_engines_do_not_dispose_the_shared_model` (rents past the retention limit to force overflow disposal and asserts the shared transformer is never disposed; this test fails if `ownsTransformer` is left at its default of true). ``` Passed! - Failed: 0, Passed: 10, Skipped: 0, Total: 10 - Microsoft.Extensions.ML.Tests.dll (net8.0) ``` ## Risk Behavioral change limited to disposal/lifetime semantics. Public API addition only (`PredictionEnginePool` now `IDisposable`). No change to prediction results on the happy path; stale-model engines are simply no longer reused after a swap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e66c00f to
d421aab
Compare
…Pool Make the _disposed flag an int updated via Interlocked.Exchange so Dispose runs exactly once, and read it with Volatile.Read on all hot paths to avoid races between rent/return, reload callbacks, and disposal. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
anicka-net
left a comment
There was a problem hiding this comment.
Reviewed the concurrency model — ConditionalWeakTable for generation tracking, idempotent dispose, OwnsTransformer=false to prevent shared model disposal. Tests cover hot-swap, double-dispose, and pool overflow. LGTM.
Description
This PR makes zero-downtime model hot-swapping in Microsoft.Extensions.ML safe under live traffic. Before this change, reloading a PredictionEnginePool had three problems. It leaked native memory, because the pool was built with new DefaultObjectPool, which never disposes the prediction engines it holds, so every reload abandoned a generation of native engines. It could also serve predictions from the old model, because a returned engine always went back to the current pool, so an engine rented against the previous model could be handed out by the new pool after a swap. And it risked disposing the shared model out from under other engines, because each engine was created with ownsTransformer set to true and so believed it solely owned the single shared transformer.
This PR builds each pool generation with DefaultObjectPoolProvider, which returns the framework's DisposableObjectPool for disposable types and disposes engines for us. It routes each rented engine back to the exact generation it came from using a ConditionalWeakTable and disposes the old generation atomically on swap. It creates engines with ownsTransformer set to false, so PoolLoader owns the shared model's lifetime per generation and disposes it only when no surviving generation still uses it. PredictionEnginePool now also implements IDisposable, guards against use after disposal, and disposes its loaders so the file and uri watchers are torn down. New tests cover serving across an in-flight hot-swap, loader disposal, and a regression test that forces overflow disposal and confirms the shared model is never disposed.