[flat] Post-processing support and use DataProvider::InternalId#1160
Conversation
# Conflicts: # diskann-bftree/src/neighbors.rs
…t-process # Conflicts: # diskann-bftree/src/neighbors.rs # diskann-providers/src/model/graph/provider/async_/inmem/full_precision.rs # diskann-providers/src/model/graph/provider/async_/inmem/product.rs # diskann-providers/src/model/graph/provider/async_/inmem/provider.rs # diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs # diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs # diskann/src/graph/index.rs
There was a problem hiding this comment.
Pull request overview
This PR extends FlatIndex::knn_search to support post-processing via the shared diskann::graph::glue::SearchPostProcess interface, aligning flat search with the graph search pipeline. It also simplifies flat search ID plumbing by tying DistancesUnordered/SearchStrategy directly to the provider’s DataProvider::InternalId (and making DistancesUnordered require HasId).
Changes:
- Add post-processing support to
FlatIndex::knn_searchviaSearchPostProcess. - Simplify flat-search ID typing by removing
SearchStrategy::Idand makingDistancesUnordered: HasId, usingP::InternalIdend-to-end. - Extend flat test harness/tests to validate post-processing behavior (including an “even IDs only” post-processor).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| diskann/src/flat/test/provider.rs | Updates test visitor/strategy implementations for the new HasId-based ID wiring. |
| diskann/src/flat/test/harness.rs | Adds an oracle/processor abstraction and post-processing test utilities. |
| diskann/src/flat/test/cases/flat_knn_search.rs | Updates baseline flat knn test to use the new harness API (processor-aware). |
| diskann/src/flat/strategy.rs | Makes DistancesUnordered require HasId and ties SearchStrategy visitor IDs to P::InternalId. |
| diskann/src/flat/mod.rs | Updates module docs to reflect the shared post-processing interface. |
| diskann/src/flat/index.rs | Adds the SearchPostProcess hook to FlatIndex::knn_search and updates tests accordingly. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1160 +/- ##
==========================================
+ Coverage 89.46% 89.47% +0.01%
==========================================
Files 487 487
Lines 92106 92155 +49
==========================================
+ Hits 82401 82455 +54
+ Misses 9705 9700 -5
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This PR adds support to do post-processing in the
FlatIndex::knn_searchvia thediskann::glue::SearchPostProcesstrait; which is the same interface used by the graph search API.In addition, I had left a hook in the flat search
SearchStrategyandDistancesUnorderedto support ID types that were not convertible to and from scalar-like values. With #1145, that constraint is now gone inVectorIdand we are free to correctly tie the ID type in theDistancesUnorderedimplementation used by the flat index to theInternalIdof the underlyingDataProvider. This means:DistancesUnorderednow hasHasIdas a super-trait.SearchStrategyno longer needsIdas an associated type.Testing
Added testing in the test module to make sure post-processing works as it should via a custom post-processor that only keeps even ids. Added some extra code in the
test::harnessmodule just to make the dispatch to post-processing generic enough so I can enumerate easily over cases.For reviewers: Review the
flat\strategy.rschanges first, thenflat\index.rs. Optionally, then, review the testing code.