Add a FilteredAccessor for filtered search.#1141
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new glue::FilteredAccessor abstraction to make filtered graph search “filter-aware” at the accessor layer, and refactors multi-hop filtered search to use this new trait. It also relocates the legacy QueryLabelProvider-based filtering hook into a new graph::ext::labeled compatibility module with adapter types so existing call sites can be wrapped with minimal churn.
Changes:
- Add
Decision<T>,ExpansionKind, and a newglue::FilteredAccessortrait for filter-aware expansion and start-point handling. - Refactor
MultihopSearchto operate onFilteredAccessorand fix start-point handling (rejected start points now flow into two-hop expansion). - Add
graph::ext::labeledcompatibility adapters (strategy + accessor wrapper) and update affected crates/tests; remove distance-tweaking callback filtering test coverage and baselines.
Reviewed changes
Copilot reviewed 23 out of 25 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann/test/generated/graph/test/cases/multihop/even_filtering_grid.json | Updates baseline stats due to corrected start-point handling in multihop filtered search. |
| diskann/test/generated/graph/test/cases/multihop/callback_filtering_grid.json | Removes baseline for dropped “distance tweaking” callback-filtering behavior. |
| diskann/src/internal/counter.rs | Adds local counter value accessor used by tests to assert distance computation counts. |
| diskann/src/graph/test/provider.rs | Exposes get_vector_count() for test assertions around distance computation behavior. |
| diskann/src/graph/test/cases/multihop.rs | Refactors tests to use ext::labeled compatibility layer and updates expectations for rejected start points; removes callback-filtering tests. |
| diskann/src/graph/search/range_search.rs | Tightens SearchStrategy bounds to require SearchAccessor: SearchAccessor for range search. |
| diskann/src/graph/search/multihop_search.rs | Rewrites multihop search to use FilteredAccessor and Decision-based acceptance/rejection. |
| diskann/src/graph/search/knn_search.rs | Tightens SearchStrategy bounds to require SearchAccessor: SearchAccessor for KNN search. |
| diskann/src/graph/search/diverse_search.rs | Tightens SearchStrategy bounds and imports SearchAccessor for diverse search. |
| diskann/src/graph/mod.rs | Adds graph::ext module. |
| diskann/src/graph/index.rs | Removes QueryLabelProvider/QueryVisitDecision from graph::index and updates SearchStrategy bounds for paged search. |
| diskann/src/graph/glue.rs | Adds Decision, ExpansionKind, and FilteredAccessor; relaxes SearchStrategy::SearchAccessor associated type bound to HasId. |
| diskann/src/graph/ext/mod.rs | Introduces graph::ext module root. |
| diskann/src/graph/ext/labeled.rs | Adds QueryLabelProvider and adapters (Filtered strategy + FilteredAccessor wrapper + post-process step) for legacy filter integration. |
| diskann-providers/src/model/graph/provider/layers/betafilter.rs | Updates imports to new ext::labeled::QueryLabelProvider location. |
| diskann-providers/src/model/graph/provider/async_/inmem/spherical.rs | Adjusts SearchStrategy bounds with explicit associated-type constraint for SearchAccessor. |
| diskann-providers/src/model/graph/provider/async_/inmem/scalar.rs | Adjusts SearchStrategy bounds with explicit associated-type constraint for SearchAccessor. |
| diskann-providers/src/index/wrapped_async.rs | Updates SearchStrategy bounds and refactors internal “noawait” paged-search wiring to avoid projection issues. |
| diskann-providers/src/index/diskann_async.rs | Updates imports and generic bounds to align with updated SearchStrategy associated-type constraints. |
| diskann-garnet/src/labels.rs | Updates imports to new ext::labeled::QueryLabelProvider location. |
| diskann-benchmark/src/utils/filters.rs | Updates imports to new ext::labeled::QueryLabelProvider location. |
| diskann-benchmark/src/backend/index/benchmarks.rs | Adjusts DefaultSearchStrategy bounds with explicit associated-type constraint for SearchAccessor. |
| diskann-benchmark-core/src/search/graph/range.rs | Adds a trait bound ensuring graph::search::Range implements graph::Search for the selected strategy. |
| diskann-benchmark-core/src/search/graph/multihop.rs | Updates to new multihop API (filter moved into strategy wrapper) and import path changes. |
| diskann-benchmark-core/src/search/graph/knn.rs | Adds a trait bound ensuring graph::search::Knn implements graph::Search for the selected strategy. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1141 +/- ##
==========================================
- Coverage 89.47% 89.46% -0.01%
==========================================
Files 486 487 +1
Lines 92161 92170 +9
==========================================
+ Hits 82458 82463 +5
- Misses 9703 9707 +4
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| type Id = A::Id; | ||
| } | ||
|
|
||
| impl<'a, A> glue::FilteredAccessor for FilteredAccessor<'a, A> |
There was a problem hiding this comment.
impl glue::FilteredAccessor for FilteredAccessor is confusing.
What about naming the struct LabelFilteredAccessor or QueryLabelFilteredAccessor?
There was a problem hiding this comment.
It's a bit confusing, but I find there are often naming issues that arise like this. The FilteredAccessor's main job is to implement the FilteredAccessor trait. It is in the labeled module and not reexported, so can be used as labeled::FilteredAccessor. The problem I have with adding prefixes is that you end up with something like labeled::LabelFilterAccessor, but where do you stop? You could to graph::labeled::LabelGraphFilterAccessor etc.
I'm slowly settling on a pattern where I make the module path part of the identity of the type, but recognize I could go over-board on this sometimes. If you feel strongly about it, I'm happy to change.
| P: glue::HybridPredicate<Self::Id> + Send + Sync, | ||
| F: FnMut(Decision<Self::Id>, f32) + Send, | ||
| { | ||
| match kind { |
There was a problem hiding this comment.
I'm having trouble following this.
In the ExpansionKind::All case, we call decide (which calls is_match()). The the AcceptOnly case we wrap the predicate and then unconditional return Accept on everything. This appears to be because Wrapper folds the is_match into the eval.
I think this deserves a comment to lay out how the filter matching is happening. It's a little bit obtuse due to each branch handling it differently and with the wrapper types.
There was a problem hiding this comment.
Thanks Jack - I realized that this is way more obtuse than I originally thought (and I got it wrong originally as well). Because of these nuances, I've split the two traversal modes into two independent methods, with the AcceptOnly side taking an Accept struct directly where needed.
| I: VectorId, | ||
| { | ||
| fn eval_mut(&mut self, item: &I) -> bool { | ||
| // Oreder must be `label` -> `inner` because we have to know an ID is accepted before |
There was a problem hiding this comment.
Sorry to be picky but small typo here
Introduce a
FilteredAccessoras a filter aware version ofSearchAccessor.Motivation
Our current hook for performing filtered search relies on passing a
QueryLabelProviderexternally to all call sites. This has the following disadvantages:QueryLabelProvideris hard-coded to use the accessor's internal ID, which may not always be the best fit depending on context.QueryLabelProviderevaluations always live behind a trait object and are just queried one at a time.How
This PR boils down to the following enums and trait:
The existing multi-hop search is rewritten to use
FilteredAccessorand #1131 can be easily rewritten this way as well. This makes theFilteredAccessor"filter-aware" and allows the stated problems to be solved (the TOCTOU potential is really pushed down to the accessor, but that's the layer that knows the concurrency story so it's in a much better place to solve it anyways).To reduce code churn, I added a new module
diskann::graph::ext::labeledthat is the new home forQueryLabelProviderand introduces a wrappingFilteredAccessorand associated strategy. The functionality in this module is intended as a compatibility layer for code that is currently relying onQueryLabelProvider.Two new
expand_*methodsUnfortunately, the semantics of filtered search for
MultiHopFilteredSearchbasically requires the use of two differentexpand_*methods. This is because the accept-only path really needs only accepted items to be passed topred.eval_mut. This PR models that by bespokeAccept/Rejecttypes and because of this nuance, two methods seemed like the most robust way of expressing this.Bug Fixes in MultiHop
With this change, it was clear that multi-hop search was slightly buggy with respect to start point handling: start points were treated as always fulfilling the predicate. This PR goes ahead and fixes that bug by tracking rejected start points explicitly and filtering them out in the post-process call.
What's up with
QueryVisitDecisionThis PR gets rid of
QueryVisitDecision. It is conflating a large number of unrelated decisions.diskannever bothered with it, leading to semantic issues with, for example, beta-filtered search never callingon_visit.two_hop_neighbors. To actually use it right would require a redundant interrogation of theQueryLabelProvider.Dropping this extra functionality means we can ditch the multi-hop tests specifically testing this functionality.
Suggested Review Order
SearchStrategy, this commit relaxes theSearchAccessortrait bound onSearchStrategy::SearchAccessorto allow the disjointFilteredAccessorto join the party. This commit is the one responsible for most of the churn in random parts of the code.diskann/src/graph/glue.rs: The newFilteredAccessorand associated enums.diskann/src/graph/search/multihop_search.rs: Refactored multi-hop search to use the new trait.diskann/src/graph/ext/labeled.rs: The adaptor layer that allows code currently written forQueryLabelProviderto keep working.diskann/test/generated/graph/test/cases/multihop/callback_filtering_grid.jsonwas deleted since it's corresponding test was deleted. We no longer support distance tweaking.