Refactor of diskann-benchmarks to make it potentially easier to spin up new backends#1142
Refactor of diskann-benchmarks to make it potentially easier to spin up new backends#1142JordanMaples wants to merge 5 commits into
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1142 +/- ##
=======================================
Coverage 89.51% 89.52%
=======================================
Files 487 489 +2
Lines 92443 92513 +70
=======================================
+ Hits 82751 82820 +69
- Misses 9692 9693 +1
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
1875da1 to
28d08fa
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the diskann-benchmark “graph index” benchmark stack to improve modularity and reduce duplication, primarily by introducing a shared streaming runner/managed layer and by renaming “Dynamic” concepts to “Streaming”. It also restructures benchmark registration so index benchmarks live under a new crate::index module rather than crate::backend::index.
Changes:
- Added a reusable streaming execution path (
StreamRunner,Managed,build_streamer,run_streaming) shared across in-mem and bf-tree streaming benchmarks. - Renamed Dynamic → Streaming for runbook params and streaming runs, updating example inputs and integration tests accordingly.
- Consolidated bf-tree build/stream inputs to support either “none” or “spherical” quantization via a single
QuantConfig.
Reviewed changes
Copilot reviewed 36 out of 39 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-benchmark/src/main.rs | Registers new index benchmark module; updates integration tests and example input filenames. |
| diskann-benchmark/src/inputs/graph_index.rs | Renames Dynamic→Streaming types; privatizes some input fields and adds accessors; updates tags. |
| diskann-benchmark/src/inputs/exhaustive.rs | Derives Clone for TransformKind for reuse in new config structures. |
| diskann-benchmark/src/inputs/bftree.rs | Unifies bf-tree inputs with QuantConfig for build + streaming modes; updates validation/display. |
| diskann-benchmark/src/index/streaming/stats.rs | Rehomes imports under crate::index::* after refactor. |
| diskann-benchmark/src/index/streaming/runner.rs | Introduces StreamRunner + Maintainer abstraction used by streaming benchmarks. |
| diskann-benchmark/src/index/streaming/mod.rs | Adds shared build_streamer and run_streaming helpers for runbook-driven streaming. |
| diskann-benchmark/src/index/streaming/managed.rs | Adds the Managed ID→slot layer and SlotReclaim policy for streaming backends. |
| diskann-benchmark/src/index/search/range.rs | Updates imports to new crate::index module structure. |
| diskann-benchmark/src/index/search/plugins.rs | Updates imports to new crate::index module structure. |
| diskann-benchmark/src/index/search/mod.rs | New module wrapper for search submodules. |
| diskann-benchmark/src/index/search/knn.rs | Updates imports to new crate::index module structure. |
| diskann-benchmark/src/index/result.rs | Updates imports to new crate::index module structure. |
| diskann-benchmark/src/index/mod.rs | New top-level index module with benchmark registration entrypoint. |
| diskann-benchmark/src/index/inmem/streaming.rs | Adds in-mem streaming maintenance implementation (DropDeleted + Release). |
| diskann-benchmark/src/index/inmem/spherical.rs | Adjusts to new module layout and new accessor-based input APIs. |
| diskann-benchmark/src/index/inmem/scalar.rs | Adjusts to new module layout and new accessor-based input APIs. |
| diskann-benchmark/src/index/inmem/product.rs | Adjusts to new module layout and new accessor-based input APIs. |
| diskann-benchmark/src/index/inmem/mod.rs | New inmem module wrapper; exports InmemMaintainer. |
| diskann-benchmark/src/index/build.rs | New shared build helpers (insert variants, start points, progress meter, save/load). |
| diskann-benchmark/src/index/bftree/spherical.rs | Refactors bf-tree spherical build benchmark to use QuantConfig and shared quantizer helper. |
| diskann-benchmark/src/index/bftree/spherical_streaming.rs | Adds bf-tree spherical streaming benchmark using shared streaming helpers. |
| diskann-benchmark/src/index/bftree/quantizer_util.rs | New shared spherical quantizer construction helper for bf-tree benchmarks. |
| diskann-benchmark/src/index/bftree/mod.rs | Updates bf-tree benchmark registration and module structure. |
| diskann-benchmark/src/index/bftree/full_precision.rs | Refactors bf-tree full-precision build benchmark to use unified BfTreeBuild input. |
| diskann-benchmark/src/index/bftree/full_precision_streaming.rs | Adds bf-tree full-precision streaming benchmark using shared streaming helpers. |
| diskann-benchmark/src/index/benchmarks.rs | Refactors graph-index benchmark registration and streaming path to use shared helpers. |
| diskann-benchmark/src/backend/mod.rs | Stops registering the old backend::index benchmarks (now registered via crate::index). |
| diskann-benchmark/src/backend/index/streaming/mod.rs | Removed (streaming moved under crate::index::streaming). |
| diskann-benchmark/src/backend/index/streaming/full_precision.rs | Removed (replaced by StreamRunner + inmem maintainer). |
| diskann-benchmark/src/backend/index/bftree/streaming_utils.rs | Removed (replaced by crate::index::streaming::run_streaming). |
| diskann-benchmark/src/backend/index/bftree/spherical_streaming.rs | Removed (replaced by crate::index::bftree::spherical_streaming). |
| diskann-benchmark/src/backend/index/bftree/full_precision_streaming.rs | Removed (replaced by crate::index::bftree::full_precision_streaming). |
| diskann-benchmark/README.md | Updates example command and job type/tag for streaming runs. |
| diskann-benchmark/example/graph-index-stream.json | Updates job type to the new streaming tag. |
| diskann-benchmark/example/graph-index-bftree.json | Updates job type to unified bf-tree tag and adds quantization.kind. |
| diskann-benchmark/example/graph-index-bftree-stream.json | Updates job type to unified bf-tree streaming tag and adds quantization.kind. |
| diskann-benchmark/example/graph-index-bftree-stream-spherical.json | New example demonstrating bf-tree streaming with spherical quantization. |
| diskann-benchmark/example/graph-index-bftree-spherical.json | Updates example to use unified bf-tree tag and nested quantization config. |
Comments suppressed due to low confidence (1)
diskann-benchmark/src/inputs/graph_index.rs:1204
- PR description says input fields are privatized behind accessor methods, but
StreamingRunbookParamsstill exposes most fields aspub(crate)and downstream code accesses them directly. Either update the PR description/scope, or make these fields private and add accessors to keep the encapsulation goal consistent.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Merge BfTreeFullPrecisionBuild + BfTreeSphericalBuild into BfTreeBuild, and BfTreeStreamingRun + BfTreeSphericalStreamingRun into a single BfTreeStreamingRun. Both structs now carry a QuantConfig enum that discriminates between full-precision (None) and spherical quantization. - Add QuantConfig enum with tagged serde (kind: none | spherical) - Consolidate quantizer training into quantizer_util::build_quantizer - Update benchmark callers to check QuantConfig variant in try_match - Update example JSON files with new unified tags and quantization field - Add Clone to exhaustive::TransformKind Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
4d1b44e to
8af5e9a
Compare
|
Thanks Jordan, this looks pretty good to me. One thing I am wondering is if we're refactoring streaming, there is one other thing in the streaming benchmark that really doesn't make sense, and it's that we don't modify the Would it make sense to also tackle that as part of a streaming overhaul? |
Yeah, I can tackle those here. |
hildebrandmw
left a comment
There was a problem hiding this comment.
Thanks Jordan - this looks like a net simplification, but merging the behavioral changes with the module reorganization is making it hard for me to follow this early on a Monday. Can I request the module reorganization/mechanical changes be moved into their own quick PR? Will review ASAP.
Following up on #1142 (review), I'm breaking #1142 into two prs. Pt 1 (this) is a purely mechanical module reorganization with no behavioral changes: - Move backend/index/ contents to top-level index/ module - Create index/inmem/ submodule for product, scalar, spherical backends - Update all crate::backend::index:: paths to crate::index:: - Widen pub(super) to pub(crate) where items are now used across submodule boundaries - Register index benchmarks directly from main.rs instead of through backend/mod.rs Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Refactors the diskann-benchmark infrastructure to improve modularity, reduce duplication, and privatize input struct fields behind accessor methods.
Input field privatization: