Skip to content

DistanceTester: take a direct path for Overlap features (~8% faster TRIBL2/IB1 test)#16

Open
antalvdb wants to merge 1 commit into
masterfrom
distance-tester-dispatch-flatten
Open

DistanceTester: take a direct path for Overlap features (~8% faster TRIBL2/IB1 test)#16
antalvdb wants to merge 1 commit into
masterfrom
distance-tester-dispatch-flatten

Conversation

@antalvdb

Copy link
Copy Markdown
Member

Summary

The inner distance loop in DistanceTester::test() dispatched every feature through a virtual metricTestFunction::test()Feature::fvDistance() (which re-checks the storable / numeric branches on every call) → a virtual metric->distance(), plus a permutation[] indirection — all to compute, for a plain Overlap feature, just (F == G ? 0 : weight).

This precomputes once, in permuted order, a flat metricTestFunction array (removing the permutation[] indirection) and an Overlap flag per feature, and lets Overlap features take the direct (F == G ? 0 : weight) path. Other metrics (MVDM, numeric, …) keep the existing path unchanged.

Measured impact

TRIBL2, 20k test instances written to /dev/null, reused saved instance base; deterministic instruction counts, min of 3 (the base load is identical before/after, so the delta is the test phase):

20k run instructions
before 161.30 B
after 155.19 B

−8% of the test phase. Only IB1 and TRIBL2 use DistanceTester (IGTree computes no distances, so it is unaffected); IB1 — being all-distance — likely benefits more.

Correctness

Output is byte-identical, including with +v db (distributions printed), which depends on the exact distances and neighbour ordering. Verified on TRIBL2 (20k), plain and +v db.

Posting for your consideration — happy to adjust.

🤖 Generated with Claude Code

The inner distance loop dispatched every feature through a virtual
metricTestFunction::test() -> Feature::fvDistance() (which re-checks the
storable/numeric branches on every call) -> a virtual metric->distance(),
plus a permutation[] indirection -- all to compute, for a plain Overlap
feature, just (F == G ? 0 : weight).

Precompute once, in permuted order, a flat metricTestFunction array (removing
the permutation indirection) and an Overlap flag per feature, and let Overlap
features take the direct (F == G ? 0 : weight) path. Other metrics (MVDM,
numeric, ...) keep the existing path.

Measured (TRIBL2, 20k test instances, reused saved base; deterministic
instruction counts, min of 3): 161.30 B -> 155.19 B, i.e. about -8% of the
test phase. Only IB1/TRIBL2 use DistanceTester (IGTree computes no distances).

Output is byte-identical, including with +v db (which depends on the exact
distances and neighbour ordering).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant