[fix](NestedColumnPruning): visit key/value/entry args in visitMapContainsKey/Value/Entry#64440
[fix](NestedColumnPruning): visit key/value/entry args in visitMapContainsKey/Value/Entry#64440englefly wants to merge 1 commit into
Conversation
…ry (apache#26176) visitMapContainsKey/Value/Entry only visited the map argument (getArgument(0)) and skipped the key/value/entry argument (getArgument(1)). When the key is a nested sub-column expression (e.g. element_at(s, 'a')) and the same sub-column also appears in IS NULL, NestedColumnPruning would only see [s.a.NULL] and prune the column to null-only metadata access. Fix: after visiting the map argument, also visit the second argument with a fresh context to register its data access paths. Regression test map_contains_arg_pruning covers: - map_contains_key with struct sub-column key + IS NULL - map_contains_value with struct sub-column value + IS NULL - Combined key + value sub-column access Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found blocking issues in the PR as written. The key/value visitor changes address the reported shape, but the map_contains_entry path is still incomplete, and the added regression suite will not run in normal mode because its quick-test output file is missing.
Critical checkpoint conclusions:
- Goal and proof: Partially met.
map_contains_keyandmap_contains_valuenow visit argument 1, butmap_contains_entryis ternary and still misses the value argument. The intended entry regression is not actually testingmap_contains_entry. - Scope and clarity: The implementation is small and localized, but the entry function comments/model are inaccurate (
<map>, <entry>instead of<map>, <key>, <value>). - Concurrency, lifecycle, config, compatibility: Not applicable for this FE rewrite visitor/test change; no new shared mutable runtime state, lifecycle ownership, configuration, storage format, or FE-BE protocol fields were introduced.
- Parallel code paths: Key/value/entry are parallel paths. The key/value cases were updated, but entry was only partially updated and needs both argument 1 and argument 2 traversal.
- Test coverage and results: Coverage is currently insufficient. The new suite has
order_qt_*checks but no generated.outfile, and its Case 3 usesmap_contains_key/map_contains_valueinstead ofmap_contains_entry. - Observability: Not applicable; this is optimizer access-path collection and existing explain output is enough for the regression.
- Transaction/persistence/data writes: Not applicable.
- Performance: No material performance issue found; the added traversal is small and only visits expression arguments already present in the plan.
- User focus: No additional user-provided review focus was present.
I did not run the full regression suite in this runner; the test-runner issue is based on the regression framework behavior, which throws for missing quick-test output files unless output generation mode is enabled.
| return continueCollectAccessPath(mapContainsEntry.getArgument(0), context); | ||
| continueCollectAccessPath(mapContainsEntry.getArgument(0), context); | ||
| // Entry argument: visit with a fresh context to register its data access paths. | ||
| Expression entryArg = mapContainsEntry.getArgument(1); |
There was a problem hiding this comment.
MapContainsEntry is a ternary function (map_contains_entry(map, key, value); the class implements TernaryExpression and its signature reads getArgument(2)), but this visitor only collects argument 1. A query such as map_contains_entry(m, 'hello', element_at(s, 'b')) together with element_at(s, 'b') IS NULL still only records the s.b.NULL path for the value expression and can prune s.b to null-only access, which is the same wrong-result class this PR is fixing. Please visit both search arguments, for example by collecting arguments 1..arity-1 with fresh contexts, and update the comment to reflect the actual (map, key, value) shape.
| sql """ | ||
| SELECT id, | ||
| element_at(s, 'a') IS NULL, | ||
| map_contains_key(m, element_at(s, 'a')), |
There was a problem hiding this comment.
This case is labeled as map_contains_entry, but the query calls map_contains_key and map_contains_value; it never exercises visitMapContainsEntry. Because the entry function is ternary (map_contains_entry(m, key, value)), this test would not catch the current missing traversal of argument 2. Please add a real map_contains_entry(m, element_at(s, 'a'), element_at(s, 'b')) case and assert both nested argument paths are collected.
| contains "m.KEYS" // map_contains_key needs KEYS path | ||
| } | ||
|
|
||
| order_qt_case1 """ |
There was a problem hiding this comment.
This new suite adds order_qt_case1/case2/case3, but the PR does not add regression-test/data/nereids_rules_p0/column_pruning/map_contains_arg_pruning.out. In normal regression mode quickRunTest throws Missing outputFile when the suite .out file is absent, so run-regression-test.sh -d nereids_rules_p0/column_pruning -s map_contains_arg_pruning will fail unless run with output generation enabled. Please add the generated .out file for these quick-test blocks.
TPC-H: Total hot run time: 29049 ms |
TPC-DS: Total hot run time: 168788 ms |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
What problem does this PR solve?
visitMapContainsKey/Value/Entry only visited the map argument (getArgument(0)) and skipped the key/value/entry argument (getArgument(1)). When the key is a nested sub-column expression (e.g. element_at(s, 'a')) and the same sub-column also appears in IS NULL, NestedColumnPruning would only see [s.a.NULL] and prune the column to null-only metadata access.
Fix: after visiting the map argument, also visit the second argument with a fresh context to register its data access paths.
Regression test map_contains_arg_pruning covers:
26176
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)