Refactor 0612#64444
Merged
Merged
Conversation
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add Iceberg-specific TableReader debug output so row lineage, delete file, delete filter, and column mapping state can be inspected when diagnosing Iceberg scan behavior.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --check
- Attempted BE unit tests with run-be-ut.sh, but the local run was interrupted after environment setup issues and user interruption
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve? Issue Number: close #xxx Related PR: #xxx Problem Summary: Document the distinction between Iceberg row lineage metadata columns and Doris internal Iceberg row locator virtual columns in TableVirtualColumnType. ### Release note None ### Check List (For Author) - Test: No need to test (comment-only change) - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Iceberg v3 row lineage metadata columns should preserve physical non-null values and inherit data file metadata only for NULL or missing values. The reader previously treated missing _row_id as a pure virtual column and overwrote _last_updated_sequence_number with a constant value, while physical _row_id was mapped as a normal file column and skipped inheritance. Mark physical row lineage columns for finalize-stage materialization, fill only NULL values from first_row_id plus row position or last_updated_sequence_number, and add unit coverage for physical, missing, and metadata-missing cases.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Manual test
- Added BE unit coverage in table_reader_test.cpp for Iceberg row lineage inheritance
- Ran git diff --check
- Attempted run-be-ut.sh for the related TableReaderTest filter, but local execution failed because nproc is unavailable, submodule .git/modules writes are denied in the sandbox, and github.com could not be resolved; the escalated rerun was interrupted by the user
- Behavior changed: Yes. Iceberg v3 row lineage metadata columns now preserve physical non-null values and inherit only missing/NULL values according to Iceberg rules.
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add BE unit coverage for Iceberg row lineage predicates. ColumnMapper now has coverage that physical row lineage columns remain FINALIZE_ONLY and are not localized to file-reader conjuncts. TableReader coverage simulates scanner final filtering after row lineage materialization for _row_id and _last_updated_sequence_number predicates.
### Release note
None
### Check List (For Author)
- Test: Unit Test / Manual test
- Added ColumnMapperConstantTest.PhysicalRowLineageFiltersStayFinalizeOnly
- Added TableReaderTest.IcebergRowIdPredicateFiltersAfterRowLineageMaterialization
- Added TableReaderTest.IcebergLastUpdatedSequencePredicateFiltersAfterMaterialization
- Ran git diff --check
- Attempted run-be-ut.sh with the added tests, but local execution failed before tests because nproc is unavailable, .git/modules writes are denied in the sandbox, and github.com could not be resolved
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Rename the optional nullable int64 expectation helper so initializer-list calls with non-null expected values continue to resolve to the plain int64 helper without ambiguous overload resolution.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --check
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Refactor TableColumnMapper row lineage virtual column selection to avoid duplicated column-name branches. Add comments for the physical row lineage field path, the missing row lineage virtual path, and the Doris internal Iceberg row locator path.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --check
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Move file-local expression tree cloning away from a ColumnMapper-only type switch. Add VExpr::deep_clone with per-expression clone_node hooks for the expression nodes used by ColumnMapper, while keeping table-reader-specific slot, literal, and cast clone policy in ColumnMapper.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --cached --check
- Attempted ./run-be-ut.sh --run --filter='ColumnMapper*:*TableColumnMapper*', but local build did not reach C++ compilation because generated/thirdparty dependencies were missing: thirdparty/installed/bin/protoc and Snappy
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Remove the format_v2 TableLiteral and TableSlotRef wrapper classes. Move their split-local literal and pre-resolved slot capabilities into VLiteral and VSlotRef, and update ColumnMapper, table readers, scanners, and related tests to use the base expression classes directly.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --cached --check
- Attempted ./run-be-ut.sh --run --filter='VLiteralTest*:*VSlotRefTest*:*ColumnMapper*:*TableColumnMapper*' with JDK 17; build did not reach C++ compilation because local thirdparty/gensrc dependencies are missing: thirdparty/installed/bin/protoc and Snappy
- Behavior changed: No
- Does this need documentation: No
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Iceberg row lineage materialization modified nullable columns with assert_mutable after convert_to_full_column_if_const. When the column was shared, this hit COW::assert_mutable use_count() > 1. Use IColumn::mutate for both _row_id and _last_updated_sequence_number so shared columns are detached before filling inherited values.
### Release note
None
### Check List (For Author)
- Test: Manual test
- Ran git diff --cached --check
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: RuntimeFilterExpr owns the real runtime filter expression in _impl, while the wrapper itself does not keep that child tree in its own _children vector. A generic expression deep clone therefore needs RuntimeFilterExpr to clone _impl explicitly. Add clone hooks for RuntimeFilterExpr and runtime filter predicate implementations used by file-local filter rewrites, and add a unit test that verifies the cloned runtime filter wrapper owns an independent impl tree.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added RuntimeFilterExprSamplingTest.deep_clone_clones_impl_tree
- Ran git diff --cached --check
- Attempted ./run-be-ut.sh --run --filter='RuntimeFilterExprSamplingTest.deep_clone_clones_impl_tree', but the local thirdparty installation is incomplete: thirdparty/installed/bin/protoc is missing and Snappy cannot be found.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: In BY_FIELD_ID mapping mode, Iceberg row lineage metadata columns must be identified by their reserved field ids instead of only by column name. Name-only matching can miss renamed metadata columns and can also misclassify ordinary columns that happen to use the same name with a different field id. The missing row lineage path also has to take precedence over generic default expressions so IcebergTableReader can apply the Iceberg v3 inheritance rules.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added ColumnMapperConstantTest.MissingRowLineageDefaultExprStillUsesVirtualMapping
- Added ColumnMapperConstantTest.ByFieldIdDoesNotTreatSameNameDifferentIdAsRowLineage
- Ran git diff --cached --check
- BE UT execution is blocked in this workspace because thirdparty/installed/bin/protoc is missing and Snappy cannot be found.
- Behavior changed: Yes. Iceberg row lineage virtual columns in BY_FIELD_ID mode are now resolved by reserved Iceberg field id and take precedence over generic missing-column defaults.
- Does this need documentation: No
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)