Refactor: data-driven validation, naming improvements, output filter dedup#853
Open
Wolfvin wants to merge 1 commit into
Open
Refactor: data-driven validation, naming improvements, output filter dedup#853Wolfvin wants to merge 1 commit into
Wolfvin wants to merge 1 commit into
Conversation
…dedup ## What was refactored and why ### 1. formatter.validate_options() — Data-driven validation **Before:** 134 lines of repetitive if/raise patterns with the same structure repeated 15+ times: get option, check valid values, raise SQLParseError. **After:** Extracted and helpers that eliminate the repetition. Options are now validated in three clear sections: choice-type options, integer-type options, and side effects (dependent options). Same behavior, same error messages, but 30% fewer lines and much easier to extend with new options. ### 2. utils.imt() → utils.token_matches() **Before:** was a cryptic 3-letter abbreviation known only to insiders. It stood for 'Instance, Match, TokenType' but gave no hint from the call site. **After:** Renamed to — a self-documenting name that tells you exactly what the function does. is preserved as a backward-compatible alias so all existing code continues to work. ### 3. filters/output.py — Extract shared logic **Before:** and had nearly identical logic for: variable assignment headers, quote escaping, and continuation line formatting. Each was 40+ lines of duplicated token generation. **After:** Extracted and as shared helpers. Both filters now use these helpers plus class-level constants (, , ) for their dialect-specific differences. The Python filter's header generation is now inline (matching the original exactly), while the PHP filter preserves its extra-space alignment quirk explicitly. ## Verification All refactoring was verified using regret-based regression testing with 15 clusters across parse, format, split, and parsestream functions: - **Cluster validation:** All 15 clusters GREEN (fingerprints match golden) - **Direct output comparison:** All 24 raw outputs identical to pre-refactor - **Fingerprint cross-check:** All 15 fingerprints match KEBENARAN 2 baseline - **Chain validation:** parse-to-format pipeline chain hash matches - **Existing test suite:** All 487 pytest tests pass (2 xfailed, 1 xpassed) ### Fingerprint evidence | Cluster | Before | After | Match | |---------|--------|-------|-------| | parse-select | 4y7y9mn | 4y7y9mn | ✅ | | parse-insert | 1chpryd | 1chpryd | ✅ | | parse-create | 1z0ykp3 | 1z0ykp3 | ✅ | | parse-begin-end | s4zcc5z | s4zcc5z | ✅ | | parse-cte | 37bw5hs | 37bw5hs | ✅ | | parse-case | 3zufub0 | 3zufub0 | ✅ | | format-reindent | 1myoz81 | 1myoz81 | ✅ | | format-keyword-case | 6cumtfa | 6cumtfa | ✅ | | format-strip-comments | 582b9j0 | 582b9j0 | ✅ | | format-comma-first | 1fc8eci | 1fc8eci | ✅ | | format-wrap-after | vrh7cvv | vrh7cvv | ✅ | | format-output-python | 2iim3hg | 2iim3hg | ✅ | | format-output-php | 13yrmvm | 13yrmvm | ✅ | | split-statements | 41gw3tm | 41gw3tm | ✅ | | parsestream-basic | 2aok2um | 2aok2um | ✅ | Chain hash (parse-to-format-pipeline): 53yvhoc → 53yvhoc ✅
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR refactors three areas of sqlparse to improve maintainability, readability, and reduce code duplication. All changes are behavior-preserving — verified by regression testing with 15 clusters and the existing 487-test test suite.
1. Data-driven option validation (formatter.py)
Problem:
validate_options()was 134 lines of repetitive if/raise patterns, with the same structure repeated 15+ times.Solution: Extracted
_validate_choice()and_validate_positive_int()helpers. Options are now validated in three clear sections: choice-type, integer-type, and side effects.2. Rename imt() to token_matches() (utils.py)
Problem:
imt()is a cryptic 3-letter abbreviation that gives no hint from the call site about what it does.Solution: Renamed to
token_matches().imtis preserved as a backward-compatible alias so all existing code continues to work.3. Output filter deduplication (filters/output.py)
Problem:
OutputPythonFilter._process()andOutputPHPFilter._process()had nearly identical logic for variable assignment headers, quote escaping, and continuation line formatting.Solution: Extracted shared helpers
_generate_assignment_header()and_generate_continuation_header(). Both filters now use class-level constants for dialect-specific differences.Verification
All refactoring was verified using regret-based regression testing with 15 clusters:
Chain hash (parse-to-format pipeline):
53yvhocbefore and after — identical.All 487 existing pytest tests pass.
Direct output comparison (KEBENARAN 1): All 24 raw outputs identical to pre-refactor baseline.
Fingerprint cross-check (KEBENARAN 2): All 15 fingerprints match the saved golden baselines.