Skip to content

feat: auto-apply Snowflake ownership object tags on publish#26

Open
Vinay Shende (vinay79n) wants to merge 6 commits into
mainfrom
feat/tagging-tables
Open

feat: auto-apply Snowflake ownership object tags on publish#26
Vinay Shende (vinay79n) wants to merge 6 commits into
mainfrom
feat/tagging-tables

Conversation

@vinay79n

@vinay79n Vinay Shende (vinay79n) commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Implements the table-ownership RFC (RFC: Snowflake table ownership via object tags) by tagging tables through the existing publish chokepoint instead of hundreds of hand-written ALTER statements.

  • Add _snowflake/object_tags.py:
    • build_table_tags — derives TEAM/DOMAIN/PROJECT from Metaflow context, defaults STATUS=active, validates STATUS/SLA against allowed-value lists. OWNER is resolved by priority: explicit tags={"owner": ...} → owning-team alias ds-<domain>-team when the domain is known → unknown (we don't use current.username, which on argo/deployed runs is a service identity, not a person).
    • build_set_tag_sql / apply_table_tags — build and run the ALTER ... SET TAG, with identifier validation to block SQL injection and warn-don't-fail behavior (a missing tag def or missing APPLY never breaks a successful publish).
  • Wire automatic tagging into publish() and publish_pandas() via a new optional tags override. Tags are applied only to production tables (DATA_SCIENCE); non-prod (DATA_SCIENCE_STAGE) runs are left untouched.
  • Add metaflow/registry.py: create_ownership_registry_view() + TABLE_OWNERSHIP_REGISTRY view (plain view, adoption-based, no refresh needed; ~2h ACCOUNT_USAGE lag — use INFORMATION_SCHEMA.TAG_REFERENCES / SYSTEM$GET_TAG for real-time checks).
  • Export create_ownership_registry_view from the metaflow package.
  • Tests: unit tests with 100% coverage of object_tags.py (derivation, owner priority, validation, warn-don't-fail).
  • Docs: publish / publish_pandas ownership-tags sections + registry view page.
  • Bump version 0.4.20.5.0.

Admin prerequisites (per the RFC §3): in PATTERN_DB.DATA_SCIENCE, create the 7 TABLE_* tag definitions (with ALLOWED_VALUES on TABLE_STATUS / TABLE_SLA) and grant the publishing role APPLY on each. Until then, tagging is skipped with a warning (publish still succeeds).

ClickUp card - DS-2491

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds first-class Snowflake table-ownership tagging to the Metaflow publish “chokepoints” and introduces an admin helper view that surfaces those tags via a central registry view.

Changes:

  • Introduces _snowflake/object_tags.py to derive/validate ownership tags, generate ALTER TABLE … SET TAG SQL, and apply tags (warn-only on tagging failures).
  • Wires optional tags overrides into publish() and publish_pandas() and applies tags for production tables.
  • Adds create_ownership_registry_view() + docs to create/query the TABLE_OWNERSHIP_REGISTRY view; bumps version to 0.5.0.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
uv.lock Bumps locked project version to 0.5.0.
pyproject.toml Bumps package version to 0.5.0 and updates authors.
src/ds_platform_utils/_snowflake/object_tags.py New tag derivation/SQL builder/apply helpers for Snowflake object tags.
src/ds_platform_utils/metaflow/write_audit_publish.py Adds tags parameter and applies tags post-SWAP in production.
src/ds_platform_utils/metaflow/pandas.py Adds tags parameter and applies tags in production for both direct and S3-stage paths.
src/ds_platform_utils/metaflow/registry.py New helper to create the ownership registry view in Snowflake.
src/ds_platform_utils/metaflow/init.py Exports create_ownership_registry_view.
tests/unit_tests/snowflake/test__object_tags.py New unit tests for tag derivation, SQL construction, and apply behavior.
README.md Links to the new create_ownership_registry_view docs page.
docs/metaflow/publish.md Documents the new tags param and ownership-tag behavior.
docs/metaflow/publish_pandas.md Documents the new tags param and ownership-tag behavior.
docs/metaflow/create_ownership_registry_view.md New docs for creating/querying the registry view.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/ds_platform_utils/metaflow/pandas.py Outdated
Comment thread src/ds_platform_utils/_snowflake/object_tags.py
Comment thread docs/metaflow/publish.md Outdated
…proved error handling , github copilot suggested code chnages
… tags

Owner now derives from domain (ds-<domain>-team); tags co-locate with the
table's schema; adds a functional test for the tagging path.
Team decided dev (DATA_SCIENCE_STAGE) tables shouldn't be tagged. Re-gate
publish()/publish_pandas() on current.is_production; keep the owner-from-
domain derivation, validation, and expanded allowed values.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants