Skip to content

Mask as a struct attribute.#304

Open
yfukai wants to merge 40 commits into
royerlab:mainfrom
yfukai:mask_struct_attr
Open

Mask as a struct attribute.#304
yfukai wants to merge 40 commits into
royerlab:mainfrom
yfukai:mask_struct_attr

Conversation

@yfukai

@yfukai yfukai commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

This PR explores options to save the mask as a struct attribute. Expect #268 be merged.

yfukai and others added 30 commits February 17, 2026 13:26
The previous commit (db35287) mistakenly deleted upstream/main's _SQLIDSet
scratch-table machinery, _create_id_scratch_table, the out_degree/copy
bound-variable handling, and the three scratch-table tests — they were diffed
against a stale fork main and wrongly treated as PR-added code. Restore them
verbatim from main; the struct-attr column-expansion simplification from the
previous commit is kept.
yfukai and others added 4 commits June 8, 2026 01:41
Masks were registered as pl.Object and round-tripped through pickle,
leaving the bbox locked inside an opaque blob. They are now stored as
pl.Struct({min_(z)yx, max_(z)yx: Int64, data: Binary}) so bbox fields
are natively filterable via NodeAttr("mask").struct.field(...), while
the binary mask stays blosc2-compressed in the data field.

- Mask.struct_dtype() / to_struct() / from_struct() conversion API,
  plus as_mask() to coerce struct dicts and legacy Mask objects alike
- RegionPropsNodes and MaskDiskAttrs register the struct dtype and
  write struct values
- consumers (GraphArrayView, IoUEdgeAttr, MaskMatching, ctc metrics,
  compute_overlaps, to_geff) materialize masks via as_mask(), so
  legacy pl.Object mask attributes keep working
- ctc metrics skip the pickle-to-bytes multiprocessing shim for
  struct mask columns, which are Arrow-native

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This was referenced Jun 14, 2026
@yfukai yfukai mentioned this pull request Jun 17, 2026
16 tasks
yfukai and others added 6 commits June 16, 2026 17:34
Resolve conflicts from upstream's struct-attribute PR (royerlab#268) and the
single->bulk add_node/add_edge refactor:

- attrs.py: take upstream's `columns` property and `f.to_attr().expr`
  filter reduction (handles compound AttrFilters as well as struct-field
  AttrComparisons).
- _rustworkx_graph.py: drop the now-duplicate nested `_extract_field_path`;
  use upstream's module-level `_eval_filter` (supports AND/OR/XOR/NOT).
- _sql_graph.py: adopt upstream's `_to_sql_clause` and the removal of the
  single `add_node`/`add_edge` methods (BaseGraph now delegates to the bulk
  variants). Mask struct flattening is preserved via `_flatten_attrs_for_write`
  on the bulk write paths.

Mask remains stored as a struct attribute (this branch's feature); upstream
left Mask as pl.Object.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Mask struct's `data` leaf (blosc2-compressed bytes) was stored through a
SQLAlchemy PickleType column, wrapping the already-compressed bytes in a second
pickle layer on every write and unpickling on every read.

Map `pl.Binary` to `sa.LargeBinary` so binary bytes are stored as a raw BLOB.
Pickling is now detected from the actual SQL column type (PickleType only, not
LargeBinary):

- `_is_pickled_sql_type` returns True only for PickleType columns.
- `unpickle_bytes_columns` takes the explicit set of pickled physical columns
  so raw-binary columns (e.g. the Mask `data` leaf) are left untouched.
- `_restore_pickled_column_types` re-tags reflected LargeBinary columns as
  PickleType except genuine raw-binary columns (schema dtype pl.Binary), which
  reflection cannot otherwise distinguish from pickled blobs. Reordered
  `_define_schema` so the attribute schemas are available when this runs.

Adds a regression test asserting the Mask `data` leaf is a raw LargeBinary
column (not PickleType) before and after reload, and that the mask round-trips
and struct-field filtering still works.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Remove duplicate `_StructNamespace` class in attrs.py left by the upstream
  merge (this branch and upstream PR royerlab#268 each added an identical copy).
- Drop the now-empty `if TYPE_CHECKING: pass` block and unused `TYPE_CHECKING`
  import in array/_graph_array.py.
- Fix `_restore_pickled_column_types` docstring to reference the helper it
  actually uses (`_raw_binary_physical_columns`).
- Align Mask.bbox_struct_fields/struct_dtype docstrings with the supported
  1-to-3 dimension range.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
These four imports break a graph <-> nodes package import cycle: `_base_graph`
is imported by nearly every package `__init__`, so a module-level
`from tracksdata.nodes._mask import as_mask` re-enters the partially-initialized
graph package. Add a short comment at each site so the pattern isn't naively
"tidied" into a module-level import.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

@yfukai yfukai left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yfukai yfukai marked this pull request as ready for review June 22, 2026 22:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants