mitoverse (code)

Build + access tooling for the MitoVerse 3D-EM mitochondria benchmark. Data + splits live in a separate HuggingFace repo, pytc/MitoVerse (cluster: /projects/weilab/dataset/mitoverse). See DESIGN.md for the format decision and .agent/plan.md for the benchmark rationale. Current build: 212 volumes, 13 datasets, ~29.8k mitochondria.

mitoverse/io.py          load(volume_id) -> Volume(img, mito, mask, meta)
catalog/datasets.yaml    per-dataset source metadata (modality/species/voxel/license)
scripts/
  ingest.py              one HDF5/TIFF image+instance pair -> one zarr in the data repo
  ingest_dir.py          batch-ingest every *_im/_mito(/_mask) pair in a directory
  ingest_stream.py       slab-by-slab ingest for huge volumes (e.g. MitoEM), no full RAM load
  to_pytc.py             a splits/*.json benchmark -> PyTorchConnectomics cfg.data block
  rebuild_index.py       regenerate data-repo catalog.json from every store's .zattrs
  build_web.py           generate docs/index.html — the multi-tab dataset explorer
docs/index.html           self-contained catalog explorer (All / by modality / organism / resolution / tissue / dataset / provenance)

Layout & conventions

One zarr DirectoryStore per original-chunk volume: data/<dataset>/<volume>.zarr/ with arrays img (uint8 ZYX) and mito (instance, 0=bg), metadata in .zattrs["mitoverse"]. Datasets are <author><yy> folders; OpenOrganelle/COSEM volumes live in openorganelle/ and their annotation provenance is carried by split membership (splits/cellmap.json vs splits/mitoem2.0.json). Benchmarks are split overlays in the data repo's splits/, never duplicated data.

Use

# add a volume (reuses existing curated h5/tiff; no re-annotation)
python scripts/ingest.py --dataset guay21 --volume vol0 \
    --im .../guay21/vol0_im.h5 --mito .../guay21/vol0_mito.h5 \
    --voxel 10,10,50 --modality SBF-SEM --species Human --tissue platelet

# batch a whole source dir, then refresh the index
python scripts/ingest_dir.py --src-dir .../betaSeg --dataset muller20 --voxel 16,16,16 --modality FIB-SEM --species Mouse --tissue pancreas
python scripts/rebuild_index.py

# a benchmark split -> PyTC data config
python scripts/to_pytc.py /projects/weilab/dataset/mitoverse/splits/mitoem2.0.json

# regenerate the explorer website
python scripts/build_web.py

from mitoverse import load
v = load("guay21_vol0")        # v.img, v.mito, v.semantic (derived), v.meta

PyTorchConnectomics reads the stores natively — image: <vol>.zarr/img, label: <vol>.zarr/mito.

Environments

pytc conda env (zarr 2.18, h5py, numcodecs, tifffile) runs everything except v3 OME-Zarr reads.
Reading Peng's MitoEM2.0_OMEZarr (Zarr v3 multiscale) needs a separate env: conda create -n zarrv3 -c conda-forge "zarr>=3" numcodecs h5py tifffile. Don't put zarr 3 in pytc.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.agent		.agent
catalog		catalog
docs		docs
mitoverse		mitoverse
scripts		scripts
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mitoverse (code)

Layout & conventions

Use

Environments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mitoverse (code)

Layout & conventions

Use

Environments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages