Alexhillsley/refactor by ahillsley · Pull Request #14 · royerlab/ops_model

ahillsley · 2026-06-12T16:15:41Z

Changes:

Restructure combination (dropped old pipeline and focused on pca_optimization)
Add config entry point to pca_optimization
Score CORUM/CHAD/EBI independently - so when one metric fails the others will still run
Unify CSV-> Anndata processing into process_features_csv
Depreciate eval subdir and remove outdated tests
add end-to-end test scripts for the feature pipelines
Fix stale hardcoded config paths
Fix stale icd.ops gene cluster path

…ntal + add-on modules The argparse subpackage (`python -m ...combination.pca_optimization`) is now the canonical entry point, so the config/baseline.yml path is removed and the remaining modules are grouped by role. Remove deprecated config/baseline.yml path (no non-test/non-scratch importers): - cli.py, config_handler.py, file_validator.py - combiners.py (the duplicate PcaOptimizationCombiner + deprecated ComprehensiveCombiner) - classifier_combiner.py / classifier_aggregator.py (dormant; never wired into the CLI) Group downstream-only analysis tools into analysis/: - embedding_overlays, compare_map_scores, compare_modalities, pca_component_to_feature, marker_norm_sweep_runner Group optional flag-gated stages into pipeline_add_ons/: - op_signal, chromosome, guide_chrom_arm_correction Update all importers (pca_optimization __init__/handlers/phase2/embeddings and models/attention/embedding/regen_umap_html) to the new paths. pca_sweep_op_signal is still re-exported through pca_optimization's namespace; test_pca_optimization_refactor passes (41/41). Add README.md (how to run the subpackage) and SCRIPT_MAP.md (core vs supplemental inventory). cell_filters.py is retained pending a port-vs-drop decision (no subpackage equivalent yet). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…tion Two additive features for the pca_optimization combination pipeline, both reusing the existing argparse Namespace + Phase 1/2 with no parallel schema. 1. --config <yaml>: run the pipeline from a config file whose keys are the CLI argument names (snake_case dest names). Config values populate argparse defaults via set_defaults; any explicit CLI flag still overrides. Adds run_from_config() (programmatic entry) and _load_and_validate_config() (rejects unknown keys + the phase_only/no_phase conflict). main() is split into main() (parse + config merge) and run(args) (the unchanged dispatch). Example at pca_optimization/example_config.yml. 2. signal_paths (a config key): combine cell-level embedding h5ads that live OUTSIDE the standard experiment layout. Maps a signal-group name -> one h5ad path or a list of paths (pooled); each h5ad uses the same schema as the discovery features_processed_*.h5ad. phase1.pca_sweep_pooled_signal gains an optional cell_paths override (explicit path instead of find_cell_h5ad_path); new handlers._handle_external builds signal groups from the manifest and reuses the pooled worker + Phase 2. Experiment discovery is skipped; output lands under <output_dir>/external/. Verified: 41/41 structural tests pass; external ingest validated end-to-end (two synthetic h5ads pooled into one signal -> per_signal guide/gene outputs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Move src/ops_model/eval into src/ops_model/deprecated/eval (following the existing deprecated/ convention) and delete tests/eval. Also drop the now-dead run_eval console-script entry point from pyproject.toml. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Replace the two near-duplicate pipelines (evaluate_cp.process and evaluate_embeddings.process_embedding_csv) with a single processing_common.process_features_csv that branches on feature type: CellProfiler builds the cell AnnData and splits by reporter; embedding models (dinov3/cell_dino/subcell) build one per-channel AnnData. Shared embedding-config parsing, guide/gene aggregation, and validate_and_save now live in processing_common; cp_features and batch_process_embeddings call the single entry point and the old ones are removed. Also folds in the features/ cleanup: dead functions + unused imports removed, the good_rows_mask NaN-row bug fixed, and the broken test_evaluate_dinov3.py deleted (its module was generalized into evaluate_embeddings). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

tests/e2e_tests/ — self-contained scripts (run directly with uv run python, not pytest) covering each core ops_model feature: the cell_dino, dinov3, subcell and cell_profiler extractors, and the pca_optimization combination pipeline. Each subsets real inputs to a minimal example in a tmp dir, points an inline config at it, runs the feature normally, and verifies the outputs at each step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Two hardcoded paths that silently disabled features outside one environment: - annotated_gene_panel_July2025.csv: repoint from the now-missing /hpc/projects/intracellular_dashboard/ops/configs path to the present icd.fast.ops/configs location (data/embeddings/utils.py, funk_clusters.py, combination/analysis/embedding_overlays.py). The dead path was caught-and- skipped, silently dropping CORUM consistency scoring. - gene_supercategory_mapping.yaml: default to the in-repo copy (resolved from the repo root) instead of a personal home-dir path that was permission-denied for other users (combination/analysis/embedding_overlays.py, compare_modalities.py, models/attention/atlas/attention_atlas.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

_score_consistency previously ran CORUM, CHAD and EBI inside one shared try/except, so a failure in one metric (e.g. CHAD failing to parse its annotation) silently suppressed the others and dropped EBI entirely. Each metric now runs in its own try/except and returns (None, 0.0) on failure; the panel/volcano plots are best-effort. One metric failing no longer takes the rest down. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

/hpc/projects/icd.ops/configs/gene_clusters/ no longer exists; the CHAD cluster YAMLs live under icd.fast.ops. Repoint the dead references (CHAD overlay hierarchy/cluster-map in pca_optimization phase2/handlers, deprecated gene/guide eval, titration decay tools, compare_map_scores) so they load instead of being skipped with warnings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ahillsley · 2026-06-12T16:16:41Z

@gav-sturm Could you try running a few extraction pipelines / evaluations with this PR to make sure everything works. I tested it a few times and seems good, But I want to make sure it's actually usable on your end

Delete the old base_dataset.py, the data/embeddings/* helpers (cosine_similarity, embedding_metrics, funk_clusters, pca, umap_plots, utils), and move_links.py, along with their now-obsolete tests (test_basedataset.py, test_feature_metrics.py). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…and should not be aprt of the public repo

post_process/map/ was only a backward-compat shim re-exporting the mAP functions from ops_utils.analysis (map_scores, map_umap). Nothing in ops_model imports it anymore, so move it to deprecated/ (kept locally only) and drop it from the tree. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

ahillsley · 2026-06-24T16:21:07Z

@gav-sturm When you have a chance can you go through ops_model/src/ops_model/models/attention and identify what code is core to the method and what is just extra analysis? Can you then move all the analysis scripts to a separate dir?

Alexander Hillsley and others added 8 commits June 10, 2026 12:52

ahillsley requested a review from gav-sturm June 12, 2026 16:15

Alexander Hillsley and others added 4 commits June 24, 2026 08:59

Merge remote-tracking branch 'origin/main' into alexhillsley/refactor

b0b11aa

remove iss_drift_fix.py from the tracked repo, it was a 1 off script …

70e1a3b

…and should not be aprt of the public repo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alexhillsley/refactor#14

Alexhillsley/refactor#14
ahillsley wants to merge 12 commits into
mainfrom
alexhillsley/refactor

ahillsley commented Jun 12, 2026

Uh oh!

ahillsley commented Jun 12, 2026

Uh oh!

ahillsley commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ahillsley commented Jun 12, 2026

Uh oh!

ahillsley commented Jun 12, 2026

Uh oh!

ahillsley commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant