Skip to content

Alexhillsley/refactor#14

Open
ahillsley wants to merge 12 commits into
mainfrom
alexhillsley/refactor
Open

Alexhillsley/refactor#14
ahillsley wants to merge 12 commits into
mainfrom
alexhillsley/refactor

Conversation

@ahillsley

Copy link
Copy Markdown
Collaborator

Changes:

  • Restructure combination (dropped old pipeline and focused on pca_optimization)

  • Add config entry point to pca_optimization

  • Score CORUM/CHAD/EBI independently - so when one metric fails the others will still run

  • Unify CSV-> Anndata processing into process_features_csv

  • Depreciate eval subdir and remove outdated tests

  • add end-to-end test scripts for the feature pipelines

  • Fix stale hardcoded config paths

  • Fix stale icd.ops gene cluster path

Alexander Hillsley and others added 8 commits June 10, 2026 12:52
…ntal + add-on modules

The argparse subpackage (`python -m ...combination.pca_optimization`) is now the
canonical entry point, so the config/baseline.yml path is removed and the remaining
modules are grouped by role.

Remove deprecated config/baseline.yml path (no non-test/non-scratch importers):
- cli.py, config_handler.py, file_validator.py
- combiners.py (the duplicate PcaOptimizationCombiner + deprecated ComprehensiveCombiner)
- classifier_combiner.py / classifier_aggregator.py (dormant; never wired into the CLI)

Group downstream-only analysis tools into analysis/:
- embedding_overlays, compare_map_scores, compare_modalities,
  pca_component_to_feature, marker_norm_sweep_runner

Group optional flag-gated stages into pipeline_add_ons/:
- op_signal, chromosome, guide_chrom_arm_correction

Update all importers (pca_optimization __init__/handlers/phase2/embeddings and
models/attention/embedding/regen_umap_html) to the new paths. pca_sweep_op_signal is
still re-exported through pca_optimization's namespace; test_pca_optimization_refactor
passes (41/41).

Add README.md (how to run the subpackage) and SCRIPT_MAP.md (core vs supplemental
inventory). cell_filters.py is retained pending a port-vs-drop decision (no subpackage
equivalent yet).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…tion

Two additive features for the pca_optimization combination pipeline, both reusing
the existing argparse Namespace + Phase 1/2 with no parallel schema.

1. --config <yaml>: run the pipeline from a config file whose keys are the CLI
   argument names (snake_case dest names). Config values populate argparse
   defaults via set_defaults; any explicit CLI flag still overrides. Adds
   run_from_config() (programmatic entry) and _load_and_validate_config()
   (rejects unknown keys + the phase_only/no_phase conflict). main() is split
   into main() (parse + config merge) and run(args) (the unchanged dispatch).
   Example at pca_optimization/example_config.yml.

2. signal_paths (a config key): combine cell-level embedding h5ads that live
   OUTSIDE the standard experiment layout. Maps a signal-group name -> one h5ad
   path or a list of paths (pooled); each h5ad uses the same schema as the
   discovery features_processed_*.h5ad. phase1.pca_sweep_pooled_signal gains an
   optional cell_paths override (explicit path instead of find_cell_h5ad_path);
   new handlers._handle_external builds signal groups from the manifest and
   reuses the pooled worker + Phase 2. Experiment discovery is skipped; output
   lands under <output_dir>/external/.

Verified: 41/41 structural tests pass; external ingest validated end-to-end
(two synthetic h5ads pooled into one signal -> per_signal guide/gene outputs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move src/ops_model/eval into src/ops_model/deprecated/eval (following the
existing deprecated/ convention) and delete tests/eval. Also drop the now-dead
run_eval console-script entry point from pyproject.toml.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the two near-duplicate pipelines (evaluate_cp.process and
evaluate_embeddings.process_embedding_csv) with a single
processing_common.process_features_csv that branches on feature type:
CellProfiler builds the cell AnnData and splits by reporter; embedding models
(dinov3/cell_dino/subcell) build one per-channel AnnData. Shared embedding-config
parsing, guide/gene aggregation, and validate_and_save now live in
processing_common; cp_features and batch_process_embeddings call the single
entry point and the old ones are removed.

Also folds in the features/ cleanup: dead functions + unused imports removed,
the good_rows_mask NaN-row bug fixed, and the broken test_evaluate_dinov3.py
deleted (its module was generalized into evaluate_embeddings).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
tests/e2e_tests/ — self-contained scripts (run directly with uv run python, not
pytest) covering each core ops_model feature: the cell_dino, dinov3, subcell and
cell_profiler extractors, and the pca_optimization combination pipeline. Each
subsets real inputs to a minimal example in a tmp dir, points an inline config
at it, runs the feature normally, and verifies the outputs at each step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Two hardcoded paths that silently disabled features outside one environment:

- annotated_gene_panel_July2025.csv: repoint from the now-missing
  /hpc/projects/intracellular_dashboard/ops/configs path to the present
  icd.fast.ops/configs location (data/embeddings/utils.py, funk_clusters.py,
  combination/analysis/embedding_overlays.py). The dead path was caught-and-
  skipped, silently dropping CORUM consistency scoring.
- gene_supercategory_mapping.yaml: default to the in-repo copy (resolved from
  the repo root) instead of a personal home-dir path that was permission-denied
  for other users (combination/analysis/embedding_overlays.py, compare_modalities.py,
  models/attention/atlas/attention_atlas.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
_score_consistency previously ran CORUM, CHAD and EBI inside one shared
try/except, so a failure in one metric (e.g. CHAD failing to parse its
annotation) silently suppressed the others and dropped EBI entirely.
Each metric now runs in its own try/except and returns (None, 0.0) on
failure; the panel/volcano plots are best-effort. One metric failing no
longer takes the rest down.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
/hpc/projects/icd.ops/configs/gene_clusters/ no longer exists; the CHAD
cluster YAMLs live under icd.fast.ops. Repoint the dead references (CHAD
overlay hierarchy/cluster-map in pca_optimization phase2/handlers,
deprecated gene/guide eval, titration decay tools, compare_map_scores)
so they load instead of being skipped with warnings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ahillsley ahillsley requested a review from gav-sturm June 12, 2026 16:15
@ahillsley

Copy link
Copy Markdown
Collaborator Author

@gav-sturm Could you try running a few extraction pipelines / evaluations with this PR to make sure everything works. I tested it a few times and seems good, But I want to make sure it's actually usable on your end

Alexander Hillsley and others added 4 commits June 24, 2026 08:59
Delete the old base_dataset.py, the data/embeddings/* helpers
(cosine_similarity, embedding_metrics, funk_clusters, pca, umap_plots,
utils), and move_links.py, along with their now-obsolete tests
(test_basedataset.py, test_feature_metrics.py).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
post_process/map/ was only a backward-compat shim re-exporting the mAP
functions from ops_utils.analysis (map_scores, map_umap). Nothing in
ops_model imports it anymore, so move it to deprecated/ (kept locally
only) and drop it from the tree.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@ahillsley

Copy link
Copy Markdown
Collaborator Author

@gav-sturm When you have a chance can you go through ops_model/src/ops_model/models/attention and identify what code is core to the method and what is just extra analysis? Can you then move all the analysis scripts to a separate dir?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant