Skip to content

Humble attempt to fix typos by adding codespell action, config etc.#53

Open
yarikoptic wants to merge 4 commits into
canlab:masterfrom
yarikoptic:enh-codespell
Open

Humble attempt to fix typos by adding codespell action, config etc.#53
yarikoptic wants to merge 4 commits into
canlab:masterfrom
yarikoptic:enh-codespell

Conversation

@yarikoptic

@yarikoptic yarikoptic commented Nov 28, 2023

Copy link
Copy Markdown
Contributor

original description: but the number is way too many. If someone could help and push more skips to add -- would be great.

adjusted one -- mostly redone with my codespell claude skill -- since 260 files changed, might need some quality time to click through all them marking "viewed -- legit" ;):

Adds codespell spell-checking infrastructure and fixes all existing typos it detects (~250 unique typos across ~245 files), so the next typo to land will be caught in CI rather than after the fact.

I've introduced this exact pattern to over a hundred projects with positive feedback (see improveit-dashboard's notes for context). The GitHub Actions workflow has permissions: contents: read only.

There are about 25 prior typo-related commits in master, so this is a recurring issue worth automating.

What's in the PR

Infrastructure (4 commits)

  1. Add CI workflow to codespell master on push and PRs.github/workflows/codespell.yml, pinned to a commit SHA for actions-codespell@v2.2, runs on push and on PRs targeting master.
  2. Add codespell configuration.codespellrc with:
    • Skip patterns for binary/data files, External/ (vendored), docs_sphinx_old/, etc.
    • ignore-regex for URLs and for base64-encoded image data embedded by MATLAB live-script export in docs/markdown_tutorials/*.m.
    • ignore-words-list for MATLAB function names (ttest, assignin, evalin), variable-name conventions (indx, alph, te, selt, etc.), domain proper names (Sepulcre, Shepard), and a few struct-field / file names that are public API (prctiles, efficency).
  3. Fix ambiguous typos requiring context review – 29 files, ~32 lines. Each fixed manually after reading the surrounding code; commit body lists every fix with file and rationale (e.g. interation -> iteration in an algorithm-convergence comment, not interaction).
  4. Fix non-ambiguous typos with codespell -w – 244 files, ~405 lines. All single-suggestion typos auto-applied by codespell -w, then I reviewed the diff and reverted the changes that would have broken things:
    • desc.prctiles is a documented struct field of descriptives() output – kept (would have broken downstream readers that codespell missed due to apostrophe-transpose syntax).
    • efficency.m filename retained – the function name must match the file; the sibling efficiency.m (newer version) was correctly cleaned up.
    • groupt variable in effect_size_map.m kept (with inline codespell:ignore) – example workflow variable, renaming would silently break user scripts.
    • Two docs cells that intentionally cite source-code typos (continguous, classfy) restored with inline pragmas.
    • Pipe-column alignment in canlab_glm_dsgninfo.txt re-spaced after word-length changes (because/performed/with/details).

One real bug found as a side effect

Visualization_functions/tor_wedge_plot.m line 468 had handels(i).texth(1) = ... (typo of the function's return variable handles). The text-handle assignment was being silently discarded into a phantom variable; codespell -w corrected it.

Most-frequent fixes
typo count replacement
saggital 40 sagittal
expermental 28 experimental
dispaly 22 display
atleast 17 at least
aproach 15 approach
concensus 12 consensus
initalize 12 initialize
Nneeded handling, fucntion etc. 9 each function, etc.
signficant 8 significant
homogenous 7 homogeneous
accomodate 7 accommodate
re-used 7 reused
analagous 6 analogous
efficency (in efficiency.m only) 6 efficiency
~230 other unique typos 1–5 each
Verification
$ uvx codespell
# (no output – clean)

The CI workflow runs this same command on every push to master and on PRs.


Generated with Claude Code and love to typo-free code.

@jcf2

jcf2 commented Sep 19, 2024

Copy link
Copy Markdown
Contributor

In the 'sagittal' case, the change could be extended slightly:

case {'sagg', 'sagittal', 'saggital'}

really needs the first as well as the 2nd/3rd to be fixed?

@yarikoptic

Copy link
Copy Markdown
Contributor Author

so, overall, would you like me to polish it up? -- I could push more fixes (as you can see there is still a good number)...

Comment thread CanlabCore/@fmridisplay/addpoints.m Outdated
whcol = 3;

case {'sagg', 'sagittal', 'saggital'}
case {'sagg', 'sagittal', 'sagittal'}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jc2 , following your comment in the main thread (sorry -- missed), you want to expand this to contain all of them and then we need to ignore the line to not fix them up, so smth like

Suggested change
case {'sagg', 'sagittal', 'sagittal'}
case {'sagg', 'sagittal', 'sagittal', 'saggital'} # codespell:ignore

?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure exactly what I meant there, but looking at that case, 'sagg' and 'saggital' are misspellings, though possibly intentionally handled in this case statement? So this may be less a "typo" issue than a design one.

The original has case {'sagg', 'sagittal', 'sagittal'}. That handles both correct and misspelled sagittal, but only misspelled sag. So I think the correct line might be

case {'sag', 'sagg', 'sagittal', 'saggital'}  # codespell:ignore

IF the intention is indeed to gently support misspelled versions?...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW git history is not useful here
❯ git blame CanlabCore/@fmridisplay/addpoints.m | grep case.*sagg
d7bb2736 fmridisplay/@fmridisplay/addpoints.m (Zeb Delk  2014-08-12 15:45:56 -0600 181)             case {'sagg', 'sagittal', 'saggital'}
❯ git show d7bb2736 | head -n 20
commit d7bb27368a04c20a3bb62672ed5854bb698c4aab
Author: Zeb Delk <elizabeth.delk@colorado.edu>
Date:   Tue Aug 12 15:45:56 2014 -0600

    Import the SCN Core Support.

diff --git a/@canlab_dataset/add_var.m b/@canlab_dataset/add_var.m
new file mode 100644
index 0000000..1876c74
--- /dev/null
+++ b/@canlab_dataset/add_var.m
@@ -0,0 +1,57 @@
+% Not complete yet. Please edit me
+% 
+% Function for adding a variable to a dataset in a systematic way.
+% - Checks IDs of subjects to make sure data is added in the correct order.
+% - Values for missing data are coded as missing values, as specified in dat.Description.Missing_Values
+% - Handles Subject_Level or Event_Level data
+
+varname = 'ValenceType';

but indeed I would say the idea likely was to allow for human errors , but I think it was a mistake to not amplify here: allowing errors in one place in the code leads to the need to spread such need to all places where such mistakes could be made etc. So I would really advise against expanding, but to just add a correct value here... looking back at me making mistake and adding correct value second time, I think we just need

Suggested change
case {'sagg', 'sagittal', 'sagittal'}
case {'sagg', 'sagittal', 'sagittal'} # codespell:ignore

and be done here!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that reasoning, but I don't understand the change line. Why two copies of 'sagittal'? Why not the correct spelling of 'sag'?

I would think it would be just case {'sag', 'sagittal'} to follow those principles (optionally with codespell:ignore but I'm guessing it cannot "correct" anything in this line erroneously so that part seems unnecessary, if not intentionally preserving bad spelling...?).

@yarikoptic

Copy link
Copy Markdown
Contributor Author

should I just ignore docs_sphinx_old ?

@torwager

Copy link
Copy Markdown
Contributor

Thank you, @yarikoptic ! now there are substantial documentation changes and some updates using claude's help. let me know if you see other problems or would be useful to run codespell regularly.

@torwager torwager closed this Jun 19, 2026
@torwager torwager reopened this Jun 19, 2026
@torwager

Copy link
Copy Markdown
Contributor

you could ignore docs_sphinx_old yes. @yarikoptic

yarikoptic and others added 4 commits June 19, 2026 08:59
GitHub Actions workflow runs codespell on every push to master and on
pull requests targeting master. The workflow is pinned to a commit SHA
for reproducibility and uses 'permissions: contents: read' for safety.

Co-Authored-By: Claude Code 2.1.138 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Configure codespell to skip vendored / data / build artifacts and to
ignore short variable names, MATLAB built-in function names, proper
names, and domain abbreviations that would otherwise be flagged.

The ignore-regex skips URLs (so we don't "fix" typos in third-party
links) and the base64-encoded image data embedded by MATLAB
live-script export in docs/markdown_tutorials/*.m.

Co-Authored-By: Claude Code 2.1.138 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These typos have multiple plausible corrections (e.g. trough -> through
or trough). Fixed manually after reading each occurrence's context:

- regoin -> region (@atlas/threshold.m: brain region splitting)
- interals -> intervals (@canlab_dataset/plot_var.m: 95% confidence intervals)
- converstion -> conversion (×2: @fmri_data/fmri_data.m, Model_building_tools/design_matrix.m)
- extacted -> extracted (×2: @fmri_data/fmri_data.m, @fmri_timeseries/fmri_timeseries.m)
- interation -> iteration (@fmri_glm_design_matrix/robustfit.m: algorithm iteration)
- exlude -> exclude (@image_vector/image_similarity_plot.m: exclude empty data)
- followd -> followed (@image_vector/orthviews.m: followed by integer)
- pring -> print (Cifti_plotting/plot_surface_map.m: figure to print)
- numbe -> number (GLM_Batch_tools/spm_splines.README)
- nexted -> nested (HRF_Est_Toolbox4/.../testFMinSearchNew.m: nested function)
- fiels -> fields (Image_computation_tools/image_eval_function_multisubj.m: .n fields)
- shoul -> should (Image_computation_tools/image_histogram1d.m: trailing comment fragment)
- inbetween -> in between (×2: Model_building_tools/design_matrix.m)
- specificy -> specify (OptimizeDesign11/.../README_gst_notes)
- propotions -> proportions (OptimizeDesign11/core_functions/calcFreqDev.m)
- defauls -> defaults (OptimizeDesign11/other_functions/designsim_gui_script.m)
- cicles -> circles (Statistics_tools/.../xval_SVM_BKedit25Dec.m: plot legend)
- foor -> for (Statistics_tools/cluster_confusion_matrix.m)
- decription -> description (×5 in Visualization_functions/riverplot/*.m)
- psace -> space (canlab_canonical_brains/.../spherical_icosahedral_interpolation.m)
- covert -> convert (diagnostics/effect_size_map.m: convert to power estimate)
- coverge -> converge (diagnostics/fmri_mask_thresh_canlab.m: mixture models converge)
- sensivity -> sensitivity (web_repository_tools/...: sensitivity, specificity, PPV)
- warpper -> wrapper (×2 CBIG_RF_*.sh: wrapper script)

Co-Authored-By: Claude Code 2.1.138 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single-suggestion typos applied automatically by `codespell -w`, plus a
few targeted reverts for false positives the auto-fix had introduced:

  * `desc.prctiles` (struct field, documented public API) reverted from
    the `percentiles` "fix" in @image_vector/descriptives.m,
    Image_computation_tools/mean_image.m, and the docs table.
  * `efficency.m` function name kept (must match file name); sibling
    `efficiency.m` correctly renamed to its filename.
  * `groupt` variable in diagnostics/effect_size_map.m kept (script-level
    example variable); inline `codespell:ignore` annotation added.
  * `continguous` and `classfy` in docs intentionally retained as they
    reference real source identifiers; inline pragmas added.
  * Pipe-column alignment in canlab_glm_dsgninfo.txt restored after
    word-length changes (because, performed, with, details).

The dominant fixes by frequency: saggital -> sagittal (40 occurrences),
expermental -> experimental (28), dispaly -> display (22), atleast -> at
least (17, all in comments), aproach -> approach (15), concensus ->
consensus (12), initalize -> initialize (12), fucntion -> function (9),
signficant -> significant (8), homogenous -> homogeneous (7), accomodate
-> accommodate (7), re-used -> reused (7), efficency -> efficiency (in
efficiency.m only), analagous -> analogous (6), and ~230 unique
single-suggestion fixes across docs, comments, strings, and code.

Also fixes a long-standing latent bug in
Visualization_functions/tor_wedge_plot.m where `handels(i).texth(1) =
...` was a typo of the return variable `handles` -- the text-handle
assignment was being discarded into a phantom variable.

Co-Authored-By: Claude Code 2.1.138 / Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants