Skip to content

Switch to more efficient sexp datatype#1365

Open
myreen wants to merge 17 commits into
masterfrom
sexp-switch
Open

Switch to more efficient sexp datatype#1365
myreen wants to merge 17 commits into
masterfrom
sexp-switch

Conversation

@myreen

@myreen myreen commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

No description provided.

tanyongkiam and others added 4 commits February 20, 2026 01:08
Rewrite compiler/parsing/fromSexpScript.sml to use mlsexp (Atom/Expr)
from basis/pure instead of simpleSexp from HOL4's context-free examples.
This is Phase 1 of eliminating simpleSexp from CakeML.

Key changes:
- Ancestor: mlsexp replaces simpleSexpParse
- Holmakefile: INCLUDES basis/pure instead of HOL4 context-free
- Encoding: SX_SYM/SX_NUM/SX_STR/SX_CONS replaced by Atom/Expr
- listsexp xs = Expr xs (trivial, lists are native)
- dstrip_sexp extracts tag + args from Expr (Atom tag :: args)
- All roundtrip proofs (encoder/decoder bijection) updated
- dstrip_sexp_SOME uses strlit nm form for efficient gvs resolution

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use mlsexp$fromString instead of parse_sexp for sexp input parsing,
and sexp_to_string instead of print_sexp for sexp output. Remove
simpleSexpParse ancestor and formal-languages/context-free includes
from compiler, scheme, and dafny Holmakefiles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…n files

SexpProg (in basis/) already translates the mlsexp parser/printer to CakeML,
so the translation files no longer need to translate simpleSexp's PEG parser,
printer, or destructor functions. Remove ~300 lines of now-unnecessary code.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dnezam dnezam added the test failing regression test failed on the latest commit of this pull request label Mar 26, 2026
@dnezam

dnezam commented Mar 26, 2026

Copy link
Copy Markdown
Contributor

In addition to the proof failure in sexp_parserProg, it also swapped out Unicode double quotes with string quotes in dafny_compilerProg

@tanyongkiam

Copy link
Copy Markdown
Contributor

the sexp_parser failure is known, I just didn't want to fix it manually yet...

dnezam and others added 6 commits June 5, 2026 18:25
Resolve conflicts in compiler/compilerScript.sml and
compiler/parsing/fromSexpScript.sml by keeping the sexp-switch
representation (Atom/Expr + mlsexp$fromString); master's conflicting
blocks were in the old SX_CONS/SX_SYM/parse_sexp representation that
this branch replaces.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The rewritten proofs referenced mlstringTheory.implode_def, which no
longer exists now that mlstring's constructor is `implode` (with `strlit`
an inferior overload of it). Those references caused static errors that
aborted the theory. Since `strlit = implode` is definitional, the
implode_def rewrite was a no-op; the affected proofs close with the
existing implode_explode/explode_implode lemmas.

Also fixes the Char witness in litsexp_sexplit: `str c` (a string) ->
`implode [c]` (the correct mlstring).

fromSexpTheory now builds with all proofs complete.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
SXNUM is now an ordinary smart-constructor definition
(SXNUM n = Atom (toString (&n))) rather than a simpleSexp datatype
constructor, so the translator no longer handles it automatically.
Translate fromSexpTheory.SXNUM_def before locnsexp_def (its first use)
so the encoder translations close.

to_sexpProgTheory now builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Translate fromSexpTheory.SXNUM_def before locnsexp_def: SXNUM is now a
  smart-constructor function, not a datatype constructor, so the
  translator no longer handles it automatically.
- Remove the obsolete litsexp_side_thm: litsexp now translates without a
  precondition, so litsexp_side no longer exists.
- Restore HOL term/type quotation marks on the main_function sanity check
  and the program-assembly antiquotation, which an earlier commit had
  turned into ASCII string literals.

dafny_compilerProgTheory now builds.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Assisted-by: Claude:claude-opus-4-8[1m]
@dnezam dnezam added test failing regression test failed on the latest commit of this pull request and removed test failing regression test failed on the latest commit of this pull request labels Jun 6, 2026
@dnezam

dnezam commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Claude did some really bad merging...

@dnezam dnezam added test failing regression test failed on the latest commit of this pull request and removed test failing regression test failed on the latest commit of this pull request labels Jun 6, 2026
dnezam and others added 4 commits June 6, 2026 21:45
The switch from simpleSexpParse to mlsexp dropped simpleSexpParse from the
Ancestors, removing print_sexp. The --print_sexp path in compile_def still
referenced it. Use mlsexp$sexp_to_string (flat output) instead; it returns
an mlstring directly, so the old implode/"++" wrapping is no longer needed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
astToSexprLib is a hand-written SML mirror of the fromSexpScript.sml
encoders that serialises a CakeML AST term to s-expression text. It still
produced the old simpleSexp format, which no longer matches the migrated
mlsexp-based encoders/decoders, so its output would not re-parse.

Rewrite it to emit exactly what mlsexp$sexp_to_string would print for the
encoder results (decsexp/expsexp/...), keeping the public API
(write_ast, write_ast_to_file) unchanged so all consumers are unaffected.

Notable format changes (mlsexp sexp = Atom | Expr):
- no dotted pairs/tuples; pairs become 2-element lists, empty list "()"
- atoms quoted only when unsafe (faithful encode_control + make_str_safe
  + escape_str ports), using an ordinal isPrint (32..126) test
- IntLit now tagged with a "~" sign; Char via SEXSTR; StrLit bare;
  words/Float64 via decimal SXNUM
- locations as nested s-expressions, incl. EOFpt
- ThunkOp operators handled; explicit op->tag table that keeps
  Vsub_unsafe's underscore while stripping the other unsafe ops

Verified in HOL: byte-exact match against sexp_to_string (decsexp d) for
every literal/op/pattern/type/declaration form (incl. tricky strings),
and a full write_ast -> fromString -> sexplist sexpdec round-trip
recovering the original program.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@dnezam

dnezam commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Seems like it builds now. Before this gets merged, the changes should be documented (including checking for regressions).

One example of a silent regression:

- if s = "Vsubunsafe" then SOME Vsub_unsafe else
+ if a = «Vsub_unsafe» then SOME Vsub_unsafe else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test failing regression test failed on the latest commit of this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants