Skip to content

Add static checks via Liquid Haskell annotations#1

Open
ninioArtillero wants to merge 5 commits into
masterfrom
lh-check
Open

Add static checks via Liquid Haskell annotations#1
ninioArtillero wants to merge 5 commits into
masterfrom
lh-check

Conversation

@ninioArtillero

@ninioArtillero ninioArtillero commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

This PR adds static checks using Liquid Haskell (LH). The PR does not modify the existing functions, except for one small change with no semantic effect, explained below.

The Liquid Haskell annotations are ignored in regular builds of Diff, but they can be checked by building a new package Diff-liquidhaskell that re-uses the source tree of the original Diff package, so that the actual code of Diff is checked. Usually, using a cabal flag is possible to enable Liquid Haskell instead of creating a separate package, but here that would create a circular dependency since Diff is a transitive dependency of liquidhaskell.

Liquid Haskell checks the Haskell modules by using an external SMT solver, which needs to be installed separately. A README is provided with a note about it.

The static checks were developed during a Tweag's project, described in this blog post; the main goal being to strenghten the codebase with static checks of previously informal invariants and expectations. The post explains the full scope of the project, a fraction of which is contributed in this PR, with other PRs to follow if desired.

In the following sections (statically cheked invariants, termination metrics and omitted checks) I explain the meaning of the LH annotations contained in this PR. The last section addresses some possible concerns from the maintenance perspective.

Statically checked invariants

The DL data type has the following specification:

{-@
data DL = DL
    { poi  :: Nat
    , poj  :: Nat
    , path :: { p : [DI] | len p <= poi + poj }
    }
@-}

This checks that DL coordinate values are always natural numbers (LH defines Nat as Int values that are >= 0), and that their edit trace lenght is bounded by their sum. This specification is in accordance with the algorithm definition and an invariant of the implementation that is expected from any DL value.

The dstep function has this specification:

{-@
dstep
  :: (Nat -> Nat -> Bool)
  -> {nodes : [DL] | len nodes > 0}
  -> {v : [DL] | len v = len nodes + 1}
@-}

Which requires its function argument (the parameterized version of canDiag, a.k.a. the diagonal predicate) to only have natural number arguments (pre-condition), the node list argument to be non-empty (pre-condition) and the resulting node list to increase the input length by one (post-condition). Note how we bind the nodes variable to the input value and v to the output so that we can relate them in the post-condition. This specification provides a detail on the shape of the result, which relates directly to the original algorithm inner loop.

The last provided specification is

{-@ addsnake :: (Nat -> Nat -> Bool) -> x : DL -> {v : DL | path v == path x } @-}

Which again requires the function argument (the diagonal predicate) to have only natural number arguments and specifies that input and output nodes must have the same edit trace, which is the fundamental requirement for addsnake to do what it ought.

The pre- and post-conditions forming the function specifidcations are checked at each call of this functions (unless instructed otherwise).

Termination metrics

Termination metrics are introduced using the / [metric1, metric2,...] syntax at the end of a function specification.
They point LH to a reducing quantity across recursive calls in a recursive function definition, which it uses to prove that such function terminates. Sometimes metric1 decreases, but when it stays constant, then metric2 should decrease, and so on to define a lexicographic order.

All the introduced metrics are appended to specifications matching the existing function signatures, so no additional checks have been introduced:

{-@ go :: Hunk c -> xs : [Diff [c]] -> [Hunk c] / [len xs] @-}

{-@ toLineRange :: Int -> Int -> diffs : [Diff[String]] -> [DiffOperation LineRange] / [len diffs, 0] @-}

{-@ toChange :: Int -> Int -> [String] -> [String] -> diffs : [Diff [String]] -> [DiffOperation LineRange] / [len diffs, 1] @-}

{-@ doParse :: [DiffOperation LineRange] -> diffs : [String] -> [DiffOperation LineRange] / [len diffs] @-}

The relevant example among these are the mutually recursive doPreffix and doSuffix:

 {-@ doPrefix :: h : Hunk c -> [Diff [c]] / [len h, 0]@-}
 {-@ doSuffix :: h : Hunk c -> [Diff [c]] / [len h, 1] @-}

Which uses the fact that, in most equations, the (mutually) recursive call is performed on the tail of the list, so the list length is reduced. The second metric (just constants) is a fallback for the case of doSuffix second guard of its third equation where doPreffix is called on a list of equal length as the input; in this case the second metric decreases from 1 to 0.

The only source change in this PR is a small refactoring introduced for this termination check to work:

-      doPrefix (d : ds) = doSuffix (d : ds)
+      doPrefix (d : ds) = d : doSuffix ds

This does not change the semantics, because when we reach this equation d is known to not be a Both value and this takes us to the last doSuffix equation that does the same thing.

Omitted checks

The following annotations

{-@ lazy addsnake @-} 

{-@ ignore ses @-} 

Prevent LH from trying to prove termination for addsnake and omit all checks inside ses definition body. This is so because both are significantly more complex to check and are left for another PR.

Other changes

  • The README file explains how to build and test Diff, and how to run Liquid Haskell.
  • The .gitignore file was updated to also ignore nested build outputs.

Notes on maintenance

Here are some arguments that support the integration of these static checks:

  • This addition is easy to ignore: even if the LH specifications go unmaintained and break, the library can be build as usual. The LH specification annotations are, to plain GHC, just additional documentation comments.
  • For the same reason as above, we have easy exit: the static checks can be easily removed by just deleting the annotations (and helper functions) and removing the Diff-liquidhaskell.
  • In many cases LH specifications mirror the existing documentation, so people get extra confidence that those are correct.
  • The static checks build up to a clearer connection between the implementation and the original algorithm specification (in the specific case of the Diff module), which could help identify deviations from the expected space-time complexity and possible optimizations.
  • This approach doesn't need potential contributors to learn new syntax/concepts: they can go about their Haskell code and don't bother with the formal syntax.
  • On the other hand, maintainers gets extra confidence of any change or contribution that passes the LH checks.
  • The additional layer of assurance could give this implementation an edge for production use (speculative).

A question that might naturally arise is: What is a maintainer supposed to make of a failing LH check whose errors she doesn't understand? e.g. an error turns out to be uninformative. In relation to the previous points, these checks can't break the package build or CI (unless added to the pipeline), they are currently intended to be run locally and can be safely ignored.

On the other hand, if the decision is not to ignore the check failures, the failing check readily signals that either the changes violate a specification or introduce new ambiguity that LH cannot solve from existing specifications. The latter case might arise e.g. from a new function definition using other annotated functions or lacking a termination metric if it's recursive; this can be solved simply by adding a lazy or ignore annotation to the offending function. Cases of the former kind can be trickier to diagnose and fix, but are probably worth a detailed look, e.g. the refactoring of a annotated function might be violating a property we care about. Alternatively the changes could be making a specification obsolete, so removing it altogether is a way forward if writing a new one is not viable given effort constraints.

Both Haskell and Liquid Haskell are technologies we endorse at Tweag, so we have vested interest in their improvement and adoption, and are generally available to help with whatever problem comes up. We are currently working on improving error messages, documentation and developer experience. Helping maintain this annotations and fix surfacing bugs align with our current agenda.

@ninioArtillero ninioArtillero changed the title Add invariant stating checks using Liquid Haskell annotations Add static checking of specifications with Liquid Haskell annotations Jun 11, 2026
@ninioArtillero ninioArtillero marked this pull request as ready for review June 11, 2026 23:46
@facundominguez facundominguez changed the title Add static checking of specifications with Liquid Haskell annotations Add static checking of specifications with Liquid Haskell Jun 12, 2026
@facundominguez facundominguez changed the title Add static checking of specifications with Liquid Haskell Add static checks via Liquid Haskell annotations Jun 12, 2026
Comment thread cabal.project Outdated
@ninioArtillero ninioArtillero force-pushed the lh-check branch 10 times, most recently from 744bbe7 to 7b679ef Compare June 12, 2026 22:10
@facundominguez

Copy link
Copy Markdown
Member

👋 @ninioArtillero, maybe check the instructions in the README once more.

I think eliminating the cabal.project file is for the better, as no rebuild of LH happens now. Moreover, cabal test and cabal bench would not work before without extra flags.

Comment thread README.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants