Skip to content

CI: Check html links for doc#4178

Draft
hdiethelm wants to merge 3 commits into
LinuxCNC:masterfrom
hdiethelm:ci_htmltest
Draft

CI: Check html links for doc#4178
hdiethelm wants to merge 3 commits into
LinuxCNC:masterfrom
hdiethelm:ci_htmltest

Conversation

@hdiethelm

@hdiethelm hdiethelm commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Check HTML links in docs in CI

There are a few broken links, so continue-on-error is set to true for now.

This is just one tool of many. I tried:

@grandixximo

Copy link
Copy Markdown
Contributor

LGTM. htmltest in a container works well and keeps doc builds free of new dev dependencies.

One note for later: we already have an in-tree link checker (docs/src/checkref) built on w3c-linkchecker, but it has been a silent no-op for a while since recent w3c-linkchecker refuses file:// URIs, so it validates nothing. This CI check is a real improvement over that.

If we ever want something more actively maintained than htmltest, lychee (Rust, async) is excellent. I'd keep it CI-only via Docker though: there is no Debian apt package for it, so it is not a good fit for the local make checkref target where we rely on apt-installable deps.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Hmm, it doesn't really make sense to have two ways of checking links. However, after some investigation:

  • w3c-linkchecker is one single perl file, no need to build and install
  • --follow-file-links does the trick and it works again

Should be easy to integrate it CI, let's see...

@grandixximo

Copy link
Copy Markdown
Contributor

Please no, its soooooo slow 😿

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Please no, its soooooo slow 😿

:-D Ok, but thanks for the tip with lychee, it has even a github action. Let's see what the CI has to say to my change.

@grandixximo

Copy link
Copy Markdown
Contributor

Will have to see what @BsAtHome thinks, I'd rather have it quick, but this is an out of tree dep, probably will get contested...

@BsAtHome

Copy link
Copy Markdown
Contributor

It seems to be an integral part of CI, so there should be less of a problem.

The real issue would be how you can do this locally. It is one thing to have CI check, but when working, then you want this to be functional from your dev machine. Lychee is also a photo management system. Confusion is preprogrammed, it seems.
Do you actually need build integration for a link checker? Internal links,... that would be nice to catch inconsistencies. However, testing external links in the build prevents building on systems that have no network. That is not a nice thing to do.

FWIW, Fedora doesn't even have the w3c checker. I had to create a local empty script to get through the doc build. Adding more dependencies is, well, not nice. Especially not nice when they are not in any repo. And very "unnice" when requiring internet connections.

So, lets just say, I'm not convinced.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Ah now I see the issue, checklink is indeed active in the CI but only for english.

###
### language: English
### all links are good!
###

Of course, all other languages have broken links.

So this could also be corrected by checking all other languages as well instead of running a recursive link checker in CI afterwards.

At the moment, checklink is run with each html file as an argument. Doing it recursively with --recursive on top index.html using would be also an option.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

It seems to be an integral part of CI, so there should be less of a problem.

The real issue would be how you can do this locally. It is one thing to have CI check, but when working, then you want this to be functional from your dev machine. Lychee is also a photo management system. Confusion is preprogrammed, it seems. Do you actually need build integration for a link checker? Internal links,... that would be nice to catch inconsistencies. However, testing external links in the build prevents building on systems that have no network. That is not a nice thing to do.

I disabled external link checks in CI, this takes ages. But you should run it from time to time locally, there are hundreds of broken external links...

Of course this also needs a way to run locally, it's not nice to have to use CI to check your issues.

FWIW, Fedora doesn't even have the w3c checker. I had to create a local empty script to get through the doc build. Adding more dependencies is, well, not nice. Especially not nice when they are not in any repo. And very "unnice" when requiring internet connections.

For w3c-linkchecker: Just
wget https://raw.githubusercontent.com/w3c/link-checker/0623dd9b9a3136a70a73256169e4fd2543a93cce/bin/checklink
problem solved. This script is only 116k, so we could add this also directly in the linuxcnc repo.

For debian, there is a package, but it is broken and throws a million
Use of uninitialized value $_ in pattern match (m//) at /usr/bin/checklink line 1296.
at you. This is even handled in the checkref script... :-D

So, lets just say, I'm not convinced.

Me neither, still WIP. I was not aware that there is already a check in which is broken.

The question is:
Do we fix the existing link check or add a new one and drop the old? w3c-linkchecker is also not bad, just takes 30s instead of 3s. Having two link checkers is not desirable in my opinion.

Downside: Both new ones have no debian package. You would need to use docker, download a binary or build from source. Not so nice.

However, the two I tried have quite a nice output.
lycee:
In the overview:
https://github.com/LinuxCNC/linuxcnc/actions/runs/27866839643?pr=4178
And in the run:
https://github.com/LinuxCNC/linuxcnc/actions/runs/27866839643/job/82472249104?pr=4178#step:8:97

htmltest:
https://github.com/LinuxCNC/linuxcnc/actions/runs/27837874384/job/82390015173#step:8:41

@BsAtHome

Copy link
Copy Markdown
Contributor

Just including the w3c link checker is not enough. You then need to add all the Perl dependencies, at least in the dev-build dependencies.

Why do lychee and htmltest not report the same problems? That is a bit fishy.

Doing docker runs is also a problem IMO. They fall into the same category as curl runs. At least lychee has a CI provided target, which is a better shield than a download-and-run scenario.

This is already installed due to build depends on it
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Just including the w3c link checker is not enough. You then need to add all the Perl dependencies, at least in the dev-build dependencies.

Why do lychee and htmltest not report the same problems? That is a bit fishy.

They all report different kinds of errors. It has to be investigated which tool reports the most useful errors and some can also be turned on/off.

Doing docker runs is also a problem IMO. They fall into the same category as curl runs. At least lychee has a CI provided target, which is a better shield than a download-and-run scenario.

Docker this way shroud be fine as i understand it. If the tool is compromised, it still can not break out of docker and compromise the CI.

The github actions are from a marketplace: https://github.com/marketplace/actions/lychee-broken-link-checker I don't know if / how well there are checked, as bad as curl from a github link to a random repo i assume.

By using the git hash instead of the version, you can avoid compromised versions coming in, but if the actual version is already bad, bad luck. I would have to switch to the hash before merging.

Only the ones starting with actions/ are directly from github and are save as long as github itself is not compromised.

So and the variant with w3c in CI:
https://github.com/LinuxCNC/linuxcnc/actions/runs/27868137894/job/82475536953?pr=4178#step:8:9

So the options so fare are:

  1. CI: w3c / takes 40s (Ready)
  2. CI: htmltest / takes 5s (Ready)
  3. CI: lychee / takes 4s (Ready)
  4. Fix the makefile so all languages are checked (Someone else has to do this, I am not good in Makefiles)

@BsAtHome

Copy link
Copy Markdown
Contributor

They all report different kinds of errors. It has to be investigated which tool reports the most useful errors and some can also be turned on/off.

Chasing false positive and missing true problems will just take a lot of time. There needs to be real resilience in the checker for it to be worth the effort.

Docker this way shroud be fine as i understand it. If the tool is compromised, it still can not break out of docker and compromise the CI.
The github actions are from a marketplace: https://github.com/marketplace/actions/lychee-broken-link-checker I don't know if / how well there are checked, as bad as curl from a github link to a random repo i assume.

So far, I'm not liking any way or option of getting code from "other" places. The "marketplace" is another work for "jungle law rules". Not a system I trust or would want to have bindings to.

As for the time it takes to check, yes, the old w3c checker is slow. Others are faster, but I have some strong reservations to allow integration of any.

Maybe we shouldn't integrate it at all and leave it to a side project? OTOH, making sure internal links are fine is an issue. Would it be possible to do internal consistency checking already at the adoc level?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants