CI: Check html links for doc#4178
Conversation
|
LGTM. htmltest in a container works well and keeps doc builds free of new dev dependencies. One note for later: we already have an in-tree link checker (docs/src/checkref) built on w3c-linkchecker, but it has been a silent no-op for a while since recent w3c-linkchecker refuses file:// URIs, so it validates nothing. This CI check is a real improvement over that. If we ever want something more actively maintained than htmltest, lychee (Rust, async) is excellent. I'd keep it CI-only via Docker though: there is no Debian apt package for it, so it is not a good fit for the local make checkref target where we rely on apt-installable deps. |
|
Hmm, it doesn't really make sense to have two ways of checking links. However, after some investigation:
Should be easy to integrate it CI, let's see... |
|
Please no, its soooooo slow 😿 |
:-D Ok, but thanks for the tip with lychee, it has even a github action. Let's see what the CI has to say to my change. |
|
Will have to see what @BsAtHome thinks, I'd rather have it quick, but this is an out of tree dep, probably will get contested... |
|
It seems to be an integral part of CI, so there should be less of a problem. The real issue would be how you can do this locally. It is one thing to have CI check, but when working, then you want this to be functional from your dev machine. Lychee is also a photo management system. Confusion is preprogrammed, it seems. FWIW, Fedora doesn't even have the w3c checker. I had to create a local empty script to get through the doc build. Adding more dependencies is, well, not nice. Especially not nice when they are not in any repo. And very "unnice" when requiring internet connections. So, lets just say, I'm not convinced. |
|
Ah now I see the issue, checklink is indeed active in the CI but only for english. Of course, all other languages have broken links. So this could also be corrected by checking all other languages as well instead of running a recursive link checker in CI afterwards. At the moment, checklink is run with each html file as an argument. Doing it recursively with |
I disabled external link checks in CI, this takes ages. But you should run it from time to time locally, there are hundreds of broken external links... Of course this also needs a way to run locally, it's not nice to have to use CI to check your issues.
For w3c-linkchecker: Just For debian, there is a package, but it is broken and throws a million
Me neither, still WIP. I was not aware that there is already a check in which is broken. The question is: Downside: Both new ones have no debian package. You would need to use docker, download a binary or build from source. Not so nice. However, the two I tried have quite a nice output. htmltest: |
|
Just including the w3c link checker is not enough. You then need to add all the Perl dependencies, at least in the dev-build dependencies. Why do lychee and htmltest not report the same problems? That is a bit fishy. Doing docker runs is also a problem IMO. They fall into the same category as curl runs. At least lychee has a CI provided target, which is a better shield than a download-and-run scenario. |
This is already installed due to build depends on it
They all report different kinds of errors. It has to be investigated which tool reports the most useful errors and some can also be turned on/off.
Docker this way shroud be fine as i understand it. If the tool is compromised, it still can not break out of docker and compromise the CI. The github actions are from a marketplace: https://github.com/marketplace/actions/lychee-broken-link-checker I don't know if / how well there are checked, as bad as curl from a github link to a random repo i assume. By using the git hash instead of the version, you can avoid compromised versions coming in, but if the actual version is already bad, bad luck. I would have to switch to the hash before merging. Only the ones starting with actions/ are directly from github and are save as long as github itself is not compromised. So and the variant with w3c in CI: So the options so fare are:
|
Chasing false positive and missing true problems will just take a lot of time. There needs to be real resilience in the checker for it to be worth the effort.
So far, I'm not liking any way or option of getting code from "other" places. The "marketplace" is another work for "jungle law rules". Not a system I trust or would want to have bindings to. As for the time it takes to check, yes, the old w3c checker is slow. Others are faster, but I have some strong reservations to allow integration of any. Maybe we shouldn't integrate it at all and leave it to a side project? OTOH, making sure internal links are fine is an issue. Would it be possible to do internal consistency checking already at the adoc level? |
Check HTML links in docs in CI
There are a few broken links, so continue-on-error is set to true for now.
This is just one tool of many. I tried: