swlc introduction
arts & ego is rather big, with thousands of pages. They are all static HTML, whatever their extension might imply. They contain links. I would like to verify that the links are valid.
The solution I’ve being using is rather good for checking single pages, but pretty hopeless for checking lots of them. It takes days to cover the entire site, and misses far too many broken links.
From the occasional result of a broken link in the application I used before, it’s clear that it does not pick up on many internal broken links. If an unchanged page in an unchanged section is finally reported to have a broken link to a non-existent page in that section after months of repeated checks, it follows that the older software usually misses broken links. Since it’s meant to check links, and doesn’t properly do so, I decided to retire it, at least from here.
A lot of my pages including common external links, for example to schema.org. Because the previous software checks each page independently, then, when I check my site, it repeatedly checks the same external link when that link only actually needs checking once. It’s a nice way to set off hacker alerts on other sites.
I didn’t find an alternative package. In the end, I decided the easiest solution was to write my own link checker, the relatively simple swlc. It does the following:
- Checks a static site as a whole, not individual pages one by one.
- Builds up a map of all the files in the local website, before it starts any checking.
- Checks static internal links against that map rather than repeatedly annoying a web server.
- Keeps track of external links that have been checked, and doesn’t repeat those checks.
And it works! Well, it works far better than my previous solution. This software covers my entire site in under an hour, and has found thousands of broken links that the previous package missed. That’s high quality results in less than one hundredth the time. Good God.