Broken Links Won't Fix Themselves

You do not notice broken links when you publish. You notice them months later, when a reader clicks one and lands on a 404.

That is the annoying part. The content still looks fine in your editor. The page builds. The site ships. The dead link sits there until someone outside your workflow finds it.

I got tired of waiting for that email.

So I wrote go-linkchecker, a small Go tool that starts with markdown files today, with a roadmap toward MDX, HTML, CMS-backed content, and sitemap-based checks. It extracts HTTP and HTTPS links, checks them concurrently, and gives me a report I can act on. Broken, OK, Skipped. That is the whole idea.

Why I wanted this#

Static sites make publishing easy and maintenance easy to forget. A post from two years ago can still rank, still get read, and still send people to a domain that no longer exists.

Most teams only catch that problem by accident. A reader complains. A search console report shows a spike. Somebody notices the link in passing and makes a ticket.

I wanted a check that runs before the complaint.

The first version targets .md files because that fits how my content is stored now. The open roadmap is broader: WordPress and other PHP CMS setups, Node apps with database-backed content, .mdx, .html, and live sites that expose a sitemap or URL list.

That is the real goal. Markdown is the starting point, not the limit.

What the tool does#

go-linkchecker walks a directory, finds .md files, extracts links, and checks each unique URL once. It uses the Go standard library only. No extra dependencies, no broker, no service to babysit.

It also does a few things that matter in practice:

Checks links concurrently
Tries HEAD first, then falls back to GET when a site blocks HEAD
Supports skip patterns for bot-hostile or local URLs
Lets you ignore specific status codes when a site is live but unfriendly to automated requests
Exits with status code 1 when broken links exist, which makes it useful in CI

That last part is the main reason I built it. A link checker is only useful if it can fail the build.

The roadmap is not markdown-only. The open issue tracks support for other source types, starting with broader file extensions and moving toward sitemap and URL-list workflows for stacks like WordPress, PHP apps, Node apps, and other systems that do not keep content in plain .md files.

The code is on GitHub: srmdn/go-linkchecker ↗.

The mental model#

The flow is simple:

markdown files
    ↓
extract URLs
    ↓
deduplicate
    ↓
check each link concurrently
    ↓
report broken, OK, and skipped links

text

Nothing fancy. That is the point.

I do not want a crawler. I do not want a black box. I want a small tool that tells me which link broke, where it lives, and whether I should care.

Why not use something bigger#

You can, and sometimes you should. If you need full site crawling, JavaScript rendering, or a hosted dashboard, a heavier tool makes sense.

That was not my problem.

My site is markdown-based. The content already lives in files. I needed something that could check those files directly and fit into a weekly maintenance pass. A queue, a browser, and a web crawler would have been noise.

The same problem shows up elsewhere too: WordPress sites, PHP CMS installs, Node apps with content in a database, MDX projects, raw HTML sites, and live sites that publish a sitemap. Those stacks need the same broken-link discipline, just from different inputs.

The tradeoffs#

The tool is intentionally narrow. It does not crawl live HTML. It does not try to understand MDX. It does not pretend to solve every broken-link problem on the internet.

That limitation is a feature. Narrow tools are easier to trust. They are easier to run in CI. They are easier to debug when one URL starts returning garbage.

The other tradeoff is false positives. Some sites return 403 to automated requests even when the page is live. I handle that with skip patterns and ignored status codes, because pretending those sites are broken would make the report less useful.

The part I care about#

The real value is not the code. It is the habit.

Once a tool is cheap to run, you run it. Once you run it regularly, broken links stop being a surprise. Once they stop being a surprise, they stop turning into cleanup debt.

That is the version of maintenance I like. Boring, automatic, and hard to ignore.

Is This Right for You?#

This kind of tool makes sense if your content lives in markdown files, you want a small check that can run in CI or cron, and you care more about reliability than feature count.

It is not the right fit if you need full website crawling, client-side rendering, or a link management platform with dashboards and notifications everywhere.

For me, the useful test is simple: does it catch the problem before a reader does? If yes, it earns its place.