𝗪𝗵𝗮𝘁 𝗜 𝗟𝗲𝗮𝗿𝗻𝗲𝗱 𝗔𝘂𝗱𝗶𝘁𝗶𝗻𝗴 𝗝𝗦𝗢𝗡-𝗟𝗗 𝗗𝗮𝘁𝗮 𝗶𝗻 𝗖𝗜
JSON-LD structured data can vanish from your site without breaking anything visible. Your build succeeds. Your deploy completes. Your pages look fine in a browser.
But Googlebot reads the script tags to decide your rich results. If the data is missing or broken, you won't know until Search Console flags it weeks later.
I added a post-deploy audit step to my CI pipeline. It finds these errors in under 60 seconds.
How it works:
I run three static sites built with Astro and deployed via Cloudflare Pages. These sites use schema like SoftwareApplication, VideoGame, and ItemList. Because they are static, a template change can drop schema from thousands of pages without triggering a build error.
The audit script does this:
• Fetches the live homepage and sample detail pages. • Reads the live sitemap to find real URLs. • Extracts all JSON-LD blocks using regex. • Checks for @graph unwrapping to find nested items. • Compares found @type values against my expected list.
I run this against live deployed pages rather than build artifacts. This catches issues with CDN caching or edge delivery.
The script found three issues on the first run:
- ossfind.com: Missing ItemList on specific pages. This turned a vague idea into a concrete task.
- findindiegame.com: An incorrect http:// protocol in the WebSite @id. This was a copy-paste error that looked fine to a human but was inconsistent for Google.
- aiappdex.com: Using raw database IDs instead of human-readable names in the SoftwareApplication schema.
These were real bugs. Neither showed up in build logs or browser reviews.
I set the CI step to be non-fatal. If the audit finds an issue, the deployment still finishes, but the error appears in the logs. This allows me to observe real-world behavior before I make the check block the pipeline.
It is a smoke test, not a full validation suite. It checks two samples per site. It won't catch every edge case, but it catches the big mistakes before they sit in production for a month.