𝗥𝗲𝗴𝗲𝘅 𝗕𝗿𝗼𝗸𝗲 𝗠𝘆 𝗦𝗰𝗿𝗮𝗽𝗲𝗿

I built scrapers for years. I used CSS selectors and regex. It worked until the website changed.

I managed 200 supplier sites. One site changed its layout. My code broke. I spent days fixing it. I tried headless browsers. They were too slow.

I tried a new way. I used an LLM. I sent HTML text to GPT. I asked for JSON.

My process:

Pros:

Cons:

My tips for you:

How do you handle website changes?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/regex-broke-my-scraper-using-llms-for-robust-data-extraction-5bef