๐—Ÿ๐—Ÿ๐— ๐˜€ ๐—™๐—ผ๐—ฟ ๐—•๐—ฒ๐˜๐˜๐—ฒ๐—ฟ ๐—ช๐—ฒ๐—ฏ ๐—ฆ๐—ฐ๐—ฟ๐—ฎ๐—ฝ๐—ถ๐—ป๐—ด

I spent years writing scrapers. I used CSS selectors and regex. It worked until the website changed.

One layout update broke my code. I spent days fixing it. I lost the battle against changing HTML.

I tried a new way. I used LLMs. I stop guessing selectors. I send page text to the model.

My process is simple:

The results are better. It works on different layouts. It recognizes prices and stock status without specific rules.

There are trade-offs:

Choose your tool based on your needs:

I have not touched my code in three weeks. The LLM handles the fragile DOM for me.

How do you handle website changes? Do you use selectors or models?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/regex-broke-my-scraper-using-llms-for-robust-data-extraction-5bef Optional learning community: https://t.me/GyaanSetuAi