๐—ช๐—ต๐˜† ๐—œ ๐—ฆ๐˜„๐—ถ๐˜๐—ฐ๐—ต๐—ฒ๐—ฑ ๐˜๐—ผ ๐—”๐—œ ๐—ณ๐—ผ๐—ฟ ๐—ช๐—ฒ๐—ฏ ๐—ฆ๐—ฐ๐—ฟ๐—ฎ๐—ฝ๐—ถ๐—ป๐—ด

I scraped websites for years. I used CSS selectors and XPath. It worked until sites changed their layout. Then my scripts broke. I spent more time fixing code than using data.

I tried BeautifulSoup. I tried Regex. I tried OCR. Nothing lasted. Small changes in HTML broke everything.

Now I use AI models. I send the HTML to an LLM. I ask for a JSON object. The AI finds the price and name. It ignores the HTML structure. It looks at the meaning of the text.

Why this works:

The trade-offs:

My strategy:

Stop fighting with HTML tags. Focus on your data.

Source: https://dev.to/__c1b9e06dc90a7e0a676b/why-i-gave-up-on-regex-and-started-using-ai-for-web-scraping-339d Optional learning community: https://t.me/GyaanSetuAi