๐—”๐—œ ๐—ช๐—ฒ๐—ฏ ๐—ฆ๐—ฐ๐—ฟ๐—ฎ๐—ฝ๐—ถ๐—ป๐—ด ๐—ฉ๐˜€ ๐—ง๐—ฟ๐—ฎ๐—ฑ๐—ถ๐˜๐—ถ๐—ผ๐—ป๐—ฎ๐—น ๐—ฆ๐—ฒ๐—น๐—ฒ๐—ฐ๐˜๐—ผ๐—ฟ๐˜€

Traditional web scraping breaks often. You write CSS selectors. The site updates. Your code fails. I tried a new way using AI.

I built a price tool. I used XPath and regex. Site redesigns broke my scrapers. Regex picked up wrong numbers. I needed a tool to understand meaning.

I first sent raw HTML to an LLM. It cost too much. It hallucinated data. I tried removing too much text. The model lost the context.

I changed my process. First, I cleaned the HTML. I removed scripts and footers. I kept only headings and prices. This cut token use by 70%.

Second, I gave the AI examples. I showed it what a price looks like. Third, I set temperature to 0. This made the output stable.

There are trade-offs.

Skip AI if:

Respect robots.txt. Start with AI to save time. Add validation for safety.

What does your scraping stack look like?

Source: https://dev.to/__c1b9e06dc90a7e0a676b/i-tried-ai-powered-web-scraping-so-my-selectors-could-finally-rest-2llf