𝗔𝗜 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴 𝗩𝘀 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗦𝗲𝗹𝗲𝗰𝘁𝗼𝗿𝘀

📅2 weeks ago⏱1 min read

Traditional web scraping breaks often. You write CSS selectors. The site updates. Your code fails. I tried a new way using AI.

I built a price tool. I used XPath and regex. Site redesigns broke my scrapers. Regex picked up wrong numbers. I needed a tool to understand meaning.

I first sent raw HTML to an LLM. It cost too much. It hallucinated data. I tried removing too much text. The model lost the context.

I changed my process. First, I cleaned the HTML. I removed scripts and footers. I kept only headings and prices. This cut token use by 70%.

Second, I gave the AI examples. I showed it what a price looks like. Third, I set temperature to 0. This made the output stable.

There are trade-offs.

Skip AI if:

Respect robots.txt. Start with AI to save time. Add validation for safety.

What does your scraping stack look like?

Continue reading