𝗪𝗵𝘆 𝗜 𝗦𝘄𝗶𝘁𝗰𝗵𝗲𝗱 𝘁𝗼 𝗔𝗜 𝗳𝗼𝗿 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴

📅1 week ago⏱1 min read

I scraped websites for years. I used CSS selectors and XPath. It worked until sites changed their layout. Then my scripts broke. I spent more time fixing code than using data.

I tried BeautifulSoup. I tried Regex. I tried OCR. Nothing lasted. Small changes in HTML broke everything.

Now I use AI models. I send the HTML to an LLM. I ask for a JSON object. The AI finds the price and name. It ignores the HTML structure. It looks at the meaning of the text.

Why this works:

It survives layout changes.
It is easy to add new fields.
One prompt works for many sites.

The trade-offs:

Each request costs money.
It is slower than a script.
AI sometimes makes mistakes.
Large pages hit limits.

My strategy:

Use simple rules first.
Use AI as a fallback.
Validate every result.
Cache your data.

Stop fighting with HTML tags. Focus on your data.

Source: https://dev.to/__c1b9e06dc90a7e0a676b/why-i-gave-up-on-regex-and-started-using-ai-for-web-scraping-339d Optional learning community: https://t.me/GyaanSetuAi

𝗪𝗵𝘆 𝗜 𝗦𝘄𝗶𝘁𝗰𝗵𝗲𝗱 𝘁𝗼 𝗔𝗜 𝗳𝗼𝗿 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴

Continue reading

𝗔𝗜 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴 𝗩𝘀 𝗧𝗿𝗮𝗱𝗶𝘁𝗶𝗼𝗻𝗮𝗹 𝗦𝗲𝗹𝗲𝗰𝘁𝗼𝗿𝘀

𝗦𝘁𝗼𝗽 𝗪𝗿𝗶𝘁𝗶𝗻𝗴 𝗙𝗿𝗮𝗴𝗶𝗹𝗲 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗲𝗿𝘀

𝗠𝘆 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴 𝗡𝗶𝗴𝗵𝘁𝗺𝗮𝗿𝗲 𝗘𝗻𝗱𝗲𝗱 𝗪𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

𝗘𝘅𝘁𝗿𝗮𝗰𝘁𝗶𝗻𝗴 𝗠𝗲𝘀𝘀𝘆 𝗪𝗲𝗯 𝗗𝗮𝘁𝗮 𝗪𝗶𝘁𝗵 𝗟𝗟𝗠𝘀

𝗟𝗟𝗠𝘀 𝗙𝗼𝗿 𝗕𝗲𝘁𝘁𝗲𝗿 𝗪𝗲𝗯 𝗦𝗰𝗿𝗮𝗽𝗶𝗻𝗴