๐—ฆ๐˜๐—ผ๐—ฝ ๐—ช๐—ฟ๐—ถ๐˜๐—ถ๐—ป๐—ด ๐—™๐—ฟ๐—ฎ๐—ด๐—ถ๐—น๐—ฒ ๐—ช๐—ฒ๐—ฏ ๐—ฆ๐—ฐ๐—ฟ๐—ฎ๐—ฝ๐—ฒ๐—ฟ๐˜€

I spent years building scrapers. I used CSS selectors and BeautifulSoup. This worked for simple sites.

Then I hit a wall. E-commerce sites change layouts often. Some use random class names. My 300 line code failed often. Headless browsers used too much memory.

I tried a new method. I fed raw HTML to an AI. I asked it to find the data.

The AI understands meaning. It does not care if a price is in a span or a div. You write a prompt instead of a selector.

How to make it work:

Use this for:

Avoid this for:

The best setup is a hybrid. Use CSS selectors for stable sites. Use AI when selectors fail.

Describe the data you need. Let the model handle the structure.

Source: https://dev.to/__c1b9e06dc90a7e0a676b/my-web-scraping-nightmare-ended-when-i-let-an-llm-read-the-html-1bj4