๐๐ ๐๐ฟ๐ฎ๐ฐ๐๐ถ๐ป๐ด ๐ ๐ฒ๐๐๐ ๐ช๐ฒ๐ฏ ๐๐ฎ๐๐ฎ ๐ช๐ถ๐๐ต ๐๐๐ ๐
I scraped websites for years. I used BeautifulSoup and Scrapy. One site broke my process. The HTML was a mess. The layout changed every week. My selectors broke.
I tried an LLM. I gave it raw HTML. I asked for JSON.
Traditional tools rely on structure. LLMs rely on meaning. I describe the data. The AI finds it.
Pros:
- Layout changes do not stop it.
- Setup takes minutes.
- It ignores noise.
Cons:
- API calls cost money.
- It is slow.
- It sometimes makes up data.
- Huge HTML needs cleaning.
My strategy:
- Use traditional tools for stable sites.
- Use AI for hard sites.
- Validate your data.
Do you use AI for scraping? Do you prefer XPath?