๐๐ ๐๐ฎ๐ถ๐น๐ ๐๐ ๐๐ง ๐ง๐ฎ๐๐ธ๐
Artificial Analysis and IBM released a new test called ITBench-AA. It tests AI on enterprise IT tasks. Top models scored under 50%.
Agentic IT tasks involve AI working alone. These include:
- Troubleshooting errors.
- Managing system updates.
- User support.
- Security operations.
General AI models write well. They fail at real technical work. They lack grounding. Grounding is the ability to check a live system.
Here are the failures:
- One model suggested restarting a PC for a router issue.
- Models missed how system parts connect. This causes crashes.
- AI ignored security rules.
Do not rely only on the newest models for IT workflows.
Follow these steps:
- Use AI tools built for IT.
- Start with pilot programs.
- Check performance metrics.
- Keep human experts in charge.
Human oversight is a must. Trust evidence over claims.
Source: https://huggingface.co/blog/ibm-research/itbench-aa Optional learning community: https://t.me/GyaanSetuAi