๐ง๐ฒ๐๐๐ถ๐ป๐ด ๐๐ ๐ฆ๐ฎ๐ณ๐ฒ๐๐: ๐ช๐ถ๐น๐ฑ๐๐๐ฎ๐ฟ๐ฑ ๐๐ ๐ก๐ฒ๐บ๐ผ๐๐ฟ๐ผ๐ป
I tested three local AI safety classifiers for medical use.
I used 50 test cases. 22 cases were real attacks like jailbreaks. 18 cases were safe medical queries.
The results show a big trade-off between catching attacks and blocking real users.
๐ง๐ต๐ฒ ๐ง๐ฒ๐๐ ๐ฅ๐ฒ๐๐๐น๐๐:
โข WildGuard (F1: 0.941): It caught every single attack. However, it blocked many safe medical terms. It marked "BRCA1 mutation" and "GFR calculation" as harmful. This is bad for medical AI. โข Nemotron-3-CS (F1: 0.857): It had zero false alarms. It never blocked a safe medical query. But, it missed 8 attacks. It failed to catch attacks using Base64 or ROT13 encoding. โข LlamaGuard3 (F1: 0.800-0.821): It is a middle ground. It is better at allowing medical terms than WildGuard, but less accurate overall.
๐ช๐ต๐ฎ๐ ๐ ๐๐ฒ๐ฎ๐ฟ๐ป๐ฒ๐ฑ:
- Medical terms cause false positives. Models like WildGuard often treat rare medical strings as high-risk content. You will need a whitelist if you use this in a clinic.
- Encoding bypasses work. Nemotron-3-CS cannot stop technical attacks like Kubernetes infra probes or encoded commands.
- Testing automation needs speed. I used Passmark AI to run these tests. I found that clicking through UIs like Swagger is too slow for AI agents. It is better to navigate directly to REST URLs or use Playwright request fixtures.
๐ช๐ต๐ถ๐ฐ๐ต ๐ผ๐ป๐ฒ ๐๐ต๐ผ๐๐น๐ฑ ๐๐ผ๐ ๐ฐ๐ต๐ผ๐ผ๐๐ฒ?
- Choose WildGuard if you must catch every attack and can handle manual reviews for blocked terms.
- Choose Nemotron-3-CS if you cannot afford to block legitimate medical questions.
- Choose LlamaGuard if you need a balance on limited hardware.
No model is perfect for medical AI yet. We still need better audit logs and specialized training for medical vocabulary.
Source: https://dev.to/jh5_pulse/nemoclaw-shi-ce-ping-zheng-qie-qu-vs-zi-liao-wai-xie-11ga
Optional learning community: https://t.me/GyaanSetuAi