๐—ง๐—ฒ๐˜€๐˜๐—ถ๐—ป๐—ด ๐—”๐—œ ๐—ฆ๐—ฎ๐—ณ๐—ฒ๐˜๐˜†: ๐—ช๐—ถ๐—น๐—ฑ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฑ ๐˜ƒ๐˜€ ๐—ก๐—ฒ๐—บ๐—ผ๐˜๐—ฟ๐—ผ๐—ป

I tested three local AI safety classifiers for medical use.

I used 50 test cases. 22 cases were real attacks like jailbreaks. 18 cases were safe medical queries.

The results show a big trade-off between catching attacks and blocking real users.

๐—ง๐—ต๐—ฒ ๐—ง๐—ฒ๐˜€๐˜ ๐—ฅ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€:

โ€ข WildGuard (F1: 0.941): It caught every single attack. However, it blocked many safe medical terms. It marked "BRCA1 mutation" and "GFR calculation" as harmful. This is bad for medical AI. โ€ข Nemotron-3-CS (F1: 0.857): It had zero false alarms. It never blocked a safe medical query. But, it missed 8 attacks. It failed to catch attacks using Base64 or ROT13 encoding. โ€ข LlamaGuard3 (F1: 0.800-0.821): It is a middle ground. It is better at allowing medical terms than WildGuard, but less accurate overall.

๐—ช๐—ต๐—ฎ๐˜ ๐—œ ๐—Ÿ๐—ฒ๐—ฎ๐—ฟ๐—ป๐—ฒ๐—ฑ:

  1. Medical terms cause false positives. Models like WildGuard often treat rare medical strings as high-risk content. You will need a whitelist if you use this in a clinic.
  2. Encoding bypasses work. Nemotron-3-CS cannot stop technical attacks like Kubernetes infra probes or encoded commands.
  3. Testing automation needs speed. I used Passmark AI to run these tests. I found that clicking through UIs like Swagger is too slow for AI agents. It is better to navigate directly to REST URLs or use Playwright request fixtures.

๐—ช๐—ต๐—ถ๐—ฐ๐—ต ๐—ผ๐—ป๐—ฒ ๐˜€๐—ต๐—ผ๐˜‚๐—น๐—ฑ ๐˜†๐—ผ๐˜‚ ๐—ฐ๐—ต๐—ผ๐—ผ๐˜€๐—ฒ?

No model is perfect for medical AI yet. We still need better audit logs and specialized training for medical vocabulary.

Source: https://dev.to/jh5_pulse/nemoclaw-shi-ce-ping-zheng-qie-qu-vs-zi-liao-wai-xie-11ga

Optional learning community: https://t.me/GyaanSetuAi