๐๐ฝ๐ฝ๐น๐ฒ ๐๐ถ๐ป๐ฎ๐น๐น๐ ๐ฆ๐ต๐ถ๐ฝ๐ฝ๐ฒ๐ฑ ๐๐ผ๐ฐ๐ฎ๐น ๐ ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐ถ๐ป ๐ ๐๐ผ๐ฑ๐ฒ ๐ฎ๐ณ ๐๐ฒ๐๐ฎ
Six months ago, I built a custom implementation for iOS using llama.cpp.
I wanted one specific result: Input an image. Get structured JSON out. Do it all without the cloud.
The system worked and stayed fast. But the technical debt was high. I had to manage:
- XCFramework builds
- ObjC++ bridges
- Tokenizer and sampling internals
- Model file management
- Strict JSON guardrails
Apple just released Foundation Models for image analysis in the Xcode 27.0 beta. You can now run serious on-device models without building the engine yourself.
The new API is simple. You import FoundationModels and use @Generable to define your data structure.
You create a session and ask the model to respond. You pass your image as an attachment. The model returns the data in your exact format.
This change removes the need for:
- Manual llama.cpp management
- ObjC++ wrappers and thread safety
- Custom schema failover logic
- Manual model file bundling
The new system provides:
- Native LanguageModelSession
- Native image attachments
- Native structured generation
- Native model availability checks
- Native profiling via Instruments.app
This is how multimodal inference should work. It is cleaner and faster for developers.
Source: https://dev.to/fosteman/100-years-later-apple-finally-shipped-local-multimodal-in-xcode-27-beta-nmc
Optional learning community: https://t.me/GyaanSetuAi