๐—”๐—ฝ๐—ฝ๐—น๐—ฒ ๐—™๐—ถ๐—ป๐—ฎ๐—น๐—น๐˜† ๐—ฆ๐—ต๐—ถ๐—ฝ๐—ฝ๐—ฒ๐—ฑ ๐—Ÿ๐—ผ๐—ฐ๐—ฎ๐—น ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—ถ๐—ป ๐˜…๐—–๐—ผ๐—ฑ๐—ฒ ๐Ÿฎ๐Ÿณ ๐—•๐—ฒ๐˜๐—ฎ

Six months ago, I built a custom implementation for iOS using llama.cpp.

I wanted one specific result: Input an image. Get structured JSON out. Do it all without the cloud.

The system worked and stayed fast. But the technical debt was high. I had to manage:

Apple just released Foundation Models for image analysis in the Xcode 27.0 beta. You can now run serious on-device models without building the engine yourself.

The new API is simple. You import FoundationModels and use @Generable to define your data structure.

You create a session and ask the model to respond. You pass your image as an attachment. The model returns the data in your exact format.

This change removes the need for:

The new system provides:

This is how multimodal inference should work. It is cleaner and faster for developers.

Source: https://dev.to/fosteman/100-years-later-apple-finally-shipped-local-multimodal-in-xcode-27-beta-nmc

Optional learning community: https://t.me/GyaanSetuAi