𝗔𝗽𝗽𝗹𝗲 𝗙𝗶𝗻𝗮𝗹𝗹𝘆 𝗦𝗵𝗶𝗽𝗽𝗲𝗱 𝗟𝗼𝗰𝗮𝗹 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗶𝗻 𝘅𝗖𝗼𝗱𝗲 𝟮𝟳 𝗕𝗲𝘁𝗮

📅20 hours ago⏱1 min read

Six months ago, I built a custom implementation for iOS using llama.cpp.

I wanted one specific result: Input an image. Get structured JSON out. Do it all without the cloud.

The system worked and stayed fast. But the technical debt was high. I had to manage:

XCFramework builds
ObjC++ bridges
Tokenizer and sampling internals
Model file management
Strict JSON guardrails

Apple just released Foundation Models for image analysis in the Xcode 27.0 beta. You can now run serious on-device models without building the engine yourself.

The new API is simple. You import FoundationModels and use @Generable to define your data structure.

You create a session and ask the model to respond. You pass your image as an attachment. The model returns the data in your exact format.

This change removes the need for:

Manual llama.cpp management
ObjC++ wrappers and thread safety
Custom schema failover logic
Manual model file bundling

The new system provides:

Native LanguageModelSession
Native image attachments
Native structured generation
Native model availability checks
Native profiling via Instruments.app

This is how multimodal inference should work. It is cleaner and faster for developers.

Source: https://dev.to/fosteman/100-years-later-apple-finally-shipped-local-multimodal-in-xcode-27-beta-nmc

Optional learning community: https://t.me/GyaanSetuAi

𝗔𝗽𝗽𝗹𝗲 𝗙𝗶𝗻𝗮𝗹𝗹𝘆 𝗦𝗵𝗶𝗽𝗽𝗲𝗱 𝗟𝗼𝗰𝗮𝗹 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗶𝗻 𝘅𝗖𝗼𝗱𝗲 𝟮𝟳 𝗕𝗲𝘁𝗮

Continue reading

𝗥𝘂𝗻 𝗖𝗼𝗱𝗲𝘅 𝗖𝗟𝗜 𝘄𝗶𝘁𝗵 𝗚𝗲𝗺𝗺𝗮 𝟰 𝗼𝗻 𝗪𝗦𝗟𝟮

𝗧𝗵𝗶𝘀 𝗜𝘀 𝗔 𝗚𝗨𝗜𝗗𝗘 𝗧𝗢 𝗨𝗦𝗜𝗡𝗚 𝗚𝗕𝗡𝗙 𝗚𝗥𝗔𝗠𝗠𝗔𝗥𝗦 𝗙𝗢𝗥 𝗢𝗡 𝗗𝗘𝗩𝗜𝗖𝗘 𝗟𝗟𝗠𝗦

𝗧𝗵𝗲 𝗛𝗶𝗱𝗱𝗲𝗻 𝗖𝗼𝘀𝘁 𝗼𝗳 𝗟𝗼𝗰𝗮𝗹 𝗟𝗟𝗠𝘀

𝗦𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲𝗱 𝗢𝘂𝘁𝗽𝘂𝘁 𝗳𝗿𝗼𝗺 𝗟𝗟𝗠𝘀

𝗧𝗿𝘆𝗶𝗻𝗴 𝗔𝗽𝗽𝗹𝗲 𝗙𝗼𝘂𝗻𝗱𝗮𝘁𝗶𝗼𝗻 𝗠𝗼𝗱𝗲𝗹𝘀 𝘄𝗶𝘁𝗵 𝗢𝗻 𝗗𝗲𝘃𝗶𝗰𝗲 𝗔𝗜