Cross-platform Swift SDK for Cactus inference.
Cactus is a low-latency inference engine for mobile devices and wearables, allowing you to run high-performance AI locally in your app. Additionally, the engine supports hybrid cloud inference for when local inference isn’t sufficient, and the engine itself will automatically perform this behavior.
This package supports all Apple platforms, Android, and Linux on ARM.
Supported Engine Version: 1.14
This package exports 3 products.
Cacus- The main library product.- This product exports
CactusCore.
- This product exports
CactusCore- The core of the library without macros bundled in.- This product exports
CXXCactusShims.
- This product exports
CXXCactusShims- A direct export of the Cactus FFI.
To get started, you’ll need to have URL to a model in Cactus format. This means you can either side-load a model yourself, or you can use CactusModelsDirectory to download a model. Afterwards, you can create a CactusAgentSession to begin conversing with the model.
import Cactus
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .lfm2_5_1_2bThinking()
)
let session = try CactusAgentSession(from: modelURL) {
"You are a helpful assistant who can answer questions."
}
let message = CactusUserMessage {
"What is the meaning of time?"
}
let completion = try await session.respond(to: message)
print(completion.output)Function calling is supported through the CactusFunction protocol, which can be passed to a CactusAgentSession.
import Cactus
struct GetWeather: CactusFunction {
@JSONSchema
struct Input: Codable, Sendable {
@JSONSchemaProperty(description: "The city to load the weather for.")
let city: String
}
let name = "get_weather"
let description = "Loads the weather for a city."
func invoke(input: Input) async throws -> sending String {
let weatherCondition = try await weather(for: city)
return "The current weather for \(input.city) is: \(weatherCondition)"
}
}
let session = try CactusAgentSession(
from: modelURL,
functions: [GetWeather()]
) {
"You are a weather assistant who can get the current weather."
}
let message = CactusUserMessage {
"What is the weather in San Francisco?"
}
let completion = try await session.respond(to: message)
print(completion.output)The input type of the function must conform to Decodable, and the function must have a JSON Schema desccription for its parameters. You can use the @JSONSchema macro as shown in the above example to autosynthesize the schema.
If a model makes multiple function calls in a single prompt, they are executed in parallel by default (this is configurable), and the results are passed back to the model in the same order that the model invoked the functions in.
Certain models also support vision through images. You can analyze images by passing them into the prompt via CactusPromptContent.
import Cactus
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .lfm2Vl_450m()
)
let session = try CactusAgentSession(from: modelURL) {
"You describe interesting parts of images."
}
let message = CactusUserMessage {
"Describe this image in 1 sentence."
CactusPromptContent(images: [imageURL])
}
let completion = try await session.respond(to: message)
print(completion.output)Audio transcription is supported through the CactusSTTSession class. You can transcribe both audio files (WAV), and PCM buffers (16khz 16-bit mono).
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .parakeetCtc_1_1b()
)
let session = try CactusSTTSession(from: modelURL)
// WAV File
let request = CactusTranscription.Request(
prompt: .default,
content: .audio(.documentsDirectory.appending(path: "audio.wav"))
)
let transcription = try await session.transcribe(request: request)
print(transcription.content)
// PCM Buffer
let pcmBytes: [UInt8] = [...]
let request = CactusTranscription.Request(
prompt: .default,
content: .pcm(pcmBytes)
)
let transcription = try await session.transcribe(request: request)
print(transcription.content)
// AVFoundation (Apple Platforms Only)
import AVFoundation
let buffer: AVAudioPCMBuffer = ...
let request = CactusTranscription.Request(
prompt: .default,
content: try .pcm(buffer)
)
let transcription = try await session.transcribe(request: request)
print(transcription.content)For whisper models, you can rely on a special prompt constructor for whisper-style prompts.
import Cactus
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .whisperSmall()
)
let session = try CactusSTTSession(from: modelURL)
let request = CactusTranscription.Request(
prompt: .whipser(language: .english, includeTimestamps: true),
content: .audio(.documentsDirectory.appending(path: "audio.wav"))
)
let transcription = try await session.transcribe(request: request)
print(transcription.content)Both CactusAgentSession and CactusSTTSession support streaming via CactusInferenceStream.
// Agent Session
let session = try CactusAgentSession(from: modelURL) {
"You are a helpful assistant."
}
let message = CactusUserMessage {
"What is the weather in San Francisco?"
}
let stream = try session.stream(to: message)
for await token in stream.tokens {
print(token.stringValue, token.tokenId, token.generationStreamId)
}
let completion = try await stream.collectResponse()
print(completion.output)
// STT Session
let session = try CactusSTTSession(from: modelURL)
let request = CactusTranscription.Request(
prompt: .default,
content: .audio(.documentsDirectory.appending(path: "audio.wav"))
)
let stream = try session.transcriptionStream(request: request)
for await token in stream.tokens {
print(token.stringValue, token.tokenId, token.generationStreamId)
}
let transcription = try await stream.collectResponse()
print(transcription.content)You can also do live transcription through the CactusTranscriptionStream class, and by passing chunks of audio to transcribe to the stream.
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .parakeetCtc_1_1b()
)
let stream = try CactusTranscriptionStream(from: modelURL)
let recordingTask = Task {
for try await chunk in stream {
print(chunk)
}
}
try await stream.process(buffer: chunk)
try await stream.process(buffer: chunk)
try await stream.process(buffer: chunk)
try await stream.finish()
_ = try await recordingTask.valueCactusSTTSession also supports detecting the language from an audio source.
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .whisperSmall()
)
let session = try CactusSTTSession(from: modelURL)
// WAV File
let request = CactusLanguageDetection.Request(
content: .audio(.documentsDirectory.appending(path: "audio.wav"))
)
let detection = try await session.detectLanguage(request: request)
print(detection.language)
// PCM Buffer
let pcmBytes: [UInt8] = [...]
let request = CactusLanguageDetection.Request(content: .pcm(pcmBytes))
let detection = try await session.detectLanguage(request: request)
print(detection.language)
// AVFoundation (Apple Platforms Only)
import AVFoundation
let buffer: AVAudioPCMBuffer = ...
let request = CactusLanguageDetection.Request(content: try .pcm(buffer))
let detection = try await session.detectLanguage(request: request)
print(detection.language)Note
Language detection is currently limited to whisper models only.
VAD is supported through the CactusVADSession class, and supports the same audio formats as CactusSTTSession.
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .sileroVad()
)
let session = try CactusVADSession(from: modelURL)
// WAV File
let request = CactusVAD.Request(
content: .audio(.documentsDirectory.appending(path: "audio.wav"))
)
let vad = try await session.vad(request: request)
print(vad.segments)
// PCM Buffer
let pcmBytes: [UInt8] = [...]
let request = CactusVAD.Request(content: .pcm(pcmBytes))
let vad = try await session.vad(request: request)
print(vad.segments)
// AVFoundation (Apple Platforms Only)
import AVFoundation
let buffer: AVAudioPCMBuffer = ...
let request = CactusVAD.Request(content: try .pcm(buffer))
let vad = try await session.vad(request: request)
print(vad.segments)Certain models also have a pro version on Apple platforms, which enable NPU acceleration through ANE. For models that support NPU acceleration, you can indicate the pro version you want inside the model download request.
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .lfm2Vl_450m(pro: .apple)
)
let modelURL = try await CactusModelsDirectory.shared.modelURL(
for: .moonshineBase(pro: .apple)
)The CactusModelsDirectory class manages access to all models stored locally in your app, and it can even download and remove models directly on device.
let directory = CactusModelsDirectory(
baseURL: .applicationSupportDirectory.appending(path: "models")
)
// Downloading
// directory.modelURL will only download if it cannot find the
// model in the directory.
let modelURL = try await directory.modelURL(for: .whisperSmall())
let downloadTask = try await directory.downloadTask(for: .whisperSmall())
downloadTask.onProgress = { progress in
print(progress)
}
// Removing
try directory.removeModel(with: .whisperSmall())
try directory.removeModels { $0.request == .whisperSmall() }The CactusModel class is a non-Copyable, non-Sendable struct that provides a synchronous wrapper around the cactus_model_t pointer and C FFI. All higher level APIs in the SDK are built entirely off of this struct.
let model = try CactusModel(from: modelURL)
let turn = try model.complete(
messages: [
.system("You are a helpful assistant."),
.user("What is the meaning of life?")
]
) { token, tokenId in
print(token, tokenId) // Streaming
}
print(turn.response)
let transcription = try model.transcribe(
audio: wavURL,
prompt: ""
) { token, tokenId in
print(token, tokenId) // Streaming
}
print(transcription.response)Note
Since the struct is non-copyable, it therefore uses ownership semantics to manage the memory of the underlying model pointer. You can read the Swift Evolution proposal for non-Copyable types to understand how they function at a deeper level.
The CactusModelActor is an actor variant of CactusModel which is properly Sendable, and supports background thread execution.
let model = try CactusModelActor(from: modelURL)
let turn = try await model.complete(
messages: [
.system("You are a helpful assistant."),
.user("What is the meaning of life?")
]
) { token, tokenId in
print(token, tokenId) // Streaming
}
print(turn.response)
let transcription = try await model.transcribe(
audio: wavURL,
prompt: ""
) { token, tokenId in
print(token, tokenId) // Streaming
}
print(transcription.response)You can get embeddings for text, audio, and images through either CactusModel or CactusModelActor.
// Synchronous
let model = try CactusModel(from: modelURL)
var embeddings = [2048 of Float](repeating: 0)
var span = embeddings.mutableSpan
try model.embeddings(for: "This is some text", buffer: &span)
try model.imageEmbeddings(for: imageURL, buffer: &span)
try model.audioEmbeddings(for: audioFileURL, buffer: &span)
// Async/Await
let model = try CactusModelActor(from: modelURL)
var embeddings = [2048 of Float](repeating: 0)
var span = embeddings.mutableSpan
try await model.embeddings(for: "This is some text", buffer: &span)
try await model.imageEmbeddings(for: imageURL, buffer: &span)
try await model.audioEmbeddings(for: audioFileURL, buffer: &span)You can use the low-level CactusIndex struct for vector indexing. CactusIndex is also a non-Copyable and non-Sendable struct like CactusModel, which means that it uses ownership semantics to manage the memory to its underlying cactus_model_t pointer.
import Cactus
let model = try CactusModel(from: modelURL)
let index = try CactusIndex(
from: .applicationSupportDirectory.appending(path: "my-index")
)
let embeddings = try model.embeddings(for: "Some text")
let document = CactusIndex.Document(
id: 0,
embeddings: emdeddings,
content: "Some text"
)
try index.add(document: document)
let queryEmbeddings = try model.embeddings(for: "Another text")
let query = CactusIndex.Query(embeddings: queryEmbeddings)
let results = try index.query(query)
for result in results {
print(result.documentId, result.score)
}You can enable and disable inference telemetry like so.
import Cactus
CactusTelemetry.setup()
await CactusTelemetry.disable()You can control logging from the Cactus engine via CactusLogging.
import Cactus
CactusLogging.setLevel(.debug)
CactusLogging.setHandler { entry in
print(entry.level, entry.message)
}
CactusLogging.removeHandler()The available log levels are:
.debug- All log messages.info- Informational messages and above.warn- Warnings and errors (default).error- Errors only.none- Disables all logging
Many types in the library, such as CactusAgentSession, CactusInferenceStream, and CactusModel.DownloadTask conform to the Observable protocol from the Observation framework such that you can use them for live UI updates in SwiftUI views.
import SwiftUI
import Cactus
struct MyChatView: View {
@State var session: CactusAgentSession
var body: some View {
VStack {
ForEach(self.session.transcript) { entry in
Text(entry.message.content)
}
if self.session.isResponding {
ProgressView()
}
}
}
}The library ships with a built-in JSONSchema strong type including support for both validation and Codable types. You can easily generate a JSON schema for a struct through the @JSONSchema macro.
Additionally, the library supports encoding and decoding Codable values from an intermediate JSON representation.
@JSONSchema
struct MyValue: Codable {
@JSONSchemaProperty(.string(pattern: /[0-9A-Za-z]+/))
var property: String
@JSONSchemaProperty(.integer(minimum: 10))
var num: Int
}
let jsonValue = JSONSchema.Value.object([
"property": "this is a string",
"num": 10
])
// Validation
try JSONSchema.Validator.shared.validate(
value: jsonValue,
with: MyValue.jsonSchema
)
// Codable Support
let decoded = try JSONSchema.Value.Decoder()
.decode(MyValue.self, from: jsonValue)
let encoded: JSONSchema.Value = try JSONSchema.Value.Encoder()
.encode(MyValue(property: "blob", num: 20))Cactus supports hybrid inference (ie. Handing off to a cloud model when a local model is below a certain confidence threshold.) automatically through the cactusCloudAPIKey property.
import Cactus
Cactus.cactusCloudAPIKey = "<optional key here for hybrid inference>"Note
Avoid hardcoding an API key into an app that's publicly distributed. Attackers can inspect your binary, network traffic, or use a debugger to extract the key.
This library uses it's own versioning scheme separate from the upstream engine (eg. Version X.Y.Z of this library != Cactus Version X.Y.Z). You can check the supported engine version through the cactusEngineVersion constant.
import Cactus
// Prints the supported engine version ("1.14" at the time of writing this).
print(Cactus.cactusEngineVersion)The supported engine version is also displayed at the top of README.
In no particular order.
AnyLangauageModelbackend via package trait.- Reliable structured generation using the
@JSONSchemamacro and any EBNF grammar.- This requires CFG support in the upstream engine.
- This would also support incremental structured streaming for JSON-complete formats via
StreamParsing.
- Higher-Level vector index abstractions.
- Integrations with more Apple native frameworks (eg. CoreAudio).
- Cactus Graphs.
- Example apps.
You can add Swift Cactus to an Xcode project by adding it to your project as a package.
If you want to use Swift Cactus in a SwiftPM project, it's as simple as adding it to your Package.swift.
dependencies: [
.package(url: "https://github.com/mhayes853/swift-cactus", from: "2.0.0")
]And then adding the product to any target that needs access to the library.
.product(name: "Cactus", package: "swift-cactus")This library is licensed under an MIT License. See LICENSE for details.