feat: support parakeet-tdt-ctc-110m hybrid model by JarbasAl · Pull Request #383 · FluidInference/FluidAudio

JarbasAl · 2026-03-16T05:58:33Z

Add AsrModelVersion.tdtCtc110m for the 110M parameter hybrid TDT-CTC model. Key differences from the 0.6B models:

Fused preprocessor+encoder (no separate Encoder.mlmodelc)
Smaller dimensions: encoderHidden=512, vocabSize=1024, 1 LSTM layer
Array-format vocabulary (vocab.json) instead of dict format
blankId=1024 (same as v2)

Changes:

AsrModels: optional encoder, fused frontend loading, array vocab support
AsrManager: version-aware decoder state shapes, fused frontend availability
AsrTranscription: skip encoder step when preprocessor output is fused
TdtDecoderState: parameterized LSTM layer count
TdtDecoderV3: use config.encoderHiddenSize instead of auto-detection
EncoderFrameView: accept explicit hidden size parameter
TranscribeCommand: --model-version tdt-ctc-110m, --model-dir flags
ModelNames: parakeetTdtCtc110m repo, fused model requirements

Companion PR: FluidInference/mobius#25

Why is this change needed?

better support for https://huggingface.co/nvidia/parakeet-tdt_ctc-110m

AI Disclosure

I never worked with swift before, Claude Opus did most of the work

Add AsrModelVersion.tdtCtc110m for the 110M parameter hybrid TDT-CTC model. Key differences from the 0.6B models: - Fused preprocessor+encoder (no separate Encoder.mlmodelc) - Smaller dimensions: encoderHidden=512, vocabSize=1024, 1 LSTM layer - Array-format vocabulary (vocab.json) instead of dict format - blankId=1024 (same as v2) Changes: - AsrModels: optional encoder, fused frontend loading, array vocab support - AsrManager: version-aware decoder state shapes, fused frontend availability - AsrTranscription: skip encoder step when preprocessor output is fused - TdtDecoderState: parameterized LSTM layer count - TdtDecoderV3: use config.encoderHiddenSize instead of auto-detection - EncoderFrameView: accept explicit hidden size parameter - TranscribeCommand: --model-version tdt-ctc-110m, --model-dir flags - ModelNames: parakeetTdtCtc110m repo, fused model requirements

devin-ai-integration

Devin Review found 1 potential issue.

View 7 additional findings in Devin Review.

devin-ai-integration · 2026-03-16T06:07:06Z

Sources/FluidAudio/ASR/AsrManager.swift

        switch models.version {
-        case .v2:
+        case .v2, .tdtCtc110m:
            let decoder = TdtDecoderV2(config: config)


🔴 Missing encoderHiddenSize adaptation causes runtime crash for tdtCtc110m with default config

When AsrManager is created with ASRConfig.default (or any config that doesn't explicitly set encoderHiddenSize), it defaults to ASRConstants.encoderHiddenSize (1024). If then initialized with tdtCtc110m models (which produce encoder output with hidden size 512), transcription will fail at runtime with "Encoder hidden size mismatch" in EncoderFrameView (Sources/FluidAudio/ASR/TDT/EncoderFrameView.swift:32-33).

The blankId mismatch is handled by TdtDecoderV2.adaptConfigForV2 (Sources/FluidAudio/ASR/TDT/TdtDecoderV2.swift:55-74), but encoderHiddenSize is never adapted. AsrManager.initialize(models:) has the model version info (models.version.encoderHiddenSize returns 512 for 110m) but neither validates nor adapts the config. Since ASRConfig is stored as let on AsrManager, it cannot be corrected after init. This means the natural usage pattern AsrManager() → initialize(models: tdtCtc110m) silently accepts the mismatch and crashes only during transcription.

Example that triggers the crash

let models = try await AsrModels.downloadAndLoad(version: .tdtCtc110m)
let manager = AsrManager() // encoderHiddenSize defaults to 1024
try await manager.initialize(models: models)
let result = try await manager.transcribe(url) // CRASH in EncoderFrameView

Prompt for agents

In Sources/FluidAudio/ASR/AsrManager.swift, in the tdtDecodeWithTimings method (around line 306-348), the config is passed directly to TdtDecoderV2/V3 without adapting encoderHiddenSize based on the model version. Since models.version is already available at this point (line 317-319), create an adapted config that uses models.version.encoderHiddenSize before passing it to the decoder. Specifically, around line 320-322, where `let decoder = TdtDecoderV2(config: config)` is called, replace `config` with a version that has the correct encoderHiddenSize from models.version.encoderHiddenSize. Same for the v3 case at line 335. Alternatively, add validation in initialize(models:) at line 105-120 that throws an error if config.encoderHiddenSize != models.version.encoderHiddenSize, giving the user a clear error at initialization time rather than a cryptic error during transcription.

Was this helpful? React with 👍 or 👎 to provide feedback.

Alex-Wengg · 2026-03-16T14:02:27Z

@JarbasAl did you test this on iOS , we had originally fused preprocessor+encoder before & it had incompatibility issues on iOS .

also what about the benchmarks

JarbasAl mentioned this pull request Mar 16, 2026

feat: add TDT CoreML export for parakeet-tdt-ctc-110m FluidInference/mobius#25

Open

devin-ai-integration bot reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: support parakeet-tdt-ctc-110m hybrid model#383

feat: support parakeet-tdt-ctc-110m hybrid model#383
JarbasAl wants to merge 1 commit intoFluidInference:mainfrom
TigreGotico:feat/tdt-ctc-110m-support

JarbasAl commented Mar 16, 2026 •

edited

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

devin-ai-integration bot Mar 16, 2026

Uh oh!

Alex-Wengg commented Mar 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JarbasAl commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why is this change needed?

AI Disclosure

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration bot Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

Alex-Wengg commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JarbasAl commented Mar 16, 2026 •

edited

Loading

Alex-Wengg commented Mar 16, 2026 •

edited

Loading