Skip to content

feat: add TDT CoreML export for parakeet-tdt-ctc-110m#25

Open
JarbasAl wants to merge 1 commit intoFluidInference:mainfrom
TigreGotico:feat/tdt-ctc-110m-coreml-export
Open

feat: add TDT CoreML export for parakeet-tdt-ctc-110m#25
JarbasAl wants to merge 1 commit intoFluidInference:mainfrom
TigreGotico:feat/tdt-ctc-110m-coreml-export

Conversation

@JarbasAl
Copy link

@JarbasAl JarbasAl commented Mar 16, 2026

Add convert-tdt-coreml.py which exports the TDT decoder components (fused mel+encoder, RNNT decoder LSTM, joint decision with duration) instead of the CTC head. The CTC export only produces blank-dominant log-probabilities unsuitable for greedy transcription in hybrid models.

Components:

  • convert-tdt-coreml.py: Full TDT export pipeline (iOS 18 target)
  • individual_components.py: Shared torch.nn.Module wrappers for tracing
  • Updated README.md: Documents both TDT and CTC export paths
  • Updated pyproject.toml: Adds script entry point and includes

companion PR: FluidInference/FluidAudio#383

AI Disclosure

Claude Opus did most of the work


Open with Devin

Add convert-tdt-coreml.py which exports the TDT decoder components
(fused mel+encoder, RNNT decoder LSTM, joint decision with duration)
instead of the CTC head. The CTC export only produces blank-dominant
log-probabilities unsuitable for greedy transcription in hybrid models.

Components:
- convert-tdt-coreml.py: Full TDT export pipeline (iOS 18 target)
- individual_components.py: Shared torch.nn.Module wrappers for tracing
- Updated README.md: Documents both TDT and CTC export paths
- Updated pyproject.toml: Adds script entry point and includes
Copy link

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 4 additional findings in Devin Review.

Open in Devin Review

):
logits = self.joint(encoder_outputs, decoder_outputs)
token_logits = logits[..., : self.vocab_with_blank]
duration_logits = logits[..., -self.num_extra :]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 -0: slice returns all logits when num_extra == 0, producing incorrect duration outputs

When num_extra is 0 (plain RNNT model without TDT duration head), logits[..., -self.num_extra :] evaluates to logits[..., -0:] which in Python is equivalent to logits[..., 0:] — returning all logits instead of an empty tensor. This means duration_logits would contain the full joint output (vocab + blank), and torch.argmax(duration_logits, dim=-1) would produce meaningless duration values based on token logits rather than duration bins.

The same issue exists in both JointDecisionWrapper (individual_components.py:146) and JointDecisionSingleStep (individual_components.py:179). The code in convert-tdt-coreml.py:226-230 warns about num_extra == 0 but continues the export, producing a model that silently emits incorrect duration predictions.

Prompt for agents
Fix the -0: slicing bug in both JointDecisionWrapper.forward() (individual_components.py:146) and JointDecisionSingleStep.forward() (individual_components.py:179). When self.num_extra is 0, logits[..., -0:] returns all logits instead of an empty slice. Either:

1. Guard the duration slice: use logits[..., self.vocab_with_blank :] instead of logits[..., -self.num_extra :], which correctly returns an empty tensor when vocab_with_blank equals the total logit dimension. Or:
2. Add a conditional: if self.num_extra > 0, compute duration_logits normally; otherwise return a zeros tensor of the appropriate shape for duration.

Additionally, in convert-tdt-coreml.py:226-230, consider raising an error or skipping the JointDecision export entirely when num_extra == 0 rather than continuing with a broken duration head.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

model: ct.models.MLModel, path: Path, description: str
) -> None:
try:
model.minimum_deployment_target = ct.target.iOS17

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 _save_mlpackage overwrites iOS18 deployment target with iOS17, creating invalid metadata

The _save_mlpackage function at convert-tdt-coreml.py:58 unconditionally sets model.minimum_deployment_target = ct.target.iOS17, but the TDT export converts all models with deployment_target=ct.target.iOS18 (convert-tdt-coreml.py:188). The README explicitly states "iOS 18 deployment target: Required for int ops in the encoder's positional encoding."

If coremltools allows this downgrade (the try/except may not catch it since it's just a metadata property), the saved .mlpackage will claim iOS 17 compatibility while containing iOS 18-specific operations. This would cause iOS 17 devices to attempt loading the model and fail at runtime with confusing errors, rather than getting a clear "requires iOS 18" rejection. This function was copied from convert-coreml.py:71 where iOS17 was the correct target.

Suggested change
model.minimum_deployment_target = ct.target.iOS17
model.minimum_deployment_target = ct.target.iOS18
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant