Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ Each plugin lives in `plugins/<slug>`. The directory name is the install keyword
| `mac-notify` | macOS notifications when a Cline run completes. |
| `nanobanana` | Image generation through OpenRouter and Gemini image models. |
| `speak` | Speaks completed Cline replies with ElevenLabs text to speech. |
| `togetherai` | Together AI workflow skills for inference, training, media, evaluation, and infrastructure. |
| `typescript-lsp` | TypeScript language service `goto_definition` support. |
| `weather-metrics` | Demo weather tool plus runtime metrics hooks. |
| `web-search` | Exa-backed web search as a Cline tool. |
Expand Down
21 changes: 21 additions & 0 deletions plugins/togetherai/LICENSE.togetherai-skills
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2025 Together AI

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
37 changes: 37 additions & 0 deletions plugins/togetherai/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Together AI

Together AI workflow skills for Cline.

## What It Does

This plugin bundles Together AI skills for chat completions, batch inference, embeddings, evaluations, fine-tuning, images, video, audio, sandboxes, dedicated endpoints, dedicated containers, and GPU clusters.

Each skill includes workflow guidance plus local reference files and example Python or TypeScript scripts. The plugin does not register an MCP server and does not run Together AI calls during install.

The bundled skills ask Cline to get approval before running scripts, installing SDKs, spending credits, uploading data, creating or deleting endpoints, launching clusters, or using remote execution.

## Install

```bash
cline plugin install togetherai
```

For local development from this repository:

```bash
cline plugin install ./plugins/togetherai --cwd .
```

## Requirements

- `TOGETHER_API_KEY` in the environment before running Together AI API examples.
- Python examples generally expect `together>=2.0.0`; TypeScript examples expect `together-ai`.
- Some workflows may also require external provider keys, Docker/container tooling, Kubernetes/Slurm access, Jig, Sprocket, or Together Cloud cluster permissions.

## Security Notes

Together AI workflows can spend credits, upload datasets or models, generate media, execute remote code, and provision billable infrastructure. Review scripts and target resources before running them, keep API keys out of source control, and clean up endpoints, clusters, sandboxes, storage, and generated artifacts when they are no longer needed.

## Attribution

The bundled Together AI skill material is MIT licensed. See `LICENSE.togetherai-skills`.
10 changes: 10 additions & 0 deletions plugins/togetherai/index.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import type { AgentPlugin } from "@cline/sdk"

const plugin: AgentPlugin = {
name: "togetherai",
manifest: {
capabilities: ["skills"],
},
}

export default plugin
19 changes: 19 additions & 0 deletions plugins/togetherai/package.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"name": "togetherai",
"version": "0.0.0",
"private": true,
"type": "module",
"description": "Cline plugin that bundles Together AI workflow skills.",
"cline": {
"plugins": [
{
"paths": [
"./index.ts"
],
"capabilities": [
"skills"
]
}
]
}
}
84 changes: 84 additions & 0 deletions plugins/togetherai/skills/together-audio/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
name: together-audio
description: "Text-to-speech and speech-to-text via Together AI, including REST, streaming, and realtime WebSocket TTS, plus transcription, translation, diarization, timestamps, and live STT. Reach for it whenever the user needs audio in or audio out on Together AI rather than chat generation, image or video creation, or model training."
---

# Together Audio

## Overview

Use Together AI audio APIs for:

- text-to-speech generation
- streaming or realtime voice output
- speech-to-text transcription
- translation, diarization, and timestamps
- live captioning and realtime transcription

## When This Skill Wins

- Generate spoken audio from text
- Transcribe uploaded audio files or URLs
- Add realtime voice or captioning to an app
- Extract speaker segments or word timings

## Hand Off To Another Skill

- Use `together-chat-completions` for text-only generation
- Use `together-video` or `together-images` for visual generation workflows
- Use `together-dedicated-endpoints` only when the audio model itself must be hosted on dedicated infrastructure

## Quick Routing

- REST TTS or streaming TTS
- Read [references/tts-models.md](references/tts-models.md)
- Start with [scripts/tts_generate.py](scripts/tts_generate.py) or [scripts/tts_generate.ts](scripts/tts_generate.ts)
- Realtime TTS over WebSocket
- Read [references/tts-models.md](references/tts-models.md)
- Start with [scripts/tts_websocket.py](scripts/tts_websocket.py)
- File transcription, translation, diarization, or timestamps
- Read [references/stt-models.md](references/stt-models.md)
- Start with [scripts/stt_transcribe.py](scripts/stt_transcribe.py) or [scripts/stt_transcribe.ts](scripts/stt_transcribe.ts)
- Realtime STT
- Read [references/stt-models.md](references/stt-models.md)
- Start with [scripts/stt_realtime.py](scripts/stt_realtime.py)

## Workflow

1. Confirm whether the task is TTS or STT.
2. Choose REST, streaming, or realtime transport based on latency and interaction needs.
3. Pick the model and response format from the relevant reference file.
4. Start from the matching script instead of rebuilding the request contract from memory.
5. For Python STT uploads, open audio files in binary mode and pass the file handle rather than a bare path string.

## High-Signal Rules

- Python scripts require the Together v2 SDK (`together>=2.0.0`). If the user is on an older version, they must upgrade first: `uv pip install --upgrade "together>=2.0.0"`.
- Use `client.audio.speech.create()` for TTS.
- REST TTS returns a `BinaryAPIResponse`; call `response.write_to_file(path)` to save it. Do NOT use `stream_to_file` (it does not exist on this object).
- Streaming TTS (`stream=True`) returns a `Stream` of `AudioSpeechStreamChunk` objects. Iterate chunks, check `chunk.type`, and decode `base64.b64decode(chunk.delta)` for audio data. There is no file-writing helper on the stream object.
- Use `client.audio.transcriptions.create()` for transcription and `client.audio.translations.create()` for translation.
- Batch transcription and translation share hard limits: 500 MB direct upload, 1 GB URL-fetch, 4 hours of audio per request. For larger payloads, pass a public HTTPS URL on `file=`; for longer audio, split into "�� 4 h chunks. See the Limits section of [references/stt-models.md](references/stt-models.md).
- Realtime APIs require audio-format discipline; confirm PCM expectations before streaming bytes.
- Diarization and word timestamps change response shape; code for the richer verbose output explicitly.

## Resource Map

- TTS reference: [references/tts-models.md](references/tts-models.md)
- STT reference: [references/stt-models.md](references/stt-models.md)
- Python TTS workflow: [scripts/tts_generate.py](scripts/tts_generate.py)
- TypeScript TTS workflow: [scripts/tts_generate.ts](scripts/tts_generate.ts)
- Python realtime TTS workflow: [scripts/tts_websocket.py](scripts/tts_websocket.py)
- Python STT workflow: [scripts/stt_transcribe.py](scripts/stt_transcribe.py)
- TypeScript STT workflow: [scripts/stt_transcribe.ts](scripts/stt_transcribe.ts)
- Python realtime STT workflow: [scripts/stt_realtime.py](scripts/stt_realtime.py)

## Official Docs

- [Text-to-Speech](https://docs.together.ai/docs/text-to-speech)
- [Speech-to-Text](https://docs.together.ai/docs/speech-to-text)
- [TTS REST API](https://docs.together.ai/reference/audio-speech)
- [TTS WebSocket API](https://docs.together.ai/reference/audio-speech-websocket)
- [Audio Transcriptions API](https://docs.together.ai/reference/audio-transcriptions)
- [Audio Translations API](https://docs.together.ai/reference/audio-translations)
- [Realtime Audio Transcriptions API](https://docs.together.ai/reference/audio-transcriptions-realtime)
Loading