langfuse · ysonggit · Mar 5, 2026
diff --git a/cookbook/_routes.json b/cookbook/_routes.json
@@ -364,6 +364,11 @@
     "docsPath": "integrations/frameworks/llamaindex",
     "isGuide": true
   },
+  {
+    "notebook": "integration_alibaba_cloud_model_studio.ipynb",
+    "docsPath": "integrations/model-providers/alibaba-cloud-model-studio",
+    "isGuide": true
+  },
   {
     "notebook": "integration_amazon_bedrock.ipynb",
     "docsPath": "integrations/model-providers/amazon-bedrock",

diff --git a/cookbook/integration_alibaba_cloud_model_studio.ipynb b/cookbook/integration_alibaba_cloud_model_studio.ipynb
@@ -0,0 +1,368 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- NOTEBOOK_METADATA source: \"\\u26a0\\ufe0f Jupyter Notebook\" title: \"Integrate Alibaba Cloud Model Studio (DashScope) with Langfuse\" sidebarTitle: \"Alibaba Cloud Model Studio\" description: \"Guide on integrating Alibaba Cloud Model Studio (DashScope) Qwen models with Langfuse for observability, tracing, and cost tracking.\" category: \"Integrations\" -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Observability for Alibaba Cloud Model Studio (DashScope) with Langfuse\n",
+    "\n",
+    "This cookbook demonstrates how to integrate [Alibaba Cloud Model Studio](https://www.alibabacloud.com/en/product/model-studio) (DashScope) with [Langfuse](https://langfuse.com) for full observability of your Qwen model calls.\n",
+    "\n",
+    "DashScope provides an **OpenAI-compatible API endpoint**, which means you can use Langfuse's drop-in replacement for the OpenAI SDK to automatically trace all your calls with zero additional code.\n",
+    "\n",
+    "We will cover:\n",
+    "- Basic chat completions with automatic tracing\n",
+    "- Streaming responses\n",
+    "- Embedding model tracing\n",
+    "- `@observe()` decorator for grouping multiple generations\n",
+    "- Multi-model comparison (qwen-plus / qwen-turbo / qwen-max) in a single trace\n",
+    "- Viewing traces in the Langfuse dashboard"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **What is Alibaba Cloud Model Studio (DashScope)?** [Model Studio](https://www.alibabacloud.com/en/product/model-studio) is Alibaba Cloud's AI model service platform, providing access to the Qwen family of large language models (qwen-plus, qwen-turbo, qwen-max) and embedding models via an OpenAI-compatible API."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **What is Langfuse?** [Langfuse](https://langfuse.com) is an open source LLM engineering platform that helps teams trace API calls, monitor performance, and debug issues in their AI applications."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- STEPS_START -->\n",
+    "## Step 1: Install Dependencies"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "%pip install openai langfuse --quiet"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Step 2: Set Up Environment Variables\n\nYou need:\n- A **DashScope API key** from [Alibaba Cloud Model Studio console](https://bailian.console.alibabacloud.com/) (international) or [DashScope console](https://dashscope.console.aliyun.com/) (China)\n- **Langfuse credentials** from [Langfuse Cloud](https://cloud.langfuse.com) or your self-hosted instance\n\nAlibaba Cloud Model Studio is available in multiple regions. Use the corresponding API endpoint:\n\n| Region | Endpoint |\n|---|---|\n| International (Singapore) | `https://dashscope-intl.aliyuncs.com/compatible-mode/v1` |\n| US (Virginia) | `https://dashscope-us.aliyuncs.com/compatible-mode/v1` |\n| China (Beijing) | `https://dashscope.aliyuncs.com/compatible-mode/v1` |\n\n> **Note:** API keys are region-specific and not interchangeable. Ensure your key matches the endpoint region."
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "import os\n\n# Get keys for your project from the project settings page\n# https://cloud.langfuse.com\nos.environ[\"LANGFUSE_PUBLIC_KEY\"] = \"pk-lf-...\"\nos.environ[\"LANGFUSE_SECRET_KEY\"] = \"sk-lf-...\"\nos.environ[\"LANGFUSE_BASE_URL\"] = \"https://cloud.langfuse.com\" # 🇪🇺 EU region\n# os.environ[\"LANGFUSE_BASE_URL\"] = \"https://us.cloud.langfuse.com\" # 🇺🇸 US region\n\n# Get your API key from Alibaba Cloud Model Studio console\n# https://bailian.console.alibabacloud.com/\nos.environ[\"DASHSCOPE_API_KEY\"] = \"sk-...\""
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Verify the Langfuse connection:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langfuse import get_client\n",
+    "\n",
+    "langfuse = get_client()\n",
+    "\n",
+    "if langfuse.auth_check():\n",
+    "    print(\"Langfuse client is authenticated and ready!\")\n",
+    "else:\n",
+    "    print(\"Authentication failed. Please check your credentials and host.\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 3: OpenAI Drop-in Replacement for DashScope\n",
+    "\n",
+    "Since DashScope exposes an OpenAI-compatible endpoint, we use the Langfuse drop-in replacement for the OpenAI SDK. Simply import `openai` from `langfuse.openai` and point the `base_url` to DashScope's endpoint.\n",
+    "\n",
+    "All API calls are then automatically traced by Langfuse."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": "# Instead of: from openai import OpenAI\nfrom langfuse.openai import openai\n\nclient = openai.OpenAI(\n    api_key=os.environ.get(\"DASHSCOPE_API_KEY\"),\n    base_url=\"https://dashscope-intl.aliyuncs.com/compatible-mode/v1\",\n)"
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Basic Chat Completion\n",
+    "\n",
+    "A simple chat completion request using `qwen-plus`. The trace is automatically captured in Langfuse."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    model=\"qwen-plus\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What is Langfuse and how does it help with LLM observability?\"},\n",
+    "    ],\n",
+    "    name=\"qwen-plus-basic\",\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 4: Streaming Chat Completion\n",
+    "\n",
+    "Streaming works out of the box. Langfuse captures the full streamed response and token usage."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "stream = client.chat.completions.create(\n",
+    "    model=\"qwen-plus\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a creative storyteller.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"Tell me a short story about a cloud that wanted to become a river.\"},\n",
+    "    ],\n",
+    "    stream=True,\n",
+    "    name=\"qwen-plus-streaming\",\n",
+    ")\n",
+    "\n",
+    "for chunk in stream:\n",
+    "    if chunk.choices[0].delta.content:\n",
+    "        print(chunk.choices[0].delta.content, end=\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": "## Step 5: Embedding Model Tracing\n\nDashScope also provides embedding models. Langfuse traces these calls just like chat completions.\n\n> **Note:** Embedding models (e.g., `text-embedding-v3`) are currently available in the **China (Beijing)** and **International (Singapore)** regions only. They are not available in the US (Virginia) region."
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "embedding_response = client.embeddings.create(\n",
+    "    model=\"text-embedding-v3\",\n",
+    "    input=[\"Langfuse provides open source LLM observability.\"],\n",
+    "    name=\"dashscope-embedding\",\n",
+    ")\n",
+    "\n",
+    "print(f\"Embedding dimension: {len(embedding_response.data[0].embedding)}\")\n",
+    "print(f\"First 5 values: {embedding_response.data[0].embedding[:5]}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 6: Group Multiple Generations with `@observe()`\n",
+    "\n",
+    "Use the `@observe()` decorator to group multiple LLM calls into a single trace. This is useful when your application logic involves chaining several model calls."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langfuse import observe\n",
+    "\n",
+    "@observe()\n",
+    "def translate_and_summarize(text: str) -> str:\n",
+    "    # Generation 1: Translate to English\n",
+    "    translation = client.chat.completions.create(\n",
+    "        model=\"qwen-plus\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a translator. Translate the following text to English.\"},\n",
+    "            {\"role\": \"user\", \"content\": text},\n",
+    "        ],\n",
+    "        name=\"translate\",\n",
+    "    ).choices[0].message.content\n",
+    "\n",
+    "    # Generation 2: Summarize the translation\n",
+    "    summary = client.chat.completions.create(\n",
+    "        model=\"qwen-plus\",\n",
+    "        messages=[\n",
+    "            {\"role\": \"system\", \"content\": \"You are a summarizer. Provide a one-sentence summary.\"},\n",
+    "            {\"role\": \"user\", \"content\": translation},\n",
+    "        ],\n",
+    "        name=\"summarize\",\n",
+    "    ).choices[0].message.content\n",
+    "\n",
+    "    return summary\n",
+    "\n",
+    "result = translate_and_summarize(\n",
+    "    \"\\u4e91\\u8ba1\\u7b97\\u662f\\u4e00\\u79cd\\u901a\\u8fc7\\u4e92\\u8054\\u7f51\\u63d0\\u4f9b\\u8ba1\\u7b97\\u670d\\u52a1\\u7684\\u6280\\u672f\\uff0c\\u5305\\u62ec\\u670d\\u52a1\\u5668\\u3001\\u5b58\\u50a8\\u3001\\u6570\\u636e\\u5e93\\u548c\\u7f51\\u7edc\\u7b49\\u8d44\\u6e90\\u3002\"\n",
+    ")\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 7: Multi-Model Comparison\n",
+    "\n",
+    "Compare responses from different Qwen models (qwen-plus, qwen-turbo, qwen-max) in a single trace. This is useful for evaluating model quality, latency, and cost."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from langfuse import observe\n",
+    "\n",
+    "@observe()\n",
+    "def compare_qwen_models(prompt: str) -> dict:\n",
+    "    models = [\"qwen-turbo\", \"qwen-plus\", \"qwen-max\"]\n",
+    "    results = {}\n",
+    "\n",
+    "    for model in models:\n",
+    "        response = client.chat.completions.create(\n",
+    "            model=model,\n",
+    "            messages=[\n",
+    "                {\"role\": \"system\", \"content\": \"You are a helpful assistant. Be concise.\"},\n",
+    "                {\"role\": \"user\", \"content\": prompt},\n",
+    "            ],\n",
+    "            name=f\"{model}-comparison\",\n",
+    "            temperature=0.7,\n",
+    "            max_tokens=200,\n",
+    "        )\n",
+    "        results[model] = response.choices[0].message.content\n",
+    "\n",
+    "    return results\n",
+    "\n",
+    "comparison = compare_qwen_models(\"Explain what makes a good API design in three bullet points.\")\n",
+    "\n",
+    "for model, answer in comparison.items():\n",
+    "    print(f\"\\n--- {model} ---\")\n",
+    "    print(answer)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 8: Enhance Tracing (Optional)\n",
+    "\n",
+    "You can enrich your DashScope traces with additional context:\n",
+    "\n",
+    "- Add [metadata](https://langfuse.com/docs/tracing-features/metadata), [tags](https://langfuse.com/docs/tracing-features/tags), [log levels](https://langfuse.com/docs/tracing-features/log-levels), and [user IDs](https://langfuse.com/docs/tracing-features/users) to traces\n",
+    "- Group traces by [sessions](https://langfuse.com/docs/tracing-features/sessions)\n",
+    "- Use [Langfuse Prompt Management](https://langfuse.com/docs/prompts/get-started) and link prompts to traces\n",
+    "- Add [scores](https://langfuse.com/docs/scores/custom) to traces\n",
+    "\n",
+    "Visit the [OpenAI SDK cookbook](https://langfuse.com/guides/cookbook/integration_openai_sdk) to see more examples on passing additional parameters."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = client.chat.completions.create(\n",
+    "    model=\"qwen-plus\",\n",
+    "    messages=[\n",
+    "        {\"role\": \"system\", \"content\": \"You are a helpful assistant.\"},\n",
+    "        {\"role\": \"user\", \"content\": \"What are the benefits of observability in LLM applications?\"},\n",
+    "    ],\n",
+    "    name=\"qwen-plus-enhanced\",\n",
+    "    metadata={\n",
+    "        \"langfuse_tags\": [\"dashscope\", \"qwen-plus\", \"demo\"],\n",
+    "        \"langfuse_user_id\": \"user-123\",\n",
+    "        \"langfuse_session_id\": \"session-abc\",\n",
+    "        \"langfuse_metadata\": {\"use_case\": \"observability-demo\", \"region\": \"cn\"},\n",
+    "    },\n",
+    ")\n",
+    "\n",
+    "print(response.choices[0].message.content)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Step 9: View Traces in Langfuse\n",
+    "\n",
+    "After running the examples above, navigate to [Langfuse Cloud](https://cloud.langfuse.com) (or your self-hosted instance) to see detailed traces including:\n",
+    "\n",
+    "- Request and response content\n",
+    "- Token usage and latency metrics\n",
+    "- Cost tracking per model\n",
+    "- Nested traces from `@observe()` decorated functions\n",
+    "\n",
+    "You can define custom price information via the Langfuse dashboard ([see docs](https://langfuse.com/docs/model-usage-and-cost)) to configure the exact pricing for your DashScope models.\n",
+    "<!-- STEPS_END -->"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- MARKDOWN_COMPONENT name: \"LearnMore\" path: \"@/components-mdx/integration-learn-more.mdx\" -->"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.6"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}