Skip to content

Python models cannot write to Data Warehouse with Service Principal auth (Fabric limitation) #28

@tominatorcz

Description

@tominatorcz

@sdebruyn

Python models fail with Service Principal authentication (two issues)

Environment

  • dbt-core: 1.11.2
  • dbt-fabric: 1.10.1
  • Authentication: ActiveDirectoryServicePrincipal
  • Profile config: using workspace_id + lakehouse_id (no host)
  • OS: Windows 11

Summary

When running Python models with Service Principal (SP) authentication, two separate issues prevent successful execution. SQL models work perfectly with the same SP configuration.


Issue 1: GET /sessions returns HTTP 500 with SP token

Description

LivySession._get_existing_session() calls GET /livyapi/versions/2023-12-01/sessions to look for a reusable idle session. This endpoint returns 500 Internal Server Error when authenticated with a Service Principal token, even though the SP has all required Fabric API permissions and workspace roles.

The subsequent POST /sessions call works fine (returns 202) with the exact same token.

Steps to reproduce

from azure.identity import ClientSecretCredential
import requests

credential = ClientSecretCredential(tenant_id="...", client_id="...", client_secret="...")
token = credential.get_token("https://analysis.windows.net/powerbi/api/.default")
headers = {"Authorization": f"Bearer {token.token}", "Accept": "application/json"}
base = "https://api.fabric.microsoft.com/v1/workspaces//lakehouses//livyapi/versions/2023-12-01"

# This fails with 500
r1 = requests.get(base + "/sessions", headers=headers)
print("GET:", r1.status_code)  # 500

# This works fine
r2 = requests.post(base + "/sessions", headers=headers, json={"name": "dbt-fabric"})
print("POST:", r2.status_code)  # 202

Workaround

Wrap the GET call in _get_existing_session() in a try/except so the adapter falls through to creating a new session:

def _get_existing_session(self) -> str | None:
    try:
        response = requests.get(self._get_base_url() + "/sessions", headers=self._get_headers())
        response.raise_for_status()
        for session in response.json().get("items", []):
            if session["name"] == "dbt-fabric" and session["livyState"] in (
                "idle",
                "starting",
                "running",
            ):
                return session["id"]
    except Exception:
        pass
    return None

The only downside is that existing idle sessions won't be reused — a new session is created every time. This is acceptable for most dbt workflows.

Root cause

This appears to be a Fabric API bug — GET /sessions doesn't work with SP tokens while POST /sessions does. Worth reporting to Microsoft, but the adapter should handle this gracefully regardless.


Issue 2: synapsesql Spark connector does not support Service Principal authentication

Description

After applying the workaround above, the Livy session is created successfully and the Spark job starts. However, the job itself fails with InternalServerError because the generated Spark code uses spark.read.synapsesql() and df.write.synapsesql() to read from and write to the Fabric Data Warehouse.

Microsoft explicitly documents that the synapsesql connector only supports interactive user authentication — Service Principal is not supported:

"The Spark connector for Fabric Data Warehouse only supports interactive Microsoft Entra user authentication. Service principal authentication isn't supported."

Microsoft Learn: Spark connector for Microsoft Fabric Data Warehouse

Generated Spark code (from target/run/)

import com.microsoft.spark.fabric
from com.microsoft.spark.fabric.Constants import Constants
spark.conf.set("spark.datawarehouse.DW_dbt_club_silver_dev.sqlendpoint", "...,1433")
dbt = dbtObj(spark.read.synapsesql)
df = model(dbt, spark)
df.write.mode("overwrite").synapsesql("DW_dbt_club_silver_dev.mikt.int_accounts_fixed__dbt_tmp")

The Spark job starts but fails because synapsesql cannot authenticate with the SP identity of the Livy session.

Error

The Spark monitoring page shows:

Status: Failed
Error Code: InternalServerError
Error Message: An internal server error occurred while processing the request.

And dbt reports:

('42000', "[42000] No item by the name of 'mikt.int_accounts_fixed__dbt_tmp'
could be found in the current database 'DW_dbt_club_silver_dev'")

This is because the tmp table was never created — the Spark write failed silently.

Possible solution

Microsoft provides an alternative Spark connector (mssql) that maybe does support Service Principal authentication via an explicit access token.

Reference: Microsoft Learn: Spark connector for SQL databases

The adapter could detect SP authentication and generate Spark code using the mssql connector with an explicit access token instead of synapsesql. This would require passing the SP credentials into the Spark session (e.g., via spark.conf.set or environment variables) so the Spark code can acquire its own token.


What next?

By any chance, have you come across these issues? Do you know any workaround?

Thank you for the great adapter, Sam! Happy to help test any fixes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingfabricsparkRelated to the FabricSpark (Spark SQL / Lakehouse) adapterupstream-limitationBlocked or limited by upstream dbt / package / platform constraints

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions