The foundation under every LLM app and agent. Before RAG, tools, or multi-agent orchestration, you need fluent, idiomatic, production Python. This guide covers the slice of Python that AI engineers use daily — and that interviewers expect you to write without thinking.
Why it's listed first in the roadmap: agent code is concurrent, schema-driven, API-heavy, and failure-prone. Weak Python here shows up immediately as flaky, slow, unobservable agents.
- Core Python You Actually Use
- Type Hints
- async / await & Concurrency
- Pydantic Models
- Environment & Dependency Management
- Working with APIs
- Error Handling & Retries
- Logging & Observability
- Project Hygiene
- Interview Questions
| Feature | Why it matters in AI code |
|---|---|
| Comprehensions / generators | Stream large datasets/embeddings without loading everything into memory |
Context managers (with) |
Manage clients, files, DB sessions, spans cleanly |
| Decorators | Wrap functions with retries, caching, tracing, timing |
dataclasses / Enum |
Lightweight typed records and fixed choice sets |
functools (lru_cache, partial) |
Cache deterministic calls; pre-bind config |
pathlib |
Safe, cross-platform path handling (and traversal checks) |
from functools import lru_cache
@lru_cache(maxsize=1024)
def embed_cached(text: str) -> tuple[float, ...]:
"""Cache deterministic embeddings so repeated text isn't re-embedded."""
return tuple(embed(text))Type hints make agent code maintainable and power schema generation (Pydantic, tool definitions) and editor/static checks.
from typing import Literal, Optional, Any
def call_model(
prompt: str,
model: str = "claude-sonnet-4-6",
temperature: float = 0.2,
tools: Optional[list[dict[str, Any]]] = None,
role: Literal["system", "user", "assistant"] = "user",
) -> str:
...- Prefer built-in generics (
list[str],dict[str, int]) on modern Python. Literalfor fixed string choices;Optional[X]=X | None.- Run a static checker (
mypy/pyright) in CI — it catches a class of bugs before they reach a flaky agent run.
Agents make many network calls. Doing them sequentially is the #1 latency sink.
asyncio lets independent calls run concurrently.
import asyncio
async def fetch_tool(call):
return await run_tool_async(call)
async def run_parallel_tools(tool_calls):
# Execute independent tool calls concurrently, gather all results
return await asyncio.gather(*(fetch_tool(c) for c in tool_calls))- When to use async — I/O-bound work (API/tool/DB calls). Not CPU-bound work (use processes for that).
asyncio.gather— fan out parallel calls;asyncio.as_completed— handle results as they finish;asyncio.Semaphore— cap concurrency to respect rate limits.- Don't block the event loop — never call a synchronous, blocking SDK inside an
async handler; use the async client or
run_in_executor. - Most LLM SDKs ship both sync and async clients — match the one to your app.
sem = asyncio.Semaphore(10) # respect provider rate limits
async def guarded_call(req):
async with sem:
return await client_async.call(req)Pydantic is the backbone of structured outputs, tool schemas, and config. It validates and coerces data at the boundary so the rest of your code trusts its types.
from pydantic import BaseModel, Field
class SearchArgs(BaseModel):
query: str = Field(description="Search query")
top_k: int = Field(default=5, ge=1, le=50)
class Settings(BaseModel):
model: str = "claude-opus-4-8"
temperature: float = 0.2
max_tokens: int = 4096- Tool schemas —
SearchArgs.model_json_schema()produces the JSON schema you hand to an LLM's function-calling API. - Structured outputs — parse model JSON into a validated object; on failure,
re-ask with the validation error (see the
instructorlibrary). - Config —
pydantic-settingsreads typed config from env vars /.env. - v2 essentials —
model_validate(),model_dump(),Field(...), validators.
→ Deeper: Pydantic for AI Systems · Structured Outputs
| Tool | Use |
|---|---|
venv |
Built-in isolated environments |
uv |
Fast installer + resolver + lockfile (modern default) |
poetry / pip-tools |
Dependency management + locking |
.env + python-dotenv / pydantic-settings |
Load secrets/config from env |
# config.py — never hardcode keys
from pydantic_settings import BaseSettings
class Config(BaseSettings):
anthropic_api_key: str
vector_db_url: str = "http://localhost:6333"
class Config:
env_file = ".env"
config = Config() # reads from environment / .env- Pin dependencies with a lockfile for reproducibility.
- Never commit
.env— commit.env.example; load real secrets from env / a secrets manager. See GitHub Project Setup.
import httpx
async def post_json(url: str, payload: dict, timeout: float = 30.0) -> dict:
async with httpx.AsyncClient(timeout=timeout) as client:
resp = await client.post(url, json=payload)
resp.raise_for_status()
return resp.json()httpx(async-capable) orrequests(sync) for raw HTTP; prefer the provider's official SDK when one exists.- Always set timeouts — a hung request stalls the whole agent.
- Streaming — consume Server-Sent Events incrementally for token-by-token UX.
- Pagination — loop until the cursor/
has_nextis exhausted. - Token counting — use the provider's counter (not
tiktokenfor Claude).
→ Production HTTP/API patterns: FastAPI for AI Backends
Networks fail, providers rate-limit, and models occasionally return junk. Handle it.
import time, random
def with_retry(fn, *, retries=5, base=1.0, max_delay=30.0):
for attempt in range(retries):
try:
return fn()
except RateLimitError: # 429 — retryable
pass
except ServerError: # 5xx — retryable
pass
except BadRequestError: # 4xx — do NOT retry
raise
delay = min(base * 2 ** attempt + random.uniform(0, 1), max_delay)
time.sleep(delay)
raise RuntimeError("exhausted retries")- Catch a chain, most-specific first — distinguish retryable (429, 5xx, network) from non-retryable (4xx).
- Exponential backoff + jitter — avoid thundering-herd retries. (Most SDKs do
this automatically — configure
max_retries.) - Validate model output — wrap structured parsing in try/except and retry with the error message so the model self-corrects.
- Fail gracefully — return a useful error to the agent (e.g. tool
is_error), don't crash the whole run.
Agents are non-deterministic — logs are how you debug them.
import logging, json, uuid
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s")
log = logging.getLogger("agent")
def log_event(event: str, **fields):
"""Structured log line with a correlation id for tracing a run."""
log.info(json.dumps({"event": event, **fields}))
run_id = str(uuid.uuid4())
log_event("tool_call", run_id=run_id, tool="search", args={"query": "..."})- Structured (JSON) logs — machine-parseable; filter by
run_id, tool, latency. - Correlation IDs — tag every log/span in one agent run with the same id.
- Log the right things — each model call (model, tokens, cost, latency), each tool call (name, args, result, error), and decisions.
- Levels —
DEBUGfor traces,INFOfor events,WARNING/ERRORfor failures. - Graduate to tracing — OpenTelemetry / LangSmith / Langfuse for end-to-end spans.
→ Deeper: Observability & Tracing · LLMOps
src/layout, tests, configs separated from code → see ML/AI Project Folder Structures.- Lint/format with
ruff; type-check withmypy/pyright; test withpytest— all in CI. - Pre-commit hooks to catch issues before they hit CI.
- When do you reach for
asyncin an AI app? → I/O-bound concurrency — parallel model/tool/DB calls. Useasyncio.gather, cap with a semaphore for rate limits; don't use it for CPU-bound work. - Why Pydantic in an LLM pipeline? → Validate/coerce model output and tool args at the boundary, auto-generate JSON schemas, and load typed config.
- How do you handle a 429 vs a 400? → Retry 429 (and 5xx) with exponential backoff + jitter; never retry 400 — it's a client error, fix the request.
- How do you make a non-deterministic agent debuggable? → Structured logs with a per-run correlation id, log every model/tool call with tokens/latency, and end-to-end tracing.
- How do you keep secrets out of code? →
.env(git-ignored) + a settings loader locally, secrets manager / CI secrets in production; never hardcode. - Sync vs async SDK client — does it matter? → Yes — calling a blocking sync client inside an async handler stalls the event loop; use the async client or offload to an executor.