Skip to content

Python vs TypeScript

benzsevern edited this page Apr 8, 2026 · 1 revision

Python vs TypeScript

The infermap project ships two packages with identical mapping behavior:

  • infermap on PyPI — the original Python implementation.
  • infermap on npm — the TypeScript port.

Mapping decisions (which source maps to which target) and confidence scores agree to 4 decimal places, enforced by a shared golden-test parity suite that runs on every CI build. If Python scorer logic changes, the golden generator must be re-run and the TS parity tests re-verified before anything merges.

Which should I use?

If you are… Use
Building a Python data pipeline or notebook Python — direct Polars/Pandas integration, full DB support
Building a Next.js app, Node service, or browser tool TypeScript — zero runtime deps, edge-runtime compatible
Running mapping in a JAMstack / serverless edge function TypeScript — runs on V8 isolates without Node built-ins
Doing ad-hoc CSV exploration from the command line Python CLI has more features (YAML schema files, Excel); TS CLI is leaner
Integrating with SQLAlchemy / DuckDB / psycopg2 Python — richer DB layer
Integrating with Prisma / Drizzle / Kysely (TS ORMs) TypeScriptinfermap/node for DB extraction, then TS native ORM for application

If you need both (e.g. Python backend + Next.js admin UI) — both are fine. Outputs are interoperable via the JSON config format.

Feature parity

Feature Python TypeScript Notes
6 built-in scorers Same algorithms, same weights, same defaults
Weighted average with min 2 contributors Identical gate
Hungarian assignment ✅ (scipy) ✅ (vendored) ~150 LOC Kuhn-Munkres, parity-verified
In-memory provider ✅ (polars/pandas/list[dict]) ✅ (Array) TS variant is plain JS records
CSV provider ✅ (polars) ✅ (vendored RFC 4180) TS parser is ~80 LOC, no deps
JSON records provider TS-only convenience for JSON data
Schema definition file ✅ (YAML + JSON) ✅ (JSON only) TS is JSON-only by design
SQLite provider TS uses better-sqlite3 (optional)
PostgreSQL provider TS uses pg (optional)
DuckDB provider TS uses @duckdb/node-api (optional)
MySQL provider 🔶 stubbed 🔶 stubbed Neither implemented
Parquet / Excel Python only — TS has no polars equivalent
Engine runtime config ✅ (YAML) ✅ (JSON) Same shape, JSON only in TS
Saved MapResult config ✅ (YAML + load) ✅ (JSON + load) Interoperable via JSON
Custom scorers (decorator) @infermap.scorer defineScorer() Equivalent function-style APIs
LLM scorer 🔶 stub 🔶 stub + adapter type TS exposes LLMAdapter for future async path
CLI ✅ Typer-based parseArgs-based Same subcommands: map/apply/inspect/validate
Apply mapping to DataFrame ✅ (polars/pandas rename) ✅ (CSV rewrite only) TS CLI supports apply for CSVs
Edge runtime compatible TS default entrypoint has zero Node built-ins

API mapping

Side-by-side of common operations.

Basic map

# Python
import infermap
result = infermap.map(source, target)
// TypeScript
import { map } from "infermap";
const result = map(source, target);

Engine with config

# Python
from infermap import MapEngine
engine = MapEngine(min_confidence=0.4, config_path="infermap.yaml")
result = engine.map(source, target, required=["email"])
// TypeScript
import { MapEngine, loadEngineConfig, applyScorerOverrides, defaultScorers, AliasScorer } from "infermap";
import { readFile } from "node:fs/promises";

const cfg = loadEngineConfig(await readFile("infermap.json", "utf8"));
const aliasScorer = new AliasScorer(cfg.aliases ?? {});
const scorers = applyScorerOverrides(
  defaultScorers().map((s) => s.name === "AliasScorer" ? aliasScorer : s),
  cfg.scorers
);
const engine = new MapEngine({ minConfidence: 0.4, scorers });
const result = engine.mapSchemas(srcSchema, tgtSchema, { required: ["email"] });

Or, more concisely, via the map() wrapper:

import { map } from "infermap";
const result = map(source, target, {
  engineOptions: { minConfidence: 0.4 },
  config: { scorers: {...}, aliases: {...} },
  required: ["email"],
});

Custom scorer

# Python
from infermap import scorer, ScorerResult, FieldInfo

@scorer("domain", weight=0.6)
def domain(src: FieldInfo, tgt: FieldInfo) -> ScorerResult | None:
    if src.name == tgt.name.lower():
        return ScorerResult(score=1.0, reasoning="domain match")
    return None
// TypeScript
import { defineScorer, makeScorerResult } from "infermap";

const domain = defineScorer(
  "domain",
  (src, tgt) => {
    if (src.name === tgt.name.toLowerCase()) {
      return makeScorerResult(1.0, "domain match");
    }
    return null;
  },
  0.6
);

Database extraction

# Python
from infermap.providers import extract_schema
schema = extract_schema("sqlite:///crm.db", table="customers")
// TypeScript
import { extractDbSchema } from "infermap/node";
const schema = await extractDbSchema("sqlite:///crm.db", { table: "customers" });

Save + reload a mapping

# Python — YAML format
result.to_config("mapping.yaml")
restored = infermap.from_config("mapping.yaml")
// TypeScript — JSON format
import { mapResultToConfigJson, fromConfig } from "infermap";
import { readFile, writeFile } from "node:fs/promises";

await writeFile("mapping.json", mapResultToConfigJson(result));
const restored = fromConfig(await readFile("mapping.json", "utf8"));

Interop note: The Python from_config also accepts a JSON file if you rename the extension or adjust the loader. Mapping JSON produced by the TS side works in Python downstream.

Naming conventions

Python uses snake_case, TypeScript uses camelCase. The data shapes are otherwise identical.

Python field TypeScript field
sample_values sampleValues
null_rate nullRate
unique_rate uniqueRate
value_count valueCount
source_name sourceName
required_fields requiredFields
unmapped_source unmappedSource
unmapped_target unmappedTarget

The saved-config JSON format uses snake_case on both sides for interop.

Migrating a Python project to TypeScript

  1. Identify the integration points — where does infermap.map(...) get called? For most projects this is in 1–3 places.
  2. Replace inputs — if you were passing file paths, switch to extractSchemaFromFile(path) from infermap/node. If you were passing Polars/Pandas DataFrames, convert to Array<Record<string, unknown>> and use the records: input shape.
  3. Convert custom scorers — replace @infermap.scorer decorators with defineScorer() calls. The function signature is identical (two FieldInfoScorerResult | null).
  4. Port infermap.yaml → JSON — the shapes match; just change the format. You can keep both files if Python and TS are both running against the same rules.
  5. Verify parity — run the same inputs through both implementations and diff the output. Should agree to 4 decimal places on confidence scores, and mappings should be identical. Any drift is a bug; please report it.

When parity might drift

The golden parity suite catches drift between the shipped Python and TS packages, but a few subtle edge cases could in principle diverge:

  • Floating-point in the last bit — both runtimes are IEEE 754, but accumulation order could differ by one ULP. Tests round to 4 decimal places.
  • Dtype inference on borderline cases — TS uses a vendored heuristic, Python uses polars. Mixed-content columns (e.g. "0" and "0.5") could classify differently. The parity suite covers the common cases; report drift if you see it.
  • Regex engine quirks — the vendored semantic-type regexes are pure ASCII and work identically in Python re and JS RegExp. If a future regex adds lookbehind or Unicode properties, parity may require careful translation.
  • LLMScorer is async-capable in TS but not in Python — both stubs always abstain in the sync path. If you wire up an actual LLM adapter, the sync paths will still match (both abstain); the async path is TS-only for now.

If you hit a parity bug, please file an issue with the two inputs and both outputs.

See also

Clone this wiki locally