-
-
Notifications
You must be signed in to change notification settings - Fork 0
Python vs TypeScript
The infermap project ships two packages with identical mapping behavior:
Mapping decisions (which source maps to which target) and confidence scores agree to 4 decimal places, enforced by a shared golden-test parity suite that runs on every CI build. If Python scorer logic changes, the golden generator must be re-run and the TS parity tests re-verified before anything merges.
| If you are… | Use |
|---|---|
| Building a Python data pipeline or notebook | Python — direct Polars/Pandas integration, full DB support |
| Building a Next.js app, Node service, or browser tool | TypeScript — zero runtime deps, edge-runtime compatible |
| Running mapping in a JAMstack / serverless edge function | TypeScript — runs on V8 isolates without Node built-ins |
| Doing ad-hoc CSV exploration from the command line | Python CLI has more features (YAML schema files, Excel); TS CLI is leaner |
| Integrating with SQLAlchemy / DuckDB / psycopg2 | Python — richer DB layer |
| Integrating with Prisma / Drizzle / Kysely (TS ORMs) |
TypeScript — infermap/node for DB extraction, then TS native ORM for application |
If you need both (e.g. Python backend + Next.js admin UI) — both are fine. Outputs are interoperable via the JSON config format.
| Feature | Python | TypeScript | Notes |
|---|---|---|---|
| 6 built-in scorers | ✅ | ✅ | Same algorithms, same weights, same defaults |
| Weighted average with min 2 contributors | ✅ | ✅ | Identical gate |
| Hungarian assignment | ✅ (scipy) | ✅ (vendored) | ~150 LOC Kuhn-Munkres, parity-verified |
| In-memory provider | ✅ (polars/pandas/list[dict]) | ✅ (Array) | TS variant is plain JS records |
| CSV provider | ✅ (polars) | ✅ (vendored RFC 4180) | TS parser is ~80 LOC, no deps |
| JSON records provider | ❌ | ✅ | TS-only convenience for JSON data |
| Schema definition file | ✅ (YAML + JSON) | ✅ (JSON only) | TS is JSON-only by design |
| SQLite provider | ✅ | ✅ | TS uses better-sqlite3 (optional) |
| PostgreSQL provider | ✅ | ✅ | TS uses pg (optional) |
| DuckDB provider | ✅ | ✅ | TS uses @duckdb/node-api (optional) |
| MySQL provider | 🔶 stubbed | 🔶 stubbed | Neither implemented |
| Parquet / Excel | ✅ | ❌ | Python only — TS has no polars equivalent |
| Engine runtime config | ✅ (YAML) | ✅ (JSON) | Same shape, JSON only in TS |
| Saved MapResult config | ✅ (YAML + load) | ✅ (JSON + load) | Interoperable via JSON |
| Custom scorers (decorator) | @infermap.scorer |
defineScorer() |
Equivalent function-style APIs |
| LLM scorer | 🔶 stub | 🔶 stub + adapter type | TS exposes LLMAdapter for future async path |
| CLI | ✅ Typer-based | ✅ parseArgs-based |
Same subcommands: map/apply/inspect/validate |
| Apply mapping to DataFrame | ✅ (polars/pandas rename) | ✅ (CSV rewrite only) | TS CLI supports apply for CSVs |
| Edge runtime compatible | ❌ | ✅ | TS default entrypoint has zero Node built-ins |
Side-by-side of common operations.
# Python
import infermap
result = infermap.map(source, target)// TypeScript
import { map } from "infermap";
const result = map(source, target);# Python
from infermap import MapEngine
engine = MapEngine(min_confidence=0.4, config_path="infermap.yaml")
result = engine.map(source, target, required=["email"])// TypeScript
import { MapEngine, loadEngineConfig, applyScorerOverrides, defaultScorers, AliasScorer } from "infermap";
import { readFile } from "node:fs/promises";
const cfg = loadEngineConfig(await readFile("infermap.json", "utf8"));
const aliasScorer = new AliasScorer(cfg.aliases ?? {});
const scorers = applyScorerOverrides(
defaultScorers().map((s) => s.name === "AliasScorer" ? aliasScorer : s),
cfg.scorers
);
const engine = new MapEngine({ minConfidence: 0.4, scorers });
const result = engine.mapSchemas(srcSchema, tgtSchema, { required: ["email"] });Or, more concisely, via the map() wrapper:
import { map } from "infermap";
const result = map(source, target, {
engineOptions: { minConfidence: 0.4 },
config: { scorers: {...}, aliases: {...} },
required: ["email"],
});# Python
from infermap import scorer, ScorerResult, FieldInfo
@scorer("domain", weight=0.6)
def domain(src: FieldInfo, tgt: FieldInfo) -> ScorerResult | None:
if src.name == tgt.name.lower():
return ScorerResult(score=1.0, reasoning="domain match")
return None// TypeScript
import { defineScorer, makeScorerResult } from "infermap";
const domain = defineScorer(
"domain",
(src, tgt) => {
if (src.name === tgt.name.toLowerCase()) {
return makeScorerResult(1.0, "domain match");
}
return null;
},
0.6
);# Python
from infermap.providers import extract_schema
schema = extract_schema("sqlite:///crm.db", table="customers")// TypeScript
import { extractDbSchema } from "infermap/node";
const schema = await extractDbSchema("sqlite:///crm.db", { table: "customers" });# Python — YAML format
result.to_config("mapping.yaml")
restored = infermap.from_config("mapping.yaml")// TypeScript — JSON format
import { mapResultToConfigJson, fromConfig } from "infermap";
import { readFile, writeFile } from "node:fs/promises";
await writeFile("mapping.json", mapResultToConfigJson(result));
const restored = fromConfig(await readFile("mapping.json", "utf8"));Interop note: The Python
from_configalso accepts a JSON file if you rename the extension or adjust the loader. Mapping JSON produced by the TS side works in Python downstream.
Python uses snake_case, TypeScript uses camelCase. The data shapes are otherwise identical.
| Python field | TypeScript field |
|---|---|
sample_values |
sampleValues |
null_rate |
nullRate |
unique_rate |
uniqueRate |
value_count |
valueCount |
source_name |
sourceName |
required_fields |
requiredFields |
unmapped_source |
unmappedSource |
unmapped_target |
unmappedTarget |
The saved-config JSON format uses snake_case on both sides for interop.
-
Identify the integration points — where does
infermap.map(...)get called? For most projects this is in 1–3 places. -
Replace inputs — if you were passing file paths, switch to
extractSchemaFromFile(path)frominfermap/node. If you were passing Polars/Pandas DataFrames, convert toArray<Record<string, unknown>>and use therecords:input shape. -
Convert custom scorers — replace
@infermap.scorerdecorators withdefineScorer()calls. The function signature is identical (twoFieldInfo→ScorerResult | null). -
Port
infermap.yaml→ JSON — the shapes match; just change the format. You can keep both files if Python and TS are both running against the same rules. - Verify parity — run the same inputs through both implementations and diff the output. Should agree to 4 decimal places on confidence scores, and mappings should be identical. Any drift is a bug; please report it.
The golden parity suite catches drift between the shipped Python and TS packages, but a few subtle edge cases could in principle diverge:
- Floating-point in the last bit — both runtimes are IEEE 754, but accumulation order could differ by one ULP. Tests round to 4 decimal places.
-
Dtype inference on borderline cases — TS uses a vendored heuristic, Python uses polars. Mixed-content columns (e.g.
"0"and"0.5") could classify differently. The parity suite covers the common cases; report drift if you see it. -
Regex engine quirks — the vendored semantic-type regexes are pure ASCII and work identically in Python
reand JSRegExp. If a future regex adds lookbehind or Unicode properties, parity may require careful translation. -
LLMScoreris async-capable in TS but not in Python — both stubs always abstain in the sync path. If you wire up an actual LLM adapter, the sync paths will still match (both abstain); the async path is TS-only for now.
If you hit a parity bug, please file an issue with the two inputs and both outputs.
- TypeScript API — full TS reference
- Python API — full Python reference
- Getting Started
- TypeScript examples
- Python examples