Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions packages/cli/LIMITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,13 +59,14 @@ If you hit something that's not documented here, open an issue.
### Padrón RUC

- ✅ **Local padrón download + lookup** — verified end-to-end (PR #3 smoke test).
- **Padrón puntual via portal `e-consultaruc.sunat.gob.pe`** — the form now requires a `numRnd` token + reCAPTCHA. Plain HTTP POSTs return 404. Workaround would need `agent-browser` automation (same pattern as RHE/F616 already use). **Local padrón is strictly better for batch/scriptable use anyway** — instantaneous after sync, no network roundtrip per RUC.
- ⚠️ **`padron ruc-online` via SUNAT portal** (PR #8) — agent-browser drives `e-consultaruc.sunat.gob.pe` (bypasses the `numRnd` + reCAPTCHA gate that broke direct fetch). Pure parser unit-tested with 7 fixture cases. Live scraping untested in CI (no Chrome) — verify post-merge by running `sunat padron ruc-online 20131312955`. **For batch use always prefer local padrón** (`padron ruc/batch`) — `ruc-online` is ~5-10s per RUC.

### Tipo de Cambio

- ⛔ **SUNAT `e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias`** — blocked by WAF, returns "Request Rejected".
- ⛔ **SBS `sbs.gob.pe`** — also blocked by WAF.
- 🚧 **`sunat tipo-cambio` command** — not implemented. Future PR with `agent-browser` driver.
- ⚠️ **`sunat tipo-cambio` via SUNAT portal** (PR #8) — agent-browser scrapes `e-consulta.sunat.gob.pe/cl-at-ittipcam/tcS01Alias` (the WAF blocks direct fetch but allows headless Chrome via DevTools). Pure parser unit-tested with 7 fixture cases. Cache: `~/.sunat/cache/tipo-cambio.jsonl` keyed by ISO date (immutable per date, cached forever).
- ⛔ **SBS `sbs.gob.pe`** — also blocked by WAF, NOT bypassed in PR #8 (SUNAT's own TC is the legally-valid one for tax purposes anyway).
- 🚧 **Live scraping untested in CI** (no Chrome). Verify post-merge by running `sunat tipo-cambio` and confirm a reasonable USD/PEN value comes back.
- 🚧 **No automatic fallback** — if SUNAT changes the table layout, the parser returns null. The error message hints at running with debug to inspect the snapshot. Future PR could add a third-party fallback (with explicit user opt-in via env var).

### Consulta CPE Integrada

Expand Down
15 changes: 15 additions & 0 deletions packages/cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,23 @@ sunat-cli cpe gre emit --params '{
sunat-cli cpe gre status --ticket 20240100000001 --wait
```

### Tipo de Cambio oficial SUNAT (USD/PEN)

```bash
sunat-cli tipo-cambio # today's USD/PEN
sunat-cli tipo-cambio --fecha 2026-04-15 # historical, immutable
sunat-cli tipo-cambio cached --fecha 2026-04-15
```

Scrapes the SUNAT portal via agent-browser (WAF blocks direct fetch).
Cached forever per date.

### Padrón Reducido del RUC (offline lookup, no auth)

```bash
sunat-cli padron ruc-online 20131312955 # single RUC via portal (no sync needed)
```

```bash
sunat-cli padron sync # ~370MB download, refreshes daily
sunat-cli padron ruc 20131312955 # razon social, estado, condicion
Expand Down
2 changes: 2 additions & 0 deletions packages/cli/bin/sunat.ts
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import { createLukeaCommand } from "../src/commands/lukea/index.ts";
import { createCpeCommand } from "../src/commands/cpe/index.ts";
import { createPadronCommand } from "../src/commands/padron/index.ts";
import { createSireCommand } from "../src/commands/sire/index.ts";
import { createTipoCambioCommand } from "../src/commands/tipo-cambio.ts";

const program = new Command();

Expand All @@ -35,5 +36,6 @@ program.addCommand(createLukeaCommand());
program.addCommand(createCpeCommand());
program.addCommand(createPadronCommand());
program.addCommand(createSireCommand());
program.addCommand(createTipoCambioCommand());

program.parse();
21 changes: 21 additions & 0 deletions packages/cli/skills/sunat-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,6 +336,27 @@ Polling: `--wait` polls getStatus with backoff (2s/4s/8s/16s/30s, max 5min).
Without `--wait`, returns the ticket and you poll independently with
`sunat sire {ventas|compras} ticket --num <id> [--wait]`.

### Tipo de Cambio oficial SUNAT

```bash
sunat tipo-cambio # today's USD/PEN
sunat tipo-cambio --fecha 2026-04-15 # historical (immutable)
sunat tipo-cambio --force # bypass cache
sunat tipo-cambio cached --fecha 2026-04-15 # cache-only, no scrape
```

Scrapes the official SUNAT portal via agent-browser (WAF blocks direct
fetch). Cached forever per date in `~/.sunat/cache/tipo-cambio.jsonl`
since SUNAT TCs are immutable.

### Padrón RUC online (single lookup, no padrón sync)

```bash
sunat padron ruc-online 20131312955 # ~5-10s, drives SUNAT portal via browser
```

For batch: always use `sunat padron ruc/batch` (offline padrón, instantaneous).

### Padrón Reducido del RUC (offline)

Local copy of the SUNAT RUC registry. ~370MB ZIP, ~600MB TXT, ~3.5M entries.
Expand Down
26 changes: 26 additions & 0 deletions packages/cli/src/commands/padron/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -152,5 +152,31 @@ export function createPadronCommand(): Command {
}
});

padron
.command("ruc-online")
.description(
"Lookup a single RUC by driving the SUNAT portal via agent-browser " +
"(slow ~5-10s, no padrón sync needed). For batch use 'padron ruc/batch' instead. T0.",
)
.argument("<ruc>", "11-digit RUC")
.action(async (ruc, _opts, cmd) => {
const format = getFormat(cmd);
try {
if (!/^\d{11}$/.test(ruc)) {
outputError(`Invalid RUC: '${ruc}'. Must be 11 digits.`, format);
return;
}
const { consultarRucPortal } = await import("../../sunat-rest/ruc-portal.ts");
const entry = await consultarRucPortal(ruc);
if (!entry) {
output(format, { json: { ruc, found: false, source: "sunat-portal" } });
return;
}
output(format, { json: { found: true, ...entry } });
} catch (err) {
outputError(err instanceof Error ? err.message : String(err), format);
}
});

return padron;
}
69 changes: 69 additions & 0 deletions packages/cli/src/commands/tipo-cambio.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
import { Command } from "commander";
import { audit } from "../data/audit.ts";
import { getTipoCambio, loadCachedTc } from "../sunat-rest/tipo-cambio.ts";
import { output, outputError } from "../utils/output.ts";

type Format = "json" | "table" | "auto";

function getFormat(cmd: Command): Format {
let parent: Command | null = cmd;
while (parent) {
const opts = parent.opts();
if (opts.output) return opts.output as Format;
parent = parent.parent;
}
return "auto";
}

export function createTipoCambioCommand(): Command {
const tc = new Command("tipo-cambio").description(
"Tipo de Cambio oficial SUNAT (USD/PEN) — scrapes the SUNAT portal via agent-browser. T0.",
);

tc
.option("--fecha <YYYY-MM-DD>", "Date for which to fetch the rate (defaults to today)")
.option("--force", "Bypass local cache (default: cached if present, since SUNAT TC is immutable per date)")
.action(async (opts, cmd) => {
const format = getFormat(cmd);
try {
const fecha = opts.fecha;
if (fecha && !/^\d{4}-\d{2}-\d{2}$/.test(fecha)) {
outputError(`--fecha must be YYYY-MM-DD, got: ${fecha}`, format);
return;
}
const rate = await getTipoCambio({ fecha, force: !!opts.force });
audit({
command: "tipo-cambio",
args: { fecha: fecha || "today", force: !!opts.force },
result: "success",
details: { fecha: rate.fecha, compra: rate.compra, venta: rate.venta },
});
output(format, { json: rate });
} catch (err) {
outputError(err instanceof Error ? err.message : String(err), format);
}
});

tc
.command("cached")
.description("List rates already cached locally without scraping. T0.")
.option("--fecha <YYYY-MM-DD>", "Filter to one specific date")
.action((opts, cmd) => {
const format = getFormat(cmd);
try {
if (opts.fecha) {
const r = loadCachedTc(opts.fecha);
output(format, { json: r ? { found: true, ...r } : { found: false, fecha: opts.fecha } });
return;
}
outputError(
"--fecha required for 'cached' (full cache list shaped, not implemented)",
format,
);
} catch (err) {
outputError(err instanceof Error ? err.message : String(err), format);
}
});

return tc;
}
137 changes: 137 additions & 0 deletions packages/cli/src/sunat-rest/ruc-portal.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
/**
* RUC consulta puntual via SUNAT portal (e-consultaruc.sunat.gob.pe).
*
* Direct HTTP POSTs return 404 because the portal added a `numRnd` token
* + reCAPTCHA in 2024. Workaround: drive a real Chrome via agent-browser,
* fill the form, parse the rendered detail page.
*
* For BATCH lookups always prefer `sunat padron ruc/batch` (offline,
* instantaneous after sync). This module is for ad-hoc single-RUC checks
* when you don't want to download the 370MB padrón.
*/

import * as browser from "../browser/client.ts";

const PORTAL_URL = "https://e-consultaruc.sunat.gob.pe/cl-ti-itmrconsruc/FrameCriterioBusquedaWeb.jsp";

export interface RucPortalEntry {
ruc: string;
razonSocial: string;
estado?: string; // "ACTIVO", "BAJA DE OFICIO", etc
condicion?: string; // "HABIDO", "NO HABIDO", "NO HALLADO", etc
tipoContribuyente?: string;
direccion?: string;
departamento?: string;
provincia?: string;
distrito?: string;
source: "sunat-portal";
fetchedAt: string;
}

/**
* Pure parser for a SUNAT RUC detail page snapshot.
*
* The portal renders a table with rows like:
* "Número de RUC: 20131312955 - SUPERINTENDENCIA NACIONAL ..."
* "Tipo Contribuyente: ..."
* "Estado del Contribuyente: ACTIVO"
* "Condición del Contribuyente: HABIDO"
* "Domicilio Fiscal: AV. ... LIMA - LIMA - LIMA"
*
* agent-browser snapshot strips formatting but preserves these
* "Label: Value" pairs. We extract them with a tolerant regex.
*/
export function parseRucSnapshot(snapshot: string, ruc: string): RucPortalEntry | null {
// Header line: "Número de RUC: {ruc} - {razon social}"
const headerMatch = snapshot.match(/N[uú]mero de RUC[:\s]*(\d{11})\s*[-–]?\s*([^\n]+)/i);
if (!headerMatch || headerMatch[1] !== ruc) return null;

const razonSocial = headerMatch[2].trim();

const labelValue = (label: RegExp): string | undefined => {
const m = snapshot.match(new RegExp(`${label.source}[:\\s]*([^\\n]+)`, "i"));
return m ? m[1].trim() : undefined;
};

const estado = labelValue(/Estado del Contribuyente/);
const condicion = labelValue(/Condici[óo]n del Contribuyente/);
const tipoContribuyente = labelValue(/Tipo (?:de )?Contribuyente/);
const direccion = labelValue(/Domicilio Fiscal/);

let departamento: string | undefined;
let provincia: string | undefined;
let distrito: string | undefined;
if (direccion) {
// SUNAT format: "AV CALLE 123 DISTRITO - PROVINCIA - DEPARTAMENTO"
// where the last segment before " - X - Y" is the address tail with the
// distrito appended. We pull the last 3 hyphen-segments and then
// tokenize the leftmost of those to extract the distrito.
const parts = direccion.split(/\s*-\s*/).map((p) => p.trim()).filter(Boolean);
if (parts.length >= 3) {
departamento = parts[parts.length - 1];
provincia = parts[parts.length - 2];
const tail = parts[parts.length - 3];
// distrito is the last whitespace-separated token in the tail
const tokens = tail.split(/\s+/);
distrito = tokens[tokens.length - 1];
}
}

return {
ruc,
razonSocial,
estado,
condicion,
tipoContribuyente,
direccion,
departamento,
provincia,
distrito,
source: "sunat-portal",
fetchedAt: new Date().toISOString(),
};
}

/**
* Navigate the portal, fill the RUC field, click consultar, parse the result.
*
* Uses headless agent-browser. Slow (~5-10s per RUC). For batch use, fall
* back to local padrón instead.
*/
export async function consultarRucPortal(ruc: string): Promise<RucPortalEntry | null> {
if (!/^\d{11}$/.test(ruc)) {
throw new Error(`Invalid RUC: '${ruc}'. Must be exactly 11 digits.`);
}

await browser.open(PORTAL_URL, { headed: false });
await browser.sleep(2000);

const formSnap = await browser.snapshot({ interactive: true });
const rucRef = extractRef(formSnap, "txtRuc") || extractRef(formSnap, "RUC");
const submitRef = extractRef(formSnap, "Buscar") || extractRef(formSnap, "btnAceptar");

if (rucRef) await browser.fill(rucRef, ruc);
else {
// Last-resort: try evaluating the form fields directly
await browser.evalJS(`document.getElementById('txtRuc').value = '${ruc}';`);
}

if (submitRef) await browser.click(submitRef);
else {
await browser.evalJS("document.forms.mainForm && document.forms.mainForm.submit();");
}

await browser.sleep(2500);
const detail = await browser.snapshot();
return parseRucSnapshot(detail, ruc);
}

/**
* Best-effort ref extraction from agent-browser interactive snapshot.
* The interactive output formats refs as `[ref=e1]` next to interactive elements.
*/
function extractRef(snapshot: string, marker: string): string | null {
const rx = new RegExp(`${marker}[\\s\\S]{0,80}?\\[ref=([a-z]\\d+)\\]`, "i");
const m = snapshot.match(rx);
return m ? m[1] : null;
}
Loading
Loading