diff --git a/AGENTS.md b/AGENTS.md index bd7e255..f3f59e1 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -41,6 +41,7 @@ Pre-commit hooks provide fast feedback; `prek` is a drop-in replacement that rea - Configured hooks: - Pre-commit: `trailing-whitespace`, `end-of-file-fixer`, `check-merge-conflict`, `check-yaml`, `check-toml`, `check-json`, `check-added-large-files`, `detect-private-key`, `check-executables-have-shebangs`, `check-symlinks`, `check-case-conflict`, `cargo fmt --check`. - Pre-push: `cargo clippy --workspace --all-targets --all-features -D warnings`, `cargo test --workspace`. +- If a pre-push hook fails, fix the reported changes, rerun the full test suite, then commit and push again after the hook clears. ## Commit & Pull Request Guidelines No established commit conventions are present yet. Until standards are set: @@ -59,3 +60,17 @@ No established commit conventions are present yet. Until standards are set: - Do not add proprietary binaries, keys, or assets to the repository. - Always test changes, update relevant documentation, and commit all code you modify or add. - Push all commits after creating them. + +## Subagent Conflict Resolution +When using subagents, apply this workflow to keep ownership and diffs clear. + +1) Record a clean baseline before spawning subagents: `git status -sb` and `git diff --stat`. +2) Assign each subagent a strict file or directory scope. +3) After each subagent finishes, compare changes to the baseline and declared scope. +4) If unexpected changes appear: + - Stop all subagents. + - Inspect the diff (`git diff --name-status` + `git diff` for unexpected files). + - Decide to accept, revert, or move the changes into a separate commit. +5) Do not mix subagent outputs across scopes in a single commit. +6) If hooks modify files on commit/push, rerun the hook targets, re-stage, re-commit, then push again. +7) Only push when `git status -sb` shows a clean tree and hooks are green. diff --git a/Cargo.lock b/Cargo.lock index 5c1a126..824930b 100644 --- a/Cargo.lock +++ b/Cargo.lock @@ -238,6 +238,15 @@ version = "0.11.0" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "df1d3c3b53da64cf5760482273a98e575c651a67eec7f77df96b5b642de8f039" +[[package]] +name = "lz4_flex" +version = "0.11.5" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "08ab2867e3eeeca90e844d1940eab391c9dc5228783db2ed999acbc0a9ed375a" +dependencies = [ + "twox-hash", +] + [[package]] name = "memchr" version = "2.7.6" @@ -312,6 +321,7 @@ dependencies = [ name = "recomp-pipeline" version = "0.1.0" dependencies = [ + "lz4_flex", "pathdiff", "serde", "serde_json", @@ -328,6 +338,8 @@ dependencies = [ "recomp-gfx", "recomp-services", "recomp-timing", + "serde", + "serde_json", "thiserror", ] @@ -524,6 +536,12 @@ version = "0.1.2" source = "registry+https://github.com/rust-lang/crates.io-index" checksum = "5d99f8c9a7727884afe522e9bd5edbfc91a3312b36a77b5fb8926e4c31a41801" +[[package]] +name = "twox-hash" +version = "2.1.2" +source = "registry+https://github.com/rust-lang/crates.io-index" +checksum = "9ea3136b675547379c4bd395ca6b938e5ad3c3d20fad76e7fe85f9e0d011419c" + [[package]] name = "typenum" version = "1.19.0" diff --git a/PLANS.md b/PLANS.md index 2ad186a..de559ad 100644 --- a/PLANS.md +++ b/PLANS.md @@ -11,6 +11,12 @@ This file tracks implementation work derived from specs that do not yet have a c - SPEC-096 Bundle Manifest Integrity - SPEC-100 Validation and Acceptance - SPEC-110 Target Title Selection Criteria +- SPEC-120 Homebrew Candidate Intake +- SPEC-130 Homebrew Module Extraction +- SPEC-140 Homebrew Runtime Surface +- SPEC-150 Homebrew Asset Packaging +- SPEC-160 AArch64 Decode Coverage +- SPEC-170 Function Discovery and Control-Flow Graph ## SPEC-000: Project Charter and Ethics Outcome @@ -120,3 +126,100 @@ Work items Exit criteria (from SPEC-110) - A documented selection that satisfies all checklist items. - A published plan for obtaining inputs legally and privately. + +## SPEC-120: Homebrew Candidate Intake +Outcome +- Accept a legally distributable homebrew candidate and emit a deterministic intake manifest. + +Work items +- [x] Define a module intake manifest schema for NRO + optional NSO inputs. +- [x] Implement NRO intake parsing for header fields and asset section offsets. +- [x] Add provenance validation checks for homebrew inputs (reject proprietary or encrypted formats). +- [x] Emit deterministic `module.json` and `manifest.json` with hashes, sizes, and tool versions. +- [x] Add sample intake tests using non-proprietary NRO fixtures. + +Exit criteria (from SPEC-120) +- A homebrew NRO can be ingested with hashes, build id, and asset offsets recorded. +- Asset extraction is recorded without mixing assets into code output. +- Intake errors are explicit when required fields are missing or unsupported. + +## SPEC-130: Homebrew Module Extraction +Outcome +- Normalize NRO/NSO binaries into module.json and extracted segment blobs. + +Work items +- [x] Implement NSO parsing including LZ4 segment decompression. +- [x] Capture build id/module id and preserve section boundaries in module.json. +- [x] Preserve relocation and symbol metadata when present. +- [x] Ensure extraction is deterministic across runs. +- [x] Add tests for NRO-only and NRO + NSO ingestion paths. + +Exit criteria (from SPEC-130) +- NRO and NSO inputs yield module.json with correct segment sizes and build id. +- Compressed NSO segments are decompressed and emitted deterministically. +- Section boundaries are preserved for later translation. + +## SPEC-140: Homebrew Runtime Surface +Outcome +- Provide a minimal runtime ABI surface that can boot a recompiled homebrew title. + +Work items +- [x] Implement homebrew entrypoint shim with loader config setup. +- [x] Define loader config keys and defaults (EndOfList, MainThreadHandle, AppletType). +- [x] Add runtime manifest that enumerates provided config keys and stubbed services. +- [x] Implement deterministic time and input stubs for validation runs. +- [x] Add logging for unsupported service calls with explicit failure behavior. + +Exit criteria (from SPEC-140) +- Recompiled binaries boot with required loader config keys present. +- Unsupported services fail with explicit, logged errors. +- Runtime manifest records provided loader config keys. + +## SPEC-150: Homebrew Asset Packaging +Outcome +- Extract NRO asset section contents and package them alongside recompiled output. + +Work items +- [x] Implement asset section extraction (icon, NACP, RomFS). +- [x] Validate and store NACP as `control.nacp` with expected size. +- [x] Emit deterministic asset output directory and hashes in manifest.json. +- [x] Document runtime RomFS mount expectations. +- [x] Add tests for asset extraction and manifest hashes. + +Exit criteria (from SPEC-150) +- Icon, NACP, and RomFS assets are extracted deterministically when present. +- Asset hashes in manifest.json match extracted bytes. +- Code output remains separate from extracted assets. + +## SPEC-160: AArch64 Decode Coverage +Outcome +- Expand decode coverage and IR support to lift real homebrew code paths. + +Work items +- [x] Extend the lifted IR schema with arithmetic, logical, shift, memory, and branch ops. +- [x] Add decoder support for MOV (ORR alias), SUB, AND/OR/XOR, ADR/ADRP, LDR/STR, and branch opcodes listed in SPEC-160. +- [x] Map 32-bit W-register operations to zero-extended 64-bit IR semantics. +- [x] Add per-op unit tests that validate opcode decoding and emitted IR structure. +- [x] Add decode-limit enforcement tests for oversized text segments. + +Exit criteria (from SPEC-160) +- A synthetic instruction stream containing Phase 1 opcodes lifts without errors. +- Unsupported opcodes report the PC and opcode value. +- Tests confirm 32-bit variants are zero-extended. +- Loads/stores emit correctly typed IR ops with aligned access checks. + +## SPEC-170: Function Discovery and Control-Flow Graph +Outcome +- Replace linear decoding with basic blocks and deterministic control-flow graphs. + +Work items +- [x] Extend the lifted module schema to allow block-based functions alongside legacy linear ops. +- [x] Implement a sorted worklist decoder that builds blocks and edges deterministically. +- [x] Add control-flow terminators for unconditional, conditional, call, and indirect branches. +- [x] Seed function discovery from entrypoint and direct call targets. +- [x] Add tests for if/else blocks, direct call discovery, and unresolved indirect branches. + +Exit criteria (from SPEC-170) +- A synthetic binary with a conditional branch yields at least two blocks and correct edges. +- Direct call targets are discovered and lifted as separate functions. +- The lifted module is deterministic when run twice on the same input. diff --git a/RESEARCH.md b/RESEARCH.md index 521724a..3c6433e 100644 --- a/RESEARCH.md +++ b/RESEARCH.md @@ -73,6 +73,10 @@ Needed research: - Nintendo Switch platform baseline: https://en.wikipedia.org/wiki/Nintendo_Switch - Tegra X1 whitepaper: https://www.nvidia.com/content/tegra/embedded-systems/pdf/tegra-x1-whitepaper.pdf - Switch hardware overview: https://switchbrew.org/wiki/Hardware +- Switch homebrew NRO format: https://switchbrew.org/wiki/NRO +- Switch NSO format and compression: https://switchbrew.org/wiki/NSO +- Homebrew ABI entrypoint and loader config: https://switchbrew.org/wiki/Homebrew_ABI +- NACP title metadata format: https://switchbrew.org/wiki/NACP ## Research Deliverables - A research summary for each category with sources. diff --git a/crates/recomp-cli/src/main.rs b/crates/recomp-cli/src/main.rs index 6237c16..4823e38 100644 --- a/crates/recomp-cli/src/main.rs +++ b/crates/recomp-cli/src/main.rs @@ -1,5 +1,8 @@ -use clap::{Parser, Subcommand}; +use clap::{Parser, Subcommand, ValueEnum}; use recomp_pipeline::bundle::{package_bundle, PackageOptions}; +use recomp_pipeline::homebrew::{ + intake_homebrew, lift_homebrew, IntakeOptions, LiftMode, LiftOptions, +}; use recomp_pipeline::{run_pipeline, PipelineOptions}; use std::path::PathBuf; @@ -14,6 +17,8 @@ struct Args { enum Command { Run(RunArgs), Package(PackageArgs), + HomebrewIntake(HomebrewIntakeArgs), + HomebrewLift(HomebrewLiftArgs), } #[derive(Parser, Debug)] @@ -42,6 +47,45 @@ struct PackageArgs { assets_dir: Option, } +#[derive(Parser, Debug)] +struct HomebrewIntakeArgs { + #[arg(long)] + module: PathBuf, + #[arg(long)] + nso: Vec, + #[arg(long)] + provenance: PathBuf, + #[arg(long)] + out_dir: PathBuf, +} + +#[derive(Parser, Debug)] +struct HomebrewLiftArgs { + #[arg(long)] + module_json: PathBuf, + #[arg(long)] + out_dir: PathBuf, + #[arg(long, default_value = "entry")] + entry: String, + #[arg(long, value_enum, default_value = "decode")] + mode: HomebrewLiftMode, +} + +#[derive(ValueEnum, Debug, Clone)] +enum HomebrewLiftMode { + Stub, + Decode, +} + +impl From for LiftMode { + fn from(value: HomebrewLiftMode) -> Self { + match value { + HomebrewLiftMode::Stub => LiftMode::Stub, + HomebrewLiftMode::Decode => LiftMode::Decode, + } + } +} + fn main() { let args = Args::parse(); @@ -100,5 +144,55 @@ fn main() { } } } + Command::HomebrewIntake(intake) => { + let options = IntakeOptions { + module_path: intake.module, + nso_paths: intake.nso, + provenance_path: intake.provenance, + out_dir: intake.out_dir, + }; + match intake_homebrew(options) { + Ok(report) => { + println!( + "Homebrew intake wrote {} files to {}", + report.files_written.len(), + report.out_dir.display() + ); + println!("module.json: {}", report.module_json_path.display()); + println!("manifest.json: {}", report.manifest_path.display()); + } + Err(err) => { + eprintln!("Homebrew intake error: {err}"); + std::process::exit(1); + } + } + } + Command::HomebrewLift(lift) => { + let options = LiftOptions { + module_json_path: lift.module_json, + out_dir: lift.out_dir, + entry_name: lift.entry, + mode: lift.mode.into(), + }; + match lift_homebrew(options) { + Ok(report) => { + println!( + "Homebrew lift wrote {} functions to {}", + report.functions_emitted, + report.module_json_path.display() + ); + if !report.warnings.is_empty() { + println!("Warnings:"); + for warning in report.warnings { + println!("- {}", warning); + } + } + } + Err(err) => { + eprintln!("Homebrew lift error: {err}"); + std::process::exit(1); + } + } + } } } diff --git a/crates/recomp-isa/src/lib.rs b/crates/recomp-isa/src/lib.rs index 57af6c8..932f7cd 100644 --- a/crates/recomp-isa/src/lib.rs +++ b/crates/recomp-isa/src/lib.rs @@ -121,7 +121,7 @@ impl Memory { pub fn read(&self, address: usize, size: MemSize) -> Result { let width = size.bytes(); - if !address.is_multiple_of(width) { + if address % width != 0 { return Err(ExecError::Unaligned { address, size: width, @@ -142,7 +142,7 @@ impl Memory { pub fn write(&mut self, address: usize, size: MemSize, value: u64) -> Result<(), ExecError> { let width = size.bytes(); - if !address.is_multiple_of(width) { + if address % width != 0 { return Err(ExecError::Unaligned { address, size: width, diff --git a/crates/recomp-pipeline/Cargo.toml b/crates/recomp-pipeline/Cargo.toml index 5cfb673..bb94684 100644 --- a/crates/recomp-pipeline/Cargo.toml +++ b/crates/recomp-pipeline/Cargo.toml @@ -11,6 +11,7 @@ serde_json = "1.0" sha2 = "0.10" thiserror = "1.0" toml = "0.8" +lz4_flex = "0.11" [dev-dependencies] tempfile = "3.10" diff --git a/crates/recomp-pipeline/src/homebrew/intake.rs b/crates/recomp-pipeline/src/homebrew/intake.rs new file mode 100644 index 0000000..2c42f0a --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/intake.rs @@ -0,0 +1,647 @@ +use crate::homebrew::module::{ + BssInfo, ModuleBuild, ModuleJson, ModuleSegment, OffsetInfo, MODULE_SCHEMA_VERSION, +}; +use crate::homebrew::nro::{parse_nro, NroModule}; +use crate::homebrew::nso::{extract_segments, parse_nso, NsoModule, NsoSegmentKind}; +use crate::homebrew::romfs::{list_romfs_entries, RomfsEntry}; +use crate::homebrew::util::hex_bytes; +use crate::output::{GeneratedFile, InputSummary}; +use crate::provenance::{InputFormat, ProvenanceManifest}; +use sha2::{Digest, Sha256}; +use std::collections::BTreeMap; +use std::fs; +use std::path::{Path, PathBuf}; + +const INTAKE_SCHEMA_VERSION: &str = "1"; + +#[derive(Debug)] +pub struct IntakeOptions { + pub module_path: PathBuf, + pub nso_paths: Vec, + pub provenance_path: PathBuf, + pub out_dir: PathBuf, +} + +#[derive(Debug)] +pub struct IntakeReport { + pub out_dir: PathBuf, + pub module_json_path: PathBuf, + pub manifest_path: PathBuf, + pub files_written: Vec, +} + +#[derive(Debug, serde::Serialize)] +struct IntakeManifest { + schema_version: String, + tool: ToolInfo, + modules: Vec, + assets: Vec, + inputs: Vec, + generated_files: Vec, +} + +#[derive(Debug, serde::Serialize)] +struct ToolInfo { + name: String, + version: String, +} + +#[derive(Debug, serde::Serialize)] +struct ModuleRecord { + name: String, + format: String, + build_id: String, + module_json_path: String, +} + +#[derive(Debug, serde::Serialize, Clone)] +struct AssetRecord { + kind: String, + path: String, + sha256: String, + size: u64, + source_offset: u64, + source_size: u64, +} + +pub fn intake_homebrew(options: IntakeOptions) -> Result { + let module_path = absolute_path(&options.module_path)?; + let nso_paths = options + .nso_paths + .iter() + .map(|path| absolute_path(path)) + .collect::, _>>()?; + let provenance_path = absolute_path(&options.provenance_path)?; + let out_dir = absolute_path(&options.out_dir)?; + + let provenance_src = + fs::read_to_string(&provenance_path).map_err(|err| format!("read provenance: {err}"))?; + let provenance = ProvenanceManifest::parse(&provenance_src)?; + let validation = provenance.validate(&provenance_path, &provenance_src)?; + + enforce_homebrew_formats(&validation.inputs)?; + ensure_input_present(&validation.inputs, &module_path)?; + for nso_path in &nso_paths { + ensure_input_present(&validation.inputs, nso_path)?; + } + + let nro = parse_nro(&module_path)?; + let nso_modules = nso_paths + .iter() + .map(|path| parse_nso(path)) + .collect::, _>>()?; + + fs::create_dir_all(&out_dir) + .map_err(|err| format!("create out dir {}: {err}", out_dir.display()))?; + let segments_dir = out_dir.join("segments"); + let assets_dir = out_dir.join("assets"); + fs::create_dir_all(&segments_dir).map_err(|err| format!("create segments dir: {err}"))?; + fs::create_dir_all(&assets_dir).map_err(|err| format!("create assets dir: {err}"))?; + + let (nro_build, mut generated_files, mut files_written) = + write_nro_segments(&nro, &segments_dir)?; + let mut module_builds = vec![nro_build]; + + for nso in &nso_modules { + let (build, segment_files, segment_written) = write_nso_segments(nso, &segments_dir)?; + module_builds.push(build); + generated_files.extend(segment_files); + files_written.extend(segment_written); + } + + let mut assets = Vec::new(); + let (asset_files, asset_written) = extract_assets(&nro, &out_dir, &assets_dir, &mut assets)?; + generated_files.extend(asset_files); + files_written.extend(asset_written); + + module_builds.sort_by(|a, b| a.name.cmp(&b.name)); + let module_json = ModuleJson { + schema_version: MODULE_SCHEMA_VERSION.to_string(), + module_type: "homebrew".to_string(), + modules: module_builds, + }; + + let module_json_path = out_dir.join("module.json"); + let module_json_src = + serde_json::to_string_pretty(&module_json).map_err(|err| err.to_string())?; + fs::write(&module_json_path, module_json_src.as_bytes()) + .map_err(|err| format!("write module.json: {err}"))?; + files_written.push(module_json_path.clone()); + generated_files.push(GeneratedFile { + path: "module.json".to_string(), + sha256: sha256_bytes(module_json_src.as_bytes()), + size: module_json_src.len() as u64, + }); + + let inputs = validation + .inputs + .iter() + .map(|input| InputSummary { + path: input.path.clone(), + format: input.format.as_str().to_string(), + sha256: input.sha256.clone(), + size: input.size, + role: input.role.clone(), + }) + .collect::>(); + + let module_records = module_json + .modules + .iter() + .map(|module| ModuleRecord { + name: module.name.clone(), + format: module.format.clone(), + build_id: module.build_id.clone(), + module_json_path: "module.json".to_string(), + }) + .collect::>(); + + assets.sort_by(|a, b| a.path.cmp(&b.path)); + generated_files.sort_by(|a, b| a.path.cmp(&b.path)); + + let manifest = IntakeManifest { + schema_version: INTAKE_SCHEMA_VERSION.to_string(), + tool: ToolInfo { + name: "recomp-pipeline".to_string(), + version: env!("CARGO_PKG_VERSION").to_string(), + }, + modules: module_records, + assets, + inputs, + generated_files, + }; + + let manifest_path = out_dir.join("manifest.json"); + let manifest_src = serde_json::to_string_pretty(&manifest).map_err(|err| err.to_string())?; + fs::write(&manifest_path, manifest_src.as_bytes()) + .map_err(|err| format!("write manifest.json: {err}"))?; + files_written.push(manifest_path.clone()); + + Ok(IntakeReport { + out_dir, + module_json_path, + manifest_path, + files_written, + }) +} + +fn write_nro_segments( + module: &NroModule, + segments_dir: &Path, +) -> Result<(ModuleBuild, Vec, Vec), String> { + let module_name = module + .path + .file_stem() + .and_then(|name| name.to_str()) + .unwrap_or("nro") + .to_string(); + let module_dir = segments_dir.join(&module_name); + fs::create_dir_all(&module_dir).map_err(|err| format!("create module dir: {err}"))?; + + let bytes = fs::read(&module.path) + .map_err(|err| format!("read NRO {}: {err}", module.path.display()))?; + let mut segments = Vec::new(); + let mut generated = Vec::new(); + let mut written = Vec::new(); + + for segment in &module.segments { + let start = segment.file_offset as usize; + let end = start + .checked_add(segment.size as usize) + .ok_or_else(|| "segment offset overflow".to_string())?; + if end > bytes.len() { + return Err(format!("NRO segment out of range: {}..{}", start, end)); + } + let data = &bytes[start..end]; + let file_name = format!("{}.bin", segment.name); + let output_rel = format!("segments/{module_name}/{file_name}"); + let output_path = module_dir.join(&file_name); + fs::write(&output_path, data).map_err(|err| format!("write segment {file_name}: {err}"))?; + written.push(output_path.clone()); + generated.push(GeneratedFile { + path: output_rel.clone(), + sha256: sha256_bytes(data), + size: data.len() as u64, + }); + segments.push(ModuleSegment { + name: segment.name.clone(), + file_offset: segment.file_offset as u64, + file_size: segment.size as u64, + memory_offset: segment.memory_offset as u64, + memory_size: segment.size as u64, + permissions: segment.permissions.as_str().to_string(), + compressed: None, + output_path: output_rel, + }); + } + + let input_sha256 = sha256_path(&module.path)?; + let input_size = bytes.len() as u64; + let bss_offset = module + .segments + .iter() + .find(|segment| segment.name == "data") + .map(|segment| segment.memory_offset as u64 + segment.size as u64) + .unwrap_or(0); + + let build = ModuleBuild { + name: module_name, + format: "nro".to_string(), + input_path: module.path.clone(), + input_sha256, + input_size, + build_id: module.build_id_hex(), + segments, + bss: BssInfo { + size: module.bss_size as u64, + memory_offset: bss_offset, + }, + embedded: None, + dynstr: None, + dynsym: None, + }; + + Ok((build, generated, written)) +} + +fn write_nso_segments( + module: &NsoModule, + segments_dir: &Path, +) -> Result<(ModuleBuild, Vec, Vec), String> { + let module_name = module + .path + .file_stem() + .and_then(|name| name.to_str()) + .unwrap_or("nso") + .to_string(); + let module_dir = segments_dir.join(&module_name); + fs::create_dir_all(&module_dir).map_err(|err| format!("create module dir: {err}"))?; + + let segment_data = extract_segments(module)?; + let mut segments = Vec::new(); + let mut generated = Vec::new(); + let mut written = Vec::new(); + + for entry in segment_data { + let file_name = format!("{}.bin", segment_name(entry.segment.kind)); + let output_rel = format!("segments/{module_name}/{file_name}"); + let output_path = module_dir.join(&file_name); + fs::write(&output_path, &entry.data) + .map_err(|err| format!("write NSO segment {file_name}: {err}"))?; + written.push(output_path.clone()); + generated.push(GeneratedFile { + path: output_rel.clone(), + sha256: sha256_bytes(&entry.data), + size: entry.data.len() as u64, + }); + segments.push(ModuleSegment { + name: segment_name(entry.segment.kind).to_string(), + file_offset: entry.segment.file_offset as u64, + file_size: entry.segment.file_size as u64, + memory_offset: entry.segment.memory_offset as u64, + memory_size: entry.segment.size as u64, + permissions: entry.segment.permissions.as_str().to_string(), + compressed: Some(entry.segment.compressed), + output_path: output_rel, + }); + } + + let input_sha256 = sha256_path(&module.path)?; + let bss_offset = module + .segments + .iter() + .find(|segment| segment.kind == NsoSegmentKind::Data) + .map(|segment| segment.memory_offset as u64 + segment.size as u64) + .unwrap_or(0); + + let build = ModuleBuild { + name: module_name, + format: "nso".to_string(), + input_path: module.path.clone(), + input_sha256, + input_size: module.size, + build_id: module.module_id_hex(), + segments, + bss: BssInfo { + size: module.bss_size as u64, + memory_offset: bss_offset, + }, + embedded: Some(OffsetInfo { + offset: module.embedded_offset as u64, + size: module.embedded_size as u64, + }), + dynstr: Some(OffsetInfo { + offset: module.dynstr_offset as u64, + size: module.dynstr_size as u64, + }), + dynsym: Some(OffsetInfo { + offset: module.dynsym_offset as u64, + size: module.dynsym_size as u64, + }), + }; + + Ok((build, generated, written)) +} + +fn extract_assets( + module: &NroModule, + root_dir: &Path, + assets_dir: &Path, + records: &mut Vec, +) -> Result<(Vec, Vec), String> { + let Some(assets) = module.assets.clone() else { + return Ok((Vec::new(), Vec::new())); + }; + let bytes = fs::read(&module.path) + .map_err(|err| format!("read NRO {}: {err}", module.path.display()))?; + let mut generated = Vec::new(); + let mut written = Vec::new(); + + if assets.icon.size > 0 { + let (path, info) = extract_asset( + &bytes, + &assets, + assets.icon, + root_dir, + assets_dir, + "icon.bin", + "icon", + )?; + records.push(info); + generated.push(path.generated_file); + written.push(path.path); + } + + if assets.nacp.size > 0 { + let (path, info) = extract_asset( + &bytes, + &assets, + assets.nacp, + root_dir, + assets_dir, + "control.nacp", + "nacp", + )?; + if info.size != 0x4000 { + return Err(format!( + "NACP size mismatch: expected 0x4000, got {}", + info.size + )); + } + records.push(info); + generated.push(path.generated_file); + written.push(path.path); + } + + if assets.romfs.size > 0 { + let romfs_dir = assets_dir.join("romfs"); + fs::create_dir_all(&romfs_dir).map_err(|err| format!("create romfs dir: {err}"))?; + let (romfs_bytes, romfs_base_offset) = extract_asset_bytes(&bytes, &assets, assets.romfs)?; + let entries = list_romfs_entries(&romfs_bytes)?; + let (generated_entries, written_entries) = write_romfs_entries( + &romfs_bytes, + &entries, + romfs_base_offset, + root_dir, + &romfs_dir, + "romfs", + records, + )?; + generated.extend(generated_entries); + written.extend(written_entries); + } + + Ok((generated, written)) +} + +struct AssetWrite { + path: PathBuf, + generated_file: GeneratedFile, +} + +fn extract_asset( + bytes: &[u8], + assets: &crate::homebrew::nro::NroAssetHeader, + section: crate::homebrew::nro::NroAssetSection, + root_dir: &Path, + out_dir: &Path, + file_name: &str, + kind: &str, +) -> Result<(AssetWrite, AssetRecord), String> { + let start = assets + .base_offset + .checked_add(section.offset) + .ok_or_else(|| "asset offset overflow".to_string())? as usize; + let end = start + .checked_add(section.size as usize) + .ok_or_else(|| "asset size overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "asset out of range: {}..{} (len={})", + start, + end, + bytes.len() + )); + } + let data = &bytes[start..end]; + let out_path = out_dir.join(file_name); + fs::write(&out_path, data).map_err(|err| format!("write asset {file_name}: {err}"))?; + + let rel = out_path + .strip_prefix(root_dir) + .unwrap_or(&out_path) + .to_string_lossy() + .replace('\\', "/"); + + let generated_file = GeneratedFile { + path: rel.clone(), + sha256: sha256_bytes(data), + size: data.len() as u64, + }; + let record = AssetRecord { + kind: kind.to_string(), + path: rel, + sha256: generated_file.sha256.clone(), + size: generated_file.size, + source_offset: section.offset, + source_size: section.size, + }; + + Ok(( + AssetWrite { + path: out_path, + generated_file, + }, + record, + )) +} + +fn extract_asset_bytes( + bytes: &[u8], + assets: &crate::homebrew::nro::NroAssetHeader, + section: crate::homebrew::nro::NroAssetSection, +) -> Result<(Vec, u64), String> { + let start = assets + .base_offset + .checked_add(section.offset) + .ok_or_else(|| "asset offset overflow".to_string())? as usize; + let end = start + .checked_add(section.size as usize) + .ok_or_else(|| "asset size overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "asset out of range: {}..{} (len={})", + start, + end, + bytes.len() + )); + } + Ok((bytes[start..end].to_vec(), start as u64)) +} + +fn write_romfs_entries( + romfs_bytes: &[u8], + entries: &[RomfsEntry], + romfs_base_offset: u64, + root_dir: &Path, + romfs_dir: &Path, + kind: &str, + records: &mut Vec, +) -> Result<(Vec, Vec), String> { + let mut generated = Vec::new(); + let mut written = Vec::new(); + for entry in entries { + let rel_path = Path::new(&entry.path); + if rel_path.is_absolute() { + return Err(format!("romfs entry path is absolute: {}", entry.path)); + } + for component in rel_path.components() { + match component { + std::path::Component::Normal(_) => {} + _ => { + return Err(format!( + "romfs entry path contains invalid component: {}", + entry.path + )) + } + } + } + + let out_path = romfs_dir.join(rel_path); + if let Some(parent) = out_path.parent() { + fs::create_dir_all(parent) + .map_err(|err| format!("create romfs dir {}: {err}", parent.display()))?; + } + + let start = entry.data_offset as usize; + let end = start + .checked_add(entry.data_size as usize) + .ok_or_else(|| "romfs file size overflow".to_string())?; + if end > romfs_bytes.len() { + return Err(format!( + "romfs entry out of range: {}..{} (len={})", + start, + end, + romfs_bytes.len() + )); + } + let data = &romfs_bytes[start..end]; + fs::write(&out_path, data) + .map_err(|err| format!("write romfs entry {}: {err}", out_path.display()))?; + + let rel = out_path + .strip_prefix(root_dir) + .unwrap_or(&out_path) + .to_string_lossy() + .replace('\\', "/"); + let generated_file = GeneratedFile { + path: rel.clone(), + sha256: sha256_bytes(data), + size: data.len() as u64, + }; + let record = AssetRecord { + kind: kind.to_string(), + path: rel, + sha256: generated_file.sha256.clone(), + size: generated_file.size, + source_offset: romfs_base_offset + .checked_add(entry.data_offset) + .ok_or_else(|| "romfs source offset overflow".to_string())?, + source_size: entry.data_size, + }; + records.push(record); + generated.push(generated_file); + written.push(out_path); + } + + Ok((generated, written)) +} + +fn ensure_input_present( + inputs: &[crate::provenance::ValidatedInput], + path: &Path, +) -> Result<(), String> { + if inputs.iter().any(|input| input.path == path) { + Ok(()) + } else { + Err(format!( + "input {} not listed in provenance metadata", + path.display() + )) + } +} + +fn enforce_homebrew_formats(inputs: &[crate::provenance::ValidatedInput]) -> Result<(), String> { + let mut disallowed = BTreeMap::new(); + for input in inputs { + match input.format { + InputFormat::Nro0 | InputFormat::Nso0 => {} + other => { + disallowed + .entry(other.as_str()) + .or_insert_with(Vec::new) + .push(input.path.clone()); + } + } + } + if disallowed.is_empty() { + return Ok(()); + } + let mut message = String::from("disallowed input formats for homebrew intake:\n"); + for (format, paths) in disallowed { + for path in paths { + message.push_str(&format!("- {format}: {}\n", path.display())); + } + } + Err(message) +} + +fn segment_name(kind: NsoSegmentKind) -> &'static str { + match kind { + NsoSegmentKind::Text => "text", + NsoSegmentKind::Rodata => "rodata", + NsoSegmentKind::Data => "data", + } +} + +fn absolute_path(path: &Path) -> Result { + if path.is_absolute() { + Ok(path.to_path_buf()) + } else { + std::env::current_dir() + .map_err(|err| err.to_string()) + .map(|cwd| cwd.join(path)) + } +} + +fn sha256_bytes(bytes: &[u8]) -> String { + let mut hasher = Sha256::new(); + hasher.update(bytes); + let digest = hasher.finalize(); + hex_bytes(&digest) +} + +fn sha256_path(path: &Path) -> Result { + let bytes = fs::read(path).map_err(|err| err.to_string())?; + Ok(sha256_bytes(&bytes)) +} diff --git a/crates/recomp-pipeline/src/homebrew/lift.rs b/crates/recomp-pipeline/src/homebrew/lift.rs new file mode 100644 index 0000000..bca0141 --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/lift.rs @@ -0,0 +1,1206 @@ +use crate::homebrew::module::{ModuleJson, ModuleSegment, MODULE_SCHEMA_VERSION}; +use crate::input::{Block, Function, Module, Op, Terminator}; +use serde_json; +use std::collections::{BTreeMap, BTreeSet}; +use std::fs; +use std::path::{Path, PathBuf}; + +const MAX_BLOCK_INSTRUCTIONS: usize = 10_000; +const MAX_FUNCTION_INSTRUCTIONS: usize = 200_000; + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum LiftMode { + Stub, + Decode, +} + +#[derive(Debug)] +pub struct LiftOptions { + pub module_json_path: PathBuf, + pub out_dir: PathBuf, + pub entry_name: String, + pub mode: LiftMode, +} + +#[derive(Debug)] +pub struct LiftReport { + pub module_json_path: PathBuf, + pub functions_emitted: usize, + pub warnings: Vec, +} + +pub fn lift_homebrew(options: LiftOptions) -> Result { + if options.entry_name.trim().is_empty() { + return Err("entry name must be non-empty".to_string()); + } + + let module_json_path = absolute_path(&options.module_json_path)?; + let out_dir = absolute_path(&options.out_dir)?; + + let module_src = fs::read_to_string(&module_json_path).map_err(|err| err.to_string())?; + let module_json: ModuleJson = + serde_json::from_str(&module_src).map_err(|err| err.to_string())?; + + validate_homebrew_module(&module_json)?; + + let base_dir = module_json_path + .parent() + .ok_or_else(|| "homebrew module.json has no parent directory".to_string())?; + + match options.mode { + LiftMode::Stub => lift_stub(&options, &module_json, base_dir, &out_dir), + LiftMode::Decode => lift_decode(&options, &module_json, base_dir, &out_dir), + } +} + +fn validate_homebrew_module(module_json: &ModuleJson) -> Result<(), String> { + if module_json.schema_version != MODULE_SCHEMA_VERSION { + return Err(format!( + "unsupported homebrew module schema version: {}", + module_json.schema_version + )); + } + if module_json.module_type != "homebrew" { + return Err(format!( + "unsupported module type for homebrew lifter: {}", + module_json.module_type + )); + } + if module_json.modules.is_empty() { + return Err("homebrew module list is empty".to_string()); + } + Ok(()) +} + +fn lift_stub( + options: &LiftOptions, + module_json: &ModuleJson, + base_dir: &Path, + out_dir: &Path, +) -> Result { + let mut warnings = Vec::new(); + + if module_json.modules.len() > 1 { + warnings.push(format!( + "homebrew lifter emitted a stub entry for {} modules without decoding instructions", + module_json.modules.len() + )); + } else { + warnings + .push("homebrew lifter emitted a stub entry without decoding instructions".to_string()); + } + + let segments = collect_segments(module_json, base_dir)?; + if segments.is_empty() { + warnings.push("homebrew module contains no segments".to_string()); + } + + ensure_dir(out_dir)?; + + let lifted = Module { + arch: "aarch64".to_string(), + functions: vec![Function { + name: options.entry_name.clone(), + ops: vec![Op::Ret], + blocks: Vec::new(), + }], + }; + + let output_path = out_dir.join("module.json"); + let output_json = serde_json::to_string_pretty(&lifted).map_err(|err| err.to_string())?; + fs::write(&output_path, output_json).map_err(|err| err.to_string())?; + + Ok(LiftReport { + module_json_path: output_path, + functions_emitted: 1, + warnings, + }) +} + +fn lift_decode( + options: &LiftOptions, + module_json: &ModuleJson, + base_dir: &Path, + out_dir: &Path, +) -> Result { + let mut warnings = Vec::new(); + if module_json.modules.len() > 1 { + warnings.push(format!( + "homebrew lifter decoding only the first text segment across {} modules", + module_json.modules.len() + )); + } + + let segments = collect_segments(module_json, base_dir)?; + let text_segment = find_text_segment(&segments, &mut warnings)?; + let text = load_text_segment(&text_segment, &mut warnings)?; + + let mut functions = Vec::new(); + let mut pending = BTreeSet::new(); + let mut seen = BTreeSet::new(); + let mut name_table = FunctionNames::new(); + name_table.seed_entry(text.base_addr, options.entry_name.clone()); + + pending.insert(text.base_addr); + + while let Some(func_addr) = pop_first(&mut pending) { + if !text.contains(func_addr) { + warnings.push(format!( + "function entry 0x{func_addr:x} is outside text segment" + )); + continue; + } + if seen.contains(&func_addr) { + continue; + } + seen.insert(func_addr); + + let func_name = name_table.name_for(func_addr); + let mut state = DecodeState::new(); + let (blocks, call_targets) = + decode_function(&text, func_addr, &mut state, &mut warnings, &mut name_table)?; + + for target in call_targets { + if !text.contains(target) { + warnings.push(format!("call target 0x{target:x} is outside text segment")); + continue; + } + pending.insert(target); + } + + functions.push(Function { + name: func_name, + ops: Vec::new(), + blocks, + }); + } + + if functions.is_empty() { + warnings.push("decoded zero functions from text segment".to_string()); + } + + ensure_dir(out_dir)?; + + let lifted = Module { + arch: "aarch64".to_string(), + functions, + }; + + let output_path = out_dir.join("module.json"); + let output_json = serde_json::to_string_pretty(&lifted).map_err(|err| err.to_string())?; + fs::write(&output_path, output_json).map_err(|err| err.to_string())?; + + Ok(LiftReport { + module_json_path: output_path, + functions_emitted: lifted.functions.len(), + warnings, + }) +} + +#[derive(Debug, Clone)] +struct SegmentInfo { + name: String, + permissions: String, + memory_offset: u64, + file_size: u64, + path: PathBuf, +} + +#[derive(Debug, Clone)] +struct TextSegment { + bytes: Vec, + base_addr: u64, +} + +impl TextSegment { + fn contains(&self, addr: u64) -> bool { + let end = self.base_addr + self.bytes.len() as u64; + addr >= self.base_addr && addr < end + } + + fn read_word(&self, addr: u64) -> Result { + if !self.contains(addr) { + return Err(format!("address 0x{addr:x} outside text segment")); + } + let offset = addr - self.base_addr; + if offset % 4 != 0 { + return Err(format!("unaligned instruction address 0x{addr:x}")); + } + let idx = offset as usize; + if idx + 4 > self.bytes.len() { + return Err(format!("instruction read overrun at 0x{addr:x}")); + } + Ok(u32::from_le_bytes([ + self.bytes[idx], + self.bytes[idx + 1], + self.bytes[idx + 2], + self.bytes[idx + 3], + ])) + } +} + +fn collect_segments(module_json: &ModuleJson, base_dir: &Path) -> Result, String> { + let mut segments = Vec::new(); + for module in &module_json.modules { + for segment in &module.segments { + let segment_path = resolve_segment_path(base_dir, segment); + if !segment_path.exists() { + return Err(format!( + "segment file not found: {}", + segment_path.display() + )); + } + segments.push(SegmentInfo { + name: segment.name.clone(), + permissions: segment.permissions.clone(), + memory_offset: segment.memory_offset, + file_size: segment.file_size, + path: segment_path, + }); + } + } + Ok(segments) +} + +fn find_text_segment( + segments: &[SegmentInfo], + warnings: &mut Vec, +) -> Result { + let mut candidates = segments + .iter() + .filter(|segment| segment.name == "text" || segment.permissions.contains('x')); + let first = candidates + .next() + .cloned() + .ok_or_else(|| "no executable text segment found in homebrew module".to_string())?; + if candidates.next().is_some() { + warnings.push("multiple executable segments found; using the first".to_string()); + } + Ok(first) +} + +fn load_text_segment( + segment: &SegmentInfo, + warnings: &mut Vec, +) -> Result { + let bytes = fs::read(&segment.path).map_err(|err| err.to_string())?; + if bytes.len() as u64 != segment.file_size { + warnings.push(format!( + "text segment size mismatch: manifest {} bytes, file {} bytes", + segment.file_size, + bytes.len() + )); + } + if bytes.len() % 4 != 0 { + return Err(format!( + "text segment size is not 4-byte aligned: {} bytes", + bytes.len() + )); + } + Ok(TextSegment { + bytes, + base_addr: segment.memory_offset, + }) +} + +#[derive(Debug, Default)] +struct DecodeState { + temp_counter: u32, + reg_values: [Option; 32], +} + +impl DecodeState { + fn new() -> Self { + Self { + temp_counter: 0, + reg_values: [None; 32], + } + } + + fn next_temp(&mut self, prefix: &str) -> String { + let name = format!("{prefix}{}", self.temp_counter); + self.temp_counter += 1; + name + } +} + +struct FunctionNames { + names: BTreeMap, +} + +impl FunctionNames { + fn new() -> Self { + Self { + names: BTreeMap::new(), + } + } + + fn name_for(&mut self, addr: u64) -> String { + if let Some(name) = self.names.get(&addr) { + return name.clone(); + } + let name = format!("fn_{addr:016x}"); + self.names.insert(addr, name.clone()); + name + } + + fn seed_entry(&mut self, addr: u64, name: String) { + self.names.insert(addr, name); + } +} + +fn decode_function( + text: &TextSegment, + entry: u64, + state: &mut DecodeState, + warnings: &mut Vec, + names: &mut FunctionNames, +) -> Result<(Vec, Vec), String> { + let mut blocks = BTreeMap::new(); + let mut pending = BTreeSet::new(); + let mut call_targets = BTreeSet::new(); + let mut decoded_count = 0_usize; + + pending.insert(entry); + + while let Some(addr) = pop_first(&mut pending) { + if blocks.contains_key(&addr) { + continue; + } + + let (block, block_targets, block_calls, block_insts) = + decode_block(text, addr, state, warnings, names)?; + decoded_count += block_insts; + if decoded_count > MAX_FUNCTION_INSTRUCTIONS { + return Err(format!( + "function decode limit exceeded ({} instructions)", + MAX_FUNCTION_INSTRUCTIONS + )); + } + + for target in block_targets { + pending.insert(target); + } + for target in block_calls { + call_targets.insert(target); + } + blocks.insert(addr, block); + } + + Ok(( + blocks.into_values().collect(), + call_targets.into_iter().collect(), + )) +} + +fn decode_block( + text: &TextSegment, + start: u64, + state: &mut DecodeState, + warnings: &mut Vec, + names: &mut FunctionNames, +) -> Result<(Block, Vec, Vec, usize), String> { + if start % 4 != 0 { + return Err(format!("unaligned block start 0x{start:x}")); + } + if !text.contains(start) { + return Err(format!("block start 0x{start:x} outside text segment")); + } + + let mut ops = Vec::new(); + let mut pc = start; + let mut inst_count = 0_usize; + + loop { + if inst_count >= MAX_BLOCK_INSTRUCTIONS { + return Err(format!( + "block decode limit exceeded ({} instructions)", + MAX_BLOCK_INSTRUCTIONS + )); + } + let word = text.read_word(pc)?; + let decoded = decode_one(word, pc, state, &mut ops, warnings)?; + inst_count += 1; + + match decoded { + DecodedOutcome::Continue => { + pc = pc.wrapping_add(4); + continue; + } + DecodedOutcome::Terminate(term) => { + let (terminator, block_targets, call_targets) = lower_terminator(term, names); + let block = Block { + label: block_label(start), + start, + ops, + terminator, + }; + return Ok((block, block_targets, call_targets, inst_count)); + } + } + } +} + +fn lower_terminator(term: TermInfo, names: &mut FunctionNames) -> (Terminator, Vec, Vec) { + match term { + TermInfo::Br { target } => ( + Terminator::Br { + target: block_label(target), + }, + vec![target], + Vec::new(), + ), + TermInfo::BrCond { + cond, + then_target, + else_target, + } => ( + Terminator::BrCond { + cond, + then: block_label(then_target), + else_target: block_label(else_target), + }, + vec![then_target, else_target], + Vec::new(), + ), + TermInfo::Call { target, next } => { + let name = names.name_for(target); + ( + Terminator::Call { + target: name, + next: block_label(next), + }, + vec![next], + vec![target], + ) + } + TermInfo::BrIndirect { reg } => ( + Terminator::BrIndirect { reg: reg_name(reg) }, + Vec::new(), + Vec::new(), + ), + TermInfo::Ret => (Terminator::Ret, Vec::new(), Vec::new()), + } +} + +#[derive(Debug)] +enum DecodedOutcome { + Continue, + Terminate(TermInfo), +} + +#[derive(Debug)] +enum TermInfo { + Br { + target: u64, + }, + BrCond { + cond: String, + then_target: u64, + else_target: u64, + }, + Call { + target: u64, + next: u64, + }, + BrIndirect { + reg: u8, + }, + Ret, +} + +fn decode_one( + word: u32, + pc: u64, + state: &mut DecodeState, + ops: &mut Vec, + warnings: &mut Vec, +) -> Result { + if word == 0xD503201F { + return Ok(DecodedOutcome::Continue); + } + if is_ret(word) { + return Ok(DecodedOutcome::Terminate(TermInfo::Ret)); + } + if let Some(reg) = decode_br_register(word) { + return Ok(DecodedOutcome::Terminate(TermInfo::BrIndirect { reg })); + } + if let Some(target) = decode_branch_imm(word, pc) { + return Ok(DecodedOutcome::Terminate(TermInfo::Br { target })); + } + if let Some(target) = decode_branch_link(word, pc) { + return Ok(DecodedOutcome::Terminate(TermInfo::Call { + target, + next: pc.wrapping_add(4), + })); + } + if let Some((cond, target)) = decode_branch_cond(word, pc)? { + return Ok(DecodedOutcome::Terminate(TermInfo::BrCond { + cond, + then_target: target, + else_target: pc.wrapping_add(4), + })); + } + + if let Some((is_nz, reg, target)) = decode_cbz(word, pc)? { + let zero = state.next_temp("imm"); + ops.push(Op::ConstI64 { + dst: zero.clone(), + imm: 0, + }); + ops.push(Op::CmpI64 { + lhs: reg_name(reg), + rhs: zero, + }); + let cond = if is_nz { "ne" } else { "eq" }; + return Ok(DecodedOutcome::Terminate(TermInfo::BrCond { + cond: cond.to_string(), + then_target: target, + else_target: pc.wrapping_add(4), + })); + } + + if let Some((is_nz, reg, bit, target)) = decode_tbz(word, pc)? { + let tmp_shift = state.next_temp("tmp"); + let tmp_mask = state.next_temp("tmp"); + let tmp_one = state.next_temp("imm"); + let tmp_zero = state.next_temp("imm"); + ops.push(Op::ConstI64 { + dst: tmp_one.clone(), + imm: 1, + }); + ops.push(Op::ConstI64 { + dst: tmp_zero.clone(), + imm: 0, + }); + ops.push(Op::ConstI64 { + dst: tmp_shift.clone(), + imm: bit as i64, + }); + ops.push(Op::LsrI64 { + dst: tmp_mask.clone(), + lhs: reg_name(reg), + rhs: tmp_shift, + }); + ops.push(Op::AndI64 { + dst: tmp_mask.clone(), + lhs: tmp_mask.clone(), + rhs: tmp_one, + }); + ops.push(Op::CmpI64 { + lhs: tmp_mask, + rhs: tmp_zero, + }); + let cond = if is_nz { "ne" } else { "eq" }; + return Ok(DecodedOutcome::Terminate(TermInfo::BrCond { + cond: cond.to_string(), + then_target: target, + else_target: pc.wrapping_add(4), + })); + } + + if let Some(decoded) = decode_mov_wide(word)? { + emit_mov_wide(decoded, pc, state, ops)?; + return Ok(DecodedOutcome::Continue); + } + + if let Some(decoded) = decode_adr(word, pc)? { + ops.push(decoded); + return Ok(DecodedOutcome::Continue); + } + + if let Some(decoded) = decode_add_sub_immediate(word)? { + emit_add_sub_immediate(decoded, state, ops); + return Ok(DecodedOutcome::Continue); + } + + if let Some(decoded) = decode_add_sub_register(word)? { + emit_add_sub_register(decoded, state, ops); + return Ok(DecodedOutcome::Continue); + } + + if let Some(decoded) = decode_logical_register(word)? { + emit_logical_register(decoded, state, ops); + return Ok(DecodedOutcome::Continue); + } + + if let Some(decoded) = decode_load_store(word)? { + emit_load_store(decoded, ops, warnings); + return Ok(DecodedOutcome::Continue); + } + + Err(format!( + "unsupported instruction 0x{word:08x} at 0x{pc:016x}" + )) +} + +#[derive(Debug, Clone, Copy)] +struct MovWide { + dst: u8, + kind: MovWideKind, + imm: u16, + shift: u8, + is_32: bool, +} + +#[derive(Debug, Clone, Copy)] +enum MovWideKind { + MovZ, + MovN, + MovK, +} + +fn decode_mov_wide(word: u32) -> Result, String> { + let sf = (word >> 31) & 0x1; + let opc = (word >> 29) & 0x3; + let fixed = (word >> 23) & 0x3F; + if fixed != 0b100101 { + return Ok(None); + } + let hw = ((word >> 21) & 0x3) as u8; + let imm16 = ((word >> 5) & 0xFFFF) as u16; + let rd = (word & 0x1F) as u8; + let kind = match opc { + 0b00 => MovWideKind::MovN, + 0b10 => MovWideKind::MovZ, + 0b11 => MovWideKind::MovK, + _ => return Ok(None), + }; + Ok(Some(MovWide { + dst: rd, + kind, + imm: imm16, + shift: hw, + is_32: sf == 0, + })) +} + +fn emit_mov_wide( + decoded: MovWide, + pc: u64, + state: &mut DecodeState, + ops: &mut Vec, +) -> Result<(), String> { + let shift_bits = (decoded.shift as u32) * 16; + let imm_value = (decoded.imm as u64) << shift_bits; + let value = match decoded.kind { + MovWideKind::MovZ => imm_value, + MovWideKind::MovN => !imm_value, + MovWideKind::MovK => { + let prev = state.reg_values[decoded.dst as usize].ok_or_else(|| { + format!( + "movk requires prior value for x{} at 0x{:x}", + decoded.dst, pc + ) + })?; + let mask = !(0xFFFF_u64 << shift_bits); + (prev as u64 & mask) | imm_value + } + }; + state.reg_values[decoded.dst as usize] = Some(value as i64); + ops.push(Op::ConstI64 { + dst: reg_name(decoded.dst), + imm: value as i64, + }); + if decoded.is_32 { + zero_extend_32(®_name(decoded.dst), state, ops); + } + Ok(()) +} + +fn decode_adr(word: u32, pc: u64) -> Result, String> { + let op = (word >> 31) & 0x1; + let fixed = (word >> 24) & 0x1F; + if fixed != 0b10000 { + return Ok(None); + } + let immlo = (word >> 29) & 0x3; + let immhi = (word >> 5) & 0x7FFFF; + let rd = (word & 0x1F) as u8; + let imm = ((immhi << 2) | immlo) as i64; + let signed = sign_extend(imm, 21); + if op == 0 { + return Ok(Some(Op::PcRel { + dst: reg_name(rd), + pc: pc as i64, + offset: signed, + })); + } + let page = (pc as i64) & !0xFFF; + let offset = signed << 12; + Ok(Some(Op::PcRel { + dst: reg_name(rd), + pc: page, + offset, + })) +} + +#[derive(Debug, Clone, Copy)] +struct AddSubImm { + dst: u8, + src: u8, + imm: u64, + is_sub: bool, + set_flags: bool, + is_32: bool, +} + +fn decode_add_sub_immediate(word: u32) -> Result, String> { + let sf = (word >> 31) & 0x1; + let op = (word >> 30) & 0x1; + let s = (word >> 29) & 0x1; + let opcode = (word >> 24) & 0x1F; + if opcode != 0b10001 { + return Ok(None); + } + let shift = (word >> 22) & 0x3; + if shift > 1 { + return Ok(None); + } + let imm12 = (word >> 10) & 0xFFF; + let rn = ((word >> 5) & 0x1F) as u8; + let rd = (word & 0x1F) as u8; + let imm = (imm12 as u64) << (shift * 12); + Ok(Some(AddSubImm { + dst: rd, + src: rn, + imm, + is_sub: op == 1, + set_flags: s == 1, + is_32: sf == 0, + })) +} + +fn emit_add_sub_immediate(decoded: AddSubImm, state: &mut DecodeState, ops: &mut Vec) { + let temp = state.next_temp("imm"); + ops.push(Op::ConstI64 { + dst: temp.clone(), + imm: decoded.imm as i64, + }); + if decoded.set_flags && decoded.dst == 31 { + if decoded.is_sub { + ops.push(Op::CmpI64 { + lhs: reg_name(decoded.src), + rhs: temp, + }); + } else { + ops.push(Op::CmnI64 { + lhs: reg_name(decoded.src), + rhs: temp, + }); + } + return; + } + if decoded.is_sub { + ops.push(Op::SubI64 { + dst: reg_name(decoded.dst), + lhs: reg_name(decoded.src), + rhs: temp, + }); + } else { + ops.push(Op::AddI64 { + dst: reg_name(decoded.dst), + lhs: reg_name(decoded.src), + rhs: temp, + }); + } + if decoded.is_32 { + zero_extend_32(®_name(decoded.dst), state, ops); + } +} + +#[derive(Debug, Clone, Copy)] +struct AddSubReg { + dst: u8, + lhs: u8, + rhs: u8, + is_sub: bool, + set_flags: bool, + is_32: bool, +} + +fn decode_add_sub_register(word: u32) -> Result, String> { + let sf = (word >> 31) & 0x1; + let op = (word >> 30) & 0x1; + let s = (word >> 29) & 0x1; + let opcode = (word >> 24) & 0x1F; + if opcode != 0b01011 { + return Ok(None); + } + let shift = (word >> 22) & 0x3; + let imm6 = (word >> 10) & 0x3F; + if shift != 0 || imm6 != 0 { + return Ok(None); + } + let rm = ((word >> 16) & 0x1F) as u8; + let rn = ((word >> 5) & 0x1F) as u8; + let rd = (word & 0x1F) as u8; + Ok(Some(AddSubReg { + dst: rd, + lhs: rn, + rhs: rm, + is_sub: op == 1, + set_flags: s == 1, + is_32: sf == 0, + })) +} + +fn emit_add_sub_register(decoded: AddSubReg, state: &mut DecodeState, ops: &mut Vec) { + if decoded.set_flags && decoded.dst == 31 { + if decoded.is_sub { + ops.push(Op::CmpI64 { + lhs: reg_name(decoded.lhs), + rhs: reg_name(decoded.rhs), + }); + } else { + ops.push(Op::CmnI64 { + lhs: reg_name(decoded.lhs), + rhs: reg_name(decoded.rhs), + }); + } + return; + } + if decoded.is_sub { + ops.push(Op::SubI64 { + dst: reg_name(decoded.dst), + lhs: reg_name(decoded.lhs), + rhs: reg_name(decoded.rhs), + }); + } else { + ops.push(Op::AddI64 { + dst: reg_name(decoded.dst), + lhs: reg_name(decoded.lhs), + rhs: reg_name(decoded.rhs), + }); + } + if decoded.is_32 { + zero_extend_32(®_name(decoded.dst), state, ops); + } +} + +#[derive(Debug, Clone, Copy)] +struct LogicalReg { + dst: u8, + lhs: u8, + rhs: u8, + opc: u8, + is_32: bool, +} + +fn decode_logical_register(word: u32) -> Result, String> { + let sf = (word >> 31) & 0x1; + let opc = ((word >> 29) & 0x3) as u8; + let fixed = (word >> 24) & 0x1F; + let n = (word >> 21) & 0x1; + if fixed != 0b01010 || n != 0 { + return Ok(None); + } + let shift = (word >> 22) & 0x3; + let imm6 = (word >> 10) & 0x3F; + if shift != 0 || imm6 != 0 { + return Ok(None); + } + let rm = ((word >> 16) & 0x1F) as u8; + let rn = ((word >> 5) & 0x1F) as u8; + let rd = (word & 0x1F) as u8; + Ok(Some(LogicalReg { + dst: rd, + lhs: rn, + rhs: rm, + opc, + is_32: sf == 0, + })) +} + +fn emit_logical_register(decoded: LogicalReg, state: &mut DecodeState, ops: &mut Vec) { + let dst = reg_name(decoded.dst); + let lhs = reg_name(decoded.lhs); + let rhs = reg_name(decoded.rhs); + + match decoded.opc { + 0b00 => { + ops.push(Op::AndI64 { + dst: dst.clone(), + lhs, + rhs, + }); + if decoded.is_32 { + zero_extend_32(&dst, state, ops); + } + } + 0b01 => { + if decoded.lhs == 31 { + ops.push(Op::MovI64 { + dst: dst.clone(), + src: rhs, + }); + } else { + ops.push(Op::OrI64 { + dst: dst.clone(), + lhs, + rhs, + }); + } + if decoded.is_32 { + zero_extend_32(&dst, state, ops); + } + } + 0b10 => { + ops.push(Op::XorI64 { + dst: dst.clone(), + lhs, + rhs, + }); + if decoded.is_32 { + zero_extend_32(&dst, state, ops); + } + } + 0b11 => { + if decoded.dst == 31 { + ops.push(Op::TestI64 { lhs, rhs }); + } else { + ops.push(Op::AndI64 { + dst: dst.clone(), + lhs, + rhs, + }); + ops.push(Op::TestI64 { + lhs: dst.clone(), + rhs: dst.clone(), + }); + if decoded.is_32 { + zero_extend_32(&dst, state, ops); + } + } + } + _ => {} + } +} + +#[derive(Debug, Clone, Copy)] +struct LoadStore { + is_load: bool, + size: u8, + base: u8, + rt: u8, + offset: u64, +} + +fn decode_load_store(word: u32) -> Result, String> { + if (word & 0x3B000000) != 0x39000000 { + return Ok(None); + } + let size = ((word >> 30) & 0x3) as u8; + let is_load = ((word >> 22) & 0x1) == 1; + let imm12 = (word >> 10) & 0xFFF; + let rn = ((word >> 5) & 0x1F) as u8; + let rt = (word & 0x1F) as u8; + let offset = (imm12 as u64) << size; + Ok(Some(LoadStore { + is_load, + size, + base: rn, + rt, + offset, + })) +} + +fn emit_load_store(decoded: LoadStore, ops: &mut Vec, warnings: &mut Vec) { + let offset = decoded.offset as i64; + let base = reg_name(decoded.base); + let reg = reg_name(decoded.rt); + let op = match (decoded.is_load, decoded.size) { + (true, 0) => Op::LoadI8 { + dst: reg, + addr: base, + offset, + }, + (true, 1) => Op::LoadI16 { + dst: reg, + addr: base, + offset, + }, + (true, 2) => Op::LoadI32 { + dst: reg, + addr: base, + offset, + }, + (true, 3) => Op::LoadI64 { + dst: reg, + addr: base, + offset, + }, + (false, 0) => Op::StoreI8 { + src: reg, + addr: base, + offset, + }, + (false, 1) => Op::StoreI16 { + src: reg, + addr: base, + offset, + }, + (false, 2) => Op::StoreI32 { + src: reg, + addr: base, + offset, + }, + (false, 3) => Op::StoreI64 { + src: reg, + addr: base, + offset, + }, + _ => { + warnings.push("unsupported load/store size".to_string()); + return; + } + }; + ops.push(op); +} + +fn decode_branch_imm(word: u32, pc: u64) -> Option { + if (word & 0xFC000000) != 0x14000000 { + return None; + } + let imm26 = (word & 0x03FFFFFF) as i64; + let offset = sign_extend(imm26 << 2, 28); + Some((pc as i64 + offset) as u64) +} + +fn decode_branch_link(word: u32, pc: u64) -> Option { + if (word & 0xFC000000) != 0x94000000 { + return None; + } + let imm26 = (word & 0x03FFFFFF) as i64; + let offset = sign_extend(imm26 << 2, 28); + Some((pc as i64 + offset) as u64) +} + +fn decode_branch_cond(word: u32, pc: u64) -> Result, String> { + if (word & 0xFF000010) != 0x54000000 { + return Ok(None); + } + let cond = (word & 0xF) as u8; + let cond_str = cond_code(cond).ok_or_else(|| format!("unsupported condition {cond}"))?; + let imm19 = ((word >> 5) & 0x7FFFF) as i64; + let offset = sign_extend(imm19 << 2, 21); + Ok(Some((cond_str.to_string(), (pc as i64 + offset) as u64))) +} + +fn decode_cbz(word: u32, pc: u64) -> Result, String> { + let top = word & 0x7F000000; + let is_cbz = top == 0x34000000 || top == 0xB4000000; + let is_cbnz = top == 0x35000000 || top == 0xB5000000; + if !is_cbz && !is_cbnz { + return Ok(None); + } + let imm19 = ((word >> 5) & 0x7FFFF) as i64; + let offset = sign_extend(imm19 << 2, 21); + let rt = (word & 0x1F) as u8; + Ok(Some((is_cbnz, rt, (pc as i64 + offset) as u64))) +} + +fn decode_tbz(word: u32, pc: u64) -> Result, String> { + let top = word & 0x7F000000; + let is_tbz = top == 0x36000000 || top == 0xB6000000; + let is_tbnz = top == 0x37000000 || top == 0xB7000000; + if !is_tbz && !is_tbnz { + return Ok(None); + } + let b5 = ((word >> 31) & 0x1) as u8; + let b40 = ((word >> 19) & 0x1F) as u8; + let bit = (b5 << 5) | b40; + let imm14 = ((word >> 5) & 0x3FFF) as i64; + let offset = sign_extend(imm14 << 2, 16); + let rt = (word & 0x1F) as u8; + Ok(Some((is_tbnz, rt, bit, (pc as i64 + offset) as u64))) +} + +fn decode_br_register(word: u32) -> Option { + if (word & 0xFFFFFC1F) != 0xD61F0000 { + return None; + } + Some(((word >> 5) & 0x1F) as u8) +} + +fn is_ret(word: u32) -> bool { + (word & 0xFFFFFC1F) == 0xD65F0000 +} + +fn cond_code(code: u8) -> Option<&'static str> { + match code { + 0x0 => Some("eq"), + 0x1 => Some("ne"), + 0x2 => Some("cs"), + 0x3 => Some("cc"), + 0x4 => Some("mi"), + 0x5 => Some("pl"), + 0x6 => Some("vs"), + 0x7 => Some("vc"), + 0x8 => Some("hi"), + 0x9 => Some("ls"), + 0xA => Some("ge"), + 0xB => Some("lt"), + 0xC => Some("gt"), + 0xD => Some("le"), + 0xE => Some("al"), + _ => None, + } +} + +fn sign_extend(value: i64, bits: u8) -> i64 { + let shift = 64 - bits as i64; + (value << shift) >> shift +} + +fn zero_extend_32(dst: &str, state: &mut DecodeState, ops: &mut Vec) { + let mask = state.next_temp("imm"); + ops.push(Op::ConstI64 { + dst: mask.clone(), + imm: 0xFFFF_FFFF, + }); + ops.push(Op::AndI64 { + dst: dst.to_string(), + lhs: dst.to_string(), + rhs: mask, + }); +} + +fn reg_name(reg: u8) -> String { + format!("x{reg}") +} + +fn block_label(addr: u64) -> String { + format!("bb_{addr:016x}") +} + +fn pop_first(set: &mut BTreeSet) -> Option { + let first = set.iter().next().copied(); + if let Some(value) = first { + set.remove(&value); + } + first +} + +fn resolve_segment_path(base_dir: &Path, segment: &ModuleSegment) -> PathBuf { + let path = Path::new(&segment.output_path); + if path.is_absolute() { + path.to_path_buf() + } else { + base_dir.join(path) + } +} + +fn ensure_dir(path: &Path) -> Result<(), String> { + fs::create_dir_all(path).map_err(|err| err.to_string()) +} + +fn absolute_path(path: &Path) -> Result { + if path.is_absolute() { + Ok(path.to_path_buf()) + } else { + std::env::current_dir() + .map_err(|err| err.to_string()) + .map(|cwd| cwd.join(path)) + } +} diff --git a/crates/recomp-pipeline/src/homebrew/mod.rs b/crates/recomp-pipeline/src/homebrew/mod.rs new file mode 100644 index 0000000..05a6905 --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/mod.rs @@ -0,0 +1,13 @@ +pub mod intake; +pub mod lift; +pub mod module; +pub mod nro; +pub mod nso; +pub mod romfs; +mod util; + +pub use intake::{intake_homebrew, IntakeOptions, IntakeReport}; +pub use lift::{lift_homebrew, LiftMode, LiftOptions, LiftReport}; +pub use module::{ModuleBuild, ModuleJson, ModuleSegment, ModuleWriteReport}; +pub use nro::{NroAssetHeader, NroModule, NroSegment}; +pub use nso::{NsoModule, NsoSegment, NsoSegmentKind, NsoSegmentPermissions}; diff --git a/crates/recomp-pipeline/src/homebrew/module.rs b/crates/recomp-pipeline/src/homebrew/module.rs new file mode 100644 index 0000000..c73f946 --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/module.rs @@ -0,0 +1,60 @@ +use serde::{Deserialize, Serialize}; +use std::path::PathBuf; + +pub const MODULE_SCHEMA_VERSION: &str = "1"; + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct ModuleJson { + pub schema_version: String, + pub module_type: String, + pub modules: Vec, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct ModuleBuild { + pub name: String, + pub format: String, + pub input_path: PathBuf, + pub input_sha256: String, + pub input_size: u64, + pub build_id: String, + pub segments: Vec, + pub bss: BssInfo, + #[serde(skip_serializing_if = "Option::is_none")] + pub embedded: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub dynstr: Option, + #[serde(skip_serializing_if = "Option::is_none")] + pub dynsym: Option, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct ModuleSegment { + pub name: String, + pub file_offset: u64, + pub file_size: u64, + pub memory_offset: u64, + pub memory_size: u64, + pub permissions: String, + #[serde(skip_serializing_if = "Option::is_none")] + pub compressed: Option, + pub output_path: String, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct BssInfo { + pub size: u64, + pub memory_offset: u64, +} + +#[derive(Debug, Serialize, Deserialize, Clone)] +pub struct OffsetInfo { + pub offset: u64, + pub size: u64, +} + +#[derive(Debug)] +pub struct ModuleWriteReport { + pub module_json_path: PathBuf, + pub segment_paths: Vec, +} diff --git a/crates/recomp-pipeline/src/homebrew/nro.rs b/crates/recomp-pipeline/src/homebrew/nro.rs new file mode 100644 index 0000000..963f4d9 --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/nro.rs @@ -0,0 +1,226 @@ +use crate::homebrew::util::{find_magic, hex_bytes, read_bytes, read_u32, read_u64}; +use std::fs; +use std::path::{Path, PathBuf}; + +const NRO_MAGIC: u32 = 0x304F524E; // "NRO0" +const NRO_HEADER_MAGIC_OFFSET: usize = 0x10; +const NRO_HEADER_MIN_SIZE: usize = 0x80; +const ASSET_MAGIC: u32 = 0x54455341; // "ASET" + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum NroSegmentPermissions { + Rx, + R, + Rw, +} + +impl NroSegmentPermissions { + pub fn as_str(self) -> &'static str { + match self { + NroSegmentPermissions::Rx => "r-x", + NroSegmentPermissions::R => "r--", + NroSegmentPermissions::Rw => "rw-", + } + } +} + +#[derive(Debug, Clone)] +pub struct NroSegment { + pub name: String, + pub file_offset: u32, + pub size: u32, + pub memory_offset: u32, + pub permissions: NroSegmentPermissions, +} + +#[derive(Debug, Clone, Copy)] +pub struct NroAssetSection { + pub offset: u64, + pub size: u64, +} + +#[derive(Debug, Clone)] +pub struct NroAssetHeader { + pub icon: NroAssetSection, + pub nacp: NroAssetSection, + pub romfs: NroAssetSection, + pub base_offset: u64, +} + +#[derive(Debug, Clone)] +pub struct NroModule { + pub path: PathBuf, + pub size: u32, + pub segments: Vec, + pub bss_size: u32, + pub build_id: [u8; 0x20], + pub assets: Option, +} + +impl NroModule { + pub fn build_id_hex(&self) -> String { + hex_bytes(&self.build_id) + } +} + +pub fn parse_nro(path: &Path) -> Result { + let bytes = fs::read(path).map_err(|err| format!("read NRO {}: {err}", path.display()))?; + let magic_offset = + find_magic(&bytes, NRO_MAGIC, 0x80).ok_or_else(|| "NRO magic not found".to_string())?; + let header_start = magic_offset + .checked_sub(NRO_HEADER_MAGIC_OFFSET) + .ok_or_else(|| "NRO header offset underflow".to_string())?; + if bytes.len() < header_start + NRO_HEADER_MIN_SIZE { + return Err("NRO header truncated".to_string()); + } + + let size = read_u32(&bytes, header_start + 0x18).map_err(|err| err.to_string())?; + let text_mem_offset = read_u32(&bytes, header_start + 0x20).map_err(|err| err.to_string())?; + let text_size = read_u32(&bytes, header_start + 0x24).map_err(|err| err.to_string())?; + let ro_mem_offset = read_u32(&bytes, header_start + 0x28).map_err(|err| err.to_string())?; + let ro_size = read_u32(&bytes, header_start + 0x2C).map_err(|err| err.to_string())?; + let data_mem_offset = read_u32(&bytes, header_start + 0x30).map_err(|err| err.to_string())?; + let data_size = read_u32(&bytes, header_start + 0x34).map_err(|err| err.to_string())?; + let bss_size = read_u32(&bytes, header_start + 0x38).map_err(|err| err.to_string())?; + let build_id = read_bytes(&bytes, header_start + 0x40, 0x20) + .map_err(|err| err.to_string())? + .try_into() + .map_err(|_| "NRO build id length mismatch".to_string())?; + + let segments = match parse_segments_libnx(&bytes, magic_offset) { + Some(mut segments) => { + if segments.iter().all(|seg| { + let end = seg.file_offset.saturating_add(seg.size); + end as usize <= bytes.len() + }) { + segments[0].memory_offset = text_mem_offset; + segments[1].memory_offset = ro_mem_offset; + segments[2].memory_offset = data_mem_offset; + segments + } else { + synthesize_segments( + header_start, + text_mem_offset, + text_size, + ro_mem_offset, + ro_size, + data_mem_offset, + data_size, + ) + } + } + None => synthesize_segments( + header_start, + text_mem_offset, + text_size, + ro_mem_offset, + ro_size, + data_mem_offset, + data_size, + ), + }; + + let assets = parse_assets(&bytes, size as usize); + + Ok(NroModule { + path: path.to_path_buf(), + size, + segments, + bss_size, + build_id, + assets, + }) +} + +fn parse_segments_libnx(bytes: &[u8], magic_offset: usize) -> Option> { + let mut segments = Vec::new(); + let offsets = [0x10, 0x18, 0x20]; + let names = ["text", "rodata", "data"]; + let perms = [ + NroSegmentPermissions::Rx, + NroSegmentPermissions::R, + NroSegmentPermissions::Rw, + ]; + for ((offset, name), perm) in offsets.iter().zip(names.iter()).zip(perms.iter()) { + let file_offset = read_u32(bytes, magic_offset + offset).ok()?; + let size = read_u32(bytes, magic_offset + offset + 0x4).ok()?; + segments.push(NroSegment { + name: name.to_string(), + file_offset, + size, + memory_offset: 0, + permissions: *perm, + }); + } + Some(segments) +} + +fn synthesize_segments( + header_start: usize, + text_mem_offset: u32, + text_size: u32, + ro_mem_offset: u32, + ro_size: u32, + data_mem_offset: u32, + data_size: u32, +) -> Vec { + let text_file_offset = (header_start + NRO_HEADER_MIN_SIZE) as u32; + let ro_file_offset = text_file_offset.saturating_add(text_size); + let data_file_offset = ro_file_offset.saturating_add(ro_size); + + vec![ + NroSegment { + name: "text".to_string(), + file_offset: text_file_offset, + size: text_size, + memory_offset: text_mem_offset, + permissions: NroSegmentPermissions::Rx, + }, + NroSegment { + name: "rodata".to_string(), + file_offset: ro_file_offset, + size: ro_size, + memory_offset: ro_mem_offset, + permissions: NroSegmentPermissions::R, + }, + NroSegment { + name: "data".to_string(), + file_offset: data_file_offset, + size: data_size, + memory_offset: data_mem_offset, + permissions: NroSegmentPermissions::Rw, + }, + ] +} + +fn parse_assets(bytes: &[u8], offset: usize) -> Option { + if offset + 0x38 > bytes.len() { + return None; + } + let magic = read_u32(bytes, offset).ok()?; + if magic != ASSET_MAGIC { + return None; + } + let icon_offset = read_u64(bytes, offset + 0x8).ok()?; + let icon_size = read_u64(bytes, offset + 0x10).ok()?; + let nacp_offset = read_u64(bytes, offset + 0x18).ok()?; + let nacp_size = read_u64(bytes, offset + 0x20).ok()?; + let romfs_offset = read_u64(bytes, offset + 0x28).ok()?; + let romfs_size = read_u64(bytes, offset + 0x30).ok()?; + + Some(NroAssetHeader { + icon: NroAssetSection { + offset: icon_offset, + size: icon_size, + }, + nacp: NroAssetSection { + offset: nacp_offset, + size: nacp_size, + }, + romfs: NroAssetSection { + offset: romfs_offset, + size: romfs_size, + }, + base_offset: offset as u64, + }) +} diff --git a/crates/recomp-pipeline/src/homebrew/nso.rs b/crates/recomp-pipeline/src/homebrew/nso.rs new file mode 100644 index 0000000..dee6aab --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/nso.rs @@ -0,0 +1,190 @@ +use crate::homebrew::util::{hex_bytes, parse_error, read_bytes, read_u32}; +use lz4_flex::block::decompress; +use std::fs; +use std::path::{Path, PathBuf}; + +const NSO_MAGIC: u32 = 0x304F534E; // "NSO0" +const NSO_HEADER_SIZE: usize = 0x100; + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum NsoSegmentKind { + Text, + Rodata, + Data, +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum NsoSegmentPermissions { + Rx, + R, + Rw, +} + +impl NsoSegmentPermissions { + pub fn as_str(self) -> &'static str { + match self { + NsoSegmentPermissions::Rx => "r-x", + NsoSegmentPermissions::R => "r--", + NsoSegmentPermissions::Rw => "rw-", + } + } +} + +#[derive(Debug, Clone)] +pub struct NsoSegment { + pub kind: NsoSegmentKind, + pub file_offset: u32, + pub memory_offset: u32, + pub size: u32, + pub file_size: u32, + pub compressed: bool, + pub permissions: NsoSegmentPermissions, +} + +#[derive(Debug, Clone)] +pub struct NsoModule { + pub path: PathBuf, + pub size: u64, + pub segments: Vec, + pub bss_size: u32, + pub module_id: [u8; 32], + pub embedded_offset: u32, + pub embedded_size: u32, + pub dynstr_offset: u32, + pub dynstr_size: u32, + pub dynsym_offset: u32, + pub dynsym_size: u32, +} + +impl NsoModule { + pub fn module_id_hex(&self) -> String { + hex_bytes(&self.module_id) + } +} + +#[derive(Debug, Clone)] +pub struct NsoSegmentData { + pub segment: NsoSegment, + pub data: Vec, +} + +pub fn parse_nso(path: &Path) -> Result { + let bytes = fs::read(path).map_err(|err| format!("read NSO {}: {err}", path.display()))?; + if bytes.len() < NSO_HEADER_SIZE { + return Err(format!("NSO too small: {} bytes", bytes.len())); + } + let magic = read_u32(&bytes, 0).map_err(|err| err.to_string())?; + if magic != NSO_MAGIC { + return Err(format!("NSO magic mismatch: {magic:#x}")); + } + + let flags = read_u32(&bytes, 0x8).map_err(|err| err.to_string())?; + let text = read_segment(&bytes, 0x10, NsoSegmentKind::Text)?; + let rodata = read_segment(&bytes, 0x20, NsoSegmentKind::Rodata)?; + let data = read_segment(&bytes, 0x30, NsoSegmentKind::Data)?; + + let file_sizes = [ + read_u32(&bytes, 0x60).map_err(|err| err.to_string())?, + read_u32(&bytes, 0x64).map_err(|err| err.to_string())?, + read_u32(&bytes, 0x68).map_err(|err| err.to_string())?, + ]; + + let mut segments = vec![text, rodata, data]; + for (segment, file_size) in segments.iter_mut().zip(file_sizes) { + segment.file_size = file_size; + segment.compressed = match segment.kind { + NsoSegmentKind::Text => flags & 0x1 != 0, + NsoSegmentKind::Rodata => flags & 0x2 != 0, + NsoSegmentKind::Data => flags & 0x4 != 0, + }; + } + + let bss_size = read_u32(&bytes, 0x3C).map_err(|err| err.to_string())?; + let embedded_offset = read_u32(&bytes, 0x70).map_err(|err| err.to_string())?; + let embedded_size = read_u32(&bytes, 0x74).map_err(|err| err.to_string())?; + let dynstr_offset = read_u32(&bytes, 0x78).map_err(|err| err.to_string())?; + let dynstr_size = read_u32(&bytes, 0x7C).map_err(|err| err.to_string())?; + let dynsym_offset = read_u32(&bytes, 0x80).map_err(|err| err.to_string())?; + let dynsym_size = read_u32(&bytes, 0x84).map_err(|err| err.to_string())?; + + let module_id = read_bytes(&bytes, 0x40, 0x20) + .map_err(|err| err.to_string())? + .try_into() + .map_err(|_| "NSO module id length mismatch".to_string())?; + + Ok(NsoModule { + path: path.to_path_buf(), + size: bytes.len() as u64, + segments, + bss_size, + module_id, + embedded_offset, + embedded_size, + dynstr_offset, + dynstr_size, + dynsym_offset, + dynsym_size, + }) +} + +pub fn extract_segments(module: &NsoModule) -> Result, String> { + let bytes = fs::read(&module.path) + .map_err(|err| format!("read NSO {}: {err}", module.path.display()))?; + let mut out = Vec::new(); + for segment in &module.segments { + let start = segment.file_offset as usize; + let file_size = segment.file_size as usize; + let end = start + .checked_add(file_size) + .ok_or_else(|| "segment offset overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "NSO segment out of range: {}..{} for {}", + start, + end, + module.path.display() + )); + } + let data = &bytes[start..end]; + let decoded = if segment.compressed { + decompress(data, segment.size as usize) + .map_err(|err| parse_error(format!("lz4 decode failed: {err}"))) + .map_err(|err| err.to_string())? + } else { + data.to_vec() + }; + if decoded.len() != segment.size as usize { + return Err(format!( + "NSO segment size mismatch: expected {}, got {}", + segment.size, + decoded.len() + )); + } + out.push(NsoSegmentData { + segment: segment.clone(), + data: decoded, + }); + } + Ok(out) +} + +fn read_segment(bytes: &[u8], offset: usize, kind: NsoSegmentKind) -> Result { + let file_offset = read_u32(bytes, offset).map_err(|err| err.to_string())?; + let memory_offset = read_u32(bytes, offset + 0x4).map_err(|err| err.to_string())?; + let size = read_u32(bytes, offset + 0x8).map_err(|err| err.to_string())?; + let permissions = match kind { + NsoSegmentKind::Text => NsoSegmentPermissions::Rx, + NsoSegmentKind::Rodata => NsoSegmentPermissions::R, + NsoSegmentKind::Data => NsoSegmentPermissions::Rw, + }; + + Ok(NsoSegment { + kind, + file_offset, + memory_offset, + size, + file_size: 0, + compressed: false, + permissions, + }) +} diff --git a/crates/recomp-pipeline/src/homebrew/romfs.rs b/crates/recomp-pipeline/src/homebrew/romfs.rs new file mode 100644 index 0000000..de84d7b --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/romfs.rs @@ -0,0 +1,462 @@ +use std::collections::HashSet; + +const ROMFS_HEADER_SIZE: u64 = 0x50; +const ROMFS_INVALID_OFFSET: u32 = 0xFFFF_FFFF; + +#[derive(Debug, Clone)] +pub struct RomfsEntry { + pub path: String, + pub data_offset: u64, + pub data_size: u64, +} + +#[derive(Debug, Clone)] +struct RomfsHeader { + dir_table_off: u64, + dir_table_size: u64, + file_table_off: u64, + file_table_size: u64, + file_data_off: u64, +} + +#[derive(Debug, Clone)] +struct DirEntry { + sibling: u32, + child_dir: u32, + child_file: u32, + name: String, +} + +#[derive(Debug, Clone)] +struct FileEntry { + sibling: u32, + data_off: u64, + data_size: u64, + name: String, +} + +pub fn list_romfs_entries(bytes: &[u8]) -> Result, String> { + let header = parse_header(bytes)?; + let mut entries = Vec::new(); + let mut dir_stack = Vec::new(); + let mut visited_dirs = HashSet::new(); + let mut visited_files = HashSet::new(); + let mut seen_paths = HashSet::new(); + + dir_stack.push((0u32, Vec::::new())); + + while let Some((dir_off, parent_path)) = dir_stack.pop() { + if !visited_dirs.insert(dir_off) { + return Err(format!("romfs directory loop at offset {dir_off}")); + } + let dir = read_dir_entry(bytes, &header, dir_off)?; + let mut current_path = parent_path.clone(); + if !dir.name.is_empty() { + current_path.push(dir.name); + } + + if dir.child_file != ROMFS_INVALID_OFFSET { + extract_files( + bytes, + &header, + dir.child_file, + ¤t_path, + &mut visited_files, + &mut seen_paths, + &mut entries, + )?; + } + + if dir.sibling != ROMFS_INVALID_OFFSET { + dir_stack.push((dir.sibling, parent_path)); + } + if dir.child_dir != ROMFS_INVALID_OFFSET { + dir_stack.push((dir.child_dir, current_path)); + } + } + + Ok(entries) +} + +fn extract_files( + bytes: &[u8], + header: &RomfsHeader, + start_off: u32, + dir_components: &[String], + visited_files: &mut HashSet, + seen_paths: &mut HashSet, + entries: &mut Vec, +) -> Result<(), String> { + let mut file_off = start_off; + while file_off != ROMFS_INVALID_OFFSET { + if !visited_files.insert(file_off) { + return Err(format!("romfs file loop at offset {file_off}")); + } + let file = read_file_entry(bytes, header, file_off)?; + if file.name.is_empty() { + return Err(format!("romfs file entry {file_off} has empty name")); + } + let mut components = dir_components.to_vec(); + components.push(file.name); + let path = components.join("/"); + if !seen_paths.insert(path.clone()) { + return Err(format!("romfs duplicate path {path}")); + } + + let data_offset = header + .file_data_off + .checked_add(file.data_off) + .ok_or_else(|| "romfs data offset overflow".to_string())?; + let data_end = data_offset + .checked_add(file.data_size) + .ok_or_else(|| "romfs data size overflow".to_string())?; + if data_end > bytes.len() as u64 { + return Err(format!( + "romfs file data out of range: {}..{} (len={})", + data_offset, + data_end, + bytes.len() + )); + } + + entries.push(RomfsEntry { + path, + data_offset, + data_size: file.data_size, + }); + + file_off = file.sibling; + } + + Ok(()) +} + +fn parse_header(bytes: &[u8]) -> Result { + if bytes.len() < ROMFS_HEADER_SIZE as usize { + return Err(format!("romfs image too small: {} bytes", bytes.len())); + } + let header_size = read_u64(bytes, 0x0, "header_size")?; + if header_size != ROMFS_HEADER_SIZE { + return Err(format!( + "romfs header size mismatch: expected 0x50, got 0x{header_size:x}" + )); + } + + let dir_table_off = read_u64(bytes, 0x18, "dir_table_off")?; + let dir_table_size = read_u64(bytes, 0x20, "dir_table_size")?; + let file_table_off = read_u64(bytes, 0x38, "file_table_off")?; + let file_table_size = read_u64(bytes, 0x40, "file_table_size")?; + let file_data_off = read_u64(bytes, 0x48, "file_data_off")?; + + validate_range(bytes, dir_table_off, dir_table_size, "dir_table")?; + validate_range(bytes, file_table_off, file_table_size, "file_table")?; + if file_data_off > bytes.len() as u64 { + return Err(format!( + "romfs file data offset out of range: {file_data_off} (len={})", + bytes.len() + )); + } + + Ok(RomfsHeader { + dir_table_off, + dir_table_size, + file_table_off, + file_table_size, + file_data_off, + }) +} + +fn read_dir_entry(bytes: &[u8], header: &RomfsHeader, dir_off: u32) -> Result { + let dir_off = dir_off as u64; + if dir_off >= header.dir_table_size { + return Err(format!( + "romfs dir offset out of range: {dir_off} (size={})", + header.dir_table_size + )); + } + let entry_off = header + .dir_table_off + .checked_add(dir_off) + .ok_or_else(|| "romfs dir offset overflow".to_string())?; + let entry_off = entry_off as usize; + let sibling = read_u32(bytes, entry_off + 0x4, "dir_sibling")?; + let child_dir = read_u32(bytes, entry_off + 0x8, "dir_child_dir")?; + let child_file = read_u32(bytes, entry_off + 0xC, "dir_child_file")?; + let name_len = read_u32(bytes, entry_off + 0x14, "dir_name_len")? as usize; + let name_off = entry_off + 0x18; + let name = read_name(bytes, name_off, name_len, "dir")?; + + Ok(DirEntry { + sibling, + child_dir, + child_file, + name, + }) +} + +fn read_file_entry(bytes: &[u8], header: &RomfsHeader, file_off: u32) -> Result { + let file_off = file_off as u64; + if file_off >= header.file_table_size { + return Err(format!( + "romfs file offset out of range: {file_off} (size={})", + header.file_table_size + )); + } + let entry_off = header + .file_table_off + .checked_add(file_off) + .ok_or_else(|| "romfs file offset overflow".to_string())?; + let entry_off = entry_off as usize; + let sibling = read_u32(bytes, entry_off + 0x4, "file_sibling")?; + let data_off = read_u64(bytes, entry_off + 0x8, "file_data_off")?; + let data_size = read_u64(bytes, entry_off + 0x10, "file_data_size")?; + let name_len = read_u32(bytes, entry_off + 0x1C, "file_name_len")? as usize; + let name_off = entry_off + 0x20; + let name = read_name(bytes, name_off, name_len, "file")?; + + Ok(FileEntry { + sibling, + data_off, + data_size, + name, + }) +} + +fn read_name(bytes: &[u8], offset: usize, len: usize, kind: &str) -> Result { + if len == 0 { + return Ok(String::new()); + } + let end = offset + .checked_add(len) + .ok_or_else(|| format!("romfs {kind} name overflow"))?; + if end > bytes.len() { + return Err(format!( + "romfs {kind} name out of range: {}..{} (len={})", + offset, + end, + bytes.len() + )); + } + let name_bytes = &bytes[offset..end]; + let terminator = name_bytes.iter().position(|b| *b == 0).unwrap_or(len); + let name_bytes = &name_bytes[..terminator]; + let name = String::from_utf8(name_bytes.to_vec()) + .map_err(|_| format!("romfs {kind} name is not valid UTF-8"))?; + if name.contains('/') || name.contains('\\') { + return Err(format!("romfs {kind} name has path separator: {name}")); + } + if name == "." || name == ".." { + return Err(format!("romfs {kind} name is invalid: {name}")); + } + Ok(name) +} + +fn read_u32(bytes: &[u8], offset: usize, label: &str) -> Result { + let end = offset + 4; + if end > bytes.len() { + return Err(format!( + "romfs read {label} out of range: {}..{} (len={})", + offset, + end, + bytes.len() + )); + } + let mut buf = [0u8; 4]; + buf.copy_from_slice(&bytes[offset..end]); + Ok(u32::from_le_bytes(buf)) +} + +fn read_u64(bytes: &[u8], offset: usize, label: &str) -> Result { + let end = offset + 8; + if end > bytes.len() { + return Err(format!( + "romfs read {label} out of range: {}..{} (len={})", + offset, + end, + bytes.len() + )); + } + let mut buf = [0u8; 8]; + buf.copy_from_slice(&bytes[offset..end]); + Ok(u64::from_le_bytes(buf)) +} + +fn validate_range(bytes: &[u8], offset: u64, size: u64, label: &str) -> Result<(), String> { + let end = offset + .checked_add(size) + .ok_or_else(|| format!("romfs {label} range overflow"))?; + if end > bytes.len() as u64 { + return Err(format!( + "romfs {label} out of range: {}..{} (len={})", + offset, + end, + bytes.len() + )); + } + Ok(()) +} + +#[cfg(test)] +mod tests { + use super::list_romfs_entries; + + fn align_up(value: usize, align: usize) -> usize { + value.div_ceil(align) * align + } + + fn write_u64(bytes: &mut [u8], offset: usize, value: u64) { + bytes[offset..offset + 8].copy_from_slice(&value.to_le_bytes()); + } + + fn push_dir_entry( + buf: &mut Vec, + parent: u32, + sibling: u32, + child_dir: u32, + child_file: u32, + next_hash: u32, + name: &str, + ) -> u32 { + let offset = buf.len() as u32; + buf.extend_from_slice(&parent.to_le_bytes()); + buf.extend_from_slice(&sibling.to_le_bytes()); + buf.extend_from_slice(&child_dir.to_le_bytes()); + buf.extend_from_slice(&child_file.to_le_bytes()); + buf.extend_from_slice(&next_hash.to_le_bytes()); + buf.extend_from_slice(&(name.len() as u32).to_le_bytes()); + buf.extend_from_slice(name.as_bytes()); + while buf.len() % 4 != 0 { + buf.push(0); + } + offset + } + + fn push_file_entry( + buf: &mut Vec, + parent: u32, + sibling: u32, + data_off: u64, + data_size: u64, + next_hash: u32, + name: &str, + ) -> u32 { + let offset = buf.len() as u32; + buf.extend_from_slice(&parent.to_le_bytes()); + buf.extend_from_slice(&sibling.to_le_bytes()); + buf.extend_from_slice(&data_off.to_le_bytes()); + buf.extend_from_slice(&data_size.to_le_bytes()); + buf.extend_from_slice(&next_hash.to_le_bytes()); + buf.extend_from_slice(&(name.len() as u32).to_le_bytes()); + buf.extend_from_slice(name.as_bytes()); + while buf.len() % 4 != 0 { + buf.push(0); + } + offset + } + + fn build_romfs_image() -> Vec { + let file_root = b"HELLO"; + let file_nested = b"NESTED"; + let nested_dir = "data"; + let root_name = ""; + + let root_entry_size = align_up(0x18 + root_name.len(), 4); + let nested_entry_off = root_entry_size as u32; + let nested_entry_size = align_up(0x18 + nested_dir.len(), 4); + let dir_table_size = root_entry_size + nested_entry_size; + + let file_root_name = "hello.txt"; + let file_nested_name = "nested.bin"; + let file_root_entry_size = align_up(0x20 + file_root_name.len(), 4); + let file_nested_off = file_root_entry_size as u32; + let file_nested_entry_size = align_up(0x20 + file_nested_name.len(), 4); + let file_table_size = file_root_entry_size + file_nested_entry_size; + + let file_root_data_off = 0u64; + let file_nested_data_off = align_up(file_root.len(), 0x10) as u64; + let mut file_data = Vec::new(); + file_data.extend_from_slice(file_root); + let padding = align_up(file_data.len(), 0x10) - file_data.len(); + file_data.extend(std::iter::repeat_n(0u8, padding)); + file_data.extend_from_slice(file_nested); + + let mut dir_table = Vec::new(); + push_dir_entry( + &mut dir_table, + 0xFFFF_FFFF, + 0xFFFF_FFFF, + nested_entry_off, + 0, + 0xFFFF_FFFF, + root_name, + ); + push_dir_entry( + &mut dir_table, + 0, + 0xFFFF_FFFF, + 0xFFFF_FFFF, + file_nested_off, + 0xFFFF_FFFF, + nested_dir, + ); + + assert_eq!(dir_table.len(), dir_table_size); + + let mut file_table = Vec::new(); + push_file_entry( + &mut file_table, + 0, + 0xFFFF_FFFF, + file_root_data_off, + file_root.len() as u64, + 0xFFFF_FFFF, + file_root_name, + ); + push_file_entry( + &mut file_table, + nested_entry_off, + 0xFFFF_FFFF, + file_nested_data_off, + file_nested.len() as u64, + 0xFFFF_FFFF, + file_nested_name, + ); + + assert_eq!(file_table.len(), file_table_size); + + let header_size = 0x50usize; + let dir_table_off = align_up(header_size, 0x10); + let file_table_off = align_up(dir_table_off + dir_table_size, 0x10); + let file_data_off = align_up(file_table_off + file_table_size, 0x10); + let total_size = file_data_off + file_data.len(); + + let mut image = vec![0u8; total_size]; + write_u64(&mut image, 0x0, 0x50); + write_u64(&mut image, 0x8, dir_table_off as u64); + write_u64(&mut image, 0x10, 0); + write_u64(&mut image, 0x18, dir_table_off as u64); + write_u64(&mut image, 0x20, dir_table_size as u64); + write_u64(&mut image, 0x28, file_table_off as u64); + write_u64(&mut image, 0x30, 0); + write_u64(&mut image, 0x38, file_table_off as u64); + write_u64(&mut image, 0x40, file_table_size as u64); + write_u64(&mut image, 0x48, file_data_off as u64); + + image[dir_table_off..dir_table_off + dir_table_size].copy_from_slice(&dir_table); + image[file_table_off..file_table_off + file_table_size].copy_from_slice(&file_table); + image[file_data_off..file_data_off + file_data.len()].copy_from_slice(&file_data); + image + } + + #[test] + fn list_romfs_entries_emits_paths() { + let image = build_romfs_image(); + let mut entries = list_romfs_entries(&image).expect("list entries"); + entries.sort_by(|a, b| a.path.cmp(&b.path)); + let paths = entries + .iter() + .map(|entry| entry.path.as_str()) + .collect::>(); + assert_eq!(paths, vec!["data/nested.bin", "hello.txt"]); + } +} diff --git a/crates/recomp-pipeline/src/homebrew/util.rs b/crates/recomp-pipeline/src/homebrew/util.rs new file mode 100644 index 0000000..dfe0eab --- /dev/null +++ b/crates/recomp-pipeline/src/homebrew/util.rs @@ -0,0 +1,82 @@ +use std::fmt; + +pub fn read_u32(bytes: &[u8], offset: usize) -> Result { + let end = offset + .checked_add(4) + .ok_or_else(|| "offset overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "read_u32 out of range: offset={offset} len={}", + bytes.len() + )); + } + Ok(u32::from_le_bytes( + bytes[offset..end].try_into().expect("slice length"), + )) +} + +pub fn read_u64(bytes: &[u8], offset: usize) -> Result { + let end = offset + .checked_add(8) + .ok_or_else(|| "offset overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "read_u64 out of range: offset={offset} len={}", + bytes.len() + )); + } + Ok(u64::from_le_bytes( + bytes[offset..end].try_into().expect("slice length"), + )) +} + +pub fn read_bytes(bytes: &[u8], offset: usize, size: usize) -> Result<&[u8], String> { + let end = offset + .checked_add(size) + .ok_or_else(|| "offset overflow".to_string())?; + if end > bytes.len() { + return Err(format!( + "read_bytes out of range: offset={offset} size={size} len={}", + bytes.len() + )); + } + Ok(&bytes[offset..end]) +} + +pub fn hex_bytes(bytes: &[u8]) -> String { + use std::fmt::Write; + + let mut out = String::with_capacity(bytes.len() * 2); + for byte in bytes { + let _ = write!(&mut out, "{byte:02x}"); + } + out +} + +pub fn find_magic(bytes: &[u8], magic: u32, search_len: usize) -> Option { + let target = magic.to_le_bytes(); + let len = bytes.len().min(search_len); + bytes + .windows(4) + .take(len.saturating_sub(3)) + .position(|window| window == target) +} + +#[derive(Debug)] +pub struct ParseError { + pub context: String, +} + +impl fmt::Display for ParseError { + fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { + write!(f, "{}", self.context) + } +} + +impl std::error::Error for ParseError {} + +pub fn parse_error(context: impl Into) -> ParseError { + ParseError { + context: context.into(), + } +} diff --git a/crates/recomp-pipeline/src/input.rs b/crates/recomp-pipeline/src/input.rs index 6066061..6978474 100644 --- a/crates/recomp-pipeline/src/input.rs +++ b/crates/recomp-pipeline/src/input.rs @@ -1,18 +1,30 @@ -use serde::Deserialize; +use serde::{Deserialize, Serialize}; -#[derive(Debug, Deserialize)] +#[derive(Debug, Deserialize, Serialize)] pub struct Module { pub arch: String, pub functions: Vec, } -#[derive(Debug, Deserialize)] +#[derive(Debug, Deserialize, Serialize)] pub struct Function { pub name: String, + #[serde(default)] pub ops: Vec, + #[serde(default)] + pub blocks: Vec, } -#[derive(Debug, Deserialize)] +#[derive(Debug, Deserialize, Serialize)] +pub struct Block { + pub label: String, + pub start: u64, + #[serde(default)] + pub ops: Vec, + pub terminator: Terminator, +} + +#[derive(Debug, Deserialize, Serialize)] #[serde(tag = "op", rename_all = "snake_case")] pub enum Op { ConstI64 { @@ -24,6 +36,122 @@ pub enum Op { lhs: String, rhs: String, }, + MovI64 { + dst: String, + src: String, + }, + SubI64 { + dst: String, + lhs: String, + rhs: String, + }, + AndI64 { + dst: String, + lhs: String, + rhs: String, + }, + OrI64 { + dst: String, + lhs: String, + rhs: String, + }, + XorI64 { + dst: String, + lhs: String, + rhs: String, + }, + CmpI64 { + lhs: String, + rhs: String, + }, + CmnI64 { + lhs: String, + rhs: String, + }, + TestI64 { + lhs: String, + rhs: String, + }, + LslI64 { + dst: String, + lhs: String, + rhs: String, + }, + LsrI64 { + dst: String, + lhs: String, + rhs: String, + }, + AsrI64 { + dst: String, + lhs: String, + rhs: String, + }, + PcRel { + dst: String, + pc: i64, + offset: i64, + }, + LoadI8 { + dst: String, + addr: String, + #[serde(default)] + offset: i64, + }, + LoadI16 { + dst: String, + addr: String, + #[serde(default)] + offset: i64, + }, + LoadI32 { + dst: String, + addr: String, + #[serde(default)] + offset: i64, + }, + LoadI64 { + dst: String, + addr: String, + #[serde(default)] + offset: i64, + }, + StoreI8 { + src: String, + addr: String, + #[serde(default)] + offset: i64, + }, + StoreI16 { + src: String, + addr: String, + #[serde(default)] + offset: i64, + }, + StoreI32 { + src: String, + addr: String, + #[serde(default)] + offset: i64, + }, + StoreI64 { + src: String, + addr: String, + #[serde(default)] + offset: i64, + }, + Br { + target: String, + }, + BrCond { + cond: String, + then: String, + #[serde(rename = "else")] + else_target: String, + }, + Call { + target: String, + }, Syscall { name: String, args: Vec, @@ -31,6 +159,28 @@ pub enum Op { Ret, } +#[derive(Debug, Deserialize, Serialize)] +#[serde(tag = "op", rename_all = "snake_case")] +pub enum Terminator { + Br { + target: String, + }, + BrCond { + cond: String, + then: String, + #[serde(rename = "else")] + else_target: String, + }, + Call { + target: String, + next: String, + }, + BrIndirect { + reg: String, + }, + Ret, +} + impl Module { pub fn validate_arch(&self) -> Result<(), String> { if self.arch != "aarch64" { diff --git a/crates/recomp-pipeline/src/lib.rs b/crates/recomp-pipeline/src/lib.rs index 09f948f..4071d4c 100644 --- a/crates/recomp-pipeline/src/lib.rs +++ b/crates/recomp-pipeline/src/lib.rs @@ -1,5 +1,6 @@ pub mod bundle; pub mod config; +pub mod homebrew; pub mod input; pub mod output; pub mod pipeline; diff --git a/crates/recomp-pipeline/src/output.rs b/crates/recomp-pipeline/src/output.rs index d1ae1db..69e7026 100644 --- a/crates/recomp-pipeline/src/output.rs +++ b/crates/recomp-pipeline/src/output.rs @@ -1,4 +1,4 @@ -use crate::pipeline::{ensure_dir, RustFunction, RustProgram}; +use crate::pipeline::{ensure_dir, FunctionBody, RustFunction, RustProgram, RustTerminator}; use serde::Serialize; use sha2::{Digest, Sha256}; use std::fs; @@ -198,26 +198,106 @@ fn emit_function(function: &RustFunction) -> String { for reg in &function.regs { out.push_str(&format!(" let mut {reg}: i64 = 0;\n")); } - if !function.regs.is_empty() { - out.push('\n'); + if function.needs_flags { + out.push_str(" let mut flag_n = false;\n"); + out.push_str(" let mut flag_z = false;\n"); + out.push_str(" let mut flag_c = false;\n"); + out.push_str(" let mut flag_v = false;\n"); } - for line in &function.lines { - out.push_str(" "); - out.push_str(line); + if !function.regs.is_empty() || function.needs_flags { out.push('\n'); } - if function - .lines - .last() - .map(|line| !line.trim_start().starts_with("return")) - .unwrap_or(true) - { - out.push_str(" Ok(())\n"); + match &function.body { + FunctionBody::Linear(lines) => { + for line in lines { + out.push_str(" "); + out.push_str(line); + out.push('\n'); + } + if lines + .last() + .map(|line| !line.trim_start().starts_with("return")) + .unwrap_or(true) + { + out.push_str(" Ok(())\n"); + } + } + FunctionBody::Blocks(blocks) => { + let entry = blocks + .first() + .map(|block| block.label.as_str()) + .unwrap_or("entry"); + out.push_str(&format!(" let mut block_label = \"{entry}\";\n")); + out.push_str(" loop {\n"); + out.push_str(" match block_label {\n"); + for block in blocks { + out.push_str(&format!(" \"{}\" => {{\n", block.label)); + for line in &block.lines { + out.push_str(" "); + out.push_str(line); + out.push('\n'); + } + emit_block_terminator(&mut out, &block.terminator); + out.push_str(" },\n"); + } + out.push_str(" _ => {\n"); + out.push_str(" panic!(\"unknown block label: {}\", block_label);\n"); + out.push_str(" }\n"); + out.push_str(" }\n"); + out.push_str(" }\n"); + } } out.push_str("}\n"); out } +fn emit_block_terminator(out: &mut String, terminator: &RustTerminator) { + match terminator { + RustTerminator::Br { target } => { + out.push_str(&format!(" block_label = \"{target}\";\n")); + out.push_str(" continue;\n"); + } + RustTerminator::BrCond { + cond_expr, + cond, + then_label, + else_label, + } => { + if let Some(expr) = cond_expr { + out.push_str(&format!(" if {expr} {{\n")); + out.push_str(&format!( + " block_label = \"{then_label}\";\n" + )); + out.push_str(" } else {\n"); + out.push_str(&format!( + " block_label = \"{else_label}\";\n" + )); + out.push_str(" }\n"); + out.push_str(" continue;\n"); + } else { + out.push_str(&format!( + " panic!(\"unsupported condition: {cond}\");\n" + )); + } + } + RustTerminator::Call { call_line, next } => { + out.push_str(" "); + out.push_str(call_line); + out.push('\n'); + out.push_str(&format!(" block_label = \"{next}\";\n")); + out.push_str(" continue;\n"); + } + RustTerminator::BrIndirect { reg } => { + out.push_str(&format!( + " panic!(\"indirect branch via {reg} is not supported\");\n" + )); + } + RustTerminator::Ret => { + out.push_str(" return Ok(());\n"); + } + } +} + fn sanitize_name(name: &str) -> String { name.chars() .map(|ch| match ch { diff --git a/crates/recomp-pipeline/src/pipeline.rs b/crates/recomp-pipeline/src/pipeline.rs index 0b63c1a..cd3afc5 100644 --- a/crates/recomp-pipeline/src/pipeline.rs +++ b/crates/recomp-pipeline/src/pipeline.rs @@ -1,5 +1,6 @@ use crate::config::{PerformanceMode, StubBehavior, TitleConfig}; -use crate::input::{Function, Module, Op}; +use crate::homebrew::ModuleJson; +use crate::input::{Block, Function, Module, Op, Terminator}; use crate::output::{emit_project, BuildManifest, GeneratedFile, InputSummary}; use crate::provenance::{ProvenanceManifest, ValidatedInput}; use pathdiff::diff_paths; @@ -45,9 +46,18 @@ pub fn run_pipeline(options: PipelineOptions) -> Result { + module.validate_arch().map_err(PipelineError::Module)?; + module + } + ModuleSource::Homebrew(module_json) => { + return Err(PipelineError::Module(format!( + "homebrew module.json detected (schema_version={}, module_type={}). Run the lifter to produce a lifted module.json before translation.", + module_json.schema_version, module_json.module_type + ))); + } + }; let config_src = fs::read_to_string(&config_path)?; let config = TitleConfig::parse(&config_src).map_err(PipelineError::Config)?; @@ -108,6 +118,40 @@ pub fn run_pipeline(options: PipelineOptions) -> Result Result { + let value: serde_json::Value = serde_json::from_str(module_src) + .map_err(|err| PipelineError::Module(format!("invalid module JSON: {err}")))?; + if looks_like_homebrew_module(&value) { + let module_json: ModuleJson = serde_json::from_value(value) + .map_err(|err| PipelineError::Module(format!("invalid homebrew module JSON: {err}")))?; + return Ok(ModuleSource::Homebrew(module_json)); + } + let module: Module = serde_json::from_value(value) + .map_err(|err| PipelineError::Module(format!("invalid module JSON: {err}")))?; + Ok(ModuleSource::Lifted(module)) +} + +fn looks_like_homebrew_module(value: &serde_json::Value) -> bool { + value + .get("schema_version") + .and_then(|value| value.as_str()) + .is_some() + && value + .get("module_type") + .and_then(|value| value.as_str()) + .is_some() + && value + .get("modules") + .and_then(|value| value.as_array()) + .is_some() +} + fn translate_module(module: &Module, config: &TitleConfig) -> Result { let entry = config.entry.clone(); let mut functions = Vec::new(); @@ -128,44 +172,16 @@ fn translate_function( function: &Function, config: &TitleConfig, ) -> Result { - let mut regs = Vec::new(); - let mut lines = Vec::new(); - - for op in &function.ops { - match op { - Op::ConstI64 { dst, imm } => { - track_reg(&mut regs, dst); - lines.push(format!("{dst} = {imm};")); - } - Op::AddI64 { dst, lhs, rhs } => { - track_reg(&mut regs, dst); - track_reg(&mut regs, lhs); - track_reg(&mut regs, rhs); - lines.push(format!("{dst} = {lhs} + {rhs};")); - } - Op::Syscall { name, args } => { - for arg in args { - track_reg(&mut regs, arg); - } - let behavior = config - .stubs - .get(name) - .copied() - .unwrap_or(StubBehavior::Panic); - let call = render_syscall(name, behavior, args); - lines.push(call); - } - Op::Ret => { - lines.push("return Ok(());".to_string()); - } - } + if !function.blocks.is_empty() { + translate_block_function(function, config) + } else if !function.ops.is_empty() { + translate_linear_function(function, config) + } else { + Err(PipelineError::Module(format!( + "function {} has no ops or blocks", + function.name + ))) } - - Ok(RustFunction { - name: function.name.clone(), - regs, - lines, - }) } fn track_reg(regs: &mut Vec, name: &str) { @@ -189,11 +205,348 @@ fn render_syscall(name: &str, behavior: StubBehavior, args: &[String]) -> String } } +fn translate_linear_function( + function: &Function, + config: &TitleConfig, +) -> Result { + let mut regs = Vec::new(); + let mut lines = Vec::new(); + let mut needs_flags = false; + + for op in &function.ops { + translate_op(op, config, &mut regs, &mut lines, &mut needs_flags)?; + } + + Ok(RustFunction { + name: function.name.clone(), + regs, + needs_flags, + body: FunctionBody::Linear(lines), + }) +} + +fn translate_block_function( + function: &Function, + config: &TitleConfig, +) -> Result { + let mut regs = Vec::new(); + let mut blocks = Vec::new(); + let mut needs_flags = false; + + for block in &function.blocks { + let mut lines = Vec::new(); + for op in &block.ops { + translate_op(op, config, &mut regs, &mut lines, &mut needs_flags)?; + } + let terminator = translate_terminator(block, &mut needs_flags)?; + blocks.push(RustBlock { + label: block.label.clone(), + lines, + terminator, + }); + } + + Ok(RustFunction { + name: function.name.clone(), + regs, + needs_flags, + body: FunctionBody::Blocks(blocks), + }) +} + +fn translate_op( + op: &Op, + config: &TitleConfig, + regs: &mut Vec, + lines: &mut Vec, + needs_flags: &mut bool, +) -> Result<(), PipelineError> { + match op { + Op::ConstI64 { dst, imm } => { + track_reg(regs, dst); + lines.push(format!("{dst} = {imm};")); + } + Op::AddI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} + {rhs};")); + } + Op::MovI64 { dst, src } => { + track_reg(regs, dst); + track_reg(regs, src); + lines.push(format!("{dst} = {src};")); + } + Op::SubI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} - {rhs};")); + } + Op::AndI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} & {rhs};")); + } + Op::OrI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} | {rhs};")); + } + Op::XorI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} ^ {rhs};")); + } + Op::CmpI64 { lhs, rhs } => { + track_reg(regs, lhs); + track_reg(regs, rhs); + *needs_flags = true; + lines.push(format!( + "let (__recomp_cmp_res, __recomp_cmp_overflow) = {lhs}.overflowing_sub({rhs});" + )); + lines.push(format!( + "let (_, __recomp_cmp_borrow) = ({lhs} as u64).overflowing_sub({rhs} as u64);" + )); + lines.push("flag_n = __recomp_cmp_res < 0;".to_string()); + lines.push("flag_z = __recomp_cmp_res == 0;".to_string()); + lines.push("flag_c = !__recomp_cmp_borrow;".to_string()); + lines.push("flag_v = __recomp_cmp_overflow;".to_string()); + } + Op::CmnI64 { lhs, rhs } => { + track_reg(regs, lhs); + track_reg(regs, rhs); + *needs_flags = true; + lines.push(format!( + "let (__recomp_cmn_res, __recomp_cmn_overflow) = {lhs}.overflowing_add({rhs});" + )); + lines.push(format!( + "let (_, __recomp_cmn_carry) = ({lhs} as u64).overflowing_add({rhs} as u64);" + )); + lines.push("flag_n = __recomp_cmn_res < 0;".to_string()); + lines.push("flag_z = __recomp_cmn_res == 0;".to_string()); + lines.push("flag_c = __recomp_cmn_carry;".to_string()); + lines.push("flag_v = __recomp_cmn_overflow;".to_string()); + } + Op::TestI64 { lhs, rhs } => { + track_reg(regs, lhs); + track_reg(regs, rhs); + *needs_flags = true; + lines.push(format!("let __recomp_tst_res = {lhs} & {rhs};")); + lines.push("flag_n = __recomp_tst_res < 0;".to_string()); + lines.push("flag_z = __recomp_tst_res == 0;".to_string()); + lines.push("flag_c = false;".to_string()); + lines.push("flag_v = false;".to_string()); + } + Op::LslI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!( + "{dst} = (({lhs} as u64) << (({rhs} as u64) & 63)) as i64;" + )); + } + Op::LsrI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!( + "{dst} = (({lhs} as u64) >> (({rhs} as u64) & 63)) as i64;" + )); + } + Op::AsrI64 { dst, lhs, rhs } => { + track_reg(regs, dst); + track_reg(regs, lhs); + track_reg(regs, rhs); + lines.push(format!("{dst} = {lhs} >> (({rhs} as u64) & 63);")); + } + Op::PcRel { dst, pc, offset } => { + track_reg(regs, dst); + lines.push(format!("{dst} = {pc} + {offset};")); + } + Op::LoadI8 { dst, addr, .. } + | Op::LoadI16 { dst, addr, .. } + | Op::LoadI32 { dst, addr, .. } + | Op::LoadI64 { dst, addr, .. } => { + track_reg(regs, dst); + track_reg(regs, addr); + lines.push(format!( + "panic!({});", + rust_string_literal("load op not supported in runtime") + )); + } + Op::StoreI8 { src, addr, .. } + | Op::StoreI16 { src, addr, .. } + | Op::StoreI32 { src, addr, .. } + | Op::StoreI64 { src, addr, .. } => { + track_reg(regs, src); + track_reg(regs, addr); + lines.push(format!( + "panic!({});", + rust_string_literal("store op not supported in runtime") + )); + } + Op::Br { target } => { + lines.push(format!( + "panic!({});", + rust_string_literal(&format!("control-flow op in linear IR: br to {target}")) + )); + } + Op::BrCond { cond, .. } => { + *needs_flags = true; + lines.push(format!( + "panic!({});", + rust_string_literal(&format!("control-flow op in linear IR: br_cond {cond}")) + )); + } + Op::Call { target } => { + let call_line = render_call_line(target); + lines.push(call_line); + } + Op::Syscall { name, args } => { + for arg in args { + track_reg(regs, arg); + } + let behavior = config + .stubs + .get(name) + .copied() + .unwrap_or(StubBehavior::Panic); + let call = render_syscall(name, behavior, args); + lines.push(call); + } + Op::Ret => { + lines.push("return Ok(());".to_string()); + } + } + Ok(()) +} + +fn translate_terminator( + block: &Block, + needs_flags: &mut bool, +) -> Result { + match &block.terminator { + Terminator::Br { target } => Ok(RustTerminator::Br { + target: target.clone(), + }), + Terminator::BrCond { + cond, + then, + else_target, + } => { + *needs_flags = true; + let cond_expr = render_cond_expr(cond); + Ok(RustTerminator::BrCond { + cond_expr, + cond: cond.clone(), + then_label: then.clone(), + else_label: else_target.clone(), + }) + } + Terminator::Call { target, next } => { + let call_line = render_call_line(target); + Ok(RustTerminator::Call { + call_line, + next: next.clone(), + }) + } + Terminator::BrIndirect { reg } => Ok(RustTerminator::BrIndirect { reg: reg.clone() }), + Terminator::Ret => Ok(RustTerminator::Ret), + } +} + +fn render_call_line(target: &str) -> String { + if is_rust_ident(target) { + format!("{target}()?;") + } else { + format!( + "panic!({});", + rust_string_literal(&format!("unsupported call target: {target}")) + ) + } +} + +fn render_cond_expr(cond: &str) -> Option { + let expr = match cond { + "eq" => "flag_z", + "ne" => "!flag_z", + "cs" | "hs" => "flag_c", + "cc" | "lo" => "!flag_c", + "mi" => "flag_n", + "pl" => "!flag_n", + "vs" => "flag_v", + "vc" => "!flag_v", + "hi" => "flag_c && !flag_z", + "ls" => "!flag_c || flag_z", + "ge" => "flag_n == flag_v", + "lt" => "flag_n != flag_v", + "gt" => "!flag_z && (flag_n == flag_v)", + "le" => "flag_z || (flag_n != flag_v)", + "al" => "true", + _ => return None, + }; + Some(expr.to_string()) +} + +fn is_rust_ident(name: &str) -> bool { + let mut chars = name.chars(); + let Some(first) = chars.next() else { + return false; + }; + if !(first == '_' || first.is_ascii_alphabetic()) { + return false; + } + chars.all(|ch| ch == '_' || ch.is_ascii_alphanumeric()) +} + +fn rust_string_literal(value: &str) -> String { + format!("{value:?}") +} + #[derive(Debug)] pub struct RustFunction { pub name: String, pub regs: Vec, + pub needs_flags: bool, + pub body: FunctionBody, +} + +#[derive(Debug)] +pub enum FunctionBody { + Linear(Vec), + Blocks(Vec), +} + +#[derive(Debug)] +pub struct RustBlock { + pub label: String, pub lines: Vec, + pub terminator: RustTerminator, +} + +#[derive(Debug)] +pub enum RustTerminator { + Br { + target: String, + }, + BrCond { + cond_expr: Option, + cond: String, + then_label: String, + else_label: String, + }, + Call { + call_line: String, + next: String, + }, + BrIndirect { + reg: String, + }, + Ret, } #[derive(Debug)] diff --git a/crates/recomp-pipeline/src/provenance.rs b/crates/recomp-pipeline/src/provenance.rs index edd8c35..ee369e1 100644 --- a/crates/recomp-pipeline/src/provenance.rs +++ b/crates/recomp-pipeline/src/provenance.rs @@ -1,7 +1,6 @@ use serde::Deserialize; use sha2::{Digest, Sha256}; use std::fs; -use std::io::Read; use std::path::{Path, PathBuf}; const SCHEMA_VERSION: &str = "1"; @@ -245,24 +244,33 @@ pub fn detect_format(path: &Path) -> Result { } } - let mut file = - fs::File::open(path).map_err(|err| format!("failed to open {}: {err}", path.display()))?; - let mut magic = [0u8; 4]; - file.read_exact(&mut magic) - .map_err(|err| format!("failed to read {}: {err}", path.display()))?; - - match &magic { + let bytes = + fs::read(path).map_err(|err| format!("failed to read {}: {err}", path.display()))?; + if bytes.len() < 4 { + return Err(format!( + "unsupported input format (too small: {} bytes) for {}", + bytes.len(), + path.display() + )); + } + let magic = &bytes[0..4]; + match magic { b"NCA3" | b"NCA2" => Ok(InputFormat::Nca), b"PFS0" => Ok(InputFormat::Exefs), b"NSO0" => Ok(InputFormat::Nso0), b"NRO0" => Ok(InputFormat::Nro0), b"NRR0" => Ok(InputFormat::Nrr0), b"META" | b"NPDM" => Ok(InputFormat::Npdm), - _ => Err(format!( - "unsupported input format (magic {:?}) for {}", - magic, - path.display() - )), + _ => { + if bytes.len() >= 0x14 && &bytes[0x10..0x14] == b"NRO0" { + return Ok(InputFormat::Nro0); + } + Err(format!( + "unsupported input format (magic {:?}) for {}", + magic, + path.display() + )) + } } } diff --git a/crates/recomp-pipeline/tests/homebrew_intake.rs b/crates/recomp-pipeline/tests/homebrew_intake.rs new file mode 100644 index 0000000..5575f54 --- /dev/null +++ b/crates/recomp-pipeline/tests/homebrew_intake.rs @@ -0,0 +1,362 @@ +use recomp_pipeline::homebrew::{intake_homebrew, IntakeOptions}; +use sha2::{Digest, Sha256}; +use std::fs; +use std::path::{Path, PathBuf}; +use tempfile::tempdir; + +fn sha256_hex(bytes: &[u8]) -> String { + let mut hasher = Sha256::new(); + hasher.update(bytes); + let digest = hasher.finalize(); + let mut out = String::with_capacity(digest.len() * 2); + for byte in digest { + use std::fmt::Write; + let _ = write!(&mut out, "{byte:02x}"); + } + out +} + +fn write_u32(bytes: &mut [u8], offset: usize, value: u32) { + bytes[offset..offset + 4].copy_from_slice(&value.to_le_bytes()); +} + +fn write_u64(bytes: &mut [u8], offset: usize, value: u64) { + bytes[offset..offset + 8].copy_from_slice(&value.to_le_bytes()); +} + +fn align_up(value: usize, align: usize) -> usize { + value.div_ceil(align) * align +} + +fn build_romfs_image() -> Vec { + let file_root = b"HELLO"; + let file_nested = b"NESTED"; + let nested_dir = "data"; + let root_name = ""; + + let root_entry_size = align_up(0x18 + root_name.len(), 4); + let nested_entry_off = root_entry_size as u32; + let nested_entry_size = align_up(0x18 + nested_dir.len(), 4); + let dir_table_size = root_entry_size + nested_entry_size; + + let file_root_name = "hello.txt"; + let file_nested_name = "nested.bin"; + let file_root_entry_size = align_up(0x20 + file_root_name.len(), 4); + let file_nested_off = file_root_entry_size as u32; + let file_nested_entry_size = align_up(0x20 + file_nested_name.len(), 4); + let file_table_size = file_root_entry_size + file_nested_entry_size; + + let file_root_data_off = 0u64; + let file_nested_data_off = align_up(file_root.len(), 0x10) as u64; + let mut file_data = Vec::new(); + file_data.extend_from_slice(file_root); + let padding = align_up(file_data.len(), 0x10) - file_data.len(); + file_data.extend(std::iter::repeat_n(0u8, padding)); + file_data.extend_from_slice(file_nested); + + let mut dir_table = Vec::new(); + push_dir_entry( + &mut dir_table, + 0xFFFF_FFFF, + 0xFFFF_FFFF, + nested_entry_off, + 0, + 0xFFFF_FFFF, + root_name, + ); + push_dir_entry( + &mut dir_table, + 0, + 0xFFFF_FFFF, + 0xFFFF_FFFF, + file_nested_off, + 0xFFFF_FFFF, + nested_dir, + ); + assert_eq!(dir_table.len(), dir_table_size); + + let mut file_table = Vec::new(); + push_file_entry( + &mut file_table, + 0, + 0xFFFF_FFFF, + file_root_data_off, + file_root.len() as u64, + 0xFFFF_FFFF, + file_root_name, + ); + push_file_entry( + &mut file_table, + nested_entry_off, + 0xFFFF_FFFF, + file_nested_data_off, + file_nested.len() as u64, + 0xFFFF_FFFF, + file_nested_name, + ); + assert_eq!(file_table.len(), file_table_size); + + let header_size = 0x50usize; + let dir_table_off = align_up(header_size, 0x10); + let file_table_off = align_up(dir_table_off + dir_table_size, 0x10); + let file_data_off = align_up(file_table_off + file_table_size, 0x10); + let total_size = file_data_off + file_data.len(); + + let mut image = vec![0u8; total_size]; + write_u64(&mut image, 0x0, 0x50); + write_u64(&mut image, 0x8, dir_table_off as u64); + write_u64(&mut image, 0x10, 0); + write_u64(&mut image, 0x18, dir_table_off as u64); + write_u64(&mut image, 0x20, dir_table_size as u64); + write_u64(&mut image, 0x28, file_table_off as u64); + write_u64(&mut image, 0x30, 0); + write_u64(&mut image, 0x38, file_table_off as u64); + write_u64(&mut image, 0x40, file_table_size as u64); + write_u64(&mut image, 0x48, file_data_off as u64); + + image[dir_table_off..dir_table_off + dir_table_size].copy_from_slice(&dir_table); + image[file_table_off..file_table_off + file_table_size].copy_from_slice(&file_table); + image[file_data_off..file_data_off + file_data.len()].copy_from_slice(&file_data); + + image +} + +fn push_dir_entry( + buf: &mut Vec, + parent: u32, + sibling: u32, + child_dir: u32, + child_file: u32, + next_hash: u32, + name: &str, +) -> u32 { + let offset = buf.len() as u32; + buf.extend_from_slice(&parent.to_le_bytes()); + buf.extend_from_slice(&sibling.to_le_bytes()); + buf.extend_from_slice(&child_dir.to_le_bytes()); + buf.extend_from_slice(&child_file.to_le_bytes()); + buf.extend_from_slice(&next_hash.to_le_bytes()); + buf.extend_from_slice(&(name.len() as u32).to_le_bytes()); + buf.extend_from_slice(name.as_bytes()); + while buf.len() % 4 != 0 { + buf.push(0); + } + offset +} + +fn push_file_entry( + buf: &mut Vec, + parent: u32, + sibling: u32, + data_off: u64, + data_size: u64, + next_hash: u32, + name: &str, +) -> u32 { + let offset = buf.len() as u32; + buf.extend_from_slice(&parent.to_le_bytes()); + buf.extend_from_slice(&sibling.to_le_bytes()); + buf.extend_from_slice(&data_off.to_le_bytes()); + buf.extend_from_slice(&data_size.to_le_bytes()); + buf.extend_from_slice(&next_hash.to_le_bytes()); + buf.extend_from_slice(&(name.len() as u32).to_le_bytes()); + buf.extend_from_slice(name.as_bytes()); + while buf.len() % 4 != 0 { + buf.push(0); + } + offset +} + +fn build_nro(path: &Path, with_assets: bool) -> Vec { + let header_size = 0x80usize; + let text = b"TEXT"; + let rodata = b"RODT"; + let data = b"DATA"; + let text_off = header_size as u32; + let ro_off = text_off + text.len() as u32; + let data_off = ro_off + rodata.len() as u32; + + let nro_size = header_size + text.len() + rodata.len() + data.len(); + let mut bytes = vec![0u8; nro_size]; + + bytes[0x10..0x14].copy_from_slice(b"NRO0"); + write_u32(&mut bytes, 0x18, nro_size as u32); + write_u32(&mut bytes, 0x20, text_off); + write_u32(&mut bytes, 0x24, text.len() as u32); + write_u32(&mut bytes, 0x28, ro_off); + write_u32(&mut bytes, 0x2C, rodata.len() as u32); + write_u32(&mut bytes, 0x30, data_off); + write_u32(&mut bytes, 0x34, data.len() as u32); + write_u32(&mut bytes, 0x38, 0x20); + + let build_id = [0xABu8; 0x20]; + bytes[0x40..0x60].copy_from_slice(&build_id); + + bytes[text_off as usize..text_off as usize + text.len()].copy_from_slice(text); + bytes[ro_off as usize..ro_off as usize + rodata.len()].copy_from_slice(rodata); + bytes[data_off as usize..data_off as usize + data.len()].copy_from_slice(data); + + if with_assets { + let asset_base = bytes.len(); + let icon = b"ICON"; + let nacp = vec![0x11u8; 0x4000]; + let romfs = build_romfs_image(); + let asset_header_size = 0x38usize; + let icon_offset = asset_header_size as u64; + let nacp_offset = icon_offset + icon.len() as u64; + let romfs_offset = nacp_offset + nacp.len() as u64; + let total = asset_base + asset_header_size + icon.len() + nacp.len() + romfs.len(); + bytes.resize(total, 0u8); + + bytes[asset_base..asset_base + 4].copy_from_slice(b"ASET"); + write_u64(&mut bytes, asset_base + 0x8, icon_offset); + write_u64(&mut bytes, asset_base + 0x10, icon.len() as u64); + write_u64(&mut bytes, asset_base + 0x18, nacp_offset); + write_u64(&mut bytes, asset_base + 0x20, nacp.len() as u64); + write_u64(&mut bytes, asset_base + 0x28, romfs_offset); + write_u64(&mut bytes, asset_base + 0x30, romfs.len() as u64); + + let icon_start = asset_base + icon_offset as usize; + bytes[icon_start..icon_start + icon.len()].copy_from_slice(icon); + let nacp_start = asset_base + nacp_offset as usize; + bytes[nacp_start..nacp_start + nacp.len()].copy_from_slice(&nacp); + let romfs_start = asset_base + romfs_offset as usize; + bytes[romfs_start..romfs_start + romfs.len()].copy_from_slice(&romfs); + } + + fs::write(path, &bytes).expect("write NRO"); + bytes +} + +fn build_nso(path: &Path) -> Vec { + let header_size = 0x100usize; + let text = b"TEXTDATA"; + let rodata = b"RO"; + let data = b"DATA"; + let compressed_text = lz4_flex::block::compress(text); + + let text_off = header_size as u32; + let ro_off = text_off + compressed_text.len() as u32; + let data_off = ro_off + rodata.len() as u32; + let total = header_size + compressed_text.len() + rodata.len() + data.len(); + let mut bytes = vec![0u8; total]; + + bytes[0x0..0x4].copy_from_slice(b"NSO0"); + write_u32(&mut bytes, 0x8, 0x1); + write_u32(&mut bytes, 0x10, text_off); + write_u32(&mut bytes, 0x14, 0); + write_u32(&mut bytes, 0x18, text.len() as u32); + write_u32(&mut bytes, 0x20, ro_off); + write_u32(&mut bytes, 0x24, 0x1000); + write_u32(&mut bytes, 0x28, rodata.len() as u32); + write_u32(&mut bytes, 0x30, data_off); + write_u32(&mut bytes, 0x34, 0x2000); + write_u32(&mut bytes, 0x38, data.len() as u32); + write_u32(&mut bytes, 0x3C, 0x40); + + let module_id = [0xCDu8; 0x20]; + bytes[0x40..0x60].copy_from_slice(&module_id); + write_u32(&mut bytes, 0x60, compressed_text.len() as u32); + write_u32(&mut bytes, 0x64, rodata.len() as u32); + write_u32(&mut bytes, 0x68, data.len() as u32); + + bytes[text_off as usize..text_off as usize + compressed_text.len()] + .copy_from_slice(&compressed_text); + let ro_start = ro_off as usize; + bytes[ro_start..ro_start + rodata.len()].copy_from_slice(rodata); + let data_start = data_off as usize; + bytes[data_start..data_start + data.len()].copy_from_slice(data); + + fs::write(path, &bytes).expect("write NSO"); + bytes +} + +fn write_provenance(path: &Path, entries: Vec<(PathBuf, &str, &[u8])>) { + let mut inputs = String::new(); + for (entry_path, format, bytes) in entries { + let sha = sha256_hex(bytes); + let size = bytes.len(); + inputs.push_str(&format!( + "[[inputs]]\npath = \"{}\"\nsha256 = \"{}\"\nsize = {}\nformat = \"{}\"\n\n", + entry_path.display(), + sha, + size, + format + )); + } + + let toml = format!( + "schema_version = \"1\"\n\n[title]\nname = \"Test\"\ntitle_id = \"0100000000000000\"\nversion = \"1.0.0\"\nregion = \"US\"\n\n[collection]\ndevice = \"Switch\"\ncollected_at = \"2024-01-01\"\n\n[collection.tool]\nname = \"collector\"\nversion = \"0.1\"\n\n{}", + inputs + ); + fs::write(path, toml).expect("write provenance"); +} + +#[test] +fn intake_homebrew_extracts_assets_and_segments() { + let dir = tempdir().expect("tempdir"); + let nro_path = dir.path().join("main.nro"); + let nro_bytes = build_nro(&nro_path, true); + let provenance_path = dir.path().join("provenance.toml"); + write_provenance( + &provenance_path, + vec![(nro_path.clone(), "nro0", &nro_bytes)], + ); + + let out_dir = dir.path().join("out"); + let report = intake_homebrew(IntakeOptions { + module_path: nro_path, + nso_paths: Vec::new(), + provenance_path, + out_dir: out_dir.clone(), + }) + .expect("intake homebrew"); + + assert!(report.module_json_path.exists()); + assert!(report.manifest_path.exists()); + assert!(out_dir.join("segments/main/text.bin").exists()); + assert!(out_dir.join("assets/icon.bin").exists()); + assert!(out_dir.join("assets/control.nacp").exists()); + assert!(out_dir.join("assets/romfs/hello.txt").exists()); + assert!(out_dir.join("assets/romfs/data/nested.bin").exists()); + + let manifest = fs::read_to_string(report.manifest_path).expect("read manifest"); + assert!(manifest.contains("control.nacp")); + assert!(manifest.contains("romfs/hello.txt")); + assert!(manifest.contains("romfs/data/nested.bin")); +} + +#[test] +fn intake_homebrew_handles_nso_segments() { + let dir = tempdir().expect("tempdir"); + let nro_path = dir.path().join("main.nro"); + let nro_bytes = build_nro(&nro_path, false); + let nso_path = dir.path().join("mod.nso"); + let nso_bytes = build_nso(&nso_path); + + let provenance_path = dir.path().join("provenance.toml"); + write_provenance( + &provenance_path, + vec![ + (nro_path.clone(), "nro0", &nro_bytes), + (nso_path.clone(), "nso0", &nso_bytes), + ], + ); + + let out_dir = dir.path().join("out"); + let report = intake_homebrew(IntakeOptions { + module_path: nro_path.clone(), + nso_paths: vec![nso_path.clone()], + provenance_path, + out_dir: out_dir.clone(), + }) + .expect("intake homebrew"); + + let nso_text = fs::read(out_dir.join("segments/mod/text.bin")).expect("read text"); + let nso_data = fs::read(out_dir.join("segments/mod/data.bin")).expect("read data"); + assert_eq!(nso_text, b"TEXTDATA"); + assert_eq!(nso_data, b"DATA"); + + let module_json = fs::read_to_string(report.module_json_path).expect("read module.json"); + assert!(module_json.contains("\"format\": \"nso\"")); +} diff --git a/crates/recomp-pipeline/tests/homebrew_lift.rs b/crates/recomp-pipeline/tests/homebrew_lift.rs new file mode 100644 index 0000000..b1d0a26 --- /dev/null +++ b/crates/recomp-pipeline/tests/homebrew_lift.rs @@ -0,0 +1,603 @@ +use recomp_pipeline::homebrew::{lift_homebrew, LiftMode, LiftOptions}; +use serde_json::Value; +use std::fs; + +#[test] +fn homebrew_lift_emits_stub_module() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + fs::write(&segment_path, [0_u8; 16]).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": 16, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": 16, + "memory_offset": 0, + "memory_size": 16, + "permissions": "r-x", + "output_path": "{}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + let report = lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Stub, + }) + .expect("lift homebrew"); + + assert_eq!(report.functions_emitted, 1); + assert!(!report.warnings.is_empty()); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + assert_eq!( + lifted_json.get("arch").and_then(|v| v.as_str()), + Some("aarch64") + ); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + assert_eq!(functions.len(), 1); + assert_eq!( + functions[0].get("name").and_then(|v| v.as_str()), + Some("entry") + ); + let ops = functions[0] + .get("ops") + .and_then(|v| v.as_array()) + .expect("ops"); + assert_eq!(ops.len(), 1); + assert_eq!(ops[0].get("op").and_then(|v| v.as_str()), Some("ret")); +} + +#[test] +fn homebrew_lift_decodes_minimal_block() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let words = [movz_x(0, 7), movz_x(1, 35), add_reg_x(2, 0, 1), ret_x(30)]; + let mut bytes = Vec::new(); + for word in words { + bytes.extend_from_slice(&word.to_le_bytes()); + } + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + let report = lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect("lift homebrew decode"); + + assert_eq!(report.functions_emitted, 1); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + let blocks = functions[0] + .get("blocks") + .and_then(|v| v.as_array()) + .expect("blocks"); + assert_eq!(blocks.len(), 1); + let ops = blocks[0] + .get("ops") + .and_then(|v| v.as_array()) + .expect("block ops"); + assert!(ops.len() >= 3); + assert_eq!(ops[0].get("op").and_then(|v| v.as_str()), Some("const_i64")); + assert_eq!(ops[1].get("op").and_then(|v| v.as_str()), Some("const_i64")); + assert_eq!(ops[2].get("op").and_then(|v| v.as_str()), Some("add_i64")); + let terminator = blocks[0] + .get("terminator") + .and_then(|v| v.get("op")) + .and_then(|v| v.as_str()); + assert_eq!(terminator, Some("ret")); +} + +fn movz_x(dst: u8, imm: u16) -> u32 { + let hw = 0_u32; + let sf = 1_u32; + let opc = 0b10_u32; + let fixed = 0b100101_u32; + (sf << 31) | (opc << 29) | (fixed << 23) | (hw << 21) | ((imm as u32) << 5) | (dst as u32) +} + +fn add_reg_x(dst: u8, lhs: u8, rhs: u8) -> u32 { + let sf = 1_u32; + let op = 0_u32; + let s = 0_u32; + let opcode = 0b01011_u32; + let shift = 0_u32; + let imm6 = 0_u32; + (sf << 31) + | (op << 30) + | (s << 29) + | (opcode << 24) + | (shift << 22) + | (imm6 << 10) + | ((rhs as u32) << 16) + | ((lhs as u32) << 5) + | (dst as u32) +} + +fn ret_x(reg: u8) -> u32 { + 0xD65F0000 | ((reg as u32) << 5) +} + +#[test] +fn homebrew_lift_discovers_call_targets() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let mut bytes = vec![0_u8; 0x24]; + let bl = bl_to(0x0, 0x20); + bytes[0..4].copy_from_slice(&bl.to_le_bytes()); + let ret0 = ret_x(30); + bytes[4..8].copy_from_slice(&ret0.to_le_bytes()); + let ret1 = ret_x(30); + bytes[0x20..0x24].copy_from_slice(&ret1.to_le_bytes()); + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + let report = lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect("lift homebrew decode"); + + assert_eq!(report.functions_emitted, 2); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + assert_eq!(functions.len(), 2); + let names: Vec<_> = functions + .iter() + .filter_map(|func| func.get("name").and_then(|v| v.as_str())) + .collect(); + assert!(names.contains(&"entry")); + assert!(names.contains(&"fn_0000000000000020")); +} + +#[test] +fn homebrew_lift_builds_conditional_blocks() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let cbz = cbz_to(0x0, 0x8, 0); + let ret = ret_x(30); + let mut bytes = Vec::new(); + bytes.extend_from_slice(&cbz.to_le_bytes()); + bytes.extend_from_slice(&ret.to_le_bytes()); + bytes.extend_from_slice(&ret.to_le_bytes()); + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + let report = lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect("lift homebrew decode"); + + assert_eq!(report.functions_emitted, 1); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + let blocks = functions[0] + .get("blocks") + .and_then(|v| v.as_array()) + .expect("blocks"); + assert!(blocks.len() >= 2); + let first_term = blocks[0] + .get("terminator") + .and_then(|v| v.get("op")) + .and_then(|v| v.as_str()); + assert_eq!(first_term, Some("br_cond")); +} + +#[test] +fn homebrew_lift_handles_indirect_branch() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let br = br_x(1); + let mut bytes = Vec::new(); + bytes.extend_from_slice(&br.to_le_bytes()); + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect("lift homebrew decode"); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + let blocks = functions[0] + .get("blocks") + .and_then(|v| v.as_array()) + .expect("blocks"); + let term = blocks[0] + .get("terminator") + .and_then(|v| v.get("op")) + .and_then(|v| v.as_str()); + assert_eq!(term, Some("br_indirect")); +} + +#[test] +fn homebrew_lift_decodes_load_store_ops() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let ldr = ldr_x_imm(0, 1, 16); + let str = str_x_imm(0, 1, 16); + let ret = ret_x(30); + let mut bytes = Vec::new(); + bytes.extend_from_slice(&ldr.to_le_bytes()); + bytes.extend_from_slice(&str.to_le_bytes()); + bytes.extend_from_slice(&ret.to_le_bytes()); + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect("lift homebrew decode"); + + let lifted_path = out_dir.join("module.json"); + let lifted_src = fs::read_to_string(lifted_path).expect("read lifted module"); + let lifted_json: Value = serde_json::from_str(&lifted_src).expect("parse lifted module"); + let functions = lifted_json + .get("functions") + .and_then(|v| v.as_array()) + .expect("functions"); + let ops = functions[0] + .get("blocks") + .and_then(|v| v.as_array()) + .expect("blocks")[0] + .get("ops") + .and_then(|v| v.as_array()) + .expect("ops"); + assert_eq!(ops[0].get("op").and_then(|v| v.as_str()), Some("load_i64")); + assert_eq!(ops[1].get("op").and_then(|v| v.as_str()), Some("store_i64")); +} + +#[test] +fn homebrew_lift_rejects_oversized_block() { + let temp = tempfile::tempdir().expect("tempdir"); + let base_dir = temp.path(); + let segment_dir = base_dir.join("segments/sample"); + fs::create_dir_all(&segment_dir).expect("segment dir"); + let segment_path = segment_dir.join("text.bin"); + + let nop = 0xD503_201F_u32; + let mut bytes = Vec::new(); + for _ in 0..10_001 { + bytes.extend_from_slice(&nop.to_le_bytes()); + } + fs::write(&segment_path, &bytes).expect("segment data"); + + let module_json = format!( + r#"{{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + {{ + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": {input_size}, + "build_id": "deadbeef", + "segments": [ + {{ + "name": "text", + "file_offset": 0, + "file_size": {file_size}, + "memory_offset": 0, + "memory_size": {file_size}, + "permissions": "r-x", + "output_path": "{path}" + }} + ], + "bss": {{ "size": 0, "memory_offset": 0 }} + }} + ] +}}"#, + input_size = bytes.len(), + file_size = bytes.len(), + path = segment_path.strip_prefix(base_dir).unwrap().display() + ); + + let module_path = base_dir.join("module.json"); + fs::write(&module_path, module_json).expect("write module.json"); + + let out_dir = base_dir.join("lifted"); + let err = lift_homebrew(LiftOptions { + module_json_path: module_path, + out_dir: out_dir.clone(), + entry_name: "entry".to_string(), + mode: LiftMode::Decode, + }) + .expect_err("expected decode failure"); + + assert!(err.contains("block decode limit exceeded")); +} + +fn bl_to(from: u64, target: u64) -> u32 { + let delta = target as i64 - from as i64; + let imm26 = (delta >> 2) & 0x03FF_FFFF; + 0x9400_0000 | (imm26 as u32) +} + +fn cbz_to(from: u64, target: u64, reg: u8) -> u32 { + let delta = target as i64 - from as i64; + let imm19 = (delta >> 2) & 0x7FFFF; + 0xB400_0000 | ((imm19 as u32) << 5) | (reg as u32) +} + +fn br_x(reg: u8) -> u32 { + 0xD61F_0000 | ((reg as u32) << 5) +} + +fn ldr_x_imm(rt: u8, rn: u8, offset: u64) -> u32 { + let size = 3_u32; + let imm12 = (offset >> size) as u32; + 0x3900_0000 | (size << 30) | (1 << 22) | (imm12 << 10) | ((rn as u32) << 5) | (rt as u32) +} + +fn str_x_imm(rt: u8, rn: u8, offset: u64) -> u32 { + let size = 3_u32; + let imm12 = (offset >> size) as u32; + 0x3900_0000 | (size << 30) | (imm12 << 10) | ((rn as u32) << 5) | (rt as u32) +} diff --git a/crates/recomp-pipeline/tests/pipeline.rs b/crates/recomp-pipeline/tests/pipeline.rs index dba1dc9..fa7d791 100644 --- a/crates/recomp-pipeline/tests/pipeline.rs +++ b/crates/recomp-pipeline/tests/pipeline.rs @@ -109,6 +109,46 @@ fn pipeline_emits_project() { assert_eq!(report.detected_inputs.len(), 2); } +#[test] +fn pipeline_rejects_homebrew_module_json() { + let temp = tempfile::tempdir().expect("tempdir"); + let module_path = temp.path().join("module.json"); + let config_path = temp.path().join("title.toml"); + let provenance_path = temp.path().join("provenance.toml"); + let out_dir = temp.path().join("out"); + let runtime_path = PathBuf::from("../crates/recomp-runtime"); + + let homebrew_module = r#"{ + "schema_version": "1", + "module_type": "homebrew", + "modules": [ + { + "name": "sample", + "format": "nro", + "input_path": "module.nro", + "input_sha256": "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + "input_size": 4, + "build_id": "deadbeef", + "segments": [], + "bss": { "size": 0, "memory_offset": 0 } + } + ] +}"#; + + fs::write(&module_path, homebrew_module).expect("write module"); + let err = run_pipeline(PipelineOptions { + module_path, + config_path, + provenance_path, + out_dir, + runtime_path, + }) + .expect_err("pipeline rejects homebrew module.json"); + + let message = err.to_string(); + assert!(message.contains("homebrew module.json detected")); +} + fn sha256_hex(bytes: &[u8]) -> String { let mut hasher = Sha256::new(); hasher.update(bytes); diff --git a/crates/recomp-runtime/Cargo.toml b/crates/recomp-runtime/Cargo.toml index 12fd3e3..0b9f63b 100644 --- a/crates/recomp-runtime/Cargo.toml +++ b/crates/recomp-runtime/Cargo.toml @@ -8,4 +8,6 @@ license = "MIT OR Apache-2.0" recomp-gfx = { path = "../recomp-gfx" } recomp-services = { path = "../recomp-services" } recomp-timing = { path = "../recomp-timing" } +serde = { version = "1.0", features = ["derive"] } +serde_json = "1.0" thiserror = "1.0" diff --git a/crates/recomp-runtime/src/homebrew.rs b/crates/recomp-runtime/src/homebrew.rs new file mode 100644 index 0000000..13b54c1 --- /dev/null +++ b/crates/recomp-runtime/src/homebrew.rs @@ -0,0 +1,455 @@ +use crate::RuntimeError; +use recomp_services::StubBehavior; +use serde::Serialize; +use std::collections::BTreeSet; + +pub const NRO_ENTRY_X1: u64 = 0xFFFF_FFFF_FFFF_FFFF; + +#[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord, Hash, Serialize)] +#[serde(rename_all = "snake_case")] +#[repr(u32)] +pub enum LoaderConfigKey { + EndOfList, + MainThreadHandle, + AppletType, + Argv, + OverrideHeap, + AllocPages, + LockRegion, +} + +impl LoaderConfigKey { + pub fn supported_keys() -> Vec { + vec![ + LoaderConfigKey::EndOfList, + LoaderConfigKey::MainThreadHandle, + LoaderConfigKey::AppletType, + LoaderConfigKey::Argv, + LoaderConfigKey::OverrideHeap, + LoaderConfigKey::AllocPages, + LoaderConfigKey::LockRegion, + ] + } +} + +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +#[repr(C)] +pub struct LoaderConfigEntry { + pub key: LoaderConfigKey, + pub flags: u32, + pub values: [u64; 2], +} + +impl LoaderConfigEntry { + pub fn new(key: LoaderConfigKey, value0: u64, value1: u64, flags: u32) -> Self { + Self { + key, + flags, + values: [value0, value1], + } + } +} + +#[derive(Debug, Clone)] +pub struct LoaderConfig { + entries: Vec, +} + +impl LoaderConfig { + pub fn entries(&self) -> &[LoaderConfigEntry] { + &self.entries + } + + pub fn entry_ptr(&self) -> *const LoaderConfigEntry { + self.entries.as_ptr() + } + + pub fn provided_keys(&self) -> Vec { + let present: BTreeSet = + self.entries.iter().map(|entry| entry.key).collect(); + LoaderConfigKey::supported_keys() + .into_iter() + .filter(|key| present.contains(key)) + .collect() + } +} + +#[derive(Debug, Default)] +pub struct LoaderConfigBuilder { + entries: Vec, +} + +impl LoaderConfigBuilder { + pub fn new() -> Self { + Self::default() + } + + pub fn main_thread_handle(mut self, handle: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::MainThreadHandle, + handle, + 0, + 0, + )); + self + } + + pub fn applet_type(mut self, applet_type: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::AppletType, + applet_type, + 0, + 0, + )); + self + } + + pub fn argv(mut self, argv_ptr: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::Argv, + argv_ptr, + 0, + 0, + )); + self + } + + pub fn override_heap(mut self, heap_ptr: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::OverrideHeap, + heap_ptr, + 0, + 0, + )); + self + } + + pub fn alloc_pages(mut self, page_count: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::AllocPages, + page_count, + 0, + 0, + )); + self + } + + pub fn lock_region(mut self, region_ptr: u64) -> Self { + self.entries.push(LoaderConfigEntry::new( + LoaderConfigKey::LockRegion, + region_ptr, + 0, + 0, + )); + self + } + + pub fn build(mut self) -> Result { + let present: BTreeSet = + self.entries.iter().map(|entry| entry.key).collect(); + if !present.contains(&LoaderConfigKey::MainThreadHandle) { + return Err(RuntimeError::MissingLoaderConfigKey { + key: LoaderConfigKey::MainThreadHandle, + }); + } + if !present.contains(&LoaderConfigKey::AppletType) { + return Err(RuntimeError::MissingLoaderConfigKey { + key: LoaderConfigKey::AppletType, + }); + } + + self.entries + .retain(|entry| entry.key != LoaderConfigKey::EndOfList); + self.entries + .push(LoaderConfigEntry::new(LoaderConfigKey::EndOfList, 0, 0, 0)); + + Ok(LoaderConfig { + entries: self.entries, + }) + } +} + +pub type NroEntrypoint = unsafe extern "C" fn(*const LoaderConfigEntry, u64) -> i32; + +pub fn entrypoint_shim(entry: NroEntrypoint, loader_config: &LoaderConfig) -> i32 { + unsafe { entry(loader_config.entry_ptr(), NRO_ENTRY_X1) } +} + +#[derive(Debug, Clone)] +pub struct ServiceStub { + pub name: String, + pub behavior: StubBehavior, +} + +#[derive(Debug, Clone, Serialize)] +pub struct RuntimeManifest { + pub abi_version: String, + pub loader_config: LoaderConfigManifest, + pub services: Vec, + pub determinism: DeterminismManifest, +} + +#[derive(Debug, Clone, Serialize)] +pub struct LoaderConfigManifest { + pub supported_keys: Vec, + pub provided_keys: Vec, +} + +#[derive(Debug, Clone, Serialize)] +pub struct ServiceStubManifest { + pub name: String, + pub behavior: String, +} + +#[derive(Debug, Clone, Serialize)] +pub struct DeterminismManifest { + pub time_source: String, + pub input_source: String, +} + +impl RuntimeManifest { + pub fn new( + abi_version: &str, + loader_config: &LoaderConfig, + service_stubs: &[ServiceStub], + ) -> Self { + let services = service_stubs + .iter() + .map(|stub| ServiceStubManifest { + name: stub.name.clone(), + behavior: format!("{:?}", stub.behavior).to_ascii_lowercase(), + }) + .collect(); + Self { + abi_version: abi_version.to_string(), + loader_config: LoaderConfigManifest { + supported_keys: LoaderConfigKey::supported_keys(), + provided_keys: loader_config.provided_keys(), + }, + services, + determinism: DeterminismManifest { + time_source: "deterministic".to_string(), + input_source: "deterministic".to_string(), + }, + } + } + + pub fn to_json(&self) -> Result { + serde_json::to_string_pretty(self).map_err(|err| RuntimeError::ManifestSerialize { + message: err.to_string(), + }) + } +} + +#[derive(Debug, Clone, PartialEq, Eq)] +pub struct InputEvent { + pub time: u64, + pub code: u32, + pub value: i32, +} + +#[derive(Debug, Default)] +pub struct DeterministicClock { + time: u64, +} + +impl DeterministicClock { + pub fn new(start: u64) -> Self { + Self { time: start } + } + + pub fn now(&self) -> u64 { + self.time + } + + pub fn advance(&mut self, delta: u64) -> u64 { + self.time = self.time.saturating_add(delta); + self.time + } + + pub fn set(&mut self, time: u64) { + self.time = time; + } +} + +#[derive(Debug, Default)] +pub struct InputQueue { + next_id: u64, + pending: Vec, +} + +#[derive(Debug, Clone)] +struct QueuedInput { + id: u64, + event: InputEvent, +} + +impl InputQueue { + pub fn new() -> Self { + Self::default() + } + + pub fn push(&mut self, event: InputEvent) -> u64 { + let id = self.next_id; + self.next_id += 1; + self.pending.push(QueuedInput { id, event }); + id + } + + pub fn drain_ready(&mut self, time: u64) -> Vec { + self.pending.sort_by(|a, b| { + a.event + .time + .cmp(&b.event.time) + .then_with(|| a.id.cmp(&b.id)) + }); + + let mut ready = Vec::new(); + let mut remaining = Vec::new(); + for queued in self.pending.drain(..) { + if queued.event.time <= time { + ready.push(queued.event); + } else { + remaining.push(queued); + } + } + self.pending = remaining; + ready + } + + pub fn pending(&self) -> usize { + self.pending.len() + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn builder_requires_main_thread_handle() { + let err = LoaderConfigBuilder::new() + .applet_type(1) + .build() + .unwrap_err(); + assert!(matches!( + err, + RuntimeError::MissingLoaderConfigKey { + key: LoaderConfigKey::MainThreadHandle + } + )); + } + + #[test] + fn builder_requires_applet_type() { + let err = LoaderConfigBuilder::new() + .main_thread_handle(5) + .build() + .unwrap_err(); + assert!(matches!( + err, + RuntimeError::MissingLoaderConfigKey { + key: LoaderConfigKey::AppletType + } + )); + } + + #[test] + fn builder_appends_end_of_list_last() { + let config = LoaderConfigBuilder::new() + .main_thread_handle(2) + .applet_type(3) + .argv(99) + .build() + .expect("build loader config"); + let entries = config.entries(); + assert_eq!(entries.last().unwrap().key, LoaderConfigKey::EndOfList); + } + + #[test] + fn entrypoint_shim_passes_expected_registers() { + use std::sync::{Mutex, OnceLock}; + + #[derive(Default)] + struct Seen { + x1: u64, + ptr: usize, + } + + static SEEN: OnceLock> = OnceLock::new(); + unsafe extern "C" fn probe(entry: *const LoaderConfigEntry, x1: u64) -> i32 { + let seen = SEEN.get_or_init(|| Mutex::new(Seen::default())); + let mut guard = seen.lock().expect("lock"); + guard.x1 = x1; + guard.ptr = entry as usize; + 0 + } + + let config = LoaderConfigBuilder::new() + .main_thread_handle(9) + .applet_type(1) + .build() + .expect("build loader config"); + let expected_ptr = config.entry_ptr() as usize; + + let result = entrypoint_shim(probe, &config); + assert_eq!(result, 0); + let seen = SEEN.get_or_init(|| Mutex::new(Seen::default())); + let guard = seen.lock().expect("lock"); + assert_eq!(guard.x1, NRO_ENTRY_X1); + assert_eq!(guard.ptr, expected_ptr); + } + + #[test] + fn manifest_includes_provided_keys() { + let config = LoaderConfigBuilder::new() + .main_thread_handle(1) + .applet_type(2) + .argv(3) + .build() + .expect("config"); + let manifest = RuntimeManifest::new( + "0.1.0", + &config, + &[ServiceStub { + name: "svc_stub".to_string(), + behavior: StubBehavior::Panic, + }], + ); + let json = manifest.to_json().expect("serialize manifest"); + assert!(json.contains("provided_keys")); + assert!(json.contains("svc_stub")); + } + + #[test] + fn deterministic_clock_advances() { + let mut clock = DeterministicClock::new(10); + assert_eq!(clock.now(), 10); + clock.advance(5); + assert_eq!(clock.now(), 15); + } + + #[test] + fn input_queue_is_deterministic() { + let mut queue = InputQueue::new(); + queue.push(InputEvent { + time: 5, + code: 1, + value: 0, + }); + queue.push(InputEvent { + time: 3, + code: 2, + value: 1, + }); + queue.push(InputEvent { + time: 5, + code: 3, + value: 2, + }); + let ready = queue.drain_ready(5); + let codes: Vec = ready.into_iter().map(|event| event.code).collect(); + assert_eq!(codes, vec![2, 1, 3]); + assert_eq!(queue.pending(), 0); + } +} diff --git a/crates/recomp-runtime/src/lib.rs b/crates/recomp-runtime/src/lib.rs index f81d2ef..d4e7171 100644 --- a/crates/recomp-runtime/src/lib.rs +++ b/crates/recomp-runtime/src/lib.rs @@ -1,7 +1,13 @@ use std::fmt; +mod homebrew; + pub const ABI_VERSION: &str = "0.1.0"; +pub use homebrew::{ + entrypoint_shim, DeterministicClock, InputEvent, InputQueue, LoaderConfig, LoaderConfigBuilder, + LoaderConfigEntry, LoaderConfigKey, NroEntrypoint, RuntimeManifest, ServiceStub, NRO_ENTRY_X1, +}; pub use recomp_gfx::{CommandStream, GraphicsBackend, GraphicsError, StubBackend}; pub use recomp_services::{ stub_handler, ServiceAccessControl, ServiceCall, ServiceError, ServiceLogger, ServiceRegistry, @@ -38,6 +44,10 @@ impl Default for RuntimeConfig { pub enum RuntimeError { #[error("stubbed syscall: {name}")] StubbedSyscall { name: String }, + #[error("missing loader config key: {key:?}")] + MissingLoaderConfigKey { key: LoaderConfigKey }, + #[error("runtime manifest serialization failed: {message}")] + ManifestSerialize { message: String }, } pub type RuntimeResult = Result; @@ -125,11 +135,10 @@ mod tests { #[test] fn panic_syscall_returns_error() { let err = syscall_panic("svc_test", &[]).unwrap_err(); - match err { - RuntimeError::StubbedSyscall { name } => { - assert_eq!(name, "svc_test"); - } - } + assert!(matches!( + err, + RuntimeError::StubbedSyscall { name } if name == "svc_test" + )); } #[test] diff --git a/docs/DEVELOPMENT.md b/docs/DEVELOPMENT.md index 406428c..becc95f 100644 --- a/docs/DEVELOPMENT.md +++ b/docs/DEVELOPMENT.md @@ -88,3 +88,24 @@ cargo run -p recomp-cli -- package \ --provenance samples/minimal/provenance.toml \ --out-dir out/bundle-minimal ``` + +- Run homebrew intake (NRO + optional NSO inputs): + +``` +cargo run -p recomp-cli -- homebrew-intake \ + --module path/to/homebrew.nro \ + --nso path/to/optional.nso \ + --provenance path/to/provenance.toml \ + --out-dir out/homebrew-intake +``` + +- Lift homebrew intake output into a lifted module: + +``` +cargo run -p recomp-cli -- homebrew-lift \ + --module-json out/homebrew-intake/module.json \ + --out-dir out/homebrew-lift +``` + +The default mode attempts to decode a small AArch64 subset (mov wide, add, ret). Use `--mode stub` +to emit a placeholder lifted module without decoding instructions. diff --git a/docs/exploratory-pipeline.md b/docs/exploratory-pipeline.md index dcca901..2c021b8 100644 --- a/docs/exploratory-pipeline.md +++ b/docs/exploratory-pipeline.md @@ -13,9 +13,12 @@ This document captures the initial exploratory pipeline that mirrors proven stat - Produces compilable artifacts for validation and iteration. ## Inputs -- `module.json` describes a module, functions, and operations. +- `module.json` describes a lifted module, functions, and operations. - `title.toml` provides the title name, entry function, ABI version, and stub map. - `provenance.toml` records lawful input provenance and format metadata. +- Homebrew intake emits a separate `module.json` + `manifest.json` with segment blobs and assets extracted from NRO/NSO inputs; this homebrew module.json is not consumed by the translator until a lifter produces a lifted module.json. +- The `homebrew-lift` command defaults to decoding a small AArch64 subset (mov wide, add, ret). Use `--mode stub` to emit a placeholder lifted module when decoding is not possible. +- Homebrew RomFS assets are emitted as a file tree under `assets/romfs/`; runtime implementations should mount or map this directory when wiring up RomFS access. Example stub map: ``` @@ -39,5 +42,5 @@ performance_mode = "handheld" ## Next Steps - Add a real input parser for Switch binaries. -- Replace the JSON module with lifted IR from the pipeline. +- Expand the lifter to cover more AArch64 instructions and control flow. - Expand runtime services and ABI validation. diff --git a/samples/homebrew-intake/README.md b/samples/homebrew-intake/README.md new file mode 100644 index 0000000..d9689e8 --- /dev/null +++ b/samples/homebrew-intake/README.md @@ -0,0 +1,45 @@ +# Homebrew Intake Sample + +This walkthrough generates synthetic NRO/NSO inputs (plus a non-proprietary asset section) and runs the homebrew intake pipeline. + +Usage (from repo root): + +1) Generate inputs and provenance metadata. + +``` +python3 samples/homebrew-intake/generate.py +``` + +To skip the asset section, pass `--no-assets`. + +2) Run homebrew intake. + +``` +cargo run -p recomp-cli -- homebrew-intake \ + --module samples/homebrew-intake/inputs/homebrew.nro \ + --nso samples/homebrew-intake/inputs/overlay.nso \ + --provenance samples/homebrew-intake/provenance.toml \ + --out-dir out/homebrew-intake +``` + +3) Lift the intake output into a placeholder lifted module. + +``` +cargo run -p recomp-cli -- homebrew-lift \ + --module-json out/homebrew-intake/module.json \ + --out-dir out/homebrew-lift \ + --mode stub +``` + +4) Inspect outputs. + +``` +ls out/homebrew-intake +``` + +The output includes: +- `segments/` with extracted NRO/NSO segments. +- `assets/` with `icon.bin`, `control.nacp`, and `romfs/romfs.bin` when assets are enabled. +- `module.json` and `manifest.json` describing the intake results. + +All generated data is synthetic and non-proprietary. diff --git a/samples/homebrew-intake/generate.py b/samples/homebrew-intake/generate.py new file mode 100644 index 0000000..a749c69 --- /dev/null +++ b/samples/homebrew-intake/generate.py @@ -0,0 +1,167 @@ +#!/usr/bin/env python3 +"""Generate synthetic NRO/NSO inputs and provenance metadata.""" + +from __future__ import annotations + +import argparse +import hashlib +import os +from pathlib import Path +import struct + + +def write_u32(buf: bytearray, offset: int, value: int) -> None: + buf[offset : offset + 4] = struct.pack(" None: + buf[offset : offset + 8] = struct.pack(" str: + hasher = hashlib.sha256() + with path.open("rb") as handle: + for chunk in iter(lambda: handle.read(8192), b""): + hasher.update(chunk) + return hasher.hexdigest() + + +def build_nro(path: Path, with_assets: bool) -> None: + header_size = 0x80 + text = b"TEXT-SEGMENT" + rodata = b"RODATA-SEGMENT" + data = b"DATA-SEGMENT" + + text_off = header_size + ro_off = text_off + len(text) + data_off = ro_off + len(rodata) + + nro_size = header_size + len(text) + len(rodata) + len(data) + buf = bytearray(nro_size) + + buf[0x10:0x14] = b"NRO0" + write_u32(buf, 0x18, nro_size) + write_u32(buf, 0x20, 0x0) + write_u32(buf, 0x24, len(text)) + write_u32(buf, 0x28, 0x1000) + write_u32(buf, 0x2C, len(rodata)) + write_u32(buf, 0x30, 0x2000) + write_u32(buf, 0x34, len(data)) + write_u32(buf, 0x38, 0x20) + + build_id = b"SYNTHETIC-NRO-BUILD-ID".ljust(0x20, b"0") + buf[0x40:0x60] = build_id + + buf[text_off : text_off + len(text)] = text + buf[ro_off : ro_off + len(rodata)] = rodata + buf[data_off : data_off + len(data)] = data + + if with_assets: + asset_base = len(buf) + asset_header_size = 0x38 + icon = b"SYNTH-ICON-DATA" + nacp = bytearray(0x4000) + nacp[:24] = b"SYNTHETIC NACP METADATA" + romfs = b"ROMFS-SAMPLE-DATA" + + icon_offset = asset_header_size + nacp_offset = icon_offset + len(icon) + romfs_offset = nacp_offset + len(nacp) + total = asset_base + asset_header_size + len(icon) + len(nacp) + len(romfs) + if total > len(buf): + buf.extend(b"\x00" * (total - len(buf))) + + buf[asset_base : asset_base + 4] = b"ASET" + write_u64(buf, asset_base + 0x8, icon_offset) + write_u64(buf, asset_base + 0x10, len(icon)) + write_u64(buf, asset_base + 0x18, nacp_offset) + write_u64(buf, asset_base + 0x20, len(nacp)) + write_u64(buf, asset_base + 0x28, romfs_offset) + write_u64(buf, asset_base + 0x30, len(romfs)) + + icon_start = asset_base + icon_offset + buf[icon_start : icon_start + len(icon)] = icon + nacp_start = asset_base + nacp_offset + buf[nacp_start : nacp_start + len(nacp)] = nacp + romfs_start = asset_base + romfs_offset + buf[romfs_start : romfs_start + len(romfs)] = romfs + + path.write_bytes(buf) + + +def build_nso(path: Path) -> None: + header_size = 0x100 + text = b"NSO-TEXT-SEGMENT" + rodata = b"NSO-RODATA" + data = b"NSO-DATA" + + text_off = header_size + ro_off = text_off + len(text) + data_off = ro_off + len(rodata) + + total = header_size + len(text) + len(rodata) + len(data) + buf = bytearray(total) + + buf[0x0:0x4] = b"NSO0" + write_u32(buf, 0x8, 0x0) + write_u32(buf, 0x10, text_off) + write_u32(buf, 0x14, 0x0) + write_u32(buf, 0x18, len(text)) + write_u32(buf, 0x20, ro_off) + write_u32(buf, 0x24, 0x1000) + write_u32(buf, 0x28, len(rodata)) + write_u32(buf, 0x30, data_off) + write_u32(buf, 0x34, 0x2000) + write_u32(buf, 0x38, len(data)) + write_u32(buf, 0x3C, 0x40) + + module_id = b"SYNTHETIC-NSO-BUILD-ID".ljust(0x20, b"0") + buf[0x40:0x60] = module_id + write_u32(buf, 0x60, len(text)) + write_u32(buf, 0x64, len(rodata)) + write_u32(buf, 0x68, len(data)) + + buf[text_off : text_off + len(text)] = text + ro_start = ro_off + buf[ro_start : ro_start + len(rodata)] = rodata + data_start = data_off + buf[data_start : data_start + len(data)] = data + + path.write_bytes(buf) + + +def build_provenance(path: Path, nro_path: Path, nso_path: Path) -> None: + nro_sha = sha256_path(nro_path) + nso_sha = sha256_path(nso_path) + nro_size = nro_path.stat().st_size + nso_size = nso_path.stat().st_size + + content = f"""schema_version = \"1\"\n\n[title]\nname = \"Homebrew Intake Sample\"\ntitle_id = \"0100000000000000\"\nversion = \"0.1.0\"\nregion = \"US\"\n\n[collection]\ndevice = \"demo\"\ncollected_at = \"2026-02-01\"\nnotes = \"Synthetic homebrew intake fixture with non-proprietary assets.\"\n\n[collection.tool]\nname = \"synthetic-generator\"\nversion = \"1.0\"\n\n[[inputs]]\npath = \"inputs/{nro_path.name}\"\nformat = \"nro0\"\nsha256 = \"{nro_sha}\"\nsize = {nro_size}\nrole = \"homebrew_module\"\n\n[[inputs]]\npath = \"inputs/{nso_path.name}\"\nformat = \"nso0\"\nsha256 = \"{nso_sha}\"\nsize = {nso_size}\nrole = \"auxiliary_module\"\n""" + path.write_text(content) + + +def main() -> int: + parser = argparse.ArgumentParser(description="Generate synthetic homebrew intake inputs") + parser.add_argument("--no-assets", action="store_true", help="skip asset section") + args = parser.parse_args() + + root = Path(__file__).resolve().parent + inputs_dir = root / "inputs" + inputs_dir.mkdir(parents=True, exist_ok=True) + + nro_path = inputs_dir / "homebrew.nro" + nso_path = inputs_dir / "overlay.nso" + + build_nro(nro_path, with_assets=not args.no_assets) + build_nso(nso_path) + build_provenance(root / "provenance.toml", nro_path, nso_path) + + print(f"Wrote {nro_path} ({nro_path.stat().st_size} bytes)") + print(f"Wrote {nso_path} ({nso_path.stat().st_size} bytes)") + print("Updated provenance.toml") + + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/samples/homebrew-intake/inputs/homebrew.nro b/samples/homebrew-intake/inputs/homebrew.nro new file mode 100644 index 0000000..ce7e686 Binary files /dev/null and b/samples/homebrew-intake/inputs/homebrew.nro differ diff --git a/samples/homebrew-intake/inputs/overlay.nso b/samples/homebrew-intake/inputs/overlay.nso new file mode 100644 index 0000000..2153eba Binary files /dev/null and b/samples/homebrew-intake/inputs/overlay.nso differ diff --git a/samples/homebrew-intake/provenance.toml b/samples/homebrew-intake/provenance.toml new file mode 100644 index 0000000..9830921 --- /dev/null +++ b/samples/homebrew-intake/provenance.toml @@ -0,0 +1,30 @@ +schema_version = "1" + +[title] +name = "Homebrew Intake Sample" +title_id = "0100000000000000" +version = "0.1.0" +region = "US" + +[collection] +device = "demo" +collected_at = "2026-02-01" +notes = "Synthetic homebrew intake fixture with non-proprietary assets." + +[collection.tool] +name = "synthetic-generator" +version = "1.0" + +[[inputs]] +path = "inputs/homebrew.nro" +format = "nro0" +sha256 = "c30c135e46d5b0baea8a9ce8f2686366c754e926beb653c47882477b65f2feb2" +size = 16637 +role = "homebrew_module" + +[[inputs]] +path = "inputs/overlay.nso" +format = "nso0" +sha256 = "a90d4d0c38c9a690869133407253a5f39f2e16a88bb9c0781acd8642e34b5d6b" +size = 290 +role = "auxiliary_module" diff --git a/specs/README.md b/specs/README.md index a19c45d..9203213 100644 --- a/specs/README.md +++ b/specs/README.md @@ -17,6 +17,12 @@ This folder contains the project specs for the Switch static recompilation prese - SPEC-096-BUNDLE-MANIFEST-INTEGRITY.md - SPEC-100-VALIDATION.md - SPEC-110-TITLE-SELECTION.md +- SPEC-120-HOMEBREW-INTAKE.md +- SPEC-130-HOMEBREW-MODULE-EXTRACTION.md +- SPEC-140-HOMEBREW-RUNTIME-SURFACE.md +- SPEC-150-HOMEBREW-ASSET-PACKAGING.md +- SPEC-160-AARCH64-DECODE-COVERAGE.md +- SPEC-170-FUNCTION-DISCOVERY-CFG.md ## Template - SPEC-TEMPLATE.md diff --git a/specs/SPEC-030-RECOMP-PIPELINE.md b/specs/SPEC-030-RECOMP-PIPELINE.md index 3d5a1a4..60715fb 100644 --- a/specs/SPEC-030-RECOMP-PIPELINE.md +++ b/specs/SPEC-030-RECOMP-PIPELINE.md @@ -1,11 +1,13 @@ # SPEC-030: Static Recompilation Pipeline ## Status -Draft v0.4 +Draft v0.6 ## Rationale - Added an exploratory pipeline scaffold and config schema to validate the end-to-end shape. - Added deterministic build manifest emission with input hashes. +- Added a placeholder homebrew lifter command to bridge intake into lifted JSON. +- Added a minimal AArch64 decode mode to lift homebrew text segments into the placeholder IR. ## Purpose Define the end-to-end pipeline for static recompilation from input binaries to native output. diff --git a/specs/SPEC-050-CPU-ISA.md b/specs/SPEC-050-CPU-ISA.md index 7b69530..0972900 100644 --- a/specs/SPEC-050-CPU-ISA.md +++ b/specs/SPEC-050-CPU-ISA.md @@ -1,13 +1,14 @@ # SPEC-050: CPU ISA Lifting and Semantics ## Status -Draft v0.4 +Draft v0.5 ## Rationale - Added a minimal ISA execution module for early instruction semantics tests. - Expanded semantics coverage with arithmetic and NZCV flag updates. - Added shift and rotate immediate semantics with carry updates. - Added load/store stubs with alignment checks for memory access validation. +- Added a minimal AArch64 lifter subset for mov wide, add, and ret to feed the exploratory pipeline. ## Purpose Define instruction coverage and semantics for the Switch CPU ISA lifting layer. diff --git a/specs/SPEC-120-HOMEBREW-INTAKE.md b/specs/SPEC-120-HOMEBREW-INTAKE.md new file mode 100644 index 0000000..ff659f6 --- /dev/null +++ b/specs/SPEC-120-HOMEBREW-INTAKE.md @@ -0,0 +1,63 @@ +# SPEC-120: Homebrew Candidate Intake + +## Status +Draft v0.3 + +## Purpose +Define the intake requirements and metadata capture needed to select a Switch homebrew title and feed it into the static recompilation pipeline. + +## Goals +- Accept a legally distributable homebrew candidate with clear provenance. +- Normalize input artifacts into a deterministic intake manifest. +- Preserve asset separation while extracting optional metadata and RomFS content. + +## Non-Goals +- Supporting retail titles or proprietary content. +- Recompiling dynamically generated code. +- Handling titles that require runtime emulation rather than static recompilation. + +## Background +Homebrew on Switch is commonly distributed as NRO modules, which are executable formats for non-ExeFS binaries and may include an optional asset section for icon, NACP metadata, and RomFS content. citeturn1view0 + +## Requirements +- Intake must accept an NRO as the primary module, with optional auxiliary NSO modules when supplied. +- Intake must reject inputs that contain proprietary retail assets or encrypted formats. +- The NRO header and module identifier must be parsed to capture segment sizes, memory offsets, and build id metadata. citeturn1view0 +- If a homebrew asset section is present, intake must detect and extract the icon, NACP, and RomFS offsets and sizes. citeturn1view0 +- If NACP is present, intake must capture it as a 0x4000-byte control.nacp blob with UTF-8 strings. citeturn3view0 +- Inputs must be hashed (SHA-256) and stored alongside provenance metadata. +- Intake must emit a deterministic manifest describing: + - Input file paths, sizes, hashes. + - Parsed module id/build id. + - Asset section presence and sizes. + - Tool versions used for parsing. + +## Interfaces and Data +- Inputs are stored under a per-title directory containing: + - `module.nro` + - Optional `module*.nso` + - Optional `assets/` extracted from the NRO asset section + - `provenance.toml` and `title.toml` metadata files +- Intake produces a `module.json` and `manifest.json` compatible with the pipeline CLI; the homebrew module.json is consumed by the lifter stage to produce a lifted module.json for translation. + +## Deliverables +- NRO and optional NSO intake parser. +- Deterministic intake manifest schema. +- Documentation for required input layout and provenance fields. + +## Open Questions +- Should the intake step allow raw ELF inputs for developer-built homebrew, or require NRO only? +- How should optional NSO modules be mapped into a single module.json layout? + +## Acceptance Criteria +- A sample homebrew NRO can be ingested with a generated manifest that records hashes, build id, and asset offsets. +- If the NRO contains NACP and RomFS, intake extracts and records them without mixing assets into code output. +- Intake fails fast with a clear error when a required field is missing or unsupported. + +## Risks +- Homebrew titles that embed unexpected custom data in the asset section may require extra parsing rules. +- Incorrect module id parsing could break reproducibility or provenance tracking. + +## References +- https://switchbrew.org/wiki/NRO +- https://switchbrew.org/wiki/NACP diff --git a/specs/SPEC-130-HOMEBREW-MODULE-EXTRACTION.md b/specs/SPEC-130-HOMEBREW-MODULE-EXTRACTION.md new file mode 100644 index 0000000..eec8e11 --- /dev/null +++ b/specs/SPEC-130-HOMEBREW-MODULE-EXTRACTION.md @@ -0,0 +1,59 @@ +# SPEC-130: Homebrew Module Extraction + +## Status +Draft v0.2 + +## Purpose +Define how NRO and NSO binaries are parsed and normalized into the internal module representation used by the static recompilation pipeline. + +## Goals +- Provide a deterministic, lossless mapping from NRO/NSO to module.json. +- Support compressed NSO segments and record build ids. +- Preserve section boundaries and relocation metadata for later translation. + +## Non-Goals +- Full dynamic loader emulation. +- Recovering symbols beyond what the module provides. + +## Background +NRO is a Switch executable format used for non-ExeFS binaries and includes header offsets for code, rodata, data, and bss, plus an optional asset section. citeturn1view0 +NSO is another Switch executable format that can store segments compressed with LZ4 and includes a ModuleId for build identification. citeturn2view0 + +## Requirements +- The extractor must parse NRO headers and map text, rodata, data, and bss segments into module.json with file and memory offsets. citeturn1view0 +- The extractor must parse NSO headers and segment tables, including any LZ4-compressed segments, and produce decompressed outputs for translation. citeturn2view0 +- The extractor must capture the ModuleId/build id from NRO or NSO metadata for reproducible builds. citeturn1view0turn2view0 +- If dynamic symbol tables or relocation metadata are present, the extractor must preserve them in module.json for later resolution. +- Extraction must be deterministic: identical inputs produce byte-identical module.json and extracted segment files. + +## Interfaces and Data +- Input: `module.nro` and optional `module*.nso`. +- Output: + - `module.json` with: + - Segment list (name, file offset, size, vaddr, permissions). + - BSS size and base address. + - Build id/module id. + - Optional relocation and symbol table references. + - Extracted segment blobs stored under `out//segments/`. + +## Deliverables +- NRO and NSO parsers. +- Module normalization logic. +- Tests covering NRO-only and NRO + NSO ingestion. + +## Open Questions +- Do we need to support embedded MOD0 metadata beyond symbol resolution? +- Should compressed NSO segments be cached to avoid repeated LZ4 decoding? + +## Acceptance Criteria +- A homebrew NRO produces a module.json with correct segment sizes and a non-empty build id. +- An NSO with compressed segments can be parsed, decompressed, and emitted as deterministic blobs. +- Extraction preserves all section boundaries necessary for later instruction translation. + +## Risks +- Incorrect segment alignment or padding could lead to wrong control flow reconstruction. +- Missing relocation metadata may require fallback heuristics. + +## References +- https://switchbrew.org/wiki/NRO +- https://switchbrew.org/wiki/NSO diff --git a/specs/SPEC-140-HOMEBREW-RUNTIME-SURFACE.md b/specs/SPEC-140-HOMEBREW-RUNTIME-SURFACE.md new file mode 100644 index 0000000..3884651 --- /dev/null +++ b/specs/SPEC-140-HOMEBREW-RUNTIME-SURFACE.md @@ -0,0 +1,57 @@ +# SPEC-140: Homebrew Runtime Surface + +## Status +Draft v0.2 + +## Purpose +Define the runtime ABI surface required to boot a recompiled homebrew title and satisfy the Switch homebrew ABI expectations. + +## Goals +- Provide a minimal, deterministic runtime surface for homebrew startup. +- Map required loader configuration fields into the runtime environment. +- Establish a clear contract for unsupported services. + +## Non-Goals +- Re-implementing the full Horizon OS service set. +- Supporting dynamically loaded NROs at runtime. + +## Background +The Switch homebrew ABI defines how NRO entrypoints receive a loader configuration and which config entries must be present at startup, including EndOfList, MainThreadHandle, and AppletType. citeturn0view0 +It also defines the register arguments used for NRO entrypoints. citeturn0view0 + +## Requirements +- The runtime must provide an entrypoint shim that invokes the recompiled NRO entrypoint with: + - X0 pointing to the loader configuration structure. + - X1 set to 0xFFFFFFFFFFFFFFFF for NROs. citeturn0view0 +- The runtime must populate loader config entries for EndOfList, MainThreadHandle, and AppletType at a minimum. citeturn0view0 +- The runtime must surface loader config entries for optional fields (Argv, OverrideHeap, AllocPages, LockRegion) when present, and fail with a clear error if a required field is missing. citeturn0view0 +- The runtime must provide a stable, deterministic time source and input event queue to minimize nondeterminism in validation. +- The runtime must document which Horizon OS services are stubbed and which are implemented, with a hard failure for unsupported calls. + +## Interfaces and Data +- Runtime ABI struct definitions for loader config entries. +- A generated `runtime_manifest.json` describing: + - Supported loader config keys. + - Stubbed services and behavior. + - Determinism knobs (time, input). + +## Deliverables +- Entry shim implementation for the homebrew ABI. +- Loader config builder. +- Runtime service capability documentation. + +## Open Questions +- Should we map libnx service calls through a thin compatibility layer or directly implement a minimal subset? +- What is the smallest deterministic input/timing surface that still allows real gameplay? + +## Acceptance Criteria +- A recompiled homebrew binary boots and reaches its main loop with the runtime providing required loader config keys. +- Unsupported services fail with an explicit, logged error that references the missing service. +- The runtime manifest enumerates which loader config keys were provided for a run. + +## Risks +- Some homebrew titles may assume additional loader config keys not covered by the minimum set. +- Overly strict service stubs may block otherwise runnable titles. + +## References +- https://switchbrew.org/wiki/Homebrew_ABI diff --git a/specs/SPEC-150-HOMEBREW-ASSET-PACKAGING.md b/specs/SPEC-150-HOMEBREW-ASSET-PACKAGING.md new file mode 100644 index 0000000..c46a8dc --- /dev/null +++ b/specs/SPEC-150-HOMEBREW-ASSET-PACKAGING.md @@ -0,0 +1,57 @@ +# SPEC-150: Homebrew Asset Packaging + +## Status +Draft v0.3 + +## Purpose +Define how homebrew asset data (icon, NACP, RomFS) is extracted from NROs and packaged with the recompiled output while preserving asset separation. + +## Goals +- Extract and preserve NRO asset section contents deterministically. +- Keep code output and asset output strictly separated. +- Emit metadata that allows a runtime to mount RomFS content. + +## Non-Goals +- Repacking assets back into NRO format. +- Handling proprietary retail assets. + +## Background +NRO files can include an optional asset section that contains icon, NACP metadata, and RomFS content. citeturn1view0 +NACP (control.nacp) is a 0x4000-byte metadata file with UTF-8 strings used for title metadata. citeturn3view0 + +## Requirements +- If an NRO asset section is present, extraction must locate the icon, NACP, and RomFS offsets/sizes and copy them into the output asset directory. citeturn1view0 +- Extracted NACP must be stored verbatim as `control.nacp` and validated for size 0x4000. citeturn3view0 +- Extracted icon data must be preserved as raw bytes with file metadata describing expected image type when known. +- RomFS content must be extracted into a deterministic directory structure and hashed for provenance. +- The output `manifest.json` must include per-asset hashes and sizes alongside code hashes. + +## Interfaces and Data +- Output layout: + - `out/<title>/assets/control.nacp` + - `out/<title>/assets/icon.bin` + - `out/<title>/assets/romfs/<romfs-path>` + - `out/<title>/manifest.json` +- `manifest.json` fields for asset hashes, sizes, and source offsets. + +## Deliverables +- Asset extraction tool. +- Asset manifest schema updates. +- Documentation for runtime RomFS mounting expectations. + +## Open Questions +- Should icon data be normalized to a specific image format for downstream tooling? +- How should multi-language NACP strings be surfaced in title metadata? + +## Acceptance Criteria +- A homebrew NRO with asset section yields extracted icon, NACP, and RomFS file tree in a deterministic output directory. +- Asset hashes in manifest.json match the extracted bytes. +- Code output remains separate from extracted assets. + +## Risks +- Some homebrew titles may omit NACP or RomFS entirely, requiring graceful handling. +- Incorrect RomFS extraction could break resource loading at runtime. + +## References +- https://switchbrew.org/wiki/NRO +- https://switchbrew.org/wiki/NACP diff --git a/specs/SPEC-160-AARCH64-DECODE-COVERAGE.md b/specs/SPEC-160-AARCH64-DECODE-COVERAGE.md new file mode 100644 index 0000000..ab94066 --- /dev/null +++ b/specs/SPEC-160-AARCH64-DECODE-COVERAGE.md @@ -0,0 +1,68 @@ +# SPEC-160: AArch64 Decode Coverage + +## Status +Draft v0.1 + +## Purpose +Define the minimum AArch64 instruction decode coverage needed to lift real homebrew code into the exploratory IR. + +## Goals +- Expand decode coverage beyond the current MOV/ADD/RET subset. +- Map decoded instructions into deterministic, testable IR operations. +- Capture enough semantics to support basic control flow and memory access. + +## Non-Goals +- Full AArch64 ISA coverage. +- SIMD, floating point, or system instruction support. +- Accurate exception modeling beyond explicit traps for unsupported opcodes. + +## Background +The current homebrew lifter decodes only a tiny subset of instructions and produces a single linear block. Real homebrew titles quickly encounter control flow, memory access, and wider arithmetic operations that must be lifted to progress. + +## Requirements +### Decode Coverage (Phase 1) +- Move wide: `MOVZ`, `MOVN`, `MOVK` (64-bit). +- Register move: `ORR` with `XZR` (64-bit) as a move alias. +- Integer arithmetic: `ADD`/`SUB` (immediate and register, 64-bit). +- Compare/test: `CMP`/`CMN`/`TST` via `SUBS`/`ADDS`/`ANDS` (64-bit), emitting flag updates. +- PC-relative: `ADR`, `ADRP` (64-bit). +- Loads/stores: `LDR`/`STR` unsigned immediate for byte/half/word/dword sizes (64-bit base). +- Branching: `B`, `BL`, `BR`, `RET`, `B.<cond>`, `CBZ`/`CBNZ`, `TBZ`/`TBNZ`. +- NOP: `NOP`. + +### IR Extensions +- Add IR ops for: + - Arithmetic: `sub_i64`, `and_i64`, `or_i64`, `xor_i64`. + - Comparisons: `cmp_i64` that updates flags. + - Shifts: `lsl_i64`, `lsr_i64`, `asr_i64`. + - Memory: `load_i{8,16,32,64}`, `store_i{8,16,32,64}`. + - Control flow: `br`, `br_cond`, `call`, `ret`. + - PC-relative: `pc_rel` or explicit `const_i64` of resolved addresses. +- When decoding 32-bit variants (W registers), zero-extend to 64-bit in the IR. + +### Decode Rules +- Decode little-endian 32-bit words with 4-byte alignment. +- Reject unsupported instructions with the opcode and PC offset. +- Enforce deterministic decode limits per function to avoid runaway scans. + +## Interfaces and Data +- The lifted `module.json` must include the instruction-derived ops and any new op fields. +- Flag updates must be explicit in the IR so later stages do not infer hidden side effects. + +## Deliverables +- A decoder module covering Phase 1 instructions. +- IR extensions with serialization support. +- Tests that validate opcode decoding and IR emission for each instruction class. + +## Acceptance Criteria +- A synthetic instruction stream containing Phase 1 opcodes lifts without errors. +- Unsupported opcodes report the PC and opcode value. +- Tests confirm 32-bit variants are zero-extended. +- Loads/stores emit correctly typed IR ops with aligned access checks. + +## Risks +- Partial decode coverage may still be insufficient for real titles. +- Incorrect flag modeling can break control flow and comparisons. + +## References +- https://developer.arm.com/documentation/ddi0596/latest diff --git a/specs/SPEC-170-FUNCTION-DISCOVERY-CFG.md b/specs/SPEC-170-FUNCTION-DISCOVERY-CFG.md new file mode 100644 index 0000000..85e6428 --- /dev/null +++ b/specs/SPEC-170-FUNCTION-DISCOVERY-CFG.md @@ -0,0 +1,74 @@ +# SPEC-170: Function Discovery and Control-Flow Graph + +## Status +Draft v0.1 + +## Purpose +Define basic function discovery and control-flow block construction for lifted homebrew code. + +## Goals +- Replace single linear decoding with basic blocks and explicit control-flow edges. +- Discover functions starting from entrypoints and direct call targets. +- Produce deterministic, reproducible function layouts in `module.json`. + +## Non-Goals +- Full symbol recovery or decompilation quality control flow. +- Indirect branch target resolution beyond explicit metadata. +- Advanced tail-call or exception unwinding analysis. + +## Background +Linear decoding fails once branches appear and cannot represent multiple control-flow paths. Basic block construction allows the pipeline to model conditional branches and calls while keeping the IR deterministic. + +## Requirements +### Entry Points +- Seed function discovery from the title entrypoint in `title.toml`. +- If a homebrew module provides symbol or relocation metadata, include direct export targets as additional seeds. + +### Basic Blocks +- Each function is a list of basic blocks with: + - A unique label. + - A starting PC address. + - A list of IR ops. + - A terminator (`br`, `br_cond`, `call`, `ret`). +- Decode sequentially until a terminator or a known block boundary is reached. +- Split blocks at branch targets and fallthrough addresses. + +### Control-Flow Edges +- `B` creates an unconditional edge to its target. +- `B.<cond>`, `CBZ`/`CBNZ`, `TBZ`/`TBNZ` create two edges: taken and fallthrough. +- `BL` creates a call edge and a fallthrough edge. +- `BR` creates an indirect edge tagged as unresolved. + +### Determinism +- Function and block ordering must be stable across runs. +- Use a sorted worklist for addresses to avoid nondeterministic traversal. +- Enforce per-function decode limits with explicit errors. + +### Error Handling +- Unsupported opcodes should fail the function decode with PC and opcode. +- Overlapping blocks or invalid alignment should fail with a clear error. + +## Interfaces and Data +- Update the lifted module schema to support `blocks` under each function. +- Blocks must preserve their start addresses for traceability. +- The pipeline must accept both linear `ops` (legacy) and block-based functions during the transition. + +## Deliverables +- Function discovery engine with a block builder. +- Schema updates and validation for block-based functions. +- Tests for: + - Simple if/else control flow. + - Direct calls that seed new functions. + - Indirect branches recorded as unresolved edges. + +## Acceptance Criteria +- A synthetic binary with a conditional branch yields at least two blocks and correct edges. +- Direct call targets are discovered and lifted as separate functions. +- The lifted module is deterministic when run twice on the same input. + +## Risks +- Missing branch targets may drop code paths. +- Overly strict decode limits may block real-world binaries. + +## References +- https://en.wikipedia.org/wiki/Basic_block