Skip to content

Initial version of opencc-wasm library with a demo site#1

Merged
frankslin merged 6 commits intomasterfrom
wasm-demo
Jan 1, 2026
Merged

Initial version of opencc-wasm library with a demo site#1
frankslin merged 6 commits intomasterfrom
wasm-demo

Conversation

@frankslin
Copy link
Owner

@frankslin frankslin merged commit 143b623 into master Jan 1, 2026
23 checks passed
@frankslin frankslin deleted the wasm-demo branch January 1, 2026 15:40
frankslin added a commit that referenced this pull request Jan 1, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 3, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin pushed a commit that referenced this pull request Jan 9, 2026
This commit addresses two severe security vulnerabilities discovered in
OpenCC's UTF-8 text processing logic.

## Vulnerability 1: MaxMatchSegmentation Buffer Overflow (Issue BYVoid#997)

**Location:** src/MaxMatchSegmentation.cpp
**Type:** Heap buffer overflow via integer underflow
**CVSS:** ~7.5 (High)

Problem:
- Used manual length decrement: length -= matchedLength
- When UTF-8 character length exceeded remaining bytes, caused size_t underflow
- Next MatchPrefix() call received huge length value, reading beyond buffer

Example trigger:
- Input: "一" + \xE4\xB8 (truncated 3-byte sequence)
- Iteration 2: remainingLength=2, NextCharLength=3
- Old: length = 2 - 3 = SIZE_MAX (underflow)
- Result: Buffer overflow read

## Vulnerability 2: Conversion Information Disclosure (More Severe)

**Location:** src/Conversion.cpp
**Type:** Information disclosure + heap buffer overflow
**CVSS:** ~8.6 (Critical)

Problem:
- Similar to #1, but worse: OUTPUTS leaked data to result
- When processing truncated UTF-8, would jump over null terminator
- Continue reading and OUTPUT heap memory contents
- Could leak: encryption keys, passwords, user data, etc.

Example exploit:
- Input: "干" + \xE5\xB9 + null
- Output: "幹" + heap_garbage_data
- Attacker receives sensitive information directly

Why more severe than #1:
- Issue BYVoid#997: Buffer overflow (no data output)
- This bug: Buffer overflow + data exfiltration
- Direct information disclosure to attacker

## Solution

Implemented defense-in-depth approach with multiple layers:

1. **Layer 1 - Dynamic length calculation:**
   ```cpp
   const char* textEnd = text.c_str() + text.length();
   size_t remainingLength = textEnd - pstr;  // Always accurate
   ```

2. **Layer 2 - Explicit boundary checks:**
   ```cpp
   if (matchedLength > remainingLength) {
       matchedLength = remainingLength;  // Clamp to safe value
   }
   ```

3. **Layer 3 - Loop termination:**
   - Existing `*pstr != '\0'` check as final safeguard

4. **Layer 4 - Dictionary match validation:**
   - Also validate KeyLength() doesn't exceed remainingLength
   - Defense even against corrupted dictionary data

## Changes

**Code fixes:**
- src/MaxMatchSegmentation.cpp:
  * Calculate textEnd pointer once
  * Dynamically compute remainingLength per iteration
  * Add explicit bounds check for NextCharLength result
  * Pass remainingLength to MatchPrefix

- src/Conversion.cpp:
  * Calculate phraseEnd pointer once
  * Dynamically compute remainingLength per iteration
  * Add bounds checks for both NextCharLength and KeyLength
  * Prevent reading beyond null terminator

**Test coverage:**
- src/MaxMatchSegmentationTest.cpp:
  * Add TruncatedUtf8Sequence test
  * Verifies handling of incomplete UTF-8 sequences
  * Ensures output preserves all input bytes (no data loss)

- src/ConversionTest.cpp:
  * Add TruncatedUtf8Sequence test
  * Verifies conversion works + no information leak
  * Tests with "干" → "幹" + preserved incomplete sequence

## Behavior Verification

**Normal input:** Behavior completely unchanged
- Old: length values 9→6→3→0
- New: remainingLength values 9→6→3→0
- Boundary checks never trigger
- Zero performance impact

**Malicious input:** Now safely handled
- Incomplete UTF-8 sequences preserved (no data loss)
- No buffer overruns
- No information disclosure
- All tests pass (15/15)

## Security Impact

- Fixes CWE-125 (Out-of-bounds Read)
- Fixes CWE-200 (Information Exposure)
- Prevents DoS attacks
- Prevents information disclosure attacks
- Backward compatible with all normal use cases

Discovered during security audit. All users should upgrade immediately
if processing untrusted input.

Fixes BYVoid#997
frankslin pushed a commit that referenced this pull request Jan 9, 2026
This commit addresses two severe security vulnerabilities discovered in
OpenCC's UTF-8 text processing logic.

## Vulnerability 1: MaxMatchSegmentation Buffer Overflow (Issue BYVoid#997)

**Location:** src/MaxMatchSegmentation.cpp
**Type:** Heap buffer overflow via integer underflow
**CVSS:** ~7.5 (High)

Problem:
- Used manual length decrement: length -= matchedLength
- When UTF-8 character length exceeded remaining bytes, caused size_t underflow
- Next MatchPrefix() call received huge length value, reading beyond buffer

Example trigger:
- Input: "一" + \xE4\xB8 (truncated 3-byte sequence)
- Iteration 2: remainingLength=2, NextCharLength=3
- Old: length = 2 - 3 = SIZE_MAX (underflow)
- Result: Buffer overflow read

## Vulnerability 2: Conversion Information Disclosure (More Severe)

**Location:** src/Conversion.cpp
**Type:** Information disclosure + heap buffer overflow
**CVSS:** ~8.6 (Critical)

Problem:
- Similar to #1, but worse: OUTPUTS leaked data to result
- When processing truncated UTF-8, would jump over null terminator
- Continue reading and OUTPUT heap memory contents
- Could leak: encryption keys, passwords, user data, etc.

Example exploit:
- Input: "干" + \xE5\xB9 + null
- Output: "幹" + heap_garbage_data
- Attacker receives sensitive information directly

Why more severe than #1:
- Issue BYVoid#997: Buffer overflow (no data output)
- This bug: Buffer overflow + data exfiltration
- Direct information disclosure to attacker

## Solution

Implemented defense-in-depth approach with multiple layers:

1. **Layer 1 - Dynamic length calculation:**
   ```cpp
   const char* textEnd = text.c_str() + text.length();
   size_t remainingLength = textEnd - pstr;  // Always accurate
   ```

2. **Layer 2 - Explicit boundary checks:**
   ```cpp
   if (matchedLength > remainingLength) {
       matchedLength = remainingLength;  // Clamp to safe value
   }
   ```

3. **Layer 3 - Loop termination:**
   - Existing `*pstr != '\0'` check as final safeguard

4. **Layer 4 - Dictionary match validation:**
   - Also validate KeyLength() doesn't exceed remainingLength
   - Defense even against corrupted dictionary data

## Changes

**Code fixes:**
- src/MaxMatchSegmentation.cpp:
  * Calculate textEnd pointer once
  * Dynamically compute remainingLength per iteration
  * Add explicit bounds check for NextCharLength result
  * Pass remainingLength to MatchPrefix

- src/Conversion.cpp:
  * Calculate phraseEnd pointer once
  * Dynamically compute remainingLength per iteration
  * Add bounds checks for both NextCharLength and KeyLength
  * Prevent reading beyond null terminator

**Test coverage:**
- src/MaxMatchSegmentationTest.cpp:
  * Add TruncatedUtf8Sequence test
  * Verifies handling of incomplete UTF-8 sequences
  * Ensures output preserves all input bytes (no data loss)

- src/ConversionTest.cpp:
  * Add TruncatedUtf8Sequence test
  * Verifies conversion works + no information leak
  * Tests with "干" → "幹" + preserved incomplete sequence

## Behavior Verification

**Normal input:** Behavior completely unchanged
- Old: length values 9→6→3→0
- New: remainingLength values 9→6→3→0
- Boundary checks never trigger
- Zero performance impact

**Malicious input:** Now safely handled
- Incomplete UTF-8 sequences preserved (no data loss)
- No buffer overruns
- No information disclosure
- All tests pass (15/15)

## Security Impact

- Fixes CWE-125 (Out-of-bounds Read)
- Fixes CWE-200 (Information Exposure)
- Prevents DoS attacks
- Prevents information disclosure attacks
- Backward compatible with all normal use cases

Discovered during security audit. All users should upgrade immediately
if processing untrusted input.

Fixes BYVoid#997
frankslin added a commit that referenced this pull request Jan 13, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 14, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 14, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 14, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 14, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 14, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 16, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 21, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 24, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Jan 28, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Mar 9, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Mar 17, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
frankslin added a commit that referenced this pull request Mar 18, 2026
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant