Angry Data Core is a Kotlin Multiplatform library for locating sensitive data (PII) inside free-form text. The project focuses on Russian banking and personal identifiers and ships with tuned regular expressions, contextual validation logic, and two execution engines: a high-performance HyperScan backend for the JVM and a portable Kotlin regex engine that runs everywhere Kotlin does. Native bindings are also provided so the scanner can be embedded from non-JVM applications.
- Kotlin Multiplatform distribution with JVM, JS, and Native targets published to Maven Central.
- Dual detection engines:
HyperScanEngine(Hyperscan powered,AutoCloseable) for throughput-sensitive JVM workloads andKotlinEnginefor portable scanning. - Hardened matchers for bank card numbers (Luhn + BIN validation), CVV/CVC, bank account numbers, Russian passports, SNILS, OMS, INN, phone numbers, vehicle licence plates, full names, addresses, logins, passwords, e-mail addresses, IPv4/IPv6, and customizable user signatures.
Matchresults include the detected value, ten characters of surrounding context, and absolute character offsets to simplify redaction or highlighting.- Built-in matcher registry (
org.angryscan.common.extensions.Matchers) plus Kotlin Serialization support, making it easy to persist engine configurations or ship them over the wire. - Kotlin/Native shared library (
AngryData.dll/libAngryData.so/libAngryData.dylib) with C ABI exports for integrations such as Python.
:kotlin-lib- core multiplatform library with engines, matchers, shared constants, and publishing settings.:native-lib- Kotlin/Native wrapper that exposes C-callable detection functions backed by the core library.python-example-ctypesdemo that consumes the native shared library and prints detection results.
All artifacts are published under the org.angryscan group.
repositories {
mavenCentral()
}
dependencies {
// Multiplatform artifact; usable from JVM or MPP projects
implementation("org.angryscan:core:1.3.5")
}<dependency>
<groupId>org.angryscan</groupId>
<artifactId>core</artifactId>
<version>1.3.5</version>
</dependency>dependencies {
implementation("org.angryscan:core-js:1.3.5")
}Note: The HyperScan backend relies on the
com.gliwka.hyperscanwrapper. Ensure your runtime platform is supported by that dependency; otherwise fall back to the portableKotlinEngine.
import org.angryscan.common.engine.hyperscan.HyperScanEngine
import org.angryscan.common.engine.hyperscan.IHyperMatcher
import org.angryscan.common.extensions.Matchers
val text = """
Client Ivan Ivanov lives at Moscow, Tverskaya street 1.
Card 4276 8070 1492 7948 (CVV 123), phone +7 (916) 123-45-67 and email ivanov@example.org.
""".trimIndent()
HyperScanEngine(Matchers.filterIsInstance<IHyperMatcher>()).use { engine ->
val matches = engine.scan(text)
matches.forEach { match ->
println("${match.matcher.name}: ${match.value} at ${match.startPosition}-${match.endPosition}")
}
}Compiling regular expressions on every startup can be slow when the matcher set is large. You can compile once, save the database, and reload it instantly on subsequent runs.
import org.angryscan.common.engine.hyperscan.HyperScanEngine
import org.angryscan.common.engine.hyperscan.IHyperMatcher
import org.angryscan.common.extensions.Matchers
import java.io.File
val matchers = Matchers.filterIsInstance<IHyperMatcher>()
val dbFile = File("hyperscan.db")
// First run — compile and save
HyperScanEngine(matchers).use { engine ->
engine.saveCompiledDatabase(dbFile)
}
// Subsequent runs — load (no compilation)
HyperScanEngine.fromCompiledDatabase(matchers, dbFile).use { engine ->
engine.scan(text).forEach { match ->
println("${match.matcher.name}: ${match.value}")
}
}In-memory ByteArray variants are also available:
// Save to bytes
val bytes: ByteArray = engine.saveCompiledDatabase()
// Load from bytes
val fast = HyperScanEngine.fromCompiledDatabase(matchers, bytes)Compatibility note: the saved database is tied to the exact matcher set, their order, and the
requireKeywordsflag used during compilation. Loading a database with a different configuration will throwIllegalArgumentException. The binary format is also platform-specific (see Hyperscan documentation).
import org.angryscan.common.engine.kotlin.IKotlinMatcher
import org.angryscan.common.engine.kotlin.KotlinEngine
import org.angryscan.common.extensions.Matchers
val engine = KotlinEngine(Matchers.filterIsInstance<IKotlinMatcher>())
val matches = engine.scan(text)You can mask card numbers using the matcher's IMask API to keep only the first 6 and last 4 digits visible.
import org.angryscan.common.matchers.CardNumber
val masked = CardNumber().mask("1234 5678 9012 3456")
// masked == "1234 56** **** 3456"To redact card numbers inside a larger text using scan results:
import org.angryscan.common.engine.kotlin.IKotlinMatcher
import org.angryscan.common.engine.kotlin.KotlinEngine
import org.angryscan.common.extensions.Matchers
import org.angryscan.common.engine.IMask
import org.angryscan.common.matchers.CardNumber
val text = "Card 4276 8070 1492 7948 and 1234567890123456"
val engine = KotlinEngine(Matchers.filterIsInstance<IKotlinMatcher>())
val matches = engine.scan(text)
val sb = StringBuilder(text)
matches
.filter { it.matcher is CardNumber }
.sortedByDescending { it.startPosition }
.forEach { m ->
val masked = (m.matcher as IMask).mask(m.value)
sb.replace(m.startPosition, m.endPosition, masked)
}
val redacted = sb.toString()
// e.g. "Card 4276 80** **** 7948 and 123456******3456"import org.angryscan.common.matchers.CardNumber
import org.angryscan.common.matchers.UserSignature
import org.angryscan.common.engine.kotlin.KotlinEngine
val customEngine = KotlinEngine(
listOf(
CardNumber(checkCardBins = false), // skip BIN validation for testing data
UserSignature(
name = "Corporate stamp",
searchSignatures = mutableListOf("OOO Romashka", "ZAO Test")
)
)
)Each Match contains:
| Field | Description |
|---|---|
value |
Extracted token. |
before / after |
Up to ten characters of context on each side. |
startPosition / endPosition |
Absolute offsets counted from zero. |
matcher |
Reference to the matcher that produced the hit. |
Use this metadata to redact or highlight findings in downstream systems.
# Run all tests (JVM + JS + Native)
./gradlew check
# JVM-specific tests
./gradlew :kotlin-lib:jvmTest
# Produce a fat JAR with all dependencies
./gradlew :kotlin-lib:jvmShadowJarOn Windows, use
gradlew.batinstead of./gradlew.
./gradlew :native-lib:linkNativeReleaseSharedThe command produces native-lib/build/bin/native/releaseShared/AngryData.*. The Python demo in python-example/interop.py shows how to load this library with ctypes, call functions such as detectPassport, and clean detected text using the exported utilities.
- Keep additions ASCII unless the feature requires Cyrillic or other Unicode characters already present in the codebase.
- Add or update tests for new matchers or engine features.
- Run
./gradlew checkbefore opening a pull request.
Angry Data Core is distributed under the Apache License 2.0. See LICENSE for details.