Add tests.py: unit tests and paper experiment replications for ASOI#6
Add tests.py: unit tests and paper experiment replications for ASOI#6
Conversation
Agent-Logs-Url: https://github.com/Javen-W/ASOI-Python/sessions/367f26f4-5341-4316-826e-899f6a34d16c Co-authored-by: Javen-W <21213702+Javen-W@users.noreply.github.com>
Javen-W
left a comment
There was a problem hiding this comment.
Add instructions on how to run these tests. Also add documentation with the actual results of running these unit tests, maybe at the end of the README.md, to demonstrate the validity of this implementation.
Agent-Logs-Url: https://github.com/Javen-W/ASOI-Python/sessions/0d7dd403-257f-4ace-95fa-186e893b2d32 Co-authored-by: Javen-W <21213702+Javen-W@users.noreply.github.com>
Done in commit
|
|
@copilot as per your testing example in the README.md, when there is a noise level of 0%, and F1 score of 1.0, why is our ASOI score only 0.7430 and not 1.0? |
ASOI is not a measure of label correctness — it measures the geometric structure of the data in feature space. F1 = 1.0 simply means the labels perfectly identify which points are anomalies; it says nothing about how far apart the two groups actually sit in feature space. ASOI has two components:
So ASOI = 1.0 would require perfect geometric separation: every anomaly at the farthest possible position from all normals and zero shared histogram mass in every feature. That is essentially impossible for data drawn from overlapping continuous distributions. In the synthetic test case (normals ~ N(0,1), anomalies ~ N(5,1) in 5D), the two clusters are well-separated but the tails of the Gaussian distributions overlap slightly, and the anomaly centroid is not at the feature-wise maximum. That is why ASOI = 0.7430 despite F1 = 1.0. This is intentional and is actually the strength of ASOI: a detector that perfectly labels hard-to-separate points (F1 = 1.0, ASOI = 0.55) is working harder than one that perfectly labels trivially-separated points (F1 = 1.0, ASOI = 0.95). ASOI captures that distinction; F1 alone cannot. |
The repository had no test coverage for the ASOI implementation. This adds
tests.pywith 24 tests that validate correctness and replicate key experimental findings from the paper (precision degradation, Spearman correlation with F1).Test classes
TestInputValidation—ValueErroron invalidalpharange and non-binary / single-class label arrays.TestScoreProperties— Return type,[0, 1]bounds, determinism, list inputs, constant/single features,alpha=0/alpha=1edge cases,normalizeflag, separation vs overlap ordering.TestPrecisionDegradation— Replicates the paper's precision degradation experiment: ASOI decreases as random label noise is injected, and perfect labels outscore random labels on a well-separated synthetic dataset.TestSpearmanCorrelation— Replicates the paper's core correlation finding: ASOI has positive Spearman correlation with F1 score as label quality varies, tested on Breast Cancer Wisconsin and a binary Digits subset.TestBenchmarkDatasets— Score sanity checks on Breast Cancer Wisconsin, binary Digits, and synthetic datasets covering low contamination (~2%), high contamination (~40%), and high dimensionality (100 features).Example
Requires
scipy(forspearmanr) in addition to the existingnumpy/scikit-learndependencies.