Skip to content

fix: replace manual anomalies with a hampel filter#1997

Open
matthewp wants to merge 3 commits intonpmx-dev:mainfrom
matthewp:feat/hampel-anomaly-detection
Open

fix: replace manual anomalies with a hampel filter#1997
matthewp wants to merge 3 commits intonpmx-dev:mainfrom
matthewp:feat/hampel-anomaly-detection

Conversation

@matthewp
Copy link

@matthewp matthewp commented Mar 8, 2026

🔗 Linked issue

Previous issue: #1707

🧭 Context

The current implementation of anomaly removal is biased due to the manual nature.

This replaces the implementation with one that uses a hampel filter to automatically remove deviations.

📚 Description

The important part here is applyHampelCorrection.

It goes over each data point and creates a sliding window of data points, by default 3 on each side.

Of those data points it finds the median, since a spike can't pull it off center.

Then it measures how spread out the neighbors are by calculating the MAD (see here).

Then it gives the data point a score and if it's above a threshold then it gets replaced with the median of that window and marked as being an anomaly.

Here are two articles about how hampel filters work:

Here's a screenshot of Vite which still shows its spike removed after this change.

Screenshot from 2026-03-08 17-24-36

@vercel
Copy link

vercel bot commented Mar 8, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
npmx.dev Ready Ready Preview, Comment Mar 8, 2026 9:45pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
docs.npmx.dev Ignored Ignored Preview Mar 8, 2026 9:45pm
npmx-lunaria Ignored Ignored Mar 8, 2026 9:45pm

Request Review

@matthewp matthewp changed the title Replace manual anomalies with a hampel filter fix: replace manual anomalies with a hampel filter Mar 8, 2026
@codecov
Copy link

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 23.52941% with 26 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
app/utils/download-anomalies.ts 20.00% 20 Missing and 4 partials ⚠️
app/components/Package/TrendsChart.vue 50.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

github-actions bot commented Mar 8, 2026

Lunaria Status Overview

🌕 This pull request will trigger status changes.

Learn more

By default, every PR changing files present in the Lunaria configuration's files property will be considered and trigger status changes accordingly.

You can change this by adding one of the keywords present in the ignoreKeywords property in your Lunaria configuration file in the PR's title (ignoring all files) or by including a tracker directive in the merged commit's description.

Tracked Files

File Note
i18n/locales/bg-BG.json Localization changed, will be marked as complete.
i18n/locales/cs-CZ.json Localization changed, will be marked as complete.
i18n/locales/de-DE.json Localization changed, will be marked as complete.
i18n/locales/en.json Source changed, localizations will be marked as outdated.
i18n/locales/es.json Localization changed, will be marked as complete.
i18n/locales/fr-FR.json Localization changed, will be marked as complete.
i18n/locales/hu-HU.json Localization changed, will be marked as complete.
i18n/locales/id-ID.json Localization changed, will be marked as complete.
i18n/locales/ja-JP.json Localization changed, will be marked as complete.
i18n/locales/pl-PL.json Localization changed, will be marked as complete.
i18n/locales/ru-RU.json Localization changed, will be marked as complete.
i18n/locales/tr-TR.json Localization changed, will be marked as complete.
i18n/locales/uk-UA.json Localization changed, will be marked as complete.
i18n/locales/zh-CN.json Localization changed, will be marked as complete.
i18n/locales/zh-TW.json Localization changed, will be marked as complete.
Warnings reference
Icon Description
🔄️ The source for this localization has been updated since the creation of this pull request, make sure all changes in the source have been applied.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

This PR replaces blocklist-based anomaly correction with a Hampel-filter implementation (add: applyHampelCorrection) and removes the DOWNLOAD_ANOMALIES dataset and related blocklist functions. TrendsChart and WeeklyDownloadStats now call the Hampel correction. Per-package anomaly UI, detailed anomaly tooltips and related state were removed or simplified. i18n keys for known anomaly ranges were deleted across locales and schema. Tests were updated to exercise the Hampel-based correction.

Possibly related PRs

Suggested reviewers

  • graphieros
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description clearly relates to the changeset, explaining the shift from manual blocklist-based anomaly correction to Hampel filter-based automatic anomaly detection with supporting technical details.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3


ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: beb9630b-e84a-4279-a4f8-362d18f451b4

📥 Commits

Reviewing files that changed from the base of the PR and between d26e250 and 1b0a0c7.

📒 Files selected for processing (5)
  • app/components/Package/TrendsChart.vue
  • app/components/Package/WeeklyDownloadStats.vue
  • app/utils/download-anomalies.data.ts
  • app/utils/download-anomalies.ts
  • test/unit/app/utils/download-anomalies.spec.ts
💤 Files with no reviewable changes (1)
  • app/utils/download-anomalies.data.ts

Comment on lines +32 to +34
// halfWindow controls how many neighbors on each side to consider.
// A window of 3 means we look at 7 points total (3 left + current + 3 right).
const halfWindow = opts?.halfWindow ?? DEFAULT_HALF_WINDOW
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not score boundary samples with truncated windows.

Line 33 says a halfWindow of 3 uses 3 neighbours on each side, but Lines 50-52 clamp the first and last samples to shorter windows after Line 42 only validates the overall series length. That makes edge points easy false positives: 100,100,100,100,100,100,200 gets its last point flattened back to 100 because there is no right-hand context and windowMad falls to 0. Skip indices that cannot form a full symmetric window, or handle boundaries explicitly.

Suggested fix
-  for (let i = 0; i < values.length; i++) {
-    // Build a sliding window around the current point, clamped to array bounds.
-    const start = Math.max(0, i - halfWindow)
-    const end = Math.min(values.length - 1, i + halfWindow)
+  for (let i = halfWindow; i < values.length - halfWindow; i++) {
+    // Only evaluate points that have a full symmetric window.
+    const start = i - halfWindow
+    const end = i + halfWindow
     const window = values.slice(start, end + 1)

Also applies to: 41-42, 48-52

Comment on lines +65 to +72
// MAD of 0 means most values in the window are identical.
// If this point differs from the median at all, it's an outlier.
if (windowMad === 0) {
if (deviation > 0) {
result[i]!.value = Math.round(windowMedian)
result[i]!.hasAnomaly = true
}
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The zero-MAD branch will erase real low-volume traffic.

When windowMad === 0, Line 68 treats any deviation from the local median as an anomaly. For sparse packages, a legitimate series like 0,0,0,1,0,0,0 is rewritten to all zeros on Line 69, which drops the only real activity in that period. Please gate this path behind an absolute/relative floor, or leave zero-MAD windows untouched.

Comment on lines +16 to 83
describe('applyHampelCorrection', () => {
it('flags and corrects a spike in the middle of steady data', () => {
const data: WeeklyDataPoint[] = [
makeWeeklyPoint('2022-11-07', 100),
makeWeeklyPoint('2022-11-14', 100),
makeWeeklyPoint('2022-11-21', 100),
makeWeeklyPoint('2022-11-28', 1000), // spike
makeWeeklyPoint('2022-12-05', 100),
makeWeeklyPoint('2022-12-12', 100),
makeWeeklyPoint('2022-12-19', 100),
]

const result = applyHampelCorrection(data) as WeeklyDataPoint[]

// The spike should be corrected
expect(result[3]!.hasAnomaly).toBe(true)
expect(result[3]!.value).toBe(100) // replaced with median

// Non-spike points should be unchanged
expect(result[0]!.value).toBe(100)
expect(result[0]!.hasAnomaly).toBeUndefined()
expect(result[1]!.value).toBe(100)
expect(result[6]!.value).toBe(100)
})

it('does not flag gradual growth as anomalies', () => {
const data: WeeklyDataPoint[] = [
makeWeeklyPoint('2022-11-07', 100),
makeWeeklyPoint('2022-11-14', 110),
makeWeeklyPoint('2022-11-21', 120),
makeWeeklyPoint('2022-11-28', 130),
makeWeeklyPoint('2022-12-05', 140),
makeWeeklyPoint('2022-12-12', 150),
makeWeeklyPoint('2022-12-19', 160),
]

const result = applyHampelCorrection(data) as WeeklyDataPoint[]

for (const point of result) {
expect(point.hasAnomaly).toBeUndefined()
}
})

it('returns data unchanged when too few points for the window', () => {
const data: WeeklyDataPoint[] = [
makeWeeklyPoint('2022-11-07', 100),
makeWeeklyPoint('2022-11-14', 1000),
makeWeeklyPoint('2022-11-21', 100),
]

const result = applyHampelCorrection(data) as WeeklyDataPoint[]
expect(result[1]!.value).toBe(1000) // not enough data to detect
})

it('does not mutate the original data', () => {
const data: WeeklyDataPoint[] = [
{
value: 100,
weekKey: '2022-11-07_2022-11-13',
weekStart: '2022-11-07',
weekEnd: '2022-11-13',
timestampStart: 0,
timestampEnd: 0,
},
{
value: 999,
weekKey: '2022-11-14_2022-11-20',
weekStart: '2022-11-14',
weekEnd: '2022-11-20',
timestampStart: 0,
timestampEnd: 0,
},
{
value: 999,
weekKey: '2022-11-21_2022-11-27',
weekStart: '2022-11-21',
weekEnd: '2022-11-27',
timestampStart: 0,
timestampEnd: 0,
},
{
value: 999,
weekKey: '2022-11-28_2022-12-04',
weekStart: '2022-11-28',
weekEnd: '2022-12-04',
timestampStart: 0,
timestampEnd: 0,
},
{
value: 200,
weekKey: '2022-12-05_2022-12-11',
weekStart: '2022-12-05',
weekEnd: '2022-12-11',
timestampStart: 0,
timestampEnd: 0,
},
makeWeeklyPoint('2022-11-07', 100),
makeWeeklyPoint('2022-11-14', 100),
makeWeeklyPoint('2022-11-21', 100),
makeWeeklyPoint('2022-11-28', 1000),
makeWeeklyPoint('2022-12-05', 100),
makeWeeklyPoint('2022-12-12', 100),
makeWeeklyPoint('2022-12-19', 100),
]

expect(
applyBlocklistCorrection({
data,
packageName: 'svelte',
granularity: 'weekly',
}),
).toEqual([
data[0],
{ ...data[1], value: 125, hasAnomaly: true },
{ ...data[2], value: 150, hasAnomaly: true },
{ ...data[3], value: 175, hasAnomaly: true },
data[4],
])
applyHampelCorrection(data)
expect(data[3]!.value).toBe(1000) // original unchanged
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Please add regressions for the boundary and sparse-series cases.

The suite covers a centred large spike and gradual growth, but not the two easiest false-positive cases in this implementation: the first/last halfWindow indices, and flat low-volume series such as 0,0,0,1,0,0,0. Please pin both behaviours down here so the filter does not silently flatten real data at the edges or on sparse packages.

As per coding guidelines "**/*.{test,spec}.{ts,tsx}: Write unit tests for core functionality using vitest".

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
app/components/Package/TrendsChart.vue (1)

1831-1838: Consider using v-model for simpler checkbox binding.

The explicit :checked + @change pattern is functionally correct, but v-model provides equivalent behaviour with less boilerplate.

♻️ Optional simplification
 <input
-  :checked="settings.chartFilter.anomaliesFixed"
-  `@change`="
-    settings.chartFilter.anomaliesFixed = ($event.target as HTMLInputElement).checked
-  "
+  v-model="settings.chartFilter.anomaliesFixed"
   type="checkbox"
   class="accent-[var(--accent-color,var(--fg-subtle))]"
 />

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 91971e88-96ef-48e4-befd-3a0865a5eb4c

📥 Commits

Reviewing files that changed from the base of the PR and between 1b0a0c7 and 17e63ce.

📒 Files selected for processing (17)
  • app/components/Package/TrendsChart.vue
  • i18n/locales/bg-BG.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/de-DE.json
  • i18n/locales/en.json
  • i18n/locales/es.json
  • i18n/locales/fr-FR.json
  • i18n/locales/hu-HU.json
  • i18n/locales/id-ID.json
  • i18n/locales/ja-JP.json
  • i18n/locales/pl-PL.json
  • i18n/locales/ru-RU.json
  • i18n/locales/tr-TR.json
  • i18n/locales/uk-UA.json
  • i18n/locales/zh-CN.json
  • i18n/locales/zh-TW.json
  • i18n/schema.json
💤 Files with no reviewable changes (16)
  • i18n/locales/en.json
  • i18n/locales/bg-BG.json
  • i18n/locales/es.json
  • i18n/locales/de-DE.json
  • i18n/locales/zh-TW.json
  • i18n/locales/fr-FR.json
  • i18n/locales/tr-TR.json
  • i18n/locales/cs-CZ.json
  • i18n/locales/hu-HU.json
  • i18n/schema.json
  • i18n/locales/zh-CN.json
  • i18n/locales/uk-UA.json
  • i18n/locales/ru-RU.json
  • i18n/locales/ja-JP.json
  • i18n/locales/pl-PL.json
  • i18n/locales/id-ID.json

@danielroe
Copy link
Member

danielroe commented Mar 8, 2026

this looks very promising! tagging @jycouet who may have some thoughts on this 🙏

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Great to have a second look into this.

You probably checked my initial PR, #1636 I started with hampel implementation 👍

If I remember correctly, the sweet spot was 2.5 or 3 for vite. But this setting would hide "great start" of a lib. (eg: 0 0 0 0 0 0 20000)
I will pull the branch later (I'm on phone atm) as I'm curious to see it, maybe you have a more robust implem' 👍

Another note, when the start or end of the chart is in an anomalie period, any algo can't fix it. A good example is: if you start the chart in the middle of the vite spike.

Maybe we should add some tests around all this ? (to keep the intent)

Another note 2: I would love to have more than just vite there today ! 😅

@graphieros
Copy link
Contributor

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Another note 2: I would love to have more than just vite there today ! 😅

@jycouet corrections were applied to Svelte in #1934 & #1983

That's the drawback of answering on phone
It shows my age 😅

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

Thanks for this! A few thoughts after pulling the branch:

Reproduction URL that matters:
http://127.0.0.1:3000/package/vite?modal=chart&start=2025-03-09&end=2026-03-07
(because I suspect there's also a day-of-the-week issue atm, see #2005 let's not speak about this here)

"Great start" test cases:

Packages like @bramus/specificity or @sveltejs/sv-utils are good real-world examples of the 0 0 0 0 0 → lots of downloads pattern.
Would be great to add these as test cases (or manual verification) to see how Hampel manages them (or to find good defaults)

PR suggestion

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

// Happy coding

@matthewp
Copy link
Author

matthewp commented Mar 9, 2026

Could we expose the Hampel filter as its own (separate from manual correction), with tweakable halfWindow / threshold sliders? That way we can compare manual vs Hampel side by side and find the right balance before committing to one approach?
Let me know if you want me to help or if you want to do it?

I don't follow, as its own what?

@jycouet
Copy link
Contributor

jycouet commented Mar 9, 2026

own

Something like
image

It's own controls. (It's more to be able to tests different senarios)

@matthewp
Copy link
Author

matthewp commented Mar 9, 2026

Yes I can do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants