Skip to content

Instantiate DatasetContextClassifier in DatasetParser and guard null case#19

Merged
lfoppiano merged 1 commit into
devfrom
claude/fix-dataset-classifier-null-BW5OR
Apr 14, 2026
Merged

Instantiate DatasetContextClassifier in DatasetParser and guard null case#19
lfoppiano merged 1 commit into
devfrom
claude/fix-dataset-classifier-null-BW5OR

Conversation

@lfoppiano
Copy link
Copy Markdown
Collaborator

Summary

  • Fix NPE at DatasetParser.java:1485 on every /service/processDatasetPDF request: this.datasetContextClassifier was permanently null because the getInstance(...) factory accepted the classifier as a parameter but the private constructor it delegated to only took configuration, making this.datasetContextClassifier = datasetContextClassifier a field-to-itself no-op. The same pattern was already fixed for DatasetDisambiguator in Fix NPE when disambiguator is null in DatasetParser #18 — this change applies it to DatasetContextClassifier: eager init via DatasetContextClassifier.getInstance(configuration) in the constructor, try/catch so model-loading failures don't cascade, warn-once helper, and a null-guard at the call site for graceful degradation.
  • Also removed the equivalent dead no-op line this.dataTypeClassifier = dataTypeClassifier; (real init is lazy in dataTypeClassify at line 649-650 and untouched).
  • Bumped project version to 0.9.0 in build.gradle, resources/config/config.yml, resources/config/config-docker.yml, and Readme.md (docker tags, GROBID_VERSION build-arg, sample JSON response, build instructions). GROBID dependencies are already at 0.9.0 since Remove DataseerML, upgrade to grobid 0.9.0 and Gradle 8.5 #5.

Compatibility with recent fixes

Test plan

  • ./gradlew compileJava — passes.
  • Happy path: with full GROBID install and context models present, start the service (./gradlew run) and curl -F "input=@<sample>.pdf" http://localhost:8060/service/processDatasetPDF. Expect HTTP 200, no NPE in logs, dataset mentions carrying used/created/shared context labels.
  • Degraded path: simulate missing context models by renaming context_used/context_creation/context_shared under grobid-home/models/, restart, and resend the request. Expect HTTP 200 with mentions (no context labels), a single warn-once log line, no NPE, and /service/health reporting the context models as failed.
  • Version bump visible: /service/processDatasetPDF JSON response shows "version": "0.9.0".

https://claude.ai/code/session_01MLUqfjKtqyvzt9ALYKpshr

…case

The datasetContextClassifier field was never initialized — the getInstance
factory accepts it as a parameter but the private constructor discarded it,
so `this.datasetContextClassifier = datasetContextClassifier` was a no-op
field-to-itself assignment. This caused an NPE at DatasetParser.java:1485
on every /service/processDatasetPDF request.

Apply the same pattern as #18 (DatasetDisambiguator): eagerly instantiate
via DatasetContextClassifier.getInstance(configuration) in the constructor,
catch model-loading failures, and null-guard the call site with a
warn-once helper so the service degrades gracefully when context models
are unavailable.

Also bump project version to 0.9.0 in build.gradle, config.yml,
config-docker.yml, and Readme.md.

https://claude.ai/code/session_01MLUqfjKtqyvzt9ALYKpshr
@lfoppiano lfoppiano merged commit 0d32a2f into dev Apr 14, 2026
3 checks passed
@lfoppiano lfoppiano deleted the claude/fix-dataset-classifier-null-BW5OR branch April 14, 2026 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants