Adopt a per-token classifier and assembler over Pre-classifier/Sub-classifier Architecture #13

peliclove · 2026-04-03T08:21:24Z

peliclove
Apr 3, 2026

Firstly, this is some really cool work being done here. I had been looking for a project working this domain. From what I could gather from your literature is that the current NLP pipeline design has two span level classification stages after the span splitter is first the pre classifier that assigns an over arching class to the span based on mostly grammer rules and then you have the subclassifier which categorizes it and extracts a structured object.

The problem this design runs into is that span category is determined top down from preposition patterns, which breaks on spans where the leading token suggests one category but the content suggests another.

If we look at a span like this:

convert all videos to 1080p

The current process would work like so,

[to, 1080p] preclassifier says DESTINATION (leading "to")
subclassifier says enum (bare word, not a path)

but 1080p is actually an ARGUMENT (output resolution), not a destination filepath at all

What I'm proposing is that instead of classifying spans top down, classify each token individually first, then assemble span-level meaning bottom-up from token composition.

So the token classifier would tell you what does each token signify while the span assembler does the job of analyzing what the combination of tokens means as a whole.

The span assembler produces the same final output as the old preclassifier + subclassifier combined, which is a SpanResult with category, subtype, value, and confidence, but it makes that determination based on token type composition rather than POS patterns alone.

sortedcord · 2026-04-04T17:25:36Z

sortedcord
Apr 4, 2026
Maintainer

This seems to me like a good idea, and this is similar to what I've done with Coggle-TOK-Assembler. I decided to not use the pipeline structure that we have her with the span splitting and then role classification, rather I decided to tag each token from the start into a couple fundamental classes and then work my way up into grouping them.

In a nutshell, here I'm trying a top down approach (first classifying the spans as a whole and then looking at tokens) while what you're suggesting is a bottom up approach with understanding what each token says and then grouping them together to form some semantic meaning out of them.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adopt a per-token classifier and assembler over Pre-classifier/Sub-classifier Architecture #13

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Adopt a per-token classifier and assembler over Pre-classifier/Sub-classifier Architecture #13

Uh oh!

peliclove Apr 3, 2026

Replies: 1 comment

Uh oh!

sortedcord Apr 4, 2026 Maintainer

peliclove
Apr 3, 2026

sortedcord
Apr 4, 2026
Maintainer