Replies: 1 comment
-
|
This seems to me like a good idea, and this is similar to what I've done with Coggle-TOK-Assembler. I decided to not use the pipeline structure that we have her with the span splitting and then role classification, rather I decided to tag each token from the start into a couple fundamental classes and then work my way up into grouping them. In a nutshell, here I'm trying a top down approach (first classifying the spans as a whole and then looking at tokens) while what you're suggesting is a bottom up approach with understanding what each token says and then grouping them together to form some semantic meaning out of them. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Firstly, this is some really cool work being done here. I had been looking for a project working this domain. From what I could gather from your literature is that the current NLP pipeline design has two span level classification stages after the span splitter is first the pre classifier that assigns an over arching class to the span based on mostly grammer rules and then you have the subclassifier which categorizes it and extracts a structured object.
The problem this design runs into is that span category is determined top down from preposition patterns, which breaks on spans where the leading token suggests one category but the content suggests another.
If we look at a span like this:
convert all videos to 1080pThe current process would work like so,
but 1080p is actually an ARGUMENT (output resolution), not a destination filepath at all
What I'm proposing is that instead of classifying spans top down, classify each token individually first, then assemble span-level meaning bottom-up from token composition.
So the token classifier would tell you what does each token signify while the span assembler does the job of analyzing what the combination of tokens means as a whole.
The span assembler produces the same final output as the old preclassifier + subclassifier combined, which is a SpanResult with category, subtype, value, and confidence, but it makes that determination based on token type composition rather than POS patterns alone.
Beta Was this translation helpful? Give feedback.
All reactions