Our README originally had this as a potential qualitative way to analyze metrics. We actually can generate this plot, we have all the scores and everything, we just didn't get around to it for the study.
We would expect a plot like this to look mostly high for a good metric, but somewhat lower for in-class samples.

(note: image is from https://www.researchgate.net/publication/385010313_signwriting-evaluation_Effective_Sign_Language_Evaluation_via_SignWriting)
Our README originally had this as a potential qualitative way to analyze metrics. We actually can generate this plot, we have all the scores and everything, we just didn't get around to it for the study.
We would expect a plot like this to look mostly high for a good metric, but somewhat lower for in-class samples.
(note: image is from https://www.researchgate.net/publication/385010313_signwriting-evaluation_Effective_Sign_Language_Evaluation_via_SignWriting)