Skip to content

Support non-binary word vectors in compute-accuracy#8

Open
Witiko wants to merge 1 commit into
agnusmaximus:masterfrom
Witiko:non-binary-compute-accuracy
Open

Support non-binary word vectors in compute-accuracy#8
Witiko wants to merge 1 commit into
agnusmaximus:masterfrom
Witiko:non-binary-compute-accuracy

Conversation

@Witiko
Copy link
Copy Markdown

@Witiko Witiko commented Jan 10, 2019

This PR makes adds the binary and text format identifiers to the header of word vectors to tell the difference between binary and non-binary word vector formats. The compute-accuracy script has been extended to recognize both binary and non-binary word vectors. Binary word vectors without the format identifier are also recognized to remain backwards-compatible.

The PR has been successfully tested as follows:

  • Three word vector files were produced using -binary 0 on HEAD, -binary 1 on HEAD, and -binary 1 on HEAD^.
  • All three word vector files were evaluated using compute-accuracy to see if the standard deviation of the total accuracy was below 1%.

@Witiko
Copy link
Copy Markdown
Author

Witiko commented Jan 29, 2019

I am sorry, I did not realize the vector file format was a de-facto standard. I suppose a better course of action might be to add a -binary argument to compute-accuracy.c instead.

@Witiko Witiko force-pushed the non-binary-compute-accuracy branch from 442386f to b7a8906 Compare April 9, 2019 16:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant