-
Notifications
You must be signed in to change notification settings - Fork 1
Using FlexiTerm
Unai Lopez edited this page Jul 5, 2018
·
2 revisions
Software requirements to run FlexiTerm:
- Java version "1.6.0" or above
- Java(TM) SE Runtime Environment (build 1.6.0-b105)
- Java HotSpot(TM) Client VM (build 1.6.0-b105, mixed mode)
How to run FlexiTerm:
- Place plain text files into "text" folder.
- OPTIONAL: Replace file stoplist.txt in "resources" folder with your own, if needed.
- Run FlexiTerm.bat (Windows) or FlexiTerm.sh (Unix/Linux) at "script" folder from the command line.
- Check results in "out" folder. They will be presented in different formats: txt, csv and html.
Folder structure:
- bin: Binary (Java .class) files
- config: Contains "settings.txt" file with configuration options for FlexiTerm
- lib: External libraries required by FlexiTerm
- out: Output files
- resources: Contains text resources required by FlexiTerm, including
- resources/dict: WordNet files used by WordNet.java
- resources/models: Models used by the Stanford CoreNLP.
- resources/stoplist.txt : Stoplist used to filter out stopwords.
- resources/dictionary.txt : A list of distinct tokens used as a dictionary by Jazzy to suggest similar tokens.
- script: Windows/Unix scripts to run FlexiTerm
- src: Source (.java) files
- text: Input text files.
Output files format:
- output.csv : A table of results: Rank | Term | Score | Frequency
- output.html : A table of results: ID | Term variants | Score | Rank
- output.txt : A list of recognised term variants ordered by their scores.
- output.mixup : A Mixup file used by MinorThird to annotate term occurrences in text.
- text.html : Input text annotated with occurrences of terms listed in output.html.
- log.txt : Listing output used for debugging.
Format of configuration file settings.txt:
- term pattern(s)
- stoplist
- edit distance threshold
- minimum term candidate frequency
- minimum (implicit) acronym frequency
- acronym recognition mode
- profiling of the runtime, divided into blocks (0 is off, 1 is on)
Default parameters in settings.txt:
- pattern = "(((((NN|JJ) )*NN) IN (((NN|JJ) )*NN))|((NN|JJ )*NN POS (NN|JJ )*NN))|(((NN|JJ) )+NN)"
- max = 3 : Jazzy distance threshold: How many operations away? Reduce for better similarity.
- min = 2 : Term frequency threshold: occurrence > min. Increase for better precision.
- MIN = 9 : Implicit acronym frequency threshold: occurrence > min. Increase for better precision
- acronyms = explicit : Acronyms have to be explicitly defined in text using parentheses.
- profiling = 0 : Profiling is disabled