Translate Russian words and phrases to English with Google Gemini AI and Google Cloud Text-to-Speech. The results are used to create Russian-English note data for Anki flashcards.
The input text file consists of Russian words and phrases to be translated. Each line can have one or more Russian texts separated by semicolons. Blank lines and lines starting with hash (#) are ignored.
The output file is a semicolon delimited text file compatible with Anki text import.
The Anki note records have following fields -
- russian - the Russian text to be translated from the input textfile
- stressed_russian - the Russian text with added accute stress accent
- romanize - a Latin transliteration of the Russian text
- audio - an MP3 sound clip of the Russian generated by Text-to-Speech
- english - an English translation of the Russian text
- notes - additional information not populated by this app
- section - section information from comments preceding the Russian text
The Gemini AI performs a number of tasks. It first does a spelling check on the
Russian input texts and provides a short description if errors are identified.
If no errors are identified, Gemini adds NFD accute accents (U+0301) to stressed
vowels and creates a Latin transliteration using the BGN/PCGN system and an English
translation with brief explanitory notes where appropriate. Results are returned as
JSON data. See the system_instruction and AnkiBaseNote class in anki_import_ai.py.
Sound files are saved to the default Anki Media Folder (collection.media) or a location specified on the command line. A sound filename prefix must be specified on the command line. A sequential numerical index is added to the sound filename.
A mechanism is provided allowing vocabulary files to be created and maintained on Google Drive and downloaded to local storage for processing. New words and phrases can be added to an existing vocabulary file and the processing will create Anki import files for newly added content.
Comments in the source text files are used as section names and added to the section field of the Anki notes. This can be used to create filtered decks used to limit study to portions of a note deck. See data/Course.pdf for an example.
Header information to be passed to the processing routine can be included with comment lines in the format:
# name:value
Headers start on the first line of the input file and the header section ends with the first non-header line. Leading and trailing whitespace is stripped from the header name and value. The header name can not contain a colon.
Supported headers are:
# deck:deck_name