-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi, the existing support for VoiceOver is broken. I have done a lot of research, and created a fork with my findings on this subject. I would like to open a discussion on my findings, and possible better solutions before I create a pull request.
The existing solution used right now for VoiceOver support uses a library called appscript. I was never able to actually get this to work myself, however a quick Google search on this library shows that appscript is actually deprecated
http://appscript.sourceforge.net/
I opened a thread in the audiogames.net developers room a while back to discuss the issues that I was having with the current implementation of the VoiceOver class, and my findings back then
https://forum.audiogames.net/topic/37573/voiceover-error-using-accessibleoutput2/
I pasted the existing implementation of the VoiceOver class for reference below
from __future__ import absolute_import
from .base import Output
class VoiceOver(Output):
"""Speech output supporting the Apple VoiceOver screen reader."""
name = "VoiceOver"
def __init__(self, *args, **kwargs):
import appscript
self.app = appscript.app("voiceover")
def speak(self, text, interrupt=False):
self.app.output(text)
def silence(self):
self.app.output(u"")
def is_active(self):
return self.app.isrunning()
output_class = VoiceOver
I started doing research into alternative methods for making VoiceOver speak on Mac, and actually found this rather helpful article
https://wiki.lazarus.freepascal.org/macOS_Text-To-Speech
To summarize this article, there are a total of 5 ways to get speech output on Mac that I am aware of:
- Speech Synthesis Manager: Looks like an older way to get speech output, and it does not have friendly functions to use. I did not look into this. I am not even sure if there are Python bindings for the framework this lives in under Application Services.
https://developer.apple.com/documentation/applicationservices/speech_synthesis_manager - NSSpeechSynthesizer: This was actually the solution I ended up using for default output. There is a lot of good functionality here, and AppKit, the framework NSSpeechSynthesizer lives in, already has Python bindings.
https://developer.apple.com/documentation/appkit/nsspeechsynthesizer - AVSpeechSynthesizer: I did some research on this, however could not find confirmation of existing Python bindings for the framework AVFAudio. I did not look into this thoroughly since I had already written a class using NSSpeechSynthesizer. Perhaps this is something that should be explored more.
https://developer.apple.com/documentation/avfaudio/avspeechsynthesizer - say utility: This was the easiest to test since it is just a command, however I would prefer something a bit more lower level. It just does not seem ideal to be using say to get output, especially since it is necessary to fork a new process to run the command every time.
https://ss64.com/osx/say.html - Apple Script output: This is a little less clear what is going on hear. The article provides just another way to use the say utility, however, there is actually another way to get speech output using Apple Script. You can talk to the currently running VoiceOver process, and use the output command to get VoiceOver to speak provided text. This is the only way I have found to talk to the running instance of VoiceOver. Every other method creates a new instance of VoiceOver independent of the user settings I.E. provides programatic control over settings, but uses default VoiceOver settings unless told otherwise. Since this was the only way to talk to VoiceOver directly, I used this as the primary speech output. It is not ideal since as noted above for the say utility, it is necessary to fork a process every time you want to speak, which is expensive.
example:tell application "VoiceOver" to output "hello world"
Note: I could not find any formal documentation on the output instruction to VoiceOver. I found the code in previous versions of AO2 and other projects. If anyone knows where to find the documentation please let me know.
So, as mentiond above, I ended up settling on Apple Script output for primary output assuming VoiceOver is running, and NSSpeechSynthesizer if VoiceOver is not running. There is one caveat to the Apple Script solution though. VoiceOver requires the user to enable to be controled by Apple Script. In the old AO2 code, this was handled like so:
First check if VoiceOver is running: (Very slow probably due to having to get all the processes)
tell application "system events" (name of processes) contains "VoiceOver"
Then check if VoiceOver can be controled by Apple Script by throwing a command at VoiceOver and returning false if the command fails (no idea where this command came from, probably lives in the elusive formal documentation on controling VoiceOver with Apple Script)
tell application "voiceover"
try
return bounds of vo cursor
on error
return false
end try
end tell
Note: I rewrote these commands to make them more readable, since they were nested inside Python strings, so sorry about any syntax errors.
The AO2 code I am refering to is here (not sure if this is apart of the existing commit history for this repo)
https://raw.githubusercontent.com/frastlin/accessible_output2/master/accessible_output2/outputs/voiceover.py
The issue with this solution is the incredible lag that is seen most clearly when using the Auto class. To run Apple Script through Python, you have to run it through the commandline tool osascript. This means that we have to use subprocess.Popen, or os.system to run these scripts. These functions will fork a new process to run the script, which means that with the existing implementation we are forking 3 separate processes everytime speak is called since is_active is called each time speak is called. WE first check if VoiceOver is running, then check if Apple Script control is enabled in VoiceOver, and finally a process to speak your output. This is incredibly expensive processing, and it was clear when I timed just the is_active method alone which was taking approximately 0.3 seconds, not to mention the speak function, which was taking a similar amount of time to execute. This meant that when Auto.output was called, an very noticeable lag could be observed by the user from when you should be getting speech output, and when you actually got speech output. I think that the lag is not all on process forking, I was seeing posts that were hinting at Apple Script not being particularly fast. Regardless, a different solution is necessary if we still want to talk to VoiceOver directly. Determining if VoiceOver is running is not terribly dificult to do outside of Apple Script. Python provides the psutils module to manage processes, however to check if VoiceOver Apple Script control is enabled had to be dropped. I could not find any other way other than throwing an script at VoiceOver and seeing if it failed to check this, and that is just to expenseive to be doing in is_active.
This is my solution for talking to VoiceOver directly using Apple Script. I cleaned up the code quite a bit, and also used the NSSpeechSynthesizer object to provide a bonus method for checking if VoiceOver is speaking. Unfortunately this was the only functionality out of NSSpeechSynthesizer that I found to interact with the running instance of VoiceOver.
import subprocess, psutil
from accessible_output2.outputs.base import Output
class VoiceOver(Output):
"""Speech output supporting the Apple VoiceOver screen reader."""
name = "VoiceOver"
def __init__(self, *args, **kwargs):
from AppKit import NSSpeechSynthesizer
self.NSSpeechSynthesizer = NSSpeechSynthesizer
def is_speaking(self):
return self.NSSpeechSynthesizer.isAnyApplicationSpeaking()
def run_apple_script(self, command, process = "voiceover"):
return subprocess.Popen(["osascript", "-e",
f"tell application \"{process}\"\n{command}\nend tell"],
stdout = subprocess.PIPE).communicate()[0]
def speak(self, text, interrupt=False):
# apple script output command seems to interrupt by default
# if an empty string is provided itseems to force voiceover to not interrupt
if not interrupt:
self.silence()
self.run_apple_script(f"output \"{text}\"")
def silence (self):
self.run_apple_script("output \"\"")
def is_active(self):
for process in psutil.process_iter():
if process.name().lower() == "voiceover":
return True
return False
output_class = VoiceOver
This class is not perfect by any means, but it gets the job done. The rather confusing interrupt flag in the speech method was something that I was wrestling with, but as mentioned in the comment, it seems that the default behavior is to interrupt, and oddly if you follow a spoken message by an empty string it actually forces polite behavior where VoiceOver was finishing the first message in full before starting to speak the second message. That being said, not sure how useful the silent method is here outside of the context of the speak method. This was a happy axident, since the default behavior is to interrupt, and there is no clear way to disable that feature. Also, the NSSpeechSynthesizer import looks a little odd, although I was just following the convention of the rest of AO2. This is just the only place that I found that it was necessary to access a static method, so I did not have an instance to work with. If anyone thinks this should change to perhaps putting the import at the top of the file, I am happy to change it. I know there is concern about importing modules that are only found on certain platforms, such as AppKit, which would throw an import error on Windows. I don't think this is a concern since in the outputs.init..py file conditionally imports the screen reader classes for each platform, so the VoiceOver class should never get imported if you are on Windows. If I am wrong in my assumption here please correct me.
The second class I wrote was using the AppKit.NSSpeechSynthesizer object. Conveniently, there were already Python bindings for AppKit I believe provided by Apple. I have installed so many things in my global Python installation on my Mac, I am not 100% certain, but I am pretty sure based on my research AppKit is a default library in modern Mac OS. If someone can confirm this that would be great. I just implemented this class the same way SAPI5 was implemented, providing access to is_speaking, speak, silent, and voice configurations. The only thing I could not get to work was the Pitch. It looks like there is a method to set the pitch, but not to get the pitch. I just left pitch out altogether as a result. I saw how you are supposed to get the pitch through some generic object property method, but I could not get it to work in Python. Any help on this would be appreciated, otherwise I don't think people are going to care much if they cannot change the pitch. The voices dict does not hold an instnce of the voice object, rather it holds a reference to the voice identifier. I was having some trouble since the setVoice method takes the identifier rather than the voice object. It can be done I think with the voice objects if anyone has a strong argument for using objects rather than identifiers in the dict. The last thing to note here is the name. I called the file system_voiceover.py and the class SystemVoiceOver. If anyone has a better idea for a name, I am happy to change it.
from __future__ import absolute_import
import platform
from collections import OrderedDict
from .base import Output, OutputError
class SystemVoiceOver(Output):
"""Default speech output supporting the Apple VoiceOver screen reader."""
name = "VoiceOver"
priority = 101
system_output = True
def __init__(self, *args, **kwargs):
from AppKit import NSSpeechSynthesizer
self.NSSpeechSynthesizer = NSSpeechSynthesizer
self.voiceover = NSSpeechSynthesizer.alloc().init()
self.voices = self._available_voices()
def _available_voices(self):
voices = OrderedDict()
for voice in self.NSSpeechSynthesizer.availableVoices():
voice_attr = self.NSSpeechSynthesizer.attributesForVoice_(voice)
voice_name = voice_attr["VoiceName"]
voice_identifier = voice_attr["VoiceIdentifier"]
voices[voice_name] = voice_identifier
return voices
def list_voices(self):
return list(self.voices.keys())
def get_voice(self):
voice_attr = self.NSSpeechSynthesizer.attributesForVoice_(self.voiceover.voice())
return voice_attr["VoiceName"]
def set_voice(self, voice_name):
voice_identifier = self.voices[voice_name]
self.voiceover.setVoice_(voice_identifier)
def get_rate(self):
return self.voiceover.rate()
def set_rate(self, rate):
self.voiceover.setRate_(rate)
def get_volume(self):
return self.voiceover.volume()
def set_volume(self, volume):
self.voiceover.setVolume_(volume)
def is_speaking(self):
return self.NSSpeechSynthesizer.isAnyApplicationSpeaking()
def speak(self, text, interrupt=False):
if interrupt:
self.silence()
return self.voiceover.startSpeakingString_(text)
def silence(self):
self.voiceover.stopSpeaking()
def is_active(self):
return self.voiceover is not None
output_class = SystemVoiceOver
Here is my fork of accessible_output2 with the implemented classes, plus the modification to the outputs.init.py file necessary to add system_voiceover to the outputs. I also modified the readme to include VoiceOver as an output option and a note at the bottom about the necessity of enabling the setting in VoiceOver to allow being controled by Apple Script. I included e-speak in the list of outputs as well since it was missing, I don't know the status on if e-speak is working or not, but it was missing from the list. Perhaps someone can speak on this.
https://github.com/tbreitenfeldt/accessible_output2
Timothy Breitenfeldt