Adding Wakeword in AGL
The diagram below summarizes discussion in AGL Santa Clara F2F (Sept 2018) about wakeword. Feasibility of this proposed flow has not yet been ascertained.
Some of the open questions:
- How do we ascertain control of buffer between voice agents to ensure voice agent X can access audio buffer only when it is supposed to (currently: startListening API sent by VSHL)
- Different voice agents may have different requirement about time of silence before speech for ASR calibration - we need configuration established for that
- Wakeword detection, caching, and voice agent ASR recognition are happening in 3 separate processes. How do we make sure all the 3 processes are in sync in terms of buffer position? For example, ahl-softmixer needs to know the exact wakeword position to make sure it is not included when the ASR recognition begins.
- How do we accommodate voice barge-in in this scenario?
- Do we need to accommodate the scenario if voice agent also needs access to wakeword uttered as a part of the cached buffer?
- Event subscription flow and other definitions need to be formalized
- We need to decide if it is safe for Wakeword engine to close audio buffer on wakeword detection without the risk of ahl-softmixer dropping audio packets