Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

The diagram below summarizes discussion in AGL Santa Clara F2F (Sept 2018) about wakeword. Feasibility of this proposed flow has not yet been ascertained.


Some of the open questions:

  • How do we ascertain control of buffer between voice agents to ensure voice agent X can access audio buffer only when it is supposed to (currently: startListening API sent by VSHL)
  • Different voice agents may have different requirement about time of silence before speech for ASR calibration - we need configuration established for that
  • Wakeword detection, caching, and voice agent ASR recognition are happening in 3 separate processes. How do we make sure all the 3 processes are in sync in terms of buffer position? For example, ahl-softmixer needs to know the exact wakeword position to make sure it is not included when the ASR recognition begins.
  • How do we accommodate voice barge-in in this scenario?
  • Do we need to accommodate the scenario if voice agent also needs access to wakeword uttered as a part of the cached buffer?
  • Event subscription flow and other definitions need to be formalized
  • We need to decide if it is safe for Wakeword engine to close audio buffer on wakeword detection without the risk of ahl-softmixer dropping audio packets
  • No labels