Table of Contents

I should be able to follow the guidelines of the AGL speech framework to plugin my voice agent.
I should be able to follow the guidelines of the AGL speech framework to plugin my wake word solution.
I should be able to follow the guidelines of the AGL speech framework to plugin my NLU engine.

High Level Architecture

Quoting AGL documentation,
http://docs.automotivelinux.org/docs/apis_services/en/dev/reference/signaling/architecture.html#architecture“
“Good practice is often based on modularity with clearly separated components assembled within a common framework. Such modularity ensures separation of duties, robustness, resilience, achievable long term maintenance and security.”

High Level Components

Architecture

The Voice Services architecture in AGL is layered into two levels. They are High Level Voice Service layer and vendor software layer. In the above architecture, the high-level voice service is composed of multiple bindings APIs (colored in green) that abstract the functioning of all the voice assistants running on the system. The vendor software layer composes of vendor specific voice agent software implementation that complies with the Voice Agent Binding APIs.

...

This design makes no assumptions on the mode in which the high level voice service component is configured and running.

1)

...

agl-service-

...

voice-

...

high

This binding has following responsibilities.

Structurally follows a bridge pattern to abstract the functioning of the specific voice agent software from the application layer.
The request arbitrator is main entry point to the system. It is responsible for routing the utterance to the correct voice agent based on various parameters like configuration, wake word detection etc.
Registers for dialog, connection, auth etc events from voice agents. Maintains the latest and the greatest state of the voice agents.
Audio/Visual Focus management. Provides an interface using which the voice agents can request audio or visual focus before actual rendering the content. In multiple active voice agent scenario, we can imagine that each agent would be competing for audio and visual focus. Based on the priority of the content, the core should grant or deny focus to an agent. In cases where it grants the focus, it has to inform the agent currently rendering the content to duck or stop. And make audio and visual focus decisions on behalf of the voice agents its managing.

State Diagram

Image Removed

API

...

language	cpp
title	vshl/startListening
collapse	true

...

Current Architecture

The following diagrams dives a litter deeper into the low level components of high level voice service (VSHL)

and their dependencies depicted by directional arrows. The dependency in this case can be either through an

association of objects between components or through an interface implementation relationships.

For e.g.,

A depends on B if A aggregates or composes B.

A depends on B if A implements an interface that is used by B to talk to A.

Gliffy

name	VSHL Architecture
pagePin	2

VoiceAgents Module

Gliffy


name	VoiceAgentsModule
pagePin	1

Core Module

Gliffy


name	VSHL Core Module
pagePin	1

Capabilities Module

Gliffy


name	VSHL Capabilities Module
pagePin	1

Sequence Diagrams

OnLoad

On load the controller will instantiate the entry level classes of each module and inject their dependencies. For e.g Core module observers changes to voiceagent data in VoiceAgent module.

Image Added

StartListening to Audio Input & Events

Image Added

State Diagram

Image Added

API

Code Block

language	cpp
title	vshl/cancelstartListening
collapse	true

vshl/cancel


Cancels the speech recognition processing in the chosen agent.
If agent id is not passed then the cancel request is sent to the default voice agent.

"permission": "urn:AGL:permission:speech:public:audiocontrol"


vshl/startListening


Starts listening for speech input. As a part of request, common configuration related information is passed. 


Note: The config inputs below are just examples and not the final list of configurations. 


Request: { 

"agent_id" : "integer"
}}


Responses: { 
  "jtype":"afb-reply", 
  "request": { 
    "status":"string" // success or bad-state or bad-request
  }
  "response":{
    "request_id": "string" // Request created by this call.
    "agent_id": "string" // Agent to which the request has been proxied.
  }
}

Code Block

language	cpp
title	vshl/isAvailablecancel
collapse	true

vshl/isAvailable

Check if the voice agents are available and running on the platform.

"permission": "urn:AGL:permission:speech:public:accesscontrol"vshl/cancelListening


Cancels the speech recognition processing in the chosen agent.
If agent id is not passed then the cancel request is sent to the default voice agent.


Request:
{

}

Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
  "response": {
    "available":"boolean"
  }
}

Code Block

language	cpp
title	vshl/subscribe
collapse	true

vshl/subscribe

Subscribe/Unsubscribe to voice service high level events.

"permission": "urn:AGL:permission:speech:public:accesscontrol"

Request:
{
  {
    "type":"array",
    "items" : [{
        "type":"string" // List of events to subscribe to
      }
    ]
  },
  {
    "subscribe":"boolean"
  }  
}

Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
}

...

Code Block

language	cpp
title	vshl_dialogstate_event
collapse	true

Dialog state describes the state of the currently active voice agent's dialog interaction. 

Event Data:
{
  "name" : "vshlvoice_dialogstate_event"
  "state":"string"
  "agent_id": "integer"
}


Values for state are
1) IDLE
High level voice service is ready for speech interaction.

2) LISTENING
High level voice service is currently listening.

3) THINKING
A customer request has been completed and no more input is accepted. In this state, Voice service is working on a response.

4) SPEAKING
Responding to a request with speech.

...

Code Block

language	cpp
title	vshl_connectionstate_event
collapse	true

Connection state describes the state of the voice agent along with errors. 

Event Data:
{
  "name" : "vshlvoice_connectionstate_event"
  "state":"string"
  "agent_id": "integer"
}

1) DISCONNECTED
Voice agent is not connected to its voice service endpoint.

2) PENDING
Voice agent is attempting to establish connection to its endpoint.

3) CONNECTED
Voice agent is connected to its endpoint.

4) CONNECTION_TIMEDOUT
Voice agent connection attempt failed due to excessive load on its server endpoint.

5) CONNECTION_ERROR
Captures other network related errors.

...

Code Block

language	cpp
title	vshl_authstate_event
collapse	true

Auth state describes the state of the authorization of the voice agent with its cloud endpoint. 

Event Data:
{
  "name" : "vshlvoice_authstate_event"
  "state":"string"
  "agent_id": "integer"
}

1) UNINITIALIZED
Authorization not yet acquired.


2) REFRESHED
Authorization has been refreshed.

3) EXPIRED
Authorization has expired.


4) ERROR
Authorization error has occurred.

...

vshl/phonecontrol/subscribe - For subscribing to phone control messages below.

Messages

DownstreamUpstream

{

Topic : "PhoneControlphonecontrol"

Action : "DIALdial"

Payload : {

"callId": "{{STRING}}", // A unique identifier for the call

...

}

Upstream

{

Topic : "PhoneControlphonecontrol"

Action : "CALLcall_ACTIVATEDactivated"

Payload : {

"callId": "{{STRING}}", // A unique identifier for the call

"required": [ "callId"]

}

{

Topic : "PhoneControlphonecontrol"

Action : "CALLcall_FAILEDfailed"

Payload : {

"callId": "{{STRING}}", // A unique identifier for the call

...

4xx range: Validation failure for the input from the @c dial() directive
500: Internal error on the platform unrelated to the cellular network
503: Error on the platform related to the cellular networkrelated to the cellular network

{

Topic : "phonecontrol"

Action : "call_terminated"

Payload : {

"callId": "{{STRING}}", // A unique identifier for the call

"required": [ "callId"]

}

{

Topic : "PhoneControlphonecontrol"

Action : "CALLconnection_state_TERMINATEDchanged"

Payload : {

"callId": "{{STRING}}", // A unique identifier for the call

...

vshl/navigation/subscribe - For subscribing to navigation messages.

Messages

DownstreamUpstream

{

Topic : "Navigationnavigation"

Action : "SETset_DESTINATIONdestination"

Payload : {

"destination": {"destination": {
"coordinate": {
"latitudeInDegrees": {{DOUBLE}},
"longitudeInDegrees": {{DOUBLE}}
},
"name": "{{STRING}}",
"coordinatesingleLineDisplayAddress": "{{STRING}}"
"latitudeInDegreesmultipleLineDisplayAddress": "{{DOUBLESTRING}}",
}

}

{

Topic : "longitudeInDegreesNavigation": {{DOUBLE}}
},
"name": "{{STRING}}",
"singleLineDisplayAddress": "{{STRING}}"
"multipleLineDisplayAddress": "{{STRING}}",
}

Action : "cancel_navigation"

}

GuiMetadata

API

vshl/guimetadata/publish - For publishing ui metadata messages for rendering.

vshl/guimetadata/subscribe - For subscribing ui metadata messages for rendering.

Messages

Upstream

{

Topic : "guimetadata"

Action : "render_template"

Payload : {

<Yet to be standardized>

}

{

Topic : "Navigationguimetadata"

Action : "CANCELclear_NAVIGATIONtemplate"

}

Template Rendering

API

vshl/templates/publish - For publishing template rendering messages.

vshl/templates/subscribe - For subscribing to template rendering messages.

Messages

Downstream Payload : {

<Yet to be standardized>

}

{

Topic : "Templatesguimetadata"

Action : "RENDERrender_player_info"

Payload : {

<Yet to be standardized>

}

{

Topic : "Templatesguimetadata"

Action : "CLEARclear_player_info"

Payload : {

<Yet to be standardized>

}

3) Configuration

Provides mechanism for OEMs to configure its functionality. OEMs should be able to configure

List of active agents
Assign roles and responsibilities of each agent
Language setting
Default Agent
Enable/Disable Fallback Invocation mode
Enable/Disable Agent Switching during multi turn dialog
... more

...

Code Block

language	cpp
title	vshl/enumerate_agents
collapse	true

vshl/enumerate_agentsenumerateVoiceAgents


"permission": "urn:AGL:permission:speechvshl:voiceagents:public:accesscontrol"


Enumerates and return an array of voice agents running in the system. This might be need for the applications like settings to be able to present some UI with a list of agents to enable/disable, show status etc.


Request: 
{ } 


Responses: { 
  "jtype":"afb-reply",
  "request": { 
    "status":"string" // success or bad-state or bad-request
  } 
  "response": { 
    "type":"array", 
    "items" : 
      [ 
        { 
          "name":"string", 
          "description":"string", 
          "agent_id":"integer" // Voice agent ID 
          "status":"string" // enabled, disabled 
        } 
      ] 
  }
}

...

Code Block

language	cpp
title	vshl/setActive
collapse	true

vshl/setActivesetDefaultVoiceAgent

Activate or deactivate a voice agent.

"permission": "urn:AGL:permission:vshl:speechvoiceagents:public:accesscontrol"


Request:
{ 
  "agent_id":"integer"
  "is_active":"boolean"
}


Responses: { 
  "jtype":"afb-reply",
  "request":
  {
    "status":"string" // success or bad-state or bad-request }
  }
}

...

afb-voiceservice-wakeword-detector

...

Provides an interface primarily for the core afb-voiceservice-highlevel to listen for wakeword detection events and make request routing decisions.
This binding will internally talk to or host voice assistant vendor specific wake word solutions to enable the wake word detection.

Voice Agent Vendor Software

1) voice-agent-binding

The API specification of voice agent is defined in this document. All the vendor specific voice agent bindings will follow the same specific to integrate with the high level voice service.
Voice Agent will listen to audio input when instructed by the high level voice service.
Voice Agent will run its own automatic speech recognition, natural language processing, generates intents to perform requested action.
Voice Agent will have its own authentication, connection and dialog management flows. And generates events to notify the high level voice service of its state transitions.
Voice Agent will use the high level voice service's interaction manager to command system applications to perform tasks, like Route to a specific geo code, Dial a Number, Play music etc.

API

Code Block

language	cpp
title	voiceagent/setup
collapse	true

voiceagent/setup

This API is exposed to high level voice service to pass any setup or high level config information like agent_id to the voice agent.

"permission": "urn:AGL:permission:speech:public:accesscontrol"

Request:
{ 
  "agent_id":"integer"
  "language":"string"
}


Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
}

...

Code Block

language	cpp
title	voiceagent/startListening
collapse	true

voiceagent/startListening

Start the listening for speech input. As a part of request, common configuration related information is passed.
Note: The config inputs below are just examples and not the final list of configurations.

"permission": "urn:AGL:permission:speech:public:audiocontrol"
Request:
{
  "request_id": "string" // Request ID assigned by the high level voice service.
  "language":"string"
  "location":"string"
  "preferred_network_mode":"string" // online, offline, hybrid
  "audio_input_device": "string" // ID of the alsa device to read the input
}


Responses:
{
  "jtype":"afb-reply",
  "request":{
    "status":"string" // success or bad-state or bad-request
  }
}

Events

Code Block

language	cpp
title	voiceagent_endofspeechdetected_event
collapse	true

Voice agent will notify its clients that end of speech is detected.
Event Data:
{
  "name" : "voiceagent_endofspeechdetected_event"
  "agent_id": "integer"
  "request_id": "integer" // the request for which the end of speech is detected
}

...

Climate Control (CC)

Use cases


1	CC - on/off	Turn on or off the climate control (e.g. turn off climate control)
2	CC - specific temperature	Set the car's temperature to 70 degrees (e.g. set the temperature to 70)
3	CC.- target range	Set the car's heating to a set gradient (e.g. set the heat to high)
4	CC - min / max temperature	Set the car's temperature to max or min A/C (or heat) (e.g. set the A/C to max)
5	CC - increase / decrease temperature	Increase or decrease the car's temperature (e.g. increase the temperature)
6	CC - specific fan speed	Set the fan to a specific value (e.g. set the fan speed to 3)
7	CC - target range	Set the fan to a specific value (e.g. set the fan speed to high)
8	CC - min / max fan speed	Set the fan to min / max (e.g. set the fan to max)
9	CC - increase / decrease fan speed	Increase / decrease the fan speed (e.g. increase the air flow)
10	CC - Temp Status	What is the current temperature of the car (e.g. how hot is it in my car?)
11	CC - Fan Status	Determine the fan setting (e.g. what's the fan set to?)

Set the cabin temperature

...

how hot is it in my car?

Use cases

1

Set Destination

Notify the navigation application to route to specified destination.

For e.g

"Navigate to nearest star bucks"

"Navigate to my home"

2

Cancel Navigation

Cancel the navigation based on touch input or voice.

For e.g

User can say "cancel navigation"

User can cancel the navigation by interacting with the navigation application directly on the device using Touch inputs.

3

Suggest Alternate Route

Suggest an alternate route to the user and proceed as per user preference.

For e.g.

"There is an alternate route available that is 4 minutes faster, Do you wish to select?"

When user says "No", then continue navigation

when user says "Yes", then proceed with navigation.

Set Destination

Cancel Navigation

...

This use case is currently unsupported by Alexa. Its a high level proposal on how the interaction is supposed to work. Alternatively, AGL navigation app can use STT and TTS API (out of scope for this doc) with some minimal NLU to enable similar behavior.

Technical References & Demos

Technical Video Presentation of AGL Speech Framework High-Level Architecture and Live Demo with Alexa Integration.

Alexa Demo on Renesas board.

Version	Old Version 1	New Version Current
Changes made by	Naveen Bobbili	Naveen Bobbili
Saved on	Oct 12, 2018	Dec 08, 2018

Page Comparison

Versions Compared

Key

High Level Architecture

High Level Components

Architecture

1)

agl-service-

voice-

high

State Diagram

API

Current Architecture

VoiceAgents Module

Core Module

Capabilities Module

Sequence Diagrams

OnLoad

Image Added

StartListening to Audio Input & Events

State Diagram

API

Messages

Messages

GuiMetadata

API

Messages

Template Rendering

API

Messages

3) Configuration

afb-voiceservice-wakeword-detector

Provides an interface primarily for the core afb-voiceservice-highlevel to listen for wakeword detection events and make request routing decisions.This binding will internally talk to or host voice assistant vendor specific wake word solutions to enable the wake word detection.

Voice Agent Vendor Software

1) voice-agent-binding

API

Events

Climate Control (CC)

Use cases

Set the cabin temperature

Navigation

Use cases

Set Destination

Cancel Navigation

Technical References & Demos

Provides an interface primarily for the core afb-voiceservice-highlevel to listen for wakeword detection events and make request routing decisions.
This binding will internally talk to or host voice assistant vendor specific wake word solutions to enable the wake word detection.