Skip to content

Voice

We use speech to text to text to speech with both real and artificial voices.

That’s kind of complicated to stream fast on various devices, but we try.

@todo explain the components and endpoints and streaming and cross-browser compatibility

graph TD
subgraph client-side
/
memory["memory.getMemory()"] --> ls[(LocalStorage)] --> Messages
end
subgraph deps[External APIs]
claude(Claude)
deepgram(Deepgram)
elevenlabs(Elevenlabs)
end
subgraph server-side
/api/chat <-.-> claude
/api/deepgram-key <-.-> deepgram
/api/tts <-.-> elevenlabs
end
subgraph client-components
ChatInterface --> MessageList & MessageForm
MicRecorder <-.-> deepgram
end
/ <-.->|messages| /api/chat
/ -.->|memory| /api/summarize -.->|mermaid graph| /
/ -.->|microphone recorder| deepgramWebSocket -.->|transcript| /

The Voice Service handles text-to-speech and speech-to-text functionality.

VoiceService.ts
export async function textToSpeech(text: string, voice: string): Promise<string> {
// Convert text to speech using ElevenLabs or OpenAI
}
export async function speechToText(audioBlob: Blob): Promise<string> {
// Convert speech to text using Deepgram
}
export function startRecording(): Promise<void> {
// Start recording audio
}
export function stopRecording(): Promise<Blob> {
// Stop recording and return audio blob
}