Voice
We use speech to text to text to speech with both real and artificial voices.
That’s kind of complicated to stream fast on various devices, but we try.
@todo explain the components and endpoints and streaming and cross-browser compatibility
graph TD subgraph client-side / memory["memory.getMemory()"] --> ls[(LocalStorage)] --> Messages end
subgraph deps[External APIs] claude(Claude) deepgram(Deepgram) elevenlabs(Elevenlabs) end
subgraph server-side /api/chat <-.-> claude /api/deepgram-key <-.-> deepgram /api/tts <-.-> elevenlabs end
subgraph client-components ChatInterface --> MessageList & MessageForm MicRecorder <-.-> deepgram end
/ <-.->|messages| /api/chat / -.->|memory| /api/summarize -.->|mermaid graph| / / -.->|microphone recorder| deepgramWebSocket -.->|transcript| /The Voice Service handles text-to-speech and speech-to-text functionality.
export async function textToSpeech(text: string, voice: string): Promise<string> { // Convert text to speech using ElevenLabs or OpenAI}
export async function speechToText(audioBlob: Blob): Promise<string> { // Convert speech to text using Deepgram}
export function startRecording(): Promise<void> { // Start recording audio}
export function stopRecording(): Promise<Blob> { // Stop recording and return audio blob}