Voice

We use speech to text to text to speech with both real and artificial voices.

That’s kind of complicated to stream fast on various devices, but we try.

@todo explain the components and endpoints and streaming and cross-browser compatibility

graph TD
    subgraph client-side
        /
        memory["memory.getMemory()"] --> ls[(LocalStorage)] --> Messages
    end

    subgraph deps[External APIs]
        claude(Claude)
        deepgram(Deepgram)
        elevenlabs(Elevenlabs)
    end

    subgraph server-side
        /api/chat <-.-> claude
        /api/deepgram-key <-.-> deepgram
        /api/tts <-.-> elevenlabs
    end

    subgraph client-components
        ChatInterface --> MessageList & MessageForm
        MicRecorder <-.-> deepgram
    end

    / <-.->|messages| /api/chat
    / -.->|memory| /api/summarize -.->|mermaid graph| /
    / -.->|microphone recorder| deepgramWebSocket -.->|transcript| /

The Voice Service handles text-to-speech and speech-to-text functionality.

export async function textToSpeech(text: string, voice: string): Promise<string> {
  // Convert text to speech using ElevenLabs or OpenAI
}

export async function speechToText(audioBlob: Blob): Promise<string> {
  // Convert speech to text using Deepgram
}

export function startRecording(): Promise<void> {
  // Start recording audio
}

export function stopRecording(): Promise<Blob> {
  // Stop recording and return audio blob
}