How LiveKit Powers OpenAI's Voice Mode: From Near-Death to AI Unicorn

How LiveKit Powers OpenAI's Voice Mode: From Near-Death to AI Unicorn

Episode 17 · February 23, 2026

Bottom Line Up Front

LiveKit co-founder Russ d'Sa reveals how his struggling live streaming startup secretly became the infrastructure powering ChatGPT's voice mode. After facing a 'sell or die' ultimatum from a tech giant, an unexpected email from OpenAI transformed LiveKit into a $1B+ AI infrastructure company. This is the story of pivoting from video conferencing to becoming the backbone of multimodal AI.

Key Facts

OpenAI Partnership:
Powers ChatGPT's voice mode for 3+ years(Russ d'Sa)
Revenue Growth:
Hit $1M ARR in August 2023, then 3x-5x-3x growth(Russ d'Sa)
Emergency Use:
Powers 35% of 911 calls through dispatch centers(Russ d'Sa)
Market Size:
Trillions of phone calls daily represent massive AI opportunity(Russ d'Sa)
Series B:
Raised $45M in 2024, achieved unicorn status(Episode description)

When Russ d'Sa got a threatening ultimatum from a tech giant to sell his startup for cheap or face destruction, he thought it was over. Then OpenAI sent an email that changed everything.

Key Facts

  • OpenAI Partnership: Powers ChatGPT's voice mode for 3+ years (Russ d'Sa)
  • Revenue Growth: Hit $1M ARR in August 2023, then 3x-5x-3x growth (Russ d'Sa)
  • Emergency Use: Powers 35% of 911 calls through dispatch centers (Russ d'Sa)
  • Market Size: Trillions of phone calls daily represent massive AI opportunity (Russ d'Sa)
  • Series B: Raised $45M in 2024, achieved unicorn status (Episode description)

The Secret Behind ChatGPT's Voice Mode Infrastructure

LiveKit provides the WebRTC infrastructure that enables ChatGPT's voice mode to handle real-time, stateful conversations between users and AI, fundamentally different from stateless HTTP-based text interactions.

When OpenAI launched ChatGPT's voice mode, few knew the infrastructure powering it came from a struggling live streaming startup. LiveKit's WebRTC technology enables the real-time audio streaming that makes voice conversations with AI feel natural and responsive.

The key technical challenge lies in the difference between stateless and stateful interactions. As Russ d'Sa explains, traditional web applications use HTTP, a stateless protocol where each request is independent. Voice AI requires maintaining context throughout extended conversations, similar to how multiplayer video games handle persistent sessions.

This infrastructure challenge extends beyond just the protocol. Load balancing voice AI agents differs dramatically from web applications, where sessions can last anywhere from 30 seconds to an hour, requiring sophisticated deployment and scaling strategies that LiveKit has perfected through powering OpenAI's massive scale.

"hey, so we signed up for LiveKit Cloud in secret with a personal Gmail address three weeks ago. And we built this voice interface for ChatGPT, on top of you guys. And now we're fans, and we want to talk commercial." — Russ d'Sa
"HTTP is a stateless protocol. What that means is that you don't have any context building up across requests. Every request is independent, but with a voice conversation. We're having a conversation that has been building up over an hour" — Russ d'Sa

From Near-Death Threat to Unicorn Partnership

A tech giant's 'sell or die' ultimatum became the catalyst for LiveKit's transformation when OpenAI's partnership email arrived the same day, shifting the company from video conferencing to AI infrastructure.

The timing couldn't have been more dramatic. On August 4th, 2023, Russ d'Sa endured a five-hour lunch where a tech hyperscaler delivered an ultimatum: sell for a lowball offer, license their technology, or face destruction. The company was struggling at around $300k run rate with their live streaming business.

That same evening, driving home in defeat, d'Sa received the email that would change everything. OpenAI had been secretly using LiveKit's infrastructure and wanted to discuss a commercial partnership. This wasn't just any customer – it was the company defining the future of AI.

The partnership transformed LiveKit's positioning overnight. When enterprise customers like Salesforce later inquired about their capabilities, the conversation became simple. The credibility of powering ChatGPT's voice mode opened doors that years of traditional sales efforts couldn't.

"So I went to this five hour long lunch with them. They were, like, can we buy you? Can we license? Or we're going to kill you." — Russ d'Sa
"Salesforce came to me, and they were like, so first question, who do you guys work with? Who do you guys power? Can you handle our scale? I'm like, oh, well, we power ChatGPT. They're like, all right, next question." — Russ d'Sa
  • Tech giant offered lowball acquisition or threatened competitive destruction
  • OpenAI email arrived same day as the threatening ultimatum meeting
  • Partnership instantly elevated LiveKit's enterprise credibility
  • Company pivoted entire strategy from video conferencing to AI infrastructure

Critical Infrastructure Beyond Consumer AI

LiveKit powers critical infrastructure including 35% of 911 emergency calls and major platforms like Spotify, demonstrating how voice AI infrastructure extends far beyond consumer chatbots into essential services.

While ChatGPT's voice mode captures headlines, LiveKit's infrastructure serves much broader critical applications. The company powers 911 dispatch centers across the United States, handling approximately 35% of emergency calls through their partnership with Prepared (now owned by Axon).

These emergency systems showcase the sophisticated capabilities required for mission-critical voice AI. Dispatch agents receive real-time audio, video, and GPS data while AI agents assist with tasks like gun detection, license plate reading, and automatic coordination with FBI field offices.

Enterprise applications span from Spotify's live concerts to Oracle's experimental police drone projects. Each use case demonstrates different scaling challenges, from massive community events to specialized government applications, all requiring the robust, low-latency infrastructure that LiveKit provides.

Never miss a founder's PMF story

Subscribe to The PMF Show
"when you call 911 for about thirty five or so percent of the calls. That's actually going through LiveKit infrastructure" — Russ d'Sa
"Oracle's was crazy. That was like a Cybertruck police drone. They use us today too for a bunch of different initiatives, but that was the first one they had an idea for." — Russ d'Sa

The Technical Architecture of Stateful Voice AI

Voice AI requires fundamentally different infrastructure than web applications, using stateful WebRTC connections instead of stateless HTTP, with specialized load balancing similar to multiplayer gaming rather than traditional web servers.

The technical challenges of voice AI infrastructure mirror those of multiplayer gaming more than traditional web applications. Unlike HTTP requests that complete independently, voice conversations maintain persistent connections that can last from minutes to hours, requiring entirely different deployment strategies.

WebRTC, originally designed for peer-to-peer connections between client devices, had to be adapted for server-side AI agents. This creates unique scaling challenges where load balancing can't simply round-robin requests, as sessions have dramatically different durations and resource requirements.

LiveKit's solution involves sophisticated orchestration that handles interruption detection, context preservation, and seamless failover across global infrastructure. When OpenAI needed to scale voice mode, they specifically asked LiveKit for guidance on best practices for this new architectural paradigm.

"The way I think about it is like multiplayer video games. When you play Fortnite, one Fortnite game might take five minutes and you'll win. And then somebody else is in a Fortnite session that might take thirty minutes for somebody to win." — Russ d'Sa
"OpenAI put this meeting on our calendar, and we go into this meeting. And they're like, so how do we scale this thing? No joke, my answer was, you're OpenAI, why are you asking me?" — Russ d'Sa

Competing in the Voice AI Infrastructure Stack

LiveKit competes as high-code infrastructure while companies like Vapi offer low-code solutions built on top of LiveKit, creating a ecosystem similar to Next.js/Vercel versus Webflow for voice AI development.

The voice AI ecosystem has emerged with clear layers of abstraction. Companies like Vapi, Bland, and Retail operate as low-code platforms, many actually building on LiveKit's underlying infrastructure while providing simplified interfaces for rapid deployment.

LiveKit positions itself as the high-code solution, comparable to Next.js and Vercel for web development, while low-code platforms resemble Webflow. This creates different value propositions: low-code platforms enable quick time-to-market through point-and-click interfaces, while high-code solutions provide the control needed for differentiation.

The strategic advantage of controlling the infrastructure layer becomes apparent as voice AI applications mature. While multiple companies can quickly build similar products using low-code platforms, sustainable competitive advantages require the fine-grained control that only infrastructure-level access provides.

"Most of them actually are built on top of LiveKit and so they use our infrastructure for all of the transport. Then they have their own infrastructure for orchestration" — Russ d'Sa
"I think of it personally as Webflow versus Next.js and Vercel. Solid businesses, and they address a different segment of customer. But my personal opinion is that eventually you need to go to code." — Russ d'Sa

Lessons from Selling a Struggling Startup

Successful startup sales require relationship-driven approaches, with strategic partnerships providing leverage in negotiations, while acqui-hires work best when positioning team talent as the primary value proposition.

d'Sa's experience selling Evie Labs to Medium reveals crucial insights about startup exits when companies aren't thriving. The key strategic insight from their banker was to build strategic partnerships first, creating mutual dependencies that provide leverage in acquisition discussions.

The relationship-driven nature of startup acquisitions became evident through connections spanning years. The Medium deal materialized through Josh Ellman at Robinhood, who sat on Medium's board and knew d'Sa from Twitter, demonstrating how professional networks compound over time.

When facing acquisition offers that didn't include the entire team, ethical considerations proved financially costly but relationship-preserving. Declining Coinbase's offer to acquire only five of eleven team members maintained team integrity, with d'Sa's co-founder holding firm on doing right by everyone despite potentially lucrative individual outcomes.

"the recommendation is to try to find a very strong strategic partnership with someone first. Kind of prove out some kind of shared win that you can achieve during that partnership where they depend on you" — Russ d'Sa
"In all situations where a company is sold, not bought. It's almost purely relationship driven, in my opinion." — Russ d'Sa
  • Strategic partnerships before acquisition discussions create negotiating leverage
  • Professional relationships often determine acquisition opportunities
  • Team integrity decisions have long-term relationship implications
  • Good cop/bad cop negotiation tactics with investors require coordination

Voice AI Infrastructure: High-Code vs Low-Code Platforms

AspectLiveKit (High-Code)Vapi/Bland (Low-Code)
Time to MarketLonger development cycleQuick deployment via UI
CustomizationFull control over agent behaviorLimited to platform features
InfrastructureDirect infrastructure accessBuilt on LiveKit infrastructure
DifferentiationHigh barrier to entry for competitorsLower barrier to entry
Use CasesComplex, custom applicationsStandard voice AI workflows

Frequently Asked Questions

How does LiveKit power OpenAI's ChatGPT voice mode?

LiveKit provides the WebRTC infrastructure that enables real-time audio streaming between users and ChatGPT's AI agents. OpenAI secretly signed up with a personal Gmail address and built their voice interface on LiveKit's platform before revealing the partnership.

What's the difference between stateful and stateless AI interactions?

Stateless interactions like text chat send the entire conversation history with each request. Stateful voice conversations maintain persistent connections with built-up context, similar to multiplayer gaming sessions, requiring completely different infrastructure approaches.

How did the tech giant's ultimatum affect LiveKit's strategy?

The 'sell or die' threat from a hyperscaler coincided with OpenAI's partnership email on the same day. This timing pushed LiveKit to pivot from video conferencing to AI infrastructure, transforming a struggling company into a unicorn.

What other critical infrastructure does LiveKit power besides ChatGPT?

LiveKit powers 35% of 911 emergency calls in the US, Spotify's live concerts, and various enterprise applications. Their infrastructure handles everything from emergency dispatch systems to experimental police drone projects.

LiveKit's transformation from a struggling video conferencing startup to the infrastructure backbone of voice AI demonstrates the importance of timing, relationships, and recognizing inflection points. Listen to the full conversation on The Product Market Fit Show for more insights on navigating startup pivots and building critical AI infrastructure.

Want more founder stories like this?

Subscribe to The Product Market Fit Show for weekly episodes.

Subscribe Now