On the July 14, 2021 episode of WebRTC Live, we welcomed another new face to WebRTC Live. Anton Venema, CTO at LiveSwitch Inc (formerly Frozen Mountain Software) joined Arin Sime for a deep dive into the topic of scalability. With the drastic increase in live video that has come with the pandemic, scaling is a very important topic for so many members in the WebRTC community.
Anton gave us some background on how his company began when he and his brother had an idea to develop a web diagramming tool. Over time this led them down a path of building other data synchronization applications and eventually into real-time video applications.
An early challenge with live video was scaling and the mesh network model of peer-to-peer WebRTC video often sets in once you have video calls with more than four people. They decided to build server-side solutions that could ingest the content into the server and fan it out, similar to a SFU. This was the birth of LiveSwitch.
Basics of Scalability and Media Servers
The first thing people learn about WebRTC is that it is not quite as peer-to-peer as advertised. STUN and TURN servers are needed to establish and relay the connections around firewalls. Anton shared that 20-30% of WebRTC calls require a TURN server.
Unlike STUN servers which are stateless, the TURN server is dealing with video traffic. As your service grows, you are going to need to scale the TURN server as well. Typically this is done with an elastic cluster that can expand and contract with demand. Unless you are doing strictly point to point calling, you usually need an SFU. SFU’s are one of the most common media server architectures in use today–each participant basically forms a peer-to-peer connection with a server, instead of with other peers.
When Live Switch was designed, they avoided the problem of scaling the TURN servers by implementing intermediary services in the middle that could forward Individual RTP packets that contain inbound streams of frames. This is the approach that many of the media servers that are WebRTC-compatible take today: to act as a selective forwarding unit or SFU. The TURN server is actually embedded into the Live Switch media server, so both are scaled simultaneously.
The server takes the RTP packets and forwards them on, allowing each participant to upload their streams once instead of to each participant, though they will still download a separate stream for each other participant. This is less of an issue though because most internet connections are asymmetric, and users generally have more download bandwidth than upload bandwidth.
Additionally, the media server can selectively choose how high a quality to send to each user, based on their network quality. In this way, everyone on a call doesn’t suffer from one person’s bad network.
MCUs: the dark horse in the room
A multipoint conferencing unit (MCU) is generally considered an older style media server architecture, because it has to process and combine all the video streams on the media server. The advantage is that each participant only downloads a single composite video stream which includes all participants. The disadvantage is that this puts a lot of processing burden on the server running the MCU, perhaps requiring a “beast of a box” according to Anton.
Anton and Arin discussed that there is still a use for MCUs, in particular use cases like SIP and telephony where it’s helpful to have a media server that can interface to older or less compatible technologies and send them a single composite video or audio stream.
Optimize for server or client efficiency?
It is hard to optimize for large numbers of parallel conversations and large group chats at the same time. This is why it is so hard to give a single answer on how to scale a WebRTC application. You really need to dig into those use cases and ask what your users value most.
Also, who will support the cost of the server: you or your customers? This question lies in your business model. Are you serving a lot of free clients or is your price point very aggressive? Who is paying the bills and what are their expectations?
When do you scale and when do you design for scale?
This question comes back to startups vs. well-defined environments and features. When building a first version of an application: how much do you do upfront and how much do you come back to later? It is important to have a tight focus on the feature sets you want to build in the first version so that your team can focus on more fundamental architectural issues, like scaling!
This question is also related to how you want to host the application: will you manage the application yourself using open source media servers or leverage others’ expertise and infrastructure using a cloud-based CPaaS?
The biggest mistake people make in scaling their WebRTC applications is becoming victims of their own success. Anton notes that it is hard to set aside time, but startups need to think about their pricing model and how it is going to impact you when you scale to a million clients. There is no margin for error in real-time communication, so plan for success and plan for scale from the get-go!
To learn more about LiveSwitch’s offerings for cloud-based or on-premise media servers, check out LiveSwitch.io.
Register for WebRTC Live #58 on August 25: XDN (Experience Delivery Network) as an Architecture for Our New Interactive World
Do you have a topic that you would like to see discussed on WebRTC Live? Let us know by emailing email@example.com.
Never miss an episode of WebRTC Live, our webinar series hosted by WebRTC.ventures Founder and CEO, Arin Sime. We feature the latest use cases and technical updates to this increasingly popular coding standard for live video. Watch past episodes on our WebRTC Live page, our YouTube channel, and on our blog. Better yet, use the form in the right sidebar to join our mailing list and be among the first to know about upcoming episodes and the latest news in WebRTC!