The WebRTC Architecture Landscape is Changing

There’s a new type of WebRTC application architecture evolving right now referred to as “WebRTC Unbundling”. Although it may not be appropriate for all applications, it should at the least be considered for any new live video development project. It was an important topic in my recent WebRTC Live interview with Tsahi Levent-Levi of BlogGeek.me.

In the past, we’ve spoken of three different types of WebRTC application architectures: WebRTC “to the standard”, open source media servers, and the commercial media server options referred to as CPaaS. All three of these remain valid architectural choices. WebRTC Unbundling is simply a fourth choice, which we could consider to be a twist on the WebRTC “to the standard” option.

To start, let’s review each of the previous three architectural options. I’ve also covered this in a “How to Architect Your WebRTC Application” conference talk that our CTO, Alberto Gonzalez, and I gave virtually at the 2021 TADSummit. You can skip ahead to this part of the presentation here, and I’ll highlight each option quickly below.


Option #1: WebRTC “to the standard”

This is the original way to build your WebRTC application and is still the best option for limited use cases. From its inception, WebRTC was described as a simple way to access the camera and microphone and establish peer-to-peer video, audio, and data channels using plain-old JavaScript. It was intended to bring telecom communications to the developer masses so they could build things in the browser without having to understand VOIP or telephony.

This pitch was revolutionary. However, it’s always been a bit more complicated than advertised. Along with this great power comes great responsibility. You have to manage considerations like STUN/TURN servers, application signaling, and browser/mobile support. On top of that, scaling your application is non-trivial.

For the vast majority of applications these days, there’s no reason for you to handle the complexity of building to the standard. You’re better off building on top of an open source media server or CPaaS (see below). Many people start building applications this way as a learning exercise, and then leverage existing frameworks for their production applications.

The exception had been if you want to build your own media server because nothing on the market fits your precise requirement, or if you want to modify and recompile libwebrtc for a unique use case. Now, WebRTC Unbundling will cover most of those use cases better.

Option #2: Open Source Media Servers

Open source media servers like in the MediaSoup, Janus, Jitsi, and Pion libraries are all appealing options because they handle many of the complexities of WebRTC for you. They typically have the details of STUN/TURN, signaling, and browser/mobile support built right in. Based on a media server design such as a Selective Forwarding Unit, they also greatly improve the scaling capabilities of standard WebRTC.

You will still need to handle all aspects of your server infrastructure and the scaling of that infrastructure. The work to do this will be less than our first option, but still requires that you have a good WebRTC DevOps lead in your team who understands cloud infrastructure scaling.

Option #3: CPaaS

A Communications Platform As a Service (CPaaS) is a compelling architectural option for many of our clients because it’s the fastest-to-market solution. The commercial platform has built in all the global scaling you will need for the video portion of your application, as well as keeping up to date with new browser and mobile releases. Common CPaaS solutions include Agora, LiveSwitch, Twilio, and Vonage.

The tradeoff is cost. You will have higher operating costs since these services all charge some form of usage-based metering. A CPaaS is still an excellent choice for getting started. However, it can be time consuming and expensive to change your application later if you find you need more low level control of the video than your chosen CPaaS may provide, or if your application grows to a point where the charges become prohibitive.

Option #4: WebRTC Unbundling

Now we arrive at the new kid on the block for WebRTC architectures. WebRTC Unbundling is a term that references how you can combine various JavaScript APIs to replace parts of the media pipeline in WebRTC applications. These include WebAssembly, WebTransport, and WebCodecs.

As Tsahi showed in this graphic from our WebRTC Live episode, you can choose to replace different parts of the WebRTC media pipeline as you see fit for your specific application.

You can watch that part specifically of the WebRTC Live episode here. Tsahi discusses the new technologies that make up the next version of WebRTC and how they can be used individually in the media pipeline.

Using this model of unbundling, you can replace the standard encoding/decoding done in a WebRTC application with the WebCodecs library. This allows you to do your own optimization of the video based on your unique application needs, as well as to manipulate individual video frames.  

You can also change the send/receive portions of the media pipeline by using WebTransport if you want to use a lighter transmission protocol than WebRTC. Implementing your own WebTransport-based application might make sense if your use case doesn’t need NAT Traversal, and so the STUN/TURN overhead of WebRTC is not necessary. WebAssembly lets you connect all these pieces together more quickly and integrate with other things like Machine Learning around the video itself.

These APIs are not available in all browsers yet and working with them is more complex than most of our clients will need. However, the concept of Unbundled WebRTC is a very viable alternative to working with the WebRTC standard itself. Especially if you have a use case where you find yourself saying, “I wish WebRTC would do this instead…”. Then, you may have a situation where unbundling makes sense. You can build your own version of WebRTC that works however you like, without the complexities and maintenance problems of modifying libwebrtc itself.

Comparing the WebRTC Architectural Options

Previously, I’ve described the following matrix of WebRTC architectural options. This table compares different architectures against the parameters that our clients most often care about.

  • Upfront cost. How much will it cost me to build this application? Generally speaking, a CPaaS or open source media server abstracts out more of the complexity of building a WebRTC video application, so it should take you less time to build your application then coding directly to the standard. Because software developers are expensive, the best way to lower the upfront cost of building an application is to make it simpler so it requires less development time.
  • Ongoing cost. How much will it cost me to run this application in production? This also refers to the transaction cost of each individual video call on the system. Because a CPaaS charges you per-minute or based on the amount of bandwidth required, it is more expensive than building your own infrastructure with an open source media server or coding to the standard. This may not be a big deal when you first launch your application. If your plan is to scale up to millions of users and video calls, however, the transactional costs of a CPaaS could become prohibitive.
  • Technical difficulty. Closely related to the upfront cost, this tells you how specialized a developer you need to build the application. The more specialized that developer, the harder it will be for you to maintain this application after your initial development team moves on. Whether you’re hiring a development company like ours or hiring your own developers, this is an important consideration. It’s not easy to find any kind of developer these days, but you’ll have a lot more luck finding talented JavaScript developers than low level C++ developers.
  • Features included. How many features are already available to you just by choosing this architecture? A CPaaS is going to have the most features already available to you.  For example, a CPaaS may already provide an option for background replacement or captioning the video, whereas you have to build that functionality yourself if you’re writing code directly to the WebRTC standard.

How Unbundled WebRTC changes these tradeoffs

Unbundled WebRTC is similar to coding to the WebRTC standard with the additional benefits of more features available and arguably less technical difficulty.

The upfront costs of building an Unbundled WebRTC application are still higher than the other options, similar to building against the WebRTC standard directly. You’ll need a good understanding of video encoding if you want to manipulate individual video frames in your media pipeline. That extra expertise and time/cost is why this won’t be the best choice for more applications.

Similar to an open source media server, you’re going to control your own server infrastructure in this scenario. You won’t be paying any transaction fees to a CPaaS, so Unbundled WebRTC apps still offer a relatively lower ongoing cost at high scale.

The technical difficulty is a bit lower because the WebCodecs APIs give you easier access to video frames than if you were trying to modify the internals of libwebrtc itself. Your developers will still need a strong understanding of media processing pipelines and things like video codecs. 

The big differentiator of Unbundled WebRTC is the power that it gives you. On the one hand, we have previously said that a CPaaS gives you the highest features included because there is so much more functionality built into their commercial APIs. For instance, you might not need to build in recording or background replacement if the CPaaS offers that as a feature.

However, what happens if you don’t like the way that the CPaaS implements recording?  There’s very little you can do about it, as we have run into when a client doesn’t like the way that a CPaaS lays out the videos in a composite recording file. Or perhaps you don’t like the “funny hats” that a CPaaS offers you in their APIs and you want to build your own custom graphics into a live video stream? Or maybe the video transcription service that the CPaaS integrates with just costs too much and you want to roll your own solution?  

A CPaaS doesn’t typically give you much flexibility around their particular implementation. The advantage of Unbundled WebRTC is that you can modify each aspect of the media processing pipeline however you want. You have tremendous potential to build the features however you want but it will still be hard work, thus the asterisk disclaimer on my diagram above!

Which WebRTC Architecture is best for you?

It’s unusual that any of our clients request the exact same application as any other client we’ve worked with. Building a 1-1 video chat application is quite different from building an interactive webinar application with multiple participants and thousands of live viewers. The architecture you need is driven by all of the combined considerations in our trade off discussion above: the features you need, the budget you have, the developers you can find, etc.

What is best for you? Most of the time it’s still going to be the CPaaS or open source media server options, because of the benefits they offer in lower upfront cost and built in features and scaling. However, if you already have a well-identified secret sauce that you know your customers need, and you already know it cannot be fulfilled through existing media servers (open source or CPaaS), then unbundling will be worth the time. It allows you to achieve higher levels of differentiation from your competitors.

If you need help navigating these tradeoffs and building your live video application, we’re here to help! Contact us today!

©[current-year] KLEO Template a premium and multipurpose theme from Seventh Queen

Log in with your credentials

Forgot your details?