It glues everything together: an interview with Marc Cymontkowski, evangelist of SRT

SRT has been instrumental in the development of Video Transport and is also available in our video SDK. I had a chance to have this great conversation with Marc at IBC2019, whom I had known from the days we were both involved in creating DirectShow filters for the video industry. We talked about how SRT was created, why it is open source and what’s on the roadmap for the future.

                                                                            .   .   .

Can you give us some background into the Montivision days and how you got involved with Haivision and SRT?

Yeah, sure. I started the idea of Montivision in 2002, when I was working as a freelancer for some companies that needed machine vision in production to, for example, do quality control on conveyor belts. I looked for ways to make the video processing independent of the capture card (by the time every capture card had an SDK with it). I found that DirectShow was a way to achieve this and created DirectShow components doing machine vision for the industrial world.

We built the Montivision development kit with a UI and a vast choice of video processing components, and you could export the configurations into an ActiveX control to use from within an application. That was nice.

Then, over time, we got phone calls from people from the film and broadcast industry asking, since you guys are using DirectShow, could you also build, for example, a standards conversion component? I said, we can, because signal processing was my background. So I wrote a DirectShow filter for standards conversion. I send it back to the customer in London and I got a phone call in which they said, it seems to work really well, but something is wrong. When I asked what was wrong, they said: it’s too fast. It’s the coolest phone call I ever received: why would you complain if an algorithm is too fast? And the guy goes, as per the math you shouldn’t be able to go faster than 3–4 frames per second without sacrificing quality. They wanted to let a consultant decide. So I flew over to London, they got one of those expensive hardware converters attached to one screen and our software to another one and the independent consultant looks at both screens for a while and finally points at the screen that shows the output of the MontiVision DirectShow filter.

And that’s the day we decided to put more focus on broadcast. We developed a lot of DirectShow filters for this specific customer, but also onboarded new customers from the broadcast space. And then, in 2005, the Kulabyte founders from Austin called us. Those guys were super innovative and cool to work with. They had all these ideas for an OTT encoder, but were not C++ software developers. They needed a very special custom DirectShow filter and asked us whether we would be able to build it. We did it and basically ended up working on the whole encoder pipeline up to 2010, when they decided to go Linux, and we had to re-architect the whole thing to make it platform-independent.

In the middle of that redesign, Haivision started talking to Kulabyte regarding an acquisition, and towards the end of the process they realized that they need these guys from Germany. The CTO came over to Germany and we quickly figured that we would be able to work together very well.

Was that a technology acquisition or a talent acquisition?

It was both, because without MontiVision there wouldn’t have been the Kulabyte encoder the way we know it today. Haivision needed the people who developed the encoder engine in order to move into the OTT space.
Haivision was always about embedded devices, MPEG-TS end to end, low latency, high quality. They had realised that they needed an OTT product to extend the reach of their customers for mass delivery over CDNs.

So, Haivision originally was a transport company?

Totally. Haivision was initially doing video conferencing, very low latency video transmission. And then the business was growing into other verticals, but the focus was always on high quality low latency end to end video solutions. We always hesitated to do broadcast, as our encoders were missing some typical broadcast requirements at the time, but the customers loved them for their contribution feeds, so this field became one of the major pillars.

So, this is when the remote production idea was ramping up?

Exactly.

And was that done over the public Internet? Or did it require a dedicated line?

That was already the public Internet, because by that time we already had SRT. SRT was originally created in 2012 and first presented in 2013. When I joined in 2011, Haivision evaluated a proprietary technology from a third-party company. We liked the technology, but weren’t able to come to an acceptable business agreement.

So, one day our CMO and our CTO came to me and said, Marc, we need SRT. And I was like, what is SRT? Secure, reliable transport: it’s capable of sending encrypted MPEG transport streams over the public Internet at very low latency.

I had some experience with transmitting DVB-compliant video over the public Internet to satellite teleports, but we had to smooth the signal with a long buffer before feeding it into the teleport. It had to be a perfectly timed signal, since DVB is really picky as you know. So that whole idea was based on a library called UDT. And we were using huge 10-seconds buffers to smooth out the signal. This UDT stuff worked really well over the public internet, but with crazy high latency.

I figured, to change this, we would need to capture the system timestamps on the sender machine, put it into the header and use it to recreate the signal shape on the receiver side.

Because I knew that in Montreal we have some hardcore networking people, I was convinced they’d help me implementing it. So I went to one of the smartest people I know, sat down with him and explained what I wanted to do with the UDT library.

And after I explained everything, I could sense from his reaction that he was thinking something like “hm, stupid”. So I flew back to Germany, really frustrated. But then, after a few weeks, I get an email that said something like “…well, maybe. I tried some things and it could work.”

At IBC 2013, we demonstrated live HEVC encoding from a hotel suite outside IBC utilising SRT over the public internet to the show floor, where we had a tiny booth with a screen, and we were showing live interviews happening at the hotel. And when people said, it’s probably fake, we asked them to pick up the mobile and call the interviewer to see how he’d pull out his mobile phone.
That’s how it started, we put SRT into all our products, encoders, decoders, gateway, media platform. And suddenly we were able to connect all the dots worldwide. We brought it to a stage where our media gateway became this backend routing service, which we could fire up on different clouds and setup end to end routes. Now we’re taking it to the next stage with a serverless cloud architecture—same concept, but with one central UI based on web APIs. Worldwide routing workflows, extended to devices integrated over IoT control. So you can create end to end routes from a device on one side of the planet, across the Internet, to a device on the other side of the planet. Or to a destination in the cloud.

The SRT Hub?

Yes, that’s the SRT Hub, exactly.

Between what you have envisioned with SRT, and what you had actually accomplished — what was the difference? Did the idea evolve?

Wildly. I mean, the idea was to get data from A to B, and it was unidirectional in the beginning. And when it worked, immediately people said that it would be great to get the data back across the same channel. Okay, should be possible — so we made it bi-directional. Then, there was something UDT already had, the multiplexing feature. What does it do? It’s multiplexing streams over UDP. Oh, cool, let’s activate that. So we did all the code changes to make it work and now you can run several bi-directional SRT connections over one UDP port, which is SRT multiplexing as we call it.

Then the next thing: it’s nice to have this bi-directional connectivity, but how do I know which stream is which? There is an identifier in the header. But a simple identifier means that I need a database to translate this identifier into a customer X, stream Y, and so on. Two months ago, we came up with what we call content access specification, which allows you to add username, stream name and mode (publish/request) to the SRT handshake. Now a listener socket can react to incoming SRT connections, decide whether it wants to allow or decline a connection, and also select a customer-assigned encryption passphrase.

Authentication?

Yes, it’s a way to authenticate.

Another thing that we introduced at NAB 2019 in an experimental branch is a packet filter API, which allows you to write a custom network packet filter to be used on the sender and receiver side, which we used to add FEC support to the protocol. The reason why we put FEC is not that everybody wanted FEC. But we are working in the broadcast market and the broadcasters know FEC for a long time. Since we faced concerns and critics, we added FEC support to SRT and gave people the flexibility to use either ARQ only, FEC only or both combined.

But it wasn’t necessary, you’re saying?

It’s only necessary in specific use cases. Some broadband customers told us that if they want to transmit a very high bitrate stream over a dedicated good quality very long distance link, say U.S. to Australia, FEC gives them lower latency than ARQ. But now you have the voice to do ARQ only, re-transmit of packets, you can choose pure FEC, or you can combine FEC with ARQ, so if FEC doesn’t recover the packet loss, ARQ can still retransmit the lost packet.

How would you segment the areas where SRT is used? Any unexpected applications?

We tend to call it glue, because SRT glues everything together. People use it for uncompressed AES67 audio only. There is this guy in Germany, who does machine analytics on the endpoints, and then just sends the metadata over SRT. Other people are doing multiple video links over one SRT connection by multiplexing with their custom multiplexer on top of it. There are so many different use cases… Somebody told me once: if TCP knew that in the future it would do mainly video, then it would have been like SRT. Nice description.

Can you speak about the open source idea? Why did you arrive at this decision? And what was the goal?

We knew that SRT was working really, really well. Because we were delivering it with encoders, decoders, gateways in high volumes. Partners were asking whether they could license the protocol from us and customers were asking whether other companies they work with would be able to add SRT support. At the same time so many proprietary solutions were popping up, many of them. We had two options, either we end up implementing all of them, or, we go out there with a strong push and try to unify the market. It wasn’t easy to get everybody’s support inside the company, as we were about to give away something unique.

But after we did it, we gained so much recognition, because we created an ecosystem around the protocol that we support with a lot of energy. And suddenly we got the attention of much bigger players in the market. Microsoft became a big supporter of the protocol, because they love what we’re doing. And suddenly, this little-known niche player, Haivision, is on everyone’s lips here at IBC2019.

But then, obviously, there’s the element of competition. For some of their customers, those could be sales of encoders and decoders, right?

There’s always competition, but the thing is, the market is big enough for many companies. We’re repeating the pattern with the SRT Hub. This time it’s not open source, but an open ecosystem. We built this media routing service in the cloud. And we have the concept of hublets, where you can plug into our system. If a third party writes a hublet, their service can be used as part of the routing service. We provide the routing service and create this broadcast ecosystem, where you can connect all sorts of solutions in the cloud.

So, as far as I understand it, instead of competing with everybody’s proprietary protocols, you’ve created a protocol that everybody can use. And you have, sort of, expanded the market. And by expanding the market, you have gained more than you would if you were going with a proprietary solution.

Completely.

But my other question is, why open source? Because, for instance, NDI® is doing virtually the same thing, right? They are creating a market for NDI, but it’s proprietary, it’s closed, it’s free to use. Why do you go with open source? Did that particularly give you any benefits?

Marc holding the 2018 Emmy® Award for Technology and Engineering.

A huge benefit is credibility. What I often mention to people is that if you want to build credibility around an open source project, then you need to be open and honest about it. That’s why from day one, all the features go into the protocol first. There are sometimes rumours out there that Haivision has a secret sauce, and we run a custom SRT. That’s not true. For example, Haivision is doing network adaptive encoding with SRT. So the encoder is doing dynamic stuff? Yes, it does. But this is not SRT: our encoder looks at the statistics that the protocol is generating and it’s using those statistics to adjust the encoder bitrate — everybody can do that. There’s no secret sauce there. That is not the protocol. The more credibility we get over time, the more people trust us and the bigger the ecosystem we can work with.

In the beginning, many people told me they don’t understand why we open-sourced SRT. They assumed we would ask for licensing at one point, or that we would have hidden source, which offers certain advantages. I don’t hear that anymore. People understand that we are serious about it and that’s very important to us.

But are you going to do licensing for hardware manufacturers? Or does your open source license allow me to create a hardware appliance with SRT in it?

It’s free, you take the protocol and put it into your appliance.

So, it’s different from NDI…

SRT is a framework, like FFmpeg: take it and do whatever you want. The only requirement: please contribute back if you have an improvement.

How did that work out? Where there, like, hundreds of people contributing?

In the beginning it was slow. People were watching, many people cloned the code, looked into it, but the contribution was picking up slowly. I would say the first breakthrough was at IBC last year, and then another one at NAB this year. And by now we have around 50 contributors to the protocol. Now the amount of feedback and its quality is increasing steadily.

So, the fact that 50 people are contributing is actually helping you to make the product better versus creating management overhead…

Yeah, stabilize it.

What would be the top three improvements that you see coming in the next few months into the protocol?

Currently we’re out of ammunition for a while, because we pushed so much stuff into the protocol over the last month, it’s crazy. The next thing that is coming, which I announced at the open source panel, is this experimental branch out there for the socket groups, basically network link bonding. Those are exciting, because this very simple concept is very flexible and therefore powerful. You can take multiple independent SRT connections and add them to a socket group, and this socket group will suddenly handle all the connections, synchronize them and talk to all of them. So you send the data into the group, it travels to the destination, and there’s another group receiver, and you get one stream out.

This allows you to say, now I want all these links to be completely redundant, send all the data over all the links, and have an SMPTE 2022–7 seamless protection switching receiver on the receiver side. So basically, first come first serve, whatever packet comes first, you receive it, and then SRT does its timing recovery and outputs the signal as usual.

Secondly, we implemented a main/backup solution, which is doing seamless switching between the links. Within the SRT buffer time range, the protocol detects whether the main link has problems and switches to the backup link. So you’re not sending all the data over all links anymore, but you are only sending the data once.

Yes, you would have different network providers, different network types or simply at least two different routes across the internet. The third socket group feature the team is working on is bonding. Once you control multiple links, you can split the data across multiple links. So you could get a higher throughput by combining different links.

This is a big task and a lot of work is left, but we want to get the SRT version 1.5 out at NAB 2020. But it’s way to go because the main/backup links alone, for example, are a very complex topic. You need intelligence in the protocol that makes the decision when to switch. We will need to run a lot of experiments, and that’s why we put it out there in an experimental branch. People can take it and help us run it over different networks, tell us what breaks, give us feedback, help us validate, so hopefully it’ll be solid for a release at NAB 2020.

What is your role in the SRT movement? Are you designing the architecture, promoting?

I collect ideas and wishes in the community, come up with a lot of ideas and talk to people within our organization to see what it is that they need. The redundancy topic is hot for years now and in the labs we have gone through different experiments over time. But it’s not a trivial topic. The developers came up with a really cool solution, but the CPU utilization was so high that you wouldn’t be able to run it on an embedded device. Then they came up with another solution, which was very low-level, super lightweight, but wasn’t reliable enough. And now, with the socket group model, we think we found a great balance. That’s what I’m doing, trying to be the SRT ambassador.

Are you facing any competition? The RIST protocol?

The RIST initiative started around the time where we open-sourced SRT and we are a member of the working group. There was a clear trend in that group to rely on existing standards, to take RTP as a base and build on top of it, which was true for the simple profile of RIST. So you can send a RIST simple profile stream to an existing RTP receiver. Once you go to the main profile, because you have encryption, multiplexing and other functionality, that interoperability breaks. In the end the main profile is a protocol that is not very different from what SRT is.

The difference between them is, SRT is one code base. It’s a community project. Everybody uses the same code, and it’s implementation-driven. RIST is standards-driven. If you are a big broadcaster, you might have a team of developers to implement it yourself, right? So some big guys might do it. But many of the smaller players don’t have the engineering power to implement every protocol they need and maintain it over time. They can just take SRT and it works.

Either way is totally valid. I don’t see problems with both existing. The RIST group is strongly focusing on the broadcast market. SRT has been taken into all kinds of areas. People use it for IoT connectivity, metadata exchange, as a communication protocol, for uncompressed data and many obviously for sending MPEG transport streams over the public internet.

That’s why I call it glue! Whatever you want to glue together, put SRT in between.

See also