Video SDK Ingest Media Servers

Vertex: a next-generation AV production suite based on MFormats Video SDK

Vertex is the next-generation AV production suite that powers fixed installations at museums, creative projects for marketing agencies, and live events. The product is developed by ioversal – a German company founded by Jan Huewel and Martin Kuhn in 2018.

We had an in-depth discussion with Jan about how and why MFormats became an important part of their development process.

Having started his professional career as a theatre lighting designer and operator, at some point Jan thought he wanted to create a product that would simplify and automate some of the tasks, especially when they started to use more video: the whole process of video editing and playout during rehearsals and performances must be easier and faster.

Together with a friend from high school, who was a computer engineer, they started to built their first media server, the product prototype that led to the foundation of coolux and the development of Pandoras Box, which, over the course of 15 years, has grown into one of the biggest hardware-based turnkey solutions on the market.

We did projection mapping before the word was ever invented. We did this in the early 2000s. The whole buzz about that started years later. When we did the first 3D-mapped projection on a car, I thought: no one will ever do this again, it’s too complicated. A couple of years later, this was all over the place.

Soon after coolux was acquired by Christie, Jan found himself thinking about the next logical step: what would the future look like? His thinking led to the creation of Vertex, which is what this story centers on.

What is Vertex?

Vertex can be considered as a modern multimedia framework, an application for immersive multimedia applications based on three core concepts being: Multi-System, Multi-Display, and Multi-User by design.

The system allows users to interact with lots of different elements that are vital for multimedia shows and applications. One integral part is content: it can be anything from video to PowerPoint, Photoshop files, audio files, input and output streams, and so on. The content can be organized in one or multiple canvases for structuring playout across multiple systems. For example, in a museum with multiple rooms, each room could be a separate canvas where all the displays in that room are composited together.

Everything in the application – and this was the biggest challenge during development – is multi-user, down to every property. Multiple people can work on any aspect of the project simultaneously.

On top of this foundation Vertex features a large device library, including projectors, switchers, tracking systems that can feed us data, or since it's bi-directional, we can send data out to control lighting devices… In fact, it's so generic, we don't care whether it's a coffee machine, a motor, a hydraulic pump, an SMS or an email – we can process it and do something with it.

Vertex also offers so-called control surfaces, so you can easily build your own web pages or control pages with buttons, sliders, and faders – this covers the use cases where people interact with the screens via touch devices or third party controllers.

How is the Video SDK used?

MFormats is an integral part of our video playback engine and DirectX render engine. It's basically the lower level part. MFormats helps us in several ways. First, we do video analysis on import. We look at the frame rates, we analyze frame times for each frame. This gives us the data we use during the decoding process – since we do our own decoding for some file formats, such as HAP or NotchLC. We’ve built our own file reader that uses fast non-cached reading. We analyze that HAP file with MFormats when we import it, and then we use our own reader to read it much faster. We also use this data for visualization – the metadata that users see.

We’ve also built our own video player, with our own buffering and synchronization mechanism. We use the Medialooks framework to acquire the frames, and then feed them into our video player so that we can take care of the synchronization between multiple Vertex systems.

Next, it is video proxy generation. For every video that we import, we always encode a low-res proxy version. This is done by MFormats in the background, usually via H.264 or Hap Alpha. This helps a lot, when our clients work with really large canvases and high-res content (8K).

Another thing that we offer in our product is video transcoding: the option to transcode files into your desired format: for example from a ProRes into the HAP format. This is based on MFormats, and directly accessible for our clients in our application, which is great.

Since MFormats supports it, we offer video encryption where it's required for MOV files. We use the standard MLB encryption for this.

MFormats is also used for live input, to acquire video content from various input sources. If we were to do the whole thing ourselves, we would have to integrate the Blackmagic drivers, we would have to deal with the API's of DELTACAST, AJA and other vendors – this would be extra workload and time.

Finally, we use it for streaming – such as via NDI or SRT – as well as for file rendering. That was a little bit tricky to set up since we needed to get the GPU textures back and convert them into MFormats frames in order to send them as a stream. We had a little bit of a hard time to set this up, but I know it's not a common use case. However, with the Medialooks support team we managed to get this working, and it is working great.

So, that's the seven parts of MFormats that are important to us regarding video acquisition and processing. On the audio side, we import any kind of audio formats via MFormats, and we transcode them during import into uncompressed WAV files – so, we also use the audio conversion features of MFormats.

What did our product give you? Development speed?

Yes, hell yes! I can tell you, thank you! Thank you for that. Without MFormats Vertex wouldn't be here. It was a timesaver.

At ioversal we decided to develop everything in .NET. There are a few plugins that are written in C++, but most of the application is in C#, and that's also where MFormats was very well suited to our needs.

It also allowed us to offer a purely software-based solution and put the choice of hardware into the hands of the end clients, or the integrator. We also didn’t have to look at every input driver or streaming protocol.

Did you consider any alternatives to MFormats?

To be honest, I searched a little bit but I didn't find a good alternative. The only thing that came to mind at some point was creating our own FFmpeg wrapper. And there's a couple of open source projects around for that. But the response time of your support team, your sophisticated responses… this was more than worth it, because it's good to keep this expertise separate, you've been doing this for such a long time.

Light show on the walls of the hall — © OASIS immersion

What features or aspects of the product do you like best?

I would say it's not a specific feature. It's more the characteristic: the setup, the acquisition, and the handling of frames, which is, honestly, very easy. It doesn't require a lot of code to implement it. Most of the API is very slim. I think that's what I like.

And, speaking about the simple features, some developers might hate it, but the way the parameters or the properties are sometimes added as strings, makes it very easy. Probably, there's also a controversial discussion in your team, whether it should be based on a lot of enums and different sets of classes. But I found the string approach efficient, because, if you put in the right string, it does what you want.

Also, the value that we get from MFormats is the fact that it knows all the different file containers, and it gives us access to all the frames, and we can leverage this data and use it in our framework for multi-system synchronization.

How would you explain Mformats to a friend?

I would explain it as a framework to easily acquire video frames as well as audio data, transform, play out, display it, or even stream it. It's an audio and video framework to easily acquire and manipulate video and audio – that's probably the simplest way of how I could put it.