Video Transcoding Reference Architecture
The diagrams featured here provide different perspectives on the architecture of a video transcoding workflow:
Figure 1 describes the high-level components of a general video transcoding architecture without referring to particular technologies that may be used (with some exceptions).
Figure 3 presents a detailed version of this architecture with descriptions of how specific technologies are used to implement the components of the video transcoding workflow.
Figure 1 illustrates a media processing lifecycle that has two typical contributors: content creators and content distributors.
Content creators produce some type of multimedia content. This architecture assumes that this content is video. Typically the original content is in a high quality format so that it can be archived and transcoded to different formats in the future.
Content distributors take the original high resolution format of the content and convert it into formats suitable for online distribution.
The steps in figure 1 are described as follows:
Prior to ingesting new content from a creator, a strategy is devised to understand what types of platforms, audiences, and devices the content is ultimately distributed to. It is the role of the content distribution team to understand:
- What intake formats they are expecting from content creators
- What destinations the content is published to
- What output formats are needed for those destinations. This includes ancillary content such as metadata and images.
For example, HTTP Live Streaming (HLS) was the original format specified to support video playback on Apple devices. While HLS is supported much more widely today beyond just Apple devices, there are still always considerations for video distribution formats. These considerations may be based on budgets, audiences, devices to be supported, geographic regions, and so on.
Those concepts are encapsulated in transcoding system configurations. These configurations are called workflow specifications and transcode profiles, and they are defined prior to any content ingestion. Argo Workflows is an example of a tool that supports DAG-based (Directed Acyclic Graph) workflows, and it can be used to define workflow specifications. An open-source tool called PyTranscoder can be used to create transcode profiles for FFmpeg.
Content creators send content files into the media processing system. A way to upload content into a Media Ingest Location is provided to the content creator. This is typically a file system, FTP/SFTP endpoint, or cloud storage bucket.
When new files are added, a Media Ingest Event starts the workflow lifecycle. For example, there may be an application process that watches for new files and triggers the workflow. Or, the workflow may be started when a webhook notification is received from the Media Ingest Location.
Because transcoding is very CPU intensive, and because parallel processing is dependent on available computing resources, it is typical for a media processing workflow to queue up workloads in a Media Ingest Queue. This queue is consumed as processing slots become available.
Media Ingest Workers represent the actual processing tasks that are performed on a given file in a workflow. Steps 5a-5e illustrate the typical sub-processes that are carried out on video files.
The video file is downloaded from an object storage location to the local compute instance where it is processed.
The MediaInfo tool is used to collect metadata about the source file.
The source file is transcoded to the desired outputs. These outputs are specified by the transcoding parameters, which are based on the content distribution requirements.
After the new transcoded outputs are created, metadata is gathered for them. This information is used to validate that the output formats match the desired specifications. This metadata can also be used to catalog the content in other systems of record.
After transcoding to the required outputs, one or more thumbnail images may be generated. These typically accompany the content when it is published for consumption.
The files are uploaded to the desired Media Destination Location (also sometimes referred to as the content origin).
After being prepared in the proper distribution formats, most content is delivered to end consumers through a Content Delivery Network (CDN). The CDN caches content in the geographic regions it is served to, which reduces latency and increases reliability of playback.
A system is set up to allow observability of the content workflows. Content distributors need to be able to handle errors, assess utilization, and make decisions based on the types of workloads their systems are processing. A dashboard or tool to view and collect this information helps diagnose these situations.
Figure 2 provides an overview of the tools and technologies overlaid onto the VOD workflow use case. This overview illustrates how the Akamai Connected Cloud and Linode Kubernetes Engine (LKE) are combined with an event-driven and highly scalable workflow management tool called Argo. Argo is an open source Kubernetes-native workflow engine supporting DAG and step-based workflows, and it is a member of the Cloud Native Computing Foundation (CNCF). This combination of technologies allows for a flexible, portable, and cost-effective media processing solution.