Speaker: Heinrich Fink
This thesis investigates GPU-based video processing in the context of a graphics system for live TV broadcasting. Upcoming TV standards like UHD-1 result in much higher data rates than existing formats. Processing such data rates while satisfying the real-time requirement of live TV poses a particular challenge for the implementation of a software-based broadcast graphics system. In order to reach the required data rates, the software needs to process image data concurrently on the central processing unit (CPU) and graphics processing unit (GPU) of the machine. In particular, the transfers of image data between main and graphics memory need to be overlapped with CPU-based and GPU-based executions in order to maximize data throughput. In this thesis, we therefore investigate the following questions: Which methods are available to a software implementation in order to reach this level of parallelism? Which data rates can actually be reached using these methods? In order to answer these questions, we implement a prototype of a software for rendering TV graphics. To take advantage of the GPU's ability to efficiently process image data, we use the OpenGL application programming interface (API). We use advanced methods of OpenGL programming to render high-quality video and increase the level of employed parallelism of the GPU. We implement the transcoding between RGB and the professional video format V210, which is more complex to process than conventional consumer-oriented image formats. In our software, we apply the pipeline programming pattern in order to distribute stages of the video processing algorithm to different threads. As a result, those stages execute concurrently on different hardware units of the system. Our prototype exposes the applied degree of concurrency to the user as a collection of different optimization settings. In order to evaluate these optimizations, we integrate a profiling mechanism directly into the execution of the pipeline. This allows us to automatically create performance profiles while running our prototype with various test scenarios. The results of this thesis are based on the analysis of these traces. Our prototype shows that the methods described in this thesis enable a software program to process high-resolution video in high quality. The results of our evaluations also show that there is no single best optimization setting for every GPU architecture. Different driver implementations and hardware features require our prototype to apply different optimization settings for each device. The ability of our software structure to dynamically change the degree of concurrency is therefore an important feature. For broadcasting software that is expected to perform well on a range of hardware devices, this is ultimately an essential feature.