2.6 Representing Color

2.6.1 Digital Image - Raster Graphics

A digital image is a rectangular data structure (a 2 or 3-dimensional array) of picture elements (pixels).

The color of a pixel is determined by a single code for achromatic images or multiple codes for chromatic images, commonly three.

Raster graphics represent color image data as a rectangular grid of code triplets.

The word code implies that a function encoded each pixel color, selecting the appropriate function depends on many factors and requires the introduction of concepts such as color encodings, image states, quantization or perceptual coding.

2.6.2 Quantization

Quantization is the process of mapping a continuous input signal (or sizeable discrete set of input values) to a smaller discrete set of output values (or codes). Quantization is ubiquitous in digital image processing and used to reduce images storage requirements through file size reduction or to lower bandwidth requirements when streaming images.

During quantization, the information between each quantizer step is discarded and irretrievably lost which makes it a lossy compression technique. The difference between the input value and the code value is known as the quantization error or signal distortion.

4-bit quantization of a continuous sinusoidal signal, there are 24==16 quantizer steps.

Quantization error decreases the signal-to-noise ratio (SNR) which in the context of digital imaging typically creates banding and contouring artifacts. They can be reduced by introducing a small amount of noise (? ? quantizer step) before the quantization, this is called dithering, and even though it also reduces the SNR, the noise introduced by dithering is usually more compelling than banding artifacts, although it may increase the required bandwidth for compressed deliverables.

2.6.3 Perceptual Coding

As Poynton (1998) explains, a color imaging system is perceptually uniform if a small perturbation of a component value is approximately equally perceptible across the range of that value.

Most electronic color imaging systems account for the non-linearity of the HVS and its logarithmic perceptual response to brightness when encoding RGB scene relative luminance values (linear-light values) into R’G’B’ perceptually uniform values using an OETF. The encoding process makes the reduction of bandwidth and the number of bits required per pixel possible by optimizing digital codes allocation as described in the next section.

Comparing 4 bit linear and perceptually uniform quantization of an image. Image copyright ? Disney 2018

Old cathode ray tube (CRTs) displays’ electron gun characteristics imposed an EOTF that is approximately the inverse of the HVS perception of brightness. The combination of HVS perceptual response to brightness with the CRT power function produces code values displayed in a perceptually uniform way. Non-linear coding was well understood by the engineers who designed the NTSC television system and was necessary to achieve good visual performance. Modern Standard Dynamic Range (SDR) display devices, e.g. Liquid Crystal Display (LCD), plasma, and Digital Light Processing (DLP), replicate this behavior by imposing a 2.2, 2.4 or 2.6 power function through signal processing circuitry. Their response to the input signal is often reasonably modeled using a gain-offset-gamma (GOG) approximation.

L = (gain ? V + offset)? where L is the normalized luminance emitted from the display, V is the normalized input device code value, and ? is the Gamma exponent and is discussed in details in section 2.5.5.

As described in section 2.2.6, an increment in Luminance is judged relative to the previous amount. The difference between a Luminance value L and a Luminance value L + ?L is noticeable for ?L >  0.01L. In other words, the just-noticeable-difference (JND) between two Luminance values is about 1

The 1.01 (101 / 100) ratio is known as the Weber contrast or Weber fraction. On a linear-light values scale from code 0 to code 255, code 100 is the location where the Weber contrast reaches 1An ideal non-linear transfer function allocates code values to minimize the just-noticeable difference. Advanced approaches, e.g. DICOM GSDF and PQ, account for the spatial frequency behavior of the HVS and leverage Barten (2003) Contrast Sensitivity Function (CSF) to model almost optimal functions.

A linear-light values scale showing various Weber contrast values. Poynton, C. (2012). Digital Video and HD, Second Edition: Algorithms and Interfaces (2nd ed.). Elsevier / Morgan Kaufmann. ISBN:978-0123919267

The contrast ratio is the ratio between the lowest luminance (reference black) and the highest luminance (reference white) that an electronic color imaging system is capable of producing. The artifact devoid contrast ratio is the ratio between the lowest luminance value that can be reproduced without artifacts (code 100 on a linear system) and the peak luminance of the system.

For 8 bit linear-light code values, the artifact devoid contrast ratio is only 2.55:1 (255 / 100). The "code 100" problem as Poynton (2003) terms it, can be addressed by using 12 bit coding (4095 code values) yielding an artifact devoid contrast ratio of 40.95:1, however, most of those codes cannot be visually discriminated.

On the other hand, using a logarithmic or power based perceptual coding, the required number of code values can be dramatically reduced, for example maintaining a 1.01 Weber contrast over scene relative luminance range of [0.01, 100] (contrast ratio of 100:1), requires approximately 462 codes (? 9 bits).

Perceptual coding is not required when using 16 bit integer (artifacts free contrast ratio of 655.35:1) or half float-representations (Weber contrast of 0.1

2.6.4 Floating-Point and Logarithmic Representations

Integer representations are not appropriate for storing linear high-dynamic range scene-referred imagery due to the typical luminance distribution in real-world scenes, even when they are middle-grey normalized. Floating-point systems gracefully address the precision issues associated with encoding scene-referred linear imagery. They represent numbers approximately to a fixed number of significant digits, i.e. the significand, scaled by an exponent in some fixed base, e.g. 2, 10 or 16.

The same unit can represent different orders of magnitude, for example, the temperature of the cosmic microwave background radiation is -270.45 ?C while the quasar 3C273 temperature has been estimated to 10 trillion ?C. Whereas with integer representations, the interval between two consecutive values is the same throughout the domain, with floating point representations the intervals increase as the exponent does. Fractional values can be represented using a negative exponent, and one bit is reserved to indicate the sign, allowing the coding of negative values as well. 4 ? 2-3 = 4 ? ? = ? 5 ? 2-3 = 5 ? ? = ? 6 ? 2-3 = 6 ? ? = ? ? 4 ? 22 = 16 5 ? 22 = 20 6 ? 22 = 24 ? 4 ? 28 = 1024 5 ? 28 = 1280 6 ? 28 = 1536 For games, where the balance between performance and memory use (or both) and quality often lies firmly towards performance, smaller floating point formats dropping the sign bit or having less precision (or both) are common. For more information on these, please refer to sections 3.4.3, and 3.5.3. For storage and performance reasons, it is common to encode high-dynamic range, scene-referred color spaces with integer representations. Digital motion picture cameras often record 10 or 12 bit integer media, e.g. ProRes, X-AVC, HDCAM SR or DPX files using an integer logarithmic encoding. It allows for most of the benefits of floating point representations, without actually requiring floating-point storage media. Being integer, logarithmic image data is also suitable for transporting over SDI. In logarithmic encodings, successive code values in the integer log space map to multiplicative values in linear space. Put more intuitively, this is analogous to converting each pixel to a representation of the number of stops above or below a reference level, and then storing an integer representation of this quantity.

As logarithmic images encode an extensive dynamic range, most mid-tone pixels reside in the central portion of the encoded scale. Thus, if you directly display a logarithmic image on a sRGB monitor, it exhibits low contrast.

Marcie, a famous reference image in theatrical exhibition and DI workflows, is shown as logarithmically encoded and exhibit low contrast. Image copyright ? Eastman Kodak Company.

2.6.5 Gamma

Gamma (?) is a numerical parameter giving the exponent of a power function assumed to approximate the relationship between a signal quantity (such as a video signal code) and light power.

Encoding gamma (?E), characteristic of OETFs uses an exponent approximately between 0.4 and 0.5 while decoding gamma (?D), characteristic of EOTFs uses an exponent approximately between 2.2 and 2.6.

The various gamma values are usually dependent on the color imaging systems and their defined viewing conditions. A color imaging system achieves representation of a scene in a way that matches viewer expectation of the appearance of that scene instead of attempting to reproduce physical color stimuli quantities. An outdoor sunlight scene can have luminance over 50,000 cd/m2 but may be displayed on a consumer electronic display with a white peak luminance of 320 cd/m2.

Object Luminance cd/m2 Relative Exposure Sun 1,600,000,000 23.9 Incandescent lamp (filament) 23,000,000 17.8 White paper in sunlight (Maximum value of PQ) 10,000 6.6 Blue Sky (Maximum value of HLG) 5000 5.6 Dolby Pulsar HDR reference monitor 4000

HDR reference monitor 1000

White paper in office lighting (500 lux) standard television reference monitor 100 0 preferred values for indoor lighting 50 - 500 -1.0 - 2.3 Digital Cinema Projector 48 -1.1 White paper in candlelight (5 lux) 1 -6.6 Night vision (rods in the retina) 0.01 -13.3

The different viewing conditions and image formation medium/device capabilities impose that scene luminance must be mapped to image formation medium/device luminance.

As described in 2.5.3 Color Appearance Phenomena, the Lateral-Brightness Adaptation Effect (by virtue of the surround change) and the Steven Effect (by virtue of the display device peak luminance depending on the viewing conditions) will affect the perceived image contrast and is critical to the faithful reproduction of images in the home or theater viewing conditions.

A simple but efficient way to compensate for the different surrounds, viewing conditions and display peak luminance when exhibiting pictures is to increase their contrast as the viewing conditions get darker. Typically, the surround brightness and the display peak luminance will be lower as well. Hunt (2004) reports that to overcome the loss in apparent contrast, the end-to-end system function or Opto-Optical Transfer Function (OOTF) for a digital color imaging system may have appropriate the, controversial, exponent values of 1, 1.25, and 1.5 respectively for bright, dim, and dark surrounds. The surround effect is likely overpredicted when those values are used in color appearance models to transform image lightness contrast based on the relative luminance of the surround. Daniels, Giorgianni and Fairchild (1997) report exponent values of 1, 1.06, and 1.16 as general guidelines. Gamma and its correction on a system, the combination of the different encoding and decoding gammas yields an end-to-end (OOTF) system gamma which increases perceived image contrast.

2.6.6 Color Encodings

A color encoding is a digital representation of colors for image processing, storage, and interchange between systems. Madden and Giorgianni (2007) define it as the numerical specification of color information.

A color encoding specification (CES) is a fully specified color encoding scheme and must define both the following components:

Color Encoding Method determining the meaning of the encoded data or what will be represented by the data. Color Encoding Data Metric characterizing the color space and the numerical units used to encode the data or how the representation will be numerically expressed. It may also include the specification for data file format, data compression method and other relevant attributes.

- The Academy Color Encoding System (ACES) used in this document to illustrate many color management concepts, implements various color encodings united into a unified paradigm supporting a wide range of input and output devices.