Multimedia Security, Volume 1. William Puech

Читать онлайн.
Название Multimedia Security, Volume 1
Автор произведения William Puech
Жанр Зарубежная компьютерная литература
Серия
Издательство Зарубежная компьютерная литература
Год выпуска 0
isbn 9781119901792



Скачать книгу

Each of them only counts the photons of a certain wavelength. As a result, each pixel has a value relative to one color. By using filters of different colors on neighboring pixels, the missing colors can then be interpolated.

      Although others exist, almost all cameras use the same CFA: the Bayer array, which is illustrated in Figure 1.3. This matrix samples half the pixels in green, a quarter in red and the last quarter in blue. Sampling more pixels in green is justified by the human visual system, which is more sensitive to the color green.

      Unlike other steps in the formation of an image, a wide variety of algorithms are used to demosaic an image. The most simple demosaicing algorithm is bilinear interpolation: missing values are interpolated by averaging the most direct neighbors sampled in that channel. As the averaging is done regardless of the image gradient, this can cause visible artifacts when interpolated against a strong gradient, such as on image edges.

      Figure 1.3. The Bayer matrix is by far the most used for sampling colors in cameras

      Linear minimum mean-square error demosaicing (Getreuer 2011) suggests working not directly on the three color channels (red, green and blue), but on the pixelwise differences between the green channel and each of the other two channels separately. It interpolates this difference separately in the horizontal and vertical directions, in order to estimate first the green channel, followed by the differences between red and green, and then between blue and green. The red and blue channels can then be recovered by a simple subtraction. This method, as well as many others, makes the underlying assumption that the difference of color channels is smoother than the color channels themselves, and therefore easier to interpolate.

      More recently, convolutional neural networks have been proposed to demosaic an image. For instance, demosaicnet uses a convolutional neural network to jointly interpolate and denoise an image (Gharbi et al. 2016; Ehret and Facciolo 2019). Even if these methods offer superior results to algorithms without training, they also require more resources, and are therefore not widely used yet in digital cameras.

      The methods described here are only a brief overview of the large array of methods that exist for image demosaicing. This variety is increased by the fact that most industrial cameras do not disclose their often private demosaicing algorithm.

      White balance aims to adjust values obtained by the sensors, so that they match the colors perceived by the observer by adjusting the gain values of each channel. White balance adjusts the output using characteristics of the light source, so that achromatic objects in the real scene are rendered as such (Losson and Dinet 2012).

      For example, white balance can be achieved by multiplying the value of each channel, so that a pixel that has a maximum value in each channel is found to have the same maximum value 255 in all channels.

      Then, the image goes through what is known as gamma correction. The charge accumulated by the sensor is proportional to the number of photons incident on the device during the exposure time. However, human perception is not linear with the signal intensity (Fechner 1860). Therefore, the image is processed to accurately represent human vision by applying a concave function of the form

, where γ typically varies between 1.8 and 2.2. The idea behind this procedure is not only to enhance the contrast of the image but also to encode more precisely the information in the dark areas, which are too dark in the raw image.

      Nevertheless, commercial cameras generally do not apply this simple function, but rather a tone curve. Tone curves allow image intensities to be mapped according to precomputed tables that simulate the nonlinearity present in human vision.

      Figure 1.4. JPEG compression pipeline

      The stages of the JPEG compression algorithm, illustrated in Figure 1.4, are detailed below. The first stage of the JPEG encoding process consists of performing a color space transformation from RGB to YCBCR, where Y is the luminance component and CB and CR are the chrominance components of the blue difference and the red difference. Since the Human Visual System (HVS) is less sensitive to color changes than to changes in luminance, color components can be subsampled without affecting visual perception too much. The subsampling ratio generally applied is 4:2:0, which means that the horizontal and vertical resolution is reduced by a factor of 2. After the color subsampling, each channel is divided into blocks of 8 × 8 and each block is processed independently. The discrete cosine transform (DCT) is applied to each block and the coefficients are quantized.

      The JPEG quality factor Q, ranging between 1 and 100, corresponds to the rate of image compression. The lower this rate, the lighter the resulting file, but the more deteriorated the image. A quantization matrix linked to Q provides a factor for each component of the DCT blocks. It is during this quantization step that the greatest loss of information occurs, but it is also this step that allows the most space in memory to be saved. The coefficients corresponding to the high frequencies, whose variations the HVS struggles to distinguish, are the most quantized, sometimes going so far as to be entirely canceled.

      Finally, as in the example in Figure 1.5, the quantized blocks are encoded without loss to obtain a JPEG file. Each 8 × 8 block is zig-zagged and the coefficients are arranged as a vector in which the first components represent the low frequencies and the last ones represent the high frequencies.

      Lossless compression by RLE coding (range coding) then exploits the long series of zeros at the end of each vector due to the strong quantization of the high frequencies, and then a Huffman code allows for a final lossless compression of the data, to which a header is finally added to form the file.

      1.3.1. Non-parametric estimation of noise in images