The Bayer Sensor Strategy
In this article we’d like to give an inside view on how most cameras acquire their images, using something called a “Bayer array” sensor. This will help put the approach into context, in addition to providing a foundation for future topics.
Digital sensors capture images using arrays of light-gathering wells or “photosites.” When the exposure starts, these are uncovered to collect photons of light; then, when the exposure ends, these are read as an electrical signal, quantified and stored as values in a digital file. To assess color, photosites typically also use filters to ensure that only one color is stored at a time (see monochrome vs. color sensors). What makes Bayer sensors unique is how these color filters are arranged:
Bayer sensors use a simple strategy: capture alternating red, green and blue colors at each photosite, and do so in a way that twice as many green photosites are recorded as either of the other two colors. Values from these photosites are then intelligently combined to produce full color pixels using a process called “demosaicing” (also called “debayer” in REDCINE-X®).
Why this works so well, however, is based on a much deeper understanding of our visual system. The two key concepts are: (1) our eyes perceive a much higher brightness resolution than we do with color, and (2) green light contributes roughly twice as much to our perception of brightness than does the combined effect of red and blue. Allocating more photosites to green therefore produces a far better looking image than if each color were allocated equally.
Note: those with a video encoding background may want to try and apply the 4:2:2, 4:1:1, etc. categorizations to a Bayer sensor, but this terminology is intended for compression methodologies and final images, not the sensors themselves. A 4K Bayer sensor is capable of producing full 4K 4:4:4 RGB files, for example; 4:2:2 is what could be applied to this file afterwards.
A high quality image requires a sensor that measures all of the following as accurately as possible: (i) how much light is received, (ii) the color of this light, and (iii) precisely where this light hits the sensor. Improving upon these measurements yields better dynamic range (or noise), color accuracy and resolution, respectively. The problem is that, everything else being equal, trying to improve any property individually often comes at the expense of others:
For example, using additional red and blue filters at the expense of green gives more color resolution (which is less perceptible), but less brightness resolution (which is more perceptible). Alternatively, photosites without color filters don’t discard any light or require demosaicing, but these only yield a monochrome image. An optimally-designed sensor is therefore one that provides the best compromise for a given task.
However, the above photosite comparisons can also be misleading if taken in isolation. Ultimately, the technology and design behind a photosite is what matters most — particularly when comparing cameras across different generations. This is especially true when considering other properties such as noise and dynamic range.
Let’s take a look at the compromises made by the following alternative approaches:
1. Striped RGB. This groups three photosites into each pixel, similar to how the phosphors on older CRT televisions were arranged. The initial motivation was to be able to read these into RGB pixels directly, without the need for a demosaicing algorithm. However, without demosaicing, the actual resolution often ends up being less than a third of what the photosite count alone would lead one to believe. Furthermore, since each color is in a different position, these sensors are prone to rainbow artifacts when read into RGB values directly. Modern processor speeds have also made demosaicing much less time consuming.
2. Three-Chip RGB & Prism. These achieve the impressive feat of recording the precise location of each color of light without using color filters. However, this greatly increases costs by tripling the sensor area with three chips. To keep prices competitive, the size of each sensor is often reduced, but this ends up requiring non-standard lenses and reducing the light-gathering area. Using a prism to direct light also has the potential to introduce new image artifacts, since light may not refract as intended (depending on angle, path length and polarization).
3. Stacked RGB. While this might initially seem to be the ideal solution—by using a single sensor to record all three colors at each photosite—all current implementations have fallen far short of its theoretical benefits. A big reason is because these sensors have trouble distinguishing colors, since they work by assessing penetration into the photosite instead of using specially-designed color filters. As a result, these sensors require an internal saturation boost, which increases image noise (or requires detail-reducing noise reduction). Furthermore, the read-out rate of these sensors has yet to achieve standard video frame rates at normal broadcast resolutions.
In addition, although the last two examples might at first seem to capture a higher resolution than Bayer, this potential advantage is offset by the need for an anti-aliasing filter to achieve artifact-free video. In such cases, photosites are inefficiently allocated toward improving resolution, and could have been better utilized for other aspects of image capture, such as sensitivity or dynamic range.
Ultimately, the optimal strategy is the one that makes the most effective use of available photosites. Such a sensor would need to take into account how our eyes work by prioritizing brightness resolution over color resolution, while also capturing as high a resolution as possible without unnecessarily compromising dynamic range, noise or color accuracy. Thus far, Bayer sensors have been the best way of achieving these goals. They are capable of far surpassing the quality of film, have remained the dominant approach for over a decade, and are being continually improved.