libjpeg-turbo | About / A Study on the Usefulness of DCT Scaling and SmartScale

Author: D. R. Commander

Disclaimer: My background is in high-performance computing and visualization, not image processing, so the information below is largely the product of my own understanding and experimentation, not the product of any official source on the topic (of which there are few.) Thus, if I've gotten anything wrong, please correct me.

This report is the product of many hours of pro bono work, so if you have benefited from the information herein, please consider donating to our project.

Introduction

With version 7 of the Independent JPEG Group's software, new "scaled" versions of the forward DCT algorithms were introduced, thus allowing an image to be scaled by a factor of 8/N, N={1...16} while compressing. These algorithms accomplish this by operating over multiple blocks but outputting only one block. For instance, for a DCT block size of 8x8 (baseline JPEG), the DCT could operate over 16x16 pixels to effectively scale the image by 1/2 horizontally and vertically. The most obvious use for this feature is as a means of generating a smaller version of a large image for the purposes of transmission or previewing. However, since DCT scaling cannot produce a scaled image of an arbitrary size, one is left to wonder whether it is really more useful in this context than scaling in image space prior to compression.

The second use for DCT scaling is as a means of data reduction. In this context, DCT downscaling is an intermediate step that, when combined with IDCT upscaling during decompression, will produce an image of the same resolution as the original source. To understand the motivation behind this requires a brief discussion of how data is typically reduced in a JPEG image. When the JPEG quality is decreased, this increases the amount of quantization, which causes more and more high frequencies in each DCT block (a DCT block is 8x8 in baseline JPEG) to be discarded. Recall that "frequency" in an image is the relative change in brightness from pixel to pixel, so when one says "high-frequency", one is referring to a sudden spatial change in brightness (in other words, a sharp feature or edge.) Thus, the visual effect of discarding high frequencies is to increase the distance required to change from one color to another within a block, which can cause "blocking artifacts" around sharp features. Within each DCT block, if the highest frequencies are discarded, then that block can no longer represent sharp changes in color or brightness, so if a sharp feature passes through the block, the pixels in the block on either side of the feature will not be represented accurately. In the most extreme case (quality=1), a block can only represent a single color or a linear gradient between two colors.

The principle behind DCT scaling is that, rather than discard frequency data after the DCT is performed, data can instead be discarded by the DCT algorithm itself. The idea is that there is a point at which increasing the level of quantization produces so many blocking artifacts that a better-quality image can be generated at the same compression ratio by reducing the number of pixels and increasing the JPEG quality. The visual effect of this is somewhat different than that of quantization. When you downscale and then upscale in the image domain, you end up with either a pixelated or a blurred version of the original image. IDCT upscaling, however, tends to preserve edges, but it produces "ghosting" or "ringing" artifacts around them. Additionally, when used on a JPEG image that was compressed with even a moderate amount of quantization, IDCT upscaling can generate sharp artifacts at the boundaries of blocks containing high-frequency features, and a large amount of noise can be generated within those blocks. It also goes without saying that a DCT-scaled image will not be as sharp as the original. However, the claim is that there exist some images for which balancing quantization with DCT scaling will produce either a better perceptual quality at the same compression ratio, or a better compression ratio at the same perceptual quality, than quantization alone.

Upscaling an image during compression would not be of much use, since it would increase the JPEG file size with no increase in quality. Thus, with jpeg-7, the only really useful scaling factors were 8/N, N={9...16}, which placed the scaling factors in the narrow range of 0.5 (8/16) to 0.89 (8/9). Enter SmartScale. With jpeg-8, the IJG introduced a non-standard extension to the JPEG format that allowed for DCT block sizes other than 8x8. There were two purposes for this. The first was that it allowed DCT scaling by any factor of M/N, where both M and N were in the range of 1...16. This provided a wider range of useful scaling factors, from 0.06 (1/16) up to 0.94 (15/16). The second purpose was that adjusting the block size gave the user an additional quality vs. compression ratio knob. Reducing the block size increases the file size, but it also means that sharp features can be accurately represented at much lower quality levels. Thus, whereas low JPEG quality levels still effectively reduce the number of colors in the image, reducing the block size greatly reduces or, in the case of a block size of 1, eliminates the blocking artifacts due to quantization. When combined with DCT scaling, reducing the block size reduces the ringing and block edge artifacts, but the image becomes more pixelated. In the extreme case of a block size of 1, the ringing/block edge artifacts are eliminated altogether, but the result is identical to that of image-space scaling.

The SmartScale extension effectively creates a new image format that is not an industry standard or even (yet) an accepted community standard, and this author is unaware of any prominent software that has yet adopted it. This report is, in part, an attempt to review the efficacy of the SmartScale extension as both a data reduction and as a quality enhancement mechanism, in hopes that the community can decide for itself whether the extension is worth supporting. This will largely determine whether or not it ever gets added to libjpeg-turbo.

A blog post on hardwarebug.org evaluated the effectiveness of DCT scaling for photographic content in a narrow quality range. While this post offered much useful insight, further study was warranted, as it did not actually take advantage of the SmartScale feature (it only used DCT scaling with a block size of 8), nor did it examine the effectiveness of DCT scaling and SmartScale on different image types and quality ranges.

Although scant information is available on the topic, one can ascertain from various online posts that the IJG intends DCT scaling to be used as a means of data reduction and for SmartScale to be used as a means of quality improvement. In fact, they have even used SmartScale to implement a lossless JPEG format, which is basically just a SmartScale file with a block size of 1 and a JPEG colorspace of RGB rather than YCbCr. This article evaluates both the data reduction and quality improvement claims.