====================================== Thoughts on PNG Gamma Handling draft 0.1.2 (Sun 7 Jun 1998) Adam M. Costello with contributions from Dave Martindale ====================================== Revision history ================ 0.1.1 (Sat 14 Mar 1998) 0.1.2 (Sun 7 Jun 1998) Fixed non-substantive errors in wording. Model 0 ======= This is the existing model from the PNG 1.0 spec, including its flaws. light_in file light_out --------> camera --> encoder ----> decoder --> LUT --> CRT ---------> LUT = lookup table (of a frame buffer) CRT = cathode ray tube (a monitor) The camera, encoder, decoder, LUT, and CRT each maps its input to its output using a function that we approximate by a power function. The exponents are called camera_gamma, encoding_gamma, decoding_gamma, LUT_gamma, and CRT_gamma, respectively. CRT_gamma is almost always 2.5. LUT_gamma is 1.0 on PCs, but not on Macs and SGIs. The frame buffer contains the values output by the decoder, input to the LUT. The function mapping the frame buffer values to light_out is called display_gamma, and we can see from the pipeline that: display_gamma = LUT_gamma * CRT_gamma The function mapping light_in to the file samples has an exponent called file_gamma, and we can see from the pipeline that: file_gamma = camera_gamma * encoding_gamma For a modern (Rec 709) camera, camera_gamma is between 0.5 and 0.52. For an old NTSC camera, camera_gamma is nominally 0.45. The function mapping light_in to light_out has an exponent called viewing_gamma, and we can see from the pipeline that: viewing_gamma = camera_gamma * encoding_gamma * decoding_gamma * LUT_gamma * CRT_gamma = file_gamma * decoding_gamma * display_gamma Notice that viewing_gamma is not necessarily 1.0. It is a function of the monitor environment (1.0 for "bright", 1.5 for "dark", and 1.25 for "dim"). If the decoder knows about the LUT and the monitor and the monitor environment, then it knows display_gamma and viewing_gamma. (If you don't know the monitor environment, assume viewing_gamma is 1.0 or 1.25.) Because file_gamma is stored in the file, the decoder can deduce decoding_gamma: decoding_gamma = viewing_gamma / (file_gamma * display_gamma) Actually, file_gamma, which is a floating point number, is not stored directly in the gAMA chunk. It is multiplied by 100000 and rounded to the nearest integer. The decoder divides the gAMA value by 100000 to obtain file_gamma. gAMA_value = file_gamma * 100000 Therefore: decoding_gamma = (viewing_gamma * 100000) / (gAMA_value * display_gamma) Model 0 problems ================ CRT_gamma might not be 2.5. Sony says it's 2.2. viewing_gamma is not a function of just the monitor environment--it is a function of both the monitor environment and camera environment. Therefore, the decoder does not have enough information to deduce decoding_gamma. Even if we were to assume a standard camera environment, the mapping from monitor environment to viewing_gamma is imprecise, and the recommended default is imprecise. Many images are generated without the use of a camera, making it conceptually difficult to apply this model. The model assumes that the goal is to reproduce an original scene faithfully, but often the goal is to achieve a desired displayed image, which might be a deliberate distortion of an original scene, or there might not be an original scene at all. The decoder behavior is not predictable enough. Our primary goal should be to ensure that two observers looking at the same PNG file on identical monitors in identical environments should see the same thing, regardless of which computer platform and which decoding software they are using. Backward compatibility ====================== When we change the model, we must strive to maintain backward compatibility with model 0, because we made a promise with PNG draft 9. Backward compatibility means simply that encoders and decoders written to the new spec will behave the same as those written to the old spec, given the same inputs. This can be hard to judge because the old spec is imprecise, so we don't know *exactly* how old encoders and decoders behave. Even if we did know the exact old behavior, it might be impossible to fix all the problems and achieve full backward compatibility, but we must try to come as close as we can. Solution options ================ The CRT_gamma problem is separate from the rest. There is a correct value, and we just need to find out what it is. Once we do, it will have implications for backward compatibility. I can think of four ways to address the decoder's lack of information about the camera environment: 1) Allow the encoder to store additional information in the file about the camera environment. 2) Assume a standard camera environment. We can choose one that causes decoders to behave the same as before, but unfortunately encoders will behave differently, because new encoders will compensate for nonstandard camera environments, while old encoders will not. 3) Base the gAMA value on the relation between the file samples and light_out, instead of the relation between light_in and the file samples. Now the decoder doesn't need to know the camera environment, but now the encoder needs to assume a monitor environment. This is the mirror image of the previous situation, and we are faced with an analogous choice: 3a) The encoder stores information in the file about the monitor environment it assumed when choosing the gAMA value. 3b) We agree on a single reference monitor environment, so it needn't be stored in the file. 4) Base the gAMA value on the relation between the file samples and a well-specified perceptual color space that is divorced from both light_in and light_out. Options 1 and 2 do not address the problem of images without cameras. Option 4 is beyond my capabilities. Option 3 defines gAMA in terms of monitors, which I'm told are the best understood color devices, so I'm going to explore that one further. Describing environments sounds hard, so I'll focus on option 3b, in the hope that we can just point at someone else's description. If anyone would like to discuss the other options, we should certainly do so. Model 1 ======= Decoding -------- file light_out light_eye MI ----> decoder --> LUT --> CRT ---------> env -------------> HVS --> light_ref light_eye_ref MI ---------> env_ref -------------> HVS_ref --> LUT = lookup table (of a frame buffer) CRT = cathode ray tube (a monitor) HVS = human visual system MI = mental image (the same one in both pipelines) The decoder, LUT, and CRT each maps its input to its output using a function that we approximate by a power function. The exponents are called decoding_gamma, LUT_gamma, and CRT_gamma, respectively. CRT_gamma is almost always CRT_gamma_default. LUT_gamma is 1.0 on PCs, but not on Macs and SGIs. The environment may cause the light emitted from the CRT phosphors not to be proportional to the light entering the observer's eye. For example, ambient light reflects off the monitor screen (this is called "flare"). The distortion function in the actual monitor environment is denoted by env, while the function in the reference environment is denoted by env_ref. The corresponding exponents are called CRT_env_phys and CRT_env_phys_ref. The human visual system maps its input to its output using a function that we don't pretend to understand, which varies with the environment. HVS denotes the function in the actual monitor environment, while HVS_ref denotes the function in the reference monitor environment. Luckily for us, the function HVS --> HVS_ref^-1 can be approximated by a power function, whose exponent is called CRT_env_percept_delta, which is obviously 1.0 when the actual environment equals the reference environment. The function mapping light_out to light_ref (env --> HVS --> HVS_ref^-1 --> env_ref^-1) has an exponent called CRT_env_delta, which is 1.0 when the actual environment equals the reference environment. We can see from the pipeline that: CRT_env_delta = CRT_env_phys * CRT_env_percept_delta / CRT_env_phys_ref The frame buffer contains the values output by the decoder, input to the LUT. The function mapping the frame buffer values to light_out is called display_gamma, and we can see from the pipeline that: display_gamma = LUT_gamma * CRT_gamma The function mapping light_ref to the file samples has an exponent called sample_gamma, and we can see from the pipeline that: sample_gamma = 1 / (decoding_gamma * LUT_gamma * CRT_gamma * CRT_env_delta) = 1 / (decoding_gamma * display_gamma * CRT_env_delta) If the decoder knows about the LUT, the monitor, and the monitor environment, then it knows display_gamma and CRT_env_delta. (If you don't know the monitor environment, assume CRT_env_delta_default.) Because sample_gamma is stored in the file, the decoder can deduce decoding_gamma: decoding_gamma = 1 / (sample_gamma * display_gamma * CRT_env_delta) Actually, sample_gamma, which is a floating point number, is not stored directly in the gAMA chunk. It is multiplied by gAMA_one and rounded to the nearest integer. The decoder divides the gAMA value by gAMA_one to obtain sample_gamma. [gAMA_one is not a parameter; it's a constant. The spec would contain the particular number, not the symbol "gAMA_one".] gAMA_value = sample_gamma * gAMA_one Therefore: decoding_gamma = gAMA_one / (gAMA_value * display_gamma * CRT_env_delta) If the decoder gets its knowledge of the monitor environment from the user, it should set: CRT_env_delta = CRT_env_delta_default * user_extra_gamma where user_extra_gamma is directly adjustable by the user. A value of 1.0 for user_extra_gamma yields the same behavior as if the decoder knew nothing about the monitor environment. The decoder should offer the user hints about which values of user_extra_gamma are appropriate for various environments. For example: user_extra_gamma = (pick one) <*> 1.0 for a well-lit indoor environment < > 1.2 for a dim environment < > 1.4 for a dark environment < > [___] enter your own value Those numbers are for illustration purposes only--they're not recommended. The spec would need good numbers in the example. Encoding -------- An encoder should write whatever gAMA_value will cause decoders to display the image "correctly", where "correct" means whatever the observer is supposed to see--the goal may or may not be faithful reproduction of an original scene. For images that have been designed to be shown on (or approved as looking correct on) a particular display in a particular environment, the encoder simply uses the equation for sample_gamma, with decoding_gamma = 1 and the other parameters set for the target display and environment. For images captured from cameras, when you know about the camera and its environment, but not about the target display and environment, it may be useful to think about the full pipeline: light_in file --------> camera --> encoder ----> file light_out MI ----> decoder --> LUT,CRT ---------> env,HVS --> light_ref MI ---------> env_ref,HVS_ref --> The camera and encoder each maps its input to its output using a function that we approximate by a power function. The exponents are called camera_gamma and encoding_gamma, respectively. For a Rec 709 camera, camera_gamma is about 0.52. The function mapping light_in to light_ref has an exponent called camera_env_delta, and we can see from the pipeline that: camera_env_delta = camera_gamma * encoding_gamma * decoding_gamma * display_gamma * CRT_env_delta = camera_gamma * encoding_gamma / sample_gamma Therefore: gAMA_value = gAMA_one * camera_gamma * encoding_gamma / camera_env_delta Remember that the whole point of the camera/CRT pipeline is to produce the same mental image as if the observer were looking at the actual scene: light_in MI ---------> HVS_cam --> light_ref MI ---------> env_ref --> HVS_ref --> HVS_cam denotes the human visual system in the camera environment. There is no env_cam because the light entering the eye is the same light entering the camera. The exponent of the function HVS_cam --> HVS_ref^-1 is called camera_env_percept_delta. It is analogous to CRT_env_percept_delta, so it can use the same map from environment parameters to env_percept_delta values. We can see from the pipelines that: camera_env_delta = camera_env_percept_delta / CRT_env_phys_ref If you don't know the camera environment, assume camera_env_delta_default. Notice that model 1 does not use the function from light_in to light_out, because neither the encoder nor the decoder has enough information to determine this function. Its exponent was called viewing_gamma in model 0, and we infer from the relations between light_in, light_ref, and light_out that: viewing_gamma = camera_env_delta / CRT_env_delta. But remember that model 1 has no use for this exponent. Alpha handling ============== Alpha operations are supposed to be performed on values proportional to light, but there are five different lights: light_in Input to the camera, if there is one. light_out Output from the monitor in the actual environment. light_ref Output from the reference monitor in the reference environment. light_eye Input to the observer's eye in the actual environment. light_eye_ref Input to the reference observer's eye in the reference environment. I don't know which of these is most correct, but the numerical results will be quite similar no matter which one you use. The exponents of the following functions may be useful: light_in -----> file_sample: camera_gamma file_sample --> light_ref: sample_gamma file_sample --> light_out: decoding_gamma * display_gamma light_out ----> light_eye: CRT_env_phys light_ref ----> light_eye_ref: CRT_env_phys_ref Not all of the lights are available to both encoder and decoder: encoder decoder light_in X light_ref X X light_eye_ref X X light_out X light_eye X Making model 1 backward compatible ================================== We need to make sure that encoders and decoders behave the same under models 0 and 1, so lets examine a few test cases. Case 1: Encoder, Rec 709 camera, unknown camera environment ------------------------------------------------------------ Model 0 behavior: Write a gAMA_value between 50000 and 52000. The PNG spec states unequivocally that this is the correct behavior (near the end of the gamma tutorial). Model 1 behavior: gAMA_value = gAMA_one * 0.52 / camera_env_delta_default Compatibility constraint: [96154,100000] = gAMA_one / camera_env_delta_default Case 2: Encoder = paint program -------------------------------- Model 0 behavior: Hard to say, since model 0 is in terms of the camera, and there is no camera. The most sensible behavior I can think of is to use the decoder spec, and write whatever gAMA_value will result in correct display. Model 1 behavior: Write whatever gAMA_value will result in correct display. Compatibility constraint: none if decoders are compatible (see case 4) Case 3: Encoder = raytracer ---------------------------- Model 0 behavior: The raytracer works with virtual light, so it will write gAMA_value = encoding_gamma * 100000. However, the scene designer will adjust the scene parameters until the image looks "correct" on the display. Although it is the file samples that have been adjusted rather than the gAMA_value, the end result is that the gAMA_value is the one that causes the file samples to be displayed correctly. Model 1 behavior: Write whatever gAMA_value will result in correct display. Compatibility constraint: none if decoders are compatible (see case 4) Case 4: Decoder, PC display, unknown monitor environment -------------------------------------------------------- Model 0 behavior: decoding_gamma = (viewing_gamma * 100000) / (gAMA_value * 2.5) where viewing_gamma is "1.0 or 1.25". In other words: decoding_gamma = [40000,50000] / gAMA_value Model 1 behavior: decoding_gamma = gAMA_one / (gAMA_value * CRT_gamma_default * CRT_env_delta_default) Compatibility constraint: [40000,50000] = gAMA_one / (CRT_gamma_default * CRT_env_delta_default) Note: There is anecdotal evidence that some existing PNG encoders and decoders expect decoding_gamma to be 1.0 when gAMA_value is near 45000 and the image is displayed on a PC in an unknown monitor environment. Constraint summary ------------------ Including some additional constraints having nothing to do with backward compatibility: * CRT_gamma_default must be correct. * [96154,100000] = gAMA_one / camera_env_delta_default * [40000,50000] = gAMA_one / (CRT_gamma_default * CRT_env_delta_default) Existing practice suggests that the left side should be near 45000. * CRT_env_delta_default should be based on a realistic typical monitor environment. Its value depends on what we choose to be the reference viewing environment. (Note that the reference environment and the default environment need not be the same.) * camera_env_delta_default should be based on a realistic typical camera environment. Its value depends on what we choose to be the reference viewing environment. * We would like to point to an existing spec for the reference viewing environment. * We would like gAMA_one to be 100000. Unless we get lucky, it may be impossible to satisfy all seven constraints. I personally think the least important is the desire for gAMA_one to be 100000. Model 0 software reads and writes file_gamma, while model 1 software reads and writes sample_gamma, but in order to interoperate, they need to use the same gAMA_value in the common cases (it's impossible in general). It should not be surprising that we may need to use a backward compatibility fudge factor in order to pull that off. Programs that print the contents of PNG chunks should print both gAMA_value (to avoid ambiguity) and sample_gamma (to be user-friendly). Next steps ========== Gain confidence in a value for CRT_gamma_default. Agree on a realistic range of default environments for monitors and cameras. Compile a list of candidate reference viewing environments. Maybe a very short list. ;) For each candidate, determine the realistic range of values of CRT_env_delta_default and camera_env_delta_default.