======================================
Thoughts on PNG Gamma Handling
draft 0.1.2 (Sun  7 Jun 1998)
Adam M. Costello <amc@cs.berkeley.edu>
with contributions from
Dave Martindale <davem@cs.ubc.ca>
======================================


Revision history
================

0.1.1 (Sat 14 Mar 1998)

0.1.2 (Sun  7 Jun 1998)
    Fixed non-substantive errors in wording.


Model 0
=======

This is the existing model from the PNG 1.0 spec, including its flaws.

light_in                     file                          light_out
--------> camera --> encoder ----> decoder --> LUT --> CRT --------->

LUT = lookup table (of a frame buffer)
CRT = cathode ray tube (a monitor)

The camera, encoder, decoder, LUT, and CRT each maps its input to its
output using a function that we approximate by a power function.  The
exponents are called camera_gamma, encoding_gamma, decoding_gamma,
LUT_gamma, and CRT_gamma, respectively.  CRT_gamma is almost always 2.5.
LUT_gamma is 1.0 on PCs, but not on Macs and SGIs.

The frame buffer contains the values output by the decoder, input to
the LUT.  The function mapping the frame buffer values to light_out is
called display_gamma, and we can see from the pipeline that:

    display_gamma = LUT_gamma * CRT_gamma

The function mapping light_in to the file samples has an exponent called
file_gamma, and we can see from the pipeline that:

    file_gamma = camera_gamma * encoding_gamma

For a modern (Rec 709) camera, camera_gamma is between 0.5 and 0.52.
For an old NTSC camera, camera_gamma is nominally 0.45.

The function mapping light_in to light_out has an exponent called
viewing_gamma, and we can see from the pipeline that:

    viewing_gamma = camera_gamma * encoding_gamma * decoding_gamma *
                    LUT_gamma * CRT_gamma
                  = file_gamma * decoding_gamma * display_gamma

Notice that viewing_gamma is not necessarily 1.0.  It is a function of
the monitor environment (1.0 for "bright", 1.5 for "dark", and 1.25 for
"dim").  If the decoder knows about the LUT and the monitor and the
monitor environment, then it knows display_gamma and viewing_gamma.  (If
you don't know the monitor environment, assume viewing_gamma is 1.0 or
1.25.)  Because file_gamma is stored in the file, the decoder can deduce
decoding_gamma:

    decoding_gamma = viewing_gamma / (file_gamma * display_gamma)

Actually, file_gamma, which is a floating point number, is not stored
directly in the gAMA chunk.  It is multiplied by 100000 and rounded to
the nearest integer.  The decoder divides the gAMA value by 100000 to
obtain file_gamma.

    gAMA_value = file_gamma * 100000

Therefore:

    decoding_gamma = (viewing_gamma * 100000) /
                     (gAMA_value * display_gamma)


Model 0 problems
================

CRT_gamma might not be 2.5.  Sony says it's 2.2.

viewing_gamma is not a function of just the monitor environment--it
is a function of both the monitor environment and camera environment.
Therefore, the decoder does not have enough information to deduce
decoding_gamma.

Even if we were to assume a standard camera environment, the mapping
from monitor environment to viewing_gamma is imprecise, and the
recommended default is imprecise.

Many images are generated without the use of a camera, making it
conceptually difficult to apply this model.

The model assumes that the goal is to reproduce an original scene
faithfully, but often the goal is to achieve a desired displayed image,
which might be a deliberate distortion of an original scene, or there
might not be an original scene at all.

The decoder behavior is not predictable enough.  Our primary goal
should be to ensure that two observers looking at the same PNG file on
identical monitors in identical environments should see the same thing,
regardless of which computer platform and which decoding software they
are using.


Backward compatibility
======================

When we change the model, we must strive to maintain backward
compatibility with model 0, because we made a promise with PNG draft
9.  Backward compatibility means simply that encoders and decoders
written to the new spec will behave the same as those written to the
old spec, given the same inputs.  This can be hard to judge because
the old spec is imprecise, so we don't know *exactly* how old encoders
and decoders behave.  Even if we did know the exact old behavior, it
might be impossible to fix all the problems and achieve full backward
compatibility, but we must try to come as close as we can.


Solution options
================

The CRT_gamma problem is separate from the rest.  There is a correct
value, and we just need to find out what it is.  Once we do, it will
have implications for backward compatibility.

I can think of four ways to address the decoder's lack of information
about the camera environment:

 1) Allow the encoder to store additional information in the file about
    the camera environment.

 2) Assume a standard camera environment.  We can choose one that causes
    decoders to behave the same as before, but unfortunately encoders
    will behave differently, because new encoders will compensate for
    nonstandard camera environments, while old encoders will not.

 3) Base the gAMA value on the relation between the file samples
    and light_out, instead of the relation between light_in and
    the file samples.  Now the decoder doesn't need to know the
    camera environment, but now the encoder needs to assume a monitor
    environment.  This is the mirror image of the previous situation,
    and we are faced with an analogous choice:

    3a) The encoder stores information in the file about the monitor
        environment it assumed when choosing the gAMA value.

    3b) We agree on a single reference monitor environment, so it
        needn't be stored in the file.

 4) Base the gAMA value on the relation between the file samples and a
    well-specified perceptual color space that is divorced from both
    light_in and light_out.

Options 1 and 2 do not address the problem of images without cameras.
Option 4 is beyond my capabilities.  Option 3 defines gAMA in terms
of monitors, which I'm told are the best understood color devices, so
I'm going to explore that one further.  Describing environments sounds
hard, so I'll focus on option 3b, in the hope that we can just point at
someone else's description.  If anyone would like to discuss the other
options, we should certainly do so.


Model 1
=======

Decoding
--------

file                          light_out            light_eye            MI
----> decoder --> LUT --> CRT --------->   env   ------------->   HVS   -->


                              light_ref          light_eye_ref          MI
                              ---------> env_ref -------------> HVS_ref -->

LUT = lookup table (of a frame buffer)
CRT = cathode ray tube (a monitor)
HVS = human visual system
MI  = mental image (the same one in both pipelines)

The decoder, LUT, and CRT each maps its input to its output using
a function that we approximate by a power function.  The exponents
are called decoding_gamma, LUT_gamma, and CRT_gamma, respectively.
CRT_gamma is almost always CRT_gamma_default.  LUT_gamma is 1.0 on PCs,
but not on Macs and SGIs.

The environment may cause the light emitted from the CRT phosphors
not to be proportional to the light entering the observer's eye.  For
example, ambient light reflects off the monitor screen (this is called
"flare").  The distortion function in the actual monitor environment
is denoted by env, while the function in the reference environment is
denoted by env_ref.  The corresponding exponents are called CRT_env_phys
and CRT_env_phys_ref.

The human visual system maps its input to its output using a function
that we don't pretend to understand, which varies with the environment.
HVS denotes the function in the actual monitor environment, while
HVS_ref denotes the function in the reference monitor environment.
Luckily for us, the function HVS --> HVS_ref^-1 can be approximated
by a power function, whose exponent is called CRT_env_percept_delta,
which is obviously 1.0 when the actual environment equals the reference
environment.

The function mapping light_out to light_ref (env --> HVS --> HVS_ref^-1
--> env_ref^-1) has an exponent called CRT_env_delta, which is 1.0 when
the actual environment equals the reference environment.  We can see
from the pipeline that:

    CRT_env_delta
      = CRT_env_phys * CRT_env_percept_delta / CRT_env_phys_ref

The frame buffer contains the values output by the decoder, input to
the LUT.  The function mapping the frame buffer values to light_out is
called display_gamma, and we can see from the pipeline that:

    display_gamma = LUT_gamma * CRT_gamma

The function mapping light_ref to the file samples has an exponent
called sample_gamma, and we can see from the pipeline that:

    sample_gamma
      = 1 / (decoding_gamma * LUT_gamma * CRT_gamma * CRT_env_delta)
      = 1 / (decoding_gamma * display_gamma * CRT_env_delta)

If the decoder knows about the LUT, the monitor, and the monitor
environment, then it knows display_gamma and CRT_env_delta.  (If you
don't know the monitor environment, assume CRT_env_delta_default.)
Because sample_gamma is stored in the file, the decoder can deduce
decoding_gamma:

    decoding_gamma = 1 / (sample_gamma * display_gamma * CRT_env_delta)

Actually, sample_gamma, which is a floating point number, is not stored
directly in the gAMA chunk.  It is multiplied by gAMA_one and rounded to
the nearest integer.  The decoder divides the gAMA value by gAMA_one to
obtain sample_gamma. [gAMA_one is not a parameter; it's a constant.  The
spec would contain the particular number, not the symbol "gAMA_one".]

    gAMA_value = sample_gamma * gAMA_one

Therefore:

    decoding_gamma
      = gAMA_one / (gAMA_value * display_gamma * CRT_env_delta)

If the decoder gets its knowledge of the monitor environment from the
user, it should set:

    CRT_env_delta = CRT_env_delta_default * user_extra_gamma

where user_extra_gamma is directly adjustable by the user.  A value of
1.0 for user_extra_gamma yields the same behavior as if the decoder knew
nothing about the monitor environment.  The decoder should offer the
user hints about which values of user_extra_gamma are appropriate for
various environments.  For example:

    user_extra_gamma = (pick one)
      <*>  1.0  for a well-lit indoor environment
      < >  1.2  for a dim environment
      < >  1.4  for a dark environment
      < > [___] enter your own value

Those numbers are for illustration purposes only--they're not
recommended.  The spec would need good numbers in the example.


Encoding
--------

An encoder should write whatever gAMA_value will cause decoders to
display the image "correctly", where "correct" means whatever the
observer is supposed to see--the goal may or may not be faithful
reproduction of an original scene.  For images that have been designed
to be shown on (or approved as looking correct on) a particular display
in a particular environment, the encoder simply uses the equation for
sample_gamma, with decoding_gamma = 1 and the other parameters set for
the target display and environment.

For images captured from cameras, when you know about the camera and its
environment, but not about the target display and environment, it may be
useful to think about the full pipeline:

light_in                     file
--------> camera --> encoder ---->


file                      light_out                  MI
----> decoder --> LUT,CRT --------->     env,HVS     -->


                          light_ref                  MI
                          ---------> env_ref,HVS_ref -->

The camera and encoder each maps its input to its output using a
function that we approximate by a power function.  The exponents are
called camera_gamma and encoding_gamma, respectively.  For a Rec 709
camera, camera_gamma is about 0.52.

The function mapping light_in to light_ref has an exponent called
camera_env_delta, and we can see from the pipeline that:

    camera_env_delta = camera_gamma * encoding_gamma * decoding_gamma *
                       display_gamma * CRT_env_delta
                     = camera_gamma * encoding_gamma / sample_gamma

Therefore:

    gAMA_value
      = gAMA_one * camera_gamma * encoding_gamma / camera_env_delta

Remember that the whole point of the camera/CRT pipeline is to produce
the same mental image as if the observer were looking at the actual
scene:

light_in                       MI
--------->       HVS_cam       -->


light_ref                      MI
---------> env_ref --> HVS_ref -->

HVS_cam denotes the human visual system in the camera environment.
There is no env_cam because the light entering the eye is the same
light entering the camera.  The exponent of the function HVS_cam -->
HVS_ref^-1 is called camera_env_percept_delta.  It is analogous to
CRT_env_percept_delta, so it can use the same map from environment
parameters to env_percept_delta values.  We can see from the pipelines
that:

camera_env_delta = camera_env_percept_delta / CRT_env_phys_ref

If you don't know the camera environment, assume
camera_env_delta_default.

Notice that model 1 does not use the function from light_in to
light_out, because neither the encoder nor the decoder has enough
information to determine this function.  Its exponent was called
viewing_gamma in model 0, and we infer from the relations between
light_in, light_ref, and light_out that:

    viewing_gamma = camera_env_delta / CRT_env_delta.

But remember that model 1 has no use for this exponent.


Alpha handling
==============

Alpha operations are supposed to be performed on values proportional to
light, but there are five different lights:

    light_in        Input to the camera, if there is one.

    light_out       Output from the monitor in the actual environment.

    light_ref       Output from the reference monitor in the reference
                    environment.

    light_eye       Input to the observer's eye in the actual environment.

    light_eye_ref   Input to the reference observer's eye in the
                    reference environment.

I don't know which of these is most correct, but the numerical results
will be quite similar no matter which one you use.  The exponents of the
following functions may be useful:

    light_in -----> file_sample:    camera_gamma

    file_sample --> light_ref:      sample_gamma

    file_sample --> light_out:      decoding_gamma * display_gamma

    light_out ----> light_eye:      CRT_env_phys

    light_ref ----> light_eye_ref:  CRT_env_phys_ref

Not all of the lights are available to both encoder and decoder:

                   encoder  decoder
    light_in          X
    light_ref         X        X
    light_eye_ref     X        X
    light_out                  X
    light_eye                  X


Making model 1 backward compatible
==================================

We need to make sure that encoders and decoders behave the same under
models 0 and 1, so lets examine a few test cases.


Case 1:  Encoder, Rec 709 camera, unknown camera environment
------------------------------------------------------------

Model 0 behavior:  Write a gAMA_value between 50000 and 52000.  The PNG
spec states unequivocally that this is the correct behavior (near the
end of the gamma tutorial).

Model 1 behavior:
    gAMA_value = gAMA_one * 0.52 / camera_env_delta_default

Compatibility constraint:
    [96154,100000] = gAMA_one / camera_env_delta_default


Case 2:  Encoder = paint program
--------------------------------

Model 0 behavior:  Hard to say, since model 0 is in terms of the camera,
and there is no camera.  The most sensible behavior I can think of is
to use the decoder spec, and write whatever gAMA_value will result in
correct display.

Model 1 behavior:  Write whatever gAMA_value will result in correct
display.

Compatibility constraint: none if decoders are compatible (see case 4)


Case 3:  Encoder = raytracer
----------------------------

Model 0 behavior:  The raytracer works with virtual light, so it will
write gAMA_value = encoding_gamma * 100000.  However, the scene designer
will adjust the scene parameters until the image looks "correct" on the
display.  Although it is the file samples that have been adjusted rather
than the gAMA_value, the end result is that the gAMA_value is the one
that causes the file samples to be displayed correctly.

Model 1 behavior:  Write whatever gAMA_value will result in correct
display.

Compatibility constraint: none if decoders are compatible (see case 4)


Case 4: Decoder, PC display, unknown monitor environment
--------------------------------------------------------

Model 0 behavior:
    decoding_gamma = (viewing_gamma * 100000) / (gAMA_value * 2.5)
where viewing_gamma is "1.0 or 1.25".  In other words:
    decoding_gamma = [40000,50000] / gAMA_value

Model 1 behavior:
    decoding_gamma = gAMA_one / (gAMA_value * CRT_gamma_default
                                 * CRT_env_delta_default)

Compatibility constraint:
    [40000,50000]
      = gAMA_one / (CRT_gamma_default * CRT_env_delta_default)

Note:  There is anecdotal evidence that some existing PNG encoders and
decoders expect decoding_gamma to be 1.0 when gAMA_value is near 45000
and the image is displayed on a PC in an unknown monitor environment.


Constraint summary
------------------

Including some additional constraints having nothing to do with backward
compatibility:

  * CRT_gamma_default must be correct.

  * [96154,100000] = gAMA_one / camera_env_delta_default

  * [40000,50000]
      = gAMA_one / (CRT_gamma_default * CRT_env_delta_default)

    Existing practice suggests that the left side should be near 45000.

  * CRT_env_delta_default should be based on a realistic typical
    monitor environment.  Its value depends on what we choose to be the
    reference viewing environment.  (Note that the reference environment
    and the default environment need not be the same.)

  * camera_env_delta_default should be based on a realistic typical
    camera environment.  Its value depends on what we choose to be the
    reference viewing environment.

  * We would like to point to an existing spec for the reference viewing
    environment.

  * We would like gAMA_one to be 100000.

Unless we get lucky, it may be impossible to satisfy all seven
constraints.  I personally think the least important is the desire for
gAMA_one to be 100000.  Model 0 software reads and writes file_gamma,
while model 1 software reads and writes sample_gamma, but in order to
interoperate, they need to use the same gAMA_value in the common cases
(it's impossible in general).  It should not be surprising that we may
need to use a backward compatibility fudge factor in order to pull that
off.  Programs that print the contents of PNG chunks should print both
gAMA_value (to avoid ambiguity) and sample_gamma (to be user-friendly).


Next steps
==========

Gain confidence in a value for CRT_gamma_default.

Agree on a realistic range of default environments for monitors and
cameras.

Compile a list of candidate reference viewing environments.  Maybe a
very short list.  ;)

For each candidate, determine the realistic range of values of
CRT_env_delta_default and camera_env_delta_default.