Instead of relying upon definitions of facial features that a plastic surgeon might agree with, we are instead taking our inspiration from David Martin’s dissertation work [84], which relied on the natural intuitions of very lightly trained students in order to divide photographic scenes into their constituent objects. Although his annotators were not asked to do so, he found that a subset of them wanted to mark more than just the boundaries of any humans appearing in the images. In particular,
they wanted to break down the faces to their constituent parts. For our annotation effort, we encourage and rely on this tendency, asking our annotators to lean heavily on their own intuition on how to break down a face into substructures1.
Since the task of annotating the images of 303 subjects with many anatomical markers is a large, tedious job, we realized in the very early planning stages that in order for the task to be completed in a reasonable amount of time, it would need to be divided up across a group of annotators. This practical necessity gave us the advantage of being able to obtain a consensus opinion on how to define each facial features, rather than relying on the opinion of any single person. This is, of course, also a disadvantage in that there is no guarantee that such a consensus exists. In order to mitigate this risk, we simply involved a subset of the annotation staff at each stage of the feature-definition process. We describe this process in the remainder of this section.
Before any annotation could begin, a single image needed to be selected from the collection of images we had collected for each subject. For this primary annotation round, we used only pose A as depicted in figure 2.16, but there could still be up to 24 such images, depending on how many sessions each subject participated in. Images were selected primarily according to the quality of their pose with respect to the alignment markers, as described in section 2.3.3, and additionally according to the sharpness of the image. A few annotators were hired earlier than the others in order to complete this task. While they were going through this selection process, they were asked to also take a look at the most recognizable facial features: the eyes, the nose, the mouth, and the shape of the head. Having consulted several anatomical references and other efforts toward measuring facial structures [25, 41, 43, 70, 112, 135], together with the annotators, we considered various anatomical boundaries connected to each of these core features. We evaluated the difficulty of 1The reason behind this seemingly instinctive desire to parse faces is interesting in its own right,
but studying this behavior for its own sake it is well beyond the scope of this dissertation. We speculate that we are tapping into the mental processes that facilitate human recognition of faces, and there is doubtless extensive relevant research in Cognitive Science on the subject.
definitively identifying these structures across the entire set of images. For instance, if one is trying to find the boundaries of an eye, one can consider looking for one of several structures: the inner or outer limits of the eyelid, the outermost edges of the orbit (the eye socket), or the orbicularis oculi (the muscles that surround the eye socket and enable the motion of the eyelids). All of these structures are visible and we could have chosen to mark the positions of any or all of them. We chose to only mark the inner edge of the eyelid for this round of annotation simply because it is the easiest boundary to reliably identify: the contrast of the bright white color of the of the sclera with the surrounding eyelids make this boundary particularly easy to define and reliably identify.
Working with a prototype of the project annotation tool, VisageMap, described in detail in section 3.4.2, we then tried actually marking up each feature according to our working definitions, refining these definitions further to try to handle ambiguities or other problems we encountered. For instance, with the eyes, we decided to include anatomical structures inside the eyelid, including such structures as the lacrimal caruncle (this fleshy part of the inner corner of the eye), the plica semilunaris, and other parts of the conjunctiva2. We also settled on a final name for the feature, eye
opening, to precisely express that we are interested in marking the visible opening between the eyelids.
In one case, we were not able to come up with a reasonable rule for defining part of a feature: the topmost point of the nose. Anatomically, there is a clear choice: the nasion, which is the intersection of the nasal bones and the frontal bone on the human skull. This is easy to find by touch: there is a very perceptible notch at this point. However, the soft tissues that cover this part of the skull typically make it completely invisible on a photograph. We therefore chose a completely arbitrary, but visible, point to define the top of the nose feature, and during analysis, we will 2The decision to include all of these structures was primarily to simplify identifying the bound-
aries of this feature to finding the edges of the eyelid, but this also has the advantage of including the complete inner corner of the eye, which very obviously has a lot of individual variation in its shape.
simply treat thus arbitrary point as having a large measurement error.
Additional refinements to the feature definitions were made during the early stages of the production work. All such changes were communicated to the an- notation staff so that a group consensus was always in force. The final working definitions are recorded as part of the annotation template, described in more detail in section 3.4.2, which is itself copied into every completed annotation file as part of its structured metadata.
In addition to refining our definitions, the raw-format images were additionally processed with Adobe Photoshop CS3 Extended. The processing steps were as fol- lows (the precise parameters used for each image were recorded as part of the image metadata):
• The automatic exposure tool for camera raw was used and small hand adjust- ments were made to optimize facial information. Specifically, exposure was adjusted to expose as much detail in the shadowed part of the face (usually the eyes) without blowing out the image highlights (usually on the tip of the nose) excessively.
• The white balance was adjusted based on the color reference card (when avail- able) or the scale reference card.
• Convert the color profile to the working color space.
• Rotate the images to correct for left↔right head tilt. More precisely, we rotated the image so that a vertical line bisecting the face was vertical. We used the ruler tool provided withCS3 Extendedto make sure that the bisecting line was chosen correctly. This rotation of the head within the imaging plane was one pose factor that we could not reliably control when taking the photographs, so we elected to correct for it in post-processing as stated.
canvas size will have become slightly larger if the image was rotated.
• Convert to a JPEG with no embedded color profile3 at the 94% quality level.