We present a component-based framework for face detection and identification. The face detection and identification modules share the same hierarchical architecture. They both consist of two layers ...of classifiers, a layer with a set of component classifiers and a layer with a single combination classifier. The component classifiers independently detect/identify facial parts in the image. Their outputs are passed the combination classifier which performs the final detection/identification of the face.We describe an algorithm which automatically learns two separate sets of facial components for the detection and identification tasks. In experiments we compare the detection and identification systems to standard global approaches. The experimental results clearly show that our component-based approach is superior to global approaches.
This paper is motivated by an open problem around deep networks, namely, the apparent absence of over-fitting despite large over-parametrization which allows perfect fitting of the training data. In ...this paper, we analyze this phenomenon in the case of regression problems when each unit evaluates a periodic activation function. We argue that the minimal expected value of the square loss is inappropriate to measure the generalization error in approximation of compositional functions in order to take full advantage of the compositional structure. Instead, we measure the generalization error in the sense of maximum loss, and sometimes, as a pointwise error. We give estimates on exactly how many parameters ensure both zero training error as well as a good generalization error. We prove that a solution of a regularization problem is guaranteed to yield a good training error as well as a good generalization error and estimate how much error to expect at which test data.
We present a general example-based framework for detecting objects in static images by components. The technique is demonstrated by developing a system that locates people in cluttered scenes. The ...system is structured with four distinct example-based detectors that are trained to separately find the four components of the human body: the head, legs, left arm, and right arm. After ensuring that these components are present in the proper geometric configuration, a second example-based classifier combines the results of the component detectors to classify a pattern as either a "person" or a "nonperson." We call this type of hierarchical architecture, in which learning occurs at multiple stages, an adaptive combination of classifiers (ACC). We present results that show that this system performs significantly better than a similar full-body person detector. This suggests that the improvement in performance is due to the component-based approach and the ACC data classification architecture. The algorithm is also more robust than the full-body person detection method in that it is capable of locating partially occluded views of people and people whose body parts have little contrast with the background.
Theory I: Deep networks and the curse of dimensionality T. Poggio; Q. Liao
Bulletin of the Polish Academy of Sciences. Technical sciences,
12/2018, Letnik:
66, Številka:
No 6 (Special Section on Deep Learning: Theory and Practice)
Journal Article
Reanimating Faces in Images and Video Blanz, V.; Basso, C.; Poggio, T. ...
Computer graphics forum,
September 2003, Letnik:
22, Številka:
3
Journal Article
Recenzirano
This paper presents a method for photo‐realistic animation that can be applied to any face shown in a single imageor a video. The technique does not require example data of the person's mouth ...movements, and the image to beanimated is not restricted in pose or illumination. Video reanimation allows for head rotations and speech in theoriginal sequence, but neither of these motions is required.
In order to animate novel faces, the system transfers mouth movements and expressions across individuals, basedon a common representation of different faces and facial expressions in a vector space of 3D shapes and textures.This space is computed from 3D scans of neutral faces, and scans of facial expressions.
The 3D model's versatility with respect to pose and illumination is conveyed to photo‐realistic image and videoprocessing by a framework of analysis and synthesis algorithms: The system automatically estimates 3D shape andall relevant rendering parameters, such as pose, from single images. In video, head pose and mouth movements aretracked automatically. Reanimated with new mouth movements, the 3D face is rendered into the original images.
Categories and Subject Descriptors (according to ACM CCS): I.3.7 Computer Graphics: Animation
We present an example-based learning approach for locating vertical frontal views of human faces in complex scenes. The technique models the distribution of human face patterns by means of a few ...view-based "face" and "nonface" model clusters. At each image location, a difference feature vector is computed between the local image pattern and the distribution-based model. A trained classifier determines, based on the difference feature vector measurements, whether or not a human face exists at the current image location. We show empirically that the distance metric we adopt for computing difference feature vectors, and the "nonface" clusters we include in our distribution-based model, are both critical for the success of our system.
Theory II: Deep learning and optimization T. Poggio; Q. Liao
Bulletin of the Polish Academy of Sciences. Technical sciences,
12/2018, Letnik:
66, Številka:
No 6 (Special Section on Deep Learning: Theory and Practice)
Journal Article
Items are categorized differently depending on the behavioral context. For instance, a lion can be categorized as an African animal or a type of cat. We recorded lateral prefrontal cortex (PFC) ...neural activity while monkeys switched between categorizing the same image set along two different category schemes with orthogonal boundaries. We found that each category scheme was largely represented by independent PFC neuronal populations and that activity reflecting a category distinction was weaker, but not absent, when that category was irrelevant. We suggest that the PFC represents competing category representations independently to reduce interference between them.
Deep Learning: Theory and Practice A. Cichocki; T. Poggio; S. Osowski ...
Bulletin of the Polish Academy of Sciences. Technical sciences,
12/2018, Letnik:
66, Številka:
No 6 (Special Section on Deep Learning: Theory and Practice)
Journal Article