Taigman, Yaniv, et al. "Deepface: Closing the gap to human-level performance in face verification."Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014.
Facebook likes to suggest me tagging somebody on a photo. Amazingly it is almost always right! DeepFace was by Facebook in 2014. DeepFace in read world, wow.
- Face Recognition: Detection -> Alignment -> Representation -> Recognition
This work contributed to Alignment, Representation and Recognition.
They proposed a 3D-model based alignment method. Although 3D-model based methods had fallen out of favor, they think it is the right way because faces are 3D objects. The alignment method is as the image below.
They proposed a novel CNN models for aligned faces raw images.
C1 + M1: conv and max-pooling
C2: conv (no max-pooling)
L4, L5, L6: local connected layers.
F7, F8: FC layers. Features are extracted at F7.
Local connected layers are are like normal conv layers but every location in the feature map learns a different set of filters. Based on the fact the input faces are aligned, different regions of the image have different local statistics, learned by local filters.
Several metrics are tested:
- unsupervised metric: distance = feature1 dot feature2
- weighted X^2 distance
- Siamese network to finetune the last 2 layers
On the LFW (Labeled Faces in the Wild) dataset, it came to 97.35%. Human: 97.53%.
On the YTF (Youtube Faces) dataset, 91.4%.