學習筆記|活體檢測

PAD(presentation attack detection)

- 動作配合式活體檢測:給出指定動作要求,用戶需配合完成,通過實時檢測用戶眼睛,嘴巴,頭部姿態(tài)的狀態(tài),來判斷是否是活體。

- H5視頻活體檢測:用戶上傳一個現(xiàn)場錄制的視頻,錄制時讀出隨機分配的語音校驗碼。然后通過分析這個視頻的人臉信息以及語音校驗碼是否匹配,完成活體檢測判斷。

- 靜默活體檢測:相對于動態(tài)活體檢測方法,靜默活體檢測是指,不需要用戶做任何動作,自然面對攝像頭3、4秒鐘即可。由于真實人臉并不是絕對靜止的, 存在微表情,如眼皮眼球的律動、眨眼、嘴唇及周邊面頰的伸縮等,可通過此類特征反欺騙。

- 圖片活體檢測:基于圖片中人像的破綻(摩爾紋、成像畸形等)來判斷目標對象是否為活體,可有效防止屏幕二次翻拍等作弊攻擊,可使用單張或多張判斷邏輯。

- 近紅外活體檢測:利用近紅外成像原理,實現(xiàn)夜間或無自然光條件下的活體判斷。其成像特點(如屏幕無法成像,不同材質(zhì)反射率不同等)可以實現(xiàn)高魯棒性的活體判斷。

- 3D結(jié)構(gòu)光活體檢測:基于3D結(jié)構(gòu)光成像原理,通過人臉表面反射光線構(gòu)建深度圖像,判斷目標對象是否為活體,可強效防御圖片、視頻、屏幕、模具等攻擊。

- 光流法:利用圖像序列中的像素強度數(shù)據(jù)的時域變化和相關性來確定各自像素位置的“運動”,從圖像序列中得到各個像素點的運行信息,采用高斯差分濾波器、LBP特征和支持向量機進行數(shù)據(jù)統(tǒng)計分析。同時,光流場對物體運動比較敏感,利用光流場可以統(tǒng)一檢測眼球移動和眨眼。這種活體檢測方式可以在用戶無配合的情況下實現(xiàn)盲測。

傳統(tǒng)方法:

1)鏡面反射+圖像質(zhì)量失真+顏色

specular reflection, blurriness features, chromatic moment and color diversity

Di Wen, Hu Han, Anil K. Jain. Face Spoof Detection with Image Distortion Analysis. IEEE Transactions on Information Forensics and Security, 2015

2)HSV空間人臉多級LBP特征 + YCbCr空間人臉LPQ特征

Zinelabidine Boulkenafet, Jukka Komulainen, Abdenour Hadid. Face Spoofing Detection Using Colour Texture Analysis. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016

3)捕獲活體與非活體微動作之間的差異來設計特征:一個是先通過運動放大來增強臉部微動作, 然后提取方向光流直方圖HOOF + 動態(tài)紋理LBP-TOP 特征;一個是通過動態(tài)模式分解DMD,得到最大運動能量的子空間圖,再分析紋理。

Santosh Tirunagari, Norman Poh. Detection of Face Spoofing Using Visual Dynamics. IEEE TRANS. ON INFORMATION FORENSICS AND SECURIT, 2015

4)先通過 pluse 在頻域上分布不同先區(qū)分活體 or 照片攻擊 (因為照片中的人臉提取的心率分布不同),再若判別1結(jié)果是活體,再 cascade 一個紋理LBP分類器,來區(qū)分活體 or 屏幕攻擊(因為屏幕視頻中人臉心率分布與活體相近)

Xiaobai Li, , Guoying Zhao. Generalized face anti-spoofing by detecting pulse from face videos, 2016 23rd ICPR

5) multi-scale local binary pattern (LBP) followed by a non-linear SVM(texture)

J. M??tt?, A. Hadid, and M. Pietik?inen, “Face spoofing detection from single images using micro-texture analysis,” in Biometrics (IJCB), 2011 international joint conference on. IEEE, 2011, pp. 1–7.

6) used the LBP-TOP features containing space and time descriptors to encode the motion information along with the face texture

T. de Freitas Pereira, A. Anjos, J. M. De Martino, and S. Marcel, “Lbp- top based countermeasure against face spoofing attacks,” in Asian Conference on Computer Vision. Springer, 2012, pp. 121–132.

7)multiple Difference of Gaussian (DoG) filters to remove the noise and low-frequency information. They used the high frequency information to generate the feature vector for SVM classifier

Z. Zhang, J. Yan, S. Liu, Z. Lei, D. Yi, and S. Z. Li, “A face antispoofing database with diverse attacks,” in Biometrics (ICB), 2012 5th IAPR international conference on. IEEE, 2012, pp. 26–31.

8)the motion relation between foreground and background

A. Anjos and S. Marcel, “Counter-measures to photo attacks in face recognition: a public database and a baseline,” in Biometrics (IJCB),2011 international joint conference on. IEEE, 2011, pp. 1–7.

9)(2015)FACE ANTI-SPOOFING BASED ON COLOR TEXTURE ANALYSIS

深度學習方法論文概要

1 (2018)Discriminative Representation Combinations for Accurate Face Spoofing Detection

SPMT: spatial pyramid coding micro-texture; local

SSD: Single Shot MultiBox Detector;context

TFBD: template face matched binocular depth;stereo

1)SSD+SPMT: one image

2)TFBD+SPMT:binocular image pair

把活體檢測直接放到人臉檢測(SSD,MTCNN等) 模塊里作為一個類,即人臉檢測出來的 bbox 里有背景,真人人臉,假人臉 三類的置信度,這樣可以在早期就過濾掉一部分非活體。


2 (2018)Learning Deep Models for Face Anti-Spoofing Binary or Auxiliary Supervision

二分類with softmax 只學到訓練集的某一種區(qū)分特征,黑盒而且泛化性不強。

提出auxillary supervision提取時間和空間信息:face depth(pixel-wise、CNN)和rPPG signals(sequence-wise、RNN)。

設計了深度框架準端到端地去預測 Pulse統(tǒng)計量 及 Depth map (這里說的“準”,就是最后沒接分類器,直接通過樣本 feature 的相似距離,閾值決策)

1)過去方法把活體檢測看成二分類問題,直接讓DNN去學習,這樣學出來的cues不夠general 和 discriminative

2)將二分類問題換成帶目標性地特征監(jiān)督問題,即 回歸出 pulse 統(tǒng)計量 + 回歸出 Depth map,保證網(wǎng)絡學習的就是這兩種特征(哈哈,不排除假設學到了 color texture 在里面,黑箱網(wǎng)絡這么聰明)


回歸 Depth map,就是通過 Landmark 然后 3DMMfitting 得到 人臉3D shape,然后再閾值化去背景,得到 depth map 的 groundtruth,最后和網(wǎng)絡預測的 estimated depth map 有 L2 loss。

而文章亮點在于設計了 Non-rigid Registration Layer 來對齊各幀人臉的非剛性運動(如姿態(tài),表情等),然后通過RNN更好地學到 temporal pulse 信息。


為什么需要這個對齊網(wǎng)絡呢?我們來想想,在做運動識別任務時,只需簡單把 sampling或者連續(xù)幀 合并起來喂進網(wǎng)絡就行了,是假定相機是不動的,對象在運動;而文中需要對連續(xù)人臉幀進行pulse特征提取,主要對象是人臉上對應ROI在 temporal 上的 Intensity 變化,所以就需要把人臉當成是相機固定不動。

3 (2018)Face De-Spoofing Anti-Spoofing via Noise Modeling

假設噪音是ubiquitous and repetive。

單幀方法,啟發(fā)于圖像去噪denoise 圖像去抖動 deblur,無論是噪聲圖還是模糊圖,都可看成是在原圖上加噪聲運算或者模糊運算,而去噪和去抖動,就是估計噪聲分布和模糊核,從而重構(gòu)回原圖。文中把活體人臉圖看成是原圖 ,而非活體人臉圖看成是加了噪聲后失真的 x ,故 task 就變成估計 Spoof noise ,然后用這個 Noise pattern feature 去分類決策。


那問題來了,數(shù)據(jù)集沒有像素級別一一對應的 groundtruth,也沒有Spoof Noise模型的先驗知識(如果有知道Noise模型,可以用Live Face來生成Spoofing Face),那拿什么來當groundtruth,怎么設計網(wǎng)絡去估計 Spoofing noise 呢?

如一般Low-level image 任務一樣,文中利用Encoder-decoder來得到 Spoof noise N,然后通過殘差重構(gòu) ,這就是下圖的DS Net。為了保證網(wǎng)絡對于不同輸入,學出來的Noise是有效的,根據(jù)先驗知識設計了三個Loss來constrain:

1)Magnitude loss(當輸入是Live face時,N盡量逼近0);

2)Repetitive loss(Spooing face的Noise圖在高頻段有較大的峰值);

3)Map Loss(讓Real Face 的 deep feature map分布盡量逼近全0,而Spoofing face的 deep feature map 盡量逼近全1)


那網(wǎng)絡右邊的 VQ-Net 和 DQ-Net 又有什么作用呢?因為沒有 Live face 的 Groundtruth,要保證重構(gòu)出來的分布接近 Live face,作者用了對抗生成網(wǎng)絡GAN (即 VQ-Net )去約束重構(gòu)生成的live face 與Live face分布盡量一致;而用了 pre-trained Depth model 來保證結(jié)果live face的深度圖與Live face的深度圖盡量一致。

Pros: 通過可視化最終讓大眾知道了 Spoofing Noise 是長什么樣子的~

Cons: 在實際場景中難部署(該模型假定Spoofing Noise是 strongly 存在的,當實際場景中活體的人臉圖質(zhì)量并不是很高,而非活體攻擊的質(zhì)量相對高時,Spoofing noise走不通)

4 (2019)A Performance Evaluation of Convolutional Neural?

測試多種核心CNN網(wǎng)絡、是否transfer、是否init random、learning rate

The face anti-spoofing is considered as the two-class classification problem in this paper. The two classes are real face

class and spoofed face class. the CNN model predicts the class score for training images, computes the categorical cross-entropy loss


TABLE : The training, validation and testing performance comparison among Inception-v3, ResNet50 and ResNet152 models in terms of the accuracy, convergence rate, and varying parameters like initial weights, number of trainable layers and learning rate. In this table, the ‘Epochs’ is the number of epochs for highest validation accuracy.

5 (2019)Deep Transfer Across Domains for Face Anti-spoofing

提出現(xiàn)有的方法泛化性都不足。原因:

1)the variety of spoofing materials can make the spoofing attacks quite different.

2)limited labeled data is available for training in face anti-spoofing.

We propose to learn a shared feature subspace where the distributions of the real access samples (genuine) from different domains, and the distributions of different types of spoofing attacks (fake) from different domains are drawn close, respectively. In the proposed framework, the sufficient labeled source data are used to learn discriminative representations that distinguish the genuine samples and the fake samples, meanwhile the sparsely labeled target samples are fed to the network to calculate the feature distribution distance between the genuine samples from the source and the target domain, and between the fake samples from the source and the target domains, corresponding to their materials. The kernel approach is adopted to map the features output from the CNN into a common kernel space, and the Maximum Mean Discrepancy (MMD) is adopted to measure the distribution distance between the samples from the source and target domains. This feature distribution distance is treated as a domain loss term added to the objective function and minimized along with training of the network.


Figure : The flowchart of the proposed framework, where every input batch contains half the source images and half the target images. Features of the two domains output from the last pooling layer are used to calculate the distribution distance with kernel based MMD. The network is trained using the classification loss along with the distribution distance which is taken as domain loss.

6 (2018)Deep Tree Learning for Zero-shot Face Anti-Spoofing

the detection of unknown spoof attacks as Zero-Shot Face Anti-spoofing (ZSFA).A novel Deep Tree Network (DTN) is proposed to partition the spoof samples into semantic sub-groups in an unsupervised fashion.Assuming there are both homogeneous features among different spoof types and distinct features within each spoof type, a tree-like model is well-suited to handle this case: learning the homogeneous features in the early tree nodes and distinct features in later tree nodes.


Figure : The proposed Deep Tree Network (DTN) architecture. (a) the overall structure of DTN. A tree node consists of a Convolutional Residual Unit (CRU) and a Tree Routing Unit (TRU), and a leaf node consists of a CRU and a Supervised Feature Learning (SFL) module. (b) the concept of Tree Routing Unit (TRU): finding the base with largest variations; (c) the structure of each Convolutional Residual Unit(CRU); (d) the structure of the Supervised Feature Learning (SFL) in the leaf nodes.

7 (2019)Enhance the Motion Cues for Face Anti-Spoofing using

fine-grained motions:比如眨眼、手抖

Extract the high discriminative features of video frames using the conventional Convolutional Neural Network (CNN). Then we leverage Long Short-Term Memory (LSTM) with the extracted features as inputs to capture the temporal dynamics in videos.To ensure the fine-grained motions more easily to be perceived in the training process, **the eulerian motion magnification** is used as the preprocessing to enhance the facial expressions exhibited by individuals, and **the attention mechanism** is embedded in LSTM to ensure the model learn to focus selectively on the dynamic frames across the video clips.


Fig: (a) The flowchart of the proposed CNN-LSTM framework. (b) The cascaded LSTM architecture. (c) Illustration of a single LSTM unit, the current state t depends on the past state t 1 of the same neuron.

8 (2019)FeatherNets Convolutional Neural Networks as Light as Feather

代碼:https://github.com/SoftwareGift/FeatherNets_Face-Anti-spoofing-Attack-Detection-Challenge-CVPR2019

提出一種非常小的網(wǎng)絡結(jié)構(gòu);fixes the weakness of Global Average Pooling-->Streaming Module; **use depth image only**( the depth information is estimated from RGB image) ;“ensemble + cascade” structure


Figure. Streaming Module. The last blocks’ output is down-sampled by a depthwise convolution[28, 29] with stride larger than 1 and flattened directly into an one-dimensional vector.


Figure. Multi-Modal Fusion Strategy: Two stages cascaded, stage 1 is an ensemble classifier consisting of several depth models. Stage 2 employs IR models to classify the uncertain samples from stage 1.

9 (2018)LiveNet Improving features generalization for face liveness detection

continuous data-randomization (like bootstrapping)


Fig. The sampling is done in the form of mini-batches. (a) Conventional method for training CNN Networks. (b) Proposed method for training CNN networks.

10(2019)Learning Generalizable and Identity-Discriminative Representations for face anti-spoofing

代碼:https://github.com/XgTu/GFA-CNN

1)Total Pairwise Confusion(TPC) loss:通過故意在特征激活中引入混淆來解決細粒度視覺分類中的過度擬合問題


其中xi,xj來自訓練集(不必須來自不同categories)以讓模型學到slightly less discriminative features;使用歐式距離使得圖像對have a similar con-ditional probability distribution

2)Fast Domain Adaptation(FDA) component into the CNN model to alleviate negative effects brought by domain changes:The FDA consists of an image transformation network f (·) that generates a synthetic image y from a given image x: y = f (x),and a loss network φ(·) that computes content reconstruction loss L content and domain reconstruction loss L domain .

3)Generalizable Face Authentication CNN model,works in a multi-task manner, performing simultaneously face anti-spoofing and face recognition


Figure : Architecture of proposed GFA-CNN. The whole network contains two branches. The face anti-spoofing branch(upper) takes as input the domain-adaptive images transferred by FDA and optimized by TPC-loss and Anti-loss, while the face recognition branch (bottom) takes the cropped face images as input and is trained by minimizing Recog-loss. Thestructure settings are shown on top of each block, where “ID number” indicates the number of subjects involved in training. The two branches share parameters during training.

Table: Ablation study (HTER %). +” means the corre-sponding component is used, while -” indicates removingthe component. The numbers in bold are the best results.

11(2019)Improving Face Anti-Spoofing by 3D Virtual Synthesis

合成更多的spoof 數(shù)據(jù)

12( 2019)Generalized Presentation Attack Detection a face anti-spoofing evaluation proposal

1)proposed a framework, GRAD-GPAD, for systematic evaluation of the generalization properties of face-PAD methods

2)提出了兩種新的評估協(xié)議:Cross-FaceResolution、Cross-Conditions

原有的協(xié)議:Grandtest、Cross-Dataset、One-PAI、Unseen Attacks (Cross-PAI)、 Unseen Capture Devices:

13( 2019)Exploiting temporal and depth information for multi-frame face anti-spoofing

estimate depth information from multiple RGB frames


Figure . The pipeline of proposed architecture. The inputs are consecutive frames in a fixed interval. Our single-frame part aims toextract features at various levels and to output the single-frame estimated facial depth. OFF blocks take single-frame features from twoconsecutive frames as inputs and calculate short-term motion features. Then the final OFF features are fed into the ConvGRUs to obtainlong-term motion information, and output the residual of single-frame facial depth. Finally, the combined estimated multi-frame depthmaps are supervised by the depth loss and binary loss in respective manners.

14( 2019)Meta Anti-spoofing: Learning to Learn in Face Anti-spoofing

a few-shot learning problem with evolving new attacks


Figure. (a) Network structure of Meta-FAS-CS which aims to train a meta-learner through classification label. (b) Network structure ofMeta-FAS-DR which aims to train a meta-learner through depth label.

15( 2019)Deep Anomaly Detection for Generalized Face Anti-Spoofing


Figure : We propose a deep metric learning approach, using a set of Siamese CNNs, in conjunction with the combination of a triplet focal loss and a novel “metric softmax” loss. The latter accumulates the probability distribution of each pair within the triplet. Our aim is to learn a feature representation that allows us to detect impostor samples as anomalies.

16( 2019)Aurora Guard: Real-Time Face Anti-Spoofing via Light Reflection

extracts the normal cues via light reflection analysis, and then uses an end-to-end trainable multi-task Convolutional Neural Network (CNN) to not only recover subjects’ depth maps to assist liveness classification, but also provide the light CAPTCHA checking mechanism in the regression branch to further improve the system reliability


Figure : Overview of Aurora Guard. From facial reflection frames encoded by casted light CAPTCHA, we estimate the normal cues. In the classification branch, we recover the depth maps from the normal cues, and then perform depth-based liveness classification. In the regression branch, we obtain the estimated light CAPTCHA.

17( 2019)Towards Real-time Eyeblink Detection in The Wild:Dataset, Theory and Practices

After locating and tracking human eye using SeetaFace engine and KCF tracker respectively, a modified LSTM model able to capture the multi-scale temporal information is proposed to execute eyeblink verification.A feature extraction approach that reveals appearance and motion characteristics simultaneously is also proposed.


18( 2018)Exploring Hypergraph Representation on Face Anti-spoofing Beyond 2D Attacks

construct a computation-efficient and posture-invariant face representation with only a few key points on hypergraphs. The hypergraph representation is then fed into the designed HGCNN with hypergraph convolution for feature extraction, while the depth auxiliary is also exploited for 3D mask anti-spoofing


總結(jié)

近幾年提出的深度學習的活體檢測方法,主要有四種思路:

1)單純地使用圖片作為輸入-->CNN

難點在于泛化性不足、黑盒特征,提出的解決思路有:

- 學習不同數(shù)據(jù)集的domain difference:domain loss[例5]、FDA+TPC loss[例10]

- 使用注意力機制學習特定的特征:face depth(pixel-wise、CNN)[例2]

- 新創(chuàng)的抽樣方法[例9]

- 新型的網(wǎng)絡結(jié)構(gòu)、“ensemble + cascade” structure、使用depth image[例8]

- 將spoofing 信息視為一種noise[例3]

- 與人工特征結(jié)合[例1]

2)考慮時域信息來識別fine-grained motions(LSTM)[例7]、rPPG signals(RNN)[例2]

3)使用無監(jiān)督學習[例6]

4)使用binary image pair[例1]


數(shù)據(jù)集

可下載:

NUAA

REPLAY-ATTACK https://www.idiap.ch/dataset/replayattack

、

CASIA-FASD http://www.cbsr.ia.ac.cn/english/FASDB_Agreement/Agreement.pdf

SIW http://cvlab.cse.msu.edu/spoof-in-the-wild-siw-face-anti-spoofing-database.html

OULU-NPU https://sites.google.com/site/oulunpudatabase/

MSU-MFSD http://biometrics.cse.msu.edu/Publications/Databases/MSUMobileFaceSpoofing/index.htm

MSU_USSA http://biometrics.cse.msu.edu/Publications/Databases/MSU_USSA/

HKBU-MARs http://rds.comp.hkbu.edu.hk/mars/

、

3DMAD https://www.idiap.ch/dataset/3dmad

UVAD https://recodbr.wordpress.com/code-n-data/#UVAD

REPLAY-MOBILE https://www.idiap.ch/dataset/replay-mobile

ROSE-YOUTU http://rose1.ntu.edu.sg/Datasets/faceLivenessDetection.asp

CS-MAD https://www.idiap.ch/dataset/csmad

SMAD

未公開:

SiW-M 、MMFD

參考:

[活體檢測算法綜述](https://zhuanlan.zhihu.com/p/44904820)

[論文獲取鏈接1](https://paperswithcode.com/task/face-anti-spoofing/codeless#code)

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
【社區(qū)內(nèi)容提示】社區(qū)部分內(nèi)容疑似由AI輔助生成,瀏覽時請結(jié)合常識與多方信息審慎甄別。
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點,簡書系信息發(fā)布平臺,僅提供信息存儲服務。

相關閱讀更多精彩內(nèi)容

友情鏈接更多精彩內(nèi)容