你將在這篇文章中學到什麼
如何讓人臉識別的Python腳本工作起來對它如何工作有一定的理解介紹
使用Tensorflow對象檢測API實現實時對象檢測變得非常簡單。我們可以簡單地使用預訓練過的模型,如SSD Mobilenet detector ,在任何普通的計算機上實時識別許多物體。但如果我們想做面部識別呢?就像電影裡那樣
你能使用一個預先訓練過的人臉識別模型來識別你的朋友嗎?因為:
一個預先訓練好的對象檢測模型可以識別世界上最常見的對象但是一個預先訓練過的人臉識別模型只能識別訓練過的人臉所以我們需要一些額外的東西來使用這樣的模型。假設你想構建一個應用程序,可以識別你的一些朋友/家人/同事的臉。你能不能在一小時內做到呢?
你需要的東西
有些圖片包含你想要識別的人的臉(他們不需要裁剪)。注意:你不需要幾百萬張圖片來訓練一個模型來識別你感興趣的人。每個人只有一張照片就夠了!帶Python 3和攝像頭的筆記本電腦下載並安裝一些東西,尤其是dlib途徑
所以我們需要兩個模型:
人臉檢測模型將圖像轉換成有意義的特徵向量(或嵌入)的人臉嵌入模型,專門針對人臉進行訓練對於這兩種方法,您可以使用多種選項,但是最簡單的選項是使用dlib庫中的模型:
HOG face detector(直方圖定向梯度特徵+線性SVM分類器)人臉嵌入模型( face embedding model):一個稍微修改過的ResNet-34分類模型,訓練了300萬個人臉,其中最後一個分類器層被移除,使其成為一個嵌入模型。聽起來已經很麻煩了?不用擔心,我們將使用一些的包:
dlib用於實際檢測和識別的東西face_recognition,作為一個很好的包裝,讓我們使用更方便OpenCV使用網絡攝像頭和其他的圖像代碼實現
首先:做一些導入並定義一些常量,Python實現如下:
import face_recognition
import cv2
import numpy as np
import glob
import os
import logging
IMAGES_PATH = './images' # put your reference images in here
CAMERA_DEVICE_ID = 0
MAX_DISTANCE = 0.6 # increase to make recognition less strict, decrease to make more strict
現在讓我們從一個函數開始在圖像中嵌入任何人臉。它比您想象的要簡單:只需從face_recognition包中提取兩行代碼!
def get_face_embeddings_from_image(image, convert_to_rgb=False):
"""
Take a raw image and run both the face detection and face embedding model on it
"""
# Convert from BGR to RGB if needed
if convert_to_rgb:
image = image[:, :, ::-1]
# run the face detection model to find face locations
face_locations = face_recognition.face_locations(image)
# run the embedding model to get face embeddings for the supplied locations
face_encodings = face_recognition.face_encodings(image, face_locations)
return face_locations, face_encodings
注意:OpenCV在BGR格式中讀取圖像,在RGB格式中使用face_recognition,因此有時需要轉換它們。
現在做一個小的身份數據庫,包含我們的參考圖像的編碼,Python代碼如下:
def setup_database():
"""
Load reference images and create a database of their face encodings
"""
database = {}
for filename in glob.glob(os.path.join(IMAGES_PATH, '*.jpg')):
# load image
image_rgb = face_recognition.load_image_file(filename)
# use the name in the filename as the identity key
identity = os.path.splitext(os.path.basename(filename))[0]
# get the face encoding and link it to the identity
locations, encodings = get_face_embeddings_from_image(image_rgb)
database[identity] = encodings[0]
return database
讓我們使用OpenCV從網絡攝像頭獲取視頻流。基本原理如下:
# open a connection to the camera
video_capture = cv2.VideoCapture(CAMERA_DEVICE_ID)
# read from the camera in a loop, frame by frame
while video_capture.isOpened():
# Grab a single frame of video
ok, frame = video_capture.read()
#
# do face recognition stuff here using this frame...
#
# Display the image
cv2.imshow('my_window_name', frame)
# Hit 'q' on the keyboard to stop the loop
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# release handle to the webcam
video_capture.release()
# close the window (buggy on a Mac btw)
cv2.destroyAllWindows()
從這裡開始,我們需要做的就是在這個循環中插入識別部分,加上一些代碼來比較面部嵌入和我們的'數據庫'以找到最佳匹配,Python實現如下:
# run detection and embedding models
face_locations, face_encodings = get_face_embeddings_from_image(frame, convert_to_rgb=True)
# the face_recognition library uses keys and values of your database separately
known_face_encodings = list(database.values())
known_face_names = list(database.keys())
# Loop through each face in this frame of video and see if there's a match
for location, face_encoding in zip(face_locations, face_encodings):
# get the distances from this encoding to those of all reference images
distances = face_recognition.face_distance(known_face_encodings, face_encoding)
# select the closest match (smallest distance) if it's below the threshold value
if np.any(distances <= MAX_DISTANCE):
best_match_idx = np.argmin(distances)
name = known_face_names[best_match_idx]
else:
name = None
# show recognition info on the image
paint_detected_face_on_image(frame, location, name)
我們還沒有定義最後一個函數。只是幾個OpenCV函數調用。一定要確保上下不要混淆!
def paint_detected_face_on_image(frame, location, name=None):
"""
Paint a rectangle around the face and write the name
"""
# unpack the coordinates from the location tuple
top, right, bottom, left = location
if name is None:
name = 'Unknown'
color = (0, 0, 255) # red for unrecognized face
else:
color = (0, 128, 0) # dark green for recognized face
# Draw a box around the face
cv2.rectangle(frame, (left, top), (right, bottom), color, 2)
# Draw a label with a name below the face
cv2.rectangle(frame, (left, bottom - 35), (right, bottom), color, cv2.FILLED)
cv2.putText(frame, name, (left + 6, bottom - 6), cv2.FONT_HERSHEY_DUPLEX, 1.0, (255, 255, 255), 1)
差不多了!只需將其中一些最後的腳本部分合併到一個應用程序的主函數中,Python實現如下:
def run_face_recognition(database):
"""
Start the face recognition via the webcam
"""
# Open a handler for the camera
video_capture = cv2.VideoCapture(CAMERA_DEVICE_ID)
# the face_recognitino library uses keys and values of your database separately
known_face_encodings = list(database.values())
known_face_names = list(database.keys())
while video_capture.isOpened():
# Grab a single frame of video (and check if it went ok)
ok, frame = video_capture.read()
if not ok:
logging.error("Could not read frame from camera. Stopping video capture.")
break
# run detection and embedding models
face_locations, face_encodings = get_face_embeddings_from_image(frame, convert_to_rgb=True)
# Loop through each face in this frame of video and see if there's a match
for location, face_encoding in zip(face_locations, face_encodings):
# get the distances from this encoding to those of all reference images
distances = face_recognition.face_distance(known_face_encodings, face_encoding)
# select the closest match (smallest distance) if it's below the threshold value
if np.any(distances <= MAX_DISTANCE):
best_match_idx = np.argmin(distances)
name = known_face_names[best_match_idx]
else:
name = None
# put recognition info on the image
paint_detected_face_on_image(frame, location, name)
# Display the resulting image
cv2.imshow('Video', frame)
# Hit 'q' on the keyboard to quit!
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release handle to the webcam
video_capture.release()
cv2.destroyAllWindows()
現在我們準備好運行它了!
database = setup_database()
run_face_recognition(database)
結果與改進
它的工作原理?你可能對幀速率有點失望。但是你可以進一步改進:
在將幀放入模型之前,先將其壓縮到原來尺寸的25%左右。這已經產生了巨大的差異,而識別仍然有效!跳過一些幀。只在其他幀上做檢測/識別部分。使用獨立的線程進行相機讀取、模型推斷和圖像顯示。你可以用Haar Cascade人臉檢測器來代替HOG。