我制作的工程代码在这：Tiny_Face_Recognition

之前的人脸识别效果不好，直接导致后续的其他的工作效果不好，于是寻找到现有的一个非常良心的论文，和我的数据集用的基本一致，而且效果非常的好。于是定了一个一周计划去复刻，每天以博文的形式记录。

开始之前先说一下我之前的人脸识别的步骤，我在学习Siamese Network时直接参考了one-shot Face Recognition的相关论文，所以直接搭建了Siamese ResNet152进行人脸的相似度网络。这样识别的时候，先扔进去一张自己的照片，再扔别的照片的时候可以通过人脸特征的相似度匹配，识别出是/还是不是你的照片。但准确率没有90%，还存在很多的问题。

所以开始这篇论文的学习，刚好他用的数据集和验证集和我的一样，正好可以参考他的进行搭建，实现一个正常的人脸识别的流程。

论文地址：https://arxiv.org/abs/1801.07698

参考代码地址（使用的MXNet）：https://github.com/deepinsight/insightface#512-d-feature-embedding

训练集：MS-Celeb-1M

验证集：LFW

介绍ArcFace

这篇论文提出了一个叫做 Additive Angular Margin Loss，即ArcFace的Loss，改进基于上面的A-Softmax。

公式为：

\[L_{arcface} = \frac{1}{N} \sum_j{-\log{\frac{e^{s(\cos(\theta_{y_j} + m))}}{e^{s(\cos(\theta_{y_j} + m))} + \sum_{i\neq y_j}{e^{scos{(\theta_i, j)}}}}}}\]

具体为什么和其他的我还没看懂，明天再看。

第一天代码

原评测中效果较好的为LResNet50E-IR，其中L表示112112人脸图片，IR表示对ResNet的block进行了一些改进，所以我先搭建一个原始的224224的ResNet-50

ResNet50的结构为：（3+4+6+3）*3+2，需要四个Block块

先定义Block数据结构:

class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])):
	pass

设计50层ResNet顶层函数：

def resnet_50(inputs, num_classes=None, global_pool=True, reuse=None, scope='renet_50'):

	blocks = [
		Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
		Block('block2', bottleneck, [(512, 128,1)] * 3 + [(512, 128, 2)]),
		Block('block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
		Block('block4', bottleneck, [(2048, 512, 1)] * 3)
	]

	return resnet_v2(inputs, blocks, num_classes, global_pool, include_root_block=True, reuse=reuse, scope=scope)

其中包含残差函数bottleneck和resnet运行环境两部分。

残差函数bottlenect:

def bottleneck(inputs, depth, depth_bottlenect, stride, outputs_collections=None, scope=None):
	with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
		# 获取输入最后一维，即输出通道，限定min_rank最小维度为4
		depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
		# 对输入进行Batch_Normalization
		preact = slim.batch_norm(inputs, activation_fn=tf.nn.relu, scope='preact')

		if depth == depth_in:
			# 如果残差单元输入通道和输出通道一致，按照步长对inputs进行降采样
			shortcut = subsample(inputs, stride, 'shortcut')
		else:
			# 不一致就按步长和1*1卷积改变通道数，使两者一致
			shortcut = slim.conv2d(preact, depth, [1,1], stride=stride, normalizer_fn=None, activation_fn=None, scope='shortcut')

		# 输出通道为 depth_bottleneck
		residual = slim.conv2d(preact, depth_bottleneck, [1,1], stride=1, scope='conv1')

		# 输出通道为depth_bottleneck, 步长为stride, 3*3卷积
		residual = conv2d_same(residual, depth_bottleneck, 3, stride, scope='conv2')

		# 最后一层1*1卷积，步长1，输出depth的卷积，无正则，无激活函数
		residual = slim.conv2d(residual, depth, [1,1], stride=1, normalizer_fn=None, activation_fn=None, scope='conv3')

		# 将结果和降采样结果相加
		output = shortcut + residual
		# 将output加入collection并返回output作为函数结果
		return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)

其中用到了降采样和卷积层两个方法

降采样方法：

def subsample(inputs, facor, scope=None):
	if factor == 1:
		return inputs
	else:
		return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope)

创造卷积层：

def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
	if stride == 1:
		return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, padding='SAME', scope=scope)
	else:
		# 显式的pad zero总数为kernel_size - 1
		pad_total = kernel_size - 1
		pad_beg   = pad_total // 2
		pad_end   = pad_total - pad_beg
		# 对输入进行补0
		inputs = tf.pad(inputs, [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]])
		return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride, padding='VALID', scope=scope)

最后是resnet运行环境：

def resnet_v2(
	inputs, # 输入 [batch, height, weight, channels]
	blocks, # 多个['scope', 'unit_fn', 'args']
	num_classes=None,  # 最后输出类数
	global_pool=True,  # 是否最后进行平均池化
	include_root_block=True, # 是否在最前面加上7*7卷积和最大池化
	reuse=None,    # 是否重用
	scope=None     # 名称
):
	with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse=reuse) as sc:
		# 定义end_points_collection
		end_points_collection = sc.original_name_scope + '_end_points'
		# 将三个参数的outputs_collections默认设置为end_points_collection
		with slim.arg_scope([slim.conv2d, bottleneck, stack_blocks_dense], outputs_collections=end_points_collection):
			net = inputs
			if include_root_block:
				with slim.arg_scope([slim.conv2d], activation_fn=None, normalizer_fn=None):
					# 64输出通道，7*7卷积
					net = conv2d_same(net, 64, 7, stride=2, scope='conv1')
				# 接最大池化
				net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')
			# 生成残差学习模块
			net = stack_blocks_dense(net, blocks)
			net = slim.batch_norm(net, activation_fn=tf.nn.relu, scope='postnorm')

			if global_pool:
				# 全局平均池化层
				net = tf.reduce_mean(net, [1,2], name='pool5', keep_dims=True)
			if num_classes is not None:
				# 是否有通道数
				net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='logits')
			# 转化collection到python的dict
			end_points = slim.utils.convert_collection_to_dict(end_points_collection)
			
			# Softmax激活，输出网络结果
			if num_classes is not None:
				end_points['predictions'] = slim.softmax(net, scope='predictions')

			return net, end_points

其中包含arg_scope和stack_blocks_dense，具体实现如下

arg_scope, 定义参数默认值：

def resnet_arg_scope(
	is_training=True,       # 是否训练
	weight_decay=0.0001,    # 权重衰减速率
	batch_norm_decay=0.997, # BN衰减速率
	batch_norm_epsilon=1e-5, # BN的epsilon默认值
	batch_norm_scale=True    # BN的scale默认值
):
	batch_norm_params = {
		'is_training': is_training,
		'decay': batch_norm_decay,
		'epsilon': batch_norm_epsilon,
		'scale': batch_norm_scale,
		'updates_collections': tf.GraphKeys.UPDATE_OPS,
	}

	with slim.arg_scope(
		[slim.conv2d],
		weights_regularizer = slim.l2_regularizer(weight_decay),
		weights_initializer = slim.variance_scaling_initializer(),
		activation_fn = tf.nn.relu,
		normalizer_fn = slim.batch_norm,
		normalizer_params = batch_norm_params)
		):
		with slim.arg_scope([slim.batch_norm], **batch_norm_params): 
			with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc:
				return arg_sc # 最后将基层嵌套的arg_scope作为结果返回

}

stack_blocks_dense，堆叠Blocks的函数

def stack_blocks_dense(net, blocks, outputs_collections=None):
	# 两层循环，逐个Residual unit堆叠
	for block in blocks:
		with tf.variable_scope(block.scope, 'block', [net]) as sc:
			for i, unit in enumerate(block.args):
				# 第二层循环展开四个参数
				with tf.variable_scope('unit_%d' % (i+1), values=[net]):
					unit_depth, unit_depth_bottleneck, unit_stride = unit
					# 使用残差学习单元的生成函数顺序的创建并连接所有的残差学习单元
					net = block.unit_fn(net, depth=unit_depth, depth_bottleneck=unit_depth_bottleneck, stride=unit_stride)
			net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
	# 所有Block的residual unit都堆叠完之后，再返回net作为stack_blocks_dense
	return net

深度学习入门（十一）一周精读论文与代码复刻 ArcFace (Day 1)

相关预知识

Softmax Loss

Center Loss

Triplet Loss

A-Softmax Loss

介绍ArcFace

第一天代码