一. gradient
根据链式法则自动微分机制去计算梯度. 所谓自动, 是矩阵运算op种类固定且有限, 所以可以做到对每个op都维护一个求导方法, 直接硬编码到源码中.
二. optimizer
优化器作梯度计算及参数更新.
2.1 优化器基类
-
class
tensorflow.python.training.optimizer.Optimizer
优化方法的基类. -
_slots
, 字段, Dict[ slot_name , Dict [(graph, primary_var), slot_var]], 存放辅助变量. -
minimize
(self, loss, global_step=None, var_list=None, …)
返回一个 train_op, 通过更新 variable 来最小化损失函数. 它其实是以下两个 api(梯度计算 与 参数更新) 的封装. 当我们想在二者之间做一些自定义操作时, 就可以显式地分开调用. 常用场景之一就是梯度截断, 见参考[3]. -
compute_gradients
(self, loss, var_list=None,…)
求梯度但不更新. var_list 默认用 GraphKeys.TRAINABLE_VARIABLES 从 collections 中拿 var_list.
return a list of (gradient, variable) pairs, gradient 可以是 {Tensor, IndexedSlices, None}. -
apply_gradients
(grads_and_vars, global_step)
根据指定的梯度作更新. grads_and_vars 签名与上个方法返回一致.
2.2 简明源码
@tf_export("train.Optimizer")
class Optimizer(checkpointable.CheckpointableBase):
def __init__(self, use_locking, name):
self._name = name
self._slots = {}
def minimize(self, loss, global_step=None, var_list=None,
gate_gradients=GATE_OP, aggregation_method=None,
colocate_gradients_with_ops=False, name=None,
grad_loss=None):
"""Add operations to minimize `loss` by updating `var_list`.
This method simply combines calls `compute_gradients()` and
`apply_gradients()`. If you want to process the gradient before applying
them, call `compute_gradients()` and `apply_gradients()` explicitly instead
of using this function.
"""
grads_and_vars = self.compute_gradients(
loss, var_list=var_list, gate_gradients=gate_gradients,
aggregation_method=aggregation_method,
colocate_gradients_with_ops=colocate_gradients_with_ops,
grad_loss=grad_loss)
return self.apply_gradients(grads_and_vars, global_step=global_step,
name=name)
def compute_gradients(self, loss, var_list=None,
gate_gradients=GATE_OP,
aggregation_method=None,
colocate_gradients_with_ops=False,
grad_loss=None):
return grads_and_vars
def apply_gradients(self, grads_and_vars, global_step=None, name=None):
with ops.init_scope():
self._create_slots(var_list)
update_ops = []
with ops.name_scope(name, self._name) as name:
self._prepare()
for grad, var, processor in converted_grads_and_vars:
with ops.name_scope("update_" + scope_name), ops.colocate_with(var):
update_ops.append(processor.update_op(self, grad))
with ops.control_dependencies([self._finish(update_ops, "update")]):
with ops.colocate_with(global_step):
apply_updates = state_ops.assign_add(
global_step, 1, name=name)
return apply_updates
def _create_slots(self, var_list):
pass
def _get_or_make_slot_with_initializer(self, var, initializer, shape, dtype,
slot_name, op_name):
new_slot_variable = slot_creator.create_slot_with_initializer(var, initializer, shape, dtype, op_name)
self._slot_dict(slot_name)[_var_key(var)] = new_slot_variable
return new_slot_variable
# slot_creator.py
def create_slot_with_initializer(primary, initializer, shape, dtype, name,
colocate_with_primary=True):
prefix = primary.op.name
with variable_scope.variable_scope(None, prefix + "/" + name):
with distribution_strategy.colocate_vars_with(primary):
return _create_slot_var(primary, initializer, "", validate_shape, shape,
dtype)
def _create_slot_var(primary, val, scope, validate_shape, shape, dtype):
current_partitioner = variable_scope.get_variable_scope().partitioner
slot = variable_scope.get_variable(
scope, initializer=val, trainable=False,
use_resource=resource_variable_ops.is_resource_variable(primary),
shape=shape, dtype=dtype,
validate_shape=validate_shape)
variable_scope.get_variable_scope().set_partitioner(current_partitioner)
return slot
2.3 slot
TensorFlow 中的 optimizer slot 是在优化器中用于存储和更新变量的辅助变量。每个变量都有一个或多个 slot,用于存储该变量在优化过程中的状态。例如,Adam 优化器使用了两个 slot,分别存储了变量的一阶和二阶动量估计。在每次优化时,这些 slot 将被更新并用于计算变量的梯度。通常,slot 也可以用于实现正则化、momentum 和 batch normalization 等优化算法中的额外功能。
case 体验:
对应下文中 AdagradDecayOptimizer 的 _create_slots() 方法, 当 primary_var 是scope_emb/input_from_feature_columns/word_embedding/weights
时, 相应的 _slot 内容为:文章来源:https://www.toymoban.com/news/detail-555311.html
{'accumulator': {
(<tensorflow.python.framework.ops.Graph object at 0x000002871512F390>, 'scope_emb/input_from_feature_columns/word_embedding/weights'):
<tf.Variable 'OptimizeLoss/scope_emb/input_from_feature_columns/word_embedding/weights/AdagradDecay:0' shape=(1000, 10) dtype=float32_ref>
},
'accumulator_decay_power': {
(<tensorflow.python.framework.ops.Graph object at 0x000002871512F390>, 'scope_emb/input_from_feature_columns/word_embedding/weights'):
<tf.Variable 'OptimizeLoss/scope_emb/input_from_feature_columns/word_embedding/weights/AdagradDecay_1:0' shape=(1000, 10) dtype=int64_ref>
}
}
三. high level api
- tensorflow.contrib.layers.python.layers.optimizers.
optimize_loss
(loss,global_step,learning_rate, optimizer, clip_gradients, learning_rate_decay_fn, update_ops, variables, …)- optimizer: string, class or optimizer instance
- update_ops: list of update
Operation
s to execute at each step. IfNone
, uses elements of UPDATE_OPS collection. The order of execution betweenupdate_ops
andloss
is non-deterministic. - variables: list of variables to optimize or
None
to use all trainable variables.
简明源码
def optimize_loss(...):
# 诸如 batch norm 中的 moving_mean 等更新就在这里
update_ops = set(ops.get_collection(ops.GraphKeys.UPDATE_OPS))
loss = control_flow_ops.with_dependencies(list(update_ops), loss)
gradients = opt.compute_gradients(loss,...)
gradients = _clip_gradients_by_norm(gradients, clip_gradients)
grad_updates = opt.apply_gradients(gradients)
train_tensor = control_flow_ops.with_dependencies([grad_updates], loss)
return train_tensor
三. 常用子类
2.1 GradientDescentOptimizer
-
class
GradientDescentOptimizer
(optimizer.Optimizer)
类. 梯度下降法的实现. -
__init__(self, learning_rate)
构造函数中指定学习速率.
2.2 AdagradOptimizer
@tf_export("train.AdagradOptimizer")
class AdagradOptimizer(optimizer.Optimizer):
def _create_slots(self, var_list):
for v in var_list:
dtype = v.dtype.base_dtype
if v.get_shape().is_fully_defined():
init = init_ops.constant_initializer(self._initial_accumulator_value,
dtype=dtype)
else:
init = self._init_constant_op(v, dtype)
# Optimizer 父类的方法
self._get_or_make_slot_with_initializer(v, init, v.get_shape(), dtype,
"accumulator", self._name)
2.3 AdagradDecayOptimizer
class AdagradDecayOptimizer(optimizer.Optimizer):
def _create_slots(self, var_list):
for v in var_list:
with ops.colocate_with(v)
self._get_or_make_slot_with_initializer(v, init, v_shape, dtype,
"accumulator", self._name)
self._get_or_make_slot_with_initializer(
v, init_ops.zeros_initializer(self._global_step.dtype),
v_shape, self._global_step.dtype, "accumulator_decay_power", self._name)
2.4 AdamOptimizer
-
AdamOptimizer(optimizer.Optimizer)
类. 实现了Adam算法的优化器, 它是一种随机梯度下降法.
四. 多优化器并存
搭建计算图就像搭积木一样, 可以划分为多个模块, 自然也可以给各模块应用不同的优化器.
上文的 minimize 方法中有 var_list 参数, 就可以让不同的 optimizer 优化不同的模块. 那怎么给不同模块的参数作划分呢?文章来源地址https://www.toymoban.com/news/detail-555311.html
五. high api
- tensorflow.contrib.layers.
optimize_loss
(loss, global_step, learning_rate, optimizer, clip_gradients, variables)- variables, 对应 compute_gradients() 参数中的 var_list.
到了这里,关于tensorflow 中的 gradient 与 optimizer的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!