tensorflow 中的 gradient 与 optimizer-Toy模板网

这篇具有很好参考价值的文章主要介绍了tensorflow 中的 gradient 与 optimizer。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

一. gradient

根据链式法则自动微分机制去计算梯度. 所谓自动, 是矩阵运算op种类固定且有限, 所以可以做到对每个op都维护一个求导方法, 直接硬编码到源码中.

二. optimizer

优化器作梯度计算及参数更新.

2.1 优化器基类

class tensorflow.python.training.optimizer.Optimizer
优化方法的基类.
_slots, 字段, Dict[ slot_name , Dict [(graph, primary_var), slot_var]], 存放辅助变量.
minimize(self, loss, global_step=None, var_list=None, …)
返回一个 train_op, 通过更新 variable 来最小化损失函数. 它其实是以下两个 api(梯度计算与参数更新) 的封装. 当我们想在二者之间做一些自定义操作时, 就可以显式地分开调用. 常用场景之一就是梯度截断, 见参考[3].
compute_gradients(self, loss, var_list=None,…)
求梯度但不更新. var_list 默认用 GraphKeys.TRAINABLE_VARIABLES 从 collections 中拿 var_list.
return a list of (gradient, variable) pairs, gradient 可以是 {Tensor, IndexedSlices, None}.
apply_gradients(grads_and_vars, global_step)
根据指定的梯度作更新. grads_and_vars 签名与上个方法返回一致.

2.2 简明源码

@tf_export("train.Optimizer")
class Optimizer(checkpointable.CheckpointableBase):
    def __init__(self, use_locking, name):
        self._name = name
        self._slots = {}

    def minimize(self, loss, global_step=None, var_list=None,
                 gate_gradients=GATE_OP, aggregation_method=None,
                 colocate_gradients_with_ops=False, name=None,
                 grad_loss=None):
        """Add operations to minimize `loss` by updating `var_list`.
        This method simply combines calls `compute_gradients()` and
        `apply_gradients()`. If you want to process the gradient before applying
        them, call `compute_gradients()` and `apply_gradients()` explicitly instead
        of using this function.
        """
        grads_and_vars = self.compute_gradients(
            loss, var_list=var_list, gate_gradients=gate_gradients,
            aggregation_method=aggregation_method,
            colocate_gradients_with_ops=colocate_gradients_with_ops,
            grad_loss=grad_loss)

        return self.apply_gradients(grads_and_vars, global_step=global_step,
                                    name=name)

    def compute_gradients(self, loss, var_list=None,
                          gate_gradients=GATE_OP,
                          aggregation_method=None,
                          colocate_gradients_with_ops=False,
                          grad_loss=None):
        return grads_and_vars

    def apply_gradients(self, grads_and_vars, global_step=None, name=None):
        with ops.init_scope():
            self._create_slots(var_list)
        update_ops = []
        with ops.name_scope(name, self._name) as name:
            self._prepare()
            for grad, var, processor in converted_grads_and_vars:
                with ops.name_scope("update_" + scope_name), ops.colocate_with(var):
                    update_ops.append(processor.update_op(self, grad))
            with ops.control_dependencies([self._finish(update_ops, "update")]):
                with ops.colocate_with(global_step):
                    apply_updates = state_ops.assign_add(
                        global_step, 1, name=name)
        return apply_updates

    def _create_slots(self, var_list):
        pass

    def _get_or_make_slot_with_initializer(self, var, initializer, shape, dtype,
                                           slot_name, op_name):
        new_slot_variable = slot_creator.create_slot_with_initializer(var, initializer, shape, dtype, op_name)
        self._slot_dict(slot_name)[_var_key(var)] = new_slot_variable
        return new_slot_variable


# slot_creator.py
def create_slot_with_initializer(primary, initializer, shape, dtype, name,
                                 colocate_with_primary=True):
    prefix = primary.op.name
    with variable_scope.variable_scope(None, prefix + "/" + name):
        with distribution_strategy.colocate_vars_with(primary):
            return _create_slot_var(primary, initializer, "", validate_shape, shape,
                                dtype)

def _create_slot_var(primary, val, scope, validate_shape, shape, dtype):
    current_partitioner = variable_scope.get_variable_scope().partitioner
    slot = variable_scope.get_variable(
        scope, initializer=val, trainable=False,
        use_resource=resource_variable_ops.is_resource_variable(primary),
        shape=shape, dtype=dtype,
        validate_shape=validate_shape)
    variable_scope.get_variable_scope().set_partitioner(current_partitioner)
    return slot

2.3 slot

TensorFlow 中的 optimizer slot 是在优化器中用于存储和更新变量的辅助变量。每个变量都有一个或多个 slot，用于存储该变量在优化过程中的状态。例如，Adam 优化器使用了两个 slot，分别存储了变量的一阶和二阶动量估计。在每次优化时，这些 slot 将被更新并用于计算变量的梯度。通常，slot 也可以用于实现正则化、momentum 和 batch normalization 等优化算法中的额外功能。

case 体验:
对应下文中 AdagradDecayOptimizer 的 _create_slots() 方法, 当 primary_var 是
scope_emb/input_from_feature_columns/word_embedding/weights 时, 相应的 _slot 内容为:

{'accumulator': {
    (<tensorflow.python.framework.ops.Graph object at 0x000002871512F390>, 'scope_emb/input_from_feature_columns/word_embedding/weights'): 
    <tf.Variable 'OptimizeLoss/scope_emb/input_from_feature_columns/word_embedding/weights/AdagradDecay:0' shape=(1000, 10) dtype=float32_ref>
    }, 
 'accumulator_decay_power': {
     (<tensorflow.python.framework.ops.Graph object at 0x000002871512F390>, 'scope_emb/input_from_feature_columns/word_embedding/weights'): 
     <tf.Variable 'OptimizeLoss/scope_emb/input_from_feature_columns/word_embedding/weights/AdagradDecay_1:0' shape=(1000, 10) dtype=int64_ref>
     }
}

三. high level api

tensorflow.contrib.layers.python.layers.optimizers.optimize_loss(loss,global_step,learning_rate, optimizer, clip_gradients, learning_rate_decay_fn, update_ops, variables, …)
- optimizer: string, class or optimizer instance
- update_ops: list of update Operations to execute at each step. If None, uses elements of UPDATE_OPS collection. The order of execution between update_ops and loss is non-deterministic.
- variables: list of variables to optimize or None to use all trainable variables.

简明源码

def optimize_loss(...):
	# 诸如 batch norm 中的 moving_mean 等更新就在这里
	update_ops = set(ops.get_collection(ops.GraphKeys.UPDATE_OPS))
	loss = control_flow_ops.with_dependencies(list(update_ops), loss)
	gradients = opt.compute_gradients(loss,...)
	gradients = _clip_gradients_by_norm(gradients, clip_gradients)
	grad_updates = opt.apply_gradients(gradients)
	train_tensor = control_flow_ops.with_dependencies([grad_updates], loss)
	return train_tensor

三. 常用子类

2.1 GradientDescentOptimizer

class GradientDescentOptimizer(optimizer.Optimizer)
类. 梯度下降法的实现.
__init__(self, learning_rate)
构造函数中指定学习速率.

2.2 AdagradOptimizer

@tf_export("train.AdagradOptimizer")
class AdagradOptimizer(optimizer.Optimizer):
    def _create_slots(self, var_list):
        for v in var_list:
            dtype = v.dtype.base_dtype
            if v.get_shape().is_fully_defined():
                init = init_ops.constant_initializer(self._initial_accumulator_value,
                                                     dtype=dtype)
            else:
                init = self._init_constant_op(v, dtype)
            # Optimizer 父类的方法
            self._get_or_make_slot_with_initializer(v, init, v.get_shape(), dtype,
                                                    "accumulator", self._name)

2.3 AdagradDecayOptimizer

class AdagradDecayOptimizer(optimizer.Optimizer):
    def _create_slots(self, var_list):
        for v in var_list:
        	with ops.colocate_with(v)
        		self._get_or_make_slot_with_initializer(v, init, v_shape, dtype,
                                                    "accumulator", self._name)
               self._get_or_make_slot_with_initializer(
                v, init_ops.zeros_initializer(self._global_step.dtype),
                v_shape, self._global_step.dtype, "accumulator_decay_power", self._name)

2.4 AdamOptimizer

AdamOptimizer(optimizer.Optimizer)
类. 实现了Adam算法的优化器, 它是一种随机梯度下降法.

四. 多优化器并存

搭建计算图就像搭积木一样, 可以划分为多个模块, 自然也可以给各模块应用不同的优化器.
上文的 minimize 方法中有 var_list 参数, 就可以让不同的 optimizer 优化不同的模块. 那怎么给不同模块的参数作划分呢?文章来源地址https://www.toymoban.com/news/detail-555311.html

五. high api

tensorflow.contrib.layers.optimize_loss(loss, global_step, learning_rate, optimizer, clip_gradients, variables)
- variables, 对应 compute_gradients() 参数中的 var_list.

到了这里，关于tensorflow 中的 gradient 与 optimizer的文章就介绍完了。如果您还想了解更多内容，请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章，希望大家以后多多支持TOY模板网！