pytorch 固定部分网络参数需要使用 with torch.no

这篇具有很好参考价值的文章主要介绍了pytorch 固定部分网络参数需要使用 with torch.no_grad()吗。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

在 PyTorch 中，torch.no_grad() 是一个上下文管理器，用于设置一段代码的计算图不需要梯度。具体来说，当我们在 torch.no_grad() 的上下文中执行某些操作时，PyTorch 不会为这些操作自动计算梯度，以节省计算资源。

使用 torch.no_grad() 可以有如下几种情况：

测试模型：在测试模型或部分模型时，我们不需要计算梯度，因为这些操作不会影响我们的模型的训练。此时，可以使用 torch.no_grad()。
固定模型参数：有时我们可能需要固定模型的某些参数，例如在微调（fine-tuning）预训练模型时，我们可能只需要更新一部分参数，而其他参数应该被固定下来。此时，可以使用 torch.no_grad() 来固定特定的参数。

在 PyTorch 中，固定部分网络参数不一定需要使用 torch.no_grad()。当我们将需要固定的参数的 requires_grad 属性设置为 False 时，这些参数在计算梯度时就不会被更新，因此不需要使用 torch.no_grad()。

然而，当我们在使用不需要更新的参数进行前向传递时，如果不使用 torch.no_grad()，PyTorch 会默认计算梯度，这会浪费计算资源。因此，为了节省计算资源，建议在使用不需要更新的参数进行前向传递时使用 torch.no_grad()。

下面是一个示例代码，演示如何使用 torch.no_grad() 来固定部分网络参数：

import torch
import torch.nn as nn

# 创建一个简单的网络
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 1)
        
    def forward(self, x):
       
        x = self.fc1(x)
        x = nn.functional.relu(x)
        with torch.no_grad():
            x = self.fc2(x)
        return x

# 创建输入和标签
inputs = torch.randn(3, 10)
labels = torch.tensor([[1.0], [0.0], [1.0]])

# 创建网络和优化器
net = Net()

# 前向传递计算
outputs = net(inputs)

# 在测试模型时，可以使用 torch.no_grad() 来禁用梯度计算

    # 对输出进行操作，但不需要计算梯度
outputs = nn.functional.sigmoid(outputs)

# 计算损失函数
loss = nn.functional.binary_cross_entropy_with_logits(outputs, labels)

# 反向传播计算梯度
loss.backward()


for name, param in net.named_parameters():
    print(name, param.grad)


输出：
报错： RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

在上述代码中，我们首先定义了一个简单的 Net 网络，并使用 torch.no_grad() 来禁用了在 fc2 层的梯度计算。然后，我们对输出进行了操作，并计算了损失函数和梯度。最后，我们输出了每个参数的梯度。可以看到，由于我们在 fc2 层使用了 torch.no_grad()，因此 fc2 层的参数的梯度为 None，而 fc1 层的参数的梯度正常计算。所以由于 fc2 层的梯度为None。所以反向传播会直接报错。

我的建议：

with torch.no_grad()仅仅在测试的时候用就行，固定参数直接requires_grad = False就可以了。

with torch.no_grad() 很容易导致反向传播的时候某些层无法计算梯度，导致RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn错误。

所以 with torch.no_grad()尽量放在网络前面层，不用放在最后面的层，比如上面这个例子，固定fc2，梯度会导致无法传播到fc1，导致报错。

如果修改一下固定fc1，就没有错误

import torch
import torch.nn as nn

# 创建一个简单的网络
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(10, 5)
        self.fc2 = nn.Linear(5, 1)
        
    def forward(self, x):
        with torch.no_grad():
            x = self.fc1(x)
        x = nn.functional.relu(x)
        
        x = self.fc2(x)
        return x

# 创建输入和标签
inputs = torch.randn(3, 10)
labels = torch.tensor([[1.0], [0.0], [1.0]])

# 创建网络和优化器
net = Net()

# 前向传递计算
outputs = net(inputs)

# 在测试模型时，可以使用 torch.no_grad() 来禁用梯度计算

    # 对输出进行操作，但不需要计算梯度
outputs = nn.functional.sigmoid(outputs)

# 计算损失函数
loss = nn.functional.binary_cross_entropy_with_logits(outputs, labels)

# 反向传播计算梯度
loss.backward()


for name, param in net.named_parameters():
    print(name, param.grad)

输出：
fc1.weight None
fc1.bias None
fc2.weight tensor([[ 0.0246,  0.0000, -0.0222,  0.0000, -0.0331]])
fc2.bias tensor([-0.0125])