C++前端中的自动梯度¶

Created On: Apr 01, 2020 | Last Updated: Jan 21, 2025 | Last Verified: Not Verified

``autograd``包对于在PyTorch中构建高度灵活和动态的神经网络至关重要。PyTorch Python前端中的大多数自动梯度API也可以在C++前端中使用，从而简化了将自动梯度代码从Python转换为C++的过程。

在本教程中，探索了如何在PyTorch C++前端进行自动梯度计算的一些示例。请注意，本教程假定您已经对Python前端中的自动梯度有了基本的理解。如果没有，请先阅读`自动梯度：自动微分 <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html>`_。

基本自动梯度操作¶

（改编自`本教程 <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#autograd-automatic-differentiation>`_）

创建一个张量并设置``torch::requires_grad()``以跟踪其计算

auto x = torch::ones({2, 2}, torch::requires_grad());
std::cout << x << std::endl;

输出：

1 1
1 1
[ CPUFloatType{2,2} ]

执行一个张量操作：

auto y = x + 2;
std::cout << y << std::endl;

输出：

 3  3
 3  3
[ CPUFloatType{2,2} ]

由于``y``是由操作生成的，因此它有一个``grad_fn``。

std::cout << y.grad_fn()->name() << std::endl;

输出：

AddBackward1

对``y``进行更多操作

auto z = y * y * 3;
auto out = z.mean();

std::cout << z << std::endl;
std::cout << z.grad_fn()->name() << std::endl;
std::cout << out << std::endl;
std::cout << out.grad_fn()->name() << std::endl;

输出：

 27  27
 27  27
[ CPUFloatType{2,2} ]
MulBackward1
27
[ CPUFloatType{} ]
MeanBackward0

``.requires_grad_(…)``就地更改现有张量的``requires_grad``标志。

auto a = torch::randn({2, 2});
a = ((a * 3) / (a - 1));
std::cout << a.requires_grad() << std::endl;

a.requires_grad_(true);
std::cout << a.requires_grad() << std::endl;

auto b = (a * a).sum();
std::cout << b.grad_fn()->name() << std::endl;

输出：

false
true
SumBackward0

现在让我们反向传播。由于``out``包含单个标量，out.backward()``相当于``out.backward(torch::tensor(1.))。

out.backward();

打印梯度d(out)/dx

std::cout << x.grad() << std::endl;

输出：

 4.5000  4.5000
 4.5000  4.5000
[ CPUFloatType{2,2} ]

您应该得到一个``4.5``的矩阵。有关于如何得出此值的解释，请参阅`教程中的相应部分 <https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html#gradients>`_。

现在让我们来看一个向量-雅可比矩阵积的示例：

x = torch::randn(3, torch::requires_grad());

y = x * 2;
while (y.norm().item<double>() < 1000) {
  y = y * 2;
}

std::cout << y << std::endl;
std::cout << y.grad_fn()->name() << std::endl;

输出：

-1021.4020
  314.6695
 -613.4944
[ CPUFloatType{3} ]
MulBackward1

如果我们想要向量-雅可比矩阵积，就将向量作为参数传递给``backward``：

auto v = torch::tensor({0.1, 1.0, 0.0001}, torch::kFloat);
y.backward(v);

std::cout << x.grad() << std::endl;

输出：

4000
0000
1024
[ CPUFloatType{3} ]

您还可以通过将``torch::NoGradGuard``放置在代码块中来停止自动梯度在需要梯度的张量上跟踪历史记录。

std::cout << x.requires_grad() << std::endl;
std::cout << x.pow(2).requires_grad() << std::endl;

{
  torch::NoGradGuard no_grad;
  std::cout << x.pow(2).requires_grad() << std::endl;
}

输出：

true
true
false

或者通过使用``.detach()``获取一个具有相同内容但不需要梯度的新张量：

std::cout << x.requires_grad() << std::endl;
y = x.detach();
std::cout << y.requires_grad() << std::endl;
std::cout << x.eq(y).all().item<bool>() << std::endl;

输出：

true
false
true

有关C++张量自动梯度API的更多信息，例如``grad`` / requires_grad / is_leaf / backward / detach / detach_ / register_hook / retain_grad，请参阅`相应的C++ API文档 <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html>`_。

在C++中计算高阶梯度¶

高阶梯度的一个应用是计算梯度惩罚。以下是使用``torch::autograd::grad``的示例：

#include <torch/torch.h>

auto model = torch::nn::Linear(4, 3);

auto input = torch::randn({3, 4}).requires_grad_(true);
auto output = model(input);

// Calculate loss
auto target = torch::randn({3, 3});
auto loss = torch::nn::MSELoss()(output, target);

// Use norm of gradients as penalty
auto grad_output = torch::ones_like(output);
auto gradient = torch::autograd::grad({output}, {input}, /*grad_outputs=*/{grad_output}, /*create_graph=*/true)[0];
auto gradient_penalty = torch::pow((gradient.norm(2, /*dim=*/1) - 1), 2).mean();

// Add gradient penalty to loss
auto combined_loss = loss + gradient_penalty;
combined_loss.backward();

std::cout << input.grad() << std::endl;

输出：

-0.1042 -0.0638  0.0103  0.0723
-0.2543 -0.1222  0.0071  0.0814
-0.1683 -0.1052  0.0355  0.1024
[ CPUFloatType{3,4} ]

Please see the documentation for torch::autograd::backward (link) and torch::autograd::grad (link) for more information on how to use them.

在C++中使用自定义自动梯度函数¶

（改编自`本教程 <https://pytorch.org/docs/stable/notes/extending.html#extending-torch-autograd>`_）

向``torch::autograd``添加新的基本操作需要为每个操作实现一个新的``torch::autograd::Function``子类。torch::autograd::Function``用于计算结果和梯度并编码操作历史。每个新函数需要实现2个方法：``forward``和``backward，更多详细要求请参阅`此链接 <https://pytorch.org/cppdocs/api/structtorch_1_1autograd_1_1_function.html>`_。

以下是``torch::nn``中``Linear``函数的代码：

#include <torch/torch.h>

using namespace torch::autograd;

// Inherit from Function
class LinearFunction : public Function<LinearFunction> {
 public:
  // Note that both forward and backward are static functions

  // bias is an optional argument
  static torch::Tensor forward(
      AutogradContext *ctx, torch::Tensor input, torch::Tensor weight, torch::Tensor bias = torch::Tensor()) {
    ctx->save_for_backward({input, weight, bias});
    auto output = input.mm(weight.t());
    if (bias.defined()) {
      output += bias.unsqueeze(0).expand_as(output);
    }
    return output;
  }

  static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
    auto saved = ctx->get_saved_variables();
    auto input = saved[0];
    auto weight = saved[1];
    auto bias = saved[2];

    auto grad_output = grad_outputs[0];
    auto grad_input = grad_output.mm(weight);
    auto grad_weight = grad_output.t().mm(input);
    auto grad_bias = torch::Tensor();
    if (bias.defined()) {
      grad_bias = grad_output.sum(0);
    }

    return {grad_input, grad_weight, grad_bias};
  }
};

然后，我们可以按以下方式使用``LinearFunction``：

auto x = torch::randn({2, 3}).requires_grad_();
auto weight = torch::randn({4, 3}).requires_grad_();
auto y = LinearFunction::apply(x, weight);
y.sum().backward();

std::cout << x.grad() << std::endl;
std::cout << weight.grad() << std::endl;

输出：

5314  1.2807  1.4864
5314  1.2807  1.4864
[ CPUFloatType{2,3} ]
7608  0.9101  0.0073
7608  0.9101  0.0073
7608  0.9101  0.0073
7608  0.9101  0.0073
[ CPUFloatType{4,3} ]

这里，我们给出了一个按非张量参数化函数的额外示例：

#include <torch/torch.h>

using namespace torch::autograd;

class MulConstant : public Function<MulConstant> {
 public:
  static torch::Tensor forward(AutogradContext *ctx, torch::Tensor tensor, double constant) {
    // ctx is a context object that can be used to stash information
    // for backward computation
    ctx->saved_data["constant"] = constant;
    return tensor * constant;
  }

  static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
    // We return as many input gradients as there were arguments.
    // Gradients of non-tensor arguments to forward must be `torch::Tensor()`.
    return {grad_outputs[0] * ctx->saved_data["constant"].toDouble(), torch::Tensor()};
  }
};

然后，我们可以按以下方式使用``MulConstant``：

auto x = torch::randn({2}).requires_grad_();
auto y = MulConstant::apply(x, 5.5);
y.sum().backward();

std::cout << x.grad() << std::endl;

输出：

 5.5000
 5.5000
[ CPUFloatType{2} ]

有关``torch::autograd::Function``的更多信息，请参阅`其文档 <https://pytorch.org/cppdocs/api/structtorch_1_1autograd_1_1_function.html>`_。

将自动梯度代码从Python翻译到C++¶

从高层次看，在C++中使用自动梯度最简单的方法是首先在Python中实现有效的自动梯度代码，然后按照以下表格将您的Python代码翻译为C++：

Python	C++
`torch.autograd.backward`	`torch::autograd::backward` (链接)
`torch.autograd.grad`	`torch::autograd::grad` (链接)
`torch.Tensor.detach`	`torch::Tensor::detach` (链接)
`torch.Tensor.detach_`	`torch::Tensor::detach_` (链接)
`torch.Tensor.backward`	`torch::Tensor::backward` (链接)
`torch.Tensor.register_hook`	`torch::Tensor::register_hook` (链接)
`torch.Tensor.requires_grad`	`torch::Tensor::requires_grad_` (链接)
`torch.Tensor.retain_grad`	`torch::Tensor::retain_grad` (链接)
`torch.Tensor.grad`	`torch::Tensor::grad` (链接)
`torch.Tensor.grad_fn`	`torch::Tensor::grad_fn` (链接)
`torch.Tensor.set_data`	`torch::Tensor::set_data` (链接)
`torch.Tensor.data`	`torch::Tensor::data` (链接)
`torch.Tensor.output_nr`	`torch::Tensor::output_nr` (链接)
`torch.Tensor.is_leaf`	`torch::Tensor::is_leaf` (链接)

翻译后，大多数Python自动梯度代码在C++中应当能够正常工作。如果无法正常工作，请通过`GitHub问题 <https://github.com/pytorch/pytorch/issues>`_提交错误报告，我们将尽快修复。

总结¶

您现在应该对PyTorch C++自动梯度API有了良好的概览。可以通过`这里 <https://github.com/pytorch/examples/tree/master/cpp/autograd>`_找到本笔记中显示的代码示例。如果遇到任何问题或有疑问，您可以使用我们的`论坛 <https://discuss.pytorch.org/>`_或`GitHub问题 <https://github.com/pytorch/pytorch/issues>`_与我们联系。

C++前端中的自动梯度¶

基本自动梯度操作¶

在C++中计算高阶梯度¶

在C++中使用自定义自动梯度函数¶

将自动梯度代码从Python翻译到C++¶

总结¶

文档

教程

资源