(测试版) 使用学习率调度器运行已编译优化器¶

Created On: May 21, 2024 | Last Updated: May 21, 2024 | Last Verified: Nov 05, 2024

作者：Michael Lazos

优化器是训练任何深度学习模型的关键算法。在此示例中，我们将展示如何将通过``torch.compile``编译的优化器与学习率调度器配对以加速训练收敛。

备注

本教程需要PyTorch 2.3.0或更高版本。

模型设置¶

在此示例中，我们将使用一系列简单的线性层。

import torch

# Create simple model
model = torch.nn.Sequential(
    *[torch.nn.Linear(1024, 1024, False, device="cuda") for _ in range(10)]
)
input = torch.rand(1024, device="cuda")

# run forward pass
output = model(input)

# run backward to populate the grads for our optimizer below
output.sum().backward()

设置并运行带学习率调度器的已编译优化器¶

在本节中，我们将使用Adam优化器和LinearLR调度器，并创建一个辅助函数为它们分别的``step()``调用包装``torch.compile()``。

备注

``torch.compile``仅支持计算能力>=7.0的CUDA设备。

# exit cleanly if we are on a device that doesn't support ``torch.compile``
if torch.cuda.get_device_capability() < (7, 0):
    print("Exiting because torch.compile is not supported on this device.")
    import sys
    sys.exit(0)

# !!! IMPORTANT !!! Wrap the lr in a Tensor if we are pairing the
# the optimizer with an LR Scheduler.
# Without this, torch.compile will recompile as the value of the LR
# changes.
opt = torch.optim.Adam(model.parameters(), lr=torch.tensor(0.01))
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)

@torch.compile(fullgraph=False)
def fn():
    opt.step()
    sched.step()


# Warmup runs to compile the function
for _ in range(5):
    fn()
    print(opt.param_groups[0]["lr"])

扩展：非张量学习率会发生什么？¶

对于好奇者，我们将展示当不将学习率包装在张量中时，``torch.compile``会发生什么。

# No longer wrap the LR in a tensor here
opt = torch.optim.Adam(model.parameters(), lr=0.01)
sched = torch.optim.lr_scheduler.LinearLR(opt, total_iters=5)

@torch.compile(fullgraph=False)
def fn():
    opt.step()
    sched.step()

# Setup logging to view recompiles
torch._logging.set_logs(recompiles=True)

# Warmup runs to compile the function
# We will now recompile on each iteration
# as the value of the lr is mutated.
for _ in range(5):
    fn()

通过此示例，我们可以看到由于学习率``param_groups[0]``中的守护失败，我们多次重新编译了优化器。

结论¶

在本教程中，我们展示了如何将使用``torch.compile``编译的优化器与学习率调度器配对，以加速训练收敛。我们使用了由简单线性层组成的模型，其中Adam优化器与LinearLR调度器配对，演示学习率在迭代中变化的情况。

另请参见：

编译优化器教程 - 一个关于编译优化器的介绍。
使用PT2编译优化器 - 有关编译优化器的更深入技术细节。

脚本的总运行时间： (0分钟 0.000秒)

由Sphinx-Gallery生成的图集

(测试版) 使用学习率调度器运行已编译优化器¶

模型设置¶

设置并运行带学习率调度器的已编译优化器¶

扩展：非张量学习率会发生什么？¶

结论¶

文档

教程

资源