.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "recipes/torch_export_aoti_python.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_recipes_torch_export_aoti_python.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_recipes_torch_export_aoti_python.py:


.. meta::
   :description: An end-to-end example of how to use AOTInductor for Python runtime.
   :keywords: torch.export, AOTInductor, torch._inductor.aoti_compile_and_package, aot_compile, torch._export.aoti_load_package

``torch.export`` AOTInductor Tutorial for Python runtime (Beta)
===============================================================
**Author:** Ankith Gunapal, Bin Bao, Angela Yi

.. GENERATED FROM PYTHON SOURCE LINES 14-33

.. warning::

    ``torch._inductor.aoti_compile_and_package`` and
    ``torch._inductor.aoti_load_package`` are in Beta status and are subject
    to backwards compatibility breaking changes. This tutorial provides an
    example of how to use these APIs for model deployment using Python
    runtime.

It has been shown `previously
<https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`__ how
AOTInductor can be used to do Ahead-of-Time compilation of PyTorch exported
models by creating an artifact that can be run in a non-Python environment.
In this tutorial, you will learn an end-to-end example of how to use
AOTInductor for Python runtime.

**Contents**

.. contents::
    :local:

.. GENERATED FROM PYTHON SOURCE LINES 36-41

Prerequisites
-------------
* PyTorch 2.6 or later
* Basic understanding of ``torch.export`` and AOTInductor
* Complete the `AOTInductor: Ahead-Of-Time Compilation for Torch.Export-ed Models <https://pytorch.org/docs/stable/torch.compiler_aot_inductor.html#>`_ tutorial

.. GENERATED FROM PYTHON SOURCE LINES 43-49

What you will learn
----------------------
* How to use AOTInductor for Python runtime.
* How to use :func:`torch._inductor.aoti_compile_and_package` along with :func:`torch.export.export` to generate a compiled artifact
* How to load and run the artifact in a Python runtime using :func:`torch._export.aot_load`.
* When to you use AOTInductor with a Python runtime

.. GENERATED FROM PYTHON SOURCE LINES 51-72

Model Compilation
-----------------

We will use the TorchVision pretrained ``ResNet18`` model as an example.

The first step is to export the model to a graph representation using
:func:`torch.export.export`. To learn more about using this function, you can
check out the `docs <https://pytorch.org/docs/main/export.html>`_ or the
`tutorial <https://pytorch.org/tutorials/intermediate/torch_export_tutorial.html>`_.

Once we have exported the PyTorch model and obtained an ``ExportedProgram``,
we can apply :func:`torch._inductor.aoti_compile_and_package` to AOTInductor
to compile the program to a specified device, and save the generated contents
into a ".pt2" artifact.

.. note::

      This API supports the same available options that :func:`torch.compile`
      has, such as ``mode`` and ``max_autotune`` (for those who want to enable
      CUDA graphs and leverage Triton based matrix multiplications and
      convolutions)

.. GENERATED FROM PYTHON SOURCE LINES 72-103

.. code-block:: default


    import os
    import torch
    import torch._inductor
    from torchvision.models import ResNet18_Weights, resnet18

    model = resnet18(weights=ResNet18_Weights.DEFAULT)
    model.eval()

    with torch.inference_mode():
        inductor_configs = {}

        if torch.cuda.is_available():
            device = "cuda"
            inductor_configs["max_autotune"] = True
        else:
            device = "cpu"

        model = model.to(device=device)
        example_inputs = (torch.randn(2, 3, 224, 224, device=device),)

        exported_program = torch.export.export(
            model,
            example_inputs,
        )
        path = torch._inductor.aoti_compile_and_package(
            exported_program,
            package_path=os.path.join(os.getcwd(), "resnet18.pt2"),
            inductor_configs=inductor_configs
        )


.. GENERATED FROM PYTHON SOURCE LINES 104-163

The result of :func:`aoti_compile_and_package` is an artifact "resnet18.pt2"
which can be loaded and executed in Python and C++.

The artifact itself contains a bunch of AOTInductor generated code, such as
a generated C++ runner file, a shared library compiled from the C++ file, and
CUDA binary files, aka cubin files, if optimizing for CUDA.

Structure-wise, the artifact is a structured ``.zip`` file, with the following 
specification:

.. code::
   .
   ├── archive_format
   ├── version
   ├── data
   │   ├── aotinductor
   │   │   └── model
   │   │       ├── xxx.cpp            # AOTInductor generated cpp file
   │   │       ├── xxx.so             # AOTInductor generated shared library
   │   │       ├── xxx.cubin          # Cubin files (if running on CUDA)
   │   │       └── xxx_metadata.json  # Additional metadata to save
   │   ├── weights
   │   │  └── TBD
   │   └── constants
   │      └── TBD
   └── extra
       └── metadata.json

We can use the following command to inspect the artifact contents:

.. code:: bash

   $ unzip -l resnet18.pt2

.. code::

   Archive:  resnet18.pt2
     Length      Date    Time    Name
   ---------  ---------- -----   ----
           1  01-08-2025 16:40   version
           3  01-08-2025 16:40   archive_format
       10088  01-08-2025 16:40   data/aotinductor/model/cagzt6akdaczvxwtbvqe34otfe5jlorktbqlojbzqjqvbfsjlge4.cubin
       17160  01-08-2025 16:40   data/aotinductor/model/c6oytfjmt5w4c7onvtm6fray7clirxt7q5xjbwx3hdydclmwoujz.cubin
       16616  01-08-2025 16:40   data/aotinductor/model/c7ydp7nocyz323hij4tmlf2kcedmwlyg6r57gaqzcsy3huneamu6.cubin
       17776  01-08-2025 16:40   data/aotinductor/model/cyqdf46ordevqhiddvpdpp3uzwatfbzdpl3auj2nx23uxvplnne2.cubin
       10856  01-08-2025 16:40   data/aotinductor/model/cpzfebfgrusqslui7fxsuoo4tvwulmrxirc5tmrpa4mvrbdno7kn.cubin
       14608  01-08-2025 16:40   data/aotinductor/model/c5ukeoz5wmaszd7vczdz2qhtt6n7tdbl3b6wuy4rb2se24fjwfoy.cubin
       11376  01-08-2025 16:40   data/aotinductor/model/csu3nstcp56tsjfycygaqsewpu64l5s6zavvz7537cm4s4cv2k3r.cubin
       10984  01-08-2025 16:40   data/aotinductor/model/cp76lez4glmgq7gedf2u25zvvv6rksv5lav4q22dibd2zicbgwj3.cubin
       14736  01-08-2025 16:40   data/aotinductor/model/c2bb5p6tnwz4elgujqelsrp3unvkgsyiv7xqxmpvuxcm4jfl7pc2.cubin
       11376  01-08-2025 16:40   data/aotinductor/model/c6eopmb2b4ngodwsayae4r5q6ni3jlfogfbdk3ypg56tgpzhubfy.cubin
       11624  01-08-2025 16:40   data/aotinductor/model/chmwe6lvoekzfowdbiizitm3haiiuad5kdm6sd2m6mv6dkn2zk32.cubin
       15632  01-08-2025 16:40   data/aotinductor/model/c3jop5g344hj3ztsu4qm6ibxyaaerlhkzh2e6emak23rxfje6jam.cubin
       25472  01-08-2025 16:40   data/aotinductor/model/chaiixybeiuuitm2nmqnxzijzwgnn2n7uuss4qmsupgblfh3h5hk.cubin
      139389  01-08-2025 16:40   data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t.cpp
          27  01-08-2025 16:40   data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t_metadata.json
    47195424  01-08-2025 16:40   data/aotinductor/model/cvk6qzuybruhwxtfblzxiov3rlrziv5fkqc4mdhbmantfu3lmd6t.so
   ---------                     -------
    47523148                     18 files

.. GENERATED FROM PYTHON SOURCE LINES 166-171

Model Inference in Python
-------------------------

To load and run the artifact in Python, we can use :func:`torch._inductor.aoti_load_package`.


.. GENERATED FROM PYTHON SOURCE LINES 171-185

.. code-block:: default


    import os
    import torch
    import torch._inductor

    model_path = os.path.join(os.getcwd(), "resnet18.pt2")

    compiled_model = torch._inductor.aoti_load_package(model_path)
    example_inputs = (torch.randn(2, 3, 224, 224, device=device),)

    with torch.inference_mode():
        output = compiled_model(example_inputs)


.. GENERATED FROM PYTHON SOURCE LINES 186-207

When to use AOTInductor with a Python Runtime
---------------------------------------------

There are mainly two reasons why one would use AOTInductor with a Python Runtime:

-  ``torch._inductor.aoti_compile_and_package`` generates a singular
   serialized artifact. This is useful for model versioning for deployments
   and tracking model performance over time.
-  With :func:`torch.compile` being a JIT compiler, there is a warmup
   cost associated with the first compilation. Your deployment needs to
   account for the compilation time taken for the first inference. With
   AOTInductor, the compilation is done ahead of time using
   ``torch.export.export`` and ``torch._inductor.aoti_compile_and_package``.
   At deployment time, after loading the model, running inference does not
   have any additional cost.


The section below shows the speedup achieved with AOTInductor for first inference

We define a utility function ``timed`` to measure the time taken for inference


.. GENERATED FROM PYTHON SOURCE LINES 207-236

.. code-block:: default


    import time
    def timed(fn):
        # Returns the result of running `fn()` and the time it took for `fn()` to run,
        # in seconds. We use CUDA events and synchronization for accurate
        # measurement on CUDA enabled devices.
        if torch.cuda.is_available():
            start = torch.cuda.Event(enable_timing=True)
            end = torch.cuda.Event(enable_timing=True)
            start.record()
        else:
            start = time.time()

        result = fn()
        if torch.cuda.is_available():
            end.record()
            torch.cuda.synchronize()
        else:
            end = time.time()

        # Measure time taken to execute the function in miliseconds
        if torch.cuda.is_available():
            duration = start.elapsed_time(end)
        else:
            duration = (end - start) * 1000

        return result, duration


.. GENERATED FROM PYTHON SOURCE LINES 237-238

Lets measure the time for first inference using AOTInductor

.. GENERATED FROM PYTHON SOURCE LINES 238-249

.. code-block:: default


    torch._dynamo.reset()

    model = torch._inductor.aoti_load_package(model_path)
    example_inputs = (torch.randn(1, 3, 224, 224, device=device),)

    with torch.inference_mode():
        _, time_taken = timed(lambda: model(example_inputs))
        print(f"Time taken for first inference for AOTInductor is {time_taken:.2f} ms")


.. GENERATED FROM PYTHON SOURCE LINES 250-251

Lets measure the time for first inference using ``torch.compile``

.. GENERATED FROM PYTHON SOURCE LINES 251-264

.. code-block:: default


    torch._dynamo.reset()

    model = resnet18(weights=ResNet18_Weights.DEFAULT).to(device)
    model.eval()

    model = torch.compile(model)
    example_inputs = torch.randn(1, 3, 224, 224, device=device)

    with torch.inference_mode():
        _, time_taken = timed(lambda: model(example_inputs))
        print(f"Time taken for first inference for torch.compile is {time_taken:.2f} ms")


.. GENERATED FROM PYTHON SOURCE LINES 265-267

We see that there is a drastic speedup in first inference time using AOTInductor compared
to ``torch.compile``

.. GENERATED FROM PYTHON SOURCE LINES 269-277

Conclusion
----------

In this recipe, we have learned how to effectively use the AOTInductor for Python runtime by 
compiling and loading a pretrained ``ResNet18`` model. This process
demonstrates the practical application of generating a compiled artifact and
running it within a Python environment. We also looked at the advantage of using 
AOTInductor in model deployments, with regards to speed up in first inference time.


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.000 seconds)


.. _sphx_glr_download_recipes_torch_export_aoti_python.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example


    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: torch_export_aoti_python.py <torch_export_aoti_python.py>`

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: torch_export_aoti_python.ipynb <torch_export_aoti_python.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_