Ivy as a Transpiler

On the Building Blocks page, we explored the role of the backend functional APIs, the Ivy functional API, the backend handler and the graph compiler. These parts are labelled (a) in the image below.

Here, we explain the role of the backend-specific frontends in Ivy, and how these enable automatic code conversions between different ML frameworks. This part is labelled as (b) in the image below.

The code conversion tools described on this page are works in progress, as indicated by the the construction signs 🚧. This is in keeping with the rest of the documentation.

https://github.com/unifyai/unifyai.github.io/blob/master/img/externally_linked/design/submodule_dependency_graph.png?raw=true

Frontend Functional APIs 🚧

While the backend API, Ivy API and backend handler enable all Ivy code to be framework-agnostic, they do not, for example, enable PyTorch code to be framework agnostic. But with frontend APIs, we can also achieve this!

Let’s take a look at the how the implementation of clip method would seem like in the frontends:

# ivy/functional/frontends/jax/lax/functions.py
def clamp(x_min,x, x_max):
    return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/numpy/general.py
def clip(x, x_min, x_max):
    return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/tensorflow/general.py
def clip_by_value(x, x_min, x_max):
    return ivy.clip(x, x_min, x_max)
# ivy/functional/frontends/torch/general.py
def clamp(x, x_min, x_max):
    return ivy.clip(x, x_min, x_max)

combined, we have the following situation:

https://github.com/unifyai/unifyai.github.io/blob/master/img/externally_linked/design/clip_backends_n_frontends.png?raw=true

Importantly, we can select the backend and frontend independently from one another. For example, this means we can select a JAX backend, but also select the PyTorch frontend and write Ivy code which fully adheres to the PyTorch functional API. In the reverse direction: we can take pre-written pure PyTorch code, replace each PyTorch function with the equivalent function using Ivy’s PyTorch frontend, and then run this PyTorch code using JAX:

https://github.com/unifyai/unifyai.github.io/blob/master/img/externally_linked/design/clip_conversion.png?raw=true

For this example it’s very simple, the differences are only syntactic, but the above process works for any function. If there are semantic differences then these will be captured (a) in the wrapped frontend code which expresses the frontend method as a composition of Ivy functions, and (b) in the wrapped backend code which expressed the Ivy functions as compositions of backend methods.

Let’s take a more complex example and convert PyTorch method torch.nn.functional.one_hot() into NumPy code. The frontend is implemented by wrapping a single Ivy method ivy.one_hot() as follows:

# ivy/functional/frontends/torch/nn/sparse_functions.py
def one_hot(tensor, num_classes=-1):
    return ivy.one_hot(tensor, num_classes)

Let’s look at the NumPy backend code for this Ivy method:

# ivy/functional/backends/numpy/general.py
 def one_hot(
     indices: np.ndarray, depth: int, *, device: str, out: Optional[np.ndarray] = None
 ) -> np.ndarray:
    res = np.eye(depth)[np.array(indices).reshape(-1)]
    return res.reshape(list(indices.shape) + [depth])

By chaining these method together, we can now call torch.nn.functional.one_hot() using NumPy:

import ivy
import ivy.frontends.torch as torch

ivy.set_backend('numpy')

x = np.array([0., 1., 2.])
ret = torch.nn.functional.one_hot(x, 3)

Let’s take one more example and convert TensorFlow method tf.cumprod() into PyTorch code. This time, the frontend is implemented by wrapping two Ivy methods ivy.cumprod(), and ivy.flip() as follows:

# ivy/functional/frontends/tensorflow/math.py
def cumprod(x, axis=0, exclusive=False, reverse=False, name=None):
    ret = ivy.cumprod(x, axis, exclusive)
    if reverse:
        return ivy.flip(ret, axis)
    return ret

Let’s look at the PyTorch backend code for both of these Ivy methods:

# ivy/functional/backends/torch/general.py
 def cumprod(
     x: torch.Tensor,
     axis: int = 0,
     exclusive: bool = False,
     *,
     out: Optional[torch.Tensor] = None,
 ) -> torch.Tensor:
     if exclusive:
         x = torch.transpose(x, axis, -1)
         x = torch.cat((torch.ones_like(x[..., -1:]), x[..., :-1]), -1, out=out)
         res = torch.cumprod(x, -1, out=out)
         return torch.transpose(res, axis, -1)
     return torch.cumprod(x, axis, out=out)
# ivy/functional/backends/torch/manipulation.py
 def flip(
     x: torch.Tensor,
     axis: Optional[Union[int, Sequence[int]]] = None,
     *,
     out: Optional[torch.Tensor] = None,
 ) -> torch.Tensor:
     num_dims: int = len(x.shape)
     if not num_dims:
         return x
     if axis is None:
         new_axis: List[int] = list(range(num_dims))
     else:
         new_axis: List[int] = axis
     if isinstance(new_axis, int):
         new_axis = [new_axis]
     else:
         new_axis = new_axis
     new_axis = [item + num_dims if item < 0 else item for item in new_axis]
     ret = torch.flip(x, new_axis)
     return ret

Again, by chaining these methods together, we can now call tf.math.cumprod() using PyTorch:

import ivy
import ivy.frontends.tensorflow as tf

ivy.set_backend('torch')

x = torch.tensor([[0., 1., 2.]])
ret = tf.math.cumprod(x, -1)

Role of the Graph Compiler 🚧

The very simple example above worked well, but what about even more complex PyTorch code involving Modules, Optimizers, and other higher level objects? This is where the graph compiler plays a vital role. The graph compiler can convert any code into its constituent functions at the functional API level for any ML framework.

For example, let’s take the following PyTorch code and run it using JAX:

import torch

class Network(torch.nn.Module):

    def __init__(self):
     super(Network, self).__init__()
     self._linear = torch.nn.Linear(3, 3)

    def forward(self, x):
     return self._linear(x)

x = torch.tensor([1., 2., 3.])
net = Network()
net(x)

We cannot simply import ivy.frontends.torch in place of import torch as we did in the previous examples. This is because the Ivy frontend only supports the functional API for each framework, whereas the code above makes use of higher level classes through the use of the torch.nn namespace.

In general, the way we convert code is by first compiling the code into its constituent functions in the core API using Ivy’s graph compiler, and then we convert this executable graph into the new framework. For the example above, this would look like:

import jax
import ivy

jax_graph = ivy.compile_graph(net, x).to_backend('jax')
x = jax.numpy.array([1., 2., 3.])
jax_graph(x)

However, when calling ivy.compile_graph() the graph only connects the inputs to the outputs. Any other tensors or variables which are not listed in the inputs are treated as constants in the graph. In this case, this means the learnable weights in the Module will be treated as constants. This works fine if we only care about running inference on our graph post-training, but this won’t enable training of the Module in JAX.

Converting Network Models 🚧

In order to convert a model from PyTorch to JAX, we first must convert the torch.nn.Module instance to an ivy.Module instance using the method ivy.to_ivy_module() like so:

net = ivy.to_ivy_module(net)

In its current form, the ivy.Module instance thinly wraps the PyTorch model into the ivy.Module interface, whilst preserving the pure PyTorch backend. We can compile this network into a graph using Ivy’s graph compiler like so:

net = net.compile_graph()

In this case, the learnable weights are treated as inputs to the graph rather than constants.

Now, with a compiled graph under the hood of our model, we can call to_backend() directly on the ivy.Module instance to convert it to any backend of our choosing, like so:

net = net.to_backend('jax')

The network can now be trained using Ivy’s optimizer classes with a JAX backend like so:

optimizer = ivy.Adam(1e-4)
x_in = ivy.array([1., 2., 3.])
target = ivy.array([0.])

def loss_fn(v):
    out = model(x_in, v=v)
    return ivy.reduce_mean((out - target)**2)

for step in range(100):
    loss, grads = ivy.execute_with_gradients(loss_fn, model.v)
    model.v = optimizer.step(model.v, grads)

To convert this ivy.Module instance to a haiku.Module instance, we can call to_haiku_module() like so:

net = net.to_haiku_module()

If we want to remove Ivy from the pipeline entirely, we can then train the model in haiku like so:

import haiku as hk
import jax.numpy as jnp

x_in = jnp.array([1., 2., 3.])
target = jnp.array([0.])

def loss_fn():
    out = net(x_in)
    return jnp.mean((out - target)**2)

loss_fn_t = hk.transform(loss_fn)
loss_fn_t = hk.without_apply_rng(loss_fn_t)

rng = jax.random.PRNGKey(42)
params = loss_fn_t.init(rng)

def update_rule(param, update):
    return param - 0.01 * update

for i in range(100):
    grads = jax.grad(loss_fn_t.apply)(params)
    params = jax.tree_multimap(update_rule, params, grads)

Other JAX-specific network libraries such as Flax, Trax and Objax are also supported.

Overall, we have taken a torch.nn.Module instance, which can be trained using PyTorch’s optimizer classes, and converted this to a haiku.Module instance which can be trained using Haiku’s optimizer classes. The same is true for any combination of frameworks, and for any network architecture, regardless of its complexity!

Round Up

Hopefully this has explained how, with the addition of backend-specific frontends, Ivy will be able to easily convert code between different ML frameworks 🙂 works in progress, as indicated by the the construction signs 🚧. This is in keeping with the rest of the documentation.

Please reach out on discord if you have any questions!