This page was generated from tools/engine/tensor_plus_tensor.ipynb.

JIT Engine: Tensor + Tensor¶

This example will go over how to compile MLIR code to a function callable from Python.

The example MLIR code we’ll use here performs element-wise tensor addition.

Let’s first import some necessary modules and generate an instance of our JIT engine.

import mlir_graphblas
import numpy as np

engine = mlir_graphblas.MlirJitEngine()

Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

We’ll use the same set of passes to optimize and compile all of our examples below.

passes = [
    "--graphblas-structuralize",
    "--graphblas-optimize",
    "--graphblas-lower",
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-cf",
    "--convert-memref-to-llvm",
    "--convert-math-to-llvm",
    "--convert-openmp-to-llvm",
    "--convert-arith-to-llvm",
    "--convert-math-to-llvm",
    "--convert-std-to-llvm",
    "--reconcile-unrealized-casts"
]

Fixed-Size Tensor Addition¶

Here’s some MLIR code to add two 32-bit floating point tensors of with the shape 2x3.

mlir_text = """
#trait_add = {
 indexing_maps = [
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>
 ],
 iterator_types = ["parallel", "parallel"]
}

func @matrix_add_f32(%arga: tensor<2x3xf32>, %argb: tensor<2x3xf32>) -> tensor<2x3xf32> {
  %answer = linalg.generic #trait_add
    ins(%arga, %argb: tensor<2x3xf32>, tensor<2x3xf32>)
    outs(%arga: tensor<2x3xf32>) {
      ^bb(%a: f32, %b: f32, %s: f32):
        %sum = arith.addf %a, %b : f32
        linalg.yield %sum : f32
  } -> tensor<2x3xf32>
  return %answer : tensor<2x3xf32>
}
"""

Let’s compile our MLIR code.

engine.add(mlir_text, passes)

['matrix_add_f32']

Let’s try out our compiled function.

# grab our callable
matrix_add_f32 = engine.matrix_add_f32

# generate inputs
a = np.arange(6, dtype=np.float32).reshape([2, 3])
b = np.full([2, 3], 100, dtype=np.float32)

# generate output
result = matrix_add_f32(a, b)

result

array([[100., 101., 102.],
       [103., 104., 105.]], dtype=float32)

Let’s verify that our function works as expected.

np.all(result == np.add(a, b))

True

Arbitrary-Size Tensor Addition¶

The above example created a function to add two matrices of size 2x3. This function won’t work if we want to add two matrices of size 4x5 or any other size.

a = np.arange(20, dtype=np.float32).reshape([4, 5])
b = np.full([4, 5], 100, dtype=np.float32)
matrix_add_f32(a, b)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [8], in <cell line: 3>()
      1 a = np.arange(20, dtype=np.float32).reshape([4, 5])
      2 b = np.full([4, 5], 100, dtype=np.float32)
----> 3 matrix_add_f32(a, b)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:843, in MlirJitEngine._generate_zero_or_single_valued_functions.<locals>.python_callable(mlir_function, encoders, c_callable, decoder, *args)
    839     raise ValueError(
    840         f"{name} expected {len(mlir_function.args)} args but got {len(args)}."
    841     )
    842 encoded_args = (encoder(arg) for arg, encoder in zip(args, encoders))
--> 843 encoded_args = sum(encoded_args, [])
    844 encoded_result = c_callable(*encoded_args)
    845 result = decoder(encoded_result)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:842, in <genexpr>(.0)
    838 if len(args) != len(mlir_function.args):
    839     raise ValueError(
    840         f"{name} expected {len(mlir_function.args)} args but got {len(args)}."
    841     )
--> 842 encoded_args = (encoder(arg) for arg, encoder in zip(args, encoders))
    843 encoded_args = sum(encoded_args, [])
    844 encoded_result = c_callable(*encoded_args)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:400, in input_tensor_to_ctypes.<locals>.encoder(arg)
    395 if not len(dimensions) == len(arg.shape):
    396     raise ValueError(
    397         f"{repr(arg)} is expected to have rank {len(dimensions)} but has rank {len(arg.shape)}."
    398     )
--> 400 validate_arg_shape(arg)
    402 encoded_args = [arg, arg, 0]
    403 encoded_args += list(arg.shape)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:310, in input_tensor_to_ctypes.<locals>.validate_arg_shape(arg)
    305     expected_dim_size = dimensions[dim_index]
    306     if (
    307         expected_dim_size is not None
    308         and arg.shape[dim_index] != expected_dim_size
    309     ):
--> 310         raise ValueError(
    311             f"{repr(arg)} is expected to have size {expected_dim_size} in the "
    312             f"{dim_index}th dimension but has size {arg.shape[dim_index]}."
    313         )
    314 return

ValueError: array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.],
       [10., 11., 12., 13., 14.],
       [15., 16., 17., 18., 19.]], dtype=float32) is expected to have size 2 in the 0th dimension but has size 4.

While it’s nice that the JIT engine is able to detect that there’s a size mismatch, it’d be nicer to have a function that can add two tensors of arbitrary size.

We’ll now show how to create such a function for matrix of 32-bit integers.

mlir_text = """
#trait_add = {
 indexing_maps = [
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>,
   affine_map<(i, j) -> (i, j)>
 ],
 iterator_types = ["parallel", "parallel"]
}

func @matrix_add_i32(%arga: tensor<?x?xi32>, %argb: tensor<?x?xi32>) -> tensor<?x?xi32> {
  // Find the max dimensions of both args
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %arga_dim0 = tensor.dim %arga, %c0 : tensor<?x?xi32>
  %arga_dim1 = tensor.dim %arga, %c1 : tensor<?x?xi32>
  %argb_dim0 = tensor.dim %argb, %c0 : tensor<?x?xi32>
  %argb_dim1 = tensor.dim %argb, %c1 : tensor<?x?xi32>
  %dim0_gt = arith.cmpi "ugt", %arga_dim0, %argb_dim0 : index
  %dim1_gt = arith.cmpi "ugt", %arga_dim1, %argb_dim1 : index
  %output_dim0 = arith.select %dim0_gt, %arga_dim0, %argb_dim0 : index
  %output_dim1 = arith.select %dim1_gt, %arga_dim1, %argb_dim1 : index
  %output_tensor = linalg.init_tensor [%output_dim0, %output_dim1] : tensor<?x?xi32>

  // Perform addition
  %answer = linalg.generic #trait_add
    ins(%arga, %argb: tensor<?x?xi32>, tensor<?x?xi32>)
    outs(%output_tensor: tensor<?x?xi32>) {
      ^bb(%a: i32, %b: i32, %s: i32):
        %sum = arith.addi %a, %b : i32
        linalg.yield %sum : i32
    } -> tensor<?x?xi32>
 return %answer : tensor<?x?xi32>
}
"""

The compilation of this MLIR code will be the same as our first example. The main difference is in how we wrote our MLIR code (notice the use of “?x?” when denoting the shapes of tensors).

# compile
engine.add(mlir_text, passes)
matrix_add_i32 = engine.matrix_add_i32

# generate inputs
a = np.arange(20, dtype=np.int32).reshape([4, 5])
b = np.full([4, 5], 100, dtype=np.int32)

# generate output
result = matrix_add_i32(a, b)

result

array([[100, 101, 102, 103, 104],
       [105, 106, 107, 108, 109],
       [110, 111, 112, 113, 114],
       [115, 116, 117, 118, 119]], dtype=int32)

assert np.all(result == np.add(a, b))

Note that we get some level of safety regarding the tensor types as we get an exception if we pass in tensors with the wrong dtype.

matrix_add_i32(a, b.astype(np.int64))

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [13], in <cell line: 1>()
----> 1 matrix_add_i32(a, b.astype(np.int64))

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:843, in MlirJitEngine._generate_zero_or_single_valued_functions.<locals>.python_callable(mlir_function, encoders, c_callable, decoder, *args)
    839     raise ValueError(
    840         f"{name} expected {len(mlir_function.args)} args but got {len(args)}."
    841     )
    842 encoded_args = (encoder(arg) for arg, encoder in zip(args, encoders))
--> 843 encoded_args = sum(encoded_args, [])
    844 encoded_result = c_callable(*encoded_args)
    845 result = decoder(encoded_result)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:842, in <genexpr>(.0)
    838 if len(args) != len(mlir_function.args):
    839     raise ValueError(
    840         f"{name} expected {len(mlir_function.args)} args but got {len(args)}."
    841     )
--> 842 encoded_args = (encoder(arg) for arg, encoder in zip(args, encoders))
    843 encoded_args = sum(encoded_args, [])
    844 encoded_result = c_callable(*encoded_args)

File ~/code/mlir-graphblas/mlir_graphblas/engine.py:392, in input_tensor_to_ctypes.<locals>.encoder(arg)
    388     raise TypeError(
    389         f"{repr(arg)} is expected to be an instance of {np.ndarray.__qualname__}"
    390     )
    391 if not arg.dtype == element_np_type:
--> 392     raise TypeError(
    393         f"{repr(arg)} is expected to have dtype {element_np_type}"
    394     )
    395 if not len(dimensions) == len(arg.shape):
    396     raise ValueError(
    397         f"{repr(arg)} is expected to have rank {len(dimensions)} but has rank {len(arg.shape)}."
    398     )

TypeError: array([[100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100],
       [100, 100, 100, 100, 100]]) is expected to have dtype <class 'numpy.int32'>

Note that in the MLIR code, each of our output tensor’s dimensions is the max of each dimension of our inputs.

A consequence of this is that our function doesn’t enforce that our inputs are the same shape.

# generate differently shaped inputs
a = np.arange(6, dtype=np.int32).reshape([2, 3])
b = np.full([4, 5], 100, dtype=np.int32)

# generate output
result = matrix_add_i32(a, b)

result.shape

(4, 5)

result

array([[       100,        101,        102, -536870912,          7],
       [       103,        104,        105,          0,         48],
       [1852990827,  808348773,  862337379,  758342450, 1667588407],
       [ 879047725,  809053497, 1680696121, 1650798691,  878994488]],
      dtype=int32)

This result is somewhat unexpected. The weird numbers we see (the zeros and large numbers) are come from the garbage/uninitialized values in the memory for our output (i.e. %output_memref).

This is an implementation problem with how we wrote our MLIR code as there’s no enforcement of the need for both inputs to be the same shape. Special care must be taken when dealing with arbitrary sized tensors or else we might get bugs or unexpected results as shown here.

JIT Engine: Scalar + Scalar JIT Engine: Matrix + Broadcasted Vector