This page was generated from tools/engine/matrix_plus_broadcasted_vector.ipynb.

JIT Engine: Matrix + Broadcasted Vector

This example will go over how to compile MLIR code intended to add a matrix to a vector using broadcasting.

In other words, we want to write a function equivalent to this NumPy code.


import numpy as np

matrix = np.full([4,3], 100, dtype=np.float32)
vector = np.arange(4, dtype=np.float32)

matrix + np.expand_dims(vector, 1)

array([[100., 100., 100.],
       [101., 101., 101.],
       [102., 102., 102.],
       [103., 103., 103.]], dtype=float32)

Let’s first import some necessary modules and generate an instance of our JIT engine.


import mlir_graphblas

engine = mlir_graphblas.MlirJitEngine()
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

Here’s the MLIR code we’ll use.


mlir_text = """
#trait_matplusvec = {
  indexing_maps = [
    affine_map<(i,j) -> (i,j)>,
    affine_map<(i,j) -> (i)>,
    affine_map<(i,j) -> (i,j)>
  ],
  iterator_types = ["parallel", "parallel"]
}

func @mat_plus_vec(%arga: tensor<10x?xf32>, %argb: tensor<10xf32>) -> tensor<10x?xf32> {
  %c1 = arith.constant 1 : index
  %arga_dim1 = tensor.dim %arga, %c1 : tensor<10x?xf32>
  %output_tensor = linalg.init_tensor [10, %arga_dim1] : tensor<10x?xf32>
  %answer = linalg.generic #trait_matplusvec
      ins(%arga, %argb : tensor<10x?xf32>, tensor<10xf32>)
      outs(%output_tensor: tensor<10x?xf32>) {
        ^bb(%a: f32, %b: f32, %x: f32):
          %sum = arith.addf %a, %b : f32
          linalg.yield %sum : f32
  } -> tensor<10x?xf32>
  return %answer : tensor<10x?xf32>
}
"""

Note that the input matrix has an arbitrary number of columns.

These are the passes we’ll use to optimize and compile our MLIR code.


passes = [
    "--graphblas-structuralize",
    "--graphblas-optimize",
    "--graphblas-lower",
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--arith-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-scf-to-cf",
    "--convert-memref-to-llvm",
    "--convert-math-to-llvm",
    "--convert-openmp-to-llvm",
    "--convert-arith-to-llvm",
    "--convert-math-to-llvm",
    "--convert-std-to-llvm",
    "--reconcile-unrealized-casts"
]

Let’s compile our MLIR code using our JIT engine.


engine.add(mlir_text, passes)
mat_plus_vec = engine.mat_plus_vec

Let’s see how well our function works.


# generate inputs
m = np.arange(40, dtype=np.float32).reshape([10,4])
v = np.arange(10, dtype=np.float32) / 10

# generate output
result = mat_plus_vec(m, v)

result

array([[ 0. ,  1. ,  2. ,  3. ],
       [ 4.1,  5.1,  6.1,  7.1],
       [ 8.2,  9.2, 10.2, 11.2],
       [12.3, 13.3, 14.3, 15.3],
       [16.4, 17.4, 18.4, 19.4],
       [20.5, 21.5, 22.5, 23.5],
       [24.6, 25.6, 26.6, 27.6],
       [28.7, 29.7, 30.7, 31.7],
       [32.8, 33.8, 34.8, 35.8],
       [36.9, 37.9, 38.9, 39.9]], dtype=float32)

Let’s verify that our result is correct.


np.all(result == m + np.expand_dims(v, 1))

True