This page was generated from dialect/graphblas_dialect_tutorials/graphblas_optimize/fuse_multiply_apply.ipynb.
Fusing graphblas.matrix_multiply with graphblas.apply¶
This example will go over how to use the --graphblas-structuralize
and --graphblas-optimize
passes from graphblas-opt
to fuse graphblas.matrix_multiply
ops with graphblas.apply
ops into graphblas.matrix_multiply_generic
ops.
Let’s first import some necessary libraries.
import tempfile
from mlir_graphblas.cli import GRAPHBLAS_OPT_EXE
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt
Since sparse tensor encodings can be very verbose in MLIR, let’s import some helpers to make the MLIR code more readable.
from mlir_graphblas.tools import tersify_mlir
Fusion Details¶
Recall that graphblas.matrix_multiply
ops can lower into graphblas.matrix_multiply_generic
ops, which take blocks that specify exact behavior at several points during the matrix multiply. One of those blocks is a “transform_out” block.
Since graphblas.apply
ops only change tensors in an element-wise fashion, we can perform these element-wise changes in the “transform_out” block of a graphblas.matrix_multiply_generic
op if the graphblas.apply
op is run on the result of a graphblas.matrix_multiply
op.
Simple Fusion¶
Here, we’ll show the simplest example of how we can fuse a graphblas.matrix_multiply
op with a graphblas.apply
op.
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (i,j)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (j,i)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
func @fuse_adjacent(%A: tensor<?x?xf64, #CSR64>, %B: tensor<?x?xf64, #CSC64>, %thunk: f64) -> tensor<?x?xf64, #CSR64> {
%C = graphblas.matrix_multiply %A, %B { semiring = "plus_plus" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
%apply_result = graphblas.apply %C, %thunk { apply_operator = "min" } : (tensor<?x?xf64, #CSR64>, f64) to tensor<?x?xf64, #CSR64>
return %apply_result : tensor<?x?xf64, #CSR64>
}
"""
with tempfile.NamedTemporaryFile() as temp:
temp_file_name = temp.name
with open(temp_file_name, 'w') as f:
f.write(mlir_text)
temp.flush()
output_mlir = ! cat $temp_file_name | $GRAPHBLAS_OPT_EXE --graphblas-structuralize --graphblas-optimize
output_mlir = "\n".join(output_mlir)
output_mlir = tersify_mlir(output_mlir)
print(output_mlir)
#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
module {
func @fuse_adjacent(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>, %arg2: f64) -> tensor<?x?xf64, #CSR64> {
%cst = arith.constant 0.000000e+00 : f64
%0 = graphblas.matrix_multiply_generic %arg0, %arg1 {mask_complement = false} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64> {
graphblas.yield add_identity %cst : f64
}, {
^bb0(%arg3: f64, %arg4: f64):
%1 = arith.addf %arg3, %arg4 : f64
graphblas.yield add %1 : f64
}, {
^bb0(%arg3: f64, %arg4: f64):
%1 = arith.addf %arg3, %arg4 : f64
graphblas.yield mult %1 : f64
}, {
^bb0(%arg3: f64):
%1 = arith.cmpf olt, %arg3, %arg2 : f64
%2 = arith.select %1, %arg3, %arg2 : f64
graphblas.yield transform_out %2 : f64
}
return %0 : tensor<?x?xf64, #CSR64>
}
}
Note how this function now only has one op from the GraphBLAS dialect. Notice how this one op, i.e. the graphblas.matrix_multiply_generic
, has a “transform_out” block that performs the exact behavior specified by the graphblas.apply
op in the original code.
It’s noteworthy that this fusion also works if the graphblas.matrix_multiply
use takes a mask. Rather than explicitly demonstrating this, we’ll leave it as an exercise for the reader as it’s a fairly straightforward.
If the intermediate result from the graphblas.matrix_multiply
op is used in other places outside of the graphblas.apply
op, this fusion cannot apply.