GraphBLAS Passes and graphblas-opt¶
The mlir-graphblas package includes an drop-in replacement for mlir-opt
called graphblas-opt which includes the custom graphblas dialect along
with all the other dialects that ship with MLIR.
This document is not intended to be a complete tutorial on graphblas-opt and acts more
as a reference manual for the passes exclusively available for graphblas-opt. For tutorials
on these passes specific to graphblas-opt, see our GraphBLAS Dialect Tutorials.
The three passes specific to graphblas-opt are:
--graphblas-structuralize: Lowers higher-levelgraphblasops into lower-level genericgraphblasops to enable the--graphblas-optimizepass to perform optimizations such as op fusion.--graphblas-optimize: Fuses lower-level genericgraphblasops together to eliminate temporary tensors and redundant loops.--graphblas-lower: Convertgraphblasops to thescf,std, andmemrefdialects.
The --graphblas-structuralize and --graphblas-optimize passes are optional
and unoptimized code can be generated and executed with --graphblas-lower.
Also, --graphblas-optimize can be used without uses of --graphblas-structuralize,
but in these sorts of situations, --graphblas-optimize may not be able to find all
the available optimizations as it only transforms lower-level generic graphblas ops
(all of which may not yet be present without uses of --graphblas-structuralize).
It’s best practice to always use --graphblas-structuralize prior to uses of
--graphblas-optimize.
--graphblas-structuralize Pass¶
The structuralization pass performs three transformations:
Transform
graphblas.matrix_multiplyops into equivalentgraphblas.matrix_multiply_genericops.Transform
graphblas.applyops into equivalentgraphblas.apply_genericops.Transform
graphblas.reduce_to_scalarops into equivalentgraphblas.reduce_to_scalar_genericops.
--graphblas-optimize Pass¶
The optimization pass performs three transformations:
Combine
graphblas.matrix_selectops on the same input sparse tensor into a single loop with multiple output tensors.Combine
graphblas.matrix_multiplyfollowed bygraphblas.applyinto a singlegraphblas.matrix_multiplywith an additional scalar transformation of the output elements as they are written to memory.Combine
graphblas.matrix_multiplyfollowed bygraphblas.matrix_reduce_to_scalarinto a newgraphblas.matrix_multiply_reduce_to_scalarop that does the sparse multiply and accumulate in a single pass (rather than two passes forgraphblas.matrix_multiply).
--graphblas-lower Pass¶
The lowering pass is intended to allow the graphblas ops to be executed on
a CPU target. Although the lowering assumes the same C++ structure as is used in
the existing sparse tensor related passes in MLIR, --graphblas-lower does
not use the linalg dialect, as there is not yet support for sparse output.
The graphblas lowering pass requires a small runtime based on the MLIR
sparse runtime, but with some additional functions that allow for the
allocation, resizing, and deallocation of the sparse tensor data structure.
In general, this pass uses scf ops to handle traversing the sparse data
structure, with scf.parallel used when possible to indicate loops where the
loop body iterations are independent of each other, or are performing a
well-defined reduction. Note that scf.parallel does not automatically
enable parallel, multithreaded execution without the use of additional passes.
Unfortunately, the --convert-scf-to-openmp pass cannot handle the code
--graphblas-lower generates as OpenMP reductions are not yet supported.