This page was generated from dialect/graphblas_dialect_tutorials/graphblas_lower/python_utilities.ipynb.
Python Utilities for MLIR’s Sparse Tensors¶
Before going into actual examples, we’ll first go over some useful utilities for working with MLIR’s sparse tensors in Python.
Let’s first import them.
import mlir_graphblas
from mlir_graphblas.sparse_utils import MLIRSparseTensor
from mlir_graphblas.cli import GRAPHBLAS_OPT_EXE
from mlir_graphblas.tools import tersify_mlir
from mlir_graphblas.tools.utils import sparsify_array
import tempfile
import numpy as np
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt
The first useful thing to note is that GRAPHBLAS_OPT_EXE from mlir_graphblas.cli holds the location of the locally used graphblas-opt.
Overview of tersify_mlir¶
When MLIR code is passed through graphblas-opt or mlir-opt, it can often become more verbose or difficult to read. This is true when using sparse tensors due to sparse tensor encodings.
For example, this code is fairly easy to read.
mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (i,j)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(i,j) -> (j,i)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
func @mat_mul(%argA: tensor<?x?xf64, #CSR64>, %argB: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
%answer = graphblas.matrix_multiply %argA, %argB { semiring = "plus_times" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
return %answer : tensor<?x?xf64, #CSR64>
}
"""
However, when passing it through graphblas-opt or mlir-opt with no passes (which will produce behaviorally identical code), the aliases for the sparse tensor encodings get expanded and results in very verbose code.
with tempfile.NamedTemporaryFile() as temp:
temp_file_name = temp.name
with open(temp_file_name, 'w') as f:
f.write(mlir_text)
temp.flush()
verbose_mlir = ! cat $temp_file_name | $GRAPHBLAS_OPT_EXE
verbose_mlir = "\n".join(verbose_mlir)
print(verbose_mlir)
module {
func @mat_mul(%arg0: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, %arg1: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) -> tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>> {
%0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) to tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>
return %0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>
}
}
We can make this resulting code less verbose and more readable using tersify_mlir from mlir_graphblas.tools.
print(tersify_mlir(verbose_mlir))
#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
module {
func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
%0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
return %0 : tensor<?x?xf64, #CSR64>
}
}
tersify_mlir mostly moves sparse tensor encodings commonly used in the GraphBLAS dialect (i.e. the CSR, CSC, and compressed vector encodings) to aliases.
tersify_mlir is also available as a tool to be used at the command line.
with tempfile.NamedTemporaryFile() as temp:
temp_file_name = temp.name
with open(temp_file_name, 'w') as f:
f.write(verbose_mlir)
temp.flush()
terse_mlir_via_command_line = ! cat $temp_file_name | tersify_mlir 2> /dev/null
terse_mlir_via_command_line = "\n".join(terse_mlir_via_command_line)
print(terse_mlir_via_command_line)
#CSR64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
#CSC64 = #sparse_tensor.encoding<{
dimLevelType = [ "dense", "compressed" ],
dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
pointerBitWidth = 64,
indexBitWidth = 64
}>
module {
func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
%0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
return %0 : tensor<?x?xf64, #CSR64>
}
}
Overview of sparsify_array¶
Very often when debugging or testing, it is useful to convert a dense tensor represented as an array in NumPy.
sparsify_array from mlir_graphblas.tools.utils let’s us do that.
Let’s say we wanted to convert this vector into a MLIRSparseTensor.
dense_vector = np.array([0, 0, 12, 0, 0, 34, 0, 0], dtype=np.int32)
dense_vector
array([ 0, 0, 12, 0, 0, 34, 0, 0], dtype=int32)
We would normally have to explicitly pass in the indices, values, shape, etc. into the constructor for MLIRSparseTensor as shown below.
indices = np.array([2, 5], dtype=np.uint64)
values = np.array([12, 34], dtype=np.int32)
sizes = np.array([8], dtype=np.uint64)
sparsity = np.array([True], dtype=np.bool8)
explicitly_generated_sparse_vector = MLIRSparseTensor(indices, values, sizes, sparsity)
explicitly_generated_sparse_vector
<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d670d860>
explicitly_generated_sparse_vector.shape
(8,)
explicitly_generated_sparse_vector.pointers
(array([0, 2], dtype=uint64),)
explicitly_generated_sparse_vector.indices
(array([2, 5], dtype=uint64),)
explicitly_generated_sparse_vector.values
array([12, 34], dtype=int32)
We can avoid writing such verbose code using sparsify_array. We only need to pass in the desired sparsity for each dimension.
sparse_vector = sparsify_array(dense_vector, [True])
sparse_vector
<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d740d220>
sparse_vector.shape
(8,)
sparse_vector.pointers
(array([0, 2], dtype=uint64),)
sparse_vector.indices
(array([2, 5], dtype=uint64),)
sparse_vector.values
array([12, 34], dtype=int32)
The default missing value is 0, but another value can be specified if needed.
sparse_vector = sparsify_array(dense_vector, [True], missing=12)
We’ll show examples of how to use sparsify_array with matrices below. Note that sparsify_array works with any ranked tensor (not just vectors and matrices) as long as the appropriate sparsity values are provided.
Overview of MLIRSparseTensor.toarray¶
Very often when debugging or testing, it is useful to be able to convert a MLIRSparseTensor into a dense tensor represented as an array in NumPy.
MLIRSparseTensor.toarray allow us to do this. This method will treat missing values as zeros. It’s worth noting that this isn’t necessarily the correct behavior for all applications, so it’s always worth sanity checking what the assumed value is for the missing values.
Let’s first convert the sparse vectors we created above into dense numpy vectors.
sparse_vector.toarray()
array([ 0, 0, 12, 0, 0, 34, 0, 0], dtype=int32)
We can also convert CSR and CSC matrices into NumPy matrices.
Let’s first create a CSR matrix via sparsify_array.
dense_matrix = np.array(
[
[1, 0, 0, 0, 0],
[0, 2, 3, 0, 0],
[0, 0, 4, 0, 0],
[0, 0, 5, 6, 0],
[0, 0, 0, 0, 0],
],
dtype=np.float64,
)
csr_matrix = sparsify_array(dense_matrix, [False, True])
csr_matrix
<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d7538bd0>
csr_matrix.shape
(5, 5)
csr_matrix.pointers
(array([], dtype=uint64), array([0, 1, 3, 4, 6, 6], dtype=uint64))
csr_matrix.indices
(array([], dtype=uint64), array([0, 1, 2, 2, 2, 3], dtype=uint64))
csr_matrix.values
array([1., 2., 3., 4., 5., 6.])
Let’s now create a dense matrix from this CSR matrix.
round_trip_dense_matrix = csr_matrix.toarray()
round_trip_dense_matrix
array([[1., 0., 0., 0., 0.],
[0., 2., 3., 0., 0.],
[0., 0., 4., 0., 0.],
[0., 0., 5., 6., 0.],
[0., 0., 0., 0., 0.]])
round_trip_dense_matrix.dtype
dtype('float64')
As with sparsify_array, the missing value can be set to something other than the default 0
csr_matrix.toarray(missing=-99)
array([[ 1., -99., -99., -99., -99.],
[-99., 2., 3., -99., -99.],
[-99., -99., 4., -99., -99.],
[-99., -99., 5., 6., -99.],
[-99., -99., -99., -99., -99.]])