This page was generated from dialect/graphblas_dialect_tutorials/graphblas_lower/python_utilities.ipynb.

Python Utilities for MLIR’s Sparse Tensors

Before going into actual examples, we’ll first go over some useful utilities for working with MLIR’s sparse tensors in Python.

Let’s first import them.


import mlir_graphblas
from mlir_graphblas.sparse_utils import MLIRSparseTensor
from mlir_graphblas.cli import GRAPHBLAS_OPT_EXE
from mlir_graphblas.tools import tersify_mlir
from mlir_graphblas.tools.utils import sparsify_array

import tempfile
import numpy as np
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

The first useful thing to note is that GRAPHBLAS_OPT_EXE from mlir_graphblas.cli holds the location of the locally used graphblas-opt.

Overview of tersify_mlir

When MLIR code is passed through graphblas-opt or mlir-opt, it can often become more verbose or difficult to read. This is true when using sparse tensors due to sparse tensor encodings.

For example, this code is fairly easy to read.


mlir_text = """
#CSR64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (i,j)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
  dimLevelType = [ "dense", "compressed" ],
  dimOrdering = affine_map<(i,j) -> (j,i)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @mat_mul(%argA: tensor<?x?xf64, #CSR64>, %argB: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %answer = graphblas.matrix_multiply %argA, %argB { semiring = "plus_times" } : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %answer : tensor<?x?xf64, #CSR64>
}
"""

However, when passing it through graphblas-opt or mlir-opt with no passes (which will produce behaviorally identical code), the aliases for the sparse tensor encodings get expanded and results in very verbose code.


with tempfile.NamedTemporaryFile() as temp:
    temp_file_name = temp.name
    with open(temp_file_name, 'w') as f:
        f.write(mlir_text)
    temp.flush()

    verbose_mlir = ! cat $temp_file_name | $GRAPHBLAS_OPT_EXE
    verbose_mlir = "\n".join(verbose_mlir)

print(verbose_mlir)
module {
  func @mat_mul(%arg0: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, %arg1: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) -> tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) to tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>
    return %0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d0, d1)>, pointerBitWidth = 64, indexBitWidth = 64 }>>
  }
}

We can make this resulting code less verbose and more readable using tersify_mlir from mlir_graphblas.tools.


print(tersify_mlir(verbose_mlir))
#CSR64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

module {
  func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %0 : tensor<?x?xf64, #CSR64>
  }
}


tersify_mlir mostly moves sparse tensor encodings commonly used in the GraphBLAS dialect (i.e. the CSR, CSC, and compressed vector encodings) to aliases.

tersify_mlir is also available as a tool to be used at the command line.


with tempfile.NamedTemporaryFile() as temp:
    temp_file_name = temp.name
    with open(temp_file_name, 'w') as f:
        f.write(verbose_mlir)
    temp.flush()

    terse_mlir_via_command_line = ! cat $temp_file_name | tersify_mlir 2> /dev/null
    terse_mlir_via_command_line = "\n".join(terse_mlir_via_command_line)

print(terse_mlir_via_command_line)
#CSR64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d0, d1)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

#CSC64 = #sparse_tensor.encoding<{
    dimLevelType = [ "dense", "compressed" ],
    dimOrdering = affine_map<(d0, d1) -> (d1, d0)>,
    pointerBitWidth = 64,
    indexBitWidth = 64
}>

module {
  func @mat_mul(%arg0: tensor<?x?xf64, #CSR64>, %arg1: tensor<?x?xf64, #CSC64>) -> tensor<?x?xf64, #CSR64> {
    %0 = graphblas.matrix_multiply %arg0, %arg1 {semiring = "plus_times"} : (tensor<?x?xf64, #CSR64>, tensor<?x?xf64, #CSC64>) to tensor<?x?xf64, #CSR64>
    return %0 : tensor<?x?xf64, #CSR64>
  }
}


Overview of sparsify_array

Very often when debugging or testing, it is useful to convert a dense tensor represented as an array in NumPy.

sparsify_array from mlir_graphblas.tools.utils let’s us do that.

Let’s say we wanted to convert this vector into a MLIRSparseTensor.


dense_vector = np.array([0, 0, 12, 0, 0, 34, 0, 0], dtype=np.int32)
dense_vector

array([ 0,  0, 12,  0,  0, 34,  0,  0], dtype=int32)

We would normally have to explicitly pass in the indices, values, shape, etc. into the constructor for MLIRSparseTensor as shown below.


indices = np.array([2, 5], dtype=np.uint64)
values = np.array([12, 34], dtype=np.int32)
sizes = np.array([8], dtype=np.uint64)
sparsity = np.array([True], dtype=np.bool8)
explicitly_generated_sparse_vector = MLIRSparseTensor(indices, values, sizes, sparsity)

explicitly_generated_sparse_vector

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d670d860>

explicitly_generated_sparse_vector.shape

(8,)

explicitly_generated_sparse_vector.pointers

(array([0, 2], dtype=uint64),)

explicitly_generated_sparse_vector.indices

(array([2, 5], dtype=uint64),)

explicitly_generated_sparse_vector.values

array([12, 34], dtype=int32)

We can avoid writing such verbose code using sparsify_array. We only need to pass in the desired sparsity for each dimension.


sparse_vector = sparsify_array(dense_vector, [True])

sparse_vector

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d740d220>

sparse_vector.shape

(8,)

sparse_vector.pointers

(array([0, 2], dtype=uint64),)

sparse_vector.indices

(array([2, 5], dtype=uint64),)

sparse_vector.values

array([12, 34], dtype=int32)

The default missing value is 0, but another value can be specified if needed.

sparse_vector = sparsify_array(dense_vector, [True], missing=12)

We’ll show examples of how to use sparsify_array with matrices below. Note that sparsify_array works with any ranked tensor (not just vectors and matrices) as long as the appropriate sparsity values are provided.

Overview of MLIRSparseTensor.toarray

Very often when debugging or testing, it is useful to be able to convert a MLIRSparseTensor into a dense tensor represented as an array in NumPy.

MLIRSparseTensor.toarray allow us to do this. This method will treat missing values as zeros. It’s worth noting that this isn’t necessarily the correct behavior for all applications, so it’s always worth sanity checking what the assumed value is for the missing values.

Let’s first convert the sparse vectors we created above into dense numpy vectors.


sparse_vector.toarray()

array([ 0,  0, 12,  0,  0, 34,  0,  0], dtype=int32)

We can also convert CSR and CSC matrices into NumPy matrices.

Let’s first create a CSR matrix via sparsify_array.


dense_matrix = np.array(
    [
        [1, 0, 0, 0, 0],
        [0, 2, 3, 0, 0],
        [0, 0, 4, 0, 0],
        [0, 0, 5, 6, 0],
        [0, 0, 0, 0, 0],
    ],
    dtype=np.float64,
)
csr_matrix = sparsify_array(dense_matrix, [False, True])

csr_matrix

<mlir_graphblas.sparse_utils.MLIRSparseTensor at 0x7f86d7538bd0>

csr_matrix.shape

(5, 5)

csr_matrix.pointers

(array([], dtype=uint64), array([0, 1, 3, 4, 6, 6], dtype=uint64))

csr_matrix.indices

(array([], dtype=uint64), array([0, 1, 2, 2, 2, 3], dtype=uint64))

csr_matrix.values

array([1., 2., 3., 4., 5., 6.])

Let’s now create a dense matrix from this CSR matrix.


round_trip_dense_matrix = csr_matrix.toarray()
round_trip_dense_matrix

array([[1., 0., 0., 0., 0.],
       [0., 2., 3., 0., 0.],
       [0., 0., 4., 0., 0.],
       [0., 0., 5., 6., 0.],
       [0., 0., 0., 0., 0.]])

round_trip_dense_matrix.dtype

dtype('float64')

As with sparsify_array, the missing value can be set to something other than the default 0


csr_matrix.toarray(missing=-99)

array([[  1., -99., -99., -99., -99.],
       [-99.,   2.,   3., -99., -99.],
       [-99., -99.,   4., -99., -99.],
       [-99., -99.,   5.,   6., -99.],
       [-99., -99., -99., -99., -99.]])