This page was generated from tools/cli/using_debugresult.ipynb.

Using DebugResult

Here, we will show how to use DebugResult to debug some problems we might encounter when using our mlir-opt CLI Wrapper.

Let’s first import some necessary classes and generate an instance of our mlir-opt CLI Wrapper.


from mlir_graphblas import MlirOptCli

cli = MlirOptCli(executable=None, options=None)
Using development graphblas-opt: /Users/pnguyen/code/mlir-graphblas/mlir_graphblas/src/build/bin/graphblas-opt

Generate Example Input

Let’s say we have a bunch of MLIR code that we’re not familiar with.


mlir_string = """
#trait_sum_reduction = {
  indexing_maps = [
    affine_map<(i,j,k) -> (i,j,k)>,  // A
    affine_map<(i,j,k) -> ()>        // x (scalar out)
  ],
  iterator_types = ["reduction", "reduction", "reduction"],
  doc = "x += SUM_ijk A(i,j,k)"
}

#sparseTensor = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed", "compressed" ],
  dimOrdering = affine_map<(i,j,k) -> (i,j,k)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @func_f32(%argA: tensor<10x20x30xf32, #sparseTensor>) -> f32 {
  %out_tensor = linalg.init_tensor [] : tensor<f32>
  %reduction = linalg.generic #trait_sum_reduction
     ins(%argA: tensor<10x20x30xf32, #sparseTensor>)
    outs(%out_tensor: tensor<f32>) {
      ^bb(%a: f32, %x: f32):
        %0 = arith.addf %x, %a : f32
        linalg.yield %0 : f32
  } -> tensor<f32>
  %answer = tensor.extract %reduction[] : tensor<f32>
  return %answer : f32
}
"""
mlir_bytes = mlir_string.encode()

Since we’re not familiar with this code, we don’t exactly know what passes are necessary or in what order they should go in.

Let’s say that this is the first set of passes we try.


passes = [
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--arith-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--finalizing-bufferize",
    "--convert-linalg-to-loops",
    "--convert-vector-to-llvm",
    "--convert-math-to-llvm",
    "--convert-math-to-libm",
    "--convert-memref-to-llvm",
    "--convert-openmp-to-llvm",
    "--convert-arith-to-llvm",
    "--convert-std-to-llvm",
    "--reconcile-unrealized-casts"
]

Let’s see what results we get.


result = cli.apply_passes(mlir_bytes, passes)
[stderr] <stdin>:20:16: error: failed to legalize operation 'builtin.unrealized_conversion_cast' that was explicitly marked illegal
[stderr]   %reduction = linalg.generic #trait_sum_reduction
[stderr]                ^
[stderr] <stdin>:20:16: note: see current operation: %4 = "builtin.unrealized_conversion_cast"(%3) : (i64) -> index
---------------------------------------------------------------------------
MlirOptError                              Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 result = cli.apply_passes(mlir_bytes, passes)

File ~/code/mlir-graphblas/mlir_graphblas/cli.py:93, in MlirOptCli.apply_passes(self, file, passes)
     91         input = self._read_input(fp)
     92 err.debug_result = self.debug_passes(input, passes) if passes else None
---> 93 raise err

MlirOptError: <stdin>:20:16: error: failed to legalize operation 'builtin.unrealized_conversion_cast' that was explicitly marked illegal
  %reduction = linalg.generic #trait_sum_reduction
               ^

We get an exception.

Unfortunately, the exception message isn’t very clear as it only gives us the immediate error message but doesn’t inform us of the context in which it occurred, e.g. in which pass the error occurred (if any) or if any necessary passes are missing.

We only know that the operation builtin.unrealized_conversion_cast shows up somewhere and that it’s a problem.

Let’s try to use the debug_passes method instead of the apply_passes to get more information.


result = cli.debug_passes(mlir_bytes, passes)

result

=================================================
  Error when running reconcile-unrealized-casts
=================================================
<stdin>:24:10: error: failed to legalize operation 'builtin.unrealized_conversion_cast' that was explicitly marked illegal
    %4 = builtin.unrealized_conversion_cast %3 : i64 to index
         ^
<stdin>:24:10: note: see current operation: %4 = "builtin.unrealized_conversion_cast"(%3) : (i64) -> index loc("<stdin>":24:10)


=======================================
  Input to reconcile-unrealized-casts
=======================================
             10        20        30        40        50        60        70        80        90        100       110       120       130       140       150       160       170       180       190       200
    12345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123
    -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1|module attributes {llvm.data_layout = ""} {
  2|  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  3|  llvm.func @sparseValuesF32(%arg0: !llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
  4|    %0 = llvm.mlir.constant(1 : index) : i64
  5|    %1 = llvm.alloca %0 x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
  6|    llvm.call @_mlir_ciface_sparseValuesF32(%1, %arg0) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) -> ()
  7|    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
  8|    llvm.return %2 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
  9|  }
 10|  llvm.func @_mlir_ciface_sparseValuesF32(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) attributes {llvm.emit_c_interface, sym_visibility = "private"}
 11|  llvm.func @sparsePointers64(%arg0: !llvm.ptr<i8>, %arg1: i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
 12|    %0 = llvm.mlir.constant(1 : index) : i64
 13|    %1 = llvm.alloca %0 x !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
 14|    llvm.call @_mlir_ciface_sparsePointers64(%1, %arg0, %arg1) : (!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) -> ()
 15|    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
 16|    llvm.return %2 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 17|  }
 18|  llvm.func @_mlir_ciface_sparsePointers64(!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) attributes {llvm.emit_c_interface, sym_visibility = "private"}
 19|  llvm.func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
 20|    %0 = llvm.mlir.constant(0 : index) : i64
 21|    %1 = builtin.unrealized_conversion_cast %0 : i64 to index
 22|    %2 = builtin.unrealized_conversion_cast %1 : index to i64
 23|    %3 = llvm.mlir.constant(1 : index) : i64
 24|    %4 = builtin.unrealized_conversion_cast %3 : i64 to index
 25|    %5 = builtin.unrealized_conversion_cast %4 : index to i64
 26|    %6 = llvm.mlir.constant(2 : index) : i64
 27|    %7 = llvm.mlir.constant(0.000000e+00 : f32) : f32
 28|    %8 = llvm.call @sparsePointers64(%arg0, %0) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 29|    %9 = builtin.unrealized_conversion_cast %8 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
 30|    %10 = builtin.unrealized_conversion_cast %9 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 31|    %11 = llvm.call @sparsePointers64(%arg0, %3) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 32|    %12 = builtin.unrealized_conversion_cast %11 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
 33|    %13 = builtin.unrealized_conversion_cast %12 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 34|    %14 = llvm.call @sparsePointers64(%arg0, %6) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 35|    %15 = builtin.unrealized_conversion_cast %14 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
 36|    %16 = builtin.unrealized_conversion_cast %15 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 37|    %17 = llvm.call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
 38|    %18 = builtin.unrealized_conversion_cast %17 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
 39|    %19 = builtin.unrealized_conversion_cast %18 : memref<?xf32> to !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
 40|    %20 = llvm.mlir.constant(1 : index) : i64
 41|    %21 = llvm.mlir.null : !llvm.ptr<f32>
 42|    %22 = llvm.getelementptr %21[%20] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
 43|    %23 = llvm.ptrtoint %22 : !llvm.ptr<f32> to i64
 44|    %24 = llvm.call @malloc(%23) : (i64) -> !llvm.ptr<i8>
 45|    %25 = llvm.bitcast %24 : !llvm.ptr<i8> to !llvm.ptr<f32>
 46|    %26 = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 47|    %27 = llvm.insertvalue %25, %26[0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 48|    %28 = llvm.insertvalue %25, %27[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 49|    %29 = llvm.mlir.constant(0 : index) : i64
 50|    %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 51|    %31 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 52|    llvm.store %7, %31 : !llvm.ptr<f32>
 53|    %32 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
 54|    %33 = llvm.load %32 : !llvm.ptr<f32>
 55|    %34 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 56|    %35 = llvm.getelementptr %34[%2] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 57|    %36 = llvm.load %35 : !llvm.ptr<i64>
 58|    %37 = builtin.unrealized_conversion_cast %36 : i64 to index
 59|    %38 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 60|    %39 = llvm.getelementptr %38[%5] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 61|    %40 = llvm.load %39 : !llvm.ptr<i64>
 62|    %41 = builtin.unrealized_conversion_cast %40 : i64 to index
 63|    %42 = scf.for %arg1 = %37 to %41 step %4 iter_args(%arg2 = %33) -> (f32) {
 64|      %46 = builtin.unrealized_conversion_cast %arg1 : index to i64
 65|      %47 = builtin.unrealized_conversion_cast %arg1 : index to i64
 66|      %48 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 67|      %49 = llvm.getelementptr %48[%47] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 68|      %50 = llvm.load %49 : !llvm.ptr<i64>
 69|      %51 = builtin.unrealized_conversion_cast %50 : i64 to index
 70|      %52 = llvm.add %46, %3  : i64
 71|      %53 = builtin.unrealized_conversion_cast %52 : i64 to index
 72|      %54 = builtin.unrealized_conversion_cast %53 : index to i64
 73|      %55 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 74|      %56 = llvm.getelementptr %55[%54] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 75|      %57 = llvm.load %56 : !llvm.ptr<i64>
 76|      %58 = builtin.unrealized_conversion_cast %57 : i64 to index
 77|      %59 = scf.for %arg3 = %51 to %58 step %4 iter_args(%arg4 = %arg2) -> (f32) {
 78|        %60 = builtin.unrealized_conversion_cast %arg3 : index to i64
 79|        %61 = builtin.unrealized_conversion_cast %arg3 : index to i64
 80|        %62 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 81|        %63 = llvm.getelementptr %62[%61] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 82|        %64 = llvm.load %63 : !llvm.ptr<i64>
 83|        %65 = builtin.unrealized_conversion_cast %64 : i64 to index
 84|        %66 = llvm.add %60, %3  : i64
 85|        %67 = builtin.unrealized_conversion_cast %66 : i64 to index
 86|        %68 = builtin.unrealized_conversion_cast %67 : index to i64
 87|        %69 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
 88|        %70 = llvm.getelementptr %69[%68] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
 89|        %71 = llvm.load %70 : !llvm.ptr<i64>
 90|        %72 = builtin.unrealized_conversion_cast %71 : i64 to index
 91|        %73 = scf.for %arg5 = %65 to %72 step %4 iter_args(%arg6 = %arg4) -> (f32) {
 92|          %74 = builtin.unrealized_conversion_cast %arg5 : index to i64
 93|          %75 = llvm.extractvalue %19[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
 94|          %76 = llvm.getelementptr %75[%74] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
 95|          %77 = llvm.load %76 : !llvm.ptr<f32>
 96|          %78 = llvm.fadd %arg6, %77  : f32
 97|          scf.yield %78 : f32
 98|        }
 99|        scf.yield %73 : f32
100|      }
101|      scf.yield %59 : f32
102|    }
103|    %43 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
104|    llvm.store %42, %43 : !llvm.ptr<f32>
105|    %44 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
106|    %45 = llvm.load %44 : !llvm.ptr<f32>
107|    llvm.return %45 : f32
108|  }
109|}
110|

================================
  Input to convert-std-to-llvm
================================
module {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @sparseValuesF32(%arg0: !llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparseValuesF32(%1, %arg0) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparseValuesF32(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @sparsePointers64(%arg0: !llvm.ptr<i8>, %arg1: i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparsePointers64(%1, %arg0, %arg1) : (!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparsePointers64(!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %0 = llvm.mlir.constant(0 : index) : i64
    %1 = builtin.unrealized_conversion_cast %0 : i64 to index
    %2 = builtin.unrealized_conversion_cast %1 : index to i64
    %3 = llvm.mlir.constant(1 : index) : i64
    %4 = builtin.unrealized_conversion_cast %3 : i64 to index
    %5 = builtin.unrealized_conversion_cast %4 : index to i64
    %6 = llvm.mlir.constant(2 : index) : i64
    %7 = llvm.mlir.constant(0.000000e+00 : f32) : f32
    %8 = llvm.call @sparsePointers64(%arg0, %0) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %9 = builtin.unrealized_conversion_cast %8 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %10 = builtin.unrealized_conversion_cast %9 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %11 = llvm.call @sparsePointers64(%arg0, %3) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %12 = builtin.unrealized_conversion_cast %11 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %13 = builtin.unrealized_conversion_cast %12 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %14 = llvm.call @sparsePointers64(%arg0, %6) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %15 = builtin.unrealized_conversion_cast %14 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %16 = builtin.unrealized_conversion_cast %15 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %17 = llvm.call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %18 = builtin.unrealized_conversion_cast %17 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
    %19 = builtin.unrealized_conversion_cast %18 : memref<?xf32> to !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %20 = llvm.mlir.constant(1 : index) : i64
    %21 = llvm.mlir.null : !llvm.ptr<f32>
    %22 = llvm.getelementptr %21[%20] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
    %23 = llvm.ptrtoint %22 : !llvm.ptr<f32> to i64
    %24 = llvm.call @malloc(%23) : (i64) -> !llvm.ptr<i8>
    %25 = llvm.bitcast %24 : !llvm.ptr<i8> to !llvm.ptr<f32>
    %26 = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %27 = llvm.insertvalue %25, %26[0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %28 = llvm.insertvalue %25, %27[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %29 = llvm.mlir.constant(0 : index) : i64
    %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %31 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %7, %31 : !llvm.ptr<f32>
    %32 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %33 = llvm.load %32 : !llvm.ptr<f32>
    %34 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %35 = llvm.getelementptr %34[%2] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %36 = llvm.load %35 : !llvm.ptr<i64>
    %37 = builtin.unrealized_conversion_cast %36 : i64 to index
    %38 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %39 = llvm.getelementptr %38[%5] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %40 = llvm.load %39 : !llvm.ptr<i64>
    %41 = builtin.unrealized_conversion_cast %40 : i64 to index
    %42 = scf.for %arg1 = %37 to %41 step %4 iter_args(%arg2 = %33) -> (f32) {
      %46 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %47 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %48 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %49 = llvm.getelementptr %48[%47] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %50 = llvm.load %49 : !llvm.ptr<i64>
      %51 = builtin.unrealized_conversion_cast %50 : i64 to index
      %52 = llvm.add %46, %3  : i64
      %53 = builtin.unrealized_conversion_cast %52 : i64 to index
      %54 = builtin.unrealized_conversion_cast %53 : index to i64
      %55 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %56 = llvm.getelementptr %55[%54] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %57 = llvm.load %56 : !llvm.ptr<i64>
      %58 = builtin.unrealized_conversion_cast %57 : i64 to index
      %59 = scf.for %arg3 = %51 to %58 step %4 iter_args(%arg4 = %arg2) -> (f32) {
        %60 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %61 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %62 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %63 = llvm.getelementptr %62[%61] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %64 = llvm.load %63 : !llvm.ptr<i64>
        %65 = builtin.unrealized_conversion_cast %64 : i64 to index
        %66 = llvm.add %60, %3  : i64
        %67 = builtin.unrealized_conversion_cast %66 : i64 to index
        %68 = builtin.unrealized_conversion_cast %67 : index to i64
        %69 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %70 = llvm.getelementptr %69[%68] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %71 = llvm.load %70 : !llvm.ptr<i64>
        %72 = builtin.unrealized_conversion_cast %71 : i64 to index
        %73 = scf.for %arg5 = %65 to %72 step %4 iter_args(%arg6 = %arg4) -> (f32) {
          %74 = builtin.unrealized_conversion_cast %arg5 : index to i64
          %75 = llvm.extractvalue %19[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
          %76 = llvm.getelementptr %75[%74] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
          %77 = llvm.load %76 : !llvm.ptr<f32>
          %78 = llvm.fadd %arg6, %77  : f32
          scf.yield %78 : f32
        }
        scf.yield %73 : f32
      }
      scf.yield %59 : f32
    }
    %43 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %42, %43 : !llvm.ptr<f32>
    %44 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %45 = llvm.load %44 : !llvm.ptr<f32>
    llvm.return %45 : f32
  }
}



==================================
  Input to convert-arith-to-llvm
==================================
module {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @sparseValuesF32(%arg0: !llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparseValuesF32(%1, %arg0) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparseValuesF32(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @sparsePointers64(%arg0: !llvm.ptr<i8>, %arg1: i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparsePointers64(%1, %arg0, %arg1) : (!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparsePointers64(!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %0 = llvm.mlir.constant(0 : index) : i64
    %1 = builtin.unrealized_conversion_cast %0 : i64 to index
    %2 = builtin.unrealized_conversion_cast %1 : index to i64
    %3 = llvm.mlir.constant(1 : index) : i64
    %4 = builtin.unrealized_conversion_cast %3 : i64 to index
    %5 = builtin.unrealized_conversion_cast %4 : index to i64
    %6 = llvm.mlir.constant(2 : index) : i64
    %7 = llvm.mlir.constant(0.000000e+00 : f32) : f32
    %8 = llvm.call @sparsePointers64(%arg0, %0) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %9 = builtin.unrealized_conversion_cast %8 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %10 = builtin.unrealized_conversion_cast %9 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %11 = llvm.call @sparsePointers64(%arg0, %3) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %12 = builtin.unrealized_conversion_cast %11 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %13 = builtin.unrealized_conversion_cast %12 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %14 = llvm.call @sparsePointers64(%arg0, %6) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %15 = builtin.unrealized_conversion_cast %14 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %16 = builtin.unrealized_conversion_cast %15 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %17 = llvm.call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %18 = builtin.unrealized_conversion_cast %17 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
    %19 = builtin.unrealized_conversion_cast %18 : memref<?xf32> to !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %20 = llvm.mlir.constant(1 : index) : i64
    %21 = llvm.mlir.null : !llvm.ptr<f32>
    %22 = llvm.getelementptr %21[%20] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
    %23 = llvm.ptrtoint %22 : !llvm.ptr<f32> to i64
    %24 = llvm.call @malloc(%23) : (i64) -> !llvm.ptr<i8>
    %25 = llvm.bitcast %24 : !llvm.ptr<i8> to !llvm.ptr<f32>
    %26 = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %27 = llvm.insertvalue %25, %26[0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %28 = llvm.insertvalue %25, %27[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %29 = llvm.mlir.constant(0 : index) : i64
    %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %31 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %7, %31 : !llvm.ptr<f32>
    %32 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %33 = llvm.load %32 : !llvm.ptr<f32>
    %34 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %35 = llvm.getelementptr %34[%2] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %36 = llvm.load %35 : !llvm.ptr<i64>
    %37 = builtin.unrealized_conversion_cast %36 : i64 to index
    %38 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %39 = llvm.getelementptr %38[%5] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %40 = llvm.load %39 : !llvm.ptr<i64>
    %41 = builtin.unrealized_conversion_cast %40 : i64 to index
    %42 = scf.for %arg1 = %37 to %41 step %4 iter_args(%arg2 = %33) -> (f32) {
      %46 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %47 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %48 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %49 = llvm.getelementptr %48[%47] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %50 = llvm.load %49 : !llvm.ptr<i64>
      %51 = builtin.unrealized_conversion_cast %50 : i64 to index
      %52 = llvm.add %46, %3  : i64
      %53 = builtin.unrealized_conversion_cast %52 : i64 to index
      %54 = builtin.unrealized_conversion_cast %53 : index to i64
      %55 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %56 = llvm.getelementptr %55[%54] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %57 = llvm.load %56 : !llvm.ptr<i64>
      %58 = builtin.unrealized_conversion_cast %57 : i64 to index
      %59 = scf.for %arg3 = %51 to %58 step %4 iter_args(%arg4 = %arg2) -> (f32) {
        %60 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %61 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %62 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %63 = llvm.getelementptr %62[%61] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %64 = llvm.load %63 : !llvm.ptr<i64>
        %65 = builtin.unrealized_conversion_cast %64 : i64 to index
        %66 = llvm.add %60, %3  : i64
        %67 = builtin.unrealized_conversion_cast %66 : i64 to index
        %68 = builtin.unrealized_conversion_cast %67 : index to i64
        %69 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %70 = llvm.getelementptr %69[%68] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %71 = llvm.load %70 : !llvm.ptr<i64>
        %72 = builtin.unrealized_conversion_cast %71 : i64 to index
        %73 = scf.for %arg5 = %65 to %72 step %4 iter_args(%arg6 = %arg4) -> (f32) {
          %74 = builtin.unrealized_conversion_cast %arg5 : index to i64
          %75 = llvm.extractvalue %19[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
          %76 = llvm.getelementptr %75[%74] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
          %77 = llvm.load %76 : !llvm.ptr<f32>
          %78 = llvm.fadd %arg6, %77  : f32
          scf.yield %78 : f32
        }
        scf.yield %73 : f32
      }
      scf.yield %59 : f32
    }
    %43 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %42, %43 : !llvm.ptr<f32>
    %44 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %45 = llvm.load %44 : !llvm.ptr<f32>
    llvm.return %45 : f32
  }
}



===================================
  Input to convert-openmp-to-llvm
===================================
module {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %0 = builtin.unrealized_conversion_cast %c0 : index to i64
    %c1 = arith.constant 1 : index
    %1 = builtin.unrealized_conversion_cast %c1 : index to i64
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %2 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = builtin.unrealized_conversion_cast %2 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %4 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %5 = builtin.unrealized_conversion_cast %4 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %6 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %7 = builtin.unrealized_conversion_cast %6 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %8 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %9 = builtin.unrealized_conversion_cast %8 : memref<?xf32> to !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %10 = llvm.mlir.constant(1 : index) : i64
    %11 = llvm.mlir.null : !llvm.ptr<f32>
    %12 = llvm.getelementptr %11[%10] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
    %13 = llvm.ptrtoint %12 : !llvm.ptr<f32> to i64
    %14 = llvm.call @malloc(%13) : (i64) -> !llvm.ptr<i8>
    %15 = llvm.bitcast %14 : !llvm.ptr<i8> to !llvm.ptr<f32>
    %16 = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %17 = llvm.insertvalue %15, %16[0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %18 = llvm.insertvalue %15, %17[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %19 = llvm.mlir.constant(0 : index) : i64
    %20 = llvm.insertvalue %19, %18[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %21 = llvm.extractvalue %20[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %cst, %21 : !llvm.ptr<f32>
    %22 = llvm.extractvalue %20[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %23 = llvm.load %22 : !llvm.ptr<f32>
    %24 = llvm.extractvalue %3[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %25 = llvm.getelementptr %24[%0] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %26 = llvm.load %25 : !llvm.ptr<i64>
    %27 = arith.index_cast %26 : i64 to index
    %28 = llvm.extractvalue %3[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %29 = llvm.getelementptr %28[%1] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %30 = llvm.load %29 : !llvm.ptr<i64>
    %31 = arith.index_cast %30 : i64 to index
    %32 = scf.for %arg1 = %27 to %31 step %c1 iter_args(%arg2 = %23) -> (f32) {
      %36 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %37 = llvm.extractvalue %5[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %38 = llvm.getelementptr %37[%36] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %39 = llvm.load %38 : !llvm.ptr<i64>
      %40 = arith.index_cast %39 : i64 to index
      %41 = arith.addi %arg1, %c1 : index
      %42 = builtin.unrealized_conversion_cast %41 : index to i64
      %43 = llvm.extractvalue %5[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %44 = llvm.getelementptr %43[%42] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %45 = llvm.load %44 : !llvm.ptr<i64>
      %46 = arith.index_cast %45 : i64 to index
      %47 = scf.for %arg3 = %40 to %46 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %48 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %49 = llvm.extractvalue %7[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %50 = llvm.getelementptr %49[%48] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %51 = llvm.load %50 : !llvm.ptr<i64>
        %52 = arith.index_cast %51 : i64 to index
        %53 = arith.addi %arg3, %c1 : index
        %54 = builtin.unrealized_conversion_cast %53 : index to i64
        %55 = llvm.extractvalue %7[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %56 = llvm.getelementptr %55[%54] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %57 = llvm.load %56 : !llvm.ptr<i64>
        %58 = arith.index_cast %57 : i64 to index
        %59 = scf.for %arg5 = %52 to %58 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %60 = builtin.unrealized_conversion_cast %arg5 : index to i64
          %61 = llvm.extractvalue %9[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
          %62 = llvm.getelementptr %61[%60] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
          %63 = llvm.load %62 : !llvm.ptr<f32>
          %64 = arith.addf %arg6, %63 : f32
          scf.yield %64 : f32
        }
        scf.yield %59 : f32
      }
      scf.yield %47 : f32
    }
    %33 = llvm.extractvalue %20[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %32, %33 : !llvm.ptr<f32>
    %34 = llvm.extractvalue %20[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %35 = llvm.load %34 : !llvm.ptr<f32>
    return %35 : f32
  }
}



===================================
  Input to convert-memref-to-llvm
===================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    memref.store %cst, %4[] : memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



=================================
  Input to convert-math-to-libm
=================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    memref.store %cst, %4[] : memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



=================================
  Input to convert-math-to-llvm
=================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    memref.store %cst, %4[] : memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



===================================
  Input to convert-vector-to-llvm
===================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    memref.store %cst, %4[] : memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



====================================
  Input to convert-linalg-to-loops
====================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



=================================
  Input to finalizing-bufferize
=================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}



=============================
  Input to tensor-bufferize
=============================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %13 = memref.load %1[%arg1] : memref<?xi64>
      %14 = arith.index_cast %13 : i64 to index
      %15 = arith.addi %arg1, %c1 : index
      %16 = memref.load %1[%15] : memref<?xi64>
      %17 = arith.index_cast %16 : i64 to index
      %18 = scf.for %arg3 = %14 to %17 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %19 = memref.load %2[%arg3] : memref<?xi64>
        %20 = arith.index_cast %19 : i64 to index
        %21 = arith.addi %arg3, %c1 : index
        %22 = memref.load %2[%21] : memref<?xi64>
        %23 = arith.index_cast %22 : i64 to index
        %24 = scf.for %arg5 = %20 to %23 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %25 = memref.load %3[%arg5] : memref<?xf32>
          %26 = arith.addf %arg6, %25 : f32
          scf.yield %26 : f32
        }
        scf.yield %24 : f32
      }
      scf.yield %18 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = bufferization.to_tensor %4 : memref<f32>
    %12 = tensor.extract %11[] : tensor<f32>
    return %12 : f32
  }
}



===========================
  Input to func-bufferize
===========================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %13 = memref.load %1[%arg1] : memref<?xi64>
      %14 = arith.index_cast %13 : i64 to index
      %15 = arith.addi %arg1, %c1 : index
      %16 = memref.load %1[%15] : memref<?xi64>
      %17 = arith.index_cast %16 : i64 to index
      %18 = scf.for %arg3 = %14 to %17 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %19 = memref.load %2[%arg3] : memref<?xi64>
        %20 = arith.index_cast %19 : i64 to index
        %21 = arith.addi %arg3, %c1 : index
        %22 = memref.load %2[%21] : memref<?xi64>
        %23 = arith.index_cast %22 : i64 to index
        %24 = scf.for %arg5 = %20 to %23 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %25 = memref.load %3[%arg5] : memref<?xf32>
          %26 = arith.addf %arg6, %25 : f32
          scf.yield %26 : f32
        }
        scf.yield %24 : f32
      }
      scf.yield %18 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = bufferization.to_tensor %4 : memref<f32>
    %12 = tensor.extract %11[] : tensor<f32>
    return %12 : f32
  }
}



============================
  Input to arith-bufferize
============================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %13 = memref.load %1[%arg1] : memref<?xi64>
      %14 = arith.index_cast %13 : i64 to index
      %15 = arith.addi %arg1, %c1 : index
      %16 = memref.load %1[%15] : memref<?xi64>
      %17 = arith.index_cast %16 : i64 to index
      %18 = scf.for %arg3 = %14 to %17 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %19 = memref.load %2[%arg3] : memref<?xi64>
        %20 = arith.index_cast %19 : i64 to index
        %21 = arith.addi %arg3, %c1 : index
        %22 = memref.load %2[%21] : memref<?xi64>
        %23 = arith.index_cast %22 : i64 to index
        %24 = scf.for %arg5 = %20 to %23 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %25 = memref.load %3[%arg5] : memref<?xf32>
          %26 = arith.addf %arg6, %25 : f32
          scf.yield %26 : f32
        }
        scf.yield %24 : f32
      }
      scf.yield %18 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = bufferization.to_tensor %4 : memref<f32>
    %12 = tensor.extract %11[] : tensor<f32>
    return %12 : f32
  }
}



=============================
  Input to linalg-bufferize
=============================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %13 = memref.load %1[%arg1] : memref<?xi64>
      %14 = arith.index_cast %13 : i64 to index
      %15 = arith.addi %arg1, %c1 : index
      %16 = memref.load %1[%15] : memref<?xi64>
      %17 = arith.index_cast %16 : i64 to index
      %18 = scf.for %arg3 = %14 to %17 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %19 = memref.load %2[%arg3] : memref<?xi64>
        %20 = arith.index_cast %19 : i64 to index
        %21 = arith.addi %arg3, %c1 : index
        %22 = memref.load %2[%21] : memref<?xi64>
        %23 = arith.index_cast %22 : i64 to index
        %24 = scf.for %arg5 = %20 to %23 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %25 = memref.load %3[%arg5] : memref<?xf32>
          %26 = arith.addf %arg6, %25 : f32
          scf.yield %26 : f32
        }
        scf.yield %24 : f32
      }
      scf.yield %18 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = bufferization.to_tensor %4 : memref<f32>
    %12 = tensor.extract %11[] : tensor<f32>
    return %12 : f32
  }
}



=====================================
  Input to sparse-tensor-conversion
=====================================
module {
  func @func_f32(%arg0: tensor<10x20x30xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, pointerBitWidth = 64, indexBitWidth = 64 }>>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = sparse_tensor.pointers %arg0, %c0 : tensor<10x20x30xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, pointerBitWidth = 64, indexBitWidth = 64 }>> to memref<?xi64>
    %1 = sparse_tensor.pointers %arg0, %c1 : tensor<10x20x30xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, pointerBitWidth = 64, indexBitWidth = 64 }>> to memref<?xi64>
    %2 = sparse_tensor.pointers %arg0, %c2 : tensor<10x20x30xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, pointerBitWidth = 64, indexBitWidth = 64 }>> to memref<?xi64>
    %3 = sparse_tensor.values %arg0 : tensor<10x20x30xf32, #sparse_tensor.encoding<{ dimLevelType = [ "compressed", "compressed", "compressed" ], dimOrdering = affine_map<(d0, d1, d2) -> (d0, d1, d2)>, pointerBitWidth = 64, indexBitWidth = 64 }>> to memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    linalg.fill(%cst, %4) : f32, memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %13 = memref.load %1[%arg1] : memref<?xi64>
      %14 = arith.index_cast %13 : i64 to index
      %15 = arith.addi %arg1, %c1 : index
      %16 = memref.load %1[%15] : memref<?xi64>
      %17 = arith.index_cast %16 : i64 to index
      %18 = scf.for %arg3 = %14 to %17 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %19 = memref.load %2[%arg3] : memref<?xi64>
        %20 = arith.index_cast %19 : i64 to index
        %21 = arith.addi %arg3, %c1 : index
        %22 = memref.load %2[%21] : memref<?xi64>
        %23 = arith.index_cast %22 : i64 to index
        %24 = scf.for %arg5 = %20 to %23 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %25 = memref.load %3[%arg5] : memref<?xf32>
          %26 = arith.addf %arg6, %25 : f32
          scf.yield %26 : f32
        }
        scf.yield %24 : f32
      }
      scf.yield %18 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = bufferization.to_tensor %4 : memref<f32>
    %12 = tensor.extract %11[] : tensor<f32>
    return %12 : f32
  }
}



===========================
  Input to sparsification
===========================

#trait_sum_reduction = {
  indexing_maps = [
    affine_map<(i,j,k) -> (i,j,k)>,  // A
    affine_map<(i,j,k) -> ()>        // x (scalar out)
  ],
  iterator_types = ["reduction", "reduction", "reduction"],
  doc = "x += SUM_ijk A(i,j,k)"
}

#sparseTensor = #sparse_tensor.encoding<{
  dimLevelType = [ "compressed", "compressed", "compressed" ],
  dimOrdering = affine_map<(i,j,k) -> (i,j,k)>,
  pointerBitWidth = 64,
  indexBitWidth = 64
}>

func @func_f32(%argA: tensor<10x20x30xf32, #sparseTensor>) -> f32 {
  %out_tensor = linalg.init_tensor [] : tensor<f32>
  %reduction = linalg.generic #trait_sum_reduction
     ins(%argA: tensor<10x20x30xf32, #sparseTensor>)
    outs(%out_tensor: tensor<f32>) {
      ^bb(%a: f32, %x: f32):
        %0 = arith.addf %x, %a : f32
        linalg.yield %0 : f32
  } -> tensor<f32>
  %answer = tensor.extract %reduction[] : tensor<f32>
  return %answer : f32
}

This large output may seem intimidating due to it’s size, but it’s mostly large since it’s showing the inputs to each pass.

We know that the error happens when the builtin.unrealized_conversion_cast operation occurs.

We can see from the output above that it happens during the convert-std-to-llvm pass.

It’s likely that there’s something problematic in the input to that pass, so it’s worth looking into the IR that was given to the convert-std-to-llvm pass, which we can see under the section labelled ````. We’ll show a sort snippet of it below.


result_string = str(result)
lines = result_string.splitlines()
lines = lines[lines.index("  Input to convert-std-to-llvm  ")-1:]
lines = lines[:lines.index("")]
print("\n".join(lines))
================================
  Input to convert-std-to-llvm
================================
module {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @sparseValuesF32(%arg0: !llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparseValuesF32(%1, %arg0) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparseValuesF32(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @sparsePointers64(%arg0: !llvm.ptr<i8>, %arg1: i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparsePointers64(%1, %arg0, %arg1) : (!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparsePointers64(!llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>, i64) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %0 = llvm.mlir.constant(0 : index) : i64
    %1 = builtin.unrealized_conversion_cast %0 : i64 to index
    %2 = builtin.unrealized_conversion_cast %1 : index to i64
    %3 = llvm.mlir.constant(1 : index) : i64
    %4 = builtin.unrealized_conversion_cast %3 : i64 to index
    %5 = builtin.unrealized_conversion_cast %4 : index to i64
    %6 = llvm.mlir.constant(2 : index) : i64
    %7 = llvm.mlir.constant(0.000000e+00 : f32) : f32
    %8 = llvm.call @sparsePointers64(%arg0, %0) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %9 = builtin.unrealized_conversion_cast %8 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %10 = builtin.unrealized_conversion_cast %9 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %11 = llvm.call @sparsePointers64(%arg0, %3) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %12 = builtin.unrealized_conversion_cast %11 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %13 = builtin.unrealized_conversion_cast %12 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %14 = llvm.call @sparsePointers64(%arg0, %6) : (!llvm.ptr<i8>, i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %15 = builtin.unrealized_conversion_cast %14 : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xi64>
    %16 = builtin.unrealized_conversion_cast %15 : memref<?xi64> to !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %17 = llvm.call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %18 = builtin.unrealized_conversion_cast %17 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> to memref<?xf32>
    %19 = builtin.unrealized_conversion_cast %18 : memref<?xf32> to !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
    %20 = llvm.mlir.constant(1 : index) : i64
    %21 = llvm.mlir.null : !llvm.ptr<f32>
    %22 = llvm.getelementptr %21[%20] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
    %23 = llvm.ptrtoint %22 : !llvm.ptr<f32> to i64
    %24 = llvm.call @malloc(%23) : (i64) -> !llvm.ptr<i8>
    %25 = llvm.bitcast %24 : !llvm.ptr<i8> to !llvm.ptr<f32>
    %26 = llvm.mlir.undef : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %27 = llvm.insertvalue %25, %26[0] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %28 = llvm.insertvalue %25, %27[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %29 = llvm.mlir.constant(0 : index) : i64
    %30 = llvm.insertvalue %29, %28[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %31 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %7, %31 : !llvm.ptr<f32>
    %32 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %33 = llvm.load %32 : !llvm.ptr<f32>
    %34 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %35 = llvm.getelementptr %34[%2] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %36 = llvm.load %35 : !llvm.ptr<i64>
    %37 = builtin.unrealized_conversion_cast %36 : i64 to index
    %38 = llvm.extractvalue %10[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
    %39 = llvm.getelementptr %38[%5] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
    %40 = llvm.load %39 : !llvm.ptr<i64>
    %41 = builtin.unrealized_conversion_cast %40 : i64 to index
    %42 = scf.for %arg1 = %37 to %41 step %4 iter_args(%arg2 = %33) -> (f32) {
      %46 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %47 = builtin.unrealized_conversion_cast %arg1 : index to i64
      %48 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %49 = llvm.getelementptr %48[%47] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %50 = llvm.load %49 : !llvm.ptr<i64>
      %51 = builtin.unrealized_conversion_cast %50 : i64 to index
      %52 = llvm.add %46, %3  : i64
      %53 = builtin.unrealized_conversion_cast %52 : i64 to index
      %54 = builtin.unrealized_conversion_cast %53 : index to i64
      %55 = llvm.extractvalue %13[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
      %56 = llvm.getelementptr %55[%54] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
      %57 = llvm.load %56 : !llvm.ptr<i64>
      %58 = builtin.unrealized_conversion_cast %57 : i64 to index
      %59 = scf.for %arg3 = %51 to %58 step %4 iter_args(%arg4 = %arg2) -> (f32) {
        %60 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %61 = builtin.unrealized_conversion_cast %arg3 : index to i64
        %62 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %63 = llvm.getelementptr %62[%61] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %64 = llvm.load %63 : !llvm.ptr<i64>
        %65 = builtin.unrealized_conversion_cast %64 : i64 to index
        %66 = llvm.add %60, %3  : i64
        %67 = builtin.unrealized_conversion_cast %66 : i64 to index
        %68 = builtin.unrealized_conversion_cast %67 : index to i64
        %69 = llvm.extractvalue %16[1] : !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>
        %70 = llvm.getelementptr %69[%68] : (!llvm.ptr<i64>, i64) -> !llvm.ptr<i64>
        %71 = llvm.load %70 : !llvm.ptr<i64>
        %72 = builtin.unrealized_conversion_cast %71 : i64 to index
        %73 = scf.for %arg5 = %65 to %72 step %4 iter_args(%arg6 = %arg4) -> (f32) {
          %74 = builtin.unrealized_conversion_cast %arg5 : index to i64
          %75 = llvm.extractvalue %19[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
          %76 = llvm.getelementptr %75[%74] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
          %77 = llvm.load %76 : !llvm.ptr<f32>
          %78 = llvm.fadd %arg6, %77  : f32
          scf.yield %78 : f32
        }
        scf.yield %73 : f32
      }
      scf.yield %59 : f32
    }
    %43 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    llvm.store %42, %43 : !llvm.ptr<f32>
    %44 = llvm.extractvalue %30[1] : !llvm.struct<(ptr<f32>, ptr<f32>, i64)>
    %45 = llvm.load %44 : !llvm.ptr<f32>
    llvm.return %45 : f32
  }
}

While this is a good idea in general, it doesn’t seem to be useful here. When MLIR applies a pass, that pass is applied until quiescence, i.e. it keeps applying the pass until nothing changes (or until some limit on the number of applications is reached).

It seems that the convert-std-to-llvm pass has already been applied a few times since we see several ops from the LLVM dialect already present in the IR shown under the Input to convert-std-to-llvm section (for example, we see llvm.mlir.constant).

Another good place to look is in the output of the last pass right before we get our error. Let’s look at the result of the convert-math-to-llvm pass.


lines = result_string.splitlines()
lines = lines[lines.index("  Input to convert-math-to-llvm  ")-1:]
lines = lines[:lines.index("")]
print("\n".join(lines))
=================================
  Input to convert-math-to-llvm
=================================
module {
  func private @sparseValuesF32(!llvm.ptr<i8>) -> memref<?xf32> attributes {llvm.emit_c_interface}
  func private @sparsePointers64(!llvm.ptr<i8>, index) -> memref<?xi64> attributes {llvm.emit_c_interface}
  func @func_f32(%arg0: !llvm.ptr<i8>) -> f32 {
    %c0 = arith.constant 0 : index
    %c1 = arith.constant 1 : index
    %c2 = arith.constant 2 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = call @sparsePointers64(%arg0, %c0) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %1 = call @sparsePointers64(%arg0, %c1) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %2 = call @sparsePointers64(%arg0, %c2) : (!llvm.ptr<i8>, index) -> memref<?xi64>
    %3 = call @sparseValuesF32(%arg0) : (!llvm.ptr<i8>) -> memref<?xf32>
    %4 = memref.alloc() : memref<f32>
    memref.store %cst, %4[] : memref<f32>
    %5 = memref.load %4[] : memref<f32>
    %6 = memref.load %0[%c0] : memref<?xi64>
    %7 = arith.index_cast %6 : i64 to index
    %8 = memref.load %0[%c1] : memref<?xi64>
    %9 = arith.index_cast %8 : i64 to index
    %10 = scf.for %arg1 = %7 to %9 step %c1 iter_args(%arg2 = %5) -> (f32) {
      %12 = memref.load %1[%arg1] : memref<?xi64>
      %13 = arith.index_cast %12 : i64 to index
      %14 = arith.addi %arg1, %c1 : index
      %15 = memref.load %1[%14] : memref<?xi64>
      %16 = arith.index_cast %15 : i64 to index
      %17 = scf.for %arg3 = %13 to %16 step %c1 iter_args(%arg4 = %arg2) -> (f32) {
        %18 = memref.load %2[%arg3] : memref<?xi64>
        %19 = arith.index_cast %18 : i64 to index
        %20 = arith.addi %arg3, %c1 : index
        %21 = memref.load %2[%20] : memref<?xi64>
        %22 = arith.index_cast %21 : i64 to index
        %23 = scf.for %arg5 = %19 to %22 step %c1 iter_args(%arg6 = %arg4) -> (f32) {
          %24 = memref.load %3[%arg5] : memref<?xf32>
          %25 = arith.addf %arg6, %24 : f32
          scf.yield %25 : f32
        }
        scf.yield %23 : f32
      }
      scf.yield %17 : f32
    }
    memref.store %10, %4[] : memref<f32>
    %11 = memref.load %4[] : memref<f32>
    return %11 : f32
  }
}

We see that the ops are mostly ops from the standard, llvm, and builtin dialects. However, there are some ops from the scf dialect. It would make sense that the convert-std-to-llvm pass would be able to handle ops from the builtin dialect. It would make sense that it be able to handle ops from the llvm dialect since that’s the target diallect. It’s unclear whether or not the convert-std-to-llvm dialect can handle ops from the scf dialect. Given the name of the convert-std-to-llvm pass, we can infer that it will mostly handle ops from the std dialect and cannot handle ops from the scf dialect. Let’s see if there are any passes that can convert from the scf dialect to the std dialect.


!mlir-opt --help | grep "scf"
Available Dialects: acc, affine, amx, arith, arm_neon, arm_sve, async, bufferization, builtin, cf, complex, dlti, emitc, gpu, linalg, llvm, math, memref, nvvm, omp, pdl, pdl_interp, quant, rocdl, scf, shape, sparse_tensor, spv, std, tensor, test, tosa, vector, x86vector
      --async-parallel-for                              -   Convert scf.parallel operations to multiple async compute ops executed concurrently for non-overlapping iteration ranges
      --convert-linalg-tiled-loops-to-scf               -   Lower linalg tiled loops to SCF loops and parallel loops
      --convert-openacc-to-scf                          -   Convert the OpenACC ops to OpenACC with SCF dialect
      --convert-parallel-loops-to-gpu                   -   Convert mapped scf.parallel ops to gpu launch operations
      --convert-scf-to-cf                               -   Convert SCF dialect to ControlFlow dialect, replacing structured control flow with a CFG
      --convert-scf-to-openmp                           -   Convert SCF parallel loop to OpenMP parallel + workshare constructs.
      --convert-scf-to-spirv                            -   Convert SCF dialect to SPIR-V dialect.
      --convert-vector-to-scf                           -   Lower the operations from the vector dialect into the SCF dialect
      --scf-bufferize                                   -   Bufferize the scf dialect.
      --scf-for-loop-canonicalization                   -   Canonicalize operations within scf.for loop bodies
      --scf-for-loop-peeling                            -   Peel `for` loops at their upper bounds.
      --scf-for-loop-range-folding                      -   Fold add/mul ops into loop range
      --scf-for-loop-specialization                     -   Specialize `for` loops for vectorization
      --scf-for-to-while                                -   Convert SCF for loops to SCF while loops
      --scf-parallel-loop-collapsing                    -   Collapse parallel loops to use less induction variables
      --scf-parallel-loop-fusion                        -   Fuse adjacent parallel loops
      --scf-parallel-loop-specialization                -   Specialize parallel loops for vectorization
      --scf-parallel-loop-tiling                        -   Tile parallel loops
      --test-scf-for-utils                              -   test scf.for utils
      --test-scf-if-utils                               -   test scf.if utils
      --test-scf-pipelining                             -   test scf.forOp pipelining
      --test-vector-transfer-full-partial-split         -   Test lowering patterns to split transfer ops via scf.if + linalg ops
      --tosa-to-scf                                     -   Lower TOSA to the SCF dialect

The pass convert-scf-to-cf seems promising as it intends to convert the scf dialect to cf dialect.

Let’s see if running the convert-scf-to-cf pass any of the conversion passes will get rid of our exception.


passes = [
    "--sparsification",
    "--sparse-tensor-conversion",
    "--linalg-bufferize",
    "--arith-bufferize",
    "--func-bufferize",
    "--tensor-bufferize",
    "--finalizing-bufferize",
    "--convert-scf-to-cf", # newly added
    "--convert-linalg-to-loops",
    "--convert-vector-to-llvm",
    "--convert-math-to-llvm",
    "--convert-math-to-libm",
    "--convert-memref-to-llvm",
    "--convert-openmp-to-llvm",
    "--convert-arith-to-llvm",
    "--convert-std-to-llvm",
    "--reconcile-unrealized-casts"
]
result = cli.apply_passes(mlir_bytes, passes)
print(result[:1500])
module attributes {llvm.data_layout = ""} {
  llvm.func @malloc(i64) -> !llvm.ptr<i8>
  llvm.func @sparseValuesF32(%arg0: !llvm.ptr<i8>) -> !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.call @_mlir_ciface_sparseValuesF32(%1, %arg0) : (!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) -> ()
    %2 = llvm.load %1 : !llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>
    llvm.return %2 : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>
  }
  llvm.func @_mlir_ciface_sparseValuesF32(!llvm.ptr<struct<(ptr<f32>, ptr<f32>, i64, array<1 x i64>, array<1 x i64>)>>, !llvm.ptr<i8>) attributes {llvm.emit_c_interface, sym_visibility = "private"}
  llvm.func @sparsePointers64(%arg0: !llvm.ptr<i8>, %arg1: i64) -> !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> attributes {llvm.emit_c_interface, sym_visibility = "private"} {
    %0 = llvm.mlir.constant(1 : index) : i64
    %1 = llvm.alloca %0 x !llvm.struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)> : (i64) -> !llvm.ptr<struct<(ptr<i64>, ptr<i64>, i64, array<1 x i64>, array<1 x i64>)>>

It looks like it fixed our issue!