Tensor Interface
The AbstractTensor
interface (defined in src/abstract_tensor.jl
) is the interface through which Finch understands tensors. It is a high-level interace which allows tensors to interact with the rest of the Finch system. The interface is designed to be extensible, allowing users to define their own tensor types and behaviors. For a minimal example, read the definitions in /ext/SparseArraysExt.jl
and in /src/interface/abstractarray.jl
. Once these methods are defined that tell Finch how to generate code for an array, the AbstractTensor
interface will also use Finch to generate code for several Julia AbstractArray
methods, such as getindex
, setindex!
, map
, and reduce
. An important note: getindex
and setindex!
are not a source of truth for Finch tensors. Search the codebase for ::AbstractTensor
for a full list of methods that are implemented for AbstractTensor
. Note than most AbstractTensor
implement labelled_show
and labelled_children
methods instead of show(::IO, ::MIME"text/plain", t::AbstractTensor)
for pretty printed display.
Tensor Methods
Finch.declare!
— Functiondeclare!(ctx, tns, init)
Declare the read-only virtual tensor tns
in the context ctx
with a starting value of init
and return it. Afterwards the tensor is update-only.
Finch.freeze!
— Functionfreeze!(ctx, tns)
Freeze the update-only virtual tensor tns
in the context ctx
and return it. This may involve trimming any excess overallocated memory. Afterwards, the tensor is read-only.
Finch.thaw!
— Functionthaw!(ctx, tns)
Thaw the read-only virtual tensor tns
in the context ctx
and return it. Afterwards, the tensor is update-only.
Finch.unfurl
— Functionunfurl(ctx, tns, ext, proto)
Return an array object (usually a looplet nest) for lowering the outermost dimension of virtual tensor tns
. ext
is the extent of the looplet. proto
is the protocol that should be used for this index, but one doesn't need to unfurl all the indices at once.
Finch.instantiate
— Functioninstantiate(ctx, tns, mode)
Process the tensor tns
in the context ctx
, just after it has been unfurled, declared, or thawed. The earliest opportunity to process tns
.
Finch.virtual_eltype
— Functionvirtual_eltype(arr)
Return the element type of the virtual tensor arr
.
Finch.virtual_fill_value
— Functionvirtual fill_value(arr)
Return the initializer for virtual array arr
.
Finch.virtual_size
— Functionvirtual_size(ctx, tns)
Return a tuple of the dimensions of tns
in the context ctx
. This is a function similar in spirit to Base.axes
.
Finch.virtual_resize!
— Functionvirtual_resize!(ctx, tns, dims...)
Resize tns
in the context ctx
. This is a function similar in spirit to Base.resize!
.
Finch.labelled_show
— Functionlabelled_show(node)
Show the node
in a LabelledTree
.
Finch.labelled_children
— Functionlabelled_children(node)
Return the children of node
in a LabelledTree
. You may label the children by returning a LabelledTree(key, value)
, which will be shown as key: value
a.
Finch.is_injective
— Functionis_injective(ctx, tns)
Returns a vector of booleans, one for each dimension of the tensor, indicating whether the access is injective in that dimension. A dimension is injective if each index in that dimension maps to a different memory space in the underlying array.
Finch.is_atomic
— Functionis_atomic(ctx, tns)
Returns a tuple (atomicities, overall) where atomicities is a vector, indicating which indices have an atomic that guards them,
and overall is a boolean that indicates is the last level had an atomic guarding it.
Finch.is_concurrent
— Functionis_concurrent(ctx, tns)
Returns a vector of booleans, one for each dimension of the tensor, indicating
whether the index can be written to without any execution state. So if a matrix returns [true, false],
then we can write to A[i, j] and A[i_2, j] without any shared execution state between the two, but
we can't write to A[i, j] and A[i, j_2] without carrying over execution state.
Level Interface
julia> A = [0.0 0.0 4.4; 1.1 0.0 0.0; 2.2 0.0 5.5; 3.3 0.0 0.0]
4×3 Matrix{Float64}:
0.0 0.0 4.4
1.1 0.0 0.0
2.2 0.0 5.5
3.3 0.0 0.0
julia> A_fbr = Tensor(Dense(Dense(Element(0.0))), A)
4×3 Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 4.4
1.1 0.0 0.0
2.2 0.0 5.5
3.3 0.0 0.0
julia> tensor_tree(A_fbr)
4×3-Tensor
└─ Dense [:,1:3]
├─ [:, 1]: Dense [1:4]
│ ├─ [1]: 0.0
│ ├─ [2]: 1.1
│ ├─ [3]: 2.2
│ └─ [4]: 3.3
├─ [:, 2]: Dense [1:4]
│ ├─ [1]: 0.0
│ ├─ [2]: 0.0
│ ├─ [3]: 0.0
│ └─ [4]: 0.0
└─ [:, 3]: Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0
We refer to a node in the tree as a subfiber. All of the nodes at the same level are stored in the same datastructure, and disambiguated by an integer position
. in the above example, there are three levels: the rootmost level contains only one subfiber, the root. The middle level has 3 subfibers, one for each column. The leafmost level has 12 subfibers, one for each element of the array. For example, the first level is A_fbr.lvl
, and we can represent it's third position as SubFiber(A_fbr.lvl.lvl, 3)
. The second level is A_fbr.lvl.lvl
, and we can access it's 9th position as SubFiber(A_fbr.lvl.lvl.lvl, 9)
. For instructional purposes, you can use parentheses to call a subfiber on an index to select among children of a subfiber.
julia> tensor_tree(Finch.SubFiber(A_fbr.lvl.lvl, 3))
Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0
julia> tensor_tree(A_fbr[:, 3])
4-Tensor
└─ Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0
julia> tensor_tree(A_fbr(3))
Dense [1:4]
├─ [1]: 4.4
├─ [2]: 0.0
├─ [3]: 5.5
└─ [4]: 0.0
julia> Finch.SubFiber(A_fbr.lvl.lvl.lvl, 9)
Finch.SubFiber{ElementLevel{0.0, Float64, Int64, Vector{Float64}}, Int64}:
4.4
julia> A_fbr[1, 3]
4.4
julia> A_fbr(3)(1)
4.4
When we print the tree in text, positions are numbered from top to bottom.
Because our array is sparse, (mostly zero, or another fill value), it would be more efficient to store only the nonzero values. In Finch, each level is represented with a different format. A sparse level only stores non-fill values. This time, we'll use a tensor constructor with sl
(for "SparseList
of nonzeros") instead of d
(for "Dense
"):
julia> A_fbr = Tensor(Dense(SparseList(Element(0.0))), A)
4×3 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 4.4
1.1 0.0 0.0
2.2 0.0 5.5
3.3 0.0 0.0
julia> tensor_tree(A_fbr)
4×3-Tensor
└─ Dense [:,1:3]
├─ [:, 1]: SparseList (0.0) [1:4]
│ ├─ [2]: 1.1
│ ├─ [3]: 2.2
│ └─ [4]: 3.3
├─ [:, 2]: SparseList (0.0) [1:4]
└─ [:, 3]: SparseList (0.0) [1:4]
├─ [1]: 4.4
└─ [3]: 5.5
Our Dense(SparseList(Element(0.0)))
format is also known as "CSC" and is equivalent to SparseMatrixCSC
. The Tensor
function will perform a zero-cost copy between Finch fibers and sparse matrices, when available. CSC is an excellent general-purpose representation when we expect most of the columns to have a few nonzeros. However, when most of the columns are entirely fill (a situation known as hypersparsity), it is better to compress the root level as well:
julia> A_fbr = Tensor(SparseList(SparseList(Element(0.0))), A)
4×3 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 4.4
1.1 0.0 0.0
2.2 0.0 5.5
3.3 0.0 0.0
julia> tensor_tree(A_fbr)
4×3-Tensor
└─ SparseList (0.0) [:,1:3]
├─ [:, 1]: SparseList (0.0) [1:4]
│ ├─ [2]: 1.1
│ ├─ [3]: 2.2
│ └─ [4]: 3.3
└─ [:, 3]: SparseList (0.0) [1:4]
├─ [1]: 4.4
└─ [3]: 5.5
Here we see that the entirely zero column has also been compressed. The SparseList(SparseList(Element(0.0)))
format is also known as "DCSC".
The "COO" (or "Coordinate") format is often used in practice for ease of interchange between libraries. In an N
-dimensional array A
, COO stores N
lists of indices I_1, ..., I_N
where A[I_1[p], ..., I_N[p]]
is the p
^th stored value in column-major numbering. In Finch, COO
is represented as a multi-index level, which can handle more than one index at once. We use curly brackets to declare the number of indices handled by the level:
julia> A_fbr = Tensor(SparseCOO{2}(Element(0.0)), A)
4×3 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
0.0 0.0 4.4
1.1 0.0 0.0
2.2 0.0 5.5
3.3 0.0 0.0
julia> tensor_tree(A_fbr)
4×3-Tensor
└─ SparseCOO{2} (0.0) [:,1:3]
├─ [2, 1]: 1.1
├─ [3, 1]: 2.2
├─ ⋮
├─ [1, 3]: 4.4
└─ [3, 3]: 5.5
The COO format is compact and straightforward, but doesn't support random access. For random access, one should use the SparseDict
or SparseBytemap
format. A full listing of supported formats is described after a rough description of shared common internals of level, relating to types and storage.
Types and Storage of Level
All levels have a postype
, typically denoted as Tp
in the constructors, used for internal pointer types but accessible by the function:
Finch.postype
— Functionpostype(lvl)
Return a position type with the same flavor as those used to store the positions of the fibers contained in lvl
. The name position descends from the pos or position or pointer arrays found in many definitions of CSR or CSC. In Finch, positions should be data used to access either a subfiber or some other similar auxiliary data. Thus, we often end up iterating over positions.
Additionally, many levels have a Vp
or Vi
in their constructors; these stand for vector of element type Tp
or Ti
. More generally, levels are paramterized by the types that they use for storage. By default, all levels use Vector
, but a user could could change any or all of the storage types of a tensor so that the tensor would be stored on a GPU or CPU or some combination thereof, or even just via a vector with a different allocation mechanism. The storage type should behave like AbstractArray
and needs to implement the usual abstract array functions and Base.resize!
. See the tests for an example.
When levels are constructed in short form as in the examples above, the index, position, and storage types are inferred from the level below. All the levels at the bottom of a Tensor (Element, Pattern, Repeater
) specify an index type, position type, and storage type even if they don't need them. These are used by levels that take these as parameters.
Level Methods
Tensor levels are implemented using the following methods:
Finch.declare_level!
— Functiondeclare_level!(ctx, lvl, pos, init)
Initialize and thaw all fibers within lvl
, assuming positions 1:pos
were previously assembled and frozen. The resulting level has no assembled positions.
Finch.assemble_level!
— Functionassemble_level!(ctx, lvl, pos, new_pos)
Assemble and positions pos+1:new_pos
in lvl
, assuming positions 1:pos
were previously assembled.
Finch.reassemble_level!
— Functionreassemble_level!(lvl, ctx, pos_start, pos_end)
Set the previously assempled positions from pos_start
to pos_end
to level_fill_value(lvl)
. Not avaliable on all level types as this presumes updating.
Finch.freeze_level!
— Functionfreeze_level!(ctx, lvl, pos, init)
Given the last reference position, pos
, freeze all fibers within lvl
assuming that we have potentially updated 1:pos
.
Finch.level_ndims
— Functionlevel_ndims(::Type{Lvl})
The result of level_ndims(Lvl)
defines ndims for all subfibers in a level of type Lvl
.
Finch.level_size
— Functionlevel_size(lvl)
The result of level_size(lvl)
defines the size of all subfibers in the level lvl
.
Finch.level_axes
— Functionlevel_axes(lvl)
The result of level_axes(lvl)
defines the axes of all subfibers in the level lvl
.
Finch.level_eltype
— Functionlevel_eltype(::Type{Lvl})
The result of level_eltype(Lvl)
defines eltype
for all subfibers in a level of type Lvl
.
Finch.level_fill_value
— Functionlevel_fill_value(::Type{Lvl})
The result of level_fill_value(Lvl)
defines fill_value
for all subfibers in a level of type Lvl
.
Combinator Interface
Tensor Combinators allow us to modify the behavior of tensors. The AbstractCombinator
interface (defined in src/tensors/abstract_combinator.jl
) is the interface through which Finch understands tensor combinators. The interface requires the combinator to overload all of the tensor methods, as well as the methods used by Looplets when lowering ranges, etc. For a minimal example, read the definitions in /src/tensors/combinators/offset.jl
.