Sparse Array Utilities
Sparse Constructors
In addition to the Tensor
constructor, Finch provides a number of convenience constructors for common tensor types. For example, the spzeros
and sprand
functions have fspzeros
and fsprand
counterparts that return Finch tensors. We can also construct a sparse COO Tensor
from a list of indices and values using the fsparse
function.
Finch.fsparse
— Functionfsparse(I::Tuple, V,[ M::Tuple, combine]; fill_value=zero(eltype(V)))
Create a sparse COO tensor S
such that size(S) == M
and S[(i[q] for i = I)...] = V[q]
. The combine function is used to combine duplicates. If M
is not specified, it is set to map(maximum, I)
. If the combine function is not supplied, combine defaults to +
unless the elements of V are Booleans in which case combine defaults to |
. All elements of I must satisfy 1 <= I[n][q] <= M[n]. Numerical zeros are retained as structural nonzeros; to drop numerical zeros, use dropzeros!.
See also: sparse
Examples
julia> I = ( [1, 2, 3], [1, 2, 3], [1, 2, 3]);
julia> V = [1.0; 2.0; 3.0];
julia> fsparse(I, V) SparseCOO (0.0) [1:3×1:3×1:3] │ │ │ └─└─└─[1, 1, 1] [2, 2, 2] [3, 3, 3] 1.0 2.0 3.0
Finch.fsparse!
— Functionfsparse!(I..., V,[ M::Tuple])
Like fsparse
, but the coordinates must be sorted and unique, and memory is reused.
Finch.fsprand
— Functionfsprand([rng],[type], M..., p, [rfn])
Create a random sparse tensor of size m
in COO format. There are two cases: - If p
is floating point, the probability of any element being nonzero is independently given by p
(and hence the expected density of nonzeros is also p
). - If p
is an integer, exactly p
nonzeros are distributed uniformly at random throughout the tensor (and hence the density of nonzeros is exactly p / prod(M)
). Nonzero values are sampled from the distribution specified by rfn
and have the type type
. The uniform distribution is used in case rfn
is not specified. The optional rng
argument specifies a random number generator.
See also: (sprand
)(https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.sprand)
Examples
julia> fsprand(Bool, 3, 3, 0.5)
SparseCOO (false) [1:3,1:3]
├─├─[1, 1]: true
├─├─[3, 1]: true
├─├─[2, 2]: true
├─├─[3, 2]: true
├─├─[3, 3]: true
julia> fsprand(Float64, 2, 2, 2, 0.5)
SparseCOO (0.0) [1:2,1:2,1:2]
├─├─├─[2, 2, 1]: 0.6478553157718558
├─├─├─[1, 1, 2]: 0.996665291437684
├─├─├─[2, 1, 2]: 0.7491940599574348
Finch.fspzeros
— Functionfspzeros([type], M...)
Create a random zero tensor of size M
, with elements of type type
. The tensor is in COO format.
See also: (spzeros
)(https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.spzeros)
Examples
julia> A = fspzeros(Bool, 3, 3)
3×3 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, ElementLevel{false, Bool, Int64, Vector{Bool}}}}:
0 0 0
0 0 0
0 0 0
julia> countstored(A)
0
julia> B = fspzeros(Float64, 2, 2, 2)
2×2×2 Tensor{SparseCOOLevel{3, Tuple{Int64, Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}, Vector{Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
[:, :, 1] =
0.0 0.0
0.0 0.0
[:, :, 2] =
0.0 0.0
0.0 0.0
julia> countstored(B)
0
Finch.ffindnz
— Functionffindnz(arr)
Return the nonzero elements of arr
, as Finch understands arr
. Returns (I..., V)
, where I
are the coordinate vectors, one for each mode of arr
, and V
is a vector of corresponding nonzero values, which can be passed to fsparse
.
See also: (findnz
)(https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.findnz)
Fill Values
Finch tensors support an arbitrary "background" value for sparse arrays. While most arrays use 0
as the background value, this is not always the case. For example, a sparse array of Int
might use typemin(Int)
as the background value. The fill_value
function returns the background value of a tensor. If you ever want to change the background value of an existing array, you can use the set_fill_value!
function. The countstored
function returns the number of stored elements in a tensor, and calling pattern!
on a tensor returns tensor which is true whereever the original tensor stores a value. Note that countstored doesn't always return the number of non-zero elements in a tensor, as it counts the number of stored elements, and stored elements may include the background value. You can call dropfills!
to remove explicitly stored background values from a tensor.
julia> A = fsparse([1, 1, 2, 3], [2, 4, 5, 6], [1.0, 2.0, 3.0])
3×6 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
0.0 1.0 0.0 2.0 0.0 0.0
0.0 0.0 0.0 0.0 3.0 0.0
0.0 0.0 0.0 0.0 0.0 0.0
julia> min.(A, -1)
3×6 Tensor{DenseLevel{Int64, DenseLevel{Int64, ElementLevel{-1.0, Float64, Int64, Vector{Float64}}}}}:
-1.0 -1.0 -1.0 -1.0 -1.0 -1.0
-1.0 -1.0 -1.0 -1.0 -1.0 -1.0
-1.0 -1.0 -1.0 -1.0 -1.0 -1.0
julia> fill_value(A)
0.0
julia> B = set_fill_value!(A, -Inf)
3×6 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, ElementLevel{-Inf, Float64, Int64, Vector{Float64}}}}:
-Inf 1.0 -Inf 2.0 -Inf -Inf
-Inf -Inf -Inf -Inf 3.0 -Inf
-Inf -Inf -Inf -Inf -Inf -Inf
julia> min.(B, -1)
3×6 Tensor{SparseDictLevel{Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}, Vector{Int64}, SparseDictLevel{Int64, Vector{Int64}, Vector{Int64}, Vector{Int64}, Dict{Tuple{Int64, Int64}, Int64}, Vector{Int64}, ElementLevel{-Inf, Float64, Int64, Vector{Float64}}}}}:
-Inf -1.0 -Inf -1.0 -Inf -Inf
-Inf -Inf -Inf -Inf -1.0 -Inf
-Inf -Inf -Inf -Inf -Inf -Inf
julia> countstored(A)
3
julia> pattern!(A)
3×6 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, PatternLevel{Int64}}}:
0 1 0 1 0 0
0 0 0 0 1 0
0 0 0 0 0 0
Finch.fill_value
— Functionfill_value(arr)
Return the initializer for arr
. For SparseArrays, this is 0. Often, the "fill" value becomes the "background" value of a tensor.
Finch.set_fill_value!
— Functionset_fill_value!(fbr, init)
Return a tensor which is equal to fbr
, but with the fill (implicit) value set to init
. May reuse memory and render the original tensor unusable when modified.
julia> A = Tensor(SparseList(Element(0.0), 10), [2.0, 0.0, 3.0, 0.0, 4.0, 0.0, 5.0, 0.0, 6.0, 0.0])
10 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
2.0
0.0
3.0
0.0
4.0
0.0
5.0
0.0
6.0
0.0
julia> set_fill_value!(A, Inf)
10 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{Inf, Float64, Int64, Vector{Float64}}}}:
2.0
Inf
3.0
Inf
4.0
Inf
5.0
Inf
6.0
Inf
Finch.pattern!
— Functionpattern!(fbr)
Return the pattern of fbr
. That is, return a tensor which is true wherever fbr
is structurally unequal to its fill_value. May reuse memory and render the original tensor unusable when modified.
julia> A = Tensor(SparseList(Element(0.0), 10), [2.0, 0.0, 3.0, 0.0, 4.0, 0.0, 5.0, 0.0, 6.0, 0.0])
10 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}:
2.0
0.0
3.0
0.0
4.0
0.0
5.0
0.0
6.0
0.0
julia> pattern!(A)
10 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, PatternLevel{Int64}}}:
1
0
1
0
1
0
1
0
1
0
Finch.countstored
— Functioncountstored(arr)
Return the number of stored elements in arr
. If there are explicitly stored fill elements, they are counted too.
See also: (SparseArrays.nnz
)(https://docs.julialang.org/en/v1/stdlib/SparseArrays/#SparseArrays.nnz) and (Base.summarysize
)(https://docs.julialang.org/en/v1/base/base/#Base.summarysize)
Finch.dropfills
— Functiondropfills(src)
Drop the fill values from src
and return a new tensor with the same shape and format.
Finch.dropfills!
— Functiondropfills!(dst, src)
Copy only the non-fill values from src
into dst
.
How to tell whether an entry is "fill"
In the sparse world, a semantic distinction is sometimes made between "explicitly stored" values and "implicit" or "fill" values (usually zero). However, the formats in the Finch compiler represent a diverse set of structures beyond sparsity, and it is often unclear whether any of the values in the tensor are "explicit" (consider a mask matrix, which can be represented with a constant number of bits). Thus, Finch makes no semantic distinction between values which are stored explicitly or not. If users wish to make this distinction, they should instead store a tensor of tuples of the form (value, is_fill)
. For example,
julia> A = fsparse(
[1, 1, 2, 3],
[2, 4, 5, 6],
[(1.0, false), (0.0, true), (3.0, false)];
fill_value=(0.0, true),
)
3×6 Tensor{SparseCOOLevel{2, Tuple{Int64, Int64}, Vector{Int64}, Tuple{Vector{Int64}, Vector{Int64}}, ElementLevel{(0.0, true), Tuple{Float64, Bool}, Int64, Vector{Tuple{Float64, Bool}}}}}:
(0.0, 1) (1.0, 0) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1)
(0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (3.0, 0) (0.0, 1)
(0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1)
julia> B = Tensor(Dense(SparseList(Element((0.0, true)))), A)
3×6 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{(0.0, true), Tuple{Float64, Bool}, Int64, Vector{Tuple{Float64, Bool}}}}}}:
(0.0, 1) (1.0, 0) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1)
(0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (3.0, 0) (0.0, 1)
(0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1) (0.0, 1)
julia> sum(map(last, B))
16
julia> sum(map(first, B))
4.0
Format Conversion and Storage Order
Converting Between Formats
You can convert between tensor formats with the Tensor
constructor. Simply construct a new Tensor in the desired format and
julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
julia> B = Tensor(DCSCFormat(), A)
3×4 Tensor{SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
Storage Order
By default, tensors in Finch are column-major. However, you can use the swizzle
function to transpose them lazily. To convert to a transposed format, use the dropfills!
function. Note that the permutedims
function transposes eagerly.
Finch.swizzle
— Functionswizzle(tns, dims)
Create a SwizzleArray
to transpose any tensor tns
such that
swizzle(tns, dims)[i...] == tns[i[dims]]
julia> A = Tensor(CSCFormat(), [0 0 2 1; 0 0 1 0; 1 0 0 0])
3×4 Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0
julia> tensor_tree(swizzle(A, 2, 1))
SwizzleArray (2, 1)
└─ 3×4-Tensor
└─ Dense [:,1:4]
├─ [:, 1]: SparseList (0.0) [1:3]
│ └─ [3]: 1.0
├─ [:, 2]: SparseList (0.0) [1:3]
├─ [:, 3]: SparseList (0.0) [1:3]
│ ├─ [1]: 2.0
│ └─ [2]: 1.0
└─ [:, 4]: SparseList (0.0) [1:3]
└─ [1]: 1.0
julia> tensor_tree(permutedims(A, (2, 1)))
4×3-Tensor
└─ SparseDict (0.0) [:,1:3]
├─ [:, 1]: SparseDict (0.0) [1:4]
│ ├─ [3]: 2.0
│ └─ [4]: 1.0
├─ [:, 2]: SparseDict (0.0) [1:4]
│ └─ [3]: 1.0
└─ [:, 3]: SparseDict (0.0) [1:4]
└─ [1]: 1.0
julia> dropfills!(swizzle(Tensor(CSCFormat()), 2, 1), A)
3×4 Finch.SwizzleArray{(2, 1), Tensor{DenseLevel{Int64, SparseListLevel{Int64, Vector{Int64}, Vector{Int64}, ElementLevel{0.0, Float64, Int64, Vector{Float64}}}}}}:
0.0 0.0 2.0 1.0
0.0 0.0 1.0 0.0
1.0 0.0 0.0 0.0