Package smile.data
Interface SparseDataset<T>
- Type Parameters:
T
- the target type.
- All Superinterfaces:
Dataset<SparseArray,
,T> Iterable<SampleInstance<SparseArray,
T>>
List of Lists sparse matrix format. LIL stores one list per row,
where each entry stores a column index and value. Typically, these
entries are kept sorted by column index for faster lookup.
This format is good for incremental matrix construction.
LIL is typically used to construct the matrix. Once the matrix is constructed, it is typically converted to a format, such as Harwell-Boeing column-compressed sparse matrix format, which is more efficient for matrix operations.
-
Method Summary
Modifier and TypeMethodDescriptionstatic SparseDataset
Parses spare dataset in coordinate triple tuple list format.static SparseDataset
<Void> Reads spare dataset in coordinate triple tuple list format.default double
get
(int i, int j) Returns the value at entry (i, j).int
ncol()
Returns the number of columns.default int
nrow()
Returns the number of rows.int
nz()
Returns the number of nonzero entries.int
nz
(int j) Returns the number of nonzero entries in column j.static <T> SparseDataset
<T> of
(Collection<SampleInstance<SparseArray, T>> data) Returns a default implementation of SparseDataset without targets.static <T> SparseDataset
<T> of
(Collection<SampleInstance<SparseArray, T>> data, int ncol) Returns a default implementation of SparseDataset without targets.static SparseDataset
<Void> of
(Stream<SparseArray> data) Returns a default implementation of SparseDataset.static SparseDataset
<Void> of
(SparseArray[] data) Returns a default implementation of SparseDataset without targets.static SparseDataset
<Void> of
(SparseArray[] data, int ncol) Returns a default implementation of SparseDataset without targets.default SparseMatrix
toMatrix()
Convert into Harwell-Boeing column-compressed sparse matrix format.default void
unitize()
Unitize each row so that L2 norm of x = 1.default void
unitize1()
Unitize each row so that L1 norm of x is 1.Methods inherited from interface smile.data.Dataset
apply, batch, get, isEmpty, size, stream, toList, toString
Methods inherited from interface java.lang.Iterable
forEach, iterator, spliterator
-
Method Details
-
nz
int nz()Returns the number of nonzero entries.- Returns:
- the number of nonzero entries.
-
nz
int nz(int j) Returns the number of nonzero entries in column j.- Parameters:
j
- the column index.- Returns:
- the number of nonzero entries in column j.
-
nrow
default int nrow()Returns the number of rows.- Returns:
- the number of rows.
-
ncol
int ncol()Returns the number of columns.- Returns:
- the number of columns.
-
get
default double get(int i, int j) Returns the value at entry (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the cell value.
-
unitize
default void unitize()Unitize each row so that L2 norm of x = 1. -
unitize1
default void unitize1()Unitize each row so that L1 norm of x is 1. -
toMatrix
Convert into Harwell-Boeing column-compressed sparse matrix format.- Returns:
- the sparse matrix.
-
of
Returns a default implementation of SparseDataset without targets.- Type Parameters:
T
- the target type.- Parameters:
data
- sparse arrays.- Returns:
- the sparse dataset.
-
of
Returns a default implementation of SparseDataset without targets.- Type Parameters:
T
- the target type.- Parameters:
data
- sparse arrays.ncol
- the number of columns.- Returns:
- the sparse dataset.
-
of
Returns a default implementation of SparseDataset without targets.- Parameters:
data
- sparse arrays.- Returns:
- the sparse dataset.
-
of
Returns a default implementation of SparseDataset without targets.- Parameters:
data
- sparse arrays.ncol
- the number of columns.- Returns:
- the sparse dataset.
-
of
Returns a default implementation of SparseDataset.- Parameters:
data
- sparse arrays.- Returns:
- the sparse dataset.
-
from
Parses spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples.- Parameters:
path
- the input file path.- Returns:
- the sparse dataset.
- Throws:
IOException
- when fails to read file.ParseException
- when fails to parse data.
-
from
Reads spare dataset in coordinate triple tuple list format. Coordinate file stores a list of (row, column, value) tuples:instanceID attributeID value instanceID attributeID value instanceID attributeID value instanceID attributeID value ... instanceID attributeID value instanceID attributeID value instanceID attributeID value
Ideally, the entries are sorted (by row index, then column index) to improve random access times. This format is good for incremental matrix construction.In addition, there may a header line
D W N // The number of rows, columns and nonzero entries.
or 3 header linesD // The number of rows W // The number of columns N // The total number of nonzero entries in the dataset.
- Parameters:
path
- the input file path.arrayIndexOrigin
- the starting index of array. By default, it is 0 as in C/C++ and Java. But it could be 1 to parse data produced by other programming language such as Fortran.- Returns:
- the sparse dataset.
- Throws:
IOException
- if stream to file cannot be read or closed.ParseException
- if an index is not an integer or the value is not a double.
-