Package smile.data
Record Class DataFrame
java.lang.Object
java.lang.Record
smile.data.DataFrame
- Record Components:
schema
- the schema of DataFrame.columns
- the columns of DataFrame.index
- the optional row index.
- All Implemented Interfaces:
Serializable
,Iterable<Row>
public record DataFrame(StructType schema, List<ValueVector> columns, RowIndex index)
extends Record
implements Iterable<Row>, Serializable
Two-dimensional, potentially heterogeneous tabular data.
- See Also:
-
Constructor Summary
ConstructorsConstructorDescriptionDataFrame
(RowIndex index, ValueVector... columns) Constructor.DataFrame
(StructType schema, List<ValueVector> columns, RowIndex index) Creates an instance of aDataFrame
record class.DataFrame
(ValueVector... columns) Constructor. -
Method Summary
Modifier and TypeMethodDescriptionadd
(ValueVector... vectors) Adds columns to this data frame.apply
(boolean[] index) Returns a new data frame with boolean indexing.apply
(int i) Returns the row at the specified index.apply
(int i, int j) Returns the cell at (i, j).Returns the column of given name.Returns a new DataFrame with selected columns.Returns a new data frame with row indexing.column
(int j) Returns the j-th column.Returns the column of given name.columns()
Returns the value of thecolumns
record component.Concatenates data frames vertically by rows.describe()
Returns the data structure and statistics.drop
(int... indices) Returns a new DataFrame without selected columns.Returns a new DataFrame without selected columns.dropna()
Returns a new data frame without rows that have null/missing values.DataType[]
dtypes()
Returns the column data types.final boolean
Indicates whether some other object is "equal to" this one.Returns a new DataFrame with given columns converted to nominal.fillna
(double value) Fills null/NaN/Inf values of numeric columns with the specified value.get
(boolean[] index) Returns a new data frame with boolean indexing.get
(int i) Returns the row at the specified index.get
(int i, int j) Returns the cell at (i, j).Returns a new data frame with row indexing.double
getDouble
(int i, int j) Returns the double value at position (i, j).float
getFloat
(int i, int j) Returns the float value at position (i, j).int
getInt
(int i, int j) Returns the int value at position (i, j).long
getLong
(int i, int j) Returns the long value at position (i, j).getScale
(int i, int j) Returns the value at position (i, j) of NominalScale or OrdinalScale.getString
(int i, int j) Returns the string representation of the value at position (i, j).final int
hashCode()
Returns a hash code value for this object.head
(int numRows) Returns the string representation of top rows.index()
Returns the value of theindex
record component.boolean
isEmpty()
Returns true if the data frame is empty.boolean
isNullAt
(int i, int j) Checks whether the value at position (i, j) is null or missing value.iterator()
Joins two data frames on their index.Returns the row with the specified index.Returns a new data frame with specified rows.Measure[]
measures()
Returns the column's level of measurements.Merges data frames horizontally by columns.String[]
names()
Returns the column names.int
ncol()
Returns the number of columns.int
nrow()
Returns the number of rows.static DataFrame
Creates a DataFrame from a 2-dimensional array.static DataFrame
Creates a DataFrame from a 2-dimensional array.static DataFrame
Creates a DataFrame from a 2-dimensional array.static <T> DataFrame
Creates a DataFrame from a collection of objects.static DataFrame
Creates a DataFrame from a JDBC ResultSet.static DataFrame
of
(StructType schema, List<? extends Tuple> data) Creates a DataFrame from a set of tuples.static DataFrame
of
(StructType schema, Stream<? extends Tuple> data) Creates a DataFrame from a stream of tuples.schema()
Returns the value of theschema
record component.select
(int... indices) Returns a new DataFrame with selected columns.Returns a new DataFrame with selected columns.void
Sets the value at position (i, j).set
(String name, ValueVector column) Sets the column values.Sets the DataFrame index.Sets the DataFrame index using existing column.int
shape
(int dim) Returns the size of given dimension.int
size()
Returns the number of rows.stream()
Returns a (possibly parallel) Stream of rows.tail
(int numRows) Returns the string representation of bottom rows.double[][]
toArray
(boolean bias, CategoricalEncoder encoder, String... names) Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.double[][]
Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.toList()
Returns theList
of rows.toMatrix()
Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.toMatrix
(boolean bias, CategoricalEncoder encoder, String rowNames) Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.toString()
Returns a string representation of this record class.toString
(int from, int to, boolean truncate) Returns the string representation of rows in specified range.void
Updates the value at position (i, j).update
(String name, ValueVector column) Sets the column values.Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
-
Constructor Details
-
DataFrame
Creates an instance of aDataFrame
record class. -
DataFrame
Constructor.- Parameters:
columns
- the columns of DataFrame.
-
DataFrame
Constructor.- Parameters:
index
- the row index.columns
- the columns of DataFrame.
-
-
Method Details
-
toString
Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components. -
names
Returns the column names.- Returns:
- the column names.
-
dtypes
Returns the column data types.- Returns:
- the column data types.
-
measures
Returns the column's level of measurements.- Returns:
- the column's level of measurements.
-
shape
public int shape(int dim) Returns the size of given dimension. For pandas user's convenience.- Parameters:
dim
- the dimension index.- Returns:
- the size of given dimension.
-
size
public int size()Returns the number of rows. This is an alias tonrow
for Java's convention.- Returns:
- the number of rows.
-
nrow
public int nrow()Returns the number of rows.- Returns:
- the number of rows.
-
ncol
public int ncol()Returns the number of columns.- Returns:
- the number of columns.
-
isEmpty
public boolean isEmpty()Returns true if the data frame is empty.- Returns:
- true if the data frame is empty.
-
setIndex
Sets the DataFrame index using existing column. The index column will be removed from the DataFrame.- Parameters:
column
- the name of column that will be used as row index.- Returns:
- a new DataFrame with the row index.
-
setIndex
Sets the DataFrame index.- Parameters:
index
- the row index values.- Returns:
- a new DataFrame with the row index.
-
column
Returns the j-th column.- Parameters:
j
- the column index.- Returns:
- the column vector.
-
column
Returns the column of given name.- Parameters:
name
- the column name.- Returns:
- the column vector.
-
apply
Returns the column of given name. This is an alias tocolumn
for Scala's convenience.- Parameters:
name
- the column name.- Returns:
- the column vector.
-
apply
Returns a new DataFrame with selected columns. This is an alias toselect
for Scala's convenience.- Parameters:
names
- the column names.- Returns:
- a new DataFrame with selected columns.
-
get
Returns the row at the specified index.- Parameters:
i
- the row index.- Returns:
- the i-th row.
-
apply
Returns the row at the specified index. This is an alias toget
for Scala's convenience.- Parameters:
i
- the row index.- Returns:
- the i-th row.
-
loc
Returns the row with the specified index.- Parameters:
row
- the row index.- Returns:
- the row with the specified index.
-
loc
Returns a new data frame with specified rows.- Parameters:
rows
- the row indices.- Returns:
- a new data frame with specified rows.
-
get
Returns a new data frame with row indexing.- Parameters:
index
- the row indexing.- Returns:
- the data frame of selected rows.
-
apply
Returns a new data frame with row indexing. This is an alias toget
for Scala's convenience.- Parameters:
index
- the row indexing.- Returns:
- the data frame of selected rows.
-
get
Returns a new data frame with boolean indexing.- Parameters:
index
- the boolean indexing.- Returns:
- the data frame of selected rows.
-
apply
Returns a new data frame with boolean indexing. This is an alias toget
for Scala's convenience.- Parameters:
index
- the boolean indexing.- Returns:
- the data frame of selected rows.
-
isNullAt
public boolean isNullAt(int i, int j) Checks whether the value at position (i, j) is null or missing value.- Parameters:
i
- the row index.j
- the column index.- Returns:
- true if the cell value is null.
-
get
Returns the cell at (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the cell value.
-
apply
Returns the cell at (i, j). This is an alias toget
for Scala's convenience.- Parameters:
i
- the row index.j
- the column index.- Returns:
- the cell value.
-
getInt
public int getInt(int i, int j) Returns the int value at position (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the int value of cell.
-
getLong
public long getLong(int i, int j) Returns the long value at position (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the long value of cell.
-
getFloat
public float getFloat(int i, int j) Returns the float value at position (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the float value of cell.
-
getDouble
public double getDouble(int i, int j) Returns the double value at position (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the double value of cell.
-
getString
Returns the string representation of the value at position (i, j).- Parameters:
i
- the row index.j
- the column index.- Returns:
- the string representation of cell value.
-
getScale
Returns the value at position (i, j) of NominalScale or OrdinalScale.- Parameters:
i
- the row index.j
- the column index.- Returns:
- the cell scale.
- Throws:
ClassCastException
- when the data is not nominal or ordinal.
-
set
Sets the value at position (i, j).- Parameters:
i
- the row index.j
- the column index.value
- the new value.
-
update
Updates the value at position (i, j). This is an alias toset
for Scala's convenience.- Parameters:
i
- the row index.j
- the column index.value
- the new value.
-
stream
Returns a (possibly parallel) Stream of rows.- Returns:
- a (possibly parallel) Stream of rows.
-
iterator
-
toList
Returns theList
of rows.- Returns:
- the
List
of rows.
-
dropna
Returns a new data frame without rows that have null/missing values.- Returns:
- the data frame without null/missing values.
-
fillna
Fills null/NaN/Inf values of numeric columns with the specified value.- Parameters:
value
- the value to replace NAs.- Returns:
- this data frame.
-
select
Returns a new DataFrame with selected columns.- Parameters:
indices
- the column indices.- Returns:
- a new DataFrame with selected columns.
-
select
Returns a new DataFrame with selected columns.- Parameters:
names
- the column names.- Returns:
- a new DataFrame with selected columns.
-
drop
Returns a new DataFrame without selected columns.- Parameters:
indices
- the column indices.- Returns:
- a new DataFrame without selected columns.
-
drop
Returns a new DataFrame without selected columns.- Parameters:
names
- the column names.- Returns:
- a new DataFrame without selected columns.
-
add
Adds columns to this data frame.- Parameters:
vectors
- the columns to add.- Returns:
- this dataframe.
-
set
Sets the column values. If the column does not exist, adds it as the last column of the dataframe.- Parameters:
name
- the column name.column
- the new column value.- Returns:
- this dataframe.
-
update
Sets the column values. If the column does not exist, adds it as the last column of the dataframe. This is an alias toset
for Scala's convenience.- Parameters:
name
- the column name.column
- the new column value.- Returns:
- this dataframe.
-
join
Joins two data frames on their index. If either dataframe has no index, merges them horizontally by columns.- Parameters:
other
- the data frames to merge.- Returns:
- a new data frame with combined columns.
-
merge
Merges data frames horizontally by columns. If there are columns with the same name, the latter ones will be renamed with suffix such as _2, _3, etc.- Parameters:
dataframes
- the data frames to merge.- Returns:
- a new data frame with combined columns.
-
concat
Concatenates data frames vertically by rows.- Parameters:
dataframes
- the data frames to concatenate.- Returns:
- a new data frame that combines all the rows.
-
factorize
Returns a new DataFrame with given columns converted to nominal.- Parameters:
names
- column names. If empty, all object columns in the data frame will be converted.- Returns:
- a new DataFrame.
-
toArray
Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.- Parameters:
columns
- the columns to export. If empty, all columns will be used.- Returns:
- the numeric array.
-
toArray
Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.- Parameters:
bias
- if true, add the first column of all 1's.encoder
- the categorical variable encoder.names
- the columns to export. If empty, all columns will be used.- Returns:
- the numeric array.
-
toMatrix
Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.- Returns:
- the numeric matrix.
-
toMatrix
Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.- Parameters:
bias
- if true, add the first column of all 1's.encoder
- the categorical variable encoder.rowNames
- the column to be used as row names.- Returns:
- the numeric matrix.
-
describe
Returns the data structure and statistics.- Returns:
- the data structure and statistics.
-
head
Returns the string representation of top rows.- Parameters:
numRows
- the number of rows to show.- Returns:
- the string representation of top rows.
-
tail
Returns the string representation of bottom rows.- Parameters:
numRows
- the number of rows to show.- Returns:
- the string representation of bottom rows.
-
toString
Returns the string representation of rows in specified range.- Parameters:
from
- the initial index of the range to show, inclusiveto
- the final index of the range to show, exclusive.truncate
- Whether truncate long strings and align cells right.- Returns:
- the string representation of rows in specified range.
-
of
Creates a DataFrame from a 2-dimensional array.- Parameters:
data
- the data array.names
- the name of columns.- Returns:
- the data frame.
-
of
Creates a DataFrame from a 2-dimensional array.- Parameters:
data
- the data array.names
- the name of columns.- Returns:
- the data frame.
-
of
Creates a DataFrame from a 2-dimensional array.- Parameters:
data
- the data array.names
- the name of columns.- Returns:
- the data frame.
-
of
Creates a DataFrame from a collection of objects.- Type Parameters:
T
- The data type of elements.- Parameters:
clazz
- The class type of elements.data
- The data collection.- Returns:
- the data frame.
-
of
Creates a DataFrame from a stream of tuples.- Parameters:
data
- The data stream.- Returns:
- the data frame.
-
of
Creates a DataFrame from a set of tuples.- Parameters:
schema
- The schema of tuple.data
- The data collection.- Returns:
- the data frame.
-
of
Creates a DataFrame from a JDBC ResultSet.- Parameters:
rs
- The JDBC result set.- Returns:
- the data frame.
- Throws:
SQLException
- when JDBC operation fails.
-
hashCode
public final int hashCode()Returns a hash code value for this object. The value is derived from the hash code of each of the record components. -
equals
Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared withObjects::equals(Object,Object)
. -
schema
Returns the value of theschema
record component.- Returns:
- the value of the
schema
record component
-
columns
Returns the value of thecolumns
record component.- Returns:
- the value of the
columns
record component
-
index
Returns the value of theindex
record component.- Returns:
- the value of the
index
record component
-