java.lang.Object

java.lang.Record

smile.data.DataFrame

Record Components:: schema - the schema of DataFrame.; columns - the columns of DataFrame.; index - the optional row index.

All Implemented Interfaces:: Serializable, Iterable<Row>

public record DataFrame(StructType schema, List<ValueVector> columns, RowIndex index) extends Record implements Iterable<Row>, Serializable

Two-dimensional, potentially heterogeneous tabular data.

See Also:

Constructor Summary

Constructors

Constructor

Description

DataFrame(RowIndex index, ValueVector... columns)

Constructor.

DataFrame(StructType schema, List<ValueVector> columns, RowIndex index)

Constructor.

DataFrame(ValueVector... columns)

Constructor.
Method Summary

Modifier and Type

Method

Description

DataFrame

add(ValueVector... vectors)

Adds columns to this data frame.

DataFrame

apply(boolean[] index)

Returns a new data frame with boolean indexing.

Tuple

apply(int i)

Returns the row at the specified index.

Object

apply(int i, int j)

Returns the cell at (i, j).

ValueVector

apply(String name)

Returns the column of given name.

DataFrame

apply(String... names)

Returns a new DataFrame with selected columns.

DataFrame

apply(Index index)

Returns a new data frame with row indexing.

ValueVector

column(int j)

Returns the j-th column.

ValueVector

column(String name)

Returns the column of given name.

List<ValueVector>

columns()

Returns the value of the columns record component.

DataFrame

concat(DataFrame... dataframes)

Concatenates data frames vertically by rows.

DataFrame

describe()

Returns the data structure and statistics.

DataFrame

drop(int... indices)

Returns a new DataFrame without selected columns.

DataFrame

drop(String... names)

Returns a new DataFrame without selected columns.

DataFrame

dropna()

Returns a new data frame without rows that have null/missing values.

DataType[]

dtypes()

Returns the column data types.

final boolean

equals(Object o)

Indicates whether some other object is "equal to" this one.

DataFrame

factorize(String... names)

Returns a new DataFrame with given columns converted to nominal.

DataFrame

fillna(double value)

Fills null/NaN/Inf values of numeric columns with the specified value.

DataFrame

get(boolean[] index)

Returns a new data frame with boolean indexing.

Tuple

get(int i)

Returns the row at the specified index.

Object

get(int i, int j)

Returns the cell at (i, j).

DataFrame

get(Index index)

Returns a new data frame with row indexing.

double

getDouble(int i, int j)

Returns the double value at position (i, j).

float

getFloat(int i, int j)

Returns the float value at position (i, j).

int

getInt(int i, int j)

Returns the int value at position (i, j).

long

getLong(int i, int j)

Returns the long value at position (i, j).

String

getScale(int i, int j)

Returns the value at position (i, j) of NominalScale or OrdinalScale.

String

getString(int i, int j)

Returns the string representation of the value at position (i, j).

final int

hashCode()

Returns a hash code value for this object.

String

head(int numRows)

Returns the string representation of top rows.

RowIndex

index()

Returns the value of the index record component.

boolean

isEmpty()

Returns true if the data frame is empty.

boolean

isNullAt(int i, int j)

Checks whether the value at position (i, j) is null or missing value.

Iterator<Row>

iterator()

DataFrame

join(DataFrame other)

Joins two data frames on their index.

Tuple

loc(Object row)

Returns the row with the specified index.

DataFrame

loc(Object... rows)

Returns a new data frame with specified rows.

Measure[]

measures()

Returns the column's level of measurements.

DataFrame

merge(DataFrame... dataframes)

Merges data frames horizontally by columns.

String[]

names()

Returns the column names.

int

ncol()

Returns the number of columns.

int

nrow()

Returns the number of rows.

static DataFrame

of(double[][] data, String... names)

Creates a DataFrame from a 2-dimensional array.

static DataFrame

of(float[][] data, String... names)

Creates a DataFrame from a 2-dimensional array.

static DataFrame

of(int[][] data, String... names)

Creates a DataFrame from a 2-dimensional array.

static <T> DataFrame

of(Class<T> clazz, List<T> data)

Creates a DataFrame from a collection of objects.

static DataFrame

of(ResultSet rs)

Creates a DataFrame from a JDBC ResultSet.

static DataFrame

of(StructType schema, List<? extends Tuple> data)

Creates a DataFrame from a set of tuples.

static DataFrame

of(StructType schema, Stream<? extends Tuple> data)

Creates a DataFrame from a stream of tuples.

StructType

schema()

Returns the value of the schema record component.

DataFrame

select(int... indices)

Returns a new DataFrame with selected columns.

DataFrame

select(String... names)

Returns a new DataFrame with selected columns.

void

set(int i, int j, Object value)

Sets the value at position (i, j).

DataFrame

set(String name, ValueVector column)

Sets the column values.

DataFrame

setIndex(Object[] index)

Sets the DataFrame index.

DataFrame

setIndex(String column)

Sets the DataFrame index using existing column.

int

shape(int dim)

Returns the size of given dimension.

int

size()

Returns the number of rows.

Stream<Row>

stream()

Returns a (possibly parallel) Stream of rows.

String

tail(int numRows)

Returns the string representation of bottom rows.

double[][]

toArray(boolean bias, CategoricalEncoder encoder, String... names)

Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.

double[][]

toArray(String... columns)

Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.

List<Row>

toList()

Returns the List of rows.

Matrix

toMatrix()

Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.

Matrix

toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames)

Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.

String

toString()

Returns a string representation of this record class.

String

toString(int from, int to, boolean truncate)

Returns the string representation of rows in specified range.

void

update(int i, int j, Object value)

Updates the value at position (i, j).

DataFrame

update(String name, ValueVector column)

Sets the column values.

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods inherited from interface java.lang.Iterable
forEach, spliterator

Constructor Details
- DataFrame
  
  public DataFrame(StructType schema, List<ValueVector> columns, RowIndex index)
  
  Constructor.
- DataFrame
  
  public DataFrame(ValueVector... columns)
  
  Constructor.
  
  Parameters:
  
  columns - the columns of DataFrame.
- DataFrame
  
  public DataFrame(RowIndex index, ValueVector... columns)
  
  Constructor.
  
  Parameters:
  
  index - the row index.
  
  columns - the columns of DataFrame.
Method Details
- toString
  
  public String toString()
  
  Returns a string representation of this record class. The representation contains the name of the class, followed by the name and value of each of the record components.
  
  Specified by:
  
  toString in class Record
  
  Returns:
  
  a string representation of this object
- names
  
  public String[] names()
  
  Returns the column names.
  
  Returns:
  
  the column names.
- dtypes
  
  public DataType[] dtypes()
  
  Returns the column data types.
  
  Returns:
  
  the column data types.
- measures
  
  public Measure[] measures()
  
  Returns the column's level of measurements.
  
  Returns:
  
  the column's level of measurements.
- shape
  
  public int shape(int dim)
  
  Returns the size of given dimension. For pandas user's convenience.
  
  Parameters:
  
  dim - the dimension index.
  
  Returns:
  
  the size of given dimension.
- size
  
  public int size()
  
  Returns the number of rows. This is an alias to nrow for Java's convention.
  
  Returns:
  
  the number of rows.
- nrow
  
  public int nrow()
  
  Returns the number of rows.
  
  Returns:
  
  the number of rows.
- ncol
  
  public int ncol()
  
  Returns the number of columns.
  
  Returns:
  
  the number of columns.
- isEmpty
  
  public boolean isEmpty()
  
  Returns true if the data frame is empty.
  
  Returns:
  
  true if the data frame is empty.
- setIndex
  
  public DataFrame setIndex(String column)
  
  Sets the DataFrame index using existing column. The index column will be removed from the DataFrame.
  
  Parameters:
  
  column - the name of column that will be used as row index.
  
  Returns:
  
  a new DataFrame with the row index.
- setIndex
  
  public DataFrame setIndex(Object[] index)
  
  Sets the DataFrame index.
  
  Parameters:
  
  index - the row index values.
  
  Returns:
  
  a new DataFrame with the row index.
- column
  
  public ValueVector column(int j)
  
  Returns the j-th column.
  
  Parameters:
  
  j - the column index.
  
  Returns:
  
  the column vector.
- column
  
  public ValueVector column(String name)
  
  Returns the column of given name.
  
  Parameters:
  
  name - the column name.
  
  Returns:
  
  the column vector.
- apply
  
  public ValueVector apply(String name)
  
  Returns the column of given name. This is an alias to column for Scala's convenience.
  
  Parameters:
  
  name - the column name.
  
  Returns:
  
  the column vector.
- apply
  
  public DataFrame apply(String... names)
  
  Returns a new DataFrame with selected columns. This is an alias to select for Scala's convenience.
  
  Parameters:
  
  names - the column names.
  
  Returns:
  
  a new DataFrame with selected columns.
- get
  
  public Tuple get(int i)
  
  Returns the row at the specified index.
  
  Parameters:
  
  i - the row index.
  
  Returns:
  
  the i-th row.
- apply
  
  public Tuple apply(int i)
  
  Returns the row at the specified index. This is an alias to get for Scala's convenience.
  
  Parameters:
  
  i - the row index.
  
  Returns:
  
  the i-th row.
- loc
  
  public Tuple loc(Object row)
  
  Returns the row with the specified index.
  
  Parameters:
  
  row - the row index.
  
  Returns:
  
  the row with the specified index.
- loc
  
  public DataFrame loc(Object... rows)
  
  Returns a new data frame with specified rows.
  
  Parameters:
  
  rows - the row indices.
  
  Returns:
  
  a new data frame with specified rows.
- get
  
  public DataFrame get(Index index)
  
  Returns a new data frame with row indexing.
  
  Parameters:
  
  index - the row indexing.
  
  Returns:
  
  the data frame of selected rows.
- apply
  
  public DataFrame apply(Index index)
  
  Returns a new data frame with row indexing. This is an alias to get for Scala's convenience.
  
  Parameters:
  
  index - the row indexing.
  
  Returns:
  
  the data frame of selected rows.
- get
  
  public DataFrame get(boolean[] index)
  
  Returns a new data frame with boolean indexing.
  
  Parameters:
  
  index - the boolean indexing.
  
  Returns:
  
  the data frame of selected rows.
- apply
  
  public DataFrame apply(boolean[] index)
  
  Returns a new data frame with boolean indexing. This is an alias to get for Scala's convenience.
  
  Parameters:
  
  index - the boolean indexing.
  
  Returns:
  
  the data frame of selected rows.
- isNullAt
  
  public boolean isNullAt(int i, int j)
  
  Checks whether the value at position (i, j) is null or missing value.
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  true if the cell value is null.
- get
  
  public Object get(int i, int j)
  
  Returns the cell at (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the cell value.
- apply
  
  public Object apply(int i, int j)
  
  Returns the cell at (i, j). This is an alias to get for Scala's convenience.
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the cell value.
- getInt
  
  public int getInt(int i, int j)
  
  Returns the int value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the int value of cell.
- getLong
  
  public long getLong(int i, int j)
  
  Returns the long value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the long value of cell.
- getFloat
  
  public float getFloat(int i, int j)
  
  Returns the float value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the float value of cell.
- getDouble
  
  public double getDouble(int i, int j)
  
  Returns the double value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the double value of cell.
- getString
  
  public String getString(int i, int j)
  
  Returns the string representation of the value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the string representation of cell value.
- getScale
  
  public String getScale(int i, int j)
  
  Returns the value at position (i, j) of NominalScale or OrdinalScale.
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  Returns:
  
  the cell scale.
  
  Throws:
  
  ClassCastException - when the data is not nominal or ordinal.
- set
  
  public void set(int i, int j, Object value)
  
  Sets the value at position (i, j).
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  value - the new value.
- update
  
  public void update(int i, int j, Object value)
  
  Updates the value at position (i, j). This is an alias to set for Scala's convenience.
  
  Parameters:
  
  i - the row index.
  
  j - the column index.
  
  value - the new value.
- stream
  
  public Stream<Row> stream()
  
  Returns a (possibly parallel) Stream of rows.
  
  Returns:
  
  a (possibly parallel) Stream of rows.
- iterator
  
  public Iterator<Row> iterator()
  
  Specified by:
  
  iterator in interface Iterable<Row>
- toList
  
  public List<Row> toList()
  
  Returns the List of rows.
  
  Returns:
  
  the List of rows.
- dropna
  
  public DataFrame dropna()
  
  Returns a new data frame without rows that have null/missing values.
  
  Returns:
  
  the data frame without null/missing values.
- fillna
  
  public DataFrame fillna(double value)
  
  Fills null/NaN/Inf values of numeric columns with the specified value.
  
  Parameters:
  
  value - the value to replace NAs.
  
  Returns:
  
  this data frame.
- select
  
  public DataFrame select(int... indices)
  
  Returns a new DataFrame with selected columns.
  
  Parameters:
  
  indices - the column indices.
  
  Returns:
  
  a new DataFrame with selected columns.
- select
  
  public DataFrame select(String... names)
  
  Returns a new DataFrame with selected columns.
  
  Parameters:
  
  names - the column names.
  
  Returns:
  
  a new DataFrame with selected columns.
- drop
  
  public DataFrame drop(int... indices)
  
  Returns a new DataFrame without selected columns.
  
  Parameters:
  
  indices - the column indices.
  
  Returns:
  
  a new DataFrame without selected columns.
- drop
  
  public DataFrame drop(String... names)
  
  Returns a new DataFrame without selected columns.
  
  Parameters:
  
  names - the column names.
  
  Returns:
  
  a new DataFrame without selected columns.
- add
  
  public DataFrame add(ValueVector... vectors)
  
  Adds columns to this data frame.
  
  Parameters:
  
  vectors - the columns to add.
  
  Returns:
  
  this dataframe.
- set
  
  public DataFrame set(String name, ValueVector column)
  
  Sets the column values. If the column does not exist, adds it as the last column of the dataframe.
  
  Parameters:
  
  name - the column name.
  
  column - the new column value.
  
  Returns:
  
  this dataframe.
- update
  
  public DataFrame update(String name, ValueVector column)
  
  Sets the column values. If the column does not exist, adds it as the last column of the dataframe. This is an alias to set for Scala's convenience.
  
  Parameters:
  
  name - the column name.
  
  column - the new column value.
  
  Returns:
  
  this dataframe.
- join
  
  public DataFrame join(DataFrame other)
  
  Joins two data frames on their index. If either dataframe has no index, merges them horizontally by columns.
  
  Parameters:
  
  other - the data frames to merge.
  
  Returns:
  
  a new data frame with combined columns.
- merge
  
  public DataFrame merge(DataFrame... dataframes)
  
  Merges data frames horizontally by columns. If there are columns with the same name, the latter ones will be renamed with suffix such as _2, _3, etc.
  
  Parameters:
  
  dataframes - the data frames to merge.
  
  Returns:
  
  a new data frame with combined columns.
- concat
  
  public DataFrame concat(DataFrame... dataframes)
  
  Concatenates data frames vertically by rows.
  
  Parameters:
  
  dataframes - the data frames to concatenate.
  
  Returns:
  
  a new data frame that combines all the rows.
- factorize
  
  public DataFrame factorize(String... names)
  
  Returns a new DataFrame with given columns converted to nominal.
  
  Parameters:
  
  names - column names. If empty, all object columns in the data frame will be converted.
  
  Returns:
  
  a new DataFrame.
- toArray
  
  public double[][] toArray(String... columns)
  
  Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
  
  Parameters:
  
  columns - the columns to export. If empty, all columns will be used.
  
  Returns:
  
  the numeric array.
- toArray
  
  public double[][] toArray(boolean bias, CategoricalEncoder encoder, String... names)
  
  Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
  
  Parameters:
  
  bias - if true, add the first column of all 1's.
  
  encoder - the categorical variable encoder.
  
  names - the columns to export. If empty, all columns will be used.
  
  Returns:
  
  the numeric array.
- toMatrix
  
  public Matrix toMatrix()
  
  Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
  
  Returns:
  
  the numeric matrix.
- toMatrix
  
  public Matrix toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames)
  
  Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
  
  Parameters:
  
  bias - if true, add the first column of all 1's.
  
  encoder - the categorical variable encoder.
  
  rowNames - the column to be used as row names.
  
  Returns:
  
  the numeric matrix.
- describe
  
  public DataFrame describe()
  
  Returns the data structure and statistics.
  
  Returns:
  
  the data structure and statistics.
- head
  
  public String head(int numRows)
  
  Returns the string representation of top rows.
  
  Parameters:
  
  numRows - the number of rows to show.
  
  Returns:
  
  the string representation of top rows.
- tail
  
  public String tail(int numRows)
  
  Returns the string representation of bottom rows.
  
  Parameters:
  
  numRows - the number of rows to show.
  
  Returns:
  
  the string representation of bottom rows.
- toString
  
  public String toString(int from, int to, boolean truncate)
  
  Returns the string representation of rows in specified range.
  
  Parameters:
  
  from - the initial index of the range to show, inclusive
  
  to - the final index of the range to show, exclusive.
  
  truncate - Whether truncate long strings and align cells right.
  
  Returns:
  
  the string representation of rows in specified range.
- of
  
  public static DataFrame of(double[][] data, String... names)
  
  Creates a DataFrame from a 2-dimensional array.
  
  Parameters:
  
  data - the data array.
  
  names - the name of columns.
  
  Returns:
  
  the data frame.
- of
  
  public static DataFrame of(float[][] data, String... names)
  
  Creates a DataFrame from a 2-dimensional array.
  
  Parameters:
  
  data - the data array.
  
  names - the name of columns.
  
  Returns:
  
  the data frame.
- of
  
  public static DataFrame of(int[][] data, String... names)
  
  Creates a DataFrame from a 2-dimensional array.
  
  Parameters:
  
  data - the data array.
  
  names - the name of columns.
  
  Returns:
  
  the data frame.
- of
  
  public static <T> DataFrame of(Class<T> clazz, List<T> data)
  
  Creates a DataFrame from a collection of objects.
  
  Type Parameters:
  
  T - The data type of elements.
  
  Parameters:
  
  clazz - The class type of elements.
  
  data - The data collection.
  
  Returns:
  
  the data frame.
- of
  
  public static DataFrame of(StructType schema, Stream<? extends Tuple> data)
  
  Creates a DataFrame from a stream of tuples.
  
  Parameters:
  
  schema - the schema of data frame.
  
  data - the data stream.
  
  Returns:
  
  the data frame.
- of
  
  public static DataFrame of(StructType schema, List<? extends Tuple> data)
  
  Creates a DataFrame from a set of tuples.
  
  Parameters:
  
  schema - The schema of tuple.
  
  data - The data collection.
  
  Returns:
  
  the data frame.
- of
  
  public static DataFrame of(ResultSet rs) throws SQLException
  
  Creates a DataFrame from a JDBC ResultSet.
  
  Parameters:
  
  rs - The JDBC result set.
  
  Returns:
  
  the data frame.
  
  Throws:
  
  SQLException - when JDBC operation fails.
- hashCode
  
  public final int hashCode()
  
  Returns a hash code value for this object. The value is derived from the hash code of each of the record components.
  
  Specified by:
  
  hashCode in class Record
  
  Returns:
  
  a hash code value for this object
- equals
  
  public final boolean equals(Object o)
  
  Indicates whether some other object is "equal to" this one. The objects are equal if the other object is of the same class and if all the record components are equal. All components in this record class are compared with Objects::equals(Object,Object).
  
  Specified by:
  
  equals in class Record
  
  Parameters:
  
  o - the object with which to compare
  
  Returns:
  
  true if this object is the same as the o argument; false otherwise.
- schema
  
  public StructType schema()
  
  Returns the value of the schema record component.
  
  Returns:
  
  the value of the schema record component
- columns
  
  public List<ValueVector> columns()
  
  Returns the value of the columns record component.
  
  Returns:
  
  the value of the columns record component
- index
  
  public RowIndex index()
  
  Returns the value of the index record component.
  
  Returns:
  
  the value of the index record component

Record Class DataFrame

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface java.lang.Iterable

Constructor Details

DataFrame

DataFrame

DataFrame

Method Details

toString

names

dtypes

measures

shape

size

nrow

ncol

isEmpty

setIndex

setIndex

column

column

apply

apply

get

apply

loc

loc

get

apply

get

apply

isNullAt

get

apply

getInt

getLong

getFloat

getDouble

getString

getScale

set

update

stream

iterator

toList

dropna

fillna

select

select

drop

drop

add

set

update

join

merge

concat

factorize

toArray

toArray

toMatrix

toMatrix

describe

head

tail

toString

of

of

of

of

of

of

of

hashCode

equals

schema

columns

index