Package smile.data

Interface DataFrame

All Superinterfaces:
Iterable<Tuple>
All Known Implementing Classes:
IndexDataFrame

public interface DataFrame extends Iterable<Tuple>
An immutable collection of data organized into named columns.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Interface
    Description
    static interface 
    Stream collectors.
  • Method Summary

    Modifier and Type
    Method
    Description
    default Tuple
    apply(int i)
    Returns the row at the specified index.
    default BaseVector
    apply(Enum<?> column)
    Selects column using an enum value.
    default BaseVector
    apply(String column)
    Selects column based on the column name and return it as a Column.
    Selects column based on the column index.
    booleanVector(Enum<?> column)
    Selects column using an enum value.
    Selects column based on the column name.
    byteVector(int i)
    Selects column based on the column index.
    default ByteVector
    byteVector(Enum<?> column)
    Selects column using an enum value.
    default ByteVector
    Selects column based on the column name.
    charVector(int i)
    Selects column based on the column index.
    default CharVector
    charVector(Enum<?> column)
    Selects column using an enum value.
    default CharVector
    Selects column based on the column name.
    column(int i)
    Selects column based on the column index.
    default BaseVector
    column(Enum<?> column)
    Selects column using an enum value.
    default BaseVector
    column(String column)
    Selects column based on the column name.
    doubleVector(int i)
    Selects column based on the column index.
    default DoubleVector
    doubleVector(Enum<?> column)
    Selects column using an enum value.
    default DoubleVector
    Selects column based on the column name.
    drop(int... columns)
    Returns a new DataFrame without selected columns.
    default DataFrame
    drop(String... columns)
    Returns a new DataFrame without selected columns.
    default DataFrame
    factorize(String... columns)
    Returns a new DataFrame with given columns converted to nominal.
    default DataFrame
    fillna(double value)
    Fills NaN/Inf values of floating number columns using the specified value.
    floatVector(int i)
    Selects column based on the column index.
    default FloatVector
    floatVector(Enum<?> column)
    Selects column using an enum value.
    default FloatVector
    Selects column based on the column name.
    get(int i)
    Returns the row at the specified index.
    default Object
    get(int i, int j)
    Returns the cell at (i, j).
    default Object
    get(int i, String column)
    Returns the cell at (i, j).
    default <T> T[]
    getArray(int i, int j)
    Returns the value at position (i, j) of array type.
    default <T> T[]
    getArray(int i, String column)
    Returns the field value of array type.
    default boolean
    getBoolean(int i, int j)
    Returns the value at position (i, j) as a primitive boolean.
    default boolean
    getBoolean(int i, String column)
    Returns the field value as a primitive boolean.
    default byte
    getByte(int i, int j)
    Returns the value at position (i, j) as a primitive byte.
    default byte
    getByte(int i, String column)
    Returns the field value as a primitive byte.
    default char
    getChar(int i, int j)
    Returns the value at position (i, j) as a primitive byte.
    default char
    getChar(int i, String column)
    Returns the field value as a primitive byte.
    default LocalDate
    getDate(int i, int j)
    Returns the value at position (i, j) of date type as java.time.LocalDate.
    default LocalDate
    getDate(int i, String column)
    Returns the field value of date type as java.time.LocalDate.
    getDateTime(int i, int j)
    Returns the value at position (i, j) as java.time.LocalDateTime.
    getDateTime(int i, String column)
    Returns the field value as java.time.LocalDateTime.
    default BigDecimal
    getDecimal(int i, int j)
    Returns the value at position (i, j) of decimal type as java.math.BigDecimal.
    default BigDecimal
    getDecimal(int i, String column)
    Returns the field value of decimal type as java.math.BigDecimal.
    default double
    getDouble(int i, int j)
    Returns the value at position (i, j) as a primitive double.
    default double
    getDouble(int i, String column)
    Returns the field value as a primitive double.
    default float
    getFloat(int i, int j)
    Returns the value at position (i, j) as a primitive float.
    default float
    getFloat(int i, String column)
    Returns the field value as a primitive float.
    default int
    getInt(int i, int j)
    Returns the value at position (i, j) as a primitive int.
    default int
    getInt(int i, String column)
    Returns the field value as a primitive int.
    default long
    getLong(int i, int j)
    Returns the value at position (i, j) as a primitive long.
    default long
    getLong(int i, String column)
    Returns the field value as a primitive long.
    default String
    getScale(int i, int j)
    Returns the value at position (i, j) of NominalScale or OrdinalScale.
    default String
    getScale(int i, String column)
    Returns the field value of NominalScale or OrdinalScale.
    default short
    getShort(int i, int j)
    Returns the value at position (i, j) as a primitive short.
    default short
    getShort(int i, String column)
    Returns the field value as a primitive short.
    default String
    getString(int i, int j)
    Returns the value at position (i, j) as a String object.
    default String
    getString(int i, String column)
    Returns the field value as a String object.
    default Tuple
    getStruct(int i, int j)
    Returns the value at position (i, j) of struct type.
    default Tuple
    getStruct(int i, String column)
    Returns the field value of struct type.
    default LocalTime
    getTime(int i, int j)
    Returns the value at position (i, j) of date type as java.time.LocalTime.
    default LocalTime
    getTime(int i, String column)
    Returns the field value of date type as java.time.LocalTime.
    int
    indexOf(String column)
    Returns the index of a given column name.
    intVector(int i)
    Selects column based on the column index.
    default IntVector
    intVector(Enum<?> column)
    Selects column using an enum value.
    default IntVector
    intVector(String column)
    Selects column based on the column name.
    default boolean
    Returns true if the data frame is empty.
    default boolean
    isNullAt(int i, int j)
    Checks whether the value at position (i, j) is null.
    default boolean
    isNullAt(int i, String column)
    Checks whether the field value is null.
    longVector(int i)
    Selects column based on the column index.
    default LongVector
    longVector(Enum<?> column)
    Selects column using an enum value.
    default LongVector
    Selects column based on the column name.
    default Measure[]
    Returns the column's level of measurements.
    merge(DataFrame... dataframes)
    Merges data frames horizontally by columns.
    merge(BaseVector... vectors)
    Merges vectors with this data frame.
    default String[]
    Returns the column names.
    int
    Returns the number of columns.
    default int
    Returns the number of rows.
    default DataFrame
    of(boolean... index)
    Returns a new data frame with boolean indexing.
    static DataFrame
    of(double[][] data, String... names)
    Creates a DataFrame from a 2-dimensional array.
    static DataFrame
    of(float[][] data, String... names)
    Creates a DataFrame from a 2-dimensional array.
    default DataFrame
    of(int... index)
    Returns a new data frame with row indexing.
    static DataFrame
    of(int[][] data, String... names)
    Creates a DataFrame from a 2-dimensional array.
    static DataFrame
    Creates a DataFrame from a JDBC ResultSet.
    static <T> DataFrame
    of(Collection<Map<String,T>> data, StructType schema)
    Creates a DataFrame from a set of Maps.
    static DataFrame
    of(List<? extends Tuple> data)
    Creates a DataFrame from a set of tuples.
    static DataFrame
    of(List<? extends Tuple> data, StructType schema)
    Creates a DataFrame from a set of tuples.
    static <T> DataFrame
    of(List<T> data, Class<T> clazz)
    Creates a DataFrame from a collection.
    static DataFrame
    of(Stream<? extends Tuple> data)
    Creates a DataFrame from a stream of tuples.
    static DataFrame
    of(Stream<? extends Tuple> data, StructType schema)
    Creates a DataFrame from a stream of tuples.
    static DataFrame
    of(BaseVector... vectors)
    Creates a DataFrame from a set of vectors.
    default DataFrame
    Returns a new data frame without rows that have null/missing values.
    Returns the schema of DataFrame.
    select(int... columns)
    Returns a new DataFrame with selected columns.
    default DataFrame
    select(String... columns)
    Returns a new DataFrame with selected columns.
    shortVector(int i)
    Selects column based on the column index.
    default ShortVector
    shortVector(Enum<?> column)
    Selects column using an enum value.
    default ShortVector
    Selects column based on the column name.
    int
    Returns the number of rows.
    default DataFrame
    slice(int from, int to)
    Copies the specified range into a new data frame.
    Returns a (possibly parallel) Stream of rows.
    stringVector(int i)
    Selects column based on the column index.
    default StringVector
    stringVector(Enum<?> column)
    Selects column using an enum value.
    default StringVector
    Selects column based on the column name.
    default DataFrame
    Returns the structure of data frame.
    default DataFrame
    Returns the statistic summary of numeric columns.
    default double[][]
    toArray(boolean bias, CategoricalEncoder encoder, String... columns)
    Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.
    default double[][]
    toArray(String... columns)
    Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix.
    default List<Tuple>
    Returns the List of rows.
    default Matrix
    Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.
    default Matrix
    toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames)
    Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix.
    default String
    toString(int numRows)
    Returns the string representation of top rows.
    default String
    toString(int numRows, boolean truncate)
    Returns the string representation of top rows.
    default String
    toString(int i, int j)
    Returns the string representation of the value at position (i, j).
    default String
    toString(int i, String column)
    Returns the string representation of the field value.
    default String[][]
    toStrings(int numRows)
    Returns the string representation of top rows.
    default String[][]
    toStrings(int numRows, boolean truncate)
    Returns the string representation of top rows.
    default DataType[]
    Returns the column data types.
    union(DataFrame... dataframes)
    Unions data frames vertically by rows.
    <T> Vector<T>
    vector(int i)
    Selects column based on the column index.
    default <T> Vector<T>
    vector(Enum<?> column)
    Selects column using an enum value.
    default <T> Vector<T>
    vector(String column)
    Selects column based on the column name.

    Methods inherited from interface java.lang.Iterable

    forEach, iterator, spliterator
  • Method Details

    • schema

      StructType schema()
      Returns the schema of DataFrame.
      Returns:
      the schema.
    • names

      default String[] names()
      Returns the column names.
      Returns:
      the column names.
    • types

      default DataType[] types()
      Returns the column data types.
      Returns:
      the column data types.
    • measures

      default Measure[] measures()
      Returns the column's level of measurements.
      Returns:
      the column's level of measurements.
    • size

      int size()
      Returns the number of rows.
      Returns:
      the number of rows.
    • isEmpty

      default boolean isEmpty()
      Returns true if the data frame is empty.
      Returns:
      true if the data frame is empty.
    • get

      Tuple get(int i)
      Returns the row at the specified index.
      Parameters:
      i - the row index.
      Returns:
      the i-th row.
    • apply

      default Tuple apply(int i)
      Returns the row at the specified index. For Scala's convenience.
      Parameters:
      i - the row index.
      Returns:
      the i-th row.
    • stream

      Stream<Tuple> stream()
      Returns a (possibly parallel) Stream of rows.
      Returns:
      a (possibly parallel) Stream of rows.
    • toList

      default List<Tuple> toList()
      Returns the List of rows.
      Returns:
      the List of rows.
    • nrow

      default int nrow()
      Returns the number of rows.
      Returns:
      the number of rows.
    • ncol

      int ncol()
      Returns the number of columns.
      Returns:
      the number of columns.
    • structure

      default DataFrame structure()
      Returns the structure of data frame.
      Returns:
      the structure of data frame.
    • omitNullRows

      default DataFrame omitNullRows()
      Returns a new data frame without rows that have null/missing values.
      Returns:
      the data frame without nulls.
    • fillna

      default DataFrame fillna(double value)
      Fills NaN/Inf values of floating number columns using the specified value.
      Parameters:
      value - the value to replace NAs.
      Returns:
      this data frame.
    • get

      default Object get(int i, int j)
      Returns the cell at (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
    • get

      default Object get(int i, String column)
      Returns the cell at (i, j).
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
    • of

      default DataFrame of(int... index)
      Returns a new data frame with row indexing.
      Parameters:
      index - the row indices.
      Returns:
      the data frame of selected rows.
    • of

      default DataFrame of(boolean... index)
      Returns a new data frame with boolean indexing.
      Parameters:
      index - the boolean index.
      Returns:
      the data frame of selected rows.
    • slice

      default DataFrame slice(int from, int to)
      Copies the specified range into a new data frame.
      Parameters:
      from - the initial index of the range to be copied, inclusive
      to - the final index of the range to be copied, exclusive.
      Returns:
      the data frame of selected range of rows.
    • isNullAt

      default boolean isNullAt(int i, int j)
      Checks whether the value at position (i, j) is null.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      true if the cell value is null.
    • isNullAt

      default boolean isNullAt(int i, String column)
      Checks whether the field value is null.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      true if the cell value is null.
    • getBoolean

      default boolean getBoolean(int i, int j)
      Returns the value at position (i, j) as a primitive boolean.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getBoolean

      default boolean getBoolean(int i, String column)
      Returns the field value as a primitive boolean.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getChar

      default char getChar(int i, int j)
      Returns the value at position (i, j) as a primitive byte.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getChar

      default char getChar(int i, String column)
      Returns the field value as a primitive byte.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getByte

      default byte getByte(int i, int j)
      Returns the value at position (i, j) as a primitive byte.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getByte

      default byte getByte(int i, String column)
      Returns the field value as a primitive byte.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getShort

      default short getShort(int i, int j)
      Returns the value at position (i, j) as a primitive short.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getShort

      default short getShort(int i, String column)
      Returns the field value as a primitive short.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getInt

      default int getInt(int i, int j)
      Returns the value at position (i, j) as a primitive int.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getInt

      default int getInt(int i, String column)
      Returns the field value as a primitive int.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getLong

      default long getLong(int i, int j)
      Returns the value at position (i, j) as a primitive long.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getLong

      default long getLong(int i, String column)
      Returns the field value as a primitive long.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getFloat

      default float getFloat(int i, int j)
      Returns the value at position (i, j) as a primitive float. Throws an exception if the type mismatches or if the value is null.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getFloat

      default float getFloat(int i, String column)
      Returns the field value as a primitive float. Throws an exception if the type mismatches or if the value is null.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getDouble

      default double getDouble(int i, int j)
      Returns the value at position (i, j) as a primitive double.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getDouble

      default double getDouble(int i, String column)
      Returns the field value as a primitive double.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
      NullPointerException - when value is null.
    • getString

      default String getString(int i, int j)
      Returns the value at position (i, j) as a String object.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getString

      default String getString(int i, String column)
      Returns the field value as a String object.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • toString

      default String toString(int i, int j)
      Returns the string representation of the value at position (i, j).
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the string representation of cell value.
    • toString

      default String toString(int i, String column)
      Returns the string representation of the field value.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the string representation of cell value.
    • getDecimal

      default BigDecimal getDecimal(int i, int j)
      Returns the value at position (i, j) of decimal type as java.math.BigDecimal.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDecimal

      default BigDecimal getDecimal(int i, String column)
      Returns the field value of decimal type as java.math.BigDecimal.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDate

      default LocalDate getDate(int i, int j)
      Returns the value at position (i, j) of date type as java.time.LocalDate.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDate

      default LocalDate getDate(int i, String column)
      Returns the field value of date type as java.time.LocalDate.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getTime

      default LocalTime getTime(int i, int j)
      Returns the value at position (i, j) of date type as java.time.LocalTime.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getTime

      default LocalTime getTime(int i, String column)
      Returns the field value of date type as java.time.LocalTime.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDateTime

      default LocalDateTime getDateTime(int i, int j)
      Returns the value at position (i, j) as java.time.LocalDateTime.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getDateTime

      default LocalDateTime getDateTime(int i, String column)
      Returns the field value as java.time.LocalDateTime.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getScale

      default String getScale(int i, int j)
      Returns the value at position (i, j) of NominalScale or OrdinalScale.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell scale.
      Throws:
      ClassCastException - when the data is not nominal or ordinal.
    • getScale

      default String getScale(int i, String column)
      Returns the field value of NominalScale or OrdinalScale.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell scale.
      Throws:
      ClassCastException - when the data is not nominal or ordinal.
    • getArray

      default <T> T[] getArray(int i, int j)
      Returns the value at position (i, j) of array type.
      Type Parameters:
      T - the data type of array elements.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getArray

      default <T> T[] getArray(int i, String column)
      Returns the field value of array type.
      Type Parameters:
      T - the data type of array elements.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getStruct

      default Tuple getStruct(int i, int j)
      Returns the value at position (i, j) of struct type.
      Parameters:
      i - the row index.
      j - the column index.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • getStruct

      default Tuple getStruct(int i, String column)
      Returns the field value of struct type.
      Parameters:
      i - the row index.
      column - the column name.
      Returns:
      the cell value.
      Throws:
      ClassCastException - when data type does not match.
    • indexOf

      int indexOf(String column)
      Returns the index of a given column name.
      Parameters:
      column - the column name.
      Returns:
      the index of column.
      Throws:
      IllegalArgumentException - when a field `name` does not exist.
    • apply

      default BaseVector apply(String column)
      Selects column based on the column name and return it as a Column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • apply

      default BaseVector apply(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the field enum.
      Returns:
      the column vector.
    • column

      BaseVector column(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • column

      default BaseVector column(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • column

      default BaseVector column(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • vector

      <T> Vector<T> vector(int i)
      Selects column based on the column index.
      Type Parameters:
      T - the data type of column.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • vector

      default <T> Vector<T> vector(String column)
      Selects column based on the column name.
      Type Parameters:
      T - the data type of column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • vector

      default <T> Vector<T> vector(Enum<?> column)
      Selects column using an enum value.
      Type Parameters:
      T - the data type of column.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • booleanVector

      BooleanVector booleanVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • booleanVector

      default BooleanVector booleanVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • booleanVector

      default BooleanVector booleanVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • charVector

      CharVector charVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • charVector

      default CharVector charVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • charVector

      default CharVector charVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • byteVector

      ByteVector byteVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • byteVector

      default ByteVector byteVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • byteVector

      default ByteVector byteVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • shortVector

      ShortVector shortVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • shortVector

      default ShortVector shortVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • shortVector

      default ShortVector shortVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • intVector

      IntVector intVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • intVector

      default IntVector intVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • intVector

      default IntVector intVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • longVector

      LongVector longVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • longVector

      default LongVector longVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • longVector

      default LongVector longVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • floatVector

      FloatVector floatVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • floatVector

      default FloatVector floatVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • floatVector

      default FloatVector floatVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • doubleVector

      DoubleVector doubleVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • doubleVector

      default DoubleVector doubleVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • doubleVector

      default DoubleVector doubleVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • stringVector

      StringVector stringVector(int i)
      Selects column based on the column index.
      Parameters:
      i - the column index.
      Returns:
      the column vector.
    • stringVector

      default StringVector stringVector(String column)
      Selects column based on the column name.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • stringVector

      default StringVector stringVector(Enum<?> column)
      Selects column using an enum value.
      Parameters:
      column - the column name.
      Returns:
      the column vector.
    • select

      DataFrame select(int... columns)
      Returns a new DataFrame with selected columns.
      Parameters:
      columns - the column indices.
      Returns:
      a new DataFrame with selected columns.
    • select

      default DataFrame select(String... columns)
      Returns a new DataFrame with selected columns.
      Parameters:
      columns - the column names.
      Returns:
      a new DataFrame with selected columns.
    • drop

      DataFrame drop(int... columns)
      Returns a new DataFrame without selected columns.
      Parameters:
      columns - the column indices.
      Returns:
      a new DataFrame without selected columns.
    • drop

      default DataFrame drop(String... columns)
      Returns a new DataFrame without selected columns.
      Parameters:
      columns - the column names.
      Returns:
      a new DataFrame without selected columns.
    • merge

      DataFrame merge(DataFrame... dataframes)
      Merges data frames horizontally by columns.
      Parameters:
      dataframes - the data frames to merge.
      Returns:
      a new data frame that combines this DataFrame with one more other DataFrames by columns.
    • merge

      DataFrame merge(BaseVector... vectors)
      Merges vectors with this data frame.
      Parameters:
      vectors - the vectors to merge.
      Returns:
      a new data frame that combines this DataFrame with one more additional vectors.
    • union

      DataFrame union(DataFrame... dataframes)
      Unions data frames vertically by rows.
      Parameters:
      dataframes - the data frames to union.
      Returns:
      a new data frame that combines all the rows.
    • factorize

      default DataFrame factorize(String... columns)
      Returns a new DataFrame with given columns converted to nominal.
      Parameters:
      columns - column names. If empty, all object columns in the data frame will be converted.
      Returns:
      a new DataFrame.
    • toArray

      default double[][] toArray(String... columns)
      Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
      Parameters:
      columns - the columns to export. If empty, all columns will be used.
      Returns:
      the numeric array.
    • toArray

      default double[][] toArray(boolean bias, CategoricalEncoder encoder, String... columns)
      Return an array obtained by converting the columns in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
      Parameters:
      bias - if true, add the first column of all 1's.
      encoder - the categorical variable encoder.
      columns - the columns to export. If empty, all columns will be used.
      Returns:
      the numeric array.
    • toMatrix

      default Matrix toMatrix()
      Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN.
      Returns:
      the numeric matrix.
    • toMatrix

      default Matrix toMatrix(boolean bias, CategoricalEncoder encoder, String rowNames)
      Return a matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. Missing values/nulls will be encoded as Double.NaN. No bias term and uses level encoding for categorical variables.
      Parameters:
      bias - if true, add the first column of all 1's.
      encoder - the categorical variable encoder.
      rowNames - the column to be used as row names.
      Returns:
      the numeric matrix.
    • summary

      default DataFrame summary()
      Returns the statistic summary of numeric columns.
      Returns:
      the statistic summary of numeric columns.
    • toString

      default String toString(int numRows)
      Returns the string representation of top rows.
      Parameters:
      numRows - the number of rows to show
      Returns:
      the string representation of top rows.
    • toString

      default String toString(int numRows, boolean truncate)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      truncate - Whether truncate long strings and align cells right.
      Returns:
      the string representation of top rows.
    • toStrings

      default String[][] toStrings(int numRows)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      Returns:
      the string representation of top rows.
    • toStrings

      default String[][] toStrings(int numRows, boolean truncate)
      Returns the string representation of top rows.
      Parameters:
      numRows - Number of rows to show
      truncate - Whether truncate long strings.
      Returns:
      the string representation of top rows.
    • of

      static DataFrame of(BaseVector... vectors)
      Creates a DataFrame from a set of vectors.
      Parameters:
      vectors - The column vectors.
      Returns:
      the data frame.
    • of

      static DataFrame of(double[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static DataFrame of(float[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static DataFrame of(int[][] data, String... names)
      Creates a DataFrame from a 2-dimensional array.
      Parameters:
      data - The data array.
      names - the name of columns.
      Returns:
      the data frame.
    • of

      static <T> DataFrame of(List<T> data, Class<T> clazz)
      Creates a DataFrame from a collection.
      Type Parameters:
      T - The data type of elements.
      Parameters:
      data - The data collection.
      clazz - The class type of elements.
      Returns:
      the data frame.
    • of

      static DataFrame of(Stream<? extends Tuple> data)
      Creates a DataFrame from a stream of tuples.
      Parameters:
      data - The data stream.
      Returns:
      the data frame.
    • of

      static DataFrame of(Stream<? extends Tuple> data, StructType schema)
      Creates a DataFrame from a stream of tuples.
      Parameters:
      data - The data stream.
      schema - The schema of tuple.
      Returns:
      the data frame.
    • of

      static DataFrame of(List<? extends Tuple> data)
      Creates a DataFrame from a set of tuples.
      Parameters:
      data - The data collection.
      Returns:
      the data frame.
    • of

      static DataFrame of(List<? extends Tuple> data, StructType schema)
      Creates a DataFrame from a set of tuples.
      Parameters:
      data - The data collection.
      schema - The schema of tuple.
      Returns:
      the data frame.
    • of

      static <T> DataFrame of(Collection<Map<String,T>> data, StructType schema)
      Creates a DataFrame from a set of Maps.
      Type Parameters:
      T - The data type of elements.
      Parameters:
      data - The data collection.
      schema - The schema of data.
      Returns:
      the data frame.
    • of

      static DataFrame of(ResultSet rs) throws SQLException
      Creates a DataFrame from a JDBC ResultSet.
      Parameters:
      rs - The JDBC result set.
      Returns:
      the data frame.
      Throws:
      SQLException - when JDBC operation fails.