Interface Read
-
Method Summary
Modifier and TypeMethodDescriptionstatic DataFrame
Reads an ARFF file.static DataFrame
Reads an ARFF file.static DataFrame
Reads an Apache Arrow file.static DataFrame
Reads an Apache Arrow file.static DataFrame
avro
(String path, InputStream schema) Reads an Apache Avro file.static DataFrame
Reads an Apache Avro file.static DataFrame
avro
(Path path, InputStream schema) Reads an Apache Avro file.static DataFrame
Reads an Apache Avro file.static DataFrame
Reads a CSV file.static DataFrame
Reads a CSV file.static DataFrame
Reads a CSV file.static DataFrame
csv
(String path, org.apache.commons.csv.CSVFormat format, StructType schema) Reads a CSV file.static DataFrame
Reads a CSV file.static DataFrame
Reads a CSV file.static DataFrame
csv
(Path path, org.apache.commons.csv.CSVFormat format, StructType schema) Reads a CSV file.static DataFrame
Reads a data file.static DataFrame
Reads a data file.static DataFrame
Reads a JSON file.static DataFrame
json
(String path, JSON.Mode mode, StructType schema) Reads a JSON file.static DataFrame
Reads a JSON file.static DataFrame
json
(Path path, JSON.Mode mode, StructType schema) Reads a JSON file.static SparseDataset
<Integer> libsvm
(BufferedReader reader) Reads a libsvm sparse dataset.static SparseDataset
<Integer> Reads a libsvm sparse dataset.static SparseDataset
<Integer> Reads a libsvm sparse dataset.static Object
Reads a serialized object from a file.static DataFrame
Reads an Apache Parquet file.static DataFrame
Reads an Apache Parquet file.static DataFrame
Reads a SAS7BDAT file.static DataFrame
Reads a SAS7BDAT file.
-
Method Details
-
object
Reads a serialized object from a file.- Parameters:
path
- the file path.- Returns:
- the serialized object.
- Throws:
IOException
- when fails to read the stream.ClassNotFoundException
- when fails to load the class.
-
data
Reads a data file. Infers the data format by the file name extension.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.ParseException
- when fails to parse the file.URISyntaxException
- when the file path syntax is wrong.
-
data
static DataFrame data(String path, String format) throws IOException, URISyntaxException, ParseException Reads a data file. Infers the data format by the file name extension.- Parameters:
path
- the input file path.format
- the optional file format specification. For csv files, it is such asdelimiter=\t,header=true,comment=#,escape=\,quote="
. For json files, it is the file mode (single-line or multi-line). For avro files, it is the path to the schema file.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.ParseException
- when fails to parse the file.URISyntaxException
- when the file path syntax is wrong.
-
csv
Reads a CSV file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
csv
Reads a CSV file.- Parameters:
path
- the input file path.format
- the format specification in key-value pairs such asdelimiter=\t,header=true,comment=#,escape=\,quote="
.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
csv
static DataFrame csv(String path, org.apache.commons.csv.CSVFormat format) throws IOException, URISyntaxException Reads a CSV file.- Parameters:
path
- the input file path.format
- the CSV file format.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
csv
static DataFrame csv(String path, org.apache.commons.csv.CSVFormat format, StructType schema) throws IOException, URISyntaxException Reads a CSV file.- Parameters:
path
- the input file path.format
- the CSV file format.schema
- the data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
csv
Reads a CSV file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
csv
Reads a CSV file.- Parameters:
path
- the input file path.format
- the CSV file format.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
csv
static DataFrame csv(Path path, org.apache.commons.csv.CSVFormat format, StructType schema) throws IOException Reads a CSV file.- Parameters:
path
- the input file path.format
- the CSV file format.schema
- the data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
json
Reads a JSON file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
json
static DataFrame json(String path, JSON.Mode mode, StructType schema) throws IOException, URISyntaxException Reads a JSON file.- Parameters:
path
- the input file path.mode
- the file mode (single-line or multi-line).schema
- the data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
json
Reads a JSON file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
json
Reads a JSON file.- Parameters:
path
- the input file path.mode
- the file mode (single-line or multi-line).schema
- the data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
arff
Reads an ARFF file. Weka ARFF (attribute relation file format) is an ASCII text file format that is essentially a CSV file with a header that describes the meta-data. ARFF was developed for use in the Weka machine learning software.A dataset is firstly described, beginning with the name of the dataset (or the relation in ARFF terminology). Each of the variables (or attribute in ARFF terminology) used to describe the observations is then identified, together with their data type, each definition on a single line. The actual observations are then listed, each on a single line, with fields separated by commas, much like a CSV file.
Missing values in an ARFF dataset are identified using the question mark '?'.
Comments can be included in the file, introduced at the beginning of a line with a '%', whereby the remainder of the line is ignored.
A significant advantage of the ARFF data file over the CSV data file is the metadata information.
Also, the ability to include comments ensure we can record extra information about the data set, including how it was derived, where it came from, and how it might be cited.
- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.ParseException
- when fails to parse the file.URISyntaxException
- when the file path syntax is wrong.
-
arff
Reads an ARFF file. Weka ARFF (attribute relation file format) is an ASCII text file format that is essentially a CSV file with a header that describes the meta-data. ARFF was developed for use in the Weka machine learning software.A dataset is firstly described, beginning with the name of the dataset (or the relation in ARFF terminology). Each of the variables (or attribute in ARFF terminology) used to describe the observations is then identified, together with their data type, each definition on a single line. The actual observations are then listed, each on a single line, with fields separated by commas, much like a CSV file.
Missing values in an ARFF dataset are identified using the question mark '?'.
Comments can be included in the file, introduced at the beginning of a line with a '%', whereby the remainder of the line is ignored.
A significant advantage of the ARFF data file over the CSV data file is the metadata information.
Also, the ability to include comments ensure we can record extra information about the data set, including how it was derived, where it came from, and how it might be cited.
- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.ParseException
- when fails to parse the file.
-
sas
Reads a SAS7BDAT file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
sas
Reads a SAS7BDAT file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
arrow
Reads an Apache Arrow file. Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
arrow
Reads an Apache Arrow file. Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
avro
Reads an Apache Avro file.- Parameters:
path
- the input file path.schema
- the input stream of data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
avro
Reads an Apache Avro file.- Parameters:
path
- the input file path.schema
- the data schema file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
avro
Reads an Apache Avro file.- Parameters:
path
- the input file path.schema
- the input stream of data schema.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
avro
Reads an Apache Avro file.- Parameters:
path
- the input file path.schema
- the data schema file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
parquet
Reads an Apache Parquet file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
parquet
Reads an Apache Parquet file.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
libsvm
Reads a libsvm sparse dataset. The format of libsvm file is:<label> <index1>:<value1> <index2>:<value2> ...
label
is the target value of the training data. For classification, it should be an integer which identifies a class (multi-class classification is supported). For regression, it's any real number. For one-class SVM, it's not used so can be any number.index
is an integer starting from 1, andvalue
is a real number. The indices must be in ascending order. The labels in the testing data file are only used to calculate accuracy or error. If they are unknown, just fill this column with a number.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.URISyntaxException
- when the file path syntax is wrong.
-
libsvm
Reads a libsvm sparse dataset. The format of libsvm file is:<label> <index1>:<value1> <index2>:<value2> ...
label
is the target value of the training data. For classification, it should be an integer which identifies a class (multi-class classification is supported). For regression, it's any real number. For one-class SVM, it's not used so can be any number.index
is an integer starting from 1, andvalue
is a real number. The indices must be in ascending order. The labels in the testing data file are only used to calculate accuracy or error. If they are unknown, just fill this column with a number.- Parameters:
path
- the input file path.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-
libsvm
Reads a libsvm sparse dataset. The format of libsvm file is:<label> <index1>:<value1> <index2>:<value2> ...
label
is the target value of the training data. For classification, it should be an integer which identifies a class (multi-class classification is supported). For regression, it's any real number. For one-class SVM, it's not used so can be any number.index
is an integer starting from 1, andvalue
is a real number. The indices must be in ascending order. The labels in the testing data file are only used to calculate accuracy or error. If they are unknown, just fill this column with a number.- Parameters:
reader
- the file reader.- Returns:
- the data frame.
- Throws:
IOException
- when fails to read the file.
-