Create a list of nanoparquet options.
Usage
parquet_options(
  class = getOption("nanoparquet.class", "tbl"),
  compression_level = getOption("nanoparquet.compression_level", NA_integer_),
  keep_row_groups = FALSE,
  num_rows_per_row_group = getOption("nanoparquet.num_rows_per_row_group", 10000000L),
  use_arrow_metadata = getOption("nanoparquet.use_arrow_metadata", TRUE),
  write_arrow_metadata = getOption("nanoparquet.write_arrow_metadata", TRUE),
  write_data_page_version = getOption("nanoparquet.write_data_page_version", 1L),
  write_minmax_values = getOption("nanoparquet.write_minmax_values", TRUE)
)Arguments
- class
- The extra class or classes to add to data frames created in - read_parquet(). By default nanoparquet adds the- "tbl"class, so data frames are printed differently if the pillar package is loaded.
- compression_level
- The compression level in - write_parquet().- NAis the default, and it specifies the default compression level of each method.- Infalways selects the highest possible compression level. More details:- Snappy does not support compression levels currently. 
- GZIP supports levels from 0 (uncompressed), 1 (fastest), to 9 (best). The default is 6. 
- ZSTD allows positive levels up to 22 currently. 20 and above require more memory. Negative levels are also allowed, the lower the level, the faster the speed, at the cost of compression. Currently the smallest level is -131072. The default level is 3. 
 
- keep_row_groups
- This option is used when appending to a Parquet file with - append_parquet(). If- TRUEthen the existing row groups of the file are always kept as is and nanoparquet creates new row groups for the new data. If- FALSE(the default), then the last row group of the file will be overwritten if it is smaller than the default row group size, i.e.- num_rows_per_row_group.
- num_rows_per_row_group
- The number of rows to put into a row group, if row groups are not specified explicitly. It should be an integer scalar. Defaults to 10 million. 
- use_arrow_metadata
- TRUEor- FALSE. If- TRUE, then- read_parquet()and- read_parquet_schema()will make use of the Apache Arrow metadata to assign R classes to Parquet columns. This is currently used to detect factor columns, and to detect "difftime" columns.- If this option is - FALSE:- "factor" columns are read as character vectors. 
- "difftime" columns are read as real numbers, meaning one of seconds, milliseconds, microseconds or nanoseconds. Impossible to tell which without using the Arrow metadata. 
 
- write_arrow_metadata
- Whether to add the Apache Arrow types as metadata to the file - write_parquet().
- write_data_page_version
- Data version to write by default. Possible values are 1 and 2. Default is 1. 
- write_minmax_values
- Whether to write minimum and maximum values per row group, for data types that support this in - write_parquet(). However, nanoparquet currently does not support minimum and maximum values for the- DECIMAL,- UUIDand- FLOAT16logical types and the- BOOLEAN,- BYTE_ARRAYand- FIXED_LEN_BYTE_ARRAYprimitive types if they are writing without a logical type. Currently the default is- TRUE.
Examples
if (FALSE) {
# the effect of using Arrow metadata
tmp <- tempfile(fileext = ".parquet")
d <- data.frame(
  fct = as.factor("a"),
  dft = as.difftime(10, units = "secs")
)
write_parquet(d, tmp)
read_parquet(tmp, options = parquet_options(use_arrow_metadata = TRUE))
read_parquet(tmp, options = parquet_options(use_arrow_metadata = FALSE))
}