Changelog
Source:NEWS.md
nanoparquet 0.3.0
CRAN release: 2024-06-17
-
read_parquet()
type mapping changes:- The
STRING
logical type and theUTF8
converted type are still read as a character vector, butBYTE_ARRAY
types without a converted or logical types are not any more, and are read as a list of raw vectors. Missing values are indicated asNULL
values. - The
DECIMAL
converted type is read as aREALSXP
now, even if its type isFIXED_LEN_BYTE_ARRAY
. (Not just if it isBYTE_ARRAY
). - The
UUID
logical type is now read as a character vector, formatted as00112233-4455-6677-8899-aabbccddeeff
. -
BYTE_ARRAY
andFIXED_LEN_BYTE_ARRAY
types without logical or converted types; or with unsupported ones:FLOAT16
,INTERVAL
; are now read into a list of raw vectors. Missing values are denoted byNULL
.
- The
write_parquet()
now automatically uses dictionary encoding for columns that have many repeated values. Only the first 10k rows are used to decide if dictionary will be used or not. Similarly, logical columns are written in RLE encoding if they contain runs of repeated values.NA
values are ignored when selecting the encoding (#18).write_parquet()
can now write a data frame to a memory buffer, returned as a raw vector, if the special":raw:"
filename is used (#31).read_parquet()
can now read Parquet files with V2 data pages (#37).Both
read_parquet()
andwrite_parquet()
now support GZIP and ZSTD compressed Parquet files.read_parquet()
now supports theRLE
encoding forBOOLEAN
columns and also supports theDELTA_BINARY_PACKED
,DELTA_LENGTH_BYTE_ARRAY
,DELTA_BYTE_ARRAY
andBYTE_STREAM_SPLIT
encodings.The
parquet_columns()
function is now calledparquet_column_types()
and it can now map the column types of a data frame to Parquet types.parquet_info()
,parquet_metadata()
andparquet_column_types()
now work if thecreated_by
metadata field is unset.New
parquet_options()
function that you can use to set nanoparquet options for a singleread_parquet()
orwrite_parquet()
call.
nanoparquet 0.2.0
CRAN release: 2024-05-30
- First release on CRAN. It contains the Parquet reader from https://github.com/hannes/miniparquet, a Parquet writer, functions to read Parquet metadata, and many improvements.