Resolves the three findings the auto-check email surfaced for the 2026-05-06 archived 0.5.1 release.
tdc/src/api/decode_impl.c, surfaced through
read_rg_tdc_with_fp in vtr1_tdc.c): the
consolidated decode pipeline now always allocates scratch buffers with a
+16-byte wildcopy slack, so tdc_match_copy’s SIMD overshoot
stays within the allocation. The decode_ex.c variant that
was missing this slack on 0.5.1 is gone (folded into the shared
driver_decode_block_impl). The ASAN-under-vignettes
regression check is now part of the GitHub Actions sanitizer workflow so
a future drift would be caught locally instead of at CRAN’s BDR
memcheck.src/r_bridge.c,
src/r_bridge_io.c, src/vtr1_tdc.c, and
src/collect.c: every Rf_getAttrib /
Rf_mkString result that crossed an allocating call
(R_alloc, Rf_warning,
Rf_setAttrib, Rf_asReal,
Rf_asInteger, parse_*) is now
PROTECTed and balanced with a matching
UNPROTECT. Touches apply_annotation,
C_write_vtr, C_write_vtr_tdc,
parse_quantize, and parse_spatial.src/vec_omp.h and call sites: stop including
<omp.h> and forward-declare the three OpenMP runtime
functions vectra calls (omp_get_max_threads,
omp_get_thread_num, omp_in_parallel). clang
21’s bundled omp.h wrapper contains an unbalanced
#pragma omp end declare variant that breaks compilation of
block.c (and any other vectra TU that includes the wrapper)
under r-devel-linux-x86_64-debian-clang. The bug is in the wrapper
itself, so an #ifdef _OPENMP guard around
#include <omp.h> is not enough — when
-fopenmp is on the compile line, _OPENMP is
defined and the broken wrapper is pulled in. Skipping the wrapper avoids
the bug; the #pragma omp ... directives elsewhere in
src/ are still recognised and the runtime symbols resolve
at link time via libomp. Fixes the compilation error that
caused vectra 0.5.1 to be archived from CRAN..vec)A new tiled raster format and accompanying API for larger-than-RAM gridded data. Each tile is encoded as a self-describing tdc block (PRED_2D + BYTE_SHUFFLE + LZ); decoding is parallel across tiles.
vec_write_raster(x, path, ...): write a numeric matrix
or 3D (rows, cols, bands) array to .vec.
Storage dtypes: f64, f32,
i8/u8, i16/u16,
i32/u32, i64/u64.
compression controls per-tile codec probing —
"fast", "balanced", or "max"
(six-spec probe per tile). Decode cost is unchanged across levels
because each tile records its own codec spec.vec_open_raster(path) /
vec_close_raster(r): lazy open returning a metadata +
handle list (vectra_raster). The handle is auto-finalized
on garbage collection.vec_read_window(r, band, level, cols, rows): decode a
window of a chosen band, with overview-level support. Pixels outside the
raster come back as NA. Tile decode is parallelized across
worker threads (Phase 5a).vec_extract_points(r, x, y): sample band values at
(x, y) points.vec_build_overviews(path, levels, resampling): append
n_levels - 1 reduced-resolution copies in place. Resampling
kernels: "nearest", "average",
"bilinear", "mode", "gauss". The
file’s n_levels is updated atomically.vec_to_tiff(path, output, compression): export
.vec level-0 pixels to GeoTIFF. Compression is
"none", "deflate", or "lzw"; LZW
also applies horizontal differencing (Predictor 2) for integer pixel
types, matching the layout libtiff/GDAL produce by default. Inherits
dtype, geotransform, EPSG, and nodata from the source.vec_write_time_cube(x, times, path, layout, ...): write
a 4D (rows, cols, bands, time) array. Two layouts:
"image" (default): one tile per
(band, time, ty, tx) — optimal for “give me one full image
at time T” reads."pixel": one tile per (band, ty, tx)
holding the full time stack as [tw*th, n_time] — optimal
for “give me the time series at pixel (x, y)” reads.vec_read_pixel_series(r, x, y, band): full time series
at a single pixel as a numeric vector. On pixel-major files this is one
tile decode; on image-major files the reader scans the index for
distinct time stamps and decodes one tile per stamp.vec_read_time_slice(r, time, band, level, cols, rows):
read a single time slice as a matrix.vec_raster_times(r, band, level): distinct time stamps,
in ascending order.vec_raster_layout(r): query whether an open raster is
"image" or "pixel" layout.print.vectra_raster(): prints dimensions, dtype,
geotransform, EPSG, nodata, and band names.n_blocks_x = 1). Edge-block padding is handled in
block_stored_rows().tiff_band_names(): parse
<Item role="description"> entries from
GDAL_METADATA (tag 42112). Pure-R scanner, no
xml2 dependency.tiff_crs(path): read the EPSG code,
geographic-vs-projected flag, and citation string from the GeoKey
directory (tags 34735/34737).write_tiff() gains tiled,
tile_size, bigtiff, and crs
arguments.
tiled = TRUE emits TIFF tags 322/323/324/325 in place
of strip tags. tile_size accepts a single integer (square)
or a length-2 c(w, h); both dimensions must be positive
multiples of 16. Default 256. Tiled output is the layout required for
Cloud-Optimized GeoTIFF.bigtiff = "auto" (default) auto-promotes to BigTIFF
(magic 0x002B, 64-bit offsets) when the expected raw
payload exceeds the classic-TIFF 4 GB ceiling; TRUE forces
BigTIFF; FALSE forces classic TIFF. Tiled BigTIFF is not
yet supported.crs accepts an integer EPSG code, an
"EPSG:xxxx" string, or a list with $epsg,
$geographic, and optional $citation. Outputs
round-trip through terra::rast() for 4326, 3857, and
31287.collect() / block_array_gather:
empty-string slots now shortcut to R_BlankString.
Previously the gather paths called
Rf_mkCharLenCE(NULL, 0, ...) and the dedup cache called
memcmp(NULL, ...) when a batch happened to contain only
empty/NA strings, tripping UBSAN’s nonnull check even
though the length was zero.*_push helpers (vec_buf_push,
vec_array_push, …) consolidated into a single
vec_grow_to growth primitive.configure / configure.win: rewritten as
POSIX /bin/sh (previously #!/usr/bin/env bash
with set -o pipefail and [[ ... ]]). Bash is
not guaranteed on all CRAN build hosts.src/window.c: the OpenMP task-parallel merge sort
helper was defined unconditionally but called only from
#ifdef _OPENMP branches, producing a clang
-Wunneeded-internal-declaration warning under Debian’s
no-OpenMP build. The definition now shares the guard.tdc: all fprintf(stderr, ...)
debug/timing prints are routed through a TDC_LOG(...) macro
that is a no-op unless TDC_ENABLE_STDERR_LOG is defined at
build time, so the released .so contains no
stderr / fprintf symbols. Addresses the WRE
§1.6.4 policy forbidding compiled code from writing to
stdout/stderr.collect(): fix use-after-free in the cross-batch
CHARSXP dedup cache. Each slot stored a raw pointer into the decoder’s
heap buffer, which is freed when the batch is consumed; the next batch’s
hash-collision memcmp then dereferenced freed memory.
Manifested as segfaults on the second consecutive collect()
of a large multi-rowgroup string-heavy .vtr (register,
backbones), more likely under the parallel reader where batches
accumulate before the serial consumer loop. Now verifies cache hits
against CHAR(sexp), which points into the still-alive
interned CHARSXP body.tdc, a standalone
typed-dimensional compression library vendored into
src/tdc/. Encode and decode go through a self-describing
block record (model + transform chain + entropy) rather than per-column
tag constants. Deleted vtr_codec.c,
vtr_encodings.c, vtr_compress.c,
vtr1.c, and vtr_codec_internal.h..vtr on-disk format is a deliberate breaking
change: pre-0.5 files are not readable. write_vtr() and
append_vtr() write the new container; tbl()
reads only the new container.tools/vendor_tdc.sh and
configure / configure.win pull the latest
upstream tdc on every install when the source checkout is
present; the pre-vendored copy is used otherwise.tdc’s dictionary-encoded varlen output when it becomes a
hot spot.man/write_vtr.Rd: replaced a literal percent sign in
the compress argument description that produced malformed
Rd output on build.write_vtr(), append_vtr() and
delete_vtr() now use MoveFileEx with a short
retry loop for the final temp-to-target swap. Previously, a preceding
tbl() read could leave the target file mmap’d pending GC,
and the swap would fail with a sharing violation.vtr_schema(), link(), and
lookup() functions for star-schema workflows. Register a
fact table with named dimension links once, then pull columns from any
dimension without writing explicit joins. Only referenced dimensions are
scanned.lookup() reports unmatched keys per dimension by
default, catching referential integrity issues before they propagate NAs
silently."left" (default) and "inner"
join modes, named keys for differing column names, and reusable schema
objects across multiple queries.int64_t memory access in
vtr_codec.c (UBSAN). Dictionary encoding wrote and read
8-byte offsets through an unaligned pointer; delta decoding had the same
issue. All fixed with memcpy.append_vtr(df, path): append a data.frame as a new row
group to an existing .vtr file. Existing row groups are
never rewritten.delete_vtr(path, row_ids): logically delete rows by
0-based physical index. Writes a tombstone side file
(<path>.del); the .vtr file is never
modified. Deletions are cumulative and excluded automatically on the
next tbl() call.diff_vtr(old_path, new_path, key_col): key-based
logical diff between two .vtr files. Returns a list with
added (a lazy vectra_node) and
deleted (a vector of key values). Implemented as a
single-pass C streaming engine with O(n_unique_keys) memory.tolower(), toupper(),
trimws(): case conversion and whitespace trimming for
string columns in filter() and mutate().levenshtein(x, y) /
levenshtein_norm(x, y): Levenshtein edit distance and
normalised variant (0–1). Supports column-vs-column and
column-vs-literal comparisons. Optional max_dist argument
for early termination.dl_dist(x, y) / dl_dist_norm(x, y):
Damerau-Levenshtein distance (counts transpositions as cost 1) and
normalised variant.jaro_winkler(x, y): Jaro-Winkler similarity (0–1,
higher = more similar). All string-similarity functions propagate
NA and work in filter() and
mutate().resolve(fk, pk, value): scalar self-join — looks up
value where pk == fk within the same batch.
Useful for denormalising parent-child tables without a join.propagate(parent_id, id, seed): tree-traversal
aggregation — propagates non-NA seed values down a
parent-child hierarchy until all reachable nodes are filled. Converges
in O(depth) passes..vtr format version 4 with a two-layer codec (no
external dependencies):
PLAIN (default), DICTIONARY
(string columns with < 50% unique values), DELTA
(monotonically increasing int64 columns).LZ_VTR, ~120
lines of C). Applied after encoding; skipped for buffers < 64 bytes
or when compression does not reduce size. Files written with v4 are
typically 30–60% smaller than v3. tbl() reads v1–v4 files;
write_vtr() always writes v4..vtr v3 per-rowgroup min/max statistics to skip entire
row groups..vtr format version 3 with per-column per-rowgroup
statistics (min/max).rank() and dense_rank()
(replaces O(n²) comparison-based).summarise():
summarise(m = mean(x + y)) auto-inserts a hidden
mutate.year(), month(), day(),
hour(), minute(), second():
date/time component extraction for Date and POSIXct columns.as.Date() and as.POSIXct() literals in
filter expressions (e.g.
filter(date > as.Date("2020-01-01"))).as.Date(string_col): convert ISO-format date strings to
Date values.nchar(): returns string length as integer.substr(x, start, stop): substring extraction (1-based,
like R).grepl(pattern, x): fixed string matching (no
regex).paste0(a, b): two-argument string concatenation.gsub(pattern, replacement, x) / sub():
fixed-string replacement.startsWith() / endsWith(): string
prefix/suffix matching.pmin() / pmax(): element-wise
minimum/maximum.log2(), log10(), sign(),
trunc(): additional math functions.sd() and var(): sample standard deviation
and variance via Welford’s online algorithm. Returns NA for groups with
fewer than 2 values (R semantics).first() and last(): first and last non-NA
value per group. Both support na.rm = TRUE.slice_min() and slice_max() gain a working
with_ties parameter (default TRUE). Ties at
the boundary are now included by default; use
with_ties = FALSE for exactly n rows.count() and tally() gain a working
sort parameter. sort = TRUE returns results in
descending order of the count column.transmute() and reframe() now support
across().distinct(.keep_all = TRUE) with a column subset now
emits a message when falling back to R.glimpse(): preview column names, types, and first few
values without collecting the full result.collect() now works on data.frames (no-op), so
slice_min(...) |> collect() works regardless of the
with_ties path.vignette("quickstart").@details sections added to filter(),
mutate(), summarise(), arrange(),
distinct(), count(), and join functions.group_by() |> summarise() path for
spill-safe aggregation.int64 <-> double) in
join keys and bind_rows().rank() and dense_rank() window
functions..vtr format version 2 with per-column annotations.write_vtr() / collect().where() predicates work in select(),
rename(), relocate(), and
across().vignette("engine"))..vtr) with
multi-row-group support.filter(),
select(), mutate(), transmute(),
rename(), relocate(), group_by(),
summarise(), count(), tally(),
distinct(), reframe(), arrange(),
slice_head(), slice_tail(),
slice_min(), slice_max(),
pull().left_join(), inner_join(),
right_join(), full_join(),
semi_join(), anti_join().bind_rows() and bind_cols() for combining
queries.row_number(), lag(),
lead(), cumsum(), cummean(),
cummin(), cummax().across() support in mutate() and
summarise().explain() for inspecting the execution plan.tidyselect integration for column selection
helpers..vtr, CSV, SQLite, GeoTIFF.write_csv(), write_sqlite(),
write_tiff().