Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,8 @@

12. `print.data.table()` now truncates long character columns and list-column summaries by default to avoid horizontal console overflow, [#7718](https://github.com/Rdatatable/data.table/issues/7718). When `datatable.prettyprint.char` is `NULL` (the default), the truncation limit is now dynamically calculated based on the available console width. Use `options(datatable.prettyprint.char=Inf)` for the old default behavior (never truncate). Thanks @tdhock for the report and @venom1204 for the fix.

13. `rbindlist()` (and therefore the `rbind()` method for `data.table`s) no longer raises an error upon encountering more than approximately 50000 columns in a list entry, [#7793](https://github.com/Rdatatable/data.table/issues/7793). The bug was introduced in `data.table` version 1.18.2.1. Thanks to @rickhelmus for the report and @aitap for the fix.

### Notes

1. {data.table} now depends on R 3.5.0 (2018).
Expand Down
6 changes: 6 additions & 0 deletions inst/tests/tests.Rraw
Original file line number Diff line number Diff line change
Expand Up @@ -21669,3 +21669,9 @@ test(2375.3, print(data.table(x=c("short", "abcdefghijklmnopqrstuvwxyz"))), outp
test(2375.4, print(data.table(x="abcdefghijklmnopqrstuvwxyz")), output="abcdefghijklmnopqrstuvwxyz", options=list(width=200, datatable.prettyprint.char=NULL))
test(2375.5, print(data.table(id=1L, score=99.1, txt="abcdefghijklmnopqrstuvwxyz")), output="abcdefghijklmn...", options=list(width=20, datatable.prettyprint.char=NULL))
test(2375.6, print(data.table(x=rep("ABCDEFGHIJKLMNOPQRSTUVWXYZ", 1e6)), topn=1), output="1000000: ABCDEFGHIJKLM...", options=list(width=25, datatable.prettyprint.char=NULL))

# rbindlist() used to put O(ncol(...)) > R_PPStackSize items on the protect stack, #7793
# Assuming 50000 as the default protection stack size, although R --max-ppsize=... can increase it to 500000.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per ?Memory, max-ppsize can be set up to 500,000, and I'm not aware of a way to poll it (at a glance R_PPStackSize appears to be non-API)

Probably best to just point to ?Memory and mention here "this test assumes the default value of 50,000", which has been the default for 15+ years, but if it does change in the future, it would be good to know the reason 50,001 was chosen.

x = as.data.table(as.list(1:50001))
test(2376, rbindlist(list(x)), x)
rm(x)
14 changes: 9 additions & 5 deletions src/rbindlist.c
Original file line number Diff line number Diff line change
Expand Up @@ -250,7 +250,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor
setAttrib(ans, R_NamesSymbol, ansNames);
if (idcol) {
SET_STRING_ELT(ansNames, 0, STRING_ELT(idcolArg, 0));
SEXP idval, listNames=getAttrib(l, R_NamesSymbol);
SEXP idval, listNames=PROTECT(getAttrib(l, R_NamesSymbol));
if (length(listNames)) {
SET_VECTOR_ELT(ans, 0, idval=allocVector(STRSXP, nrow));
for (int i=0,ansloc=0; i<LENGTH(l); ++i) {
Expand All @@ -270,16 +270,19 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor
for (int k=0; k<thisnrow; ++k) idvald[ansloc++] = i+1;
}
}
UNPROTECT(1); // listNames
}

SEXP coercedForFactor = NULL;
PROTECT_INDEX IcoercedForFactor;
SEXP coercedForFactor = R_NilValue;
PROTECT_WITH_INDEX(coercedForFactor, &IcoercedForFactor); nprotect++;
for(int j=0; j<ncol; ++j) {
int maxType=LGLSXP; // initialize with LGLSXP for test 2002.3 which has col x NULL in both lists to be filled with NA for #1871
bool factor=false, orderedFactor=false; // ordered factor is class c("ordered","factor"). isFactor() is true when isOrdered() is true.
int longestLen=-1, longestW=-1, longestI=-1; // just for ordered factor; longestLen must be initialized as -1 so that rbind zero-length ordered factor could work #4795
PROTECT_INDEX ILongestLevels;
SEXP longestLevels=R_NilValue; // just for ordered factor
PROTECT_WITH_INDEX(longestLevels, &ILongestLevels); nprotect++;
PROTECT_WITH_INDEX(longestLevels, &ILongestLevels);
bool int64=false, date=false, posixct=false, itime=false, asis=false;
const char *foundName=NULL;
bool anyNotStringOrFactor=false;
Expand Down Expand Up @@ -349,7 +352,7 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor
if (factor && anyNotStringOrFactor) {
// in future warn, or use list column instead ... warning(_("Column %d contains a factor but not all items for the column are character or factor"), idcol+j+1);
// some coercing from (likely) integer/numeric to character will be needed. But this coerce can feasibly fail with out-of-memory, so we have to do it up-front
if (coercedForFactor==NULL) { coercedForFactor=PROTECT(allocVector(VECSXP, LENGTH(l))); nprotect++; }
if (coercedForFactor==R_NilValue) REPROTECT(coercedForFactor=allocVector(VECSXP, LENGTH(l)), IcoercedForFactor);
for (int i=0; i<LENGTH(l); ++i) {
SEXP li = VECTOR_ELT(l, i);
int w = usenames ? colMap[i*ncol + j] : (j<length(li) ? j : -1); // check if j exceeds length for fill=TRUE and usenames=FALSE #5444
Expand Down Expand Up @@ -563,7 +566,8 @@ SEXP rbindlist(SEXP l, SEXP usenamesArg, SEXP fillArg, SEXP idcolArg, SEXP ignor
ansloc += thisnrow;
}
}
UNPROTECT(1); // longestLevels
}
UNPROTECT(nprotect); // ans, ansNames, longestLevels? coercedForFactor?
UNPROTECT(nprotect); // ans, ansNames, coercedForFactor
return(ans);
}
Loading