Skip to content

Test output: buffer the result stream and drop the per-line flush (~6x faster file output)#15

Closed
antalvdb wants to merge 1 commit into
masterfrom
cli-output-no-per-line-flush
Closed

Test output: buffer the result stream and drop the per-line flush (~6x faster file output)#15
antalvdb wants to merge 1 commit into
masterfrom
cli-output-no-per-line-flush

Conversation

@antalvdb

Copy link
Copy Markdown
Member

Summary

Writing test results is dominated by write() syscalls. Profiling a batch test run shows ~82% of the time in __write_nocancel: the default std::ofstream buffer is tiny, so emitting one result per test instance becomes a flood of small writes, and show_results() additionally flushes the stream on every line via std::endl.

This:

  • gives outStream a 1 MB buffer (a per-experiment member, set with rdbuf()->pubsetbuf() before open()), so lines batch into a handful of large writes, and
  • writes '\n' instead of std::endl in show_results() so the per-line flush doesn't defeat the buffer.

The stream is flushed when it is closed at the end of testing, so the output is unchanged.

Measured impact

IGTree, 512k test instances written to a file (reused saved instance base):

wall time throughput instructions
before ~31 s ~17k inst/s ~430 B
after ~4.6 s ~111k inst/s ~36 B

6.5× faster wall-clock, ≈ 12× fewer instructions. Both changes matter: the buffer alone takes ~430 B → ~47 B (even with endl), and '\n' then takes ~47 B → ~36 B; either change alone with the default buffer makes no difference. As a side effect this shows the actual IGTree classification is only ~17 B instructions for 512k (~33k each) — the rest was write-syscall overhead.

Correctness

Output is byte-identical and complete — verified on IGTree (512k lines) and TRIBL2 (20k), both plain and with +v db (distributions printed).

Notes

  • pubsetbuf must precede open() to take effect; this is honoured by both libc++ and libstdc++.
  • The rarer neighbour-output path (NS_Test) still uses endl; only the common show_results() path is changed here.

Posting for your consideration — happy to adjust (e.g. buffer size, or making it configurable).

🤖 Generated with Claude Code

Writing test results was dominated by write() syscalls: the default
std::ofstream buffer is tiny, so one result per test instance turned into a
flood of small writes, and show_results() additionally flushed the stream on
every line via std::endl.

Give outStream a 1 MB buffer (a per-experiment member, set with
rdbuf()->pubsetbuf() before open()) and write '\n' instead of std::endl in
show_results(). The stream is flushed when it is closed at the end of testing,
so output is unchanged.

Measured (IGTree, 512k test instances written to a file; reused saved base):
  - wall time:    ~31 s -> ~4.6 s  (~17k -> ~111k instances/s, ~6.5x)
  - instructions: ~430 B -> ~36 B  (~12x)

Output is byte-identical (verified IGTree 512k and TRIBL2, plain and +v db).
pubsetbuf must precede open(); honoured by both libc++ and libstdc++.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@kosloot

kosloot commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

merged into the develop branch

@kosloot kosloot closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants