Skip to content

[Analyzer] rda plugin#2457

Open
EagleoutIce wants to merge 28 commits into
mainfrom
2096-rda-plugin
Open

[Analyzer] rda plugin#2457
EagleoutIce wants to merge 28 commits into
mainfrom
2096-rda-plugin

Conversation

@EagleoutIce

@EagleoutIce EagleoutIce commented Apr 23, 2026

Copy link
Copy Markdown
Member
  • Add a new load built in processor
  • In the processor do:
    • Sanity checks (not 1 arg, ...)
    • Identify the loaded file (similar to builtin:source)
    • Load an parse the file with the new logic
    • Apply the definitions (as send via dc)
    • return the new information
  • Add new tests (overwrite etc.)
  • Apply the "only if file exists" control dependency
  • Try to identify variables vs. functions
  • Add flowr main config options to avoid reading these load files

=> If the file cannot be found/parsed/shouldn't be loaded treat it as an unknown side effect.

@stimjannik stimjannik self-assigned this May 2, 2026
@EagleoutIce EagleoutIce changed the title 2096 rda plugin [Analyzer] rda plugin May 7, 2026
@EagleoutIce EagleoutIce marked this pull request as ready for review June 12, 2026 16:39
Comment thread src/dataflow/environments/built-in.ts Outdated
Comment thread src/dataflow/environments/default-builtin-config.ts Outdated
Comment thread src/dataflow/internal/process/functions/call/built-in/built-in-load.ts Outdated
Comment thread src/dataflow/internal/process/functions/call/built-in/built-in-load.ts Outdated
Comment thread test/functionality/util/project/plugin/random-r-code-generator.ts
Comment thread test/functionality/util/project/plugin/random-r-code-generator.ts
Comment thread package.json Outdated
Comment thread package.json
Comment thread package.json Outdated

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds first-class support for analyzing R’s load() built-in by introducing an RDA/RData parser, wiring a dedicated builtin:load processor into the dataflow environment, and adding configuration/docs/tests to control or validate the behavior.

Changes:

  • Add builtin:load processor that resolves the file argument, parses .rda/.RData, and injects loaded symbols (including closures) into the dataflow graph.
  • Introduce an RDA/RData parsing implementation (FlowrRDAFile / RDAParser) with compression handling (gzip/bzip2/xz/lzma) and add new runtime dependencies.
  • Add functionality/dataflow tests plus a “real-world” download pipeline, and add ignoreLoadCalls config + documentation.

Reviewed changes

Copilot reviewed 17 out of 18 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
test/functionality/util/project/plugin/random-r-code-generator.ts Adds random R code generator used to produce varied .rda test fixtures.
test/functionality/project/plugin/load-pipeline/setup.sh Adds helper script to download real-world .rda/.RData samples from Zenodo.
test/functionality/project/plugin/load-pipeline/README.md Adds placeholder README for the load pipeline tests.
test/functionality/project/plugin/load-pipeline/load-pipeline.test.ts Adds parser-vs-R “oracle” tests for many generated .rda files and optional real-world fixtures.
test/functionality/project/plugin/load-pipeline/.gitignore Ignores downloaded fixture files directory.
test/functionality/dataflow/main/functions/dataflow-load.test.ts Adds dataflow tests validating load() symbol definitions/overwrite/callability/config handling.
src/project/plugins/file-plugins/files/flowr-rda-file.ts Introduces RDA/RData parsing + (de)compression logic and object flattening.
src/documentation/wiki-interface.ts Documents new ignoreLoadCalls config option.
src/dataflow/internal/process/functions/call/built-in/built-in-load.ts Implements processLoadCall to model load() in the dataflow graph.
src/dataflow/environments/default-builtin-config.ts Switches load to the new builtin:load processor.
src/dataflow/environments/built-in.ts Registers builtin:load processor.
src/dataflow/environments/built-in-proc-name.ts Adds BuiltInProcName.Load.
src/control-flow/semantic-cfg-guided-visitor.ts Adds CFG visitor dispatch hook for builtin:load.
src/config.ts Adds ignoreLoadCalls to config interface/defaults/schema.
package.json / package-lock.json Adds deps for RDA decompression + license clarifications wiring.
license-clarifications.json Adds license override for bzip2@0.1.1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +430 to +437
const pickIndex = () => {
const r = this.rnd.int(total);
for(let i = 0; i < weights.length; i++) {
if(r < weights[i]) {
return i;
}
}
};
Comment on lines +231 to +233
generatePromise(): { value: string, type: string, len: number } {
return { value: 'delayedAssign("x", msg)', type: 'promise', len: 1 };
}
Comment on lines +348 to +354
generateMatrix(maxNestingLevel: number): { value: string, type: string, len: number } {
const rows = this.rnd.int(3) + 1;
const cols = this.rnd.int(3) + 1;
const elements = this.generateList(maxNestingLevel - 1, maxNestingLevel, rows);
const byRow = this.rnd.pick(['TRUE', 'FALSE']);
return { value: `matrix(c(${elements.value}), nrow = ${rows}, ncol = ${cols}, byrow = ${byRow})`, type: 'matrix', len: elements.len };
}
'none'
];

const tempFolder = fs.mkdtempSync(path.resolve(os.tmpdir(), '/tmp/flowr-load-pipeline-test'));
Comment on lines +42 to +53
const encoding = rnd.pick(saveFormats);
const version = rnd.pick(versions);
const compression = rnd.pick(compressions);

it(`Encoding: ${encoding}, Version: ${version}, Compression: ${compression} - run ${i} - seed ${seed + i}`, () => {

const rcg = new RandomRCodeGenerator(rnd);

const { rCode, vars } = rcg.generateRCode(objectsPerRun, maxNestingLevel);

const shellCode = `${rCode}
save(${vars.join(', ')}, file="${file}", ascii = ${encoding}, version = ${version})`;
Comment on lines +1615 to +1618
primCache = {};
primCache.type = SexpType.VecSxp;
primCache.value = funTabSize;
}
Comment on lines +965 to +969
const name = this.inString(len);
const index = (RFunTabOffsets as Record<string, string | number>)[name] as number;
if(name in RFunTabOffsets) {
s = this.mkPrimSxp(index, SexpType.BuiltInSxp);
} else {
Comment on lines +2290 to +2299
if(x.altRep) {
const ans = (x as RObject[])[i];
/* the element is marked as not mutable since complex
assignment can't see reference counts on any intermediate
containers in an ALTREP */
// MARK_NOT_MUTABLE(ans);
return ans;
} else {
return (x as RObject[])[i];
}
return { target: undefined, source: undefined };
}

private onLoadCall(_param: { call: DataflowGraphVertexFunctionCall }) {}
Comment on lines +14 to +17
const runs = 300;
const seed = 0;
const objectsPerRun = 5;
const maxNestingLevel = 1;
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants