add a future work section regarding making the data space tamper proof by Rahien · Pull Request #2 · lblod/gitbook-decide-write-up

Rahien · 2026-06-22T13:42:42Z

No description provided.

mirdono

Imo the text is a bit of a weird mix between concrete and vague. On one hand, specific technologies are mentioned, with some rather concrete steps of how they are to be used. On the other hand, the responsibilities and data flows are not entirely clear to me. For example, who is responsible for signing which data?

Furthermore, should we not also consider the DCAT data itself? For example, how will we detect someone changing a distribution's dcat:downloadURL as well as http://spdx.org/rdf/terms#checksum.

mirdono · 2026-06-22T15:42:25Z

+### Making the data space tamper-proof

-### Possible future work LBLOD related
+The data space will provide a couple of DCAT Distributions holding various represenation of the space's data sets. When downloading, users will want to be sure that the contents of these sets has not been tampered with. This can be guaranteed for simple downloads by creating a [SHA-256 hash](https://datatracker.ietf.org/doc/html/rfc6234) of the archive's contents, then signing the resulting checksum with the private key of the owner of the dataset and publishing that hash as part of the distribution's DCAT description, for instance using the `http://spdx.org/rdf/terms#checksum` predicate. The public key corresponding to this private key can be published using the owner's DID (see [write-up-verifiable-credentials.md](write-up-verifiable-credentials.md "mention")). Users of the distribution can then easily figure out if the distribution has been tampered with by applying the public key of the owner that they find in the owner's DID to the signature and verifying that the SHA256 of their download results in the same checksum value.


Question: What kind of "archives" are we talking about here?

Do you mean actual archive files (zip, tar.gz, ...) or something else? In case of former, why bother with manually calculating the checksum and signing that? You are probably better of using something tried and tested like gnupg to sign files and than publishing the (contents of the) sig file. Side note, creating a SHA-256 checksum "of the archive's contents", as you say, has the same problems as with the triples in the following paragraph, order matters.

mirdono · 2026-06-22T15:42:25Z

-### Possible future work LBLOD related
+The data space will provide a couple of DCAT Distributions holding various represenation of the space's data sets. When downloading, users will want to be sure that the contents of these sets has not been tampered with. This can be guaranteed for simple downloads by creating a [SHA-256 hash](https://datatracker.ietf.org/doc/html/rfc6234) of the archive's contents, then signing the resulting checksum with the private key of the owner of the dataset and publishing that hash as part of the distribution's DCAT description, for instance using the `http://spdx.org/rdf/terms#checksum` predicate. The public key corresponding to this private key can be published using the owner's DID (see [write-up-verifiable-credentials.md](write-up-verifiable-credentials.md "mention")). Users of the distribution can then easily figure out if the distribution has been tampered with by applying the public key of the owner that they find in the owner's DID to the signature and verifying that the SHA256 of their download results in the same checksum value.
+
+The same process can be done to certify the correctness of the DCAT distribution regarding a certain dataset. We could construct a [n-triples](https://www.w3.org/TR/rdf12-n-triples/) file that contains all triples making up the dataset and its distributions in a stable, repeatable fashion, for instance by sorting the triples by subject, then by predicate and then by object, excluding our signature predicate itself. We can then take this n-triples file and again perform a SHA-256 hash on it and signing the result using the private key of the owner of the DCAT description, likely the owner of the dataset or the owner of the dataspace.


Remark: It is unclear to me what exactly is meant to be verified here.

Is this meant to allow recipients of a distribution to verify that the received distribution contains the same data as the actual dataset? If so, this would require that recipients construct an n-triples file starting from the received distribution and check whether that file's checksum/signature matches the published one. This seems rather complicated and very fragile to do.

Side note, to me the last sentence leaves doubt as to which private key should be used, possibly even implying private keys are passed around to sign data/checksums at the right place. This would be a big no no.

add a future work section regarding making the data space tamper proof

8b2080b

mirdono reviewed Jun 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a future work section regarding making the data space tamper proof#2

add a future work section regarding making the data space tamper proof#2
Rahien wants to merge 1 commit into
masterfrom
karel/lbron-1599-tamper-proof-dcat

Rahien commented Jun 22, 2026

Uh oh!

mirdono left a comment

Uh oh!

mirdono Jun 22, 2026

Uh oh!

mirdono Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Rahien commented Jun 22, 2026

Uh oh!

mirdono left a comment

Choose a reason for hiding this comment

Uh oh!

mirdono Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

mirdono Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants