[erlang-questions] canonical encoding for term_to_binary/binary_to_term ?

zxq9 zxq9@REDACTED
Wed Nov 8 00:08:05 CET 2017


On 2017年11月07日 火曜日 21:07:45 Benoit Chesneau wrote:
> Following the recent discussion on maps ordering, I’m curious if someone has already worked on an implementation of a canonical term_to_binary/binary_to_term to be able to sign them and make sure we get the same data across the wire.
> 
> Is there  any lib that does that around? For now i’m just using a canonical version of JSON but it’s pretty inefficient …

So far I've had success with converting everything to a canonical basic form of tuples and sorted lists. The trick there is that every message in a protocol (or document system, whatever) requires a known schema so that you can identify whether a list of things is a list that needs to be sorted, or is a text string that must never be sorted. Tagging elements with atoms works just fine for that going into the format and back out again, but my point is there is always a requirement for prior knowledge of the data.

This hasn't really been much extra annoyance in practice because the system where I do this already has well defined schemas, so writing an importer/exporter isn't much work. It is a bit less trouble than defining an ASN.1 schema to serialize the data as DER, which is what we did before, and that'S guaranteed to work everywhere. This may be "inefficient" in some sense but its never been a bottleneck.

Even with a BIF, the burden is always going to be on the programmer to canonicalize and mark strings prior to sending anything to a pre-sign canonical serialization function because there is no way a serializer can know the difference between the meanings of certain nested structures.

-Craig



More information about the erlang-questions mailing list