Uniform Data Fingerprints

A unified notation for trust anchors

The Mesh uses UDF fingerprints to uniquely identify profiles. A fingerprint is a human or machine readable identifier that is bound to the thing it identifies with a high level of confidence.

The UDF fingerprint is desribed in depth in draft-hallambaker-udf.

Strong Internet Names are described in draft-hallambaker-sin.

The normal representation of a UDF fingerprint is as a string of letters and digits. To allow fingerprints to be read aloud, the letters may be entered as either upper or lower case and numbers that look like letters have been avoided (Base32 encoding). For ease of entry and comparison, the characters are arranged into blocks of five characters separated by dashes.

For example, the following is a UDF fingerprint with five segments. This is the form that would usually be used for most purposes:

Calculating a Fingerprint

UDF fingerprints are calculated using a Cryptographic Digest function. This is a form of hash function that takes a variable length input and creates a fixed length output. Currently UDF fingerprints are created using the SHA-2-512 digest algorithm.

The details of the fingerprint value calculation are set out in the specification. These are designed to ensure that different applications can make use of UDF fingerprints without the risk that an attacker might find a way to devise an ambiguous data structure so that by getting the structure trusted for one particular purpose, the attacker could later make use of it for another.

The first byte of the fingerprint value contains a version identifier which may be used to specify the use of different digest algorithms and/or formats.

Fingerprint = <Version-ID> + H (<Content-ID> + ?:? + H(<Data>))

Where:

H(x) is the cryptographic digest function <Version-ID> is the fingerprint version and algorithm identifier. <Content-ID> is the MIME Content-Type of the data. <Data> is the binary data.

The following version identifiers have been assigned:

SHA-2-512 = 96 SHA-2-512 (compressed) = 97, 98, 99, 100 SHA-3-512 = 144

Precision

The 25 character UDF fingerprint has a work factor of 117 bits which is close enough to a 128 bit work factor to be acceptable for most purposes. But it is rather larger than a traditional PGP fingerprint. To allow UDF fingerprints to fit on a business card or other constrained spaces, a UDF fingerprint MAY be truncated to 4 blocks. This provides a 92 bit work factor which is adequate for most purposes but not generous.

The truncated fingerprint is simply the first four blocks of the five block form:

MB2GK-6DUF5-YGYYL-JNY5E

The short fingerprint is easier to use but considerably weaker. It is probably adequate for most 'first contact' applications but we would prefer a higher degree of security. Applications that verify fingerprints can do this through a process called 'fingerprint stretching'.

When an application verifies a fingerprint, it has to calculate the full 512 bit fingerprint value then truncate it to present it in the BASE32 format. The first time an application verifies a fingerprint value, it can store a much longer result for future comparisons. A ten block fingerprint provides a 242 bit work factor which is sufficient for most purposes.

MB2GK-6DUF5-YGYYL-JNY5E-RWSHZ-SV75J-C4OZQ-5GIN2-GQ7FQ-EEHFI

Once the Cryptomesh is established it will be possible to use it to provide a fingerprint stretching service. The first fingerprint enrolled in the Cryptomesh linked log would be considered to be the definitive fingerprint value for a given prefix. This approach allows fingerprints as short as two blocks or even a single block to be used securely.

Another way to shorten fingerprints that does not require the deployment of the Cryptomesh is to brute force generate fingerprints until one is found that allows one of the compressed presentation formats to be used. If the first 25 bits of the fingerprint value are zeros, the presentation is shortened by one block without loss of work factor.

Alternative Comparison Presentations

The Base32 presentation is designed to support entry and comparison of fingerprints. Many uses of fingerprints only require the ability to compare two fingerprints to see if they match. Such uses may be better served by encodings that use large dictionaries of words or images for comparison.

Using a dictionary of 2^15 words allows a comparison to be made between nine words rather than 25 characters. This is likely to be easier to use but only if the user speaks the language used to compile the dictionary.

A dictionary of images with 2^18 distinct images would allow the same work factor to be achieved using seven images.