Skip to content

Commit

Permalink
modified HIP-15 to have a final permutation (multiply by 1,000,003 mo…
Browse files Browse the repository at this point in the history
…d 26^5)
  • Loading branch information
lbaird committed May 6, 2021
1 parent 568100d commit 9eeae39
Show file tree
Hide file tree
Showing 5 changed files with 466 additions and 29 deletions.
71 changes: 42 additions & 29 deletions HIP/hip-15.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
```
---
hip: 15
title: Address Checksum
author: Leemon Baird (@lbaird)
type: Standards Track
category: API
status: Draft
created: 2020-03-11
created: 2021-03-11
discussions-to: https://github.com/hashgraph/hedera-improvement-proposal/discussions/47
---
```

## Abstract

Expand All @@ -29,29 +31,29 @@ It is therefore useful to catch such errors before the transaction is sent to th

## Specification

Software should be written to always display Hedera entity addresses in with-checksum format (such as `0.0.123-laujm`), with the checksum after a dash, all lowercase, and no spaces or other characters added. It should accept address inputs in either no-checksum ( `0.0.123` ) or with-checksum ( 0.0.123-laujm ) format, all lowercase, with no additional whitespace or punctuation allowed, and no leading zeros for the integers. So these would both be accepted:
Software should be written to always display Hedera entity addresses in with-checksum format (such as `0.0.123-vfmkw`), with the checksum after a dash, all lowercase, and no spaces or other characters added. It should accept address inputs in either no-checksum ( `0.0.123` ) or with-checksum ( 0.0.123-vfmkw ) format, all lowercase, with no additional whitespace or punctuation allowed, and no leading zeros for the integers. So these would both be accepted:

```
0.0.123
0.0.123-laujm
0.0.123-vfmkw
```

If the user enters any other format, or the checksum doesn't match, then the input should not be accepted, and the user should be told that it is incorrect, such as in these cases:

```
0.0.123-abcde
0.00.123
0.0.0123-laujm
0.0.123-LAUJM
0.0.123-lAuJm
0.0.123#laujm
0.0.123laujm
0.0.123 - laujm
0.0.0123-vfmkw
0.0.123-VFMKW
0.0.123-vFmKw
0.0.123#vfmkw
0.0.123vfmkw
0.0.123 - vfmkw
0.123
0.0.123.
0.0.123-la
0.0.123-lau-jm
0.0.123-laujmxxxx
0.0.123-vf
0.0.123-vfm-kw
0.0.123-vfmkwxxxx
```

An address that is received as input should be rejected if it doesn't match the following regex. It should also be rejected if its checksum is incorrect.
Expand All @@ -66,7 +68,7 @@ An address that is displayed or sent as output should always be generated such t
/^(0|(?:[1-9]\d*))\.(0|(?:[1-9]\d*))\.(0|(?:[1-9]\d*))(?:-([a-z]{5}))$/
```

The checksum (such as `laujm`) is calculated from the no-checksum address (such as `0.0.123` ) by this algorithm:
The checksum (such as `vfmkw`) is calculated from the no-checksum address (such as `0.0.123` ) by this algorithm:

```
a = a valid no-checksum address string, such as 0.0.123
Expand All @@ -79,6 +81,7 @@ s1 = (d[1] + d[3] + d[5] + d[7] + ...) mod 11
s = (...((((d[0] * 31) + d[1]) * 31) + d[2]) * 31 + ... ) * 31 + d[d.length-1]) mod p3
sh = (...(((h[0] * 31) + h[1]) * 31) + h[2]) * 31 + ... ) * 31 + h[h.length-1]) mod p5
c = (((d.length mod 5) * 11 + s0) * 11 + s1) * p3 + s + sh ) mod p5
c = (c * 1000003) % p5
checksum = c, written as 5 digits in base 26, using a-z
```

Expand All @@ -94,18 +97,26 @@ The reference implementation is the Java code linked to in the Reference Impleme

```
For ledger ID 0x00:
0.0.1-auzeb
0.0.123-laujm
0.0.1234567890-ueafv
12.345.6789-idmsv
1.23.456-qzwsb
0.0.1-dfkxr
0.0.4-cjcuq
0.0.5-ktach
0.0.6-tcxjy
0.0.12-uuuup
0.0.123-vfmkw
0.0.1234567890-zbhlt
12.345.6789-aoyyt
1.23.456-adpbr
For ledger ID 0xa1ff01:
0.0.1-ktdue
0.0.123-uyyzp
0.0.1234567890-ecevy
12.345.6789-sbriy
1.23.456-aybie
0.0.1-xzlgq
0.0.4-xdddp
0.0.5-fnalg
0.0.6-nwxsx
0.0.12-povdo
0.0.123-pzmtv
0.0.1234567890-tvhus
12.345.6789-vizhs
1.23.456-uxpkq
```

The checksum is always 5 lowercase letters, and is guaranteed to catch any of the following errors:
Expand All @@ -115,25 +126,27 @@ The checksum is always 5 lowercase letters, and is guaranteed to catch any of th
- modify a digit (or 2 adjacent digits)
- swap two different adjacent digits

Doing any of those modifications is guaranteed to change the checksum. In fact, it is guaranteed to change at least one of the first 2 letters of the checksum. Furthermore, if the no-checksum part of the address were simply replaced with a random one, or the ledger ID were replaced with a random one, then it is extremely unlikely that the new address would have the same 5-character checksum as the old address (less than one in a million chance).
Doing any of those modifications is guaranteed to change the checksum. In fact, it is guaranteed to change at least one of the first 2 letters of the checksum (before the final permutation). Furthermore, if the no-checksum part of the address were simply replaced with a random one, or the ledger ID were replaced with a random one, then it is extremely unlikely that the new address would have the same 5-character checksum as the old address (less than one in a million chance).

In the algorithm for calculating the checksum, the variable s is a weighted sum of all the digits (mod 26^3), sh is a weighted sum of all the bytes of the ledger ID padded with 6 zeros (mod 26^5), s0 is a sum of the digits in the even positions (mod 11), and s1 is a sum of the digits in the odd positions (mod 11). If a digit is removed or added, then (d.length mod 5) will change. If a digit in an even position is modified, then s0 will change. If a digit in an odd position is modified, then s1 will change. If two different adjacent digits are swapped, then both s0 and s1 will change.

The 3 numbers d.length, s0, s1 are encoded in the 2 most significant letters of the checksum, so if any of those conditions occur, the checksum will change in at least one of those two letters.
The 3 numbers d.length, s0, s1 are encoded in the 2 most significant letters of the checksum (before the final permutation), so if any of those conditions occur, that checksum will change in at least one of those two letters.

The other 3 letters of the checksum are a very simple hash of the address. There are over 10 million different 5-letter checksums possible, so typos are likely to be caught, even if they aren't one of the 4 kinds of typos listed here.

The hash described above would have strong guarantees that small changes in the address will change the checksum. However, incrementing the last digit of the address would often leave the first character of the checksum unchanged, and only change the other 4 characters. For example, the addresses 0.0.4, 0.0.5, and 0.0.6 would all have checksums starting with the letter "c". Therefore, the algorithm contains a final permutation, to increase the probability of the first character changing, too. That final permutation simply multiplies c by 1,000,003 modulo 26^5. The multiplier is the minimum prime greater than a million, and is therefore coprime to the modulus, and therefore performs a permutation (an invertible transformation). Because it is large (over a million), it allows each of the 5 characters to be affected by each of the others.

## Backwards Compatibility

Address checksums should be optional so as to support backward compatibility.
Address checksums should be optional so as to support backward compatibility. It is recommended that all software be upgraded to always display addresses with the checksum. But nothing will break if they don't.

## Security Implications

In general, this HIP would improve security, preventing mistakes in addressing that could lead to the loss of funds.

Any function that creates 5-letter checksums will inevitably have collisions, where two addresses have the same checksum. If the checksum function were a cryptographically-strong pseudorandom function (PRF), then there would be collisions where the addresses differ in only a single digit. The function defined here has no collisions like that.

When calculating checksums for all accounts of the form `0.0.x` as `x` counts up 1, 2, 3, ..., a PRF would be expected to reach a collision within the first 3,500 numbers (around sqrt(26^5)), but the function here goes more than 10 times further. It reaches its first collision at `0.0.39004-vwmgo`, which collides with the earlier address `0.0.10690-vwmgo`. But that is fine, because it is unlikely that a person trying to type `0.0.39004-vwmgo` would accidentally type `0.0.10690-vwmgo`. And a more likely typo such as `0.0.3904` has a different checksum `-pgbgg`, so an entered address of `0.0.3904-vwmgo` would be flagged as incorrect by any application that follows this standard.
When calculating checksums for all accounts of the form `0.0.x` as `x` counts up 1, 2, 3, ..., a PRF would be expected to reach a collision within the first 3,500 numbers (around sqrt(26^5)), but the function here goes more than 10 times further. It reaches its first collision at `0.0.39004-gyebe`, which collides with the earlier address `0.0.10690-gyebe`. But that is fine, because it is unlikely that a person trying to type `0.0.39004-gyebe` would accidentally type `0.0.10690-gyebe`. And a more likely typo such as `0.0.3904` has a different checksum `-csury`, so an entered address of `0.0.3904-gyebe` would be flagged as incorrect by any application that follows this standard.

## How to Teach This

Expand All @@ -143,8 +156,8 @@ When calculating checksums for all accounts of the form `0.0.x` as `x` counts up

Example code can be downloaded for these languages:

- [AddressChecksums.java.zip](https://github.com/hashgraph/hedera-improvement-proposal/files/5861407/AddressChecksums.java.zip)
- [HIP-1_javascript.html.zip](https://github.com/hashgraph/hedera-improvement-proposal/files/5861376/HIP-1_javascript.html.zip)
- [AddressChecksums.java.zip](https://github.com/hashgraph/hedera-improvement-proposal/assets/hip-15/AddressChecksums.java.zip)
- [HIP-15-javascript.html.zip](https://github.com/hashgraph/hedera-improvement-proposal/assets/hip-15/HIP-15-javascript.html.zip)

## Rejected Ideas

Expand Down
229 changes: 229 additions & 0 deletions assets/hip-15/AddressChecksums.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
//(c) 2020-2021 Hedera Hashgraph, released under Apache 2.0 license.
package com.hedera;

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import static java.lang.Math.*;

/**
* Static methods and classes useful for dealing with Hedera address checksums, as defined in
* <a href="https://github.com/hashgraph/hedera-improvement-proposal/HIP/hip-15.md>HIP-15</a>
*/
public class AddressChecksums {
/** regex accepting both no-checksum and with-checksum formats, with 4 capture groups: 3 numbers and a checksum */
final private static Pattern addressInputFormat = Pattern.compile(
"^(0|(?:[1-9]\\d*))\\.(0|(?:[1-9]\\d*))\\.(0|(?:[1-9]\\d*))(?:-([a-z]{5}))?$");

/** the status of an address parsed by parseAddress */
public enum parseStatus {
BAD_FORMAT, //incorrectly formatted
BAD_CHECKSUM, //checksum was present, but it was incorrect
GOOD_NO_CHECKSUM, //good no-checksum format address (no checksum was given)
GOOD_WITH_CHECKSUM //good with-checksum format address (a correct checksum was given)
}

/**
* The result returned by {@link }#parseAddress(addr)}, including all 4 components of addr, and correct checksum.
*/
public static class ParsedAddress {
/** is this a valid address? (If it's valid, then it either has a correct checksum, or no checksum) */
boolean isValid;
/** the status of the parsed address */
parseStatus status;
/** the first number in the address (10 in 10.20.30) */
int num1;
/** the second number in the address (20 in 10.20.30) */
int num2;
/** the third number in the address (30 in 10.20.30) */
int num3;
/** the checksum in the address that was parsed */
String givenChecksum;
/** the correct checksum */
String correctChecksum;
/** the address in no-checksum format */
String noChecksumFormat;
/** the address in with-checksum format */
String withChecksumFormat;

public String toString() {
return String.format(
"[isValid: %s, status: %s, num1: %s, num2: %s, num3: %s, correctChecksum: %s, " +
"givenChecksum: %s, noChecksumFormat: %s, withChecksumFormat: %s]",
isValid, status, num1, num2, num3, correctChecksum,
givenChecksum, noChecksumFormat, withChecksumFormat);
}
}


/**
* Given an address in either no-checksum or with-checksum format, return the components of the address, the correct
* checksum, and the canonical form of the address in no-checksum and with-checksum format.
*
* @param ledgerId
* the ledger ID for the ledger this address is on
* @param addr
* the address string to parse, such as "0.0.123" or "0.0.123-vfmkw"
* @return the address components, checksum, and forms
*/
public static ParsedAddress parseAddress(byte[] ledgerId, String addr) {
ParsedAddress results = new ParsedAddress();
Matcher match = addressInputFormat.matcher(addr);
if (!match.matches()) {
results.isValid = false;
results.status = parseStatus.BAD_FORMAT; //when status==BAD_FORMAT, the rest of the fields should be ignored
return results;
}
results.num1 = Integer.parseInt(match.group(1));
results.num2 = Integer.parseInt(match.group(2));
results.num3 = Integer.parseInt(match.group(3));
String ad = results.num1 + "." + results.num2 + "." + results.num3;
String c = checksum(ledgerId, ad);
results.status = ("".equals(match.group(4))) ? parseStatus.GOOD_NO_CHECKSUM
: (c.equals(match.group(4))) ? parseStatus.GOOD_WITH_CHECKSUM
: parseStatus.BAD_CHECKSUM;
results.isValid = (results.status != parseStatus.BAD_CHECKSUM);
results.correctChecksum = c;
results.givenChecksum = match.group(4);
results.noChecksumFormat = ad;
results.withChecksumFormat = ad + "-" + c;
return results;
}

/**
* Given an address like "0.0.123", return a checksum like "vfmkw" . The address must be in no-checksum format, with
* no extra characters (so not "0.0.00123" or "==0.0.123==" or "0.0.123-vfmkw"). The algorithm is defined by the
* HIP-15 standard to be:
*
* <pre>{@code
* a = a valid no-checksum address string, such as 0.0.123
* d = int array for the digits of a (using 10 to represent "."), so 0.0.123 is [0,10,0,10,1,2,3]
* h = unsigned byte array containing the ledger ID followed by 6 zero bytes
* p3 = 26 * 26 * 26
* p5 = 26 * 26 * 26 * 26 * 26
* s0 = (d[0] + d[2] + d[4] + d[6] + ...) mod 11
* s1 = (d[1] + d[3] + d[5] + d[7] + ...) mod 11
* s = (...((((d[0] * 31) + d[1]) * 31) + d[2]) * 31 + ... ) * 31 + d[d.length-1]) mod p3
* sh = (...(((h[0] * 31) + h[1]) * 31) + h[2]) * 31 + ... ) * 31 + h[h.length-1]) mod p5
* c = (((d.length mod 5) * 11 + s0) * 11 + s1) * p3 + s + sh ) mod p5
* c = (c * 1000003) mod p5
* checksum = c, written as 5 digits in base 26, using a-z
* }</pre>
*
* @param ledgerId
* the ledger ID for the ledger this address is on
* @param addr
* no-checksum address string without leading zeros or extra characters (so ==00.00.00123== becomes 0.0.123)
* @return the checksum
*/
public static String checksum(byte[] ledgerId, String addr) {
String a = addr; //address, such as "0.0.123"
int[] d = new int[addr.length()]; //digits of address, with 10 for '.', such as [0,10,0,10,1,2,3]
byte[] h = ledgerId; //ledger ID as an array of unsigned bytes
int s0 = 0; //sum of even positions (mod 11)
int s1 = 0; //sum of odd positions (mod 11)
int s = 0; //weighted sum of all positions (mod p3)
int sh = 0; //hash of the ledger ID
long c = 0; //the checksum, as a single number (it's a long, to prevent overflow in c * m)
String checksum = ""; //the answer to return
final int p3 = 26 * 26 * 26; //3 digits base 26
final int p5 = 26 * 26 * 26 * 26 * 26; //5 digits base 26
final int ascii_0 = '0'; //48
final int ascii_a = 'a'; //97
final int m = 1_000_003; //min prime greater than a million. Used for the final permutation.
final int w = 31; //sum of digit values weights them by powers of w. Should be coprime to p5.

for (int i = 0; i < a.length(); i++) {
d[i] = (a.charAt(i) == '.' ? 10 : (a.charAt(i) - ascii_0));
}
for (int i = 0; i < d.length; i++) {
s = (w * s + d[i]) % p3;
if (i % 2 == 0) {
s0 = (s0 + d[i]) % 11;
} else {
s1 = (s1 + d[i]) % 11;
}
}
for (byte sb : h) {
sh = (w * sh + (sb & 0xff)) % p5; //convert signed byte to unsigned before adding
}
for (int i = 0; i < 6; i++) { //process 6 zeros as if they were appended to the ledger ID
sh = (w * sh + 0) % p5;
}
c = ((((a.length() % 5) * 11 + s0) * 11 + s1) * p3 + s + sh) % p5;
c = (c * m) % p5;
for (int i = 0; i < 5; i++) {
checksum = Character.toString(ascii_a + (int)(c % 26)) + checksum;
c /= 26;
}

return checksum;
}

/**
* Check if the given checksum matches the calculated checksum, and println the result
*
* @param ledgerId
* the ledger that the address is on
* @param addr
* the address string (with or without checksum)
*/
private static void verify(byte[] ledgerId, String addr, String correctChecksum) {
ParsedAddress parsed = parseAddress(ledgerId, addr);
System.out.println(
(correctChecksum.equals(parsed.correctChecksum) ? "GOOD: " : "BAD: ")
+ "Ledger " + Arrays.toString(ledgerId)
+ " address " + parsed.withChecksumFormat);
}

/**
* Demonstrate use of checksum and parseAddress methods.
*
* @param args
* ignored
*/
public static void main(String[] args) {
byte[] mainnetLedgerId = new byte[] { (byte) 0 };
byte[] exampleLedgerId = new byte[] { (byte) 0xa1, (byte) 0xff, (byte) 0x01 };

//the following should all output a line starting with "GOOD:"

verify(mainnetLedgerId, "0.0.1", "dfkxr");
verify(mainnetLedgerId, "0.0.4", "cjcuq");
verify(mainnetLedgerId, "0.0.5", "ktach");
verify(mainnetLedgerId, "0.0.6", "tcxjy");
verify(mainnetLedgerId, "0.0.12", "uuuup");
verify(mainnetLedgerId, "0.0.123", "vfmkw");
verify(mainnetLedgerId, "0.0.1234567890", "zbhlt");
verify(mainnetLedgerId, "12.345.6789", "aoyyt");
verify(mainnetLedgerId, "1.23.456", "adpbr");

verify(exampleLedgerId, "0.0.1", "xzlgq");
verify(exampleLedgerId, "0.0.4", "xdddp");
verify(exampleLedgerId, "0.0.5", "fnalg");
verify(exampleLedgerId, "0.0.6", "nwxsx");
verify(exampleLedgerId, "0.0.12", "povdo");
verify(exampleLedgerId, "0.0.123", "pzmtv");
verify(exampleLedgerId, "0.0.1234567890", "tvhus");
verify(exampleLedgerId, "12.345.6789", "vizhs");
verify(exampleLedgerId, "1.23.456", "uxpkq");

//The following should all output a line starting with "[isValid: false".
//The first one should have a status of BAD_CHECKSUM, and the rest should have BAD_FORMAT.

System.out.println(parseAddress(mainnetLedgerId, "0.0.123-abcde"));
System.out.println(parseAddress(mainnetLedgerId, "0.00.123"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.0123-vfmkw"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123-VFMKW"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123-vFmKw"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123#vfmkw"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123vfmkw"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123 - vfmkw"));
System.out.println(parseAddress(mainnetLedgerId, "0.123"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123."));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123-vf"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123-vfm-kw"));
System.out.println(parseAddress(mainnetLedgerId, "0.0.123-vfmkwxxxx"));
}
}
Binary file added assets/hip-15/AddressChecksums.java.zip
Binary file not shown.
Loading

0 comments on commit 9eeae39

Please sign in to comment.