Skip to content

Commit

Permalink
improve all docs and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
carderne committed Jul 13, 2024
1 parent f65f499 commit 8d9b8d3
Show file tree
Hide file tree
Showing 5 changed files with 200 additions and 56 deletions.
134 changes: 113 additions & 21 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ SELECT id FROM users; -- user_2accvpp5guht4dts56je5a
SELECT id FROM users WHERE id = 'user_2accvpp5guht4dts56je5a';
```

Plays nice with your server code too, no extra work needed:
Plays nice with your server code, no extra work needed:
```python
with psycopg.connect("postgresql://...") as conn:
res = conn.execute("SELECT id FROM users").fetchone()
Expand Down Expand Up @@ -79,7 +79,7 @@ Key changes relative to ULID:
```

### Collision
Relative to ULID, the time precision is reduced from 48 to 40 bits (keeping the most significant bits, so oveflow still won't occur until 10889 AD), and the randomness reduced from 80 to 64 bits.
Relative to ULID, the time precision is reduced from 48 to 40 bits (keeping the most significant bits, so overflow still won't occur until 10889 AD), and the randomness reduced from 80 to 64 bits.

The timestamp precision at 40 bits is around 250 milliseconds. In order to have a 50% probability of collision with 64 bits of randomness, you would need to generate around **4 billion items per 250 millisecond window**.

Expand All @@ -104,8 +104,41 @@ from upid import upid
upid("user")
```

Or more explicitly:
```python
from upid import UPID
UPID.from_prefix("user")
```

Or specifying your own timestamp or datetime
```python
import time, datetime
UPID.from_prefix_and_milliseconds("user", milliseconds)
UPID.from_prefix_and_datetime("user", datetime.datetime.now())
```

From and to a string:
```python
u = UPID.from_str("user_2accvpp5guht4dts56je5a")
u.to_str() # user_2a...
```

Get stuff out:
```python
u.prefix # user
u.datetime # 2024-07-07 ...
```

Convert to other formats:
```python
int(u) # 2079795568564925668398930358940603766
u.hex # 01908dd6a3669b912738191ea3d61576
u.to_uuid() # UUID('01908dd6-a366-9b91-2738-191ea3d61576')
```

#### Development
Code and tests are in the [py/](./py/) directory. Using [Rye](https://rye.astral.sh/) for development (installation instructions at the link).
Code and tests are in the [py/](./py/) directory.
Using [Rye](https://rye.astral.sh/) for development (installation instructions at the link).

```bash
# can be run from the repo root
Expand All @@ -118,6 +151,8 @@ If you just want to have a look around, pip should also work:
pip install -e .
```

Please open a PR if you spot a bug or improvement!

## Rust implementation
The current Rust implementation is based on [dylanhart/ulid-rs](https://github.com/dylanhart/ulid-rs), but using the same lookup base32 lookup method as the Python implementation.

Expand All @@ -132,6 +167,31 @@ use upid::Upid;
Upid::new("user");
```

Or specifying your own timestamp or datetime:
```rust
use std::time::SystemTime;
Upid::from_prefix_and_milliseconds("user", 1720366572288);
Upid::from_prefix_and_datetime("user", SystemTime::now());
```

From and to a string:
```rust
let u = Upid::from_string("user_2accvpp5guht4dts56je5a");
u.to_string();
```

Get stuff out:
```rust
u.prefix(); // user
u.datetime(); // 2024-07-07 ...
u.milliseconds(); // 17203...
```

Convert to other formats:
```rust
u.to_bytes();
```

#### Development
Code and tests are in the [upid_rs/](./upid_rs/) directory.

Expand All @@ -140,48 +200,80 @@ cd upid_rs
cargo check # or fmt/clippy/build/test/run
```

Please open a PR if you spot a bug or improvement!

## Postgres extension
There is also a Postgres extension built on the Rust implementation, using [pgrx](https://github.com/pgcentralfoundation/pgrx) and based on the very similar extension [pksunkara/pgx_ulid](https://github.com/pksunkara/pgx_ulid).

#### Installation
You can try out the Docker image [carderne/postgres-upid:16](https://hub.docker.com/r/carderne/postgres-upid):
The easiest would be to try out the Docker image [carderne/postgres-upid:16](https://hub.docker.com/r/carderne/postgres-upid), currently built for arm64 and amd64 but only for Postgres 16:
```bash
docker run -e POSTGRES_HOST_AUTH_METHOD=trust -p 5432:5432 carderne/postgres-upid:16
```

If you want to install it into another Postgres, you'll install pgrx and follow its [installation instructions](https://github.com/pgcentralfoundation/pgrx/blob/develop/cargo-pgrx/README.md).
Something like this:
```bash
cargo install --locked cargo-pgrx
pgrx init
cd upid_pg
pgrx install
```
You can also grab a Linux `.deb` from the [Releases](https://github.com/carderne/upid/releases) page. This is built for Postgres 16 and amd64 only.

Installable binaries will come soon.
More architectures and versions will follow once it is out of alpha.

#### Usage
```sql
CREATE EXTENSION ulid;

CREATE EXTENSION upid_pg;

CREATE TABLE users (
id upid NOT NULL DEFAULT gen_upid('user') PRIMARY KEY,
name text NOT NULL
);

INSERT INTO users (name) VALUES('Bob');

SELECT * FROM users;
-- id | name
-- -----------------------------+------
-- user_2accvpp5guht4dts56je5a | Bob
```

#### Development
Code and tests are in the [upid_pg/](./upid_pg/) directory.
You can get the raw `bytea` data, or the prefix or timestamp:
```sql
SELECT upid_to_bytea(id) FROM users;
-- \x019...

SELECT upid_to_prefix(id) FROM users;
-- 'user'

SELECT upid_to_timestamp(id) FROM users;
-- 2024-07-07 ...
```

You can convert a `UPID` to a regular Postgres `UUID`:
```sql
SELECT upid_to_uuid(gen_upid('user'));
```

Or the reverse (although the prefix and timestamp will no longer make sense):
```sql
select upid_from_uuid(gen_random_uuid());
```

#### Development
If you want to install it into another Postgres, you'll install pgrx and follow its [installation instructions](https://github.com/pgcentralfoundation/pgrx/blob/develop/cargo-pgrx/README.md).
Something like this:
```bash
cd upid_pg
cargo install --locked cargo-pgrx
cargo pgrx init
cargo pgrx install
```

Some `cargo` commands work as normal:
```bash
cargo check # or fmt/clippy
```

But building, testing and running must be done via pgrx.
This will compile it into a Postgres installation, and allow an interactive session and tests there.

# must test/run/install with pgrx
# this will compile it into a Postgres installation
# and run the tests there, or drop you into a psql prompt
cargo pgrx test # or run/install
```bash
cargo pgrx test pg16
# or run
# or install
```
18 changes: 16 additions & 2 deletions py/upid/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,8 @@ class UPID:
"""
The `UPID` contains a 20-bit prefix, 40-bit timestamp and 68 bits of randomness.
The prefix should only contain lower-case latin alphabet characters.
The prefix should only contain lower-case latin alphabet characters and be max
four characters long.
It is usually created using the `upid(prefix: str)` helper function:
Expand Down Expand Up @@ -78,10 +79,19 @@ def from_prefix_and_milliseconds(cls: type[Self], prefix: str, milliseconds: int

@classmethod
def from_str(cls: type[Self], string: str) -> Self:
"""
Convert the provided `str` to a `UPID`.
Throws a `ValueError` if the string is invalid:
- too long
- too short
- contains characters not in the `ENCODE` base32 alphabet
"""
return cls(b32.decode(string))

@property
def prefix(self) -> str:
"""Return just the prefix as a `str`."""
prefix, _ = b32.encode_prefix(self.b[b32.END_RANDO_BIN :])
return prefix

Expand All @@ -99,14 +109,18 @@ def datetime(self) -> dt.datetime:
def hex(self) -> str:
return self.b.hex()

def to_str(self) -> str:
return b32.encode(self.b)

def to_uuid(self) -> uuid.UUID:
"""Convert to a standard Python UUID."""
return uuid.UUID(bytes=self.b)

def __repr__(self) -> str:
return f"UPID({self!s})"

def __str__(self) -> str:
return b32.encode(self.b)
return self.to_str()

def __int__(self) -> int:
return int.from_bytes(self.b, "big")
Expand Down
13 changes: 9 additions & 4 deletions upid_pg/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
//! # upid_pg
//!
//! `upid_pg` is a thin wrapper for [upid](https://crates.io/crates/upid)
//! providing the UPID datatype and generator as a Postgres extension
//!
//! The code below is based largely on the following:
//! https://github.com/pksunkara/pgx_ulid
//! providing the UPID datatype and generator as a Postgres extension.
// The code below is based largely on the following:
// https://github.com/pksunkara/pgx_ulid

use core::ffi::CStr;
use inner_upid::Upid as InnerUpid;
Expand Down Expand Up @@ -105,6 +105,11 @@ fn upid_to_bytea(input: upid) -> Vec<u8> {
bytes.to_vec()
}

#[pg_extern(immutable, parallel_safe)]
fn upid_to_prefix(input: upid) -> String {
InnerUpid(input.0).prefix()
}

#[pg_extern(immutable, parallel_safe)]
fn upid_to_timestamp(input: upid) -> Timestamp {
let inner_seconds = (InnerUpid(input.0).milliseconds() as f64) / 1000.0;
Expand Down
4 changes: 2 additions & 2 deletions upid_rs/src/b32.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ const RANDO_CHAR_LEN: usize = 13;
const VERSION_CHAR_LEN: usize = 1;

/// Length of a string-encoded Upid
pub const CHAR_LEN: usize = 26;
const CHAR_LEN: usize = 26;

/// 32-character alphabet modified from Crockford's
/// Numbers first for sensible sorting, but full lower-case
/// latin alphabet so any sensible prefix can be used
/// Effectively a mapping from 8 bit byte -> 5 bit int -> base32 character
const ENCODE: &[u8; 32] = b"234567abcdefghijklmnopqrstuvwxyz";
pub const ENCODE: &[u8; 32] = b"234567abcdefghijklmnopqrstuvwxyz";

/// Speedy O(1) inverse lookup
/// base32 char -> ascii byte int -> base32 alphabet index
Expand Down
Loading

0 comments on commit 8d9b8d3

Please sign in to comment.