lua-resty-txid

Generate sortable, unique transaction or request IDs.

$ opm get GUI/lua-resty-txid

lua-resty-txid

[!CircleCI](https://circleci.com/gh/GUI/lua-resty-txid)

lua-resty-txid provides a function that can be used to generate unique transaction/request IDs for OpenResty/nginx. The IDs can be used to correlate logs or upstream requests and have the following characteristics:

  • 20 characters

  • base32hex encoded

  • Temporally and lexically sortable

  • Case insensitive

  • 96 bit identifier

lua-resty-txid is a LuaJIT port of ngx\_txid for OpenResty (or nginx with ngx\_lua). The IDs generated by lua-resty-txid follow the exact same pattern and are compatible with ngx\_txid.

Installation

Via OPM:

    opm get GUI/lua-resty-txid

Or via LuaRocks:

    luarocks install lua-resty-txid

Usage

A single txid() Lua function is exposed by this module to generate IDs:

    local txid = require "resty.txid"
    local id = txid() -- b2g6q94qdn6h84an7vfg

Each time txid() is called, a new, unique ID will be returned, so you will need to cache the result if you wish to reuse the same ID in multiple places for a single request. Depending on your usage, `ngx.ctx` or `set_by_lua` offer some simple options for caching the value on a per-request basis.

    txid() -- b2g83t2oshrg092mjggg
    txid() -- b2g83t2oodncokuges00
    
    ngx.ctx.txid = txid() -- b2g83t2od939mdvb2l0g
    ngx.ctx.txid          -- b2g83t2od939mdvb2l0g

Finally, txid() accepts an optional argument for what timestamp (in milliseconds) to use when generating the ID. By default, the current timestamp is used. Since the resulting IDs are temporally and lexically sortable, this can be used to generate IDs that will be sorted based on a previous date or time.

    local timestamp_ms = 655829050000 -- 1990-10-13 14:44:10
    txid(timestamp_ms) -- 4om9qi54la8ffr4bd9sg
    
    local timestamp_ms = 655929050000 -- 1990-10-14 12:30:50
    txid(timestamp_ms) -- 4on1lg74nt0ud2ssllu0

Example

A more complete example, with caching, setting request/response headers, and integration with nginx's logging:

    http {
      log_format agent "$lua_txid $http_user_agent";
      log_format addr "$lua_txid $remote_addr";
    
      init_by_lua_block {
        # Pre-load the module.
        require "resty.txid"
      }
    
      server {
        listen 8080;
        access_log logs/agents.log agent;
        access_log logs/addrs.log addr;
    
        # Set an nginx variable that is cached per request and can be used in the
        # nginx log_format.
        set_by_lua_block $lua_txid {
          local txid = require "resty.txid"
          return txid()
        }
    
        location / {
          # Set a header on the response providing the ID.
          more_set_headers "X-Request-Id: $lua_txid";
    
          # Set a header on the request providing the ID (which will be sent to the
          # proxied upstream).
          more_set_input_headers "X-Request-Id: $lua_txid";
    
          proxy_pass http://localhost:8081;
        }
      }
    }

Performance

Benchmarks indicate that performance is equivalent to the ngx\_txid C extension.

Design

The transaction ID design is a direct port of ngx\_txid, so here's all the original information about the design from ngx\_txid:

Background

The design of this transaction ID should meet the following requirements:

  • Be roughly numerically temporally sortable with ~second granularity.

  • Have a representation that is roughly lexically sortable with ~second granularity.

  • Have a probability of less than 1e-9 for collision at 1 million transactions per second.

  • Be efficient and easy to decode into fixed size C types

  • Always be available at the risk of higher collision probability

  • Use as few bytes as possible

  • Work with IPv4 and IPv6 networks

Technique

Use a monotonic millisecond resolution clock in the high 42 bits and system entropy for the low 54 bits. Use enough entropy bits to satisfy a collision probability at a desired global request rate.

    +------------- 64 bits------------+--- 32 bits ----+
    +------ 42 bits ------+--22 bits--|----------------+
    | msec since 1970-1-1 | random    | random         |
    +---------------------+-----------+----------------+

A request rate of 1 million per second across all servers means 1000 random values per millisecond. Estimating the collision probability using the birthday paradox can be done with this formula: 1 - e^(-((m^2)/(2*n))) where m is the number of ids and n is the number of random values possible.

When using 54 bits of entropy:

    1mil req/s  = 1 - exp(-((1000^2) /(2*2^54))) = 2.775558e-11
    10mil req/s = 1 - exp(-((10000^2)/(2*2^54))) = 2.775558e-09

The odds of collision are small even at 10 million requests per second.

Nginx keeps track of the current clock in increments of the configuration directive timer_resolution. The clock resolution for $txid is 1ms, so a timer resolution greater than 1ms means that the probability of collision will increase. If you have a timer_resolution of 10ms, 1 million requests per second would require 10,000 random values per second in the worst case.

Encoding

base32hex is used with a lower case alphabet and without padding characters is chosen for the following reasons:

  • Lexically sort order equivalent to numeric sort order

  • Case insensitive equality

  • Lower case is easer for visual compares

  • Denser than hex encoding by 4 bytes

Other techniques

  • snowflake: Uses time(41) + unique id(10) + sequence(12).

    • Pro: Guaranteed unique sequences

    • Pro: Fits in 63 bits

    • Cons: Requires unique id coordination for each server - 16 workers processes per host means a limit of 64 instances of nginx

    • Cons: Only 11 bits available for unique id, needs monitoring

    • Cons: Total ordering only possible in the same process

    • Cons: Service interruption possible when clocks lose synchronization

  • flake: Uses time + mac id + sequence.

    • Pro: Guaranteed unique sequences

    • Cons: Uses 128 bits

    • Cons: Wastes 22 bits of timestamp data

    • Cons: Only a single process per host can generate ids - needs to synchronize access to the sequence from each worker process

    • Cons: Service interruption possible when clocks lose synchronization

    • Cons: Seeds cross platform MAC Address lookup.

  • UUIDv4: 122 bits of entropy

    • Pro: Very low probability of collision

    • Cons: Unsortable

  • UUID with timestamp: 48 bits of time + 74 bits entropy

    • Pro: Very low probability of collision

    • Cons: String representation is not temporally local

  • httpd mod\_unique\_id: Host ip(32) + pid(32) + time(32) + sequence (16) + thread id (32)

    • Pro: Deterministic

    • Cons: Uses 144 bits

    • Cons: Assumes unique IPv4 for the hostnamme's interface

    • Cons: Unsortable case-sensitive custom representation - base64 with a custom alphabet

    • Cons: Hard limit of 65535 ids per second per pid - small tolerance for clock steps

Development

After checking out the repo, Docker can be used to run the test suite:

    docker-compose run --rm app make test

Release Process

To publish releases to OPM and LuaRocks:

    VERSION=x.x.x make release

Credits

Credit for the original design goes to ngx\_txid. This is simply a pure LuaJIT port for easier installation in OpenResty.

Authors

Nick Muerdter

License

mit

Dependencies

luajit

Versions