Crate genom

Fast reverse geocoding library with enriched location data. Convert coordinates to detailed place information including timezone, currency, region, and more.

Features

  • Simple API - Single function call: genom::lookup(lat, lon)
  • Rich Data - Returns 16+ fields including timezone, currency, postal code, region
  • Fast Lookups - Grid-based spatial indexing for sub-millisecond queries
  • Zero Config - Database builds automatically on first install
  • Thread-Safe - Global singleton with lazy initialization
  • Compact - Efficient binary format with string interning

Quick Start

use genom;

fn main() {
    // Lookup coordinates
    if let Some(place) = genom::lookup(40.7128, -74.0060) {
        println!("{}, {}", place.city, place.country_name);
        // Output: New York, United States
    }
}

Module enrichment

Data enrichment module that adds computed fields to raw geographic data.

This module provides functionality to enrich basic place data with additional information such as:

  • Country names from ISO codes
  • Currency codes by country
  • Continent information
  • EU membership status
  • Timezone calculations (offset, abbreviation, DST status)

All enrichment data is stored in static lazy-initialized hash maps for efficient lookup.

Module types

Core data structures for geographic information.

This module defines the fundamental types used throughout the library:

  • Place - Enriched output with complete geographic context
  • Location - Simple coordinate pair with distance calculations

Struct Geocoder

pub struct Geocoder { /* private fields */ }

The core geocoding engine. Manages the spatial database and performs coordinate lookups.

Conceptual Role

Geocoder is the transport layer for all geographic queries. It handles:

  • Zero-copy parse of the embedded binary blob
  • Grid-based spatial indexing for O(1) lookups
  • Nearest-neighbor search across grid cells
  • String table resolution for compact storage

What This Type Does NOT Do

  • Data enrichment (handled by enrichment module)
  • Distance calculations (delegated to Location type)
  • Thread synchronization (uses OnceLock for initialization)

Invariants

  • After construction, the database is fully loaded and valid
  • Grid keys are consistent with coordinate quantization
  • String indices into the embedded blob are bounds-checked at parse time

Thread Safety

Geocoder is Send but not Sync. However, the global instance accessed via Geocoder::global() is safe to use from multiple threads because all operations are read-only after initialization.

Implementations

pub fn global() → &'static Self

Returns a reference to the global geocoder singleton.

Initialization

First call zero-copy-parses the embedded binary blob and builds two FxHashMap indexes. Subsequent calls return the cached instance. Initialization is thread-safe via OnceLock.

Panics

Panics if database initialization fails (corrupted data, out of memory). This is intentional - the library cannot function without a valid database.

Examples

use genom::Geocoder;

let geocoder = Geocoder::global();
let place = geocoder.lookup(51.5074, -0.1278);

pub fn lookup(&self, latitude: f64, longitude: f64) → Option<Place>

Finds the nearest place to the given coordinates.

Algorithm

  1. Quantize coordinates into the 0.1° city grid cell
  2. Expanding-ring search across neighboring cells until a candidate is found
  3. Refine the postal code from the matching country's 0.01° postal grid
  4. Enrich with country/currency/continent/timezone metadata

Returns

Some(Place) if a location is found within search radius, None otherwise. Ocean coordinates typically return None unless near coastal cities.

Examples

use genom::Geocoder;

let geocoder = Geocoder::global();

// Paris, France
let place = geocoder.lookup(48.8566, 2.3522).unwrap();
assert_eq!(place.city, "Paris");
assert_eq!(place.country_code, "FR");

Struct Place

pub struct Place {
    pub city: String,
    pub region: String,
    pub region_code: String,
    pub district: String,
    pub country_code: String,
    pub country_name: String,
    pub postal_code: String,
    pub timezone: String,
    pub timezone_abbr: String,
    pub utc_offset: i32,
    pub utc_offset_str: String,
    pub latitude: f64,
    pub longitude: f64,
    pub currency: String,
    pub continent_code: String,
    pub continent_name: String,
    pub is_eu: bool,
    pub dst_active: bool,
}

The enriched output type containing complete geographic context for a location.

This struct is returned by lookup() and contains 18 fields providing comprehensive information about a geographic location.

Fields

city: String

City or locality name (e.g., "New York", "Tokyo", "Paris")

region: String

State, province, or administrative region full name (e.g., "California", "Tokyo", "Île-de-France")

region_code: String

ISO 3166-2 region code (e.g., "CA" for California, "13" for Tokyo)

district: String

County, district, or sub-region (e.g., "Los Angeles County", "Chiyoda")

country_code: String

ISO 3166-1 alpha-2 country code (e.g., "US", "JP", "FR")

country_name: String

Full country name (e.g., "United States", "Japan", "France")

postal_code: String

Postal or ZIP code (e.g., "10001", "100-0001", "75001")

timezone: String

IANA timezone identifier (e.g., "America/New_York", "Asia/Tokyo", "Europe/Paris")

timezone_abbr: String

Current timezone abbreviation (e.g., "EST", "JST", "CET"). Changes based on DST.

utc_offset: i32

Current UTC offset in seconds (e.g., -18000 for UTC-5, 32400 for UTC+9)

utc_offset_str: String

Formatted UTC offset string (e.g., "UTC-5", "UTC+9", "UTC+5:30")

latitude: f64

Precise latitude coordinate in decimal degrees (-90 to 90)

longitude: f64

Precise longitude coordinate in decimal degrees (-180 to 180)

currency: String

ISO 4217 currency code (e.g., "USD", "JPY", "EUR")

continent_code: String

Two-letter continent code (e.g., "NA" for North America, "AS" for Asia, "EU" for Europe)

continent_name: String

Full continent name (e.g., "North America", "Asia", "Europe")

is_eu: bool

Whether the location is in a European Union member state

dst_active: bool

Whether daylight saving time is currently active for this location

Examples

use genom;

let place = genom::lookup(40.7128, -74.0060).unwrap();

println!("City: {}", place.city);
println!("Country: {}", place.country_name);
println!("Timezone: {} ({})", place.timezone, place.timezone_abbr);
println!("Currency: {}", place.currency);
println!("EU Member: {}", place.is_eu);

Struct Location

pub struct Location {
    pub latitude: f64,
    pub longitude: f64,
}

A coordinate pair with distance calculation capabilities.

This is a simple wrapper around latitude and longitude coordinates that provides utility methods for geographic calculations.

Fields

latitude: f64

Latitude in decimal degrees (-90 to 90)

longitude: f64

Longitude in decimal degrees (-180 to 180)

Implementations

pub fn new(latitude: f64, longitude: f64) → Self

Constructs a new Location from coordinates.

Examples

use genom::Location;

let loc = Location::new(40.7128, -74.0060);
assert_eq!(loc.latitude, 40.7128);
assert_eq!(loc.longitude, -74.0060);

pub fn distance_to(&self, other: &Location) → f64

Calculates the great-circle distance to another location using the haversine formula.

Returns the distance in kilometers. This calculation assumes a spherical Earth with radius 6371 km, which provides accuracy within 0.5% for most distances.

Examples

use genom::Location;

let nyc = Location::new(40.7128, -74.0060);
let la = Location::new(34.0522, -118.2437);

let distance = nyc.distance_to(&la);
assert!(distance > 3900.0 && distance < 4000.0); // ~3944 km

Reference Binary format

The compiled database is a single contiguous blob (geo.bin, ~37 MB) embedded into your binary via include_bytes! and exposed as &'static [u8]. All previously-public storage types (CompactPlace, Database) are gone — there is no bincode, no Vec deserialization, no decompression.

Layout

  1. Header — magic "GEO1", version, 12× u32 section offsets/lengths
  2. Strings — interned UTF-8 table with u32 offsets
  3. Country codes — packed 2-byte ISO 3166-1 alpha-2 codes indexed by u16
  4. City gridFxHashMap<u32, u32> cell → byte offset, varint+zigzag deltas
  5. Cities blob — per-cell varint stream of (Δlat, Δlon, name, a1, a2, a1c, tz, cc)
  6. Postal directory — per-country byte ranges
  7. Postal sections — country-local string table + 0.01° cell grid + delta-encoded entries
  8. Country polygon dir + blob — Natural Earth simplified rings with bbox, varint deltas

Lookup hot path

  • Zero-copy slicing of the embedded blob — no allocations per query
  • Two FxHashMap indexes built once at startup (grid + postal_by_cc)
  • Distance computed in fixed-point i64 squared microdegrees

Build pipeline

build.rs downloads GeoNames cities500, admin1/2, countryInfo, allCountries.zip (postal) and Natural Earth country polygons, then runs the encoder in build/builder.rs to emit OUT_DIR/geo.bin. Downloads are cached in OUT_DIR/geonames-cache/. Skip the build with --features no-build-database (lookups then return None).

Struct PlaceInput

pub struct PlaceInput<'a> {
    pub city: &'a str,
    pub region: &'a str,
    pub region_code: &'a str,
    pub district: &'a str,
    pub country_code: &'a str,
    pub postal_code: &'a str,
    pub timezone: &'a str,
    pub latitude: f64,
    pub longitude: f64,
}

Input structure for the enrich_place function.

This struct contains the basic geographic data that will be enriched with additional computed fields (country name, currency, continent, timezone details, etc.).

Uses borrowed string slices to avoid unnecessary allocations during the enrichment process.

Fields

city: &'a str

City name

region: &'a str

Region/state name

region_code: &'a str

Region code

district: &'a str

District/county name

country_code: &'a str

ISO country code

postal_code: &'a str

Postal/ZIP code

timezone: &'a str

IANA timezone identifier

latitude: f64

Latitude coordinate

longitude: f64

Longitude coordinate

Function lookup

pub fn lookup(
    latitude: f64,
    longitude: f64
) → Option<Place>

Performs reverse geocoding on the given coordinates, returning enriched place data if found.

Conceptual Role

This is the primary entry point for all geocoding operations. It abstracts away database access, spatial indexing, and data enrichment into a single call.

What This Function Does

  • Accesses the global geocoder singleton (lazy initialization on first call)
  • Performs grid-based spatial lookup to find nearest place
  • Enriches raw data with timezone, currency, and regional information
  • Returns None if no place found within search radius

Thread Safety

This function is thread-safe and can be called concurrently from multiple threads. The underlying database is initialized once and shared via a static OnceLock.

Performance

Sub-microsecond per lookup in steady state. First call incurs a small one-shot index build (~5 ms). Subsequent calls are lock-free reads.

Examples

use genom;

// Tokyo coordinates
if let Some(place) = genom::lookup(35.6762, 139.6503) {
    println!("{}, {}", place.city, place.country_name);
    println!("Timezone: {}", place.timezone);
    println!("Currency: {}", place.currency);
}

// Ocean coordinates return None
assert!(genom::lookup(0.0, -160.0).is_none());

Function enrich_place

pub fn enrich_place(input: PlaceInput) → Place

Enriches basic place data with computed fields.

This function takes a PlaceInput containing basic geographic information and returns a fully enriched Place with additional computed fields.

Enrichment Process

  1. Timezone Parsing: Parses the IANA timezone to extract current offset, abbreviation, and DST status using chrono-tz
  2. Country Lookup: Maps country code to full country name using static hash map
  3. Currency Lookup: Maps country code to ISO 4217 currency code
  4. Continent Lookup: Maps country code to continent code and name
  5. EU Status: Checks if country is an EU member state

Static Data Sources

All enrichment data is stored in static LazyLock<FxHashMap> instances:

  • COUNTRY_NAMES - 200+ country code to name mappings
  • COUNTRY_CURRENCIES - 200+ country code to currency mappings
  • COUNTRY_CONTINENTS - 200+ country code to continent mappings
  • CONTINENT_NAMES - 7 continent code to name mappings
  • EU_COUNTRIES - 27 EU member states

DST Detection

DST status is determined by comparing the current UTC offset with the minimum offset observed in January and July. If the current offset differs from the minimum, DST is active.

Examples

use genom::enrichment::{enrich_place, PlaceInput};

let input = PlaceInput {
    city: "New York",
    region: "New York",
    region_code: "NY",
    district: "New York County",
    country_code: "US",
    postal_code: "10001",
    timezone: "America/New_York",
    latitude: 40.7128,
    longitude: -74.0060,
};

let place = enrich_place(input);
assert_eq!(place.country_name, "United States");
assert_eq!(place.currency, "USD");
assert_eq!(place.continent_name, "North America");
assert_eq!(place.is_eu, false);

Performance & Implementation Details

Memory Layout

  • Single blob: ~37 MB geo.bin embedded via include_bytes!, lives in .rodata
  • String interning: shared UTF-8 table, every string field is a u32 offset
  • Varint + zigzag deltas: coordinates and IDs stored as 1–3 byte sequences
  • Two-level grid: 0.1° cells for cities, per-country 0.01° cells for postal codes

Initialization

  1. include_bytes! bakes geo.bin into the executable
  2. On first lookup, headers are parsed into &'static [u8] slices
  3. Two FxHashMap indexes are built (city grid, postal-by-country)
  4. The Geocoder is cached in a OnceLock for the lifetime of the process

Thread Safety

After initialization, all operations are read-only and require no synchronization. Multiple threads can perform lookups concurrently without contention.

Build Process

build.rs generates the database during cargo build:

  1. Download GeoNames cities500, admin1/2, countryInfo, allCountries
  2. Download Natural Earth ne_110m_admin_0_countries shapefile
  3. Parse, simplify polygon rings (Douglas–Peucker)
  4. Intern strings, group cities by 0.1° cell, group postal codes per country
  5. Emit a single varint-encoded blob to OUT_DIR/geo.bin
  6. Library include_bytes!'s the result

Downloads are cached in OUT_DIR/geonames-cache/, so a clean rebuild without network works as long as the cache exists. Build time on first run: ~30–60 s.