Crate genom

Fast reverse geocoding library with enriched location data. Convert coordinates to detailed place information including timezone, currency, region, and more.

Features

  • Simple API - Single function call: genom::lookup(lat, lon)
  • Rich Data - Returns 16+ fields including timezone, currency, postal code, region
  • Fast Lookups - Grid-based spatial indexing for sub-millisecond queries
  • Zero Config - Database builds automatically on first install
  • Thread-Safe - Global singleton with lazy initialization
  • Compact - Efficient binary format with string interning

Quick Start

use genom;

fn main() {
    // Lookup coordinates
    if let Some(place) = genom::lookup(40.7128, -74.0060) {
        println!("{}, {}", place.city, place.country_name);
        // Output: New York, United States
    }
}

Module enrichment

Data enrichment module that adds computed fields to raw geographic data.

This module provides functionality to enrich basic place data with additional information such as:

  • Country names from ISO codes
  • Currency codes by country
  • Continent information
  • EU membership status
  • Timezone calculations (offset, abbreviation, DST status)

All enrichment data is stored in static lazy-initialized hash maps for efficient lookup.

Module types

Core data structures for geographic information.

This module defines the fundamental types used throughout the library:

  • Place - Enriched output with complete geographic context
  • Location - Simple coordinate pair with distance calculations
  • CompactPlace - Compressed storage format using string table indices
  • Database - Complete spatial database with grid index

Struct Geocoder

pub struct Geocoder { /* private fields */ }

The core geocoding engine. Manages the spatial database and performs coordinate lookups.

Conceptual Role

Geocoder is the transport layer for all geographic queries. It handles:

  • Database initialization and decompression
  • Grid-based spatial indexing for O(1) lookups
  • Nearest-neighbor search across grid cells
  • String table resolution for compact storage

What This Type Does NOT Do

  • Data enrichment (handled by enrichment module)
  • Distance calculations (delegated to Location type)
  • Thread synchronization (uses OnceLock for initialization)

Invariants

  • After construction, the database is fully loaded and valid
  • Grid keys are consistent with coordinate quantization
  • String indices in CompactPlace are valid into strings vector

Thread Safety

Geocoder is Send but not Sync. However, the global instance accessed via Geocoder::global() is safe to use from multiple threads because all operations are read-only after initialization.

Implementations

pub fn global() → &'static Self

Returns a reference to the global geocoder singleton.

Initialization

First call initializes the database by decompressing the embedded binary data. Subsequent calls return the cached instance. Initialization is thread-safe via OnceLock.

Panics

Panics if database initialization fails (corrupted data, out of memory). This is intentional - the library cannot function without a valid database.

Examples

use genom::Geocoder;

let geocoder = Geocoder::global();
let place = geocoder.lookup(51.5074, -0.1278);

pub fn lookup(&self, latitude: f64, longitude: f64) → Option<Place>

Finds the nearest place to the given coordinates.

Algorithm

  1. Quantize coordinates to grid key (0.1° resolution)
  2. Search target cell and 8 neighboring cells
  3. Calculate haversine distance to all candidates
  4. Return nearest place, enriched with metadata

Returns

Some(Place) if a location is found within search radius, None otherwise. Ocean coordinates typically return None unless near coastal cities.

Examples

use genom::Geocoder;

let geocoder = Geocoder::global();

// Paris, France
let place = geocoder.lookup(48.8566, 2.3522).unwrap();
assert_eq!(place.city, "Paris");
assert_eq!(place.country_code, "FR");

Struct Place

pub struct Place {
    pub city: String,
    pub region: String,
    pub region_code: String,
    pub district: String,
    pub country_code: String,
    pub country_name: String,
    pub postal_code: String,
    pub timezone: String,
    pub timezone_abbr: String,
    pub utc_offset: i32,
    pub utc_offset_str: String,
    pub latitude: f64,
    pub longitude: f64,
    pub currency: String,
    pub continent_code: String,
    pub continent_name: String,
    pub is_eu: bool,
    pub dst_active: bool,
}

The enriched output type containing complete geographic context for a location.

This struct is returned by lookup() and contains 18 fields providing comprehensive information about a geographic location.

Fields

city: String

City or locality name (e.g., "New York", "Tokyo", "Paris")

region: String

State, province, or administrative region full name (e.g., "California", "Tokyo", "Île-de-France")

region_code: String

ISO 3166-2 region code (e.g., "CA" for California, "13" for Tokyo)

district: String

County, district, or sub-region (e.g., "Los Angeles County", "Chiyoda")

country_code: String

ISO 3166-1 alpha-2 country code (e.g., "US", "JP", "FR")

country_name: String

Full country name (e.g., "United States", "Japan", "France")

postal_code: String

Postal or ZIP code (e.g., "10001", "100-0001", "75001")

timezone: String

IANA timezone identifier (e.g., "America/New_York", "Asia/Tokyo", "Europe/Paris")

timezone_abbr: String

Current timezone abbreviation (e.g., "EST", "JST", "CET"). Changes based on DST.

utc_offset: i32

Current UTC offset in seconds (e.g., -18000 for UTC-5, 32400 for UTC+9)

utc_offset_str: String

Formatted UTC offset string (e.g., "UTC-5", "UTC+9", "UTC+5:30")

latitude: f64

Precise latitude coordinate in decimal degrees (-90 to 90)

longitude: f64

Precise longitude coordinate in decimal degrees (-180 to 180)

currency: String

ISO 4217 currency code (e.g., "USD", "JPY", "EUR")

continent_code: String

Two-letter continent code (e.g., "NA" for North America, "AS" for Asia, "EU" for Europe)

continent_name: String

Full continent name (e.g., "North America", "Asia", "Europe")

is_eu: bool

Whether the location is in a European Union member state

dst_active: bool

Whether daylight saving time is currently active for this location

Examples

use genom;

let place = genom::lookup(40.7128, -74.0060).unwrap();

println!("City: {}", place.city);
println!("Country: {}", place.country_name);
println!("Timezone: {} ({})", place.timezone, place.timezone_abbr);
println!("Currency: {}", place.currency);
println!("EU Member: {}", place.is_eu);

Struct Location

pub struct Location {
    pub latitude: f64,
    pub longitude: f64,
}

A coordinate pair with distance calculation capabilities.

This is a simple wrapper around latitude and longitude coordinates that provides utility methods for geographic calculations.

Fields

latitude: f64

Latitude in decimal degrees (-90 to 90)

longitude: f64

Longitude in decimal degrees (-180 to 180)

Implementations

pub fn new(latitude: f64, longitude: f64) → Self

Constructs a new Location from coordinates.

Examples

use genom::Location;

let loc = Location::new(40.7128, -74.0060);
assert_eq!(loc.latitude, 40.7128);
assert_eq!(loc.longitude, -74.0060);

pub fn distance_to(&self, other: &Location) → f64

Calculates the great-circle distance to another location using the haversine formula.

Returns the distance in kilometers. This calculation assumes a spherical Earth with radius 6371 km, which provides accuracy within 0.5% for most distances.

Examples

use genom::Location;

let nyc = Location::new(40.7128, -74.0060);
let la = Location::new(34.0522, -118.2437);

let distance = nyc.distance_to(&la);
assert!(distance > 3900.0 && distance < 4000.0); // ~3944 km

Struct CompactPlace

pub struct CompactPlace {
    pub city: u32,
    pub region: u32,
    pub region_code: u32,
    pub district: u32,
    pub country_code: u32,
    pub postal_code: u32,
    pub timezone: u32,
    pub lat: i32,
    pub lon: i32,
}

Compressed storage format using string table indices and fixed-point coordinates.

This is the internal storage representation used in the database. All string fields are stored as u32 indices into a shared string table, and coordinates are stored as i32 fixed-point values (multiplied by 100,000).

This reduces memory footprint by approximately 70% compared to storing full Place structs.

Fields

city: u32

Index into the string table for the city name

region: u32

Index into the string table for the region name

region_code: u32

Index into the string table for the region code

district: u32

Index into the string table for the district name

country_code: u32

Index into the string table for the country code

postal_code: u32

Index into the string table for the postal code

timezone: u32

Index into the string table for the timezone identifier

lat: i32

Latitude as fixed-point integer (multiply by 100,000 to get decimal degrees)

lon: i32

Longitude as fixed-point integer (multiply by 100,000 to get decimal degrees)

Implementations

pub fn location(&self) → Location

Converts the fixed-point coordinates to a Location.

Divides the integer coordinates by 100,000 to recover the original decimal degree values.

Struct Database

pub struct Database {
    pub strings: Vec<String>,
    pub places: Vec<CompactPlace>,
    pub grid: FxHashMap<(i16, i16), Vec<u32>>,
}

The complete spatial database structure with string interning and grid index.

This struct contains all the data needed for geocoding operations. It uses string interning to deduplicate common strings and a spatial grid index for fast coordinate lookups.

Fields

strings: Vec<String>

Deduplicated string table. All string fields in CompactPlace are stored as indices into this vector. Common strings like country codes and timezone names are stored only once.

places: Vec<CompactPlace>

All geographic entries in compressed format. Each entry contains indices into the string table and fixed-point coordinates.

grid: FxHashMap<(i16, i16), Vec<u32>>

Spatial index mapping grid cells to place indices. The world is divided into 0.1° × 0.1° cells (~11km at equator). Each cell contains a vector of indices into the places vector.

Uses FxHashMap (from rustc-hash) for faster hashing of integer keys compared to the standard library's HashMap.

Spatial Indexing Strategy

The grid divides the world into 0.1° × 0.1° cells. For a lookup:

  1. Quantize the input coordinates to a grid key: (lat * 100000 / 10000, lon * 100000 / 10000)
  2. Search the target cell and 8 neighboring cells (3×3 grid)
  3. Calculate haversine distance to all candidates in these cells
  4. Return the nearest place

This provides O(1) average-case lookup with a small constant factor (typically 10-50 candidates to check).

Struct PlaceInput

pub struct PlaceInput<'a> {
    pub city: &'a str,
    pub region: &'a str,
    pub region_code: &'a str,
    pub district: &'a str,
    pub country_code: &'a str,
    pub postal_code: &'a str,
    pub timezone: &'a str,
    pub latitude: f64,
    pub longitude: f64,
}

Input structure for the enrich_place function.

This struct contains the basic geographic data that will be enriched with additional computed fields (country name, currency, continent, timezone details, etc.).

Uses borrowed string slices to avoid unnecessary allocations during the enrichment process.

Fields

city: &'a str

City name

region: &'a str

Region/state name

region_code: &'a str

Region code

district: &'a str

District/county name

country_code: &'a str

ISO country code

postal_code: &'a str

Postal/ZIP code

timezone: &'a str

IANA timezone identifier

latitude: f64

Latitude coordinate

longitude: f64

Longitude coordinate

Function lookup

pub fn lookup(
    latitude: f64,
    longitude: f64
) → Option<Place>

Performs reverse geocoding on the given coordinates, returning enriched place data if found.

Conceptual Role

This is the primary entry point for all geocoding operations. It abstracts away database access, spatial indexing, and data enrichment into a single call.

What This Function Does

  • Accesses the global geocoder singleton (lazy initialization on first call)
  • Performs grid-based spatial lookup to find nearest place
  • Enriches raw data with timezone, currency, and regional information
  • Returns None if no place found within search radius

Thread Safety

This function is thread-safe and can be called concurrently from multiple threads. The underlying database is initialized once and shared via a static OnceLock.

Performance

Typical lookup time: <1ms. First call incurs database initialization overhead (~100ms to decompress and load). Subsequent calls are lock-free reads.

Examples

use genom;

// Tokyo coordinates
if let Some(place) = genom::lookup(35.6762, 139.6503) {
    println!("{}, {}", place.city, place.country_name);
    println!("Timezone: {}", place.timezone);
    println!("Currency: {}", place.currency);
}

// Ocean coordinates return None
assert!(genom::lookup(0.0, -160.0).is_none());

Function enrich_place

pub fn enrich_place(input: PlaceInput) → Place

Enriches basic place data with computed fields.

This function takes a PlaceInput containing basic geographic information and returns a fully enriched Place with additional computed fields.

Enrichment Process

  1. Timezone Parsing: Parses the IANA timezone to extract current offset, abbreviation, and DST status using chrono-tz
  2. Country Lookup: Maps country code to full country name using static hash map
  3. Currency Lookup: Maps country code to ISO 4217 currency code
  4. Continent Lookup: Maps country code to continent code and name
  5. EU Status: Checks if country is an EU member state

Static Data Sources

All enrichment data is stored in static LazyLock<FxHashMap> instances:

  • COUNTRY_NAMES - 200+ country code to name mappings
  • COUNTRY_CURRENCIES - 200+ country code to currency mappings
  • COUNTRY_CONTINENTS - 200+ country code to continent mappings
  • CONTINENT_NAMES - 7 continent code to name mappings
  • EU_COUNTRIES - 27 EU member states

DST Detection

DST status is determined by comparing the current UTC offset with the minimum offset observed in January and July. If the current offset differs from the minimum, DST is active.

Examples

use genom::enrichment::{enrich_place, PlaceInput};

let input = PlaceInput {
    city: "New York",
    region: "New York",
    region_code: "NY",
    district: "New York County",
    country_code: "US",
    postal_code: "10001",
    timezone: "America/New_York",
    latitude: 40.7128,
    longitude: -74.0060,
};

let place = enrich_place(input);
assert_eq!(place.country_name, "United States");
assert_eq!(place.currency, "USD");
assert_eq!(place.continent_name, "North America");
assert_eq!(place.is_eu, false);

Performance & Implementation Details

Memory Layout

The database uses a highly optimized memory layout:

  • String Interning: Common strings stored once, reducing memory by ~60%
  • Fixed-Point Coordinates: 32-bit integers instead of 64-bit floats, reducing coordinate storage by 50%
  • Spatial Grid: O(1) lookup with small constant factor (10-50 candidates)

Initialization

The database is embedded in the binary at compile time using include_bytes!. On first access:

  1. Binary data is decompressed using bincode
  2. Database struct is deserialized (~100ms)
  3. Reference is stored in static OnceLock
  4. All subsequent accesses are instant

Thread Safety

After initialization, all operations are read-only and require no synchronization. Multiple threads can perform lookups concurrently without contention.

Build Process

The database is built from GeoNames data during cargo build:

  1. Download GeoNames cities dataset
  2. Parse and filter entries
  3. Build string table with deduplication
  4. Create spatial grid index
  5. Serialize to binary format
  6. Embed in compiled binary