Pipeline diagram showing Wialon data flowing through extraction, staging, and validation into PostgreSQL
8 min

Wialon to PostgreSQL: Production Pipeline Blueprint

Daily extracts into PostgreSQL can unlock analytics speed, but only if ingestion, validation, and schema governance are handled deliberately.

Use immutable staging before business tables

The first rule of a production pipeline is: never transform data in place. Raw payloads from Wialon should land in an immutable staging table before any business logic touches them. This table — call it raw_payloads — has a simple schema: id (bigserial), payload (JSONB), source_endpoint (text), source_hash (text, SHA-256 of the payload), ingested_at (timestamptz), and processed (boolean, default false). No updates, no deletes. Append only.

Why this rigidity matters becomes clear the first time a transformation bug corrupts your business tables. With immutable staging, you fix the transformation, mark the affected records as unprocessed, and replay them. Without it, you have to re-extract from Wialon — and if the data retention window has passed or the API was returning different results due to a Wialon-side update, your data is gone. We learned this lesson after a unit-of-measure bug converted kilometers to miles for 72 hours. Replay from staging took 20 minutes. Re-extraction would have taken days and produced different numbers.

The source_hash column serves double duty: deduplication and integrity verification. Before inserting a payload, check whether a row with the same source_hash already exists. If it does, skip the insert — your extraction ran twice, or a retry delivered a duplicate. This makes the entire ingestion pipeline idempotent at the staging layer, which is the foundation that every downstream transformation depends on.

Design the extraction schedule

Wialon data extraction is pull-based unless you build a webhook layer on top of Wialon notifications. For most pipelines, polling is the practical choice. The key decisions are interval, pagination, and incremental vs full extraction. Polling too frequently wastes API budget and risks rate limits. Polling too infrequently creates large batches that are slow to process and hard to retry granularly.

For unit messages (GPS positions, sensor readings), poll every 5 minutes using the messages/load_interval endpoint with a time window. Track the last successfully extracted timestamp per unit in a cursor table. Each poll requests messages from last_extracted_at to now. This cursor-based approach ensures you never miss messages and never re-extract the same window unnecessarily. For slower-changing data like unit properties or driver assignments, a 15-minute or hourly poll is sufficient.

Timezone handling is a persistent source of bugs. Wialon stores timestamps as Unix epoch seconds (UTC). Your PostgreSQL database should store everything in UTC with timestamptz columns. The temptation to convert to local time during extraction is strong — resist it. Convert to local time only at the presentation layer. If your extraction code converts to Europe/Warsaw during ingestion and your BI tool converts again, you get double-shifted timestamps that are off by one or two hours depending on DST.

Backpressure is the extraction concern nobody plans for. When Wialon is slow — during peak hours, during server maintenance, or when your account shares infrastructure with a large customer on Wialon Hosting — your 5-minute polling window may not complete before the next poll triggers. Use a lock mechanism (advisory locks in PostgreSQL work well) to prevent overlapping extraction runs. If a run takes longer than its interval, the next run waits rather than stacking up concurrent API sessions.

Build transformation layers with versioning

Raw Wialon JSON in staging is not useful for analysts or dashboards. The transformation layer converts it into structured, typed, business-meaningful tables. Organize this in three tiers: staging (raw, immutable), intermediate (cleaned, typed, deduplicated), and mart (aggregated, business-entity-aligned). This mirrors the dbt staging-intermediate-mart pattern and provides clear boundaries for debugging.

Version each transformation explicitly. When you change how idle time is calculated — say, from 'engine on with speed below 3 km/h for more than 5 minutes' to 'engine on with speed below 5 km/h for more than 3 minutes' — that is version 2 of the idle_events transformation. The version number is stored alongside every output record. When a dashboard shows a spike in idle events, the first question is 'did the metric definition change?', and the version column answers it immediately.

Track schema migrations in a dedicated changelog table: migration_id, applied_at, description, transformation_name, version_from, version_to. This is not optional ceremony. When finance asks why fuel cost per km increased 8% month-over-month, and the answer is 'we changed the fuel consumption formula in transformation v3 on March 15th', you need that record. Silent metric changes destroy trust in the data platform faster than any technical failure.

Normalize Wialon entities to relational schema

Wialon's data model is hierarchical and deeply nested. A unit has properties, custom fields, admin fields, sensors, counters, driver bindings, and profile fields — all accessible via different API flags in core/search_items. Mapping this to a relational schema requires deliberate normalization decisions. The goal is a schema that makes common queries fast and uncommon queries possible, not a perfect third-normal-form representation of every Wialon concept.

Start with core dimension tables: dim_units (unit_id, name, vin, registration, vehicle_class, hardware_type), dim_drivers (driver_id, name, phone, license_number, ibutton_code), dim_geofences (geofence_id, name, type, geometry as PostGIS), and dim_groups (group_id, name, description). Fact tables follow: fact_messages (unit_id, timestamp, lat, lon, speed, altitude, satellites, raw_parameters JSONB), fact_trips (unit_id, start_time, end_time, distance_km, start_geofence_id, end_geofence_id, driver_id), fact_events (unit_id, event_type, timestamp, parameters JSONB).

Wialon custom fields deserve special attention. They are key-value pairs with no enforced schema — one customer uses a field called 'Department' while another uses 'dept' or 'cost_center'. Rather than creating a column per custom field (which leads to wide, sparse tables), store them in a JSONB column on the dimension table and create functional indexes on commonly queried fields: CREATE INDEX idx_units_department ON dim_units ((custom_fields->>'Department')). This balances query performance with schema flexibility.

For dimension changes — a vehicle moves from Group A to Group B, a driver's license is updated — use SCD Type 2 (slowly changing dimensions). Add valid_from, valid_to, and is_current columns to dimension tables. When a change is detected, close the current row (set valid_to to now, is_current to false) and insert a new row. This preserves history for accurate retrospective reporting: 'which group was this vehicle in during January?' has an exact answer.

Operationalize data quality checks

Data quality checks should run after every transformation batch, not as an afterthought weekly review. The minimum viable set includes five checks: row count validation (did we receive approximately the expected number of records for this time window?), duplicate key detection (are there any fact_messages with the same unit_id and timestamp?), timestamp monotonicity (are all timestamps within the expected extraction window, not in the future or suspiciously in the past?), null rate monitoring (did the percentage of null driver_ids in fact_trips suddenly spike?), and outlier detection (is any vehicle reporting more than 2000 km in a single day or fuel consumption above 80 liters per 100 km?).

Each check produces a structured result: check_name, severity (warning or critical), affected_table, affected_row_count, sample_ids, and run_timestamp. Critical checks block downstream transformations — if duplicate keys are detected in fact_messages, the mart layer should not rebuild until the duplicates are resolved. Warning checks are logged and reviewed in the daily data quality digest but do not block processing.

Alert routing matters as much as alert detection. A null rate spike in driver assignments should notify the fleet operations lead, not the database administrator. An outlier in fuel consumption should go to the maintenance team. Build a routing table that maps check_name to notification channel and responsible owner. When an alert fires and nobody responds because it went to a generic inbox, the check is worthless regardless of how sophisticated the detection logic is.

  • Implement a data freshness check: if the most recent record in fact_messages is older than 15 minutes, trigger an ingestion lag alert.
  • Track check results over time to identify degradation trends — a null rate that increases 0.5% per week indicates a systemic issue even if it never crosses the alert threshold.
  • Run monthly reconciliation between PostgreSQL row counts and Wialon report totals to catch silent data loss.

Enable historical backfill

Wialon retains raw messages for a configurable period — typically 3 to 6 months on Wialon Hosting, depending on the account tier and storage allocation. When you deploy a new pipeline or discover a gap in historical data, you need a backfill strategy that works within these retention constraints. The worst time to design backfill is when you need it urgently.

Backfill jobs must be idempotent. Use the same deduplication key (source_hash or unit_id + timestamp composite) as your live extraction. Run backfill in date-range batches — one day at a time — with configurable parallelism. Each batch is an independent unit of work: if day 47 of a 90-day backfill fails, you retry day 47 without re-running days 1 through 46. Track completion metadata in a backfill_runs table: backfill_id, unit_id, date_start, date_end, status, rows_extracted, started_at, completed_at.

Gap detection is the trigger for backfill. Run a daily query that checks for missing time intervals in fact_messages per unit. If a unit that normally reports every 30 seconds has a 4-hour gap, flag it. Some gaps are legitimate (vehicle parked in an underground garage, tracker powered off), but unexpected gaps deserve investigation. Maintain a gap_log table that records detected gaps, their cause (if known), and whether a backfill was attempted. This log is invaluable when operations asks 'why is there no data for truck 2847 on March 3rd?'.

Connect to BI and downstream consumers

PostgreSQL as a fleet data warehouse shines when it connects to the tools that operations and management already use. Metabase, Grafana, and Tableau can all connect directly via PostgreSQL wire protocol. The challenge is not connectivity — it is performance isolation and access control.

Create materialized views for dashboard queries that scan large fact tables. A materialized view that pre-aggregates daily mileage, fuel consumption, and idle time per vehicle group refreshes in seconds and serves dashboard queries in milliseconds. Schedule materialized view refreshes immediately after transformation batches complete, using REFRESH MATERIALIZED VIEW CONCURRENTLY to avoid locking out dashboard users during refresh. Without materialized views, every dashboard load runs a full aggregation query against millions of fact rows, which degrades the experience for everyone.

For deployments where BI query load could interfere with pipeline ingestion, use a read replica. The pipeline writes to the primary, BI tools read from the replica. PostgreSQL streaming replication adds sub-second lag, which is invisible to dashboard users. This separation prevents a runaway Tableau extract from blocking your ingestion workers.

Access control should follow the principle of least privilege. Create database roles per consumer type: role_pipeline (read/write on staging and intermediate, read on mart), role_analyst (read-only on mart and materialized views), role_dashboard (read-only on materialized views). Never share the pipeline's credentials with BI tools. When an analyst accidentally runs a query that locks a table, it should lock a read replica table, not the production pipeline's ingestion target.

Related Guides

Get in Touch

Want to talk?

Fill out the form and we'll get back to you within 24 hours.

Who We Are

We know Wialon inside out

Our team came from Gurtam. We've been building production integrations for Wialon partners across six continents for over a decade.

10+
Years
Experience
50+
Integrations
Delivered
6
Continents
Served