Optimizing Satellite Maps: Efficiently Rendering 700K++ Object Polygons and Their Attributes

Introduction

As an AI Engineer, I worked on object instance segmentation for satellite imagery and generating 700,000 polygons stored in PostgreSQL with PostGIS. The final challenge came when our team was tasked with building a simple website to display these 700K polygons, complete with details like names and areas, based on six months of data collected by the Computer Vision model we’d been using.

The Initial Problem: Bloated GeoJSON

Initially, I used HTML, JavaScript, and Leaflet to display the data. This worked fine for small-scale datasets, but when the data grew to 700,000 polygons (GeoJSON files ballooning to 350-500 MB), the browser slowed to a crawl. Loading times were long, rendering stuttered—even on localhost, my laptop’s performance tanked. This approach was clearly not scalable.

Ilustrasi GeoJSON 350-500 MB di browser — GeoJSON 350-500 MB in the browser: poor performance

This pushed me to search for a more efficient solution.

First Experiment: Marker Clustering

I tried Marker Clustering with Leaflet.markercluster, grouping nearby polygons into a single marker that split apart when zooming in. For fewer than 50,000 polygons, this was fairly effective. But with 700K polygons, the real-time calculations still bogged down the browser. I decided it was time for a new approach.

Understanding Digital Map Technology

After the solution above failed, I didn’t want to jump straight into finding another fix. Instead, I wanted to understand the fundamentals of how digital map technology works. Imagine a digital map as a stack of transparent sheets that can be rearranged at any time. Each sheet is a layer carrying different information, stacked together to create an interactive map. Here are its key components:

Component	Deskripsi	Contoh
Base Map Layer	The foundational layer of the map	Road tiles from OpenStreetMap, satellite imagery from Google Maps
Feature Layers	Additional layers with specific data	Polygons of regions, road lines, POI points (with properties like name or population)
User Interaction	How users engage with the map	Click for info, hover for details, adjust layer order so polygons appear above roads
Zoom Levels	Detail levels that change with magnification	Low zoom: city outlines; High zoom: street names and small buildings

Web maps use tiles—small data chunks pieced together like a puzzle:

Raster Tiles: Gambar statis (JPEG/PNG), cepat tapi besar.
Vector Tiles: Data vektor, ringan dan fleksibel.

Learning from the Best

Pertanyaan

How do Google Maps or OpenStreetMap handle massive datasets?

Prinsip Vector Tiling — Ilustrasi prinsip vector tiling

The answer: vector tiles. Data is split into tiles, loaded based on the visible area and zoom level. I proposed this approach to the team.

Vector Tiles dan PMTiles

GeoJSON vs Vector Tiles

Feature	GeoJSON	Vector Tiles
Format	JSON text	Binary, chopped into tiles
Data Size	350-500 MB for 700K polygons	100-300 MB (depending on zoom/simplification configuration)
Loading Speed	Slow, downloads everything at once	Fast, per tile based on zoom and area needed
Bandwidth Efficiency	Low	High
Rendering Performance	Heavy on the browser	Light, tile-based rendering

Vector Tiles

I choose vector tiles. A demo on localhost was responsive—no lag like with GeoJSON.

From PostGIS to PMTiles

Have 2 Options here:

PostGIS ST_AsMVT: Tiles real-time dari query.
PMTiles: Pre-generate tiles jadi file statis (via Tippecanoe).

Aspect	PostGIS ST_AsMVT	PMTiles
Output Type	Vector Tiles (PBF)	Vector Tiles statis File
Speed	Depends on query and size of data	Fast, pre-rendered
Data Updates	Always up-to-date	Requires regeneration
Server Load	High	Low

PMTiles vs MBTiles

Although I went with PMTiles, I also considered MBTiles. Both store vector tiles in different formats.

PMTiles are like a ZIP file, packing many vector tiles into a single file optimized for cloud access. MBTiles store vector tiles (or raster) in an SQLite database table, offering flexibility but with added overhead.

Fitur	PMTiles	MBTiles
Server	Serverless, can be read remotely over	Typically requires a server for online use, accessed on disk for offline use
Format	Single-file archive format optimized for the cloud	SQLite-based database, typically stored locally or on a server
Use case	Designed to be read remotely over HTTP	Designed for local disk access (offline) or server-based HTTP delivery (online)

PMTiles

I chose PMTiles because:

Optimized for online use via HTTP.
No need for frequent updates, regeneration is efficient.
Relatively smaller size, lightweight in the cloud.

Implementation with Martin Tile Server

I had multiple PMTiles since our data included various object types (e.g., region boundaries, satellite data, or polygons X, Y, Z). Initially, I thought serving several PMTiles would be a hassle. While PMTiles can be served directly via a static file server like S3, I needed extra features:

Authentication: Data is sensitive, so access must be tightly controlled.
Caching: For faster performance in frequently accessed areas.
Simple Endpoint: One URL for all PMTiles, keeping it easy for clients.

In the end, I chose Martin—a fast, lightweight Rust-based tile server. Why Martin?

Support Caching: Martin supports caching to improve performance.
High Performance: Blazing fast with a low memory footprint, perfect for large datasets like our 700,000 polygons.
PostGIS Integration: Can act as a hybrid—serving static PMTiles and real-time PostGIS data if needed later.
Simple Endppoint: Composite Sources -> One endpoint for multiple PMTiles, simplifying data consumption.
Authentication: However, martin doesn’t support authentication out of the box. We had to implement a reverse proxy with Nginx to handle this.

Example with consume with Leaflet:

...
L.vectorGrid
    .protobuf('http://localhost:7800/polygons/{z}/{x}/{y}', {
        vectorTileLayerStyles: {
            'polygons': { color: '#ff7800', fill: true, weight: 1 }
        }
    })
    .addTo(map);
...

When I tested this demo, the results were far smoother than GeoJSON—the map loaded instantly without delays, even with multiple layers.

Results and Implementation Challenges

My demo succeeded in a development environment. The map with 700,000 polygons rendered without lag, a stark contrast to GeoJSON, which felt sluggish even on localhost. Unfortunately, I didn’t get to fully implement it due to a certain reason.

Since it was still an experiment, I don’t have precise performance metrics (e.g., loading time or memory usage). But qualitatively, the difference was night and day. Clearly.

Valuable Lessons

Though this strayed a bit from AI, a product is always judged end-to-end. On my own initiative, I sought ways to make the AI-generated data usable (whether for analysis or visualization). I learned a lot about maps during this exploration:

Scalability: Solutions must handle big data.
Experimentation: Clustering’s failure led me to the best solution.
Communication: Writing an RFC trained me to convey complex ideas.

If I get the chance, I’d love to continue this with performance metrics and integration into our computer vision pipeline. For now, this is my story of optimizing satellite maps—a small step toward a bigger solution.

Next Up: Data Collection Adventure and Integrating Computer Vision into Maps

Optimizing rendering is only half the story. How did those 700,000 polygons collected? In my next project, I’ll share how we built a system to "mesh" specified regions from shapefiles, capture Google Maps satellite imagery via API, and predict objects with YOLO, all automated within a predefined grid. Stay tuned!