vectortiles

GIS Vector Tiles, as defined by Mapbox.

LTS Haskell 19.33:	1.5.1
Stackage Nightly 2022-03-17:	1.5.1
Latest on Hackage:	1.5.1

See all snapshots vectortiles appears in

BSD-3-Clause licensed by Colin Woodbury

Maintained by [email protected]

This version can be pinned in stack with:vectortiles-1.5.1@sha256:cd36573aadc0bcc89612561aefe9a0c00c0e038d264e73a47fa5bc5e50ee094d,2813

Module documentation for 1.5.1

Geography
- Geography.VectorTile
  - Geography.VectorTile.Internal

Depends on 12 packages(full list with versions):

base, bytestring, containers, deepseq, hashable, mtl, protocol-buffers, protocol-buffers-descriptor, text, transformers, unordered-containers, vector

VectorTiles

What are VectorTiles?

Invented by Mapbox, VectorTiles are a combination of finite-sized tiles and vector geometries. Mapbox maintains the official implementation spec for VectorTile codecs.

VectorTiles are advantageous over raster tiles in that:

They are typically smaller to store
They can be easily transformed (rotated, etc.) in real time
They allow for continuous (as opposed to step-wise) zoom in Slippy Maps.

Raw VectorTile data is stored in the protobuf format. Any codec implementing the spec must decode and encode data according to this .proto schema.

What is this library?

vectortiles is a minimum viable implementation of Version 2.1 of the VectorTile spec. It aims to be a solid reference to implement other codecs. vectortiles exposes a small API of conversion functions between raw protobuf data and a higher-level VectorTile type that is more condusive to further processing. vectortiles also exposes fairly simplistic (yet sensible) implementations of the typical GIS Geometry types:

Point
LineString
Polygon

For ease of encoding and decoding, each Geometry type and its Multi counterpart (i.e. Multipoint) are considered the same thing, a Vector of that Geometry.

Efficiency

This library is not micro-optimized, but does leverage some “for-free” aspects of Haskell to remain usable:

Point is implemented as a Record Pattern Synonym to hide the fact it’s just a vanilla tuple of Ints. This allows us to use the more efficient unboxed Vectors with it:

-- | Access a Point's values with the `x` and `y` functions.
type Point = (Int,Int)
pattern Point :: Int -> Int -> (Int, Int)
pattern Point{x, y} = (x, y)

Some types (like LineString) are implemented as a newtype for its compile-time unboxing:

import qualified Data.Vector.Unboxed as U

newtype LineString = LineString { lsPoints :: U.Vector Point }

All lenses are INLINEd.

Performance

Benchmarks can be run with stack bench, provided the stack tool is installed. The following results are from a 2016 Lenovo ThinkPad Carbon X1 with an Intel Core i7 processor, comparing this library with a [Python library of similar functionality] (https://github.com/mapzen/mapbox-vector-tile). All benchmarking code is available in the bench directory.

Note: 1 ms = 1000 μs

Decoding

	One Point	One LineString	One Polygon	roads.mvt (40kb, 15 layers)
CPython 3.5.2	63 μs	70 μs	84 μs	76 ms
PyPy 5.3	116 μs	210 μs	211 μs	12 ms
Haskell	3.6 μs	5 μs	5.8 μs	17.1 ms

The Haskell times are measuring data evaluation to their Normal Form (fully evaluated form).

The Python class decoded to is the builtin dict class.

Encoding

	One Point	One LineString	One Polygon	roads.mvt
CPython 3.5.2	218 μs	278 μs	703 μs	N/A
Haskell	3.2 μs	4.4 μs	5 μs	11.1 ms

Certain encoding benchmarks for Python were not possible.

Data Access (Fetching first Polygon)

	One Polygon	roads.mvt (`water` layer)
CPython 3.5.2	84 μs	78 ms
PyPy 5.3	31 μs	7.9 ms
Haskell	3.4 μs	6.8 ms

The operation being benchmarked is ByteString -> Polygon, meaning we include the decoding time to account for speed gains afforded by laziness.

Conclusions

Laziness pays off. In Haskell, just fetching some specific data field is faster than decoding the entire structure.
Python data fetches are fast. They are based on the dict class, so fetch operations will be as fast as dict is.
PyPy results are enigmatic. Python3 seems to do much better “off the block”, but given time the PyPy JIT overtakes it. Fetching layer names also seems to be faster than decoding the entire object, somehow. This may be due to the JIT being clever, noticing we aren’t using the rest of the structure.

Questions & Issues

Simply parsing raw protobuf data is not enough to work with VectorTiles, since the spec also defines how said data is to be interpreted once parsed. In writing a codec, there are a number of things one must consider:

Hand written schema code vs `protoc` use

Many languages have a “protobuf compiler” which can take a .proto file and generate schema code to access parsed data. There are PROs and CONs to taking this approach.

PROs for using a protoc-like program:

All accessor code is written for you
Update process when new official .proto is released is clearer

PROs for writing your own schema:

Its your code, so you have more control. Bugs are easier to chase
The code will likely be much shorter

In the case of two Haskell protobuf libraries which were compared, the hand-written one allowed for a 50-line schema, while the other auto-generated a 550-line one.

Extension Support

The protobuf spec leaves room for additional:

Value types in the key-value metadata maps
fields in a Layer
fields in a Tile
use of the UNKNOWN geometry type

In writing a codec, these can be completely ignored. However, they are permitted by the spec, and some tools may encode data using them.

Feature/Geometry Polymorphism

At the protobuf level, Features of Points, LineStrings, and Polygons are all mixed into a single list, distinguished only by a GeomType label. At a high level, you may wish to separate these specifically. This library does just that:

data Layer = Layer { _version :: Int
                   , _name :: Text
                   , _points :: V.Vector (Feature Point)
                   , _linestrings :: V.Vector (Feature LineString)
                   , _polygons :: V.Vector (Feature Polygon)
                   , _extent :: Int
                   }

As opposed to having a single field named features, which contains all features unified by some superclass / trait / generic.

Claim: Having separate accessors for each geometry type yields a “heavier” API, but gives more power, is more performant, and less complex.

Layer / Feature Coupling

Layers and Features have coupled data at the protobuf level. In order to achieve higher compression ratios, Layers contain all metadata in key/value lists to be shared across their Features, while those Features store only indices into those lists. As a result, functions converting protobuf-level Feature objects into a high-level type need to be passed those key/value lists from the parent Layer. and a more isomorphic:

feature :: Geometry g => RawFeature -> Either Text (Feature g)

is not possible.

Polygon Definition

Version 2 of the spec mainly clarified language surrounding how polygons should be decoded. This Github issue reports another “gotcha” associated with the definition of polygons.

Sanity Checks / Error Handling

The protobuf data can be malformed in a number of ways. The amount of sanity checking that occurs while decoding depends on willingness to sacrifice performance. For instance, here is a constraint found in the spec regarding feature metadata:

Every key index MUST be unique within that feature such that no other attribute pair within that feature has the same key index. A feature MUST have an even number of tag fields. A feature tag field MUST NOT contain a key index or value index greater than or equal to the number of elements in the layer’s keys or values set, respectively.

Tips

Decode what is encoded, and encode what is decoded.

Encoding and decoding functions should be as close to isomorphisms as possible.

(0,0) is in the top-left corner.

Know binary arithmetic well.

Lists of Geometry commands/values are Z-encoded. See the zig and unzig functions in Geometry.VectorTile.Geometry.

Changes

Changelog

1.5.1

GHC 8.8.2 support.

1.5.0

Breaking: Removed the distance function.
Regenerated schema code to match modern versions of protocol-buffers.

1.4.0

Breaking: Feature is now explicit about what Vector type it holds
Performance improvements

1.2.0.3

Performance improvements during metadata encoding.

1.2.0.2

Bump vector upper bound.

1.2.0.1

Remove the redundant Monoid instance for Point.

1.2.0

Front End

VectorTile now holds a lazy Map internally, instead of a Vector. Use the ix Lens, giving it a layer name, to hone in on individual Layers quickly.
Users an optionally use the new fromProtobuf function directly from Geography.VectorTile.Protobuf if they wish to be semantically explicit about the backend conversion. Otherwise, they can still use the top-level tile function.

Back End

Reworked internals for Protobuf conversion. Most code moved to an Internal module.

1.1.1

Removed the StrictData pragma. Turns out laziness is faster.