Douglas-Peucker

The Douglas-Peucker algorithm uses a point-to-edge distance tolerance. The algorithm starts with a crude simplification that is the single edge joining the first and last vertices of the original polyline. It then computes the distance of all intermediate vertices to that edge. The vertex that is furthest away from that edge, and that has a computed distance that is larger than a specified tolerance, will be marked as a key and added to the simplification. This process will recurse for each edge in the current simplification, until all vertices of the original polyline are within tolerance of the simplification results. This process is illustrated below:

Initially, the simplification consists of a single edge. During the first step, the fourth vertex is marked as a key and the simplification is adjusted accordingly. During the second step, the first edge of the current simplification is processed. The maximum vertex distance to that edge falls below the tolerance threshold, and no new key is added. During the third step, a key is found for the second edge of the current simplification. This edge is split at the key and the simplification is updated. This process continues until no more keys can be found. Note that at each step, only one edge of the current simplification is processed.

This algorithm has a worst case running time of O(nm), and O(n log m) on average, where m is the size of the simplified polyline. As such, this is an output dependent algorithm, and will be very fast when m is small. To make it even faster, the Radial Distance routine is applied as a pre-processing step.

Interface

template <unsigned DIM, class ForwardIterator, class OutputIterator>
OutputIterator simplify_douglas_peucker (
    ForwardIterator first,
    ForwardIterator last,
    typename std::iterator_traits <ForwardIterator>::value_type tol,
    OutputIterator result)

Applies the Radial Distance routine followed by Douglas-Peucker approximation to the range [first, last) using the specified tolerance tol. The resulting simplified polyline is copied to the output range [result, result + m * DIM), where m is the number of vertices of the simplified polyline. The return value is the end of the output range: result + m * DIM.

Input (Type) Requirements

DIM is not 0, where DIM represents the dimension of the polyline
The ForwardIterator value type is convertible to the value type of the OutputIterator
The range [first, last) contains vertex coordinates in multiples of DIM, e.g.: x, y, z, x, y, z, x, y, z when DIM = 3
The range [first, last) contains at least 2 vertices
tol is not 0.

In case these requirements are not met, the entire input range [first, last) is copied to the output range [result, result + (last - first)), or compile errors may occur.

Implementation Details

Initially, my focus was on limiting the memory usage of the algorithms. So instead of using output iterators, all algorithms returned a std::vector<bool>. One boolean for each vertex that determined if that vertex is a key and thus part of the simplification. This list of key markers could be used as input for another algorithm, allowing different algorithms to be run in sequence. A separate function could optionally copy all keys to some output range.

This approach worked, but had some serious drawbacks:

The code was slow, especially for non-random access iterators
The code had become too complex with all its bookkeeping
When using the code, I always needed a real copy of the simplification results and not a bunch of key markers

The first thing I changed was the interface of each algorithm. Instead of returning key markers, the simplification results were copied to an output range using output iterators. The second change was to store the intermediate result produced by the Radial Distance pre-processing step in a plain C-style array. This array is then used during Douglas-Peucker approximation. The advantages of this approach far outweigh the increase in memory usage:

Using the Radial Distance routine as a pre-processing step became trivial; I only had to create an array and specify an output iterator for it
Less code - lack of specific code for different iterator categories, and less bookkeeping
Faster code - working with C-style arrays and value type pointers is generally faster than using iterators, especially when dealing with non-random access iterators
Cleaner interface

The algorithm itself is a straightforward loop. The initial edge of the simplification is added to a std::stack. As long as the stack is not empty, an edge is popped and processed. Its key and key-edge distance are calculated. If the computed distance is larger than the tolerance, the key is added to the simplification. The edge is split and both sub-edges are added to the stack. When a vertex is added to the simplification, it is only marked as being a key. When the algorithm has finished, all marked vertices (keys) are copied to the output range.

Usage

double tolerance = 10;        // point-to-edge distance tolerance
std::deque <double> polyline; // original polyline, assume not empty

// make sure the result array is large enough
double* result = new double [polyline.size ()];

// simplify the 3d polyline
psimpl::simplify_douglas_peucker <3> (
    polyline.begin (), polyline.end (),
    tolerance, result);

This example demonstrates that input and output containers do not have to be the same.

Douglas-Peucker N

This algorithm is a variation of the original implementation. Its key differences are:

A point count tolerance is used instead of a point-to-edge distance tolerance. This allows you to specify the exact number of vertices in the simplified polyline. With the original implementation, you can never be sure how many vertices will remain.
Instead of processing a single edge at a time (chosen pseudo random), all edges of the current simplified polyline are considered simultaneously. Each of these edges may define a new key. From all these possible keys, the one with the highest point-to-edge distance is chosen as the new key.

A direct result from always choosing the next key based on all possible keys at any given time, is that the simplification results are of a higher quality. This is most notable when using a very low point-count tolerance. A downside is that we cannot use the Radial Distance routine as a pre-processing step to speed up the algorithm.

Interface

template <unsigned DIM, class ForwardIterator, class OutputIterator>
OutputIterator simplify_douglas_peucker_n (
    ForwardIterator first,
    ForwardIterator last,
    unsigned count,
    OutputIterator result)

Applies a variant of the Douglas-Peucker algorithm to the range [first, last). The resulting simplified polyline consists of count vertices, and is copied to the output range [result, result + count). The return value is the end of the output range: result + count.

Input (Type) Requirements

DIM is not 0, where DIM represents the dimension of the polyline
The ForwardIterator value type is convertible to the value type of the OutputIterator
The range [first, last) contains vertex coordinates in multiples of DIM, e.g.: x, y, z, x, y, z, x, y, z when DIM = 3
The range [first, last) contains a minimum of count vertices
count is at least 2

In case these requirements are not met, the entire input range [first, last) is copied to the output range [result, result + (last - first)), or compile errors may occur.

Implementation Details

The implementation for this variant varies only slightly from the original implementation. The main differences being that there is no pre-processing step, and in the way the edges of the current simplification are processed.

For each edge that is added to the current simplification, its key is calculated. This key, alongside the edge and its distance to that edge, are stored in a std::priority_queue. This queue ensures that its top element contains the key with the maximum point-edge distance. As long as the simplification does not contain the desired amount of points, the top element from the queue is popped and its key is added to the simplification. The corresponding edge is split, and the two sub-edges are processed and stored in the queue.

For performance reasons, a copy is made of the input polyline in a plain C-style array. Note that for the original implementation, this copy was made automatically during the Radial Distance pre-processing step.

Usage

unsigned tolerance = 25;     // point count tolerance
std::list <long> polyline;   // original polyline, assume not empty
std::vector <double> result; // resulting simplified polyline

// simplify the 4d polyline to 25 points
psimpl::simplify_douglas_peucker_n <4> (
    polyline.begin (), polyline.end (),
    tolerance, std::back_inserter (result));

This example demonstrates the use of a non-random access container with a signed integer data type.

psimpl

generic n-dimensional poyline simplification

Douglas-Peucker

Interface

Input (Type) Requirements

Implementation Details

Usage

Douglas-Peucker N

Interface

Input (Type) Requirements

Implementation Details

Usage