Douglas-Peucker
The Douglas-Peucker algorithm uses a point-to-edge distance tolerance. The algorithm starts with a crude simplification that is the single edge joining the first and last vertices of the original polyline. It then computes the distance of all intermediate vertices to that edge. The vertex that is furthest away from that edge, and that has a computed distance that is larger than a specified tolerance, will be marked as a key and added to the simplification. This process will recurse for each edge in the current simplification, until all vertices of the original polyline are within tolerance of the simplification results. This process is illustrated below:
Initially, the simplification consists of a single edge. During the first step, the fourth vertex is marked as a key and the simplification is adjusted accordingly. During the second step, the first edge of the current simplification is processed. The maximum vertex distance to that edge falls below the tolerance threshold, and no new key is added. During the third step, a key is found for the second edge of the current simplification. This edge is split at the key and the simplification is updated. This process continues until no more keys can be found. Note that at each step, only one edge of the current simplification is processed.
This algorithm has a worst case running time of O(nm), and O(n log m) on average, where m is the size of the simplified polyline. As such, this is an output dependent algorithm, and will be very fast when m is small. To make it even faster, the Radial Distance routine is applied as a pre-processing step.
Interface
template <unsigned DIM, class ForwardIterator, class OutputIterator> OutputIterator simplify_douglas_peucker ( ForwardIterator first, ForwardIterator last, typename std::iterator_traits <ForwardIterator>::value_type tol, OutputIterator result)
Applies the Radial Distance routine followed by
Douglas-Peucker approximation to the range
[first, last)
using the specified tolerance tol
.
The resulting simplified polyline is copied to the output range
[result, result + m * DIM)
, where m is the number of
vertices of the simplified polyline. The return value is the end of the output
range: result + m * DIM
.
Input (Type) Requirements
DIM
is not 0, whereDIM
represents the dimension of the polyline- The
ForwardIterator
value type is convertible to the value type of theOutputIterator
- The range
[first, last)
contains vertex coordinates in multiples ofDIM
, e.g.: x, y, z, x, y, z, x, y, z whenDIM
= 3 - The range
[first, last)
contains at least 2 vertices tol
is not 0.
In case these requirements are not met, the entire input range
[first, last)
is copied to the output range
[result, result + (last - first))
, or compile errors may
occur.
Implementation Details
Initially, my focus was on limiting the memory usage of the algorithms. So
instead of using output iterators, all algorithms returned a
std::vector<bool>
. One boolean
for each vertex that determined if that vertex is a key and thus part of the
simplification. This list of key markers could be used as input for another
algorithm, allowing different algorithms to be run in sequence. A separate
function could optionally copy all keys to some output range.
This approach worked, but had some serious drawbacks:
- The code was slow, especially for non-random access iterators
- The code had become too complex with all its bookkeeping
- When using the code, I always needed a real copy of the simplification results and not a bunch of key markers
The first thing I changed was the interface of each algorithm. Instead of returning key markers, the simplification results were copied to an output range using output iterators. The second change was to store the intermediate result produced by the Radial Distance pre-processing step in a plain C-style array. This array is then used during Douglas-Peucker approximation. The advantages of this approach far outweigh the increase in memory usage:
- Using the Radial Distance routine as a pre-processing step became trivial; I only had to create an array and specify an output iterator for it
- Less code - lack of specific code for different iterator categories, and less bookkeeping
- Faster code - working with C-style arrays and value type pointers is generally faster than using iterators, especially when dealing with non-random access iterators
- Cleaner interface
The algorithm itself is a straightforward loop. The initial edge of the
simplification is added to a std::stack
. As long as the stack is
not empty, an edge is popped and processed. Its key and key-edge distance are
calculated. If the computed distance is larger than the tolerance, the key is
added to the simplification. The edge is split and both sub-edges are added to
the stack. When a vertex is added to the simplification, it is only marked as
being a key. When the algorithm has finished, all marked vertices (keys) are
copied to the output range.
Usage
double tolerance = 10; // point-to-edge distance tolerance std::deque <double> polyline; // original polyline, assume not empty // make sure the result array is large enough double* result = new double [polyline.size ()]; // simplify the 3d polyline psimpl::simplify_douglas_peucker <3> ( polyline.begin (), polyline.end (), tolerance, result);
This example demonstrates that input and output containers do not have to be the same.
Douglas-Peucker N
This algorithm is a variation of the original implementation. Its key differences are:
- A point count tolerance is used instead of a point-to-edge distance tolerance. This allows you to specify the exact number of vertices in the simplified polyline. With the original implementation, you can never be sure how many vertices will remain.
- Instead of processing a single edge at a time (chosen pseudo random), all edges of the current simplified polyline are considered simultaneously. Each of these edges may define a new key. From all these possible keys, the one with the highest point-to-edge distance is chosen as the new key.
A direct result from always choosing the next key based on all possible keys at any given time, is that the simplification results are of a higher quality. This is most notable when using a very low point-count tolerance. A downside is that we cannot use the Radial Distance routine as a pre-processing step to speed up the algorithm.
Interface
template <unsigned DIM, class ForwardIterator, class OutputIterator> OutputIterator simplify_douglas_peucker_n ( ForwardIterator first, ForwardIterator last, unsigned count, OutputIterator result)
Applies a variant of the Douglas-Peucker algorithm to
the range [first, last)
. The resulting simplified polyline consists
of count
vertices, and is copied to the output range
[result, result + count)
. The return value is the end of the output
range: result + count
.
Input (Type) Requirements
DIM
is not 0, whereDIM
represents the dimension of the polyline- The
ForwardIterator
value type is convertible to the value type of theOutputIterator
- The range
[first, last)
contains vertex coordinates in multiples ofDIM
, e.g.: x, y, z, x, y, z, x, y, z whenDIM
= 3 - The range
[first, last)
contains a minimum ofcount
vertices count
is at least 2
In case these requirements are not met, the entire input range
[first, last)
is copied to the output range
[result, result + (last - first))
, or compile errors may
occur.
Implementation Details
The implementation for this variant varies only slightly from the original implementation. The main differences being that there is no pre-processing step, and in the way the edges of the current simplification are processed.
For each edge that is added to the current simplification, its key is calculated.
This key, alongside the edge and its distance to that edge, are stored in a
std::priority_queue
. This queue ensures that its top element contains
the key with the maximum point-edge distance. As long as the simplification does
not contain the desired amount of points, the top element from the queue is popped
and its key is added to the simplification. The corresponding edge is split, and
the two sub-edges are processed and stored in the queue.
For performance reasons, a copy is made of the input polyline in a plain C-style array. Note that for the original implementation, this copy was made automatically during the Radial Distance pre-processing step.
Usage
unsigned tolerance = 25; // point count tolerance std::list <long> polyline; // original polyline, assume not empty std::vector <double> result; // resulting simplified polyline // simplify the 4d polyline to 25 points psimpl::simplify_douglas_peucker_n <4> ( polyline.begin (), polyline.end (), tolerance, std::back_inserter (result));
This example demonstrates the use of a non-random access container with a signed integer data type.