# Douglas-Peucker

The **Douglas-Peucker** algorithm uses a point-to-edge
distance tolerance. The algorithm starts with a crude simplification that is the
single edge joining the first and last vertices of the original polyline. It then
computes the distance of all intermediate vertices to that edge. The vertex that
is furthest away from that edge, and that has a computed distance that is larger
than a specified tolerance, will be marked as a key and added to the simplification.
This process will recurse for each edge in the current simplification, until all
vertices of the original polyline are within tolerance of the simplification results.
This process is illustrated below:

Initially, the simplification consists of a single edge. During the first step, the fourth vertex is marked as a key and the simplification is adjusted accordingly. During the second step, the first edge of the current simplification is processed. The maximum vertex distance to that edge falls below the tolerance threshold, and no new key is added. During the third step, a key is found for the second edge of the current simplification. This edge is split at the key and the simplification is updated. This process continues until no more keys can be found. Note that at each step, only one edge of the current simplification is processed.

This algorithm has a worst case running time of O(nm), and O(n log m) on average,
where m is the size of the simplified polyline. As such, this is an output dependent
algorithm, and will be very fast when m is small. To make it even faster, the
**Radial Distance** routine is applied as a pre-processing step.

## Interface

template <unsigned DIM, class ForwardIterator, class OutputIterator> OutputIterator simplify_douglas_peucker ( ForwardIterator first, ForwardIterator last, typename std::iterator_traits <ForwardIterator>::value_type tol, OutputIterator result)

Applies the **Radial Distance** routine followed by
**Douglas-Peucker** approximation to the range
`[first, last)`

using the specified tolerance `tol`

.
The resulting simplified polyline is copied to the output range
`[result, result + m * DIM)`

, where *m* is the number of
vertices of the simplified polyline. The return value is the end of the output
range: `result + m * DIM`

.

## Input (Type) Requirements

`DIM`

is not 0, where`DIM`

represents the dimension of the polyline- The
`ForwardIterator`

value type is convertible to the value type of the`OutputIterator`

- The range
`[first, last)`

contains vertex coordinates in multiples of`DIM`

, e.g.: x, y, z, x, y, z, x, y, z when`DIM`

= 3 - The range
`[first, last)`

contains at least 2 vertices `tol`

is not 0.

In case these requirements are not met, the entire input range
`[first, last)`

is copied to the output range
`[result, result + (last - first))`

, **or** compile errors may
occur.

## Implementation Details

Initially, my focus was on limiting the memory usage of the algorithms. So
instead of using output iterators, all algorithms returned a
`std::vector<bool>`

. One boolean
for each vertex that determined if that vertex is a key and thus part of the
simplification. This list of key markers could be used as input for another
algorithm, allowing different algorithms to be run in sequence. A separate
function could optionally copy all keys to some output range.

This approach worked, but had some serious drawbacks:

- The code was slow, especially for non-random access iterators
- The code had become too complex with all its bookkeeping
- When using the code, I always needed a real copy of the simplification results and not a bunch of key markers

The first thing I changed was the interface of each algorithm. Instead of
returning key markers, the simplification results were copied to an output range
using output iterators. The second change was to store the intermediate result
produced by the **Radial Distance** pre-processing step in a
plain C-style array. This array is then used during **Douglas-Peucker**
approximation. The advantages of this approach far outweigh the increase in memory usage:

- Using the
**Radial Distance**routine as a pre-processing step became trivial; I only had to create an array and specify an output iterator for it - Less code - lack of specific code for different iterator categories, and less bookkeeping
- Faster code - working with C-style arrays and value type pointers is generally faster than using iterators, especially when dealing with non-random access iterators
- Cleaner interface

The algorithm itself is a straightforward loop. The initial edge of the
simplification is added to a `std::stack`

. As long as the stack is
not empty, an edge is popped and processed. Its key and key-edge distance are
calculated. If the computed distance is larger than the tolerance, the key is
added to the simplification. The edge is split and both sub-edges are added to
the stack. When a vertex is added to the simplification, it is only marked as
being a key. When the algorithm has finished, all marked vertices (keys) are
copied to the output range.

## Usage

double tolerance = 10; // point-to-edge distance tolerance std::deque <double> polyline; // original polyline, assume not empty // make sure the result array is large enough double* result = new double [polyline.size ()]; // simplify the 3d polyline psimpl::simplify_douglas_peucker <3> ( polyline.begin (), polyline.end (), tolerance, result);

This example demonstrates that input and output containers do not have to be the same.

# Douglas-Peucker N

This algorithm is a variation of the original implementation. Its key differences are:

- A point count tolerance is used instead of a point-to-edge distance tolerance. This allows you to specify the exact number of vertices in the simplified polyline. With the original implementation, you can never be sure how many vertices will remain.
- Instead of processing a single edge at a time (chosen pseudo random), all edges of the current simplified polyline are considered simultaneously. Each of these edges may define a new key. From all these possible keys, the one with the highest point-to-edge distance is chosen as the new key.

A direct result from always choosing the next key based on all possible keys
at any given time, is that the simplification results are of a higher quality.
This is most notable when using a very low point-count tolerance. A downside is
that we cannot use the **Radial Distance** routine as a
pre-processing step to speed up the algorithm.

## Interface

template <unsigned DIM, class ForwardIterator, class OutputIterator> OutputIterator simplify_douglas_peucker_n ( ForwardIterator first, ForwardIterator last, unsigned count, OutputIterator result)

Applies a variant of the **Douglas-Peucker** algorithm to
the range `[first, last)`

. The resulting simplified polyline consists
of `count`

vertices, and is copied to the output range
`[result, result + count)`

. The return value is the end of the output
range: `result + count`

.

## Input (Type) Requirements

`DIM`

is not 0, where`DIM`

represents the dimension of the polyline- The
`ForwardIterator`

value type is convertible to the value type of the`OutputIterator`

- The range
`[first, last)`

contains vertex coordinates in multiples of`DIM`

, e.g.: x, y, z, x, y, z, x, y, z when`DIM`

= 3 - The range
`[first, last)`

contains a minimum of`count`

vertices `count`

is at least 2

In case these requirements are not met, the entire input range
`[first, last)`

is copied to the output range
`[result, result + (last - first))`

, **or** compile errors may
occur.

## Implementation Details

The implementation for this variant varies only slightly from the original implementation. The main differences being that there is no pre-processing step, and in the way the edges of the current simplification are processed.

For each edge that is added to the current simplification, its key is calculated.
This key, alongside the edge and its distance to that edge, are stored in a
`std::priority_queue`

. This queue ensures that its top element contains
the key with the maximum point-edge distance. As long as the simplification does
not contain the desired amount of points, the top element from the queue is popped
and its key is added to the simplification. The corresponding edge is split, and
the two sub-edges are processed and stored in the queue.

For performance reasons, a copy is made of the input polyline in a plain C-style
array. Note that for the original implementation, this copy was made automatically
during the **Radial Distance** pre-processing step.

## Usage

unsigned tolerance = 25; // point count tolerance std::list <long> polyline; // original polyline, assume not empty std::vector <double> result; // resulting simplified polyline // simplify the 4d polyline to 25 points psimpl::simplify_douglas_peucker_n <4> ( polyline.begin (), polyline.end (), tolerance, std::back_inserter (result));

This example demonstrates the use of a non-random access container with a signed integer data type.