What’s the Problem?
In the geospatial profession, we really have great data. And we have LOTS of great data. And a bunch of that data has been around for a pretty long time. Which means as more and more new data arrives on the scene, buried in there somewhere is the evidence of how things have been changing over time. But as measurement professionals we aren’t just interested in knowing there has been change, we want to be able to tell “How Much?”
The easy approach is to just compare two datasets directly. Overlay and subtract. Calculate the difference. Right? Add to that the increasing buzz about AI and machine learning algorithms and we are good to go! Until, of course, things begin to go very wrong.
This figure shows the same wetland mapped in green and in pink, the dates separated by about 15 years. Is the difference change? What if both polygons were created from the same source data, just at different scales? What if the source data were completely different? Both wetlands are “accurate” based on their source data. The green polygon is clearly more “precise”.
Most of our work has to do with mapping topography and, in general, topography doesn’t really change all that much. In the relatively short time scales that we’re interested in, most changes are pretty small. Like, I am pretty sure the wood chuck holes in my front lawn are contributing to mass wasting along the bank, but that’s for another story. In localized areas, some topographic changes can obviously be extreme. Human development, coastal storms and river flooding can all have acute short-term impacts on local topography. Bluff and bank erosion are critical sources of sediment and deposition. Dunes can literally disappear. New parking lots and subdivisions can complete affect runoff and drainage. It is possible now, with our current legacy of high-resolution LiDAR data, to begin detecting and quantifying these detailed changes over time and over large areas.
Here’s the problem: We can only measure change within the accuracy limitations of the source data we are comparing. Re-read that statement a couple times and your immediate reaction is probably for your eyes to gloss over and your mind to start looking for something else to do. The bottom line is that we’d love to take data at face value and move on. But that’s a problem.
Accuracy vs Precision
Helping people understanding the difference between these two terms very well may be one of the biggest challenges facing the geospatial profession today.
While this picture of shots on a target is easy to grasp (we are often dealing with the situation in the lower left), it’s not all that helpful when trying to understand geospatial data. And it’s the crux of our problem. As the saying goes, “Its complicated”.
It’s also wrapped up in our inconsistent use of the term “scale” in this advanced digital age where anyone can zoom in on a map or an image well beyond the point of ridiculousness. A recent statewide mapping RFP specifically removed all scale references from the data requirements, which is also a problem. We still have functional scales at which we use and analyze data, and beyond which it is just inappropriate to use, even if we never print a single paper map. So, scale still matters. We know intuitively that even very precise things can also be inaccurate. High resolution imagery can be off horizontally by feet, meters, or even more. LiDAR elevation data that is “accurate” to some threshold still has inherent error.
Scale still matters.
So how do we account for error? ASPRS and the NSSDA have promulgated the use of “compiled to meet” and “tested to meet” statements for accuracy, and the accuracy of our data is a critical part of how we can use it to quantify change. It defines the limits of what we can detect as “actual change” versus what is just acceptable and normal variability within the data itself. This is particularly problematic if our precision significantly exceeds our accuracy. We may “think” we are far more accurate than we actually are, because we have all these extra decimal places.
Why we get it Wrong
The real problem is not that all geodata have inherent spatial errors (we know this), but that we are guilty of Actively Ignoring it. Once a dataset has been determined (or assumed) to be “accurate enough”, we move on, leaving the question of just how accurate on the cutting floor. The fact that one dataset may be accurate to 1ft, and another may be accurate to 10ft is kind of …well… important.
There are a number of very interesting issues at play here. The McNamara Fallacy, for example, includes the practice of “disregarding that which can’t be easily measured”. A classic example of this is when we want to overlay two datasets created in two different coordinate reference systems. We have this amazing ability to automatically project and transform datasets between geodetic reference systems, and we KNOW that doing so does not happen without introducing at least some error, but we make the blanket simplifying assumption that error is negligible.
Direct from ESRI: “Changing the projection…to a projection and datum that is different from the source data can add a margin of error to your data”.
In this case, “margin of error” is just additional error that gets piled on to what you already know about the error to begin with. Which is usually…not a lot.
The now ubiquitous Web Mercator specifically “projects coordinates surveyed against an ellipsoid as if they were surveyed on a sphere” with the intent that the resulting known errors are “not enough to be noticeable by eye”.
Which is great when you are trying to find a coffee shop really fast on your phone, but could be problematic for quantifying beach erosion or calculating wetland loss.
Precision is King
Our first inclination is to simply accept whatever error is inherent in the data and…keep moving. The well documented “Streetlight Effect” on our scientific thinking, or the parable of the Blind Men and the Elephant all circle around this same idea. We look at what is right in front of us, and we measure what we see with the tools we have. And its clear that in the current state of our industry, Precision is King.
Accuracy is the annoying friend that we tolerate, but do everything we can to ignore.
Enter the Error Ellipse
Enter the three-dimensional construct of the error ellipsoid. If the center of the ellipsoid represents a true location in the real world, any measurement of that location that lands inside the ellipsoid is considered “accurate”.
We can make this ellipsoid as big or as small as we want based on our design specifications or project needs, and then we can design our data collection plan and equipment to meet those requirements.
We can even create an ellipsoid for an entire dataset. We can create a design (or planned) ellipsoid, and a tested (or measured) ellipsoid. Conceptually, if the measured ellipsoid fits inside the planned ellipsoid, we are good to go! If it doesn’t, well, then we have a problem. Of course, we don’t actually create ellipsoids, but we do use terms like “horizontal and vertical accuracy”, which is effectively the same thing.
We can’t establish accuracy, and then just throw it out.
Yet that is EXACTLY what we do…
Design accuracy comes directly from project specs or industry standards. Tested accuracy must be measured from reference data of a known and much higher accuracy. Based on the maturity of collection methods and the high quality of equipment and sensors today, it’s becoming increasingly rare for data to actually be “tested”. Instead, we rely on the safe assumption that the data meets or exceeds the design specifications. To be clear, there is nothing wrong with doing that. But when comparing two different datasets, its important then to know what each of those design accuracy specifications are, and understand that they are almost never the same.
What we CAN’T do is use this idea of the error ellipsoid to establish the accuracy of the data (design or tested), and then simply throw it out when we want to do detailed analysis or calculations. Yet that is EXACTLY what we do in almost all cases.
IN Part 2, we will discuss a case study analyzing change between a legacy LiDAR dataset and a newer, more accurate (and more precise) LiDAR dataset.
Leave with this parting though that the error ellipsoid represents “allowable error”, and we MUST take that error into account if we are going to try to quantify any changes between two datasets.