In the spirit of transparency, we are going to share some background info on how we at Quadrant analyse the quality of location data we provide to our buyers – some of the steps we take to ensure it is of the highest quality possible for their particular use case.
It all starts with the source
It starts with us only sourcing location data from SDKs, because IP address, bidstream, and cell tower triangulation data are not nearly as accurate (read more on that here ). Peeling back the layers of location data to assess its overall quality means we are always look at a variety of key data metrics.
Let’s take a closer look at some of these metrics below:
- DAU/MAU Ratio
- Data Completeness
- Horizontal Accuracy
- Days Seen Per Month
- Hours Seen Per Day
1: DAU/MAU Ratio
One of the baseline metrics we look at when analysing location data for quality is the Daily Active Users (DAU) and Monthly Active Users (MAU) ratio. In a nutshell, this helps us approximate how consistent a panel (group of mobile devices) is over the course of a month. The higher the number, the better.
As this is a high-level metric, publishers are usually able to provide these numbers immediately to help us get a general idea of the dataset.
Some data buyers may prioritise feeds with higher DAU/MAU ratios, but for us at Quadrant we source data from a variety of SDKs and publishers, which means we sometimes use data with lower DAU/MAU ratios if it complements our existing dataset.
2: Data Completeness
The amount of data captured is dependent on a number of factors including device hardware, SDK collection methodology, user opt-in permission, etc. As such, one common issue seen with location data is incomplete or missing data fields.
At Quadrant, we developed a metric known as “Data Completeness” (the percentage of each data attribute that contains verifiable data). This allows data buyers to quickly and easily assess the amount of missing data points in each attribute.
Screenshot of Data Completeness Metrics in Quadrant's Data Quality Dashboard.
As an example, latitude, longitude, timestamp, and horizontal accuracy are core attributes of location data. Without these fields, the data is essentially useless from a geospatial practitioner point of view.
When onboarding data, we would want to understand how much of the data in those fields is absent or missing; too much missing data and we would just be wasting time and resources.
At Quadrant, we always aim for datasets with 100 per cent completeness for the core fields. Other attributes, such as country code, operating system, or user agent tend to be given a bit more leeway.
It is worth pointing out that we are able to filter our datafeeds for our buyers, such as in cases where they want missing fields removed and only those fields with 100 per cent completeness included.
3: Horizontal Accuracy
Another key metric we always consider is Horizontal Accuracy (HA). Horizontal Accuracy of 10 meters and below is generally considered very good (for GPS data). In fact, we tend to reject data sources with high HA. In our Data Quality Dashboard, this metric is visualised as a histogram.
Visualisation of HA for Orion feed.
It’s worth noting that HA can vary based on a user’s environment and weather conditions. For example, in certain built-up areas of if there is bad weather, readings can be less accurate. Contrastingly, clear skies and open line-of-sight to satellites will likely result in better HA.
4: Days Seen Per Month
Days seen per month is a metric that gets even more granular than DAU/MAU. It enables us to see the distribution of devices over a certain period of time, we start by evaluating the number of days over the course of the month.
In the chart below, we can see that almost 20 per cent of the dataset is seen 25 days of the month and more – this is very high number. By contrast, if we see a device less than five days of the month, for example, it may limit the use case and whether one we can derive quality insights.
Days Seen Per Month Distribution for Datafeed 01.
At Quadrant, we are always on the lookout for quality location data feeds where devices are seen over a threshold of a certain number of days per month (with the higher being the better).
However, as with other metrics, in some cases we will accept data with a lower number of days seen per month if it complements our existing dataset. This is particularly true if it helps fill in missing information on a user’s journey, such as between two locations.
5: Hours Seen Per Day
The final metric we will share is even more granular. The number of Hours Seen Per Day, like days seen per month, for most use cases is usually more valuable when the number is higher.
This should be obvious, because it means we are recording a more complete picture of a user’s daily activity in terms of where they are located on an hour-by-hour basis.
Hours Seen Per Day Metric available in Quadrant's Data Quality Dashboard.
Conclusion
While we use the above-mentioned methods and metrics to evaluate location data, it is by no means everything we do to assess data quality. We hope that it provides some general insights and ideas on how to evaluate the quality of location data feeds you are investing in.
At Quadrant, we pride ourselves on providing our partners the highest quality location data, based on their specific data needs. We also provide custom filtered location data feeds, which ensure that our partners only receive the data they need to support their business objectives.
To achieve that, our data engineering team regularly checks our data for signs of poor accuracy, and manipulation. We will go into more depth on this in an upcoming post covering inaccurate and fraudulent geolocation data.