Performing Extrapolation on Location Data to Derive Relevant Insights

Picture of Sai Jami

Location data is collected from multiple sources of varying quality GPS signals from mobile devices, beacons, and WIFI connections, the notorious Bidstream, and more. In most cases, even genuine location data cannot represent the entire population of the region. This discrepancy can be attributed to smartphone penetration in the country, app-specific demographic variations, hardware inconsistencies, and sources of location data.

To perform meaningful analysis that accounts for mobility patterns and other trends in a larger region, data scientists use projection models to make an accurate estimation of a region’s population and normalise data counts to fit the use case. This is called data extrapolation.

Data extrapolation is the process of taking a set of values and estimating the numbers beyond the original observation range. It is a commonly used technique in analysis to estimate or predict the impact in real-time and make informed decisions.

There are multiple ways to perform data extrapolation. One good example is to multiply the data with projection metrics. Projection metrics are created based on the actual population of the region of interest and the given numbers are appropriated to match the population.

Using the available location data counts, we can extrapolate the numbers based on the known population density. Along with location data, data buyers often add other sources like demographics of the region, mobile penetration rate, Point-of-Interest (POI) attributes, notable events that may impact the footfall in the region, etc.

Let’s take footfall analysis, one of the most prominent analyses performed using location data, for example. It is mostly used to understand the reachability and popularity of a franchise at a specific location.

To demonstrate data extrapolation, we’ll use Quadrant’s location data between 2nd April 2021 to 8th April 2021 to perform footfall analysis for Vivo City Mall - the largest and most crowded mall in Singapore. What could be maximum reachability if a new Decathlon franchise is set to open at Vivo City?

Extrapolation 1-1

The observed footfall number of visitors to the mall is shown below.


*The first three bars indicate Friday, Saturday, and Sunday whereas the last 4 bars indicate Monday - Thursday

We see a lot moremall visitors on 2nd April 2021 (Friday and public holiday in Singapore) and 3rd and 4th April (weekend) than on weekdays (5th April 2021 – 8th April 2021).

And by looking at the hourly pattern of visitors, we can tell that the mall is most crowded between 12 PM and 10 PM.

Hourly Distribution

The whole footfall analysis is from the given location data. If we were to estimate numbers based on the population, we could add an extrapolation factor by projecting it from the given sample data to the population.

The overall number of devices observed in April 2021 (Singapore) from the sample dataset 585,230
The total population in Singapore (As of 2020) 5,896,686
Projection metric =5,896,686/585,230 = 10.08

We can project the measured footfall counts by multiplying with a factor of 10.08 to the original counts. Due to safe distancing measures during the COVID-19 pandemic, the Singapore government has capped the maximum occupancy of the mall at a given time. This resulted in a 20% drop in occupancy rate compared to the usual period.

Adjusted projection of Visitors

Note that the footfall would also include passers-by, people who are waiting in transit, or people who are going to Sentosa Island.

After applying the projection factor, the maximum potential visits to the mall are around 10,000 visitors on average on weekends (public holiday included) and 5,000 visitors on average on weekdays. Therefore, Decathlon could have potential reachability of 5,000 – 10,000 visitors every day if they were to open a franchise in the Vivo City shopping mall.

To make our projections more tailored to our use case, we can also add additional metrics. In this case, the percent of sports enthusiasts in Singapore would be a good metric to factor in. Or even zip code level population on the assumption that people on the same zip code or nearby blocks often visit the mall. Additional age demographic metrics could also be used to tailor to the right audience. Many other metrics could be considered in order to make better decisions for our use case.

With more organisations relying on geospatial data to make informed decisions that impact real-world scenarios, location data analysis is not optional but a necessity. When it comes to data extrapolation, quality location data adds a lot of confidence, compared to previous estimation methods. As a result, businesses can make the right decisions and gain a competitive edge.

Ready to evaluate a high-quality data sample for your analysis?

Contact our data consultants today



Great updates

Subscribe to our email newsletter today!