February 7, 2022 Saif Kakakhel

Using AI to clean Personally Identifiable Information from user-generated data sets

Having access to large repositories of data enables businesses to optimise operations in several ways. This includes personalised advertising, greater supply chain efficiency, and more satisfying customer experiences.

However, people have grown increasingly wary of trusting businesses and governments with their data. Several high-profile data privacy breaches at LinkedIn, Alibaba, and Yahoo (to name a few) collectively impacted billions of users.

Ethically managing data and making sure no Personally Identifiable Information (PII) makes it to big data sets is a challenge we take very seriously at Quadrant.

PII BP body image

Quadrant's Geolancer app and the privacy implications with POI data

Using our proprietary POI collection and verification platform, Geolancer, people can manually submit information about physical locations in exchange for eQUAD (a cryptocurrency issued by Quadrant Protocol). Besides ensuring accuracy through a multistage verification process, Geolancer allows users to record custom attributes to generate bespoke datasets tailored to solve specific business problems.

When collecting POI data via Geolancer, users take pictures of the respective locations. Since businesses want POI data on well-populated cities, certain forms of PII inevitably slip into some of the collected images such as human faces, vehicle number plates etc.

Classification of data under privacy regulations

Data privacy frameworks – like the EU’s General Data Protection Regulation (GDPR) – impose constraints on businesses in terms of the customer data they can collect. Specifically, such frameworks are designed to prevent companies from collecting PII without acquiring explicit consent from individuals.

While there are no specific clauses pertaining to facial data, Article 9 of the UK's GDPR has the provision to fine companies for processing special categories of personal data. This category can include any sensitive data, such as PII, as determined at the discretion of the specific regulatory body. Moreover, as per clause 14, Article 4 of the EU's GDPR: “‘biometric data’ means personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person, such as facial images or dactyloscopic data.” As such, it is illegal for businesses to collect images that contain such information about people.

California’s Consumer Privacy Act (CCPA) also bars businesses from collecting PII – which includes biometric data. According to the CCPA, biometric information is classified as “an individual’s physiological, biological or behavioural characteristics, including any data that can be used, singly or in combination with other identifying data, to establish individual identity.”

Notable incidents where companies have been fined for collecting and sharing facial data

A French data privacy watchdog (CNIL) is gearing up to take legal action against Clearview AI – a facial recognition company that has amassed more than 10 billion sensitive images of people through social media and other internet sources without gaining their consent. The United Kingdom’s Information Commissioner’s Office intends to fine Clearview 17 million pounds for violations of data privacy laws. Another business that incurred monetary penalties for data privacy breaches is Mercadona. In 2021, AEPD (Spain’s data protection authority) fined Mercadona approximately 2.5 million Euros for running facial recognition programs on anyone that entered the company’s supermarkets. The ruling stated that the company had violated several articles of GDPR.

In the same year, Italy’s Garante per la Protezione dei Dati Personali fined a Milan-based university 200,000 Euros for monitoring and recording students for suspicious behaviour during remote examinations. Since this mechanism of supervision involved processing of biometric data, it violated GDPR. Similarly, in 2019, the Swedish data protection authority fined a school 20,000 Euros for using a facial recognition system to monitor attendance for a small group of students – since this too involved processing biometric information which is a violation of GDPR.

How Quadrant is ensuring privacy and saving our buyers efforts and resources for compliance

To proactively ensure compliance with data privacy laws, Quadrant’s data science team has built proprietary algorithms to scan POI data and images for PII and take appropriate measures to conceal sensitive information before data is passed on to customers.

In the first stage, all images collected via Geolancer go through a proprietary AI model. This model blurs out potentially sensitive information such as faces, vehicle registration plates etc. captured in the images. The AI model has a high degree of accuracy and will improve as it gets exposed to more training data. In time, the data science team will add the ability to remove other forms of PII as well. It is important to note that within 4-10 seconds of a Geolancer uploading an image through the app, the image has already been analysed and modified – if needed – by the algorithm. The old image is deleted and replaced with the PII-free image. To ensure that no sensitive information falls through the cracks, an annotation team monitors and manually blurs PII that escapes the AI model.

As a result, clients receive POI data with no PII. This also extends to data received for verification from customers. When the data is verified via Geolancers both old and new images collected by Geolancers will have blurred PII.

Here are a few sample images that have gone through Quadrant’s PII blurring algorithm:

4-Feb-07-2022-05-38-28-16-AM

1-2

2-Feb-07-2022-05-38-27-92-AM

3-Feb-07-2022-05-38-27-50-AM

To learn about the intricacies and use cases of POI data, you can access our POI knowledge base here. Or check out our customer success stories for Geolancer.

Using AI to clean Personally Identifiable Information from user-generated data sets

Share

Quadrant's Geolancer app and the privacy implications with POI data

Classification of data under privacy regulations

Notable incidents where companies have been fined for collecting and sharing facial data

How Quadrant is ensuring privacy and saving our buyers efforts and resources for compliance

ABOUT AUTHOR

RELATED POSTS

4 Key Takeaways from the “Unlocking the Value of Location Data: From Insights to Impact” Webinar

Tracking People’s Movement After Disasters Can Save Lives : Asian Development Blog

Remapping Southeast Asia, one building at a time (Case Study)