Tracking the Trackers: Ethical measurements of web privacy leakages in-the-wild
22 June, 14:00 BST @ WebSci 2021
Nishanth Sastry, Guillermo Suarez De Tangil, Nicolas Kourtellis, Mainack Mondal, Xuehui (Rachel) Hu, Pushkal Agarwal
University of Surrey, IMDEA Networks, King’s College London, Telefonica Research, IIT Kharagpur
First introduced in the mid-nineties as a way of recording client-side state, cookies have proliferated widely on the Web, and have become a fundamental part of the Web ecosystem. However, there is widespread concern that cookies are being abused to track and profile individuals online for commercial, analytical and various other purposes. Consequently, there has been an explosion of research into understanding the prevalence of tracking on the Web, and the resulting leakage of Personally Identifiable Information (PII). In this tutorial, we aim to introduce the audience to state-of-the-art empirical measurement methods and techniques that are being used to understand and quantify web tracking in-the-wild.
Introduction (14:00 - 14:10 BST)
- Agenda & Overview
- Specify some familiarity with JS and Python
- Quick heads up about setup needed for the lab
Types/means of tracking (Slides) (14:10 - 14:35 BST)
- Cookies
- Cookie Synchronization
- Invisible pixels
- Device Fingerprinting
- CNAME Cloaking Tracking
GDPR and Consent Management on the Web (Slides) (14:35 - 15:00 BST)
- Legal and regulatory basis for fighting back against trackers
- GDPR and other data privacy laws around the world
- Industry fightback (and ways to manage the regulatory burden)
- IAB, ICC Categories and Consent Management Platforms
- Dark Patterns to deceive users to consent to tracking
Ethical and privacy-preserving internet-mediated research (Slides) (15:00 - 15:35 BST)
- Tracking as internet-mediated research
- The Belmont Report
- Basic ethical principles
- Theories of data privacy
- Ensuring ethics and privacy of Internet-mediated research
Practical Session Teaser (15:35 - 15:45 BST)
- Automated measurements (OpenWPM and barebone Selenium) (Slides) (Installations)
- Human-centered measurements (Browser extensions) (Slides)
Coffee Break (15:45 - 16:00 BST)
Practical Session
Hands-on experience into state-of-the-art tools and techniques, including some developed by the tutorial organisers as well as some instructions for our github repositories and datasets. Most of the online activities will involve some light python scripting on Google Collab. Familiarity with Python will be helpful but not required. We will also build a basic browser extension to measure web privacy. Familiarity with Javascript and HTML will be helpful for this. Optional requirements:
- Chrome browser, to follow browser extension-related activities.
- Linux or Mac: to install OpenWPM (Not essential; but installation may provide additional insights into performing automated measurements)
Automated measurements (Slides) (Installations) (Google Collab) (16:00 - 16:45 BST)
- How to install / launch OpenWPM?
- What to do when anti-crawling mechanisms are deployed?
- What options / settings available?
- How to build synthetic profiles?
- What data should be collected?
Human-centered measurements (Slides) (Download) (Google Collab) (16:45 - 17:30 BST)
- How to create a browser extension?
- Monitoring and collecting user activity
- What data should be collected?
- Handling informed consent
- Aggregating and visualizing data
- Interpreting results