
Within the framework of the Digital Markets Act (DMA), the European Commission has put forth a provision that would mandate Google to share search data with third parties via an automated Application Programming Interface (API). Cybersecurity and privacy experts are cautioning that the proposed data anonymization mechanism harbors fundamental flaws, potentially paving the way for mass surveillance of users across the European Union (EU).
The DMA is designed to target major technology firms—the “gatekeepers”—such as Google, with the goal of fostering greater competition in digital markets. Nevertheless, critics argue that this specific new rule risks compromising the privacy of European users and potentially national security. Concerns have been voiced by Lukasz Olejnik, a recognized expert in cybersecurity and data protection. After examining the draft document, which aims to boost competition by forcing Google to grant qualified entities access to search data, Olejnik assessed that the suggested anonymization scheme would likely not prevent users from being re-identified. He believes the mechanism itself creates an opening for large-scale collection of sensitive information.
The new provision requires Google to continuously stream search activity from across the entire EU through the API. Reports indicate the data package set for transfer includes full query texts, timestamps, approximate user locations, language, device type, as well as granular behavioral signals like clicks, page scrolls, and query refinements. While IP addresses and account identifiers are supposedly slated for removal, Olejnik maintains that the remaining data set is sufficient for de-anonymization.
[Image description in original text] A segment of the European Commission’s draft highlighting the requirement for Alphabet to transfer all search queries entered by users into Google Search, including query modifications and metadata. Image source: blog.lukaszolejnik.com
[Image description in original text] A segment of the European Commission’s draft highlighting the requirement for Alphabet to transfer all search queries entered by users into Google Search, including query modifications and metadata.
The system for anonymization is based on a “whitelist” model, frequently termed a “permissive list.” Specific elements of search queries—such as names or keywords—are permitted for transfer only if they have been entered by a minimum of 50 authorized users over a 13-month period. Once included, an element remains on this list for up to five years. However, this threshold only applies to segments of queries, not the entire searches; consequently, unique or sensitive searches composed of commonly used words could still end up in the exported data.
Olejnik stresses that this architectural setup invites targeted manipulation. Malicious actors could engage in “seeding” the system by running repeated queries from numerous accounts to push desired terms onto the permissive list. Once a term is approved, it could allow for years of tracking sensitive inquiries linked to specific persons, organizations, or subjects.
[Image description in original text] De-anonymization scheme: A website owner cross-references their visit log with the Google export to identify a user’s confidential query.
[Image description in original text] De-anonymization scheme: A website owner cross-references their visit log with the Google export to identify a user’s confidential query.
The data being transferred can be easily correlated with external sources. The stream includes timestamps for pages users visited and generalized interaction times. Web analytics providers or tracking scripts would possess sufficient information in this data to match search records with visit logs, thereby reconstructing individual search histories—even without direct identifiers being present.
Geolocation represents another weak point in the system. Coordinates are generalized into “buckets” encompassing an area of at least 3 km² and covering a minimum of 1,000 users. Nevertheless, such zones may still correspond to very specific locations, such as university campuses or government districts. Over time, observers might be able to trace search patterns near medical facilities, state agencies, or high-security installations.
[Image description in original text] Observer builds a user profile by tracking daily search queries originating from a geographical “bucket” tied to a known location.
[Image description in original text] Observer builds a user profile by tracking daily search queries originating from a geographical “bucket” tied to a known location.
Olejnik deems this proposal one of the most significant potential data leakage threats Europe has faced in recent years. In his estimation, the existing safeguards rely more heavily on procedural oversight rather than robust technical defenses. Furthermore, the researcher disagrees with the foundational premise that frequency thresholds and partial anonymization can effectively deter misuse.