Academic Jobs - Home of Higher Ed Logo

Canadian Watchdogs Accuse OpenAI of Privacy Violations in ChatGPT Training, Urge Legal Reforms

168views
Submit News
a cell phone sitting on top of a laptop computer
Photo by Levart_Photographer on Unsplash

The Joint Probe: How Canadian Regulators Uncovered OpenAI's Shortcomings

Canadian privacy authorities have delivered a landmark ruling against OpenAI, the creator of the wildly popular ChatGPT artificial intelligence tool. In a detailed report released on May 6, 2026, federal and provincial watchdogs concluded that the company violated key privacy laws during the training of its early ChatGPT models. The investigation, which spanned three years, highlighted systemic issues in how vast troves of personal data were scraped from the public internet without proper consent or safeguards. This development underscores growing tensions between rapid AI innovation and the protection of individual privacy rights in Canada.

The probe was triggered by a complaint filed in April 2023 to the Office of the Privacy Commissioner of Canada (OPC). It quickly expanded into a collaborative effort involving the OPC, Quebec's Commission d'accès à l'information (CAI), British Columbia's Office of the Information and Privacy Commissioner (OIPC-BC), and Alberta's Office of the Information and Privacy Commissioner (OIPC-AB). Together, these bodies examined OpenAI's compliance with the Personal Information Protection and Electronic Documents Act (PIPEDA)—Canada's federal private-sector privacy law—and equivalent provincial statutes.

At the heart of the matter was ChatGPT's foundational models, GPT-3.5 and GPT-4, released in late 2022. These large language models (LLMs) were trained on datasets comprising trillions of words gathered primarily from publicly accessible websites. Sources included web crawlers like Common Crawl, licensed content from media outlets and stock image providers, and even user interactions within ChatGPT itself. Regulators found that this process inadvertently—and sometimes directly—captured sensitive personal details about Canadians, such as health conditions, political opinions, ethnic origins, and information pertaining to children.

Breaking Down the Data Collection Process: A Step-by-Step Look

To understand the violations, it's essential to grasp how LLMs like those powering ChatGPT are built. The training process unfolds in stages: pre-training, where the model ingests massive unstructured text to learn language patterns; supervised fine-tuning, using labeled examples; and reinforcement learning from human feedback (RLHF), refining outputs based on preferences.

Step one involves data acquisition. OpenAI deployed tools like its GPT Bot to scrape the open web, amassing datasets that exceeded 99 percent public content. No comprehensive screening excluded social media profiles, forums, or personal blogs, leading to overcollection. Regulators noted that while some filters blocked harmful sites or paywalled content, they were rudimentary and insufficient for the scale involved.

Step two: tokenization and processing. Raw text is broken into tokens—subword units—and fed into neural networks for statistical learning. Personal information, even incidental, becomes embedded, potentially resurfacing in outputs as 'hallucinations'—plausible but fabricated details.

Step three: fine-tuning with user data. ChatGPT conversations were anonymized and used to improve the model, but initial opt-out mechanisms were buried, and notices appeared post-interaction.

  • Public web scrapes: Primary source, no consent from individuals.
  • Licensed datasets: Less than 1 percent, but lacking robust privacy warranties.
  • User chats: Opt-out available but not prominent, especially for free users.

This unchecked approach contravened core principles of necessity and proportionality, as vast data volumes far exceeded what was required for legitimate model development.

Key Violations: Consent, Transparency, and Beyond

The regulators pinpointed multiple breaches. Foremost was the absence of valid consent. Under PIPEDA Principle 4.3, organizations must obtain meaningful consent for collection, use, and disclosure. Implied consent from public availability doesn't hold for sensitive data or uses outside reasonable expectations—like fueling proprietary AI models. Provincial laws imposed even stricter implicit or deemed consent rules, which OpenAI failed to meet.

Transparency suffered too. OpenAI's privacy policy and terms vaguely referenced 'publicly available internet data' without detailing categories or risks. Users weren't warned that their forum posts or blog entries could train a 'black box' system prone to errors.

Accuracy posed another risk. Internal tests revealed 20-50 percent factual inaccuracy rates in ChatGPT outputs, including fabricated personal details. Without verification tools or prominent disclaimers, this could lead to real-world harms, such as biased hiring decisions or misinformation.

Individual rights were undermined: data access exports were cumbersome and incomplete; corrections relied on blocklists requiring proof; deletion ('untraining') was infeasible due to diffused model weights. Retention lacked schedules, with raw data held 'as long as necessary'—potentially indefinitely.

Accountability was lacking; ChatGPT launched despite known risks, as admitted by early leaders prioritizing speed over safeguards.

OpenAI's Defense and Remediation Efforts

OpenAI contested some jurisdictional claims, arguing no physical Canadian presence or pre-launch ties. Regulators affirmed extraterritorial reach via user data flows and commercial targeting of Canadians.

In response to a preliminary report, OpenAI acted decisively. It deprecated GPT-3.5 and GPT-4, introducing successors with a advanced privacy filter boasting 98-99 percent recall for personally identifiable information (PII). This tool contextually masks names, addresses, and more, distinguishing private citizens from public figures.

Other changes include pre-chat notices warning against sensitive inputs, expanded opt-outs preserving history, temporary chat modes, web search integration for sourced responses, formal retention policies with deletion milestones, and bilingual Canadian-specific guidance. Quarterly compliance reports ensure ongoing adherence.

A spokesperson emphasized: 'People are using ChatGPT in increasingly personal ways... We take that responsibility seriously.' While disagreeing with all findings, OpenAI views the collaboration as advancing privacy-by-design.

canada text overlay on black background

Photo by Andy Holmes on Unsplash

Digital illustration representing data privacy risks in AI training with locks and code streams

Outcomes: No Fines, But Conditional Wins and Monitoring

Unlike hefty EU GDPR penalties, no monetary fines were levied. The OPC conditionally resolved the complaint under PIPEDA, deeming mitigations sufficient for future operations. Provincial outcomes varied: Quebec issued recommendations, while B.C. and Alberta deemed past practices unresolved but acknowledged improvements.

Privacy Commissioner Philippe Dufresne stated: 'OpenAI launched ChatGPT without having fully addressed known privacy issues. This exposed Canadians to potential risks of harm.' Yet, he praised the fixes, noting millions of monthly Canadian users can now engage more safely.

Monitoring continues, with expectations for explainability enhancements and child protections.

Real-World Impacts on Canadians and Broader Society

With ChatGPT boasting millions of Canadian users—including in professional settings—the stakes are high. Inaccurate outputs risked discrimination, such as erroneous health inferences in job screenings. Breaches could expose scraped data, while biases from unfiltered web content perpetuate stereotypes.

Cultural context matters: Canada's diverse population amplifies sensitivity around ethnic or political data. Recent events, like the Tumbler Ridge tragedy where a banned ChatGPT user planned violence without police alerts, heightened scrutiny—though not central to this probe, CEO Sam Altman apologized publicly.

Surveys indicate 40 percent of Canadians use generative AI weekly, per recent polls, fueling demands for trust.

Stakeholder Perspectives: Regulators, Experts, and Industry

Experts like University of Ottawa's Teresa Scassa hail the negotiated approach as pragmatic. Michael Geist critiques legislative lag, while Emily Laidlaw advocates principle-based rules over scraping bans.

Privacy advocates push for opt-in defaults; tech firms warn overregulation stifles innovation. Diane McLeod (Alberta) calls for pre-release assessments and fines.

  • Regulators: Prioritize privacy in AI evolution.
  • OpenAI: Committed to balancing innovation and protection.
  • Users: Demand clearer controls and accuracy.
  • Businesses: Seek guidance amid integration boom.

Global Context: Canada Joins International Scrutiny

Canada's action mirrors global trends. The EU's GDPR has fined Meta billions for similar scraping; Italy temporarily banned ChatGPT in 2023. U.S. states eye AI laws, while the UK's ICO probes data use. For full details, see the official joint investigation report.

two adult golden and light golden retrievers on rock during day

Photo by J. Schiemann on Unsplash

Comparisons:

JurisdictionKey ActionOutcome
CanadaJoint probeCommitments, no fines
EUGDPR enforcementMulti-billion fines
ItalyChatGPT banLifted post-fixes

Path Forward: Calls for Legal Reforms and Actionable Insights

Watchdogs urge modernizing PIPEDA and provincials for AI specifics—like mandatory impact assessments and real-time consent. Bill C-27's Artificial Intelligence and Data Act lingers; experts predict revival.

For Canadians: Opt out of data training, avoid sensitive prompts, verify outputs. Businesses: Conduct privacy audits, use enterprise versions with controls. Check CBC's coverage for user stories: OpenAI didn't respect Canadian privacy law.

Future outlook: As GPT-5 promises lower hallucinations (under 4 percent), privacy integration will define ethical AI. Canada positions as a balanced regulator, fostering innovation sans unchecked risks.

Conclusion: Balancing AI Power with Privacy Rights

This saga marks a pivotal moment. OpenAI's fixes mitigate past wrongs, but sustained vigilance ensures AI serves, not surveils, Canadians. With reforms on horizon, the nation charts a privacy-forward digital path.

Portrait of Dr. Liam Whitaker
About the author

Dr. Liam WhitakerView author

Academic Jobs In House Author

Discussion

Sort by:

Be the first to comment on this article!

You

Please keep comments respectful and on-topic.

New0 comments

Join the conversation!

Add your comments now!

Have your say

Engagement level

Frequently Asked Questions

⚖️What laws did OpenAI violate in Canada?

OpenAI contravened PIPEDA federally and provincial acts in Quebec, B.C., and Alberta by overcollecting data, lacking consent, and poor transparency.

🌐How did OpenAI collect Canadian data for ChatGPT?

Primarily via web scraping public sites like Common Crawl, licensed content, and user chats—trillions of words including sensitive info without filters.

Was consent obtained from Canadians?

No valid consent; implied from public sources deemed invalid for sensitive data. User opt-outs were not prominent initially.

⚠️What risks did these violations pose?

Inaccuracies ('hallucinations'), discrimination from biased outputs, breach exposure, and inability to correct/delete data.

🔧How has OpenAI responded?

Deprecated old models, deployed 98-99% effective PII filters, improved notices/opt-outs, retention policies, and Canadian-specific guidance.

💰Were fines imposed on OpenAI?

No; conditionally resolved federally with monitoring. Provinces issued recommendations.

👥What are the implications for AI users in Canada?

Safer now, but verify outputs, avoid sensitive inputs, use opt-outs. Millions affected indirectly via scraped data.

🌍How does this compare globally?

Similar to EU GDPR actions on Meta; Canada emphasizes cooperation over fines. See report.

📜What reforms are watchdogs calling for?

Modernize PIPEDA for AI: impact assessments, penalties, real-time consent. Bill C-27 revival eyed.

🗑️Can Canadians request data deletion from ChatGPT?

Yes, via tools for exports, blocklists, opt-outs. Process improved but verify compliance.

💼Is ChatGPT safe for professional use now?

Better with fixes, but always cross-check critical info due to inherent LLM limits.