Employees Still Do Not Know What Data They Can Put Into AI Tools

Nearly 40 per cent of all data flowing into AI tools is sensitive, yet most organisations still lack clear, enforceable rules telling employees what they can and cannot paste into ChatGPT, Copilot, or Claude. The result is a grey zone where workers draw their own lines and quietly expose source code, customer records, legal documents, and strategy decks to public AI platforms.

The Scale of What Employees Paste

Cyberhaven’s 2026 AI Adoption and Risk Report, tracking data movements across seven million workers, found that 39.7 per cent of corporate data employees put into AI tools is sensitive. That figure was 10.7 per cent just two years earlier (Cyberhaven, February 2026). The trajectory is steep and it shows no sign of flattening.

LayerX’s enterprise browser telemetry puts the behavioural picture into sharper focus. Seventy-seven per cent of employees paste data into generative AI tools, with 82 per cent of that activity flowing through personal, unmanaged accounts rather than corporate-licensed platforms (LayerX, 2025). On average, employees make 14 pastes per day into non-corporate AI accounts. At least three of those contain sensitive corporate data.

Harmonic Security’s analysis of 22.4 million prompts across six major GenAI applications in 2025 breaks down the types of data exposed. Source code accounts for 30 per cent of all sensitive data exposures, followed by legal discourse at 22.3 per cent, M&A data at 12.6 per cent, and financial projections at 7.8 per cent (Harmonic Security, 2025). The pattern is consistent: employees are feeding AI tools the organisation’s most commercially sensitive material.

The free-tier problem compounds the exposure. Harmonic found that 87 per cent of sensitive data instances occurred via ChatGPT Free, not enterprise-licensed versions with data protection agreements. LayerX reports that 67 per cent of ChatGPT access in enterprises happens through personal accounts. Netskope’s 2026 Cloud and Threat Report confirms 47 per cent of all generative AI users access tools through personal accounts over which security teams have zero visibility (Netskope, 2026).

Zscaler’s ThreatLabz 2026 AI Security Report, analysing 989.3 billion AI/ML transactions across approximately 9,000 organisations, recorded 18,033 terabytes of enterprise data transferred to AI and ML platforms in 2025. That is a 93 per cent year-on-year increase. ChatGPT alone received 2,021 terabytes (Zscaler, 2026).

The Grey Zones Where Employees Make Their Own Rules

Most organisations have not given employees usable rules for the situations they actually face. The core governance failure is operational, not intentional.

A marketing manager wants to paste a client email into ChatGPT to draft a response. Is that acceptable if she removes the client’s name? A developer needs to debug proprietary code. Can he paste a function into Copilot if it does not contain API keys? A project manager uploads meeting notes from an internal strategy session to get a summary. Are those notes “confidential” or merely “internal”?

These are the grey zones. The research shows employees resolve them by defaulting to convenience. WalkMe found that 78 per cent of employees use AI tools not approved by their employer and 51 per cent receive conflicting guidance on when and how to use AI (WalkMe, 2025). The CybSafe and National Cybersecurity Alliance “Oh, Behave!” report, surveying 6,500 individuals across seven countries including Australia, found 43 per cent of respondents admitted sharing sensitive workplace information with AI tools without their employer’s knowledge. Fifty-eight per cent of AI users had received no training on security or privacy risks (CybSafe/NCA, 2025).

The Sprinto CISO Pulse Check Report, published in March 2026, puts numbers on the enforcement gap. Thirty-nine per cent of organisations have an AI usage policy that is not consistently enforced. Roughly two-thirds take longer than a week to implement policy changes after identifying new AI risks (Sprinto, 2026). ISACA’s 2025 AI Acceptable Use Policy template notes that only 28 per cent of organisations have a formal, comprehensive AI use policy (ISACA, 2025).

From Samsung to Grok: What Exposure Looks Like in Practice

The most instructive incident remains Samsung’s triple exposure in March 2023. Within 20 days of lifting its internal ban on ChatGPT, three separate incidents occurred in Samsung’s semiconductor division. An engineer pasted buggy source code from a semiconductor database program seeking a fix. A second employee uploaded chip-testing code and requested optimisation. A third employee recorded an internal meeting on a smartphone, transcribed it using a speech recognition application, and fed the transcript into ChatGPT to auto-generate meeting minutes.

Samsung restricted prompts to 1,024 bytes, then banned all generative AI tools across its largest divisions. The company warned employees the data was now stored on OpenAI’s servers and was impossible to retrieve. Samsung subsequently developed its in-house Gauss AI platform (Dark Reading, 2023).

The three exposure types mapped precisely to the grey zones most employees face daily: source code, code optimisation, and meeting transcription. An employee used AI to perform a mundane administrative task and in doing so fed an entire internal strategic discussion into a public platform.

More recent incidents show the risk has not diminished. In August 2025, over 370,000 conversations with xAI’s Grok chatbot were publicly indexed by Google, Bing, and DuckDuckGo after the platform’s “share” function generated URLs without noindex directives. Exposed conversations included medical queries, business details, passwords, financial data, and uploaded documents. Users were entirely unaware their conversations had become searchable (Forbes, August 2025).

OWASP Elevated This Risk for a Reason

OWASP’s decision to promote Sensitive Information Disclosure from LLM06 in the 2023/24 Top 10 for LLM Applications to LLM02 in the 2025 edition reflects the growing severity of this threat. The official documentation identifies PII, financial details, health records, confidential business data, security credentials, legal documents, proprietary algorithms, and source code as categories at risk (OWASP, 2025).

OWASP explicitly warns that users may “unintentionally provide sensitive data, which may later be disclosed in the model’s output.” The OWASP AI Exchange, a 300-page Flagship Project document, addresses prompt data risk directly: cloud-hosted models mean prompt data travels to external infrastructure, and “most Cloud AI models have your input and output unencrypted in their infrastructure.” Even government subpoena risk is flagged (OWASP AI Exchange, 2025).

OWASP’s recommended mitigations include data sanitisation before prompts reach models, strict access controls, differential privacy techniques, user education on avoiding sensitive data input, and clear data retention and usage policies. It also cautions that adding restrictions within system prompts “may not always be honoured and could be bypassed via prompt injection.” Technical controls alone are insufficient.

Regulators Have Already Weighed In

The Australian Office of the Information Commissioner addressed this directly in its December 2025 guidance on GenAI tools in the workplace. The guidance includes a fictional case study, “CarCover,” where an employee enters client personal information into a public GenAI tool, resulting in a notifiable data breach under the Australian Privacy Act. Privacy Commissioner Carly Kind stated that “the community and the OAIC expect organisations seeking to use AI to take a cautious approach.” The OAIC’s position is unambiguous: it is best practice not to enter personal information into publicly available generative AI tools (OAIC, December 2025).

New Zealand’s Privacy Commissioner guidance takes a similarly cautious line, advising that the safest course is to avoid putting personal information into AI tools if unsure, and adding a requirement unique to Aotearoa: engaging with Māori perspectives on data sovereignty and privacy. The NZ Digital Government GenAI guidance states that government information submitted to public GenAI systems must either already be public or be acceptable to make public (NZ OPC/Digital Government, 2025).

The UK Information Commissioner’s Office has confirmed that if employees input personal data into ChatGPT, this constitutes “processing” under UK GDPR, triggering requirements for lawful basis, data protection impact assessments, and privacy notices. The EU AI Act’s Article 4 AI literacy requirement, effective since February 2025, mandates that staff be trained on AI use, with fines of up to 7 per cent of global revenue for non-compliance (ICO; EU AI Act Article 4).

Half of Organisations Have No DLP for AI, and Policies Alone Do Not Work

Netskope’s 2026 Cloud and Threat Report found that only 50 per cent of organisations have deployed DLP tools that cover generative AI applications. Among those that have, the average organisation now records 223 monthly attempts by employees to include sensitive data in GenAI prompts or uploads. That figure has more than doubled year-on-year. Zscaler recorded 410 million DLP policy violations tied to ChatGPT alone in 2025 (Netskope, 2026; Zscaler, 2026).

Traditional DLP architectures were designed for file-based, network-level inspection and are fundamentally blind to copy-paste operations in browser-based AI interfaces. LayerX data shows that copy-paste has surpassed file uploads as the primary data exfiltration channel. Encrypted traffic creates further blind spots for network-based DLP.

The most promising enforcement approach is real-time coaching rather than blanket blocking. Netskope data shows 73 per cent of users who receive real-time coaching warnings choose not to proceed with the risky action, and 57 per cent altered their behaviour after coaching alerts. This is dramatically more effective than the approximately 3 per cent reduction in risky behaviour that Microsoft’s Digital Defense Report attributes to awareness training alone (Netskope, 2026). Blocking drives workarounds, while coaching changes behaviour at the point of risk.

What Organisations Need to Do

The accepted framework across NIST, ISO 42001, the OAIC, and the ICO now converges on a tiered data classification model mapped against tool type. Public data is permitted with any approved tool. Internal data is permitted only with enterprise-licensed AI tools. Confidential data is restricted to enterprise tools with contractual data protection agreements. Restricted or highly sensitive data is prohibited from all external AI tools.

That framework exists on paper. Fewer than a third of organisations have implemented it (ISACA, 2025). The gap between framework and practice is where the risk lives.

For SMEs without dedicated security teams, the practical path is a scenario-based AI acceptable use policy that addresses the five most common grey zones explicitly: email content, meeting notes, source code, customer data, and internal documents. The policy should be paired with an approved-tools list reviewed quarterly and just-in-time coaching delivered through browser-based DLP. The policy does not need to be long. It needs to answer the question every employee is already asking: “Can I paste this?”

CISA’s May 2025 AI Data Security guidance, co-published with the Australian Signals Directorate and agencies from New Zealand and the UK, reinforces these requirements: data classification, access controls, encryption, privacy-preserving techniques, and ongoing monitoring (CISA/ASD, 2025). Research on training effectiveness from UC San Diego found interactive, scenario-based training reduced future risky behaviour by 19 per cent, while static training showed negligible effect. Training effects degrade after four to six months without reinforcement (UC San Diego; Gartner/SANS/KnowBe4).

Employees are putting sensitive data into AI tools at scale, every day. Whether organisations give them clear, scenario-based rules before the next Samsung-style incident becomes their own is the governance question most have not yet answered.