Almost a quarter of the files workers upload to AI has sensitive info

Aug 5, 2025
2 min read

Date: August 5, 2025

AI/ML, Data Security, Application security, Generative AI

Nearly 22% of files and more than 4% of prompts employees send to generative AI (GenAI) tools contain sensitive information, according to an analysis by Harmonic Security published Thursday.

Harmonic analyzed 1 million prompts and 20,000 uploaded files sent by workers at companies across the United States and United Kingdom to more than 300 different GenAI and AI-enabled software-as-a-service (SaaS) applications. The prompts and uploads were recorded by the Harmonic Security Browser Extension.

The majority of sensitive prompts – 72.6% – went to OpenAI’s ChatGPT, and 26.3% of all sensitive prompts went to a free version of ChatGPT rather than ChatGPT Enterprise.

“Enterprise accounts often mean that security teams get logs of usage, whereas with personal accounts they are flying blind. Personal accounts are often free and that can mean the AI tools are training on input data,” Harmonic Security Vice President Michael Marriott told SC Media.

Harmonic also found that 15.13% of sensitive prompts and files sent to Google Gemini went through free accounts, while 47.42% of the sensitive data uploaded to Perplexity went to standard, non-enterprise accounts.

While the top six AI tools that received sensitive data were ChatGPT (72.6%), Microsoft Copilot (13.7%), Gemini (5%), Anthropic’s Claude (2.5%), Quora’s Poe (2.1%) and Perplexity (1.8%), Harmonic’s analysis found a wide variety of new tools being adopted. The average company saw 23 previously unknown GenAI tools being used by employees for the first time during the data collection period between April and June 2025.

Sensitive files accounted for 13.9% of all data exposure events to AI and 54.9% of files sent to AI tools were PDFs. Files made up 79.7% of credit card exposures, 75.3% of customer profile leaks and 68.8% of employee personal identifiable information (PII) exposures.

The most common type of sensitive data leaked was proprietary code, with this data type most often being sent to ChatGPT, Claude, DeepSeek and Baidu Chat.

Employees are also increasingly using AI-enabled SaaS tools in addition to traditional chatbots, with common examples including the AI-powered coding platform Replit, the graphic design tool Canva, which has AI features, and the AI-driven writing assistant Grammarly.

Previous research published in 2024 by Cyberhaven found that up to 27.4% of data employees send to AI chatbots is sensitive.

Harmonic said blanket bans on AI tools are no longer a viable strategy for companies to protect their data due to the degree to which AI has been embedded into business tools across industries.

“Organizations that fully block non-enterprise accounts are in a false sense of security; often employees will use a plethora of tools to do their job and restricting that pushes them to personal devices and accounts,” Marriott said.

Instead, Harmonic recommended businesses attempt to gain more visibility into employees’ use of enterprise, free and embedded AI tools, enforce context-aware controls at the data layer and work with AI vendors to understand model training opt-out policies that can help protect their data.

Click Here to See Past & Upcoming Events

Almost a quarter of the files workers upload to AI has sensitive info

Recent Posts