How To Keep AI Agents Safe At Work

AI agents are moving from interesting demos into real workplace tools. They can read documents, browse websites, write code, call APIs, create tickets, update files and sometimes act across several systems at once. That makes them useful, but it also means AI agent security needs stronger rules than ordinary chatbot use. If your team is still building the basics, start with our guide to AI security for UK businesses, then use this article to decide how much autonomy an agent should have at work.

The key difference is action. A chatbot usually answers. An agent can do. It may plan a task, choose tools, inspect files, run commands, draft emails or make changes. A badly governed agent can leak data, follow malicious instructions, damage files, approve the wrong workflow or create changes no one properly reviewed.

This guide explains how to keep AI agents safe at work without banning them completely. It covers permissions, prompt injection, tool access, human approval, logs, sensitive data, coding agents and incident response.

What is an AI agent?

An AI agent is a system that uses an AI model to pursue a goal through steps and tools. Instead of simply answering a question, it can decide what to do next. It might search a knowledge base, read a PDF, query a CRM, create a spreadsheet, write code, open a pull request or send a message for review.

That makes agents powerful because they can reduce repetitive work. It also makes them risky because tool access turns a mistaken or manipulated response into a real-world action. The OWASP GenAI Security Project has published dedicated agentic AI security guidance because autonomous agents introduce risks beyond ordinary LLM applications.

The safest AI agent is not the smartest one. It is the one with the least power needed to complete the task safely.

Why AI agents change workplace risk

Traditional workplace security usually focuses on people, accounts, devices, suppliers and data. AI agents sit across all of those areas. An agent may operate through a human account, read shared data, use a supplier platform, interact with a browser and generate work that staff trust. If the controls are unclear, responsibility becomes blurry.

The NCSC guidance on AI and cyber security highlights risks such as data leakage, prompt injection and unsafe reliance on AI outputs. Agents raise the stakes because they can combine those risks with tool use. A misleading instruction in a document is no longer only a bad answer; it may become a file change, a message, an API call or a command.

For practical teams, the question is not “are AI agents safe?” The better question is: what can this agent access, what can it change, who approves risky actions, and how would we know if something went wrong?

The main AI agent risks

Most workplace agent risks fall into a small number of patterns. They are easier to manage when named clearly.

Prompt injection: malicious or hidden instructions in websites, documents, tickets or emails influence the agent.
Excessive permissions: the agent can read or change more than the task requires.
Unsafe tool use: the agent can run commands, call APIs or send messages without enough review.
Data leakage: sensitive information is included in prompts, logs, tool outputs or third-party systems.
Over-trust: staff assume agent output is correct because it looks confident and well formatted.
Weak audit trails: the team cannot tell what the agent read, changed or attempted.
Supply chain risk: agent skills, connectors, plugins or MCP servers introduce hidden behaviour.

The OWASP Agentic Skills Top 10 is especially useful for teams experimenting with skills, connectors and workflow packs. It frames skills as the behaviour layer that tells agents what workflows they can perform. That layer deserves review because a skill can shape how the agent uses its tools.

A simple safety model: read, suggest, act

Before deploying an agent, split its capability into three levels: read, suggest and act. This model helps non-technical managers understand what is being approved.

Level	What the agent can do	Typical control
Read	View approved data, summarise documents, inspect tickets or code	Limit data scope, log access, restrict sensitive sources
Suggest	Draft replies, propose code, recommend actions, create plans	Human review before use
Act	Send messages, change files, run commands, update systems, call APIs	Explicit approval, least privilege, rollback plan and audit logs

Many businesses can start safely at read or suggest level. Full action should be reserved for narrow, low-risk tasks until the team has better governance. If an agent can affect customers, money, production systems, legal documents, security settings or personal data, it needs stronger review.

Set permission boundaries before the first pilot

Permissions are the centre of AI agent safety. Decide what the agent can read, write, execute, send and delete. Then decide whether those permissions are temporary, task-specific or always available. Avoid giving an agent broad access because “it might need it”. That is how useful experiments become uncontrolled risk.

Coding agents make this especially visible. Anthropic’s Claude Code security guidance describes a permission-based architecture where additional actions such as edits, tests and commands require explicit approval. OpenAI’s Codex Security guidance also emphasises human review for generated patches. The principle applies beyond coding: the agent should ask before doing anything with meaningful side effects.

Permission rules worth using

Start read-only where possible.
Limit the agent to a specific workspace, folder, project or dataset.
Require approval before file deletion, data export, sending messages or changing production systems.
Do not give agents shared admin accounts.
Use separate test environments for experiments.
Keep API keys, passwords and tokens out of prompts and project files.
Review permissions after each pilot, not only at launch.

Protect against prompt injection

Prompt injection happens when an attacker places instructions inside content the agent processes. The content might be a web page, email, PDF, code comment, support ticket or calendar invite. The hidden instruction may tell the agent to ignore previous rules, reveal data, call a tool, change output or take an unsafe action.

This is different from a normal phishing email because the target may be the agent, not the human. A staff member might ask an agent to summarise a page. The page may contain instructions aimed at the agent. If the agent treats untrusted text as instructions, it can be manipulated.

The safest assumption is that anything the agent reads from outside your trusted control may contain hostile instructions. That does not mean agents are unusable. It means untrusted content should not be allowed to directly control tools, secrets or decisions.

Prompt injection controls

Separate trusted instructions from untrusted content in the system design.
Do not let web pages, emails or documents trigger tool actions automatically.
Require confirmation before acting on instructions found inside external content.
Block access to secrets while the agent processes untrusted content.
Use allowlists for tools and domains where possible.
Train staff to recognise that agents can be socially engineered too.

Control tool access carefully

An agent is only as safe as the tools connected to it. Browser access, shell access, email access, CRM access, calendar access, ticketing access and file access each add a new path for mistakes. MCP servers, plugins, skills and connectors should be treated like software dependencies. They need owners, reviews and updates.

For small teams, start with a simple agent tool register. List each agent, which tools it can use, what data those tools expose, whether it can make changes, who owns it and when it was last reviewed. This belongs in the same governance routine as your cyber risk register.

Tool type	Risk	Safer setup
Email	Sending messages, exposing customer data, impersonation	Draft-only mode, approval before send, restricted mailbox scope
Browser	Indirect prompt injection, malicious pages, fake forms	No secrets during browsing, confirmation before form submission
Code editor	Unsafe code changes, secret exposure, destructive commands	Workspace limits, branch workflow, human pull request review
CRM	Customer data leakage or unwanted updates	Read-only pilot, role-based access, change logs
Shell/API	Command execution, deletion, production changes	Test environment, approval gates, command allowlists

Keep humans in the loop where consequences matter

Human approval should not be a decorative checkbox. It should be placed where consequences appear. If the agent drafts a low-risk internal note, review can be light. If it changes customer records, commits code, sends a message, approves a payment workflow or edits a public page, approval should be explicit.

Approval should also show enough context. The reviewer needs to see what the agent intends to do, what data it used, which tool it will call and what will change. “Approve action” is too vague. “Send this email to these three recipients with this content” is reviewable.

Logging: know what the agent did

Good logs are boring until something goes wrong. Then they are essential. A workplace agent should record key actions: who started the task, what tools were called, what data sources were used, what changes were made and what approvals were granted.

Do not log sensitive data unnecessarily. The goal is traceability, not creating a second copy of every secret. Store logs where the right people can review them, and include agent activity in incident response planning. If an agent leaks data or makes an unsafe change, the business needs to reconstruct what happened.

Safe rollout plan for a small business

A safe rollout does not need to be slow. It needs to be staged. Start with low-risk, high-learning tasks, then expand only when controls are proven.

Pick one use case: choose a task with clear boundaries, such as summarising public documents or drafting internal notes.
Define the data rule: decide what the agent may and may not process.
Start read-only: avoid write access during the first pilot.
Add human review: require approval before outputs are used externally.
Log actions: keep enough detail to review the pilot.
Review after two weeks: check usefulness, errors, data concerns and permission scope.
Expand gradually: add tools only when the previous stage is stable.

Pair this rollout with the wider guidance in our post on using ChatGPT and AI tools more safely at work. Staff need to understand both ordinary AI use and agent-specific risks.

Questions leaders should ask before approving an AI agent

Before a team enables an agent, leadership should ask practical questions. These are not only technical. They are governance questions.

What business task will this agent perform?
What systems, files or databases can it access?
Can it make changes, or only suggest them?
What personal or confidential data might it process?
What human approval is required before side effects?
What happens if the agent follows a malicious instruction?
How can we disable it quickly?
Who owns its settings and reviews?
Where are logs stored?
How will staff report unexpected behaviour?

What should never be automated without strong controls?

Some actions deserve special care. A small team should avoid autonomous execution for high-impact workflows until it has mature controls.

Approving or sending payments.
Changing supplier bank details.
Deleting files, records or backups.
Making production infrastructure changes.
Sending external legal, financial or security advice.
Changing access permissions or user roles.
Exporting customer, employee or health data.
Publishing content without review.

These workflows are not impossible to support with AI. The point is that the agent should assist, not silently decide. Use it to prepare information, identify anomalies or draft the next step. Keep human approval for the action.

How this connects to privacy and incidents

AI agent safety is also privacy work. Agents may read personal data, generate summaries, store prompts or send data to suppliers. Review this against your wider personal data sharing rules. If the agent processes customer or employee data, the business needs to know why, where, for how long and under which controls.

It is also incident response work. If an agent behaves unexpectedly, leaks information or performs a risky action, the team needs a response path. Add agent misuse or agent compromise to your cyber incident preparation plan. Include who can disable the agent, revoke tokens, review logs and notify affected parties if needed.

AI agent safety checklist

Use this as a quick review before launching or expanding an agent.

The use case is clearly defined.
The agent has an owner.
Permissions are limited to the task.
Sensitive data rules are written down.
Untrusted content cannot trigger actions automatically.
Human approval is required for meaningful side effects.
Tools, skills, plugins and connectors are reviewed.
Actions are logged without unnecessary sensitive data.
The agent can be disabled quickly.
Staff know how to report unexpected behaviour.

Frequently asked questions

Are AI agents safe enough for small businesses?

Yes, if the use case is narrow and the permissions are controlled. Small businesses should start with read-only or draft-only workflows, then expand carefully. Avoid giving agents broad access to email, finance, customer systems or production tools during early pilots.

Should an AI agent be allowed to send emails?

Usually not without human review. Drafting emails can be useful. Sending emails introduces risk because messages can expose data, mislead customers or be manipulated by prompt injection. Start with draft-only mode where possible.

Can prompt injection be completely solved?

Teams should not assume it can be eliminated. Treat untrusted content as potentially hostile, limit tool access, separate data from instructions where possible and require approval before risky actions.

What is the safest first AI agent use case?

A good first use case is low-risk summarisation of approved internal or public material. Avoid personal data, production systems and external actions until the team has tested its process.

Final recommendation

AI agents can make work faster, but only if their power is deliberately limited. Start with a narrow use case, read-only permissions, clear data rules, human approval and basic logging. Then expand gradually. If the agent can act, the business must be able to answer who approved it, what it changed and how to stop it.

For most teams, the safest policy is not to reject agents entirely. It is to make them useful inside clear boundaries. Combine the agent controls in this guide with your everyday cybersecurity habits, your AI governance work and your risk register. That is how AI agents become a controlled workplace tool rather than an invisible source of new risk.