Security

The agent system implements an 8-layer defense-in-depth architecture. Each layer addresses a distinct threat vector, and they operate independently so that a bypass of any single layer does not compromise the system.

Defense-in-Depth Overview

Layer
Name
Threat Addressed
Implementation

1

Project Isolation

Blast radius containment

Separate GCP projects for platform and executor

2

Prompt Scanning

Malicious prompt injection

Regex-based detection of infrastructure abuse and red flags

3

URL Whitelist

Unauthorized outbound access

Default-deny domain validation against integration registry

4

DNS Pinning

TOCTOU DNS rebinding

Resolve-validate-rewrite with Host header and SNI preservation

5

Private IP Blocking

SSRF to internal infrastructure

Block RFC 1918, link-local, loopback, and IPv6 private ranges

6

Credential Exfiltration Prevention

Secret leakage via outbound data

Scan body/headers/URL for plaintext, base64, and URL-encoded secrets

7

Rate Limiting + Budget Caps

Resource exhaustion, runaway costs

Per-execution iteration, credit, timeout, and API call caps

8

Content Moderation + Kill Switch

Harmful content, emergency revocation

Regex content scanning + admin integration deprecation

Layer 1: Project Isolation

The agent executor runs in a separate GCP project (inference-agents) from the main platform (inference-platform). This provides a hard IAM and network boundary between the system that serves user requests and the system that executes autonomous agent actions with external API credentials.

Isolation Boundaries

inference-platform (GCP project)          inference-agents (GCP project)
+----------------------------------+      +----------------------------------+
| Cloud Run: inference-api         |      | Cloud Run: agent-executor        |
| Firestore (user data, billing)   |      | KMS decrypt keys                 |
| Redis (sessions, API key cache)  |      | Cloud Tasks queue                |
| Stripe integration               |      | Isolated networking              |
| KMS encrypt keys (bridge SA)     |      | OpenTelemetry tracing            |
+----------------------------------+      +----------------------------------+
              |                                          |
              |       OIDC-authenticated calls           |
              +<---------------------------------------->+

What Isolation Provides

  • The executor has no access to the main platform's Firestore, Redis, or other resources except via explicit API calls to the main API's /internal/agent/code endpoint.

  • The main platform has no access to the executor's KMS decrypt keys.

  • Billing, quotas, and audit logs are isolated per project.

  • A compromised executor cannot access user data, billing systems, or platform infrastructure.

  • A compromised platform cannot decrypt agent credentials.

Service-to-Service Authentication

The executor authenticates to the main API using Google OIDC tokens (not user API keys). The user_id is passed in the request body so credits are charged to the correct user account:

KMS Split Trust

Credential encryption keys are split across projects:

Service Account
Project
KMS Permission
Purpose

inference-platform

Encrypt only

Encrypts credentials during storage

inference-agents

Decrypt only

Decrypts credentials during execution

See Credentials for the full encryption lifecycle.

Layer 2: Prompt Scanning (Injection Detection)

Before any LLM call, the task's system_prompt and task_prompt are scanned by a regex-based security scanner. The scanner detects two categories of threats.

Infrastructure Abuse Patterns

Patterns that indicate attempts to use the agent for infrastructure provisioning, cryptocurrency mining, or network attacks:

Pattern Category
Examples Detected

GCP provisioning

gcloud compute, create vm, deploy cluster, cloudfunctions.googleapis.com

AWS provisioning

aws ec2, boto3, amazonaws.com, CloudFormation

Azure provisioning

az vm, management.azure.com, azure container

Infrastructure-as-code

terraform apply, pulumi up, cloudformation

Cryptocurrency mining

mine bitcoin, crypto miner, ethminer

Network attacks

reverse shell, bind shell, c2 server, port scan, nmap

General Red Flags

Patterns that indicate malicious intent or social engineering:

Pattern Category
Examples Detected

Data exfiltration

exfiltrate, steal credentials, send all data to

Safety override

ignore previous instructions, bypass security, disregard system prompt

Abuse

ddos, brute force, spam email, bulk send

Harvesting

scrape email, harvest accounts, collect credentials

Scanning Behavior

  • Scanning runs at task creation, task update (when prompt fields change), and before each execution.

  • Both system_prompt and task_prompt are concatenated and scanned together.

  • Pattern matching is case-insensitive.

  • A flagged prompt results in immediate rejection with a content_moderation error status.

  • The specific matched pattern is included in the error for debugging.

Layer 3: URL Whitelist (Default-Deny)

All outbound HTTP requests from tools must target a domain registered in the platform's integration registry. Unregistered domains are blocked before any network connection is established.

Validation Flow

Domain Matching Rules

Match Type
Pattern
Matches

Exact

api.coingecko.com

api.coingecko.com

Subdomain

slack.com

slack.com, api.slack.com, hooks.slack.com

Wildcard prefix

*.api.mailchimp.com

us1.api.mailchimp.com, us21.api.mailchimp.com

Additional Constraints

  • HTTPS only -- all URLs must use the https:// scheme. Plain HTTP is rejected regardless of domain.

  • Webhook exception -- the webhook tool validates against the task's webhook_urls list instead of the integration registry.

  • Registry caching -- the integration registry is loaded from Firestore into memory on startup and refreshed hourly.

See Integrations for the full list of approved domains.

Layer 4: DNS Pinning (TOCTOU Prevention)

Standard DNS resolution is vulnerable to time-of-check-time-of-use (TOCTOU) attacks where the DNS response changes between validation and request transmission. The agent executor prevents this with DNS pinning.

The Attack

Without DNS pinning, an attacker could:

  1. Register a domain that resolves to a public IP during the URL whitelist check.

  2. Change the DNS record to a private IP (e.g., 169.254.169.254 for GCP metadata) between validation and connection.

  3. The HTTP client connects to the private IP, bypassing the private IP block.

The Defense

DNS pinning resolves the hostname once, validates the IP, and rewrites the URL to connect directly to the validated IP:

The Host header and TLS SNI (Server Name Indication) are set to the original hostname so that:

  • The destination server can route the request correctly (virtual hosting).

  • TLS certificate validation succeeds (the certificate matches the hostname, not the IP).

Implementation

The _pin_url_to_resolved_ip() function in tools.py performs the resolve-validate-rewrite operation. It returns a tuple of (pinned_url, original_hostname, resolved_ip) or an error. Every tool handler calls this function before making any HTTP request.

DNS resolution uses socket.getaddrinfo() which respects system DNS configuration. IPv6 addresses are supported and wrapped in brackets in the URL (e.g., https://[2001:db8::1]/path).

Layer 5: Private IP Blocking

After DNS resolution, the resolved IP address is checked against private and reserved ranges. This prevents SSRF attacks where a public domain resolves to an internal IP, or where an attacker attempts to reach internal infrastructure.

Blocked Ranges

Range
Type
Threat

10.0.0.0/8

RFC 1918 private

Internal VPC, databases, services

172.16.0.0/12

RFC 1918 private

Internal VPC, Docker networks

192.168.0.0/16

RFC 1918 private

Internal VPC, local networks

100.64.0.0/10

Carrier-grade NAT (RFC 6598)

Shared address space

169.254.0.0/16

Link-local

GCP metadata server (169.254.169.254)

127.0.0.0/8

Loopback

Localhost services

::1/128

IPv6 loopback

Localhost services

fc00::/7

IPv6 unique local address

Internal IPv6 networks

fe80::/10

IPv6 link-local

Local IPv6 segment

Fail-Closed Behavior

If DNS resolution fails entirely (socket.gaierror), the hostname is blocked rather than allowed. This fail-closed approach prevents bypasses through DNS resolution failures or timing attacks.

GCP Metadata Protection

The 169.254.0.0/16 block is particularly important because it prevents access to the GCP metadata server at 169.254.169.254. The metadata server exposes:

  • Service account tokens (could escalate privileges).

  • Project metadata and configuration.

  • Instance identity tokens.

Blocking the entire link-local range ensures no creative hostname tricks can reach the metadata endpoint.

Layer 6: Credential Exfiltration Prevention

Outbound HTTP requests are scanned for credential values before transmission. This prevents scenarios where a prompt injection or adversarial LLM output attempts to exfiltrate secrets by embedding them in request data.

What Is Scanned

The scanner inspects the combined content of:

  • Request body (JSON-serialized)

  • Custom headers (JSON-serialized)

  • URL (including query parameters)

Detection Methods

Each decrypted credential value (from fields api_key, access_token, refresh_token, secret) is checked against the outbound data in three encoded forms:

Encoding
Detection Method
Catches

Plaintext

Direct string match

xoxb-secret-token-123 in body

Base64

Base64-encode the credential, then search

eG94Yi1zZWNyZXQtdG9rZW4tMTIz in body

URL-encoded

URL-encode the credential, then search

xoxb-secret-token%2D123 in URL

Short Value Exclusion

Credential values of 8 characters or fewer are excluded from scanning to avoid false positives on short strings that could appear legitimately in request data.

Blocking Behavior

When a leak is detected:

  • The request is blocked immediately.

  • The tool returns an error to the LLM: Error: Blocked -- Credential X detected in outbound request. This has been logged.

  • The event is logged for security auditing.

  • The block_reason is set to credential_leak in the tool call result.

Legitimate Injection

The exfiltration scanner does not flag the executor's own credential injection (into the Authorization header or the designated auth header). The scanner only inspects the fields that the LLM controls: body, custom headers, and url from the tool input. The executor's injected auth header is added after the scan passes.

Layer 7: Rate Limiting + Budget Caps

Execution frequency and resource consumption are capped at multiple levels to prevent resource exhaustion, runaway costs, and abuse.

Per-Execution Limits

Limit
Guru ($10/mo)
Pro ($50/mo)

Max iterations per execution

10

25

Max credits per execution

100

500

Execution timeout

10 minutes

30 minutes

Max API calls per execution

20

50

Per-Minute Limits

Limit
Guru
Pro

API calls per minute

10

20

Per-Tool Limits

Limit
Guru
Pro

Individual request timeout

30 seconds

30 seconds

Max response body size

500 KB

2 MB

Limit Enforcement

  • Iteration limit -- checked at the top of each agentic loop iteration. Exceeded: status max_iterations.

  • Credit cap -- checked after each LLM call. Exceeded: status credit_exhausted.

  • Execution timeout -- enforced via asyncio.wait_for(). Exceeded: status timeout.

  • API call count -- checked before each tool invocation via ExecutionRateState. Exceeded: error returned to LLM.

  • Per-minute rate -- sliding window tracked by ExecutionRateState. Exceeded: error returned to LLM.

  • Response size -- responses are truncated to the plan's max size before being returned to the LLM.

Budget Protection

Task-level max_iterations and max_credits fields allow users to set limits stricter than their plan maximum. The executor uses min(plan_limit, task_limit) to determine the effective cap:

Billing

  • Each LLM call within the agentic loop is a separate credit charge, handled by the main API.

  • Authenticated tool calls (api_call with credential) incur a 0.5 credit per-call fee.

  • Execution fees: 5 credits (Guru) or 3 credits (Pro) per execution.

  • The executor tracks cumulative credits and aborts if the per-execution cap is reached.

Layer 8: Content Moderation + Kill Switch

The final layer provides two complementary controls: automated content moderation and manual emergency intervention.

Content Moderation

Task prompts undergo content moderation at two points:

Timing
Trigger
Action on Flag

Write time

Task creation or update (when prompt fields change)

Task rejected, cannot be saved

Execution time

Before each scheduled execution

Execution aborted, status blocked

The moderation pipeline runs in two stages:

  1. Infrastructure abuse scan (Layer 2 patterns) -- detects provisioning, mining, shells.

  2. General red flag scan -- detects exfiltration intent, safety overrides, spam, and harvesting.

Both stages must pass for the content to be approved. The dual timing (write + execute) ensures that:

  • Malicious tasks are caught at creation time.

  • Tasks modified via direct Firestore writes (bypassing the UI) are caught at execution time.

  • Future additions to the pattern list apply retroactively to existing tasks.

Admin Kill Switch

Platform administrators can deprecate any integration at any time by setting its status to deprecated in the platform_integrations collection. This immediately and globally revokes access to that API for all agents across the platform.

Deprecated integrations:

  • Are immediately removed from the in-memory cache via deprecate_integration().

  • Cannot be added to new or existing tasks.

  • Cause active executions to fail if they attempt to reach the deprecated API's domains.

  • Trigger an integration_deprecated warning in execution records.

Emergency Response Scenarios

The kill switch enables rapid response to:

Scenario
Admin Action
Effect

Third-party API is compromised

Deprecate the integration

All agents immediately lose access to that API

Integration is being abused across accounts

Deprecate the integration

Platform-wide access revocation in seconds

Security vulnerability discovered

Deprecate affected integration(s)

Stops all traffic to affected domains

Regulatory compliance requirement

Deprecate non-compliant integrations

Immediate enforcement

Admin Controls

Observability

The executor emits OpenTelemetry spans for every execution, iteration, LLM call, and tool call. These are exported to Google Cloud Trace for monitoring and debugging:

Key span attributes for security monitoring:

Attribute
Purpose

agent.tool_blocked

Whether a tool call was blocked by security

agent.tool_block_reason

Why a tool call was blocked

agent.status

Final execution status

agent.credits_used

Total credits consumed

agent.tool_call_count

Number of tool calls made

Execution Records

Every execution produces a Firestore document in agent_executions/ that includes:

  • Full tool call log (up to 50 entries) with blocked/allowed status.

  • Credit usage breakdown.

  • Error details for failed executions.

  • Iteration count and duration.

Task Health Tracking

The task document tracks operational health:

Field
Purpose

consecutive_failures

Incremented on failure, reset on success

last_status

Status of the most recent execution

last_run_at

Timestamp of the most recent execution

total_credits_used

Lifetime credit consumption

total_runs

Lifetime execution count

Threat Model Summary

Threat
Mitigating Layers

Agent reaches unauthorized API

Layer 3 (URL whitelist)

SSRF to internal infrastructure

Layer 4 (DNS pinning), Layer 5 (private IP block)

DNS rebinding attack

Layer 4 (DNS pinning)

Credential theft via outbound data

Layer 6 (exfiltration scan), Layer 1 (project isolation)

Prompt injection via API response

Response sanitization (<tool_response> wrapping)

Malicious task prompt

Layer 2 (prompt scanner), Layer 8 (content moderation)

Resource exhaustion / runaway costs

Layer 7 (rate limiting + budget caps)

Compromised third-party API

Layer 8 (admin kill switch)

Credential exposure in storage

KMS encryption with split trust (Layer 1)

Compromised executor service

Layer 1 (project isolation)

GCP metadata access (SSRF)

Layer 5 (169.254.0.0/16 block), Layer 4 (DNS pinning)

Credential encoding tricks

Layer 6 (base64 + URL-encoding detection)

Context window overflow

Response truncation (50K char per tool result)

Last updated