Security
The agent system implements an 8-layer defense-in-depth architecture. Each layer addresses a distinct threat vector, and they operate independently so that a bypass of any single layer does not compromise the system.
Defense-in-Depth Overview
1
Project Isolation
Blast radius containment
Separate GCP projects for platform and executor
2
Prompt Scanning
Malicious prompt injection
Regex-based detection of infrastructure abuse and red flags
3
URL Whitelist
Unauthorized outbound access
Default-deny domain validation against integration registry
4
DNS Pinning
TOCTOU DNS rebinding
Resolve-validate-rewrite with Host header and SNI preservation
5
Private IP Blocking
SSRF to internal infrastructure
Block RFC 1918, link-local, loopback, and IPv6 private ranges
6
Credential Exfiltration Prevention
Secret leakage via outbound data
Scan body/headers/URL for plaintext, base64, and URL-encoded secrets
7
Rate Limiting + Budget Caps
Resource exhaustion, runaway costs
Per-execution iteration, credit, timeout, and API call caps
8
Content Moderation + Kill Switch
Harmful content, emergency revocation
Regex content scanning + admin integration deprecation
Layer 1: Project Isolation
The agent executor runs in a separate GCP project (inference-agents) from the main platform (inference-platform). This provides a hard IAM and network boundary between the system that serves user requests and the system that executes autonomous agent actions with external API credentials.
Isolation Boundaries
inference-platform (GCP project) inference-agents (GCP project)
+----------------------------------+ +----------------------------------+
| Cloud Run: inference-api | | Cloud Run: agent-executor |
| Firestore (user data, billing) | | KMS decrypt keys |
| Redis (sessions, API key cache) | | Cloud Tasks queue |
| Stripe integration | | Isolated networking |
| KMS encrypt keys (bridge SA) | | OpenTelemetry tracing |
+----------------------------------+ +----------------------------------+
| |
| OIDC-authenticated calls |
+<---------------------------------------->+What Isolation Provides
The executor has no access to the main platform's Firestore, Redis, or other resources except via explicit API calls to the main API's
/internal/agent/codeendpoint.The main platform has no access to the executor's KMS decrypt keys.
Billing, quotas, and audit logs are isolated per project.
A compromised executor cannot access user data, billing systems, or platform infrastructure.
A compromised platform cannot decrypt agent credentials.
Service-to-Service Authentication
The executor authenticates to the main API using Google OIDC tokens (not user API keys). The user_id is passed in the request body so credits are charged to the correct user account:
KMS Split Trust
Credential encryption keys are split across projects:
See Credentials for the full encryption lifecycle.
Layer 2: Prompt Scanning (Injection Detection)
Before any LLM call, the task's system_prompt and task_prompt are scanned by a regex-based security scanner. The scanner detects two categories of threats.
Infrastructure Abuse Patterns
Patterns that indicate attempts to use the agent for infrastructure provisioning, cryptocurrency mining, or network attacks:
GCP provisioning
gcloud compute, create vm, deploy cluster, cloudfunctions.googleapis.com
AWS provisioning
aws ec2, boto3, amazonaws.com, CloudFormation
Azure provisioning
az vm, management.azure.com, azure container
Infrastructure-as-code
terraform apply, pulumi up, cloudformation
Cryptocurrency mining
mine bitcoin, crypto miner, ethminer
Network attacks
reverse shell, bind shell, c2 server, port scan, nmap
General Red Flags
Patterns that indicate malicious intent or social engineering:
Data exfiltration
exfiltrate, steal credentials, send all data to
Safety override
ignore previous instructions, bypass security, disregard system prompt
Abuse
ddos, brute force, spam email, bulk send
Harvesting
scrape email, harvest accounts, collect credentials
Scanning Behavior
Scanning runs at task creation, task update (when prompt fields change), and before each execution.
Both
system_promptandtask_promptare concatenated and scanned together.Pattern matching is case-insensitive.
A flagged prompt results in immediate rejection with a
content_moderationerror status.The specific matched pattern is included in the error for debugging.
Layer 3: URL Whitelist (Default-Deny)
All outbound HTTP requests from tools must target a domain registered in the platform's integration registry. Unregistered domains are blocked before any network connection is established.
Validation Flow
Domain Matching Rules
Exact
api.coingecko.com
api.coingecko.com
Subdomain
slack.com
slack.com, api.slack.com, hooks.slack.com
Wildcard prefix
*.api.mailchimp.com
us1.api.mailchimp.com, us21.api.mailchimp.com
Additional Constraints
HTTPS only -- all URLs must use the
https://scheme. Plain HTTP is rejected regardless of domain.Webhook exception -- the
webhooktool validates against the task'swebhook_urlslist instead of the integration registry.Registry caching -- the integration registry is loaded from Firestore into memory on startup and refreshed hourly.
See Integrations for the full list of approved domains.
Layer 4: DNS Pinning (TOCTOU Prevention)
Standard DNS resolution is vulnerable to time-of-check-time-of-use (TOCTOU) attacks where the DNS response changes between validation and request transmission. The agent executor prevents this with DNS pinning.
The Attack
Without DNS pinning, an attacker could:
Register a domain that resolves to a public IP during the URL whitelist check.
Change the DNS record to a private IP (e.g.,
169.254.169.254for GCP metadata) between validation and connection.The HTTP client connects to the private IP, bypassing the private IP block.
The Defense
DNS pinning resolves the hostname once, validates the IP, and rewrites the URL to connect directly to the validated IP:
The Host header and TLS SNI (Server Name Indication) are set to the original hostname so that:
The destination server can route the request correctly (virtual hosting).
TLS certificate validation succeeds (the certificate matches the hostname, not the IP).
Implementation
The _pin_url_to_resolved_ip() function in tools.py performs the resolve-validate-rewrite operation. It returns a tuple of (pinned_url, original_hostname, resolved_ip) or an error. Every tool handler calls this function before making any HTTP request.
DNS resolution uses socket.getaddrinfo() which respects system DNS configuration. IPv6 addresses are supported and wrapped in brackets in the URL (e.g., https://[2001:db8::1]/path).
Layer 5: Private IP Blocking
After DNS resolution, the resolved IP address is checked against private and reserved ranges. This prevents SSRF attacks where a public domain resolves to an internal IP, or where an attacker attempts to reach internal infrastructure.
Blocked Ranges
10.0.0.0/8
RFC 1918 private
Internal VPC, databases, services
172.16.0.0/12
RFC 1918 private
Internal VPC, Docker networks
192.168.0.0/16
RFC 1918 private
Internal VPC, local networks
100.64.0.0/10
Carrier-grade NAT (RFC 6598)
Shared address space
169.254.0.0/16
Link-local
GCP metadata server (169.254.169.254)
127.0.0.0/8
Loopback
Localhost services
::1/128
IPv6 loopback
Localhost services
fc00::/7
IPv6 unique local address
Internal IPv6 networks
fe80::/10
IPv6 link-local
Local IPv6 segment
Fail-Closed Behavior
If DNS resolution fails entirely (socket.gaierror), the hostname is blocked rather than allowed. This fail-closed approach prevents bypasses through DNS resolution failures or timing attacks.
GCP Metadata Protection
The 169.254.0.0/16 block is particularly important because it prevents access to the GCP metadata server at 169.254.169.254. The metadata server exposes:
Service account tokens (could escalate privileges).
Project metadata and configuration.
Instance identity tokens.
Blocking the entire link-local range ensures no creative hostname tricks can reach the metadata endpoint.
Layer 6: Credential Exfiltration Prevention
Outbound HTTP requests are scanned for credential values before transmission. This prevents scenarios where a prompt injection or adversarial LLM output attempts to exfiltrate secrets by embedding them in request data.
What Is Scanned
The scanner inspects the combined content of:
Request body (JSON-serialized)
Custom headers (JSON-serialized)
URL (including query parameters)
Detection Methods
Each decrypted credential value (from fields api_key, access_token, refresh_token, secret) is checked against the outbound data in three encoded forms:
Plaintext
Direct string match
xoxb-secret-token-123 in body
Base64
Base64-encode the credential, then search
eG94Yi1zZWNyZXQtdG9rZW4tMTIz in body
URL-encoded
URL-encode the credential, then search
xoxb-secret-token%2D123 in URL
Short Value Exclusion
Credential values of 8 characters or fewer are excluded from scanning to avoid false positives on short strings that could appear legitimately in request data.
Blocking Behavior
When a leak is detected:
The request is blocked immediately.
The tool returns an error to the LLM:
Error: Blocked -- Credential X detected in outbound request. This has been logged.The event is logged for security auditing.
The
block_reasonis set tocredential_leakin the tool call result.
Legitimate Injection
The exfiltration scanner does not flag the executor's own credential injection (into the Authorization header or the designated auth header). The scanner only inspects the fields that the LLM controls: body, custom headers, and url from the tool input. The executor's injected auth header is added after the scan passes.
Layer 7: Rate Limiting + Budget Caps
Execution frequency and resource consumption are capped at multiple levels to prevent resource exhaustion, runaway costs, and abuse.
Per-Execution Limits
Max iterations per execution
10
25
Max credits per execution
100
500
Execution timeout
10 minutes
30 minutes
Max API calls per execution
20
50
Per-Minute Limits
API calls per minute
10
20
Per-Tool Limits
Individual request timeout
30 seconds
30 seconds
Max response body size
500 KB
2 MB
Limit Enforcement
Iteration limit -- checked at the top of each agentic loop iteration. Exceeded: status
max_iterations.Credit cap -- checked after each LLM call. Exceeded: status
credit_exhausted.Execution timeout -- enforced via
asyncio.wait_for(). Exceeded: statustimeout.API call count -- checked before each tool invocation via
ExecutionRateState. Exceeded: error returned to LLM.Per-minute rate -- sliding window tracked by
ExecutionRateState. Exceeded: error returned to LLM.Response size -- responses are truncated to the plan's max size before being returned to the LLM.
Budget Protection
Task-level max_iterations and max_credits fields allow users to set limits stricter than their plan maximum. The executor uses min(plan_limit, task_limit) to determine the effective cap:
Billing
Each LLM call within the agentic loop is a separate credit charge, handled by the main API.
Authenticated tool calls (
api_callwith credential) incur a 0.5 credit per-call fee.Execution fees: 5 credits (Guru) or 3 credits (Pro) per execution.
The executor tracks cumulative credits and aborts if the per-execution cap is reached.
Layer 8: Content Moderation + Kill Switch
The final layer provides two complementary controls: automated content moderation and manual emergency intervention.
Content Moderation
Task prompts undergo content moderation at two points:
Write time
Task creation or update (when prompt fields change)
Task rejected, cannot be saved
Execution time
Before each scheduled execution
Execution aborted, status blocked
The moderation pipeline runs in two stages:
Infrastructure abuse scan (Layer 2 patterns) -- detects provisioning, mining, shells.
General red flag scan -- detects exfiltration intent, safety overrides, spam, and harvesting.
Both stages must pass for the content to be approved. The dual timing (write + execute) ensures that:
Malicious tasks are caught at creation time.
Tasks modified via direct Firestore writes (bypassing the UI) are caught at execution time.
Future additions to the pattern list apply retroactively to existing tasks.
Admin Kill Switch
Platform administrators can deprecate any integration at any time by setting its status to deprecated in the platform_integrations collection. This immediately and globally revokes access to that API for all agents across the platform.
Deprecated integrations:
Are immediately removed from the in-memory cache via
deprecate_integration().Cannot be added to new or existing tasks.
Cause active executions to fail if they attempt to reach the deprecated API's domains.
Trigger an
integration_deprecatedwarning in execution records.
Emergency Response Scenarios
The kill switch enables rapid response to:
Third-party API is compromised
Deprecate the integration
All agents immediately lose access to that API
Integration is being abused across accounts
Deprecate the integration
Platform-wide access revocation in seconds
Security vulnerability discovered
Deprecate affected integration(s)
Stops all traffic to affected domains
Regulatory compliance requirement
Deprecate non-compliant integrations
Immediate enforcement
Admin Controls
Observability
The executor emits OpenTelemetry spans for every execution, iteration, LLM call, and tool call. These are exported to Google Cloud Trace for monitoring and debugging:
Key span attributes for security monitoring:
agent.tool_blocked
Whether a tool call was blocked by security
agent.tool_block_reason
Why a tool call was blocked
agent.status
Final execution status
agent.credits_used
Total credits consumed
agent.tool_call_count
Number of tool calls made
Execution Records
Every execution produces a Firestore document in agent_executions/ that includes:
Full tool call log (up to 50 entries) with blocked/allowed status.
Credit usage breakdown.
Error details for failed executions.
Iteration count and duration.
Task Health Tracking
The task document tracks operational health:
consecutive_failures
Incremented on failure, reset on success
last_status
Status of the most recent execution
last_run_at
Timestamp of the most recent execution
total_credits_used
Lifetime credit consumption
total_runs
Lifetime execution count
Threat Model Summary
Agent reaches unauthorized API
Layer 3 (URL whitelist)
SSRF to internal infrastructure
Layer 4 (DNS pinning), Layer 5 (private IP block)
DNS rebinding attack
Layer 4 (DNS pinning)
Credential theft via outbound data
Layer 6 (exfiltration scan), Layer 1 (project isolation)
Prompt injection via API response
Response sanitization (<tool_response> wrapping)
Malicious task prompt
Layer 2 (prompt scanner), Layer 8 (content moderation)
Resource exhaustion / runaway costs
Layer 7 (rate limiting + budget caps)
Compromised third-party API
Layer 8 (admin kill switch)
Credential exposure in storage
KMS encryption with split trust (Layer 1)
Compromised executor service
Layer 1 (project isolation)
GCP metadata access (SSRF)
Layer 5 (169.254.0.0/16 block), Layer 4 (DNS pinning)
Credential encoding tricks
Layer 6 (base64 + URL-encoding detection)
Context window overflow
Response truncation (50K char per tool result)
Last updated
