Security
The Majestix AI Inference Hub implements defense-in-depth security across both GCP projects, covering API authentication, agent execution isolation, credential management, and data protection.
Authentication Layers
The platform supports three authentication methods, selected automatically based on the request context.
Method 1: API Key
Clients: VSCode extension, CLI tools, SDK integrations, any programmatic access.
X-Api-Key: inf_<url_safe_base64>API key authentication is attempted first when the X-Api-Key header is present.
Format
inf_ prefix + URL-safe base64 random bytes
Storage
SHA-256 hash only. Raw key never stored.
Verification
Hash key, check Redis cache (15-min TTL), fallback to Firestore on cache miss.
Revocation
Delete from Firestore + invalidate Redis cache entry immediately.
Expiry
90 days by default (configurable).
Rate limiting
Per-user, not per-key.
Security properties:
Keys are SHA-256 hashed before storage -- a database breach does not expose usable keys.
Failed lookups (key not found) are not cached to prevent cache poisoning, where an attacker could force a "not found" entry into cache for a valid key.
The plaintext key is shown to the user exactly once at creation time and is never retrievable again.
Revoking a key both deletes the Firestore document and evicts the Redis cache entry, ensuring immediate effect.
Method 2: Firebase Auth + App Check (Web)
Clients: Browser-based web applications only.
Both headers are required. The ID token authenticates the user; the App Check token verifies the request originates from a legitimate web app instance.
ID Token
Verified via Firebase Admin SDK. Contains uid, email, and custom claims.
App Check
reCAPTCHA Enterprise provider. Validates the client is a real browser, not a script or bot.
Verification
App Check token verified first, then ID token. Both must pass.
Method 3: OIDC (Service-to-Service)
Clients: Agent executor, Cloud Tasks, internal services.
OIDC authentication is used exclusively for service-to-service communication. It is not available to end users.
Token
Google-signed OIDC token with audience matching the target service URL.
Verification
Token signature validated against Google's public key set.
Service account allowlist
Only authorized SAs accepted (configured in INTERNAL_ALLOWED_SAS).
User context
user_id passed in the request body to charge credits to the correct user.
Rate limiting
OIDC-authenticated services are exempt from IP rate limits.
The verify_user() function in app/auth/firebase_user.py implements the dual-path logic for API key and Firebase auth. OIDC verification is handled separately by verify_internal_caller().
Rate Limiting
Per-IP Rate Limiting
All non-OIDC requests are subject to per-IP rate limiting. The limit is configurable via RATE_LIMIT_PER_MINUTE (default: 30 requests/minute). Rate limit state is stored in Redis for sub-millisecond enforcement.
Per-User Concurrent Request Limiting
Each user has a maximum number of concurrent in-flight requests. This prevents a single user from monopolizing server capacity.
IP Lockout
Repeated authentication failures trigger an IP-level lockout:
10 failed auth attempts in 5 minutes
15-minute IP block
The lockout applies to all authentication methods. After the lockout period, the IP is automatically unblocked. Lockout state is stored in Redis.
This mechanism protects against:
Brute-force API key guessing
Credential stuffing attacks against Firebase Auth
Automated scanning of the API surface
App Check (reCAPTCHA Enterprise)
Web clients are required to include a Firebase App Check token on every request. App Check uses reCAPTCHA Enterprise as its attestation provider.
App Check prevents:
Automated scripts calling the API with stolen Firebase credentials
API abuse from unauthorized web applications
Token replay from non-browser environments
CORS configuration reinforces this by allowing requests only from the two Firebase Hosting origins (inference-web.web.app and inference-web2.web.app).
Agent Security (8-Layer Defense-in-Depth)
The agent executor implements an 8-layer defense model to prevent prompt injection, credential exfiltration, infrastructure abuse, and runaway costs. All security logic resides in security.py in the agent executor.
For detailed agent-specific security documentation, see Agent Security.
Layer 0 -- GCP Project Isolation
The agent executor runs in inference-agents, a completely separate GCP project from inference-platform. This provides:
Separate IAM policies and service accounts
Independent audit logs (Cloud Audit Logs)
No access to Redis Memorystore, billing data, or provider API keys
Blast radius containment if an agent is tricked into malicious behavior
Layer 0b -- Prompt Security Scanner
Before any LLM call, the executor scans task prompts for patterns indicating abuse:
Cloud infrastructure provisioning (
terraform,gcloud,aws,az)Cryptocurrency mining instructions
Reverse shell commands
Data exfiltration instructions
Attempts to override safety preambles or system instructions
References to cloud metadata endpoints (
169.254.169.254)
Flagged prompts are rejected with status blocked and reason content_moderation. No LLM call is made.
Layer 1 -- URL Whitelist (Default-Deny)
All outbound HTTP requests from agent tools must match an approved domain pattern in platform_integrations. The whitelist uses scheme + host + path prefix matching.
URLs not matching any approved pattern are rejected
The whitelist is loaded from Firestore and refreshed hourly
Agents cannot modify the whitelist via prompt injection
Layer 2 -- Private IP Blocking
After DNS resolution, the resolved IP address is checked against blocked ranges:
10.0.0.0/8
RFC 1918 private (Class A)
172.16.0.0/12
RFC 1918 private (Class B)
192.168.0.0/16
RFC 1918 private (Class C)
169.254.0.0/16
Link-local (includes cloud metadata at 169.254.169.254)
127.0.0.0/8
Loopback
fc00::/7
IPv6 Unique Local Address
fe80::/10
IPv6 Link-Local
This prevents SSRF attacks where an LLM is tricked into making requests to internal infrastructure, cloud metadata endpoints, or other services on the private network.
Layer 3 -- Credential Exfiltration Prevention
Every outbound HTTP request is scanned for credential material in three encodings:
Plaintext -- raw credential strings in headers or body
Base64-encoded -- base64 encoding of credential strings
URL-encoded -- percent-encoded credential strings
If any credential material is detected in outbound traffic, the request is blocked. This prevents prompt injection attacks that attempt to exfiltrate decrypted credentials by encoding them in API call bodies.
Layer 4 -- Rate Limiting
Per-plan enforcement during agent execution:
Max iterations per execution
10
25
Max credits per execution
100
500
Execution timeout
10 min
30 min
API calls per execution
20
50
API calls per minute
10
20
Max response size
500 KB
2 MB
Layer 5 -- Response Sanitization
External API responses are wrapped in <tool_response> XML tags with a safety preamble that instructs the LLM to treat the content as untrusted data. Internal details (service account names, internal IPs, KMS key paths) are stripped from all responses.
Layer 6 -- Content Moderation
Regex-based pattern detection runs on task creation and update. This catches abuse before execution:
Infrastructure provisioning commands
Mining instructions
Credential theft patterns
Safety override attempts
Layer 7 -- Admin Kill Switch
Setting a platform_integration status to deprecated immediately revokes all agent access to that service's domains. The integration registry is refreshed hourly from Firestore and on executor startup.
Credential Encryption (Cloud KMS)
External service credentials (third-party API keys, OAuth tokens) are encrypted at rest using Cloud KMS with a split-trust service account model.
Split Trust
Bridge SA can encrypt but cannot decrypt
Executor SA can decrypt but cannot encrypt
Credentials are never stored in plaintext in Firestore
Decrypted credentials are held in memory only for the duration of the HTTP request
Credential leak scanning (Layer 3) prevents exfiltration even if the LLM is compromised
DNS Pinning
All outbound HTTP requests from agent tools use DNS pinning to prevent TOCTOU (Time-of-Check-Time-of-Use) attacks:
This prevents DNS rebinding attacks where:
A hostname resolves to a whitelisted public IP at validation time
DNS is changed to point to
169.254.169.254(cloud metadata) or127.0.0.1(loopback) before the connection is established
By pinning the IP at resolution time and rewriting the URL, the connection always goes to the exact IP that was validated.
Session Encryption (Fernet)
Conversation sessions stored in Redis are encrypted using Fernet symmetric encryption (from the cryptography library). Fernet provides authenticated encryption: data is both encrypted (AES-128-CBC) and authenticated (HMAC-SHA256).
Algorithm
AES-128-CBC + HMAC-SHA256 (Fernet)
Key management
REDIS_ENCRYPTION_KEY environment variable
Scope
Session metadata and message history
Key format
session:{id}:meta and session:{id}:messages
TTL
Configurable (default 1 hour)
Data flow:
Encryption ensures that even if Redis is compromised (e.g., via a vulnerability in the VPC connector), conversation content remains protected.
Security Headers
All API responses include hardened security headers:
Strict-Transport-Security
max-age=31536000; includeSubDomains
Enforce HTTPS
Content-Security-Policy
Restrictive policy
Prevent XSS and data injection
X-Frame-Options
DENY
Prevent clickjacking
X-Content-Type-Options
nosniff
Prevent MIME sniffing
X-XSS-Protection
1; mode=block
Legacy XSS filter
CORS
CORS is configured to allow requests only from the two Firebase Hosting origins:
https://inference-web.web.apphttps://inference-web.firebaseapp.comhttps://inference-web2.web.apphttps://inference-web2.firebaseapp.com
Preflight requests are cached. API key-authenticated requests bypass CORS since they originate from non-browser clients.
Error Sanitization
All error responses across both services are sanitized to prevent information leakage:
Service account emails
[REDACTED_SA]
Stack traces
Generic error code + correlation ID
Firestore document paths
Opaque error reference
KMS key names and versions
[REDACTED_KEY]
Internal IP addresses
[REDACTED_IP]
Environment variable names
[REDACTED_ENV]
Provider API keys in errors
[REDACTED_CREDENTIAL]
Clients receive a correlation ID with every error response. Support teams use this ID to look up the full, unsanitized error in Cloud Logging.
Data Protection Summary
Firestore
Google-managed
TLS
IAM per project
Redis Memorystore
Fernet (application-level)
In-transit encryption
VPC connector (private)
Cloud KMS
Customer-managed keys
TLS
Split SA (encrypt/decrypt)
BigQuery
Google-managed
TLS
IAM + service account
Artifact Registry
Google-managed
TLS
IAM per project
Compliance Summary
Authentication
3 methods: API key (SHA-256), Firebase App Check (reCAPTCHA Enterprise), OIDC
Authorization
Firestore role-based (user, admin)
Transport encryption
HTTPS enforced (HSTS)
Data at rest encryption
Google-managed (Firestore, BigQuery) + Fernet (Redis) + KMS (credentials)
Credential storage
SHA-256 hashed API keys, KMS-encrypted service credentials
Rate limiting
Per-IP with progressive lockout (10 failures / 5 min = 15 min block)
Input validation
Prompt scanning, URL whitelist, DNS pinning, private IP blocking
Output sanitization
Credential stripping, stack trace removal, IP redaction
Audit logging
Cloud Logging (structured JSON) + BigQuery (analytics)
Project isolation
Separate GCP projects for API and executor
Supply chain
--require-hashes for all dependencies
Related
Architecture Overview -- system design and service topology
Infrastructure -- GCP services and deployment pipeline
Authentication -- client-facing auth documentation
Agent Security -- detailed agent security documentation
Orchestration -- orchestration-specific security context
Last updated
