Security

The Majestix AI Inference Hub implements defense-in-depth security across both GCP projects, covering API authentication, agent execution isolation, credential management, and data protection.


Authentication Layers

The platform supports three authentication methods, selected automatically based on the request context.

Method 1: API Key

Clients: VSCode extension, CLI tools, SDK integrations, any programmatic access.

X-Api-Key: inf_<url_safe_base64>

API key authentication is attempted first when the X-Api-Key header is present.

Property
Detail

Format

inf_ prefix + URL-safe base64 random bytes

Storage

SHA-256 hash only. Raw key never stored.

Verification

Hash key, check Redis cache (15-min TTL), fallback to Firestore on cache miss.

Revocation

Delete from Firestore + invalidate Redis cache entry immediately.

Expiry

90 days by default (configurable).

Rate limiting

Per-user, not per-key.

Security properties:

  • Keys are SHA-256 hashed before storage -- a database breach does not expose usable keys.

  • Failed lookups (key not found) are not cached to prevent cache poisoning, where an attacker could force a "not found" entry into cache for a valid key.

  • The plaintext key is shown to the user exactly once at creation time and is never retrievable again.

  • Revoking a key both deletes the Firestore document and evicts the Redis cache entry, ensuring immediate effect.

Method 2: Firebase Auth + App Check (Web)

Clients: Browser-based web applications only.

Both headers are required. The ID token authenticates the user; the App Check token verifies the request originates from a legitimate web app instance.

Property
Detail

ID Token

Verified via Firebase Admin SDK. Contains uid, email, and custom claims.

App Check

reCAPTCHA Enterprise provider. Validates the client is a real browser, not a script or bot.

Verification

App Check token verified first, then ID token. Both must pass.

Method 3: OIDC (Service-to-Service)

Clients: Agent executor, Cloud Tasks, internal services.

OIDC authentication is used exclusively for service-to-service communication. It is not available to end users.

Property
Detail

Token

Google-signed OIDC token with audience matching the target service URL.

Verification

Token signature validated against Google's public key set.

Service account allowlist

Only authorized SAs accepted (configured in INTERNAL_ALLOWED_SAS).

User context

user_id passed in the request body to charge credits to the correct user.

Rate limiting

OIDC-authenticated services are exempt from IP rate limits.

The verify_user() function in app/auth/firebase_user.py implements the dual-path logic for API key and Firebase auth. OIDC verification is handled separately by verify_internal_caller().


Rate Limiting

Per-IP Rate Limiting

All non-OIDC requests are subject to per-IP rate limiting. The limit is configurable via RATE_LIMIT_PER_MINUTE (default: 30 requests/minute). Rate limit state is stored in Redis for sub-millisecond enforcement.

Per-User Concurrent Request Limiting

Each user has a maximum number of concurrent in-flight requests. This prevents a single user from monopolizing server capacity.

IP Lockout

Repeated authentication failures trigger an IP-level lockout:

Threshold
Lockout

10 failed auth attempts in 5 minutes

15-minute IP block

The lockout applies to all authentication methods. After the lockout period, the IP is automatically unblocked. Lockout state is stored in Redis.

This mechanism protects against:

  • Brute-force API key guessing

  • Credential stuffing attacks against Firebase Auth

  • Automated scanning of the API surface


App Check (reCAPTCHA Enterprise)

Web clients are required to include a Firebase App Check token on every request. App Check uses reCAPTCHA Enterprise as its attestation provider.

App Check prevents:

  • Automated scripts calling the API with stolen Firebase credentials

  • API abuse from unauthorized web applications

  • Token replay from non-browser environments

CORS configuration reinforces this by allowing requests only from the two Firebase Hosting origins (inference-web.web.app and inference-web2.web.app).


Agent Security (8-Layer Defense-in-Depth)

The agent executor implements an 8-layer defense model to prevent prompt injection, credential exfiltration, infrastructure abuse, and runaway costs. All security logic resides in security.py in the agent executor.

For detailed agent-specific security documentation, see Agent Security.

Layer 0 -- GCP Project Isolation

The agent executor runs in inference-agents, a completely separate GCP project from inference-platform. This provides:

  • Separate IAM policies and service accounts

  • Independent audit logs (Cloud Audit Logs)

  • No access to Redis Memorystore, billing data, or provider API keys

  • Blast radius containment if an agent is tricked into malicious behavior

Layer 0b -- Prompt Security Scanner

Before any LLM call, the executor scans task prompts for patterns indicating abuse:

  • Cloud infrastructure provisioning (terraform, gcloud, aws, az)

  • Cryptocurrency mining instructions

  • Reverse shell commands

  • Data exfiltration instructions

  • Attempts to override safety preambles or system instructions

  • References to cloud metadata endpoints (169.254.169.254)

Flagged prompts are rejected with status blocked and reason content_moderation. No LLM call is made.

Layer 1 -- URL Whitelist (Default-Deny)

All outbound HTTP requests from agent tools must match an approved domain pattern in platform_integrations. The whitelist uses scheme + host + path prefix matching.

  • URLs not matching any approved pattern are rejected

  • The whitelist is loaded from Firestore and refreshed hourly

  • Agents cannot modify the whitelist via prompt injection

Layer 2 -- Private IP Blocking

After DNS resolution, the resolved IP address is checked against blocked ranges:

Range
Description

10.0.0.0/8

RFC 1918 private (Class A)

172.16.0.0/12

RFC 1918 private (Class B)

192.168.0.0/16

RFC 1918 private (Class C)

169.254.0.0/16

Link-local (includes cloud metadata at 169.254.169.254)

127.0.0.0/8

Loopback

fc00::/7

IPv6 Unique Local Address

fe80::/10

IPv6 Link-Local

This prevents SSRF attacks where an LLM is tricked into making requests to internal infrastructure, cloud metadata endpoints, or other services on the private network.

Layer 3 -- Credential Exfiltration Prevention

Every outbound HTTP request is scanned for credential material in three encodings:

  1. Plaintext -- raw credential strings in headers or body

  2. Base64-encoded -- base64 encoding of credential strings

  3. URL-encoded -- percent-encoded credential strings

If any credential material is detected in outbound traffic, the request is blocked. This prevents prompt injection attacks that attempt to exfiltrate decrypted credentials by encoding them in API call bodies.

Layer 4 -- Rate Limiting

Per-plan enforcement during agent execution:

Limit
Guru
Pro

Max iterations per execution

10

25

Max credits per execution

100

500

Execution timeout

10 min

30 min

API calls per execution

20

50

API calls per minute

10

20

Max response size

500 KB

2 MB

Layer 5 -- Response Sanitization

External API responses are wrapped in <tool_response> XML tags with a safety preamble that instructs the LLM to treat the content as untrusted data. Internal details (service account names, internal IPs, KMS key paths) are stripped from all responses.

Layer 6 -- Content Moderation

Regex-based pattern detection runs on task creation and update. This catches abuse before execution:

  • Infrastructure provisioning commands

  • Mining instructions

  • Credential theft patterns

  • Safety override attempts

Layer 7 -- Admin Kill Switch

Setting a platform_integration status to deprecated immediately revokes all agent access to that service's domains. The integration registry is refreshed hourly from Firestore and on executor startup.


Credential Encryption (Cloud KMS)

External service credentials (third-party API keys, OAuth tokens) are encrypted at rest using Cloud KMS with a split-trust service account model.

Split Trust

Role
Service Account
Permission
Project

Bridge (encrypt)

cloudkms.cryptoKeyVersions.useToEncrypt

inference-platform

Executor (decrypt)

cloudkms.cryptoKeyVersions.useToDecrypt

inference-agents

  • Bridge SA can encrypt but cannot decrypt

  • Executor SA can decrypt but cannot encrypt

  • Credentials are never stored in plaintext in Firestore

  • Decrypted credentials are held in memory only for the duration of the HTTP request

  • Credential leak scanning (Layer 3) prevents exfiltration even if the LLM is compromised


DNS Pinning

All outbound HTTP requests from agent tools use DNS pinning to prevent TOCTOU (Time-of-Check-Time-of-Use) attacks:

This prevents DNS rebinding attacks where:

  1. A hostname resolves to a whitelisted public IP at validation time

  2. DNS is changed to point to 169.254.169.254 (cloud metadata) or 127.0.0.1 (loopback) before the connection is established

By pinning the IP at resolution time and rewriting the URL, the connection always goes to the exact IP that was validated.


Session Encryption (Fernet)

Conversation sessions stored in Redis are encrypted using Fernet symmetric encryption (from the cryptography library). Fernet provides authenticated encryption: data is both encrypted (AES-128-CBC) and authenticated (HMAC-SHA256).

Property
Detail

Algorithm

AES-128-CBC + HMAC-SHA256 (Fernet)

Key management

REDIS_ENCRYPTION_KEY environment variable

Scope

Session metadata and message history

Key format

session:{id}:meta and session:{id}:messages

TTL

Configurable (default 1 hour)

Data flow:

Encryption ensures that even if Redis is compromised (e.g., via a vulnerability in the VPC connector), conversation content remains protected.


Security Headers

All API responses include hardened security headers:

Header
Value
Purpose

Strict-Transport-Security

max-age=31536000; includeSubDomains

Enforce HTTPS

Content-Security-Policy

Restrictive policy

Prevent XSS and data injection

X-Frame-Options

DENY

Prevent clickjacking

X-Content-Type-Options

nosniff

Prevent MIME sniffing

X-XSS-Protection

1; mode=block

Legacy XSS filter


CORS

CORS is configured to allow requests only from the two Firebase Hosting origins:

  • https://inference-web.web.app

  • https://inference-web.firebaseapp.com

  • https://inference-web2.web.app

  • https://inference-web2.firebaseapp.com

Preflight requests are cached. API key-authenticated requests bypass CORS since they originate from non-browser clients.


Error Sanitization

All error responses across both services are sanitized to prevent information leakage:

Sanitized Element
Replacement

Service account emails

[REDACTED_SA]

Stack traces

Generic error code + correlation ID

Firestore document paths

Opaque error reference

KMS key names and versions

[REDACTED_KEY]

Internal IP addresses

[REDACTED_IP]

Environment variable names

[REDACTED_ENV]

Provider API keys in errors

[REDACTED_CREDENTIAL]

Clients receive a correlation ID with every error response. Support teams use this ID to look up the full, unsanitized error in Cloud Logging.


Data Protection Summary

Data Store
Encryption at Rest
Encryption in Transit
Access Control

Firestore

Google-managed

TLS

IAM per project

Redis Memorystore

Fernet (application-level)

In-transit encryption

VPC connector (private)

Cloud KMS

Customer-managed keys

TLS

Split SA (encrypt/decrypt)

BigQuery

Google-managed

TLS

IAM + service account

Artifact Registry

Google-managed

TLS

IAM per project


Compliance Summary

Control
Implementation

Authentication

3 methods: API key (SHA-256), Firebase App Check (reCAPTCHA Enterprise), OIDC

Authorization

Firestore role-based (user, admin)

Transport encryption

HTTPS enforced (HSTS)

Data at rest encryption

Google-managed (Firestore, BigQuery) + Fernet (Redis) + KMS (credentials)

Credential storage

SHA-256 hashed API keys, KMS-encrypted service credentials

Rate limiting

Per-IP with progressive lockout (10 failures / 5 min = 15 min block)

Input validation

Prompt scanning, URL whitelist, DNS pinning, private IP blocking

Output sanitization

Credential stripping, stack trace removal, IP redaction

Audit logging

Cloud Logging (structured JSON) + BigQuery (analytics)

Project isolation

Separate GCP projects for API and executor

Supply chain

--require-hashes for all dependencies


Last updated