Infrastructure
Detailed breakdown of GCP services, data stores, and deployment pipeline for the Majestix AI Inference Hub.
GCP Project Layout
The platform spans two GCP projects with strict isolation between them.
inference-platform
Main API, billing, user data, analytics
Cloud Run (inference-api), Redis Memorystore, Firestore, BigQuery, Pub/Sub, Firebase Hosting, Artifact Registry
inference-agents
Agent execution, orchestration, credential management
Cloud Run (agent-executor), Firestore, Cloud KMS, Cloud Tasks, Cloud Scheduler, Cloud Trace, Artifact Registry
The two projects communicate via OIDC-authenticated HTTPS calls over the public internet. There is no VPC peering between them.
Cloud Run Configuration
inference-api
GCP Project
inference-platform
Region
us-central1
Image source
Artifact Registry
Runtime
Python 3.13, FastAPI, Uvicorn
CPU
2 vCPU
Memory
1 GiB
Min instances
1 (avoid cold starts for user-facing traffic)
Max instances
10
Concurrency
80 requests per instance
Timeout
300s
Ingress
All traffic (public API)
VPC connector
Serverless VPC Access (for Redis Memorystore)
Service account
agent-executor
GCP Project
inference-agents
Region
us-central1
Image source
Artifact Registry
Runtime
Python 3.13, FastAPI, Uvicorn
CPU
4 vCPU
Memory
2 GiB
Min instances
0 (scale to zero -- agent tasks are bursty)
Max instances
5
Concurrency
10 requests per instance
Timeout
900s (15 min, accommodates long swarm pipelines)
Ingress
Internal + Cloud Load Balancing (not publicly accessible)
Service account
Redis Memorystore
GCP Project
inference-platform
Region
us-central1
Address
10.124.186.203:6379
Access
Serverless VPC Access connector (not public)
Redis serves five roles across the main API:
1. Session Storage
Conversation history is stored in Redis with Fernet symmetric encryption. Each session uses two keys:
TTL: configurable, default 1 hour (SESSION_TTL). Maximum history: 10 messages (MAX_HISTORY).
2. API Key Verification Cache
API key lookups are cached to reduce Firestore reads:
TTL: 15 minutes. Cache invalidated immediately on key revocation. Failed lookups (key not found) are not cached to prevent cache poisoning attacks.
3. Credit Counters
Monthly plan credits and top-up balances are tracked in Redis for low-latency reservation:
4. Rate Limiting
Per-IP and per-user rate limit state:
5. Usage and Plan Cache
Short-lived caches to reduce Firestore reads on hot paths:
Failure Policy
The REDIS_FAILURE_POLICY setting controls behavior when Redis is unreachable:
reject (default, production)
Return 503 on all requests. Safe -- prevents billing bypass.
allow (dev/testing only)
Pass through with no credit enforcement.
Firestore Collections
Main API Project (inference-platform)
inference-platform)Agent Project (inference-agents)
inference-agents)Firestore Security Rules
agent_tasks: Owner CRUD with field validation (hasOnly()allowlist). Server-managed fields (last_run_at,total_credits_used, etc.) are blocked from client writes.agent_tasks/*/credentials/*: Owner read-only (masked display). Writes via Admin SDK only (Cloud Functions).agent_executions: Owner read-only. Writes via executor Admin SDK only.platform_integrations: Any authenticated user can read. Admin-only writes.
Cloud KMS
GCP Project
inference-agents
Region
us-central1
Key Ring
agent-keys
Crypto Key
credentials
Key Path
projects/inference-agent/locations/us-central1/keyRings/agent-keys/cryptoKeys/credentials
Split Trust Model
Encrypt
Bridge SA (agent-bridge@inference-platform)
inference-platform
cloudkms.cryptoKeyVersions.useToEncrypt only
Decrypt
Executor SA (agent-executor@inference-agents)
inference-agents
cloudkms.cryptoKeyVersions.useToDecrypt only
Neither service account can perform both operations. Compromising one project does not grant full credential access.
Cloud Tasks and Cloud Scheduler
Cloud Tasks
GCP Project
inference-agents
Region
us-central1
Queue
agent-tasks
Auth
OIDC token (scoped to executor service URL)
Target
POST /internal/agent/execute on agent-executor
Cloud Tasks dispatches agent execution requests with OIDC authentication. Each task message contains the task_id and trigger type. The executor loads the full task definition from Firestore.
Cloud Scheduler
Cloud Scheduler creates cron-based triggers for scheduled agent tasks. Each active agent task with a cron schedule has a corresponding Scheduler job that enqueues a Cloud Tasks message at the specified interval.
BigQuery
GCP Project
inference-platform
Region
us-central1
Dataset
inference_analytics
Tables
usage_events, audit_events
Analytics Pipeline
Usage and audit events flow through Pub/Sub to BigQuery:
Usage events contain: model ID, provider, token counts (input/output), credit charges, latency, source (web/api_key/agent), source_id, and user ID (opaque UID). No PII or message content is stored.
Audit events record security-relevant actions: auth failures, rate limit hits, key revocations.
Firebase Hosting
Two Firebase Hosting sites serve the web application, both deployed from the inference-platform project:
Site 1
https://inference-web.web.app
Green (#22c55e)
Site 2
https://inference-web2.web.app
Indigo (#6366f1)
Both sites share identical React 19 + Vite 7 + Tailwind CSS 4 component logic. Only CSS colors differ. Firebase Hosting serves the SPA with CDN-backed global distribution.
Cloud Functions (Firebase Hosting Rewrites)
Three Cloud Functions handle credential-sensitive operations that cannot be performed client-side:
POST /api/agent/encrypt-credential
KMS encrypt API key, store in credentials subcollection
POST /api/agent/delete-credential
Delete credential from subcollection
POST /api/agent/trigger-execution
Trigger agent executor via OIDC
All three share a common security gate: CORS origin validation, payload size check, App Check verification, rate limiting, and Firebase Auth verification.
Stripe Integration
Integration point
app/billing/stripe_service.py
Webhook endpoint
POST /billing/webhook
Webhook events handled
11 event types
Subscription Plans
Free
$0/mo
500 credits
--
Guru
$10/mo
10,000 credits
STRIPE_GURU_PRICE_ID
Pro
$50/mo
55,000 credits
STRIPE_PRO_PRICE_ID
Top-Up Packages
Small
$5
5,000
--
STRIPE_TOPUP_5_PRICE_ID
Medium
$25
27,500
10%
STRIPE_TOPUP_25_PRICE_ID
Large
$100
125,000
25%
STRIPE_TOPUP_100_PRICE_ID
Payment Methods
Stripe Checkout supports: credit/debit cards, Stripe Link, Google Pay, and Apple Pay.
Auto Top-Up
Users can configure automatic top-up when their balance falls below a threshold. Safety controls:
Maximum 3 auto top-ups per day
4-hour cooldown after a failed charge
Deduplication via Redis locks (
auto_topup:{lock|daily|cooldown}:{uid})
Deployment Pipeline
All services follow the same CI/CD pipeline:
Platform:
linux/amd64(Cloud Run requirement)Dependency hashes: All pip/npm dependencies installed with
--require-hashesfor supply chain integrityMulti-stage builds: Production images exclude dev dependencies and build tools
Non-root: Containers run as non-root user
Health checks: Cloud Run verifies
/healthbefore routing traffic
Git Repositories
Backend API
LendefiMarkets/inference-harness
dev
app/
Agent Executor
LendefiMarkets/inference-agent
master
inference-agent/
Web Frontend
LendefiMarkets/inference-web
dev
inference-web/
VSCode Extension
LendefiMarkets/inference-vscode
master
../inference-vscode/
Note: inference-gcp/ itself is not a Git repository. Each subdirectory manages its own remotes.
Observability
Cloud Trace (OpenTelemetry)
Both services export OpenTelemetry spans to Cloud Trace. The agent executor uses a 4-level span hierarchy:
Distributed tracing crosses the OIDC service boundary, enabling end-to-end latency analysis for orchestration requests.
Cloud Logging
Structured JSON logs with:
Request ID correlation across services
User ID for billing audit
Model ID and provider
Token counts and credit charges
Sanitized error details (no credentials or stack traces)
Related
Architecture Overview -- system design and architectural decisions
Security -- platform-wide security model
API Reference -- endpoint documentation
Last updated
