Infrastructure

Detailed breakdown of GCP services, data stores, and deployment pipeline for the Majestix AI Inference Hub.


GCP Project Layout

The platform spans two GCP projects with strict isolation between them.

Project
Purpose
Services

inference-platform

Main API, billing, user data, analytics

Cloud Run (inference-api), Redis Memorystore, Firestore, BigQuery, Pub/Sub, Firebase Hosting, Artifact Registry

inference-agents

Agent execution, orchestration, credential management

Cloud Run (agent-executor), Firestore, Cloud KMS, Cloud Tasks, Cloud Scheduler, Cloud Trace, Artifact Registry

The two projects communicate via OIDC-authenticated HTTPS calls over the public internet. There is no VPC peering between them.


Cloud Run Configuration

inference-api

Setting
Value

GCP Project

inference-platform

Region

us-central1

Image source

Artifact Registry

Runtime

Python 3.13, FastAPI, Uvicorn

CPU

2 vCPU

Memory

1 GiB

Min instances

1 (avoid cold starts for user-facing traffic)

Max instances

10

Concurrency

80 requests per instance

Timeout

300s

Ingress

All traffic (public API)

VPC connector

Serverless VPC Access (for Redis Memorystore)

Service account

agent-executor

Setting
Value

GCP Project

inference-agents

Region

us-central1

Image source

Artifact Registry

Runtime

Python 3.13, FastAPI, Uvicorn

CPU

4 vCPU

Memory

2 GiB

Min instances

0 (scale to zero -- agent tasks are bursty)

Max instances

5

Concurrency

10 requests per instance

Timeout

900s (15 min, accommodates long swarm pipelines)

Ingress

Internal + Cloud Load Balancing (not publicly accessible)

Service account


Redis Memorystore

Property
Value

GCP Project

inference-platform

Region

us-central1

Address

10.124.186.203:6379

Access

Serverless VPC Access connector (not public)

Redis serves five roles across the main API:

1. Session Storage

Conversation history is stored in Redis with Fernet symmetric encryption. Each session uses two keys:

TTL: configurable, default 1 hour (SESSION_TTL). Maximum history: 10 messages (MAX_HISTORY).

2. API Key Verification Cache

API key lookups are cached to reduce Firestore reads:

TTL: 15 minutes. Cache invalidated immediately on key revocation. Failed lookups (key not found) are not cached to prevent cache poisoning attacks.

3. Credit Counters

Monthly plan credits and top-up balances are tracked in Redis for low-latency reservation:

4. Rate Limiting

Per-IP and per-user rate limit state:

5. Usage and Plan Cache

Short-lived caches to reduce Firestore reads on hot paths:

Failure Policy

The REDIS_FAILURE_POLICY setting controls behavior when Redis is unreachable:

Policy
Behavior

reject (default, production)

Return 503 on all requests. Safe -- prevents billing bypass.

allow (dev/testing only)

Pass through with no credit enforcement.


Firestore Collections

Main API Project (inference-platform)

Agent Project (inference-agents)

Firestore Security Rules

  • agent_tasks: Owner CRUD with field validation (hasOnly() allowlist). Server-managed fields (last_run_at, total_credits_used, etc.) are blocked from client writes.

  • agent_tasks/*/credentials/*: Owner read-only (masked display). Writes via Admin SDK only (Cloud Functions).

  • agent_executions: Owner read-only. Writes via executor Admin SDK only.

  • platform_integrations: Any authenticated user can read. Admin-only writes.


Cloud KMS

Property
Value

GCP Project

inference-agents

Region

us-central1

Key Ring

agent-keys

Crypto Key

credentials

Key Path

projects/inference-agent/locations/us-central1/keyRings/agent-keys/cryptoKeys/credentials

Split Trust Model

Operation
Service Account
Project
Permission

Encrypt

Bridge SA (agent-bridge@inference-platform)

inference-platform

cloudkms.cryptoKeyVersions.useToEncrypt only

Decrypt

Executor SA (agent-executor@inference-agents)

inference-agents

cloudkms.cryptoKeyVersions.useToDecrypt only

Neither service account can perform both operations. Compromising one project does not grant full credential access.


Cloud Tasks and Cloud Scheduler

Cloud Tasks

Property
Value

GCP Project

inference-agents

Region

us-central1

Queue

agent-tasks

Auth

OIDC token (scoped to executor service URL)

Target

POST /internal/agent/execute on agent-executor

Cloud Tasks dispatches agent execution requests with OIDC authentication. Each task message contains the task_id and trigger type. The executor loads the full task definition from Firestore.

Cloud Scheduler

Cloud Scheduler creates cron-based triggers for scheduled agent tasks. Each active agent task with a cron schedule has a corresponding Scheduler job that enqueues a Cloud Tasks message at the specified interval.


BigQuery

Property
Value

GCP Project

inference-platform

Region

us-central1

Dataset

inference_analytics

Tables

usage_events, audit_events

Analytics Pipeline

Usage and audit events flow through Pub/Sub to BigQuery:

Usage events contain: model ID, provider, token counts (input/output), credit charges, latency, source (web/api_key/agent), source_id, and user ID (opaque UID). No PII or message content is stored.

Audit events record security-relevant actions: auth failures, rate limit hits, key revocations.


Firebase Hosting

Two Firebase Hosting sites serve the web application, both deployed from the inference-platform project:

Site
URL
Theme

Site 1

https://inference-web.web.app

Green (#22c55e)

Site 2

https://inference-web2.web.app

Indigo (#6366f1)

Both sites share identical React 19 + Vite 7 + Tailwind CSS 4 component logic. Only CSS colors differ. Firebase Hosting serves the SPA with CDN-backed global distribution.

Cloud Functions (Firebase Hosting Rewrites)

Three Cloud Functions handle credential-sensitive operations that cannot be performed client-side:

Endpoint
Purpose

POST /api/agent/encrypt-credential

KMS encrypt API key, store in credentials subcollection

POST /api/agent/delete-credential

Delete credential from subcollection

POST /api/agent/trigger-execution

Trigger agent executor via OIDC

All three share a common security gate: CORS origin validation, payload size check, App Check verification, rate limiting, and Firebase Auth verification.


Stripe Integration

Property
Value

Integration point

app/billing/stripe_service.py

Webhook endpoint

POST /billing/webhook

Webhook events handled

11 event types

Subscription Plans

Plan
Price
Monthly Credits
Stripe Price ID (env)

Free

$0/mo

500 credits

--

Guru

$10/mo

10,000 credits

STRIPE_GURU_PRICE_ID

Pro

$50/mo

55,000 credits

STRIPE_PRO_PRICE_ID

Top-Up Packages

Package
Price
Credits
Bonus
Stripe Price ID (env)

Small

$5

5,000

--

STRIPE_TOPUP_5_PRICE_ID

Medium

$25

27,500

10%

STRIPE_TOPUP_25_PRICE_ID

Large

$100

125,000

25%

STRIPE_TOPUP_100_PRICE_ID

Payment Methods

Stripe Checkout supports: credit/debit cards, Stripe Link, Google Pay, and Apple Pay.

Auto Top-Up

Users can configure automatic top-up when their balance falls below a threshold. Safety controls:

  • Maximum 3 auto top-ups per day

  • 4-hour cooldown after a failed charge

  • Deduplication via Redis locks (auto_topup:{lock|daily|cooldown}:{uid})


Deployment Pipeline

All services follow the same CI/CD pipeline:

  • Platform: linux/amd64 (Cloud Run requirement)

  • Dependency hashes: All pip/npm dependencies installed with --require-hashes for supply chain integrity

  • Multi-stage builds: Production images exclude dev dependencies and build tools

  • Non-root: Containers run as non-root user

  • Health checks: Cloud Run verifies /health before routing traffic

Git Repositories

Repository
Remote
Branch
Local Path

Backend API

LendefiMarkets/inference-harness

dev

app/

Agent Executor

LendefiMarkets/inference-agent

master

inference-agent/

Web Frontend

LendefiMarkets/inference-web

dev

inference-web/

VSCode Extension

LendefiMarkets/inference-vscode

master

../inference-vscode/

Note: inference-gcp/ itself is not a Git repository. Each subdirectory manages its own remotes.


Observability

Cloud Trace (OpenTelemetry)

Both services export OpenTelemetry spans to Cloud Trace. The agent executor uses a 4-level span hierarchy:

Distributed tracing crosses the OIDC service boundary, enabling end-to-end latency analysis for orchestration requests.

Cloud Logging

Structured JSON logs with:

  • Request ID correlation across services

  • User ID for billing audit

  • Model ID and provider

  • Token counts and credit charges

  • Sanitized error details (no credentials or stack traces)


Last updated