Infrastructure

Detailed breakdown of GCP services, data stores, and deployment pipeline for the Majestix AI Inference Hub.

GCP Project Layout

The platform spans two GCP projects with strict isolation between them.

Project

Purpose

Services

inference-platform

Main API, billing, user data, analytics

Cloud Run (inference-api), Redis Memorystore, Firestore, BigQuery, Pub/Sub, Firebase Hosting, Artifact Registry

inference-agents

Agent execution, orchestration, credential management

Cloud Run (agent-executor), Firestore, Cloud KMS, Cloud Tasks, Cloud Scheduler, Cloud Trace, Artifact Registry

The two projects communicate via OIDC-authenticated HTTPS calls over the public internet. There is no VPC peering between them.

Cloud Run Configuration

inference-api

Setting

Value

GCP Project

inference-platform

Region

us-central1

Image source

Artifact Registry

Runtime

Python 3.13, FastAPI, Uvicorn

CPU

2 vCPU

Memory

1 GiB

Min instances

1 (avoid cold starts for user-facing traffic)

Max instances

Concurrency

80 requests per instance

Timeout

300s

Ingress

All traffic (public API)

VPC connector

Serverless VPC Access (for Redis Memorystore)

Service account

[email protected]

agent-executor

Setting

Value

GCP Project

inference-agents

Region

us-central1

Image source

Artifact Registry

Runtime

Python 3.13, FastAPI, Uvicorn

CPU

4 vCPU

Memory

2 GiB

Min instances

0 (scale to zero -- agent tasks are bursty)

Max instances

Concurrency

10 requests per instance

Timeout

900s (15 min, accommodates long swarm pipelines)

Ingress

Internal + Cloud Load Balancing (not publicly accessible)

Service account

[email protected]

Redis Memorystore

Property

Value

GCP Project

inference-platform

Region

us-central1

Address

10.124.186.203:6379

Access

Serverless VPC Access connector (not public)

Redis serves five roles across the main API:

1. Session Storage

Conversation history is stored in Redis with Fernet symmetric encryption. Each session uses two keys:

session:{id}:meta       — Session metadata (model, creation time, title)
session:{id}:messages   — Encrypted message history

TTL: configurable, default 1 hour (SESSION_TTL). Maximum history: 10 messages (MAX_HISTORY).

2. API Key Verification Cache

API key lookups are cached to reduce Firestore reads:

apikey:{sha256_hash}    — Cached API key metadata (uid, name, scopes, disabled)

TTL: 15 minutes. Cache invalidated immediately on key revocation. Failed lookups (key not found) are not cached to prevent cache poisoning attacks.

3. Credit Counters

Monthly plan credits and top-up balances are tracked in Redis for low-latency reservation:

usage:{uid}:{YYYY-MM}:remaining  — Monthly plan credits remaining (TTL: 35 days)
credit:{uid}:balance             — Top-up credit balance (no TTL)

4. Rate Limiting

Per-IP and per-user rate limit state:

ratelimit:ip:{ip}:{endpoint}    — Request count per minute
ratelimit:user:{uid}:concurrent — Active concurrent requests

5. Usage and Plan Cache

Short-lived caches to reduce Firestore reads on hot paths:

user:{uid}:plan          — Cached plan info (TTL: 5 min)
usage_cache:{uid}        — /usage/me response cache (TTL: 60s)
auto_topup:{lock|daily|cooldown}:{uid}  — Top-up deduplication and rate limits

Failure Policy

The REDIS_FAILURE_POLICY setting controls behavior when Redis is unreachable:

Policy

Behavior

reject (default, production)

Return 503 on all requests. Safe -- prevents billing bypass.

allow (dev/testing only)

Pass through with no credit enforcement.

Firestore Collections

Main API Project (`inference-platform`)

users/{uid}
  id: string                    # Firebase UID
  email: string
  display_name: string
  plan: "free" | "guru" | "pro"
  role: "user" | "admin"
  stripe_customer_id: string
  credits_remaining: float
  topup_balance: float
  preferences: map
  referral_code: string
  referred_by: string | null
  created_at: timestamp
  updated_at: timestamp

api_keys/{sha256_hash}
  uid: string                   # Owner user ID
  name: string                  # User-assigned label
  prefix: string                # First 8 chars of key (for display)
  disabled: bool                # Revoked flag
  scopes: string[]              # Permission scopes
  expires_at: timestamp | null  # Expiration (default 90 days)
  last_used_at: timestamp | null
  created_at: timestamp

processed_events/{event_id}
  # Stripe webhook idempotency
  event_type: string
  processed_at: timestamp

organizations/{org_id}
  # Team accounts with shared credit pools (future)
  name: string
  members: map
  credit_pool: float

Agent Project (`inference-agents`)

agent_tasks/{task_id}
  id: string                    # Document ID
  uid: string                   # Owner user ID
  name: string                  # Display name
  system_prompt: string
  task_prompt: string
  model: string                 # Model key (e.g., "claude-sonnet")
  tools: string[]               # Enabled tools (e.g., ["api_call", "http_get"])
  integrations: string[]        # Platform integration IDs for URL validation
  webhook_urls: string[]        # Registered webhooks (max 3, HTTPS only)
  status: "draft" | "active" | "paused" | "archived"
  schedule: string              # Cron expression
  temperature: float
  max_tokens: int
  max_iterations: int
  max_credits: float
  plan: string                  # User plan at creation time

  # Ensemble fields (optional)
  ensemble_enabled: bool
  ensemble_models: {drafter, critic, synthesizer}
  ensemble_config: {max_rounds, approval_threshold, scoring_dimensions, temperatures}

  # Server-managed (blocked from client writes)
  last_run_at: timestamp
  last_status: string
  total_credits_used: float
  total_runs: int
  consecutive_failures: int
  credentials_valid: bool
  created_at: timestamp
  updated_at: timestamp

agent_tasks/{task_id}/credentials/{key}
  credential_key: string        # e.g., "LUNARCRUSH_API_KEY"
  service: string               # e.g., "lunarcrush"
  status: "connected"
  credentials_encrypted: bytes  # KMS-encrypted blob
  kms_key_version: string
  auth_type: "bearer" | "header" | "query_param"
  auth_header_name: string      # e.g., "Authorization", "X-Api-Key"
  scopes: string[]
  token_expires_at: timestamp | null
  created_at: timestamp
  updated_at: timestamp

agent_executions/{execution_id}
  execution_id: string
  task_id: string
  uid: string
  status: "completed" | "failed" | "credit_exhausted" | "max_iterations" | "timeout" | "blocked"
  iterations_used: int
  credits_used: float
  tokens_in: int
  tokens_out: int
  summary: string               # Max 2000 chars
  tool_calls: [{tool, status, url, response_status, latency_ms, blocked_reason}]  # Max 50
  error: string | null
  duration_seconds: float
  trigger: "scheduled" | "manual"
  completed_at: timestamp

swarm_executions/{execution_id}
  execution_id: string
  swarm_id: string
  task_id: string
  user_id: string
  status: "completed" | "partial" | "failed"
  agents_completed: int
  agents_total: int
  content: string               # Final agent output
  total_credits: float
  tokens_in: int
  tokens_out: int
  error: string | null
  created_at: timestamp

swarm_executions/{execution_id}/agent_outputs/{agent_name}
  name: string
  status: string
  output: string
  credits_used: float
  tokens_in: int
  tokens_out: int
  iterations: int
  duration_seconds: float
  tool_calls: [...]
  error: string | null

platform_integrations/{id}
  id: string                    # e.g., "lunarcrush"
  name: string                  # Display name
  description: string
  category: string              # analytics | social | email | productivity | finance | data | webhook
  status: "approved" | "deprecated"
  icon_url: string
  allowed_domains: string[]     # e.g., ["api.lunarcrush.com", "*.api.mailchimp.com"]
  default_auth_type: string
  default_auth_header: string
  credential_key_template: string
  docs_url: string
  setup_instructions: string
  partner: bool
  monthly_active_tasks: int
  added_at: timestamp
  added_by: string

integration_requests/{id}
  # User-submitted integration requests
  requested_by: string
  service_name: string
  status: "pending" | "approved" | "rejected"
  created_at: timestamp

agent_credential_rate_limits/{doc}
  # Per-IP rate limiting for credential encryption
  ip: string
  count: int
  window_start: timestamp

Firestore Security Rules

agent_tasks: Owner CRUD with field validation (hasOnly() allowlist). Server-managed fields (last_run_at, total_credits_used, etc.) are blocked from client writes.
agent_tasks/*/credentials/*: Owner read-only (masked display). Writes via Admin SDK only (Cloud Functions).
agent_executions: Owner read-only. Writes via executor Admin SDK only.
platform_integrations: Any authenticated user can read. Admin-only writes.

Cloud KMS

Property

Value

GCP Project

inference-agents

Region

us-central1

Key Ring

agent-keys

Crypto Key

credentials

Key Path

projects/inference-agent/locations/us-central1/keyRings/agent-keys/cryptoKeys/credentials

Split Trust Model

Operation

Service Account

Project

Permission

Encrypt

Bridge SA (agent-bridge@inference-platform)

inference-platform

cloudkms.cryptoKeyVersions.useToEncrypt only

Decrypt

Executor SA (agent-executor@inference-agents)

inference-agents

cloudkms.cryptoKeyVersions.useToDecrypt only

Neither service account can perform both operations. Compromising one project does not grant full credential access.

Cloud Tasks and Cloud Scheduler

Cloud Tasks

Property

Value

GCP Project

inference-agents

Region

us-central1

Queue

agent-tasks

Auth

OIDC token (scoped to executor service URL)

Target

POST /internal/agent/execute on agent-executor

Cloud Tasks dispatches agent execution requests with OIDC authentication. Each task message contains the task_id and trigger type. The executor loads the full task definition from Firestore.

Cloud Scheduler

Cloud Scheduler creates cron-based triggers for scheduled agent tasks. Each active agent task with a cron schedule has a corresponding Scheduler job that enqueues a Cloud Tasks message at the specified interval.

Cloud Scheduler (cron expression from agent_tasks/{id}.schedule)
    |
    v
Cloud Tasks (queue: agent-tasks, OIDC auth)
    |
    v
POST /internal/agent/execute {task_id, trigger: "scheduled"}

BigQuery

Property

Value

GCP Project

inference-platform

Region

us-central1

Dataset

inference_analytics

Tables

usage_events, audit_events

Analytics Pipeline

Usage and audit events flow through Pub/Sub to BigQuery:

Model call completes
    |
    v
publish UsageEvent -> Pub/Sub (usage-events topic)
    |
    v
Subscriber -> BigQuery (usage_events table)

Usage events contain: model ID, provider, token counts (input/output), credit charges, latency, source (web/api_key/agent), source_id, and user ID (opaque UID). No PII or message content is stored.

Audit events record security-relevant actions: auth failures, rate limit hits, key revocations.

Firebase Hosting

Two Firebase Hosting sites serve the web application, both deployed from the inference-platform project:

Site

URL

Theme

Site 1

https://inference-web.web.app

Green (#22c55e)

Site 2

https://inference-web2.web.app

Indigo (#6366f1)

Both sites share identical React 19 + Vite 7 + Tailwind CSS 4 component logic. Only CSS colors differ. Firebase Hosting serves the SPA with CDN-backed global distribution.

Cloud Functions (Firebase Hosting Rewrites)

Three Cloud Functions handle credential-sensitive operations that cannot be performed client-side:

Endpoint

Purpose

POST /api/agent/encrypt-credential

KMS encrypt API key, store in credentials subcollection

POST /api/agent/delete-credential

Delete credential from subcollection

POST /api/agent/trigger-execution

Trigger agent executor via OIDC

All three share a common security gate: CORS origin validation, payload size check, App Check verification, rate limiting, and Firebase Auth verification.

Stripe Integration

Property

Value

Integration point

app/billing/stripe_service.py

Webhook endpoint

POST /billing/webhook

Webhook events handled

11 event types

Subscription Plans

Plan

Price

Monthly Credits

Stripe Price ID (env)

Free

$0/mo

500 credits

Guru

$10/mo

10,000 credits

STRIPE_GURU_PRICE_ID

Pro

$50/mo

55,000 credits

STRIPE_PRO_PRICE_ID

Top-Up Packages

Package

Price

Credits

Bonus

Stripe Price ID (env)

Small

5,000

STRIPE_TOPUP_5_PRICE_ID

Medium

$25

27,500

10%

STRIPE_TOPUP_25_PRICE_ID

Large

$100

125,000

25%

STRIPE_TOPUP_100_PRICE_ID

Payment Methods

Stripe Checkout supports: credit/debit cards, Stripe Link, Google Pay, and Apple Pay.

Auto Top-Up

Users can configure automatic top-up when their balance falls below a threshold. Safety controls:

Maximum 3 auto top-ups per day
4-hour cooldown after a failed charge
Deduplication via Redis locks (auto_topup:{lock|daily|cooldown}:{uid})

Deployment Pipeline

All services follow the same CI/CD pipeline:

+----------+     +----------------+     +-------------------+     +-------------+     +-----------+
|  Code    |---->| Docker Build   |---->| Artifact Registry |---->| Cloud Build |---->| Cloud Run |
|  (Git)   |     | linux/amd64    |     | (image storage)   |     | (deploy)    |     | (serving) |
+----------+     | --require-hashes     +-------------------+     +-------------+     +-----------+
                 +----------------+

Platform: linux/amd64 (Cloud Run requirement)
Dependency hashes: All pip/npm dependencies installed with --require-hashes for supply chain integrity
Multi-stage builds: Production images exclude dev dependencies and build tools
Non-root: Containers run as non-root user
Health checks: Cloud Run verifies /health before routing traffic

Git Repositories

Repository

Remote

Branch

Local Path

Backend API

LendefiMarkets/inference-harness

dev

app/

Agent Executor

LendefiMarkets/inference-agent

master

inference-agent/

Web Frontend

LendefiMarkets/inference-web

dev

inference-web/

VSCode Extension

LendefiMarkets/inference-vscode

master

../inference-vscode/

Note: inference-gcp/ itself is not a Git repository. Each subdirectory manages its own remotes.

Observability

Cloud Trace (OpenTelemetry)

Both services export OpenTelemetry spans to Cloud Trace. The agent executor uses a 4-level span hierarchy:

agent.execute (full execution -- task/user/plan/credits/status)
  +-- agent.iteration (per-loop turn -- tokens/credits)
      +-- agent.llm_call (each /code API call -- model)
      +-- agent.tool_call (each external API -- url/latency/blocked)

Distributed tracing crosses the OIDC service boundary, enabling end-to-end latency analysis for orchestration requests.

Cloud Logging

Structured JSON logs with:

Request ID correlation across services
User ID for billing audit
Model ID and provider
Token counts and credit charges
Sanitized error details (no credentials or stack traces)

Architecture Overview -- system design and architectural decisions
Security -- platform-wide security model
API Reference -- endpoint documentation

Previous❱ Architecture NextSecurity

Last updated 1 hour ago

Good morning

hashtagGCP Project Layout

hashtagCloud Run Configuration

hashtaginference-api

hashtagagent-executor

hashtagRedis Memorystore

hashtag1. Session Storage

hashtag2. API Key Verification Cache

hashtag3. Credit Counters

hashtag4. Rate Limiting

hashtag5. Usage and Plan Cache

hashtagFailure Policy

hashtagFirestore Collections

hashtagMain API Project (inference-platform)

hashtagAgent Project (inference-agents)

hashtagFirestore Security Rules

hashtagCloud KMS

hashtagSplit Trust Model

hashtagCloud Tasks and Cloud Scheduler

hashtagCloud Tasks

hashtagCloud Scheduler

hashtagBigQuery

hashtagAnalytics Pipeline

hashtagFirebase Hosting

hashtagCloud Functions (Firebase Hosting Rewrites)

hashtagStripe Integration

hashtagSubscription Plans

hashtagTop-Up Packages

hashtagPayment Methods

hashtagAuto Top-Up

hashtagDeployment Pipeline

hashtagGit Repositories

hashtagObservability

hashtagCloud Trace (OpenTelemetry)

hashtagCloud Logging

hashtagRelated

GCP Project Layout

Cloud Run Configuration

inference-api

agent-executor

Redis Memorystore

1. Session Storage

2. API Key Verification Cache

3. Credit Counters

4. Rate Limiting

5. Usage and Plan Cache

Failure Policy

Firestore Collections

Main API Project (`inference-platform`)

Agent Project (`inference-agents`)

Firestore Security Rules

Cloud KMS

Split Trust Model

Cloud Tasks and Cloud Scheduler

Cloud Tasks

Cloud Scheduler

BigQuery

Analytics Pipeline

Firebase Hosting

Cloud Functions (Firebase Hosting Rewrites)

Stripe Integration

Subscription Plans

Top-Up Packages

Payment Methods

Auto Top-Up

Deployment Pipeline

Git Repositories

Observability

Cloud Trace (OpenTelemetry)

Cloud Logging

Related