❱ VSCode Extension
The Majestix AI Inference Hub VSCode extension brings multi-model AI chat directly into your editor. It provides a sidebar chat panel with access to all 18 models, SSE streaming, context-aware code commands, and real-time credit balance display -- all authenticated via API key.
Overview
Sidebar chat panel
Full conversational interface in the VSCode sidebar
18 models
Access to every model on the platform (Anthropic, OpenAI, Vertex, OpenRouter)
SSE streaming
Real-time token-by-token response display
8 slash commands
Pre-built code actions (explain, refactor, find bugs, etc.)
Credit balance
Live credit display in the status bar
API key auth
Secure authentication via inf_ prefixed API keys
Installation
Install the extension from the Visual Studio Code Marketplace:
Open VSCode.
Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X).
Search for Majestix AI Inference Hub.
Click Install.
Alternatively, install from the command line:
Setup
The extension requires an API key to authenticate with the Majestix AI Inference Hub API. API keys are created through the web application.
Step 1: Create an API Key
Sign in to the web app at inference-web.web.app or inference-web2.web.app.
Navigate to Settings > API Keys.
Click Create New Key and provide a friendly name (e.g., "VSCode -- MacBook Pro").
Copy the generated key immediately. It starts with
inf_and cannot be viewed again after creation.
Step 2: Configure the Extension
Open VSCode Settings (Ctrl+, / Cmd+,).
Search for Majestix AI Inference Hub.
Paste your API key into the API Key field.
Or set it directly in settings.json:
Step 3: Verify Connection
Open the Majestix AI Inference Hub sidebar panel. If the API key is valid, you will see your current credit balance in the status bar and the model selector will populate with all available models.
Features
Sidebar Chat Panel
The extension adds a dedicated chat panel to the VSCode sidebar. It supports full multi-turn conversations with any of the 18 available models.
Type a message and press Enter to send.
Responses stream in real-time using Server-Sent Events (SSE).
Markdown formatting is rendered inline, including code blocks with syntax highlighting.
Conversation history is maintained for the duration of the session.
Model Selector
A dropdown at the top of the chat panel lets you choose from all 18 models:
Anthropic
Claude Sonnet 4.6, Claude Haiku 4.5, Claude Opus 4.6
OpenAI
GPT-5 Mini, GPT-5.2, GPT-5.2 Codex, CMO Agent
Vertex AI
Gemini 3 Flash, Gemini 3.1 Pro, Gemini 3 Image, GPT-5 Image, Seedream 4.5
OpenRouter
Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1, Qwen3 Coder, Kimi K2.5, Grok 4.1 Fast
If no model is selected, the platform auto-routes your request to the best model based on the content of your message.
SSE Streaming
All responses are streamed token-by-token via Server-Sent Events. The chat panel displays tokens as they arrive, providing immediate feedback. The streaming protocol follows the platform standard:
When the done event arrives, the credit balance in the status bar updates automatically.
Credit Balance Display
The extension displays your current available credits in the VSCode status bar. The balance is the sum of remaining monthly plan credits and any top-up balance. It refreshes after each request completes and can be manually refreshed via the Inference: Refresh Balance command.
Slash Commands
The extension provides 8 pre-built commands that operate on selected code. Select code in the editor, then invoke a command via the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) or right-click context menu.
Explain Code
Inference: Explain Code
Explains the selected code in plain language
Refactor
Inference: Refactor Code
Suggests refactoring improvements with rewritten code
Find Bugs
Inference: Find Bugs
Analyzes the selection for potential bugs and issues
Write Tests
Inference: Write Tests
Generates unit tests for the selected code
Optimize
Inference: Optimize Code
Suggests performance optimizations
Document
Inference: Document Code
Generates documentation comments (JSDoc, docstrings, etc.)
Ask About Selection
Inference: Ask About Selection
Opens the chat panel with the selected code as context
General Question
Inference: Ask Question
Opens the chat panel for a free-form question
Context-Aware Code Interaction
When you invoke any slash command, the extension automatically sends the selected code as context to the model. This means the model sees:
The selected code snippet.
The file language (for language-aware responses).
The file path (for project-aware context).
For the Ask About Selection command, the selected code is prepended to your question so the model can reference it directly.
Configuration
All settings are under the inference namespace in VSCode settings.
inference.apiKey
string
""
Your Majestix AI Inference Hub API key (starts with inf_).
inference.defaultModel
string
""
Preferred model key (e.g., claude-sonnet). Leave empty for auto-routing.
inference.apiBaseUrl
string
https://inference-api-611798501438.us-central1.run.app
API base URL. Override for local development.
Keyboard Shortcuts
The extension registers the following default keyboard shortcuts:
Ctrl+Shift+I / Cmd+Shift+I
Toggle the Inference sidebar panel
Ctrl+Shift+E / Cmd+Shift+E
Explain selected code
Additional commands can be bound to custom shortcuts via File > Preferences > Keyboard Shortcuts.
Architecture
The extension is built with TypeScript and bundled with esbuild for fast loading.
Components
Extension Host
TypeScript entry point that registers commands, the sidebar webview provider, and the status bar item.
Sidebar Webview
HTML/CSS/JS panel rendered in the VSCode sidebar. Handles chat UI, model selection, and message display.
API Client
HTTP client module that communicates with the Cloud Run API. Sends requests to /chat, /code, /models, and /usage/me.
SSE Parser
Parses text/event-stream responses and dispatches chunk, tool_use, done, and error events to the webview.
Status Bar
Displays the current credit balance. Updates after each request and on manual refresh.
Communication Flow
Build
The extension uses esbuild for bundling:
Repository
The extension source code is in a separate repository:
Repository
LendefiMarkets/inference-vscode
Branch
master
Local path
../inference-vscode/ (relative to inference-gcp/)
Language
TypeScript
Bundler
esbuild
Target
VSCode 1.85+
Troubleshooting
"Authentication failed" error
Verify your API key starts with
inf_and is correctly pasted in settings.Check that the key has not been revoked on the web app's Settings > API Keys page.
API keys expire after 90 days. Generate a new key if yours has expired.
No models appearing in the selector
Confirm the extension has a valid API key configured.
Check your internet connection. The extension fetches the model list from the API on startup.
Try running Inference: Refresh Balance from the Command Palette to re-establish the connection.
Credit balance shows 0
Your plan credits may be exhausted for the current billing cycle.
Purchase a top-up via the web app at Billing > Top Up.
The status bar updates after each request. Run Inference: Refresh Balance to force an update.
Last updated
