❱ VSCode Extension

The Majestix AI Inference Hub VSCode extension brings multi-model AI chat directly into your editor. It provides a sidebar chat panel with access to all 18 models, SSE streaming, context-aware code commands, and real-time credit balance display -- all authenticated via API key.


Overview

Feature
Description

Sidebar chat panel

Full conversational interface in the VSCode sidebar

18 models

Access to every model on the platform (Anthropic, OpenAI, Vertex, OpenRouter)

SSE streaming

Real-time token-by-token response display

8 slash commands

Pre-built code actions (explain, refactor, find bugs, etc.)

Credit balance

Live credit display in the status bar

API key auth

Secure authentication via inf_ prefixed API keys


Installation

Install the extension from the Visual Studio Code Marketplace:

  1. Open VSCode.

  2. Go to Extensions (Ctrl+Shift+X / Cmd+Shift+X).

  3. Search for Majestix AI Inference Hub.

  4. Click Install.

Alternatively, install from the command line:


Setup

The extension requires an API key to authenticate with the Majestix AI Inference Hub API. API keys are created through the web application.

Step 1: Create an API Key

  1. Navigate to Settings > API Keys.

  2. Click Create New Key and provide a friendly name (e.g., "VSCode -- MacBook Pro").

  3. Copy the generated key immediately. It starts with inf_ and cannot be viewed again after creation.

Step 2: Configure the Extension

  1. Open VSCode Settings (Ctrl+, / Cmd+,).

  2. Search for Majestix AI Inference Hub.

  3. Paste your API key into the API Key field.

Or set it directly in settings.json:

Step 3: Verify Connection

Open the Majestix AI Inference Hub sidebar panel. If the API key is valid, you will see your current credit balance in the status bar and the model selector will populate with all available models.


Features

The extension adds a dedicated chat panel to the VSCode sidebar. It supports full multi-turn conversations with any of the 18 available models.

  • Type a message and press Enter to send.

  • Responses stream in real-time using Server-Sent Events (SSE).

  • Markdown formatting is rendered inline, including code blocks with syntax highlighting.

  • Conversation history is maintained for the duration of the session.

Model Selector

A dropdown at the top of the chat panel lets you choose from all 18 models:

Provider
Models

Anthropic

Claude Sonnet 4.6, Claude Haiku 4.5, Claude Opus 4.6

OpenAI

GPT-5 Mini, GPT-5.2, GPT-5.2 Codex, CMO Agent

Vertex AI

Gemini 3 Flash, Gemini 3.1 Pro, Gemini 3 Image, GPT-5 Image, Seedream 4.5

OpenRouter

Llama 4 Maverick, DeepSeek V3.2, DeepSeek R1, Qwen3 Coder, Kimi K2.5, Grok 4.1 Fast

If no model is selected, the platform auto-routes your request to the best model based on the content of your message.

SSE Streaming

All responses are streamed token-by-token via Server-Sent Events. The chat panel displays tokens as they arrive, providing immediate feedback. The streaming protocol follows the platform standard:

When the done event arrives, the credit balance in the status bar updates automatically.

Credit Balance Display

The extension displays your current available credits in the VSCode status bar. The balance is the sum of remaining monthly plan credits and any top-up balance. It refreshes after each request completes and can be manually refreshed via the Inference: Refresh Balance command.

Slash Commands

The extension provides 8 pre-built commands that operate on selected code. Select code in the editor, then invoke a command via the Command Palette (Ctrl+Shift+P / Cmd+Shift+P) or right-click context menu.

Command
Palette Name
Description

Explain Code

Inference: Explain Code

Explains the selected code in plain language

Refactor

Inference: Refactor Code

Suggests refactoring improvements with rewritten code

Find Bugs

Inference: Find Bugs

Analyzes the selection for potential bugs and issues

Write Tests

Inference: Write Tests

Generates unit tests for the selected code

Optimize

Inference: Optimize Code

Suggests performance optimizations

Document

Inference: Document Code

Generates documentation comments (JSDoc, docstrings, etc.)

Ask About Selection

Inference: Ask About Selection

Opens the chat panel with the selected code as context

General Question

Inference: Ask Question

Opens the chat panel for a free-form question

Context-Aware Code Interaction

When you invoke any slash command, the extension automatically sends the selected code as context to the model. This means the model sees:

  • The selected code snippet.

  • The file language (for language-aware responses).

  • The file path (for project-aware context).

For the Ask About Selection command, the selected code is prepended to your question so the model can reference it directly.


Configuration

All settings are under the inference namespace in VSCode settings.

Setting
Type
Default
Description

inference.apiKey

string

""

Your Majestix AI Inference Hub API key (starts with inf_).

inference.defaultModel

string

""

Preferred model key (e.g., claude-sonnet). Leave empty for auto-routing.

inference.apiBaseUrl

string

https://inference-api-611798501438.us-central1.run.app

API base URL. Override for local development.


Keyboard Shortcuts

The extension registers the following default keyboard shortcuts:

Shortcut
Action

Ctrl+Shift+I / Cmd+Shift+I

Toggle the Inference sidebar panel

Ctrl+Shift+E / Cmd+Shift+E

Explain selected code

Additional commands can be bound to custom shortcuts via File > Preferences > Keyboard Shortcuts.


Architecture

The extension is built with TypeScript and bundled with esbuild for fast loading.

Components

Component
Description

Extension Host

TypeScript entry point that registers commands, the sidebar webview provider, and the status bar item.

Sidebar Webview

HTML/CSS/JS panel rendered in the VSCode sidebar. Handles chat UI, model selection, and message display.

API Client

HTTP client module that communicates with the Cloud Run API. Sends requests to /chat, /code, /models, and /usage/me.

SSE Parser

Parses text/event-stream responses and dispatches chunk, tool_use, done, and error events to the webview.

Status Bar

Displays the current credit balance. Updates after each request and on manual refresh.

Communication Flow

Build

The extension uses esbuild for bundling:

Repository

The extension source code is in a separate repository:

Property
Value

Repository

LendefiMarkets/inference-vscode

Branch

master

Local path

../inference-vscode/ (relative to inference-gcp/)

Language

TypeScript

Bundler

esbuild

Target

VSCode 1.85+


Troubleshooting

"Authentication failed" error

  • Verify your API key starts with inf_ and is correctly pasted in settings.

  • Check that the key has not been revoked on the web app's Settings > API Keys page.

  • API keys expire after 90 days. Generate a new key if yours has expired.

No models appearing in the selector

  • Confirm the extension has a valid API key configured.

  • Check your internet connection. The extension fetches the model list from the API on startup.

  • Try running Inference: Refresh Balance from the Command Palette to re-establish the connection.

Credit balance shows 0

  • Your plan credits may be exhausted for the current billing cycle.

  • Purchase a top-up via the web app at Billing > Top Up.

  • The status bar updates after each request. Run Inference: Refresh Balance to force an update.

Last updated