io.github.RightNow-AI/forge-mcp-server

平台与服务

by rightnow-ai

可将 PyTorch 转换为运行在真实 datacenter GPUs 上的高性能 CUDA/Triton kernels,最高提速达 14x。

什么是 io.github.RightNow-AI/forge-mcp-server

可将 PyTorch 转换为运行在真实 datacenter GPUs 上的高性能 CUDA/Triton kernels,最高提速达 14x。

README

<p align="center"> <img src="https://raw.githubusercontent.com/RightNow-AI/forge-mcp-server/main/forge-logo.jpg" alt="Forge" width="120" /> </p> <h1 align="center">Forge MCP Server</h1> <p align="center"> <strong>Swarm agents that turn slow PyTorch into fast CUDA/Triton kernels, from any AI coding agent.</strong> </p> <p align="center"> <a href="https://www.npmjs.com/package/@rightnow/forge-mcp-server"><img src="https://img.shields.io/npm/v/@rightnow/forge-mcp-server?color=0284c7&label=npm" alt="npm version"></a> <a href="https://www.npmjs.com/package/@rightnow/forge-mcp-server"><img src="https://img.shields.io/npm/dm/@rightnow/forge-mcp-server?color=0284c7" alt="npm downloads"></a> <a href="https://modelcontextprotocol.io"><img src="https://badge.mcpx.dev?type=server&features=tools,resources,prompts" alt="MCP Server"></a> <a href="https://github.com/RightNow-AI/forge-mcp-server/blob/main/LICENSE"><img src="https://img.shields.io/github/license/RightNow-AI/forge-mcp-server" alt="License"></a> <a href="https://www.typescriptlang.org/"><img src="https://img.shields.io/badge/TypeScript-3178C6?logo=typescript&logoColor=fff" alt="TypeScript"></a> </p> <p align="center"> <a href="#installation">Installation</a> · <a href="#tools">Tools</a> · <a href="#resources">Resources</a> · <a href="#prompts">Prompts</a> · <a href="#security">Security</a> · <a href="#development">Development</a> </p>

Overview

Forge transforms PyTorch models into production-grade CUDA/Triton kernels through automated multi-agent optimization. Using 32 parallel AI agents with inference-time scaling, it achieves up to 14x faster inference than torch.compile(mode='max-autotune-no-cudagraphs') while maintaining 100% numerical correctness.

This MCP server connects any MCP-compatible AI coding agent to Forge. Your agent submits PyTorch code, Forge optimizes it with swarm agents on real datacenter GPUs, and returns the fastest kernel as a drop-in replacement.

What it does

  • Optimize existing kernels - Submit PyTorch code, get back an optimized Triton/CUDA kernel benchmarked against torch.compile(max-autotune)
  • Generate new kernels - Describe an operation (e.g. "fused LayerNorm + GELU + Dropout"), get a production-ready optimized kernel
  • 32 parallel swarm agents - Coder+Judge agent pairs compete to discover optimal kernels, exploring tensor core utilization, memory coalescing, shared memory tiling, and kernel fusion simultaneously
  • Real datacenter GPU benchmarking - Every kernel is compiled, tested for correctness, and profiled on actual datacenter hardware
  • 250k tokens/sec inference - Results in minutes, not hours
  • Smart detection - The agent automatically recognizes when your code would benefit from GPU optimization
  • One-click auth - Browser-based OAuth sign-in. No API keys to manage.

Supported GPUs

All optimization and benchmarking runs on datacenter-grade hardware:

GPUArchitecture
B200Blackwell
H200Hopper
H100Hopper
L40SAda Lovelace
A100Ampere
L4Ada Lovelace
A10Ampere
T4Turing

Supported clients

ClientStatus
Claude CodeFully supported
Claude DesktopFully supported
OpenCodeFully supported
CursorFully supported
WindsurfFully supported
VS Code + CopilotFully supported
Any MCP clientFully supported via stdio

Installation

Claude Code

macOS / Linux:

bash
claude mcp add forge-mcp -- npx -y @rightnow/forge-mcp-server

Windows:

bash
claude mcp add forge-mcp -- cmd /c npx -y @rightnow/forge-mcp-server

Claude Desktop

Add to your claude_desktop_config.json:

<details> <summary>macOS: <code>~/Library/Application Support/Claude/claude_desktop_config.json</code></summary>
json
{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}
</details> <details> <summary>Windows: <code>%APPDATA%\Claude\claude_desktop_config.json</code></summary>
json
{
  "mcpServers": {
    "forge": {
      "command": "cmd",
      "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]
    }
  }
}
</details>

VS Code / Copilot

Add to your .vscode/mcp.json (workspace) or user settings:

json
{
  "servers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

Cursor

Add to your Cursor MCP settings (~/.cursor/mcp.json):

json
{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

Windsurf

Add to your Windsurf MCP configuration:

json
{
  "mcpServers": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Windows: Use "command": "cmd" with "args": ["/c", "npx", "-y", "@rightnow/forge-mcp-server"]

OpenCode

Add to your opencode.json:

json
{
  "mcp": {
    "forge": {
      "command": "npx",
      "args": ["-y", "@rightnow/forge-mcp-server"]
    }
  }
}

Tools

forge_auth

Authenticate with the Forge service. Opens your browser to sign in via the RightNow dashboard. Required before using any other tool.

  • Inputs:
    • force (boolean, optional): Force re-authentication even if valid tokens exist
  • Returns: Authentication status, email, plan type, and credit balance

forge_optimize

Submit PyTorch code for GPU kernel optimization. 32 swarm agents generate optimized Triton or CUDA kernels, evaluate them on real datacenter GPUs, and return the best result with speedup metrics.

The agent will automatically use this tool when it detects:

  • PyTorch custom operations (torch.autograd.Function, custom forward/backward)

  • Manual CUDA kernels that could be faster

  • Performance-critical tensor operations (attention, convolution, normalization, softmax)

  • Code with comments like "slow", "bottleneck", "optimize"

  • torch.compile() targets or triton.jit kernels

  • Any nn.Module with significant compute in forward()

  • Matrix multiplication, reduction, or scan operations

  • Custom loss functions with reduction operations

  • Fused operation opportunities (e.g., LayerNorm + activation)

  • Inputs:

    • pytorch_code (string, required): Complete PyTorch code to optimize. Max 500 KB.
    • kernel_name (string, required): Short name for the kernel (e.g., "flash_attention")
    • output_format (enum, optional): "triton" (default) or "native_cuda"
    • target_speedup (number, optional): Target speedup multiplier. Default 2.0
    • max_iterations (number, optional): Max optimization iterations (1-100). Default 10
    • gpu (enum, optional): Target GPU. Default "H100". Options: B200, H200, H100, L40S, A100, L4, A10, T4
    • user_prompt (string, optional): Guidance for the optimizer (e.g., "focus on memory bandwidth")
  • Returns: Optimized kernel code, speedup metrics, latency comparison, iteration history

forge_generate

Generate an optimized GPU kernel from scratch based on a natural-language specification. Forge creates a PyTorch baseline, then optimizes it into Triton or CUDA.

  • Inputs:
    • operation (string, required): Operation name (e.g., "fused_attention", "softmax")
    • description (string, required): Detailed description of what the kernel should do
    • input_shapes (number[][], required): Input tensor shapes (e.g., [[8, 512, 768]])
    • output_shape (number[], optional): Expected output shape
    • dtype (string, optional): Data type. Default "float16"
    • output_format (enum, optional): "triton" (default) or "native_cuda"
    • target_speedup (number, optional): Target speedup. Default 2.0
    • max_iterations (number, optional): Max iterations (1-100). Default 10
    • gpu (enum, optional): Target GPU. Default "H100"
    • user_prompt (string, optional): Additional guidance
  • Returns: Generated kernel code, speedup metrics, iteration history

forge_credits

Check your current Forge credit balance.

  • Inputs: None
  • Returns: Credit balance, total purchased, total used, plan type

forge_status

Check the status of a running or completed optimization job.

  • Inputs:
    • session_id (string, required): Session ID from forge_optimize or forge_generate
  • Returns: Job status, current iteration, best speedup

forge_cancel

Cancel a running optimization job.

  • Inputs:
    • session_id (string, required): Session ID of the job to cancel
  • Returns: Cancellation confirmation

forge_sessions

List past optimization sessions with results.

  • Inputs:
    • limit (number, optional): Number of sessions to return (1-100). Default 10
    • status (enum, optional): Filter by status: "all", "completed", "failed", "running". Default "all"
  • Returns: Table of sessions with task name, GPU, speedup, status, and date

Tool Annotations

ToolRead-onlyIdempotentDestructive
forge_authNoYesNo
forge_optimizeNoNoNo
forge_generateNoNoNo
forge_creditsYesYesNo
forge_statusYesYesNo
forge_cancelNoNoYes
forge_sessionsYesYesNo

Resources

URIDescription
forge://auth/statusCurrent authentication state (authenticated, token expiry, has refresh token)
forge://creditsCredit balance, usage, and plan information

Prompts

forge-optimize

Guided workflow for optimizing a GPU kernel. Instructs the agent to:

  1. Check credit balance
  2. Analyze the code for optimization targets
  3. Call forge_optimize with appropriate parameters
  4. Explain the results and suggest integration

forge-analyze

Teaches the agent to scan a codebase for GPU optimization opportunities, ranked by expected impact:

PriorityPattern
HIGHCustom autograd functions, attention mechanisms, fused operations
MEDIUMStandard nn.Module compositions, normalization + activation fusion
LOWElement-wise operations, simple reductions

How It Works

code
┌──────────────┐     stdio      ┌──────────────────┐     HTTPS      ┌──────────────────┐
│  AI Agent    │ ──────────────>│  Forge MCP       │ ──────────────>│  Forge API       │
│  (Claude,    │                │  Server          │                │  (RightNow AI)   │
│   Cursor,    │<──────────────│                  │<──────────────│                  │
│   etc.)      │   MCP result   │  - OAuth + PKCE  │   SSE stream   │  - 32 swarm      │
└──────────────┘                │  - SSE streaming │                │    agents        │
                                │  - Token mgmt    │                │  - Real GPU      │
                                └──────────────────┘                │    benchmarking  │
                                                                    └──────────────────┘
  1. Authenticate: The agent calls forge_auth, which opens your browser. Sign in once, tokens are stored locally at ~/.forge/tokens.json and auto-refresh.
  2. Optimize: The agent sends your PyTorch code via forge_optimize. The MCP server POSTs to the Forge API and streams SSE events in real time.
  3. Benchmark: 32 parallel Coder+Judge agents generate kernels, compile them, test correctness against the PyTorch reference, and profile performance on real datacenter GPUs.
  4. Return: The MCP server collects all results and returns the optimized code, speedup metrics, and iteration history. The output is a drop-in replacement for your original code.

Each optimization costs 1 credit. Credits are only charged for successful runs (speedup >= 1.1x). Failed runs and cancelled jobs are not charged.


Configuration

Authentication

No API keys needed. The server uses OAuth 2.0 with PKCE for secure browser-based authentication:

  1. Agent calls forge_auth
  2. Your default browser opens to dashboard.rightnowai.co
  3. Sign in or create an account
  4. Authorization completes automatically
  5. Tokens are stored locally at ~/.forge/tokens.json (mode 0600)
  6. Access tokens auto-refresh, you only sign in once

Credits

Forge uses a pay-as-you-go credit system. Each optimization or generation run costs 1 credit.

CreditsPricePer Credit
1-9$15.00 each$15.00
10+25% off$11.25
50$562.50$11.25
EnterpriseCustom volume pricingContact us

Free trial: optimize 1 kernel, no credit card required.

100% refund guarantee: if Forge doesn't beat torch.compile, you get your credit back.

Purchase credits at dashboard.rightnowai.co.


Benchmarks

End-to-end latency on NVIDIA B200. Forge vs torch.compile(mode='max-autotune-no-cudagraphs'):

Modeltorch.compileForgeSpeedup
Llama-3.1-8B42.3ms8.2ms5.16x
Qwen2.5-7B38.5ms9.1ms4.23x
Mistral-7B35.2ms10.4ms3.38x
Phi-3-mini18.7ms6.8ms2.75x
SDXL UNet89.4ms31.2ms2.87x
Whisper-large52.1ms19.8ms2.63x
BERT-large12.4ms5.1ms2.43x

See the full benchmarks at rightnowai.co/forge.


Security

Token Protection

  • No tokens in errors: All error messages are sanitized through regex filters that strip JWTs, Bearer tokens, hex tokens, and credential parameters before reaching the agent
  • Local storage only: Tokens are stored at ~/.forge/tokens.json with file mode 0600 (owner read/write only)
  • Auto-refresh: Access tokens expire in 1 hour and auto-refresh using the stored refresh token
  • PKCE flow: OAuth uses Proof Key for Code Exchange (SHA-256), preventing authorization code interception
  • No secrets in config: The MCP server requires zero environment variables or API keys

Input Validation

  • PyTorch code input is capped at 500 KB to prevent memory exhaustion
  • User prompts are capped at 10 KB
  • All string inputs have maximum length validation via Zod schemas
  • Numeric inputs have min/max bounds (e.g., max_iterations: 1-100)

Network Security

  • All API communication uses HTTPS
  • Non-SSE requests have a 30-second timeout to prevent hanging
  • SSE streams have a 10-minute timeout with automatic cleanup
  • Token refresh uses a mutex to prevent race conditions from concurrent requests

What the server can access

  • Network: Only dashboard.rightnowai.co and forge-api.rightnowai.co
  • Filesystem: Only reads/writes ~/.forge/tokens.json
  • No codebase access: The MCP server never reads your files. The agent passes code to it explicitly through tool parameters.

Development

Build from source

bash
git clone https://github.com/RightNow-AI/forge-mcp-server.git
cd forge-mcp-server
npm install
npm run build

Run locally

bash
npm run dev

Type check

bash
npm run typecheck

Debug with MCP Inspector

bash
npx @modelcontextprotocol/inspector node dist/index.js

This opens a web UI where you can invoke each tool, inspect inputs/outputs, and debug the server interactively.

Project structure

code
forge-mcp-server/
├── src/
│   ├── index.ts              # Entry point (McpServer + StdioServerTransport)
│   ├── server.ts             # Registers all tools, resources, prompts
│   ├── constants.ts          # URLs, client IDs, timeouts, limits
│   ├── types.ts              # TypeScript interfaces + type guards + sanitization
│   ├── auth/
│   │   ├── oauth-client.ts   # PKCE flow, token refresh, access token management
│   │   └── token-store.ts    # ~/.forge/tokens.json read/write/clear
│   ├── api/
│   │   ├── forge-client.ts   # HTTP client for all Forge API endpoints
│   │   └── sse-consumer.ts   # SSE stream parser via native fetch + ReadableStream
│   ├── tools/                # 7 MCP tools
│   ├── resources/            # 2 MCP resources
│   └── prompts/              # 2 MCP prompts
├── .github/workflows/
│   ├── ci.yml                # Typecheck + build on push/PR
│   └── release.yml           # npm publish on version tags
├── package.json
├── tsconfig.json
└── tsup.config.ts

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

  1. Fork the repo
  2. Create a branch (git checkout -b feature/my-feature)
  3. Make your changes
  4. Run npm run typecheck and npm run build
  5. Commit and push
  6. Open a pull request

License

MIT

Part of the RightNow AI ecosystem. Member of the NVIDIA Inception Program.

常见问题

io.github.RightNow-AI/forge-mcp-server 是什么?

可将 PyTorch 转换为运行在真实 datacenter GPUs 上的高性能 CUDA/Triton kernels,最高提速达 14x。

相关 Skills

MCP构建

by anthropics

Universal
热门

聚焦高质量 MCP Server 开发,覆盖协议研究、工具设计、错误处理与传输选型,适合用 FastMCP 或 MCP SDK 对接外部 API、封装服务能力。

想让 LLM 稳定调用外部 API,就用 MCP构建:从 Python 到 Node 都有成熟指引,帮你更快做出高质量 MCP 服务器。

平台与服务
未扫描111.8k

Slack动图

by anthropics

Universal
热门

面向Slack的动图制作Skill,内置emoji/消息GIF的尺寸、帧率和色彩约束、校验与优化流程,适合把创意或上传图片快速做成可直接发送的Slack动画。

帮你快速做出适配 Slack 的动图,内置约束规则和校验工具,少踩上传与播放坑,做表情包和演示都更省心。

平台与服务
未扫描111.8k

MCP服务构建器

by alirezarezvani

Universal
热门

从 OpenAPI 一键生成 Python/TypeScript MCP server 脚手架,并校验 tool schema、命名规范与版本兼容性,适合把现有 REST API 快速发布成可生产演进的 MCP 服务。

帮你快速搭建 MCP 服务与后端 API,脚手架完善、扩展顺手,尤其适合想高效验证服务能力的开发者。

平台与服务
未扫描9.8k

相关 MCP Server

Slack 消息

编辑精选

by Anthropic

热门

Slack 是让 AI 助手直接读写你的 Slack 频道和消息的 MCP 服务器。

这个服务器解决了团队协作中需要 AI 实时获取 Slack 信息的痛点,特别适合开发团队让 Claude 帮忙汇总频道讨论或发送通知。不过,它目前只是参考实现,文档有限,不建议在生产环境直接使用——更适合开发者学习 MCP 如何集成第三方服务。

平台与服务
83.1k

by netdata

热门

io.github.netdata/mcp-server 是让 AI 助手实时监控服务器指标和日志的 MCP 服务器。

这个工具解决了运维人员需要手动检查系统状态的痛点,最适合 DevOps 团队让 Claude 自动分析性能数据。不过,它依赖 NetData 的现有部署,如果你没用过这个监控平台,得先花时间配置。

平台与服务
78.3k

by d4vinci

热门

Scrapling MCP Server 是专为现代网页设计的智能爬虫工具,支持绕过 Cloudflare 等反爬机制。

这个工具解决了爬取动态网页和反爬网站时的头疼问题,特别适合需要批量采集电商价格或新闻数据的开发者。不过,它依赖外部浏览器引擎,资源消耗较大,不适合轻量级任务。

平台与服务
34.9k

评论