io.github.RLabs-Inc/gemini-mcp

编码与调试

by rlabs-inc

Gemini 3 MCP 服务器,提供 30+ 工具,涵盖图像、视频、研究、TTS、代码执行与 CLI 等能力。

把 Gemini 3 的多模态能力和代码执行整合进同一个 MCP 服务器,30+ 工具覆盖图像、视频、研究与 CLI,能明显减少开发时的工具切换。

什么是 io.github.RLabs-Inc/gemini-mcp

Gemini 3 MCP 服务器,提供 30+ 工具,涵盖图像、视频、研究、TTS、代码执行与 CLI 等能力。

README

MCP Server Gemini

A Model Context Protocol (MCP) server for integrating Google's Gemini 3 models with Claude Code, enabling powerful collaboration between both AI systems. Now with a beautiful CLI!

npm version MCP Registry

MCP Registry Support: Now discoverable in the official MCP ecosystem!

Features

FeatureDescription
Deep Research AgentAutonomous multi-step research with web search and citations
Token CountingCount tokens and estimate costs before API calls
Text-to-Speech30 unique voices, single speaker or two-speaker dialogues
URL AnalysisAnalyze, compare, and extract data from web pages
Context CachingCache large documents for efficient repeated queries
YouTube AnalysisAnalyze videos by URL with timestamp clipping
Document AnalysisPDFs, DOCX, spreadsheets with table extraction
4K Image GenerationGenerate images up to 4K with 10 aspect ratios
Multi-Turn Image EditingIteratively refine images through conversation
Video GenerationCreate videos with Veo 2.0 (async with polling)
Code ExecutionGemini writes and runs Python code (pandas, numpy, matplotlib)
Google SearchReal-time web information with inline citations
Structured OutputJSON responses with schema validation
Data ExtractionExtract entities, facts, sentiment from text
Thinking LevelsControl reasoning depth (minimal/low/medium/high)
Direct QuerySend prompts to Gemini 3 Pro/Flash models
BrainstormingClaude + Gemini collaborative problem-solving
Code AnalysisAnalyze code for quality, security, performance
SummarizationSummarize content at different detail levels

Quick Installation

MCP Server for Claude Code

bash
# Using npm (Recommended)
claude mcp add gemini -s user -- env GEMINI_API_KEY=YOUR_KEY npx -y @rlabs-inc/gemini-mcp

# Using bun
claude mcp add gemini -s user -- env GEMINI_API_KEY=YOUR_KEY bunx @rlabs-inc/gemini-mcp

CLI (Global Install)

bash
# Install globally
npm install -g @rlabs-inc/gemini-mcp

# Set your API key once (stored securely)
gcli config set api-key YOUR_KEY

# Now use any command!
gcli search "latest news"
glci image "sunset over mountains" --ratio 16:9

Get your API key: Visit Google AI Studio - it's free and takes seconds!

Installation Options

bash
# With verbose logging
claude mcp add gemini -s user -- env GEMINI_API_KEY=YOUR_KEY VERBOSE=true bunx -y @rlabs-inc/gemini-mcp

# With custom output directory for generated images/videos
claude mcp add gemini -s user -- env GEMINI_API_KEY=YOUR_KEY GEMINI_OUTPUT_DIR=/path/to/output bunx -y @rlabs-inc/gemini-mcp

Available Tools

gemini-query

Direct queries to Gemini with thinking level control:

code
prompt: "Explain quantum entanglement"
model: "pro" or "flash"
thinkingLevel: "low" | "medium" | "high" (optional)
  • low: Fast responses, minimal reasoning
  • medium: Balanced (Flash only)
  • high: Deep reasoning for complex tasks (default)

gemini-generate-image

Generate images with Nano Banana Pro (Claude can SEE them!):

code
prompt: "a futuristic city at sunset"
style: "cyberpunk" (optional)
aspectRatio: "16:9" (1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9)
imageSize: "2K" (1K, 2K, 4K)
useGoogleSearch: false (ground in real-world info)
thinkingLevel: "high" (optional - minimal, low, medium, high)
personGeneration: "ALLOW_ALL" (optional - ALLOW_ALL, ALLOW_ADULT, ALLOW_NONE)
seed: 42 (optional - for reproducible results)

gemini-start-image-edit

Start a multi-turn image editing session:

code
prompt: "a cozy cabin in the mountains"
aspectRatio: "16:9"
imageSize: "2K"
useGoogleSearch: false
thinkingLevel: "high" (optional - minimal, low, medium, high)
personGeneration: "ALLOW_ALL" (optional - ALLOW_ALL, ALLOW_ADULT, ALLOW_NONE)
seed: 42 (optional - for reproducible results)

Returns a session ID for iterative editing.

gemini-continue-image-edit

Continue refining an image:

code
sessionId: "edit-123456789"
prompt: "add snow on the roof and make it nighttime"

gemini-end-image-edit

Close an editing session:

code
sessionId: "edit-123456789"

gemini-list-image-sessions

List all active editing sessions.

gemini-generate-video

Generate videos using Veo:

code
prompt: "a cat playing piano"
aspectRatio: "16:9" (optional)
negativePrompt: "blurry, text" (optional)

Video generation is async (takes 1-5 minutes). Use gemini-check-video to poll.

gemini-check-video

Check video generation status and download when complete:

code
operationId: "operations/xxx-xxx-xxx"

gemini-analyze-code

Analyze code for issues:

code
code: "function foo() { ... }"
language: "typescript" (optional)
focus: "quality" | "security" | "performance" | "bugs" | "general"

gemini-analyze-text

Analyze text content:

code
text: "Your text here..."
type: "sentiment" | "summary" | "entities" | "key-points" | "general"

gemini-brainstorm

Collaborative brainstorming:

code
prompt: "How could we implement real-time collaboration?"
claudeThoughts: "I think we should use WebSockets..."
maxRounds: 3 (optional)

gemini-summarize

Summarize content:

code
content: "Long text to summarize..."
length: "brief" | "moderate" | "detailed"
format: "paragraph" | "bullet-points" | "outline"

gemini-run-code

Let Gemini write and execute Python code:

code
prompt: "Calculate the first 50 prime numbers and plot them"
data: "optional CSV data to analyze" (optional)

Supports libraries: numpy, pandas, matplotlib, scipy, scikit-learn, tensorflow, and more. Generated charts are saved to the output directory and returned as images.

gemini-search

Real-time web search with citations:

code
query: "What happened in tech news this week?"
returnCitations: true (default)

Returns grounded responses with inline citations and source URLs.

gemini-structured

Get JSON responses matching a schema:

code
prompt: "Extract the meeting details from this email..."
schema: '{"type":"object","properties":{"date":{"type":"string"},"attendees":{"type":"array"}}}'
useGoogleSearch: false (optional)

gemini-extract

Convenience tool for common extraction patterns:

code
text: "Your text to analyze..."
extractType: "entities" | "facts" | "summary" | "keywords" | "sentiment" | "custom"
customFields: "name, date, amount" (for custom extraction)

gemini-youtube

Analyze YouTube videos directly:

code
url: "https://www.youtube.com/watch?v=..."
question: "What happens at 2:30?"
startTime: "1m30s" (optional, for clipping)
endTime: "5m00s" (optional, for clipping)

gemini-youtube-summary

Quick video summarization:

code
url: "https://www.youtube.com/watch?v=..."
style: "brief" | "detailed" | "bullet-points" | "chapters"

gemini-analyze-document

Analyze PDFs and documents:

code
filePath: "/path/to/document.pdf"
question: "Summarize the key findings"
mediaResolution: "low" | "medium" | "high"

gemini-summarize-pdf

Quick PDF summarization:

code
filePath: "/path/to/document.pdf"
style: "brief" | "detailed" | "outline" | "key-points"

gemini-extract-tables

Extract tables from documents:

code
filePath: "/path/to/document.pdf"
outputFormat: "markdown" | "csv" | "json"

Workflow: Claude + Gemini

The killer combination for development:

ClaudeGemini
Complex logicFrontend/UI
ArchitectureVisual components
Backend codeImage generation
IntegrationReact/CSS styling
ReasoningCreative generation

Example workflow:

  1. Ask Claude to design the backend API
  2. Use gemini-generate-image for UI mockups
  3. Ask Gemini to generate React components via gemini-query
  4. Use multi-turn editing to refine visuals
  5. Let Claude wire everything together

Environment Variables

VariableRequiredDefaultDescription
GEMINI_API_KEYYes-Your Google Gemini API key
GEMINI_OUTPUT_DIRNo./gemini-outputWhere to save generated files
GEMINI_MODELNo-Override model for init test
GEMINI_PRO_MODELNogemini-3-pro-previewPro model (Gemini 3)
GEMINI_FLASH_MODELNogemini-3-flash-previewFlash model (Gemini 3)
GEMINI_IMAGE_MODELNogemini-3-pro-image-previewImage model (Nano Banana Pro)
GEMINI_IMAGE_THINKING_LEVELNohighDefault thinking level for image generation (minimal, low, medium, high)
GEMINI_VIDEO_MODELNoveo-2.0-generate-001Video model
VERBOSENofalseEnable verbose logging
QUIETNofalseMinimize logging
GEMINI_ENABLED_TOOLSNo-Comma-separated list of tool groups to load (e.g., query,search,image-gen)
GEMINI_TOOL_PRESETNo-Preset profile: minimal, text, image, research, media, full

Tool Configuration

By default, all 37 tools are loaded. To reduce context usage, configure which tools to load:

Available Presets

PresetTool Groups
minimalquery, brainstorm
textquery, brainstorm, analyze, summarize, structured
imagequery, image-gen, image-edit, image-analyze
researchquery, search, deep-research, url-context, document
mediaquery, image-gen, image-edit, image-analyze, video-gen, youtube, speech
fullAll 18 tool groups (default)

Using Presets

bash
# Minimal - query and brainstorm
GEMINI_TOOL_PRESET=minimal

# Text processing
GEMINI_TOOL_PRESET=text  # query, brainstorm, analyze, summarize, structured

# Image workflows
GEMINI_TOOL_PRESET=image  # query, image-gen, image-edit, image-analyze

# Research workflows
GEMINI_TOOL_PRESET=research  # query, search, deep-research, url-context, document

Using Explicit Tool Lists

bash
# Only specific tools
GEMINI_ENABLED_TOOLS=query,search,image-gen

Combining Preset + Explicit

bash
# Start with preset, add extras
GEMINI_TOOL_PRESET=minimal
GEMINI_ENABLED_TOOLS=search,image-gen  # Adds to minimal preset

Available Tool Groups

GroupTools
querygemini-query
brainstormgemini-brainstorm
analyzegemini-analyze-code, gemini-analyze-text
summarizegemini-summarize
image-gengemini-generate-image, gemini-image-prompt
image-editgemini-start-image-edit, gemini-continue-image-edit, gemini-end-image-edit, gemini-list-image-sessions
video-gengemini-generate-video, gemini-check-video
code-execgemini-run-code
searchgemini-search
structuredgemini-structured, gemini-extract
youtubegemini-youtube, gemini-youtube-summary
documentgemini-analyze-document, gemini-summarize-pdf, gemini-extract-tables
url-contextgemini-analyze-url, gemini-compare-urls, gemini-extract-from-url
cachegemini-create-cache, gemini-query-cache, gemini-list-caches, gemini-delete-cache
speechgemini-speak, gemini-dialogue, gemini-list-voices
token-countgemini-count-tokens
deep-researchgemini-deep-research, gemini-check-research, gemini-research-followup
image-analyzegemini-analyze-image

Manual Installation

Global Install

bash
# Using npm
npm install -g @rlabs-inc/gemini-mcp

# Using bun
bun install -g @rlabs-inc/gemini-mcp

Claude Code Configuration

json
{
  "gemini": {
    "command": "npx",
    "args": ["-y", "@rlabs-inc/gemini-mcp"],
    "env": {
      "GEMINI_API_KEY": "your-api-key",
      "GEMINI_OUTPUT_DIR": "/path/to/save/files"
    }
  }
}

Troubleshooting

Rate Limits (429 Errors)

If you're hitting rate limits on the free tier:

  • Set GEMINI_MODEL=gemini-3-flash-preview to use Flash for init (higher limits)
  • Or upgrade to a paid plan

Connection Issues

  1. Verify your API key at Google AI Studio
  2. Check server status: claude mcp list
  3. Try with verbose logging: VERBOSE=true

Image/Video Issues

  • Ensure your API key has access to image/video generation
  • Check output directory permissions
  • Files save to GEMINI_OUTPUT_DIR (default: ./gemini-output)
  • For 4K images, generation takes longer

Previous Versions

0.7.2

Beautiful CLI with Themes! Use Gemini directly from your terminal:

bash
# Install globally
npm install -g @rlabs-inc/gemini-mcp

# Set your API key once
gcli config set api-key YOUR_KEY

# Generate images, videos, search, research, and more!
gcli image "a cat astronaut" --size 4K
gcli search "latest AI news"
gcli research "quantum computing applications" --wait
gcli speak "Hello world" --voice Puck

5 Beautiful Themes: terminal, neon, ocean, forest, minimal

CLI Commands:

  • gcli query - Direct Gemini queries with thinking levels
  • gcli search - Real-time web search with citations
  • gcli research - Deep research agent
  • gcli image - Generate images (up to 4K)
  • gcli video - Generate videos with Veo
  • gcli speak - Text-to-speech with 30 voices
  • gcli tokens - Count tokens and estimate costs
  • gcli config - Manage settings

v0.6.x: Deep Research, Token Counting, TTS, URL analysis, Context Caching v0.5.x: 30+ tools, YouTube analysis, Document analysis v0.4.x: Code execution, Google Search v0.3.x: Thinking levels, Structured output, 4K images v0.2.x: Image/Video generation with Veo


Development

bash
git clone https://github.com/rlabs-inc/gemini-mcp.git
cd gemini-mcp
bun install
bun run build
bun run dev -- --verbose

Scripts

CommandDescription
bun run buildBuild for production
bun run devDevelopment mode with watch
bun run typecheckType check without emitting
bun run formatFormat with Prettier
bun run lintLint with ESLint

License

MIT License


Made with Claude + Gemini working together

常见问题

io.github.RLabs-Inc/gemini-mcp 是什么?

Gemini 3 MCP 服务器,提供 30+ 工具,涵盖图像、视频、研究、TTS、代码执行与 CLI 等能力。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描109.6k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描109.6k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描109.6k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
82.9k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.5k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论