Gemini Vision
内容与创意by Artin0123
Analyze images and videos with Gemini to get fast, reliable visual insights. Handle content from URLs and YouTube links. Summarize scenes, identify objects, and extract key details for reports or automation. This is remote version, check local branch in github to use local tools.
什么是 Gemini Vision?
Analyze images and videos with Gemini to get fast, reliable visual insights. Handle content from URLs and YouTube links. Summarize scenes, identify objects, and extract key details for reports or automation. This is remote version, check local branch in github to use local tools.
README
image-mcp-server-gemini
This is remote server, use local version for local images and videos.
Features
- Analyze one or more image URLs with a single tool call.
- Analyze YouTube videos without downloading files locally.
- Supply an API key and optionally override the Gemini model via environment variables.
- File size limit: Images are limited to 16 MB to ensure fast processing.
- YouTube videos: No size limit as they are streamed directly by Gemini API.
Installation
Installing via Smithery
Install the server in Claude Desktop:
npx -y @smithery/cli install @Artin0123/gemini-image-mcp-server --client claude
Manual Installation
# Clone the repository
git clone https://github.com/Artin0123/gemini-vision-mcp.git
cd gemini-vision-mcp
# Install dependencies
npm install
# Compile TypeScript to dist/
npm run build
Configuration
Create a Gemini API key in Google AI Studio and provide GEMINI_API_KEY to the server.
{
"mcpServers": {
"gemini-media": {
"command": "node",
"args": ["/absolute/path/to/gemini-vision-mcp/dist/index.js"],
"env": {
"GEMINI_API_KEY": "your_api_key_here",
"GEMINI_MODEL": "models/gemini-flash-lite-latest"
}
}
}
}
If no key is supplied, the server can still start (handy for automated scans), but any tool invocation will return a configuration error until a valid API key is configured.
Model override
The server defaults to models/gemini-flash-lite-latest. Override it by either:
Setting the
GEMINI_MODELenvironment variable, or ProvidingmodelNamein the Smithery/SDK configuration schema.
Available tools
analyze_image: Analyze one or more image URLs. Maximum file size: 16 MB per image.analyze_youtube_video: Analyze a YouTube video from URL. No size limit.
Image URLs are downloaded and processed with a 16 MB size limit to ensure fast response times. Files exceeding this limit will result in an error message indicating the actual file size.
YouTube videos are streamed directly by Gemini API without downloading, so there is no size restriction.
Prompt examples
Please analyze this product photo: https://teimg-bgr.pages.dev/file/mvYT6KeF.webp
Extract the main talking points from this clip: https://www.youtube.com/watch?v=dQw4w9WgXcQ
Development
npm install
npm test
npm run build
The test suite exercises URL forwarding, MIME handling, and configuration fallbacks.
License
MIT
常见问题
Gemini Vision 是什么?
Analyze images and videos with Gemini to get fast, reliable visual insights. Handle content from URLs and YouTube links. Summarize scenes, identify objects, and extract key details for reports or automation. This is remote version, check local branch in github to use local tools.
相关 Skills
文档共著
by anthropics
围绕文档、提案、技术规格、决策记录等写作任务,按上下文收集、结构迭代、读者测试三步协作共创,减少信息遗漏,写出更清晰、经得起他人阅读的内容。
✎ 写文档、方案或技术规格时容易思路散、信息漏,它用结构化共著流程帮你高效传递上下文、反复打磨内容,还能从读者视角做验证。
内部沟通
by anthropics
按公司常用模板和语气快速起草内部沟通内容,覆盖 3P 更新、状态报告、领导汇报、项目进展、事故复盘、FAQ 与 newsletter,适合需要统一格式的团队沟通场景。
✎ 按公司偏好的模板快速产出状态汇报、领导更新和 FAQ,既省去反复改稿,也让内部沟通更统一、更专业。
平面设计
by anthropics
先生成视觉哲学,再落地成原创海报、艺术画面或其他静态设计,输出 .png/.pdf,强调构图、色彩与空间表达,适合需要高完成度视觉成品的场景。
✎ 做海报、插画或静态视觉稿时,用它能快速产出兼顾美感与版式的PNG/PDF成品,原创设计更省心,也更适合规避版权风险。
相关 MCP Server
by nirholas
免费的加密新闻聚合 MCP,汇集 Bitcoin、Ethereum、DeFi、Solana 与 altcoins 资讯源。
by alisaitteke
用于Adobe Photoshop自动化的MCP server,让AI assistants直接控制Photoshop。
by ProfessionalWiki
让 Large Language Model 客户端无缝连接任意 MediaWiki 站点,可创建、更新、搜索页面,并通过 OAuth 2.0 安全管理内容。