io.github.GoogleCloudPlatform/gemini-cloud-assist-mcp
DevOpsby googlecloudplatform
用于理解、管理并排查 GCP 环境问题的 MCP Server,帮助模型更高效地诊断云资源、配置与运行状态。
什么是 io.github.GoogleCloudPlatform/gemini-cloud-assist-mcp?
用于理解、管理并排查 GCP 环境问题的 MCP Server,帮助模型更高效地诊断云资源、配置与运行状态。
README
Gemini Cloud Assist MCP server
This server connects Model Context Protocol (MCP) clients such as the Gemini CLI to the Gemini Cloud Assist APIs. It allows you to use natural language to understand, manage, and troubleshoot your Google Cloud environment directly from the local command line.
[!NOTE] The Google Cloud Platform Terms of Service (available at https://cloud.google.com/terms/) and the Data Processing and Security Terms (available at https://cloud.google.com/terms/data-processing-terms) do not apply to any component of the Gemini Cloud Assist MCP Server software.
To learn more about Gemini Cloud Assist, see the Gemini Cloud Assist overview in the Google Cloud documentation.
✨ Key features
- Create and run investigations: Create and run Cloud Assist investigations to find the root cause of complex issues.
- Dig deeper and iterate on investigations: Get more details about investigation outcomes and add observations to refine the analysis.
Quick start
Before you begin, ensure you have the following set up:
- Node.js (v20 or later).
- Git.
- Google Cloud SDK installed and configured.
- A Google Cloud project.
- The following IAM roles on your user account:
roles/serviceusage.serviceUsageAdmin: Required to enable the Cloud Assist APIs.roles/geminicloudassist.user: Required to make requests to the Cloud Assist APIs.
Step 1: Authenticate to Google Cloud
The Gemini Cloud Assist MCP server uses local Application Default Credentials (ADC) to securely authenticate to Google Cloud. To set up ADC, run the following gcloud commands:
# Authenticate your user account to the gcloud CLI
gcloud auth login
# Set up Application Default Credentials for the server.
# This allows the MCP server to securely make Google Cloud API calls on your behalf.
gcloud auth application-default login
Step 2: Configure your MCP client
Below is the standard configuration snippet you will use. It tells the client to use npx to download and run the latest version of the MCP server on demand. Paste the MCP configuration to an MCP client of your choosing. We recommend using the Gemini CLI for the best experience.
MCP config
"mcpServers" : {
"GeminiCloudAssist": {
"command": "npx",
"args": ["-y", "@google-cloud/gemini-cloud-assist-mcp@latest"],
"timeout": 600000
}
}
Setup instructions for MCP clients
Gemini CLI
Option 1 (recommended): Extension installation
Install the MCP server as a Gemini CLI extension:
gemini extensions install https://github.com/GoogleCloudPlatform/gemini-cloud-assist-mcp
Validate successful installation by running:
gemini extensions list
Option 2: Global installation
Add the MCP config to the ~/.gemini/settings.json. This gives you access to MCP Tools in every Gemini CLI session.
Option 3: Project-level installation
Add the MCP config to your /path/to/project/.gemini/settings.json file. This gives you access to the MCP Tools in Gemini CLI sessions created under this project folder only.
Cursor
Claude Desktop
- Launch the Claude Desktop application.
- Navigate to Settings > Developer.
- Click the Edit config button to open the claude_desktop_config.json file.
- Add the MCP config to the claude_desktop_config.json file.
- Save the file and restart Claude Desktop.
Claude Code
claude mcp add GeminiCloudAssist npx -y @google-cloud/gemini-cloud-assist-mcp@latest
Follow detailed instructions on Anthropic's documentation for more information.
Cline
Update cline_mcp_settings.json with the MCP config.
Follow detailed instructions on Cline's website for more information.
🛠️ MCP tools
create_investigation: Creates a new Gemini Cloud Assist investigation.run_investigation: Runs an existing Gemini Cloud Assist investigation that you created.add_observation: Adds a new observation to an existing investigation to refine its analysis.fetch_investigation: Get investigation details or list investigations in your project.
Examples
This section shows examples of the tasks you can perform with the Cloud Assist MCP server with the Gemini CLI, including example prompts and responses.
<details> <summary>Create and run investigations</summary>Using 1 MCP server (Ctrl+T to view descriptions)
╭───────────────────────────────────────────────────────────────────────────────────╮
│ > Investigate the pending pods issue in my test-cluster-1 cluster │
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ I can help with that. I'll start an investigation into the pending pods issue in
your test-cluster-1 cluster. To do that, I'll run a few commands to get the
information I need to start the investigation.
╭──────────────────────────────────────────────────────────────────────────────────╮
│ ✔ Shell gcloud config get-value project (Getting the configured GCP project.) |
... |
... |
╰──────────────────────────────────────────────────────────────────────────────────╯
✦ I'll now create an investigation into the pending pods issue in your test-cluster-1 GKE cluster.
╭──────────────────────────────────────────────────────────────────────────────────╮
│ ✔ create_investigation (gemini-cloud-assist-mcp MCP Server) create_investigati… │
│ │
│ ## Gemini Cloud Assist Investigation │
│ │
│ **Name**: [Gemini CLI] Pending pods in test-cluster-1 GKE cluster │
... |
... |
╰──────────────────────────────────────────────────────────────────────────────────╯
✦ I've created the investigation. Now I'll run it to analyze the issue.
╭──────────────────────────────────────────────────────────────────────────────────╮
│ ✔ run_investigation (gemini-cloud-assist-mcp MCP Server) run_investigation (ge… │
... |
... |
╰──────────────────────────────────────────────────────────────────────────────────╯
✦ The investigation is complete. Here's what I found:
The main problem is that your GKE cluster `test-cluster-1` is out of resources. It
has only one node, and that node is full. This is causing new pods to get stuck in
a "pending" state because there's no room for them.
The investigation also revealed that Node Auto-Provisioning (NAP) is disabled. This
is a key feature that allows the cluster to automatically add more nodes when it's
running low on resources. Because NAP is disabled, the cluster can't scale up to
meet the demand.
I have two recommendations to fix this:
1. Enable Node Auto-Provisioning for the test-cluster-1 cluster. This is the best
long-term solution because it will allow the cluster to automatically scale up
and down as needed.
2. Manually add more nodes to the cluster. This is a good short-term solution that
will immediately relieve the resource pressure and allow the pending pods to
be scheduled.
I recommend that you start by enabling Node Auto-Provisioning. I can do this for
you now, or you can do it yourself by following the instructions in the official
documentation (https://cloud.google.com/kubernetes-engine/docs/how-to/node-auto-pro
visioning#enable).
What would you like to do?
✦ The investigation is complete. Here's a summary of the findings:
Primary Cause:
* Ephemeral Storage Exhaustion: The main reason for the pod crashes is
that the GKE nodes are running out of ephemeral storage. This is
causing a DiskPressure condition on the nodes, which in turn leads to
the kubelet evicting pods to free up space.
╭───────────────────────────────────────────────────────────────────────────────────╮
│ > Do you know which pod is consuming all the storage space? │
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ That's a great question. The investigation points to the emptyDir volume
in the gke-001-deployment pods. I can check the storage usage
inside one of the running pods to confirm this.
First, I'll list the pods to find one that is currently running.
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ Shell kubectl get pods -n gke-001-namespace (This command will list.. │
... |
... |
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ That's unexpected. The /data directory in that pod is only using 4.0K,
which is negligible. This contradicts the hypothesis that the emptyDir
volume is the primary consumer of ephemeral storage.
This means something else on the node is consuming the storage. It could
be logs, or another process running on the node that isn't immediately
obvious.
To figure this out, I'll need to get a shell on the node itself and
investigate the storage usage from there. I'll start by getting an
interactive shell on the node
gke-001-cluster-gke-001-node-pool-001.
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ Shell gcloud compute ssh gke-001-cluster-gke-001-node-pool-001… │
... |
... |
│ ERROR: (gcloud.compute.ssh) [/usr/local/bin/ssh] exited │
│ with return code [255]. |
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ It seems I can't directly SSH into the node. This is likely due to
firewall rules or the node not having an external IP address.
However, I can still investigate the node's storage. I'll add an
observation to the investigation, noting that the emptyDir volume is not
the cause of the storage issue, and then re-run the investigation. This
will allow Gemini to analyze the node's logs and other signals to
pinpoint the true source of the storage consumption.
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ add_observation (gemini-cloud-assist MCP Server) {"observation":"Fu… │
│ │
│ ## Gemini Cloud Assist Investigation │
│ │
│ **Name**: [Gemini CLI] GKE pods crashing due to ephemeral storage issues │
... |
... |
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ I've added the new information to the investigation. Now I'll re-run the
analysis. This will take the new observation into account and should help
us identify the real source of the storage consumption.
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ run_investigation (gemini-cloud-assist MCP Server) {"investigationI… │
│ │
... |
... |
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ The new investigation results are in.
Key Findings:
* `emptyDir` Ruled Out: The investigation confirms our finding that the
emptyDir volume is not the cause of the storage pressure.
* Container Logs are the Likely Culprit: The analysis strongly suggests
that the ephemeral storage is being consumed by container logs. The
kubelet is failing to reopen log files due to "no space left on
device," which is a clear indicator of this issue.
* Another Deployment is Implicated: The investigation also points to
another deployment, gke-002-deployment, as a potential source of the
problem. The logs mention that this deployment is "configured to fill
up disk."
╭───────────────────────────────────────────────────────────────────────────────────╮
│ > Give me an overview of my GKE node pools. For each one, show me its │
│ cluster and MIGs │
╰───────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ search_and_analyze_gcp_resources (gemini-cloud-assist-mcp MCP Server) s… │
│ │
│ Based on the output of the `container.googleapis.com` API │
│ calls, here is an overview of your GKE node pools, including │
│ their associated clusters and Managed Instance Groups (MIGs): │
│ │
│ * **Cluster: `cluster-1` (Zone: `us-central1-a`)** │
│ * **Node Pool:** `default-pool` │
│ * **Machine Type:** `custom-1-3840` │
│ * **Initial Node Count:** 3 │
│ * **Autoscaling:** Enabled (min: 1, max: 5) │
│ * **MIGs:** `gke-cluster-1-default-pool-001-grp` │
│ │
│ * **Cluster: `anthos-cluster-1` (Zone: `us-central1-c`)** |
│ * **Node Pool:** `default-pool` │
│ * **Machine Type:** `e2-standard-2` │
│ * **Initial Node Count:** 3 │
│ * **Autoscaling:** Enabled (min: 2, max: 3) │
│ * **MIGs:** `gke-anthos-cluster-default-pool-001-grp` |
│ │
│ * **Cluster: `cluster-2` (Zone: `us-central1-c`)** │
│ * **Node Pool:** `default-pool-1` │
│ * **Machine Type:** `e2-micro` │
│ * **Initial Node Count:** 3 │
│ * **Autoscaling:** Enabled (max: 5) │
│ * **MIGs:** `gke-cluster-2-default-pool-001-grp` │
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ OK. Here's an overview of your GKE node pools, their clusters, and MIGs. Is
there anything else I can help you with?
╭───────────────────────────────────────────────────────────────────────────────────╮
│ > Tell me details about each MIG │
╰───────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────╮
│ ✔ search_and_analyze_gcp_resources (gemini-cloud-assist-mcp MCP Server) s… │
│ │
│ I found the following details for the requested Managed │
│ Instance Groups (MIGs): │
│ │
│ * **gke-cluster-1-default-pool-001-grp**: │
│ * **Region/Zone**: us-central1-a │
│ * **Creation Timestamp**: 2024-03-25T13:57:25.588-07:00 │
│ * **Current Actions**: 2 instances with no current │
│ action │
│ * **Target Size**: 2 │
│ * **Status**: Stable │
│ * **Instance Template**: │
│ `gke-cluster-1-default-pool-001` │
│ * **Update Policy**: Opportunistic, minimal action │
│ REPLACE, max surge 1, max unavailable 1 │
│ │
│ * **gke-anthos-cluster-default-pool-001-grp**: │
│ * **Region/Zone**: us-central1-c │
│ * **Creation Timestamp**: 2024-04-03T09:09:32.058-07:00 │
│ * **Current Actions**: 2 instances with no current │
│ action │
│ * **Target Size**: 2 │
│ * **Status**: Stable │
│ * **Instance Template**: │
│ `gke-anthos-cluster-default-pool-001` │
│ * **Update Policy**: Opportunistic, minimal action │
│ REPLACE, max surge 1, max unavailable 1 │
│ * **Named Ports**: http (port 3337) │
│ │
│ * **gke-cluster-2-default-pool-001-grp**: │
│ * **Region/Zone**: us-central1-c │
│ * **Creation Timestamp**: 2024-11-13T21:46:01.072-08:00 │
│ * **Current Actions**: 5 instances with no current │
│ action │
│ * **Target Size**: 5 │
│ * **Status**: Stable │
│ * **Instance Template**: │
│ `gke-cluster-2-default-pool-001` │
│ * **Update Policy**: Opportunistic, minimal action │
│ REPLACE, max surge 1, max unavailable 1 │
│ │
│ These details were retrieved by calling the Compute Engine │
│ API's `instanceGroupManagers.get` and │
│ `instanceGroupManagers.aggregatedList` methods. │
╰───────────────────────────────────────────────────────────────────────────────────╯
✦ Here are the details for each MIG. Let me know if you need anything else.
Contributing
- If you encounter a bug, please file an issue on our GitHub Issues page.
- Before sending a pull request, please review our Contributing Guide.
License
This project is licensed under the Apache 2.0 License and provided as-is, without warranty or representation for any use or purpose. For details, see the LICENSE file.
常见问题
io.github.GoogleCloudPlatform/gemini-cloud-assist-mcp 是什么?
用于理解、管理并排查 GCP 环境问题的 MCP Server,帮助模型更高效地诊断云资源、配置与运行状态。
相关 Skills
可观测性设计
by alirezarezvani
面向生产系统规划可落地的可观测性体系,串起指标、日志、链路追踪与 SLI/SLO、错误预算、告警和仪表盘设计,适合搭建监控平台与优化故障响应。
✎ 把监控、日志、链路追踪串起来,帮助团队从设计阶段构建可观测性,排障更快、系统演进更稳。
资深开发运维
by alirezarezvani
覆盖 CI/CD 流水线生成、Terraform 基建脚手架和自动化部署,适合在 AWS、GCP、Azure 上搭建云原生发布流程,管理 Docker/Kubernetes 基础设施并持续优化交付。
✎ 把CI/CD、基础设施即代码、容器与监控串成一条交付链,尤其适合AWS/GCP/Azure多云团队高效落地。
环境密钥管理
by alirezarezvani
统一梳理dev/staging/prod的.env和密钥流程,自动生成.env.example、校验必填变量、扫描Git历史泄漏,并联动Vault、AWS SSM、1Password、Doppler完成轮换。
✎ 统一管理环境变量、密钥与配置,减少泄露和部署混乱,安全治理与团队协作一起做好,DevOps 场景很省心。
相关 MCP Server
kubefwd
编辑精选by txn2
kubefwd 是让 AI 帮你批量转发 Kubernetes 服务到本地的开发神器。
✎ 微服务开发者最头疼的本地调试问题,它一键搞定——自动分配 IP 避免端口冲突,还能用自然语言查询状态。但依赖 AI 工作流,纯命令行爱好者可能觉得不够直接。
Cloudflare
编辑精选by Cloudflare
Cloudflare MCP Server 是让你用自然语言管理 Workers、KV 和 R2 等云资源的工具。
✎ 这个工具解决了开发者频繁切换控制台和文档的痛点,特别适合那些在 Cloudflare 上部署无服务器应用、需要快速调试或管理配置的团队。不过,由于它依赖多个子服务器,初次设置可能有点繁琐,建议先从 Workers Bindings 这类核心功能入手。
Terraform
编辑精选by hashicorp
Terraform MCP Server 是让 AI 助手直接操作 Terraform Registry 和 HCP Terraform 的桥梁。
✎ 如果你经常在 Terraform 里翻文档找模块配置,这个服务器能省不少时间——直接问 Claude 就能生成准确的代码片段。最适合管理多云基础设施的团队,但注意它目前只适合本地使用,别在生产环境里暴露 HTTP 端点。