io.github.TANTIOPE/datadog-mcp

安全与合规

by tantiope

提供完整 Datadog API 访问,覆盖 monitors、logs、metrics、traces、dashboards 及 observability 工具。

什么是 io.github.TANTIOPE/datadog-mcp

提供完整 Datadog API 访问,覆盖 monitors、logs、metrics、traces、dashboards 及 observability 工具。

README

Datadog MCP Server

Quality gate CI/Release npm License Coverage

DISCLAIMER: This is a community-maintained project and is not officially affiliated with, endorsed by, or supported by Datadog, Inc. This MCP server utilizes the Datadog API but is developed independently.

MCP server providing AI assistants with full Datadog observability access. Features grep-like log search, APM trace filtering with duration/status/error queries, smart sampling modes for token efficiency, and cross-correlation between logs, traces, and metrics. Supports both stdio (local) and http (remote/Kubernetes) transports.

Quick Start

Minimal Claude Desktop / VS Code / Cursor config — just the two required keys:

json
{
  "mcpServers": {
    "datadog": {
      "command": "npx",
      "args": ["-y", "datadog-mcp"],
      "env": {
        "DD_API_KEY": "your-api-key",
        "DD_APP_KEY": "your-app-key"
      }
    }
  }
}

With optional tuning (EU site, custom default limits, longer log windows):

json
{
  "mcpServers": {
    "datadog": {
      "command": "npx",
      "args": ["-y", "datadog-mcp"],
      "env": {
        "DD_API_KEY": "your-api-key",
        "DD_APP_KEY": "your-app-key",
        "DD_SITE": "datadoghq.eu",
        "MCP_DEFAULT_LIMIT": "50",
        "MCP_DEFAULT_LOG_LINES": "200",
        "MCP_DEFAULT_METRIC_POINTS": "1000",
        "MCP_DEFAULT_TIME_RANGE": "24"
      }
    }
  }
}

To run as an HTTP server (e.g. inside a container or Kubernetes pod), add transport variables to the same env block:

json
"env": {
  "DD_API_KEY": "your-api-key",
  "DD_APP_KEY": "your-app-key",
  "MCP_TRANSPORT": "http",
  "MCP_PORT": "3000",
  "MCP_HOST": "0.0.0.0"
}

Configuration

Required environment variables

bash
DD_API_KEY=your-api-key
DD_APP_KEY=your-app-key

Optional environment variables

bash
DD_SITE=datadoghq.com  # Default. Use datadoghq.eu for EU, etc.

# Limit defaults (fallbacks when the AI doesn't specify)
MCP_DEFAULT_LIMIT=50              # General tools default limit
MCP_DEFAULT_LOG_LINES=200         # Logs tool default limit
MCP_DEFAULT_METRIC_POINTS=1000    # Metrics timeseries data points
MCP_DEFAULT_TIME_RANGE=24         # Default time range in hours

# Transport (alternative to CLI flags — useful in Kubernetes)
MCP_TRANSPORT=stdio               # stdio | http
MCP_PORT=3000                     # HTTP port
MCP_HOST=0.0.0.0                  # HTTP host

Optional flags

bash
--site=datadoghq.com     # Datadog site (overrides DD_SITE)
--transport=stdio|http   # Transport mode (default: stdio)
--port=3000              # HTTP port when using http transport
--host=0.0.0.0           # HTTP host when using http transport
--read-only              # Block all write operations
--disable-tools=synthetics,rum,security    # Comma-separated list of tools to disable

Transports

TransportWhen to useEndpoints
stdio (default)Local MCP clients — Claude Desktop, Cursor, VS Coden/a (process stdin/stdout)
httpRemote / container / KubernetesPOST /mcp · GET /mcp (SSE) · DELETE /mcp · GET /health

Select with --transport=http or MCP_TRANSPORT=http.

Deployment

Docker

json
{
  "mcpServers": {
    "datadog": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "-e", "DD_API_KEY",
        "-e", "DD_APP_KEY",
        "-e", "DD_SITE",
        "ghcr.io/tantiope/datadog-mcp"
      ],
      "env": {
        "DD_API_KEY": "your-api-key",
        "DD_APP_KEY": "your-app-key",
        "DD_SITE": "datadoghq.com"
      }
    }
  }
}

Kubernetes

Use environment variables — not container args — for transport configuration:

yaml
env:
  - name: DD_API_KEY
    value: "your-api-key"
  - name: DD_APP_KEY
    value: "your-app-key"
  - name: MCP_TRANSPORT
    value: "http"
  - name: MCP_PORT
    value: "3000"
  - name: MCP_HOST
    value: "0.0.0.0"

Note: Kubernetes args: replaces the entire Dockerfile CMD, causing Node.js to receive the flags instead of your application. Environment variables avoid this issue.

Tools

ToolActionCategoryDescriptionRequired Scopes
monitorslistAlertingList monitors with optional filtersmonitors_read
monitorsgetAlertingGet monitor by IDmonitors_read
monitorssearchAlertingSearch monitors by querymonitors_read
monitorscreateAlertingCreate a new monitor; config is validated against a typed schema covering documented options (notifyNoData, renotifyInterval, thresholds, …) — unknown keys surface in warnings. Pass dry_run: true to validate without creating (uses /api/v1/monitor/validate, allowed in read-only mode).monitors_write
monitorsupdateAlertingUpdate an existing monitor; same validated schema as create; partial configs accepted; validation errors short-circuit before any HTTP call as EINVALID_MONITOR_CONFIG:monitors_write
monitorspreviewAlertingRender a monitor template (inline message or by monitor_id/id) with optional context of variables and conditionals. Returns {rendered, variablesUsed, variablesMissing, conditionalsResolved}. Supports Datadog Mustache subset: variable substitution + six documented conditionals (is_alert, is_warning, is_no_data, is_recovery, is_alert_to_warning, is_warning_to_alert); {{#each}}/partials throw EUNSUPPORTED_TEMPLATE_SYNTAX. Read-only.monitors_read
monitorstest_notificationAlertingKnown limitation: returns ENOT_SUPPORTED — Datadog has no public REST endpoint for triggering a test notification. Documentation pointer in response.n/a
monitorsdeleteAlertingDelete a monitormonitors_write
monitorsmuteAlertingMute a monitormonitors_write
monitorsunmuteAlertingUnmute a monitormonitors_write
monitorstopAlertingTop N monitors by alert frequency with real monitor names and context breakdown. WARNING: total_count includes renotifies/re-evaluations (Datadog emits a renotify event every renotify_interval minutes while Alert). For real fires use action=history.monitors_read
monitorshistoryAlertingCount and list real state transitions for one monitor over a time window. Filters by transitionType (default ["alert","alert recovery"] — fires+recoveries, excludes renotifies) and optional group. Returns {transitions: [...], count, meta} where count is the number of real transitions (e.g. for one always-Alert burn-rate monitor over 7d: 98 raw events vs 38 real transitions).monitors_read, events_read
dashboardslistVisualizationList all dashboardsdashboards_read
dashboardsgetVisualizationGet dashboard by IDdashboards_read
dashboardscreateVisualizationCreate a new dashboarddashboards_write
dashboardsupdateVisualizationUpdate a dashboarddashboards_write
dashboardsdeleteVisualizationDelete a dashboarddashboards_write
logssearchLogsSearch logs with query syntax and filterslogs_read_data, logs_read_index_data
logsaggregateLogsAggregate log data with groupBylogs_read_data
logs_pipelineslist, getLogs ConfigInspect log processing pipelines and their processorslogs_read_config
logs_pipelinescreate, update, delete, reorderLogs ConfigAuthor pipelines and processor chainslogs_write_config
logs_pipelinesget_orderLogs ConfigRead pipeline evaluation orderlogs_read_config
logs_indexeslist, getLogs ConfigInspect indexes (filter, retention, Flex tier, exclusion filters); create/delete are UI-only per Datadog and not exposedlogs_read_config
logs_indexesupdate, reorderLogs ConfigUpdate index filter/retention/quota and reorder evaluationlogs_write_config
logs_indexesget_orderLogs ConfigRead index evaluation orderlogs_read_config
logs_archiveslist, getLogs ConfigInspect log archives (S3 / GCS / Azure destinations); per-provider credential fields are forwarded unchangedlogs_read_archives
logs_archivescreate, update, delete, reorderLogs ConfigManage archive destinations; destination.type validated against `s3gcs
logs_archivesget_orderLogs ConfigRead archive evaluation orderlogs_read_archives
metricsqueryMetricsQuery timeseries data. Response meta includes rollupRequested (parsed from rollup(method, seconds), with methodInferred flag), rollupEffective (interval derived from returned pointlist intervals + deduped intervalsObserved for multi-series), and rollupOverridden: boolean so callers can detect when Datadog silently downsampled.metrics_read, timeseries_query
metricssearchMetricsSearch for metrics by namemetrics_read
metricslistMetricsList active metricsmetrics_read
metricsmetadataMetricsGet metric metadatametrics_read
tracessearchAPMSearch spans with filtersapm_read
tracesaggregateAPMAggregate trace dataapm_read
tracesservicesAPMList APM servicesapm_service_catalog_read
eventslistEventsList eventsevents_read
eventsgetEventsGet event by IDevents_read
eventscreateEventsCreate an eventevents_read
eventssearchEventsSearch events with v2 API and cursor pagination. Optional transitionType filter (e.g. ["alert","alert recovery"]) restricts to monitor state-transition events — without it, source:alert includes renotifies. For monitor-specific fires use monitors action=history. Optional timezone adds *Local ISO 8601 siblings to every timestamp. Zero-result responses include a diagnostics array hinting at the cause (UNINDEXED_TAG_PREFIX, NARROW_TIME_RANGE, RESTRICTIVE_SOURCE_FILTER).events_read
eventshistogramEventsServer-side bucketing of events by hour_of_day, day_of_week, or day_of_month in an IANA timezone (DST-safe via Intl.DateTimeFormat). Accepts the same transitionType filter as search so monitor histograms can exclude renotifies. Cursor-paginates the underlying search; cap at limits.maxEventsForHistogram (default 5000, MCP_MAX_EVENTS_HISTOGRAM env var). When the cap is hit, returns bucketCountIncomplete: true and nextCursor for continuation.events_read
eventsaggregateEventsClient-side aggregation by monitor_name, source, etc.events_read
eventstopEventsTop N event groups by count with generic groupBy support (deployments, configs, alerts, etc.). Groups without context tags are included as "no_context"events_read
eventstimeseriesEventsTime-bucketed alert trends (hourly/daily counts)events_read
eventsincidentsEventsDeduplicate alerts into incidents with Trigger/Recover pairingevents_read
incidentslistIncidentsList incidentsincident_read
incidentsgetIncidentsGet incident by IDincident_read
incidentssearchIncidentsSearch incidentsincident_read
incidentscreateIncidentsCreate an incidentincident_write
incidentsupdateIncidentsUpdate an incidentincident_write
incidentsdeleteIncidentsDelete an incidentincident_write
sloslistSLOsList SLOs. Each item exposes query, monitorIds, monitorTags, groups, and a UI url so round-trips (get → edit → update) preserve definition fields.slos_read
slosgetSLOsGet SLO by ID (same projection as list).slos_read
sloscreateSLOsCreate an SLOslos_write
slosupdateSLOsUpdate an SLOslos_write
slosdeleteSLOsDelete an SLOslos_write
sloshistorySLOsGet SLO historyslos_read
syntheticslistSyntheticsList synthetic testssynthetics_read
syntheticsgetSyntheticsGet test by public IDsynthetics_read
syntheticscreateSyntheticsCreate a testsynthetics_write
syntheticsupdateSyntheticsUpdate a testsynthetics_write
syntheticsdeleteSyntheticsDelete a testsynthetics_write
syntheticstriggerSyntheticsTrigger a test runsynthetics_write
syntheticsresultsSyntheticsGet test resultssynthetics_read
downtimeslistDowntimesList downtimesmonitors_downtime
downtimesgetDowntimesGet downtime by IDmonitors_downtime
downtimescreateDowntimesCreate a downtimemonitors_downtime
downtimesupdateDowntimesUpdate a downtimemonitors_downtime
downtimescancelDowntimesCancel a downtimemonitors_downtime
downtimeslistByMonitorDowntimesList downtimes for a monitormonitors_downtime
hostslistInfrastructureList hostshosts_read
hoststotalsInfrastructureGet host totalshosts_read
hostsmuteInfrastructureMute a hosthosts_read
hostsunmuteInfrastructureUnmute a hosthosts_read
rumapplicationsRUMList RUM applicationsrum_read
rumeventsRUMSearch RUM eventsrum_read
rumaggregateRUMAggregate RUM datarum_read
rumperformanceRUMGet Core Web Vitals (LCP, FCP, CLS, FID, INP)rum_read
rumwaterfallRUMGet session timeline with resources/actions/errorsrum_read
securityrulesSecurityList security rulessecurity_monitoring_rules_read
securitysignalsSecuritySearch security signalssecurity_monitoring_signals_read
securityfindingsSecurityList security findingssecurity_monitoring_findings_read
notebookslistNotebooksList notebooksnotebooks_read
notebooksgetNotebooksGet notebook by IDnotebooks_read
notebookscreateNotebooksCreate a notebooknotebooks_write
notebooksupdateNotebooksUpdate a notebooknotebooks_write
notebooksdeleteNotebooksDelete a notebooknotebooks_write
userslistAdminList usersuser_access_read
usersgetAdminGet user by IDuser_access_read
teamslistAdminList teamsteams_read
teamsgetAdminGet team by IDteams_read
teamsmembersAdminList team membersteams_read
tagslistInfrastructureList all tagshosts_read
tagsgetInfrastructureGet tags for a hosthosts_read
tagsaddInfrastructureAdd tags to a hosthosts_read
tagsupdateInfrastructureUpdate host tagshosts_read
tagsdeleteInfrastructureDelete host tagshosts_read
usagesummaryBillingUsage summaryusage_read
usagehostsBillingHost usageusage_read
usagelogsBillingLog usageusage_read
usagecustom_metricsBillingCustom metrics usageusage_read
usageindexed_spansBillingIndexed spans usageusage_read
usageingested_spansBillingIngested spans usageusage_read
authvalidateAuthTest API and App key validity

Limit Control

AI assistants have full control over query limits. The MCP_DEFAULT_* environment variables only set the fallback used when the AI doesn't specify a limit — they do NOT cap what the AI can request.

ToolDefaultParameterDescription
Logs200limitLog lines to return
Metrics (timeseries)1000pointLimitData points per series (controls resolution)
General tools50limitResults to return

Tool-level token reduction features (compact: true on logs, sample: "diverse" | "spread" | "first", field projections, diagnostics) are surfaced in each tool's MCP description and chosen by the AI at call time.

Notable behaviors

A handful of patterns worth knowing about — the AI can discover the rest from tool descriptions.

  • Renotifies vs real fires. monitors top and events search with source:alert count every renotify Datadog emits (one every renotify_interval while a monitor is Alert). To get actual state transitions, use monitors history (defaults to transitionType: ["alert","alert recovery"]) or pass transitionType to events search.
  • DST-safe time buckets. events histogram buckets by hour_of_day / day_of_week / day_of_month in any IANA timezone via Intl.DateTimeFormat. Cursor-paginates the underlying search; cap controlled by MCP_MAX_EVENTS_HISTOGRAM (default 5000) with bucketCountIncomplete + nextCursor on overflow.
  • Validate before create. monitors create with dry_run: true calls /api/v1/monitor/validate instead of persisting. Allowed in --read-only mode.
  • Monitor template preview. monitors preview renders a notification against a context payload — variable substitution + Datadog's six documented conditionals (is_alert, is_warning, is_no_data, is_recovery, is_alert_to_warning, is_warning_to_alert). {{#each}} and partials throw EUNSUPPORTED_TEMPLATE_SYNTAX.
  • SLO round-trip. slos get projects query, monitorIds, monitorTags, groups, and a UI url so you can edit and feed back into slos update without dropping definition fields.
  • Cross-correlation. logs(sample:"diverse") → pull dd.trace_idtraces(query:"trace_id:<id>")metrics(query:"p95:trace.express.request{service:...}") (root metric without .duration for percentiles).

Deep links

Every query response includes a datadog_url field built for your configured DD_SITEdatadoghq.com (default), .eu, us3 / us5 / ap1.datadoghq.com, or ddog-gov.com. Supported on logs, metrics, traces, events, monitors, rum, slos.

Contributing

Contributions are welcome! Feel free to open an issue or a pull request if you have any suggestions, bug reports, or improvements to propose.

License

This project is licensed under the Apache License, Version 2.0.

常见问题

io.github.TANTIOPE/datadog-mcp 是什么?

提供完整 Datadog API 访问,覆盖 monitors、logs、metrics、traces、dashboards 及 observability 工具。

相关 Skills

安全专家

by alirezarezvani

Universal
热门

覆盖威胁建模、漏洞评估、安全架构设计、代码审计与渗透测试,内置 STRIDE、OWASP、加密模式和安全扫描流程,适合系统设计评审与上线前安全排查。

安全专家把威胁建模、漏洞分析到渗透测试串成一套流程,内置 STRIDE 与 OWASP 指南,做安全设计和排查更省心。

安全与合规
未扫描17.9k

安全运营

by alirezarezvani

Universal
热门

覆盖应用安全、漏洞管理与合规审计,支持代码/依赖扫描、CVE 评估、Secrets 检测和安全自动化,适合做安全基线落地、漏洞响应、审计检查与安全开发治理。

应用安全、漏洞管理和合规检查一套打通,还能自动化扫描与响应,帮团队更早发现并收敛风险。

安全与合规
未扫描17.9k

安全审计

by alirezarezvani

Universal
热门

安装前审计 Claude Code Skill 的代码执行、Prompt 注入和依赖供应链风险,支持本地目录或 Git 仓库扫描,输出 PASS/WARN/FAIL 结论及修复建议

把代码审查、漏洞扫描和合规检查串成一条线,帮团队更早发现风险,做安全治理更省心。

安全与合规
未扫描17.9k

相关 MCP Server

热门

搜索和分析 Sentry 错误报告,辅助调试。

把零散的 Sentry 错误报告变成可检索线索,帮你在海量报错里更快定位线上故障,排障调试明显省时。

安全与合规
725

为 AI agents 提供安全层:拦截 prompt injection、识别伪造 packages,并扫描漏洞风险。

给 AI Agent 补上关键安全层,能拦截 prompt 注入、识别伪造包并扫描漏洞风险,把防护前置更省心。

安全与合规
110

强化安全性的 NotebookLM MCP,集成 post-quantum encryption,提升数据防护能力。

安全与合规
68

评论