Dataiku MCP

平台与服务

by clssck

面向 Dataiku DSS 项目、flow 与 operations APIs 的 MCP 服务器,便于 AI 调用与管理。

什么是 Dataiku MCP

面向 Dataiku DSS 项目、flow 与 operations APIs 的 MCP 服务器,便于 AI 调用与管理。

README

Dataiku MCP Server

MCP server for Dataiku DSS REST APIs, focused on flow analysis and reliable day-to-day operations (projects, datasets, recipes, jobs, scenarios, folders, variables, connections, and code environments).

Install MCP Server

Cursor one-click install includes placeholder environment values. Update DATAIKU_URL, DATAIKU_API_KEY, and optionally DATAIKU_PROJECT_KEY after adding the server.

What You Get

  • Deterministic normalized flow maps (project.map) with recipe subtypes and connectivity.
  • Summary-first outputs with explicit raw/detail toggles where needed.
  • Broad test coverage (unit + live integration + optional destructive integration suite).
  • Strong error taxonomy in responses: not_found, forbidden, validation, transient, unknown with retry hints.

Tool Coverage

  • project: list, get, metadata, flow, map
  • dataset: list, get, schema, preview, metadata, download, create, update, delete
  • recipe: list, get, create, update, delete, download
  • job: list, get, log, build, buildAndWait, wait, abort
  • scenario: list, run, status, get, create, update, delete
  • managed_folder: list, get, contents, download, upload, delete_file
  • variable: get, set
  • connection: infer
  • code_env: list, get

Prerequisites

  • Node.js 20+
  • npm
  • Dataiku DSS URL + API key

Quick Start

bash
npm ci
npm run build

Run as a local CLI after build:

bash
node dist/index.js

Use directly from npm (after publish):

bash
npx -y dataiku-mcp

Local Build And Testing

Recommended local workflow from repo root:

bash
# install deps
npm ci

# static checks
npm run check

# unit tests
npm test

# build distribution
npm run build

# run MCP server locally (dev)
npm start

Optional live DSS integration tests:

bash
# requires DATAIKU_URL, DATAIKU_API_KEY, DATAIKU_PROJECT_KEY in .env
npm run test:integration

# includes destructive actions (create/update/delete)
DATAIKU_MCP_DESTRUCTIVE_TESTS=1 npm run test:integration

Repository Layout

  • src/: MCP server and tool implementations.
  • tests/: unit + integration test suites.
  • examples/: demos, fixtures, artifacts, and ad-hoc local scripts.
  • bin/: package executable entrypoint.
  • dist/: compiled output (generated).

Create a local env file:

bash
cp .env.example .env
# then edit .env

Run directly in dev:

bash
npm start

Example scripts and sample outputs are kept under examples/ to avoid root-level clutter.

Environment Variables

  • DATAIKU_URL: DSS base URL
  • DATAIKU_API_KEY: DSS API key
  • DATAIKU_PROJECT_KEY (optional): default project key
  • DATAIKU_REQUEST_TIMEOUT_MS (optional): per-attempt request timeout in milliseconds (default: 30000)
  • DATAIKU_RETRY_MAX_ATTEMPTS (optional): max attempts for retry-enabled requests (GET only, default: 4, cap: 10)
  • DATAIKU_DEBUG_LATENCY (optional): set to 1/true to include per-tool timing diagnostics in structuredContent.debug.latency (off by default)

MCP Client Setup Guide

Use this server command in clients (npm package):

json
{
  "command": "npx",
  "args": ["-y", "dataiku-mcp"],
  "env": {
    "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
    "DATAIKU_API_KEY": "your_api_key",
    "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
  }
}

Windows note: if your MCP client launches commands without a shell, use npx.cmd:

json
{
  "command": "npx.cmd",
  "args": ["-y", "dataiku-mcp"],
  "env": {
    "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
    "DATAIKU_API_KEY": "your_api_key",
    "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
  }
}

You can also run TypeScript directly during development:

json
{
  "command": "npx",
  "args": ["tsx", "/absolute/path/to/Dataiku_MCP/src/index.ts"],
  "env": {
    "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
    "DATAIKU_API_KEY": "your_api_key",
    "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
  }
}

Claude Desktop

  1. Open Claude Desktop -> Settings -> Developer -> Edit Config.
  2. Add this under mcpServers in claude_desktop_config.json:
json
{
  "mcpServers": {
    "dataiku": {
      "command": "npx",
      "args": ["-y", "dataiku-mcp"],
      "env": {
        "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
        "DATAIKU_API_KEY": "your_api_key",
        "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
      }
    }
  }
}

Cursor

Cursor supports both project-scoped and global MCP config:

  • Project: .cursor/mcp.json
  • Global: ~/.cursor/mcp.json

Example:

json
{
  "mcpServers": {
    "dataiku": {
      "command": "npx",
      "args": ["-y", "dataiku-mcp"],
      "env": {
        "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
        "DATAIKU_API_KEY": "your_api_key",
        "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
      }
    }
  }
}

Cline (VS Code extension)

  1. Open Cline -> MCP Servers -> Configure MCP Servers.
  2. Add this server block in cline_mcp_settings.json:
json
{
  "mcpServers": {
    "dataiku": {
      "command": "npx",
      "args": ["-y", "dataiku-mcp"],
      "env": {
        "DATAIKU_URL": "https://your-dss-instance.app.dataiku.io",
        "DATAIKU_API_KEY": "your_api_key",
        "DATAIKU_PROJECT_KEY": "YOUR_PROJECT_KEY"
      }
    }
  }
}

Codex / project-level MCP config

This repo already includes a project-scoped MCP file at .mcp.json. The checked-in .mcp.json uses node node_modules/tsx/dist/cli.mjs src/index.ts for cross-platform startup (including Windows); run npm ci first.

NPM Release Workflow

This repo includes a manual GitHub Actions release workflow:

  • Workflow file: .github/workflows/release.yml
  • Trigger: Actions -> Release NPM Package -> Run workflow

Inputs:

  • bump: patch | minor | major
  • version: optional exact version (overrides bump)
  • publish: whether to publish to npm

Required repository configuration:

  • GitHub variable: NPM_RELEASE_ENABLED=true
  • Optional variable: NPM_PUBLISH_ACCESS=public
  • Trusted publisher configured on npmjs.com for this package/repo/workflow

The workflow will:

  1. Install dependencies, run checks/tests, and build.
  2. Bump package version and create git tag.
  3. Push commit + tag to main.
  4. Publish to npm with GitHub OIDC trusted publishing (if publish=true).
  5. Create a GitHub Release with generated notes.

Trusted publishing setup (npm):

  1. Open https://www.npmjs.com/package/dataiku-mcp -> Settings -> Trusted Publisher.
  2. Choose GitHub Actions.
  3. Set:
    • Organization or user: clssck
    • Repository: Dataiku_MCP
    • Workflow filename: release.yml
  4. Save.

Official MCP Registry

This repo is configured for MCP Registry publishing:

  • Metadata file: server.json
  • Workflow: .github/workflows/publish-mcp-registry.yml
  • Required package field: mcpName in package.json

Server namespace:

  • io.github.clssck/dataiku-mcp

Publish paths:

  1. Manual: run Publish to MCP Registry in GitHub Actions.
  2. Automatic: run the npm release workflow with publish=true (it triggers MCP Registry publish).

Validation notes:

  • server.json.name must match package.json.mcpName.
  • server.json.packages[].identifier + version must reference a real npm publish.

Recommended Verification Prompt

After adding the server in a client, run:

  • project with { "action": "map", "projectKey": "YOUR_PROJECT_KEY" } (defaults to maxNodes=300, maxEdges=600; override as needed)

You should receive a flow summary in text and normalized nodes, edges, stats, roots, and leaves under structuredContent.map. When truncation limits are applied (default maxNodes=300, maxEdges=600), structuredContent.truncation reports before/after node+edge counts and whether truncation occurred.

Notes

  • project.map returns a compact text summary; full normalized graph is in structuredContent.map.
  • Arrays in normalized map output are deterministically sorted to reduce diff churn.
  • job.wait and job.buildAndWait include structuredContent.normalizedState with one of terminalSuccess | terminalFailure | timeout | nonTerminal while preserving raw DSS state.
  • With DATAIKU_DEBUG_LATENCY=1, responses include per-tool and per-API-call latency metrics under structuredContent.debug.latency.
  • List-style responses are token-bounded by default; use limit/offset (and action-specific caps like maxNodes, maxEdges, maxKeys, maxPackages) to page or expand results when needed.
  • dataset.get and job.get are summary-first by default; pass includeDefinition=true to include full DSS JSON in structuredContent.definition.

Sources

常见问题

Dataiku MCP 是什么?

面向 Dataiku DSS 项目、flow 与 operations APIs 的 MCP 服务器,便于 AI 调用与管理。

相关 Skills

MCP构建

by anthropics

Universal
热门

聚焦高质量 MCP Server 开发,覆盖协议研究、工具设计、错误处理与传输选型,适合用 FastMCP 或 MCP SDK 对接外部 API、封装服务能力。

想让 LLM 稳定调用外部 API,就用 MCP构建:从 Python 到 Node 都有成熟指引,帮你更快做出高质量 MCP 服务器。

平台与服务
未扫描116.0k

Slack动图

by anthropics

Universal
热门

面向Slack的动图制作Skill,内置emoji/消息GIF的尺寸、帧率和色彩约束、校验与优化流程,适合把创意或上传图片快速做成可直接发送的Slack动画。

帮你快速做出适配 Slack 的动图,内置约束规则和校验工具,少踩上传与播放坑,做表情包和演示都更省心。

平台与服务
未扫描116.0k

MCP服务构建器

by alirezarezvani

Universal
热门

从 OpenAPI 一键生成 Python/TypeScript MCP server 脚手架,并校验 tool schema、命名规范与版本兼容性,适合把现有 REST API 快速发布成可生产演进的 MCP 服务。

帮你快速搭建 MCP 服务与后端 API,脚手架完善、扩展顺手,尤其适合想高效验证服务能力的开发者。

平台与服务
未扫描10.7k

相关 MCP Server

Slack 消息

编辑精选

by Anthropic

热门

Slack 是让 AI 助手直接读写你的 Slack 频道和消息的 MCP 服务器。

这个服务器解决了团队协作中需要 AI 实时获取 Slack 信息的痛点,特别适合开发团队让 Claude 帮忙汇总频道讨论或发送通知。不过,它目前只是参考实现,文档有限,不建议在生产环境直接使用——更适合开发者学习 MCP 如何集成第三方服务。

平台与服务
83.6k

by netdata

热门

io.github.netdata/mcp-server 是让 AI 助手实时监控服务器指标和日志的 MCP 服务器。

这个工具解决了运维人员需要手动检查系统状态的痛点,最适合 DevOps 团队让 Claude 自动分析性能数据。不过,它依赖 NetData 的现有部署,如果你没用过这个监控平台,得先花时间配置。

平台与服务
78.4k

by d4vinci

热门

Scrapling MCP Server 是专为现代网页设计的智能爬虫工具,支持绕过 Cloudflare 等反爬机制。

这个工具解决了爬取动态网页和反爬网站时的头疼问题,特别适合需要批量采集电商价格或新闻数据的开发者。不过,它依赖外部浏览器引擎,资源消耗较大,不适合轻量级任务。

平台与服务
36.5k

评论