io.github.IvanRublev/keyphrases-mcp
平台与服务by ivanrublev
使用 BERT 模型从文本中提取关键短语的 MCP 服务器,适用于摘要、标签生成与语义分析任务。
什么是 io.github.IvanRublev/keyphrases-mcp?
使用 BERT 模型从文本中提取关键短语的 MCP 服务器,适用于摘要、标签生成与语义分析任务。
README
🔤 Keyphrases-MCP
Empowering LLMs with authentic keyphrase Extraction
Built with the following tools and technologies:
<img src="https://img.shields.io/badge/MCP-6A5ACD.svg?style=default&logo=data:image/svg+xml;base64,PHN2ZyBmaWxsPSIjNkE1QUNEIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxNiIgaGVpZ2h0PSIxNiI+PHJlY3Qgd2lkdGg9IjE2IiBoZWlnaHQ9IjE2IiByeD0iNCIvPjx0ZXh0IHg9IjgiIHk9IjExIiBmb250LXNpemU9IjgiIHRleHQtYW5jaG9yPSJtaWRkbGUiIGZpbGw9IndoaXRlIj5NQ1A8L3RleHQ+PC9zdmc+" alt="MCP"> <img src="https://img.shields.io/badge/PyTorch-EE4C2C.svg?style=default&logo=PyTorch&logoColor=white" alt="PyTorch"> <img src="https://img.shields.io/badge/Python-3776AB.svg?style=default&logo=Python&logoColor=white" alt="Python"> <img src="https://img.shields.io/badge/uv-DE5FE9.svg?style=default&logo=uv&logoColor=white" alt="uv">
Overview
This Keyphrases MCP Server is a natural language interface designed for agentic applications to extract keyphrasess from provided text. It integrates seamlessly with MCP (Model Content Protocol) clients, enabling AI-driven workflows to extract keyphrases more accurately and with higher relevance using the BERT machine learning model. It works directly with your local files in the allowed directories saving the context tokens for the LLM. The application ensures secure document processing by exposing only extracted keyphrases to the MCP client, not the original file content.
Using this MCP Server, you can ask the following questions:
- "Extract 7 keyphrases from the file. [ABSOLUTE_FILE_PATH]"
- "Extract 3 keyphrases from the given file ignoring the stop words. Stop words: former, due, amount, [OTHER_STOP_WORDS]. File: [ABSOLUTE_FILE_PATH]"
Keyphrases help users quickly grasp the main topics and themes of a document without reading it in full and enable the following applications:
- tags or metadata for documents, improving organization and discoverability in digital libraries
- emerging trends, sentiment, identified from customer reviews, social media, or news articles
- features or inputs for other tasks, such as text classification, clustering
Reasoning for keyphrases-mcp
Autoregressive LLM models such as in Claude or ChatGPT process text sequentially, which—not only limits their ability to fully contextualize keyphrases across the entire document—but also suffers from context degradation as the input length increases, causing earlier keyphrases to receive diluted attention.
Bidirectional models like BERT, by considering both left and right context and maintaining more consistent attention across the sequence, generally extract existing keyphrases from texts more accurately and with higher relevance especially when no domain-specific fine-tuning is applied.
However, as autoregressive models adopt longer context windows and techniques such as input chunking, their performance in keyphrase extraction is improving, narrowing the gap with BERT. And domain-specific fine-tuning can make autoregressive LLM model to outperform the BERT solution.
This MCP server combines BERT for keyphrase extraction with an autoregressive LLM for text generation or refinement, enabling seamless text processing.
How it works
The server uses a KeyBERT framework for the multi-step extraction pipeline combining spaCy NLP preprocessing with BERT embeddings:
- Candidate Generation: KeyphraseCountVectorizer identifies meaningful keyphrase candidates using spaCy's en_core_web_trf model and discarding stop words
- Semantic Encoding: Candidates and document are embedded using paraphrase-multilingual-MiniLM-L12-v2 sentence transformer
- Relevance Ranking: KeyBERT calculates cosine similarity between candidate keyphrase and document embeddings
- Diversity Selection: Maximal Marginal Relevance (MMR) ensures diverse, non-redundant keyphrases
- Final Output: Top N most relevant and diverse keyphrases are selected and sorted alphabetically
See configuration document for details.
Documentation
- 🚀 Integration - How to integrate server with your MCP client or LLM
- 🔧 Configuration - Environment variables and settings
- 🛠️ MCP Tools - All available tools
- 🪵 Development and testing - Guide on how to contribute to the project
License
This project is licensed under the MIT License.
Contact
For questions or support, reach out via GitHub Issues.
常见问题
io.github.IvanRublev/keyphrases-mcp 是什么?
使用 BERT 模型从文本中提取关键短语的 MCP 服务器,适用于摘要、标签生成与语义分析任务。
相关 Skills
MCP构建
by anthropics
聚焦高质量 MCP Server 开发,覆盖协议研究、工具设计、错误处理与传输选型,适合用 FastMCP 或 MCP SDK 对接外部 API、封装服务能力。
✎ 想让 LLM 稳定调用外部 API,就用 MCP构建:从 Python 到 Node 都有成熟指引,帮你更快做出高质量 MCP 服务器。
Slack动图
by anthropics
面向Slack的动图制作Skill,内置emoji/消息GIF的尺寸、帧率和色彩约束、校验与优化流程,适合把创意或上传图片快速做成可直接发送的Slack动画。
✎ 帮你快速做出适配 Slack 的动图,内置约束规则和校验工具,少踩上传与播放坑,做表情包和演示都更省心。
MCP服务构建器
by alirezarezvani
从 OpenAPI 一键生成 Python/TypeScript MCP server 脚手架,并校验 tool schema、命名规范与版本兼容性,适合把现有 REST API 快速发布成可生产演进的 MCP 服务。
✎ 帮你快速搭建 MCP 服务与后端 API,脚手架完善、扩展顺手,尤其适合想高效验证服务能力的开发者。
相关 MCP Server
Slack 消息
编辑精选by Anthropic
Slack 是让 AI 助手直接读写你的 Slack 频道和消息的 MCP 服务器。
✎ 这个服务器解决了团队协作中需要 AI 实时获取 Slack 信息的痛点,特别适合开发团队让 Claude 帮忙汇总频道讨论或发送通知。不过,它目前只是参考实现,文档有限,不建议在生产环境直接使用——更适合开发者学习 MCP 如何集成第三方服务。
by netdata
io.github.netdata/mcp-server 是让 AI 助手实时监控服务器指标和日志的 MCP 服务器。
✎ 这个工具解决了运维人员需要手动检查系统状态的痛点,最适合 DevOps 团队让 Claude 自动分析性能数据。不过,它依赖 NetData 的现有部署,如果你没用过这个监控平台,得先花时间配置。
by d4vinci
Scrapling MCP Server 是专为现代网页设计的智能爬虫工具,支持绕过 Cloudflare 等反爬机制。
✎ 这个工具解决了爬取动态网页和反爬网站时的头疼问题,特别适合需要批量采集电商价格或新闻数据的开发者。不过,它依赖外部浏览器引擎,资源消耗较大,不适合轻量级任务。