feishu-openclaw-paper-manager
by ch1hyaanon
Design or implement a paper-management workflow built on a Feishu bot plus OpenClaw. Use when the user wants to ingest papers from Feishu messages, save PDFs or source links into a Feishu cloud-docs paper folder, maintain a searchable multi-dimensional table of paper metadata and reusable tags, or evolve the tag taxonomy after every 50 newly added papers.
安装
claude skill add --url github.com/openclaw/skills/tree/main/skills/ch1hyaanon/feishu-paper-manager文档
Feishu OpenClaw Paper Manager
Use this skill when the task is to design, review, or implement a paper-management agent that runs through a Feishu bot and OpenClaw.
Primary objective
Build a workflow with three durable outputs:
- A cloud-docs
paperfolder that stores the paper asset or an index document for the source link. - A Feishu multi-dimensional table that stores normalized paper metadata.
- A taxonomy iteration loop that improves labels whenever total paper count crosses
50,100,150, and so on.
Workflow
Follow this sequence:
- Map the ingestion sources from Feishu messages.
- Normalize each paper into one canonical record.
- Save or register the paper in the cloud-docs
paperfolder. - Generate summary and reusable multi-label tags.
- Write the record into the multi-dimensional table.
- Check whether the total record count has reached a multiple of 50.
- If yes, run taxonomy refinement and backfill historical rows.
Ingestion rules
Treat these as valid paper inputs:
- PDF attachment in a Feishu message
- arXiv, OpenReview, ACL Anthology, publisher, or project links in a message
- Mixed messages that contain both a PDF and a source link
For every incoming item:
- Extract the raw message URL, sender, timestamp, and all detected paper candidates.
- Resolve whether the message refers to a new paper or an existing one.
- Create exactly one canonical paper record per paper.
Deduplication priority:
- DOI
- arXiv ID or OpenReview forum ID
- normalized title
- source URL fingerprint
If a duplicate exists, update missing fields instead of creating a new row.
Storage model
Always separate binary storage from metadata storage:
- The cloud-docs
paperfolder is for the PDF file or a small link-index doc when the PDF is unavailable. - The multi-dimensional table is the system of record for metadata, classification, and retrieval.
When a PDF is available:
- upload the PDF into the
paperfolder - use a deterministic filename from
year + first_author + short_title
When only a link is available:
- create a lightweight cloud doc in the
paperfolder containing the title, source link, access date, and capture notes - still create the metadata row in the table
Required table fields
Create or expect these fields in the Feishu multi-dimensional table:
paper_id: stable unique IDtitle: paper titledoc_link: cloud-docs link to the stored PDF or link-index docsource_link: original paper URLsource_type:pdf,arxiv,openreview,publisher,project,othersummary_one_line: one-sentence summary in plain languagetags_topic: reusable topical tagstags_method: reusable method tagstags_task: reusable task tagstags_domain: reusable domain tagstags_stage: reusable maturity tags such asreading,worth-reproducing,survey-onlyauthors: normalized author stringvenue: venue or sourceyear: publication yearstatus:new,classified,taxonomy-reviewed,duplicate,errormessage_link: original Feishu message linkingested_at: ingestion timestamptaxonomy_version: taxonomy version used for the row
Use multi-select fields for all tags_* columns so users can filter and search by labels directly in Feishu.
Tagging policy
A paper can have multiple tags, but tags must stay reusable. Do not create a new free-form label when an existing reusable label is close enough.
Use a layered taxonomy:
tags_topic: broad themes such asllm,multimodal,agent,retrieval,alignment,reasoningtags_method: technical mechanisms such asrag,rl,distillation,synthetic-data,moe,benchmarktags_task: applied tasks such ascode-gen,translation,information-extraction,math,searchtags_domain: business or science domain such asbiology,finance,education,roboticstags_stage: actionability and curation state
Keep tags short, lowercase, and singular where possible.
Prefer this decision rule:
- reuse an existing tag if it is at least 80 percent semantically correct
- add a synonym mapping instead of a brand-new visible tag when possible
- only introduce a new visible tag if it will likely apply to at least 5 papers in the next 100 rows
See references/tag-taxonomy.md for the starter taxonomy and merge rules.
One-line summary policy
The summary_one_line field must answer:
- what the paper does
- what makes it different
- in language that a technically literate teammate can scan in under 10 seconds
Avoid hype, citation-style phrasing, and long clauses.
Taxonomy iteration loop
Whenever the table row count reaches a multiple of 50, run a taxonomy review pass.
The review pass must:
- export or inspect all existing tags and their frequencies
- identify sparse tags, duplicates, synonyms, and overloaded tags
- propose a new taxonomy version that improves reuse and filtering quality
- map old tags to new tags
- backfill historical rows
- mark affected rows with the new
taxonomy_version - produce a short change log explaining merges, splits, and renamed tags
Optimization goals:
- fewer near-duplicate tags
- better coverage of high-volume themes
- stable filters across time
- minimal churn for already-good tags
Do not relabel everything from scratch unless the old taxonomy is clearly broken. Prefer merge-and-backfill over wholesale replacement.
Output contract
When asked to design the system, return:
- end-to-end workflow
- Feishu table schema
- tagging strategy
- taxonomy iteration logic
- implementation notes for Feishu bot and OpenClaw boundaries
When asked to implement or review code, anchor decisions to:
- ingestion reliability
- deduplication correctness
- idempotent writes
- table filterability
- taxonomy evolution safety
Boundary between Feishu bot and OpenClaw
Default division of responsibility:
- Feishu bot: receive messages, fetch attachments or links, send confirmations, surface errors
- OpenClaw: parse payloads, deduplicate, summarize, classify, write docs and table rows, trigger taxonomy reviews
If the user has an existing architecture, preserve it and only adapt the workflow.
Development contract
When the user wants implementation guidance, assume this integration shape unless the project already defines another one:
- Feishu bot receives the event and verifies the request.
- Feishu bot converts the raw event into a normalized ingestion payload.
- Feishu bot calls an OpenClaw workflow entrypoint with that payload.
- OpenClaw performs enrichment, storage, classification, and table updates.
- OpenClaw returns a structured result for user-visible confirmation.
- Feishu bot posts a success, duplicate, or failure message back to the conversation.
Design the integration around these engineering constraints:
- idempotent processing for webhook retries
- explicit deduplication before any write
- append-safe and patch-safe updates to the table
- stable identifiers for files, rows, and taxonomy versions
- clear status reporting back to the chat thread
Use references/api-contracts.md for payload shapes and responsibility boundaries.
Use references/event-flows.md for event sequences and retry behavior.
Feishu-side implementation expectations
On the Feishu side, prefer these components:
- webhook endpoint for message events
- message parser for attachments and URLs
- file fetcher for PDF downloads
- Feishu document client for folder upload or doc creation
- Feishu table client for row create and row update
- reply formatter for confirmations and error notices
The Feishu bot should emit one normalized payload per detected paper candidate, not one per message if a message contains multiple papers.
Before calling OpenClaw, the Feishu side should:
- verify event authenticity
- extract message metadata
- collect attachment metadata and URLs
- collect raw hyperlinks in the message body
- attach a stable
event_idandmessage_id - include enough source metadata for later retries without rereading the original message when possible
OpenClaw-side implementation expectations
On the OpenClaw side, prefer these stages:
- ingest payload validation
- paper candidate normalization
- metadata enrichment from PDF or source page
- duplicate lookup
- cloud-docs write
- one-line summary generation
- multi-dimensional tagging
- table upsert
- taxonomy-threshold check
- taxonomy review workflow when threshold is hit
OpenClaw should treat the table as an upsert target, not append-only storage.
For duplicates:
- keep the canonical row
- patch missing metadata
- optionally append the new
message_linkto notes or an audit field if the implementation supports it - return a
duplicateresult instead of creating a new paper row
Taxonomy review trigger contract
The threshold is based on canonical non-duplicate paper rows, not raw messages.
Trigger a taxonomy review only when:
- the current canonical row count is divisible by
50 - the taxonomy review for that exact threshold has not already run
Persist a review checkpoint such as last_review_count = 100 or taxonomy_version = v3 so retries do not rerun the same migration.
Output expectations for implementation tasks
When implementing code, prefer returning these artifacts:
- module boundaries
- payload schemas
- idempotency rules
- event sequence
- failure handling and retry rules
- sample success and duplicate responses
When reviewing code, check specifically for:
- webhook retry safety
- duplicate PDF uploads
- row duplication from concurrent events
- inconsistent tag writes across dimensions
- taxonomy backfill that can partially fail without rollback
Failure policy
The default failure strategy is:
- fail fast on invalid payloads
- retry transient network or API errors
- do not retry deterministic classification errors forever
- never create a table row before deduplication completes
- never advance taxonomy version markers before backfill succeeds
If partial writes are possible, require compensating logic or a resumable reconciliation job.
Resources
- Table schema and record lifecycle:
references/table-schema.md - Tag starter set and refinement rules:
references/tag-taxonomy.md - Feishu/OpenClaw payload and boundary contracts:
references/api-contracts.md - Event sequences, retries, and examples:
references/event-flows.md
相关 Skills
表格处理
by anthropics
围绕 .xlsx、.xlsm、.csv、.tsv 做读写、修复、清洗、格式整理、公式计算与格式转换,适合修改现有表格、生成新报表或把杂乱数据整理成交付级电子表格。
✎ 做 Excel/CSV 相关任务很省心,能直接读写、修复、清洗和格式转换,尤其擅长把乱七八糟的表格整理成交付级文件。
PDF处理
by anthropics
遇到 PDF 读写、文本表格提取、合并拆分、旋转加水印、表单填写或加解密时直接用它,也能提取图片、生成新 PDF,并把扫描件通过 OCR 变成可搜索文档。
✎ PDF杂活别再来回切工具了,文本表格提取、合并拆分到OCR识别一次搞定,连扫描件也能变可搜索。
Word文档
by anthropics
覆盖Word/.docx文档的创建、读取、编辑与重排,适合生成报告、备忘录、信函和模板,也能处理目录、页眉页脚、页码、图片替换、查找替换、修订批注及内容提取整理。
✎ 搞定 .docx 的创建、改写与精排版,目录、批量替换、批注修订和图片更新都能自动化,做正式文档尤其省心。
相关 MCP 服务
文件系统
编辑精选by Anthropic
Filesystem 是 MCP 官方参考服务器,让 LLM 安全读写本地文件系统。
✎ 这个服务器解决了让 Claude 直接操作本地文件的痛点,比如自动整理文档或生成代码文件。适合需要自动化文件处理的开发者,但注意它只是参考实现,生产环境需自行加固安全。
by wonderwhy-er
Desktop Commander 是让 AI 直接执行终端命令、管理文件和进程的 MCP 服务器。
✎ 这工具解决了 AI 无法直接操作本地环境的痛点,适合需要自动化脚本调试或文件批量处理的开发者。它能让你用自然语言指挥终端,但权限控制需谨慎,毕竟让 AI 执行 rm -rf 可不是闹着玩的。
EdgarTools
编辑精选by dgunning
EdgarTools 是无需 API 密钥即可解析 SEC EDGAR 财报的开源 Python 库。
✎ 这个工具解决了金融数据获取的痛点——直接让 AI 读取结构化财报,比如让 Claude 分析苹果的 10-K 文件。适合量化分析师或金融开发者快速构建数据管道。但注意,它依赖 SEC 网站稳定性,高峰期可能延迟。