什么是 io.github.HzaCode/onecite?
可根据 DOI、arXiv、标题或 URL 生成学术引用,支持 BibTeX、APA、MLA 等格式。
README
<p align="center"> OneCite is a command-line tool and Python library for citation management. It accepts DOIs, paper titles, arXiv IDs, and mixed inputs, and outputs formatted bibliographic entries. </p>
Researchers frequently accumulate reference lists in ad-hoc formats—DOIs copied from browser tabs, arXiv IDs from paper PDFs, titles typed by hand, and BibTeX fragments from various sources. Cleaning these into consistent BibTeX output is tedious and error-prone. OneCite parses raw reference text and attempts metadata lookup against configured sources such as CrossRef, PubMed, arXiv, and Semantic Scholar. The result is a reproducible processing layer that reports unresolved entries and produces auditable BibTeX where metadata can be found.
Features
| Feature | Description |
|---|---|
| Fuzzy Matching | Attempt to match incomplete references against configured academic metadata sources. |
| Multiple Formats | Input .txt/.bib → Output BibTeX. |
| 4-stage Pipeline | A 4-stage process (clean → query → validate → format) to produce consistent output. |
| Field Completion | Fill available fields returned by metadata sources, such as journal, volume, pages, authors, and abstract. |
| 🎓 7+ Citation Types | Handles journal articles, conference papers, books, software, datasets, theses, and preprints. |
| Multi-Source Lookup | Uses source-specific routes for CrossRef, arXiv, PubMed, Semantic Scholar, Google Books, and others. |
| Many Identifier Types | Accepts DOI, PMID, arXiv ID, ISBN, GitHub URL, Zenodo DOI, or plain text queries. |
| 🎛️ Interactive Mode | Manually select the correct entry when multiple potential matches are found. |
| Custom Templates | YAML-based presets that provide a fallback BibTeX entry type when auto-detection is inconclusive. |
🌐 Data Sources
<div align="center"> </div>Quick Start
Install and try OneCite in a few steps.
1. Installation
# Recommended: Install from PyPI
pip install onecite
2. Create an Input File
Create a file named references.txt with your mixed-format references:
# references.txt
# Add blank lines between entries to avoid misidentification
10.1038/nature14539
Attention is all you need, Vaswani et al., NIPS 2017
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep Learning. MIT Press.
https://github.com/tensorflow/tensorflow
10.5281/zenodo.3233118
arXiv:2103.00020
Smith, J. (2020). Neural Architecture Search. PhD Thesis. Stanford University.
3. Run OneCite
Execute the command to process your file and generate a clean .bib output.
onecite process references.txt -o results.bib --quiet
4. View Output
Your results.bib file now contains entries of different types.
@article{LeCun2015Deep,
doi = "10.1038/nature14539",
title = "Deep learning",
author = "LeCun, Yann and Bengio, Yoshua and Hinton, Geoffrey",
journal = "Nature",
year = 2015,
volume = 521,
number = 7553,
pages = "436-444",
publisher = "Springer Science and Business Media LLC",
url = "https://doi.org/10.1038/nature14539",
type = "journal-article",
abstract = "Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction...",
}
@inproceedings{Vaswani2017Attention,
arxiv = "1706.03762",
title = "Attention Is All You Need",
author = "Vaswani, Ashish and Shazeer, Noam and Parmar, Niki and Uszkoreit, Jakob and Jones, Llion and Gomez, Aidan N. and Kaiser, Lukasz and Polosukhin, Illia",
year = 2017,
booktitle = "Advances in Neural Information Processing Systems (NeurIPS)",
url = "https://arxiv.org/abs/1706.03762",
}
# ... and 5 more entries ...
📖 Advanced Usage
<details> <summary><strong>Direct String and Stdin Input</strong></summary>onecite process "10.1038/nature14539"
onecite process "Attention is all you need, Vaswani et al., NIPS 2017"
echo "10.1038/nature14539" | onecite process -
For ambiguous entries, use the --interactive flag to manually select the correct match and ensure accuracy.
Command:
onecite process ambiguous.txt --interactive
Example Interaction:
Found multiple possible matches for "Deep learning Hinton":
1. Deep learning
Authors: LeCun, Yann; Bengio, Yoshua; Hinton, Geoffrey
Journal: Nature, 2015
DOI: 10.1038/nature14539
2. Deep belief networks
Authors: Hinton, Geoffrey E.
Journal: Scholarpedia, 2009
DOI: 10.4249/scholarpedia.5947
Please select (1-2, 0=skip): 1
Selected: Deep learning
Use OneCite directly in your Python scripts.
from onecite import process_references
# A callback can be used for non-interactive selection (e.g., always choose the best match)
def auto_select_callback(candidates):
return 0 # Index of the best candidate
result = process_references(
input_content="Deep learning review\nLeCun, Bengio, Hinton\nNature 2015",
input_type="txt",
template_name="journal_article_full",
output_format="bibtex",
interactive_callback=auto_select_callback
)
print('\n\n'.join(result['results']))
OneCite provides a command-line interface with the following commands and options:
onecite process
The main command for processing references through the OneCite pipeline.
Usage:
onecite process <input_file> [OPTIONS]
Arguments:
input_file- Input file path,-for stdin, or a reference string (e.g., DOI, title)
Options:
| Option | Short | Description | Default |
|---|---|---|---|
--input-type | Input format: txt or bib | txt | |
--template | Fallback BibTeX entry-type preset when auto-detection is inconclusive | journal_article_full | |
--output-format | Output format (currently only bibtex supported) | bibtex | |
--output | -o | Output file path (default: stdout) | - |
--interactive | Enable interactive mode for ambiguous matches | False | |
--quiet | -q | Suppress verbose logging output | False |
--json | Print a stable JSON envelope instead of BibTeX text | False | |
--ndjson | Print newline-delimited JSON events for streaming automation workflows | False | |
--fail-on-unresolved | Return exit code 2 when any entry cannot be resolved | False | |
--google-scholar | Enable Google Scholar as an additional data source (requires scholarly package) | False |
Examples:
# Process a text file
onecite process references.txt -o results.bib
# Process a BibTeX file with auto-detection
onecite process references.bib
# Process with interactive mode
onecite process ambiguous.txt --interactive
# Use stdin
echo "10.1038/nature14539" | onecite process -
# Process a direct string (DOI)
onecite process "10.1038/nature14539"
# Process with custom template
onecite process references.txt --template conference_paper
# Enable Google Scholar (requires scholarly package)
onecite process references.txt --google-scholar
# Quiet mode for scripts
onecite process references.txt -o results.bib --quiet
# Automation-friendly JSON with unresolved-entry exit-code handling
onecite process references.txt --json --fail-on-unresolved
# Streaming NDJSON for automation
onecite process references.txt --ndjson
onecite --version
Display the installed OneCite version.
Usage:
onecite --version
onecite version
Alternative command to display version information.
Usage:
onecite version
onecite templates
List the bundled fallback BibTeX templates and the fields they request.
Usage:
onecite templates
onecite templates --json
onecite benchmark
Run a small deterministic regression suite for covered DOI lookup, arXiv lookup, PMID/PubMed lookup, GitHub software URLs, Zenodo/DataCite dataset DOIs, and mixed valid/invalid batches. The command is designed for CI and automation workflows that need a machine-readable pass/fail check; it is not a comprehensive citation-accuracy benchmark.
Usage:
onecite benchmark [OPTIONS]
Options:
| Option | Description | Default |
|---|---|---|
--cases | Path to a custom benchmark suite JSON file | bundled golden cases |
--min-success-rate | Minimum covered-case pass rate required for exit code 0 | 1.0 |
--json | Print the benchmark report as JSON | False |
--live | Use live external APIs instead of bundled offline fixtures | False |
Examples:
onecite benchmark
onecite benchmark --json
onecite benchmark --live --json
onecite benchmark --cases my_cases.json --min-success-rate 1.0 --json
The repository baseline record is stored at benchmarks/leaderboard.json, with
reproduction instructions in benchmarks/README.md.
onecite doctor
Check the local installation health for automation and CI. The doctor command checks package importability, bundled templates, packaged benchmark resources, the repository-contained OneCite Skill, and the offline benchmark regression check.
Usage:
onecite doctor
onecite doctor --json
The JSON output is a stable envelope with schema_version, tool,
command, status, environment, summary, and checks fields.
OneCite Skill for Automated Workflows
The repository includes a local skill package at skills/onecite/SKILL.md.
It gives automation and contributor workflows a repeatable procedure for
reference cleanup, benchmark and doctor checks, and explicit
reporting of unresolved entries.
The skill is repository-contained and does not install itself into any local
tool memory.
Input Type Auto-Detection
When --input-type is not specified, OneCite automatically detects the input type:
- Files ending with
.bibare treated as BibTeX format - All other files and strings are treated as plain text
Available Templates
OneCite supports several template presets for different entry types:
journal_article_full- Full journal article entry (default)conference_paper- Conference proceedings paperbook- Book entrythesis- Thesis/dissertation entrydataset- Dataset entrysoftware- Software/code entry
Exit Codes
0- Success1- Error occurred (invalid input, processing failure, etc.)2- One or more entries were unresolved when--fail-on-unresolvedwas used
For onecite benchmark and onecite doctor, exit code 0 means the
configured checks passed and exit code 1 means at least one check failed.
🗺️ Roadmap
- OneCite Skill — Repository-contained operating guide for local citation-cleanup workflows
- Benchmarking — Small deterministic regression suite, configurable pass-rate gate, and baseline record
- Enhanced CLI — Automation-friendly JSON, NDJSON, summaries, and exit codes for reference processing
🤝 Contributing
Contributions are always welcome! Please see CONTRIBUTING.md for development guidelines and instructions on how to submit a pull request.
📄 License
This project is licensed under the MIT License. See the LICENSE file for details.
<div align="center">OneCite
<p> <a href="https://github.com/HzaCode/OneCite">Star on GitHub</a> • <a href="http://hezhiang.com/onecite">Web App</a> • <a href="https://github.com/HzaCode/OneCite/issues">🐛 Report an Issue</a> • <a href="https://github.com/HzaCode/OneCite/discussions">Discussions</a> </p> </div>常见问题
io.github.HzaCode/onecite 是什么?
可根据 DOI、arXiv、标题或 URL 生成学术引用,支持 BibTeX、APA、MLA 等格式。
相关 Skills
前端设计
by anthropics
面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。
✎ 想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。
网页应用测试
by anthropics
用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。
✎ 借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。
网页构建器
by anthropics
面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。
✎ 在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。
相关 MCP Server
GitHub
编辑精选by GitHub
GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。
✎ 这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。
Context7 文档查询
编辑精选by Context7
Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。
✎ 它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。
by tldraw
tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。
✎ 这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。