大语言模型大规模去匿名化攻击

一篇新论文揭示,大语言模型(LLM)能以前所未有的规模和精度执行去匿名化攻击。研究人员构建了一个智能体(Agent),在拥有完整互联网访问权限的情况下,仅凭匿名在线资料和对话,就能高精度地重新识别 Hacker News 用户和 Anthropic Interviewer 参与者,其效率相当于一名专注的人类调查员数小时的工作量。
攻击流水线设计
针对‘封闭世界’场景,研究团队设计了一套可扩展的攻击流水线。给定两个包含匿名个体非结构化文本的数据库,该流水线利用 LLM 执行三个核心步骤:
- 提取身份相关特征:从原始文本中识别可能与个人身份相关的信息。
- 通过语义嵌入搜索候选匹配:利用向量嵌入(Embedding)进行语义相似度搜索,快速筛选潜在匹配对。
- 对顶级候选进行推理验证:对筛选出的候选进行深度推理,验证匹配并减少误报。
与需要结构化数据的经典去匿名化工作(如 Netflix 奖竞赛)不同,这种方法能直接处理任意平台上的原始用户内容。
数据集与评估
为了评估攻击效果,研究人员构建了三个带有已知真实匹配的数据集:
- 数据集一:将 Hacker News 用户与 LinkedIn 个人资料进行关联,利用个人资料中出现的跨平台引用信息。
- 数据集二:匹配 Reddit 电影讨论社区中的用户。
- 数据集三:将单个用户的 Reddit 历史按时间分割,创建两个待匹配的匿名档案。
在每种场景下,基于 LLM 的方法都显著优于传统基线方法。在保持 90% 精确度(Precision)的前提下,LLM 方法实现了高达 68% 的召回率(Recall),而最佳的非 LLM 方法召回率接近 0%。
结论与启示
研究结果表明,保护匿名用户的‘实际隐匿性’(Practical Obscurity)假设已不再成立。在线隐私的威胁模型需要被重新审视。
论文信息
- 标题:Large-scale online deanonymization with LLMs
- 链接:查看 PDF | HTML(实验性)
- 摘要:We show that large language models can be used to perform at-scale deanonymization. With full Internet access, our agent can re-identify Hacker News users and Anthropic Interviewer participants at high precision, given pseudonymous online profiles and conversations alone, matching what would take hours for a dedicated human investigator. We then design attacks for the closed-world setting. Given two databases of pseudonymous individuals, each containing unstructured text written by or about that individual, we implement a scalable attack pipeline that uses LLMs to: (1) extract identity-relevant features, (2) search for candidate matches via semantic embeddings, and (3) reason over top candidates to verify matches and reduce false positives. Compared to classical deanonymization work (e.g., on the Netflix prize) that required structured data, our approach works directly on raw user content across arbitrary platforms. We construct three datasets with known ground-truth data to evaluate our attacks. The first links Hacker News to LinkedIn profiles, using cross-platform references that appear in the profiles. Our second dataset matches users across Reddit movie discussion communities; and the third splits a single user's Reddit history in time to create two pseudonymous profiles to be matched. In each setting, LLM-based methods substantially outperform classical baselines, achieving up to 68% recall at 90% precision compared to near 0% for the best non-LLM method. Our results show that the practical obscurity protecting pseudonymous users online no longer holds and that threat models for online privacy need to be reconsidered.
- 评论:24 页,10 张图
- 主题:Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
- 引用为:arXiv:2602.16800 [cs.CR] (或此版本的 arXiv:2602.16800v2 [cs.CR])
- DOI:https://doi.org/10.48550/arXiv.2602.16800
提交历史
- 来自:Daniel Paleka [查看邮箱]
- [v1] 2026年2月18日 星期三 19:02:50 UTC (1,555 KB)
- [v2] 2026年2月25日 星期三 18:37:33 UTC (1,557 KB)
觉得有用?分享给更多人