LinkedIn Profile Data Mining Server

数据与存储

by amankale376

Enable advanced LinkedIn profile search, extraction, and contact information enrichment through a powerful MCP server. Leverage AI-powered query expansion, smart filtering, and multiple data sources to obtain comprehensive and validated professional profiles. Export and manage data efficiently with built-in CSV support and persistent storage.

什么是 LinkedIn Profile Data Mining Server

Enable advanced LinkedIn profile search, extraction, and contact information enrichment through a powerful MCP server. Leverage AI-powered query expansion, smart filtering, and multiple data sources to obtain comprehensive and validated professional profiles. Export and manage data efficiently with built-in CSV support and persistent storage.

README

LinkedIn Profile Data Mining MCP Server

A comprehensive Model Context Protocol (MCP) server for LinkedIn profile data mining, search, and contact information enrichment. This server integrates all the powerful features from the original data mining tool into an MCP-compatible interface.

Features

🔍 Advanced Search Capabilities

  • Google Search Integration: Uses Google Custom Search API for LinkedIn profile discovery
  • AI-Powered Query Expansion: Generates additional search queries using OpenAI GPT-4o mini
  • Smart Filtering: AI-based relevance filtering to ensure high-quality results
  • Location-Based Search: Supports global location targeting for comprehensive coverage

📊 Profile Data Extraction

  • Direct LinkedIn Scraping: Extracts profile data directly from LinkedIn pages
  • Nubela Proxycurl Fallback: Uses Nubela API when direct scraping fails
  • Structured Data Parsing: Extracts JSON-LD structured data from LinkedIn profiles
  • Comprehensive Profile Fields: Name, company, job title, description, followers, etc.

📞 Contact Information Enrichment

  • Apollo.io Integration: Enriches profiles with email addresses and phone numbers
  • Company Information: Retrieves detailed company descriptions and contact details
  • Professional Validation: Ensures contact information accuracy through API validation

🤖 AI-Powered Features

  • Profile Summarization: Generates concise professional summaries using AI
  • Relevance Scoring: AI-based filtering to match search intent
  • Query Optimization: Intelligent search query generation and expansion
  • Multiple LLM Support: OpenAI, Gemini, OpenRouter, and Ollama compatibility

💾 Data Management

  • SQLite Database: Persistent storage for all extracted profiles
  • CSV Export: Easy data export for analysis and CRM integration
  • Duplicate Prevention: Automatic detection and prevention of duplicate profiles
  • Data Validation: Ensures data quality and completeness

Installation

  1. Clone or navigate to the server directory:

    bash
    cd smithery-servers/profile-searcher
    
  2. Install dependencies:

    bash
    npm install
    
  3. Configure API keys:

    bash
    # Copy the example configuration file
    cp .env.example .env
    
    # Edit .env with your API keys
    nano .env  # or use your preferred editor
    
  4. Start the development server:

    bash
    npm run dev
    

🔑 API Keys Configuration

📋 See CONFIGURATION.md for detailed setup instructions

Quick Setup:

  1. Required: Apollo.io API key → Get from apollo.io/settings/integrations
  2. Required: OpenAI API key → Get from platform.openai.com/api-keys
  3. Optional: Nubela API key → Get from nubela.co/proxycurl

Environment Variables (.env file):

env
APOLLO_API_KEY=your_apollo_api_key_here
OPENAI_API_KEY=sk-your_openai_api_key_here
NUBELA_API_KEY=your_nubela_api_key_here
DEBUG=false

For Claude Desktop (claude_desktop_config.json):

json
{
  "mcpServers": {
    "profile-searcher": {
      "command": "node",
      "args": ["/path/to/smithery-servers/profile-searcher/dist/index.js"],
      "env": {
        "APOLLO_API_KEY": "your_apollo_api_key_here",
        "OPENAI_API_KEY": "sk-your_openai_api_key_here"
      }
    }
  }
}

Available Tools

1. search_linkedin_profiles

Search for LinkedIn profiles based on keywords.

Parameters:

  • keywords (string): Search keywords (e.g., "AI podcast host")
  • num_results (number): Number of results to return (default: 20)

Example:

json
{
  "keywords": "AI podcast host",
  "num_results": 10
}

2. extract_profile_data

Extract detailed profile data from LinkedIn URLs.

Parameters:

  • urls (array): Array of LinkedIn profile URLs
  • include_contact_info (boolean): Whether to include contact info (default: true)

Example:

json
{
  "urls": [
    "https://www.linkedin.com/in/example-profile",
    "https://www.linkedin.com/in/another-profile"
  ],
  "include_contact_info": true
}

3. mine_linkedin_data

Comprehensive data mining: search, extract, and enrich profile data.

Parameters:

  • keywords (string): Keywords to search for
  • num_results (number): Number of profiles to process (default: 20)
  • export_csv (boolean): Whether to export to CSV (default: true)
  • csv_filename (string, optional): Custom CSV filename

Example:

json
{
  "keywords": "blockchain developer",
  "num_results": 25,
  "export_csv": true,
  "csv_filename": "blockchain_developers.csv"
}

4. get_contact_info

Get contact information for a specific person using Apollo API.

Parameters:

  • person_name (string): Full name of the person
  • company_name (string): Company where the person works

Example:

json
{
  "person_name": "John Smith",
  "company_name": "Tech Corp"
}

5. export_to_csv

Export all stored profile data to CSV file.

Parameters:

  • filename (string, optional): Custom filename for export

Example:

json
{
  "filename": "all_profiles_export.csv"
}

6. get_stored_profiles

Retrieve all profiles stored in the database.

Parameters: None

7. generate_search_queries

Generate additional search queries using AI.

Parameters:

  • main_query (string): Main search query to expand
  • num_queries (number): Number of additional queries (default: 3)

Example:

json
{
  "main_query": "site:linkedin.com/in AI podcast host",
  "num_queries": 5
}

Data Structure

Profile Data Fields

Each extracted profile contains the following fields:

typescript
interface ProfileData {
  author_profile_url: string;           // LinkedIn profile URL
  author_name?: string;                 // Full name
  authors_desc?: string;                // Profile headline/description
  Company?: string;                     // Current company
  Job_title?: string;                   // Current job title
  InteractionStatistic_followers?: string; // Follower count
  email?: string;                       // Email address (from Apollo)
  phone1?: string;                      // Primary phone (from Apollo)
  phone2?: string;                      // Secondary phone (from Apollo)
  about_company?: string;               // Company description (from Apollo)
  profile_summary?: string;             // AI-generated summary
  post_details?: string;                // Recent post content
  transcript?: string;                  // Podcast/video transcripts
  post_summary?: string;                // AI summary of posts
  transcript_summary?: string;          // AI summary of transcripts
  author_activity?: string;             // Activity summary
}

Database Schema

The server uses SQLite with three main tables:

  1. author_urls_table: Stores complete profile information
  2. validated_profiles: Tracks AI validation results
  3. search_queries: Stores search queries and results

File Structure

code
smithery-servers/profile-searcher/
├── src/
│   └── index.ts              # Main server implementation
├── package.json              # Dependencies and scripts
├── README.md                 # This documentation
├── smithery.yaml            # Smithery configuration
├── Database/                # SQLite database files
│   └── author_profile_.db   # Main database
└── Data/                    # CSV export files
    └── *.csv               # Exported profile data

Usage Examples

Basic Profile Search

javascript
// Search for AI podcast hosts
const result = await mcpClient.callTool("search_linkedin_profiles", {
  keywords: "AI podcast host",
  num_results: 15
});

Comprehensive Data Mining

javascript
// Mine data for blockchain developers
const result = await mcpClient.callTool("mine_linkedin_data", {
  keywords: "blockchain developer",
  num_results: 30,
  export_csv: true,
  csv_filename: "blockchain_talent.csv"
});

Contact Information Lookup

javascript
// Get contact info for a specific person
const result = await mcpClient.callTool("get_contact_info", {
  person_name: "Jane Doe",
  company_name: "AI Innovations Inc"
});

Rate Limiting and Best Practices

  1. Respect Rate Limits: The server includes built-in delays between requests
  2. API Key Management: Keep your API keys secure and monitor usage
  3. Data Privacy: Ensure compliance with data protection regulations
  4. Ethical Use: Use the tool responsibly and respect LinkedIn's terms of service

Troubleshooting

Common Issues

  1. Missing Dependencies: Run npm install to ensure all packages are installed
  2. API Key Errors: Verify all required API keys are correctly configured
  3. Database Permissions: Ensure write permissions for the Database directory
  4. Network Issues: Check internet connectivity for API calls

Debug Mode

Enable debug mode in configuration for detailed logging:

json
{
  "debug": true
}

Contributing

This server is based on the comprehensive data mining tool and includes all its advanced features. For improvements or bug reports, please refer to the original implementation.

License

ISC License - See package.json for details.

Disclaimer

This tool is for legitimate business and research purposes. Users are responsible for complying with LinkedIn's terms of service, data protection regulations, and applicable laws. Always respect privacy and obtain necessary permissions before collecting personal data.

常见问题

LinkedIn Profile Data Mining Server 是什么?

Enable advanced LinkedIn profile search, extraction, and contact information enrichment through a powerful MCP server. Leverage AI-powered query expansion, smart filtering, and multiple data sources to obtain comprehensive and validated professional profiles. Export and manage data efficiently with built-in CSV support and persistent storage.

相关 Skills

技术栈评估

by alirezarezvani

Universal
热门

对比框架、数据库和云服务,结合 5 年 TCO、安全风险、生态活力与迁移复杂度做量化评估,适合技术选型、栈升级和替换路线决策。

帮你系统比较技术栈优劣,不只看功能,还把TCO、安全性和生态健康度一起量化,选型和迁移决策更稳。

数据与存储
未扫描17.9k

资深数据科学家

by alirezarezvani

Universal
热门

覆盖实验设计、特征工程、预测建模、因果推断与模型评估,适合用 Python/R/SQL 做 A/B 测试、时序分析和生产级 ML 落地,支撑数据驱动决策。

从 A/B 测试、因果分析到预测建模一条龙搞定,既有硬核统计方法也懂业务沟通,特别适合把数据结论真正落地。

数据与存储
未扫描17.9k

资深架构师

by alirezarezvani

Universal
热门

适合系统设计评审、ADR记录和扩展性规划,分析依赖与耦合,权衡单体或微服务、数据库与技术栈选型,并输出Mermaid、PlantUML、ASCII架构图。

搞系统设计、技术选型和扩展规划时,用它能更快理清架构决策与依赖关系,还能直接产出 Mermaid/PlantUML 图,方案讨论效率很高。

数据与存储
未扫描17.9k

相关 MCP Server

SQLite 数据库

编辑精选

by Anthropic

热门

SQLite 是让 AI 直接查询本地数据库进行数据分析的 MCP 服务器。

这个服务器解决了 AI 无法直接访问 SQLite 数据库的问题,适合需要快速分析本地数据集的开发者。不过,作为参考实现,它可能缺乏生产级的安全特性,建议在受控环境中使用。

数据与存储
87.1k

by Anthropic

热门

PostgreSQL 是让 Claude 直接查询和管理你的数据库的 MCP 服务器。

这个服务器解决了开发者需要手动编写 SQL 查询的痛点,特别适合数据分析师或后端开发者快速探索数据库结构。不过,由于是参考实现,生产环境使用前务必评估安全风险,别指望它能处理复杂事务。

数据与存储
87.1k

by Firecrawl

热门

Firecrawl 是让 AI 直接抓取网页并提取结构化数据的 MCP 服务器。

它解决了手动写爬虫的麻烦,让 Claude 能直接访问动态网页内容。最适合需要实时数据的研究者或开发者,比如监控竞品价格或抓取新闻。但要注意,它依赖第三方 API,可能涉及隐私和成本问题。

数据与存储
6.5k

评论