计算机视觉

Universal

senior-computer-vision

by alirezarezvani

聚焦目标检测、图像分割与视觉系统落地,覆盖 YOLO、DETR、Mask R-CNN、SAM 等方案,适合定制数据集训练、推理优化及 ONNX/TensorRT 部署。

把目标检测、图像分割到推理部署串成完整工程链路,主流框架与 YOLO、DETR、SAM 等方案都覆盖,落地视觉 AI 会省心很多。

11.5kAI 与智能体未扫描2026年3月5日

安装

claude skill add --url github.com/alirezarezvani/claude-skills/tree/main/engineering-team/senior-computer-vision

文档

Senior Computer Vision Engineer

Production computer vision engineering skill for object detection, image segmentation, and visual AI system deployment.

Table of Contents

Quick Start

bash
# Generate training configuration for YOLO or Faster R-CNN
python scripts/vision_model_trainer.py models/ --task detection --arch yolov8

# Analyze model for optimization opportunities (quantization, pruning)
python scripts/inference_optimizer.py model.pt --target onnx --benchmark

# Build dataset pipeline with augmentations
python scripts/dataset_pipeline_builder.py images/ --format coco --augment

Core Expertise

This skill provides guidance on:

  • Object Detection: YOLO family (v5-v11), Faster R-CNN, DETR, RT-DETR
  • Instance Segmentation: Mask R-CNN, YOLACT, SOLOv2
  • Semantic Segmentation: DeepLabV3+, SegFormer, SAM (Segment Anything)
  • Image Classification: ResNet, EfficientNet, Vision Transformers (ViT, DeiT)
  • Video Analysis: Object tracking (ByteTrack, SORT), action recognition
  • 3D Vision: Depth estimation, point cloud processing, NeRF
  • Production Deployment: ONNX, TensorRT, OpenVINO, CoreML

Tech Stack

CategoryTechnologies
FrameworksPyTorch, torchvision, timm
DetectionUltralytics (YOLO), Detectron2, MMDetection
Segmentationsegment-anything, mmsegmentation
OptimizationONNX, TensorRT, OpenVINO, torch.compile
Image ProcessingOpenCV, Pillow, albumentations
AnnotationCVAT, Label Studio, Roboflow
Experiment TrackingMLflow, Weights & Biases
ServingTriton Inference Server, TorchServe

Workflow 1: Object Detection Pipeline

Use this workflow when building an object detection system from scratch.

Step 1: Define Detection Requirements

Analyze the detection task requirements:

code
Detection Requirements Analysis:
- Target objects: [list specific classes to detect]
- Real-time requirement: [yes/no, target FPS]
- Accuracy priority: [speed vs accuracy trade-off]
- Deployment target: [cloud GPU, edge device, mobile]
- Dataset size: [number of images, annotations per class]

Step 2: Select Detection Architecture

Choose architecture based on requirements:

RequirementRecommended ArchitectureWhy
Real-time (>30 FPS)YOLOv8/v11, RT-DETRSingle-stage, optimized for speed
High accuracyFaster R-CNN, DINOTwo-stage, better localization
Small objectsYOLO + SAHI, Faster R-CNN + FPNMulti-scale detection
Edge deploymentYOLOv8n, MobileNetV3-SSDLightweight architectures
Transformer-basedDETR, DINO, RT-DETREnd-to-end, no NMS required

Step 3: Prepare Dataset

Convert annotations to required format:

bash
# COCO format (recommended)
python scripts/dataset_pipeline_builder.py data/images/ \
    --annotations data/labels/ \
    --format coco \
    --split 0.8 0.1 0.1 \
    --output data/coco/

# Verify dataset
python -c "from pycocotools.coco import COCO; coco = COCO('data/coco/train.json'); print(f'Images: {len(coco.imgs)}, Categories: {len(coco.cats)}')"

Step 4: Configure Training

Generate training configuration:

bash
# For Ultralytics YOLO
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch yolov8m \
    --epochs 100 \
    --batch 16 \
    --imgsz 640 \
    --output configs/

# For Detectron2
python scripts/vision_model_trainer.py data/coco/ \
    --task detection \
    --arch faster_rcnn_R_50_FPN \
    --framework detectron2 \
    --output configs/

Step 5: Train and Validate

bash
# Ultralytics training
yolo detect train data=data.yaml model=yolov8m.pt epochs=100 imgsz=640

# Detectron2 training
python train_net.py --config-file configs/faster_rcnn.yaml --num-gpus 1

# Validate on test set
yolo detect val model=runs/detect/train/weights/best.pt data=data.yaml

Step 6: Evaluate Results

Key metrics to analyze:

MetricTargetDescription
mAP@50>0.7Mean Average Precision at IoU 0.5
mAP@50:95>0.5COCO primary metric
Precision>0.8Low false positives
Recall>0.8Low missed detections
Inference time<33msFor 30 FPS real-time

Workflow 2: Model Optimization and Deployment

Use this workflow when preparing a trained model for production deployment.

Step 1: Benchmark Baseline Performance

bash
# Measure current model performance
python scripts/inference_optimizer.py model.pt \
    --benchmark \
    --input-size 640 640 \
    --batch-sizes 1 4 8 16 \
    --warmup 10 \
    --iterations 100

Expected output:

code
Baseline Performance (PyTorch FP32):
- Batch 1: 45.2ms (22.1 FPS)
- Batch 4: 89.4ms (44.7 FPS)
- Batch 8: 165.3ms (48.4 FPS)
- Memory: 2.1 GB
- Parameters: 25.9M

Step 2: Select Optimization Strategy

Deployment TargetOptimization Path
NVIDIA GPU (cloud)PyTorch → ONNX → TensorRT FP16
NVIDIA GPU (edge)PyTorch → TensorRT INT8
Intel CPUPyTorch → ONNX → OpenVINO
Apple SiliconPyTorch → CoreML
Generic CPUPyTorch → ONNX Runtime
MobilePyTorch → TFLite or ONNX Mobile

Step 3: Export to ONNX

bash
# Export with dynamic batch size
python scripts/inference_optimizer.py model.pt \
    --export onnx \
    --input-size 640 640 \
    --dynamic-batch \
    --simplify \
    --output model.onnx

# Verify ONNX model
python -c "import onnx; model = onnx.load('model.onnx'); onnx.checker.check_model(model); print('ONNX model valid')"

Step 4: Apply Quantization (Optional)

For INT8 quantization with calibration:

bash
# Generate calibration dataset
python scripts/inference_optimizer.py model.onnx \
    --quantize int8 \
    --calibration-data data/calibration/ \
    --calibration-samples 500 \
    --output model_int8.onnx

Quantization impact analysis:

PrecisionSizeSpeedAccuracy Drop
FP32100%1x0%
FP1650%1.5-2x<0.5%
INT825%2-4x1-3%

Step 5: Convert to Target Runtime

bash
# TensorRT (NVIDIA GPU)
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16

# OpenVINO (Intel)
mo --input_model model.onnx --output_dir openvino/

# CoreML (Apple)
python -c "import coremltools as ct; model = ct.convert('model.onnx'); model.save('model.mlpackage')"

Step 6: Benchmark Optimized Model

bash
python scripts/inference_optimizer.py model.engine \
    --benchmark \
    --runtime tensorrt \
    --compare model.pt

Expected speedup:

code
Optimization Results:
- Original (PyTorch FP32): 45.2ms
- Optimized (TensorRT FP16): 12.8ms
- Speedup: 3.5x
- Accuracy change: -0.3% mAP

Workflow 3: Custom Dataset Preparation

Use this workflow when preparing a computer vision dataset for training.

Step 1: Audit Raw Data

bash
# Analyze image dataset
python scripts/dataset_pipeline_builder.py data/raw/ \
    --analyze \
    --output analysis/

Analysis report includes:

code
Dataset Analysis:
- Total images: 5,234
- Image sizes: 640x480 to 4096x3072 (variable)
- Formats: JPEG (4,891), PNG (343)
- Corrupted: 12 files
- Duplicates: 45 pairs

Annotation Analysis:
- Format detected: Pascal VOC XML
- Total annotations: 28,456
- Classes: 5 (car, person, bicycle, dog, cat)
- Distribution: car (12,340), person (8,234), bicycle (3,456), dog (2,890), cat (1,536)
- Empty images: 234

Step 2: Clean and Validate

bash
# Remove corrupted and duplicate images
python scripts/dataset_pipeline_builder.py data/raw/ \
    --clean \
    --remove-corrupted \
    --remove-duplicates \
    --output data/cleaned/

Step 3: Convert Annotation Format

bash
# Convert VOC to COCO format
python scripts/dataset_pipeline_builder.py data/cleaned/ \
    --annotations data/annotations/ \
    --input-format voc \
    --output-format coco \
    --output data/coco/

Supported format conversions:

FromTo
Pascal VOC XMLCOCO JSON
YOLO TXTCOCO JSON
COCO JSONYOLO TXT
LabelMe JSONCOCO JSON
CVAT XMLCOCO JSON

Step 4: Apply Augmentations

bash
# Generate augmentation config
python scripts/dataset_pipeline_builder.py data/coco/ \
    --augment \
    --aug-config configs/augmentation.yaml \
    --output data/augmented/

Recommended augmentations for detection:

yaml
# configs/augmentation.yaml
augmentations:
  geometric:
    - horizontal_flip: { p: 0.5 }
    - vertical_flip: { p: 0.1 }  # Only if orientation invariant
    - rotate: { limit: 15, p: 0.3 }
    - scale: { scale_limit: 0.2, p: 0.5 }

  color:
    - brightness_contrast: { brightness_limit: 0.2, contrast_limit: 0.2, p: 0.5 }
    - hue_saturation: { hue_shift_limit: 20, sat_shift_limit: 30, p: 0.3 }
    - blur: { blur_limit: 3, p: 0.1 }

  advanced:
    - mosaic: { p: 0.5 }  # YOLO-style mosaic
    - mixup: { p: 0.1 }   # Image mixing
    - cutout: { num_holes: 8, max_h_size: 32, max_w_size: 32, p: 0.3 }

Step 5: Create Train/Val/Test Splits

bash
python scripts/dataset_pipeline_builder.py data/augmented/ \
    --split 0.8 0.1 0.1 \
    --stratify \
    --seed 42 \
    --output data/final/

Split strategy guidelines:

Dataset SizeTrainValTest
<1,000 images70%15%15%
1,000-10,00080%10%10%
>10,00090%5%5%

Step 6: Generate Dataset Configuration

bash
# For Ultralytics YOLO
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config yolo \
    --output data.yaml

# For Detectron2
python scripts/dataset_pipeline_builder.py data/final/ \
    --generate-config detectron2 \
    --output detectron2_config.py

Architecture Selection Guide

Object Detection Architectures

ArchitectureSpeedAccuracyBest For
YOLOv8n1.2ms37.3 mAPEdge, mobile, real-time
YOLOv8s2.1ms44.9 mAPBalanced speed/accuracy
YOLOv8m4.2ms50.2 mAPGeneral purpose
YOLOv8l6.8ms52.9 mAPHigh accuracy
YOLOv8x10.1ms53.9 mAPMaximum accuracy
RT-DETR-L5.3ms53.0 mAPTransformer, no NMS
Faster R-CNN R5046ms40.2 mAPTwo-stage, high quality
DINO-4scale85ms49.0 mAPSOTA transformer

Segmentation Architectures

ArchitectureTypeSpeedBest For
YOLOv8-segInstance4.5msReal-time instance seg
Mask R-CNNInstance67msHigh-quality masks
SAMPromptable50msZero-shot segmentation
DeepLabV3+Semantic25msScene parsing
SegFormerSemantic15msEfficient semantic seg

CNN vs Vision Transformer Trade-offs

AspectCNN (YOLO, R-CNN)ViT (DETR, DINO)
Training data needed1K-10K images10K-100K+ images
Training timeFastSlow (needs more epochs)
Inference speedFasterSlower
Small objectsGood with FPNNeeds multi-scale
Global contextLimitedExcellent
Positional encodingImplicitExplicit

Reference Documentation

1. Computer Vision Architectures

See references/computer_vision_architectures.md for:

  • CNN backbone architectures (ResNet, EfficientNet, ConvNeXt)
  • Vision Transformer variants (ViT, DeiT, Swin)
  • Detection heads (anchor-based vs anchor-free)
  • Feature Pyramid Networks (FPN, BiFPN, PANet)
  • Neck architectures for multi-scale detection

2. Object Detection Optimization

See references/object_detection_optimization.md for:

  • Non-Maximum Suppression variants (NMS, Soft-NMS, DIoU-NMS)
  • Anchor optimization and anchor-free alternatives
  • Loss function design (focal loss, GIoU, CIoU, DIoU)
  • Training strategies (warmup, cosine annealing, EMA)
  • Data augmentation for detection (mosaic, mixup, copy-paste)

3. Production Vision Systems

See references/production_vision_systems.md for:

  • ONNX export and optimization
  • TensorRT deployment pipeline
  • Batch inference optimization
  • Edge device deployment (Jetson, Intel NCS)
  • Model serving with Triton
  • Video processing pipelines

Common Commands

Ultralytics YOLO

bash
# Training
yolo detect train data=coco.yaml model=yolov8m.pt epochs=100 imgsz=640

# Validation
yolo detect val model=best.pt data=coco.yaml

# Inference
yolo detect predict model=best.pt source=images/ save=True

# Export
yolo export model=best.pt format=onnx simplify=True dynamic=True

Detectron2

bash
# Training
python train_net.py --config-file configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml \
    --num-gpus 1 OUTPUT_DIR ./output

# Evaluation
python train_net.py --config-file configs/faster_rcnn.yaml --eval-only \
    MODEL.WEIGHTS output/model_final.pth

# Inference
python demo.py --config-file configs/faster_rcnn.yaml \
    --input images/*.jpg --output results/ \
    --opts MODEL.WEIGHTS output/model_final.pth

MMDetection

bash
# Training
python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_1x_coco.py

# Testing
python tools/test.py configs/faster_rcnn.py checkpoints/latest.pth --eval bbox

# Inference
python demo/image_demo.py demo.jpg configs/faster_rcnn.py checkpoints/latest.pth

Model Optimization

bash
# ONNX export and simplify
python -c "import torch; model = torch.load('model.pt'); torch.onnx.export(model, torch.randn(1,3,640,640), 'model.onnx', opset_version=17)"
python -m onnxsim model.onnx model_sim.onnx

# TensorRT conversion
trtexec --onnx=model.onnx --saveEngine=model.engine --fp16 --workspace=4096

# Benchmark
trtexec --loadEngine=model.engine --batch=1 --iterations=1000 --avgRuns=100

Performance Targets

MetricReal-timeHigh AccuracyEdge
FPS>30>10>15
mAP@50>0.6>0.8>0.5
Latency P99<50ms<150ms<100ms
GPU Memory<4GB<8GB<2GB
Model Size<50MB<200MB<20MB

Resources

  • Architecture Guide: references/computer_vision_architectures.md
  • Optimization Guide: references/object_detection_optimization.md
  • Deployment Guide: references/production_vision_systems.md
  • Scripts: scripts/ directory for automation tools

相关 Skills

Claude接口

by anthropics

Universal
热门

面向接入 Claude API、Anthropic SDK 或 Agent SDK 的开发场景,自动识别项目语言并给出对应示例与默认配置,快速搭建 LLM 应用。

想把Claude能力接进应用或智能体,用claude-api上手快、兼容Anthropic与Agent SDK,集成路径清晰又省心

AI 与智能体
未扫描119.1k

RAG架构师

by alirezarezvani

Universal
热门

聚焦生产级RAG系统设计与优化,覆盖文档切块、检索链路、索引构建、召回评估等关键环节,适合搭建可扩展、高准确率的知识库问答与检索增强应用。

面向RAG落地,把知识库、向量检索和生成链路系统串联起来,做架构设计时更清晰,也更少踩坑。

AI 与智能体
未扫描11.5k

智能体流程设计

by alirezarezvani

Universal
热门

面向生产级多 Agent 编排,梳理顺序、并行、分层、事件驱动、共识五种工作流设计,覆盖 handoff、状态管理、容错重试、上下文预算与成本优化,适合搭建复杂 AI 协作系统。

帮你把多智能体流程设计、编排和自动化统一起来,复杂工作流也能更稳地落地,适合追求强控制力的团队。

AI 与智能体
未扫描11.5k

相关 MCP 服务

知识图谱记忆

编辑精选

by Anthropic

热门

Memory 是一个基于本地知识图谱的持久化记忆系统,让 AI 记住长期上下文。

帮 AI 和智能体补上“记不住”的短板,用本地知识图谱沉淀长期上下文,连续对话更聪明,数据也更可控。

AI 与智能体
83.9k

顺序思维

编辑精选

by Anthropic

热门

Sequential Thinking 是让 AI 通过动态思维链解决复杂问题的参考服务器。

这个服务器展示了如何让 Claude 像人类一样逐步推理,适合开发者学习 MCP 的思维链实现。但注意它只是个参考示例,别指望直接用在生产环境里。

AI 与智能体
83.9k

PraisonAI

编辑精选

by mervinpraison

热门

PraisonAI 是一个支持自反思和多 LLM 的低代码 AI 智能体框架。

如果你需要快速搭建一个能 24/7 运行的 AI 智能体团队来处理复杂任务(比如自动研究或代码生成),PraisonAI 的低代码设计和多平台集成(如 Telegram)让它上手极快。但作为非官方项目,它的生态成熟度可能不如 LangChain 等主流框架,适合愿意尝鲜的开发者。

AI 与智能体
6.9k

评论