io.github.panbanda/omen

编码与调试

by panbanda

支持多语言代码分析,评估复杂度、技术债、hotspots、ownership 以及缺陷预测。

什么是 io.github.panbanda/omen

支持多语言代码分析,评估复杂度、技术债、hotspots、ownership 以及缺陷预测。

README

<div align="center">

Omen

<img src="assets/omen-logo.jpg" alt="Omen - Code Analysis CLI" width="100%">

Rust Version License CI Release Crates.io

Your AI writes code without knowing where the landmines are.

Omen gives AI assistants the context they need: complexity hotspots, hidden dependencies, defect-prone files, and self-admitted debt. One command surfaces what's invisible.

Why "Omen"? An omen is a sign of things to come - good or bad. Your codebase is full of omens: low complexity and clean architecture signal smooth sailing ahead, while high churn, technical debt, and code clones warn of trouble brewing. Omen surfaces these signals so you can act before that "temporary fix" celebrates its third anniversary in production.

</div>

Features

<details> <summary><strong>Complexity Analysis</strong> - How hard your code is to understand and test</summary>

There are two types of complexity:

  • Cyclomatic Complexity counts the number of different paths through your code. Every if, for, while, or switch creates a new path. A function with cyclomatic complexity of 10 means there are 10 different ways to run through it. The higher the number, the more test cases you need to cover all scenarios.

  • Cognitive Complexity measures how hard code is for a human to read. It penalizes deeply nested code (like an if inside a for inside another if) more than flat code. Two functions can have the same cyclomatic complexity, but the one with deeper nesting will have higher cognitive complexity because it's harder to keep track of.

Why it matters: Research shows that complex code has more bugs and takes longer to fix. McCabe's original 1976 paper found that functions with complexity over 10 are significantly harder to maintain. SonarSource's cognitive complexity builds on this by measuring what actually confuses developers.

[!TIP] Keep cyclomatic complexity under 10 and cognitive complexity under 15 per function.

</details> <details> <summary><strong>Self-Admitted Technical Debt (SATD)</strong> - Comments where developers admit they took shortcuts</summary>

When developers write TODO: fix this later or HACK: this is terrible but works, they're creating technical debt and admitting it. Omen finds these comments and groups them by type:

CategoryMarkersWhat it means
DesignHACK, KLUDGE, SMELLArchitecture shortcuts that need rethinking
DefectBUG, FIXME, BROKENKnown bugs that haven't been fixed
RequirementTODO, FEATMissing features or incomplete implementations
TestFAILING, SKIP, DISABLEDTests that are broken or turned off
PerformanceSLOW, OPTIMIZE, PERFCode that works but needs to be faster
SecuritySECURITY, VULN, UNSAFEKnown security issues

Why it matters: Potdar and Shihab's 2014 study found that SATD comments often stay in codebases for years. The longer they stay, the harder they are to fix because people forget the context. Maldonado and Shihab (2015) showed that design debt is the most common and most dangerous type.

[!TIP] Review SATD weekly. If a TODO is older than 6 months, either fix it or delete it.

</details> <details> <summary><strong>Dead Code Detection</strong> - Code that exists but never runs</summary>

Dead code includes:

  • Functions that are never called
  • Variables that are assigned but never used
  • Classes that are never instantiated
  • Code after a return statement that can never execute

Why it matters: Dead code isn't just clutter. It confuses new developers who think it must be important. It increases build times and binary sizes. Worst of all, it can hide bugs - if someone "fixes" dead code thinking it runs, they've wasted time. Romano et al. (2020) found that dead code is a strong predictor of other code quality problems.

[!TIP] Delete dead code. Version control means you can always get it back if needed.

</details> <details> <summary><strong>Git Churn Analysis</strong> - How often files change over time</summary>

Churn looks at your git history and counts:

  • How many times each file was modified
  • How many lines were added and deleted
  • Which files change together

Files with high churn are "hotspots" - they're constantly being touched, which could mean they're:

  • Central to the system (everyone needs to modify them)
  • Poorly designed (constant bug fixes)
  • Missing good abstractions (features keep getting bolted on)

Why it matters: Nagappan and Ball's 2005 research at Microsoft found that code churn is one of the best predictors of bugs. Files that change a lot tend to have more defects. Combined with complexity data, churn helps you find the files that are both complicated AND frequently modified - your highest-risk code.

[!TIP] If a file has high churn AND high complexity, prioritize refactoring it.

</details> <details> <summary><strong>Code Clone Detection</strong> - Duplicated code that appears in multiple places</summary>

There are three types of clones:

TypeDescriptionExample
Type-1Exact copies (maybe different whitespace/comments)Copy-pasted code
Type-2Same structure, different namesSame function with renamed variables
Type-3Similar code with some modificationsFunctions that do almost the same thing

Why it matters: When you fix a bug in one copy, you have to remember to fix all the other copies too. Juergens et al. (2009) found that cloned code has significantly more bugs because fixes don't get applied consistently. The more clones you have, the more likely you'll miss one during updates.

[!TIP] Anything copied more than twice should probably be a shared function. Aim for duplication ratio under 5%.

</details> <details> <summary><strong>Defect Prediction</strong> - The likelihood that a file contains bugs</summary>

Omen combines multiple signals to predict defect probability using PMAT-weighted metrics:

  • Process metrics (churn frequency, ownership diffusion)
  • Metrics (cyclomatic/cognitive complexity)
  • Age (code age and stability)
  • Total size (lines of code)

Each file gets a risk score from 0% to 100%.

Why it matters: You can't review everything equally. Menzies et al. (2007) showed that defect prediction helps teams focus testing and code review on the files most likely to have problems. Rahman et al. (2014) found that even simple models outperform random file selection for finding bugs.

[!TIP] Prioritize code review for files with >70% defect probability.

</details> <details> <summary><strong>Change Risk Analysis (JIT)</strong> - Predict which commits are likely to introduce bugs</summary>

Just-in-Time (JIT) defect prediction analyzes recent commits to identify risky changes before they cause problems. Unlike file-level prediction, JIT operates at the commit level using change-scope factors from Kamei et al. (2013), augmented with file-level risk signals from the software engineering research literature.

Change-scope factors (75% of score):

Weighted using Kamei's median logistic regression coefficients (relative ordering across projects):

FactorNameWeightWhat it measures
LALines Added0.16More additions = more risk (strongest predictor)
ENTROPYChange Entropy0.14Scattered changes = harder to review
FIXBug Fix0.12Bug fix commits indicate problematic areas
LDLines Deleted0.08Deletions are generally safer
NFNumber of Files0.08More files = more coordination risk
NUCUnique Changes0.07More unique prior commits on files
NDEVNumber of Developers0.05More developers on files = more risk
EXPDeveloper Experience0.05Less experience = more risk

File-level risk signals (25% of score):

SignalWeightSource
File Churn0.10Nagappan & Ball (2005) - historical change frequency predicts defects
File Complexity0.08Zimmermann & Nagappan (2008) - cyclomatic complexity is predictive but weaker than change metrics
Ownership Diffusion0.07Bird et al. (2011) - files with many minor contributors (no clear owner) have more defects

Ownership diffusion is 1 - max_author_percentage: files where no single author dominates score higher risk. This follows Bird et al.'s finding that concentrated ownership (a clear primary author) correlates with fewer defects, while diffuse ownership correlates with more.

Percentile-based risk classification:

Risk levels use percentile-based thresholds following JIT defect prediction best practices. Rather than fixed thresholds, commits are ranked relative to the repository's own distribution:

LevelPercentileMeaning
HighTop 5%P95+ - Deserve extra scrutiny
MediumTop 20%P80-P95 - Worth additional attention
LowBottom 80%Below P80 - Standard review process

This approach aligns with the 80/20 rule from defect prediction research: ~20% of code changes contain ~80% of defects. It ensures actionable results regardless of repository characteristics - well-disciplined repos will have lower thresholds, while high-churn repos will have higher ones.

Why it matters: Kamei et al. (2013) demonstrated that JIT prediction catches risky changes at commit time, before bugs propagate. Their effort-aware approach uses ranking rather than fixed thresholds, focusing limited review resources on the riskiest ~20% of commits. Zeng et al. (2021) showed that simple JIT models match deep learning accuracy (~65%) with better interpretability.

[!TIP] Run omen changes before merging PRs to identify commits needing extra review.

</details> <details> <summary><strong>PR/Branch Diff Risk Analysis</strong> - Assess overall risk of a branch before merging</summary>

While JIT analysis examines individual commits, diff analysis evaluates an entire branch's cumulative changes against a target branch. This gives reviewers a quick risk assessment before diving into code review.

Usage:

bash
# Compare current branch against main
omen diff --target main

# Compare against a specific commit
omen diff --target abc123

# Output as markdown for PR comments
omen diff --target main -f markdown

Risk Factors:

The diff analyzer uses the same research-backed weight model as omen changes (see above), combining change-scope factors with file-level signals:

FactorWhat it measuresCategory
Lines AddedTotal new code introducedChange-scope
Lines DeletedCode removedChange-scope
Files ModifiedSpread of changesChange-scope
CommitsNumber of commits in branchChange-scope
EntropyHow scattered changes areChange-scope
File ChurnHistorical change frequency of touched filesFile-level
File ComplexityMax cyclomatic complexity of touched filesFile-level
Ownership DiffusionHow diffusely owned the touched files areFile-level

Risk Score Interpretation:

ScoreLevelRecommended Action
< 0.2LOWStandard review process
0.2 - 0.5MEDIUMCareful review, consider extra testing
> 0.5HIGHThorough review, ensure comprehensive test coverage

Example Output:

code
Branch Diff Risk Analysis
==========================

Source:   feature/new-api
Target:   main
Base:     abc123def

Risk Score: 0.31 (MEDIUM)

Changes:
  Lines Added:    530
  Lines Deleted:  39
  Files Modified: 3
  Commits:        1

Risk Factors:
  entropy:              0.023
  lines_added:          0.140
  lines_deleted:        0.008
  num_files:            0.014
  commits:              0.007
  file_churn:           0.080
  file_complexity:      0.045
  ownership_diffusion:  0.000

File Risk:
  max_complexity:       6.78
  max_churn:            1.00
  ownership_diffusion:  0.00

What to Look For:

  • High lines added, low deleted - New feature, needs thorough review
  • Balanced add/delete - Refactoring, verify behavior unchanged
  • Net code reduction - Cleanup/simplification, generally positive
  • High entropy - Scattered changes, check for unrelated modifications
  • Many files - Wide impact, ensure integration testing
  • High file_churn - Touching historically volatile files that change often
  • High ownership_diffusion - No clear owner for the touched files; ensure someone takes responsibility for review

CI/CD Integration:

yaml
# Add to GitHub Actions workflow
- name: PR Risk Assessment
  run: |
    omen diff --target ${{ github.base_ref }} -f markdown >> $GITHUB_STEP_SUMMARY

Why it matters: Code review time is limited. Diff analysis helps reviewers prioritize their attention - a LOW risk PR with 10 lines changed needs less scrutiny than a MEDIUM risk PR touching 17 files. The entropy metric is particularly useful for catching PRs that bundle unrelated changes, which are harder to review and more likely to introduce bugs.

[!TIP] Run omen diff before creating a PR to understand how reviewers will perceive your changes. Consider splitting HIGH risk PRs into smaller, focused changes.

</details> <details> <summary><strong>Technical Debt Gradient (TDG)</strong> - A composite "health score" for each file</summary>

TDG combines multiple metrics into a single score (0-100 scale, higher is better):

ComponentMax PointsWhat it measures
Structural Complexity20Cyclomatic complexity and nesting depth
Semantic Complexity15Cognitive complexity
Duplication15Amount of cloned code
Coupling15Dependencies on other modules
Hotspot10Churn x complexity interaction
Temporal Coupling10Co-change patterns with other files
Consistency10Code style and pattern adherence
Entropy10Pattern entropy and code uniformity
Documentation5Comment coverage

Why it matters: Technical debt is like financial debt - a little is fine, too much kills you. Cunningham coined the term in 1992, and Kruchten et al. (2012) formalized how to measure and manage it. TDG gives you a single number to track over time and compare across files.

[!TIP] Fix files with scores below 70 before adding new features. Track average TDG over time - it should go up, not down.

</details> <details> <summary><strong>Dependency Graph</strong> - How your modules connect to each other</summary>

Omen builds a graph showing which files import which other files, then calculates:

  • PageRank: Which files are most "central" (many things depend on them)
  • Betweenness: Which files are "bridges" between different parts of the codebase
  • Coupling: How interconnected modules are

Why it matters: Highly coupled code is fragile - changing one file breaks many others. Parnas's 1972 paper on modularity established that good software design minimizes dependencies between modules. The dependency graph shows you where your architecture is clean and where it's tangled.

[!TIP] Files with high PageRank should be especially stable and well-tested. Consider breaking up files that appear as "bridges" everywhere.

</details> <details> <summary><strong>Hotspot Analysis</strong> - High-risk files where complexity meets frequent changes</summary>

Hotspots are files that are both complex AND frequently modified. A simple file that changes often is probably fine - it's easy to work with. A complex file that rarely changes is also manageable - you can leave it alone. But a complex file that changes constantly? That's where bugs breed.

Omen calculates hotspot scores using the geometric mean of normalized churn and complexity:

code
hotspot = sqrt(churn_percentile * complexity_percentile)

Both factors are normalized against industry benchmarks using empirical CDFs, so scores are comparable across projects:

  • Churn percentile - Where this file's commit count ranks against typical OSS projects
  • Complexity percentile - Where the average cognitive complexity ranks against industry benchmarks
Hotspot ScoreSeverityAction
>= 0.6CriticalPrioritize immediately
>= 0.4HighSchedule for review
>= 0.25ModerateMonitor
< 0.25LowHealthy

Why it matters: Adam Tornhill's "Your Code as a Crime Scene" introduced hotspot analysis as a way to find the most impactful refactoring targets. His research shows that a small percentage of files (typically 4-8%) contain most of the bugs. Graves et al. (2000) and Nagappan et al. (2005) demonstrated that relative code churn is a strong defect predictor.

[!TIP] Start refactoring with your top 3 hotspots. Reducing complexity in high-churn files has the highest ROI.

</details> <details> <summary><strong>Temporal Coupling</strong> - Files that change together reveal hidden dependencies</summary>

When two files consistently change in the same commits, they're temporally coupled. This often reveals:

  • Hidden dependencies not visible in import statements
  • Logical coupling where a change in one file requires a change in another
  • Accidental coupling from copy-paste or inconsistent abstractions

Omen analyzes your git history to find file pairs that change together:

Coupling StrengthMeaning
> 80%Almost always change together - likely tight dependency
50-80%Frequently coupled - investigate the relationship
20-50%Moderately coupled - may be coincidental
< 20%Weakly coupled - probably independent

Why it matters: Ball et al. (1997) first studied co-change patterns at AT&T and found they reveal architectural violations invisible to static analysis. Beyer and Noack (2005) showed that temporal coupling predicts future changes - if files changed together before, they'll likely change together again.

[!TIP] If two files have >50% temporal coupling but no import relationship, consider extracting a shared module or merging them.

</details> <details> <summary><strong>Code Ownership/Bus Factor</strong> - Knowledge concentration and team risk</summary>

Bus factor asks: "How many people would need to be hit by a bus before this code becomes unmaintainable?" Low bus factor means knowledge is concentrated in too few people.

Omen uses git blame to calculate:

  • Primary owner - Who wrote most of the code
  • Ownership ratio - What percentage one person owns
  • Contributor count - How many people have touched the file
  • Bus factor - Number of major contributors (>5% of code)
Ownership RatioRisk LevelWhat it means
> 90%High riskSingle point of failure
70-90%Medium riskLimited knowledge sharing
50-70%Low riskHealthy distribution
< 50%Very lowBroad ownership

Why it matters: Bird et al. (2011) found that code with many minor contributors has more bugs than code with clear ownership, but code owned by a single person creates organizational risk. The sweet spot is 2-4 significant contributors per module. Nagappan et al. (2008) showed that organizational metrics (like ownership) predict defects better than code metrics alone.

[!TIP] Files with >80% single ownership should have documented knowledge transfer. Critical files should have at least 2 people who understand them.

</details> <details> <summary><strong>CK Metrics</strong> - Object-oriented design quality measurements</summary>

The Chidamber-Kemerer (CK) metrics suite measures object-oriented design quality:

MetricNameWhat it measuresThreshold
WMCWeighted Methods per ClassSum of method complexities< 20
CBOCoupling Between ObjectsNumber of other classes used< 10
RFCResponse for ClassMethods that can be invoked< 50
LCOMLack of Cohesion in MethodsMethods not sharing fields< 3
DITDepth of Inheritance TreeInheritance chain length< 5
NOCNumber of ChildrenDirect subclasses< 6

LCOM (Lack of Cohesion) is particularly important. Low LCOM means methods in a class use similar instance variables - the class is focused. High LCOM means the class is doing unrelated things and should probably be split.

Why it matters: Chidamber and Kemerer's 1994 paper established these metrics as the foundation of OO quality measurement. Basili et al. (1996) validated them empirically, finding that WMC and CBO strongly correlate with fault-proneness. These metrics have been cited thousands of times and remain the standard for OO design analysis.

[!TIP] Classes violating multiple CK thresholds are candidates for refactoring. High WMC + high LCOM often indicates a "god class" that should be split.

</details> <details> <summary><strong>Repository Map</strong> - PageRank-ranked symbol index for LLM context</summary>

Repository maps provide a compact summary of your codebase's important symbols, ranked by structural importance using PageRank. This is designed for LLM context windows - you get the most important functions and types first.

For each symbol, the map includes:

  • Name and kind (function, class, method, interface)
  • File location and line number
  • Signature for quick understanding
  • PageRank score based on how many other symbols depend on it
  • In/out degree showing dependency connections

Why it matters: LLMs have limited context windows. Stuffing them with entire files wastes tokens on less important code. PageRank, developed by Brin and Page (1998), identifies structurally important nodes in a graph. Applied to code, it surfaces the symbols that are most central to understanding the codebase.

Scalability: Omen uses a sparse power iteration algorithm for PageRank computation, scaling linearly with the number of edges O(E) rather than quadratically with nodes O(V^2). This enables fast analysis of large monorepos with 25,000+ symbols in under 30 seconds.

Example output:

code
# Repository Map (Top 20 symbols by PageRank)

## parser.ParseFile (function) - pkg/parser/parser.go:45
  PageRank: 0.0823 | In: 12 | Out: 5
  func ParseFile(path string) (*Result, error)

## models.TdgScore (struct) - pkg/models/tdg.go:28
  PageRank: 0.0651 | In: 8 | Out: 3
  type TdgScore struct

[!TIP] Use omen context --repo-map --top 50 to generate context for LLM prompts. The top 50 symbols usually capture the essential architecture.

</details> <details> <summary><strong>Feature Flag Detection</strong> - Find and track feature flags across your codebase</summary>

Feature flags are powerful but dangerous. They let you ship code without enabling it, run A/B tests, and roll out features gradually. But they accumulate. That "temporary" flag from 2019 is still in production. The flag you added for a one-week experiment is now load-bearing infrastructure.

Omen detects feature flag usage across popular providers:

ProviderLanguagesWhat it finds
LaunchDarklyJS/TSvariation(), boolVariation() calls
SplitJS/TSgetTreatment() calls
UnleashJS/TS, PythonisEnabled(), is_enabled() calls
FlipperRubyFlipper[:flag], enabled?() calls
ENV-basedRuby, JS/TS, PythonENV["FEATURE_*"], process.env.FEATURE_*

Additional providers can be added via custom tree-sitter queries in your omen.toml configuration.

For each flag, Omen reports:

  • Flag key - The identifier used in code
  • Provider - Which SDK is being used
  • References - All locations where the flag is checked
  • Staleness - When the flag was first and last modified (with git history)

Custom providers: For in-house feature flag systems, define custom tree-sitter queries in your omen.toml:

toml
[[feature_flags.custom_providers]]
name = "feature"
languages = ["ruby"]
query = '''
(call
  receiver: (constant) @receiver
  (#eq? @receiver "Feature")
  method: (identifier) @method
  (#match? @method "^(enabled\\?|get_feature_flag)$")
  arguments: (argument_list
    .
    (simple_symbol) @flag_key))
'''

Why it matters: Meinicke et al. (2020) studied feature flags across open-source projects and found that flag ownership (the developer who introduces a flag also removes it) correlates with shorter flag lifespans, helping keep technical debt in check. Rahman et al. (2018) studied Google Chrome's 12,000+ feature toggles and found that while they enable rapid releases and flexible deployment, they also introduce technical debt and additional maintenance burden. Regular flag audits prevent your codebase from becoming a maze of unused toggles.

[!TIP] Audit feature flags monthly. Remove flags older than 90 days for experiments, 14 days for release flags. Track flag staleness in your CI pipeline.

</details> <details> <summary><strong>Repository Score</strong> - Composite health score (0-100)</summary>

Omen computes a composite repository health score (0-100) that combines multiple analysis dimensions. This provides a quick overview of codebase quality and enables quality gates in CI/CD.

Score Components:

ComponentWeightWhat it measures
Complexity25%% of functions exceeding complexity thresholds
Duplication20%Code clone ratio with non-linear penalty curve
SATD10%Severity-weighted TODO/FIXME density per 1K LOC
TDG15%Technical Debt Gradient composite score
Coupling10%Cyclic deps, SDP violations, and instability
Smells5%Architectural smells relative to codebase size
Cohesion15%Class cohesion (LCOM) for OO codebases

Normalization Philosophy:

Each component metric is normalized to a 0-100 scale where higher is always better. The normalization functions are designed to be:

  1. Fair - Different metrics with similar severity produce similar scores
  2. Calibrated - Based on industry benchmarks from SonarQube, CodeClimate, and CISQ
  3. Non-linear - Gentle penalties for minor issues, steep for severe ones
  4. Severity-aware - Weight items by impact, not just count

For example, SATD (Self-Admitted Technical Debt) uses severity-weighted scoring:

  • Critical (SECURITY, VULN): 4x weight
  • High (FIXME, BUG): 2x weight
  • Medium (HACK, REFACTOR): 1x weight
  • Low (TODO, NOTE): 0.25x weight

This prevents low-severity items (like documentation TODOs) from unfairly dragging down scores.

TDG (Technical Debt Gradient) provides a complementary view by analyzing structural complexity, semantic complexity, duplication patterns, and coupling within each file.

Usage:

bash
# Compute repository score
omen score

# JSON output for CI integration
omen -f json score

Adjusting thresholds:

Achieving a score of 100 is nearly impossible for real-world codebases. Set realistic thresholds in omen.toml based on your codebase:

toml
[score.thresholds]
score = 80        # Overall score minimum
complexity = 85   # Function complexity
duplication = 65  # Code clone ratio (often the hardest to improve)
defect = 80       # Defect probability
debt = 75         # Technical debt density
coupling = 70     # Module coupling
smells = 90       # Architectural smells

Run omen score to see your current scores, then set thresholds slightly below those values. Gradually increase them over time.

Enforcing on commit with Lefthook:

Add to lefthook.yml:

yaml
pre-push:
  commands:
    omen-score:
      run: omen score

This prevents pushing code that fails your quality thresholds.

Why it matters: A single health score enables quality gates, tracks trends over time, and provides quick codebase assessment. The weighted composite ensures that critical issues (defects, complexity) have more impact than cosmetic ones.

[!TIP] Start with achievable thresholds and increase them as you improve your codebase. Duplication is often the hardest metric to improve in legacy code.

</details> <details> <summary><strong>Semantic Search</strong> - Natural language code discovery</summary>

Search your codebase by meaning, not just keywords. Omen uses a TF-IDF engine with sublinear TF, smooth IDF, and bigram tokenization to find semantically similar code from natural language queries. No external models, no API keys, no GPU required.

bash
# Build the search index
omen search index

# Search for code
omen search query "database connection pooling"
omen search query "error handling middleware" --top-k 20
omen search query "authentication" --files src/auth/,src/middleware/

# Cross-repo search
omen search query "retry logic" --include-project /path/to/other-repo

# Filter by complexity
# (via MCP: semantic_search with max_complexity parameter)

How it works:

  1. Symbol extraction - Extracts functions from your codebase using tree-sitter
  2. AST-aware chunking - Splits long functions at statement boundaries so each chunk is focused and self-contained. Parent type context (class, struct, impl) is preserved.
  3. TF-IDF indexing - Builds a sparse vector index with L2-normalized cosine similarity. Indexes in ~1-2 seconds for typical codebases.
  4. Incremental updates - Only re-indexes files that changed since last run
  5. Deduplication - Each symbol appears once in results (best-scoring chunk wins)

Features:

  • HyDE search - Write a hypothetical code snippet as your query for better matches (available via MCP semantic_search_hyde tool)
  • Complexity filtering - Exclude high-complexity functions from results (max_complexity parameter on MCP tools)
  • Multi-repo search - Query across multiple project indexes with unified IDF scoring (--include-project)
  • Per-function metrics - Results include cyclomatic and cognitive complexity when available

Performance:

MetricValue
Index time~1-2s (1,400 symbols)
Query time~250ms
StorageSQLite in .omen/search.db
DependenciesZero external (pure Rust TF-IDF)

Why it matters: Traditional grep/ripgrep finds exact matches. Semantic search finds code that means the same thing even with different naming. Ask "how do we validate user input" and find functions named sanitize_params, check_request, or validate_form.

[!TIP] Run omen search index after major refactors or when onboarding to a new codebase. The index updates incrementally on subsequent runs.

</details> <details> <summary><strong>Mutation Testing</strong> - Test suite effectiveness through code mutation</summary>

Mutation testing measures how well your test suite catches bugs by introducing small changes (mutations) to your code and checking if tests fail. A "killed" mutant means tests caught the bug; a "surviving" mutant means a bug could slip through.

21 Mutation Operators:

CategoryOperatorsWhat they mutate
CoreCRR, ROR, AOR, COR, UORLiterals, relational ops, arithmetic, conditionals, unary
AdvancedSDL, RVR, BVO, BOR, ASRStatement deletion, return values, boundaries, bitwise, assignment
RustBorrowOperator, OptionOperator, ResultOperatorBorrow semantics, Option/Result handling
GoGoErrorOperator, GoNilOperatorError handling, nil checks
TypeScriptTSEqualityOperator, TSOptionalOperator===/==, optional chaining
PythonPythonIdentityOperator, PythonComprehensionOperatoris/==, list comprehensions
RubyRubyNilOperator, RubySymbolOperatornil handling, symbol/string conversion

Features:

  • Parallel execution - Async worker pool with work-stealing for efficient mutation testing
  • Equivalent mutant detection - ML-based scoring to identify semantically equivalent mutations
  • Coverage integration - Parse LLVM-cov, Istanbul, coverage.py, and Go coverage to skip untested code
  • Incremental mode - Only test mutations in changed files
  • CI/CD integration - Quality gates and GitHub integration

Usage:

bash
# Generate mutants (dry run) with default operators (CRR, ROR, AOR)
omen mutation --dry-run

# Run mutation testing with all operators
omen mutation --mode thorough

# Fast mode (excludes operators that produce more equivalent mutants)
omen mutation --mode fast

# Run with coverage data to skip untested code
omen mutation --coverage coverage.json

# Incremental mode for CI - only test changed files
omen mutation --incremental

# Control parallelism
omen mutation --jobs 8

# Output surviving mutants for investigation
omen mutation --output-survivors survivors.json

# Filter to specific files
omen mutation --glob "src/analyzers/*.rs"

ML-Based Prediction:

Omen includes an ML model that learns from your mutation testing history to predict which mutants will survive. This enables two optimizations:

  1. Skip obvious kills - Don't waste time testing mutants the model is confident will be caught
  2. Better equivalent detection - Learn which patterns in your codebase produce equivalent mutants
bash
# Record results to history file for later training
omen mutation --record

# Train the model from accumulated history
omen mutation train

# Use trained model to skip high-confidence kills (saves time)
omen mutation --skip-predicted 0.95

# Use a custom model path
omen mutation --model path/to/model.json

Training Workflow:

  1. Collect data: Run omen mutation --record on your codebase. Each mutant's outcome (killed/survived) is appended to .omen/mutation-history.jsonl along with:

    • Mutant details (operator, location, original/mutated code)
    • Source context (5 lines before/after the mutation)
    • Execution time
  2. Train model: Run omen mutation train to train the predictor. The model learns:

    • Operator-specific kill rates for your codebase
    • Feature weights correlating code patterns with survival
  3. Use predictions: Future runs automatically load .omen/mutation-model.json. Use --skip-predicted 0.9 to skip mutants with >90% predicted kill probability.

Example CI workflow:

bash
# Weekly: full run with recording
omen mutation --record --mode thorough

# After accumulating history: train model
omen mutation train

# Daily CI: fast run using predictions
omen mutation --incremental --skip-predicted 0.95

[!NOTE] The .omen/ directory is gitignored by default. If you want to share the trained model across your team, remove .omen/mutation-model.json from your .gitignore.

Mutation Score:

The mutation score measures test suite effectiveness:

code
mutation_score = killed_mutants / (total_mutants - equivalent_mutants)
ScoreQualityMeaning
> 80%ExcellentStrong test suite that catches most bugs
60-80%GoodReasonable coverage, some gaps to address
40-60%ModerateSignificant testing gaps
< 40%PoorTests miss many potential bugs

Why it matters: Code coverage tells you what code runs during tests, but not whether tests actually verify behavior. A function can have 100% coverage yet 0% mutation score if assertions are missing. Jia and Harman (2011) showed that mutation testing correlates strongly with fault detection. Papadakis et al. (2019) demonstrated it outperforms other test adequacy criteria.

[!TIP] Start with --mode fast on CI for quick feedback, and run --mode thorough periodically for comprehensive analysis. Use --coverage to avoid wasting time on untested code.

</details> <details> <summary><strong>MCP Server</strong> - LLM tool integration via Model Context Protocol</summary>

Omen includes a Model Context Protocol (MCP) server that exposes all analyzers as tools for LLMs like Claude. This enables AI assistants to analyze codebases directly through standardized tool calls.

Available tools:

  • complexity - Cyclomatic and cognitive complexity
  • satd - Self-admitted technical debt detection
  • deadcode - Unused functions and variables
  • churn - Git file change frequency
  • clones - Code clones detection
  • defect - File-level defect probability (PMAT)
  • changes - Commit-level change risk (JIT)
  • diff - Branch diff risk analysis
  • tdg - Technical Debt Gradient scores
  • graph - Dependency graph generation
  • hotspot - High churn + complexity files
  • temporal - Files that change together
  • ownership - Code ownership and bus factor
  • cohesion - CK OO metrics
  • repomap - PageRank-ranked symbol map
  • smells - Architectural smell detection
  • flags - Feature flag detection and staleness
  • score - Composite health score (0-100)
  • semantic_search - Natural language code search
  • semantic_search_hyde - HyDE-style search (query with a hypothetical code snippet)

Each tool includes detailed descriptions with interpretation guidance, helping LLMs understand what metrics mean and when to use each analyzer.

Tool outputs default to TOON (Token-Oriented Object Notation) format, a compact serialization designed for LLM workflows that reduces token usage by 30-60% compared to JSON while maintaining high comprehension accuracy. JSON and Markdown formats are also available.

Why it matters: LLMs work best when they have access to structured tools rather than parsing unstructured output. MCP is the emerging standard for LLM tool integration, supported by Claude Desktop and other AI assistants. TOON output maximizes the information density within context windows.

[!TIP] Configure omen as an MCP server in your AI assistant to enable natural language queries like "find the most complex functions" or "show me technical debt hotspots."

</details> <div align="center"> <img src="assets/omen-context.jpg" alt="Code Only vs Omen Context" width="100%"> </div>

Supported Languages

Go, Rust, Python, TypeScript, JavaScript, TSX/JSX, Java, C, C++, C#, Ruby, PHP, Bash (and other languages supported by tree-sitter)

Installation

Homebrew (macOS/Linux)

bash
brew install panbanda/brews/omen

Cargo Install

bash
cargo install omen-cli

Docker

bash
# Pull the latest image
docker pull ghcr.io/panbanda/omen:latest

# Run analysis on current directory
docker run --rm -v "$(pwd):/repo" ghcr.io/panbanda/omen:latest analyze /repo

# Run specific analyzer
docker run --rm -v "$(pwd):/repo" ghcr.io/panbanda/omen:latest complexity /repo

# Get repository score
docker run --rm -v "$(pwd):/repo" ghcr.io/panbanda/omen:latest score /repo

Multi-arch images are available for linux/amd64 and linux/arm64.

Download Binary

Download pre-built binaries from the releases page.

Build from Source

bash
git clone https://github.com/panbanda/omen.git
cd omen
cargo build --release
# Binary at target/release/omen

Quick Start

bash
# Run all analyzers
omen all

# Check out the analyzers
omen --help

Remote Repository Scanning

Analyze any public GitHub repository without cloning it manually:

bash
# GitHub shorthand
omen -p facebook/react complexity
omen -p kubernetes/kubernetes satd

# With specific ref (branch, tag, or commit SHA)
omen -p agentgateway/agentgateway --ref v0.1.0 all
omen -p owner/repo --ref feature-branch all

# Full URLs
omen -p github.com/golang/go all
omen -p https://github.com/vercel/next.js all

# Shallow clone for faster analysis (static analyzers only)
omen -p facebook/react --shallow all

Omen clones to a temp directory, runs analysis, and cleans up automatically. The --shallow flag uses git clone --depth 1 for faster clones but disables git-history-based analyzers (churn, ownership, hotspot, temporal coupling, changes).

Configuration

Create omen.toml or .omen/omen.toml (supports yaml, json and toml):

bash
omen init

See omen.example.toml for all options.

[!TIP] Using Claude Code? Run the setup-config skill to analyze your repository and generate an omen.toml with intelligent defaults for your tech stack, including detected feature flag providers and language-specific exclude patterns.

GitHub Action

Omen provides a GitHub Action for automated PR analysis. It runs diff risk analysis and health scoring on every pull request.

Basic Usage

yaml
name: Omen Analysis
on: [pull_request]

permissions:
  contents: read

jobs:
  omen:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0

      - uses: panbanda/omen@omen-v4.21.2
        id: omen

      - name: Print results
        run: |
          echo "Risk: ${{ steps.omen.outputs.risk-level }} (${{ steps.omen.outputs.risk-score }})"
          echo "Health: ${{ steps.omen.outputs.health-grade }} (${{ steps.omen.outputs.health-score }})"

[!IMPORTANT] fetch-depth: 0 is required. Omen needs full git history for accurate analysis.

With PR Comment and Labels

yaml
      - uses: panbanda/omen@omen-v4.21.2
        id: omen
        with:
          comment: true
          label: true
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Requires additional permissions:

yaml
permissions:
  contents: read
  pull-requests: write
  issues: write

Inputs

InputDefaultDescription
versionlatestOmen version to install
path.Repository path to analyze
commentfalsePost/update a sticky PR comment
labelfalseAdd a risk-level label
label-templaterisk: {{level}}Label name template ({{level}} is replaced with low, medium, or high)
checkfalseFail if risk meets threshold
check-thresholdhighRisk level to fail on (low, medium, high)

Outputs

All outputs are available for chaining into downstream steps:

OutputExampleDescription
risk-score0.42Diff risk score (0.0 - 1.0)
risk-levelmediumRisk level (low, medium, high)
health-score76.9Health score (0 - 100)
health-gradeCHealth grade (A - F)
diff-json{...}Full omen diff JSON
score-json{...}Full omen score JSON

Quality Gate

Fail the workflow if risk is too high:

yaml
      - uses: panbanda/omen@omen-v4.21.2
        with:
          check: true
          check-threshold: high  # fail on high risk PRs

Custom Workflows

Use outputs to build custom integrations:

yaml
      - uses: panbanda/omen@omen-v4.21.2
        id: omen

      - name: Block high-risk PRs
        if: steps.omen.outputs.risk-level == 'high'
        run: |
          gh pr edit ${{ github.event.pull_request.number }} --add-label "needs-review"
          exit 1

      - name: Notify on health drop
        if: fromJSON(steps.omen.outputs.health-score) < 60
        run: curl -X POST "$SLACK_WEBHOOK" -d '{"text":"Health: ${{ steps.omen.outputs.health-score }}"}'

Label Template

Customize label naming with label-template. The {{level}} token is replaced with the risk level:

yaml
      - uses: panbanda/omen@omen-v4.21.2
        with:
          label: true
          label-template: 'omen/{{level}}'  # produces: omen/low, omen/medium, omen/high

MCP Server

Omen includes a Model Context Protocol (MCP) server that exposes all analyzers as tools for LLMs like Claude. This enables AI assistants to analyze codebases directly.

Claude Desktop

Add to your Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS):

json
{
  "mcpServers": {
    "omen": {
      "command": "omen",
      "args": ["mcp"]
    }
  }
}

Claude Code

bash
claude mcp add omen -- omen mcp

Example Usage

Once configured, you can ask Claude:

  • "Analyze the complexity of this codebase"
  • "Find technical debt in the src directory"
  • "What are the hotspot files that need refactoring?"
  • "Show me the bus factor risk for this project"
  • "Find stale feature flags that should be removed"

Claude Code Plugin

Omen is available as a Claude Code plugin, providing analysis-driven skills that guide Claude through code analysis workflows.

Installation

bash
/plugin install panbanda/omen

Verify installation with /skills to see available Omen skills.

Prerequisites

Skills require the Omen MCP server to be configured (see MCP Server section above).

Real-World Repository Analysis Examples

The examples/repos/ directory contains comprehensive health reports for popular open-source projects, demonstrating Omen's capabilities across different languages and project types.

Analyzed Repositories

RepositoryLanguageHealth ScoreKey Insights
gin-gonic/ginGo95/100Exceptionally healthy web framework with zero duplication and clean architecture
excalidraw/excalidrawTypeScript86/100Highest-scoring repo with 100% coupling score; App.tsx needs refactoring
BurntSushi/ripgrepRust76/100Mature codebase with excellent architecture; duplication in tests is intentional
tiangolo/fastapiPython74/100Great complexity scores (98/100); duplication from versioned examples in docs
discourse/discourseRuby69/100Largest codebase (10K+ files); excellent defect management despite size

What the Reports Demonstrate

1. Health Score Breakdown Each report shows how the composite score is calculated from individual components (complexity, duplication, SATD, coupling, etc.) and explains why certain scores are what they are.

2. Hotspot Analysis Reports identify files with high churn AND high complexity - the most impactful refactoring targets. For example, gin's tree.go has a hotspot score of 0.54 due to its radix tree routing complexity.

3. Technical Debt Gradient (TDG) Files are graded A-F based on accumulated technical debt. The reports explain what drives low grades and prioritize cleanup efforts.

4. PR Risk Analysis Each report includes a real PR analysis demonstrating omen diff:

bash
omen diff --target main -f markdown

Example from gin-gonic/gin (#4420 - add escaped path option):

code
Risk Score: 0.31 (MEDIUM)
Lines Added:    63
Lines Deleted:  2
Files Modified: 2

Risk Factors:
  entropy:        0.084
  lines_added:    0.118
  num_files:      0.050

Understanding Risk Factors:

  • Risk Score - LOW (< 0.2), MEDIUM (0.2-0.5), HIGH (> 0.5)
  • Entropy - How scattered changes are (0 = concentrated, 1 = everywhere)
  • Lines Added/Deleted Ratio - Net code reduction is often a good sign
  • Files Modified - More files = more potential for cascading issues

5. CI/CD Integration Reports include GitHub Actions workflow examples for quality gates and PR risk assessment.

Generating Your Own Reports

Run a comprehensive analysis on any repository:

bash
# Local repository
omen score
omen hotspot
omen tdg

# Remote repository
omen -p facebook/react score
omen -p kubernetes/kubernetes hotspot

# PR risk before merging
omen diff --target main

# Track score trends over time
omen score trend --period monthly --since 6m

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -am 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Create a Pull Request

Acknowledgments

Omen draws heavy inspiration from paiml-mcp-agent-toolkit - a fantastic CLI and comprehensive suite of code analysis tools for LLM workflows. If you're doing serious AI-assisted development, it's worth checking out. Omen exists as a streamlined alternative for teams who want a focused subset of analyzers without the additional dependencies. If you're looking for a Rust-focused MCP/agent generator as an alternative to Python, it's definitely worth checking out.

License

Apache License 2.0 - see LICENSE for details.

常见问题

io.github.panbanda/omen 是什么?

支持多语言代码分析,评估复杂度、技术债、hotspots、ownership 以及缺陷预测。

相关 Skills

前端设计

by anthropics

Universal
热门

面向组件、页面、海报和 Web 应用开发,按鲜明视觉方向生成可直接落地的前端代码与高质感 UI,适合做 landing page、Dashboard 或美化现有界面,避开千篇一律的 AI 审美。

想把页面做得既能上线又有设计感,就用前端设计:组件到整站都能产出,难得的是能避开千篇一律的 AI 味。

编码与调试
未扫描111.8k

网页构建器

by anthropics

Universal
热门

面向复杂 claude.ai HTML artifact 开发,快速初始化 React + Tailwind CSS + shadcn/ui 项目并打包为单文件 HTML,适合需要状态管理、路由或多组件交互的页面。

在 claude.ai 里做复杂网页 Artifact 很省心,多组件、状态和路由都能顺手搭起来,React、Tailwind 与 shadcn/ui 组合效率高、成品也更精致。

编码与调试
未扫描111.8k

网页应用测试

by anthropics

Universal
热门

用 Playwright 为本地 Web 应用编写自动化测试,支持启动开发服务器、校验前端交互、排查 UI 异常、抓取截图与浏览器日志,适合调试动态页面和回归验证。

借助 Playwright 一站式验证本地 Web 应用前端功能,调 UI 时还能同步查看日志和截图,定位问题更快。

编码与调试
未扫描111.8k

相关 MCP Server

GitHub

编辑精选

by GitHub

热门

GitHub 是 MCP 官方参考服务器,让 Claude 直接读写你的代码仓库和 Issues。

这个参考服务器解决了开发者想让 AI 安全访问 GitHub 数据的问题,适合需要自动化代码审查或 Issue 管理的团队。但注意它只是参考实现,生产环境得自己加固安全。

编码与调试
83.1k

by Context7

热门

Context7 是实时拉取最新文档和代码示例的智能助手,让你告别过时资料。

它能解决开发者查找文档时信息滞后的问题,特别适合快速上手新库或跟进更新。不过,依赖外部源可能导致偶尔的数据延迟,建议结合官方文档使用。

编码与调试
51.8k

by tldraw

热门

tldraw 是让 AI 助手直接在无限画布上绘图和协作的 MCP 服务器。

这解决了 AI 只能输出文本、无法视觉化协作的痛点——想象让 Claude 帮你画流程图或白板讨论。最适合需要快速原型设计或头脑风暴的开发者。不过,目前它只是个基础连接器,你得自己搭建画布应用才能发挥全部潜力。

编码与调试
46.2k

评论