Analysis Engine Architecture¶
Technical document for security analysts
1. Overview¶
The mcp-scan analysis engine is designed as a modular pipeline that processes source code from MCP servers to detect security vulnerabilities. The architecture follows Clean Architecture principles with inward dependencies.
1.1 High-Level Diagram¶
+------------------------------------------------------------------+
| INPUT LAYER |
| +------------+ +------------+ +------------+ +------------+ |
| | CLI Tool | | Go API | | Config | | Baseline | |
| | (Cobra) | | (pkg/scan) | | (YAML) | | Filter | |
| +------------+ +------------+ +------------+ +------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| ORCHESTRATION LAYER |
| +----------------------------------------------------------+ |
| | Scanner | |
| | - Coordinates the complete pipeline | |
| | - Manages parallelism (worker pool) | |
| | - Applies timeouts and cancellation | |
| +----------------------------------------------------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| ANALYSIS LAYER |
| +---------------+ +---------------+ +---------------+ |
| | Discovery | | Parser | | Surface | |
| | (glob-based) | | (tree-sitter) | | Extractor | |
| +---------------+ +---------------+ +---------------+ |
| | |
| v |
| +---------------+ +---------------+ +---------------+ |
| | Pattern | | Taint | | ML | |
| | Engine | | Engine | | Classifier | |
| +---------------+ +---------------+ +---------------+ |
| +---------------+ +---------------+ |
| | LLM | | CodeQL | |
| | Detector | | Client | |
| +---------------+ +---------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| SUPPORT LAYER |
| +---------------+ +---------------+ +---------------+ |
| | Catalog | | Call Graph | | Type Info | |
| | (src/sink) | | Builder | | System | |
| +---------------+ +---------------+ +---------------+ |
| +---------------+ +---------------+ +---------------+ |
| | Import | | MSSS | | Scoring | |
| | Resolver | | Calculator | | Context | |
| +---------------+ +---------------+ +---------------+ |
+------------------------------------------------------------------+
|
v
+------------------------------------------------------------------+
| OUTPUT LAYER |
| +---------------+ +---------------+ +---------------+ |
| | JSON | | SARIF | | Evidence | |
| | Reporter | | Reporter | | Bundle | |
| +---------------+ +---------------+ +---------------+ |
+------------------------------------------------------------------+
2. Analysis Pipeline¶
2.1 Pipeline Phases¶
The analysis pipeline consists of the following phases executed sequentially:
PHASE 1: Discovery
|
v
PHASE 2: Parsing (parallel)
|
v
PHASE 3: Surface Extraction
|
v
PHASE 4: Analysis (parallel)
| +-- Pattern Engine
| +-- Taint Engine
| +-- ML Classifier
| +-- LLM Detector (optional)
| +-- CodeQL (optional)
|
v
PHASE 5: Normalization
|
v
PHASE 6: Scoring
|
v
PHASE 7: Reporting
2.2 Detail of Each Phase¶
PHASE 1: Discovery¶
Location: internal/discovery/discovery.go
Function: Identify all source code files to analyze.
Process: 1. Traverse directory with glob patterns (include/exclude) 2. Filter by supported extensions (.py, .ts, .tsx, .js, .jsx, .go) 3. Calculate SHA-256 of each file for the manifest 4. Record size and absolute path
Output: List of FileInfo with path, hash, size, language
PHASE 2: Parsing¶
Location: internal/parser/
Function: Convert source code to normalized AST.
Process:
1. Distribute files among workers (parallel)
2. For each file:
- Select tree-sitter parser based on language
- Parse to native AST
- Convert to normalized AST (internal/ast/)
3. Extract: functions, classes, imports, decorators, calls
Output: List of *ast.File with normalized structure
Supported languages: | Language | Extension | Tree-sitter Grammar | Status | |----------|-----------|---------------------|--------| | Python | .py | tree-sitter-python | Complete | | TypeScript | .ts, .tsx, .mts, .cts | tree-sitter-typescript | Complete | | JavaScript | .js, .jsx, .mjs, .cjs | tree-sitter-javascript | Complete | | Go | .go | tree-sitter-go | Parsing only |
PHASE 3: Surface Extraction¶
Location: internal/surface/surface.go
Function: Identify the MCP surface of the server.
Process: 1. Detect SDK used (mcp, fastmcp, @modelcontextprotocol/sdk) 2. For each file: - Search for tool decorators (@tool, @server.tool) - Extract name, description, schema - Identify associated handler 3. Detect transport type (stdio, HTTP, WebSocket) 4. Detect authentication signals (cookies, headers, OAuth)
Output: *surface.MCPSurface with Tools, Resources, Prompts, Transport, AuthSignals
PHASE 4: Analysis¶
Location: Multiple modules
Function: Detect vulnerabilities through multiple techniques.
Parallel sub-phases:
| Engine | Location | Function |
|---|---|---|
| Pattern Engine | internal/pattern/ |
Detection by regex/AST |
| Taint Engine | internal/taint/ |
Data flow source->sink |
| ML Classifier | internal/ml/ |
Tool poisoning classification |
| LLM Detector | internal/llm/ |
Semantic analysis (optional) |
| CodeQL | internal/codeql/ |
Secondary confirmation (optional) |
PHASE 5: Normalization¶
Location: internal/types/types.go
Function: Unify and deduplicate findings.
Process: 1. Generate unique ID for each finding (SHA-256) 2. Deduplicate by ID 3. Apply baseline filter if exists 4. Sort by severity
Output: Deduplicated list of Finding
PHASE 6: Scoring¶
Location: internal/msss/msss.go, internal/scoring/scoring.go
Function: Calculate MSSS score and adjust severities.
Process: 1. Initialize base score = 100 2. Apply penalties per finding 3. Apply multiplier for criticals 4. Adjust severity by MCP context 5. Determine compliance level (0-3)
Output: *msss.Score with total, categories, level
PHASE 7: Reporting¶
Location: internal/reporter/
Function: Generate output in requested format.
Supported formats: | Format | Use | Content | |--------|-----|---------| | JSON | Custom processing | Findings + manifest + score | | SARIF 2.1.0 | GitHub/GitLab CI | Security standard | | Evidence | Auditing | Manifest + traces + complete evidence |
3. Key Components¶
3.1 Scanner (Orchestrator)¶
Location: pkg/scanner/scanner.go
The Scanner is the main entry point and coordinates the entire pipeline:
type Scanner struct {
config Config
patternEng *pattern.Engine
taintEng *taint.Engine
surfaceExt *surface.Extractor
mlClassifier ml.Classifier
llmDetector *llm.Detector
codeqlClient *codeql.Client
}
Responsibilities: - Initialize all engines - Manage worker pool for parallelism - Apply timeouts via context - Coordinate result merging
3.2 Pattern Engine¶
Location: internal/pattern/engine.go
Pattern-based detection engine:
type Engine struct {
rules []*Rule
severityOverrides map[string]types.Severity
disabledRules map[string]bool
}
Components:
- Rules: List of rules with ID, detector, severity
- Detectors: Implementations of Detector interface
- Overrides: Configuration to adjust severities
3.3 Taint Engine¶
Location: internal/taint/engine.go
Data flow analysis engine:
type Engine struct {
catalog *catalog.Catalog
mode Mode // fast or deep
depth int // Inter-procedural depth
}
Components: - Catalog: Definitions of sources/sinks/sanitizers - TaintState: Taint state per scope - Trace: Record of the source->sink path
3.4 Catalog¶
Location: internal/catalog/catalog.go
Knowledge catalog for taint analysis:
Source Categories:
| Category | Description |
|----------|-------------|
| SourceToolInput | MCP tool input |
| SourceEnvVar | Environment variables |
| SourceHTTPRequest | HTTP request |
| SourceFileContent | File content |
| SourceDBResult | Database result |
Sink Categories:
| Category | Class | Description |
|----------|-------|-------------|
| SinkExec | A | Command execution |
| SinkEval | A | Code evaluation |
| SinkFilesystem | B | File operations |
| SinkNetwork | C | Network requests |
| SinkDatabase | D | SQL queries |
| SinkLogging | E | Data logging |
| SinkResponse | E | Responses |
| SinkLLMPrompt | H | LLM prompts |
3.5 ML Classifier¶
Location: internal/ml/classifier.go
Machine learning classifier for tool poisoning:
Implementations:
- RuleBasedClassifier: Deterministic by rules
- WeightedClassifier: Trained weights
- EnsembleClassifier: Combination of classifiers
3.6 Surface Extractor¶
Location: internal/surface/surface.go
MCP surface extractor:
type MCPSurface struct {
Tools []Tool
Resources []Resource
Prompts []Prompt
Transport TransportType
AuthSignals []AuthSignal
}
4. Internal Data Flow¶
4.1 Finding Structure¶
Each finding has the following structure:
type Finding struct {
ID string // Unique SHA-256 hash
RuleID string // MCP-X001
Title string // Descriptive title
Severity Severity // critical/high/medium/low/info
Confidence Confidence // high/medium/low
Class VulnClass // A-N
Language Language // python/typescript/javascript/go
Location Location // file, line, column
Evidence Evidence // snippet, trace, LLM analysis
Description string // Finding description
Remediation string // How to remediate
MCPContext *MCPContext // Tool/handler where found
Trace *TaintTrace // Taint trace (if applicable)
}
4.2 Finding Flow¶
Pattern/Taint/ML detects match
|
v
+-------------------+
| Create Match |
| - location |
| - snippet |
| - confidence |
+-------------------+
|
v
+-------------------+
| Convert to |
| Finding |
| - apply rule |
| - generate ID |
+-------------------+
|
v
+-------------------+
| Context-Aware |
| Adjustment |
| - MCP tool boost |
| - trace length |
+-------------------+
|
v
+-------------------+
| Normalization |
| - deduplication |
| - baseline filter |
+-------------------+
|
v
+-------------------+
| MSSS Scoring |
| - penalties |
| - multipliers |
+-------------------+
|
v
+-------------------+
| Reporter |
| - JSON/SARIF |
+-------------------+
5. Parallelism and Performance¶
5.1 Worker Pool¶
The scanner uses a configurable worker pool:
+------------------+
| Scanner |
| +-----------+ |
| | WorkerPool| |
| | (N workers)| |
| +-----------+ |
+------------------+
|
v
+-------+-------+-------+
| | | |
Worker Worker Worker Worker
| | | |
File1 File2 File3 File4
Configuration:
- --workers N: Number of workers (0 = auto-detect CPUs)
- Default: number of available CPUs
5.2 Parallelizable Phases¶
| Phase | Parallel | Notes |
|---|---|---|
| Discovery | No | I/O bound, sequential |
| Parsing | Yes | Per file |
| Surface | No | Needs all ASTs |
| Analysis | Yes | Per file and per engine |
| Normalization | No | Needs all findings |
| Scoring | No | Fast, not necessary |
| Reporting | No | I/O output |
5.3 Cancellation and Timeouts¶
The pipeline supports cancellation at any phase via context:
ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()
result, err := scanner.Scan(ctx, "/path/to/code")
Each phase checks ctx.Done() and terminates cleanly if cancelled.
6. Extensibility¶
6.1 Adding a New Detector¶
-
Implement the
Detectorinterface: -
Register in Pattern Engine:
6.2 Adding a New Source/Sink¶
- Edit
internal/catalog/catalog.go - Add definition:
6.3 Adding a New Language¶
- Add tree-sitter grammar in
internal/parser/ - Implement AST extractor
- Add to
types.Language - Update language-specific sources/sinks
7. Key Files¶
| Path | Description |
|---|---|
pkg/scanner/scanner.go |
Public API and orchestrator |
internal/pattern/engine.go |
Pattern engine |
internal/taint/engine.go |
Taint engine |
internal/ml/classifier.go |
ML classifier |
internal/ml/features.go |
Feature extraction |
internal/llm/detector.go |
LLM detector |
internal/codeql/client.go |
CodeQL client |
internal/surface/surface.go |
Surface extractor |
internal/catalog/catalog.go |
Sources/sinks catalog |
internal/msss/msss.go |
MSSS calculator |
internal/types/types.go |
Common types |
8. Complete Execution Flow¶
mcp-scan scan /path/to/code --mode fast --output json
|
v
+---------------+
| Load Config | <-- .mcp-scan.yaml or defaults
+---------------+
|
v
+---------------+
| Init Scanner | <-- Create all engines
+---------------+
|
v
+---------------+
| Discovery | <-- Find files
+---------------+
|
v
+---------------+
| Parse Files | <-- tree-sitter -> AST
| (parallel) |
+---------------+
|
v
+---------------+
| Extract | <-- Tools, Resources, Transport
| Surface |
+---------------+
|
v
+----------------+----------------+
| | |
v v v
+--------+ +---------+ +------+
|Pattern | | Taint | | ML |
|Engine | | Engine | | Clf |
+--------+ +---------+ +------+
| | |
+----------------+----------------+
|
v
+---------------+
| Merge & Dedup | <-- Unify findings
+---------------+
|
v
+---------------+
| Apply | <-- Filter findings in baseline
| Baseline |
+---------------+
|
v
+---------------+
| Calculate | <-- MSSS Score
| MSSS Score |
+---------------+
|
v
+---------------+
| Generate | <-- JSON output
| Report |
+---------------+
|
v
stdout/file
Next document: taint-analysis.md