Analysis Engine Architecture¶

Technical document for security analysts

1. Overview¶

The mcp-scan analysis engine is designed as a modular pipeline that processes source code from MCP servers to detect security vulnerabilities. The architecture follows Clean Architecture principles with inward dependencies.

1.1 High-Level Diagram¶

+------------------------------------------------------------------+
|                        INPUT LAYER                                |
|  +------------+  +------------+  +------------+  +------------+   |
|  | CLI Tool   |  | Go API     |  | Config     |  | Baseline   |   |
|  | (Cobra)    |  | (pkg/scan) |  | (YAML)     |  | Filter     |   |
|  +------------+  +------------+  +------------+  +------------+   |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                        ORCHESTRATION LAYER                        |
|  +----------------------------------------------------------+    |
|  |                      Scanner                              |    |
|  |  - Coordinates the complete pipeline                     |    |
|  |  - Manages parallelism (worker pool)                     |    |
|  |  - Applies timeouts and cancellation                     |    |
|  +----------------------------------------------------------+    |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                        ANALYSIS LAYER                             |
|  +---------------+  +---------------+  +---------------+          |
|  | Discovery     |  | Parser        |  | Surface       |          |
|  | (glob-based)  |  | (tree-sitter) |  | Extractor     |          |
|  +---------------+  +---------------+  +---------------+          |
|                              |                                    |
|                              v                                    |
|  +---------------+  +---------------+  +---------------+          |
|  | Pattern       |  | Taint         |  | ML            |          |
|  | Engine        |  | Engine        |  | Classifier    |          |
|  +---------------+  +---------------+  +---------------+          |
|  +---------------+  +---------------+                             |
|  | LLM           |  | CodeQL        |                             |
|  | Detector      |  | Client        |                             |
|  +---------------+  +---------------+                             |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                        SUPPORT LAYER                              |
|  +---------------+  +---------------+  +---------------+          |
|  | Catalog       |  | Call Graph    |  | Type Info     |          |
|  | (src/sink)    |  | Builder       |  | System        |          |
|  +---------------+  +---------------+  +---------------+          |
|  +---------------+  +---------------+  +---------------+          |
|  | Import        |  | MSSS          |  | Scoring       |          |
|  | Resolver      |  | Calculator    |  | Context       |          |
|  +---------------+  +---------------+  +---------------+          |
+------------------------------------------------------------------+
                              |
                              v
+------------------------------------------------------------------+
|                        OUTPUT LAYER                               |
|  +---------------+  +---------------+  +---------------+          |
|  | JSON          |  | SARIF         |  | Evidence      |          |
|  | Reporter      |  | Reporter      |  | Bundle        |          |
|  +---------------+  +---------------+  +---------------+          |
+------------------------------------------------------------------+

2. Analysis Pipeline¶

2.1 Pipeline Phases¶

The analysis pipeline consists of the following phases executed sequentially:

PHASE 1: Discovery
    |
    v
PHASE 2: Parsing (parallel)
    |
    v
PHASE 3: Surface Extraction
    |
    v
PHASE 4: Analysis (parallel)
    |  +-- Pattern Engine
    |  +-- Taint Engine
    |  +-- ML Classifier
    |  +-- LLM Detector (optional)
    |  +-- CodeQL (optional)
    |
    v
PHASE 5: Normalization
    |
    v
PHASE 6: Scoring
    |
    v
PHASE 7: Reporting

2.2 Detail of Each Phase¶

PHASE 1: Discovery¶

Location: internal/discovery/discovery.go

Function: Identify all source code files to analyze.

Process: 1. Traverse directory with glob patterns (include/exclude) 2. Filter by supported extensions (.py, .ts, .tsx, .js, .jsx, .go) 3. Calculate SHA-256 of each file for the manifest 4. Record size and absolute path

Output: List of FileInfo with path, hash, size, language

PHASE 2: Parsing¶

Location: internal/parser/

Function: Convert source code to normalized AST.

Process: 1. Distribute files among workers (parallel) 2. For each file: - Select tree-sitter parser based on language - Parse to native AST - Convert to normalized AST (internal/ast/) 3. Extract: functions, classes, imports, decorators, calls

Output: List of *ast.File with normalized structure

Supported languages: | Language | Extension | Tree-sitter Grammar | Status | |----------|-----------|---------------------|--------| | Python | .py | tree-sitter-python | Complete | | TypeScript | .ts, .tsx, .mts, .cts | tree-sitter-typescript | Complete | | JavaScript | .js, .jsx, .mjs, .cjs | tree-sitter-javascript | Complete | | Go | .go | tree-sitter-go | Parsing only |

PHASE 3: Surface Extraction¶

Location: internal/surface/surface.go

Function: Identify the MCP surface of the server.

Process: 1. Detect SDK used (mcp, fastmcp, @modelcontextprotocol/sdk) 2. For each file: - Search for tool decorators (@tool, @server.tool) - Extract name, description, schema - Identify associated handler 3. Detect transport type (stdio, HTTP, WebSocket) 4. Detect authentication signals (cookies, headers, OAuth)

Output: *surface.MCPSurface with Tools, Resources, Prompts, Transport, AuthSignals

PHASE 4: Analysis¶

Location: Multiple modules

Function: Detect vulnerabilities through multiple techniques.

Parallel sub-phases:

Engine	Location	Function
Pattern Engine	`internal/pattern/`	Detection by regex/AST
Taint Engine	`internal/taint/`	Data flow source->sink
ML Classifier	`internal/ml/`	Tool poisoning classification
LLM Detector	`internal/llm/`	Semantic analysis (optional)
CodeQL	`internal/codeql/`	Secondary confirmation (optional)

PHASE 5: Normalization¶

Location: internal/types/types.go

Function: Unify and deduplicate findings.

Process: 1. Generate unique ID for each finding (SHA-256) 2. Deduplicate by ID 3. Apply baseline filter if exists 4. Sort by severity

Output: Deduplicated list of Finding

PHASE 6: Scoring¶

Location: internal/msss/msss.go, internal/scoring/scoring.go

Function: Calculate MSSS score and adjust severities.

Process: 1. Initialize base score = 100 2. Apply penalties per finding 3. Apply multiplier for criticals 4. Adjust severity by MCP context 5. Determine compliance level (0-3)

Output: *msss.Score with total, categories, level

PHASE 7: Reporting¶

Location: internal/reporter/

Function: Generate output in requested format.

Supported formats: | Format | Use | Content | |--------|-----|---------| | JSON | Custom processing | Findings + manifest + score | | SARIF 2.1.0 | GitHub/GitLab CI | Security standard | | Evidence | Auditing | Manifest + traces + complete evidence |

3. Key Components¶

3.1 Scanner (Orchestrator)¶

Location: pkg/scanner/scanner.go

The Scanner is the main entry point and coordinates the entire pipeline:

type Scanner struct {
    config      Config
    patternEng  *pattern.Engine
    taintEng    *taint.Engine
    surfaceExt  *surface.Extractor
    mlClassifier ml.Classifier
    llmDetector *llm.Detector
    codeqlClient *codeql.Client
}

Responsibilities: - Initialize all engines - Manage worker pool for parallelism - Apply timeouts via context - Coordinate result merging

3.2 Pattern Engine¶

Location: internal/pattern/engine.go

Pattern-based detection engine:

type Engine struct {
    rules             []*Rule
    severityOverrides map[string]types.Severity
    disabledRules     map[string]bool
}

Components: - Rules: List of rules with ID, detector, severity - Detectors: Implementations of Detector interface - Overrides: Configuration to adjust severities

3.3 Taint Engine¶

Location: internal/taint/engine.go

Data flow analysis engine:

type Engine struct {
    catalog *catalog.Catalog
    mode    Mode  // fast or deep
    depth   int   // Inter-procedural depth
}

Components: - Catalog: Definitions of sources/sinks/sanitizers - TaintState: Taint state per scope - Trace: Record of the source->sink path

3.4 Catalog¶

Location: internal/catalog/catalog.go

Knowledge catalog for taint analysis:

type Catalog struct {
    Sources    []SourceDef
    Sinks      []SinkDef
    Sanitizers []SanitizerDef
}

Source Categories: | Category | Description | |----------|-------------| | SourceToolInput | MCP tool input | | SourceEnvVar | Environment variables | | SourceHTTPRequest | HTTP request | | SourceFileContent | File content | | SourceDBResult | Database result |

Sink Categories: | Category | Class | Description | |----------|-------|-------------| | SinkExec | A | Command execution | | SinkEval | A | Code evaluation | | SinkFilesystem | B | File operations | | SinkNetwork | C | Network requests | | SinkDatabase | D | SQL queries | | SinkLogging | E | Data logging | | SinkResponse | E | Responses | | SinkLLMPrompt | H | LLM prompts |

3.5 ML Classifier¶

Location: internal/ml/classifier.go

Machine learning classifier for tool poisoning:

type Classifier interface {
    Classify(text string) *ClassificationResult
    Name() string
}

Implementations: - RuleBasedClassifier: Deterministic by rules - WeightedClassifier: Trained weights - EnsembleClassifier: Combination of classifiers

3.6 Surface Extractor¶

Location: internal/surface/surface.go

MCP surface extractor:

type MCPSurface struct {
    Tools       []Tool
    Resources   []Resource
    Prompts     []Prompt
    Transport   TransportType
    AuthSignals []AuthSignal
}

4. Internal Data Flow¶

4.1 Finding Structure¶

Each finding has the following structure:

type Finding struct {
    ID          string         // Unique SHA-256 hash
    RuleID      string         // MCP-X001
    Title       string         // Descriptive title
    Severity    Severity       // critical/high/medium/low/info
    Confidence  Confidence     // high/medium/low
    Class       VulnClass      // A-N
    Language    Language       // python/typescript/javascript/go
    Location    Location       // file, line, column
    Evidence    Evidence       // snippet, trace, LLM analysis
    Description string         // Finding description
    Remediation string         // How to remediate
    MCPContext  *MCPContext    // Tool/handler where found
    Trace       *TaintTrace    // Taint trace (if applicable)
}

4.2 Finding Flow¶

Pattern/Taint/ML detects match
         |
         v
+-------------------+
| Create Match      |
| - location        |
| - snippet         |
| - confidence      |
+-------------------+
         |
         v
+-------------------+
| Convert to        |
| Finding           |
| - apply rule      |
| - generate ID     |
+-------------------+
         |
         v
+-------------------+
| Context-Aware     |
| Adjustment        |
| - MCP tool boost  |
| - trace length    |
+-------------------+
         |
         v
+-------------------+
| Normalization     |
| - deduplication   |
| - baseline filter |
+-------------------+
         |
         v
+-------------------+
| MSSS Scoring      |
| - penalties       |
| - multipliers     |
+-------------------+
         |
         v
+-------------------+
| Reporter          |
| - JSON/SARIF      |
+-------------------+

5. Parallelism and Performance¶

5.1 Worker Pool¶

The scanner uses a configurable worker pool:

+------------------+
|   Scanner        |
|  +-----------+   |
|  | WorkerPool|   |
|  | (N workers)|  |
|  +-----------+   |
+------------------+
        |
        v
+-------+-------+-------+
|       |       |       |
Worker  Worker  Worker  Worker
  |       |       |       |
File1   File2   File3   File4

Configuration: - --workers N: Number of workers (0 = auto-detect CPUs) - Default: number of available CPUs

5.2 Parallelizable Phases¶

Phase	Parallel	Notes
Discovery	No	I/O bound, sequential
Parsing	Yes	Per file
Surface	No	Needs all ASTs
Analysis	Yes	Per file and per engine
Normalization	No	Needs all findings
Scoring	No	Fast, not necessary
Reporting	No	I/O output

5.3 Cancellation and Timeouts¶

The pipeline supports cancellation at any phase via context:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

result, err := scanner.Scan(ctx, "/path/to/code")

Each phase checks ctx.Done() and terminates cleanly if cancelled.

6. Extensibility¶

6.1 Adding a New Detector¶

Implement the Detector interface:

type Detector interface {
    Detect(file *ast.File, surface *surface.MCPSurface) []Match
}

Register in Pattern Engine:

engine.AddRule(&Rule{
    ID:          "MCP-X001",
    Class:       types.ClassX,
    Detector:    &MyDetector{},
    ...
})

6.2 Adding a New Source/Sink¶

Edit internal/catalog/catalog.go

Add definition:

{ID: "new-sink", Language: types.Python, Function: "dangerous_func", Category: types.SinkExec, Severity: types.SeverityCritical}

6.3 Adding a New Language¶

Add tree-sitter grammar in internal/parser/
Implement AST extractor
Add to types.Language
Update language-specific sources/sinks

7. Key Files¶

Path	Description
`pkg/scanner/scanner.go`	Public API and orchestrator
`internal/pattern/engine.go`	Pattern engine
`internal/taint/engine.go`	Taint engine
`internal/ml/classifier.go`	ML classifier
`internal/ml/features.go`	Feature extraction
`internal/llm/detector.go`	LLM detector
`internal/codeql/client.go`	CodeQL client
`internal/surface/surface.go`	Surface extractor
`internal/catalog/catalog.go`	Sources/sinks catalog
`internal/msss/msss.go`	MSSS calculator
`internal/types/types.go`	Common types

8. Complete Execution Flow¶

mcp-scan scan /path/to/code --mode fast --output json
                    |
                    v
            +---------------+
            | Load Config   |  <-- .mcp-scan.yaml or defaults
            +---------------+
                    |
                    v
            +---------------+
            | Init Scanner  |  <-- Create all engines
            +---------------+
                    |
                    v
            +---------------+
            | Discovery     |  <-- Find files
            +---------------+
                    |
                    v
            +---------------+
            | Parse Files   |  <-- tree-sitter -> AST
            | (parallel)    |
            +---------------+
                    |
                    v
            +---------------+
            | Extract       |  <-- Tools, Resources, Transport
            | Surface       |
            +---------------+
                    |
                    v
   +----------------+----------------+
   |                |                |
   v                v                v
+--------+    +---------+    +------+
|Pattern |    | Taint   |    | ML   |
|Engine  |    | Engine  |    | Clf  |
+--------+    +---------+    +------+
   |                |                |
   +----------------+----------------+
                    |
                    v
            +---------------+
            | Merge & Dedup |  <-- Unify findings
            +---------------+
                    |
                    v
            +---------------+
            | Apply         |  <-- Filter findings in baseline
            | Baseline      |
            +---------------+
                    |
                    v
            +---------------+
            | Calculate     |  <-- MSSS Score
            | MSSS Score    |
            +---------------+
                    |
                    v
            +---------------+
            | Generate      |  <-- JSON output
            | Report        |
            +---------------+
                    |
                    v
               stdout/file

Next document: taint-analysis.md