Analysis Pipeline: How Surface Connects to Scan¶

Technical document explaining the relationship between surface extraction and vulnerability scanning

1. Overview¶

The surface command and scan command are not independent - surface is a subset of scan. Understanding this relationship is key to effective use of mcp-scan.

1.1 Visual Relationship¶

scan = [discovery] → [parsing] → [SURFACE] → [analysis] → [findings]
                                    ↑
                              surface command
                           (extracts only this part)

1.2 Quick Comparison¶

Command	What it does	Output
`mcp-scan surface <path>`	Extracts MCP metadata only	Tools, transport, auth signals
`mcp-scan scan <path>`	Full pipeline including surface	Vulnerabilities + surface + score

2. The Complete Pipeline¶

2.1 Pipeline Stages¶

┌──────────────────────────────────────────────────────────────────┐
│ STAGE 1: DISCOVERY                                                │
│ Find all source files matching include/exclude patterns          │
│ Output: List of files to analyze                                  │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 2: PARSING (Parallel)                                       │
│ Convert source code to normalized AST using tree-sitter          │
│ Output: List of *ast.File structures                              │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 3: SURFACE EXTRACTION  ← This is what `surface` command    │
│ Identify MCP-specific elements:                                   │
│ - Tools (functions LLMs can call)                                │
│ - Resources (data exposed to LLMs)                               │
│ - Transport (stdio, HTTP, WebSocket)                             │
│ - Auth signals (JWT, cookies, OAuth)                             │
│ Output: MCPSurface struct (immutable, used by all analyzers)     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 4: ANALYSIS (Parallel)                                      │
│ Multiple engines run using surface as context:                   │
│                                                                   │
│   ┌─────────────────┐  ┌─────────────────┐  ┌─────────────────┐  │
│   │ Pattern Engine  │  │ Taint Engine    │  │ ML Classifier   │  │
│   │ - 30+ rules     │  │ - Data flow     │  │ - Tool poison   │  │
│   │ - Uses surface  │  │ - Uses surface  │  │ - Uses surface  │  │
│   │   for context   │  │   for sources   │  │   for desc.     │  │
│   └─────────────────┘  └─────────────────┘  └─────────────────┘  │
│                                                                   │
│   Optional:                                                       │
│   ┌─────────────────┐  ┌─────────────────┐                       │
│   │ LLM Detector    │  │ CodeQL          │                       │
│   │ - Semantic      │  │ - Deep analysis │                       │
│   └─────────────────┘  └─────────────────┘                       │
│                                                                   │
│ Output: List of matches/findings from all engines                │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 5: NORMALIZATION                                            │
│ - Merge findings from all engines                                │
│ - Deduplicate by unique ID                                       │
│ - Apply baseline filters                                         │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 6: SCORING (MSSS)                                           │
│ - Calculate compliance score (0-100)                             │
│ - Apply severity adjustments                                     │
└───────────────────────────────┬──────────────────────────────────┘
                                │
                                ▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 7: REPORTING                                                │
│ Generate output in requested format (JSON, SARIF, Evidence)      │
└──────────────────────────────────────────────────────────────────┘

3. How Surface is Used in Analysis¶

The MCPSurface struct is passed to ALL analysis engines. Each engine uses it differently:

3.1 Taint Engine: Identifying Sources¶

Critical: Tool handlers are automatically marked as taint sources.

// In internal/taint/engine.go
func (e *Engine) isToolHandler(fn ast.Function, surf *surface.MCPSurface) bool {
    // Check decorators
    for _, dec := range fn.Decorators {
        if strings.Contains(strings.ToLower(dec.Name), "tool") {
            return true
        }
    }

    // Check against surface
    if surf != nil {
        for _, tool := range surf.Tools {
            if tool.Handler != nil && tool.Handler.FunctionName == fn.Name {
                return true
            }
        }
    }
    return false
}

// When analyzing a function
if isTool {
    // ALL parameters are marked as TAINTED (HIGH confidence)
    for _, param := range fn.Parameters {
        state.SetTaint(param.Name, &TaintInfo{
            Source:     fn.Location,
            SourceType: types.SourceToolInput,
            Confidence: types.ConfidenceHigh,
        })
    }
}

Why this matters: Tool parameters come from LLM user input - they're untrusted by definition. This is MCP's unique threat model.

3.2 Pattern Engine: Context-Aware Detection¶

Some detectors analyze surface elements directly:

// PromptInjectionDetector checks tool descriptions
func (d *PromptInjectionDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    if surf != nil {
        for _, tool := range surf.Tools {
            // Check description for injection markers
            if containsInjectionMarker(tool.Description) {
                matches = append(matches, Match{
                    Location: tool.Location,
                    Context:  "Tool: " + tool.Name,
                    Snippet:  tool.Description,
                })
            }
        }
    }
    return matches
}

// ToolShadowingDetector checks tool names
func (d *ToolShadowingDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    dangerousNames := []string{"exec", "shell", "bash", "sudo", "rm", "eval"}

    if surf != nil {
        for _, tool := range surf.Tools {
            for _, dangerous := range dangerousNames {
                if strings.Contains(strings.ToLower(tool.Name), dangerous) {
                    // Tool name shadows system command!
                }
            }
        }
    }
}

3.3 ML Classifier: Description Analysis¶

// In ML detector
if surf != nil {
    for _, tool := range surf.Tools {
        result := classifier.Classify(tool.Description)
        if result.IsInjection {
            // Report tool poisoning attempt
        }
    }
}

3.4 Reporter: MCP Context in Findings¶

{
  "finding": {
    "rule_id": "MCP-A003",
    "mcp_context": {
      "tool_name": "execute_command",
      "handler_name": "handle_execute",
      "transport": "stdio",
      "is_tool_handler": true
    }
  }
}

4. When to Use Each Command¶

4.1 Use `surface` when:¶

Debugging/Understanding: "What tools does mcp-scan see in my code?"
Quick Preview: Before running a full scan, verify tool detection
Documentation: Generate inventory of MCP surface for docs
CI Validation: Verify expected tools are present

# Preview what the scanner sees
mcp-scan surface ./my-server

# JSON output for processing
mcp-scan surface ./my-server --output json > surface.json

# Verify specific tool is detected
mcp-scan surface ./my-server | grep "execute_command"

4.2 Use `scan` when:¶

Finding Vulnerabilities: The actual security analysis
CI/CD Integration: Automated security checks
Audits: Comprehensive security assessment
Compliance: MSSS scoring and certification

# Full security scan
mcp-scan scan ./my-server

# CI pipeline
mcp-scan scan ./my-server --fail-on high --output sarif

5. Data Flow Example¶

5.1 Source Code¶

# server.py
from mcp import Server
from mcp.server.stdio import stdio_server

server = Server("my-server")

@server.tool()
def execute_command(cmd: str):
    """Execute a system command."""
    import subprocess
    return subprocess.run(cmd, shell=True)  # VULNERABILITY!

5.2 Surface Extraction Output¶

{
  "tools": [
    {
      "name": "execute_command",
      "description": "Execute a system command.",
      "handler": {
        "function_name": "execute_command",
        "file": "server.py",
        "line": 8
      }
    }
  ],
  "transport": "stdio"
}

5.3 How Analysis Uses Surface¶

Taint Engine sees execute_command is in surface.Tools
Therefore, cmd parameter is marked as tainted (SourceToolInput)
Taint propagates to subprocess.run(cmd, shell=True)
subprocess.run is a sink (SinkExec)
Finding generated: MCP-A003 (RCE via tool input)

5.4 Final Finding¶

{
  "rule_id": "MCP-A003",
  "title": "Remote Code Execution via Tool Input",
  "severity": "critical",
  "confidence": "high",
  "location": {
    "file": "server.py",
    "line": 10
  },
  "mcp_context": {
    "tool_name": "execute_command",
    "is_tool_handler": true
  },
  "trace": [
    {"type": "source", "location": "server.py:8", "variable": "cmd"},
    {"type": "sink", "location": "server.py:10", "function": "subprocess.run"}
  ]
}

6. Architecture Diagram¶

                    ┌─────────────┐
                    │ Source Code │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │  Discovery  │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   Parsing   │
                    │(tree-sitter)│
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
        ┌──────────►│   Surface   │◄─────────────┐
        │           │ Extraction  │              │
        │           └──────┬──────┘              │
        │                  │                     │
        │           MCPSurface                   │
        │           (immutable)                  │
        │                  │                     │
        │    ┌────────┬────┴────┬─────────┐     │
        │    │        │         │         │     │
        │    ▼        ▼         ▼         ▼     │
        │ Pattern  Taint      ML       LLM      │
        │ Engine   Engine   Classif  Detector   │
        │    │        │         │         │     │
        │    └────────┴────┬────┴─────────┘     │
        │                  │                     │
        │           ┌──────▼──────┐              │
        │           │   Merge &   │              │
        │           │   Dedupe    │              │
   `surface`        └──────┬──────┘        `scan`
    command                │               command
   (stops here)     ┌──────▼──────┐      (full pipeline)
                    │   MSSS      │
                    │   Scoring   │
                    └──────┬──────┘
                           │
                    ┌──────▼──────┐
                    │   Report    │
                    │  Generator  │
                    └─────────────┘

7. Summary¶

Aspect	`surface`	`scan`
Stages	Discovery → Parse → Surface	Discovery → Parse → Surface → Analysis → Score → Report
Output	MCPSurface (tools, transport, auth)	Findings + Surface + MSSS Score
Use Case	Debug, preview, inventory	Security analysis
Runtime	Fast (no analysis)	Depends on mode (fast/deep)
Dependencies	None	Catalog, rules, engines

Related documents: - MCP Surface Detection - Detailed surface extraction - Taint Analysis - How taint engine uses surface - Vulnerability Classes - Rule definitions