Analysis Pipeline: How Surface Connects to Scan¶
Technical document explaining the relationship between surface extraction and vulnerability scanning
1. Overview¶
The surface command and scan command are not independent - surface is a subset of scan. Understanding this relationship is key to effective use of mcp-scan.
1.1 Visual Relationship¶
scan = [discovery] → [parsing] → [SURFACE] → [analysis] → [findings]
↑
surface command
(extracts only this part)
1.2 Quick Comparison¶
| Command | What it does | Output |
|---|---|---|
mcp-scan surface <path> |
Extracts MCP metadata only | Tools, transport, auth signals |
mcp-scan scan <path> |
Full pipeline including surface | Vulnerabilities + surface + score |
2. The Complete Pipeline¶
2.1 Pipeline Stages¶
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 1: DISCOVERY │
│ Find all source files matching include/exclude patterns │
│ Output: List of files to analyze │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 2: PARSING (Parallel) │
│ Convert source code to normalized AST using tree-sitter │
│ Output: List of *ast.File structures │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 3: SURFACE EXTRACTION ← This is what `surface` command │
│ Identify MCP-specific elements: │
│ - Tools (functions LLMs can call) │
│ - Resources (data exposed to LLMs) │
│ - Transport (stdio, HTTP, WebSocket) │
│ - Auth signals (JWT, cookies, OAuth) │
│ Output: MCPSurface struct (immutable, used by all analyzers) │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 4: ANALYSIS (Parallel) │
│ Multiple engines run using surface as context: │
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Pattern Engine │ │ Taint Engine │ │ ML Classifier │ │
│ │ - 30+ rules │ │ - Data flow │ │ - Tool poison │ │
│ │ - Uses surface │ │ - Uses surface │ │ - Uses surface │ │
│ │ for context │ │ for sources │ │ for desc. │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ Optional: │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ LLM Detector │ │ CodeQL │ │
│ │ - Semantic │ │ - Deep analysis │ │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ Output: List of matches/findings from all engines │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 5: NORMALIZATION │
│ - Merge findings from all engines │
│ - Deduplicate by unique ID │
│ - Apply baseline filters │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 6: SCORING (MSSS) │
│ - Calculate compliance score (0-100) │
│ - Apply severity adjustments │
└───────────────────────────────┬──────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────────┐
│ STAGE 7: REPORTING │
│ Generate output in requested format (JSON, SARIF, Evidence) │
└──────────────────────────────────────────────────────────────────┘
3. How Surface is Used in Analysis¶
The MCPSurface struct is passed to ALL analysis engines. Each engine uses it differently:
3.1 Taint Engine: Identifying Sources¶
Critical: Tool handlers are automatically marked as taint sources.
// In internal/taint/engine.go
func (e *Engine) isToolHandler(fn ast.Function, surf *surface.MCPSurface) bool {
// Check decorators
for _, dec := range fn.Decorators {
if strings.Contains(strings.ToLower(dec.Name), "tool") {
return true
}
}
// Check against surface
if surf != nil {
for _, tool := range surf.Tools {
if tool.Handler != nil && tool.Handler.FunctionName == fn.Name {
return true
}
}
}
return false
}
// When analyzing a function
if isTool {
// ALL parameters are marked as TAINTED (HIGH confidence)
for _, param := range fn.Parameters {
state.SetTaint(param.Name, &TaintInfo{
Source: fn.Location,
SourceType: types.SourceToolInput,
Confidence: types.ConfidenceHigh,
})
}
}
Why this matters: Tool parameters come from LLM user input - they're untrusted by definition. This is MCP's unique threat model.
3.2 Pattern Engine: Context-Aware Detection¶
Some detectors analyze surface elements directly:
// PromptInjectionDetector checks tool descriptions
func (d *PromptInjectionDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
if surf != nil {
for _, tool := range surf.Tools {
// Check description for injection markers
if containsInjectionMarker(tool.Description) {
matches = append(matches, Match{
Location: tool.Location,
Context: "Tool: " + tool.Name,
Snippet: tool.Description,
})
}
}
}
return matches
}
// ToolShadowingDetector checks tool names
func (d *ToolShadowingDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
dangerousNames := []string{"exec", "shell", "bash", "sudo", "rm", "eval"}
if surf != nil {
for _, tool := range surf.Tools {
for _, dangerous := range dangerousNames {
if strings.Contains(strings.ToLower(tool.Name), dangerous) {
// Tool name shadows system command!
}
}
}
}
}
3.3 ML Classifier: Description Analysis¶
// In ML detector
if surf != nil {
for _, tool := range surf.Tools {
result := classifier.Classify(tool.Description)
if result.IsInjection {
// Report tool poisoning attempt
}
}
}
3.4 Reporter: MCP Context in Findings¶
{
"finding": {
"rule_id": "MCP-A003",
"mcp_context": {
"tool_name": "execute_command",
"handler_name": "handle_execute",
"transport": "stdio",
"is_tool_handler": true
}
}
}
4. When to Use Each Command¶
4.1 Use surface when:¶
- Debugging/Understanding: "What tools does mcp-scan see in my code?"
- Quick Preview: Before running a full scan, verify tool detection
- Documentation: Generate inventory of MCP surface for docs
- CI Validation: Verify expected tools are present
# Preview what the scanner sees
mcp-scan surface ./my-server
# JSON output for processing
mcp-scan surface ./my-server --output json > surface.json
# Verify specific tool is detected
mcp-scan surface ./my-server | grep "execute_command"
4.2 Use scan when:¶
- Finding Vulnerabilities: The actual security analysis
- CI/CD Integration: Automated security checks
- Audits: Comprehensive security assessment
- Compliance: MSSS scoring and certification
# Full security scan
mcp-scan scan ./my-server
# CI pipeline
mcp-scan scan ./my-server --fail-on high --output sarif
5. Data Flow Example¶
5.1 Source Code¶
# server.py
from mcp import Server
from mcp.server.stdio import stdio_server
server = Server("my-server")
@server.tool()
def execute_command(cmd: str):
"""Execute a system command."""
import subprocess
return subprocess.run(cmd, shell=True) # VULNERABILITY!
5.2 Surface Extraction Output¶
{
"tools": [
{
"name": "execute_command",
"description": "Execute a system command.",
"handler": {
"function_name": "execute_command",
"file": "server.py",
"line": 8
}
}
],
"transport": "stdio"
}
5.3 How Analysis Uses Surface¶
- Taint Engine sees
execute_commandis insurface.Tools - Therefore,
cmdparameter is marked as tainted (SourceToolInput) - Taint propagates to
subprocess.run(cmd, shell=True) subprocess.runis a sink (SinkExec)- Finding generated: MCP-A003 (RCE via tool input)
5.4 Final Finding¶
{
"rule_id": "MCP-A003",
"title": "Remote Code Execution via Tool Input",
"severity": "critical",
"confidence": "high",
"location": {
"file": "server.py",
"line": 10
},
"mcp_context": {
"tool_name": "execute_command",
"is_tool_handler": true
},
"trace": [
{"type": "source", "location": "server.py:8", "variable": "cmd"},
{"type": "sink", "location": "server.py:10", "function": "subprocess.run"}
]
}
6. Architecture Diagram¶
┌─────────────┐
│ Source Code │
└──────┬──────┘
│
┌──────▼──────┐
│ Discovery │
└──────┬──────┘
│
┌──────▼──────┐
│ Parsing │
│(tree-sitter)│
└──────┬──────┘
│
┌──────▼──────┐
┌──────────►│ Surface │◄─────────────┐
│ │ Extraction │ │
│ └──────┬──────┘ │
│ │ │
│ MCPSurface │
│ (immutable) │
│ │ │
│ ┌────────┬────┴────┬─────────┐ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ Pattern Taint ML LLM │
│ Engine Engine Classif Detector │
│ │ │ │ │ │
│ └────────┴────┬────┴─────────┘ │
│ │ │
│ ┌──────▼──────┐ │
│ │ Merge & │ │
│ │ Dedupe │ │
`surface` └──────┬──────┘ `scan`
command │ command
(stops here) ┌──────▼──────┐ (full pipeline)
│ MSSS │
│ Scoring │
└──────┬──────┘
│
┌──────▼──────┐
│ Report │
│ Generator │
└─────────────┘
7. Summary¶
| Aspect | surface |
scan |
|---|---|---|
| Stages | Discovery → Parse → Surface | Discovery → Parse → Surface → Analysis → Score → Report |
| Output | MCPSurface (tools, transport, auth) | Findings + Surface + MSSS Score |
| Use Case | Debug, preview, inventory | Security analysis |
| Runtime | Fast (no analysis) | Depends on mode (fast/deep) |
| Dependencies | None | Catalog, rules, engines |
Related documents: - MCP Surface Detection - Detailed surface extraction - Taint Analysis - How taint engine uses surface - Vulnerability Classes - Rule definitions