Operations Guide¶
Comprehensive guide to all operational modes, internal processes, and system behavior.
Overview¶
MCP-Scan operates in two primary modes with various configuration options that affect internal processing:
| Mode | Description | Use Case |
|---|---|---|
| Fast | Intra-procedural analysis only | CI/CD, quick feedback |
| Deep | Full inter-procedural analysis | Security audits, certification |
Analysis Modes in Detail¶
Fast Mode¶
Fast mode performs analysis within individual functions without tracking data flow across function calls.
What happens internally:
1. DISCOVERY PHASE
- Scan directory for source files
- Apply include/exclude patterns
- Detect language from file extensions
2. PARSING PHASE (per file)
- Load file content
- Create tree-sitter parser for language
- Parse to tree-sitter AST
- Extract to normalized AST (functions, classes, imports)
3. SURFACE EXTRACTION (per file)
- Detect MCP SDK imports
- Find tool decorators (@tool, @server.tool)
- Extract tool parameters
- Detect transport type (stdio, http, websocket)
- Find auth patterns (cookies, headers, OAuth)
4. TAINT ANALYSIS (per function)
- Initialize taint state
- Mark tool parameters as tainted
- Process statements sequentially
- Track variable assignments
- Check for sinks (exec, eval, filesystem, etc.)
- Apply sanitizers when found
- Generate findings for taint→sink flows
5. PATTERN MATCHING (per file)
- Run all enabled detectors
- AST-based detectors scan function calls
- Regex detectors scan raw content
- ML classifier analyzes tool descriptions
- Generate findings for pattern matches
6. AGGREGATION
- Combine taint and pattern findings
- Deduplicate by location
- Assign deterministic IDs
7. OUTPUT
- Apply baseline filter (if configured)
- Calculate MSSS score
- Generate report (JSON/SARIF/Evidence)
Fast mode limitations: - Cannot track data flow across function calls - Cannot detect multi-step vulnerabilities - Cannot resolve cross-file imports - No function summary computation
Deep Mode¶
Deep mode enables full inter-procedural analysis with function summaries and cross-file tracking.
Additional steps in Deep mode:
1. TYPE INFERENCE (new in Deep mode)
- Analyze variable assignments
- Infer types from literals and constructors
- Track type through assignments
- Use type info for smarter taint propagation
2. IMPORT RESOLUTION (new in Deep mode)
- Parse import statements
- Resolve relative imports
- Build module→file mapping
- Index exported symbols
3. CALL GRAPH CONSTRUCTION (new in Deep mode)
- Index all functions (ID: filepath:funcname)
- Find all call sites
- Create edges (caller→callee)
- Mark MCP tool handlers
- Persist graph for incremental analysis
4. FUNCTION SUMMARY COMPUTATION
- Process functions in topological order (leaves first)
- For each function:
a. Taint each parameter individually
b. Run intra-procedural taint analysis
c. Record which parameters reach return
d. Record which parameters reach sinks
e. Store summary: {TaintedParams, ReturnsTaint, SinksReached}
5. CONTEXT-SENSITIVE ANALYSIS
- Start from MCP tool handlers (entry points)
- At each call site:
a. Check if caller has tainted args
b. Look up callee summary
c. If tainted arg index in TaintedParams → propagate
d. If callee has SinksReached → emit finding with trace
- Recurse up to max_depth (default: 10)
6. CROSS-FILE TAINT TRACKING
- Resolve called function to file
- Use function summary from that file
- Build cross-file trace if vulnerability found
Deep mode capabilities: - Track data flow across function boundaries - Detect multi-tool attack chains - Identify privilege escalation patterns - Find authentication bypass across modules
Internal Data Structures¶
TaintState¶
Tracks tainted variables during analysis:
type TaintState struct {
Variables map[string]*TaintInfo // Variable taints
Properties map[string]map[string]*TaintInfo // obj.prop taints
Returns *TaintInfo // Function return taint
Parent *TaintState // Scope chain
Findings []*Finding // Accumulated findings
}
TaintInfo¶
Information about a tainted value:
type TaintInfo struct {
Source Location // Where taint originated
SourceType SourceCategory // tool_input, env_var, http_request
Via []TraceStep // Propagation trace
Confidence Confidence // High, Medium, Low
TypeInfo *TypeInfo // Inferred type (Deep mode)
SanitizedFor []SinkCategory // Already sanitized for
}
FunctionSummary¶
Summary for inter-procedural analysis:
type FunctionSummary struct {
TaintedParams []int // Params that propagate taint
ReturnsTaint bool // Return can be tainted
TaintSources []int // Params that become sources
SanitizedParams []int // Params that are sanitized
HasSink bool // Contains dangerous sink
SinkTypes []SinkCategory // Types of sinks present
}
Statement Processing¶
How different statement types are processed:
Assignment¶
Processing: 1. Evaluate right-hand side for taint 2. If tainted, set taint on left-hand side variable 3. Add trace step: "assign to x"
Function Call¶
Processing: 1. Check if function is a source → create new taint 2. Check if function is a sanitizer → clear taint 3. Check if function is a sink → emit finding if tainted arg 4. (Deep mode) Look up function summary → propagate accordingly
Binary Operations¶
Processing: 1. Evaluate both operands 2. If either is tainted, result is tainted 3. Merge taints with combined trace
Member Access¶
Processing: 1. Check if object is tainted → propagate 2. Check property-specific taint 3. Check if this is a source pattern
Control Flow¶
Processing: 1. Fork taint state for branches 2. Process each branch independently 3. Merge states at join point (conservative: union of taints)
Source Detection¶
How sources are identified:
Tool Input Sources¶
MCP tool parameters are automatic sources:
Detection: 1. Surface extractor finds @tool decorator 2. All function parameters marked as tainted 3. SourceType set to SourceToolInput
Environment Sources¶
Detection: 1. Match against catalog source patterns 2. Receiver: "os", Function: "environ"/"getenv" 3. Create taint with SourceEnvVar
HTTP Sources¶
Detection: 1. Match receiver pattern: "request" 2. Match property/method: "body", "args", "form" 3. Create taint with SourceHTTPRequest
Sink Detection¶
How dangerous operations are identified:
Command Execution Sinks¶
os.system(cmd) # SinkExec, Critical
subprocess.run(args) # SinkExec, Critical
exec(code) # SinkEval, Critical
eval(expr) # SinkEval, Critical
Detection: 1. Match against catalog sink patterns 2. Check argument index (usually 0) 3. If argument is tainted and not sanitized → Finding
Filesystem Sinks¶
Detection: 1. Match function name: "open" 2. Check first argument (path) 3. If tainted → possible path traversal
Network Sinks¶
Detection: 1. Match receiver/function patterns 2. If URL is tainted → possible SSRF
LLM Prompt Sinks¶
openai.ChatCompletion.create(messages=msgs) # SinkLLMPrompt
anthropic.messages.create(messages=msgs) # SinkLLMPrompt
Detection: 1. Match LLM API patterns 2. Check if message/prompt argument is tainted 3. If tainted → possible prompt injection
Sanitizer Recognition¶
How sanitization breaks taint:
Explicit Sanitizers¶
safe = shlex.quote(user_input) # Sanitizes for SinkExec
safe = html.escape(user_input) # Sanitizes for SinkResponse
safe = int(user_input) # Sanitizes for multiple sinks
Processing: 1. Match against catalog sanitizer patterns 2. Get sanitized categories from definition 3. Remove those categories from taint's SanitizedFor 4. If all sink categories sanitized → clear taint
Type-Based Sanitization (Deep Mode)¶
num = int(user_input) # Type: int, reduces RCE risk
count = len(user_input) # Type: int, not injectable
Processing (with type inference): 1. Infer type of result (int) 2. Reduce taint confidence for incompatible sinks 3. int/float/bool types reduce severity for string-based attacks
Pattern Detection¶
How pattern-based detection works:
AST-Based Detectors¶
Analyze parsed AST for suspicious patterns:
# DirectShellDetector
os.system("rm -rf " + path) # Detects string concat with command
# DangerousFunctionDetector
eval(code) # Detects dangerous function calls
# HardcodedSecretDetector
API_KEY = "sk-1234..." # Detects hardcoded secrets
Regex-Based Detectors¶
Scan raw source for patterns:
# Detects URLs in code
http://internal-service:8080/api
# Detects base64-encoded secrets
YWRtaW46cGFzc3dvcmQ=
# Detects SQL patterns
"SELECT * FROM users WHERE id = " + user_id
ML-Based Detectors¶
Classify text using machine learning:
Processing: 1. Extract 29 features from description 2. Run through classifier (rule-based, weighted, or ensemble) 3. If probability > threshold → emit finding
Call Graph Operations¶
Building the Graph¶
1. Index all functions by ID (filepath:class.funcname)
2. For each function:
- Find all call expressions
- Resolve callee to function ID
- Create edge with call site location
3. Mark MCP tool handlers (IsTool = true)
Using the Graph¶
# Get all functions a tool handler can reach
reachable = graph.GetReachableFunctions("server.py:search", depth=5)
# Check if any reachable function has dangerous sink
for fn in reachable:
if fn.Summary.HasSink:
# Potential vulnerability path
Incremental Updates¶
1. Load cached graph
2. Compute file hashes
3. Compare with stored hashes
4. For changed files:
- Remove old nodes/edges
- Re-parse file
- Add new nodes/edges
5. Recompute affected summaries
6. Save updated graph
ML Classification Pipeline¶
Feature Extraction (29 features)¶
| Category | Features |
|---|---|
| Length | length, word_count, avg_word_length, sentence_count |
| Character | uppercase_ratio, lowercase_ratio, digit_ratio, special_char_ratio, whitespace_ratio |
| Keywords | injection_keyword_count, command_keyword_count, role_keyword_count, exfiltration_keyword_count |
| Patterns | delimiter_count, base64_pattern_count, unicode_escape_count, question_count, exclamation_count, imperative_verb_count |
| Entropy | char_entropy |
| Positional | starts_with_imperative, ends_with_question, has_code_block, has_xml_tags |
| Complex | has_ignore_pattern, has_system_prompt, has_role_play, has_jailbreak, has_exfil_request |
Classification Process¶
1. Extract text from tool description
2. Compute all 29 features
3. Run classifier:
- RuleBasedClassifier: Weighted score from patterns
- WeightedClassifier: Dot product with trained weights
- EnsembleClassifier: Combine multiple classifiers
4. Compare probability to threshold (default: 0.3)
5. Assign category: jailbreak, identity_manipulation, instruction_override, etc.
Output Generation¶
JSON Output¶
{
"version": "1.0.0",
"scan_time": "2024-01-20T10:00:00Z",
"mode": "deep",
"findings": [
{
"id": "abc123...",
"rule_id": "MCP-A-001",
"class": "A",
"severity": "critical",
"confidence": "high",
"title": "Command Injection",
"description": "...",
"location": {...},
"evidence": {...},
"remediation": "..."
}
],
"msss": {
"score": 72,
"level": 2,
"compliant": true
}
}
SARIF Output¶
SARIF 2.1.0 format for tool integration:
{
"$schema": "https://raw.githubusercontent.com/oasis-tcs/sarif-spec/master/Schemata/sarif-schema-2.1.0.json",
"version": "2.1.0",
"runs": [{
"tool": {
"driver": {
"name": "mcp-scan",
"version": "1.0.0",
"rules": [...]
}
},
"results": [...]
}]
}
Evidence Bundle¶
Comprehensive package for audits:
evidence/
├── manifest.json # Scan metadata
├── findings.json # All findings
├── sarif.json # SARIF report
├── surface.json # Extracted MCP surface
├── callgraph.json # Call graph (Deep mode)
├── snippets/ # Code snippets per finding
│ ├── finding-001.py
│ └── ...
└── traces/ # Taint traces per finding
├── finding-001.json
└── ...
Performance Characteristics¶
Fast Mode¶
- Memory: O(file_size) - one file at a time
- Time: O(n * f) - n files, f = avg functions per file
- Parallel: Files processed in parallel (configurable workers)
Deep Mode¶
- Memory: O(total_functions) - call graph in memory
- Time: O(n * f * d) - d = call depth
- Cache: Call graph cached for incremental updates
Timeouts¶
| Timeout | Default | Description |
|---|---|---|
| Scan | 300s | Total scan timeout |
| File | 30s | Per-file parsing timeout |
| Analysis | 60s | Per-file analysis timeout |
Error Handling¶
Parse Errors¶
- Tree-sitter handles syntax errors gracefully
- Partial AST still extracted
- Warning logged, file included in results
- Finding may note "partial_parse"
Analysis Errors¶
- Timeout → file skipped, warning logged
- Memory limit → switch to fast mode for file
- Invalid pattern → rule disabled, warning logged
Recovery¶
- Errors isolated per file
- Scan continues with remaining files
- Errors reported in result metadata
Configuration Effects¶
Mode Selection¶
| Config | Effect |
|---|---|
mode: fast |
Skip type inference, imports, call graph |
mode: deep |
Enable all inter-procedural analysis |
Analysis Tuning¶
| Config | Effect |
|---|---|
max_depth: N |
Limit call graph traversal depth |
timeout: N |
Set analysis timeout |
track_properties: bool |
Enable/disable property tracking |
ML Detection¶
| Config | Effect |
|---|---|
ml_detection.enabled: bool |
Enable/disable ML classifier |
ml_detection.threshold: N |
Classification threshold (0-1) |
ml_detection.classifier: type |
rule_based, weighted, ensemble |
Output Control¶
| Config | Effect |
|---|---|
include_trace: bool |
Include propagation traces |
include_snippet: bool |
Include code snippets |
format: type |
json, sarif, evidence |
Related Documentation¶
- Architecture - System architecture overview
- Taint Analysis - Detailed taint engine documentation
- Pattern Engine - Rule-based detection details
- ML Classifier - Machine learning detection
- Call Graph - Inter-procedural analysis
- Type Inference - Type system details
- Import Resolver - Cross-file resolution
- Surface Extraction - MCP component detection