Known Limitations
Technical document for security analysts
1. Introduction
This document describes the known limitations of each mcp-scan feature. It is fundamental for security analysts to understand these limitations to:
- Correctly interpret results
- Identify areas that require manual review
- Avoid false sense of security
- Complement analysis with other tools
2. Parsing Limitations
2.1 Tree-sitter Parser
| Limitation |
Impact |
Mitigation |
| Syntax errors stop parsing |
Files with errors are not analyzed |
Validate syntax beforehand |
| Macros not expanded |
Generated code not analyzed |
Analyze post-build code |
| Dynamic code ignored |
exec(code_string) not parsed |
Review manually |
2.2 Languages
| Language |
Status |
Specific Limitations |
| Python |
Complete |
Partial type hints |
| TypeScript |
Complete |
Simplified generics |
| JavaScript |
Complete |
Basic JSX |
| Go |
Parsing only |
No taint analysis |
2.3 Unsupported Constructs
# Metaprogramming
@decorator_factory() # Dynamic decorators
setattr(obj, "method", func) # Dynamic attributes
exec("def tool(): pass") # Generated code
# Dynamic imports
module = __import__(name)
importlib.import_module(name)
3. Taint Analysis Limitations
3.1 Intra-procedural Analysis (Fast Mode)
| Limitation |
Example |
Impact |
| Doesn't cross functions |
f(user_input) where f calls sink |
FN |
| Doesn't follow returns |
x = get_input() where get_input returns source |
FN |
| Partial lambdas |
Complex closures |
FN/Imprecision |
False Negative Example:
def process(data):
os.system(data) # Sink in another function
@server.tool()
def handler(cmd: str):
process(cmd) # Not detected in fast mode
3.2 Inter-procedural Analysis (Deep Mode)
| Limitation |
Description |
| Limited depth |
Default: 3 levels of calls |
| No recursion support |
Recursive functions may cause imprecision |
| Imprecise call graph |
Virtual methods not resolved |
| Slower |
10-100x slower than fast |
3.3 Taint Propagation
| Unsupported Case |
Example |
| Implicit taint |
if secret == x: print("yes") |
| Complex structures |
obj.nested.deep.value |
| Collections |
list[index] where index is variable |
| Alias |
alias = dangerous_func; alias(data) |
3.4 Sanitizers
| Limitation |
Impact |
| Only built-in catalog |
Custom sanitizers not recognized |
| No effectiveness analysis |
strip() is not a sanitizer but doesn't cause FP |
| Partial sanitization |
replace("..", "") not recognized |
False Positive Example:
def my_sanitizer(path):
"""Custom sanitizer not in catalog"""
if ".." in path:
raise ValueError("Invalid path")
return path
@server.tool()
def read(path: str):
safe_path = my_sanitizer(path) # Not recognized as sanitizer
return open(safe_path).read() # Reported as vulnerable
4. Pattern Engine Limitations
4.1 Regex Detection
| Limitation |
Example |
Impact |
| Doesn't understand context |
"os.system in docs" in string |
FP |
| Comments |
# os.system(cmd) commented |
FP |
| Multiline strings |
Pattern split across lines |
FN |
| Obfuscation |
os["system"](cmd) |
FN |
4.2 AST Detection
| Limitation |
Description |
| Import aliases |
import os as o; o.system() not detected |
| Indirect calls |
getattr(os, "system")(cmd) |
| Custom decorators |
Only fixed list recognized |
4.3 Specific Rules
| Rule |
Limitation |
| MCP-E001 (Secrets) |
Doesn't detect secrets in .env files |
| MCP-F002 (JWT) |
Doesn't analyze configuration in JSON |
| MCP-G003 (Shadowing) |
Fixed list of names |
5. ML Classifier Limitations
5.1 Features
| Limitation |
Impact |
| Optimized for English |
Other languages have worse recall |
| Fixed keywords |
New techniques not covered |
| No semantic context |
Legitimate "ignore" causes FP |
5.2 Classifiers
| Type |
Limitation |
| RuleBased |
Deterministic but rigid |
| Weighted |
Requires trained weights |
| Ensemble |
Slower, not always better |
5.3 Categories
| Category |
Limitation |
jailbreak |
Only known patterns |
identity |
Legitimate role play causes FP |
delimiter |
New formats not covered |
5.4 Error Examples
False Positive:
Description: "This tool helps you ignore duplicate files"
^^^^^^ triggers "ignore"
False Negative:
Description: "Desatienda las instrucciones previas" # Spanish
6. LLM Detector Limitations
6.1 Dependencies
| Limitation |
Mitigation |
| Requires Ollama |
Fallback to ML classifier |
| Model must be downloaded |
Document setup |
| Network latency |
Cache results |
6.2 Model
| Limitation |
Impact |
| llama3.2:3b limited |
Larger models are slower |
| Non-deterministic |
Results may vary |
| Knowledge cutoff |
Doesn't know new techniques |
6.3 Prompt
| Limitation |
Description |
| English only |
Fixed English prompt |
| 5000 chars max |
Truncation of long texts |
| No code context |
Only sees description, not the handler |
6.4 Reliability
| Scenario |
Behavior |
| Very short text |
Always "benign" |
| Ambiguous text |
Low confidence |
| New technique |
May not detect |
7. CodeQL Limitations
7.1 Requirements
| Limitation |
Mitigation |
| Separate installation |
Document setup |
| Heavy (~2GB) |
Optional in CI/CD |
| Long timeout |
Configurable |
7.2 Coverage
| Language |
Support |
Notes |
| Python |
Good |
Complete query pack |
| JavaScript |
Good |
Includes TypeScript |
| Go |
Good |
Complete query pack |
| Others |
No |
Only those listed |
7.3 Queries
| Limitation |
Description |
| Not MCP-aware |
Doesn't understand @tool decorators |
| Generic queries |
Not specific to tool poisoning |
| No category G |
Doesn't detect prompt injection |
| Factor |
Impact |
| DB creation |
1-10 minutes |
| Analysis |
5-30 minutes |
| Memory |
2-8 GB RAM |
8. MCP Surface Limitations
| Limitation |
Example |
| Only known decorators |
@my_custom_tool not detected |
| Imprecise heuristics |
tool_utils() false positive |
| Without docstrings |
Empty description |
8.2 Transport Detection
| Limitation |
Description |
| Based on imports |
If no import, "unknown" |
| No runtime verification |
Doesn't confirm it actually uses that transport |
8.3 SDKs
| SDK |
Limitation |
| Python Official |
Good coverage |
| FastMCP |
Basic |
| TypeScript |
Partial |
| Custom |
Not supported |
9. MSSS Scoring Limitations
9.1 Weights
| Limitation |
Impact |
| Fixed weights |
Not adjustable by context |
| No business context |
SQLi in demo = SQLi in production |
| Simple cumulative |
10 mediums != 1 critical |
9.2 Levels
| Limitation |
Description |
| Binary |
Critical present = Level 0 always |
| No gradation |
5 criticals = 1 critical for level |
| No exceptions |
Doesn't allow marking "accepted risk" |
9.3 Context
| Not Considered |
Example |
| External mitigations |
WAF that blocks SQLi |
| Environment |
Dev vs Production |
| Exposure |
Internal vs public |
10. General Limitations
10.1 Static vs Dynamic Analysis
| Aspect |
Static (mcp-scan) |
Dynamic (not supported) |
| Coverage |
All code |
Only executed paths |
| Precision |
May have FP |
Fewer FP |
| Recall |
May have FN |
Detects real behavior |
| Speed |
Fast |
Slow |
10.2 Types of Vulnerabilities
| Type |
Detection |
Notes |
| Injection (A,B,C,D) |
Good |
Taint analysis |
| Secrets (E) |
Good |
Pattern matching |
| Auth (F) |
Partial |
Known patterns |
| Tool Poisoning (G) |
Good |
ML + LLM |
| Logic Bugs |
No |
Requires semantics |
| Race Conditions |
No |
Requires runtime |
| DoS |
Partial |
Only obvious patterns |
10.3 Evasion
| Technique |
Effectiveness against mcp-scan |
| Simple obfuscation |
Partially effective |
| Encoding (base64, hex) |
Partially effective |
| Metaprogramming |
Very effective |
| Polyglots |
Effective |
| Time-based |
Completely effective |
11. Recommendations to Mitigate Limitations
mcp-scan (static)
+
CodeQL (deep analysis)
+
DAST (dynamic)
+
Manual Review (logic)
11.2 Priority Manual Review
- High severity findings - Always verify
- Low/medium confidence - Review context
- Classes A and G - Highest impact
- Code with metaprogramming - Not analyzed correctly
11.3 Recommended Configuration
# .mcp-scan.yaml
scan:
mode: deep # Inter-procedural when possible
codeql: true # Confirm with CodeQL
llm: true # Semantic analysis
ml:
threshold: 0.5 # More sensitive
rules:
# Add custom sanitizers
sanitizers:
- pattern: "my_sanitizer"
language: python
protects: [filesystem, exec]
11.4 Baseline for False Positives
# .mcp-scan-baseline.yaml
# Accepted findings
accepted:
- finding_id: "abc123..."
reason: "Sanitized by external WAF"
accepted_by: "security@example.com"
date: "2024-01-15"
12. Summary Table
| Feature |
FP Rate |
FN Rate |
Reliability |
| Taint (fast) |
Low |
High |
Medium |
| Taint (deep) |
Medium |
Medium |
High |
| Pattern Engine |
Medium |
Medium |
Medium |
| ML Classifier |
Medium |
Medium |
Medium |
| LLM Detector |
Low |
Medium |
High |
| CodeQL |
Very Low |
Medium |
Very High |
| Surface |
Low |
High |
Medium |
Legend:
- FP Rate: False positive rate
- FN Rate: False negative rate
- Reliability: General confidence in results
13. Improvement Roadmap
Planned
- Go support in taint analysis
- Custom sanitizers via configuration
- Better detection of dynamic decorators
- Strict mode with fewer FN
Under Investigation
- Jupyter notebook analysis
- Configuration vulnerability detection
- Integration with commercial SAST tools
14. Reporting Limitations
If you find an undocumented limitation:
- Verify it's not already documented here
- Create an issue in the repository with:
- Limitation description
- Example code
- Expected vs actual result
- Estimated impact
This document should be updated when new limitations are discovered or improvements are implemented.