Known Limitations¶

Technical document for security analysts

1. Introduction¶

This document describes the known limitations of each mcp-scan feature. It is fundamental for security analysts to understand these limitations to:

Correctly interpret results
Identify areas that require manual review
Avoid false sense of security
Complement analysis with other tools

2. Parsing Limitations¶

2.1 Tree-sitter Parser¶

Limitation	Impact	Mitigation
Syntax errors stop parsing	Files with errors are not analyzed	Validate syntax beforehand
Macros not expanded	Generated code not analyzed	Analyze post-build code
Dynamic code ignored	`exec(code_string)` not parsed	Review manually

2.2 Languages¶

Language	Status	Specific Limitations
Python	Complete	Partial type hints
TypeScript	Complete	Simplified generics
JavaScript	Complete	Basic JSX
Go	Parsing only	No taint analysis

2.3 Unsupported Constructs¶

# Metaprogramming
@decorator_factory()  # Dynamic decorators
setattr(obj, "method", func)  # Dynamic attributes
exec("def tool(): pass")  # Generated code

# Dynamic imports
module = __import__(name)
importlib.import_module(name)

3. Taint Analysis Limitations¶

3.1 Intra-procedural Analysis (Fast Mode)¶

Limitation	Example	Impact
Doesn't cross functions	`f(user_input)` where `f` calls sink	FN
Doesn't follow returns	`x = get_input()` where get_input returns source	FN
Partial lambdas	Complex closures	FN/Imprecision

False Negative Example:

def process(data):
    os.system(data)  # Sink in another function

@server.tool()
def handler(cmd: str):
    process(cmd)  # Not detected in fast mode

3.2 Inter-procedural Analysis (Deep Mode)¶

Limitation	Description
Limited depth	Default: 3 levels of calls
No recursion support	Recursive functions may cause imprecision
Imprecise call graph	Virtual methods not resolved
Slower	10-100x slower than fast

3.3 Taint Propagation¶

Unsupported Case	Example
Implicit taint	`if secret == x: print("yes")`
Complex structures	`obj.nested.deep.value`
Collections	`list[index]` where index is variable
Alias	`alias = dangerous_func; alias(data)`

3.4 Sanitizers¶

Limitation	Impact
Only built-in catalog	Custom sanitizers not recognized
No effectiveness analysis	`strip()` is not a sanitizer but doesn't cause FP
Partial sanitization	`replace("..", "")` not recognized

False Positive Example:

def my_sanitizer(path):
    """Custom sanitizer not in catalog"""
    if ".." in path:
        raise ValueError("Invalid path")
    return path

@server.tool()
def read(path: str):
    safe_path = my_sanitizer(path)  # Not recognized as sanitizer
    return open(safe_path).read()   # Reported as vulnerable

4. Pattern Engine Limitations¶

4.1 Regex Detection¶

Limitation	Example	Impact
Doesn't understand context	`"os.system in docs"` in string	FP
Comments	`# os.system(cmd)` commented	FP
Multiline strings	Pattern split across lines	FN
Obfuscation	`os["system"](cmd)`	FN

4.2 AST Detection¶

Limitation	Description
Import aliases	`import os as o; o.system()` not detected
Indirect calls	`getattr(os, "system")(cmd)`
Custom decorators	Only fixed list recognized

4.3 Specific Rules¶

Rule	Limitation
MCP-E001 (Secrets)	Doesn't detect secrets in .env files
MCP-F002 (JWT)	Doesn't analyze configuration in JSON
MCP-G003 (Shadowing)	Fixed list of names

5. ML Classifier Limitations¶

5.1 Features¶

Limitation	Impact
Optimized for English	Other languages have worse recall
Fixed keywords	New techniques not covered
No semantic context	Legitimate "ignore" causes FP

5.2 Classifiers¶

Type	Limitation
RuleBased	Deterministic but rigid
Weighted	Requires trained weights
Ensemble	Slower, not always better

5.3 Categories¶

Category	Limitation
`jailbreak`	Only known patterns
`identity`	Legitimate role play causes FP
`delimiter`	New formats not covered

5.4 Error Examples¶

False Positive:

Description: "This tool helps you ignore duplicate files"
                                    ^^^^^^ triggers "ignore"

False Negative:

Description: "Desatienda las instrucciones previas"  # Spanish

6. LLM Detector Limitations¶

6.1 Dependencies¶

Limitation	Mitigation
Requires Ollama	Fallback to ML classifier
Model must be downloaded	Document setup
Network latency	Cache results

6.2 Model¶

Limitation	Impact
llama3.2:3b limited	Larger models are slower
Non-deterministic	Results may vary
Knowledge cutoff	Doesn't know new techniques

6.3 Prompt¶

Limitation	Description
English only	Fixed English prompt
5000 chars max	Truncation of long texts
No code context	Only sees description, not the handler

6.4 Reliability¶

Scenario	Behavior
Very short text	Always "benign"
Ambiguous text	Low confidence
New technique	May not detect

7. CodeQL Limitations¶

7.1 Requirements¶

Limitation	Mitigation
Separate installation	Document setup
Heavy (~2GB)	Optional in CI/CD
Long timeout	Configurable

7.2 Coverage¶

Language	Support	Notes
Python	Good	Complete query pack
JavaScript	Good	Includes TypeScript
Go	Good	Complete query pack
Others	No	Only those listed

7.3 Queries¶

Limitation	Description
Not MCP-aware	Doesn't understand @tool decorators
Generic queries	Not specific to tool poisoning
No category G	Doesn't detect prompt injection

7.4 Performance¶

Factor	Impact
DB creation	1-10 minutes
Analysis	5-30 minutes
Memory	2-8 GB RAM

8. MCP Surface Limitations¶

8.1 Tool Detection¶

Limitation	Example
Only known decorators	`@my_custom_tool` not detected
Imprecise heuristics	`tool_utils()` false positive
Without docstrings	Empty description

8.2 Transport Detection¶

Limitation	Description
Based on imports	If no import, "unknown"
No runtime verification	Doesn't confirm it actually uses that transport

8.3 SDKs¶

SDK	Limitation
Python Official	Good coverage
FastMCP	Basic
TypeScript	Partial
Custom	Not supported

9. MSSS Scoring Limitations¶

9.1 Weights¶

Limitation	Impact
Fixed weights	Not adjustable by context
No business context	SQLi in demo = SQLi in production
Simple cumulative	10 mediums != 1 critical

9.2 Levels¶

Limitation	Description
Binary	Critical present = Level 0 always
No gradation	5 criticals = 1 critical for level
No exceptions	Doesn't allow marking "accepted risk"

9.3 Context¶

Not Considered	Example
External mitigations	WAF that blocks SQLi
Environment	Dev vs Production
Exposure	Internal vs public

10. General Limitations¶

10.1 Static vs Dynamic Analysis¶

Aspect	Static (mcp-scan)	Dynamic (not supported)
Coverage	All code	Only executed paths
Precision	May have FP	Fewer FP
Recall	May have FN	Detects real behavior
Speed	Fast	Slow

10.2 Types of Vulnerabilities¶

Type	Detection	Notes
Injection (A,B,C,D)	Good	Taint analysis
Secrets (E)	Good	Pattern matching
Auth (F)	Partial	Known patterns
Tool Poisoning (G)	Good	ML + LLM
Logic Bugs	No	Requires semantics
Race Conditions	No	Requires runtime
DoS	Partial	Only obvious patterns

10.3 Evasion¶

Technique	Effectiveness against mcp-scan
Simple obfuscation	Partially effective
Encoding (base64, hex)	Partially effective
Metaprogramming	Very effective
Polyglots	Effective
Time-based	Completely effective

11. Recommendations to Mitigate Limitations¶

11.1 Combine Tools¶

mcp-scan (static)
    +
CodeQL (deep analysis)
    +
DAST (dynamic)
    +
Manual Review (logic)

11.2 Priority Manual Review¶

High severity findings - Always verify
Low/medium confidence - Review context
Classes A and G - Highest impact
Code with metaprogramming - Not analyzed correctly

11.3 Recommended Configuration¶

# .mcp-scan.yaml
scan:
  mode: deep           # Inter-procedural when possible
  codeql: true         # Confirm with CodeQL
  llm: true            # Semantic analysis

ml:
  threshold: 0.5       # More sensitive

rules:
  # Add custom sanitizers
  sanitizers:
    - pattern: "my_sanitizer"
      language: python
      protects: [filesystem, exec]

11.4 Baseline for False Positives¶

# .mcp-scan-baseline.yaml
# Accepted findings
accepted:
  - finding_id: "abc123..."
    reason: "Sanitized by external WAF"
    accepted_by: "security@example.com"
    date: "2024-01-15"

12. Summary Table¶

Feature	FP Rate	FN Rate	Reliability
Taint (fast)	Low	High	Medium
Taint (deep)	Medium	Medium	High
Pattern Engine	Medium	Medium	Medium
ML Classifier	Medium	Medium	Medium
LLM Detector	Low	Medium	High
CodeQL	Very Low	Medium	Very High
Surface	Low	High	Medium

Legend: - FP Rate: False positive rate - FN Rate: False negative rate - Reliability: General confidence in results

13. Improvement Roadmap¶

Planned¶

Go support in taint analysis
Custom sanitizers via configuration
Better detection of dynamic decorators
Strict mode with fewer FN

Under Investigation¶

Jupyter notebook analysis
Configuration vulnerability detection
Integration with commercial SAST tools

14. Reporting Limitations¶

If you find an undocumented limitation:

Verify it's not already documented here
Create an issue in the repository with:
Limitation description
Example code
Expected vs actual result
Estimated impact

This document should be updated when new limitations are discovered or improvements are implemented.