Skip to content

Known Limitations

Technical document for security analysts


1. Introduction

This document describes the known limitations of each mcp-scan feature. It is fundamental for security analysts to understand these limitations to:

  • Correctly interpret results
  • Identify areas that require manual review
  • Avoid false sense of security
  • Complement analysis with other tools

2. Parsing Limitations

2.1 Tree-sitter Parser

Limitation Impact Mitigation
Syntax errors stop parsing Files with errors are not analyzed Validate syntax beforehand
Macros not expanded Generated code not analyzed Analyze post-build code
Dynamic code ignored exec(code_string) not parsed Review manually

2.2 Languages

Language Status Specific Limitations
Python Complete Partial type hints
TypeScript Complete Simplified generics
JavaScript Complete Basic JSX
Go Parsing only No taint analysis

2.3 Unsupported Constructs

# Metaprogramming
@decorator_factory()  # Dynamic decorators
setattr(obj, "method", func)  # Dynamic attributes
exec("def tool(): pass")  # Generated code

# Dynamic imports
module = __import__(name)
importlib.import_module(name)

3. Taint Analysis Limitations

3.1 Intra-procedural Analysis (Fast Mode)

Limitation Example Impact
Doesn't cross functions f(user_input) where f calls sink FN
Doesn't follow returns x = get_input() where get_input returns source FN
Partial lambdas Complex closures FN/Imprecision

False Negative Example:

def process(data):
    os.system(data)  # Sink in another function

@server.tool()
def handler(cmd: str):
    process(cmd)  # Not detected in fast mode

3.2 Inter-procedural Analysis (Deep Mode)

Limitation Description
Limited depth Default: 3 levels of calls
No recursion support Recursive functions may cause imprecision
Imprecise call graph Virtual methods not resolved
Slower 10-100x slower than fast

3.3 Taint Propagation

Unsupported Case Example
Implicit taint if secret == x: print("yes")
Complex structures obj.nested.deep.value
Collections list[index] where index is variable
Alias alias = dangerous_func; alias(data)

3.4 Sanitizers

Limitation Impact
Only built-in catalog Custom sanitizers not recognized
No effectiveness analysis strip() is not a sanitizer but doesn't cause FP
Partial sanitization replace("..", "") not recognized

False Positive Example:

def my_sanitizer(path):
    """Custom sanitizer not in catalog"""
    if ".." in path:
        raise ValueError("Invalid path")
    return path

@server.tool()
def read(path: str):
    safe_path = my_sanitizer(path)  # Not recognized as sanitizer
    return open(safe_path).read()   # Reported as vulnerable


4. Pattern Engine Limitations

4.1 Regex Detection

Limitation Example Impact
Doesn't understand context "os.system in docs" in string FP
Comments # os.system(cmd) commented FP
Multiline strings Pattern split across lines FN
Obfuscation os["system"](cmd) FN

4.2 AST Detection

Limitation Description
Import aliases import os as o; o.system() not detected
Indirect calls getattr(os, "system")(cmd)
Custom decorators Only fixed list recognized

4.3 Specific Rules

Rule Limitation
MCP-E001 (Secrets) Doesn't detect secrets in .env files
MCP-F002 (JWT) Doesn't analyze configuration in JSON
MCP-G003 (Shadowing) Fixed list of names

5. ML Classifier Limitations

5.1 Features

Limitation Impact
Optimized for English Other languages have worse recall
Fixed keywords New techniques not covered
No semantic context Legitimate "ignore" causes FP

5.2 Classifiers

Type Limitation
RuleBased Deterministic but rigid
Weighted Requires trained weights
Ensemble Slower, not always better

5.3 Categories

Category Limitation
jailbreak Only known patterns
identity Legitimate role play causes FP
delimiter New formats not covered

5.4 Error Examples

False Positive:

Description: "This tool helps you ignore duplicate files"
                                    ^^^^^^ triggers "ignore"

False Negative:

Description: "Desatienda las instrucciones previas"  # Spanish


6. LLM Detector Limitations

6.1 Dependencies

Limitation Mitigation
Requires Ollama Fallback to ML classifier
Model must be downloaded Document setup
Network latency Cache results

6.2 Model

Limitation Impact
llama3.2:3b limited Larger models are slower
Non-deterministic Results may vary
Knowledge cutoff Doesn't know new techniques

6.3 Prompt

Limitation Description
English only Fixed English prompt
5000 chars max Truncation of long texts
No code context Only sees description, not the handler

6.4 Reliability

Scenario Behavior
Very short text Always "benign"
Ambiguous text Low confidence
New technique May not detect

7. CodeQL Limitations

7.1 Requirements

Limitation Mitigation
Separate installation Document setup
Heavy (~2GB) Optional in CI/CD
Long timeout Configurable

7.2 Coverage

Language Support Notes
Python Good Complete query pack
JavaScript Good Includes TypeScript
Go Good Complete query pack
Others No Only those listed

7.3 Queries

Limitation Description
Not MCP-aware Doesn't understand @tool decorators
Generic queries Not specific to tool poisoning
No category G Doesn't detect prompt injection

7.4 Performance

Factor Impact
DB creation 1-10 minutes
Analysis 5-30 minutes
Memory 2-8 GB RAM

8. MCP Surface Limitations

8.1 Tool Detection

Limitation Example
Only known decorators @my_custom_tool not detected
Imprecise heuristics tool_utils() false positive
Without docstrings Empty description

8.2 Transport Detection

Limitation Description
Based on imports If no import, "unknown"
No runtime verification Doesn't confirm it actually uses that transport

8.3 SDKs

SDK Limitation
Python Official Good coverage
FastMCP Basic
TypeScript Partial
Custom Not supported

9. MSSS Scoring Limitations

9.1 Weights

Limitation Impact
Fixed weights Not adjustable by context
No business context SQLi in demo = SQLi in production
Simple cumulative 10 mediums != 1 critical

9.2 Levels

Limitation Description
Binary Critical present = Level 0 always
No gradation 5 criticals = 1 critical for level
No exceptions Doesn't allow marking "accepted risk"

9.3 Context

Not Considered Example
External mitigations WAF that blocks SQLi
Environment Dev vs Production
Exposure Internal vs public

10. General Limitations

10.1 Static vs Dynamic Analysis

Aspect Static (mcp-scan) Dynamic (not supported)
Coverage All code Only executed paths
Precision May have FP Fewer FP
Recall May have FN Detects real behavior
Speed Fast Slow

10.2 Types of Vulnerabilities

Type Detection Notes
Injection (A,B,C,D) Good Taint analysis
Secrets (E) Good Pattern matching
Auth (F) Partial Known patterns
Tool Poisoning (G) Good ML + LLM
Logic Bugs No Requires semantics
Race Conditions No Requires runtime
DoS Partial Only obvious patterns

10.3 Evasion

Technique Effectiveness against mcp-scan
Simple obfuscation Partially effective
Encoding (base64, hex) Partially effective
Metaprogramming Very effective
Polyglots Effective
Time-based Completely effective

11. Recommendations to Mitigate Limitations

11.1 Combine Tools

mcp-scan (static)
    +
CodeQL (deep analysis)
    +
DAST (dynamic)
    +
Manual Review (logic)

11.2 Priority Manual Review

  1. High severity findings - Always verify
  2. Low/medium confidence - Review context
  3. Classes A and G - Highest impact
  4. Code with metaprogramming - Not analyzed correctly
# .mcp-scan.yaml
scan:
  mode: deep           # Inter-procedural when possible
  codeql: true         # Confirm with CodeQL
  llm: true            # Semantic analysis

ml:
  threshold: 0.5       # More sensitive

rules:
  # Add custom sanitizers
  sanitizers:
    - pattern: "my_sanitizer"
      language: python
      protects: [filesystem, exec]

11.4 Baseline for False Positives

# .mcp-scan-baseline.yaml
# Accepted findings
accepted:
  - finding_id: "abc123..."
    reason: "Sanitized by external WAF"
    accepted_by: "security@example.com"
    date: "2024-01-15"

12. Summary Table

Feature FP Rate FN Rate Reliability
Taint (fast) Low High Medium
Taint (deep) Medium Medium High
Pattern Engine Medium Medium Medium
ML Classifier Medium Medium Medium
LLM Detector Low Medium High
CodeQL Very Low Medium Very High
Surface Low High Medium

Legend: - FP Rate: False positive rate - FN Rate: False negative rate - Reliability: General confidence in results


13. Improvement Roadmap

Planned

  • Go support in taint analysis
  • Custom sanitizers via configuration
  • Better detection of dynamic decorators
  • Strict mode with fewer FN

Under Investigation

  • Jupyter notebook analysis
  • Configuration vulnerability detection
  • Integration with commercial SAST tools

14. Reporting Limitations

If you find an undocumented limitation:

  1. Verify it's not already documented here
  2. Create an issue in the repository with:
  3. Limitation description
  4. Example code
  5. Expected vs actual result
  6. Estimated impact

This document should be updated when new limitations are discovered or improvements are implemented.