Skip to content

Pattern Engine System

Overview

The pattern engine (internal/pattern/) provides rule-based vulnerability detection using AST analysis, regex patterns, and surface inspection. It's the primary detection mechanism for security issues that can be identified through static code patterns.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Pattern Engine                           │
├─────────────────────────────────────────────────────────────┤
│                                                               │
│  ┌───────────────┐    ┌───────────────┐    ┌─────────────┐  │
│  │    Rules      │    │   Detectors   │    │   Matches   │  │
│  │  (Registry)   │───▶│  (Interface)  │───▶│  (Results)  │  │
│  └───────────────┘    └───────────────┘    └─────────────┘  │
│         │                    │                    │          │
│         │                    │                    │          │
│         ▼                    ▼                    ▼          │
│  ┌───────────────┐    ┌───────────────┐    ┌─────────────┐  │
│  │ - ID          │    │ AST File      │    │ - Location  │  │
│  │ - Severity    │    │ MCP Surface   │    │ - Snippet   │  │
│  │ - Confidence  │    │ Raw Content   │    │ - Context   │  │
│  │ - Description │    │               │    │ - Confidence│  │
│  └───────────────┘    └───────────────┘    └─────────────┘  │
│                                                               │
└─────────────────────────────────────────────────────────────┘

Core Types

Rule

Represents a detection rule:

type Rule struct {
    ID          string           // Unique identifier (e.g., "MCP-A003")
    Class       types.VulnClass  // Vulnerability class (A-N)
    Language    []types.Language // Language filter (empty = all)
    Severity    types.Severity   // Critical, High, Medium, Low
    Confidence  types.Confidence // High, Medium, Low
    Description string           // Human-readable description
    Remediation string           // How to fix the issue
    Detector    Detector         // Implementation
}

Detector Interface

All detectors implement this interface:

type Detector interface {
    Detect(file *ast.File, surface *surface.MCPSurface) []Match
}

Match

Result of a detection:

type Match struct {
    Location   types.Location    // File path, line numbers
    Snippet    string            // Code snippet
    Context    string            // Additional context
    Confidence types.Confidence  // Detection confidence
}

Built-in Detectors

Class A - Remote Code Execution

DirectShellDetector (MCP-A003)

Detects direct shell command execution:

# Detected patterns:
os.system("command")
subprocess.call(..., shell=True)
subprocess.run(..., shell=True)
subprocess.Popen(..., shell=True)
child_process.exec(...)
child_process.execSync(...)

Regex patterns:

var shellPatterns = []*regexp.Regexp{
    regexp.MustCompile(`(?i)os\.system\s*\(`),
    regexp.MustCompile(`(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True`),
    regexp.MustCompile(`(?i)subprocess\.run\s*\([^,]*,\s*shell\s*=\s*True`),
    regexp.MustCompile(`(?i)subprocess\.Popen\s*\([^,]*,\s*shell\s*=\s*True`),
    regexp.MustCompile(`(?i)child_process\.exec\s*\(`),
    regexp.MustCompile(`(?i)child_process\.execSync\s*\(`),
    regexp.MustCompile(`(?i)execSync\s*\(`),
}

DangerousFunctionDetector (MCP-A004)

Detects use of eval, exec, compile:

# Detected functions:
eval(...)
exec(...)
compile(...)
__import__(...)
new Function(...)  # JavaScript

Class B - Filesystem

PathTraversalPatternDetector (MCP-B002)

Detects path traversal patterns:

# Detected patterns:
../
..\
%2e%2e%2f
%2e%2e/
..%2f

Class C - SSRF

UnvalidatedURLDetector (MCP-C002)

Detects URL construction without validation:

# Detected patterns:
requests.get(url_variable)
requests.post(host_variable)
fetch(endpoint)
axios.get(user_url)
urllib.request.urlopen(url)

Class D - SQL Injection

SQLConcatDetector (MCP-D002)

Detects SQL string concatenation:

# Detected patterns:
"SELECT * FROM users WHERE id=" + user_id
f"SELECT * FROM users WHERE id={user_id}"
`SELECT * FROM users WHERE id=${userId}`
execute("SELECT * FROM users WHERE id=" + id)

Regex patterns:

var sqlPatterns = []*regexp.Regexp{
    regexp.MustCompile(`(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\+.*`),
    regexp.MustCompile(`(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*%s`),
    regexp.MustCompile(`(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\$\{`),
    regexp.MustCompile(`(?i)f["'].*SELECT.*\{`),
    regexp.MustCompile(`(?i)execute\s*\(\s*["'].*\+`),
}

Class E - Secrets

HardcodedSecretDetector (MCP-E001)

Detects hardcoded secrets:

# Detected patterns:
api_key = "sk_live_..."
password = "secret123"
token = "ghp_xxxx..."
AKIA... (AWS access keys)
-----BEGIN PRIVATE KEY-----

Patterns: - API keys: api_key|apikey with 20+ char value - Secrets: secret|password|passwd|pwd with 8+ char value - Tokens: token|auth_token with 20+ char value - GitHub tokens: ghp_[A-Za-z0-9]{36} - OpenAI keys: sk-[A-Za-z0-9]{48} - AWS keys: AKIA[A-Z0-9]{16} - Private keys: -----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY

SecretVariableDetector (MCP-E002)

Detects variables that might contain secrets:

# Detected variable names:
api_key = ...
secret = ...
password = ...
token = ...
credential = ...
auth = ...

SecretLoggingDetector (MCP-E005)

Detects secrets being logged:

# Detected patterns:
print(api_key)
console.log(password)
logger.info(token)
logging.debug(secret)

Class F - Auth/OAuth

InsecureCookieDetector (MCP-F001)

Detects insecure cookie settings:

# Detected patterns:
set_cookie(...) without Secure flag
response.set_cookie(...)

WeakJWTDetector (MCP-F002)

Detects weak JWT configurations:

# Detected patterns:
algorithms = ['none']
verify = False
verify: false
ignoreExpiration: true

OAuthStateDetector (MCP-F003)

Detects missing OAuth state parameter:

# Detected patterns:
authorize_url(...) without state
oauth.get_authorize_url() without state

Class G - Tool Poisoning

PromptInjectionDetector (MCP-G001)

Detects prompt injection markers in tool descriptions:

Injection markers:

var injectionMarkers = []string{
    "ignore previous",
    "ignore all instructions",
    "disregard",
    "system prompt",
    "you are now",
    "act as",
    "pretend to be",
    "forget your instructions",
    "new instructions",
    "override",
    "<important>", "</important>",
    "<hidden>", "</hidden>",
    "<system>", "</system>",
    "<instruction>", "</instruction>",
    "<secret>", "</secret>",
    "[system]", "[hidden]", "[important]",
    "do not mention",
    "do not reveal",
    "without telling",
    "secretly",
    "must first",
    "you must",
    "include it in your response",
    "access the resource",
}

UnicodeDetector (MCP-G002)

Detects suspicious Unicode characters:

// Detected characters:
'\u202E' // RTL override
'\u202D' // LTR override
'\u202C' // Pop directional formatting
'\u200B' // Zero-width space
'\u200C' // Zero-width non-joiner
'\u200D' // Zero-width joiner
'\uFEFF' // Byte order mark
// Unicode combining marks

ToolShadowingDetector (MCP-G003)

Detects tools that shadow system commands:

var shadowedTools = map[string]bool{
    "shell":    true,
    "exec":     true,
    "run":      true,
    "execute":  true,
    "terminal": true,
    "bash":     true,
    "sh":       true,
    "cmd":      true,
    "system":   true,
    "eval":     true,
    "python":   true,
    "node":     true,
    "npm":      true,
    "pip":      true,
    "curl":     true,
    "wget":     true,
    "sudo":     true,
    "admin":    true,
    "root":     true,
}

Class N - Supply Chain

LockfileDetector (MCP-N001)

Checks for presence of lockfiles:

  • package-lock.json
  • yarn.lock
  • requirements.txt.lock
  • Pipfile.lock

UntrustedDependencyDetector (MCP-N002)

Detects dependencies from untrusted sources:

# Detected patterns:
git+https://...
git+ssh://...
github.com/.../....git
file://...
http://... (non-HTTPS)

SuspiciousSetupDetector (MCP-N003)

Detects suspicious commands in setup scripts:

# Detected patterns:
curl ... | sh
wget ... | sh
curl ... | bash
wget ... | bash
base64 ... decode
nc -e ...
netcat ... -e

Engine API

Creating the Engine

engine := pattern.New()

Adding Custom Rules

// Add programmatic rule
engine.AddRule(&pattern.Rule{
    ID:          "CUSTOM-001",
    Class:       types.ClassA,
    Severity:    types.SeverityHigh,
    Confidence:  types.ConfidenceHigh,
    Description: "Custom vulnerability detected",
    Remediation: "Fix the issue",
    Detector:    &CustomDetector{},
})

// Add regex-based rule
engine.AddCustomRule(
    "CUSTOM-002",                    // ID
    `dangerous_function\s*\(`,       // Pattern
    types.SeverityHigh,              // Severity
    types.ConfidenceMedium,          // Confidence
    "Dangerous function call",       // Description
    "Use safer alternative",         // Remediation
    []types.Language{types.Python},  // Languages
    types.ClassA,                    // Vulnerability class
)

Configuring Rules

// Override severity
engine.SetSeverityOverride("MCP-A003", types.SeverityCritical)

// Disable rule
engine.SetDisabledRule("MCP-E002")

// Check if disabled
if engine.IsRuleDisabled("MCP-E002") {
    // Rule is disabled
}

Running Analysis

// Analyze multiple files
findings := engine.Analyze(files, surface)

// Analyze single file
findings := engine.AnalyzeFile(file, surface)

Detection Process

Per-File Analysis

  1. Language filter: Skip rules that don't apply to file's language
  2. Disabled check: Skip disabled rules
  3. Detector execution: Run detector's Detect() method
  4. Location enrichment: Add file path to match locations
  5. Severity override: Apply any configured overrides
  6. Finding generation: Create typed findings with unique IDs

Detection Methods

Detectors use two primary methods:

1. Raw Content Scanning

func (d *MyDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            if pattern.MatchString(line) {
                matches = append(matches, Match{
                    Location: types.Location{
                        File:      file.Path,
                        StartLine: lineNum + 1,
                        EndLine:   lineNum + 1,
                    },
                    Snippet:    line,
                    Confidence: types.ConfidenceHigh,
                })
            }
        }
    }
    return matches
}

2. AST-Based Detection

func (d *MyDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    for _, fn := range file.Functions {
        for _, stmt := range fn.Body {
            if call, ok := stmt.(*ast.ExpressionStatement); ok {
                if callExpr, ok := call.Expression.(*ast.Call); ok {
                    if isDangerous(callExpr.Function) {
                        matches = append(matches, Match{
                            Location:   callExpr.Location,
                            Snippet:    callExpr.Function,
                            Confidence: types.ConfidenceHigh,
                        })
                    }
                }
            }
        }
    }
    return matches
}

3. Surface-Based Detection

func (d *MyDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    if surf != nil {
        for _, tool := range surf.Tools {
            if isSuspicious(tool.Description) {
                matches = append(matches, Match{
                    Location:   tool.Location,
                    Snippet:    tool.Description,
                    Context:    "Tool: " + tool.Name,
                    Confidence: types.ConfidenceHigh,
                })
            }
        }
    }
    return matches
}

Rule Categories by Load Function

LoadLifecycleRules (Class L)

Plugin lifecycle issues: - Insecure initialization - Missing cleanup - Resource leaks

LoadHiddenNetworkRules (Class M)

Hidden network activity: - Undocumented connections - Covert channels - Data exfiltration

LoadExtendedInjectionRules (Class G)

Extended prompt injection detection: - More injection patterns - Encoding evasion - Multi-language attacks

LoadPromptFlowRules (Class H)

Prompt flow analysis: - User input to LLM prompt - Declaration vs behavior mismatch

LoadMLRules (Class G)

ML-based detection: - Uses ml.Classifier for classification - See ML Classifier Documentation

Creating Custom Detectors

Regex-Based Detector

type RegexDetector struct {
    Pattern *regexp.Regexp
}

func (d *RegexDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            if d.Pattern.MatchString(line) {
                matches = append(matches, Match{
                    Location: types.Location{
                        File:      file.Path,
                        StartLine: lineNum + 1,
                        EndLine:   lineNum + 1,
                    },
                    Snippet:    strings.TrimSpace(line),
                    Confidence: types.ConfidenceMedium,
                })
            }
        }
    }

    return matches
}

Surface-Aware Detector

type MyToolDetector struct{}

func (d *MyToolDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    var matches []Match

    if surf == nil {
        return matches
    }

    for _, tool := range surf.Tools {
        // Check tool properties
        if tool.Name == "" {
            matches = append(matches, Match{
                Location:   tool.Location,
                Snippet:    "unnamed tool",
                Context:    "Tool must have a name",
                Confidence: types.ConfidenceHigh,
            })
        }

        // Check tool parameters
        for _, param := range tool.Parameters {
            if param.Type == "" {
                matches = append(matches, Match{
                    Location:   param.Location,
                    Snippet:    param.Name,
                    Context:    "Parameter missing type",
                    Confidence: types.ConfidenceMedium,
                })
            }
        }
    }

    return matches
}

Configuration

# mcp-scan.yaml
rules:
  # Disable specific rules
  disabled:
    - MCP-E002  # Secret variable names (too noisy)
    - MCP-N001  # Lockfile check

  # Override severities
  severity_overrides:
    MCP-A003: critical  # Promote shell execution
    MCP-E001: high      # Demote hardcoded secrets

  # Custom rules
  custom:
    - id: CUSTOM-001
      pattern: "dangerous_pattern"
      severity: high
      confidence: medium
      description: "Custom dangerous pattern"
      remediation: "Remove dangerous pattern"
      languages: [python, javascript]
      class: A

Performance

Detection Complexity

Detector Complexity Notes
Regex-based O(n*p) n=lines, p=patterns
AST-based O(n) n=AST nodes
Surface-based O(t) t=tools

Optimization Tips

  1. Use raw content scanning for simple patterns - faster than AST
  2. Limit regex complexity - avoid backtracking
  3. Short-circuit - return early when possible
  4. Cache compiled regex - compile once at init

Thread Safety

The engine is thread-safe for analysis:

// Safe: concurrent file analysis
var wg sync.WaitGroup
for _, file := range files {
    wg.Add(1)
    go func(f *ast.File) {
        defer wg.Done()
        findings := engine.AnalyzeFile(f, surface)
        // Process findings
    }(file)
}
wg.Wait()