Motor de Deteccion por Patrones¶

Documento tecnico para analistas de seguridad

1. Introduccion¶

El Pattern Engine es el motor de deteccion basado en reglas que busca patrones especificos en el codigo fuente. A diferencia del taint analysis que rastrea flujo de datos, el pattern engine detecta construcciones de codigo conocidas como peligrosas mediante expresiones regulares y analisis de AST.

2. Arquitectura del Pattern Engine¶

2.1 Componentes¶

+------------------+
|  Pattern Engine  |
+------------------+
        |
        v
+------------------+
|     Rules        |
| +------------+   |
| | Rule 1     |   |
| | Rule 2     |   |
| | ...        |   |
| +------------+   |
+------------------+
        |
        v
+------------------+     +------------------+
|    Detectors     |     | AST + Surface    |
| - Regex-based    |<--->|   (input)        |
| - AST-based      |     |                  |
| - Hybrid         |     |                  |
+------------------+     +------------------+
        |
        v
+------------------+
|     Matches      |
+------------------+

2.2 Codigo Base¶

Ubicacion: internal/pattern/engine.go

type Engine struct {
    rules             []*Rule
    severityOverrides map[string]types.Severity
    disabledRules     map[string]bool
}

3. Estructura de una Regla¶

3.1 Definicion de Rule¶

type Rule struct {
    ID          string              // Identificador unico (MCP-X001)
    Class       types.VulnClass     // Clase de vulnerabilidad (A-N)
    Language    []types.Language    // Lenguajes aplicables
    Severity    types.Severity      // critical/high/medium/low/info
    Confidence  types.Confidence    // high/medium/low
    Description string              // Descripcion del problema
    Remediation string              // Como solucionarlo
    Detector    Detector            // Logica de deteccion
}

3.2 Convencion de IDs¶

El formato de ID sigue el patron MCP-X###:

Componente	Significado
`MCP-`	Prefijo del proyecto
`X`	Letra de clase (A-N)
`###`	Numero secuencial (001-999)

Ejemplos: - MCP-A003: Tercera regla de clase A (RCE) - MCP-B002: Segunda regla de clase B (Filesystem) - MCP-G001: Primera regla de clase G (Tool Poisoning)

4. Interface Detector¶

4.1 Definicion¶

type Detector interface {
    Detect(file *ast.File, surface *surface.MCPSurface) []Match
}

Todo detector debe implementar esta interface. Recibe: - file: AST parseado del archivo - surface: Superficie MCP extraida (tools, resources, etc.)

Retorna lista de Match encontrados.

4.2 Estructura de Match¶

type Match struct {
    Location    types.Location      // Posicion en el archivo
    Snippet     string              // Fragmento de codigo
    Context     string              // Contexto adicional
    Confidence  types.Confidence    // Puede override confianza de regla
    RuleID      string              // Puede override ID de regla
    Title       string              // Puede override titulo
    Description string              // Puede override descripcion
    Severity    types.Severity      // Puede override severidad
    Class       types.VulnClass     // Puede override clase
    Remediation string              // Puede override remediacion
    Evidence    Evidence            // Evidencia extendida
}

5. Tipos de Detectores¶

5.1 RegexDetector (Basado en Regex)¶

El detector mas simple, busca patrones con expresiones regulares:

type RegexDetector struct {
    Pattern *regexp.Regexp
}

func (d *RegexDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            if d.Pattern.MatchString(line) {
                matches = append(matches, Match{
                    Location: types.Location{
                        StartLine: lineNum + 1,
                    },
                    Snippet: strings.TrimSpace(line),
                })
            }
        }
    }
    return matches
}

Uso:

engine.AddCustomRule("CUSTOM-001", `os\.system\(`, types.SeverityCritical, ...)

5.2 AST-Based Detectors¶

Detectores que analizan la estructura del AST:

type DangerousFunctionDetector struct{}

var dangerousFunctions = map[string]bool{
    "eval":    true,
    "exec":    true,
    "compile": true,
}

func (d *DangerousFunctionDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    for _, fn := range file.Functions {
        for _, stmt := range fn.Body {
            if exprStmt, ok := stmt.(*ast.ExpressionStatement); ok {
                if call, ok := exprStmt.Expression.(*ast.Call); ok {
                    if dangerousFunctions[call.Function] {
                        matches = append(matches, Match{
                            Location: call.Location,
                            Snippet:  call.Function,
                        })
                    }
                }
            }
        }
    }
    return matches
}

5.3 Surface-Aware Detectors¶

Detectores que utilizan la superficie MCP:

type PromptInjectionDetector struct{}

var injectionMarkers = []string{
    "ignore previous",
    "disregard",
    "you are now",
    "act as",
}

func (d *PromptInjectionDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    var matches []Match

    if surf != nil {
        for _, tool := range surf.Tools {
            descLower := strings.ToLower(tool.Description)
            for _, marker := range injectionMarkers {
                if strings.Contains(descLower, marker) {
                    matches = append(matches, Match{
                        Location: tool.Location,
                        Snippet:  tool.Description,
                        Context:  "Tool: " + tool.Name,
                    })
                    break
                }
            }
        }
    }
    return matches
}

5.4 Hybrid Detectors (Regex + AST)¶

Detectores que combinan multiples tecnicas:

type DirectShellDetector struct{}

var shellPatterns = []*regexp.Regexp{
    regexp.MustCompile(`(?i)os\.system\s*\(`),
    regexp.MustCompile(`(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True`),
    regexp.MustCompile(`(?i)child_process\.exec\s*\(`),
}

func (d *DirectShellDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    // Primero intentar regex en raw content (mas preciso)
    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            for _, pattern := range shellPatterns {
                if pattern.MatchString(line) {
                    matches = append(matches, Match{
                        Location: types.Location{StartLine: lineNum + 1},
                        Snippet:  strings.TrimSpace(line),
                    })
                    break
                }
            }
        }
        return matches
    }

    // Fallback a AST si no hay raw content
    for _, fn := range file.Functions {
        // ... analisis AST
    }
    return matches
}

6. Reglas Implementadas¶

6.1 Clase A - RCE¶

Rule ID	Detector	Descripcion
MCP-A003	DirectShellDetector	Ejecucion directa de shell
MCP-A004	DangerousFunctionDetector	eval/exec/compile

Patrones de MCP-A003:

(?i)os\.system\s*\(
(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.run\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.Popen\s*\([^,]*,\s*shell\s*=\s*True
(?i)child_process\.exec\s*\(
(?i)child_process\.execSync\s*\(
(?i)execSync\s*\(

Patrones de MCP-A004:

\beval\s*\(
\bexec\s*\(
\bcompile\s*\(
\b__import__\s*\(
\bnew\s+Function\s*\(

6.2 Clase B - Filesystem¶

Rule ID	Detector	Descripcion
MCP-B002	PathTraversalPatternDetector	Patron de path traversal

Patrones:

\.\.\/
\.\.\\
%2e%2e%2f
%2e%2e/
\.\.%2f

6.3 Clase C - SSRF¶

Rule ID	Detector	Descripcion
MCP-C002	UnvalidatedURLDetector	URL sin validar

Funciones monitoreadas: - requests.get, requests.post, requests.put, requests.delete - fetch - axios.get, axios.post - http.get - urllib.request.urlopen

6.4 Clase D - SQLi¶

Rule ID	Detector	Descripcion
MCP-D002	SQLConcatDetector	Concatenacion SQL

Patrones:

(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\+.*
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*%s
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\$\{
(?i)f["'].*SELECT.*\{
(?i)execute\s*\(\s*["'].*\+

6.5 Clase E - Secrets¶

Rule ID	Detector	Descripcion
MCP-E001	HardcodedSecretDetector	Secretos hardcodeados
MCP-E002	SecretVariableDetector	Variables con nombres sospechosos
MCP-E005	SecretLoggingDetector	Logging de secretos

Patrones de MCP-E001:

(?i)(api[_-]?key|apikey)\s*[:=]\s*["']([A-Za-z0-9_\-]{20,})["']
(?i)(secret|password|passwd|pwd)\s*[:=]\s*["']([^"']{8,})["']
(?i)(token|auth[_-]?token)\s*[:=]\s*["']([A-Za-z0-9_\-\.]{20,})["']
(?i)(private[_-]?key)\s*[:=]\s*["']([^"']+)["']
(?i)bearer\s+[A-Za-z0-9_\-\.]{20,}
(?i)ghp_[A-Za-z0-9]{36}           # GitHub token
(?i)sk-[A-Za-z0-9]{48}            # OpenAI key
(?i)AKIA[A-Z0-9]{16}              # AWS access key
(?i)-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY

Patron de MCP-E002:

(?i)(api[_-]?key|secret|password|token|credential|auth)

6.6 Clase F - Auth/OAuth¶

Rule ID	Detector	Descripcion
MCP-F001	InsecureCookieDetector	Cookie sin Secure
MCP-F002	WeakJWTDetector	JWT debil
MCP-F003	OAuthStateDetector	OAuth sin state

Patrones de MCP-F002:

(?i)algorithms?\s*[=:]\s*\[?\s*["']none["']
(?i)verify\s*[=:]\s*False
(?i)verify\s*[=:]\s*false
(?i)options\s*:\s*\{[^}]*verify\s*:\s*false
(?i)ignoreExpiration\s*:\s*true

6.7 Clase G - Tool Poisoning¶

Rule ID	Detector	Descripcion
MCP-G001	PromptInjectionDetector	Injection en descripcion
MCP-G002	UnicodeDetector	Unicode sospechoso
MCP-G003	ToolShadowingDetector	Shadowing de herramientas

Marcadores de inyeccion (MCP-G001):

ignore previous
ignore all instructions
disregard
system prompt
you are now
act as
pretend to be
forget your instructions
new instructions
override
<important>
</important>
<hidden>
</hidden>
<system>
</system>
<instruction>
</instruction>
do not mention
do not reveal
secretly
must first
you must
include it in your response

Caracteres Unicode sospechosos (MCP-G002):

\u202E  - RTL override
\u202D  - LTR override
\u202C  - POP directional formatting
\u200B  - Zero-width space
\u200C  - Zero-width non-joiner
\u200D  - Zero-width joiner
\uFEFF  - BOM / Zero-width no-break space

Herramientas shadowed (MCP-G003):

shell, exec, run, execute, terminal
bash, sh, cmd, system, eval
python, node, npm, pip
curl, wget, sudo, admin, root

6.8 Clase N - Supply Chain¶

Rule ID	Detector	Descripcion
MCP-N001	LockfileDetector	Sin lockfile
MCP-N002	UntrustedDependencyDetector	Dependencia no confiable
MCP-N003	SuspiciousSetupDetector	Setup sospechoso

Patrones de MCP-N002:

(?i)git\+https?://
(?i)git\+ssh://
(?i)github\.com/[^/]+/[^/]+\.git
(?i)file://
(?i)http://   # non-HTTPS

Patrones de MCP-N003:

(?i)curl.*\|.*sh
(?i)wget.*\|.*sh
(?i)curl.*\|.*bash
(?i)wget.*\|.*bash
(?i)base64.*decode
(?i)reverse.*shell
(?i)nc\s+-e
(?i)netcat.*-e

7. Reglas Extendidas¶

7.1 Carga de Reglas¶

El engine carga multiples conjuntos de reglas:

func (e *Engine) loadRules() {
    e.LoadLifecycleRules()          // Clase L
    e.LoadHiddenNetworkRules()      // Clase M
    e.LoadExtendedInjectionRules()  // Clase G extendido
    e.LoadPromptFlowRules()         // Clase H
    e.LoadMLRules()                 // ML-based (Clase G)
    // ... reglas core
}

7.2 Reglas ML-Based¶

Integran el clasificador ML para tool poisoning:

type MLDetector struct {
    classifier ml.Classifier
    threshold  float64
}

func (d *MLDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    var matches []Match

    if surf != nil {
        for _, tool := range surf.Tools {
            result := d.classifier.Classify(tool.Description)
            if result.IsInjection && result.Probability >= d.threshold {
                matches = append(matches, Match{
                    Location:   tool.Location,
                    Snippet:    tool.Description,
                    Confidence: mapConfidence(result.Confidence),
                    Evidence: Evidence{
                        LLMAnalysis:   result.Reason,
                        LLMConfidence: result.Probability,
                        LLMCategory:   result.Category,
                    },
                })
            }
        }
    }
    return matches
}

7.3 Reglas LLM-Based¶

Utilizan LLM para analisis semantico:

type LLMDetector struct {
    detector *llm.Detector
}

func (d *LLMDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    // Analiza descripciones de tools con LLM
    // Retorna matches con Evidence.LLMAnalysis poblado
}

7.4 Reglas CodeQL-Based¶

Usan CodeQL para confirmacion secundaria:

type CodeQLDetector struct {
    client *codeql.Client
}

func (d *CodeQLDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    // Ejecuta queries CodeQL
    // Retorna matches con Evidence.CodeQLConfirmed = true
}

8. Configuracion de Reglas¶

8.1 Deshabilitar Reglas¶

# .mcp-scan.yaml
rules:
  disabled:
    - MCP-E001  # No buscar secretos hardcodeados
    - MCP-F001  # No verificar cookies

Implementacion:

func (e *Engine) SetDisabledRule(ruleID string) {
    e.disabledRules[ruleID] = true
}

func (e *Engine) IsRuleDisabled(ruleID string) bool {
    return e.disabledRules[ruleID]
}

8.2 Override de Severidad¶

rules:
  severity_overrides:
    MCP-A003: critical  # Elevar a critico
    MCP-E002: info      # Bajar a informativo

Implementacion:

func (e *Engine) SetSeverityOverride(ruleID string, severity types.Severity) {
    e.severityOverrides[ruleID] = severity
}

func (e *Engine) GetEffectiveSeverity(ruleID string, defaultSeverity types.Severity) types.Severity {
    if override, ok := e.severityOverrides[ruleID]; ok {
        return override
    }
    return defaultSeverity
}

8.3 Reglas Custom¶

rules:
  custom:
    - id: "CUSTOM-001"
      pattern: "dangerous_function\\("
      severity: high
      confidence: medium
      class: A
      description: "Uso de funcion peligrosa"
      remediation: "Usar funcion_segura en su lugar"
      languages:
        - python
        - javascript

Implementacion:

href="#__codelineno-31-1">func (e *Engine) AddCustomRule( id, pattern string, severity types.Severity, confidence types.Confidence, description, remediation string, languages []types.Language, class types.VulnClass, class="w"> error { compiledPattern, err := regexp.Compile(pattern) if err != nil { return fmt.Errorf("invalid regex: %w", err) } rule := &Rule{ ID: id, Class: class, Language: languages, Severity: severity, Confidence: confidence, Description: description, Remediation: remediation, Detector: &RegexDetector{Pattern: compiledPattern}, } e.rules = append(e.rules, rule) return nil }

9. Filtrado por Lenguaje¶

9.1 Reglas Language-Specific¶

e.rules = append(e.rules, &Rule{
    ID:       "MCP-A003",
    Language: []types.Language{types.Python},  // Solo Python
    Detector: &DirectShellDetector{},
})

9.2 Logica de Filtrado¶

func (e *Engine) AnalyzeFile(file *ast.File, surf *surface.MCPSurface) []types.Finding {
    var findings []types.Finding

    for _, rule := range e.rules {
        // Saltar reglas deshabilitadas
        if e.IsRuleDisabled(rule.ID) {
            continue
        }

        // Verificar filtro de lenguaje
        if len(rule.Language) > 0 {
            var langMatch bool
            for _, lang := range rule.Language {
                if lang == file.Language {
                    langMatch = true
                    break
                }
            }
            if !langMatch {
                continue  // No aplica a este lenguaje
            }
        }

        // Ejecutar detector
        matches := rule.Detector.Detect(file, surf)
        // ... procesar matches
    }
    return findings
}

10. Generacion de Findings¶

10.1 De Match a Finding¶

for _, match := range matches {
    match.Location.File = file.Path

    // Usar overrides de match si existen, sino usar defaults de regla
    ruleID := rule.ID
    if match.RuleID != "" {
        ruleID = match.RuleID
    }

    severity := e.GetEffectiveSeverity(ruleID, rule.Severity)
    if match.Severity != "" {
        severity = match.Severity
    }

    confidence := match.Confidence
    if confidence == "" {
        confidence = rule.Confidence
    }

    finding := types.Finding{
        RuleID:      ruleID,
        Severity:    severity,
        Confidence:  confidence,
        Class:       rule.Class,
        Language:    file.Language,
        Location:    match.Location,
        Evidence:    convertEvidence(match.Evidence),
        Description: rule.Description,
        Remediation: rule.Remediation,
    }
    finding.ID = finding.GenerateID()
    findings = append(findings, finding)
}

10.2 Generacion de ID Unico¶

func (f *Finding) GenerateID() string {
    // Componentes del ID
    data := fmt.Sprintf("%s|%s|%d|%s",
        f.RuleID,
        f.Location.File,
        f.Location.StartLine,
        f.Evidence.Snippet,
    )

    // SHA-256 truncado a 16 caracteres hex
    hash := sha256.Sum256([]byte(data))
    return hex.EncodeToString(hash[:])[:16]
}

11. Orden de Ejecucion de Reglas¶

11.1 Prioridad¶

Lifecycle Rules (L) - Cargadas primero
Hidden Network Rules (M) - Segundo
Extended Injection (G) - Tercero
Prompt Flow (H) - Cuarto
ML Rules (G) - Quinto
Core Rules (A-G, N) - Ultimo

11.2 Deduplicacion¶

Si multiples reglas detectan el mismo problema, se deduplica por ID:

func NormalizeFindings(findings []Finding) []Finding {
    seen := make(map[string]bool)
    var unique []Finding

    for _, f := range findings {
        if !seen[f.ID] {
            seen[f.ID] = true
            unique = append(unique, f)
        }
    }

    return unique
}

12. Limitaciones del Pattern Engine¶

12.1 Falsos Positivos¶

Regex demasiado amplio: Puede matchear codigo benigno
Codigo comentado: El regex no distingue comentarios
Strings: Patron en string literal != uso real
Dead code: Codigo nunca ejecutado

12.2 Falsos Negativos¶

Ofuscacion: ev + al() evita eval\(
Indirection: getattr(module, "system")(cmd)
Encoding: Base64/hex encoding del codigo
Alias: from os import system as s

12.3 Recomendaciones¶

Verificar manualmente hallazgos criticos
Combinar con taint analysis para mejor precision
Usar baseline para hallazgos conocidos/aceptados
Ajustar confianza de reglas segun contexto

Siguiente documento: clasificador-ml.md