Skip to content

Motor de Deteccion por Patrones

Documento tecnico para analistas de seguridad


1. Introduccion

El Pattern Engine es el motor de deteccion basado en reglas que busca patrones especificos en el codigo fuente. A diferencia del taint analysis que rastrea flujo de datos, el pattern engine detecta construcciones de codigo conocidas como peligrosas mediante expresiones regulares y analisis de AST.


2. Arquitectura del Pattern Engine

2.1 Componentes

+------------------+
|  Pattern Engine  |
+------------------+
        |
        v
+------------------+
|     Rules        |
| +------------+   |
| | Rule 1     |   |
| | Rule 2     |   |
| | ...        |   |
| +------------+   |
+------------------+
        |
        v
+------------------+     +------------------+
|    Detectors     |     | AST + Surface    |
| - Regex-based    |<--->|   (input)        |
| - AST-based      |     |                  |
| - Hybrid         |     |                  |
+------------------+     +------------------+
        |
        v
+------------------+
|     Matches      |
+------------------+

2.2 Codigo Base

Ubicacion: internal/pattern/engine.go

type Engine struct {
    rules             []*Rule
    severityOverrides map[string]types.Severity
    disabledRules     map[string]bool
}

3. Estructura de una Regla

3.1 Definicion de Rule

type Rule struct {
    ID          string              // Identificador unico (MCP-X001)
    Class       types.VulnClass     // Clase de vulnerabilidad (A-N)
    Language    []types.Language    // Lenguajes aplicables
    Severity    types.Severity      // critical/high/medium/low/info
    Confidence  types.Confidence    // high/medium/low
    Description string              // Descripcion del problema
    Remediation string              // Como solucionarlo
    Detector    Detector            // Logica de deteccion
}

3.2 Convencion de IDs

El formato de ID sigue el patron MCP-X###:

Componente Significado
MCP- Prefijo del proyecto
X Letra de clase (A-N)
### Numero secuencial (001-999)

Ejemplos: - MCP-A003: Tercera regla de clase A (RCE) - MCP-B002: Segunda regla de clase B (Filesystem) - MCP-G001: Primera regla de clase G (Tool Poisoning)


4. Interface Detector

4.1 Definicion

type Detector interface {
    Detect(file *ast.File, surface *surface.MCPSurface) []Match
}

Todo detector debe implementar esta interface. Recibe: - file: AST parseado del archivo - surface: Superficie MCP extraida (tools, resources, etc.)

Retorna lista de Match encontrados.

4.2 Estructura de Match

type Match struct {
    Location    types.Location      // Posicion en el archivo
    Snippet     string              // Fragmento de codigo
    Context     string              // Contexto adicional
    Confidence  types.Confidence    // Puede override confianza de regla
    RuleID      string              // Puede override ID de regla
    Title       string              // Puede override titulo
    Description string              // Puede override descripcion
    Severity    types.Severity      // Puede override severidad
    Class       types.VulnClass     // Puede override clase
    Remediation string              // Puede override remediacion
    Evidence    Evidence            // Evidencia extendida
}

5. Tipos de Detectores

5.1 RegexDetector (Basado en Regex)

El detector mas simple, busca patrones con expresiones regulares:

type RegexDetector struct {
    Pattern *regexp.Regexp
}

func (d *RegexDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            if d.Pattern.MatchString(line) {
                matches = append(matches, Match{
                    Location: types.Location{
                        StartLine: lineNum + 1,
                    },
                    Snippet: strings.TrimSpace(line),
                })
            }
        }
    }
    return matches
}

Uso:

engine.AddCustomRule("CUSTOM-001", `os\.system\(`, types.SeverityCritical, ...)

5.2 AST-Based Detectors

Detectores que analizan la estructura del AST:

type DangerousFunctionDetector struct{}

var dangerousFunctions = map[string]bool{
    "eval":    true,
    "exec":    true,
    "compile": true,
}

func (d *DangerousFunctionDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    for _, fn := range file.Functions {
        for _, stmt := range fn.Body {
            if exprStmt, ok := stmt.(*ast.ExpressionStatement); ok {
                if call, ok := exprStmt.Expression.(*ast.Call); ok {
                    if dangerousFunctions[call.Function] {
                        matches = append(matches, Match{
                            Location: call.Location,
                            Snippet:  call.Function,
                        })
                    }
                }
            }
        }
    }
    return matches
}

5.3 Surface-Aware Detectors

Detectores que utilizan la superficie MCP:

type PromptInjectionDetector struct{}

var injectionMarkers = []string{
    "ignore previous",
    "disregard",
    "you are now",
    "act as",
}

func (d *PromptInjectionDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    var matches []Match

    if surf != nil {
        for _, tool := range surf.Tools {
            descLower := strings.ToLower(tool.Description)
            for _, marker := range injectionMarkers {
                if strings.Contains(descLower, marker) {
                    matches = append(matches, Match{
                        Location: tool.Location,
                        Snippet:  tool.Description,
                        Context:  "Tool: " + tool.Name,
                    })
                    break
                }
            }
        }
    }
    return matches
}

5.4 Hybrid Detectors (Regex + AST)

Detectores que combinan multiples tecnicas:

type DirectShellDetector struct{}

var shellPatterns = []*regexp.Regexp{
    regexp.MustCompile(`(?i)os\.system\s*\(`),
    regexp.MustCompile(`(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True`),
    regexp.MustCompile(`(?i)child_process\.exec\s*\(`),
}

func (d *DirectShellDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    var matches []Match

    // Primero intentar regex en raw content (mas preciso)
    if file.RawContent != "" {
        lines := strings.Split(file.RawContent, "\n")
        for lineNum, line := range lines {
            for _, pattern := range shellPatterns {
                if pattern.MatchString(line) {
                    matches = append(matches, Match{
                        Location: types.Location{StartLine: lineNum + 1},
                        Snippet:  strings.TrimSpace(line),
                    })
                    break
                }
            }
        }
        return matches
    }

    // Fallback a AST si no hay raw content
    for _, fn := range file.Functions {
        // ... analisis AST
    }
    return matches
}

6. Reglas Implementadas

6.1 Clase A - RCE

Rule ID Detector Descripcion
MCP-A003 DirectShellDetector Ejecucion directa de shell
MCP-A004 DangerousFunctionDetector eval/exec/compile

Patrones de MCP-A003:

(?i)os\.system\s*\(
(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.run\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.Popen\s*\([^,]*,\s*shell\s*=\s*True
(?i)child_process\.exec\s*\(
(?i)child_process\.execSync\s*\(
(?i)execSync\s*\(

Patrones de MCP-A004:

\beval\s*\(
\bexec\s*\(
\bcompile\s*\(
\b__import__\s*\(
\bnew\s+Function\s*\(

6.2 Clase B - Filesystem

Rule ID Detector Descripcion
MCP-B002 PathTraversalPatternDetector Patron de path traversal

Patrones:

\.\.\/
\.\.\\
%2e%2e%2f
%2e%2e/
\.\.%2f

6.3 Clase C - SSRF

Rule ID Detector Descripcion
MCP-C002 UnvalidatedURLDetector URL sin validar

Funciones monitoreadas: - requests.get, requests.post, requests.put, requests.delete - fetch - axios.get, axios.post - http.get - urllib.request.urlopen

6.4 Clase D - SQLi

Rule ID Detector Descripcion
MCP-D002 SQLConcatDetector Concatenacion SQL

Patrones:

(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\+.*
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*%s
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\$\{
(?i)f["'].*SELECT.*\{
(?i)execute\s*\(\s*["'].*\+

6.5 Clase E - Secrets

Rule ID Detector Descripcion
MCP-E001 HardcodedSecretDetector Secretos hardcodeados
MCP-E002 SecretVariableDetector Variables con nombres sospechosos
MCP-E005 SecretLoggingDetector Logging de secretos

Patrones de MCP-E001:

(?i)(api[_-]?key|apikey)\s*[:=]\s*["']([A-Za-z0-9_\-]{20,})["']
(?i)(secret|password|passwd|pwd)\s*[:=]\s*["']([^"']{8,})["']
(?i)(token|auth[_-]?token)\s*[:=]\s*["']([A-Za-z0-9_\-\.]{20,})["']
(?i)(private[_-]?key)\s*[:=]\s*["']([^"']+)["']
(?i)bearer\s+[A-Za-z0-9_\-\.]{20,}
(?i)ghp_[A-Za-z0-9]{36}           # GitHub token
(?i)sk-[A-Za-z0-9]{48}            # OpenAI key
(?i)AKIA[A-Z0-9]{16}              # AWS access key
(?i)-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY

Patron de MCP-E002:

(?i)(api[_-]?key|secret|password|token|credential|auth)

6.6 Clase F - Auth/OAuth

Rule ID Detector Descripcion
MCP-F001 InsecureCookieDetector Cookie sin Secure
MCP-F002 WeakJWTDetector JWT debil
MCP-F003 OAuthStateDetector OAuth sin state

Patrones de MCP-F002:

(?i)algorithms?\s*[=:]\s*\[?\s*["']none["']
(?i)verify\s*[=:]\s*False
(?i)verify\s*[=:]\s*false
(?i)options\s*:\s*\{[^}]*verify\s*:\s*false
(?i)ignoreExpiration\s*:\s*true

6.7 Clase G - Tool Poisoning

Rule ID Detector Descripcion
MCP-G001 PromptInjectionDetector Injection en descripcion
MCP-G002 UnicodeDetector Unicode sospechoso
MCP-G003 ToolShadowingDetector Shadowing de herramientas

Marcadores de inyeccion (MCP-G001):

ignore previous
ignore all instructions
disregard
system prompt
you are now
act as
pretend to be
forget your instructions
new instructions
override
<important>
</important>
<hidden>
</hidden>
<system>
</system>
<instruction>
</instruction>
do not mention
do not reveal
secretly
must first
you must
include it in your response

Caracteres Unicode sospechosos (MCP-G002):

\u202E  - RTL override
\u202D  - LTR override
\u202C  - POP directional formatting
\u200B  - Zero-width space
\u200C  - Zero-width non-joiner
\u200D  - Zero-width joiner
\uFEFF  - BOM / Zero-width no-break space

Herramientas shadowed (MCP-G003):

shell, exec, run, execute, terminal
bash, sh, cmd, system, eval
python, node, npm, pip
curl, wget, sudo, admin, root

6.8 Clase N - Supply Chain

Rule ID Detector Descripcion
MCP-N001 LockfileDetector Sin lockfile
MCP-N002 UntrustedDependencyDetector Dependencia no confiable
MCP-N003 SuspiciousSetupDetector Setup sospechoso

Patrones de MCP-N002:

(?i)git\+https?://
(?i)git\+ssh://
(?i)github\.com/[^/]+/[^/]+\.git
(?i)file://
(?i)http://   # non-HTTPS

Patrones de MCP-N003:

(?i)curl.*\|.*sh
(?i)wget.*\|.*sh
(?i)curl.*\|.*bash
(?i)wget.*\|.*bash
(?i)base64.*decode
(?i)reverse.*shell
(?i)nc\s+-e
(?i)netcat.*-e


7. Reglas Extendidas

7.1 Carga de Reglas

El engine carga multiples conjuntos de reglas:

func (e *Engine) loadRules() {
    e.LoadLifecycleRules()          // Clase L
    e.LoadHiddenNetworkRules()      // Clase M
    e.LoadExtendedInjectionRules()  // Clase G extendido
    e.LoadPromptFlowRules()         // Clase H
    e.LoadMLRules()                 // ML-based (Clase G)
    // ... reglas core
}

7.2 Reglas ML-Based

Integran el clasificador ML para tool poisoning:

type MLDetector struct {
    classifier ml.Classifier
    threshold  float64
}

func (d *MLDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    var matches []Match

    if surf != nil {
        for _, tool := range surf.Tools {
            result := d.classifier.Classify(tool.Description)
            if result.IsInjection && result.Probability >= d.threshold {
                matches = append(matches, Match{
                    Location:   tool.Location,
                    Snippet:    tool.Description,
                    Confidence: mapConfidence(result.Confidence),
                    Evidence: Evidence{
                        LLMAnalysis:   result.Reason,
                        LLMConfidence: result.Probability,
                        LLMCategory:   result.Category,
                    },
                })
            }
        }
    }
    return matches
}

7.3 Reglas LLM-Based

Utilizan LLM para analisis semantico:

type LLMDetector struct {
    detector *llm.Detector
}

func (d *LLMDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
    // Analiza descripciones de tools con LLM
    // Retorna matches con Evidence.LLMAnalysis poblado
}

7.4 Reglas CodeQL-Based

Usan CodeQL para confirmacion secundaria:

type CodeQLDetector struct {
    client *codeql.Client
}

func (d *CodeQLDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
    // Ejecuta queries CodeQL
    // Retorna matches con Evidence.CodeQLConfirmed = true
}

8. Configuracion de Reglas

8.1 Deshabilitar Reglas

# .mcp-scan.yaml
rules:
  disabled:
    - MCP-E001  # No buscar secretos hardcodeados
    - MCP-F001  # No verificar cookies

Implementacion:

func (e *Engine) SetDisabledRule(ruleID string) {
    e.disabledRules[ruleID] = true
}

func (e *Engine) IsRuleDisabled(ruleID string) bool {
    return e.disabledRules[ruleID]
}

8.2 Override de Severidad

rules:
  severity_overrides:
    MCP-A003: critical  # Elevar a critico
    MCP-E002: info      # Bajar a informativo

Implementacion:

func (e *Engine) SetSeverityOverride(ruleID string, severity types.Severity) {
    e.severityOverrides[ruleID] = severity
}

func (e *Engine) GetEffectiveSeverity(ruleID string, defaultSeverity types.Severity) types.Severity {
    if override, ok := e.severityOverrides[ruleID]; ok {
        return override
    }
    return defaultSeverity
}

8.3 Reglas Custom

rules:
  custom:
    - id: "CUSTOM-001"
      pattern: "dangerous_function\\("
      severity: high
      confidence: medium
      class: A
      description: "Uso de funcion peligrosa"
      remediation: "Usar funcion_segura en su lugar"
      languages:
        - python
        - javascript

Implementacion:

func (e *Engine) AddCustomRule(
    id, pattern string,
    severity types.Severity,
    confidence types.Confidence,
    description, remediation string,
    languages []types.Language,
    class types.VulnClass,
) error {
    compiledPattern, err := regexp.Compile(pattern)
    if err != nil {
        return fmt.Errorf("invalid regex: %w", err)
    }

    rule := &Rule{
        ID:          id,
        Class:       class,
        Language:    languages,
        Severity:    severity,
        Confidence:  confidence,
        Description: description,
        Remediation: remediation,
        Detector:    &RegexDetector{Pattern: compiledPattern},
    }
    e.rules = append(e.rules, rule)
    return nil
}


9. Filtrado por Lenguaje

9.1 Reglas Language-Specific

e.rules = append(e.rules, &Rule{
    ID:       "MCP-A003",
    Language: []types.Language{types.Python},  // Solo Python
    Detector: &DirectShellDetector{},
})

9.2 Logica de Filtrado

func (e *Engine) AnalyzeFile(file *ast.File, surf *surface.MCPSurface) []types.Finding {
    var findings []types.Finding

    for _, rule := range e.rules {
        // Saltar reglas deshabilitadas
        if e.IsRuleDisabled(rule.ID) {
            continue
        }

        // Verificar filtro de lenguaje
        if len(rule.Language) > 0 {
            var langMatch bool
            for _, lang := range rule.Language {
                if lang == file.Language {
                    langMatch = true
                    break
                }
            }
            if !langMatch {
                continue  // No aplica a este lenguaje
            }
        }

        // Ejecutar detector
        matches := rule.Detector.Detect(file, surf)
        // ... procesar matches
    }
    return findings
}

10. Generacion de Findings

10.1 De Match a Finding

for _, match := range matches {
    match.Location.File = file.Path

    // Usar overrides de match si existen, sino usar defaults de regla
    ruleID := rule.ID
    if match.RuleID != "" {
        ruleID = match.RuleID
    }

    severity := e.GetEffectiveSeverity(ruleID, rule.Severity)
    if match.Severity != "" {
        severity = match.Severity
    }

    confidence := match.Confidence
    if confidence == "" {
        confidence = rule.Confidence
    }

    finding := types.Finding{
        RuleID:      ruleID,
        Severity:    severity,
        Confidence:  confidence,
        Class:       rule.Class,
        Language:    file.Language,
        Location:    match.Location,
        Evidence:    convertEvidence(match.Evidence),
        Description: rule.Description,
        Remediation: rule.Remediation,
    }
    finding.ID = finding.GenerateID()
    findings = append(findings, finding)
}

10.2 Generacion de ID Unico

func (f *Finding) GenerateID() string {
    // Componentes del ID
    data := fmt.Sprintf("%s|%s|%d|%s",
        f.RuleID,
        f.Location.File,
        f.Location.StartLine,
        f.Evidence.Snippet,
    )

    // SHA-256 truncado a 16 caracteres hex
    hash := sha256.Sum256([]byte(data))
    return hex.EncodeToString(hash[:])[:16]
}

11. Orden de Ejecucion de Reglas

11.1 Prioridad

  1. Lifecycle Rules (L) - Cargadas primero
  2. Hidden Network Rules (M) - Segundo
  3. Extended Injection (G) - Tercero
  4. Prompt Flow (H) - Cuarto
  5. ML Rules (G) - Quinto
  6. Core Rules (A-G, N) - Ultimo

11.2 Deduplicacion

Si multiples reglas detectan el mismo problema, se deduplica por ID:

func NormalizeFindings(findings []Finding) []Finding {
    seen := make(map[string]bool)
    var unique []Finding

    for _, f := range findings {
        if !seen[f.ID] {
            seen[f.ID] = true
            unique = append(unique, f)
        }
    }

    return unique
}

12. Limitaciones del Pattern Engine

12.1 Falsos Positivos

  1. Regex demasiado amplio: Puede matchear codigo benigno
  2. Codigo comentado: El regex no distingue comentarios
  3. Strings: Patron en string literal != uso real
  4. Dead code: Codigo nunca ejecutado

12.2 Falsos Negativos

  1. Ofuscacion: ev + al() evita eval\(
  2. Indirection: getattr(module, "system")(cmd)
  3. Encoding: Base64/hex encoding del codigo
  4. Alias: from os import system as s

12.3 Recomendaciones

  1. Verificar manualmente hallazgos criticos
  2. Combinar con taint analysis para mejor precision
  3. Usar baseline para hallazgos conocidos/aceptados
  4. Ajustar confianza de reglas segun contexto

Siguiente documento: clasificador-ml.md