Motor de Deteccion por Patrones¶
Documento tecnico para analistas de seguridad
1. Introduccion¶
El Pattern Engine es el motor de deteccion basado en reglas que busca patrones especificos en el codigo fuente. A diferencia del taint analysis que rastrea flujo de datos, el pattern engine detecta construcciones de codigo conocidas como peligrosas mediante expresiones regulares y analisis de AST.
2. Arquitectura del Pattern Engine¶
2.1 Componentes¶
+------------------+
| Pattern Engine |
+------------------+
|
v
+------------------+
| Rules |
| +------------+ |
| | Rule 1 | |
| | Rule 2 | |
| | ... | |
| +------------+ |
+------------------+
|
v
+------------------+ +------------------+
| Detectors | | AST + Surface |
| - Regex-based |<--->| (input) |
| - AST-based | | |
| - Hybrid | | |
+------------------+ +------------------+
|
v
+------------------+
| Matches |
+------------------+
2.2 Codigo Base¶
Ubicacion: internal/pattern/engine.go
type Engine struct {
rules []*Rule
severityOverrides map[string]types.Severity
disabledRules map[string]bool
}
3. Estructura de una Regla¶
3.1 Definicion de Rule¶
type Rule struct {
ID string // Identificador unico (MCP-X001)
Class types.VulnClass // Clase de vulnerabilidad (A-N)
Language []types.Language // Lenguajes aplicables
Severity types.Severity // critical/high/medium/low/info
Confidence types.Confidence // high/medium/low
Description string // Descripcion del problema
Remediation string // Como solucionarlo
Detector Detector // Logica de deteccion
}
3.2 Convencion de IDs¶
El formato de ID sigue el patron MCP-X###:
| Componente | Significado |
|---|---|
MCP- |
Prefijo del proyecto |
X |
Letra de clase (A-N) |
### |
Numero secuencial (001-999) |
Ejemplos:
- MCP-A003: Tercera regla de clase A (RCE)
- MCP-B002: Segunda regla de clase B (Filesystem)
- MCP-G001: Primera regla de clase G (Tool Poisoning)
4. Interface Detector¶
4.1 Definicion¶
Todo detector debe implementar esta interface. Recibe:
- file: AST parseado del archivo
- surface: Superficie MCP extraida (tools, resources, etc.)
Retorna lista de Match encontrados.
4.2 Estructura de Match¶
type Match struct {
Location types.Location // Posicion en el archivo
Snippet string // Fragmento de codigo
Context string // Contexto adicional
Confidence types.Confidence // Puede override confianza de regla
RuleID string // Puede override ID de regla
Title string // Puede override titulo
Description string // Puede override descripcion
Severity types.Severity // Puede override severidad
Class types.VulnClass // Puede override clase
Remediation string // Puede override remediacion
Evidence Evidence // Evidencia extendida
}
5. Tipos de Detectores¶
5.1 RegexDetector (Basado en Regex)¶
El detector mas simple, busca patrones con expresiones regulares:
type RegexDetector struct {
Pattern *regexp.Regexp
}
func (d *RegexDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
var matches []Match
if file.RawContent != "" {
lines := strings.Split(file.RawContent, "\n")
for lineNum, line := range lines {
if d.Pattern.MatchString(line) {
matches = append(matches, Match{
Location: types.Location{
StartLine: lineNum + 1,
},
Snippet: strings.TrimSpace(line),
})
}
}
}
return matches
}
Uso:
5.2 AST-Based Detectors¶
Detectores que analizan la estructura del AST:
type DangerousFunctionDetector struct{}
var dangerousFunctions = map[string]bool{
"eval": true,
"exec": true,
"compile": true,
}
func (d *DangerousFunctionDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
var matches []Match
for _, fn := range file.Functions {
for _, stmt := range fn.Body {
if exprStmt, ok := stmt.(*ast.ExpressionStatement); ok {
if call, ok := exprStmt.Expression.(*ast.Call); ok {
if dangerousFunctions[call.Function] {
matches = append(matches, Match{
Location: call.Location,
Snippet: call.Function,
})
}
}
}
}
}
return matches
}
5.3 Surface-Aware Detectors¶
Detectores que utilizan la superficie MCP:
type PromptInjectionDetector struct{}
var injectionMarkers = []string{
"ignore previous",
"disregard",
"you are now",
"act as",
}
func (d *PromptInjectionDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
var matches []Match
if surf != nil {
for _, tool := range surf.Tools {
descLower := strings.ToLower(tool.Description)
for _, marker := range injectionMarkers {
if strings.Contains(descLower, marker) {
matches = append(matches, Match{
Location: tool.Location,
Snippet: tool.Description,
Context: "Tool: " + tool.Name,
})
break
}
}
}
}
return matches
}
5.4 Hybrid Detectors (Regex + AST)¶
Detectores que combinan multiples tecnicas:
type DirectShellDetector struct{}
var shellPatterns = []*regexp.Regexp{
regexp.MustCompile(`(?i)os\.system\s*\(`),
regexp.MustCompile(`(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True`),
regexp.MustCompile(`(?i)child_process\.exec\s*\(`),
}
func (d *DirectShellDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
var matches []Match
// Primero intentar regex en raw content (mas preciso)
if file.RawContent != "" {
lines := strings.Split(file.RawContent, "\n")
for lineNum, line := range lines {
for _, pattern := range shellPatterns {
if pattern.MatchString(line) {
matches = append(matches, Match{
Location: types.Location{StartLine: lineNum + 1},
Snippet: strings.TrimSpace(line),
})
break
}
}
}
return matches
}
// Fallback a AST si no hay raw content
for _, fn := range file.Functions {
// ... analisis AST
}
return matches
}
6. Reglas Implementadas¶
6.1 Clase A - RCE¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-A003 | DirectShellDetector | Ejecucion directa de shell |
| MCP-A004 | DangerousFunctionDetector | eval/exec/compile |
Patrones de MCP-A003:
(?i)os\.system\s*\(
(?i)subprocess\.call\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.run\s*\([^,]*,\s*shell\s*=\s*True
(?i)subprocess\.Popen\s*\([^,]*,\s*shell\s*=\s*True
(?i)child_process\.exec\s*\(
(?i)child_process\.execSync\s*\(
(?i)execSync\s*\(
Patrones de MCP-A004:
6.2 Clase B - Filesystem¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-B002 | PathTraversalPatternDetector | Patron de path traversal |
Patrones:
6.3 Clase C - SSRF¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-C002 | UnvalidatedURLDetector | URL sin validar |
Funciones monitoreadas:
- requests.get, requests.post, requests.put, requests.delete
- fetch
- axios.get, axios.post
- http.get
- urllib.request.urlopen
6.4 Clase D - SQLi¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-D002 | SQLConcatDetector | Concatenacion SQL |
Patrones:
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\+.*
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*%s
(?i)(SELECT|INSERT|UPDATE|DELETE|DROP|CREATE|ALTER).*\$\{
(?i)f["'].*SELECT.*\{
(?i)execute\s*\(\s*["'].*\+
6.5 Clase E - Secrets¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-E001 | HardcodedSecretDetector | Secretos hardcodeados |
| MCP-E002 | SecretVariableDetector | Variables con nombres sospechosos |
| MCP-E005 | SecretLoggingDetector | Logging de secretos |
Patrones de MCP-E001:
(?i)(api[_-]?key|apikey)\s*[:=]\s*["']([A-Za-z0-9_\-]{20,})["']
(?i)(secret|password|passwd|pwd)\s*[:=]\s*["']([^"']{8,})["']
(?i)(token|auth[_-]?token)\s*[:=]\s*["']([A-Za-z0-9_\-\.]{20,})["']
(?i)(private[_-]?key)\s*[:=]\s*["']([^"']+)["']
(?i)bearer\s+[A-Za-z0-9_\-\.]{20,}
(?i)ghp_[A-Za-z0-9]{36} # GitHub token
(?i)sk-[A-Za-z0-9]{48} # OpenAI key
(?i)AKIA[A-Z0-9]{16} # AWS access key
(?i)-----BEGIN (RSA |EC |DSA |OPENSSH )?PRIVATE KEY
Patron de MCP-E002:
6.6 Clase F - Auth/OAuth¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-F001 | InsecureCookieDetector | Cookie sin Secure |
| MCP-F002 | WeakJWTDetector | JWT debil |
| MCP-F003 | OAuthStateDetector | OAuth sin state |
Patrones de MCP-F002:
(?i)algorithms?\s*[=:]\s*\[?\s*["']none["']
(?i)verify\s*[=:]\s*False
(?i)verify\s*[=:]\s*false
(?i)options\s*:\s*\{[^}]*verify\s*:\s*false
(?i)ignoreExpiration\s*:\s*true
6.7 Clase G - Tool Poisoning¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-G001 | PromptInjectionDetector | Injection en descripcion |
| MCP-G002 | UnicodeDetector | Unicode sospechoso |
| MCP-G003 | ToolShadowingDetector | Shadowing de herramientas |
Marcadores de inyeccion (MCP-G001):
ignore previous
ignore all instructions
disregard
system prompt
you are now
act as
pretend to be
forget your instructions
new instructions
override
<important>
</important>
<hidden>
</hidden>
<system>
</system>
<instruction>
</instruction>
do not mention
do not reveal
secretly
must first
you must
include it in your response
Caracteres Unicode sospechosos (MCP-G002):
\u202E - RTL override
\u202D - LTR override
\u202C - POP directional formatting
\u200B - Zero-width space
\u200C - Zero-width non-joiner
\u200D - Zero-width joiner
\uFEFF - BOM / Zero-width no-break space
Herramientas shadowed (MCP-G003):
shell, exec, run, execute, terminal
bash, sh, cmd, system, eval
python, node, npm, pip
curl, wget, sudo, admin, root
6.8 Clase N - Supply Chain¶
| Rule ID | Detector | Descripcion |
|---|---|---|
| MCP-N001 | LockfileDetector | Sin lockfile |
| MCP-N002 | UntrustedDependencyDetector | Dependencia no confiable |
| MCP-N003 | SuspiciousSetupDetector | Setup sospechoso |
Patrones de MCP-N002:
(?i)git\+https?://
(?i)git\+ssh://
(?i)github\.com/[^/]+/[^/]+\.git
(?i)file://
(?i)http:// # non-HTTPS
Patrones de MCP-N003:
(?i)curl.*\|.*sh
(?i)wget.*\|.*sh
(?i)curl.*\|.*bash
(?i)wget.*\|.*bash
(?i)base64.*decode
(?i)reverse.*shell
(?i)nc\s+-e
(?i)netcat.*-e
7. Reglas Extendidas¶
7.1 Carga de Reglas¶
El engine carga multiples conjuntos de reglas:
func (e *Engine) loadRules() {
e.LoadLifecycleRules() // Clase L
e.LoadHiddenNetworkRules() // Clase M
e.LoadExtendedInjectionRules() // Clase G extendido
e.LoadPromptFlowRules() // Clase H
e.LoadMLRules() // ML-based (Clase G)
// ... reglas core
}
7.2 Reglas ML-Based¶
Integran el clasificador ML para tool poisoning:
type MLDetector struct {
classifier ml.Classifier
threshold float64
}
func (d *MLDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
var matches []Match
if surf != nil {
for _, tool := range surf.Tools {
result := d.classifier.Classify(tool.Description)
if result.IsInjection && result.Probability >= d.threshold {
matches = append(matches, Match{
Location: tool.Location,
Snippet: tool.Description,
Confidence: mapConfidence(result.Confidence),
Evidence: Evidence{
LLMAnalysis: result.Reason,
LLMConfidence: result.Probability,
LLMCategory: result.Category,
},
})
}
}
}
return matches
}
7.3 Reglas LLM-Based¶
Utilizan LLM para analisis semantico:
type LLMDetector struct {
detector *llm.Detector
}
func (d *LLMDetector) Detect(file *ast.File, surf *surface.MCPSurface) []Match {
// Analiza descripciones de tools con LLM
// Retorna matches con Evidence.LLMAnalysis poblado
}
7.4 Reglas CodeQL-Based¶
Usan CodeQL para confirmacion secundaria:
type CodeQLDetector struct {
client *codeql.Client
}
func (d *CodeQLDetector) Detect(file *ast.File, _ *surface.MCPSurface) []Match {
// Ejecuta queries CodeQL
// Retorna matches con Evidence.CodeQLConfirmed = true
}
8. Configuracion de Reglas¶
8.1 Deshabilitar Reglas¶
# .mcp-scan.yaml
rules:
disabled:
- MCP-E001 # No buscar secretos hardcodeados
- MCP-F001 # No verificar cookies
Implementacion:
func (e *Engine) SetDisabledRule(ruleID string) {
e.disabledRules[ruleID] = true
}
func (e *Engine) IsRuleDisabled(ruleID string) bool {
return e.disabledRules[ruleID]
}
8.2 Override de Severidad¶
rules:
severity_overrides:
MCP-A003: critical # Elevar a critico
MCP-E002: info # Bajar a informativo
Implementacion:
func (e *Engine) SetSeverityOverride(ruleID string, severity types.Severity) {
e.severityOverrides[ruleID] = severity
}
func (e *Engine) GetEffectiveSeverity(ruleID string, defaultSeverity types.Severity) types.Severity {
if override, ok := e.severityOverrides[ruleID]; ok {
return override
}
return defaultSeverity
}
8.3 Reglas Custom¶
rules:
custom:
- id: "CUSTOM-001"
pattern: "dangerous_function\\("
severity: high
confidence: medium
class: A
description: "Uso de funcion peligrosa"
remediation: "Usar funcion_segura en su lugar"
languages:
- python
- javascript
Implementacion:
func (e *Engine) AddCustomRule(
id, pattern string,
severity types.Severity,
confidence types.Confidence,
description, remediation string,
languages []types.Language,
class types.VulnClass,
) error {
compiledPattern, err := regexp.Compile(pattern)
if err != nil {
return fmt.Errorf("invalid regex: %w", err)
}
rule := &Rule{
ID: id,
Class: class,
Language: languages,
Severity: severity,
Confidence: confidence,
Description: description,
Remediation: remediation,
Detector: &RegexDetector{Pattern: compiledPattern},
}
e.rules = append(e.rules, rule)
return nil
}
9. Filtrado por Lenguaje¶
9.1 Reglas Language-Specific¶
e.rules = append(e.rules, &Rule{
ID: "MCP-A003",
Language: []types.Language{types.Python}, // Solo Python
Detector: &DirectShellDetector{},
})
9.2 Logica de Filtrado¶
func (e *Engine) AnalyzeFile(file *ast.File, surf *surface.MCPSurface) []types.Finding {
var findings []types.Finding
for _, rule := range e.rules {
// Saltar reglas deshabilitadas
if e.IsRuleDisabled(rule.ID) {
continue
}
// Verificar filtro de lenguaje
if len(rule.Language) > 0 {
var langMatch bool
for _, lang := range rule.Language {
if lang == file.Language {
langMatch = true
break
}
}
if !langMatch {
continue // No aplica a este lenguaje
}
}
// Ejecutar detector
matches := rule.Detector.Detect(file, surf)
// ... procesar matches
}
return findings
}
10. Generacion de Findings¶
10.1 De Match a Finding¶
for _, match := range matches {
match.Location.File = file.Path
// Usar overrides de match si existen, sino usar defaults de regla
ruleID := rule.ID
if match.RuleID != "" {
ruleID = match.RuleID
}
severity := e.GetEffectiveSeverity(ruleID, rule.Severity)
if match.Severity != "" {
severity = match.Severity
}
confidence := match.Confidence
if confidence == "" {
confidence = rule.Confidence
}
finding := types.Finding{
RuleID: ruleID,
Severity: severity,
Confidence: confidence,
Class: rule.Class,
Language: file.Language,
Location: match.Location,
Evidence: convertEvidence(match.Evidence),
Description: rule.Description,
Remediation: rule.Remediation,
}
finding.ID = finding.GenerateID()
findings = append(findings, finding)
}
10.2 Generacion de ID Unico¶
func (f *Finding) GenerateID() string {
// Componentes del ID
data := fmt.Sprintf("%s|%s|%d|%s",
f.RuleID,
f.Location.File,
f.Location.StartLine,
f.Evidence.Snippet,
)
// SHA-256 truncado a 16 caracteres hex
hash := sha256.Sum256([]byte(data))
return hex.EncodeToString(hash[:])[:16]
}
11. Orden de Ejecucion de Reglas¶
11.1 Prioridad¶
- Lifecycle Rules (L) - Cargadas primero
- Hidden Network Rules (M) - Segundo
- Extended Injection (G) - Tercero
- Prompt Flow (H) - Cuarto
- ML Rules (G) - Quinto
- Core Rules (A-G, N) - Ultimo
11.2 Deduplicacion¶
Si multiples reglas detectan el mismo problema, se deduplica por ID:
func NormalizeFindings(findings []Finding) []Finding {
seen := make(map[string]bool)
var unique []Finding
for _, f := range findings {
if !seen[f.ID] {
seen[f.ID] = true
unique = append(unique, f)
}
}
return unique
}
12. Limitaciones del Pattern Engine¶
12.1 Falsos Positivos¶
- Regex demasiado amplio: Puede matchear codigo benigno
- Codigo comentado: El regex no distingue comentarios
- Strings: Patron en string literal != uso real
- Dead code: Codigo nunca ejecutado
12.2 Falsos Negativos¶
- Ofuscacion:
ev+al()evitaeval\( - Indirection:
getattr(module, "system")(cmd) - Encoding: Base64/hex encoding del codigo
- Alias:
from os import system as s
12.3 Recomendaciones¶
- Verificar manualmente hallazgos criticos
- Combinar con taint analysis para mejor precision
- Usar baseline para hallazgos conocidos/aceptados
- Ajustar confianza de reglas segun contexto
Siguiente documento: clasificador-ml.md