Skip to content

MSSS Scoring System v2.0

The MCP Server Security Standard (MSSS) provides a quantitative security assessment for MCP servers.

Overview

MSSS v2.0 scoring produces:

  1. Numeric Score (0-100): Higher = more secure
  2. Compliance Level (0-3): Certification tier
  3. Category Breakdown: Score per vulnerability class
  4. Severity Multiplier: Compounding effect for HIGH/CRITICAL findings

What's New in v2.0

The v2.0 scoring model addresses the issue where multiple HIGH findings resulted in misleading scores:

Scenario v1.0 Score v2.0 Score
2 HIGH findings ~93 (Level 1) ~49 (Level 0)
1 CRITICAL ~90 ~37

Key Changes: - Removed category-based penalty caps - Increased base penalties (CRITICAL: 25, HIGH: 15, MEDIUM: 5) - Added severity multiplier for compounding effect - Score now aligns with Level expectations

Compliance Levels

Level Score Required Findings Allowed Use Case
0 < 60 or any critical or > 3 high Any Not compliant
1 >= 60 <= 3 high, 0 critical Basic, OSS
2 >= 80 0 high, 0 critical Pro/Enterprise
3 >= 90 0 high, 0 critical Certified, Premium

Level Descriptions

Level 0: Not Compliant - Critical vulnerabilities present, OR - Score below 60, OR - More than 3 high-severity findings - Server should not be published or used in production - Requires immediate remediation

Level 1: Basic Compliance - Suitable for open-source publication - Basic security hygiene met - Some high-severity findings acceptable (up to 3) - Score >= 60

Level 2: Enterprise Ready - No high or critical findings - Score >= 80 - Suitable for enterprise/internal use

Level 3: Certified - Highest trust level - Score >= 90 - No high or critical findings - Deep mode analysis required

Score Calculation

Formula (v2.0 Hybrid Multiplicative)

FinalScore = max(0, 100 - DirectPenalties) × SeverityMultiplier

Step 1: Calculate Direct Penalties

Each finding contributes a penalty based on severity, confidence, and MCP context:

penalty = base_penalty × confidence_multiplier × mcp_multiplier

Base Penalties (v2.0):

Severity Base Penalty
Critical 25.0
High 15.0
Medium 5.0
Low 1.0
Info 0.2

Confidence Multiplier:

Confidence Multiplier
High 1.0
Medium 0.7
Low 0.4

MCP Context Multiplier:

Context Multiplier
In MCP tool handler 1.3
Not in tool 1.0

Step 2: Calculate Severity Multiplier

The severity multiplier provides a compounding effect for multiple HIGH/CRITICAL findings:

For CRITICAL findings:

// 1 critical = 0.50
// 2 critical = 0.35
// 3+ critical = 0.25
multiplier = max(0.25, 0.50 - (criticalCount - 1) * 0.15)

For HIGH findings (if no CRITICAL):

// 1 high = 0.85
// 2 high = 0.70
// 3 high = 0.55
// 4+ high = 0.45
multiplier = max(0.45, 1.0 - highCount * 0.15)

No HIGH/CRITICAL: multiplier = 1.0

Step 3: Apply Formula

FinalScore = max(0, 100 - TotalPenalties) × SeverityMultiplier

Example Calculations

Example 1: Clean Server

Findings: None

Penalties = 0
Multiplier = 1.0
Score = (100 - 0) × 1.0 = 100
Level = 3

Example 2: One HIGH Finding

Findings: 1 HIGH, High confidence

Penalty = 15.0 × 1.0 × 1.0 = 15.0
Multiplier = 0.85 (1 high)
Score = (100 - 15) × 0.85 = 72.25
Level = 1 (score >= 60, 1 high <= 3)

Example 3: Two HIGH Findings

Findings: 2 HIGH, High confidence

Penalties = 15.0 + 15.0 = 30.0
Multiplier = 0.70 (2 high)
Score = (100 - 30) × 0.70 = 49.0
Level = 0 (score < 60)

Example 4: One CRITICAL Finding

Findings: 1 CRITICAL, High confidence

Penalty = 25.0 × 1.0 × 1.0 = 25.0
Multiplier = 0.50 (1 critical)
Score = (100 - 25) × 0.50 = 37.5
Level = 0 (critical present)

Example 5: Five MEDIUM Findings

Findings: 5 MEDIUM, High confidence

Penalties = 5.0 × 5 = 25.0
Multiplier = 1.0 (no high/critical)
Score = (100 - 25) × 1.0 = 75.0
Level = 1 (score >= 60, < 80, no high)

Example 6: Mixed Severity

Findings: 2 HIGH + 3 MEDIUM, High confidence

Penalties = (15.0 × 2) + (5.0 × 3) = 45.0
Multiplier = 0.70 (2 high)
Score = (100 - 45) × 0.70 = 38.5
Level = 0 (score < 60)

JSON Output

{
  "msss_score": {
    "total": 49.0,
    "level": 0,
    "compliant": false,
    "version": "2.0",
    "categories": {
      "A": {
        "score": 7.0,
        "max_score": 22.0,
        "findings": 1,
        "penalties": 15.0
      },
      "B": {
        "score": 0.0,
        "max_score": 13.0,
        "findings": 1,
        "penalties": 15.0
      }
    },
    "score_breakdown": {
      "base_score": 100.0,
      "total_penalties": 30.0,
      "severity_multiplier": 0.70,
      "critical_count": 0,
      "high_count": 2,
      "formula": "(100 - 30.0) × 0.70 = 49.0"
    }
  }
}

Improving Your Score

Quick Wins

  1. Remove critical findings first - 0.50x multiplier per finding
  2. Reduce high findings to <= 1 - Each high applies 0.15 multiplier reduction
  3. Fix MCP tool vulnerabilities - 1.3x penalty multiplier

By Category

Category Max Impact Remediation Approach
A (RCE) 22.0 Use safe APIs, validate all input
B (Filesystem) 13.0 Normalize paths, use allowlists
E (Secrets) 10.0 Use environment variables, secret managers
C (SSRF) 10.0 Validate URLs, block internal IPs
D (SQLi) 10.0 Use parameterized queries
F (Auth) 8.0 Set secure flags, verify signatures
G (Tool Poisoning) 8.0 Review tool descriptions

Level Progression

To reach Level 1: - Fix all critical findings - Reduce high findings to <= 3 - Achieve score >= 60

To reach Level 2: - Fix all high and critical findings - Achieve score >= 80

To reach Level 3: - Fix all high and critical findings - Achieve score >= 90 - Run deep mode analysis

Baseline Handling

Baselined findings (accepted/false positives) are:

  1. Not counted in score calculation
  2. Listed separately in output
  3. Tracked for audit purposes
{
  "msss_score": {
    "total": 95.0,
    "baselined_findings": 2,
    "note": "2 findings excluded via baseline"
  }
}

Comparison: v1.0 vs v2.0

Aspect v1.0 v2.0
Formula Sum of category penalties Hybrid multiplicative
Category limits Yes (capped penalties) No (direct sum)
HIGH penalty 5.0 15.0
CRITICAL penalty 10.0 25.0
Compounding effect Weak (0.1 per critical) Strong (multiplier)
2 HIGH result ~93 (misleading) ~49 (accurate)
Score-Level alignment Poor Strong

Why the Change?

The v1.0 model had a fundamental issue: the score didn't reflect actual security posture.

Problem: 2 HIGH findings → Score 93 → "Almost perfect" Reality: 2 HIGH findings → Server has serious security issues

The v2.0 model ensures that: - HIGH/CRITICAL findings significantly impact the score - Multiple severe findings compound (not just add) - Score aligns with certification Level expectations - Security teams get accurate risk assessment