MSSS Scoring System v2.0¶

The MCP Server Security Standard (MSSS) provides a quantitative security assessment for MCP servers.

Overview¶

MSSS v2.0 scoring produces:

Numeric Score (0-100): Higher = more secure
Compliance Level (0-3): Certification tier
Category Breakdown: Score per vulnerability class
Severity Multiplier: Compounding effect for HIGH/CRITICAL findings

What's New in v2.0¶

The v2.0 scoring model addresses the issue where multiple HIGH findings resulted in misleading scores:

Scenario	v1.0 Score	v2.0 Score
2 HIGH findings	~93 (Level 1)	~49 (Level 0)
1 CRITICAL	~90	~37

Key Changes: - Removed category-based penalty caps - Increased base penalties (CRITICAL: 25, HIGH: 15, MEDIUM: 5) - Added severity multiplier for compounding effect - Score now aligns with Level expectations

Compliance Levels¶

Level	Score Required	Findings Allowed	Use Case
0	< 60 or any critical or > 3 high	Any	Not compliant
1	>= 60	<= 3 high, 0 critical	Basic, OSS
2	>= 80	0 high, 0 critical	Pro/Enterprise
3	>= 90	0 high, 0 critical	Certified, Premium

Level Descriptions¶

Level 0: Not Compliant - Critical vulnerabilities present, OR - Score below 60, OR - More than 3 high-severity findings - Server should not be published or used in production - Requires immediate remediation

Level 1: Basic Compliance - Suitable for open-source publication - Basic security hygiene met - Some high-severity findings acceptable (up to 3) - Score >= 60

Level 2: Enterprise Ready - No high or critical findings - Score >= 80 - Suitable for enterprise/internal use

Level 3: Certified - Highest trust level - Score >= 90 - No high or critical findings - Deep mode analysis required

Score Calculation¶

Formula (v2.0 Hybrid Multiplicative)¶

FinalScore = max(0, 100 - DirectPenalties) × SeverityMultiplier

Step 1: Calculate Direct Penalties¶

Each finding contributes a penalty based on severity, confidence, and MCP context:

penalty = base_penalty × confidence_multiplier × mcp_multiplier

Base Penalties (v2.0):

Severity	Base Penalty
Critical	25.0
High	15.0
Medium	5.0
Low	1.0
Info	0.2

Confidence Multiplier:

Confidence	Multiplier
High	1.0
Medium	0.7
Low	0.4

MCP Context Multiplier:

Context	Multiplier
In MCP tool handler	1.3
Not in tool	1.0

Step 2: Calculate Severity Multiplier¶

The severity multiplier provides a compounding effect for multiple HIGH/CRITICAL findings:

For CRITICAL findings:

// 1 critical = 0.50
// 2 critical = 0.35
// 3+ critical = 0.25
multiplier = max(0.25, 0.50 - (criticalCount - 1) * 0.15)

For HIGH findings (if no CRITICAL):

// 1 high = 0.85
// 2 high = 0.70
// 3 high = 0.55
// 4+ high = 0.45
multiplier = max(0.45, 1.0 - highCount * 0.15)

No HIGH/CRITICAL: multiplier = 1.0

Step 3: Apply Formula¶

FinalScore = max(0, 100 - TotalPenalties) × SeverityMultiplier

Example Calculations¶

Example 1: Clean Server¶

Findings: None

Penalties = 0
Multiplier = 1.0
Score = (100 - 0) × 1.0 = 100
Level = 3

Example 2: One HIGH Finding¶

Findings: 1 HIGH, High confidence

Penalty = 15.0 × 1.0 × 1.0 = 15.0
Multiplier = 0.85 (1 high)
Score = (100 - 15) × 0.85 = 72.25
Level = 1 (score >= 60, 1 high <= 3)

Example 3: Two HIGH Findings¶

Findings: 2 HIGH, High confidence

Penalties = 15.0 + 15.0 = 30.0
Multiplier = 0.70 (2 high)
Score = (100 - 30) × 0.70 = 49.0
Level = 0 (score < 60)

Example 4: One CRITICAL Finding¶

Findings: 1 CRITICAL, High confidence

Penalty = 25.0 × 1.0 × 1.0 = 25.0
Multiplier = 0.50 (1 critical)
Score = (100 - 25) × 0.50 = 37.5
Level = 0 (critical present)

Example 5: Five MEDIUM Findings¶

Findings: 5 MEDIUM, High confidence

Penalties = 5.0 × 5 = 25.0
Multiplier = 1.0 (no high/critical)
Score = (100 - 25) × 1.0 = 75.0
Level = 1 (score >= 60, < 80, no high)

Example 6: Mixed Severity¶

Findings: 2 HIGH + 3 MEDIUM, High confidence

Penalties = (15.0 × 2) + (5.0 × 3) = 45.0
Multiplier = 0.70 (2 high)
Score = (100 - 45) × 0.70 = 38.5
Level = 0 (score < 60)

JSON Output¶

{
  "msss_score": {
    "total": 49.0,
    "level": 0,
    "compliant": false,
    "version": "2.0",
    "categories": {
      "A": {
        "score": 7.0,
        "max_score": 22.0,
        "findings": 1,
        "penalties": 15.0
      },
      "B": {
        "score": 0.0,
        "max_score": 13.0,
        "findings": 1,
        "penalties": 15.0
      }
    },
    "score_breakdown": {
      "base_score": 100.0,
      "total_penalties": 30.0,
      "severity_multiplier": 0.70,
      "critical_count": 0,
      "high_count": 2,
      "formula": "(100 - 30.0) × 0.70 = 49.0"
    }
  }
}

Improving Your Score¶

Quick Wins¶

Remove critical findings first - 0.50x multiplier per finding
Reduce high findings to <= 1 - Each high applies 0.15 multiplier reduction
Fix MCP tool vulnerabilities - 1.3x penalty multiplier

By Category¶

Category	Max Impact	Remediation Approach
A (RCE)	22.0	Use safe APIs, validate all input
B (Filesystem)	13.0	Normalize paths, use allowlists
E (Secrets)	10.0	Use environment variables, secret managers
C (SSRF)	10.0	Validate URLs, block internal IPs
D (SQLi)	10.0	Use parameterized queries
F (Auth)	8.0	Set secure flags, verify signatures
G (Tool Poisoning)	8.0	Review tool descriptions

Level Progression¶

To reach Level 1: - Fix all critical findings - Reduce high findings to <= 3 - Achieve score >= 60

To reach Level 2: - Fix all high and critical findings - Achieve score >= 80

To reach Level 3: - Fix all high and critical findings - Achieve score >= 90 - Run deep mode analysis

Baseline Handling¶

Baselined findings (accepted/false positives) are:

Not counted in score calculation
Listed separately in output
Tracked for audit purposes

{
  "msss_score": {
    "total": 95.0,
    "baselined_findings": 2,
    "note": "2 findings excluded via baseline"
  }
}

Comparison: v1.0 vs v2.0¶

Aspect	v1.0	v2.0
Formula	Sum of category penalties	Hybrid multiplicative
Category limits	Yes (capped penalties)	No (direct sum)
HIGH penalty	5.0	15.0
CRITICAL penalty	10.0	25.0
Compounding effect	Weak (0.1 per critical)	Strong (multiplier)
2 HIGH result	~93 (misleading)	~49 (accurate)
Score-Level alignment	Poor	Strong

Why the Change?¶

The v1.0 model had a fundamental issue: the score didn't reflect actual security posture.

Problem: 2 HIGH findings → Score 93 → "Almost perfect" Reality: 2 HIGH findings → Server has serious security issues

The v2.0 model ensures that: - HIGH/CRITICAL findings significantly impact the score - Multiple severe findings compound (not just add) - Score aligns with certification Level expectations - Security teams get accurate risk assessment