MSSS Scoring System v2.0¶
The MCP Server Security Standard (MSSS) provides a quantitative security assessment for MCP servers.
Overview¶
MSSS v2.0 scoring produces:
- Numeric Score (0-100): Higher = more secure
- Compliance Level (0-3): Certification tier
- Category Breakdown: Score per vulnerability class
- Severity Multiplier: Compounding effect for HIGH/CRITICAL findings
What's New in v2.0¶
The v2.0 scoring model addresses the issue where multiple HIGH findings resulted in misleading scores:
| Scenario | v1.0 Score | v2.0 Score |
|---|---|---|
| 2 HIGH findings | ~93 (Level 1) | ~49 (Level 0) |
| 1 CRITICAL | ~90 | ~37 |
Key Changes: - Removed category-based penalty caps - Increased base penalties (CRITICAL: 25, HIGH: 15, MEDIUM: 5) - Added severity multiplier for compounding effect - Score now aligns with Level expectations
Compliance Levels¶
| Level | Score Required | Findings Allowed | Use Case |
|---|---|---|---|
| 0 | < 60 or any critical or > 3 high | Any | Not compliant |
| 1 | >= 60 | <= 3 high, 0 critical | Basic, OSS |
| 2 | >= 80 | 0 high, 0 critical | Pro/Enterprise |
| 3 | >= 90 | 0 high, 0 critical | Certified, Premium |
Level Descriptions¶
Level 0: Not Compliant - Critical vulnerabilities present, OR - Score below 60, OR - More than 3 high-severity findings - Server should not be published or used in production - Requires immediate remediation
Level 1: Basic Compliance - Suitable for open-source publication - Basic security hygiene met - Some high-severity findings acceptable (up to 3) - Score >= 60
Level 2: Enterprise Ready - No high or critical findings - Score >= 80 - Suitable for enterprise/internal use
Level 3: Certified - Highest trust level - Score >= 90 - No high or critical findings - Deep mode analysis required
Score Calculation¶
Formula (v2.0 Hybrid Multiplicative)¶
Step 1: Calculate Direct Penalties¶
Each finding contributes a penalty based on severity, confidence, and MCP context:
Base Penalties (v2.0):
| Severity | Base Penalty |
|---|---|
| Critical | 25.0 |
| High | 15.0 |
| Medium | 5.0 |
| Low | 1.0 |
| Info | 0.2 |
Confidence Multiplier:
| Confidence | Multiplier |
|---|---|
| High | 1.0 |
| Medium | 0.7 |
| Low | 0.4 |
MCP Context Multiplier:
| Context | Multiplier |
|---|---|
| In MCP tool handler | 1.3 |
| Not in tool | 1.0 |
Step 2: Calculate Severity Multiplier¶
The severity multiplier provides a compounding effect for multiple HIGH/CRITICAL findings:
For CRITICAL findings:
// 1 critical = 0.50
// 2 critical = 0.35
// 3+ critical = 0.25
multiplier = max(0.25, 0.50 - (criticalCount - 1) * 0.15)
For HIGH findings (if no CRITICAL):
// 1 high = 0.85
// 2 high = 0.70
// 3 high = 0.55
// 4+ high = 0.45
multiplier = max(0.45, 1.0 - highCount * 0.15)
No HIGH/CRITICAL: multiplier = 1.0
Step 3: Apply Formula¶
Example Calculations¶
Example 1: Clean Server¶
Findings: None
Example 2: One HIGH Finding¶
Findings: 1 HIGH, High confidence
Penalty = 15.0 × 1.0 × 1.0 = 15.0
Multiplier = 0.85 (1 high)
Score = (100 - 15) × 0.85 = 72.25
Level = 1 (score >= 60, 1 high <= 3)
Example 3: Two HIGH Findings¶
Findings: 2 HIGH, High confidence
Penalties = 15.0 + 15.0 = 30.0
Multiplier = 0.70 (2 high)
Score = (100 - 30) × 0.70 = 49.0
Level = 0 (score < 60)
Example 4: One CRITICAL Finding¶
Findings: 1 CRITICAL, High confidence
Penalty = 25.0 × 1.0 × 1.0 = 25.0
Multiplier = 0.50 (1 critical)
Score = (100 - 25) × 0.50 = 37.5
Level = 0 (critical present)
Example 5: Five MEDIUM Findings¶
Findings: 5 MEDIUM, High confidence
Penalties = 5.0 × 5 = 25.0
Multiplier = 1.0 (no high/critical)
Score = (100 - 25) × 1.0 = 75.0
Level = 1 (score >= 60, < 80, no high)
Example 6: Mixed Severity¶
Findings: 2 HIGH + 3 MEDIUM, High confidence
Penalties = (15.0 × 2) + (5.0 × 3) = 45.0
Multiplier = 0.70 (2 high)
Score = (100 - 45) × 0.70 = 38.5
Level = 0 (score < 60)
JSON Output¶
{
"msss_score": {
"total": 49.0,
"level": 0,
"compliant": false,
"version": "2.0",
"categories": {
"A": {
"score": 7.0,
"max_score": 22.0,
"findings": 1,
"penalties": 15.0
},
"B": {
"score": 0.0,
"max_score": 13.0,
"findings": 1,
"penalties": 15.0
}
},
"score_breakdown": {
"base_score": 100.0,
"total_penalties": 30.0,
"severity_multiplier": 0.70,
"critical_count": 0,
"high_count": 2,
"formula": "(100 - 30.0) × 0.70 = 49.0"
}
}
}
Improving Your Score¶
Quick Wins¶
- Remove critical findings first - 0.50x multiplier per finding
- Reduce high findings to <= 1 - Each high applies 0.15 multiplier reduction
- Fix MCP tool vulnerabilities - 1.3x penalty multiplier
By Category¶
| Category | Max Impact | Remediation Approach |
|---|---|---|
| A (RCE) | 22.0 | Use safe APIs, validate all input |
| B (Filesystem) | 13.0 | Normalize paths, use allowlists |
| E (Secrets) | 10.0 | Use environment variables, secret managers |
| C (SSRF) | 10.0 | Validate URLs, block internal IPs |
| D (SQLi) | 10.0 | Use parameterized queries |
| F (Auth) | 8.0 | Set secure flags, verify signatures |
| G (Tool Poisoning) | 8.0 | Review tool descriptions |
Level Progression¶
To reach Level 1: - Fix all critical findings - Reduce high findings to <= 3 - Achieve score >= 60
To reach Level 2: - Fix all high and critical findings - Achieve score >= 80
To reach Level 3: - Fix all high and critical findings - Achieve score >= 90 - Run deep mode analysis
Baseline Handling¶
Baselined findings (accepted/false positives) are:
- Not counted in score calculation
- Listed separately in output
- Tracked for audit purposes
{
"msss_score": {
"total": 95.0,
"baselined_findings": 2,
"note": "2 findings excluded via baseline"
}
}
Comparison: v1.0 vs v2.0¶
| Aspect | v1.0 | v2.0 |
|---|---|---|
| Formula | Sum of category penalties | Hybrid multiplicative |
| Category limits | Yes (capped penalties) | No (direct sum) |
| HIGH penalty | 5.0 | 15.0 |
| CRITICAL penalty | 10.0 | 25.0 |
| Compounding effect | Weak (0.1 per critical) | Strong (multiplier) |
| 2 HIGH result | ~93 (misleading) | ~49 (accurate) |
| Score-Level alignment | Poor | Strong |
Why the Change?¶
The v1.0 model had a fundamental issue: the score didn't reflect actual security posture.
Problem: 2 HIGH findings → Score 93 → "Almost perfect" Reality: 2 HIGH findings → Server has serious security issues
The v2.0 model ensures that: - HIGH/CRITICAL findings significantly impact the score - Multiple severe findings compound (not just add) - Score aligns with certification Level expectations - Security teams get accurate risk assessment