IIP-7 Validator Scoring Mechanism - The Iota Improvement Proposal (IIP) Book

iip: 7
title: Validator Scoring Mechanism
description: An automated and standardized system for monitoring validator behavior and scores
author: Olivia Saa (@oliviasaa) , Andrew Cullen (@cyberphysic4l) 
discussions-to: https://github.com/iotaledger/IIPs/discussions/21
status: Draft
type: Standards Track
layer: Core
created: 2026-01-05
requires: None

Motivation
Validators are the backbone of the IOTA network. The performance of each validator directly influences the network’s overall efficiency and usability. Therefore, it is essential to have a reliable automated system for monitoring their behavior and performance.
Currently, a number of metrics are tracked for all validators which allows manual identification of performance issues, but the only mechanism for penalizing underperforming validators is a manual reporting system. This system requires a quorum of validators to explicitly report a misbehaving peer in order to reduce their rewards. Such an approach is impractical, as it demands continuous manual monitoring, which is an unreasonable burden for most operators. Moreover, no standard criteria exist to guide reporting decisions, resulting in inconsistent and arbitrary thresholds set independently by each validator.
We propose an automated and standardized system for monitoring validator behavior, culminating in a commonly agreed score that reflects each validator’s performance during an epoch. These scores could subsequently be used to directly modify the rewards distributed at the epoch’s end, but the details of reward adjustment are outside the scope of this improvement proposal.
Specification
Performance Metrics
Each validator will monitor its peers throughout an epoch, collecting performance metrics for every other validator. Regardless of the exact set of metrics used, they are divided into two categories: provable and unprovable metrics:

Unprovable metrics: These represent misbehaviours for which a validator cannot produce a proof. Examples include malformed blocks or the dissemination of invalid blocks. Validators will collect and disseminate counts for those unprovable metrics.
Provable metrics: These include signed but invalid blocks and equivocations. Validators should produce proofs of these behaviours throughout the epoch and disseminate them.

We propose to introduce a new ConsensusTransactionKind named MisbehaviorReport specifically for the propagation of both proofs of misbehaviors and the unprovable counts related to metrics collected throughout the epoch. Whenever a block containing a transaction of this type is committed, validators update their counts for provable misbehaviours and also store the sender’s view of the unprovable metrics. This type of transaction should be sent with a reasonable periodicity, so proofs don’t accumulate too much, but at the same time, without taking unnecessary block space. We propose to follow the same periodicity of the checkpoint creation.
Aggregating Metrics and Calculating Scores
At the end of each epoch, validators should aggregate the different perceptions of the committee about all unprovable metrics in a deterministic way. With this aggregation and the provable counts, they calculate a score for each validator.
Scores can be updated during the epoch according to a partial count of the validators’ misbehaviours for monitoring purposes. Furthermore, metric counts and the score itself are used by the protocol at the epoch end to adjust rewards. Thus we calculate scores also with the same periodicity as checkpoint creation.
When the very last checkpoint of the epoch is created, all validators share the same view of all MisbehaviorReport transactions from the epoch so a validator can safely and deterministically calculate a global score from the proofs and counts therein. We do not prescribe any specific form to the scoring function in this improvement proposal other than being a weighted combination of each of the metrics included in the MisbehaviorReport transactions.
Rationale
Performance Metrics
The categorization of metrics as provable or unprovable allows them to be treated differently by applying different weights to these behaviors in the scoring function. Unprovable metrics are highly gameable, and although we can reduce the impact of innaccurate reporting through aggregation of scores, unprovable metrics should not lead to severe penalties. Provable metrics, on the other hand, offer a reliable measurement of specific aspects of validator performance and potential malicious behaviour, and can therefore be weighted more heavily in a scoring function to provide stronger incentives against these misbehaviors.
The mechanisms for sharing metrics also differs between provable and unprovable misbehaviors. Unprovable metrics are entirely local to each validator, so counts of each misbehavior must be explicitly shared by validators and agreed upon through the consensus mechanism. Conversely, when using proofs for misbehaviors embedded in blocks, validators already have a common view of all provable metrics. Thus, there is no need to explicitly report any count or score related to provable metrics at the epoch end.
Aggregating Metrics and Calculating Scores
The rationale for aggregating metrics and calculating scores during checkpoint creation is that all scores calculated use globally agreed values. Furthermore, this timing of score calculation also coincides with the epoch change mechanism which calculates critical information for advancing to the next epoch. By calculating scores each checkpoint, we ensure all validators have the same scores calculated at the moment of epoch change which ensures scores can be used to modify epoch rewards as part of future protocol improvements.
Reference Implementation
An initial set of metrics has already been implemented in the iota repository, along with a simple scoring function that serves as a placeholder for a more complete version. This reference implementation is available in the (already merged) PR#7604 and PR#7921 The remaining components required to achieve consensus on an aggregated score are implementated in this PR#8521. An example of a scoring function can also be seen in this latter PR.
Backwards Compatibility
The introduction of a new type of consensus message is not backward compatible and must be implemented as a protocol upgrade enabling the new functionality. All other changes are either local to the node (as storing and counting metrics). Those local changes should not cause any node behaviour or agreement problems.
Keyboard shortcuts

The Iota Improvement Proposal (IIP) Book