Skip to content

Add a demand-based assessor model for instance-level performance prediction #41

@sangttruong

Description

@sangttruong

Recent work on general scales for AI evaluation introduces assessors: external models that predict whether a subject AI system will succeed on a particular task instance, without running the subject model on that instance. In the Nature paper “General scales unlock AI evaluation with explanatory and predictive power,” the strongest lightweight assessor uses an item-level demand vector as input and predicts the probability of success for a fixed subject model. It would be useful to add a PyTorch-native demand-based assessor that can predict response probabilities from structured item features, such as cognitive demand annotations, benchmark metadata, or other task-level descriptors.

Goal

Implement a model that predicts:

P(response = 1 | subject_idx, item_features)

where:

  • subject_idx identifies the model/system being evaluated
  • item_features contains demand annotations or other structured item-level features
  • the output is a calibrated probability of success for each subject-item query

This should support the use case where each subject model has its own response pattern, but the predictor generalizes across items through interpretable item features.

To address this issue, we should add a new model under torch_measure.models, such as:

DemandAssessor(
    n_subjects: int,
    item_feature_dim: int,
    subject_embedding_dim: int = 16,
    hidden_dim: int = 128,
    n_layers: int = 2,
    dropout: float = 0.0,
    device: str = "cpu",
)

The model should inherit from the existing Predictor abstraction and implement:

predict(query: dict[str, torch.Tensor]) -> torch.Tensor

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions