Add a demand-based assessor model for instance-level performance prediction

Recent work on general scales for AI evaluation introduces **assessors**: external models that predict whether a subject AI system will succeed on a particular task instance, without running the subject model on that instance. In the Nature paper “General scales unlock AI evaluation with explanatory and predictive power,” the strongest lightweight assessor uses an item-level demand vector as input and predicts the probability of success for a fixed subject model. It would be useful to add a PyTorch-native demand-based assessor that can predict response probabilities from structured item features, such as cognitive demand annotations, benchmark metadata, or other task-level descriptors.

## Goal

Implement a model that predicts:

```text
P(response = 1 | subject_idx, item_features)
```

where:

* `subject_idx` identifies the model/system being evaluated
* `item_features` contains demand annotations or other structured item-level features
* the output is a calibrated probability of success for each subject-item query

This should support the use case where each subject model has its own response pattern, but the predictor generalizes across items through interpretable item features.

To address this issue, we should add a new model under `torch_measure.models`, such as:

```python
DemandAssessor(
    n_subjects: int,
    item_feature_dim: int,
    subject_embedding_dim: int = 16,
    hidden_dim: int = 128,
    n_layers: int = 2,
    dropout: float = 0.0,
    device: str = "cpu",
)
```

The model should inherit from the existing `Predictor` abstraction and implement:

```python
predict(query: dict[str, torch.Tensor]) -> torch.Tensor
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a demand-based assessor model for instance-level performance prediction #41

Goal

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add a demand-based assessor model for instance-level performance prediction #41

Description

Goal

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions