Skip to content

Improved Features types#725

Draft
TobyBoyne wants to merge 13 commits intoexperimental-design:mainfrom
TobyBoyne:typing/feature-key-types
Draft

Improved Features types#725
TobyBoyne wants to merge 13 commits intoexperimental-design:mainfrom
TobyBoyne:typing/feature-key-types

Conversation

@TobyBoyne
Copy link
Collaborator

Motivation

With the latest release of ty==0.0.17, the type checker now errors for attribute access on union where some elements lack the attribute (as mentioned in #719). This means that code like below now raises a typing error:

inputs = Inputs(...)
for feature in inputs.get(CategoricalInput):
    print(feature.categories)
# >> ty check
#  Error: Attribute `categories` is not defined on `DiscreteInput`, `ContinuousInput` in union `DiscreteInput | CategoricalInput | ContinuousInput`

A short term fix is just to pin the version of ty. This PR presents two different possible approaches to fix this in the long term. Let me know what you think of them @jduerholt, and whether you would like me to go ahead with either/both of them :)

1. TypeGuards

We can use TypeGuard to filter down the possible features contained within a Features object. For example, this PR currently includes one for checking if all the features in an Inputs object are continuous. This can be used as below (also see the change in benchmarks.py for an example).

if Inputs.is_continuous(inputs):
    for feature in inputs.get():
        print(feature.lower_bound) 
# no ty check errors!

2. Overloads

We can also overload the get method on Features to specify the types of features in the containers. This approach should be a bit more seamless, since you won't need to call an extra function compared to approach (1). I've currently added an example implementing this for Outputs, with an example in naming_conventions.py.

for feature in inputs.get(CategoricalInput):
    # thanks to the overload, we can now infer the type of `inputs.get(CategoricalInput)` is `Inputs[CategoricalInput]`
    print(feature.categories)
# again, no ty check errors!

Have you read the Contributing Guidelines on pull requests?

Yes

Have you updated CHANGELOG.md?

Not yet

Test Plan

The number ty errors that are currently raised on the CI should go down.

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super interesting, I personally like the overload approach, despite not really understanding it :D

@bertiqwerty what do you think? I personally think that implementing the overload would be quite nice.

)
return clean_exp

@overload
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the overload two times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that's part of the specification for overloads (link). I think the idea is that the actual implementation of get should have no type hints, and all of the type hints go into the different overloads.

So the second overload is a fallback if the argument types don't match the first overload, which will happen if excludes is provided, since you can't express Intersection[GetIncludesT, ~GetExcludesT] in Python's type system (yet) so we need the fallback to be more generic!

@TobyBoyne
Copy link
Collaborator Author

TobyBoyne commented Feb 16, 2026

If you wanted to take this even further, you can also use Annotated to provide type hints for the keys themselves. For example, you can have continuous_input_key: Annotated[str, ContinuousInput]. This still behaves exactly like a string, but the extra context provided by Annotated lets you use it in discriminating overloads. For example:

class Feature: ...

class ContinuousFeature(Feature): ...

class CategoricalFeature(Feature): ...

features: dict[str, Feature]

def get_continuous_key() -> Annotated[str, ContinuousFeature]:
    # Dummy function to get a key with the correct annotation.
    # One could imagine Inputs.get_keys(includes=ContinuousInput) returns a similarly
    # annotated feature key
    return "x1"

@overload
def get_feature_from_key(key: Annotated[str, ContinuousFeature]) -> ContinuousFeature:
    ...

@overload
def get_feature_from_key(key: Annotated[str, CategoricalFeature]) -> CategoricalFeature:
    ...

def get_feature_from_key(key: str) -> Feature:
    return features[key]


key = get_continuous_key()
feature = get_feature_from_key(key)
reveal_type(feature) # ContinuousFeature

What do you think? It may be a bit overkill, but really we should only need to touch the features.py file. All of the annotations will (hopefully) nicely propagate out from calling methods on Inputs and Outputs.

@jduerholt
Copy link
Contributor

I like it and you are right we are only using it for the feature containers. It will definitely add some boiler plate code but will make typehints etc. much nicer. @bertiqwerty: should Toby go ahead with this? What do you think?

@TobyBoyne
Copy link
Collaborator Author

TobyBoyne commented Feb 17, 2026

I was digging into this a bit further, and it turns out the Annotated approach doesn't actually work - the type checkers see the two Annotated arguments as simply str, so it will always match the first one. Oh well! The proper approach would be to use something like NewType, but this may come with its own overhead. I will keep playing around with this!

Edit: The latest commit shows an example how custom key types might work. Unfortunately, I don't think it's really viable. If you're curious why I say that, it's because in my opinion two key things are missing from the Python Type system:

  1. Intersection types: we really want get_by_key(ContinuousFeatureKey) to have return type Intersection[InputT, ContinuousInput], but intersection types don't yet exist in Python.
  2. Union overloads: we have (key: ContinuousFeatureKey) -> ContinuousInput and (key: DiscreteFeatureKey) -> DiscreteInput, and a fallback (key: str) -> AnyInput. This means that any key will always match this last case, and so will always return AnyInput. We would like the fallback to be (key: Intersection[str, Not[ContinuousFeatureKey], Not[DiscreteFeatureKey], ...]) -> AnyInput, but again this isn't supported (and probably isn't very sound from a type hinting perspective.

My suggestion going forward would be to write code like Inputs.get(ContinuousInput).get_by_key(featkey), which means function calling will be a bit more verbose but the type hinting is more explicit. This can all be done with the original overload approach in this PR, and I will remove the NewType stuff.

@jduerholt
Copy link
Contributor

Hi @TobyBoyne, thanks for your efforts here, until this is ready, I will pin the ty version. To see if we get degradation with respect to this one.

@jduerholt
Copy link
Contributor

Hi @TobyBoyne, thanks for your efforts here, until this is ready, I will pin the ty version. To see if we get degradation with respect to the old one.

@jduerholt jduerholt mentioned this pull request Feb 19, 2026
"""
constraints = []
inputs = domain.inputs.get([ContinuousInput, DiscreteInput])
for c in domain.constraints.get(constraint):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty infers the type of c to be Constraint, whereas pyright correctly infers it to by LinearEqualityConstraint | LinearInequalityConstraint. Adding an extra .get([LinearEqualityConstraint, LinearInequalityConstraint]) here reveals @Todo with ty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants