Implement `clean_team_abbrs()` #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

mrcaseb wants to merge 17 commits into main from seb/data

Member

mrcaseb commented Nov 23, 2025 •

edited

Loading

next part of #36

Tests proof that it works

mrcaseb and others added 14 commits

November 20, 2025 13:49


          Add mappings as parquet from nflreadr

48a94d0


          I think this is required idk

b34e907


          Implement functions to read datasets

963ac21


          news bullet

aac5ada


          Do I need this for export

2c2ea4a


          Test dataset

2d5ca08


          now it's actually exported

262a8ac


          add player_name_mapping as well

7213c2d


          update docs

d3b3d4e


          ruff format

25fd81f


          bump version

7b9fef5


          bump version in uv.lock

70c12be


          Implement clean_team_abbrs()

43c953e


          Merge branch 'main' into seb/data

accd0ad

mrcaseb requested a review from tanho63

November 23, 2025 16:16

mrcaseb added 3 commits

November 23, 2025 17:22


          add bullet item in readme

bd00b04


          add function to docs

809a0e5


          make sure this remains one item

bcd9a0b

guidopetri reviewed

View reviewed changes

guidopetri left a comment

hey seb, @tanho63 asked me to look over this so i did. i probably went overboard so i apologize lol.

only one real actionable comment, the rest are non actionable / nits, so feel free to ignore. thanks for contributing :)

src/nflreadpy/utils_name_cleaning.py

    
                      abbr: a string or list of strings of abbreviations, full team names, or team nicknames.

                      current_location: If `True` (the default), the abbreviation of the most recent team

                          location will be used.

                      keep_non_matches: If `TRUE` (the default) an element of `abbr` that can't

guidopetri Nov 26, 2025

nit: small typo (True)

src/nflreadpy/utils_name_cleaning.py

    
                      A string list with the length of `abbr` and cleaned team abbreviations\

                      if they are included in `team_abbr_mapping()` or `team_abbr_mapping_norelocate()`\

                      (depending on the value of `current_location`). Non matches may be replaced\

                      with `Nome` (depending on the value of `keep_non_matches`).

guidopetri Nov 26, 2025

nit: another typo (None)

src/nflreadpy/utils_name_cleaning.py

    
                          Otherwise it will be replaced with `None`.

                  Returns:

                      A string list with the length of `abbr` and cleaned team abbreviations\

guidopetri Nov 26, 2025

nit: you don't need backslashes at the end of the lines here

Member Author

mrcaseb Nov 26, 2025

I actually do need those linebreakers for the rendering of the documentation website.

guidopetri Nov 28, 2025

Ah gotcha. I didn't look at the docs website!

src/nflreadpy/utils_name_cleaning.py

Comment on lines +30 to +31

    
                  if isinstance(abbr, str):

                      abbr = abbr.split()

guidopetri Nov 26, 2025

no action: Not particularly pythonic, but given that we are aiming for feature parity with the R version, this is.... fine

Member Author

mrcaseb Nov 26, 2025

I will actually change this. It was stupid as I just wanted to make strings a list of strings.

guidopetri Nov 28, 2025

I don't think it's stupid! If we are aiming for feature parity with R then we should have the same kind of inputs allowed. Now, whether this should be allowed at all (in both the R and python version of this function), well. That's a point we can talk about :P

src/nflreadpy/utils_name_cleaning.py

    
                  if isinstance(abbr, str):

                      abbr = abbr.split()

                  # error if abbr is no list

guidopetri Nov 26, 2025

nit: a lot of these comments are extraneous and can be removed (e.g. all the ones until L45 can be summarized by "arg validation" or even skipped entirely imo)

Member Author

mrcaseb Nov 26, 2025 •

edited

Loading

I agree for most of these comments the code would be self-explanatory but since Tan and me aren't particularly experienced python coders I think we need some hints for future us.

guidopetri Nov 28, 2025

Very fair

src/nflreadpy/utils_name_cleaning.py

Comment on lines +54 to +55

    
                  # mapping is a polars df. Convert it to a dictionary. We could do the conversion with

                  # a polars join but the code below is just easier to read.

guidopetri Nov 26, 2025

no action: imo, preferable to do this as a polars operation if you expect that people will use long lists of abbr, since it will be more performant that way. But I also think this is fine as is

Member Author

mrcaseb Nov 26, 2025

I had it in polars first and hated the syntax. But now I think I might have to do it because it will be easier to make it work in a polars workflow, I.e. in map_elements or map_batches

src/nflreadpy/utils_name_cleaning.py

Comment on lines +73 to +77

    
                  # out dropped nonmatches. We replace the None values here if the user wants to keep them

                  if keep_non_matches is True:

                      for index, item in enumerate(out):

                          if item is None:

                              out[index] = abbr[index]

guidopetri Nov 26, 2025

Instead of replacing the dropped non-matches here, maybe something better would be in L59 above:

if keep_non_matches:  # (also preferable to just use "implicit" comparison here, especially since you've validated that this is already a bool
  out = [map_dict.get(key.upper(), key) for key in abbr]  # use the default value of `key` if it does not find a match
else:
  out = [map_dict.get(key.upper()) for key in abbr]  # same as what you have in L59

Member Author

mrcaseb Nov 26, 2025

This is so much better lol.

Still requires another loop to identify non-matches in the original abbr list but I prefer this solution.

src/nflreadpy/utils_name_cleaning.py

    
                  map_dict = dict(mapping.iter_rows())

                  # lookup with .get method because it replaces nonmatches with a default value (None)

                  out = [map_dict.get(key.upper()) for key in abbr]

guidopetri Nov 26, 2025

nit: out is not a very descriptive variable name

src/nflreadpy/utils_name_cleaning.py

    
              def clean_team_abbrs(

                  abbr: str | list[str], current_location: bool = True, keep_non_matches: bool = True

              ) -> list:

guidopetri Nov 26, 2025

nit: this returns a list[str] specifically

Member Author

mrcaseb Nov 26, 2025

How would I approach a usecase where I want my function to work both inside polars and independently of polars?

It would have to handle str, list[str] and pl.Series as inputs, which isn't particularly hard to do. But it would also require to return either str (e.g. in map_elements), list[str], and pl.Series (maybe this could also be a list).

So the return type would depend on the input type. Which I believe isn't pythonic?

guidopetri Nov 28, 2025

I don't think it's very pythonic no. But I've also definitely seen functions in large python libraries that do this too so I don't think it's too bad if you want to do that.

The way that I'd define it in that case is actually using the overload syntax, where you would essentially define 3 function "headers" (one for each input type, returning the same output type) and then in your "actual" function you would type it with the 3 input types / returning the 3 input types (let me know if this isn't clear, sorry). Depending on the python version you're using / targeting it might also be possible to use typing generics? though I haven't tried that myself.

tests/test_integration.py

Comment on lines +95 to +97

    
                      new_abbr = nfl.clean_team_abbrs(x)

                      new_abbr_drop = nfl.clean_team_abbrs(x, keep_non_matches=False)

                      old_abbr = nfl.clean_team_abbrs(x, current_location=False)

guidopetri Nov 26, 2025

nit: possibly could use pytest.parametrize() here instead to pass in the args / expected output and have this test be shorter/simpler, but also incredibly nitpicky of me

Member Author

mrcaseb commented Nov 26, 2025

hey seb, @tanho63 asked me to look over this so i did. i probably went overboard so i apologize lol.

only one real actionable comment, the rest are non actionable / nits, so feel free to ignore. thanks for contributing :)

Thank you for your feedback @guidopetri! It's highly appreciated in my journey of doing some basic python dev.

guidopetri commented Nov 28, 2025

For sure, feel free to tag me when you're ready for another review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet