-
Notifications
You must be signed in to change notification settings - Fork 15
Implement clean_team_abbrs()
#43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
guidopetri
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey seb, @tanho63 asked me to look over this so i did. i probably went overboard so i apologize lol.
only one real actionable comment, the rest are non actionable / nits, so feel free to ignore. thanks for contributing :)
| abbr: a string or list of strings of abbreviations, full team names, or team nicknames. | ||
| current_location: If `True` (the default), the abbreviation of the most recent team | ||
| location will be used. | ||
| keep_non_matches: If `TRUE` (the default) an element of `abbr` that can't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: small typo (True)
| A string list with the length of `abbr` and cleaned team abbreviations\ | ||
| if they are included in `team_abbr_mapping()` or `team_abbr_mapping_norelocate()`\ | ||
| (depending on the value of `current_location`). Non matches may be replaced\ | ||
| with `Nome` (depending on the value of `keep_non_matches`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: another typo (None)
| Otherwise it will be replaced with `None`. | ||
| Returns: | ||
| A string list with the length of `abbr` and cleaned team abbreviations\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you don't need backslashes at the end of the lines here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually do need those linebreakers for the rendering of the documentation website.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah gotcha. I didn't look at the docs website!
| if isinstance(abbr, str): | ||
| abbr = abbr.split() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no action: Not particularly pythonic, but given that we are aiming for feature parity with the R version, this is.... fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will actually change this. It was stupid as I just wanted to make strings a list of strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's stupid! If we are aiming for feature parity with R then we should have the same kind of inputs allowed. Now, whether this should be allowed at all (in both the R and python version of this function), well. That's a point we can talk about :P
| if isinstance(abbr, str): | ||
| abbr = abbr.split() | ||
|
|
||
| # error if abbr is no list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: a lot of these comments are extraneous and can be removed (e.g. all the ones until L45 can be summarized by "arg validation" or even skipped entirely imo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree for most of these comments the code would be self-explanatory but since Tan and me aren't particularly experienced python coders I think we need some hints for future us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very fair
| # mapping is a polars df. Convert it to a dictionary. We could do the conversion with | ||
| # a polars join but the code below is just easier to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no action: imo, preferable to do this as a polars operation if you expect that people will use long lists of abbr, since it will be more performant that way. But I also think this is fine as is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had it in polars first and hated the syntax. But now I think I might have to do it because it will be easier to make it work in a polars workflow, I.e. in map_elements or map_batches
| # out dropped nonmatches. We replace the None values here if the user wants to keep them | ||
| if keep_non_matches is True: | ||
| for index, item in enumerate(out): | ||
| if item is None: | ||
| out[index] = abbr[index] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of replacing the dropped non-matches here, maybe something better would be in L59 above:
if keep_non_matches: # (also preferable to just use "implicit" comparison here, especially since you've validated that this is already a bool
out = [map_dict.get(key.upper(), key) for key in abbr] # use the default value of `key` if it does not find a match
else:
out = [map_dict.get(key.upper()) for key in abbr] # same as what you have in L59There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is so much better lol.
Still requires another loop to identify non-matches in the original abbr list but I prefer this solution.
| map_dict = dict(mapping.iter_rows()) | ||
|
|
||
| # lookup with .get method because it replaces nonmatches with a default value (None) | ||
| out = [map_dict.get(key.upper()) for key in abbr] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: out is not a very descriptive variable name
|
|
||
| def clean_team_abbrs( | ||
| abbr: str | list[str], current_location: bool = True, keep_non_matches: bool = True | ||
| ) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this returns a list[str] specifically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I approach a usecase where I want my function to work both inside polars and independently of polars?
It would have to handle str, list[str] and pl.Series as inputs, which isn't particularly hard to do. But it would also require to return either str (e.g. in map_elements), list[str], and pl.Series (maybe this could also be a list).
So the return type would depend on the input type. Which I believe isn't pythonic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's very pythonic no. But I've also definitely seen functions in large python libraries that do this too so I don't think it's too bad if you want to do that.
The way that I'd define it in that case is actually using the overload syntax, where you would essentially define 3 function "headers" (one for each input type, returning the same output type) and then in your "actual" function you would type it with the 3 input types / returning the 3 input types (let me know if this isn't clear, sorry). Depending on the python version you're using / targeting it might also be possible to use typing generics? though I haven't tried that myself.
| new_abbr = nfl.clean_team_abbrs(x) | ||
| new_abbr_drop = nfl.clean_team_abbrs(x, keep_non_matches=False) | ||
| old_abbr = nfl.clean_team_abbrs(x, current_location=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: possibly could use pytest.parametrize() here instead to pass in the args / expected output and have this test be shorter/simpler, but also incredibly nitpicky of me
Thank you for your feedback @guidopetri! It's highly appreciated in my journey of doing some basic python dev. |
|
For sure, feel free to tag me when you're ready for another review |
next part of #36
Tests proof that it works