WIP: Update outlines (and outlines prompts) and transformers versions#256
WIP: Update outlines (and outlines prompts) and transformers versions#256
Conversation
bf92614 to
542d191
Compare
|
Alright so functionally this all seems to work, but when running the integration tests, there are several differences (mostly that makes sense as enough pieces have changed here that I wouldn't expect to get the exact same output). However, it seems like there's some non-determinism even with I need to spend some more time with this to understand what's really going on before feeling confident in merging. Anecdotally it does seem a bit faster now, but if it's not deterministic and doesn't enforce the entire schema what's the point. |
Copy in old outlines.prompt decorator for compatibility
6ffb3cf to
e76e5a5
Compare
|
The problem causing the non-determinsm and invalid regression values was improper handling on the outlines side of vocabulary creation: dottxt-ai/outlines#1831. This was merged recently into outlines main, but we are blocked from directly using this due to vllm dependencies. I've implemented a monkey patch fix we can use for the time being. There was a second bug in outlines here: dottxt-ai/outlines#1817. Same thing with above, we are blocked from the official fix because of vllm. |
dmjoy
left a comment
There was a problem hiding this comment.
@alright-code I think these changes on the whole are fine, I just left a few comments / change requests. I believe the integration test differences are within reason, but before merging I will want to check out this branch and run one or two tests with the latest data and configs to make sure the scores we're getting back are still good.
| from outlines_core import Vocabulary | ||
|
|
||
|
|
||
| # monkey patch to fix https://github.com/dottxt-ai/outlines/pull/1831 |
There was a problem hiding this comment.
If we know what version of outlines should have this fix (probably one higher version than the latest one assuming they haven't cut a release yet with this fix merged) we should note it here to make it easier on our future selves.
| encoded_dialog = tokenizer.apply_chat_template(dialog) | ||
|
|
||
| return tokenizer.decode(encoded_dialog) No newline at end of file | ||
| # class SpectrumTunedInferenceEngine(OutlinesTransformersInferenceEngine): |
There was a problem hiding this comment.
This inference engine is actually needed / used for our recent evaluation experiments so we should leave it in (not commented out) though maybe it needs to be updated wrt the outlines update? (FWIW I'm not a fan of how it's implemented here since essentially it's just changing how prompts are computed from the dialog, which to me doesn't really justify an entirely new inference engine)
| bert-score = "^0.3.13" | ||
| rich = "^13.6.0" | ||
| rouge-score = "^0.1.2" | ||
| swagger-client = {git = "https://github.com/NextCenturyCorporation/itm-evaluation-client.git", rev = "0.5.2"} |
There was a problem hiding this comment.
I think it's important to still pin the swagger-client version to 0.5.2, not sure if that's still being pinned with the uv specification now
| 'kdma_values': [{'kdma': 'Moral judgement', 'value': 0.2}]} | ||
| [bold]*CHANGED SCENE TO*: P1[/bold] | ||
| PyTorch version 2.3.1+cu118 available. | ||
| HTTP Request: HEAD https://huggingface.co/roberta-large/resolve/main/config.json "HTTP/1.1 200 OK" |
There was a problem hiding this comment.
There's already some mechanism(s) for filtering out lines during the integration test diff. We probably want to exclude these "HTTP Request" lines as they can be non-deterministic
| "Moral judgement": 0.55 | ||
| } | ||
| } | ||
| }, |
There was a problem hiding this comment.
Now I see your point about these timing lines and whether or not there should be an option to disable them. If there's not an easy way to filter them out in the diffs (with the integration test script) then yes it's probably a good idea to have an option in the pipeline_adm.py code for reporting these timing stats (should be on by default, but we can disable them for the integration test runs)
There was a problem hiding this comment.
This file looks like it's been hit by some auto-linting (and similar story with one or two of the other files). This particular file was provided by a sub, and I would rather than change it at all in case they deliver some updates to us so that the diff is much cleaner. Could either copy the version of this file from main and overwrite this version or revert the particular commit that made these changes.
In generally I would prefer not to auto-lint existing files since it tends to blow up the diffs with superficial changes.
@eveenhuis @jadie1 @barry-ravichandran FYSA
This is a WIP, but wanted to draw attention to some of the changes that need to be made for the latest outlines version. Specifically:
Theoutlines.pomptdecorate has been removed, now the guidance is to use theoutlines.Template.from_stringfunction.outlines.Template.from_stringwe also need to change how these templates are invoked wherever they're used, and that's scattered around several algorithms; not worth) future templates should strongly considering using this new approach thoughsamplerhas been removed from outlines, instead inference control parameters are just passed through to whatever backend is loading / serving the model (so for determinstic sampling using transformers as the model provider we would setdo_sample=Falsein thegeneration_kwargs)