Use templating to reduce the size of files metadata#1264
Use templating to reduce the size of files metadata#1264
Conversation
|
I'm still parsing the actual logic but here are some quick things I noticed:
(this was an interesting little puzzle to solve...I'm 95% sure this solution is canonical)
|
fixed
I will argue that the base_ct parameter is potentially useful as the template-finding algorithm continues to evolve.
fixed.
fixed.
I'm going to have to think about this one, but it sounds easier to just make typical_tok_len required or remove it entirely. |
There was a problem hiding this comment.
The suffix tree approach is new to me so I can't comment much on that but I walked through the process end-to-end with the example data and the apply process makes sense to me.
I think there might be an issue with expand if there are $s present in the original data. I think catching any $$s in fully_template before trying to perform substitution would work, as would using safe_substitute (although that seems riskier--more surface area for bad data to creep in). Suggestions are just to try to illustrate where I see the issue--you might have a better solution in mind!
This is a philosophical difference rather than a functional one, so I defer ;)
Since everything currently uses that param I think making it positional/required is perfectly fine (as is removing it, since the value is uniform). |
sample_templated_list_pretty.json
This PR adds airflow/dags/template_utils.py, which contains algorithms to automatically find templating options in the long list of dicts that make up the
files:metadata. A classTemplateBuilderis introduced. This class can transform a list of file metadata dicts to smaller structure in which dict terms have been replaced by templates. For example, the key "description" would be replaced by "${k0}" . The value of the description would be replaced by an equivalent string with substituted templates. The dict of templates is prepended to the updated list of dicts, resulting in an overall smaller structure.TemplateBuilderalso includes a static methodexpand()which un-does the transformation.utils.make_send_status_msg_function()is modified to apply TemplateBuilder as the files metadata is constructed.See the attached file for a sample of the template dict and a few templated files entries.