-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Dropbox
Proposal 1: Modeling Data subdirectories are a Drop Box
In this proposal, you can put things anywhere in Modeling_Data as long as:
• you can point to a reader that reads data, applies provider flags the way you want, transforms it into a dataframe .
• filenames sort lexicographically,
• you need to make a small entry in recpies/data_recipes.yaml describing how to read it and a few pieces of metadata.
• checker will be provided.
Nightly they will be swept into /formatted and thereafter they are safe, although whether the raw is safe or not is kind of up to users.
Use cases:
- Mokelumne. Populated by usgs and two type of ebmud.
- Daily data that can be downloaded: This would not be included if a daily downloader can do it in roughly the same way we now do the continuous data.
- CCF gates. This is provided to us in a subfolder automatically by the SCADA people. Continuous but not regular. I derive a simpler series that is even more irregular but sparser and distills the information in a useful way.
- Banks pumping. This is grabbed opportunistically and considerably transformed from pumping switches to flow in CFS
- Unofficial data from official sources: Often we get data from the flow/WQ groups at NCRO that they don't want to publish officially but that we can describe as a short term station. Often these are "cross program" collections – for instance stage data collected by the flow group. They are acquired during projects or over email. They may or may not be maintained long term.
The proposal is that these can be put in /dropbox/data but also anywhere on Modeling_Data
Modeling_Data
- repo
- continuous
- formatted
- daily
- formatted (or should it be repo->formatted?)
- repo_staging
- continuous
- daily
- continuous
- continuous
- repo_dropbox
- data
- recipes
- data_recipes.yaml
- mokelumne
The crux is data_recipes.yaml, the purpose of which is to do the following:
- Make sure we know what is/has been swept into our repository
- Make it easy to update stray data by adding more
- Connect the entries to enough metadata and standardization
- Address how possibly-overlapping updates work.
- Allow the user to launch a checker:
- name: pcnb_elev
file_pattern: SomeWeird_MOKE_golf_name.csv
location: Modeling_Data/repo_dropbox/data # This could be understood as a default
reader: read_ts # Names, pointers to code etc. To be fleshed out
reader_params: ...
freq: 15min # None for irregular, "infer" for infer.
metadata: # Anything can be added, but the items below are required for a well formed entry
station: pcnb # Entry required in station_dbase.csv
sublocation: None # Optional but if provided checked against list
agency: dwr_ncro # Robust to aliases like "ncro" or "dwr-ncro"
variable: elev # Checked against data dictionary, some translations for common terms
unit: stage # Checked against data dictionary, some translations for common terms (e.g. cfs to ft^3/s)
- name: etc
Metadata
Metadata
Labels
enhancementNew feature or requestNew feature or request