Proposal: Modeling Data subdirectories are a Drop Box

# Dropbox
## Proposal 1: Modeling Data subdirectories are a Drop Box
In this proposal, you can put things anywhere in Modeling_Data as long as:
•	 you can point to a reader that reads data, applies provider flags the way you want, transforms it into a dataframe . 
•	 filenames sort lexicographically,
•	 you need to make a small entry in recpies/data_recipes.yaml describing how to read it and a few pieces of metadata.
•	checker will be provided.
Nightly they will be swept into /formatted and thereafter they are safe, although whether the raw is safe or not is kind of up to users.


## Use cases:
1. Mokelumne. Populated by usgs and two type of ebmud. 
2. Daily data that can be downloaded: This would not be included if a daily downloader can do it in roughly the same way we now do the continuous data.
3.  CCF gates. This is provided to us in a subfolder automatically by the SCADA people. Continuous but not regular. I derive a simpler series that is even more irregular but sparser and distills the information in a useful way.
3. Banks pumping. This is grabbed opportunistically and considerably transformed from pumping switches to flow in CFS
4. Unofficial data from official sources: Often we get data from the flow/WQ groups at NCRO that they don't want to publish officially but that we can describe as a short term station. Often these are "cross program" collections – for instance stage data collected by the flow group. They are acquired during projects or over email. They may or may not be maintained long term.

The proposal is that these can be put in /dropbox/data but also anywhere on Modeling_Data
Modeling_Data 
* repo
   * continuous 
     * formatted
   * daily 
     * formatted (or should it be repo->formatted?)
  * repo_staging 
    * continuous
      * daily
* repo_dropbox 
  * data
  * recipes 
    * data_recipes.yaml
  * mokelumne

The crux is data_recipes.yaml, the purpose of which is to do the following:
1.	Make sure we know what is/has been swept into our repository
2.	Make it easy to update stray data by adding more
3.	Connect the entries to enough metadata and standardization
4.	Address how possibly-overlapping updates work. 
5.	Allow the user to launch a checker:
``` $ check_dropbox --name pcnb (paradise cut north bridge, an entry in data_recipes.yaml)

- name: pcnb_elev
  file_pattern: SomeWeird_MOKE_golf_name.csv
  location: Modeling_Data/repo_dropbox/data    # This could be understood as a default 
  reader: read_ts        # Names, pointers to code etc. To be fleshed out
  reader_params: ... 
  freq: 15min   # None for irregular, "infer" for infer.
  metadata:              # Anything can be added, but the items below are required for a well formed entry
     station: pcnb       # Entry required in station_dbase.csv
     sublocation: None   # Optional but if provided checked against list
     agency: dwr_ncro    # Robust to aliases like "ncro" or "dwr-ncro"
     variable: elev      # Checked against data dictionary, some translations for common terms
     unit: stage         # Checked against data dictionary, some translations for common terms (e.g. cfs to ft^3/s)
- name: etc  

```







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposal: Modeling Data subdirectories are a Drop Box #52

Dropbox

Proposal 1: Modeling Data subdirectories are a Drop Box

Use cases:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposal: Modeling Data subdirectories are a Drop Box #52

Description

Dropbox

Proposal 1: Modeling Data subdirectories are a Drop Box

Use cases:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions