Some things to work on

### 2023 Aug 16
- [ ] CSVScanner has needs:
  - [ ] provide delimiter params so that `csv.reader` can do its job correctly
    - [ ] extend those parameters to the argparse cli options
  - [ ] should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan() 
  - [X] break out the progress indicator by shoving it into a `progress_fn: Callable` callback parameter [8/20]
  - [X] provide a `progress_interval: int` parameter to control the frequency of the progress indicator [8/20]
  - [ ] row iteration: provide a `sample_pct: number` which specifies the percentage of rows to check for type or length
    - [ ] maybe use `self._csv_fh.seek(n)` to skip to the next apparent sample row; this might necessitate reinstantiating `csv.reader` to begin after the next newline
    - [ ] or, use a `io.TextBuffer` to skip rows behind the scenes so that the `reader` instance doesn't get affected
  - [ ] if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in `.scan()`.  Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
- [ ] csv2db.py
  - [ ] `zip_walker` needs to be a class ZipCollection with a base called CSVCollection or something
    - [ ] it's the head interface for instantiating CSVScanner, and outputting `CSVScanner.result()`, so `csv.reader` parameters need to go here
  - [ ] `create_import_sqlite` does a lot of heavy lifting by interfacing with the given DBMS, issuing `create table xyz` and then inserting rows.  Abstractifying some of this would be healthy:
    - [ ] sql dialect
    - [ ] separate out the create and insert into at least separate methods, but probably separate classes
    - [X] provide the same type of `progress_fn: Callable` callback and `progress_interval: int` that `csv2db.py` provides. `progress_interval: number` could be a percent, or an every-n-rows sort of event criteria [8/20]
- [X] regex filter file pathname & extension from cli [8/20]
- [ ] logging: offer a log level level setting via argparse
- [X] all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some things to work on #1

2023 Aug 16

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Some things to work on #1

Description

2023 Aug 16

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions