You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
provide delimiter params so that csv.reader can do its job correctly
extend those parameters to the argparse cli options
should know nothing about any db or DDL dialect: a function or method reference should be passed in for handling the result of CSVScanner.scan()
break out the progress indicator by shoving it into a progress_fn: Callable callback parameter [8/20]
provide a progress_interval: int parameter to control the frequency of the progress indicator [8/20]
row iteration: provide a sample_pct: number which specifies the percentage of rows to check for type or length
maybe use self._csv_fh.seek(n) to skip to the next apparent sample row; this might necessitate reinstantiating csv.reader to begin after the next newline
or, use a io.TextBuffer to skip rows behind the scenes so that the reader instance doesn't get affected
if possible, abstract out that this is about CSV or TSV and make scanning any data source feasible, by passing in a class that handles opening, iterating, breaking down the data source, being invoked by CSVScanner to produce a row, a block of rows, which will be processed in .scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.
csv2db.py
zip_walker needs to be a class ZipCollection with a base called CSVCollection or something
it's the head interface for instantiating CSVScanner, and outputting CSVScanner.result(), so csv.reader parameters need to go here
create_import_sqlite does a lot of heavy lifting by interfacing with the given DBMS, issuing create table xyz and then inserting rows. Abstractifying some of this would be healthy:
sql dialect
separate out the create and insert into at least separate methods, but probably separate classes
provide the same type of progress_fn: Callable callback and progress_interval: int that csv2db.py provides. progress_interval: number could be a percent, or an every-n-rows sort of event criteria [8/20]
regex filter file pathname & extension from cli [8/20]
logging: offer a log level level setting via argparse
all stdout should be routed to a callback: a caller using only the lib should be responsible for any console or gui output [8/20]
2023 Aug 16
csv.readercan do its job correctlyprogress_fn: Callablecallback parameter [8/20]progress_interval: intparameter to control the frequency of the progress indicator [8/20]sample_pct: numberwhich specifies the percentage of rows to check for type or lengthself._csv_fh.seek(n)to skip to the next apparent sample row; this might necessitate reinstantiatingcsv.readerto begin after the next newlineio.TextBufferto skip rows behind the scenes so that thereaderinstance doesn't get affected.scan(). Maybe too Java-like tho, maybe make CSVScanner a subclass of an ABC DataScanner.zip_walkerneeds to be a class ZipCollection with a base called CSVCollection or somethingCSVScanner.result(), socsv.readerparameters need to go herecreate_import_sqlitedoes a lot of heavy lifting by interfacing with the given DBMS, issuingcreate table xyzand then inserting rows. Abstractifying some of this would be healthy:progress_fn: Callablecallback andprogress_interval: intthatcsv2db.pyprovides.progress_interval: numbercould be a percent, or an every-n-rows sort of event criteria [8/20]