Refactor sample loading code#104
Conversation
Codecov Report
@@ Coverage Diff @@
## master #104 +/- ##
==========================================
- Coverage 85.59% 84.86% -0.74%
==========================================
Files 31 31
Lines 1534 1526 -8
Branches 242 239 -3
==========================================
- Hits 1313 1295 -18
- Misses 170 178 +8
- Partials 51 53 +2
Continue to review full report at Codecov.
|
| def main(args): | ||
| if (args.num_bands is None) is not (args.band is None): | ||
| raise ValueError('Must specify --num-bands and --band together') | ||
| myband = args.band - 1 if args.band else None |
There was a problem hiding this comment.
The human interface (CLI) expects the band to be 1-based (band \in {1..numbands}), whereas the Python and C++ APIs expect band to be 0-based (band \in {0..numbands-1}).
| for seqfile in seqfiles: | ||
| if mask: | ||
| if numbands: | ||
| nr, nk = sketch.consume_seqfile_banding_with_mask( |
There was a problem hiding this comment.
Bulk loading with mask now available from khmer master.
| print('[kevlar::counting] ', message, file=logfile) | ||
|
|
||
| if outfile: | ||
| if not outfile.endswith(('.ct', '.counttable')): |
There was a problem hiding this comment.
This behavior (with respect to filename extensions for saved sketches) should be documented somewhere. A single function to handle reading/writing sketches would help. One already exists, but it isn't suited for all cases IIRC.
Most of these changes were lumped together with (and are orthogonal to the main thrust of) a PR that likely will not be merged (#102).
The idea is to make the "sequentially diluted loading" strategy optional, and to generally clean things up a bit. Parallel loading should be handled in a separate PR.
Something to consider: there are functions that autodetects whether input files are Fastq/Fasta or sketches and invokes the appropriate command to load them. So far this has only applied to single file names, but it will clean up the code quite a bit in some places if this can handle file lists appropriately.