time boost in folds generation by aldder · Pull Request #42 · WenjieZ/TSCV

aldder · 2022-02-01T17:36:55Z

With contiguous test sets:

cv_orig = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]

cv_opt = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]

%%timeit
folds = list(cv_orig.split(np.arange(10000)))


... 1.21 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
folds = list(cv_opt.split(np.arange(10000)))


... 4.74 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With uncontiguous test sets:

cv_orig = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]

cv_opt = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]

%%timeit
folds = list(cv_orig.split(np.arange(10000)))

... 1.23 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
folds = list(cv_opt.split(np.arange(10000)))

... 4.78 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

pep8speaks · 2022-02-01T17:36:59Z

Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-04 12:11:14 UTC

WenjieZ · 2022-02-04T08:33:19Z

tscv/_split.py

+                begin = max(0, subindex[0] - before)
+                end = min(subindex[-1] + after + 1, n_samples)
+                complement = np.intersect1d(complement,
+                    np.setdiff1d(allindices, allindices[begin:end]))


You can simplify this line by using the fact $A \cap B^C = A \setminus B$.

Hi @WenjieZ , I'm sorry, I didn't get it, can you reformulate your statement?

complement = np.setdiff1d(complement, allindices[begin:end])

Ok, right! fixed!
Thank you

fix WenjieZ#42 (comment)

WenjieZ · 2022-02-05T04:05:59Z

Hi @aldder , please time the updated version and report the performance gain.

codecov · 2022-02-07T07:20:52Z

Codecov Report

Merging #42 (f5c38b3) into master (c05265a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master      #42   +/-   ##
=======================================
  Coverage   97.51%   97.51%           
=======================================
  Files           3        3           
  Lines         643      645    +2     
=======================================
+ Hits          627      629    +2     
  Misses         16       16

Impacted Files	Coverage Δ
tscv/_split.py	`93.82% <100.00%> (+0.05%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c05265a...f5c38b3. Read the comment docs.

time boost in folds generation

482d282

aldder added 2 commits February 1, 2022 18:57

pep8

f85b8ff

pep8

177ae3d

WenjieZ reviewed Feb 4, 2022

View reviewed changes

fix WenjieZ#42 (comment)

f5c38b3

aldder added a commit to aldder/TSCV that referenced this pull request Feb 4, 2022

Merge pull request #5 from aldder/time-boost-in-fold-generation

a5ea94b

fix WenjieZ#42 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

time boost in folds generation#42

time boost in folds generation#42
aldder wants to merge 4 commits intoWenjieZ:masterfrom
aldder:time-boost-in-fold-generation

aldder commented Feb 1, 2022

Uh oh!

pep8speaks commented Feb 1, 2022 •

edited

Loading

Uh oh!

WenjieZ Feb 4, 2022

Uh oh!

aldder Feb 4, 2022

Uh oh!

WenjieZ Feb 4, 2022

Uh oh!

aldder Feb 4, 2022

Uh oh!

WenjieZ commented Feb 5, 2022 •

edited

Loading

Uh oh!

codecov bot commented Feb 7, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aldder commented Feb 1, 2022

Uh oh!

pep8speaks commented Feb 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2022-02-04 12:11:14 UTC

Uh oh!

WenjieZ Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

aldder Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

WenjieZ Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

aldder Feb 4, 2022

Choose a reason for hiding this comment

Uh oh!

WenjieZ commented Feb 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Feb 7, 2022

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pep8speaks commented Feb 1, 2022 •

edited

Loading

WenjieZ commented Feb 5, 2022 •

edited

Loading