Skip to content

time boost in folds generation#42

Open
aldder wants to merge 4 commits intoWenjieZ:masterfrom
aldder:time-boost-in-fold-generation
Open

time boost in folds generation#42
aldder wants to merge 4 commits intoWenjieZ:masterfrom
aldder:time-boost-in-fold-generation

Conversation

@aldder
Copy link
Copy Markdown

@aldder aldder commented Feb 1, 2022

With contiguous test sets:

cv_orig = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
cv_opt = GapKFold(n_splits=5, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [3 4 5 6 7 8 9] TEST: [0 1]
... TRAIN: [0 5 6 7 8 9] TEST: [2 3]
... TRAIN: [0 1 2 7 8 9] TEST: [4 5]
... TRAIN: [0 1 2 3 4 9] TEST: [6 7]
... TRAIN: [0 1 2 3 4 5 6] TEST: [8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))


... 1.21 s ± 37.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))


... 4.74 ms ± 44.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

With uncontiguous test sets:

cv_orig = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_orig.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
cv_opt = _XXX_(_xxx_, gap_before=1, gap_after=1)

for train_index, test_index in cv_opt.split(np.arange(10)):
    print("TRAIN:", train_index, "TEST:", test_index)


... TRAIN: [5 6 7 8 9] TEST: [0 1 2 3]
... TRAIN: [7 8 9] TEST: [0 1 4 5]
... TRAIN: [3 4 9] TEST: [0 1 6 7]
... TRAIN: [3 4 5 6] TEST: [0 1 8 9]
... TRAIN: [0 7 8 9] TEST: [2 3 4 5]
... TRAIN: [0 9] TEST: [2 3 6 7]
... TRAIN: [0 5 6] TEST: [2 3 8 9]
... TRAIN: [0 1 2 9] TEST: [4 5 6 7]
... TRAIN: [0 1 2] TEST: [4 5 8 9]
... TRAIN: [0 1 2 3 4] TEST: [6 7 8 9]
%%timeit
folds = list(cv_orig.split(np.arange(10000)))

... 1.23 s ± 75.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
folds = list(cv_opt.split(np.arange(10000)))

... 4.78 ms ± 49.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@pep8speaks
Copy link
Copy Markdown

pep8speaks commented Feb 1, 2022

Hello @aldder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-04 12:11:14 UTC

tscv/_split.py Outdated
begin = max(0, subindex[0] - before)
end = min(subindex[-1] + after + 1, n_samples)
complement = np.intersect1d(complement,
np.setdiff1d(allindices, allindices[begin:end]))
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify this line by using the fact $A \cap B^C = A \setminus B$.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @WenjieZ , I'm sorry, I didn't get it, can you reformulate your statement?

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

complement = np.setdiff1d(complement, allindices[begin:end])

math

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, right! fixed!
Thank you

aldder added a commit to aldder/TSCV that referenced this pull request Feb 4, 2022
@WenjieZ
Copy link
Copy Markdown
Owner

WenjieZ commented Feb 5, 2022

Hi @aldder , please time the updated version and report the performance gain.

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 7, 2022

Codecov Report

Merging #42 (f5c38b3) into master (c05265a) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master      #42   +/-   ##
=======================================
  Coverage   97.51%   97.51%           
=======================================
  Files           3        3           
  Lines         643      645    +2     
=======================================
+ Hits          627      629    +2     
  Misses         16       16           
Impacted Files Coverage Δ
tscv/_split.py 93.82% <100.00%> (+0.05%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c05265a...f5c38b3. Read the comment docs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants