Skip to content

Conversation

@mountainMath
Copy link
Owner

Summary

  • Replaces row-by-row as.Date() calls with lookup table approach
  • For large tables (millions of rows), date parsing was a major bottleneck
  • Builds lookup table from unique date values (typically hundreds/thousands), then uses fast vector indexing

Benchmark Results

Table 14-10-0287 (5,394,600 rows, year-month format):

Version Mean Time Min Max
Original 85.81 sec 82.56 90.02
Optimized 66.97 sec 64.88 69.09
Improvement 22% faster

The optimization saves ~19 seconds on get_cansim() for this large table.

Verification

  • Output is identical to original for all tested tables
  • All unit tests pass (20/20)

Test Tables

  • 20-10-0001 (163K rows, year-month format): IDENTICAL output
  • 14-10-0287 (5.4M rows, year-month format): IDENTICAL output
  • 34-10-0013 (550 rows, year format): IDENTICAL output

🤖 Generated with Claude Code

Instead of calling as.Date() on every row (millions of rows),
build a lookup table for unique date values (typically hundreds
or thousands), then use vector lookup for assignment.

Benchmark on table 14-10-0287 (5.4M rows):
- Original: mean 85.8 sec
- Optimized: mean 67.0 sec
- Improvement: 22% faster overall

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants