Skip to content

neverSettles/notebook-tasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Notebook Tasks

20 notebook/data-science tasks in Harbor format, with example trajectories generated by Gemini 3.1 Pro Preview via Gemini CLI on Daytona.

Structure

tasks/                              # Task definitions (Harbor format)
  <task_name>/
    instruction.md                  # Task prompt
    task.toml                       # Task metadata
    environment/
      Dockerfile                    # Environment setup
    tests/
      test.py / test.sh             # Verification tests
    solution/
      solve.sh                      # Reference solution

trajectories/                       # Single trajectory per task
  <task_name>/
    trajectory.json                 # ATIF-format trajectory
    gemini-cli.trajectory.json      # Raw Gemini CLI trajectory
    gemini-cli.txt                  # Agent console output
    verifier/
      reward.txt                    # Score (0 or 1)
      test-stdout.txt               # Test output

trajectories-pass10/                # 10 attempts per task (pass@10, 5min timeout)
  <task_name>/
    attempt_1/ ... attempt_10/
      trajectory.json
      gemini-cli.trajectory.json
      gemini-cli.txt
      result.json
      verifier/
        reward.txt
        test-stdout.txt

trajectories-pass10-10xtimeout/     # 10 attempts per task (pass@10, 50min timeout)
  <task_name>/
    attempt_1/ ... attempt_10/
      (same structure as above)

Results

Task Pass@1 Pass@10 (5min) Pass@10 (50min)
ab_test_batch_processing_d976f570 0 1/10 2/10
ab_test_refactoring_693b7a52 0 0/10 0/10
advanced_sales_dashboard_ae646733 0 0/10 0/10
clean_financial_news_f248547f 0 0/10 0/10
clean_imputation_easy_2a135a15 0 3/10 1/10
etl_customer_merge_82586d5b 0 0/10 0/10
etl_experiment_eda_bed6e0af 0 0/10 0/10
etl_sales_pipeline_hard_64c2f802 1 5/10 9/10
gene_expression_api_opt_5780f5e3 0 0/10 0/10
gene_expression_refactor_e20a8b61 0 1/10 0/10
gene_format_conversion_9c9ab332 1 10/10 10/10
inventory_data_standardization_761286bd 0 0/10 0/10
ml_data_format_converter_45d7e0d7 1 10/10 10/10
quarterly_feedback_cleaning_cfe6ef53 0 0/10 0/10
quarterly_report_pipeline_ace29ee3 0 0/10 0/10
refactor_finance_timeseries_c5ca40ee 1 10/10 10/10
sales_etl_pipeline_d8e5ce0c 0 0/10 0/10
script_to_notebook_validation_792a5dcc 0 0/10 0/10
sql_etl_sales_pipeline_80627931 1 5/10 6/10
sql_quarterly_report_d6170bdb 1 10/10 10/10

Summary

Metric Value
Mean (pass@1) 0.30 (6/20)
Mean (pass@10, 5min timeout) 0.275 (55/200)
Mean (pass@10, 50min timeout) 0.29 (58/200)
Tasks solved at least once (50min) 8/20
Tasks solved 10/10 (50min) 4/20
Total trajectories 420 (20 + 200 + 200)

Agent Configuration

  • Agent: gemini-cli
  • Model: google/gemini-3.1-pro-preview
  • Environment: Daytona
  • Date: 2026-02-25

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors