phyplan.github.io/index.html at main · phyplan/phyplan.github.io · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
<!DOCTYPE html>
<html>

<head>
  <meta charset="utf-8">
  <meta name="description"
    content="PhyPlan: Learning To Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation">
  <meta name="keywords" content="PhyPlan, PhyPlan">
  <meta name="viewport" content="width=device-width, initial-scale=1">
  <title>PhyPlan</title>

  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">

  <link rel="stylesheet" href="./static/css/bulma.min.css">
  <link rel="stylesheet" href="./static/css/bulma-carousel.min.css">
  <link rel="stylesheet" href="./static/css/bulma-slider.min.css">
  <link rel="stylesheet" href="./static/css/fontawesome.all.min.css">
  <link rel="stylesheet" href="https://cdn.jsdelivr.net/gh/jpswalsh/academicons@1/css/academicons.min.css">
  <link rel="stylesheet" href="./static/css/index.css">

  <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.5.1/jquery.min.js"></script>
  <script defer src="./static/js/fontawesome.all.min.js"></script>
  <script src="./static/js/bulma-carousel.min.js"></script>
  <script src="./static/js/bulma-slider.min.js"></script>
  <script src="./static/js/index.js"></script>
</head>

<body>
  <section class="hero">
    <header class="header">
      <div class="hero-body">
        <div class="container is-max-desktop">
          <div class="columns is-centered">
            <div class="column has-text-centered">
              <h1 class="title is-1 publication-title"><span class="dnerf">PhyPlan</span>: Learning To Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation</h1>
              <h3 class="title is-4 conference-authors" style="color: rgb(0, 102, 255);">......</h3>
              <div class="is-size-5 publication-authors" , style="margin-bottom: 5mm;">
                <span class="author-block">
                  Ankit Kanwar<sup>*</sup>,
                  Hartej Soin<sup>*</sup>,
                  Abhinav Barnawal<sup>*</sup>,
                  Mudit Chopra,
                  Harshil Vagadia,
                  Tamajit Banerjee,
                  Shreshth Tuli,
                  Rohan Paul and
                  Souvik Chakraborty</span>
              </div>

              <div class="is-size-6 publication-authors" , style="margin-bottom: 8mm;">
                <!-- <span class="author-block"><sup>1</sup>Work primarily done when at IIT Delhi, </span>
                <span class="author-block"><sup>2</sup>Affiliated with IIT Delhi, </span> -->
                <span class="author-block">Indian Institute of Technology Delhi | </span>
                <span class="author-block"><sup>*</sup>Equal contribution </span>
                <!-- <span class="author-block"><sup>#</sup>Indicate equal advising, </span> -->
              </div>

              <div class="column has-text-centered">
                <div class="publication-links">
                  <!-- PDF Link. -->
                  <span class="link-block">
                    <a href="https://arxiv.org/pdf/2406.00001" class="external-link button is-normal is-rounded is-dark">
                      <span class="icon">
                        <i class="fas fa-file-pdf"></i>
                      </span>
                      <span>Paper</span>
                    </a>
                  </span>
                  <span class="link-block">
                    <a href="http://arxiv.org/abs/2406.00001" class="external-link button is-normal is-rounded is-dark">
                      <span class="icon">
                        <i class="ai ai-arxiv"></i>
                      </span>
                      <span>arXiv</span>
                    </a>
                  </span>
                  <!-- Video Link. -->
                  <span class="link-block">
                    <a href="https://1drv.ms/f/s!AjIGOzSicYxmgQnyyezjnUCMDo3N?e=54ufNt"
                      class="external-link button is-normal is-rounded is-dark">
                      <span class="icon">
                        <i class="fab fa-youtube"></i>
                      </span>
                      <span>Video</span>
                    </a>
                  </span>
                  <!-- Code Link. -->
                  <span class="link-block">
                    <a href="https://github.com/phyplan/PhyPlan"
                      class="external-link button is-normal is-rounded is-dark">
                      <span class="icon">
                        <i class="fab fa-github"></i>
                      </span>
                      <span>Code</span>
                    </a>
                  </span>
                  <!-- Code Link. -->
                  <span class="link-block">
                    <a href="https://1drv.ms/f/s!AjIGOzSicYxmcmRXYJJINf42I20?e=XhDGne"
                      class="external-link button is-normal is-rounded is-dark">
                      <span class="icon">
                        <i class="fa fa-database"></i>
                      </span>
                      <span>Data</span>
                    </a>
                  </span>
                </div>
              </div>
            </div>
          </div>
        </div>
      </div>
    </header>
  </section>

  <hr>

  <div class="container is-max-desktop has-text-centered">
    <h2 class="subtitle" , style="max-width: 90%; padding-left: 10%;">
      <span class="dnerf">PhyPlan</span> is a novel physics-informed planning framework based on accelerated
      learning of Physical Reasoning Tasks using Physics-Informed Dynamics Predictors.
    </h2>
  </div>

  <hr>

  <div class="container is-max-desktop has-text-centered">
  <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px; border: none; outline: none; box-shadow: none;">
      <source src="./static/video/PhyPlan_Demo_Video.mp4" type="video/mp4">
    </video>
</div>

  <section class="section">
    <div class="container is-max-desktop">
      <!-- Abstract. -->
      <h2 class="title has-text-centered is-3">Abstract</h2>
      <div class="content is-four-fifths has-text-justified">
        <p>
          Given the task of landing a ball in a goal region beyond direct reach, humans can often throw, slide, or rebound objects against the wall to attain the goal. Enabling robots to replicate such reasoning is non-trivial as it requires multi-step planning and involves a mixture of discrete and continuous action spaces, a sparse and sensitive reward structure, computationally expensive simulations, and an incomplete understanding of the environment's physics. We present PhyPlan, a physics-informed and adaptable planning framework for efficient multi-step physical reasoning. At its core, PhyPlan comprises of Generative Flow Networks (GFlowNets) and Monte Carlo Tree Search (MCTS) to explore and evaluate sequences of object interactions. GFlowNets sample discrete action sequences in proportion to their associated reward, enabling broad and reward-driven exploration of the discrete planning space. MCTS complements this by adaptively balancing the use of a fast but approximate pre-trained physics-informed dynamics predictor and costly but accurate environment rollouts, ensuring both speed and precision in planning. The known and actual physics discrepancy is captured using Gaussian Process Regression. Experiments on benchmark simulated tasks requiring composition of collisions, slides, and rebounds demonstrate that PhyPlan achieves a 45% higher success rate and up to 3x efficiency gains over state-of-the-art model-based reinforcement learning approaches.
        </p>
      </div>
  </section>


  <section class="section">
    <div class="container is-max-desktop">
      <h2 class="title has-text-centered is-3">Explanatory Video</h2>
      </div>
      <br>

      <div class="container is-max-desktop has-text-centered">
        <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px; border: none; outline: none; box-shadow: none;">
            <source src="./static/video/PhyPlan_Explanatory_Video.mp4" type="video/mp4">
          </video>
      </div>
    </div>
  </section>

  <section class="hero-teaser section">
    <div class="container is-max-desktop">

      <h2 class="title is-3 has-text-centered">Approach Overview</h2>

      <div class="columns is-centered">
        <div class="column is-12 has-text-centered">
          <img src="./static/images/Approach_Overview.png"
               style="border: none; outline: none; box-shadow: none; max-width: 100%; height: auto;" />
        </div>
      </div>

      <div class="content has-text-justified">
        <p>
          <strong>Left to Right:</strong> The framework begins by learning dynamics-predictor models for elementary physical skills—such as throwing, sliding, swinging, and collision—using a <strong>Physics-Informed Neural Network (PINN)</strong>. By incorporating coarse physics-based equations directly into its loss function, the network learns to predict both the future trajectory of interacting objects and the domain's latent physical parameters.
        </p>
        <p>
          Moving to the planning phase, given a new task and a set of tools, the agent uses <strong>GFlowNets</strong> to sample and iterate over various tool sequences, utilising the PINN-based dynamics predictors as efficient <strong>proxy-reward models</strong>. To optimise execution, the agent reasons over specific control parameters using <strong> Monte Carlo Tree Search (MCTS)</strong>. This search employs the <strong>learnt models</strong> (PINN) to rapidly simulate actions—such as the release plane of a pendulum or the orientation of a wedge—faster than a complex high-fidelity physics simulator. Finally, to ensure accuracy, the MCTS periodically conducts rollouts in the high-fidelity simulator (<strong>world model</strong>) and corrects any discrepancies between the PINN-based simulations and reality using a <strong>Gaussian Process (GP) Regressor</strong>.
        </p>
    </div>
  </section>

  <section class="section">
    <div class="container is-max-desktop">
      <h2 class="title is-3" style="text-align: center;">Skill Learning</h2>
      <div class="has-text-justified">
        <p>
          We consider the problem of learning a model for physical skills such as bouncing a ball-like object off a
          wedge, sliding over an object, swinging a pendulum, throwing an object as a projectile and hitting an object
          with a pendulum.
          The skill learning model predicts the state trajectory of an object as it undergoes a dynamic interaction with
          another object.
        </p>
        <br>
      </div>
      <hr>

      <div class="columns is-multiline" style="justify-content: center;">

        <div class="column is-one-quarter" style="display: flex; flex-direction: column; align-items: center; justify-content: flex-start;">
          <figure class="image" style="display: flex; justify-content: center; align-items: center; margin-bottom: 15px;">
            <img src="./static/gif/slide.gif" alt="GIF 1" style="width: 300px; height: 200px; border-radius: 25px;">
          </figure>
          <h3 class="title is-5" style="text-align: center;">Sliding Skill</h3>
          <p style="text-align: justify;">Skill network learns to determine the displacement and velocity of a sliding
            box with given initial velocity at any time queried on a rough plane</p>
        </div>

        <div class="column is-one-quarter" style="display: flex; flex-direction: column; align-items: center; justify-content: flex-start;">
          <figure class="image" style="display: flex; justify-content: center; align-items: center; margin-bottom: 15px;">
            <img src="./static/gif/projectile.gif" alt="GIF 2"
              style="width: 300px; height: 200px; border-radius: 25px;">
          </figure>
          <h3 class="title is-5" style="text-align: center;">Throwing Skill</h3>
          <p style="text-align: justify;">Skill network learns to determine the location and velocity of a ball thrown
            with given initial angle and velocity, at any time queried.</p>
        </div>

        <div class="column is-one-quarter" style="display: flex; flex-direction: column; align-items: center; justify-content: flex-start;">
          <figure class="image" style="display: flex; justify-content: center; align-items: center; margin-bottom: 15px;">
            <img src="./static/gif/pendulum.gif" alt="GIF 3" style="width: 300px; height: 200px; border-radius: 25px;">
          </figure>
          <h3 class="title is-5" style="text-align: center;">Swinging Skill</h3>
          <p style="text-align: justify;">Skill network learns to determine the angular position and angular velocity of
            pendulum at any time queried with given initial angular position.</p>
        </div>

        <div class="column is-one-quarter" style="display: flex; flex-direction: column; align-items: center; justify-content: flex-start;">
          <figure class="image" style="display: flex; justify-content: center; align-items: center; margin-bottom: 15px;">
            <img src="./static/gif/peg.gif" alt="GIF 4" style="width: 300px; height: 200px; border-radius: 25px;">
          </figure>
          <h3 class="title is-5" style="text-align: center;">Collision Skill</h3>
          <p style="text-align: justify;">Skill network learns to determine the velocity of the puck just after it gets hit by a
            swinging pendulum.</p>
        </div>

      </div>
      <hr>
      <div class="container is-max-desktop">
        <div class="columns is-centered is-multiline">
          <div class="column is-one-quarter">
            <img src="./static/images/simulator_rollout.png"
                 style="border-radius: 15px; border: None; width: 100%;" />
          </div>
          <div class="column is-one-quarter">
            <img src="./static/images/pinn_rollout.png"
                 style="border-radius: 15px; border: None; width: 100%;" />
          </div>
          <div class="column is-one-quarter">
            <img src="./static/images/bounce_chain_0.75.png"
                 style="border-radius: 15px; border: None; width: 100%;" />
          </div>
          <div class="column is-one-quarter">
            <img src="./static/images/bounce_chain_1.png"
                 style="border-radius: 15px; border: None; width: 100%;" />
          </div>
        </div>

        <div class="columns is-centered">
          <div class="has-text-justified">
            <p>
              <i><b>The skill learning model is based on a neural network that predicts the object's state during dynamic
              interaction continuously parameterised by time.</i></b> The figures above show the predicted
              positions of the ball plotted against time in the Bounce Task. Such interactions can be simulated in a
              physics engine by using numerical integration schemes.
              However, since we aim to perform multi-step interactions, simulating outcomes during training is often
              intractable. Hence, we adopt a learning-based approach and learn a function which predicts the object's
              state during dynamic interaction continuously parameterised by time.
              For certain skills like swinging, sliding and throwing, we leverage the known governing physics equations and
              employ a physics-informed loss function in a neural network to constrain the latent space, which is called as
              Physics-Informed Dynamics-Predictors.
              However, skills like collision detection are learnt directly from data due to the complex, intractable physics.
            </p>
          </div>
        </div>
      </div>
      <!-- <div class="has-text-justified">
        <p>
          The skill learning model is based on a neural network that predicts the object's state during dynamic
          interaction, continuously parameterised by time. Such interactions can be simulated in a physics engine by
          using numerical integration schemes.
          However, since we aim to perform multi-step interactions, simulating outcomes during training is often
          intractable. Hence, we adopt a learning-based approach and learn a function which predicts the object's
          state during dynamic interaction continuously parameterised by time.
          For certain skills like swinging, sliding and throwing, we leverage the known governing physics equations and
          employ a physics-informed loss function in a neural network to constrain the latent space; these are called as
          Physics-Informed Dynamics Predictors.
          However, skills like bouncing and hitting are learnt directly from data because of the  complex and intractable
          physics.
        </p>
      </div> -->
    </div>
  </section>

  <section class="section">
    <div class="container is-max-desktop">
      <h2 class="title is-3 has-text-centered">Benchmark Physical Reasoning Tasks</h2>
      <div id="tasks" , class="column has-text-justified">
        <p>
          We created the following five challenging 3D physical reasoning tasks to analyse the performance of PhyPlan,
          inspired by prior works in simplistic 2D environments presented in <a href="https://arxiv.org/abs/1907.09620"
            class="external-link is-normal">[Allen et al., 2020]</a> and <a href="https://phyre.ai/"
            class="external-link is-normal">[Bakhtin et al., 2019]</a>.
        </p>
      </div>
      <div class="columns is-centered">
        <div class="column is-12 has-text-centered">
          <img src="./static/images/benchmark_tasks.png"
               style="border: none; outline: none; box-shadow: none; max-width: 100%; height: auto;" />
        </div>
      </div>
      <div id="tasks" , class="column has-text-justified">
        <p>
          PhyPlan performs semantic reasoning using PINN-based Skill Models before executing each action in the
          environment (just as Humans think before executing). It also learns the difference between PINN-based rewards
          for actions and actual rewards as it executes actions (called online learning). Therefore, it often improves
          in subsequent actions (Just as Humans improve their actions with more trials). The videos show the actions
          taken by PhyPlan on each task. The effect of online learning is more evident in the Bounce and Bridge tasks, where
          the robot performs poorly in early attempts.
        </p>
      </div>
      <hr>
      <div class="columns is-multiline">
        <div class="column is-centered is-half">
          <div class="content">
            <h3 class="title is-4 has-text-centered">Launch Task</h3>
            <p class="has-text-justified">Robot trains to use the pendulum object present in the environment to make the
              ball reaches the goal</p>
            <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px;">
              <source src="./static/video/launch.mp4" type="video/mp4">
            </video>
            <p class="teaser has-text-justified">
              The robot learns to correctly align the pendulum's plane and angle to throw the ball into the box.
            </p>
          </div>
        </div>

        <div class="column is-centered is-half">
          <div class="content">
            <h3 class="title is-4 has-text-centered">Slide Task </h3>
            <p class="has-text-justified">Robot trains to use the pendulum object present in the environment to slide the
              puck to the goal</p>
            <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px;">
              <source src="./static/video/slide.mp4" type="video/mp4">
            </video>
            <p class="teaser has-text-justified">
              The following five trials represent the robot eventually sliding the puck to reach the goal by aligning
              the pendulum and using physical skills like hitting and sliding.
            </p>
          </div>
        </div>

        <div class="column is-centered is-half">
          <div class="content">
            <h3 class="title is-4 has-text-centered">Bounce Task</h3>
            <p class="has-text-justified">Robot trains to use the wedge object present in the environment to make the ball
              reach the goal</p>
            <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px;">
              <source src="./static/video/bounce.mp4" type="video/mp4">
            </video>
            <p class="teaser has-text-justified">
              In the above five trials, the robot places the wedge at the correct location with the proper orientation,
              throwing the ball from the proper height, so that the ball reaches the goal.
            </p>
          </div>
        </div>

        <div class="column is-centered is-half">
          <div class="content">
            <h3 class="title is-4 has-text-centered">Bridge Task</h3>
            <p class="has-text-justified">Robot trains to use the pendulum and bridge objects present in the environment
              to make the puck reach the
              goal</p>
            <video id="matting-video" controls playsinline height="100%" style="border-radius: 25px;">
              <source src="./static/video/bridge.mp4" type="video/mp4">
            </video>
            <p class="teaser has-text-justified">
              Over the shown trials, the robot learns to correctly align the pendulum so that the hitting plane is
              correctly aligned. Eventually, the robot effectively uses objects like the bridge present in the
              environment.
              <!-- These, along with multiple physical skills involved in the task, highlight PhyPlan's adaptability to long-horizon tasks. -->
            </p>
          </div>
        </div>
      </div>
    </div>
  </section>

  <section class="section">
    <div class="container is-max-desktop">
      <h2 class="title is-3 has-text-centered">Progressively reasoning over tools</h2>
      <div class="column has-text-justified">
        <p>
          PhyPlan's tool selector based on GFlowNet eventually learns to select and place the appropriate tool classes and the best corresponding tool variants.
        </p>
      </div>
      <div class="content">
        <img src="./static/images/phyplan_vis.png"/>
      </div>
  </div>
</section>

<section class="section">
    <div class="container is-max-desktop">
      <h2 class="title is-3 has-text-centered">Comparison with the Baselines</h2>
      <div class="column has-text-justified">
        <p>
          The image below quantifies the significant efficiency and accuracy advantage of PhyPlan over the baselines,
          in spite of PhyPlan having a greater reasoning depth compared to the baselines, which only select controls for "ideal" tools.
        </p>
      </div>
      <div class="content">
        <img src="./static/images/time-regret.png"/>
      </div>
      <hr>
      <div class="column has-text-justified">
        <p>
          Below is a qualitative comparison with the baselines "DQN" (adapted from <a href="https://phyre.ai/"
            class="external-link is-normal">[Bakhtin et al., 2019]</a>) and LLM. DQN is a Deep Q-Network trained on a set of
          observation-action-reward triplets minimizing the cross-entropy between the soft prediction and the observed
          reward. DQN also uses the same algorithm (Gaussian Process) as PhyPlan for online learning (<a href="#tasks">here</a>)
        </p>
      </div>
      <div class="content">
        <img src="./static/images/DQN-vs-LLM-vs-PhyPlan.png"
              style="border-radius: 25px;"/>
      </div>
      <ol>
        <li><b>DQN (Baseline)</b> executes actions in sequence while learning the difference
          in the predicted reward for an action and the actual reward. However, it does not use the bridge even after 11
          trials which is needed to land the ball further closer to the goal.
        </li>
        <li><b>LLM (Baseline)</b> executes actions in sequence while correcting them based on feedback
          in further trials. It uses the bridge in a few trials but does align the pandulum appropriately even in 10 trials.
        </li>
        <li><b>PhyPlan</b> executes actions in sequence while learning the difference in the predicted
          reward for an action and the actual reward. It does not use the bridge in the first attempt because
          of errors in prediction. However, it quickly realises the need of the bridge in the second attempt. Further,
          it chooses appropriate actions to land the ball in the goal in just the fourth attempt. Note that the robot
          learns to use the bridge effectively; a physical reasoning task reported earlier <a
            href="https://arxiv.org/abs/1907.09620" class="external-link is-normal">[Allen et al., 2020]</a> to be
          challenging to learn for model-free methods, highlighting PhyPlan’s adaptability to long-horizon tasks.
        </li>
      </ol>
    </div>
  </section>

  <section class="section">
    <div class="container is-max-desktop">
      <h2 class="title is-3 has-text-centered">LLM (Baseline) Prompting Details</h2>
      <div class="column has-text-justified">
        <p>
          We investigate the physical reasoning abilities of a Large Language Model (specifically <a href="https://gemini.google.com/" class="external-link is-normal">Google's Gemini-Pro LLM</a>). We initially describe the task setup and ask the LLM to generate the actions. We execute the generated action in the environemnt and reprompt the LLM based with the feedback of where the ball/puck landed with respect to the goal.
        </p>
      </div>
      <hr>

      <div class="content">
        <h3 class="title is-4 has-text-centered"><span class="dnerf">Launch Task</span></h3>
        <em>Initial Prompt (Task Description):</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a fixed pillar over which the ball is resting, and a pendulum hanging over the ball that the robot can orient to hit the ball to throw it to the goal. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck projectiles and lands far away on the ground.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
</code></pre>
        <em>Feedback Prompt:</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">In one line, give the numerical values of the angle to orient the pendulum's plane and the angle to drop the pendulum from (both in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}) and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing the goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by &lthorizontal distance between ball and goal&gt'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by &lthorizontal distance between ball and goal&gt'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by &lthorizontal distance between ball and goal&gt'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by &lthorizontal distance between ball and goal&gt'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'. \
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle) pair. Send in tuple FORMAT: (angle 1, angle 2). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!
</code></pre>
      </div>

      <div class="content">
        <h3 class="title is-4 has-text-centered"><span class="dnerf">Slide Task</span></h3>
        <em>Initial Prompt (Task Description):</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a puck that needs to reach the goal. The environment has a fixed table over which the puck slides, and a pendulum hanging over the puck that the robot can orient to hit the puck to slide it to the goal. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck slides on the table.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
</code></pre>
        <em>Feedback Prompt:</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">In one line, give the numerical values of the angle to orient the pendulum's plane and the angle to drop the pendulum from (both in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}) and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the puck landed, and you should modify your answer accordingly till the puck reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The puck lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The puck lands in the correct half but left of the goal, I'd say 'LEFT by &lthorizontal distance between puck and goal&gt'.
  3. The puck lands in the correct half but right of the goal, I'd say 'RIGHT by &lthorizontal distance between puck and goal&gt'.
  4. The puck lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by &lthorizontal distance between puck and goal&gt'.
  5. The puck lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by &lthorizontal distance between puck and goal&gt'.
  6. Finally, the puck successfully landed in the goal, I'd say 'GOAL'. \
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle) pair. Send in tuple FORMAT: (angle 1, angle 2). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!
</code></pre>
      </div>

      <div class="content">
        <h3 class="title is-4 has-text-centered"><span class="dnerf">Bounce Task</span></h3>
        <em>Initial Prompt (Task Description):</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a wedge (an inclined plane at 45 degrees from the horizontal plane) placed at origin, and the robot can bounce the ball over the wedge to place the ball inside the goal. The height of the wedge centre from the ground is fixed at 0.3 metres. The robot can orient the wedge along any horizontal direction and choose to drop the ball over the wedge from any height. When dropped from a height, the ball bounces on the wedge and lands far away on the ground.
Sanity check 1: How does the orientation angle of the wedge affect the ball's position with respect to the goal?
Sanity check 2: How does the drop height of the ball affect the ball's position with respect to the goal?
</code></pre>
        <em>Feedback Prompt:</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">In one line, give the numerical values of the angle to orient the wedge and the height to drop the ball from in the format (angle in decimal radians, height in meters). The bound for angle is ({bnds[0][0]}, {bnds[0][1]}) and that for height is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by &lthorizontal distance between ball and goal&gt'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by &lthorizontal distance between ball and goal&gt'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by &lthorizontal distance between ball and goal&gt'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by &lthorizontal distance between ball and goal&gt'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (angle, height) pair. Send in tuple FORMAT: (angle, height). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!</code></pre>
      </div>

      <div class="content">
        <h3 class="title is-4 has-text-centered"><span class="dnerf">Bridge Task</span></h3>
        <em>Initial Prompt (Task Description):</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a puck that needs to reach the goal. The environment has a fixed table over which the puck slides, a movable bridge over which the puck slides and a pendulum that the robot can orient to move the puck towards the goal. The robot can orient the pendulum along any vertical plane, orient the bridge in any horizontal direction and choose to drop the pendulum from any angle from the vertical axis. When hit with a pendulum, the puck slides on the table, then on the bridge and finally projectiles to land far away on the ground.
Sanity check 1: How does the plane of the pendulum affect the puck's position with respect to the goal?
Sanity check 2: How does the drop angle of the pendulum affect the puck's position with respect to the goal?
Sanity check 3: How does the orientation angle of the bridge affect the puck's position with respect to the goal?
</code></pre>
        <em>Feedback Prompt:</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">In one line, give the numerical values of the angle to orient the pendulum's plane, the angle to orient the bridge and the angle to drop the pendulum from (all in decimal radians). The bound for plane orientation angle is ({bnds[0][0]}, {bnds[0][1]}), that for bridge orientation angle is ({bnds[2][0]}, {bnds[2][1]}), and that for drop angle with vertical axis is ({bnds[1][0]}, {bnds[1][1]}). I will tell you where the puck landed, and you should modify your answer accordingly till the puck reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Thoughout the conversation, remember that my response would be one of these:
  1. The puck lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The puck lands in the correct half but left of the goal, I'd say 'LEFT by &lthorizontal distance between puck and goal&gt'.
  3. The puck lands in the correct half but right of the goal, I'd say 'RIGHT by &lthorizontal distance between puck and goal&gt'.
  4. The puck lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by &lthorizontal distance between puck and goal&gt'.
  5. The puck lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by &lthorizontal distance between puck and goal&gt'.
  6. Finally, the puck successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (pendulum's plane angle, pendulum's drop angle, bridge's orientation angle) triplet. Send in tuple FORMAT: (angle 1, angle 2, angle 3). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!
</code></pre>
      </div>

      <div class="content">
        <h3 class="title is-4 has-text-centered"><span class="dnerf">Ricochet Task</span></h3>
        <em>Initial Prompt (Task Description):</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">There is a robot and a goal located at {goal_pos} outside the direct reach of the robot. There is a ball that needs to reach the goal. The environment has a fixed pillar over which the ball is resting, a movable wedge (an inclined plane at 45 degrees from the horizontal plane), and a pendulum hanging over the ball that the robot can orient to hit the ball. The robot can orient the pendulum along any vertical plane and choose to drop the pendulum from any angle from the vertical axis. However, the pendulum can only be used to hit the ball in a direction away from the goal, making it impossible to reach the goal directly by the pendulum alone. To solve this, the robot must move and orient the wedge such that the ball bounces off it after being hit by the pendulum, and lands inside the goal. The robot can change the radial distance of the wedge from the origin, the direction of this radial distance (in polar coordinates), and orient the wedge in any horizontal direction.
Sanity check 1: How does the drop angle of the pendulum affect the ball's trajectory and its position with respect to the wedge?
Sanity check 2: How does the plane of the pendulum affect the ball's trajectory position with respect to the wedge?
Sanity check 3: How does the position (radial distance and angle) of the wedge affect the ball's position with respect to the goal?
Sanity check 4: How does the orientation angle of the wedge affect the ball's position with respect to the goal?
</code></pre>
        <em>Feedback Prompt:</em>
        <div class="columns is-centered is-vcentered"></div>
<pre><code style="white-space: pre-wrap;">In one line, give the numerical values of the pendulum's plane angle, pendulum's drop angle, wedge's radial distance from origin, direction angle of this radial distance (in polar form), and wedge's orientation angle (all in decimal radians or meters where applicable). The bound for pendulum plane orientation is ({bnds[0][0]}, {bnds[0][1]}), that for pendulum drop angle is ({bnds[1][0]}, {bnds[1][1]}), that for wedge radial distance is ({bnds[2][0]}, {bnds[2][1]}), that for radial direction angle is ({bnds[3][0]}, {bnds[3][1]}), and that for wedge orientation angle is ({bnds[4][0]}, {bnds[4][1]}). I will tell you where the ball landed, and you should modify your answer accordingly till the ball reaches the goal. I have marked the ground into two halves. The goal lies in one half, and the robot and the wedge are at the centre. Throughout the conversation, remember that my response would be one of these:
  1. The ball lands in the half not containing goal, I'd say 'WRONG HALF'.
  2. The ball lands in the correct half but left of the goal, I'd say 'LEFT by &lthorizontal distance between ball and goal&gt'.
  3. The ball lands in the correct half but right of the goal, I'd say 'RIGHT by &lthorizontal distance between ball and goal&gt'.
  4. The ball lands in the correct half and in line but overshot the goal, I'd say 'OVERSHOT by &lthorizontal distance between ball and goal&gt'.
  5. The ball lands in the correct half and in line but fell short of the goal, I'd say 'FELL SHORT by &lthorizontal distance between ball and goal&gt'.
  6. Finally, the ball successfully landed in the goal, I'd say 'GOAL'.
Note: In your response, do not write anything else except the (pendulum plane angle, pendulum drop angle, wedge radial distance, wedge radial direction, wedge orientation angle) tuple. Send in tuple FORMAT: (angle 1, angle 2, distance, angle 3, angle 4). Do not emphasise the answer, just return plain text. Let's begin with an initial guess!
</code></pre>
      </div>
    </div>
  </section>

  <hr>

  <section class="section" id="References">
    <div class="container is-max-desktop content">
      <h2 class="title">References</h2>
      <pre><code>1. [Allen et al., 2020] Kelsey R Allen, Kevin A Smith, and Joshua B Tenenbaum.
        Rapid trial-and-error learning with simulation supports flexible tool use and physical reasoning.
        Proceedings of the National Academy of Sciences, 117(47):29302-29310, 2020.</code></pre>
      <pre><code>2. [Bakhtin et al., 2019] Anton Bakhtin, Laurens van der Maaten, Justin Johnson, Laura Gustafson, and Ross Girshick.
        Phyre: A new benchmark for physical reasoning.
        Advances in Neural Information Processing Systems, 32,484 2019.</code></pre>
    </div>
  </section>

  <section class="section" id="BibTeX">
    <div class="container is-max-desktop content">
      <h2 class="title">Citation</h2>
      <pre><code>@inproceedings{phyplan2026,
      title     = {PhyPlan: Learning To Plan Tasks with Generalizable and Rapid Physical Reasoning for Embodied Manipulation},
      author    = {Kanwar, Ankit and Soin, Hartej and Barnawal, Abhinav and Chopra, Mudit and Vagadia, Harshil and Banerjee, Tamajit and Tuli, Shreshth and Chakraborty, Souvik and Paul, Rohan},
      booktitle = {},
      year      = {2026}
    }</code></pre>
    </div>
  </section>


  <footer class="footer">
    <div class="container">
      <div class="content has-text-centered">
        Website template borrowed from <a href="https://github.com/nerfies/nerfies.github.io">NeRFies</a>
      </div>
    </div>
  </footer>

</body>

</html>