docs: improve README with install guide, feature matrix, and troubleshooting#61
docs: improve README with install guide, feature matrix, and troubleshooting#61Pendu wants to merge 3 commits intotraceopt-ai:mainfrom
Conversation
abhinavsriva
left a comment
There was a problem hiding this comment.
Please have a look at the changes. The goal is to keep readme simple and quick to scan. Everything else goes into docs.
|
|
||
| ## Quick start | ||
|
|
||
| ### Prerequisites |
There was a problem hiding this comment.
I think this should go into quickstart then here
|
|
||
| For local review and comparison, TraceML also includes a local UI. See [`docs/quickstart.md`](docs/quickstart.md) for setup details. | ||
|
|
||
|  |
There was a problem hiding this comment.
I removed it as there were too many images on readme
| ## TraceML vs alternatives | ||
|
|
||
| TraceML is for lightweight diagnosis during real PyTorch training runs. | ||
| | Capability | TraceML | W&B / Neptune | TensorBoard Profiler | |
There was a problem hiding this comment.
I would remove W&B/Neptune they are not runtime systems and compare only with Tensorboard or Pytorch profiler
I would also remove these options
Experiment tracking & comparison | ❌ | ✅ | ✅ |
| Hyperparameter sweeps | ❌ | ✅ | ❌ |
| Team collaboration | ❌ | ✅ | ❌ |
they are not scope of TraceML. These are more for on platform systems.
| | Hyperparameter sweeps | ❌ | ✅ | ❌ | | ||
| | Team collaboration | ❌ | ✅ | ❌ | | ||
|
|
||
| It is **not**: |
There was a problem hiding this comment.
I will keep the original. Again W&B/Neptune are not in comparison. In fact another PR tried to put this into WandB.
| - basic example | ||
| - input / dataloader stall | ||
| - DDP straggler / rank skew | ||
| | Example | Requires | Description | |
There was a problem hiding this comment.
This is good. But please remove requires column. It is suppose to complement so expectgation is that HF/torch would be installed already. Usual workflows already have torch, reinstall can easily prod environments.
|
|
||
| --- | ||
|
|
||
| ## Troubleshooting |
There was a problem hiding this comment.
Please move this to quick start. This is good.
Summary
[torch],[hf],[lightning]) to preventModuleNotFoundErroron fresh instanceswatch/run/deepmodesMotivation
Running
pip install traceml-aion a fresh instance and then running examples fails with missingtorch/transformers. The README needed clearer prerequisites and dependency guidance.