Skip to content

Resume GCP Experiments#4

Open
stevenlin1111 wants to merge 3 commits into
vitchyr:masterfrom
stevenlin1111:gcp
Open

Resume GCP Experiments#4
stevenlin1111 wants to merge 3 commits into
vitchyr:masterfrom
stevenlin1111:gcp

Conversation

@stevenlin1111
Copy link
Copy Markdown

@stevenlin1111 stevenlin1111 commented Jan 11, 2019

Run python scripts/gcp/gcp_restart_server.py {PREEMPTION_BUCKET_NAME} to start the server that relaunches preempted experiments. Basically, preempted experiments are relaunched with one extra change: files that were periodically uploaded by the preempted instance (referred to as checkpoint) are automatically downloaded onto the relaunched instance with the same local path. Also, if the experiment is preempted more than max_retries number of times, it is relaunched as a non-preemptible instance instead.

vitchyr pushed a commit that referenced this pull request Aug 23, 2021
…-not-hang

add pigz by default and remove redundant wait call that can hang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant