Fix #334, add some doc. on how to replace the Manager in case of failure#335
Fix #334, add some doc. on how to replace the Manager in case of failure#335giuseppe-carboni wants to merge 2 commits intomasterfrom
Conversation
doc/production.rst
Outdated
| Production | ||
| ********** | ||
|
|
||
| Unlike the Development environment, that uses Vagrant pre-configured virtual |
There was a problem hiding this comment.
In development.rst we wrote "development environment", to be coherent we should use the same criteria.
doc/production.rst
Outdated
| Replace the Manager in case of failure | ||
| -------------------------------------- | ||
| In case the Manager machine suffers a failure of some sort, it has to be | ||
| replaced. In order to do this, the first thing to do is perform again the |
There was a problem hiding this comment.
"is perform" or "is to perform" ?
There was a problem hiding this comment.
I'm investigating this with an English speaking friend, I'll post the correct version ASAP
There was a problem hiding this comment.
About the point below, it is not clear what are "all station systems".
| - Make sure that all the station systems and machines accept incoming | ||
| connections from the newly allocated Manager's IP address. Specifically, the | ||
| ``TotalPower`` backend and the ``CalMux`` machines have to be tweaked in | ||
| order to allow them to be controlled by the new manager. |
There was a problem hiding this comment.
This procedure involves logging in the said machines as root, if it has to be documented, this is not the place to do it. A suggestion about this is we perform this step in advance by allowing a range of addresses to control the said machines, so, in case of failure, this step can be skipped.
There was a problem hiding this comment.
No clear to me how it is possible to replicate the manager without any information about this point. I think the procedure should be documented somewhere, and in case this is not the place, here we have to put a reference link to it.
| ``discos-console`` and ``discos-storage`` machines (in case the DISCOS | ||
| control software is running on a distributed environment). This will allow | ||
| other services such as the Lustre service on the ``discos-storage`` machine | ||
| to point again to the correct IP address. |
There was a problem hiding this comment.
Is there a procedure to point to?
| control software is running on a distributed environment). This will allow | ||
| other services such as the Lustre service on the ``discos-storage`` machine | ||
| to point again to the correct IP address. | ||
| - Perform the ssh key exchange procedure between the ``discos`` user of the |
There was a problem hiding this comment.
Does Mauro do all this things? :-D We need an example for him :-)
There was a problem hiding this comment.
This is not a procedure that a generic observer can do. Performing the ssh key exchange requires knowing the password of both the discos and the root users.
There was a problem hiding this comment.
I was joking, the point is that we have to write the documentation thinking that the reader is not one of the discos team...
No description provided.