Differences
This shows you the differences between two versions of the page.
| Both sides previous revision Previous revision Next revision | Previous revision | ||
| checkpoint_techniques_on_compute_canada_clusters [2015/03/30 19:29] – [Automatic checkpoints] 132.216.122.26 | checkpoint_techniques_on_compute_canada_clusters [2024/03/26 13:52] (current) – external edit 127.0.0.1 | ||
|---|---|---|---|
| Line 45: | Line 45: | ||
| # 7779 by default, but if there are several DMTCP schedulers running on | # 7779 by default, but if there are several DMTCP schedulers running on | ||
| # the same node we will have problems. The best solution is to assign the | # the same node we will have problems. The best solution is to assign the | ||
| - | # port number manually. | + | # port number manually. Also, if PORT=0, a random unused port will be |
| + | # chosen, which is probably better. | ||
| PORT=7745 | PORT=7745 | ||
| Line 78: | Line 79: | ||
| # New version of this script. Now we use DMTCP to launch | # New version of this script. Now we use DMTCP to launch | ||
| - | # the scripts | + | # the scripts. |
| def chunks(l, n): | def chunks(l, n): | ||
| Line 167: | Line 168: | ||
| In the end, this script generates a bunch of '' | In the end, this script generates a bunch of '' | ||
| - | **Currently this is not working as expected. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** | + | **Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** |
| + | |||
| + | **Update 2: they did not reply.** | ||