Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
checkpoint_techniques_on_compute_canada_clusters [2015/03/30 18:54] – 132.216.122.26 | checkpoint_techniques_on_compute_canada_clusters [2024/03/26 13:52] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 45: | Line 45: | ||
# 7779 by default, but if there are several DMTCP schedulers running on | # 7779 by default, but if there are several DMTCP schedulers running on | ||
# the same node we will have problems. The best solution is to assign the | # the same node we will have problems. The best solution is to assign the | ||
- | # port number manually. | + | # port number manually. Also, if PORT=0, a random unused port will be |
+ | # chosen, which is probably better. | ||
PORT=7745 | PORT=7745 | ||
Line 78: | Line 79: | ||
# New version of this script. Now we use DMTCP to launch | # New version of this script. Now we use DMTCP to launch | ||
- | # the scripts | + | # the scripts. |
def chunks(l, n): | def chunks(l, n): | ||
Line 165: | Line 166: | ||
</ | </ | ||
- | In the end, this script generates a bunch of '' | + | In the end, this script generates a bunch of '' |
+ | |||
+ | **Currently this is not working as expected; for some unknown reason, only 2 random jobs get re-started. I have contacted Calcul Québec about this and they should reply shortly. I will update this page with a bug-free script (or whatever solution they give me.)** | ||
+ | |||
+ | **Update 2: they did not reply.** |