this project hasnt been updated in some time and I am trying to get it to work with newer versions of Mesos, MPICH and Zookeeper. I had to make several adjustments to the mrun.py script to get it to work with the newer eggs that mesos provides. Changes made were:
diff --git a/mrun.py b/mrun.py
index fa77978..c5e92b9 100644
--- a/mrun.py
+++ b/mrun.py
@@ -1,7 +1,8 @@
-#!/usr/bin/env python
+7#!/usr/bin/env python
-import mesos
-import mesos_pb2
+import mesos.interface as mesos
+import mesos.interface.mesos_pb2 as mesos_pb2
+import mesos.scheduler
import os
import logging
@@ -68,7 +73,7 @@ def finalizeSlaves(callbacks):
logging.info("Done finalizing slaves")
-class HydraScheduler(mesos.Scheduler):
+class HydraScheduler(mesos.interface.Scheduler):
@@ -250,7 +257,7 @@ if __name__ == "__main__":
work_dir = tempfile.mkdtemp()
- driver = mesos.MesosSchedulerDriver(
+ driver = mesos.scheduler.MesosSchedulerDriver(
scheduler,
framework,
args[0])
In addition the binaries in export/bin and libraries in export/lib were updated as well to reflect the new versions below. My test environment is zookeeper, the mesos slave and master, hadoop as well as this project on all one the same host. Because I installed the mesos interface and mesos.scheduler eggs manually via easy_install I did not need to install the version provided with this project in the Makefile. I was able to compile and upload the hello world app and upload it to the HDFS name node. This is the result of running the mrun.py script (Debug flag is set in the mrun shell wrapper script):
[root@ip-10-206-2-108 mesos-hydra]# ./mrun -N 1 -n 1 "zk://10.206.2.108:2181/mesos" ./hello_world
INFO:root:Connecting to Mesos master zk://10.206.2.108:2181/mesos
INFO:root:Total processes 1
INFO:root:Total nodes 1
INFO:root:Procs per node 1
INFO:root:Cores per node 1
2017-02-28 11:27:53,867:2948(0x7fdc0abcc700):ZOO_INFO@log_env@726: Client environment:zookeeper.version=zookeeper C client 3.4.8
2017-02-28 11:27:53,867:2948(0x7fdc0abcc700):ZOO_INFO@log_env@730: Client environment:host.name=ip-10-206-2-108.ec2.internal
2017-02-28 11:27:53,867:2948(0x7fdc0abcc700):ZOO_INFO@log_env@737: Client environment:os.name=Linux
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@log_env@738: Client environment:os.arch=3.10.0-514.6.2.el7.x86_64
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@log_env@739: Client environment:os.version=#1 SMP Fri Feb 17 19:21:31 EST 2017
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@log_env@747: Client environment:user.name=ec2-user
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@log_env@755: Client environment:user.home=/root
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@log_env@767: Client environment:user.dir=/home/hadoop/mesos-hydra
2017-02-28 11:27:53,868:2948(0x7fdc0abcc700):ZOO_INFO@zookeeper_init@800: Initiating client connection, host=10.206.2.108:2181 sessionTimeout=10000 watcher=0x7fdc1646cb2a sessionId=0 sessionPasswd=<null> context=0x7fdbf8000c20 flags=0
I0228 11:27:53.868440 2948 sched.cpp:226] Version: 1.1.0
2017-02-28 11:27:53,869:2948(0x7fdc091b6700):ZOO_INFO@check_events@1728: initiated connection to server [10.206.2.108:2181]
2017-02-28 11:27:53,871:2948(0x7fdc091b6700):ZOO_INFO@check_events@1775: session establishment complete on server [10.206.2.108:2181], sessionId=0x15a856e6d320005, negotiated timeout=10000
I0228 11:27:53.872299 2952 group.cpp:340] Group process (zookeeper-group(1)@10.206.2.108:35453) connected to ZooKeeper
I0228 11:27:53.872354 2952 group.cpp:828] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0228 11:27:53.872369 2952 group.cpp:418] Trying to create path '/mesos' in ZooKeeper
I0228 11:27:53.873971 2951 detector.cpp:152] Detected a new leader: (id='14')
I0228 11:27:53.874215 2953 group.cpp:697] Trying to get '/mesos/json.info_0000000014' in ZooKeeper
I0228 11:27:53.876013 2956 zookeeper.cpp:259] A new leading master (UPID=master@10.206.2.108:5050) is detected
I0228 11:27:53.876211 2954 sched.cpp:330] New master detected at master@10.206.2.108:5050
I0228 11:27:53.876564 2954 sched.cpp:341] No credentials provided. Attempting to register without authentication
I0228 11:27:53.878914 2954 sched.cpp:743] Framework registered with 5cf46fd1-92b2-4f16-b92b-f434c853e2c7-0001
INFO:root:Registered with framework ID 5cf46fd1-92b2-4f16-b92b-f434c853e2c7-0001
Traceback (most recent call last):
File "/usr/lib64/python2.7/logging/__init__.py", line 851, in emit
msg = self.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib64/python2.7/logging/__init__.py", line 328, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file mrun.py, line 94
INFO:root:Launching proxy on offer value: "5cf46fd1-92b2-4f16-b92b-f434c853e2c7-O1"
from 10.206.2.108
INFO:root:Replying to offer: launching proxy 0 on host 10.206.2.108
INFO:root:Call-back at 10.206.2.108:31000
Traceback (most recent call last):
File "/usr/lib64/python2.7/logging/__init__.py", line 851, in emit
msg = self.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 724, in format
return fmt.format(record)
File "/usr/lib64/python2.7/logging/__init__.py", line 464, in format
record.message = record.getMessage()
File "/usr/lib64/python2.7/logging/__init__.py", line 328, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Logged from file mrun.py, line 94
INFO:root:Finalize slaves
INFO:root:about to execute mpiexec
INFO:root:in slave loop
'HYDRA_LAUNCH: /tmp/tmpeHgagc/./export/bin/hydra_pmi_proxy --control-port 10.206.2.108:38010 --rmk user --launcher manual --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 \n'
INFO:root:None
INFO:root:Done finalizing slaves
ERROR:root:A task finished unexpectedly: Command exited with status 1
I0228 11:27:59.428092 2951 sched.cpp:1995] Asked to stop the driver
I0228 11:27:59.428184 2951 sched.cpp:1187] Stopping framework 5cf46fd1-92b2-4f16-b92b-f434c853e2c7-0001
2017-02-28 11:27:59,438:2948(0x7fdc0c3cf700):ZOO_INFO@zookeeper_close@2526: Closing zookeeper sessionId=0x15a856e6d320005 to [10.206.2.108:2181]
(note: I added some additional logging here and there as you can see)
The problem is around 'HYDRA_LAUNCH: /tmp/tmpeHgagc/./export/bin/hydra_pmi_proxy --control-port 10.206.2.108:38010 --rmk user --launcher manual --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 \n'. This command returns a non zero exit code. The version of hydra_pmi_proxy is the one that is in MPICH 3.2. Documentation on this command is scarce and it appears to be an internal helper script for mpiexec's purposes.
I tried to add some debugging to the hydra-proxy.py script in order to see the stderr and/or stdout of this command but could not find a way to get to output to stdout/stderr of my tty.
Zookeeper: 3.4.9
Mesos: 1.1.0
MPICH: 3.2
Hadoop 2.6.0
CentOS 7
Linux 3.10.0-514.6.2.el7.x86_64
this project hasnt been updated in some time and I am trying to get it to work with newer versions of Mesos, MPICH and Zookeeper. I had to make several adjustments to the mrun.py script to get it to work with the newer eggs that mesos provides. Changes made were:
In addition the binaries in
export/binand libraries inexport/libwere updated as well to reflect the new versions below. My test environment is zookeeper, the mesos slave and master, hadoop as well as this project on all one the same host. Because I installed the mesos interface and mesos.scheduler eggs manually viaeasy_installI did not need to install the version provided with this project in the Makefile. I was able to compile and upload the hello world app and upload it to the HDFS name node. This is the result of running the mrun.py script (Debug flag is set in the mrun shell wrapper script):(note: I added some additional logging here and there as you can see)
The problem is around
'HYDRA_LAUNCH: /tmp/tmpeHgagc/./export/bin/hydra_pmi_proxy --control-port 10.206.2.108:38010 --rmk user --launcher manual --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id 0 \n'. This command returns a non zero exit code. The version of hydra_pmi_proxy is the one that is in MPICH 3.2. Documentation on this command is scarce and it appears to be an internal helper script for mpiexec's purposes.I tried to add some debugging to the hydra-proxy.py script in order to see the stderr and/or stdout of this command but could not find a way to get to output to stdout/stderr of my tty.