질문&답변
클라우드/리눅스에 관한 질문과 답변을 주고 받는 곳입니다.
리눅스 분류

리눅스 공부 중 질문( LAM/MPI 관련)

작성자 정보

  • ndhcom 작성
  • 작성일

컨텐츠 정보

본문

master 밑에 slave1, slave2 를 놔두고
병렬계산 하는걸 하고있는데요.  lamboot -v  할때 slave 들의 방화벽을 해제해야(iptables -F)
lamboot 가 되는데  iptables -F 쓰지 않고  lamboot가 되게 하려면
방화벽 설정을 어떻게 해줘야 할까요?


linuxchk@master ~]$ lamboot -v lamhosts

LAM 7.1.1/MPI 2 C++/ROMIO - Indiana University

n-1<3665> ssi:boot:base:linear: booting n0 (master)
n-1<3665> ssi:boot:base:linear: booting n1 (slave1)
n-1<3665> ssi:boot:base:linear: booting n2 (slave2)
-----------------------------------------------------------------------------
The lamboot agent failed to open a client socket to the newly-booted
process at IP address 192.168.29.129, port 32776.   <---이 포트 번호는 할때 마다 바뀌더라고요.

*** PLEASE READ THIS ENTIRE MESSAGE, FOLLOW ITS SUGGESTIONS, AND
*** CONSULT THE "BOOTING LAM" SECTION OF THE LAM/MPI FAQ
*** (http://www.lam-mpi.org/faq/) BEFORE POSTING TO THE LAM/MPI USER'S
*** MAILING LIST.

Although the newly-booted process has already communicated
successfully with the lamboot agent over other TCP sockets, this is
the first time that the lamboot agent tried to initiate a connection
to the newly-booted process.  As such, this may indicate:

        1. 192.168.29.129 is not the correct IP address for the machine where the
           newly-booted machine was launched
        2. There are network filters between the lamboot agent host and
           the remote host such that communication on random TCP ports
           is blocked
        3. Network routing from the the local host to the remote isn't
           properly configured (this is unlikely)

For number 1, check to ensure that 192.168.29.129 is the correct IP address for
that machine.  If it is not, check the host mapping on that machine
(e.g., /etc/hosts) to ensure that 192.168.29.129 is both reachable and is the by
the host where the lamboot agent is running, and is the correct host.

For numbers 2 and 4, try to telnet to 192.168.29.129, port 32776.  You should get a
"connection refused" error, which will indicate that you successfully
connected to some machine at that IP address, and no process was
listening on that port.  If you get any other kind of error, check
with your system/network administrator -- it may indicate network /
routing issues between the two hosts.
-----------------------------------------------------------------------------
n-1<3665> ssi:boot:base:linear: aborted!
n-1<3675> ssi:boot:base:linear: booting n0 (master)
n-1<3675> ssi:boot:base:linear: booting n1 (slave1)
n-1<3675> ssi:boot:base:linear: booting n2 (slave2)
n-1<3675> ssi:boot:base:linear: finished
lamboot did NOT complete successfully

관련자료

댓글 1

리온님의 댓글

  • 리온
  • 작성일
iptables 의 SNAT 기능을 활용해 보시기 바랍니다.
자세한 것은 작업중인 네트워크 환경이 어떤지 알수 없어서
설명드리기가 힘드네요. 다음 문서의 6.5.12 참조해보시기 바랍니다.

http://www.faqs.org/docs/iptables/targets.html#SNATTARGET

커널에서 ip forward 기능을 켜주시는 것도 잊지 마시구요.

공지사항


뉴스광장


  • 현재 회원수 :  60,037 명
  • 현재 강좌수 :  35,810 개
  • 현재 접속자 :  93 명