This document contains answers to the following questions concerning Linda. 1) TOPIC: GENERAL LINDA QUESTIONS 1.1) What is Linda? 1.2) What is a tuple, and how is it pronounced? 1.3) What is a tuple space? 1.4) What is a virtual shared memory? 1.5) Where is tuple space stored? 1.6) Is Linda implemented simply as a subroutine library? 1.7) When should I use lexit() or lhalt()? Should I ever use exit()? 1.8) What languages are supported? 2) TOPIC: PERFORMANCE ISSUES 2.1) Does Tuple Space become a bottleneck when there are many processors? 2.2) Is associative matching a performance problem in Linda? 3) TOPIC: COMPILING C-LINDA PROGRAMS 3.1) How do I get my C-Linda program to link when I'm calling FORTRAN routines? C++ routines? 3.2) Can I ignore the warning messages like "prog.cl:10: warning --- no matching Linda op." 3.3) What environment variables are used by the development software? 3.4) How do I get a link map with C-Linda? 3.5) Why do I get the message: "/usr/licensed/linda/clc: /usr/linda/true_clc: not found" when trying to compile my C-Linda program? 3.6) How do I make clc use a different C compiler? 3.7) How do I make clc use gcc instead of cc? 3.8) How do I use ANSI C mode on my HP? 3.9) How do I compile and link C modules with my C-Linda application? 3.10) How do I compile and link Fortran modules with my Fortran-Linda application? 3.11) What is the difference between the "normal" and "underscore" versions of Linda? 4) TOPIC: LANGUAGE ISSUES 4.1) What restrictions are there on the types of functions that can be eval'd? 4.2) What restrictions are there on the types of parameters that can be passed to eval'd functions? 4.3) What restrictions are there on the values that can be put into a tuple? 4.4) Is the use of inp and rdp recommended? 5) TOPIC: TUPLE SCOPE AND DEBUGGING 5.1) How do I debug CDS programs with dbx? 5.2) How do I debug Network Linda programs with dbx? 5.3) What is the difference between tuple scope and post mortem tuple scope? 5.4) Can I use the CDS with tuplescope to do performance fine-tuning? 5.5) What are aggregate fields, and how do I display them with the tuplescope? 6) TOPIC: RUNNING NETWORK LINDA PROGRAMS 6.1) Why do I get the message: "ntsnet: WARNING: ping may not be a valid Network Linda executable" 6.2) Why do I get the messages: "Permission denied." "ntsnet: too many workers exited to continue" "ntsnet: needed: 1, started: 1, died: 1" 6.3) Why do I get the messages: "rsh: shell/tcp: unknown service" "ntsnet: too many workers exited to continue" "ntsnet: needed: 1, started: 1, died: 1" 6.4) Why do I get the message: "stty: TCGETS: Operation not supported on socket" 6.5) Why do I get the message: "Linda Error: node maple(15): keepalive failure" 6.6) Why do I get the messages: "ntsnet: warning: rup rpc failed on oak: Program not registered" "ntsnet: using fallback load: 0.990000" 6.7) Why do I get the message: "More evals than processors - deadlock could occur" 6.8) How do I run my Network Linda program on a heterogeneous network? 6.9) Can I execute Network Linda programs without using the rshd daemon? 6.10) Why do I get the message: "ntsnet: shutting down with return code 9" when my Network Linda program finishes? 6.11) How can I tell ntsnet exactly how many processes to schedule on each node? 6.12) How can I tell ntsnet to run one eval server on every node, including the local node (so that the local node executes both the realmain() process and a worker process)? 6.13) Why do I get the message: "ntsnet: WARNING: ping appears to be incompatible with ntsnet" 6.14) Why does my program take so long to start executing? 6.15) How do I set environment variables for remote Linda processes? 6.16) Why do I get the message: "Internal Error... Opening passwd file in parse passwd file" 6.17) Why do I get the message: "Linda Error: node (-1): hostname not found in configuration file" 7) TOPIC: RUNNING CDS LINDA PROGRAMS 7.1) Why do I get the message: "ping: Network linda executable missing +LARGS argument, aborting." 7.2) Why do I get the message: "Linda Error: out of tb's" 7.3) How large is a CDS tuple block? 7.4) Why do I get the message: "linda init: cannot allocate semaphores." 7.5) Why do I get the message: "linda init: cannot allocate msg structure." 7.6) Why do I get the message: "linda init: cannot create shared region." 7.7) How does CDS Linda emulate parallel processing, since it runs on a single workstation? 8) TOPIC: PROGRAMMING IN NETWORK LINDA 8.1) Is there a way to find out how many processors are available to avoid evaling more Linda processes than processors? 9) TOPIC: PROFILING LINDA PROGRAMS 9.1) How can I profile my Linda program to find out how to make it run faster? 9.2) What ParaGraph displays are useful to use when viewing trace files produced by Linda? 10) TOPIC: PROGRAMMING IN C-LINDA 10.1) What are useful sources of information on programming in Linda? 10.2) How can I implement a barrier in Linda? 11) TOPIC: NTSNET 11.1) How does ntsnet find the local executable? 11.2) How does ntsnet find executables to distribute? 11.3) To what directory does ntsnet copy executables on remote machines? 11.4) In what directory does ntsnet expect to find executables if they are not distributed? 11.5) What directory does ntsnet use as the working directory for the local Linda process? 11.6) What directory does ntsnet use as the working directory for the remote Linda processes? 11.7) What are some map file examples? 11.8) How does ntsnet determine what nodes to run on? 12) TOPIC: INSTALLING LINDA 12.1) Why can't I read my tape on my RS6000? 12.2) Why can't I read my tape on my SGI? 13) TOPIC: MISCELLANEOUS 13.1) How can I use Linda on multiple Shared Memory Multiprocessors? 14) TOPIC: LICENSE SERVER 14.1) Can the lserv License server run on a different subnet then my Network Linda job? ---------------------------------------------------------------------- Subject: 1) TOPIC: GENERAL LINDA QUESTIONS ---------------------------------------------------------------------- Subject: 1.1) What is Linda? Linda is a coordination language consisting of a set of six operations. It is always built on a base language, such as C or Fortran, so it doesn't require applications to be completely rewritten. ---------------------------------------------------------------------- Subject: 1.2) What is a tuple, and how is it pronounced? A tuple (pronounced "two' pull") is the fundamental Linda data object. It consists of an ordered collection of typed data objects or place holders, called elements. ---------------------------------------------------------------------- Subject: 1.3) What is a tuple space? It is a place where tuples live. A new tuple space is created for every Linda program, and can be accessed by any process within a given Linda program. ---------------------------------------------------------------------- Subject: 1.4) What is a virtual shared memory? It is a medium used to share data between different processes on different machines without the need of physical shared memory. In the case of Linda, it is called a tuple space. ---------------------------------------------------------------------- Subject: 1.5) Where is tuple space stored? That depends on the implementation, and it makes no difference to the user's program. On shared memory machines, it is stored in shared memory. In distributed memory machines, tuple space is distributed across all the machines executing the Linda program. ---------------------------------------------------------------------- Subject: 1.6) Is Linda implemented simply as a subroutine library? No, it is implemented with a compiler since it is language. This provides many advantages. Linda knows the size and type of all data objects, so it doesn't require the user to provide that information when creating tuples. It can also provide error checking not possible for a simple library. It also performs various global optimizations. ---------------------------------------------------------------------- Subject: 1.7) When should I use lexit() or lhalt()? Should I ever use exit()? Never use exit(). It can cause CDS Linda programs to hang. Using lexit() or (flexit() in Fortran-Linda) is equivalent to doing a return from either real_main() or the eval'd routine. It has no affect on other Linda processes. Executing lhalt() (or flhalt()) will cause all Linda processes to abort. It is useful for handling fatal application errors. ---------------------------------------------------------------------- Subject: 1.8) What languages are supported? Currently, C-Linda and Fortran-Linda are supported, but C++-Linda is in progress. ---------------------------------------------------------------------- Subject: 2) TOPIC: PERFORMANCE ISSUES ---------------------------------------------------------------------- Subject: 2.1) Does Tuple Space become a bottleneck when there are many processors? No, not if the tuples can be distributed across the Network, as is usually the case in real applications. Network Linda can even dynamically redistribute tuples to improve performance. ---------------------------------------------------------------------- Subject: 2.2) Is tuple matching a performance problem in Linda? No, that is one of the advantages of Linda being a language, not simply a library. The Linda compiling system does a complete tuple operation analysis at link time, and generates a custom interface into the Linda support library to optimize the storage of tuples for each Linda program. In most Linda programs, very few tuples are checked before a matching tuple is found. ---------------------------------------------------------------------- Subject: 3) TOPIC: COMPILING C-LINDA PROGRAMS ---------------------------------------------------------------------- Subject: 3.1) How do I get my C-Linda program to link when I'm calling FORTRAN routines? C++ routines? clc links programs using the shell script linda_cc_link. Linda_cc_link normally uses the C compiler to link Linda programs, but the environment variable LINDA_CC_LINK can be used to specify an alternate program for linking. ---------------------------------------------------------------------- Subject: 3.2) Can I ignore the warning messages like "prog.cl:10: warning --- no matching Linda op." This could mean that clc noticed that a tuple is being created with an out or eval, but is never accessed with an in, inp, rd, or rdp. It is common practice not to in tuples created with eval, so that is a common reason for the warning. But the message could mean that a tuple is being in'd that was never out'd. If the in is executed, it is guaranteed to deadlock. Thus, the warning should be checked. If it refers to an eval, it is harmless. ---------------------------------------------------------------------- Subject: 3.3) What environment variables are used by the development software? LINDA_CLC is the most useful. LINDA_FLC ... LINDA_CC_LINK ... LINDA_FORTRAN_LINK ... TSNET_PATH (This entry is currently under construction) ---------------------------------------------------------------------- Subject: 3.4) How do I get a link map with C-Linda? On SunOS 4.1, use % clc -linda link_args "-Qoption ld -M" -o prog prog.cl ---------------------------------------------------------------------- Subject: 3.5) Why do I get the message: "/usr/licensed/linda/clc: /usr/linda/true_clc: not found" when trying to compile my C-Linda program? It means that the distribution isn't properly installed. Anytime the distribution is moved, it needs to be reinstalled. Just go into the top level distribution directory (containing install_pkg), and execute install_pkg. If the same distribution is used on multiple machines, there could be a problem if the different machines have different paths for reaching the distribution. You may need to create some symbolic links to the distribution on some of the machines so that everyone can use the same path, and then execute install_pkg with the canonical path as an argument to install_pkg. For example, after creating a symbolic link called /usr/licensed/linda pointing to the distribution on all machines, install linda on just one of the machines with the command: % install_pkg /usr/licensed/linda Now, the installation should work on all the machines. ---------------------------------------------------------------------- Subject: 3.6) How do I make clc use a different C compiler? The C-Linda compiler, clc, uses a collection of shell scripts in the bin directory of the Linda distribution to preprocess .cl files, compile C programs, and link executables. These shell scripts can be customized to meet your specific needs. To make this easier and avoid having to modify the distribution, many of the shell scripts use environment variables, which if defined, will change the behavior of the shell script. The Network Linda shell scripts are: /bin/linda_cc " linda_cc_link " linda_cpp " postcpp_cc and the CDS Linda shell scripts are: /cds/bin/linda_cc " linda_cc_link " linda_cpp " postcpp_cc You could edit any or all of these shell scripts, changing "cc" to the desired C compiler, or "cpp" to a different C preprocessor. It is possible that all you have to do is set some environment variables to the path of the new compiler: % setenv LINDA_CC /usr/local/bin/acc % setenv LINDA_CC_LINK /usr/local/bin/acc % setenv POSTCPP_CC /usr/local/bin/acc You could also specify an option for the compiler, as in: % setenv LINDA_CC "/usr/local/bin/acc -ANSI" % setenv LINDA_CC_LINK "/usr/local/bin/acc -ANSI" % setenv POSTCPP_CC "/usr/local/bin/acc -ANSI" Unfortunately, these have no effect on linda_cpp, so it would have to be modified in order to use a different preprocessor. ---------------------------------------------------------------------- Subject: 3.7) How do I make clc use gcc instead of cc? See answer 3.6 for a general discussion. It can be as simple as this: % setenv LINDA_CC gcc % setenv LINDA_CC_LINK gcc % setenv POSTCPP_CC gcc but this won't use the GNU C preprocessor. You may want to edit linda_cpp to use it, but that may not be necessary, depending the particular gcc installation. ---------------------------------------------------------------------- Subject: 3.8) How do I use ANSI C mode on my HP? See answer 3.6 for a general discussion. The obvious solution is to do define the environment variables as follows: setenv LINDA_CC "/bin/cc -Aa" setenv LINDA_CC_LINK "/bin/cc -Aa" setenv POSTCPP_CC "/bin/cc -Aa" Unfortunately, there is a problem with this on the HP. As described in section six of the C-Linda Reference Manual, clc first preprocesses the .cl file using linda_cpp, converts it to a .c file, and then compiles it using postcpp_cc. There isn't a simple way to turn off the preprocessing phase of the HP ANSI C compiler, so the output from the C preprocessor becomes the input to the C preprocessor in postcpp_cc. Many C preprocessors have no problem with this, but it causes a problem with /lib/cpp.ansi on the HP. Line directives of the form: # 1 "real_main.cl" # 1 "/usr/local/linda/cds/lib/linda.h" are output from /lib/cpp.ansi, which results in error messages, such as: __lt4871.c: 1: Unknown preprocessing directive. __lt4871.c: 3: Unknown preprocessing directive. when they are input to /lib/cpp.ansi. The HP C preprocessor requires line directives such as: #line 1 "real_main.cl" #line 1 "/usr/local/linda/cds/lib/linda.h" Two possible solutions are to make linda_cpp output line directives that are legal input to /lib/cpp.ansi, or to turn off the preprocessing phase of cc. The first solution could be done using a sed command like: sed 's/^# *\([0-9]\)/#line \1/' to convert the offending line directives, but it is probably better to use the second solution. Modify postcpp_cc as follows: #!/bin/sh shift case $POSTCPP_CC in "") /bin/cc -tp,${LINDA_PATH}bin/lcpp $* ;; *) $POSTCPP_CC -tp,${LINDA_PATH}bin/lcpp $* ;; esac This uses the cc -t option to change the preprocessor to be a shell script that you should create in the bin directory of the Linda distribution. The shell script could be simply: #!/bin/sh /bin/cp $1 $2 Note that these changes don't actually force ANSI C mode, but merely allow the use of the -Aa option without problems. These changes can be made in the Linda distribution without forcing all users to use ANSI C. Now ANSI C mode can be turned on by setting the environment variable CCOPTS (defined by the HP C compiler) to -Aa, as in: % setenv CCOPTS -Aa Alternatively, the shell scripts could be modified to include the -Aa option, as in: #!/bin/sh shift /bin/cc -Aa -tp,${LINDA_PATH}bin/lcpp $* to always use ANSI mode. Also, on the HP, linda_cpp uses /lib/cpp by default. You could run into problems unless linda_cpp is modified to use /lib/cpp.ansi. This could be done by modifying linda_cpp, as described in answer 3.6. Perhaps a better solution is to change linda_cpp to use cc with the -E option, as follows: #!/bin/sh input=$1 output=$2 shift shift if [ $# -eq 0 ] then /bin/cc -E -I${LINDA_PATH}lib $input > $output else /bin/cc -E -I${LINDA_PATH}lib "$@" $input > $output fi Now, the CCOPTS environment variable can be used to turn on ANSI C mode during preprocessing, as well as compiling. ---------------------------------------------------------------------- Subject: 3.9) How do I compile and link C modules with my C-Linda application? You can just compile .c files with your C compiler, and then link the .o file with the rest of your C-Linda application using clc. There is no reason to rename a pure C file to have a .cl suffix so it can be compiled with clc. ---------------------------------------------------------------------- Subject: 3.10) How do I compile and link Fortran modules with my Fortran-Linda application? You can just compile .f files with your Fortran compiler, and then link the .o file with the rest of your Fortran-Linda application using flc. There is no reason to rename a pure Fortran file to have a .fl suffix so it can be compiled with flc. ---------------------------------------------------------------------- Subject: 3.11) What is the difference between the "normal" and "underscore" versions of Linda? Why do some architectures have both versions, while others don't? The answer has to do with conventions for interlanguage calling between C and Fortran. Usually, C transforms symbols by prepending an "_", so that the symbol "foo" in your program becomes _foo to the loader. The reason for this is to reserve symbol names without underscores for internal use. Fortran, on the other hand, usually prepends *and* appends an underscore, so "foo" becomes "_foo_". Since each language is internally consistent, this only becomes an issue when C calls Fortran or vice versa. In that case, all C routines called by Fortran must be named with an appended "_", and all calls by C of Fortran routines must also append an "_". For example: C: foo_() { bar_() ..... } Fortran: subroutine bar begin call foo .... end However, not all Fortran compilers follow this convention, in particular AIX and HPUX compilers. In these compilers, Fortran symbols do *not* get the appended "_", i.e. they are identical to C symbols. However, in both of these compilers, there is a flag that can be set that forces the "_" convention to be followed. This is nice, because it allows for a consistent and portable C/Fortran interface. Thus, programmers who are mixing C and Fortran can choose either method: using underscores, which is more portable, or not using underscores, which is more aesthetically pleasing. Unfortunately, this choice has an impact on the underlying Linda library, which has C/Fortran interfaces of its own. For that reason, we build two separate systems on machines that present this choice. The default system corresponds to the default behavior of the Fortran compiler, i.e. no underscore. When using the underscore version, *all* Fortran files compiled directly (i.e. not passed to flc) must be compiled with the appropriate flag so that their symbols have "_" appended. In AIX, use "-qextname"; in HPUX, use "+ppu". Any .f files passed to flc will automatically be compiled with the appropriate flag. Maintaining two different systems is admittedly cumbersome. In the future, we plan to support both styles with a single system. ---------------------------------------------------------------------- Subject: 4) TOPIC: LANGUAGE ISSUES ---------------------------------------------------------------------- Subject: 4.1) What restrictions are there on the types of functions that can be eval'd? The return value of the function must be a scalar type. Aggregate types are not allowed. ---------------------------------------------------------------------- Subject: 4.2) What restrictions are there on the types of parameters that can be passed to eval'd functions? Parameters to eval'd functions must be scalar types, with a limit of 16 parameters. ---------------------------------------------------------------------- Subject: 4.3) What restrictions are there on the values that can be put into a tuple? The only restriction is that pointer values can't be put into tuples. Values pointed to by pointers can be put into tuples as in the example struct x *p; ... out("foo", p:sizeof(struct x)); Normal structures and arrays can easily be put into tuples, for example struct x a; double b[100]; ... out("bar", a, b); ---------------------------------------------------------------------- Subject: 4.4) Is the use of inp and rdp recommended? These operations should be used with caution, as they are more easily abused then other operations. They are timing dependent, and can lead to race conditions. This is particularly true since outs execute asynchronously in Network Linda. Usually, thinking about the problem will lead to a better solution using in or rd. Nevertheless, they can be useful at times. The important thing is not to draw any strong conclusions from an inp or rdp failing unless you really know what you are doing. ---------------------------------------------------------------------- Subject: 5) TOPIC: TUPLE SCOPE AND DEBUGGING ---------------------------------------------------------------------- Subject: 5.1) How do I debug CDS programs with dbx? First, compile the program with the -g option, and link with -g and -linda tuple_scope. When executing the program with tuple scope, click the middle mouse button on the icon of the process you wish to debug. Tuple scope will bring up an xterm with a dbx session attached to that process. (On HP/UX, it will use hpterm and xdb). The DEBUG environment variable can be set to use a different debugger, for example, gdb. This requires that the system support attaching debuggers to running processes. This is supported on all Linda systems except DEC workstations (both DECstations and Alphas). ---------------------------------------------------------------------- Subject: 5.2) How do I debug Network Linda programs with dbx? Use the -debug option with ntsnet. Make sure you compiled and linked your program with the -g option. Ntsnet will bring up a dbx session in an xterm for each Linda process. An alias, lrun, is used to start each process executing with the correct arguments. This is currently not supported on HPs. ---------------------------------------------------------------------- Subject: 5.3) What is the difference between tuple scope and post mortem tuple scope? CDS Linda supports the true, runtime tuple scope, which allows the execution of a Linda program to be monitored and controlled. It also allows a debugger, such as dbx, to be attached to a process as it executes. Network Linda only supports post mortem tuple scope which allows the execution to be viewed after the program itself has finished running. ---------------------------------------------------------------------- Subject: 5.4) Can I use the CDS with tuplescope to do performance fine-tuning? Not really. There are two major problems in trying to determine performance with CDS tuplescope. First, tuplescope has a large impact on performance, since each tuple operation incurs a lot of X-windows support overhead. This overhead swamps the cost of the Linda operations. Secondly, the very nature of the CDS makes it hard to draw conclusions about performance, since the CDS timeshares several concurrent processes on a single CPU (see 7.7 for a discussion of the CDS implementation). Parallel performance tuning cannot generally be done in this context. For performance tuning in Network Linda, we recommend using the Paragraph profiling tool. ---------------------------------------------------------------------- Subject: 5.5) What are aggregate fields, and how do I display them with the tuplescope? Aggregate fields are tuple fields that are not simple scalar values. In C, they are arrays, unions, and structs; in Fortran, arrays and common blocks. Internally in tuplespace, aggregates are represented as simple byte arrays. However, tuplescope allows aggregates to be interpreted as arrays of all of the normal scalar types (e.g. int, long, float, etc.). Note that display of aggregates is currently only possible using tuplescope with shared-memory versions of Linda, including the Code Development System. First, after executing the program, but before clicking on the RUN button, use the MODES menu to select ``Display Aggregates'', and ``Dynamic Tuple Fetch''. The second option causes the aggregate fields to be dynamically formatted when viewed, rather than when produced. Click the ``Run'' button and allow the program to produce some tuples. Now, when viewing a tuple, you will see the contents of aggregate fields instead of just the word ``block''. The default viewing type is array of ints. To change this, open the aggregate menu, click to deselect ``Int'', and select some other type. If you selected ``Dynamic Tuple Fetch'' before running the program, the Aggregate format selection will affect existing tuples. If not, it will only affect tuples subsequently produced. ---------------------------------------------------------------------- Subject: 6) TOPIC: RUNNING NETWORK LINDA PROGRAMS ---------------------------------------------------------------------- Subject: 6.1) Why do I get the message: "ntsnet: WARNING: ping may not be a valid Network Linda executable" Ntsnet looks to see if a magic string is in the local executable file which should be in all Network Linda executable files. It warns you if it isn't there, but executes it anyway. If it doesn't work, it may be because it really isn't a Network Linda executable. One common reason for this is that the executable is really a CDS executable, in which case, you should see a bunch of messages like: Linda initializing (2000 blocks). Linda initialization complete. Another reason for the warning message is if you are wrapping your Network Linda executable in a shell script, a trick that used to be necessary to run different executables on different nodes. If that is the case, you can ignore the message, as long as the shell script is written properly. You can try testing the file yourself, using the command: example% strings ping | grep linda_version %__linda_version_tsnet_v2.5.2 Another quick way to find out if a file is a Network Linda executable is to run it without ntsnet. You should see the messages: ping: Network linda executable missing +LARGS argument, aborting. ping: Use +LARGS and linda arguments if starting by hand, ping: or start the executable using the ntsnet utility. If you see Linda initializing (2000 blocks). Linda initialization complete. then it's a CDS Linda executable, as mentioned above. ---------------------------------------------------------------------- Subject: 6.2) Why do I get the messages: "Permission denied." "ntsnet: too many workers exited to continue" "ntsnet: needed: 1, started: 1, died: 1" You are unable to rsh to a node in your nodelist. The rsh fails, and ntsnet aborts since it isn't able to get a enough workers to satisfy its requirements. You may also just see the "Permission denied." message, and the program runs fine. That is because ntsnet was able to get enough workers, even though it didn't get all that it started. By default, ntsnet only needs to get one worker, although the -n option can change this. ---------------------------------------------------------------------- Subject: 6.3) Why do I get the messages: "rsh: shell/tcp: unknown service" "ntsnet: too many workers exited to continue" "ntsnet: needed: 1, started: 1, died: 1" This is a variant of the previous answer. In this case, rsh is failing to execute getservbyname(3), perhaps due to an overloaded NIS server. Use of the ntsnet -delay option may help this problem by decreasing the rate at which it forks rsh processes, but you may just need to give ntsnet more nodes to choose from. Ultimately, your system administrator may have to reconfigure your system to eliminate this problem. ---------------------------------------------------------------------- Subject: 6.4) Why do I get the message: "stty: TCGETS: Operation not supported on socket" This is one version of a classic rsh problem. The problem is that the user's .cshrc file has an stty command that fails when rsh is used (since rsh doesn't use a pseudo terminal). Ntsnet starts up the workers on each remote node with the rsh command, by default. The standard solution is to use something like: if ($?prompt == 0) then exit endif This could be put right at the beginning of .cshrc, but must be put before the stty commands. Other commands, such as biff, only work on interactive runs, giving different error messages. ---------------------------------------------------------------------- Subject: 6.5) Why do I get the message: "Linda Error: node maple(15): keepalive failure" The error message isn't as informative as it could be. What happened is that node maple noticed that another node was not responding to what we call "keep alive" messages. If a node isn't able to respond to keep alive messages, it is probably in some bad state (perhaps due to NFS problems?) that could cause the whole Linda program to hang. So rather than have you run for another couple of days before you get suspicious enough to abort the run, maple sounded the alarm, exited, and ntsnet shut the program down. In some cases, it might be useful to increase the keep alive period. This can be done as in the example: % ntsnet -kainterval 400 ping 100 This might be useful for long running jobs on some networks. There is a discussion of keep alive messages on page 4-23 of the C-Linda User's Guide. ---------------------------------------------------------------------- Subject: 6.6) Why do I get the messages: "ntsnet: warning: rup rpc failed on oak: Program not registered" "ntsnet: using fallback load: 0.990000" By default, ntsnet uses a remote procedure call (rpc) to the rstatd daemon to determine the load average of the remote machines. The rpc fails with this message if rstatd is not running on one of the remote machines. Many machines don't enable rstatd by default, and some machines don't support it. It can usually be enabled by uncommenting the appropriate line in /etc/inetd.conf and reinitializing inetd by sending it a SIGHUP. Ask your system administrator if this can be done. The message can be avoided by telling ntsnet not to get the load averages of remote machines. This can be done by setting the getload resource to false in the tsnet.config file, or, equivalently, by using the ntsnet +getload command line option. ---------------------------------------------------------------------- Subject: 6.7) Why do I get the message: "More evals than processors - deadlock could occur" Ntsnet is used to start a fixed number of eval servers - processes that handle eval requests. These servers only handle one request at a time, therefore, a backlog of eval requests can occur if more evals are executed than there are eval servers. Deadlock can occur if the program is written to assume that all eval'd processes are executing concurrently. If deadlock doesn't occur, all eval requests will eventually be serviced by an eval server that has finished processing a previous eval request. ---------------------------------------------------------------------- Subject: 6.8) How do I run my Network Linda program on a heterogeneous network? Ntsnet has many features that support executing Network Linda programs on heterogeneous networks. The suffixstring resource (specified in the configuration file) can be used to tell ntsnet to use a different executable file on different machines. For example, with the configuration file ntsnet.Appl.hp1.suffixstring: .hp ntsnet.Appl.hp2.suffixstring: .hp ntsnet.Appl.mysparc.suffixstring: .sparc ntsnet.Appl.myrs6k.suffixstring: .rs6k the command % ntsnet ping 100 will cause ntsnet to use three different executables, ping.hp, ping.sparc, and ping.rs6k on the four different nodes. Also, map files can be used to equivalence a set of directories in such a way that a different directory is used for each platform. For example, with the map file map /usr/bin/linda { hp1 hp2 : /usr/bin/linda/hp; mysparc : /usr/bin/linda/sparc; myrs6k : /usr/bin/linda/rs6k; } the command % ntsnet /usr/bin/linda/hp/ping 100 executed on hp1 will cause ntsnet to use three different executables, /usr/bin/linda/hp/ping, /usr/bin/linda/sparc/ping, and /usr/bin/linda/rs6k/ping on the four different nodes. ---------------------------------------------------------------------- Subject: 6.9) Can I execute Network Linda programs without using the rshd daemon? Yes, ntsnet uses the linda_rsh shell script to insulate it from the actual command used for remote execution. Linda_rsh can be modified by the user, but the supplied version supports both "rsh" and "on". Ntsnet passes the value of the lindarsharg resource to linda_rsh for each remote process, so either "rsh" or "on" can be used. For example, with the configuration file ntsnet.Node.lindarsharg: on ntsnet.mydec.lindarsharg: rsh ntsnet will cause linda_rsh to use "on" for all nodes except "mydec", since DECstations don't support "on". If linda_rsh is modified by the user to support another remote execution command, the lindarsharg resource can still be used to let linda_rsh choose what command to use. Ntsnet just passes the appropriate value of the lindarsharg resource for each node. ---------------------------------------------------------------------- Subject: 6.10) Why do I get the message: "ntsnet: shutting down with return code 9" when my Network Linda program finishes? Either your real_main function explicitly or implicitly returned a value other than zero. The real_main function is defined to return an integer value. If there is no return in real_main, an undefined value will be returned to ntsnet, which is reported by ntsnet. ---------------------------------------------------------------------- Subject: 6.11) How can I tell ntsnet exactly how many processes to schedule on each node? Try putting something like the following in your tsnet.config file: ! These settings are for "manual mode" scheduling. ! The speedfactor and minworkers values are necessary default values. ntsnet.Appl.getload: False ntsnet.Appl.maxprocspernode: 1000000 ntsnet.Node.speedfactor: 1.0 ntsnet.Appl.maxworkers: 1000000 ntsnet.Appl.minworkers: 1 ! These settings reflect the desire to not count the master ! process, and to run one worker per node. ntsnet.Appl.masterload: 0.0 ntsnet.Node.threshold: 1.0 You can now use the threshold resource to control how many workers are scheduled on a given node. For example, if you want to schedule three processes on node "frank", just add the line: ntsnet.frank.threshold: 3.0 If you decide that you want to include the master process in the count, just remove the line that sets masterload to 0.0 (it defaults to 1.0). Note that by setting maxworkers to a million, ntsnet schedules as many processes as it can on each node. The threshold acts as the limit. With minworkers set to one, ntsnet doesn't consider it an error to only schedule, say ten processes, rather than one million. ---------------------------------------------------------------------- Subject: 6.12) How can I tell ntsnet to run one eval server on every node, including the local node (so that the local node executes both the realmain() process and a worker process)? It can be done using the same basic technique described in the previous answer. Try putting something like the following in your tsnet.config file: ntsnet.Appl.getload: False ntsnet.Appl.maxprocspernode: 2 ntsnet.Node.speedfactor: 1.0 ntsnet.Appl.maxworkers: 1000000 ntsnet.Appl.minworkers: 1 ntsnet.Appl.masterload: 0.0 ntsnet.Node.threshold: 1.0 ---------------------------------------------------------------------- Subject: 6.13) Why do I get the message: "ntsnet: WARNING: ping appears to be incompatible with ntsnet" This means that your Network Linda program was built with different version of Network Linda than the one that you're using to execute it. This is only a warning, but if your program doesn't work properly, this is probably the reason. ---------------------------------------------------------------------- Subject: 6.14) Why does my program take so long to start executing? This is usually because rsh/rshd is taking a long time. Rshd is usually slow because it reads and executes the user's .cshrc file on the remote machine. It is generally a good idea to modify the .cshrc file to not do very much when invoked via rsh. This is described in Subject 6.4. Basically, put something like if ($?prompt == 0) then exit endif near the beginning of .cshrc (probably after setting path). This can sometimes avoid extra work and speed execution. However, the real problem may be that it is taking a very long time for rshd to even start reading .cshrc, due to the way your home directory is configured. For example, if your home directory is auto mounted by remote machines on your network, when you start executing your Network Linda program, all the remote machines will have to mount the exported partitions of your local machine. One solution is to hard mount, rather than auto mount, your home directory on the remote machines. Another solution is to make the home directory local on each of the remote machines. Your home directory can still contain symbolic links to common directories, but if .cshrc is local to each remote machine, start up time for your Network Linda program can be much faster. It is a good idea to test the speed of rsh with a simple example, such as example% rsh remotenode date Once rsh runs faster, there may still be a problem due to the Network Linda executable not being local to the remote machines. The best thing is to distribute the executable to a local directory on all the remote machines once, and then use ntsnet to execute it from now on. This makes sense for production use, in particular. There are many ways to distribute the executable. You can execute rcp directly, or use ftp. You can also use the ntsnet -distribute option with +cleanup to distribute the executable, but not delete it afterwards. This distribution scheme assumes that the executable is in a directory that has the same path on all machines, but is local on all machines. A map file can be used to make different path names equivalent. You could try experiments with /tmp (which is usually local to each machine) to see if this is a reason for slow start up. ---------------------------------------------------------------------- Subject: 6.15) How do I set environment variables for remote Linda processes? There are various ways that this can be done, but the different methods can depend on the remote execution mechanism that you're using. For instance, if you're using "on" (by setting the lindarsharg resource in your configuration file), then the local environment is automatically exported to remote processes. Rsh doesn't do that, so another mechanism is necessary. If csh is the default shell on a remote node, you can set environment variables in your remote .cshrc files, which are sourced by csh before executing the Linda program. If you're using sh or ksh, this isn't possible, since .profile isn't sourced for non-interactive execution. In that case, you may need to create a modified version of linda_rsh to achieve the desired effect. For example, you want to set the environment variable FOO to different values on different remote nodes. One way to do this is to add a few lines to linda_rsh, as follows: *) case "$rsh_arg" in on) exec /usr/bin/on -n $host "$@" ;; + DISPLAY=*) exec /usr/ucb/rsh $host $user -n $rsh_arg "$@" + ;; *) exec /usr/ucb/rsh $host $user -n "$@" ;; esac ;; The lines prefixed by "+" are the new lines. Note that the path for rsh and on varies on different platforms, so don't use this verbatim. Now add the following lines to ~/.tsnet.config: ntsnet.frank.lindarsharg: DISPLAY=biff:0 ntsnet.joe.lindarsharg: DISPLAY=chet:0 ntsnet.Node.lindarsharg: DISPLAY=junk:0 Now, when ntsnet executes linda_rsh to start a process on node frank, it will include the option "-r DISPLAY=biff:0", and on node joe, it will include the option "-r DISPLAY=chet:0". In both cases, the new line in linda_rsh will be used to execute the remote process, and DISPLAY will be set as specified in the remote processes. Another method is to use wrap the Linda program in a shell script that determines what node it is running on, and sets any environment variables, and then execs the Linda program. The shell script could look something like: #!/bin/sh case `hostname` in frank*) DISPLAY=biff:0 ;; joe*) DISPLAY=chet:0 ;; *) DISPLAY=junk:0 ;; esac export DISPLAY exec /usr/linda/bin/foo "$@" Note that using this method, you will see warning messages from ntsnet that you may not be executing a valid Network Linda program. This is correct, since ntsnet only sees the shell script, which isn't a Network Linda program. You can safely ignore the warnings. We suggest that you use the ntsnet -vv option when debugging these kinds of changes. See sections 6.4, 6.9, and 6.14 for more information on the topics of rsh and lindarsharg. ---------------------------------------------------------------------- Subject: 6.16) Why do I get the message: % ntsnet -n 3 suite Internal Error... Opening passwd file in parse passwd file Contact Customer Service at Scientific Computing Associates TEL 203-777-7442 FAX 203-776-4074 EMAIL lsupport@sca.com ntsnet: master process exited with return value 1 when trying to run my Linda program? Check the permissions on the Linda license file /lib/linda.lcn to be sure that it is readable by you. ---------------------------------------------------------------------- Subject: 6.17) Why do I get the message: % ntsnet -n 2 -suffix ping Linda Error: node sol1 (-1): hostname not found in configuration file ntsnet: worker on node sol1.sca.com exited abnormally % ls cl-examples ping.cl ping.sun ping.sol % more ~/.tsnet.config Tsnet.Appl.nodelist: sun1 sun2 sol1 Tsnet.Appl.Node.suffixstring: .sun Tsnet.Appl.sol1.suffixstring: .sol when trying to run my Linda program heterogenously? This can occur if your Network Linda executables were built with different version of the Linda compiler. To run heterogenously, all Linda executables need to be compiled with the same version of the clc compiler. ---------------------------------------------------------------------- Subject: 7) TOPIC: RUNNING CDS LINDA PROGRAMS ---------------------------------------------------------------------- Subject: 7.1) Why do I get the message: ping: Network linda executable missing +LARGS argument, aborting. ping: Use +LARGS and linda arguments if starting by hand, ping: or start the executable using the ntsnet utility. when trying to run my C-Linda program? This message indicates that you're trying to execute a Network Linda program. Set the LINDA_CLC or LINDA_FLC environment variable to CDS, and relink the program. ---------------------------------------------------------------------- Subject: 7.2) Why do I get the message: "Linda Error: out of tb's" when trying to run my Linda program under CDS? It means that your program is creating more tuples than it has allocated shared memory. The default size is 2000 tuple blocks (tb's), where a tb is about 280 bytes. To double the number of tb's (to a little over a megabyte), relink your program using a command like: % clc -linda ts 4000 -o foo foo.cl If you try to allocate too much shared memory, you'll get the message: linda init: cannot create shared region. Then you have to either decrease the number of tb's, or reconfigure your Unix kernel for more shared memory. To allow for less shared memory, you may have to modify your Linda program to use water marking, for example. ---------------------------------------------------------------------- Subject: 7.3) How large is a CDS tuple block? A tuple block is 280 bytes. ---------------------------------------------------------------------- Subject: 7.4) Why do I get the message: "linda init: cannot allocate semaphores." CDS Linda is implemented using System V message queues, semaphores, and shared memory. If a CDS Linda program aborts abnormally, under certain circumstances, it will not be able to deallocate those resources. When you see this error message, use the "ipcs" command to see if your user account has resources allocated, and then "ipcrm" to remove resources that are from aborted Linda runs. The following shell script builds an ipcrm command that can be executed using the eval command after manual verification: #!/bin/sh ME=`whoami` ipcs | awk ' BEGIN { me = "'$ME'" printf("ipcrm") } /^[qms][ \t]/ { if (me == $5) { printf(" -%s %d", $1, $2) } } END { printf("\n"); } ' An example session with ipclean could go: example% ipcs IPC status from pandora as of Wed Aug 10 09:33:01 1994 T ID KEY MODE OWNER GROUP Message Queues: q 3650 0x00000000 -Rrw------- weston linda Shared Memory: m 7300 0x00000000 --rw------- weston linda Semaphores: s 730 0x00000000 --ra------- weston linda example% ipclean ipcrm -q 3650 -m 7300 -s 730 example% eval `ipclean` example% ipcs IPC status from pandora as of Wed Aug 10 09:35:13 1994 T ID KEY MODE OWNER GROUP Message Queues: Shared Memory: Semaphores: ---------------------------------------------------------------------- Subject: 7.5) Why do I get the message: "linda init: cannot allocate msg structure." See the answer to subject 7.4. ---------------------------------------------------------------------- Subject: 7.6) Why do I get the message: "linda init: cannot create shared region." Either you've had a lot CDS Linda programs abort abnormally, or your machine isn't configured with enough shared memory. You can use the "ipcs" command to determine the status of System V IPC resource usage. You can use the "ipcrm" to clean up after abort CDS Linda programs, as described in the answer to subject 7.4. If that isn't the case, you either have to either decrease the amount of shared memory that your CDS Linda program uses, or reconfigure your Unix kernel for more shared memory. The clc/flc "-linda ts " option can be used to change the amount of shared memory that allocated to run your program. See subject 7.2 for more information on this subject. ---------------------------------------------------------------------- Subject: 7.7) How does CDS Linda emulate parallel processing, since it runs on a single workstation? The CDS is designed to provide a resonable emulation of parallel processing, using the natural concurrency of a timesharing operating system. Programs running using the CDS are truly concurrent, because each eval is implemented with its own process. The programs are not parallel, because only one of them is executing at any instant in time. Tuplespace is implemented in a shared-memory segment using semaphores to control access. There are two main points of concern when comparing the CDS to true, parallel versions of Linda: process environment and interleaving. It is important that each eval in the CDS have the same environment as it would have in any other implementation of Linda, e.g. on a network. Fork *is* used to create new processes for evals in the CDS, but a clean version of the process (called the cloner process) is used. This prevents contamination of the child process by any of the state built up by the evaling process, and allows the eval semantics to be the same for CDS Linda and (non-shared memory) parallel versions of Linda. Secondly, it is important that the individual processes in a Linda execution on the CDS be interleaved arbitrarily. This is accomplished reasonably well by the natural interleaving of the timesharing scheduler. ---------------------------------------------------------------------- Subject: 8) TOPIC: PROGRAMMING IN NETWORK LINDA ---------------------------------------------------------------------- Subject: 8.1) Is there a way to find out how many processors are available to avoid evaling more Linda processes than processors? Use the lprocs() function in C-Linda, and the flprocs() function in Fortran-Linda. The value returned includes the real_main process. Thus, the value is >= 1, and it is safe to execute (lprocs() - 1) evals, as in the example for (i = 0; i < lprocs() - 1; i++) eval("worker", worker(i)); Note that your program may need to check if lprocs() returns one, if it needs to execute at least one eval to work properly. ---------------------------------------------------------------------- Subject: 9) TOPIC: PROFILING LINDA PROGRAMS ---------------------------------------------------------------------- Subject: 9.1) How can I profile my Linda program to find out how to make it run faster? Profiling is supported by Network Linda. Link the program using the "-linda profile" clc option. Then, run the program as usual using ntsnet. This will generate one Linda trace file for each node. These files must be postprocessed by the pgtrace command to produce a single ParaGraph trace file. The ParaGraph trace file can then be viewed using PG. For example, % clc -o prime -linda profile prime.cl % ntsnet -n 4 prime 4 1000000 10000 % pgtrace prime*.ltr > prime.trf % PG prime.trf ---------------------------------------------------------------------- Subject: 9.2) What ParaGraph displays are useful to use when viewing trace files produced by Linda? Most displays can be useful, except for the "Tasks" displays, which is not supported (there is no means of defining tasks in Linda). Other utilization and communication displays, such as Gantt, Kiviat, Spacetime, and Animation, can all be useful for characterizing the performance of a Linda program. ---------------------------------------------------------------------- Subject: 10) TOPIC: PROGRAMMING IN C-LINDA ---------------------------------------------------------------------- Subject: 10.1) What are useful sources of information on programming in Linda? The C-Linda User's Guide contains several case studies, and SCIENTIFIC also has reprints of Linda case studies. The book "How to Write Parallel Programs: A First Course", by Nicholas Carriero and David Gelernter, published by The MIT Press, is another good source of information on Linda programming. ---------------------------------------------------------------------- Subject: 10.2) How can I implement a barrier in Linda? Very often barriers are over kill, but when necessary, it is easy to implement. One example is: void barrier(i, n) int i; /* My linda id */ int n; /* Number of linda processes in the barrier */ { int direction; if (n > 1) { if (i == 0) { out("barrier", i + 1, 1); in("barrier", i, ? direction); } else if (i == n - 1) { out("barrier", i - 1, -1); in("barrier", i, ? direction); } else { in("barrier", i, ? direction); out("barrier", i + direction, direction); in("barrier", i, ? direction); out("barrier", i + direction, direction); } } } No initialization or clean up is required for this routine, and it can be executed any number of times. ---------------------------------------------------------------------- Subject: 11) TOPIC: NTSNET ---------------------------------------------------------------------- Subject: 11.1) How does ntsnet find the local executable? The ntsnet command line must always specify a Linda command name. This command name can optionally be prefixed with a directory specification. If it is, then the executable must be in the specified directory. Otherwise, ntsnet uses the TSNET_PATH environment variable to find executable. When searching for the executable, ntsnet appends the suffix string (if used) to the command name. For example, if the local node uses the suffix string ".sparc", then the command % cd /tmp % ntsnet ./suite will cause ntsnet to look for the file "/tmp/suite.sparc". If it doesn't exist, ntsnet will print an error message. % setenv TSNET_PATH /usr/bin/linda:. % cd /tmp % ntsnet suite In this case, ntsnet will first look for "/usr/bin/linda/suite.sparc", and if it is not there, it will look for "/tmp/suite.sparc". Note that if TSNET_PATH is not defined, ntsnet uses a default value of "/usr/bin/linda:.". So in this case, the setenv command is not necessary, but is used for clarity. Also, note that the rexecdir resource and the -p option have no effect where ntsnet looks for the local executable. ---------------------------------------------------------------------- Subject: 11.2) How does ntsnet find executables to distribute? If distribution is enabled, they must be in the same directory as the local executable. They must have the appropriate suffix string. If your local machine is a Sparc, but you also run on have RS/6000s and HP9000s, % setenv TSNET_PATH /usr/bin/linda:. % cd /tmp % ls /usr/bin/linda suite.hp9000 suite.rs6000 suite.sparc % ntsnet -d suite Ntsnet finds the local executable in /usr/bin/linda, and then looks in the same directory for the files "suite.hp9000" and "suite.rs6000". It will copy "/usr/bin/linda/suite.hp9000" to the appropriate directory on all nodes that use the ".hp9000" suffix string, and "/usr/bin/linda/suite.rs6000" to the appropriate directory on all nodes that use the ".rs6000" suffix string. ---------------------------------------------------------------------- Subject: 11.3) To what directory does ntsnet copy executables on remote machines? If the -d option is used, ntsnet will distribute an executable to each remote node at the beginning of the run and remove them at the end. Ntsnet normally uses the rexecdir resource in the configuration file to determine where to copy an executable. The default and recommended value of rexecdir is "Parallel". This means that ntsnet copies the executable to the remote directory that is "parallel" (or equivalent) to the local executable directory. This parallel directory is determined by using the ntsnet map file to translate the name of the local executable directory. Very often, no translation is needed, so for the previous example, ntsnet would copy the executables to /usr/bin/linda on the remote machines. The map files are helpful if different nodes have dissimilar directory structures. If rexecdir is set to something other than "Parallel", it must be the name of a directory on the remote machine. Ntsnet will copy the executable to that directory. The rexecdir resource is node specific, so it can be set to a different directory for each node. This provides a simpler, though less powerful, mechanism for copying executables to different directories on different nodes than the use of map files. It is also useful if you want to set up one directory on each remote node that will contain all executables. For example, % cd ~/work % ntsnet -d -opt 'ntsnet.suite.Node.rexecdir: /tmp' /usr/bin/linda/suite Ntsnet finds suite.sparc in /usr/bin/linda, and then copies "/usr/bin/linda/suite.hp9000" to "/tmp/suite.hp9000" on every node that has the ".hp9000" suffix string. If the -p option is used, the rexecdir resource is ignored. The executable is copied to the directory that is parallel to the specified directory. So for the command % ntsnet -d -p /tmp /usr/bin/linda/suite ntsnet will copy "/usr/bin/linda/suite.hp9000" to "/tmp/suite.hp9000", assuming that a map file doesn't cause any translation. (This can be assured by the use of the +translate option to turn off path translation.) Note that the -p option also sets the remote working directory and is provided primarily for compatibility with the previous tsnet utility. It is not recommended for normal use, but can be useful when there isn't a good map file set up for your network. Note that with the previous tsnet utility, it was very common to use the command % tsnet -p $cwd suite when working in a directory that can be accessed via NFS by all nodes in an homogeneous network. The -p option is not needed for this case with ntsnet. This example only worked with tsnet if "$cwd" evaluated to a directory name that all remote nodes understood, and suite was in that directory. With ntsnet, if suite is in the current working directory, you can use the command % ntsnet suite and, by default, ntsnet will expect suite to be in the directory of the same name on the remote nodes. ---------------------------------------------------------------------- Subject: 11.4) In what directory does ntsnet expect to find executables if they are not distributed? Ntsnet uses exactly the same mechanism as for determining where to copy them. For example, % setenv TSNET_PATH /usr/bin/linda:. % cd /tmp % ls /usr/bin/linda suite.sparc % rsh myhp9000 ls /usr/bin/linda suite.hp9000 % rsh myrs6000 ls /usr/bin/linda suite.rs6000 % ntsnet -nodelist "myhp9000 myrs6000" suite will result in ntsnet executing "/usr/bin/linda/suite.sparc" on the local node, "/usr/bin/linda/suite.hp9000" on myhp9000, and "/usr/bin/linda/suite.rs6000" on myrs6000. A map file can be used to map "/usr/bin/linda" to a different name. If suite.hp9000 was installed in "/usr/homes/frank/linda" on myhp9000, a map file such as map /usr/bin/linda { myhp9000 : /usr/homes/frank/linda; } would tell ntsnet that /usr/homes/frank/linda on myhp9000 is equivalent to /usr/bin/linda on all other nodes. ---------------------------------------------------------------------- Subject: 11.5) What directory does ntsnet use as the working directory for the local Linda process? Ntsnet always uses the user's current working directory as the working directory for the local Linda process. The rworkdir resource and the -p option have no effect on the local process. ---------------------------------------------------------------------- Subject: 11.6) What directory does ntsnet use as the working directory for the remote Linda processes? Ntsnet uses a mechanism very similar to the mechanism used for determining the remote executable directory. Ntsnet normally uses the rworkdir resource in the configuration file to determine the remote working directory. The default and recommended value of rworkdir is "Parallel". This means that ntsnet uses the map file to translate the local working directory for each remote node. For example, % cd /usr/homes/frank % ntsnet suite will cause ntsnet to use the working directory "/usr/homes/frank" on all the remote nodes. If rworkdir is set to something other than "Parallel", it must be the name of a directory on the remote machine. Ntsnet will use that directory as the remote working directory. The rworkdir resource is node specific, so it can be set to a different directory for each node. The commands % cd /usr/homes/frank % ntsnet -opt 'ntsnet.suite.Node.rworkdir: /tmp' ./suite will cause ntsnet to use "/tmp" as the working directory on all remote nodes, while the executable will be expected in /usr/homes/frank on all nodes. If the -p option is used, the rworkdir resource is ignored. The working directory is set to the directory that is parallel to specified directory. So for the example % ntsnet -d -p /tmp suite ntsnet will use "/tmp" as the working directory on all remote nodes, assuming that a map file doesn't cause any translation. ---------------------------------------------------------------------- Subject: 11.7) What are some map file examples? If the map file contains the entry mapto /net/mynode { mynode : / ; } and the commands % cd /usr/homes/frank % ntsnet ./suite are executed, ntsnet will translate the directory "/usr/homes/frank" to "/net/mynode/usr/homes/frank" on all remote nodes, which is useful if automounting is available. If the Linda executable is installed in "/usr/bin/linda" on each node, then the map file should also include the entry mapto /usr/bin { mynode : /usr/bin; } For example, % setenv TSNET_PATH /usr/bin/linda:. % cd /usr/homes/frank % ntsnet suite Ntsnet will then use "/net/mynode/usr/homes/frank" for the remote working directory, but "/usr/bin/linda" for the remote executables. This works because the second map entry takes precedence over the first. ---------------------------------------------------------------------- Subject: 11.8) How does ntsnet determine what nodes to run on? First, ntsnet schedules the master process to run on the local node. Then it makes a list of all nodes that are available to run Linda processes on, and schedules maxworker worker processes to run on those nodes. The local node is always included in the list of available nodes. The scheduling procedure can take into account the load average of each of the nodes, the relative cpu speeds, and other factors as specified in the configuration database. As a result, several processes may be scheduled on one node, while another node is unused. Ntsnet uses the nodelist resource for determining the list of nodes that are available. This can be done as simply as % ntsnet -n :2 -nodelist "mynode myhp9000 myrs6000" suite since the -nodelist option can be used to override the nodelist resource. This gives ntsnet a total of three nodes to use (since the local node is mynode). Ntsnet will attempt to schedule two workers (as specified by the -n option) in addition to the master process. ---------------------------------------------------------------------- Subject: 12) TOPIC: INSTALLING LINDA ---------------------------------------------------------------------- Subject: 12.1) Why can't I read my tape on my RS6000? RS6000 tapedrives are installed with a non-zero default blocksize, probably 1024. This means that only tapes written with this blocksize can be read on this drive. The solution is to use smit to set the blocksize to zero. This allows tapes written with any blocksize to be read. ---------------------------------------------------------------------- Subject: 12.2) Why can't I read my tape on my SGI? SGI tapedrives read and write tapes with the byte order reversed from many other tapedrives. The solution is to use "dd" with the "conv=swab" option to read the tape. The output from "dd" can be directed to "tar", as follows: % dd if= conv=swab | tar xvf - where refers to the appropriate tape drive. ---------------------------------------------------------------------- Subject: 13) TOPIC: MISCELLANEOUS ---------------------------------------------------------------------- Subject: 13.1) How can I use Linda on multiple Shared Memory Multiprocessors? How can I use Linda on multiple Shared Memory Multiprocessors? (e.g. several SGI challenge machines). If you want to run a single Linda job across multiple shared memory machines, you have two possible options: 1) Use Network Linda. This method is simple, and allows multiple shared memory machines and single CPU workstations to work together in a simple way. The main disadvantage is that Tuplespace is not accessed via shared memory. You will have to set some resources in the configuration file to cause multiple processes to be created on a single shared memory machine. For example, for an 8 node machine named homer: Tsnet.homer.maxsprocspernode: 8 Tsnet.homer.speedfactor: 8.0 You may also want to turn off the getload resource. You will also have to increase maxworkers, since it defaults to the number of distinct nodes. This can be done on the command line or in the configuration file. For example, imagine that you want to run myapp on homer, marge, and bart, three 8-node machines. $ ntsnet -n 12:23 myapp will start 23 worker processes, and require that at least 12 join. We use 23 instead of 24 because the master process adds one more. Using the configuration file: Tsnet.myapp.maxworkers: 23 2) Using shared memory Linda and Paradise. You can link together multiple shared memory Linda using Paradise operations. Each machine runs a separate Linda execution, using the hardare shared memory for its own private Tuplespace. Global coordination is achieved via the Paradise Tuplespace. Using this method, the Linda Tuplespaces are supported via hardware shared memory, which gives better performance. The main drawback is additional programming effort for the Paradise coordination framework. ---------------------------------------------------------------------- Subject: 14) TOPIC: LICENSE SERVER ---------------------------------------------------------------------- Subject: 14.1) Can the lserv License server run on a different subnet than my Network Linda job? You can run the License server 'lserv' process on a different subnet than your Network Linda job by using the LSHOST environment variable. Set LSHOST to the hostname of the node on which the lserv license server is running. eg: % setenv LSHOST You can add this to your .login or .cshrc file. When Ntsnet starts a Network Linda job, it needs to contact the lserv license server to obtain the necessary license tokens. Ntsnet contacts lserv by first looking on the local node, and then by sending a broadcast message on the _local_ subnet. If the lserv license server is on a different subnet the broadcast message will fail. The 3rd method Ntsnet uses to contact lserv, is by using the LSHOST environment variable. If LSHOST is set to the hostname of the node on which lserv is running, Ntsnet sends the license token request directly to that node even if lserv is on a different subnet.