Manual of Distributed Packet Monitoring

Getting started

Example

How to log and replay the dataplane?.

 

 


Getting started

This is a tool to run tcpdump at distributed nodes and to analyze the collected data.

 

Here are the instructions on how to use it:

1. configure

Write a configure file(.rb) in dpm directory and specify its name by modifying the CONFIG_FILE in Makefile.

 

2. make install

 

3. monitor packets

There are several ways for users to monitor packets and other information.

      A. Add the ToDump element to whereever you want to monitor within the Click configuration.

See the sample ClickGenerator.rb    

 

      B. Run tcpdump to monitor packets at each node.

      Use mon.pl to start tcpdump and collect information from every node in node_list at intervals.

         Usage: mon.pl [--options=string] [--inteval=t]

          --options   Users can specify the tcpdump by setting the options. The format of the options are exactly the same with that of tcpdump.

          --inteval=t make mon.pl collect data from all the nodes every t seconds. In default, t = 60.   

         Example: ./mon.pl --options="tcp src 128.112.139.108"

 

      When your experiments are done, use "make stop" to stop the tcpdump on all the nodes

 

Note:  We will check all the *.dat files in the directory you give us, so please make sure your tcpdump files end with ".dat" in one directory and there're no other *.dat files in this directory.

 

4. Record packet loss

      add the command in your uml startup script "/host/home/{node.slice}/mon/script/record_loss.pl --host=#{node.slice}@#{node.tap0.ipaddr} --port=#{$click_port} --dir=/host/home #{node.slice}/mon/data &"

      See the sample UmlNetGenerator.rb

 

5. collect data

   Use collect.pl to collect the tcpdump file at each node to the central node.

   Usage: collect.pl [--dir=dir] [--options=string] [--interval=t] [--hash=num] [--online]

          --dir we will view all the *.dat files as the original tcpdump files. So please make sure your tcpdump files end with ".dat"

          --options you can use this to further specify the group of packets to deal with. Note that this option is only useful with filter on.

          --hash=num    use hash function to select a group of packets to monitor. (Trajectory Sampling)

We will select num% packets to monitor.

          --online    collect real-time data

          --interval=t    This is useful only when --online is on.

 

6. analyze data

  Usage: analyze.pl [--nam] [--path] [--graph=filename]

          --nam     generate the trace file for Nam Animator to replay the traffic on the network

          --path   show all the paths an indicated group of packets transfer, it will also generate a group of files route0.pac, route1.pac … Routei.pac will store the general information of all the packets that transfer through the ith path. These files are also the input files of "analyze.pl --graph".

          --graph  draw the graph about the amount of packets on each path, should first run

 "analyze.pl --path" and then run "analyze.pl --graph". We will generate path.gp, you can modify the x/y axis as you need in this file, and run "gnuplot path.gp" to get the graph.

   --loss    show the loss packets at each node

 

Example

Here we show an example about how we use this tool to monitor packets and analyze their behaviors.

 

In this example, we want to see a special group of packets and get the detailed behavior of each packets.

 

1. monitor

Add the ToDump element in Click in the following places:

The entry of mac_table

The output of uml

The output of fea

See ClickGenerator.rb for the details.

Note that we use SNAPLEN 28 to avoid capture the whole packet.

 

2. collect data

./collect.pl --options="port 12349"

Note that the path is exactly where we store our packet record in Click configuration file. We will collect all the files in this directory, so make sure there are no other files in this directory.

We don't use the --hash option because we need the detailed view of each packet and hashing may cause an unclear view of certain behavior.

We can also choose the --online option to get some message during the experiment. But now we cannot offer the online animation. There are just some real-time statistics.

 

3. analysis

 ./analyze.pl --pid=58443

packet 58443 from tap0.sttl.12349 to tap0.wash.60820

1177191392.27063: at sttl, ttl  64

=> 1177191392.27063: at sttl, ttl  63

=> 1177191393.25804: at snva, ttl  62

=> 1177191392.28829: at losa, ttl  61

=> 1177191392.29975: at hstn, ttl  60

=> 1177191392.35433: at atla, ttl  59

=> 1177191392.40571: at wash, ttl  58

ok

packet 58443 from tap0.wash.60820 to tap0.sttl.12349

1177191403.64068: at wash, ttl  64

=> 1177191403.64068: at wash, ttl  63

=> 1177191403.65184: at nycm, ttl  62

=> 1177191403.67035: at wash, ttl  61

=> 1177191403.67153: at nycm, ttl  60

=> 1177191403.67358: at chin, ttl  59

=> 1177191403.71921: at ipls, ttl  58

=> 1177191404.7694: at kscy, ttl  57

=> 1177191404.77816: at dnvr, ttl  56

=> 1177191403.80561: at sttl, ttl  55

ok

Note: ok means that this packet is successfully transferred to the destination.

There're two other options as linkdrop(pakcets get dropped on the way) & nodedrop(packets dropped on the node). -- not fully implemented yet.

 

./analyze.pl --path

route 0: total packets 4377

sttl=>dnvr=>kscy=>ipls=>chin=>nycm=>wash

route 1: total packets 37

sttl

route 2: total packets 11

sttl=>kscy=>ipls=>chin=>nycm=>wash

route 3: total packets 3

sttl=>kscy=>ipls=>chin=>nycm

route 4: total packets 4

sttl=>kscy=>ipls=>chin

route 5: total packets 4

sttl=>kscy=>chin

route 6: total packets 1

kscy

route 7: total packets 2503

sttl=>snva=>losa=>hstn=>atla=>wash

route 8: total packets 9

sttl=>snva=>losa=>atla=>wash

route 9: total packets 4

sttl=>snva=>losa=>wash

route 10: total packets 523

wash

route 11: total packets 4619

wash=>nycm=>chin=>ipls=>kscy=>dnvr=>sttl

route 12: total packets 14

wash=>nycm=>chin=>ipls=>kscy=>sttl

route 13: total packets 2

wash=>chin=>ipls=>kscy=>sttl

route 14: total packets 2

chin=>ipls=>kscy=>sttl

route 15: total packets 4

chin=>kscy=>sttl

route 16: total packets 1

kscy=>sttl

route 17: total packets 5

wash=>nycm=>chin=>ipls=>kscy

route 18: total packets 2692

wash=>atla=>hstn=>losa=>snva=>sttl

route 19: total packets 11

wash=>atla=>losa=>snva=>sttl

route 20: total packets 1

wash=>losa=>snva=>sttl

route 21: total packets 2

wash=>nycm=>wash

route 22: total packets 3

wash=>nycm=>wash=>nycm=>chin=>ipls=>kscy=>dnvr=>sttl

route 23: total packets 6

wash=>nycm=>chin=>ipls=>kscy=>dnvr

 

./analyze.pl --nam > trace.nam

Now you can get a Nam trace file, and you can use nam to see the animation about what happened in your network

experiment.

How to log and replay the dataplane?.

Freeze/replay the data plane(Click).

 

This function is specialized for VINI.

 

------------------------------------

* Case 1: XORP and Click

Step 1: monitor

monitor/xorp_mon.pl - use tcpdump to catch all the packets xorp send to click

Usage:        xorp_mon.pl [--dumpfile=filename]

Description: 

   --dumpfile=filename - indicate the file in which tcpdump write. In default, it's "xorp_dump.dat".

 

Use "make stop" to stop tcpdump at all the nodes when you have done your experiment.

 

Step 2: script generation

make replay_gen - generate the replay script at each node

 

Step 3: replay

Start VINI with router="none"

make replay_xorp

 

------------------------------------

* Case 2: Quagga and Click

Step 1: monitor and log

add the command in your uml startup script "~/mon/script/RouteLog $path"

The path is where we store our log.

 

Step 2: replay

Start VINI with router="none"

make replay_route

 

------------------------------------

* Case 3: Enhance the performance by forward packets in Click

add the command in your uml startup script "~/mon/script/Route2Click $path $hostname $click_port"  (cannot work well)

 

Example:

we modify the UmlNetGenerator.rb:

  when "Quagga"

#we add here

#for click forwarding

      if node.click_forwarding

        f.print <<EOF

        /home/#{node.slice}/mon/scripts/Route2Click /home/#{node.slice}/mon/routes #{node.slice}@#{node.tap0.ipaddr} #{$click_port} &    

        sleep 200

EOF

#for log routing table in the kernel.

      else

        f.print <<EOF

        /home/#{node.slice}/mon/scripts/RouteLog /home/#{node.slice}/mon/routes &    

EOF

      end

 

    f.print <<EOF

  echo start quagga       

  echo 1 > /proc/sys/net/ipv4/ip_forward

 

  cp -R /home/#{node.slice}/demo/#{node.slice}@#{node.dnsname}/quagga* /usr/local/quagga/etc/ &

  chown -R quagga.quagga /usr/local/quagga/ &

  /usr/local/quagga/sbin/zebra -d -f /usr/local/quagga/etc/quagga_zebra.cfg &

  chown -R quagga.quagga /var/run/quagga &

EOF

 

Note:

One error might occur in Case 3 like this:

    "The system has no more ptys.  Ask your system administrator to create more."

Solution:

 This entry needs to be added in the host's fstab:

 devpts   /dev/pts   devpts   gid=4,mode=620   0 0