Project

General

Profile

Actions

Bug #9576

closed

ibnetdiscover is applying non-CA port GUID name maps as expected but ignoring CA port GUID names

Added by Justin Sherrill about 10 years ago. Updated over 6 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
-
Category:
-
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1192740
A customer is asking a query about the behaviour of ibnetdiscover and is wondering if it is because of a bug. We are not familiar enough with IB to immediately determine this. Any guidance or pointers would be appreciated.


Here are three test cases.

File "node-name-map.txt" in the CWD contains records of '{GUID} "{name}"' as per historic 'ibnetdiscover --node-name-map {mapfile}' use. All map records take the form "{name}[IB{plane}]" and nodes "service5" and "r6i3n15" are SM-active on both locally connected fabrics. The infiniband-diags default config is used and is unchanged from stock (ie., all commented) so all defaults should apply. The default name map for CAs should be "{node} mthca_0" because the first link-up device located will always be 0 and all CAs are Mellanox via mlx4. The default name map for SWs should be the vendor strings, which are all Mellanox MT47's.

A netdiscover run for ports without the name map reports the test GUIDS as "service5 mthca_0", "r6i3n15 mthca_0", and "MT47396 Infiniscale-III Mellanox Technologies" as expected. Applying the map results in all SWs being reported as "{name}[IB{plane}]", as I'd expect. Service5 and r6i3n15, along with all other CAs, are still reported as "{name} mthca_0}".

Why is the node name map being applied only to the SW (or non-CA) GUIDs, and how do I get ibnetdiscover to apply it uniformly to all GUIDs?

##
  1. test node service5 is a standalone CA with two adapters, each with two ports: ##
    [root@r1lead ~]# grep service5 node-name-map.txt
    0x0002c9020028e8bd "service5[IB=0]"
    0x0002c9020028e8be "service5[IB=1]"
    0x0002c9020028e5d5 "service5[IB=2]"
    0x0002c9020028e5d6 "service5[IB=3]"
##
  1. test node r6i3n15 is a diskless backplane-connected CA with one embedded twin-port adapter: ##
    [root@r1lead ~]# grep r6i3n15 node-name-map.txt
    0x003048c9c9040001 "r6i3n15[IB=0]"
    0x003048c9c9040002 "r6i3n15[IB=1]"
##
  1. r6i3sw3 is a 24-port IB switch in the last slot of the last IRU of rack 6 (in this case
  2. I'm using the node-level GUID of the switch's backplane rather than a port-level GUID): ##
    [root@r1lead ~]# grep r6i3sw3 node-name-map.txt
    0x0800690000004e8c "r6i3sw3[IB=0]"
##
  1. the standalone CAs, without mapping, return connected GUIDs with default names, as they should: ##
    [root@r1lead ~]# ibnetdiscover -p |grep service5
    SW 810 23 0x0800690000004de4 4x DDR - CA 878 1 0x0002c9020028e8bd ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'service5 mthca0' )
    CA 879 2 0x0002c9020028e8be 4x DDR - SW 405 23 0x0800690000004e22 ( 'service5 mthca0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    CA 878 1 0x0002c9020028e8bd 4x DDR - SW 810 23 0x0800690000004de4 ( 'service5 mthca0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 405 23 0x0800690000004e22 4x DDR - CA 879 2 0x0002c9020028e8be ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'service5 mthca0' )
##
  1. the same, with mapping added, and only the SWs are mapped -- the CAs still have their unmapped default names: ##
    [root@r1lead ~]# ibnetdiscover -p --node-name-map node-name-map.txt |grep service5
    SW 810 23 0x0800690000004de4 4x DDR - CA 878 1 0x0002c9020028e8bd ( 'r3i3sw2[IB=1]' - 'service5 mthca0' )
    CA 879 2 0x0002c9020028e8be 4x DDR - SW 405 23 0x0800690000004e22 ( 'service5 mthca0' - 'r3i3sw3[IB=0]' )
    CA 878 1 0x0002c9020028e8bd 4x DDR - SW 810 23 0x0800690000004de4 ( 'service5 mthca0' - 'r3i3sw2[IB=1]' )
    SW 405 23 0x0800690000004e22 4x DDR - CA 879 2 0x0002c9020028e8be ( 'r3i3sw3[IB=0]' - 'service5 mthca0' )
##
  1. the backplane-attached CAs, without mapping, are correct like the standalone CAs were: ##
    [root@r1lead ~]# ibnetdiscover -p |grep r6i3n15
    SW 539 10 0x0800690000004c58 4x DDR - CA 168 2 0x003048c9c9040002 ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'r6i3n15 mlx4_0' )
    CA 168 2 0x003048c9c9040002 4x DDR - SW 539 10 0x0800690000004c58 ( 'r6i3n15 mlx4_0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    CA 8 1 0x003048c9c9040001 4x DDR - SW 651 4 0x0800690000004e8c ( 'r6i3n15 mlx4_0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 651 4 0x0800690000004e8c 4x DDR - CA 8 1 0x003048c9c9040001 ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'r6i3n15 mlx4_0' )
##
  1. with mapping added, they do the same thing as the previous case -- SWs are mapped, CAs aren't:\ ##
    [root@r1lead ~]# ibnetdiscover -p --node-name-map node-name-map.txt |grep r6i3n15
    SW 539 10 0x0800690000004c58 4x DDR - CA 168 2 0x003048c9c9040002 ( 'r6i3sw2[IB=1]' - 'r6i3n15 mlx4_0' )
    CA 168 2 0x003048c9c9040002 4x DDR - SW 539 10 0x0800690000004c58 ( 'r6i3n15 mlx4_0' - 'r6i3sw2[IB=1]' )
    CA 8 1 0x003048c9c9040001 4x DDR - SW 651 4 0x0800690000004e8c ( 'r6i3n15 mlx4_0' - 'r6i3sw3[IB=0]' )
    SW 651 4 0x0800690000004e8c 4x DDR - CA 8 1 0x003048c9c9040001 ( 'r6i3sw3[IB=0]' - 'r6i3n15 mlx4_0' )
##
  1. SW-to-SW links do the same thing as previous cases -- without mapping, default names: ##
    [root@r1lead ~]# ibnetdiscover -p |grep 0x0800690000004e8c
    ...snip-snip...
    CA 112 1 0x003048c9baa80001 4x DDR - SW 651 2 0x0800690000004e8c ( 'r6i3n12 mlx4_0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    CA 80 1 0x003048c9b28c0001 4x DDR - SW 651 1 0x0800690000004e8c ( 'r6i3n13 mlx4_0' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 651 24 0x0800690000004e8c 4x SDR 'MT47396 Infiniscale-III Mellanox Technologies'
    SW 651 23 0x0800690000004e8c 4x DDR - SW 690 23 0x0800690000004ea2 ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 651 22 0x0800690000004e8c 4x DDR - SW 690 22 0x0800690000004ea2 ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 651 21 0x0800690000004e8c 4x SDR 'MT47396 Infiniscale-III Mellanox Technologies'
    SW 651 20 0x0800690000004e8c 4x SDR 'MT47396 Infiniscale-III Mellanox Technologies'
    SW 651 19 0x0800690000004e8c 4x DDR - SW 378 19 0x0800690000004e1a ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    SW 651 18 0x0800690000004e8c 4x DDR - SW 378 18 0x0800690000004e1a ( 'MT47396 Infiniscale-III Mellanox Technologies' - 'MT47396 Infiniscale-III Mellanox Technologies' )
    ...snip-snip...
##
  1. and with mapping, mapped names, including single-sided states (ports 20, 21, and 24 have transceivers but
  2. the far ends are disconnected, they map correctly just like active links): ##
    [root@r1lead ~]# ibnetdiscover -p --node-name-map node-name-map.txt |grep 0x0800690000004e8c
    ...snip-snip...
    CA 112 1 0x003048c9baa80001 4x DDR - SW 651 2 0x0800690000004e8c ( 'r6i3n12 mlx4_0' - 'r6i3sw3[IB=0]' )
    CA 80 1 0x003048c9b28c0001 4x DDR - SW 651 1 0x0800690000004e8c ( 'r6i3n13 mlx4_0' - 'r6i3sw3[IB=0]' )
    SW 651 24 0x0800690000004e8c 4x SDR 'r6i3sw3[IB=0]'
    SW 651 23 0x0800690000004e8c 4x DDR - SW 690 23 0x0800690000004ea2 ( 'r6i3sw3[IB=0]' - 'r2i3sw3[IB=0]' )
    SW 651 22 0x0800690000004e8c 4x DDR - SW 690 22 0x0800690000004ea2 ( 'r6i3sw3[IB=0]' - 'r2i3sw3[IB=0]' )
    SW 651 21 0x0800690000004e8c 4x SDR 'r6i3sw3[IB=0]'
    SW 651 20 0x0800690000004e8c 4x SDR 'r6i3sw3[IB=0]'
    SW 651 19 0x0800690000004e8c 4x DDR - SW 378 19 0x0800690000004e1a ( 'r6i3sw3[IB=0]' - 'r5i3sw3[IB=0]' )
    SW 651 18 0x0800690000004e8c 4x DDR - SW 378 18 0x0800690000004e1a ( 'r6i3sw3[IB=0]' - 'r5i3sw3[IB=0]' )
    ...snip-snip...
##
  1. ib diags version for netdiscover ##
    [root@r1lead ~]# rpm -q --whatprovides `which ibnetdiscover`
    infiniband-diags-1.6.4-1.el6.x86_64

Actions #1

Updated by Justin Sherrill about 10 years ago

  • Status changed from New to Rejected

Whoops! mistaken clone from downstream closing

Actions #2

Updated by Eric Helms about 10 years ago

  • Triaged changed from No to Yes
Actions #3

Updated by Eric Helms almost 9 years ago

  • Translation missing: en.field_release set to 166
Actions

Also available in: Atom PDF