Bug #10852
closedDiscovery image discovery_bootif doesn't match boot interface and can't discover host
Description
I'm not sure if i'm chasing down the right issue, but I have a DELL R630 that i'm trying to get discovery to work on and when the host boots the image i see:
Started POST "/api/v2/discovered_hosts/facts" for 172.28.240.39 at 2015-06-17 08:01:46 -0500
2015-06-17 08:01:46 [I] Processing by Api::V2::DiscoveredHostsController#facts as JSON
2015-06-17 08:01:46 [I] Parameters: {"facts"=>"[FILTERED]", "apiv"=>"v2", "discovered_host"=>{"facts"=>"[FILTERED]"}}
2015-06-17 08:01:47 [I] Import facts for 'macecf4bbced658' completed. Added: 145, Updated: 0, Deleted 0 facts
2015-06-17 08:01:47 [E] address family must be specified (ArgumentError)
/opt/rh/ruby193/root/usr/share/ruby/ipaddr.rb:460:in `initialize'
/usr/share/foreman/app/models/subnet.rb:97:in `new'
/usr/share/foreman/app/models/subnet.rb:97:in `block in subnet_for'
/usr/share/foreman/app/models/subnet.rb:97:in `each'
/usr/share/foreman/app/models/subnet.rb:97:in `subnet_for'
/opt/rh/ruby193/root/usr/share/gems/gems/foreman_discovery-3.0.0/app/models/host/discovered.rb:110:in `populate_fields_from_facts'
/usr/share/foreman/app/models/host/base.rb:126:in `import_facts'
I setup SSH on the discovery module, SSH'd into the box and noticed some things
1. On Dells, the first two interfaces are FIBER and the 3rd Interface is the first copper interface
[root@fdi facts]# facter | grep discovery
discovery_bootif => ec:f4:bb:ce:d6:58
discovery_release => 20150525.1
discovery_version => 2.1.1
That mac address is matching the en1 interface
[root@fdi facts]# ifconfig eno1
eno1: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet6 fe80::eef4:bbff:fece:d658 prefixlen 64 scopeid 0x20<link>
ether ec:f4:bb:ce:d6:58 txqueuelen 1000 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Which has no link
The server successfully booted on on eno3
[root@fdi facts]# ifconfig eno3
eno3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.28.252.27 netmask 255.255.255.0 broadcast 10.28.252.255
inet6 fe80::eef4:bbff:fece:d65c prefixlen 64 scopeid 0x20<link>
ether ec:f4:bb:ce:d6:5c txqueuelen 1000 (Ethernet)
RX packets 11566 bytes 1686079 (1.6 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 8632 bytes 3270766 (3.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
device memory 0x91a80000-91afffff
Is this disparity what is causing my Dell to not post to foreman? Is it trying to post the mac info under the non booted interface?
I posted some of my discovery-debug info in the google groups. I can try and attach that here if needed. looks like /usr/share/fdi/facts/discover-facts.rb may need to be refactored so that it correctly identifies the active link in the right order?
Updated by Byron Miller over 9 years ago
Hmm, looking at the fact thingy it looks like its defaulting to rd.bootif=0 in /proc/cmdline right?
is there a way to setup a hardware type to match to a kickstart file so if hardware type=dell630 it will pass rd.bootif=2 ? or should we edit the fact script to programmatically assign the boot interface based on whichever one got DHCP and has an active route?
Going to try and edit the default boot thingy just to try and confirm i can post the facts based on these assumptions that something isn't matching.. maybe in the default boot i can have a template based on hardware types to set this value?
Updated by Lukas Zapletal over 9 years ago
by primary interface we mean the interface it was PXE-booted from. Using
PXELinux and IPAPPEND option "2" this should be passed into the system
via kernel command line option which is then parsed by our script to
create the discovery_bootif fact.
Investigate your kernel command line and BIOS setting please and let me
know.
Updated by Lukas Zapletal over 9 years ago
Ah I see your comment 1 now. Let me explain.
Image currently initializes all interfaces via DHCP, but the one that was reported via kernel command line is set with PEERDNS and DEFROUTE. That means it will become the primary route (or interface if you will) and it will be also used for DNS queries.
The only way how we determine primary interface is this. In Foreman deployments, the workflow is to use primary interfaces as provisioning interfaces.
Updated by Byron Miller over 9 years ago
Lukas,
Thanks for the reply. I see it iterate over all of the interfaces and the 3rd one does get DHCP. I was able to login to the discovery image and that's when i noticed the discrepancy that even though the first interface has no ip, it was found to be discovery_bootif interface, and there were a couple of other facts that are missing because it can't get anything but a mac address.
I logged into the idrac and the 3rd interface is set to PXE boot and 1,2&4 are down because there is no active connection. Every time i reboot, i get DHCP/PXE and the image loaded but it fails when calling back to foreman about address family must be specified.
Updated by Lukas Zapletal over 9 years ago
The "address family" bug is a known one, currently a patch is pending review: https://github.com/theforeman/foreman_discovery/pull/193
Updated by Lukas Zapletal over 9 years ago
- Related to Bug #9857: Internal server error when registering discovered host in Foreman 1.8 added
Updated by Byron Miller over 9 years ago
Here is a screenshot in the BIOS showing PXE boot is set to NIC 1 Port 3
https://www.dropbox.com/s/wf5cetmz9u1qapc/Screenshot%202015-06-18%2009.32.30.png?dl=0
Updated by Byron Miller over 9 years ago
Cool, i'll check out that patch.. maybe the default_bootif doesn't really matter in the grand scheme of things, but i do see i'm missing facts and its not running as expected on my Dell even though the BIOS shows its setup to use this inteface, discovery_bootif is using the finer on nic 1 port 1.
Updated by Byron Miller over 9 years ago
Byron Miller wrote:
Cool, i'll check out that patch.. maybe the default_bootif doesn't really matter in the grand scheme of things, but i do see i'm missing facts and its not running as expected on my Dell even though the BIOS shows its setup to use this inteface, discovery_bootif is using the mac of nic 1 port 1.
Updated by Lukas Zapletal over 9 years ago
How your default PXE Linux configuration looks like? Can you pastebit it?
Updated by Byron Miller over 9 years ago
Pretty generic template.
Updated by Byron Miller over 9 years ago
Going to grab the detailed logs here after our standup. But had a question, can we make it fail softer? So if something errors we can at least capture the error output and the interfaces that are up? my worry is that if i provision 100+ servers and some get stuck, finding the stuck ones will be a royal pain in the ass with this bootup microkernel but fail hard to foreman
Updated by Lukas Zapletal over 9 years ago
If I am not mistaken, you have a typo that is causing this behavior:
PAPPEND 2
should be
IPAPPEND 2
Without this option, Discovery will never work correctly. Verify, fix, re-test and close this issue if that's the case.
Updated by Byron Miller over 9 years ago
Sorry, it was a copy and paste failure of mine. It is set to IPAPPEND. I was switching between leaving it at 0 and 2 to see if that works.
If i need to fuss with this, hoow would i set the IPAPPEND to 2 only for DELLS since everything else works out of the box with the primary interface being the default? I'm running more builds after getting another foreman instance and vlan setup that i can experiment with that won't add more noise to production.
Also, is the 3.01 discovery image coming out soon with the fix for this related issue that info can't be posted back and increased debug output of what is failing or is that slated for 1.9x?
Updated by Byron Miller over 9 years ago
If you have a VMware instance to mess around with, you can replicate this issue.
In Foreman, click provision new host. Put Primary interface on static IP and primary network. On secondary interface set it to provisioning and use DHCP but keep it unmanaged. This will then provision the VM through VMware, load the discovery image and get stuck on the same error that it can't post to foreman.
Is the 3.01 out yet or still working on other issues?
Updated by Byron Miller over 9 years ago
Alright, I captured more discovery debug.
Also, here is the log i'm still getting.
I'm now having issues on cisco ucs blades but i think its because of having multiple interfaces.. no matter what i set bootif to it doesn't really seem to matter..
Updated by Byron Miller over 9 years ago
or ipappend.. which i'm still screwing around with just to make sure i'm not seeing things :)
Updated by Byron Miller over 9 years ago
I did tripple check and rebuilt the template that IPAPPEND 2 is there (not PAPPEND..)
doing lots of reboots now to see if i can get any of these new machines to register..
Updated by Lukas Zapletal over 9 years ago
Byron,
in the pastebins I don't see required BOOTIF kernel parameter. Our image depends on this parameter to recognize provisioning interface. It is being added by PXELinux using the IPAPPEND 2 option: http://www.syslinux.org/wiki/index.php/SYSLINUX
It is not there:
Jul 16 20:16:32 fdi discovery-register[1065]: Parsing kernel line: BOOT_IMAGE=boot/fdi-image/vmlinuz0 initrd=boot/fdi-image/initrd0.img rootflags=loop root=live:/fdi.iso rootfstype=auto ro rd.live.image acpi=force rd.luks=0 rd.md=0 rd.dm=0 rd.lvm=0 rd.bootif=1 rd.neednet=0 fdi.ssh=1 fdi.rootpw=null nomodeset proxy.url=https://10.28.252.20:8443 proxy.type=proxy
Looks like PXELinux is not able to detect your BOOTIF? Using BIOS or UEFI? We do not support UEFI yet.
Updated by Lukas Zapletal over 9 years ago
- Status changed from New to Need more information
What is the status?
Updated by Byron Miller over 9 years ago
- Status changed from Need more information to Closed
I've upgraded to 1.9.3 and have used the work arounds known to get discovery image to work. Looking forward to new stuff you're testing out as well.