Project

General

Profile

Bug #9170

Wrong nic order in libvirt domain

Added by Anonymous over 6 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Compute resources - libvirt
Target version:
Difficulty:
Triaged:
Bugzilla link:
Fixed in Releases:
Found in Releases:

Description

Hi,

We are testing the new interfaces code that just got merged to develop, and are running a test instance off of 43c4bd72e4646cc9b2b4cb0533e84e08825eadda.
So far, it's pretty great, but we found a small issue with libvirt compute resources.

The scenario is:

  • go to the New Host page
  • fill in Host information, and choose a libvirt compute resource to deploy on
  • go to the Interfaces tab
  • edit the default interface, choose a domain, subnet, ip, leave the default information as-is
  • choose a bridge from the list, say bridgeA
  • submit
  • add a new interface, choose a domain, subnet, ip
  • choose a different bridge from the list, say bridgeB
  • submit
  • fill in the remaining information and save the host, which will get built

The libvirt domain created for the new host has 2 NICs, but they are in the "wrong order", the first NIC is bridged to bridgeB and the second to bridgeA. Plus, the MAC addresses that get saved to the database once the domain has been created are mixed up.
The other data, however, looks right, with the proper bridge name in the NIC attributes.


Related issues

Related to Foreman - Tracker #2409: NetworkingNew

Related to Foreman - Refactor #7456: Extract primary interface from hostClosed2014-09-16

Associated revisions

Revision 8aeac1b7 (diff)
Added by Eric-Olivier Lamey over 6 years ago

Fixes #9170 - wrong nic order in libvirt domain

Revision 3314dc46 (diff)
Added by Eric-Olivier Lamey over 6 years ago

Fixes #9170 - wrong nic order in libvirt domain

(cherry picked from commit 8aeac1b7b89fab21e0b92f54eb460fcc04415e07)

History

#1 Updated by Dominic Cleal over 6 years ago

#2 Updated by Dominic Cleal over 6 years ago

  • Related to Refactor #7456: Extract primary interface from host added

#3 Updated by Dominic Cleal over 6 years ago

  • Legacy Backlogs Release (now unused) deleted (28)

#4 Updated by Marek Hulán over 6 years ago

Greg is there something we can do about this (except fixing fog so it returns custom ids back)?

#5 Updated by Greg Sutcliffe over 6 years ago

  • Status changed from New to Need more information

Marek, Eric, I can't reproduce this I'm afraid. I've created several hosts with various configurations of bridges and in every case the correct interface gets the correct bridge. Here's one example:

Submitted data:
nic1 (primary: true, provision true) on bridge virbr4
nic2 (no flags) on bridge virbr3
libvirt data:
virbr4 nic has mac 52:54:00:59:b7:61
virbr3 nic has mac 52:54:00:60:82:97
rails console:
[3] pry(main)> Host.first.interfaces
Host::Managed Load (0.8ms) SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" IN ('Host::Managed') LIMIT 1
Nic::Base Load (0.2ms) SELECT "nics".* FROM "nics" WHERE "nics"."host_id" = 3 ORDER BY identifier
=> [#<Nic::Managed id: 5, mac: "52:54:00:59:b7:61", ip: "192.168.130.24", type: "Nic::Managed", name: "foo5.primary.sapphire.local", host_id: 3, subnet_id: 1, domain_id: 1, attrs: {}, created_at: "2015-02-09 17:21:39", updated_at: "2015-02-09 17:21:39", provider: nil, username: nil, password: nil, virtual: false, link: true, identifier: "", tag: "", attached_to: "", managed: false, mode: "balance-rr", attached_devices: "", bond_options: "", primary: true, provision: true, compute_attributes: {"type"=>"bridge", "bridge"=>"virbr4", "model"=>"virtio"}>,
#<Nic::Managed id: 6, mac: "52:54:00:60:82:97", ip: "192.168.131.2", type: "Nic::Managed", name: "bar5.provisioning.sapphire.local", host_id: 3, subnet_id: 2, domain_id: 2, attrs: {}, created_at: "2015-02-09 17:21:39", updated_at: "2015-02-09 17:21:39", provider: nil, username: nil, password: nil, virtual: false, link: true, identifier: "", tag: "", attached_to: "", managed: false, mode: "balance-rr", attached_devices: "", bond_options: "", primary: false, provision: false, compute_attributes: {"type"=>"bridge", "bridge"=>"virbr3", "model"=>"virtio"}>]

As you can see, the primary nic in the DB has the :61 mac, and the other nic has the :97 mac.

Do you guys have a further notes for a reliable reproducer, or a public system I can log into to see the bug firsthand?

#6 Updated by Anonymous over 6 years ago

Greg Sutcliffe wrote:

Marek, Eric, I can't reproduce this I'm afraid. I've created several hosts with various configurations of bridges and in every case the correct interface gets the correct bridge. Here's one example:

Greg, thank you for looking into this.

As you can see, the primary nic in the DB has the :61 mac, and the other nic has the :97 mac.

That is weird... I can reproduce the issue every time I create a virtual machine, and now do so with the bridge in the "wrong" order so that the libvirt domain gets created properly.

Do you guys have a further notes for a reliable reproducer, or a public system I can log into to see the bug firsthand?

Unfortunately, our foreman deployment cannot be reached from outside the corporate network. I could change the logging level, or trace any request/response you think could be interesting. I tried to find where, in the code, the nic attributes were used to create the libvirt domain but couldn't do so. If you tell me where, I could at least add a few statements in foreman and/or fog to see what the data looks like.

#7 Updated by Greg Sutcliffe over 6 years ago

Ok, so if you up the logs to debug, you should see the assignments between fog nic objects and foreman nics being logged. That logging comes from `app/models/concerns/orchestration/compute.rb:288` - you could drop some more logger.debug lines in there or at line 281 outside the loop. From there the code goes to `app/models/concerns/fog_extensions/libvirt/server.rb:42` so that's another place where you could try to dig into what's happening.

#8 Updated by Anonymous over 6 years ago

Greg, thank you for your help.

After adding a heavy dose of logging, I found the issue to be in the `add_interfaces_to_compute_attrs` method (https://github.com/theforeman/foreman/blob/develop/app/models/concerns/orchestration/compute.rb#L259). As the Hash is later sorted according to the key (https://github.com/theforeman/foreman/blob/develop/app/models/compute_resource.rb#L274), the nics end up in reverse order. We may have this behaviour because we are using ruby 2.1.5 to run foreman, I'm not sure. I'm not sure either why the `object_id` is used as a key in the Hash :)

If I change the code in `add_interfaces_to_compute_attrs` to rely on the `interfaces` Hash order (since it's guaranteed to be stable since ruby 1.9), it works properly. I just used a regular `index`, starting at `0`, for the key.

Also, the code in `nested_attributes_for` seems to expect keys with `new_` in their name, I'm not sure why.

Do you think replacing `object_id` as an ordered key would be a proper solution to the issue?

#10 Updated by Marek Hulán over 6 years ago

Would you mind creating a PR? You can find instructions at our wiki page if needed.

#11 Updated by Anonymous over 6 years ago

Marek, I don't mind but I'm definitely not sure about the impacts of the change, especially since the issue seems to only happen in my environment (and may be due to the use of ruby 2.1).

#12 Updated by Marek Hulán over 6 years ago

If you make a PR, we can run our test suite and someone will review and test it sooner or later. We might get feedback from others whether this is good or bad idea. TBH I'm not entirely sure how nic.object_id could fail.

#13 Updated by Anonymous over 6 years ago

Alright then, I will make a PR :) Thank you for your help.

#14 Updated by The Foreman Bot over 6 years ago

  • Status changed from Need more information to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/2191 added
  • Pull request deleted ()

#15 Updated by Daniel Lobato Garcia over 6 years ago

I cannot reproduce this either, I can record a screencast of what I do if needed

#16 Updated by Marek Hulán over 6 years ago

Even though I was unable to reproduce, I think this can potentially happen, http://projects.theforeman.org/issues/9170#note-8 explains it pretty good. Since none of use was able to reproduce I requested some proof from reporter, which might take some time but I'd consider this as 1.8 blocker. Not setting release to it, as I don't know how long can that take.

#17 Updated by Dominic Cleal over 6 years ago

  • Assignee set to Anonymous
  • Legacy Backlogs Release (now unused) set to 50

#18 Updated by Anonymous over 6 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF