Project

General

Profile

Actions

Bug #9170

closed

Wrong nic order in libvirt domain

Added by Anonymous almost 9 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
Normal
Assignee:
-
Category:
Compute resources - libvirt
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

Hi,

We are testing the new interfaces code that just got merged to develop, and are running a test instance off of 43c4bd72e4646cc9b2b4cb0533e84e08825eadda.
So far, it's pretty great, but we found a small issue with libvirt compute resources.

The scenario is:

  • go to the New Host page
  • fill in Host information, and choose a libvirt compute resource to deploy on
  • go to the Interfaces tab
  • edit the default interface, choose a domain, subnet, ip, leave the default information as-is
  • choose a bridge from the list, say bridgeA
  • submit
  • add a new interface, choose a domain, subnet, ip
  • choose a different bridge from the list, say bridgeB
  • submit
  • fill in the remaining information and save the host, which will get built

The libvirt domain created for the new host has 2 NICs, but they are in the "wrong order", the first NIC is bridged to bridgeB and the second to bridgeA. Plus, the MAC addresses that get saved to the database once the domain has been created are mixed up.
The other data, however, looks right, with the proper bridge name in the NIC attributes.


Related issues 2 (1 open1 closed)

Related to Foreman - Tracker #2409: NetworkingNew

Actions
Related to Foreman - Refactor #7456: Extract primary interface from hostClosedGreg Sutcliffe09/16/2014Actions
Actions #1

Updated by Dominic Cleal almost 9 years ago

Actions #2

Updated by Dominic Cleal almost 9 years ago

  • Related to Refactor #7456: Extract primary interface from host added
Actions #3

Updated by Dominic Cleal almost 9 years ago

  • translation missing: en.field_release deleted (28)
Actions #4

Updated by Marek Hulán almost 9 years ago

Greg is there something we can do about this (except fixing fog so it returns custom ids back)?

Actions #5

Updated by Greg Sutcliffe almost 9 years ago

  • Status changed from New to Need more information

Marek, Eric, I can't reproduce this I'm afraid. I've created several hosts with various configurations of bridges and in every case the correct interface gets the correct bridge. Here's one example:

Submitted data:
nic1 (primary: true, provision true) on bridge virbr4
nic2 (no flags) on bridge virbr3
libvirt data:
virbr4 nic has mac 52:54:00:59:b7:61
virbr3 nic has mac 52:54:00:60:82:97
rails console:
[3] pry(main)> Host.first.interfaces
Host::Managed Load (0.8ms) SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" IN ('Host::Managed') LIMIT 1
Nic::Base Load (0.2ms) SELECT "nics".* FROM "nics" WHERE "nics"."host_id" = 3 ORDER BY identifier
=> [#<Nic::Managed id: 5, mac: "52:54:00:59:b7:61", ip: "192.168.130.24", type: "Nic::Managed", name: "foo5.primary.sapphire.local", host_id: 3, subnet_id: 1, domain_id: 1, attrs: {}, created_at: "2015-02-09 17:21:39", updated_at: "2015-02-09 17:21:39", provider: nil, username: nil, password: nil, virtual: false, link: true, identifier: "", tag: "", attached_to: "", managed: false, mode: "balance-rr", attached_devices: "", bond_options: "", primary: true, provision: true, compute_attributes: {"type"=>"bridge", "bridge"=>"virbr4", "model"=>"virtio"}>,
#<Nic::Managed id: 6, mac: "52:54:00:60:82:97", ip: "192.168.131.2", type: "Nic::Managed", name: "bar5.provisioning.sapphire.local", host_id: 3, subnet_id: 2, domain_id: 2, attrs: {}, created_at: "2015-02-09 17:21:39", updated_at: "2015-02-09 17:21:39", provider: nil, username: nil, password: nil, virtual: false, link: true, identifier: "", tag: "", attached_to: "", managed: false, mode: "balance-rr", attached_devices: "", bond_options: "", primary: false, provision: false, compute_attributes: {"type"=>"bridge", "bridge"=>"virbr3", "model"=>"virtio"}>]

As you can see, the primary nic in the DB has the :61 mac, and the other nic has the :97 mac.

Do you guys have a further notes for a reliable reproducer, or a public system I can log into to see the bug firsthand?

Actions #6

Updated by Anonymous almost 9 years ago

Greg Sutcliffe wrote:

Marek, Eric, I can't reproduce this I'm afraid. I've created several hosts with various configurations of bridges and in every case the correct interface gets the correct bridge. Here's one example:

Greg, thank you for looking into this.

As you can see, the primary nic in the DB has the :61 mac, and the other nic has the :97 mac.

That is weird... I can reproduce the issue every time I create a virtual machine, and now do so with the bridge in the "wrong" order so that the libvirt domain gets created properly.

Do you guys have a further notes for a reliable reproducer, or a public system I can log into to see the bug firsthand?

Unfortunately, our foreman deployment cannot be reached from outside the corporate network. I could change the logging level, or trace any request/response you think could be interesting. I tried to find where, in the code, the nic attributes were used to create the libvirt domain but couldn't do so. If you tell me where, I could at least add a few statements in foreman and/or fog to see what the data looks like.

Actions #7

Updated by Greg Sutcliffe almost 9 years ago

Ok, so if you up the logs to debug, you should see the assignments between fog nic objects and foreman nics being logged. That logging comes from `app/models/concerns/orchestration/compute.rb:288` - you could drop some more logger.debug lines in there or at line 281 outside the loop. From there the code goes to `app/models/concerns/fog_extensions/libvirt/server.rb:42` so that's another place where you could try to dig into what's happening.

Actions #8

Updated by Anonymous almost 9 years ago

Greg, thank you for your help.

After adding a heavy dose of logging, I found the issue to be in the `add_interfaces_to_compute_attrs` method (https://github.com/theforeman/foreman/blob/develop/app/models/concerns/orchestration/compute.rb#L259). As the Hash is later sorted according to the key (https://github.com/theforeman/foreman/blob/develop/app/models/compute_resource.rb#L274), the nics end up in reverse order. We may have this behaviour because we are using ruby 2.1.5 to run foreman, I'm not sure. I'm not sure either why the `object_id` is used as a key in the Hash :)

If I change the code in `add_interfaces_to_compute_attrs` to rely on the `interfaces` Hash order (since it's guaranteed to be stable since ruby 1.9), it works properly. I just used a regular `index`, starting at `0`, for the key.

Also, the code in `nested_attributes_for` seems to expect keys with `new_` in their name, I'm not sure why.

Do you think replacing `object_id` as an ordered key would be a proper solution to the issue?

Actions #10

Updated by Marek Hulán almost 9 years ago

Would you mind creating a PR? You can find instructions at our wiki page if needed.

Actions #11

Updated by Anonymous almost 9 years ago

Marek, I don't mind but I'm definitely not sure about the impacts of the change, especially since the issue seems to only happen in my environment (and may be due to the use of ruby 2.1).

Actions #12

Updated by Marek Hulán almost 9 years ago

If you make a PR, we can run our test suite and someone will review and test it sooner or later. We might get feedback from others whether this is good or bad idea. TBH I'm not entirely sure how nic.object_id could fail.

Actions #13

Updated by Anonymous almost 9 years ago

Alright then, I will make a PR :) Thank you for your help.

Actions #14

Updated by The Foreman Bot almost 9 years ago

  • Status changed from Need more information to Ready For Testing
  • Pull request https://github.com/theforeman/foreman/pull/2191 added
  • Pull request deleted ()
Actions #15

Updated by Daniel Lobato Garcia almost 9 years ago

I cannot reproduce this either, I can record a screencast of what I do if needed

Actions #16

Updated by Marek Hulán almost 9 years ago

Even though I was unable to reproduce, I think this can potentially happen, http://projects.theforeman.org/issues/9170#note-8 explains it pretty good. Since none of use was able to reproduce I requested some proof from reporter, which might take some time but I'd consider this as 1.8 blocker. Not setting release to it, as I don't know how long can that take.

Actions #17

Updated by Dominic Cleal over 8 years ago

  • translation missing: en.field_release set to 50
  • Assignee set to Anonymous
Actions #18

Updated by Anonymous over 8 years ago

  • Status changed from Ready For Testing to Closed
  • % Done changed from 0 to 100
Actions

Also available in: Atom PDF