Project

General

Profile

Actions

Bug #1151

closed

Too many systems in dashboard summary

Added by Jacob McCann over 12 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Low
Assignee:
Category:
Dashboard
Target version:
Difficulty:
Triaged:
Fixed in Releases:
Found in Releases:

Description

I am now noticing that hosts with modifications (active) are treated as both active and good. So I have 16 systems currently and it says 16 are good and 1 is active in my graph on the dashboard when I would expect 15 good and 1 active. This also affects the "Good Host Reports in the last x minutes" text summary.

Actions #1

Updated by Ohad Levy over 12 years ago

looking at the code, I cant figure out why you get duplicates:

  • active hosts = hosts that have applied or restarted resource
  • good hosts = hosts that don't have applied, restarted, failed or failed restarts

neither care about skipped resources, which I cant see a reason why one host would be in both groups.

any idea?

Actions #2

Updated by Jacob McCann over 12 years ago

I'll do some more digging today. I'm hitting this regularly and its easy to reproduce. I don't think it has anything to do with skipped resources this time. ;)

Actions #3

Updated by Jacob McCann over 12 years ago

I don't know if this will help.

mysql> select id,name,puppet_status from hosts;
+----+---------------+
| id | puppet_status |
+----+---------------+
|  2 |     150994944 | 
|  3 |     150994944 | 
| 10 |     150994944 | 
| 11 |     150994944 | 9 skipped, no failures/errors
| 13 |     150994944 | 
| 14 |     150994944 | 
| 15 |     150994944 | 
| 17 |     150994944 | 
| 18 |     150994944 | 
| 24 |     150994944 | 
| 25 |     184549376 | 
| 26 |     184549376 | 
| 27 |     184549376 | 
| 28 |     184549376 | 
| 36 |     150994944 | 9 skipped, no failures/errors
| 37 |     184549376 | 
| 38 |     184549376 | 
| 39 |     184549376 | 11 skipped, no failures/errors
| 40 |     184549376 | 
| 41 |     184549376 | 11 skipped, no failures/errors
| 42 |     150994944 | 9 skipped, no failures/errors
| 43 |     150994951 | 7 applied, 9 skipped, no failures/errors
+----+---------------+  

I couldn't figure how exactly to translate the puppet_status, but I correlated it to the last run report to give me an idea that its some mathematical way of showing the status of the system.

Anyways, the above systems/status causes on my dashboard:

Description                                      Data
Good Host Reports in the last 60 minutes         22 / 22 hosts (100%)
Hosts that had performed modifications           1
Out Of Sync Hosts                                0
Hosts in Error State                             0
Hosts With Alerts Disabled                       0

And its host with id 43 that is both showing as 'good' and 'active'.

Actions #4

Updated by Ohad Levy over 12 years ago

Jacob McCann wrote:

I don't know if this will help.

[...]
I couldn't figure how exactly to translate the puppet_status, but I correlated it to the last run report to give me an idea that its some mathematical way of showing the status of the system.

the status number is actually a bit field, where each 6 bits represent a field.
so it allow us to save in one integer all of the metrics (failed, restarted etc)

its probably easier to read the code, or even better, use the rails console to play with the values.

cd ~foreman
./script/console -e production
Host.all.each do |host|
  puts "#{host}: status => #{host.status.inspect}" 
end

Actions #5

Updated by Ohad Levy over 12 years ago

did you find anything?

thanks

Actions #6

Updated by Jacob McCann over 12 years ago

Here is a paste with: http://pastebin.com/w18RiGxV

Status of all recent hosts
Status of all hosts
Count of all hosts
Count of hosts recent.successful
Count of hosts recent.with_changes
Count of hosts recent.out_of_sync

So you can see the count is off by 1 currently for successful count ... unless (some) systems with recent changes are part of that count.

I say 'some' because there are times when everything does match up ...

If you want more output let me know. I'm not sure how to dig much deeper into this to help troubleshooting. :(

Actions #7

Updated by Tim Speetjens over 12 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100
Actions #8

Updated by Ohad Levy over 12 years ago

  • Category set to Dashboard
  • Assignee set to Tim Speetjens
Actions

Also available in: Atom PDF