Bug #2687
closedPerformance issues with large ISC dataset (DHCP smart proxy)
Description
Hello all!
While working on adding support for more standard DHCP options while creating host reservations (pull request https://github.com/theforeman/smart-proxy/pull/97), which includes global search based on MAC, IP or hostname (/dhcp/find/<record> you'll see below), I did a lot of testing on production dataset.
As I've already reported previously in other threads, I'm running into serious performance issues with DHCP smart-proxy using ISC DHCP backend.
Below is the data I collected executing various (local) API calls to DHCP proxy running on the following HW:
$ facter | egrep "proc|mem"
memoryfree => 124.06 GB
memorysize => 125.76 GB
memorytotal => 125.76 GB
physicalprocessorcount => 2
processor0 => Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
<32 procs>
processorcount => 32
ISC DHCP dataset: 7656 subnets with 50848 leases
I've tried both WEBrick and Apache/Passenger, but that made no difference in the API response times, so I'm going to list the details from WEBrick exercise only. As you will see below, some (major from functionality point of view) calls could not even complete within 10 minutes, so those were interrupted with ^C:
$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp
real 0m37.618s
user 0m0.015s
sys 0m0.016s
$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/169.254.1.0
{"reservations":[],"leases":[]}
real 1m8.808s
user 0m0.012s
sys 0m0.008s
$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/169.254.1.0/00:50:56:39:ac:40
Record 169.254.1.0/00:50:56:39:ac:40 not found
real 1m8.572s
user 0m0.020s
sys 0m0.000s
$ time curl -3 -H "Accept:application/json" -k -X POST https://localhost:8443/dhcp/169.254.1.0 -d 'mac=00:50:56:39:ac:40' -d 'ip=169.254.1.203' -d 'hostname=blah'
^C
real 10m24.368s
user 0m0.012s
sys 0m0.024s
<had to create the above record through omshell>
$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/169.254.1.0/00:50:56:39:ac:40
{"ip":"169.254.1.203","hostname":"blah","mac":"00:50:56:39:ac:40","subnet":"169.254.1.0/255.255.255.0"}
real 1m8.628s
user 0m0.016s
sys 0m0.008s
$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/find/00:50:56:39:ac:40
^C
real 10m39.027s
user 0m0.020s
sys 0m0.016s
$ time curl -3 -H "Accept:application/json" -k -X DELETE https://localhost:8443/dhcp/169.254.1.0/00:50:56:39:ac:40
real 1m9.113s
user 0m0.012s
sys 0m0.012s
As you can see, the time it takes even on successful calls is clearly unacceptable on large datasets and IMHO, some refactoring has to be done.
While browsing the code, it became clear to me that creating a subnet/lease maps on each request is the major contributor to this problem. Also, some extra validations before creating records, for example, is really unnecessary as omshell will do that validation much faster. I believe that at least for the following operations smart-proxy must provide much thinner layer (almost a pass-through) to omshell - create, search and delete host reservation.
Not all of the operations are available through omshell (like getting a list of subnets or leases/hosts in a particular subnet), so that obviously should stay, maybe it can be improved in some ways to help to speed things up. The other example of something that cannot be done through omshell is getting all the options for a particular record, but creating a full map of subnets/leases and then parsing through that is not the most efficient way as only dhcpd.leases needs to be parsed, which would be much faster.
These just my thoughts on some of the points and I'd like to hear your thoughts on this.
Thanks!
Files