Bug #2687

Updated by Ohad Levy almost 6 years ago

Hello all!

While working on adding support for more standard DHCP options while creating host reservations (pull request, which includes global search based on MAC, IP or hostname (/dhcp/find/<record> you'll see below), I did a lot of testing on production dataset.

As I've already reported previously in other threads, I'm running into serious performance issues with DHCP smart-proxy using ISC DHCP backend.

Below is the data I collected executing various (local) API calls to DHCP proxy running on the following HW:

$ facter | egrep "proc|mem"

memoryfree => 124.06 GB

memorysize => 125.76 GB

memorytotal => 125.76 GB

physicalprocessorcount => 2

processor0 => Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz

<32 procs>

processorcount => 32

ISC DHCP dataset: 7656 subnets with 50848 leases

I've tried both WEBrick and Apache/Passenger, but that made no difference in the API response times, so I'm going to list the details from WEBrick exercise only. As you will see below, some (major from functionality point of view) calls could not even complete within 10 minutes, so those were interrupted with ^C:

$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp

real 0m37.618s

user 0m0.015s

sys 0m0.016s

$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/


real 1m8.808s

user 0m0.012s

sys 0m0.008s

$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/

Record not found

real 1m8.572s

user 0m0.020s

sys 0m0.000s

$ time curl -3 -H "Accept:application/json" -k -X POST https://localhost:8443/dhcp/ -d 'mac=00:50:56:39:ac:40' -d 'ip=' -d 'hostname=blah'


real 10m24.368s

user 0m0.012s

sys 0m0.024s

<had to create the above record through omshell>

$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/


real 1m8.628s

user 0m0.016s

sys 0m0.008s

$ time curl -3 -H "Accept:application/json" -k -X GET https://localhost:8443/dhcp/find/00:50:56:39:ac:40


real 10m39.027s

user 0m0.020s

sys 0m0.016s

$ time curl -3 -H "Accept:application/json" -k -X DELETE https://localhost:8443/dhcp/

real 1m9.113s

user 0m0.012s

sys 0m0.012s


As you can see, the time it takes even on successful calls is clearly unacceptable on large datasets and IMHO, some refactoring has to be done.

While browsing the code, it became clear to me that creating a subnet/lease maps on each request is the major contributor to this problem. Also, some extra validations before creating records, for example, is really unnecessary as omshell will do that validation much faster. I believe that at least for the following operations smart-proxy must provide much thinner layer (almost a pass-through) to omshell - create, search and delete host reservation.

Not all of the operations are available through omshell (like getting a list of subnets or leases/hosts in a particular subnet), so that obviously should stay, maybe it can be improved in some ways to help to speed things up. The other example of something that cannot be done through omshell is getting all the options for a particular record, but creating a full map of subnets/leases and then parsing through that is not the most efficient way as only dhcpd.leases needs to be parsed, which would be much faster.

These just my thoughts on some of the points and I'd like to hear your thoughts on this.