Project

General

Profile

Actions

Bug #24634

closed

Smart proxy webrick 1.3 does not timeout SSL connections

Added by Adam Ruzicka over 5 years ago. Updated over 3 years ago.

Status:
Resolved
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Difficulty:
Triaged:
Yes
Fixed in Releases:
Found in Releases:

Description

Cloned from https://bugzilla.redhat.com/show_bug.cgi?id=1614087

Description of problem:
Smart Proxy will crash and stop responding to any request after overloaded with huge number of requests.

I can reproduce this issue by sending 1000 requests to the foreman proxy at the same time. Send higher number of requests will guarantee reproduced.

It is only reproduced on:
- Satellite >=6.3. I can't reproduce it on Satellite 6.2.15
- On port 9090. Port 8000 is fine.

Steps:
1) Run this in a terminal. Expecting many request timeout errors

foreman-rake console
1000.times { Thread.new { begin; RestClient::Resource.new('https://127.0.0.1:9090/features', verify_ssl: OpenSSL::SSL::VERIFY_NONE).get; rescue StandardError => e; p e.message; end } }

2) On another terminal. Run the below command to check the connections.

lsof -i :9090 | wc -l

3) The issue is produced when you see a stuck connection.

lsof i :9090
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
ruby 1728 foreman-proxy 8u IPv4 68616 0t0 TCP *:websm (LISTEN)
ruby 1728 foreman-proxy 12u IPv4 27104675 0t0 TCP localhost:websm
>localhost:58542 (ESTABLISHED) <====== This

4) Make another request will take forever.
  1. curl -v -k https://127.0.0.1:9090/features
    • About to connect() to 127.0.0.1 port 9090 (#0)
    • Trying 127.0.0.1...
    • Connected to 127.0.0.1 (127.0.0.1) port 9090 (#0)
    • Initializing NSS with certpath: sql:/etc/pki/nssdb
      1. STUCK FOREVER ##########

5) Restart foremon-proxy fixed the issue.


Related issues 1 (0 open1 closed)

Related to Smart Proxy - Feature #25293: Add support for pumaRejectedIdo KannerActions
Actions #1

Updated by Adam Ruzicka over 5 years ago

  • Subject changed from Smart Proxy will crash after overloaded with huge number requests to Smart Proxy will crash after overloaded with huge number requests
  • Assignee set to Ido Kanner
Actions #2

Updated by Lukas Zapletal over 5 years ago

I tried to use https://github.com/shekyan/slowhttptest which is in EPEL7 and it looks like for the basic (slow headers GET) attach our proxy on my system (VM on Ryzen1700 CPU, 14 GB RAM, 4 cores) is able to handle about 200 opened connections and then it starts refusing them, maximum amount of opened connection peaks at 320: slowhttptest -u http://127.0.0.1:8000/features -c 1000 -g -o slow-headers-1000

During the test the service is randomly not available, but once the test is stopped, it is still able to recover and process requests at normal rate which is on my system about 500 per second: ab -c 10 -t 1 http://127.0.0.1:8000/features (this is Apache ab testing utility, present in RHEL7).

So the key question is what do reporter see, if it's the service not being available during slow DoS attack, then this is normal. Every HTTP server has a top number of opened requests it can process before it starts refusing connections. Ability to handle about 200 requests at the same time is enough for foreman-proxy service which was built to serve requests from one foreman server which is typically 1-50 passenger processes (number of CPU cores, each doing typically few concurrent requests), so we are talking about scaling to several dozens of concurrent requests. We are hitting limitation of the Ruby/Rack/Sinatra stack which is one of the slower ones compared to Golang/Java/JavaScript.

Actions #3

Updated by Lukas Zapletal over 5 years ago

  • Subject changed from Smart Proxy will crash after overloaded with huge number requests to Smart proxy webrick 1.3 does not timeout SSL connections

Webrick 1.3 seems to have problems with timeouting HTTPS endpoints:

# time telnet localhost 9090
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.

This never closes, if you try HTTPS port it works fine (timeout 30 seconds - webrick's default setting). Looks like 1.4.2 version (current stable) does work correctly, connection is closed after 30 seconds.

Unfortunately Webrick 1.4.x requires Ruby 2.3 and we are still on RHEL Ruby 2.0. There are discussions in SCLing smart-proxy, that would solve this. Until then, there is no easy solution - according to git log in webrick there's been huge amount of patches in regard to timeouts, concurrency, waits and synchronization.

Actions #4

Updated by Adam Ruzicka over 5 years ago

Does puma-3.11.4 (last version running on <=2.0) suffer the same issue? Couldn't we get away by swapping webrick for puma?

Actions #5

Updated by Lukas Zapletal over 5 years ago

Puma does seem to close inactive connections after 30 seconds. Since puma has zero gem dependencies just few basic C deps this is doable. Let's add support into launcher.rb for puma and keep webrick as opt-in just in case this turns to be problematic in production as a good fallback mechanism.

https://community.theforeman.org/t/rfc-add-puma-as-the-default-smart-proxy-server/10975

Actions #6

Updated by Lukas Zapletal over 5 years ago

Ido, I started dicussion on the community list and it looks like you have assigned yourself. Do you want to work on this? It's a nice feature to work on, but keep in mind this must be tested in production setup (RPM, daemon started from systemd, Red Hat, Debian and Windows).

Actions #7

Updated by Ido Kanner over 5 years ago

Lukas Zapletal wrote:

Ido, I started dicussion on the community list and it looks like you have assigned yourself. Do you want to work on this? It's a nice feature to work on, but keep in mind this must be tested in production setup (RPM, daemon started from systemd, Red Hat, Debian and Windows).

Sure I want to work on this :)
I don't know yet how to test it though, might require some help from you on that to understand the basics of how to setup such environments and then can go from there :)

Actions #8

Updated by Lukas Zapletal over 3 years ago

  • Status changed from New to Resolved
  • Triaged changed from No to Yes

For the record, this should be fixed. This was caused by a bug in webrick 1.3 which we used due to using Ruby 2.0 from RHEL7 base system. It did not properly close slow connections, so things like security scanners brought it easily down. This was fixed after we migrated to SCL Ruby 2.5 which has webrick 1.4 which fixes this particular problem. So this should not be a problem for Foreman 1.20 or above.

Actions #9

Updated by Lukas Zapletal over 3 years ago

Actions

Also available in: Atom PDF