Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100% CPU usage in Selector using Jetty on Linux #3022

Closed
prasadlvi opened this issue Oct 23, 2018 · 27 comments
Closed

100% CPU usage in Selector using Jetty on Linux #3022

prasadlvi opened this issue Oct 23, 2018 · 27 comments

Comments

@prasadlvi
Copy link

prasadlvi commented Oct 23, 2018

I am running embedded Jetty 9.4.7.v20170914. I observed 100% CPU usage when there is no load on the server. After noticing #2205 I upgraded to latest Jetty 9.4.12.v20180830. But observed result was the same.

Environment

java version "1.8.0_181"
Linux 3.16.38-tinycore64 #1 SMP Thu Nov 17 15:37:06 UTC 2016 x86_64 GNU/Linux

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 waiting with 0 keys
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken with none selected
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 processing 0 keys, 0 updates
18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - updateable 0
18-10-23 09:07:28,711 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - updates 0
18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 woken with none selected
18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 woken up from select, 0/0/4 selected
18-10-23 09:07:28,711 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 processing 0 keys, 1 updates
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - updateable 1
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - update org.eclipse.jetty.io.ChannelEndPoint$1@70d50d97
18-10-23 09:07:28,712 [ChannelEndPoint          ] [Jetty-93                    ] DEBUG - Key interests updated 0 -> 1 on SocketChannelEndPoint@626064f5{/192.168.1.179:49327<->/10.0.0.230:443,OPEN,fill=FI,flush=-,to=12/120000}{io=1/1,kio=1,kro=1}->SslConnection@72c821e0{NOT_HANDSHAKING,eio=-1/-1,di=-1,fill=INTERESTED,flush=IDLE}~>DecryptedEndPoint@3d398250{/192.168.1.179:49327<->/10.0.0.230:443,OPEN,fill=FI,flush=-,to=13/120000}=>HttpConnection@56f90049[p=HttpParser{s=START,0 of -1},g=HttpGenerator@3ef0ec53{s=START}]=>HttpChannelOverHttp@d841d4c{r=25,c=false,a=IDLE,uri=null,age=0}
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - updates 0
18-10-23 09:07:28,712 [ManagedSelector          ] [Jetty-93                    ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@756a46d5 waiting with 4 keys

I found the CPU consuming threads using top -H -o %CPU -p <process id>

When looking at the thread dump it seems same two threads in the above log cosume most CPU.

"qtp1655055736-110" #110 prio=5 os_prio=0 tid=0x00007f090c01f800 nid=0x4af7 runnable [0x00007f08e9cde000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000d27e0040> (a sun.nio.ch.Util$3)
	- locked <0x00000000d27e0030> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000d27dfe08> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:422)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:359)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

"Jetty-93" #93 daemon prio=5 os_prio=0 tid=0x00007f08f4374800 nid=0x49e4 runnable [0x00007f09125cb000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
	- locked <0x00000000d0a2f2b8> (a sun.nio.ch.Util$3)
	- locked <0x00000000d0a2f2a8> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00000000d0a2f260> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
	at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:422)
	at org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:359)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:357)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:181)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683)
	at java.lang.Thread.run(Thread.java:748)

   Locked ownable synchronizers:
	- None

Does this looks like a bus in Jetty? Or could it be a problem with my application?

@sbordet
Copy link
Contributor

sbordet commented Oct 23, 2018

If the selector is in a loop, you should have a ton of logs like the first snippet you posted, in particular like:

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected

Can you please configure the Jetty thread pool in this way:

QueuedThreadPool threadPool = ...;
threadPool.setDetailedDump(true);

and take a server dump when you see 100% CPU?

@prasadlvi
Copy link
Author

prasadlvi commented Oct 24, 2018

I am working on to get the server dump.

But I got a thread dump with threadPool.setDetailedDump(true);

Yes, when CPU goes to 100% I get the above log entries thousands of times continuously. Like in an infinite loop. Thread with nid=0x3ec5 was consuming 100% of CPU.

Can continuous log entries be considered itself a sign of bug? Or can there be occasions Jetty can output thousands of debug logs like this but still work fine?

18-10-23 09:07:28,710 [ManagedSelector          ] [qtp1655055736-110           ] DEBUG - Selector sun.nio.ch.EPollSelectorImpl@227fa294 woken up from select, 0/0/0 selected

thread_dump.txt

@sbordet
Copy link
Contributor

sbordet commented Oct 24, 2018

The selector should not be woken up if there is nothing to do.

After #2755 (fixed in 9.4.12) we are also taking care of interrupted threads that continuously wake up the selector.

We need a server dump and possibly the DEBUG logs.
You have many sources of usage of NIO, we also need to be sure that the threads that are spinning are Jetty threads (not those from Mina or DNS, etc.).

Can you reproduce this issue reliably?

@hpmtissera
Copy link

hpmtissera commented Oct 25, 2018

Yes, I can reproduce the issue. But only in Tinycore Virtual Appliance with 2 cores. I couldn't reproduce it with more than two cores or in my Mac.

I have attached the DEBUG logs and Thread Dump.

I wonder whether this can be related to the following bug report.

thread-dump.txt

@joakime
Copy link
Contributor

joakime commented Oct 25, 2018

OS Info: Linux 3.16.38-tinycore64, amd64
         processors: 2, architecture: amd64-64

Can you test with a recent (not-EOL) Linux 4.x kernel?
There's far too many bugs/quirks with the ancient Linux 3.x kernel with regards to JVM behaviors.

@prasadlvi
Copy link
Author

prasadlvi commented Oct 29, 2018

I tried with Linux Kernel 4.19 in CentOS 7. Got the same issue. I attached the full log file. Below is an extract.

18-10-29 04:38:56,106 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,106 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop woken up from select, 0/1 selected
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [HttpClient@1757080785-201   ] DEBUG - Selector loop waiting on select
18-10-29 04:38:56,107 [ManagedSelector          ] [Jetty-275                   ] DEBUG - Selector loop woken up from select, 0/4 selected

@sbordet
Copy link
Contributor

sbordet commented Oct 29, 2018

@prasadlvi, the log only shows the loop, not what happened before the loop started, which is the critical information.
Also, if you can reproduce, and you take a server dump - not a thread dump.

@prasadlvi
Copy link
Author

Yes, I understand. I'll attach the necessary information as soon as possible.

@prasadlvi
Copy link
Author

I attached the server dump and the full log file.

jetty-server-dump.txt
server.log

@sbordet
Copy link
Contributor

sbordet commented Oct 31, 2018

@prasadlvi seems you can reproduce easily, so can you try a recent Linux kernel and JDK 11 (since in JDK 11 the NIO was basically rewritten)?

@sbordet
Copy link
Contributor

sbordet commented Oct 31, 2018

The logs show this:

  • Thread qtp2024612739-89 was waiting in select() then wakes up for reading.
  • Thread qtp2024612739-93 takes over selection while -89 reads.
  • Thread -93 sets the interest to 0 and goes back to select().
  • Thread -89 reads -1, writes the TLS close_notify, then desires read interest, so a task is submitted to the selector.
  • Thread -89 initiates an orderly close().
  • Thread -93 wakes up, sets the read interest to 1, then goes back to select().
  • Thread -93 immediately wakes up for read.
  • Thread -89 completes the close() so now the SelectionKey is invalid.
  • Thread -93 is processing the SelectionKey, detects that is invalid and calls close() which at this point is a no-op.
  • Thread -93 goes back to select() and reports still having 1 key.
  • Thread -93 immediately wakes up, but now reports having 0 keys - the spin loop begins.

I cannot see anything wrong in the way Jetty handles this particular case.
There is a concurrent close, but those happen all the times, and a completely empty selector should not wake up, so this looks to me a JDK bug.

@prasadlvi it's even more important if you can test with JDK 11.
If still happens we definitely need to put a workaround for this behavior.

@brettwooldridge
Copy link

@sbordet Co-worker of @prasadlvi here, we have tried on several 3.x and 4.x kernels.

@sbordet
Copy link
Contributor

sbordet commented Nov 1, 2018

@brettwooldridge

we have tried on several 3.x and 4.x kernels

And all of them show the spin loop?

Did you manage to try JDK 11?

@prasadlvi
Copy link
Author

I tried with JDK 11 and Linux 4.19.0 (jetty-9.4.12.v20180830) and was not able to reproduce the issue.

@sbordet
Copy link
Contributor

sbordet commented Nov 2, 2018

@gregw so this seems a JDK issue rather than a OS issue.

We would need to put in the selector loop a counter, detect when it spin loops (with zero keys?), destroy and rebuild the selector, but we need to exclude the selector from being chosen when a new connection is accepted - we need a lock 😞

@gregw
Copy link
Contributor

gregw commented Nov 2, 2018

Or we just System.exit(1) and let the server be restarted externally?

@sbordet
Copy link
Contributor

sbordet commented Nov 2, 2018

@gregw we can't System.exit(), there are other selectors that work fine.

@gregw
Copy link
Contributor

gregw commented Nov 2, 2018

I think that random JVM failures are like out of memory exceptions.... who knows what else is broken in the JVM, so best to exit and try again.
But then again, I'm not really keen on filling jetty with code that catches JVM issues and working around them.

Perhaps if a selector wakes up for no reason (0 keys and no tasks), then we can sleep for a while (say 50ms). That would stop the busy loop and we'd recover in 50ms if the selector comes good.... would be interesting to learn if the selector ever comes good?

@brettwooldridge
Copy link

Do not ever, under any circumstance, call System.exit(). You have no idea what the impact of that could be. Airplanes could crash, missiles could launch, respirators could stop.

Sleeping 50ms seems perfectly reasonable. Logging something ominous every minute that the condition persists is also a good idea, so as to not sweep the problem under the rug.

@gregw
Copy link
Contributor

gregw commented Nov 4, 2018

@brettwooldridge I think System.exit() can be the right thing to do for some deployments, but it is hard for us to consider doing so for all deployments. However a busy loop in the selector isn't going to be much better for the airline/missile/respirator.

So primary action should be to triage the JVM bug and make every effort to get that fixed. Secondary action would be to either find a change in behaviour in jetty to avoid triggering the bug... or perhaps inserting a pause to mitigate the impact of trying to continue on in the face of the bug.

@joakime
Copy link
Contributor

joakime commented Nov 5, 2018

I'm -1 on using System.exit() as it has no provisions for cleanup and proper shutdown behaviors, which will cause far more drastic issues as a side effect.

@gregw
Copy link
Contributor

gregw commented Nov 6, 2018

Sorry I think my "suggestion" of System.exit() has derailed the discussion here. What I should have said it that I'm -1 on writing code in jetty to detect a JVM bug and to restart just one selector in one connector if it is detected, specially if that check and restart requires us to put a new contended lock on the connector. So I was just saying that doing an exit to restart the server would be preferable... but in reality neither is a good option.

So let's triage the issue some more and see if we can quantify exactly when it happens and if it can be raised with open/jdk. Only then can we work out if it can be detected in the code and some non-obnoxious work around put in place... but if the problem is not happening on current JVMs, then perhaps there is nothing we need to do as the problem is already fixed?

@joakime
Copy link
Contributor

joakime commented Feb 17, 2019

Last update pointed to a JVM and/or kernel issue.
Closing as not a jetty specific issue.

@joakime joakime closed this as completed Feb 17, 2019
@XuHe1
Copy link

XuHe1 commented Sep 19, 2019

so, how to solve?

@sbordet
Copy link
Contributor

sbordet commented Sep 19, 2019

@XuHe1 please open a new issue and be sure to add a server dump and a proof that you have a spin loop in the selector, along with exact Jetty version, OS, JDK version, etc.

@bhuvangu
Copy link

@prasadlvi how did you guys manage this behavior?

@prasadlvi
Copy link
Author

prasadlvi commented Sep 10, 2020

@bhuvangu It was fixed after updating to Java 11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants