Occasional very slow response times, that could only be fixed by service restart (Windows) #1319

arekdygas · 2018-05-09T13:18:01Z

Current Behavior

We have CouchDB (with dreyfus/Clouseau) running on Windows 2012 R2. Lately we observed, that after some period of time, requests that normally are handled in milliseconds, take a lot more to complete. For example, running series of request:

GET db info
GET all design documents in db (_all_docs view query)
GET info about all design documents in db (about 70)

took over 1 minute, and normally it takes 1-2 seconds. After restart everything works as expected.

When problem occurs, we see no db activity like indexing, compaction, etc. Windows performance counters indicate normal system resources usage (CPU about 5-10%, memory about 30%, no heavy disk usage). There are no errors/warnings in CouchDB/Clouseau logs, and no errors in Windows event logs.

Sometimes this happens after CouchDB is running for a week or two, but recently it happened after about 18 hours from last restart. Unfortunately, when such a problem occurs, we have to restart the server ASAP, and have no time to start some real investigation.

Do you have any idea what might be happening? Maybe some hints what we should check/modify in config files?

Steps to Reproduce (for bugs)

No idea.

Your Environment

Version used: 2.1.1 built using couchdb-glazier, with Lucene full text search (dreyfus and clouseau), single node, single shard
Operating System and version (desktop or mobile): Windows Server 2012 R2 (virtual machine with 16GB RAM, 4 virtual processors)
Databases: 360 total, about 30 in real use (there's continuous replication running for these 30 databases, initiated on backup server)

wohali · 2018-05-09T14:48:07Z

Are you using attachments? If so, how many per document, and how large is each attachment? What is the maximum size of each attachment?

If so, there is a known issue with excessive CPU/RAM usage in this scenario that will be fixed with the release of 2.2.0, see #745 .

DanKrt82 · 2018-05-10T06:25:57Z

We had the same problem on a installation of CouchDB (2.1.1) on a Windows Server 2012 R2 (not virtual). We saw the problem only on the DELETE request to CouchDB.
We saw that the CPU was stable and low and also the RAM usage. We don't use any attachment on our environment. Unfortunately I have that problem on a critical machine and I need to solve the problem as soon as possible, so I cannot do many investigation and troubleshooting. I restarted the server and that didn't resolve the problem, after some time (some houres) the problem dissappear.

arekdygas · 2018-05-10T14:57:32Z

Thank you for reply Joan.

Yes - we use attachments. Generally they are quite small (up to 100KB), but some of them are larger (4-5MB). Currently we have a size limit set to 10MB by our application layer, but I have to check for any bigger attachments, that might got added to CouchDB before this limit was introduced.

I checked the issue you linked, and I can confirm that there are errors like that in our backup db (replication initiator). Unfortunately, they occurred two days before actual issue, so I believe this might be unrelated.

wohali · 2018-05-10T16:56:52Z

Please note that CouchDB on Windows is not intended as a production quality platform. It may work for you, but there are reports of performance issues that are as of yet unresolved. We know that running the exact same load on Linux on the exact same machine configuration is significantly more performant! Further, we have no resources to help investigate these problems.

@DanielMidali that sounds unrelated.

@arekdygas Can you try adding the following to your etc\vm.args file:

-ssl session_lifetime 300

Then, restart the CouchDB service.

arekdygas · 2018-05-11T16:06:56Z

It looks like the problem was not related to replication after all.

We installed another CouchDB instance yesterday, to test some application changes on separate environment. One of our clients who was testing the changes, reported that system does not respond to requests. As this was just a test db, we were able to experiment with it a bit.

We discovered that the problem started when our app sent hundreds of requests to CouchDB - at some point in time db responses became slower and slower, unless they were so slow, that it looked like db was not responding at all. In the end, the direct cause of the problem was the management of connections in our application - instead of reusing the single connection, or some connections pool, each request created a separate connection. It looks like we reached some CouchDB limit, but weren't sure how to increase it.

We modified max_dbs_open to 5000 in default.ini and added '+Q 65536' in vm.args (this should work like ERL_MAX_PORTS, but maybe I'm wrong). We also checked the suggested '-ssl session_lifetime 300' (just in case), but unfortunately nothing helped.

Finally, we changed connections management in our application, so that now they are reused, and the problem doesn't occur anymore. This is great, but I must admit, that I'd be glad if I knew what happened on db side :)

As to the Windows platform - I know this OS support is not a priority for your team, but I didn't know that running CouchDB on Windows was not supposed in production. We will think on changing the environment.

Thank you very much for hints and help Joan!

wohali added the windows label May 9, 2018

wohali closed this as completed May 11, 2018

nerdvegas mentioned this issue Feb 28, 2020

Occasional extremely slow /_changes request #2612

Closed

This was referenced May 3, 2024

Slow couchdb after many queries #5044

Open

CouchDB queries are extremely slow after many queries are made hyperledger/fabric#4835

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Occasional very slow response times, that could only be fixed by service restart (Windows) #1319

Occasional very slow response times, that could only be fixed by service restart (Windows) #1319

arekdygas commented May 9, 2018

wohali commented May 9, 2018 •

edited

Loading

DanKrt82 commented May 10, 2018 •

edited

Loading

arekdygas commented May 10, 2018

wohali commented May 10, 2018

arekdygas commented May 11, 2018

Occasional very slow response times, that could only be fixed by service restart (Windows) #1319

Occasional very slow response times, that could only be fixed by service restart (Windows) #1319

Comments

arekdygas commented May 9, 2018

Current Behavior

Steps to Reproduce (for bugs)

Your Environment

wohali commented May 9, 2018 • edited Loading

DanKrt82 commented May 10, 2018 • edited Loading

arekdygas commented May 10, 2018

wohali commented May 10, 2018

arekdygas commented May 11, 2018

wohali commented May 9, 2018 •

edited

Loading

DanKrt82 commented May 10, 2018 •

edited

Loading