Our website monitoring mechanism is actually quite simple. We use curl to make a connection with the website and then we grab some data to check if your site is up or down (actually, curl has more power and we use it in our new features we develop right now as well).
To make sure we are able to test your site in almost equal time intervals, we need to have a couple of testing agents working at the same time.
The job of these agents is to sit and wait for available tasks for some period of time. If there are any, agent reserves one of them and does the actual website uptime test.
However, since we started the Ukrainian and Russian government websites monitoring site (lots of downtimes), we found out that some of the scripts sometimes freeze for no good reason.
It wasn’t a serious problem, because it didn’t influence the uptime monitoring service reliability. However, the zombie agents were taking our precious server resources to do nothing.
Curl has two timeout options
And then, there comes the time when we decided to fix this issue. While debugging we found out that the solution is quite trivial.
As you probably know, we assumed that if your site is not responding in 10 seconds after our call, it means that the website is down. Nobody is waiting 10 seconds for first document request anyway. Even if your server works, the user nearby our agent’s server location might not be able to notice this.
These 10 seconds of timeout are set as a curl option (CURLOPT_CONNECTTIMEOUT). It means “if the server is not responding in X seconds, do not wait more and return connection error [http 0]”.
What we didn’t notice is the other curl option – CURLOPT_TIMEOUT and it was the cause of our problems. The option is more important in this case since it forbids curl to freeze. Connection timeout is related to the connection only. If connection is made quickly, the clock for the whole script is not ticking any more.
If you plan to use a curl mechanism inside a do-while loop with iteration limit, remember to set a curlopt_timeout or your loop won’t die sometimes.