Why is HTTPS slower than HTTP when using wget?
If I try to download an image using wget, HTTP URLs seem to work fine whereas connections to HTTPS URLs are very slow.
There are a number of possible reasons why you might run into this issue. If your HTTPS wget requests are loading slowly, you can try the following options to see if they help, as suggested in this StackOverflow thread:
Disable IPv6 connections in favor of IPv4:
Add a UserAgent to your request's header:
-U "Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-GB; rv:22.214.171.124) Gecko/2008070206 Firefox/3.0.1"
Disable certificate checking:
Enable verbose output to debug:
If the above don't solve your issue, you might also try adding the destination IP and hostname to the end of your hosts file to bypass DNS resolution, e.g.:
nano /etc/hosts 126.96.36.199 foo.bar.com
Some may want a deeper understanding why SSL and DNS might have an effect on wget's performance. See also Why is my web site so slow?
In brief, many more connections have to be made to set up an HTTPS connection which takes time before we can even start downloading the content, then there is overhead in encrypting the communication.
A large portion of the time waiting for a web request can be waiting for other things to happen. The main things are:
- waiting on DNS to return an IP Address (or an IPv6 address, or a mixture of both)
- verifying that the SSL certificate is valid by verify its domain is associated with the IP address with whom we are communicating.
- If the computer tries to go over a network that isn't working, for example you have a working IPv4 configuration and a non-working IPv6 configuration, you may spin your wheels waiting for the response on one protocol until wget times out and tries the working protocol.
- Specifying the user agent in the request can save time in the negotiation of the strength of the cipher.
Let's say your IPv6 configuration is not working properly. You make a request with wget for www.example.com. Your server picks a nameserver round robin from your resolv.conf. It picks 2001:DB8::4. Your IPv6 configuration is not working so we wait and wait and time out. Then we pick another address 2001:DB8::5. We wait again. Now we get 203.0.113.4 and off we go. Now we contact the server. It has an IPv6 address and and IPv4 address. Our name server hands us the IPv6 address and we wait again. Then we try the IPv4 address. We start to figure out how we are going to encrypt the traffic. We download the certificate from the other server. (Another connection) Do we trust these people? We have some certificates we trust. This sites certificate says it was signed by a certificate that we trust. We start to do a background check. We check with another DNS server to see if the address we are talking to matches the name on the certificate. It checks out. We have to discuss what codes we are going to use. (More connections) Now we get down to the heavy math. The server is running slow, so we wait while we negotiate a shared symmetric key so we can encrypt our traffic. We do some extra encryption to exchange keys. (More network connections.) We are misconfigured and have high hopes for this IPv6 thing, so we try the next address we have for the web server. We wait again. (In theory this could have saved us some time because there might be more than one web server, so we might have gotten faster results if we asked another server to do something complicated while the first server could be conceivably still be busy, but in our case both addresses are the same server and one of them is a dead end.)
In order to understand what is really happening, turning on the -v flag will give us more information about what is happening behind the scenes.
Putting the address of the server you want to visit in your /etc/hosts file will eliminate most of this back and forth with the DNS Servers and save a lot of time.