Yesterday a friend told me that he could not use wget to download a web page, which was protected by HTTP authentication.
Basically the process involves two round trips. The client first requests for the resource and the server returns a 401 response so the client now knows that authentication is required (The server tells the auth method, Basic or Digest, in the response header). Then the client sends the same request again, but adding an “Authorization” header field this time. The server checks this header line and if authentication is successful, responses with the correct resource and a 200 status code. But if authentication fails, it will send another 401 response. Normally the client stops trying and tells user about the failure.
I first tried to open the page in Firefox and it loaded correctly after I entered the credentials my friend gave me. I opened Firebug and found in the headers that this site was using Digest authentication. But looks like wget does support Digest auth and it will decide the method automatically:
wget http://a.com/somefile.txt --http-user=user --http-password=pass
However it just didn’t work. So I added option “-d” to wget command and see what’s different in the headers. Oh the Firefox request Authorization header has these items:
qop=auth, nc=00000003, cnonce=”1acac079e49bddc9″
but wget has not! Later I found that someone had already reported a bug in wget here, which is the same problem.
So I switched to curl:
curl --user="user:pass" http://a.com/somefile.txt --digest -o somefile.txt
See curl is not as clever as wget – it needs you to tell it the authentication method (it does provide a magic option –anyauth) but it’s robust, reliable and it works.
By the way, my friend was trying to access an “https://…” url so at first I thought it was some problem with wget’s SSL implementation. But as it turned out, this has nothing to do with SSL.