Monday, June 4, 2012

Crazy Google - PPPd - SSL BAD MAC error

UPDATE: follow up

Hi all. Long time no see. Just didn't have much to say lately. But now I do. So hi :)

A lot has actually changed lately, both at personal and professional level, but the relevant part is: I have a new ISP.  I moved and the awesome 50/10Mbit 1&1.de VDSL was no longer available, so now I have a much crappier 16/1Mbit ADSL by O2/Alice (after several months of borrowing the neighbor's WiFi). Also, instead of the fantasboulous FritzBox7390 I got a crappy Alice IAD 4412, or something whatever the thing it's called. It's 2.4Ghz only and 150 Mbit. And on top of that O2/Alice is soooo worried for some reason that I would sell the router on ebay to buy a yacht, that I have to return the damn thing at the end of the 24 month contract.

Of course the first thing I did was to disable everything internet related on the router, enable PPPoE passthrough and set up PPPoE on my linux box to act as a Torrent / Router / Firewall / Apache / Misc server. Since I had the same setup with 1&1 everything went pretty smoothly and all was fine. The End.

No, of course not. Everything did go smoothly, until I tried to use GMail. I got a nice SSL error page with the following message:

Secure Connection Failed
An error occurred during a connection to accounts.google.com.
SSL peer reports incorrect Message Authentication Code.

(Error code: ssl_error_bad_mac_alert)

I tried to google it, but since I use google with https by default, it happened for www.google.com too! After a F5 it would work again.

I thought it might be an iptables problem but the usual clamp-tcpmss-to-pmtu did no good. Trying to debug I wrote the following crude script:

C=0 E=0; while [ $E = 0 ]; do curl 'https://www.google.com' --no-sessionid -v -1; E=$?; C=$((C+1)); echo $C; done

I ran it in multiple configurations of destination servers, hosts and connections. Since I still have access to the borrowed neighbor's wifi I also ran it there. It's worth mentioning that the neighbor uses the same ISP and a traceroute shows that the second router down the road is already the same, so basically what's different is the ppp method (me - pppd through crappy router, he - crappy router directly).

Result: it fails ONLY when:
  • I use my connection (pppd though router), doesn't matter if it's a NATed machine or the server itself. Ran over 11k times from other connections, no problem.
  • I connect to google servers (accounts.google.com, www.google.com). Ran over 2k connections to other https servers from same connection: no problem.
So who is to blame?
  • Google: no, it works fine from my work maciine and the neighbor's WiFi.
  • pppd: no, it worked before with 1&1.
  • Alice ADSL: no, the neighbor has Alice as well.
  • Crappy router: no, it works fine when connecting to facebook, yahoo, deutsche bank, etc.
  • Combination of all of the above: well, it works sometimes.
Solution? Sadly, I have none. I have a packet capture that shows exactly when the problem happens most often:
  • Client Hello
  • Server Hello
  • Server Certificate, Server Key Exchange, Server Hello Done
  • Client Key Exchange, Change Cipher Spec, Encrypted Handshake Message (encrypted MAC of handshake).
  • Server issues SSL Alert: Bad Record MAC.
Googling did not help either. Some suggest changing the clock would help, but same client fails only on a specific connection. Anyway, I synced all clocks of all involved machines: no joy. Clearing the cache: curl doesn't even have cache. Anyway, it didn't help. The closest online discussion of the problem is this thread. On other threads, some people hint that the problem is dependent on the particular connection, but nobody offers a decent solution.

I know it's a long shot, but: does anybody out there have an idea on how to fix this? Even a hint towards a method to further debug it would be greately appreciated. Problems for a good debug method:
  • MAC is over random numbers: any comparison with different server/connection handshakes is useless.
  • Since it is connection dependent client side errors are rather impossible.
  • The sent MAC is encrypted, it's hard to analyze with wireshark.
The last ideas I have is to capture the ppp packets and compare if contents change over wlan0 contents (unlikely, since only google complains (only google checks?? unlikely...)), or try a different router/set the router in gateway mode instead of pppoe modem...

As said before, any idea will be appreciated!


No comments: