Our application started having integration issues early March this year. Did I say that was intermittent? To add to our confusion, we could not replicate it in our testing environments!
The integration point is a 3rd party payment gateway. Our application logs indicated that our code was not getting hit by the payment gateway site.
We had not changed our code since years. I know. I know. That's an excuse we developers tell you if you encounter us with a problem. This was different. No code changes. No changes to settings in the payment gateway either.
What happened? What changed? Why was this occurring?
During the months of April and May, the occurrences reduced. By the time we were ready to write it off as another network issue, it resurfaced in late June. And it spiked during July, increasing its clout in August and September!
While we were rummaging through the logs (and the code) over and over again, a StackOverflow conversation appeared in our inbox - regarding a restriction imposed by Chrome on the SameSite attribute of the cookies. We were sure of neither setting any cookies nor using any ASP.NET Session in our code, BUT, the trend matched the SameSite timeline.
Why was this happening? And why not in testing environments?
During our investigation, we noticed one thing that caught our eye - there were entries in IIS logs for the integration URL (our app), with no userid (empty cs-username). The timestamp of problematic transactions in payment gateway and that of these entries matched. Where is this userid coming from? It turned out to be Siteminder, our Identity Provider.
Through our FRT (Failure Request Tracing) logs we found that SMSESSION cookie (set by Siteminder) was not being passed by the browser when navigating back from the payment gateway for these transactions. Cookie! Found the cookie! And guess what! We were not setting the SameSite attribute for the cookie! Since SMSESSION was not being returned by the browser when navigating from payment gateway, the users where getting kicked off and hence the blank userids in IIS logs (which means Access Denied).
Whew!
So, what the browser (Chrome dear Chrome) does is when you don't set the SameSite attribute to the cookie you create, it adds one for you - with a value of Lax. It is less restrictive than Strict. With one caveat - With Lax, cookies will not be preserved across different domains and if a POST is involved. The lesson learned is that if you are creating a cookie, you must specify the SameSite attribute - with the values None, Lax or Strict. With None, you have to specify secure also for it to work.
You can learn more about SameSite here. This article mentions about how SameSite attribute changes affects SMESSION on a cross-domain setting.
Finally, we deployed the change in the Siteminder agent to set the SameSite attribute to None, and hola! we started seeing more and more transactions coming in from the payment gateway. If it was not for Covid19, there would have been a party at our workplace,
Every thriller has a twist and a cliffhanger.
The twist is that SameSite is not supported by all browsers. We have to keep in mind when we implement the solution. The list of incompatible browsers are listed in the article.
The cliffhanger is that we are not able to reproduce the issue in testing environments till now.
Very nicely written, Rajeev !!
ReplyDeleteWe had the same issue a couple of months ago and spent countless hours finding an apt resolution. The details that you provided are very helpful to others facing the same issues. We ended up doing exactly what you posted and now our failure rate went down to 1-5% from 30-50%.
Thank you for the informative post!!
-Rahul