Running a high-growth startup or even a large multinational brand is not an easy task. To function like a well-oiled machine, different parts of the 'machinery' need to be in good condition. Sometimes systems break down due to human or machine errors. How companies react in these situations, can either make it or break it for their reputation. Some choose to stay quiet, while other are bold enough to issue a public apology while trying to fix the problem.
Myntra on Friday took ownership of an error they recently committed and issued a public apology and explanation of how it happened. Shamik Sharma, CTO Myntra said in a company blog post,
Yesterday we unintentionally inundated many of your phones with notifications. We messed up and owe you an explanation of what happened and what we are doing to ensure it doesn’t happen again.
On Thursday, May 19, at around 2:00 pm, Myntra's notifications team updated their fleet of notification servers with a code change. It took about three minutes for the deployment systems to update all the servers with the change.
Within minutes, many users, including several Myntra employees started reporting that they were getting bombarded with notifications, unrelated to their interactions with Myntra. This error resulted in panic and outcry on Social media as many users saw order confirmations and multiple shipment notifications for products they had not ordered.
Realising this, Myntra immediately stopped their notification systems and started to troubleshoot. But the damage was already done, as a lot of notifications had already been sent. Myntra noted,
We did cancel a lot of the notifications that were en-route, but unfortunately by then a lot of our customers had already received them.
Finding the root cause
Myntra explained that after shutting down the notification sending systems, the team worked on reviewing why the error had occurred.
We realized that the problem was not with the new code base - which had been tested independently - but with how it was deployed.
Notification systems require a set of “transformations” to a message before it is sent - for example, adding the recipient’s name inside the message and adding the list of users to whom the message should be sent. The new code had a “schema change” - the list of recipients was now expected to be in a new field called “userId” rather than “recipient”.
When Myntra deployed their new code, there was a short period (2 min 37 sec) when the new code was active while notifications created by the older code were still being processed. This led to a “race condition” - the old code had already added the recipient in the old field (“recipients”) while the new code was expecting it in the new field (“userId”). When the userId wasn't found,it was left blank.
Mnytra admitted that defensive code should have been written for this case, but the company missed this. Notification messages that went through the system during this intermediate state, became untargeted notifications (ie. did not have any userids). These in turn were broadcasted to a very large set of users.
Fixing it and looking forward
Myntra added that there were several causes of the problem and they should have had more stringent checks in place. Over the last 24 hours, they claim to have added some of these checks and will be adding more. Myntra will also be reviewing all their systems and processes - and looking at their architecture as well as deployment systems in depth to check for any other such shortcomings.
The last several hours have been a humbling period for us and we deeply regret the terrible customer experience this incident has caused. I am reaching out to the affected customers and apologizing for this error. We will strive to make it up to you by committing ourselves towards building a Myntra shopping experience that is truly wonderful.
Over the past few years, many brands have been transparent and owned up to their shortcomings. In 2014, Flipkart issued a public apology after their Big Billion Day sale did not go as expected. Ola recently apologised and took down an ad after a public outcry of its sexist nature. In late 2015, Volkswagen took out multiple front page ads in newspapers to apologise to consumers for its emissions scandal.
In this age of technology where bad news travels at lightning speed, brands have realised that taking ownership of problems quickly and being transparent can help them in the long run rather than trying to hide them under the 'proverbial rug'.