On Nov 14th, 2023 we experienced an unexpected interruption in our Bridge Automatic Withdrawal Service. Here’s a post-mortem about it.
On the aforementioned date, we experienced an unexpected interruption in our Automatic Withdrawal Service for the Starkgate Bridge. This service simplifies bridging funds between Starknet and Ethereum. It enhances user experience by eliminating the need for additional L1 claim transactions, once the bridge is finalized on Ethereum mainnet. Although no user funds were at risk and normal service resumed within a day, this report aims to dissect the incident, identify its causes, and outline measures to prevent future occurrences.
The automatic withdrawal service simplifies the user experience in eliminating the need for claim action on Ethereum Mainnet. It allows users to transfer funds from Starknet to Ethereum seamlessly, in 1 click, without incurring an additional fee. The amount that is paid to the SpaceShard Relayer, covers the L1 gas fee for the claim. Our system, through an indexer, monitors these transactions to ensure the successful transfer of funds to our address. Upon confirmation of the transaction on L1, our relayer executes a claim and transfer to the user's address on their behalf.
The service interruption was first detected at block 395985, where our system ceased receiving data from the indexer. This failure interrupted our ability to recognize and process transactions involving the additional fee for the Automatic Withdrawal Service so we paused the service as a precaution.
Our investigation revealed that the core issue came from a recent change implemented by Apibara, third-party indexer. Apibara introduced an option to exclude transaction receipts to manage the extensive data generated by ETH events. This new feature, however, required an update in the configuration file. Unfortunately, while the TypeScript SDK of Apibara was updated to reflect this change, the Python SDK, which our system relies on, was not and we unfortunately weren’t informed of a change. Consequently, our system was affected, leading to the data reception issue.
Upon identifying the problem, we promptly reached out to the Apibara team. They swiftly helped us to fix the issues and we managed to restore our service to normal functionality.
To prevent similar incidents in the future, we have implemented several measures:
We are sorry for all the inconvenience caused by this interruption and are committed to ensuring the reliability and efficiency of our services. The measures outlined above are a testament to our dedication to continuous improvement and the security of our users' assets.