Degraded Event Ingestion

Started at

Collector

Resolved

We've found the root cause of the issues & have implemented fixes.

Our Kafka provider had rate limits which we had recently surpassed causing requests to hang as their system rejected our writes. We've increased the limits and have plans to migrate from a hosted solution to a BYOC solution for more bandwidth and control in the future.

The inability to publish events caused us to loose some data as there were events on device which were never able to be received on the backend. However, for revenue tracking the providers should retry webhooks and data should fill in.

Monitoring

We've identified an issue with our event pipeline timing out for a percentage of requests. We've fixed the issue but are implementing a longer term fix.