Prometheus Remote Write is the protocol used to send Prometheus metrics from Prometheus (and friends) to compatible remote storage endpoints. Remote Write is generally used for long term storage, centralization, and cloud services. It also allows users to run Prometheus in an agent mode, reducing local storage requirements.
At Grafana Labs we host metrics storage for thousands of customers using Grafana Mimir. We also have a Mimir cluster that we run for ourselves as a central operations tool. We write all Prometheus metrics from our various production Kubernetes to this operations Mimir cluster, and we write roughly a gigabyte per second of compressed data over the network. This data is sent across regions and cloud providers to a central Mimir cluster. At our scale, this can become a substantial cost factor.
In this talk you will learn about the Prometheus Remote Write format and the Prometheus storage format. We will use the Remote Write format as an example of how structuring wire formats can reduce the required network egress bandwidth even when compression is already being used. More specifically, you’ll learn see how you can reuse concepts from database design, specifically Prometheus TSDB's index, to reduce the egress bytes by as much as 60%.