Observability
Makiatto exports telemetry via OpenTelemetry (OTLP), so you can use any observability stack that supports OTLP ingestion. This guide demonstrates one approach using Grafana and related open-source components. We chose this stack because it's free, self-hosted, well-documented, and widely used, but feel free to use whatever works for you (Datadog, New Relic, Jaeger, etc.).
Architecture
┌────────────┐ ┌────────────┐ ┌────────────┐
│ Makiatto │ │ Makiatto │ │ Makiatto │
│ Node A │ │ Node B │ │ Node C │
└──────┬─────┘ └──────┬─────┘ └──────┬─────┘
└───────────────┼───────────────┘
▼
┌────────────────────────┐
│ Collector │
│ (spanmetrics) │
└──┬─────────┬─────────┬─┘
▼ ▼ ▼
┌───────┐ ┌──────────┐ ┌──────┐
│ Tempo │ │Prometheus│ │ Loki │
└───┬───┘ └────┬─────┘ └──┬───┘
└──────────┼──────────┘
▼
┌─────────┐
│ Grafana │
└─────────┘
Makiatto nodes send telemetry to a central OpenTelemetry Collector, which routes data to the appropriate backends:
- Collector - Receives OTLP data, generates RED metrics (rate, errors, duration) via spanmetrics, and forwards to backends
- Tempo - Stores and indexes distributed traces
- Prometheus - Stores metrics and provides a query interface
- Loki - Aggregates and indexes logs
- Grafana - Visualisation and dashboards for metrics, traces, and logs
This guide runs all components on a single server for simplicity, but you could split them across multiple machines for high availability or to handle higher volumes of telemetry data.
Prerequisites
- A running Makiatto cluster
- A separate server for the observability stack (can be any Linux box with a public IP)
Install the following on the observability server:
# Debian/Ubuntu
apt install podman wireguard
# Fedora/RHEL
dnf install podman wireguard-tools
1. Connect to the WireGuard mesh
Makiatto nodes communicate over a private WireGuard network. Add the observability server as an external peer so nodes can send traces securely without exposing the collector publicly.
Generate a WireGuard keypair
On the observability server:
wg genkey | tee /etc/wireguard/private.key | wg pubkey > /etc/wireguard/public.key
chmod 600 /etc/wireguard/private.key
Register the peer
Copy the public key from the observability server, then from your workstation:
maki peer add o11y \
--wg-pubkey "PUBLIC_KEY_HERE" \
--endpoint your-server-ip
The --endpoint should be the public IP or domain name of the observability server (not the WireGuard address).
Name the peer with "o11y" (at the start or end) so Makiatto auto-discovers it as the telemetry endpoint. Examples: o11y, o11y-grafana, metrics-o11y.
This assigns a WireGuard address automatically. To see the full configuration:
maki peer wg-config o11y
Configure WireGuard
Save the output of maki peer wg-config o11y as /etc/wireguard/wg0.conf on the observability server, replacing <private-key> with the contents of /etc/wireguard/private.key.
Start the interface:
wg-quick up wg0
systemctl enable wg-quick@wg0
Verify connectivity by pinging a Makiatto node's WireGuard address (e.g. ping 10.44.44.1).
2. Deploy the observability stack
We'll use Podman Quadlets for systemd-managed containers.
Config files
mkdir -p /etc/makiatto-o11y
Create /etc/makiatto-o11y/tempo.yaml:
stream_over_http_enabled: true
server:
http_listen_port: 3200
grpc_listen_port: 9095
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal
Create /etc/makiatto-o11y/loki.yaml:
auth_enabled: false
server:
http_listen_port: 3100
common:
ring:
kvstore:
store: inmemory
replication_factor: 1
path_prefix: /loki
schema_config:
configs:
- from: 2020-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
storage_config:
filesystem:
directory: /loki/chunks
limits_config:
retention_period: 336h
Create /etc/makiatto-o11y/otelcol.yaml:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
connectors:
spanmetrics:
namespace: traces.spanmetrics
dimensions:
- name: cdn.cache.hit
exporters:
otlp/tempo:
endpoint: systemd-tempo:4317
tls:
insecure: true
loki:
endpoint: http://systemd-loki:3100/loki/api/v1/push
prometheus:
endpoint: 0.0.0.0:9464
service:
pipelines:
traces:
receivers: [otlp]
exporters: [spanmetrics, otlp/tempo]
metrics:
receivers: [otlp, spanmetrics]
exporters: [prometheus]
logs:
receivers: [otlp]
exporters: [loki]
Create /etc/makiatto-o11y/prometheus.yml:
scrape_configs:
- job_name: makiatto-spanmetrics
static_configs:
- targets:
- systemd-otelcol:9464
Quadlet units
Create these files in /etc/containers/systemd/.
o11y.network:
[Network]
tempo.container:
[Container]
Image=docker.io/grafana/tempo:latest
Exec=-config.file=/etc/tempo/config.yaml
Volume=/etc/makiatto-o11y/tempo.yaml:/etc/tempo/config.yaml:ro
Volume=tempo-data:/var/tempo
Network=o11y.network
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
tempo-data.volume:
[Volume]
loki.container:
[Container]
Image=docker.io/grafana/loki:latest
Exec=-config.file=/etc/loki/config.yaml
Volume=/etc/makiatto-o11y/loki.yaml:/etc/loki/config.yaml:ro
Volume=loki-data:/loki
Network=o11y.network
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
loki-data.volume:
[Volume]
otelcol.container (replace 10.44.44.X with your WireGuard address):
[Container]
Image=docker.io/otel/opentelemetry-collector-contrib:latest
PublishPort=10.44.44.X:4317:4317
Volume=/etc/makiatto-o11y/otelcol.yaml:/etc/otelcol-contrib/config.yaml:ro
Network=o11y.network
[Unit]
Requires=tempo.service loki.service
After=tempo.service loki.service
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
prometheus.container:
[Container]
Image=docker.io/prom/prometheus:latest
Volume=/etc/makiatto-o11y/prometheus.yml:/etc/prometheus/prometheus.yml:ro
Volume=prometheus-data:/prometheus
Network=o11y.network
[Unit]
Requires=otelcol.service
After=otelcol.service
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
prometheus-data.volume:
[Volume]
grafana.container:
[Container]
Image=docker.io/grafana/grafana:latest
PublishPort=3000:3000
Volume=grafana-data:/var/lib/grafana
Network=o11y.network
[Unit]
Requires=prometheus.service tempo.service loki.service
After=prometheus.service tempo.service loki.service
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
grafana-data.volume:
[Volume]
Start the stack
systemctl daemon-reload
systemctl start grafana
The dependencies will start Tempo, Loki, otelcol, and Prometheus automatically. Services restart on boot.
Optional: HTTPS with Caddy
To access Grafana over HTTPS, add Caddy as a reverse proxy.
Remove PublishPort=3000:3000 from grafana.container, then create:
Caddyfile at /etc/makiatto-o11y/Caddyfile:
grafana.example.com {
reverse_proxy systemd-grafana:3000
}
caddy.container:
[Container]
Image=docker.io/library/caddy:latest
PublishPort=443:443
PublishPort=80:80
Volume=/etc/makiatto-o11y/Caddyfile:/etc/caddy/Caddyfile:ro
Volume=caddy-data:/data
Network=o11y.network
[Service]
Restart=always
[Install]
WantedBy=multi-user.target
caddy-data.volume:
[Volume]
Reload and start Caddy:
systemctl daemon-reload
systemctl start caddy
Caddy automatically obtains a TLS certificate from Let's Encrypt.
3. Restart Makiatto nodes
Restart Makiatto on all nodes:
maki machine restart
Makiatto automatically discovers the OTLP endpoint by finding the external peer named with "o11y" and sets the service name to makiatto.{node_name}.
To override auto-discovery or tune settings:
[o11y]
otlp_endpoint = "http://10.44.44.5:4317" # Optional: override auto-discovery
sampling_ratio = 0.1 # Sample 10% of traces (default), errors/slow always captured
tracing_enabled = true # Set to false to disable trace export
metrics_enabled = true # Set to false to disable metrics export
logging_enabled = true # Set to false to disable log export
If otlp_endpoint is not set, Makiatto auto-discovers it from external peers. If no o11y peer is found, telemetry export is disabled (console logging still works).
4. Configure Grafana
- Sign in at
http://<server-ip>:3000(or your Caddy domain if configured) withadmin/admin - Add data sources:
- Prometheus:
http://systemd-prometheus:9090 - Tempo:
http://systemd-tempo:3200 - Loki:
http://systemd-loki:3100
- Prometheus:
5. Dashboards and alerting
Build dashboards and alerts to suit your needs. Makiatto exports native OpenTelemetry metrics which you can query in Prometheus. Here are some example queries to get started:
| Panel | Query |
|---|---|
| HTTP request rate | sum by (service_name) (rate(server_request_count_total[5m])) |
| Error rate | sum(rate(server_request_count_total{http_status_code=~"5.."}[5m])) |
| p95 latency | histogram_quantile(0.95, sum(rate(server_request_duration_seconds_bucket[5m])) by (le)) |
You can also add Tempo trace panels filtered by service.name =~ "makiatto.*" and enable exemplar linking to jump from metric spikes to traces.
For alerting, Grafana supports Prometheus Alertmanager or its own unified alerting system. See the Grafana documentation for more details.