helix

OpenTelemetry

helix leverages OpenTelemetry to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) to help you analyze your software's performance and behavior.

By being configured at its core, OpenTelemetry on helix services brings strong observability consistency across integrations and services.

Environment variables

helix relies on the following required environment variables:

  • ENVIRONMENT represents the environment the service is currently running in. When value is one of local, localhost, dev, development, the logger handles logs at debug level and higher. Otherwise, the logger handles logs at info level and higher.
  • OTEL_EXPORTER_OTLP_TRACES_ENDPOINT sets the target endpoint the trace exporter will connect to. See example using Grafana Agent below for more details.

Telemetry packages

Traces

When possible, it's strongly advised to leverage Event propagation within distributed tracing.

If the context provided contains a span then the newly-created span will be a child of that span, otherwise it will be a root span.

In the example below, we pass the HTTP request's context. The REST router integration automatically handles tracing. Therefore, the custom span created will be a child of the HTTP request's span, with no additional work on your end.

import (
  "net/http"

  "go.nunchi.studio/helix/telemetry/trace"
)

router.GET("/path", func(rw http.ResponseWriter, req *http.Request) {
  
  // ...

  _, span := trace.Start(req.Context(), trace.SpanKindServer, "Custom Span")
  defer span.End()

  if 2+2 == 4 {
    span.RecordError("this is a demo error based on a dummy condition", errors.New("any error"))
  }

  // ...

})

When running a service in Kubernetes, traces are automatically populated with these additional attributes:

  • kubernetes.namespace
  • kubernetes.pod

When running a service in Nomad, traces are automatically populated with these additional attributes:

  • nomad.datacenter
  • nomad.job
  • nomad.namespace
  • nomad.region
  • nomad.task

Logs

Logs are JSON formatted and have by default the following keys:

  • message: The log's message.
  • timestamp: Timetamp of the log, formatted for RFC-3339 with nano precision.
  • trace_id and span_id: Trace details, if log is part of a trace.

Logs levels are debug, info, warn, error, fatal.

If environment variable ENVIRONMENT is one of local, localhost, dev, development, the logger handles logs at debug level and higher. Otherwise, the logger handles logs at info level and higher.

By passing a Go context, the logger is aware if the log is part of a trace/span. If so, trace_id and span_id are added to the log so it can be linked to the respective trace/span.

In the example below, we pass the HTTP request's context. The REST router integration automatically handles tracing. Therefore, the log will be associated to the trace/span with no additional work on your end.

import (
  "net/http"

  "go.nunchi.studio/helix/telemetry/log"
)

router.GET("/path", func(rw http.ResponseWriter, req *http.Request) {
  
  // ...

  log.Warn(req.Context(), "this is a warning")

  // ...

})

When running a service in Kubernetes, logs are automatically populated with these additional fields:

  • kubernetes_namespace
  • kubernetes_pod

When running a service in Nomad, logs are automatically populated with these additional fields:

  • nomad_datacenter
  • nomad_job
  • nomad_namespace
  • nomad_region
  • nomad_task

OpenTelemetry collectors

Grafana Agent

You will find below examples for configuring Grafana Agent with appropriate log labels (using Loki) and trace attributes (using Tempo) when running helix services.

server:
  log_level: "warn"
  log_format: "json"

logs:
  configs:
    - name: "loki"
      clients:
        - url: "https://loki.endpoint.tld/api/v1/push"
          basic_auth:
            username: "username"
            password: "password"
      positions:
        filename: "/tmp/positions.yaml"
      target_config:
        sync_period: "10s"
      scrape_configs:
        - job_name: "pod-logs"
          kubernetes_sd_configs:
            - role: "pod"
          pipeline_stages:
            - docker: {}
          relabel_configs:
            - source_labels:
                - "__meta_kubernetes_pod_node_name"
              target_label: "__host__"
            - action: "labelmap"
              regex: "__meta_kubernetes_pod_label_(.+)"
            - action: "replace"
              replacement: "$1"
              separator: /
              source_labels:
                - "__meta_kubernetes_namespace"
                - "__meta_kubernetes_pod_name"
              target_label: "job"
            - action: "replace"
              source_labels:
                - "__meta_kubernetes_namespace"
              target_label: "namespace"
            - action: "replace"
              source_labels:
                - "__meta_kubernetes_pod_name"
              target_label: "pod"
            - action: "replace"
              source_labels:
                - "__meta_kubernetes_pod_container_name"
              target_label: "container"
            - replacement: "/var/log/pods/*$1/*.log"
              separator: /
              source_labels:
                - "__meta_kubernetes_pod_uid"
                - "__meta_kubernetes_pod_container_name"
              target_label: "__path__"
          pipeline_stages:
            - docker:
            - json:
                expressions:
                  level: "level"
                  kubernetes_namespace: "kubernetes_namespace"
                  kubernetes_pod: "kubernetes_pod"
                  span: "span"
                  timestamp: "timestamp"
                timestamp:
                  source: "timestamp"
                  format: "RFC3339Nano"
            - labels:
                level:
                kubernetes_namespace:
                kubernetes_pod:
                span:

traces:
  configs:
    - name: "tempo"
      automatic_logging:
        backend: "logs_instance"
        logs_instance_name: "loki"
        spans: true
        roots: true
        processes: true
        process_attributes:
          - "kubernetes.namespace"
          - "kubernetes.pod"
          - "span"
        span_attributes:
          - "kubernetes.namespace"
          - "kubernetes.pod"
          - "span"
        labels:
          - "kubernetes.namespace"
          - "kubernetes.pod"
          - "span"
        overrides:
          duration_key: "duration"
          service_key: "kubernetes_pod"
          trace_id_key: "trace_id"
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: "0.0.0.0:7021"
      remote_write:
        - endpoint: "tempo.endpoint.tld"
          insecure: true
          basic_auth:
            username: "username"
            password: "password"
server:
  log_level: "warn"
  log_format: "json"

logs:
  configs:
    - name: "loki"
      clients:
        - url: "https://loki.endpoint.tld/api/v1/push"
          basic_auth:
            username: "username"
            password: "password"
      positions:
        filename: "/tmp/positions.yaml"
      target_config:
        sync_period: "10s"
      scrape_configs:
        - job_name: "docker/system"
          docker_sd_configs:
            - host: "unix:///var/run/docker.sock"
              refresh_interval: "5s"
          relabel_configs:
            - action: "replace"
              replacement: "docker/system"
              source_labels:
                - "__meta_docker_container_id"
              target_label: "job"
            - source_labels:
                - "__meta_docker_container_name"
              regex: "/(.*)"
              target_label: "container"
            - source_labels:
                - "__meta_docker_container_log_stream"
              target_label: "stream"
        - job_name: "docker/services"
          static_configs:
            - targets:
                - "localhost"
              labels:
                job: "docker/services"
                __path__: "/var/lib/docker/containers/*/*log"
          pipeline_stages:
            - docker:
            - json:
                expressions:
                  level: "level"
                  nomad_datacenter: "nomad_datacenter"
                  nomad_job: "nomad_job"
                  nomad_namespace: "nomad_namespace"
                  nomad_region: "nomad_region"
                  nomad_task: "nomad_task"
                  span: "span"
                  timestamp: "timestamp"
                timestamp:
                  source: "timestamp"
                  format: "RFC3339Nano"
            - labels:
                level:
                nomad_datacenter:
                nomad_job:
                nomad_namespace:
                nomad_region:
                nomad_task:
                span:

traces:
  configs:
    - name: "tempo"
      automatic_logging:
        backend: "logs_instance"
        logs_instance_name: "loki"
        spans: true
        roots: true
        processes: true
        process_attributes:
          - "nomad.namespace"
          - "nomad.datacenter"
          - "nomad.region"
          - "nomad.job"
          - "nomad.task"
          - "span"
        span_attributes:
          - "nomad.namespace"
          - "nomad.datacenter"
          - "nomad.region"
          - "nomad.job"
          - "nomad.task"
          - "span"
        labels:
          - "nomad.datacenter"
          - "nomad.job"
          - "nomad.namespace"
          - "nomad.region"
          - "nomad.task"
          - "span"
        overrides:
          duration_key: "duration"
          service_key: "nomad_task"
          trace_id_key: "trace_id"
      receivers:
        otlp:
          protocols:
            grpc:
              endpoint: "0.0.0.0:7021"
      remote_write:
        - endpoint: "tempo.endpoint.tld"
          insecure: true
          basic_auth:
            username: "username"
            password: "password"

Is something missing?

Built by
Nunchi