Skip to main content

Day 2 Observability - calls to other services

·681 words·4 mins
Observability O11y

This post assumes you’re already familiar with OpenTelemetry, and are already collecting some observability data.

Whether you’ve chosen automatic instrumentation, or manual, you’re now collecting telemetry data from your code. Congratulations 🎉

But what about all the other code you’re using? When your service makes a database query, or fetches weather data, you’re using someone else’s code. These other services may have their own production problems - can you separate issues in your code from issues in a dependency with your current observability signals?

If you’re not sure, read on! We’ll cover creating metrics and traces around your existing calls to other services over HTTP or GRPC. The samples below are in Go, but similar tactics should work in most languages.

Before you start instrumenting these calls yourself, consider searching the OTel Registry for existing instrumentation libraries. For example, Postgres database users could adopt the pgotel library, which will auto-magically provide instrumentation for existing go-pg code.

Wrapping HTTP clients
#

Most of the APIs you’re calling are likely HTTP-based. Some of these services may provide a client library, some users may choose to create their own client library, and still others will choose to use a simple HTTP client. No matter which category you’re in, this approach can help you get better telemetry (provided your language supports interfaces or something equivalent).

Let’s assume you’re using a client library to fetch pictures of cats, called a CatClient. You can create an instrumented version of this library using Go’s embedding.

To begin, we’ll define a type for our OTelCatClient:

type OTelCatClient struct {
    CatClient
}

Now we’ll need to “wrap” the CatClient method calls to include our instrumentation. For a method like CatClient.GetRandomCat, we can add a trace span as described in the OTel guide to Manual Instrumentation:

func (c *OTelCatClient) GetRandomCat(c context.Context) Cat {
    ctx, span := c.tracer.Start(c, "get-random-cat")
    defer span.End()
    return c.CatClient.GetRandomCat(ctx)
}

The same can be done to add Metrics as desired, to track the number of calls, or errors.

We can now use OTelCatClient the same way we would use a regular CatClient, and the instrumented client will produce a trace span for any calls to GetRandomCat.

If you produce your own client libraries, you can add instrumentation directly to your libraries with the OpenTelemetry API. By default, OpenTelemetry libraries use a no-op implementation which has a minimal effect on performance and does not record any data. When the OpenTelemetry SDK is configured by the consumer of your client libraries, all your beautiful telemetry will be available, sent to the destination of their choosing.

GRPC Interceptors
#

For calls made over GRPC (which includes most of Google’s Client Libraries), you can get telemetry by using one of the GRPC Interceptors provided by the otelgrpc instrumentation library.

GRPC Interceptors provide “hooks” in the GRPC handling process, as a way to implement logging, authorization, and other types of “middleware” tasks. The Interceptor concept is present in all supported GRPC languages, though I find it is not well described. This guide to gRPC and Interceptors is a nice summary of the concept.

To make use of the interceptor, it must be plumbed down into the GRPC Dial() call as an option. If you’re creating GRPC connections yourself, this is straightforward. For Google API Clients, it looks a bit like this:

import (
  iam "cloud.google.com/go/iam/apiv2"
  "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"
  "google.golang.org/api/option"
  "google.golang.org/grpc"
)
  policyclient, _ := iam.NewPoliciesClient(r.Context(),
    option.WithGRPCDialOption(
      grpc.WithUnaryInterceptor(
        otelgrpc.UnaryClientInterceptor())))

Note that for Streaming APIs, there’s also an otelgrpc.StreamingClientInterceptor.

This Policy Client will now record OpenTelemetry spans for each of its GRPC calls, and report them to whichever backend OTel has been configured to use. These spans include labels such as the method it called, and what the returned status code. With this telemetry at your fingertips, it becomes easier to identify when your dependent services are experiencing latency or instability.

Recap
#

We’ve discussed a few ways to add instrumentation when calling another service via custom clients, or GRPC.

If you create your own libraries, you can add native instrumentation (so your customers get better telemetry!) using the OTel guide to Instrumenting libraries.

Happy Instrumenting! 🔭