Step-by-Step Guide to Integrating Spring Boot and OpenTelemetry with Micrometer on GCP for Distributed Tracing

In today’s complex, microservices-driven landscape, understanding how requests flow through distributed systems is essential for maintaining performance, diagnosing issues, and optimizing applications. Distributed tracing provides a powerful tool for developers and operations teams to visualize and analyze the journey of requests across various services and components.

This comprehensive guide will take you through the process of integrating Spring Boot with OpenTelemetry using Micrometer to achieve robust distributed tracing. We’ll cover everything from setting up a local development environment to deploying your application on Google Cloud Platform (GCP) Cloud Run, ensuring you gain a thorough understanding of how to implement and leverage distributed tracing in both local and cloud environments.

cover

By the end of this tutorial, you will be able to:

  1. Set up a Spring Boot application with OpenTelemetry and Micrometer
  2. Implement distributed tracing in a simple order management system
  3. Configure and use OpenTelemetry Collector and Jaeger for local tracing visualization
  4. Deploy your application to GCP Cloud Run with tracing enabled
  5. Analyze trace data using GCP’s Cloud Trace

Project Setup

To begin our journey into distributed tracing with Spring Boot and OpenTelemetry, we’ll start by setting up a new Spring Boot project and adding the necessary dependencies.

Create a New Spring Boot Project

  1. Ensure Java Environment
    First, make sure that Java 21 is installed on your system.

  2. Create a Spring Boot Project
    Next, visit Spring Initializr to generate a new Spring Boot project. You can use the following link with pre-configured settings to download a project template:

    Download Project Template

    This template includes the following initial dependencies:

    • Spring Web
    • Spring Boot Actuator
  3. Download and Extract the Project
    Download the generated project and extract it to your preferred location.

Add Necessary Dependencies

After downloading the project, you’ll need to manually add some additional dependencies to enable OpenTelemetry tracing and Micrometer integration.

Open your build.gradle file and add the following dependencies under the dependencies block:

1
2
3
4
5
6
7
8
9
10
11
12
dependencies {
implementation 'org.springframework.boot:spring-boot-starter-actuator'
implementation 'org.springframework.boot:spring-boot-starter-web'
testImplementation 'org.springframework.boot:spring-boot-starter-test'
testRuntimeOnly 'org.junit.platform:junit-platform-launcher'

// Manually added dependencies
implementation 'org.springframework.boot:spring-boot-starter-aop'
implementation 'io.micrometer:micrometer-tracing-bridge-otel'
implementation 'io.opentelemetry:opentelemetry-exporter-otlp'
implementation 'io.opentelemetry.instrumentation:opentelemetry-logback-appender-1.0:2.3.0-alpha'
}

Dependency Descriptions:

  • spring-boot-starter-aop: Enables aspect-oriented programming in Spring, which Micrometer uses for tracing.
  • micrometer-tracing-bridge-otel: The Micrometer bridge for OpenTelemetry, allowing us to use OpenTelemetry with Micrometer’s API.
  • opentelemetry-exporter-otlp: The OpenTelemetry Protocol (OTLP) exporter, which we’ll use to send our traces to the OpenTelemetry Collector.
  • opentelemetry-logback-appender-1.0: This appender allows us to correlate logs with traces.

These dependencies set up the foundation for implementing distributed tracing in our Spring Boot application using OpenTelemetry and Micrometer.

Configuring Application Properties

After creating a new Spring Boot project and adding the necessary dependencies, we need to configure our application properties. Create a file named application.yml in the src/main/resources directory with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
spring:
application:
name: otel-demo
threads:
virtual:
enabled: true

management:
endpoint:
health:
probes:
enabled: true
tracing:
sampling:
probability: 1.0
observations:
annotations:
enabled: true
otlp:
tracing:
endpoint: http://localhost:4318/v1/traces

Let’s break down this configuration:

  1. spring.application.name: Sets the name of our application to “otel-demo”.

  2. spring.threads.virtual.enabled: Enables the use of virtual threads, a feature of Java 21 that improves application performance for I/O-bound operations.

  3. management.endpoint.health.probes.enabled: Enables health probes, which are useful for Kubernetes deployments and general application health monitoring.

  4. management.tracing.sampling.probability: Sets the sampling probability to 1.0, meaning all traces will be sampled. This is useful for development but should be adjusted for production environments.

  5. management.observations.annotations.enabled: Enables the use of observability annotations like @Observed, allowing for easy integration of tracing into your code.

  6. management.otlp.tracing.endpoint: Specifies the endpoint where trace data will be sent. This should match the OpenTelemetry Collector’s HTTP receiver endpoint.

This configuration sets up your application to use virtual threads, enables comprehensive tracing, and configures the endpoint for sending trace data to your OpenTelemetry Collector. It works in tandem with the dependencies we added earlier and prepares your application for robust observability in both local and cloud environments.

Remember to adjust these settings, particularly the sampling probability and endpoint, when moving from development to production environments to ensure optimal performance and data collection.

Building an Order Management System

To demonstrate distributed tracing in action, we’ll create a simple order management system. This system will include an order controller to handle HTTP requests and an order service to process orders. We’ll also configure asynchronous processing to illustrate how tracing works across different execution threads.

Design Overview

Our order management system is designed with the following components:

  1. OrderController:

    • Purpose: Handles incoming HTTP requests for order creation.
    • Responsibilities:
      • Receives order creation requests.
      • Interacts with the OrderService to create and process orders.
      • Returns the order status to the client.
  2. OrderService:

    • Purpose: Contains the business logic for creating, processing, and checking the status of orders.
    • Responsibilities:
      • Creates new orders.
      • Processes orders asynchronously.
      • Retrieves the status of orders.
  3. ApplicationConfiguration:

    • Purpose: Configures asynchronous processing to support tracing across threads.
    • Responsibilities:
      • Sets up a task executor that propagates the tracing context across different execution threads.

This design follows the principles of separation of concerns, allowing us to clearly demonstrate how a single request flows through different parts of our application. It also shows how the tracing context is maintained during asynchronous processing.

The use of asynchronous processing in the OrderService, particularly during order processing, will highlight how distributed tracing can provide insights into operations that span multiple threads or even different services in a more complex system.

Implementing OrderService

The OrderService contains the business logic for creating, processing, and checking the status of orders. Here’s the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
@Service
public class OrderService {

private static final Logger log = LoggerFactory.getLogger(OrderService.class);
@Autowired
private Tracer tracer;

@Observed(name = "order.create", contextualName = "create-order")
public String createOrder(String customerId) {
log.info("Creating order for customer: {}, traceId: {}", customerId, tracer.currentTraceContext().context().traceId());
simulateWork(500);
return "ORD-" + System.currentTimeMillis();
}

@Async
@Observed(name = "order.process.async", contextualName = "process-order-async")
public void processOrderAsync(String orderId) {
log.info("Start processing order asynchronously: {}, traceId: {}", orderId, tracer.currentTraceContext().context().traceId());
simulateWork(2000);
log.info("Finished processing order: {}", orderId);
}

@Observed(name = "order.status", contextualName = "get-order-status")
public String getOrderStatus(String orderId) {
log.info("Checking status for order: {}", orderId);
simulateWork(200);
return String.format("Status for order %s: Processing", orderId);
}

private void simulateWork(long milliseconds) {
try {
Thread.sleep(milliseconds);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
log.error("Work simulation interrupted", e);
}
}
}

Key Aspects:

  • Service Methods:
    The service includes three primary methods: createOrder, processOrderAsync, and getOrderStatus. Each method simulates a workload and logs the operation.

  • @Observed Annotation:
    The @Observed annotation is used to automatically create spans for each method, with unique name and contextualName values to track performance and behavior.

  • Tracing and Observability:
    The Tracer object is used to log the trace ID, correlating logs with traces. The @Async annotation on processOrderAsync ensures that tracing context is maintained in asynchronous operations.

  • Simulated Work:
    The simulateWork method is used to simulate processing times, which is useful for demonstrating tracing across different operations.

Implementing OrderController

The OrderController handles HTTP requests related to order creation. Here’s the implementation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@RestController
@RequestMapping("/api/orders")
public class OrderController {

private static final Logger log = LoggerFactory.getLogger(OrderController.class);
private final OrderService orderService;

public OrderController(OrderService orderService) {
this.orderService = orderService;
}

@PostMapping
@Observed(name = "order.create.request", contextualName = "create-order-request")
public String createOrder(@RequestParam String customerId) {
log.info("Received order creation request for customer: {}", customerId);
String orderId = orderService.createOrder(customerId);
orderService.processOrderAsync(orderId);
return orderService.getOrderStatus(orderId);
}
}

Key Aspects:

  • createOrder Method:
    This method, annotated with @PostMapping, handles POST requests to create an order. It accepts a customerId parameter and coordinates the order creation process through the OrderService.

  • @Observed Annotation:
    The @Observed annotation on the createOrder method automatically creates a span, allowing detailed tracing of the order creation flow. The name and contextualName attributes are used for monitoring and tracing visualization.

  • Tracing and Observability:
    The method ensures that tracing context is maintained throughout the request lifecycle, from receiving the order creation request to processing it asynchronously and retrieving its status.

Configuring Asynchronous Tracing

To ensure that tracing context is properly propagated in asynchronous operations, we need to configure a task executor that supports context propagation. Here’s how to implement this configuration:

1
2
3
4
5
6
7
8
9
10
11
12
@Configuration(proxyBeanMethods = false)
public class ApplicationConfiguration {

private static final Logger log = LoggerFactory.getLogger(OrderController.class);

@Bean(name = "propagatingContextExecutor")
public TaskExecutor propagatingContextExecutor() {
SimpleAsyncTaskExecutor taskExecutor = new SimpleAsyncTaskExecutor();
taskExecutor.setTaskDecorator(new ContextPropagatingTaskDecorator());
return taskExecutor;
}
}

Key Aspects:

  • SimpleAsyncTaskExecutor:
    We use SimpleAsyncTaskExecutor, which creates a new thread for each task execution. In Java 21, this executor will automatically use virtual threads, providing scalability for I/O-bound tasks.

  • ContextPropagatingTaskDecorator:
    The ContextPropagatingTaskDecorator ensures that the tracing context is propagated to asynchronous tasks, whether they run on platform threads or virtual threads.

  • Java 21 Compatibility:
    This configuration is compatible with Java 21 and leverages virtual threads when available, without requiring changes to the codebase.

  • Service Layer Integration:
    To use this executor in services like OrderService, specify it in the @Async annotation:

    1
    2
    3
    4
    5
    @Async("propagatingContextExecutor")
    @Observed(name = "order.process.async", contextualName = "process-order-async")
    public void processOrderAsync(String orderId) {
    // Method implementation
    }

This configuration ensures that asynchronous operations are properly traced, maintaining the tracing context across different execution threads.

Configuring OpenTelemetry Collector and Jaeger

In this section, we’ll set up the OpenTelemetry Collector and Jaeger to collect and visualize our trace data. The OpenTelemetry Collector acts as a pipeline that receives, processes, and exports telemetry data, while Jaeger is used to visualize and analyze the distributed traces.

Setting up OpenTelemetry Collector configuration file

First, we need to create a configuration file for the OpenTelemetry Collector. This file will define how the Collector receives, processes, and exports trace data.

Create a configuration file named otel-collector.yml in the dev-resources/collector folder with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318

processors:
batch:

exporters:
otlp:
endpoint: "jaeger:4317"
tls:
insecure: true

extensions:
health_check:
endpoint: "0.0.0.0:13133"
pprof:
endpoint: "0.0.0.0:1888"
zpages:
endpoint: "0.0.0.0:55679"

service:
extensions: [health_check, pprof, zpages]
pipelines:
traces:
receivers: [otlp]
processors: [batch]
exporters: [otlp]

Explanation of the Configuration:

  • Receivers:
    The otlp receiver is configured to listen on both gRPC and HTTP protocols. It allows the Collector to receive trace data from your application.

  • Processors:
    The batch processor batches spans together before they are exported, improving performance by reducing the number of export requests.

  • Exporters:
    The otlp exporter sends the collected traces to Jaeger on port 4317 via gRPC. The tls.insecure setting is set to true because we’re using an unencrypted connection in this setup.

  • Extensions:
    Extensions like health_check, pprof, and zpages are included to provide additional debugging and health monitoring capabilities for the Collector.

  • Service:
    The service block defines the extensions and pipelines that are active in the Collector. The traces pipeline processes trace data using the otlp receiver and exporter.

Configuring Docker Compose file

Next, we’ll configure Docker Compose to run the OpenTelemetry Collector and Jaeger as services. This setup allows us to easily start, stop, and manage these services in a local environment.

Create a compose.yaml file in your project root with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
services:
otel-collector:
container_name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
restart: always
command:
- --config=/etc/otelcol-contrib/otel-collector.yml
volumes:
- ./dev-resources/collector/otel-collector.yml:/etc/otelcol-contrib/otel-collector.yml
ports:
- "1888:1888" # pprof extension
- "8888:8888" # Prometheus metrics exposed by the collector
- "8889:8889" # Prometheus exporter metrics
- "13133:13133" # health_check extension
- "4317:4317" # OTLP gRPC receiver
- "4318:4318" # OTLP HTTP receiver
- "55679:55679" # zpages extension
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:13133"]
interval: 10s
timeout: 5s
retries: 5
start_period: 15s
depends_on:
jaeger:
condition: service_healthy
networks:
- otel-network

jaeger:
container_name: jaeger
image: jaegertracing/all-in-one:latest
restart: always
environment:
- COLLECTOR_OTLP_ENABLED=true
ports:
- "5775:5775/udp"
- "6831:6831/udp"
- "6832:6832/udp"
- "5778:5778"
- "16686:16686"
- "14268:14268"
- "14250:14250"
- "9411:9411"
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "http://localhost:16686"]
interval: 10s
timeout: 5s
retries: 5
networks:
- otel-network

networks:
otel-network:
driver: bridge

Explanation of the Docker Compose Configuration:

  • otel-collector Service:
    This service runs the OpenTelemetry Collector using the configuration we defined in otel-collector.yml. It exposes various ports for health checks, metrics, and trace data ingestion.

  • Jaeger Service:
    Jaeger is set up using the jaegertracing/all-in-one image, which includes the Jaeger query UI, collector, agent, and ingester. This service is configured to receive and display trace data from the OpenTelemetry Collector.

  • Health Checks:
    Health checks are defined for both services to ensure they start up correctly and remain healthy during operation. The depends_on configuration ensures that the Collector only starts once Jaeger is healthy.

  • Networks:
    A custom network (otel-network) is created to allow the services to communicate with each other.

Starting the services

Now that the configuration files are ready, you can start the OpenTelemetry Collector and Jaeger services using Docker Compose.

Run the following command in your terminal:

1
docker compose up -d

Explanation:

  • docker compose up -d:
    This command starts both the otel-collector and jaeger services in detached mode, meaning they run in the background.

Verifying the Setup:

  • OpenTelemetry Collector Health Check:
    You can verify that the OpenTelemetry Collector is running by visiting http://localhost:13133. This URL will show a health status page indicating whether the Collector is healthy.

  • Jaeger UI:
    To access the Jaeger UI, visit http://localhost:16686. This interface allows you to search for and visualize traces collected by the OpenTelemetry Collector.

With these services running, your application is now ready to send trace data to the OpenTelemetry Collector, which will forward it to Jaeger for visualization and analysis.

Testing the Results in Jaeger

Once you’ve set up your order management system with OpenTelemetry and Jaeger, it’s essential to test the implementation to ensure that distributed tracing is functioning correctly. This section will guide you through testing the system and interpreting the results in Jaeger.

Generating Trace Data with curl

To generate trace data and test the order management system, use the following curl command:

1
curl -X POST "http://localhost:8080/api/orders?customerId=12345"

This command sends a POST request to the /api/orders endpoint with a customerId of 12345. The server should respond with the order status, indicating that the order is being processed.

For a more comprehensive test, run this command multiple times with different customerId values. This will generate a variety of trace data, helping you understand how the system behaves under different scenarios and inputs.

Viewing and Understanding Trace Records in Jaeger

After generating trace data, follow these steps to view and analyze it in Jaeger:

  1. Open your web browser and go to http://localhost:16686.
  2. In the Jaeger UI, select your service name (e.g., “otel-demo” as configured in your application.yml) from the left sidebar.
  3. Adjust the time range to search for recent traces.
  4. Click on “Find Traces” to display a list of recorded traces.

Selecting a specific trace will bring up a detailed view. Here’s how to interpret the information:

Jaeger UI

Trace Overview

  • Trace ID: A unique identifier for each trace, representing a single request through your system.
  • Duration: The total time taken for the request, from start to finish.
  • Service & Operation: Indicates which service and operation initiated the trace.
  • Depth: Shows the number of nested spans in the trace.
  • Total Spans: The number of individual operations recorded within the trace.

Span Details

Each row in the trace view represents a span, which corresponds to a specific operation within your application. Key information includes:

  • Operation Name: Describes the action represented by the span (e.g., “HTTP POST /api/orders”, “create-order-request”).
  • Service Name: The service that performed this operation.
  • Duration: The time taken for this specific operation.
  • Start Time: When this operation began, relative to the start of the trace.

Analyzing the Trace

In the example trace, you should see spans that represent different stages of order processing:

  1. The initial HTTP POST request to /api/orders.
  2. The create-order-request span, representing the execution of the controller method (738.82ms).
  3. The create-order span, showing the time taken to create the order within the service layer (708.97ms).
  4. The get-order-status span, indicating the time required to retrieve the order status (201.51ms).
  5. The process-order-async span, representing the asynchronous processing of the order (2 seconds).

Pay particular attention to the process-order-async span. It should overlap with other spans, demonstrating how Jaeger visualizes asynchronous operations and their relationship with other tasks.

Tracing on GCP Cloud Run

Cloud Run offers a serverless platform for deploying and scaling containerized applications. Integrating our Spring Boot application with OpenTelemetry tracing on Cloud Run provides valuable insights into the application’s performance in a cloud environment. This section focuses on integrating logs with Cloud Logging, a critical component for achieving comprehensive observability.

Integrating Logs with Cloud Logging

To integrate application logs with GCP Cloud Logging, you need to configure the logging framework (Logback in this case) and set up an OpenTelemetry log exporter. This setup ensures that logs are properly formatted and sent to Cloud Logging, where they can be correlated with traces.

  1. Configuring Logback

    Start by creating a logback-spring.xml file in the src/main/resources directory:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
    <include resource="org/springframework/boot/logging/logback/defaults.xml"/>
    <include resource="org/springframework/boot/logging/logback/console-appender.xml" />

    <appender name="OpenTelemetry"
    class="io.opentelemetry.instrumentation.logback.appender.v1_0.OpenTelemetryAppender">
    </appender>

    <root level="INFO">
    <appender-ref ref="CONSOLE"/>
    <appender-ref ref="OpenTelemetry"/>
    </root>
    </configuration>

    What This Configuration Does:

    • It includes default Spring Boot logging configurations.
    • It adds an OpenTelemetry appender, enabling logs to be sent to OpenTelemetry.
    • It configures both console output and OpenTelemetry logging for all messages at the INFO level and above.
  2. Creating the OpenTelemetry Log Exporter Configuration

    Next, create a Java class named OpenTelemetryLogExporterConfig in the src/main/java/com/example/otel directory:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    @Configuration
    @ConditionalOnProperty(name = "otel.logs.exporter.enabled", havingValue = "true")
    public class OpenTelemetryLogExporterConfig {

    private static final Logger log = LoggerFactory.getLogger(OpenTelemetryLogExporterConfig.class);

    @Value("${otel.logs.exporter.endpoint:http://127.0.0.1:4317}")
    private String logsExporterEndpoint;

    @Bean
    OpenTelemetry openTelemetry(
    final SdkLoggerProvider sdkLoggerProvider,
    final SdkTracerProvider sdkTracerProvider,
    final ContextPropagators contextPropagators) {
    final OpenTelemetrySdk openTelemetrySdk = OpenTelemetrySdk.builder()
    .setLoggerProvider(sdkLoggerProvider)
    .setTracerProvider(sdkTracerProvider)
    .setPropagators(contextPropagators)
    .build();
    OpenTelemetryAppender.install(openTelemetrySdk);
    return openTelemetrySdk;
    }

    @Bean
    SdkLoggerProvider otelSdkLoggerProvider(final Environment environment,
    final ObjectProvider<LogRecordProcessor> logRecordProcessors) {
    final String applicationName = environment.getProperty("spring.application.name", "application");
    final Resource resource = Resource
    .create(Attributes.of(AttributeKey.stringKey("service.name"), applicationName));
    final SdkLoggerProviderBuilder builder = SdkLoggerProvider.builder()
    .setResource(Resource.getDefault().merge(resource));
    logRecordProcessors.orderedStream().forEach(builder::addLogRecordProcessor);
    return builder.build();
    }

    @Bean
    LogRecordProcessor otlpLogExporter() {
    return BatchLogRecordProcessor
    .builder(OtlpGrpcLogRecordExporter.builder()
    .setEndpoint(logsExporterEndpoint)
    .build())
    .build();
    }
    }

    What This Configuration Does:

    • It sets up the OpenTelemetry SDK with log export capabilities.
    • It configures an SdkLoggerProvider with the application name as a resource attribute.
    • It creates a LogRecordProcessor to batch and export logs to the specified endpoint.
  3. Updating application.yml

    Finally, add the following configuration to your src/main/resources/application.yml file:

    1
    otel.logs.exporter.enabled: false

    Purpose:
    This setting allows you to easily enable or disable the OpenTelemetry log exporter. Set it to true when deploying to Cloud Run.

    Result:
    With these configurations, your Spring Boot application will be set up to send logs to Cloud Logging when deployed on Cloud Run. This integration allows you to:

    • Correlate logs with traces in Cloud Logging.
    • Use structured logging for better searchability and analysis.
    • Leverage Cloud Logging’s features for log-based metrics and alerts.

    Remember to enable the log exporter by setting otel.logs.exporter.enabled to true in your Cloud Run environment variables or deployment configuration. This setup, combined with the tracing configuration, provides a comprehensive observability solution for your application on GCP Cloud Run.

Building and Pushing the OpenTelemetry Collector Docker Image

To deploy your custom OpenTelemetry Collector in GCP, you need to build the Docker image and push it to Google Artifact Registry. This process ensures that your Collector image is available for deployment in GCP services like Cloud Run.

Follow these steps:

  1. Set Up Environment Variables:

    1
    2
    3
    4
    5
    6
    export PROJECT_ID=your-gcp-project-id
    export REGION=your-region
    export REPOSITORY_NAME=your-repo-name
    export REGISTRY_URI=$REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY_NAME
    export BUILD_IMAGE_NAME=otel-collector
    export BUILD_IMAGE_TAG=0.0.1

    Replace the placeholders:

    • your-gcp-project-id with your actual GCP project ID.
    • your-region with your desired GCP region (e.g., us-central1).
    • your-repo-name with the name of your Artifact Registry repository.
  2. Build the Docker Image:

    1
    docker build --platform linux/amd64 -t $REGISTRY_URI/$BUILD_IMAGE_NAME:$BUILD_IMAGE_TAG .

    Explanation:
    This command builds the Docker image for the AMD64 architecture, ensuring compatibility with most cloud environments.

  3. Push the Image to Google Artifact Registry:

    1
    docker push $REGISTRY_URI/$BUILD_IMAGE_NAME:$BUILD_IMAGE_TAG

    Explanation:
    This command uploads your custom OpenTelemetry Collector image to your specified Artifact Registry repository.

    Important Notes:

    • Ensure you have the necessary permissions to push images to the Artifact Registry.
    • If you haven’t set up authentication for Artifact Registry, you may need to run gcloud auth configure-docker $REGION-docker.pkg.dev before pushing the image.
    • The --platform linux/amd64 flag ensures compatibility with GCP’s infrastructure. If you’re building on a different architecture (e.g., ARM-based Macs), this flag is crucial for ensuring the image works in GCP.

Setting Up GCP Permissions

Before deploying your application to Google Cloud Platform (GCP), it’s essential to configure the necessary permissions. This ensures that both your application and the OpenTelemetry Collector have the required access to GCP resources for monitoring, tracing, and logging.

Resetting GCP CLI Configuration

To begin with a clean setup, start by revoking existing authorizations and updating the gcloud CLI:

1
2
3
gcloud auth revoke --all
gcloud auth application-default revoke
gcloud components update -q

These commands will revoke all existing authorizations and ensure that you’re using the latest version of the gcloud CLI.

Authenticating and Setting the Project

Next, log in to GCP and set the target project:

1
2
3
gcloud auth login
export PROJECT_ID=your-project-id
gcloud config set project $PROJECT_ID

To verify your settings, run:

1
2
gcloud auth list
gcloud config list project

Replace your-project-id with your actual GCP project ID. This step ensures that all subsequent commands are executed within the correct project context.

Creating a Dedicated Service Account

For enhanced security and access control, create a dedicated service account for your Cloud Run application:

1
2
3
4
5
6
export SERVICE_ACCOUNT_ID_RUNTIME=your-service-account-id

gcloud iam service-accounts create $SERVICE_ACCOUNT_ID_RUNTIME \
--description="Service account for Cloud Run" \
--display-name="Cloud Run Service Account" \
--project=$PROJECT_ID

Replace your-service-account-id with a unique identifier for your service account.

Assigning Necessary IAM Roles

Finally, grant the required permissions to the service account:

1
2
3
4
5
6
7
8
9
10
11
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SERVICE_ACCOUNT_ID_RUNTIME@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/monitoring.metricWriter"

gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SERVICE_ACCOUNT_ID_RUNTIME@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/cloudtrace.agent"

gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$SERVICE_ACCOUNT_ID_RUNTIME@$PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/logging.logWriter"

These commands assign the following permissions to your service account:

  • roles/monitoring.metricWriter: Allows writing metrics to Cloud Monitoring.
  • roles/cloudtrace.agent: Enables sending trace data to Cloud Trace.
  • roles/logging.logWriter: Permits writing logs to Cloud Logging.

Building and Pushing Spring Boot Docker Images

To deploy your Spring Boot application to Google Cloud Run, you need to containerize it and push the image to a container registry. Here’s how to build your Spring Boot application as a Docker image and push it to Google Artifact Registry.

1
2
3
4
5
6
export PROJECT_ID=your-project-id
export REGION=your-region
export REPOSITORY_NAME=your-repo-name
export BUILD_IMAGE_NAME=otel-demo
export BUILD_IMAGE_TAG=0.0.1
export REGISTRY_URI=$REGION-docker.pkg.dev/$PROJECT_ID/$REPOSITORY_NAME

Replace the placeholders with your actual GCP project ID, preferred region, and Artifact Registry repository name.

Building the Docker Image

Use the Gradle Bootable Jar task to build your Spring Boot application as a Docker image:

1
./gradlew --no-daemon -x test clean bootBuildImage --imageName=$REGISTRY_URI/$BUILD_IMAGE_NAME:$BUILD_IMAGE_TAG

This command does the following:

  • Cleans the project and skips tests for faster builds.
  • Uses Spring Boot’s built-in support for building OCI images.
  • Names the image according to your Artifact Registry path.

Pushing the Image to Artifact Registry

After building the image, push it to Google Artifact Registry:

1
docker push $REGISTRY_URI/$BUILD_IMAGE_NAME:$BUILD_IMAGE_TAG

This command uploads your Spring Boot application image to the specified Artifact Registry repository.

Creating Cloud Run Service Configuration File

To deploy your Spring Boot application and OpenTelemetry Collector on Google Cloud Run, we need to create a service configuration file. This file defines how your service will be deployed and run.

Creating the Configuration File

Create a file named cloud-run.yaml in your project root directory with the following content:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: otel-demo
annotations:
run.googleapis.com/ingress: all
run.googleapis.com/ingress-status: all
labels:
cloud.googleapis.com/location: asia-east1
spec:
template:
metadata:
annotations:
autoscaling.knative.dev/maxScale: '4'
autoscaling.knative.dev/minScale: '1'
run.googleapis.com/cpu-throttling: 'false'
run.googleapis.com/startup-cpu-boost: 'true'
labels:
run.googleapis.com/startupProbeType: Custom
spec:
containerConcurrency: 100
timeoutSeconds: 300
serviceAccountName: your-service-account@your-project-id.iam.gserviceaccount.com
containers:
- name: otel-app
image: asia-east1-docker.pkg.dev/your-project-id/your-repo/otel-demo:0.0.24
ports:
- containerPort: 8080
env:
- name: spring.profiles.active
value: test,gcp
- name: otel.logs.exporter.enabled
value: 'true'
- name: logging.level.com.example.otel
value: debug
- name: JAVA_TOOL_OPTIONS
value: -Djava.security.egd=file:/dev/./urandom -Dfile.encoding=UTF-8
resources:
limits:
cpu: 1000m
memory: 1Gi
startupProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 10
timeoutSeconds: 1
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 10
timeoutSeconds: 1
- name: otel-collector
image: asia-east1-docker.pkg.dev/your-project-id/your-repo/otel-collector:0.0.4
resources:
limits:
cpu: 500m
memory: 128Mi
traffic:
- percent: 100
latestRevision: true

Key Components of the Configuration

  1. Dual Container Setup

    • otel-app: Your Spring Boot application
    • otel-collector: OpenTelemetry Collector
      This setup allows your application and Collector to run in the same Cloud Run service, facilitating data collection and forwarding.
  2. Environment Variable Configuration

    1
    2
    3
    4
    5
    env:
    - name: spring.profiles.active
    value: test,gcp
    - name: otel.logs.exporter.enabled
    value: 'true'

    These environment variables activate specific Spring profiles and enable the OpenTelemetry log exporter.

  3. Resource Allocation

    1
    2
    3
    4
    resources:
    limits:
    cpu: 1000m
    memory: 1Gi

    Sets resource limits for the application container, ensuring performance and controlling costs.

  4. Health Checks

    1
    2
    3
    4
    5
    6
    7
    8
    startupProbe:
    httpGet:
    path: /actuator/health/readiness
    port: 8080
    livenessProbe:
    httpGet:
    path: /actuator/health/liveness
    port: 8080

    These probes ensure that the application starts correctly and remains running.

  5. Autoscaling Configuration

    1
    2
    3
    annotations:
    autoscaling.knative.dev/maxScale: '4'
    autoscaling.knative.dev/minScale: '1'

    Configures the autoscaling range, allowing the service to scale between 1 and 4 instances based on demand.

  6. Service Account

    1
    serviceAccountName: your-service-account@your-project-id.iam.gserviceaccount.com

    Specifies the GCP service account used to run the service.

Important Notes

Before deploying, ensure you replace the following placeholders:

  • your-service-account@your-project-id.iam.gserviceaccount.com: Your GCP service account email.
  • asia-east1-docker.pkg.dev/your-project-id/your-repo/otel-demo:0.0.24: Full path to your Spring Boot application image in Artifact Registry.
  • asia-east1-docker.pkg.dev/your-project-id/your-repo/otel-collector:0.0.4: Full path to your OpenTelemetry Collector image in Artifact Registry.

Benefits of This Configuration

  1. Integrated Observability: Running the application and Collector in the same service simplifies the data collection process.
  2. Resource Optimization: Precise control over CPU and memory usage ensures cost-effectiveness.
  3. Reliability: Health checks ensure stable service operation.
  4. Flexibility: Autoscaling configuration allows the service to adapt to different load scenarios.

This configuration sets up a robust and flexible environment for your Spring Boot application and OpenTelemetry Collector, ensuring proper resource allocation, health monitoring, and comprehensive observability in the GCP environment. It provides a reliable and scalable foundation for your application to run and be maintained on Cloud Run over the long term.

Deploying Cloud Run Service

After preparing your cloud-run.yaml configuration file, you can deploy your service to Google Cloud Run with just a few steps.

Deploy the service

Use the following command to deploy your service based on the cloud-run.yaml configuration:

1
gcloud run services replace cloud-run.yaml

This command will create or update your Cloud Run service according to the specifications in your YAML file.

Check deployment status

After the deployment command completes, verify that your service is running correctly:

  • Open the Google Cloud Console
  • Navigate to the Cloud Run section
  • Find your service in the list and check its status

You should see your service listed as “Deployed” and “Ready” if everything has been set up correctly.

Testing Traceability in GCP Cloud Run

After deploying your application to Cloud Run, it’s crucial to verify that the tracing functionality is working correctly.

Sending a Test Request

To generate trace data, send a test request to your deployed application:

1
curl -X POST "https://<your-cloud-run-service-url>/api/orders?customerId=12345"

Replace <your-cloud-run-service-url> with the actual URL of your Cloud Run service.

Viewing the GCP Trace Interface

  1. Open the Google Cloud Console.
  2. Navigate to the “Trace” section in the left sidebar.
  3. In the Trace interface, select your service name and set an appropriate time range to find your recent trace.
  4. Click on a trace record to view detailed information.

GCP Trace Overview

Explaining Trace Records

The GCP Trace interface provides valuable insights into your application’s performance:

  1. Trace Overview: Shows the overall request duration and involved services.

  2. Span Details: Clicking “Show expanded” reveals detailed information for each span:

    • Duration of each operation
    • Log messages associated with each span
    • Relationships between different operations

GCP Trace Detail

In this example, we can observe:

  • HTTP POST /api/orders: The main request, taking 2.578 seconds in total.
    • create-order-request: Handling the order creation (806 ms).
    • create-order: Actually creating the order (752.585 ms).
    • get-order-status: Retrieving the order status (505.523 ms).
    • process-order-async: Asynchronous order processing (2 seconds).

This detailed breakdown allows you to:

  • Identify performance bottlenecks
  • Understand the flow of operations in your application
  • Correlate log messages with specific parts of the request processing
  • Analyze the efficiency of both synchronous and asynchronous operations

By examining these trace records, you can gain deep insights into your application’s behavior, helping you optimize performance and troubleshoot issues effectively in your GCP Cloud Run environment.

Summary

The integration of logs and traces allows us to have a more comprehensive understanding of the application’s running status and provides valuable information for optimizing the application’s performance and reliability. I hope this introduction helps you better utilize the GCP Trace interface for integrated log and trace analysis.

Acknowledgement

Thanks to Marcin for his advice, which helped me complete this article.

References