Building AI Applications with Spring AI (2): Implementing Chat Histories and Instant SSE Responses

Posted on 2024-04-08 Edited on 2025-07-24 In Spring , AI

Following our initial dive into creating prompts with Spring AI, this article ventures further into enhancing user interactions. We focus on incorporating chat histories and delivering responses in real-time using Server-Sent Events (SSE). This combination not only elevates the user experience by providing instant feedback but also simulates a dynamic conversation flow, akin to real-life interactions.

cover

Why Chat Histories and Real-Time Responses are Crucial

Context is paramount in any conversation, especially when it comes to AI generating relevant and coherent responses. By integrating chat histories, AI can consider previous exchanges, offering a more personalized and engaging user experience. Combined with SSE for real-time messaging, our chat application becomes lively and interactive, ensuring users remain engaged by receiving immediate responses.

Implementing Chat Histories

First, we need to store chat messages which will later be used to provide context for AI-generated responses. Although our example utilizes a simple in-memory solution for demonstration, a database implementation is advisable for production scenarios.

@Component
public class ChatHistoryService {
    private final List<Message> chatHistory = new ArrayList<>();

    public void addMessage(Message message) {
        chatHistory.add(message);
    }

    public List<Message> getChatHistory() {
        return Collections.unmodifiableList(chatHistory);
    }
}

The ChatHistoryService

This service is responsible for storing and retrieving chat messages:

import org.springframework.ai.chat.messages.Message;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;

@Component
public class ChatHistoryService {
    private final List<Message> chatHistory = new ArrayList<>();

    public void addMessage(Message message) {
        chatHistory.add(message);
    }

    public List<Message> getChatHistory() {
        return Collections.unmodifiableList(chatHistory);
    }
}

Generating Contextual Responses

For the AI to generate relevant responses, the chat history must be included to provide context:

public Prompt createPromptWithHistory(String userInput) {
    List<Message> history = chatHistoryService.getChatHistory();
    messages.add(new UserMessage(userInput));
    return new Prompt(messages);
}

This process ensures every prompt sent to the AI encompasses the entire conversation history, enabling the generation of more relevant and engaging responses.

Implementing Real-Time Responses with SSE

To offer immediate feedback to users, we leverage SSE. This entails modifying the ChatController to include chat history in the prompt and use SseEmitter for real-time communication.

The ChatController

The ChatController is a Spring MVC controller that handles HTTP GET requests to the /chat-stream endpoint. It’s designed to initiate a real-time chat session between the user and an AI model using Server-Sent Events (SSE) for live message streaming. Additionally, it incorporates a chat history feature to provide context to the AI, enhancing the relevance of its responses.

@Slf4j
@RestController
@RequiredArgsConstructor
@FieldDefaults(level = AccessLevel.PRIVATE)
public class ChatController {

    final ChatHistoryService chatHistoryService;
    final OpenAiChatClient openAiChatClient;

    @GetMapping("/chat-stream")
    public SseEmitter streamChat(String input) {
        final SseEmitter emitter = new SseEmitter(120 * 1000L); // 2 minutes timeout
        Prompt prompt = createPromptWithHistory(input);

        Flux<ChatResponse> chatResponseFlux = openAiChatClient.stream(prompt);
        StringBuilder fullMessageBuilder = new StringBuilder();

        chatResponseFlux.doOnNext(chatResponse -> {
            String content = chatResponse.getResult().getOutput().getContent();
            if (StringUtils.hasText(content)) {
                fullMessageBuilder.append(content);
                emitter.send(SseEmitter.event().name("chatMessage").data(chatResponse));
            }
        }).doOnError(emitter::completeWithError).doOnComplete(() -> {
            chatHistoryService.addMessage(new UserMessage(input));
            String fullMessage = fullMessageBuilder.toString();
            chatHistoryService.addMessage(new AssistantMessage(fullMessage));
            log.info("AI:{}", fullMessage);
            emitter.complete();
        }).subscribe();

        return emitter;
    }
}

Breaking Down the Code

@Slf4j
@RestController
@RequiredArgsConstructor
@FieldDefaults(level = AccessLevel.PRIVATE)

@Slf4j: This Lombok annotation automatically generates a Logger object, allowing us to log messages throughout the class.
@RestController: Marks this class as a controller with methods that return domain objects rather than views. It’s a convenience annotation combining @Controller and @ResponseBody.
@RequiredArgsConstructor: Another Lombok annotation that generates a constructor with 1 parameter for each field that requires special handling. All non-initialized final fields get a parameter, as well as any @NonNull fields.
@FieldDefaults(level = AccessLevel.PRIVATE): Sets the access level of class fields. Here, it makes all fields private by default, enhancing encapsulation.

1 2	final ChatHistoryService chatHistoryService; final OpenAiChatClient openAiChatClient;

These are dependencies injected by Spring’s Inversion of Control (IoC) container. ChatHistoryService manages chat history, and OpenAiChatClient interfaces with OpenAI’s chat API.

1 2	@GetMapping("/chat-stream") public SseEmitter streamChat(String input) {

This method is mapped to handle HTTP GET requests to /chat-stream. It accepts a query parameter named input, which represents the user’s message to the AI.

1	final SseEmitter emitter = new SseEmitter(120 * 1000L);

An SseEmitter instance is created with a timeout of 120 seconds. It’s used to send server-sent events to the client, facilitating real-time communication.

1	Prompt prompt = createPromptWithHistory(input);

A Prompt object is created with the user’s input and any existing chat history. This context allows the AI to generate more relevant responses.

1	Flux<ChatResponse> chatResponseFlux = openAiChatClient.stream(prompt);

The openAiChatClient.stream(prompt) call initiates the conversation with the AI, returning a Flux<ChatResponse> - a reactive stream of ChatResponse objects representing the AI’s replies.

chatResponseFlux.doOnNext(chatResponse -> {
    String content = chatResponse.getResult().getOutput().getContent();
    if (StringUtils.hasText(content)) {
        fullMessageBuilder.append(content);
        emitter.send(SseEmitter.event().name("chatMessage").data(chatResponse));
    }
})

For each ChatResponse received (doOnNext), its content is appended to a StringBuilder to accumulate the full conversation. If the content is not empty (StringUtils.hasText), it’s then sent to the client using emitter.send().

1	.doOnError(emitter::completeWithError)

If an error occurs in the stream (doOnError), it’s propagated to the client by completing the SseEmitter with an error.

.doOnComplete(() -> {
    chatHistoryService.addMessage(new UserMessage(input));
    String fullMessage = fullMessageBuilder.toString();
    chatHistoryService.addMessage(new AssistantMessage(fullMessage));
    log.info("AI:{}", fullMessage);
    emitter.complete();
})
.subscribe();

Once all messages have been processed (doOnComplete), the user’s input and the full AI response are added to the chat history. The log captures the AI’s full response for debugging or auditing purposes. Finally, the SseEmitter is marked as complete, and the reactive stream is subscribed to, activating the sequence of operations.

1	return emitter;

The method returns the SseEmitter to the client, enabling real-time streaming of chat messages.

Testing Continuity with cURL Commands

To effectively test the continuity of the conversation, utilize the following cURL commands:

First Test Call:

1	curl --location '127.0.0.1:8080/chat-stream?input=Who%20is%20the%20bad%20guy%20in%20Kamen%20Rider%3F'

Second Test Call:

1	curl --location '127.0.0.1:8080/chat-stream?input=What%20is%20their%20mission%3F'

These commands simulate a continuous conversation flow, showcasing the application’s capability to maintain context and deliver real-time responses.

Conclusion

By embedding chat histories and leveraging SSE for instant responses, we have significantly amplified the capabilities of our Spring AI-powered application. This approach not only renders interactions more engaging and personalized but also sets the stage for advanced features like handling multimodal inputs or integrating external APIs for expanded functionalities. As you continue to explore the vast possibilities with Spring AI, let your imagination lead the way to innovation. Happy coding!

Stay tuned for more tutorials in this series as we delve deeper into the exciting world of AI application development with Spring AI.

References and Future Tools

As the Spring AI project evolves, new tools and libraries are anticipated to simplify aspects like chat history management. For instance, a discussion on GitHub hints at the development of an official Spring AI ChatHistory abstraction. This will offer a more streamlined approach to incorporating chat histories into AI interactions, allowing for even more sophisticated chat applications.

References:

Prompt.java