The previous two articles
in this series covered the Chat Completions API — how to set up a client, maintain conversation history manually, call
external tools, and control output structure with response_format. That API gives you full control and a clear mental
model of what goes over the wire. This article covers the other primary OpenAI interface: the Responses API.
The Responses API moves conversation state from the client to OpenAI’s servers. You no longer maintain a history slice
and re-send it with every call. Instead, you track a response ID and pass it back on the next request. That is a meaningful
shift for agent-oriented applications — less maintenance, but also less transparency. Understanding the trade-offs between the
two APIs is worth doing before choosing which one to build on.
Responses API #
OpenAI introduced the Responses API in 2025, positioning it as the foundation for building agents. The Chat Completions API is stateless — every request must carry the full conversation history, and the client owns that state entirely. The Responses API inverts this: conversation state lives on OpenAI’s servers, and you reference previous turns by ID rather than re-sending them.
Both APIs give you access to the same underlying models and tool-calling mechanics. The difference is where the orchestration responsibility sits. The table below, first introduced in the Chat Completions API article, summarizes the trade-offs:
| Feature | Chat Completions API | Responses API |
|---|---|---|
| Conversation state | Client-managed | Server-managed |
| History management | Manual — sent with every request | Automatic |
| Tool support | Manual function calling | Built-in tools (web search, code interpreter) |
| Streaming | Yes | Yes |
| Control | Full | Limited |
| Vendor coupling | Low | Higher |
| Best for | Custom agents, full control | Rapid prototyping, built-in tooling |
The Chat Completions API is the right default when you want to control exactly what the model sees and when you need portability across providers. The Responses API reduces boilerplate and fits well when you want to prototype quickly or lean on OpenAI’s managed tooling. In this article we build the same conversational agent we built before — but with the Responses API driving state management.
First AI agent #
The openai-go SDK covers both APIs under one package. The same client initialization you used for
Chat Completions works here. To call the API, you need a secret key from
platform.openai.com — navigate to the API Keys section, generate a key,
and store it in an environment variable.
package main
import (
// ...
"github.com/openai/openai-go"
"github.com/openai/openai-go/option"
)
var previousResponseID string
var client openai.Client
func main() {
apiKey := os.Getenv("OPENAI_API_KEY")
if apiKey == "" {
fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
os.Exit(1)
}
client = openai.NewClient(option.WithAPIKey(apiKey))
}The client is created the same way as before — API key from the environment, exit immediately if it is missing. The
difference is what comes next. Instead of a history slice, we declare a previousResponseID string. The Responses API
handles conversation state on its side; all we track is the ID of the last response so we can tell the API what the
previous turn was. When previousResponseID is empty, the API treats the request as the start of a new conversation.
import (
"bufio"
// ...
)
func talkToAgent(userInput string) (string, error) {
return "", nil
}
scanner := bufio.NewScanner(os.Stdin)
fmt.Println("AI assistant ready. Type your question or 'exit' to quit.")
fmt.Println()
for {
fmt.Print("Human: ")
if !scanner.Scan() {
break
}
input := strings.TrimSpace(scanner.Text())
if input == "" {
continue
}
if input == "exit" {
break
}
text, err := talkToAgent(input)
if err != nil {
fmt.Fprintf(os.Stderr, "Error: %v\n", err)
continue
}
fmt.Printf("Agent: %s\n\n", text)
}Here bufio.Scanner reads user input line by line from standard input.
Each non-empty, non-exit line is passed directly to talkToAgent as a string argument — compare this to the Chat Completions
version, where we first appended the input to the history slice before calling the function. With the Responses API,
the input goes straight to the function and the API takes care of threading it into the ongoing conversation.
func talkToAgent(userInput string) (string, error) {
params := responses.ResponseNewParams{
Model: openai.ChatModelGPT4_1Mini,
Instructions: openai.String(
"You are a helpful assistant.",
),
Input: responses.ResponseNewParamsInputUnion{
OfString: openai.String(userInput),
},
}
if previousResponseID != "" {
params.PreviousResponseID = openai.String(previousResponseID)
}
resp, err := client.Responses.New(context.Background(), params)
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
}
previousResponseID = resp.ID
for _, item := range resp.Output {
switch item.Type {
case "message":
msg := item.AsMessage()
for _, content := range msg.Content {
if content.Type == "output_text" {
return content.AsOutputText().Text, nil
}
}
}
}
return "", fmt.Errorf("unexpected response: no text output and no tool calls")
}Building the request starts with responses.ResponseNewParams. The Instructions field replaces the system message
from Chat Completions — same concept, different field name. The user’s input goes into Input as a plain string. If
previousResponseID is set, we attach it to the params; this is what connects the request to the previous turn on
OpenAI’s side. Without it, the model has no history of the conversation.
After the call completes, we immediately update previousResponseID with the ID from the response. This is the entirety
of the state management — one string, updated on every turn. The response output is a typed list of items. We iterate
and look for a message item containing an output_text block; that is where the model’s text response is. The
structure is more nested than the Chat Completions response, but the pattern is consistent once you have seen it once.
AI assistant ready. Type your question or 'exit' to quit.
Human: hi
Agent: Hello! How can I assist you today?
Human: what is my name?
Agent: I don't know your name yet. Could you please tell me what it is?
Human: my name is Marko
Agent: Nice to meet you, Marko! How can I help you today?
Human: what is my name?
Agent: Your name is Marko. How can I assist you further?The agent remembers the user’s name across turns — because previousResponseID links each request to the prior one and
OpenAI reconstructs the context on its side. This is the core value of the Responses API: multi-turn memory without any
client-side history management.
First tool integration #
The Responses API supports the same tool-calling pattern as Chat Completions. To demonstrate, consider what happens when you ask the agent a question it cannot answer without real-world data:
Human: what is the current time in Frankfurt?
Agent: I don't have access to real-time data. However, you
can check the current time in Frankfurt by searching "current
time in Frankfurt" on a search engine or by using a world clock app.
Frankfurt is in the Central European Time (CET) zone, which is
UTC+1 during standard time and UTC+2 during daylight saving time
(typically from the last Sunday in March to the last Sunday in October).The model knows about time zones — it just does not know what time it is right now. Tools exist precisely to fill this gap: they let the model delegate specific calls to external functions and incorporate the results into its response. The process works in three steps. First, you send the model a list of tool definitions — names, descriptions, and parameter schemas. The model reads those definitions and decides whether it needs to invoke one before it can answer. If it does, it responds not with a message but with a list of tool calls, each containing the tool name and the arguments it wants to pass. You execute those calls on your side, then send a new request to the model with the results. The model may request additional tool calls or produce a final answer. This loop continues until the model has everything it needs.
The first step is a plain Go function that retrieves the current time in a given timezone:
func getCurrentTime(timezone string) (string, error) {
location, err := time.LoadLocation(timezone)
if err != nil {
return "", fmt.Errorf("failed to load location: %w", err)
}
now := time.Now().In(location)
return now.Format("2006-01-02 15:04:05"), nil
}Here time.LoadLocation resolves an IANA timezone name like
Europe/Berlin to a *time.Location. time.Now().In(location) returns the current instant in that timezone, and we
format it with Go’s reference time. The function is deliberately simple — it does one thing and returns a string. Tool
functions do not need to be complex; they need to be correct and fast.
Next, we define the tool and update talkToAgent to include it in the request:
Tool definition and updated talkToAgent
import (
// ...
"github.com/openai/openai-go/responses"
)
var getTimeToolDefinition = responses.ToolUnionParam{
OfFunction: &responses.FunctionToolParam{
Name: "get_time",
Description: openai.String("Fetch the current time in a given timezone."),
Parameters: openai.FunctionParameters{
"type": "object",
"properties": map[string]interface{}{
"timezone": map[string]interface{}{
"type": "string",
"description": "Tne international timezone name, e.g. America/Los_Angeles.",
},
},
"required": []string{"timezone"},
"additionalProperties": false,
},
},
}
func talkToAgent(userInput string) (string, error) {
params := responses.ResponseNewParams{
Model: openai.ChatModelGPT4oMini,
Instructions: openai.String(
"You are a helpful time assistant. " +
"When asked about time at particular locations, " +
"always use the get_time tool to get current time in a given timezone. " +
"Never guess the time.",
),
Input: responses.ResponseNewParamsInputUnion{
OfString: openai.String(userInput),
},
Tools: []responses.ToolUnionParam{
getTimeToolDefinition,
},
}
if previousResponseID != "" {
params.PreviousResponseID = openai.String(previousResponseID)
}
for {
resp, err := client.Responses.New(context.Background(), params)
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
}
previousResponseID = resp.ID
// ...
}
}The tool definition is a responses.ToolUnionParam wrapping a responses.FunctionToolParam. The Parameters field is
a standard JSON Schema object: it declares that the tool accepts a single required timezone string, with
additionalProperties: false to prevent the model from passing fields we did not define. The system prompt is updated
to instruct the model to always use get_time for time queries and never guess — without this instruction, the model
may produce a plausible-sounding but stale answer from its training data.
The API call is now wrapped in a for loop. This is essential: when the model decides to call a tool, it does not return
a text message — it returns a list of function calls. We execute those, send the results back, and the loop continues.
A single turn can involve multiple round-trips between the application and the model before a final answer is produced.
func callTool(call responses.ResponseFunctionToolCall) (string, error) {
// ...
}
func talkToAgent(userInput string) (string, error) {
// ...
for {
resp, err := client.Responses.New(context.Background(), params)
if err != nil {
return "", fmt.Errorf("API call failed: %w", err)
}
previousResponseID = resp.ID
var toolResultItems []responses.ResponseInputItemUnionParam
hasToolCalls := false
for _, item := range resp.Output {
switch item.Type {
case "function_call":
hasToolCalls = true
toolCall := item.AsFunctionCall()
fmt.Printf("[tool call: %s(%s)]\n", toolCall.Name, toolCall.Arguments)
result, err := callTool(toolCall)
if err != nil {
result = fmt.Sprintf("error: %s", err.Error())
}
fmt.Printf("[tool result: %s]\n", result)
toolResultItems = append(toolResultItems, responses.ResponseInputItemParamOfFunctionCallOutput(toolCall.CallID, result))
case "message":
msg := item.AsMessage()
for _, content := range msg.Content {
if content.Type == "output_text" {
return content.AsOutputText().Text, nil
}
}
}
}
if !hasToolCalls {
return "", fmt.Errorf("unexpected response: no text output and no tool calls")
}
params = responses.ResponseNewParams{
Model: openai.ChatModelGPT4oMini,
PreviousResponseID: openai.String(previousResponseID),
Input: responses.ResponseNewParamsInputUnion{
OfInputItemList: toolResultItems,
},
Tools: []responses.ToolUnionParam{
getTimeToolDefinition,
},
}
}
}Each output item is inspected by type. A message item means the model has produced a final answer — we return it
immediately. A function_call item means the model wants to invoke a tool. We call callTool with the tool call
details, collect the result, and append it to toolResultItems using ResponseInputItemParamOfFunctionCallOutput. If
the response contained tool calls but no message, we build a new ResponseNewParams with the tool results as input
and loop again. The Instructions field is deliberately omitted from this follow-up request — only PreviousResponseID
and the tool results are needed, since the model already has the conversation context from the prior response.
type timeArgs struct {
Timezone string `json:"timezone"`
}
type timeResult struct {
Time string `json:"time"`
}
func callTool(call responses.ResponseFunctionToolCall) (string, error) {
switch call.Name {
case "get_time":
var args timeArgs
if err := json.Unmarshal([]byte(call.Arguments), &args); err != nil {
return "", fmt.Errorf("failed to parse tool arguments: %w", err)
}
currentTime, err := getCurrentTime(args.Timezone)
if err != nil {
errMsg, _ := json.Marshal(map[string]string{"error": err.Error()})
return string(errMsg), nil
}
result := timeResult{
Time: currentTime,
}
out, err := json.Marshal(result)
if err != nil {
return "", fmt.Errorf("failed to marshal tool result: %w", err)
}
return string(out), nil
default:
return "", fmt.Errorf("unknown tool: %q", call.Name)
}
}Now, callTool is a dispatcher: it switches on the tool name and routes to the appropriate Go function. The model returns
arguments as a JSON string, so we unmarshal into a typed struct — timeArgs in this case — before calling getCurrentTime.
The result is marshaled back to JSON and returned as a string. This JSON contract is what the Responses API expects for
tool outputs. Notice that errors from getCurrentTime are also returned as JSON rather than propagated as Go errors —
this allows the model to read the error and respond to the user intelligently rather than crashing the loop.
The model is smart enough to map a natural-language question like “what is the time in Frankfurt?” to the IANA timezone
Europe/Berlin — it knows the relationship between city names and timezone identifiers from its training data. The tool
definition only needed to describe the parameter; the model handles the resolution.
Time assistant ready. Type your question or 'exit' to quit.
Human: what is the time in Frankfurt?
[tool call: get_time({"timezone":"Europe/Berlin"})]
[tool result: {"time":"2026-03-18 17:13:50"}]
Agent: The current time in Frankfurt is 17:13 (5:13 PM) on March 18, 2026.The full source is available at llm-and-golang-examples.
Conclusion #
The Responses API trades control for convenience. Server-side state management removes the history-maintenance boilerplate from the Chat Completions API, and the tool-calling pattern maps cleanly onto ordinary Go functions. The cost of that convenience is tighter coupling to OpenAI’s infrastructure and less visibility into exactly what the model receives on each turn. For production systems where observability and provider flexibility matter, the Chat Completions API remains the more defensible choice. For rapid agent prototyping, the Responses API is the faster path. Knowing both gives you the option to choose based on actual requirements rather than default habit.