Skip to main content
  1. Articles/

LLM and Go: OpenAI integration via Responses API

·2370 words·12 mins· loading · loading ·
Marko Milojevic
Author
Marko Milojevic
Software engineer and architect. Golang and LLM enthusiast. Awful chess player, gym rat, harmonica newbie and cat lover.
LLM and Golang - This article is part of a series.
Part 3: This Article

The previous two articles in this series covered the Chat Completions API — how to set up a client, maintain conversation history manually, call external tools, and control output structure with response_format. That API gives you full control and a clear mental model of what goes over the wire. This article covers the other primary OpenAI interface: the Responses API.

The Responses API moves conversation state from the client to OpenAI’s servers. You no longer maintain a history slice and re-send it with every call. Instead, you track a response ID and pass it back on the next request. That is a meaningful shift for agent-oriented applications — less maintenance, but also less transparency. Understanding the trade-offs between the two APIs is worth doing before choosing which one to build on.

Responses API
#

OpenAI introduced the Responses API in 2025, positioning it as the foundation for building agents. The Chat Completions API is stateless — every request must carry the full conversation history, and the client owns that state entirely. The Responses API inverts this: conversation state lives on OpenAI’s servers, and you reference previous turns by ID rather than re-sending them.

Both APIs give you access to the same underlying models and tool-calling mechanics. The difference is where the orchestration responsibility sits. The table below, first introduced in the Chat Completions API article, summarizes the trade-offs:

Feature Chat Completions API Responses API
Conversation state Client-managed Server-managed
History management Manual — sent with every request Automatic
Tool support Manual function calling Built-in tools (web search, code interpreter)
Streaming Yes Yes
Control Full Limited
Vendor coupling Low Higher
Best for Custom agents, full control Rapid prototyping, built-in tooling

The Chat Completions API is the right default when you want to control exactly what the model sees and when you need portability across providers. The Responses API reduces boilerplate and fits well when you want to prototype quickly or lean on OpenAI’s managed tooling. In this article we build the same conversational agent we built before — but with the Responses API driving state management.

First AI agent
#

The openai-go SDK covers both APIs under one package. The same client initialization you used for Chat Completions works here. To call the API, you need a secret key from platform.openai.com — navigate to the API Keys section, generate a key, and store it in an environment variable.

package main

import (
	// ...

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
)

var previousResponseID string
var client openai.Client

func main() {
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
		os.Exit(1)
	}

	client = openai.NewClient(option.WithAPIKey(apiKey))
}

The client is created the same way as before — API key from the environment, exit immediately if it is missing. The difference is what comes next. Instead of a history slice, we declare a previousResponseID string. The Responses API handles conversation state on its side; all we track is the ID of the last response so we can tell the API what the previous turn was. When previousResponseID is empty, the API treats the request as the start of a new conversation.

import (
	"bufio"

	// ...
)

func talkToAgent(userInput string) (string, error) {
	return "", nil
}

scanner := bufio.NewScanner(os.Stdin)

fmt.Println("AI assistant ready. Type your question or 'exit' to quit.")
fmt.Println()

for {
	fmt.Print("Human: ")

	if !scanner.Scan() {
		break
	}

	input := strings.TrimSpace(scanner.Text())
	if input == "" {
		continue
	}
	if input == "exit" {
		break
	}

	text, err := talkToAgent(input)
	if err != nil {
		fmt.Fprintf(os.Stderr, "Error: %v\n", err)
		continue
	}

	fmt.Printf("Agent: %s\n\n", text)
}

Here bufio.Scanner reads user input line by line from standard input. Each non-empty, non-exit line is passed directly to talkToAgent as a string argument — compare this to the Chat Completions version, where we first appended the input to the history slice before calling the function. With the Responses API, the input goes straight to the function and the API takes care of threading it into the ongoing conversation.

func talkToAgent(userInput string) (string, error) {
	params := responses.ResponseNewParams{
		Model: openai.ChatModelGPT4_1Mini,
		Instructions: openai.String(
			"You are a helpful assistant.",
		),
		Input: responses.ResponseNewParamsInputUnion{
			OfString: openai.String(userInput),
		},
	}

	if previousResponseID != "" {
		params.PreviousResponseID = openai.String(previousResponseID)
	}

	resp, err := client.Responses.New(context.Background(), params)
	if err != nil {
		return "", fmt.Errorf("API call failed: %w", err)
	}

	previousResponseID = resp.ID

	for _, item := range resp.Output {
		switch item.Type {
		case "message":
			msg := item.AsMessage()
			for _, content := range msg.Content {
				if content.Type == "output_text" {
					return content.AsOutputText().Text, nil
				}
			}
		}
	}

	return "", fmt.Errorf("unexpected response: no text output and no tool calls")
}

Building the request starts with responses.ResponseNewParams. The Instructions field replaces the system message from Chat Completions — same concept, different field name. The user’s input goes into Input as a plain string. If previousResponseID is set, we attach it to the params; this is what connects the request to the previous turn on OpenAI’s side. Without it, the model has no history of the conversation.

After the call completes, we immediately update previousResponseID with the ID from the response. This is the entirety of the state management — one string, updated on every turn. The response output is a typed list of items. We iterate and look for a message item containing an output_text block; that is where the model’s text response is. The structure is more nested than the Chat Completions response, but the pattern is consistent once you have seen it once.

AI assistant ready. Type your question or 'exit' to quit.
Human: hi
Agent: Hello! How can I assist you today?

Human: what is my name?
Agent: I don't know your name yet. Could you please tell me what it is?

Human: my name is Marko
Agent: Nice to meet you, Marko! How can I help you today?

Human: what is my name?
Agent: Your name is Marko. How can I assist you further?

The agent remembers the user’s name across turns — because previousResponseID links each request to the prior one and OpenAI reconstructs the context on its side. This is the core value of the Responses API: multi-turn memory without any client-side history management.

First tool integration
#

The Responses API supports the same tool-calling pattern as Chat Completions. To demonstrate, consider what happens when you ask the agent a question it cannot answer without real-world data:

Human: what is the current time in Frankfurt?
Agent: I don't have access to real-time data. However, you 
can check the current time in Frankfurt by searching "current 
time in Frankfurt" on a search engine or by using a world clock app. 

Frankfurt is in the Central European Time (CET) zone, which is 
UTC+1 during standard time and UTC+2 during daylight saving time 
(typically from the last Sunday in March to the last Sunday in October).

The model knows about time zones — it just does not know what time it is right now. Tools exist precisely to fill this gap: they let the model delegate specific calls to external functions and incorporate the results into its response. The process works in three steps. First, you send the model a list of tool definitions — names, descriptions, and parameter schemas. The model reads those definitions and decides whether it needs to invoke one before it can answer. If it does, it responds not with a message but with a list of tool calls, each containing the tool name and the arguments it wants to pass. You execute those calls on your side, then send a new request to the model with the results. The model may request additional tool calls or produce a final answer. This loop continues until the model has everything it needs.

The first step is a plain Go function that retrieves the current time in a given timezone:

func getCurrentTime(timezone string) (string, error) {
	location, err := time.LoadLocation(timezone)
	if err != nil {
		return "", fmt.Errorf("failed to load location: %w", err)
	}

	now := time.Now().In(location)
	return now.Format("2006-01-02 15:04:05"), nil
}

Here time.LoadLocation resolves an IANA timezone name like Europe/Berlin to a *time.Location. time.Now().In(location) returns the current instant in that timezone, and we format it with Go’s reference time. The function is deliberately simple — it does one thing and returns a string. Tool functions do not need to be complex; they need to be correct and fast.

Next, we define the tool and update talkToAgent to include it in the request:

Tool definition and updated talkToAgent

import (
	// ...

	"github.com/openai/openai-go/responses"	
)

var getTimeToolDefinition = responses.ToolUnionParam{
	OfFunction: &responses.FunctionToolParam{
		Name:        "get_time",
		Description: openai.String("Fetch the current time in a given timezone."),
		Parameters: openai.FunctionParameters{
			"type": "object",
			"properties": map[string]interface{}{
				"timezone": map[string]interface{}{
					"type":        "string",
					"description": "Tne international timezone name, e.g. America/Los_Angeles.",
				},
			},
			"required":             []string{"timezone"},
			"additionalProperties": false,
		},
	},
}

func talkToAgent(userInput string) (string, error) {
	params := responses.ResponseNewParams{
		Model: openai.ChatModelGPT4oMini,
		Instructions: openai.String(
			"You are a helpful time assistant. " +
				"When asked about time at particular locations, " +
				"always use the get_time tool to get current time in a given timezone. " +
				"Never guess the time.",
		),
		Input: responses.ResponseNewParamsInputUnion{
			OfString: openai.String(userInput),
		},
		Tools: []responses.ToolUnionParam{
			getTimeToolDefinition,
		},
	}

	if previousResponseID != "" {
		params.PreviousResponseID = openai.String(previousResponseID)
	}

	for {
		resp, err := client.Responses.New(context.Background(), params)
		if err != nil {
			return "", fmt.Errorf("API call failed: %w", err)
		}

		previousResponseID = resp.ID
		
		// ...
	}
}

The tool definition is a responses.ToolUnionParam wrapping a responses.FunctionToolParam. The Parameters field is a standard JSON Schema object: it declares that the tool accepts a single required timezone string, with additionalProperties: false to prevent the model from passing fields we did not define. The system prompt is updated to instruct the model to always use get_time for time queries and never guess — without this instruction, the model may produce a plausible-sounding but stale answer from its training data.

The API call is now wrapped in a for loop. This is essential: when the model decides to call a tool, it does not return a text message — it returns a list of function calls. We execute those, send the results back, and the loop continues. A single turn can involve multiple round-trips between the application and the model before a final answer is produced.

func callTool(call responses.ResponseFunctionToolCall) (string, error) {
	// ...
}

func talkToAgent(userInput string) (string, error) {
	// ...

	for {
		resp, err := client.Responses.New(context.Background(), params)
		if err != nil {
			return "", fmt.Errorf("API call failed: %w", err)
		}

		previousResponseID = resp.ID

		var toolResultItems []responses.ResponseInputItemUnionParam
		hasToolCalls := false

		for _, item := range resp.Output {
			switch item.Type {
			case "function_call":
				hasToolCalls = true
				toolCall := item.AsFunctionCall()

				fmt.Printf("[tool call: %s(%s)]\n", toolCall.Name, toolCall.Arguments)

				result, err := callTool(toolCall)
				if err != nil {
					result = fmt.Sprintf("error: %s", err.Error())
				}

				fmt.Printf("[tool result: %s]\n", result)

				toolResultItems = append(toolResultItems, responses.ResponseInputItemParamOfFunctionCallOutput(toolCall.CallID, result))
			case "message":
				msg := item.AsMessage()
				for _, content := range msg.Content {
					if content.Type == "output_text" {
						return content.AsOutputText().Text, nil
					}
				}
			}
		}

		if !hasToolCalls {
			return "", fmt.Errorf("unexpected response: no text output and no tool calls")
		}

		params = responses.ResponseNewParams{
			Model:              openai.ChatModelGPT4oMini,
			PreviousResponseID: openai.String(previousResponseID),
			Input: responses.ResponseNewParamsInputUnion{
				OfInputItemList: toolResultItems,
			},
			Tools: []responses.ToolUnionParam{
				getTimeToolDefinition,
			},
		}
	}
}

Each output item is inspected by type. A message item means the model has produced a final answer — we return it immediately. A function_call item means the model wants to invoke a tool. We call callTool with the tool call details, collect the result, and append it to toolResultItems using ResponseInputItemParamOfFunctionCallOutput. If the response contained tool calls but no message, we build a new ResponseNewParams with the tool results as input and loop again. The Instructions field is deliberately omitted from this follow-up request — only PreviousResponseID and the tool results are needed, since the model already has the conversation context from the prior response.

type timeArgs struct {
	Timezone string `json:"timezone"`
}

type timeResult struct {
	Time string `json:"time"`
}

func callTool(call responses.ResponseFunctionToolCall) (string, error) {
	switch call.Name {
	case "get_time":
		var args timeArgs
		if err := json.Unmarshal([]byte(call.Arguments), &args); err != nil {
			return "", fmt.Errorf("failed to parse tool arguments: %w", err)
		}

		currentTime, err := getCurrentTime(args.Timezone)
		if err != nil {
			errMsg, _ := json.Marshal(map[string]string{"error": err.Error()})
			return string(errMsg), nil
		}

		result := timeResult{
			Time: currentTime,
		}
		out, err := json.Marshal(result)
		if err != nil {
			return "", fmt.Errorf("failed to marshal tool result: %w", err)
		}
		return string(out), nil

	default:
		return "", fmt.Errorf("unknown tool: %q", call.Name)
	}
}

Now, callTool is a dispatcher: it switches on the tool name and routes to the appropriate Go function. The model returns arguments as a JSON string, so we unmarshal into a typed struct — timeArgs in this case — before calling getCurrentTime. The result is marshaled back to JSON and returned as a string. This JSON contract is what the Responses API expects for tool outputs. Notice that errors from getCurrentTime are also returned as JSON rather than propagated as Go errors — this allows the model to read the error and respond to the user intelligently rather than crashing the loop.

The model is smart enough to map a natural-language question like “what is the time in Frankfurt?” to the IANA timezone Europe/Berlin — it knows the relationship between city names and timezone identifiers from its training data. The tool definition only needed to describe the parameter; the model handles the resolution.

Time assistant ready. Type your question or 'exit' to quit.

Human: what is the time in Frankfurt?
[tool call: get_time({"timezone":"Europe/Berlin"})]
[tool result: {"time":"2026-03-18 17:13:50"}]

Agent: The current time in Frankfurt is 17:13 (5:13 PM) on March 18, 2026.

The full source is available at llm-and-golang-examples.

Conclusion
#

The Responses API trades control for convenience. Server-side state management removes the history-maintenance boilerplate from the Chat Completions API, and the tool-calling pattern maps cleanly onto ordinary Go functions. The cost of that convenience is tighter coupling to OpenAI’s infrastructure and less visibility into exactly what the model receives on each turn. For production systems where observability and provider flexibility matter, the Chat Completions API remains the more defensible choice. For rapid agent prototyping, the Responses API is the faster path. Knowing both gives you the option to choose based on actual requirements rather than default habit.

Useful Resources
#

LLM and Golang - This article is part of a series.
Part 3: This Article

Related

LLM and Go: OpenAI Integration via Chat Completions API

·2721 words·13 mins· loading · loading
For most of my career, integrating external intelligence into an application meant calling a rules engine, training a custom classifier, or encoding business logic that someone had painfully documented in a spreadsheet. The idea that I could describe a task in plain language and have a model respond with genuine reasoning was not something I expected to become production-ready in my working life. Then GPT happened, and it changed what backend developers need to know. This article is the first in a series on using LLMs in Go. We start with the OpenAI Chat Completions API — the stateless, request-based interface that gives you direct control over every aspect of the conversation. By the end, you will have a working conversational agent that can call external tools to answer questions it otherwise could not. A short introduction to ChatGPT and OpenAI # The path to large language models runs through a decade of incremental progress in deep learning. Early models like word2vec and GloVe learned to embed words into dense vector spaces, capturing semantic relationships between terms. The transformer architecture, introduced by Google in 2017, changed the trajectory of the field — it processes sequences in parallel using attention mechanisms that capture long-range dependencies far more effectively than recurrent networks. This architectural shift made it practical to train models on orders of magnitude more data. GPT-1 in 2018 showed that large-scale unsupervised pre-training followed by fine-tuning could match or beat purpose-built models across a range of language tasks. Understanding what these models actually do removes a lot of the mysticism around them. An LLM is, at its core, a next-token predictor. It takes a sequence of tokens as input and outputs a probability distribution over the vocabulary for the next token. The transformer’s attention mechanism allows every token in the input to attend to every other token, building a rich contextual representation before making that prediction. Training adjusts billions of parameters to minimise prediction error across enormous text corpora. What emerges is a model with broad world knowledge encoded in its weights — not because it was taught facts directly, but because predicting text well requires internalising the structure of the world that produced that text. ChatGPT is OpenAI’s conversational product built on the GPT model series. What set it apart from raw GPT-3 was the addition of reinforcement learning from human feedback (RLHF) — a technique that fine-tunes the base model to follow instructions and produce responses that human raters judge as helpful and safe. When ChatGPT launched in late 2022, it became one of the fastest-adopted consumer products in history. For developers, the more relevant artefact is the API behind it — specifically the Chat Completions API, which gives programmatic access to the same models powering the product.

LLM and Go: Investigating OpenAI Chat Completions API

·2132 words·11 mins· loading · loading
In the previous article I covered the fundamentals of the Chat Completions API: setting up a client, maintaining conversation history, and integrating tools. That was enough to build a working conversational agent. This article goes a level deeper — into the API parameters that shape what the model returns and how it thinks. Two parameters stand out as particularly useful in production: response_format and reasoning_effort. The first gives you control over the structure of the model’s output. The second controls how much the model reasons before responding — which turns out to matter more than you might expect once you start caring about latency and cost. Chat Completions API details # The Chat Completions API endpoint accepts a rich set of parameters. Most have sensible defaults and you will rarely touch them, but understanding what is available saves you from reaching for workarounds that already exist in the API. The table below covers the current non-deprecated parameters from the API reference: Parameter Type Description model string ID of the model to use messages array Conversation history as an ordered list of messages response_format object Output format: text, json_object, or json_schema reasoning_effort string Reasoning intensity for reasoning models: low, medium, high temperature number Sampling temperature from 0 to 2; higher values produce more random output top_p number Alternative to temperature; nucleus sampling probability mass max_completion_tokens integer Maximum tokens the model may generate in the response n integer Number of completion choices to return stream boolean Stream partial responses as server-sent events stop string/array Sequences at which the API stops generating presence_penalty number Penalises new tokens based on whether they appear in the text so far frequency_penalty number Penalises new tokens based on their frequency in the text so far tools array List of tools (functions) the model may call tool_choice string/object Controls which tool the model calls seed integer Seed for deterministic sampling user string Unique identifier for the end user In this article we focus on response_format and reasoning_effort — two parameters with a direct, visible impact on production systems. Information extraction with response_format # The response_format parameter controls how the model structures its output. The default is plain text. Setting it to json_object tells the model to return valid JSON, but gives you no control over the schema. Setting it to json_schema goes further: you provide a JSON Schema document and the model guarantees its output will conform to it. OpenAI calls this structured output.

Go Tutorial: Iterators

·2086 words·10 mins· loading · loading
Go 1.22 introduced range over functions, and Go 1.23 brought the iter package to go with it. Together they gave iterators a proper place in the language. Before that, iterating over custom data structures meant either returning slices upfront — loading everything into memory — or writing callback-based helpers that nobody could agree on naming. I have seen both approaches and neither felt right. The core idea behind iterators is straightforward: instead of computing all values upfront and handing them back as a list, you compute each value on demand and yield it to the caller one at a time. The caller controls when to stop. This matters any time you are working with large or potentially infinite sequences. This article walks through why iterators exist in Go, how the yield-based pattern works, what the iter package provides, and where the current limits of the feature sit. Why do we need Iterators? # The simplest case for iteration is a slice of numbers. You range over it, print each value, move on. func main() { numbers := []int{1, 2, 3, 4, 5} for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // 3 // 4 // 5 That works fine until the collection gets large. If you need to generate a million numbers, you have to allocate memory for all of them before you can even start ranging. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // ... You can always add a break once you hit your threshold, but the damage is already done — the entire slice was allocated upfront. In other cases, you might not even know how many items you will need. The for range loop can iterate for some time, until it reaches the breakpoint, depending on some value provided in the item. In such cases, the size of such a list must be not just too big, but absolutely unpredictable. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) if i > 10 { break } } } // OUT: // 1 // 2 // 4 // 8 // 10 // 12 In a real application, the decision about when to stop often happens dynamically — driven by user input, a timeout, or a condition that evaluates to true before the fifth item. Allocating a million items and then breaking on the fifth is wasteful. This is exactly the problem iterators solve.