Skip to main content
  1. Articles/

LLM and Go: ChatGPT Integration via Chat Completions API

·2719 words·13 mins· loading · loading ·
Marko Milojevic
Author
Marko Milojevic
Software engineer and architect. Golang and LLM enthusiast. Awful chess player, gym rat, harmonica newbie and cat lover.
LLM and Go - This article is part of a series.
Part 1: This Article

For most of my career, integrating external intelligence into an application meant calling a rules engine, training a custom classifier, or encoding business logic that someone had painfully documented in a spreadsheet. The idea that I could describe a task in plain language and have a model respond with genuine reasoning was not something I expected to become production-ready in my working life. Then GPT happened, and it changed what backend developers need to know.

This article is the first in a series on using LLMs in Go. We start with the OpenAI Chat Completions API — the stateless, request-based interface that gives you direct control over every aspect of the conversation. By the end, you will have a working conversational agent that can call external tools to answer questions it otherwise could not.

A short introduction to ChatGPT
#

The path to large language models runs through a decade of incremental progress in deep learning. Early models like word2vec and GloVe learned to embed words into dense vector spaces, capturing semantic relationships between terms. The transformer architecture, introduced by Google in 2017, changed the trajectory of the field — it processes sequences in parallel using attention mechanisms that capture long-range dependencies far more effectively than recurrent networks. This architectural shift made it practical to train models on orders of magnitude more data. GPT-1 in 2018 showed that large-scale unsupervised pre-training followed by fine-tuning could match or beat purpose-built models across a range of language tasks.

Understanding what these models actually do removes a lot of the mysticism around them. An LLM is, at its core, a next-token predictor. It takes a sequence of tokens as input and outputs a probability distribution over the vocabulary for the next token. The transformer’s attention mechanism allows every token in the input to attend to every other token, building a rich contextual representation before making that prediction. Training adjusts billions of parameters to minimise prediction error across enormous text corpora. What emerges is a model with broad world knowledge encoded in its weights — not because it was taught facts directly, but because predicting text well requires internalising the structure of the world that produced that text.

ChatGPT is OpenAI’s conversational product built on the GPT model series. What set it apart from raw GPT-3 was the addition of reinforcement learning from human feedback (RLHF) — a technique that fine-tunes the base model to follow instructions and produce responses that human raters judge as helpful and safe. When ChatGPT launched in late 2022, it became one of the fastest-adopted consumer products in history. For developers, the more relevant artefact is the API behind it — specifically the Chat Completions API, which gives programmatic access to the same models powering the product.

Chat Completions API vs Responses API
#

OpenAI exposes its models through two primary APIs. The Chat Completions API is the older and more fundamental of the two. It is stateless: you send a list of messages representing the full conversation history, and the model returns the next message. Every request is self-contained. The application is entirely responsible for maintaining that history and re-sending it with every call.

The Responses API, introduced in 2025, takes a different approach. It manages conversation state on the server side, supports built-in tools such as web search and file reading, and is designed for agent-oriented use cases where you want OpenAI’s infrastructure to handle more of the orchestration. The trade-off is less control over what is sent to the model and tighter coupling to OpenAI’s platform.

Feature Chat Completions API Responses API
Conversation state Client-managed Server-managed
History management Manual — sent with every request Automatic
Tool support Manual function calling Built-in tools (web search, code interpreter)
Streaming Yes Yes
Control Full Limited
Vendor coupling Low Higher
Best for Custom agents, full control Rapid prototyping, built-in tooling

In this series we use the Chat Completions API. It requires more wiring on your side, but it gives you a clearer mental model of what is actually happening — which matters when things go wrong in production.

First AI agent
#

The official Go SDK for the OpenAI API is openai-go, maintained by OpenAI directly. It provides typed bindings for all major API endpoints and handles authentication, serialisation, and retries. It covers Chat Completions, Responses, embeddings, images, and more. For this series, we use it exclusively — no third-party wrappers.

To call the API, you need a secret key from OpenAI. Create an account at platform.openai.com, navigate to the API Keys section, and generate a new key. Store it in an environment variable and never commit it to version control.

package main

import (
	// ...

	"github.com/openai/openai-go"
	"github.com/openai/openai-go/option"
)

var history []openai.ChatCompletionMessageParamUnion
var client openai.Client

func main() {
	apiKey := os.Getenv("OPENAI_API_KEY")
	if apiKey == "" {
		fmt.Fprintln(os.Stderr, "error: OPENAI_API_KEY environment variable is not set")
		os.Exit(1)
	}

	client = openai.NewClient(option.WithAPIKey(apiKey))
	history = append(history, openai.SystemMessage(
		"You are a helpful assistant.",
	))
}

Two things are initialised here: the API client and the conversation history. The client is created with the API key read from the environment — if the variable is missing, the application exits immediately rather than failing later with a cryptic authentication error. The history slice is the conversation state. Unlike the Responses API, the Chat Completions API has no memory of its own. Every request must include the full conversation history so the model has the context it needs to respond correctly. This slice is where we maintain it. The System Message at the top of the history defines the model’s behaviour and persona for the entire session. It is always the first message and shapes how the model interprets everything that follows.

import (
	"bufio"

	// ...
)

func talkToAgent() (string, error) {
	return "", nil
}

func main() {
	// ...

	scanner := bufio.NewScanner(os.Stdin)
	fmt.Println("AI assistant ready. Type your question or 'exit' to quit.")
	fmt.Println()

	for {
		fmt.Print("Human: ")
		if !scanner.Scan() {
			break
		}

		input := strings.TrimSpace(scanner.Text())
		if input == "" {
			continue
		}
		if strings.EqualFold(input, "exit") {
			fmt.Println("Bye.")
			break
		}

		history = append(history, openai.UserMessage(input))

		answer, err := talkToAgent()
		if err != nil {
			fmt.Fprintf(os.Stderr, "error: %v\n", err)
			continue
		}

		fmt.Printf("Agent: %s\n\n", answer)
	}
}

The main loop reads from standard input using bufio.Scanner, which handles line-by-line reading cleanly. Empty inputs are skipped. Typing exit terminates the application. For any other input, the user’s message is appended to the history as a UserMessage and passed to talkToAgent. The result is printed back to the terminal. At this stage, talkToAgent is a stub that returns nothing — we fill it in next.

func talkToAgent() (string, error) {
	resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
		Model:    openai.ChatModelGPT4_1Mini,
		Messages: history,
	})
	if err != nil {
		return "", fmt.Errorf("API call failed: %w", err)
	}

	choice := resp.Choices[0]
	history = append(history, choice.Message.ToParam())

	return choice.Message.Content, nil
}

Here Chat.Completions.New sends the full history to the model and returns a completion response. The Model field specifies which model to use — here we use gpt-4.1-mini to keep costs low during development. The response contains a Choices slice; we take the first choice, which is the model’s response. The important step is calling ToParam() on the response message before appending it to the history. This converts the assistant’s response into the format expected for the chat history, so the model will remember what it said in subsequent turns.

The three GPT-4.1 models cover different points on the cost-quality spectrum:

Model Input cost (per 1M tokens) Output cost (per 1M tokens) Best for
gpt-4.1 $2.00 $8.00 Complex reasoning, production quality
gpt-4.1-mini $0.40 $1.60 Balanced cost and quality
gpt-4.1-nano $0.10 $0.40 High-volume, low-complexity tasks

For development and experimentation, gpt-4.1-mini or gpt-4.1-nano are the right starting point. The quality is sufficient for most tasks, and the cost difference against gpt-4.1 is significant at scale.

AI assistant ready. Type your question or 'exit' to quit.
Human: hi
Agent: Hello! How can I assist you today?

Human: what is my name?
Agent: I am sorry, I do not know your name. Feel free to tell me!

Human: my name is Marko
Agent: Nice to meet you, Marko! How can I help you today?

Human: what is my name?
Agent: Your name is Marko! How can I assist you further?

The key thing to notice in this exchange is that the agent remembers the user’s name. That is not magic — it is the history slice doing its job. Every time a user sends a message and the agent replies, both messages are appended to the history. On the next request, the full history is sent to the model, giving it complete context. Without that history, the model would have no idea who Marko (me!) is.

First tool integration
#

Even with full conversation history, the model has a fundamental limitation: it cannot access real-time information. Ask it for the current time, and it will tell you so.

Human: what is the current time in Frankfurt?
Agent: I can't provide real-time information, including the current time. However, Frankfurt is in the 
Central European Time Zone (CET), which is UTC+1, and it observes Central European Summer Time (CEST), 
which is UTC+2 during the summer months. You can easily check the current time using a clock or a 
smartphone. Is there anything else you would like to know?

This is where tools come in. A tool is a function that the model can request to be called on its behalf. You define the tool — its name, description, and parameter schema — and send those definitions alongside the conversation history. When the model determines it needs information it cannot generate from its weights alone, it responds not with text but with a list of tool calls it wants executed. Your application runs those calls, sends the results back, and the model incorporates them into its final response. The model never executes code directly; it only requests calls and receives results.

The integration process has a clear shape: send the tool definitions with every request, check whether the response contains tool calls, execute each call, append the results to the history as tool messages, and send another request. Repeat until the model returns a plain assistant message with no tool calls. This loop is what makes the agent able to use multiple tools in sequence when needed.

The first step is building the function the tool will call.

func getCurrentTime(timezone string) (string, error) {
	location, err := time.LoadLocation(timezone)
	if err != nil {
		return "", fmt.Errorf("failed to load location: %w", err)
	}

	now := time.Now().In(location)
	return now.Format("2006-01-02 15:04:05"), nil
}

This is a plain Go function with no awareness of the LLM. It takes a timezone variable, loads the location, and returns the current time formatted as a string. Keeping tool implementations as ordinary functions is intentional — it keeps them testable and reusable outside the agent context.

With the function in place, we need to describe it to the model and update the request loop.

var getTimeToolDefinition = openai.ChatCompletionToolParam{
	Type: "function",
	Function: openai.FunctionDefinitionParam{
		Name:        "get_time",
		Description: openai.String("Fetch the current time in a given timezone."),
		Parameters: openai.FunctionParameters{
			"type": "object",
			"properties": map[string]interface{}{
				"timezone": map[string]interface{}{
					"type":        "string",
					"description": "Tne international timezone name, e.g. America/Los_Angeles.",
				},
			},
			"required":             []string{"timezone"},
			"additionalProperties": false,
		},
	},
}

func talkToAgent() (string, error) {
	for {
		resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
			Model:    openai.ChatModelGPT4_1Mini,
			Messages: history,
			Tools: []openai.ChatCompletionToolParam{
				getTimeToolDefinition,
			},
		})
		
		// ...
	}
}

The tool definition is a struct that describes the function to the model. The Name field is what the model uses when it requests a call. The Description is critical — it is what the model reads to decide whether to use this tool at all, so it should be precise. The Parameters block follows the JSON Schema format and tells the model exactly what arguments to provide.

The request loop in talkToAgent is now wrapped in a for loop because a single user message may require multiple round-trips: the model calls a tool, you return the result, and the model may call another tool before finally producing its answer.

func callTool(call openai.ChatCompletionMessageToolCall) (string, error) {
	// ...
}

func talkToAgent() (string, error) {
	for {
		resp, err := client.Chat.Completions.New(context.Background(), openai.ChatCompletionNewParams{
			Model:    openai.ChatModelGPT4_1Mini,
			Messages: history,
			Tools: []openai.ChatCompletionToolParam{
				getTimeToolDefinition,
			},
		})
		if err != nil {
			return "", fmt.Errorf("API call failed: %w", err)
		}

		choice := resp.Choices[0]
		history = append(history, choice.Message.ToParam())

		if len(choice.Message.ToolCalls) == 0 {
			return choice.Message.Content, nil
		}

		for _, call := range choice.Message.ToolCalls {
			fmt.Printf("  [tool] %s(%s)\n", call.Function.Name, call.Function.Arguments)

			result, err := callTool(call)
			if err != nil {
				return "", err
			}

			history = append(history, openai.ToolMessage(result, call.ID))
		}
	}
}

When the model returns a response with no tool calls, ToolCalls is empty and we return the content directly to the caller. When it does contain tool calls, we iterate over each one, print it for visibility, execute it via callTool, and append the result to the history as a ToolMessage tied to the call’s ID. The model uses that ID to match results back to the requests it made. The loop then fires another request with the updated history, and the cycle continues until the model is satisfied.

func callTool(call openai.ChatCompletionMessageToolCall) (string, error) {
	switch call.Function.Name {
	case "get_time":
		var args timeArgs
		if err := json.Unmarshal([]byte(call.Function.Arguments), &args); err != nil {
			return "", fmt.Errorf("failed to parse tool arguments: %w", err)
		}

		currentTime, err := getCurrentTime(args.Timezone)
		if err != nil {
			errMsg, _ := json.Marshal(map[string]string{"error": err.Error()})
			return string(errMsg), nil
		}

		result := timeResult{
			Time: currentTime,
		}
		out, err := json.Marshal(result)
		if err != nil {
			return "", fmt.Errorf("failed to marshal tool result: %w", err)
		}
		return string(out), nil

	default:
		return "", fmt.Errorf("unknown tool: %q", call.Function.Name)
	}
}

The function callTool is a router. It receives the tool call struct from the model, switches on the function name, and dispatches to the appropriate implementation. For get_time, it unmarshals the JSON arguments the model provided, calls getCurrentTime, and serialises the result back to JSON. If the tool execution fails — for example, an invalid timezone — the error is returned as a JSON payload rather than bubbling up to the caller. This keeps the loop running: the model receives the error, understands what went wrong, and can either try a corrected call or inform the user gracefully. Any tool name the application does not recognise returns an error, which surfaces cleanly through the loop.

The last piece is updating the system prompt to give the model clear instructions about when to use the tool.

func main() {
	// ...

	client = openai.NewClient(option.WithAPIKey(apiKey))
	history = append(history, openai.SystemMessage(
		"You are a helpful time assistant. "+
			"When asked about time at particular locations, "+
			"always use the get_time tool to get current time in a given timezone. "+
			"Never guess the time.",
	))
	
	// ...

The system prompt now explicitly instructs the model to use the get_time tool whenever time at a specific location is requested. Without this instruction, the model might attempt to reason from its training data and produce a plausible but wrong answer. The instruction removes the ambiguity, especially by saying that the model should never guess the time itself.

Time assistant ready. Type your question or 'exit' to quit.

You: hi
Agent: Hello! How can I assist you today?

You: what is the current time in Frankfurt?
  [tool] get_time({"timezone":"Europe/Berlin"})
Agent: The current time in Frankfurt is 14:23 on March 5, 2026.

The [tool] line shows the model calling get_time with the correct IANA timezone for Frankfurt. The model resolved “Frankfurt” to Europe/Berlin on its own, based on the tool description and its world knowledge. The full source for this example is available at llm-and-golang-examples.

Conclusion
#

The Chat Completions API is the right starting point for anyone building LLM-powered features in Go. It is explicit, stateless, and gives you full control over what the model sees. The history management is your responsibility, but that also means you understand exactly what is happening in every request. Adding tools extends the model’s reach into live data without complicating the core loop much — define the tool, check for calls, execute, return results, repeat. This pattern scales to multiple tools cleanly. The next article in this series will look at the Responses API and where it makes more sense than the Chat Completions approach.

Useful Resources
#

LLM and Go - This article is part of a series.
Part 1: This Article

Related

Go Tutorial: Iterators

·2086 words·10 mins· loading · loading
Go 1.22 introduced range over functions, and Go 1.23 brought the iter package to go with it. Together they gave iterators a proper place in the language. Before that, iterating over custom data structures meant either returning slices upfront — loading everything into memory — or writing callback-based helpers that nobody could agree on naming. I have seen both approaches and neither felt right. The core idea behind iterators is straightforward: instead of computing all values upfront and handing them back as a list, you compute each value on demand and yield it to the caller one at a time. The caller controls when to stop. This matters any time you are working with large or potentially infinite sequences. This article walks through why iterators exist in Go, how the yield-based pattern works, what the iter package provides, and where the current limits of the feature sit. Why do we need Iterators? # The simplest case for iteration is a slice of numbers. You range over it, print each value, move on. func main() { numbers := []int{1, 2, 3, 4, 5} for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // 3 // 4 // 5 That works fine until the collection gets large. If you need to generate a million numbers, you have to allocate memory for all of them before you can even start ranging. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) } } // OUT: // 1 // 2 // ... You can always add a break once you hit your threshold, but the damage is already done — the entire slice was allocated upfront. In other cases, you might not even know how many items you will need. The for range loop can iterate for some time, until it reaches the breakpoint, depending on some value provided in the item. In such cases, the size of such a list must be not just too big, but absolutely unpredictable. func main() { n := 1_000_000 numbers := make([]int, n) for i := range numbers { numbers[i] = i * 2 } for _, i := range numbers { fmt.Println(i) if i > 10 { break } } } // OUT: // 1 // 2 // 4 // 8 // 10 // 12 In a real application, the decision about when to stop often happens dynamically — driven by user input, a timeout, or a condition that evaluates to true before the fifth item. Allocating a million items and then breaking on the fifth is wasteful. This is exactly the problem iterators solve.

Golang Release 1.22: version

·823 words·4 mins· loading · loading
With the release of Go 1.22, the Go standard library introduced several new features. As you might have noticed in articles related to the previous release, here we mostly concentrate on the new exciting packages and features that they give us. This article will start this journey, by providing a deeper look into the implementation of the version package in Go. Lang # The first function we are ready to examine is the Lang function. This function provides a cleaned, valid Go version as a string. In case it can’t determine the actual version, due to an invalid state of the string value, it will return an string as a result. Lang function func Lang(x string) string As we can see the function signature above, function expects one argument, a string, that represents a Go version. An output should be also one value, a string, as a cleaned Go version. Lang function examples package main import ( "fmt" "go/version" ) func main() { fmt.Println(version.Lang("go1.0")) // go1 fmt.Println(version.Lang("go1")) // go1 fmt.Println(version.Lang("go1.22.4")) // go1.22 fmt.Println(version.Lang("go1.22.3")) // go1.22 fmt.Println(version.Lang("go1.22.2")) // go1.22 fmt.Println(version.Lang("go1.22.rc1")) // fmt.Println(version.Lang("go1.22rc1")) // go1.22 fmt.Println(version.Lang("1.22")) // fmt.Println(version.Lang("wrong")) // fmt.Println(version.Lang("")) // } In the example above, we can see how the Lang function adapt the Go version string. It removes all minor versions and appearance of “release candide” phrase, and present them in the end as an official Go versions that we experienced in the past (and we might experience in the future). In cases where we provided an invalid, or empty string, the ending result will be also an empty string, as the Lang function can’t find the actual version name. One interesting point, not just for the Long function, but, as you will see, for all functions in this package, to consider some string as a valid Go version, it needs to have a prefix go. IsValid # The next function we are examining is the IsValid function. This function checks a string with a potential Go version and returns a boolean result that tells us if the version is valid or not. IsValid function func IsValid(x string) bool As we can see the function signature above, function expects one argument, a string, that represents a Go version. An output should be a bool value, which tells us if the Go version is valid or not.

Golang Release 1.21: slices - Part 2

·2301 words·11 mins· loading · loading
With the release of Go 1.21, the Go standard library introduced several new features. While we’ve already discussed some of them in previous articles, in this episode, we’ll dive into more advanced enhancements. Naturally, we’ll focus on the new functions designed for sorting slices, which are part of the new slices package. This article will provide a deeper look into the implementation of these three new functions and touch on benchmarking as well. Sort # The Sort function is the first one we’d like to explore. This implementation is built upon the enhanced Pattern-defeating Quicksort, positioning it as one of the best-known unstable sorting algorithms. Don’t worry; we will discuss this “instability” aspect in this article. But first, let’s take a look at the function’s signature: Sort function func Sort[S ~[]E, E cmp.Ordered](x S) As we’ve seen in some other articles, nearly all improvements in the Go standard library are built upon generics, a feature introduced in Go version 1.18, almost three years ago. Similar to other functions, the Sort function also expects a slice of a generic type as an argument, where each item must adhere to the Ordered constraint. The function doesn’t return a new value but sorts the original slice in place. Below, you’ll find some basic examples: Sort function examples ints := []int{1, 2, 3, 5, 5, 7, 9} slices.Sort(ints) fmt.Println(ints) // Output: // 1 2 3 5 5 7 9 ints2 := []int{9, 7, 5, 5, 3, 2, 1} slices.Sort(ints2) fmt.Println(ints2) // Output: // 1 2 3 5 5 7 9 floats := []float64{9, 3, 5, 7, 1, 2, 5} slices.Sort(floats) fmt.Println(floats) // Output: // 1 2 3 5 5 7 9 strings := []string{"3", "9", "2", "5", "1", "7", "5"} slices.Sort(strings) fmt.Println(strings) // Output: // 1 2 3 5 5 7 9 In the example above, we can observe the result of the Sort method. All the outputs consist of sorted slices, arranged in ascending order. However, what makes this function particularly intriguing is its ability to handle various data types using a single function, distinguishing it from the implementations we already possess in the sort package. Now that we’ve examined the results, let’s proceed to compare the performance benchmarks with the existing package. Benchmark # In this section, we aim to evaluate the performance of the new function by comparing it to the already existing sort package. Below, you’ll find the benchmark test results: