Add an AI Code Copilot to your product using GPT-4

August 4, 2023 · 12 min read

LLM Research Engineer

In recent weeks, we've been working on an AI Code Copilot for Windmill. Users can now easily generate whole scripts or make code adjustments using prompts. They can also fix their errors automatically by the click of a button. The Copilot is available in the script builder as well as in the app and flow editors.

Here's a quick demo of what we've built:

tip

If you want to try it out, enable the feature as instructed in the documentation. You will need an API key from OpenAI with access to the GPT-4 model.

In this blog post, we're sharing how you too can become an AI assisted startup^TM in a few steps using GPT-4, some prompt engineering and a bit of UX work.

We will go over the following topics:

Why GPT-4
Generating code from instructions
Code editing and bug fixing

What is Windmill and why add an AI Copilot?

The technics explored in this post are applicable to any product that involves code generation. But to give you some context, Windmill enables developers to build and deploy endpoints, workflows and apps through the combination of code. So it was only logical to add an AI code copilot to help users create, edit and fix their code.

Why GPT-4

This feature is powered by GPT-4 from OpenAI, their latest language model. It's really good at generating code in addition to text, making it highly suitable for our use case. GPT-4 also excels at following instructions and supports larger context, accommodating up to 32K tokens.

info

Tokens represent pieces of words. As a rule of thumb, one token is approximately equivalent to 4 characters. Learn more here.

We chose this model over others because of its superior performance and its ease of use. However, we are looking forward to experimenting with other models in the future, especially OSS ones such as StarCoder and Llama-2. The technics we have used here should be applicable to those as well.

We interact with GPT-4 (8K context) using the OpenAI Node API v4 library (currently in beta). You can explore all our code on GitHub here. Additionally, we have implemented a backend service in Rust to proxy the user's request and add the OpenAI API key without revealing it in the frontend. You can find the relevant code here.

Generating code from instructions

Most of the work for building the copilot went into designing the prompts. The OpenAI API exposes a completion endpoint that takes a prompt in the form of user and system messages. We pass as user message the user's instructions along with some contextual information, and we use the system message to convey global instructions.

The endpoint also takes a few generation parameters. We keep the default values except for temperature, which we set to 0.5 to reduce the risk of hallucinations and failure to follow instructions.

info

Temperature determines the creativity and diversity of the text generated by a language model. A higher temperature value (e.g., 1.5) leads to more diverse and creative text, while a lower value (e.g., 0.5) results in more focused and deterministic text.

Prompt templates

Below is our system message template for code generation where we specifiy that it should output code with comments and wrap it in a code block. We use the same system message for all the different languages supported by Windmill.

You write code as queried by the user. Only output code. Wrap the code like that:

```language
\{code\}
```

Put explanations directly in the code as comments.

To extract and display exclusively the code, we apply the following regular expression on the response: /```[a-zA-Z]+\n([\s\S]*?)\n```/. If you do not speak fluently regex, this expression looks for three back ticks followed by a language name (any sequence of letters followed by a newline), then captures everything until the next three back ticks.

For the user message, we use the following template (here for Python):

Write a function in python called "main". The function should \{description\}.
Specify the parameter types. Do not call the main function.

The {description} placeholder gets replaced by the user's instructions. In addition, we simply instruct the model to fit Windmill's execution model: a main function with typed parameters used to infer the script's inputs.

Handling resource types

Windmill incorporates resource types, which serve as blueprints defining the structure of resources that can be used as parameters in scripts.

info

Learn more about resources and resource types in the documentation.

For instance, a resource type may represent connection credentials to an AWS account. We use standard Python and TypeScript syntax to define these resource types which are then passed as inputs to the main function:

class aws(TypedDict):
    region: str
    awsAccessKeyId: str
    awsSecretAccessKey: str

def main(credentials: aws):
    ...
    region = credentials['region']
    access_key_id = credentials['awsAccessKeyId']
    secret_access_key = credentials['awsSecretAccessKey']
    ...

type Aws = {
  region: string;
  awsAccessKeyId: string;
  awsSecretAccessKey: string;
};

async function main(credentials: aws) {
  ...
  const region = credentials.region;
  const accessKeyId = credentials.awsAccessKeyId;
  const secretAccessKey = credentials.awsSecretAccessKey;
  ...
}

Our objective is to enable the AI to generate code that leverages these resource types. For instance, if the user requests a script that returns the list of their AWS EC2 instances, the AI should be capable of generating code that uses the appropriate resource type.

To achieve this, we include all the resource types within the user message, providing clear instructions on how to employ them. Below is the updated template for Python with sample resource types:

Write a function in python called "main". The function should \{description\}.
Specify the parameter types. Do not call the main function.

You have access to the following resource types.
If you need them, you have to define the TypedDict exactly as specified
(class name has to be IN LOWERCASE) and add them as parameters:

class aws(TypedDict):
    region: str
    awsAccessKeyId: str
    awsSecretAccessKey: str

class supabase(TypedDict):
    key: str
    url: str

class ...

...
Only use the ones you need.
If the TypedDict name conflicts with the imported object,
rename the imported object NOT THE TYPE.

Based on our observations, we added some instructions at the end of the user message template. Indeed, GPT-4 would sometimes include numerous resource types in the generated code, even if they were not required. Moreover, GPT-4 occasionally introduced naming conflicts when importing libraries with the same name as the resource type. We therefore specify that it should rename the imported object instead of the type, as the latter is required for Windmill to parse the parameters.

Watch our Copilot generate a Python script to retrieve the list of my EC2 instances:

^{Loading time has been reduced to enhance the viewing experience}

What makes our Copilot work so well is also it's simple UX: a simple "AI Gen" button with a single instructions field, for both generating and editing the script. We decided against a chat interface as it would have been more complex to implement and less efficient for the user.

Monaco's diff editor enables the user to review and accept the change. This is specifically useful for code editing and fixing which we will discuss later.

Handling multiple languages

Windmill offers support for a wide range of languages, with some languages like TypeScript and Go having similar requirements as python. However, languages like SQL and bash required different instructions to deal with script arguments.

Here's how we define them in PostgreSQL scripts:

-- $1 firstName = John
-- $2 lastName = Doe
insert into users (first_name, last_name) values ($1, $2);

Below is the corresponding prompt template:

Write SQL code for PostgreSQL that should \{description\}.
Arguments can be obtained directly in the statement with `$1::{type}`, `$2::{type}`, etc...
Name the parameters by adding comments before the command like that:
`-- $1 name1` or `-- $2 name = default` (one per row, do not include the type)

The idea is the same for other supported SQL languages and similar for bash scripts. You can find the code generation templates for all languages and runtimes here.

Handling database schemas

Taking database scripts to the next level, we've given the AI the ability to generate code based on the database schema. Based on the user's instructions, the AI automatically formulates an SQL query with the appropriate tables and columns. To achieve this, we query the schema of the selected database, and we include it as part of the user message. Below is the updated template, accompanied by part of the schema from a Windmill instance database as an example:

Write SQL code for PostgreSQL that should \{description\}.
Arguments can be obtained directly in the statement with `$1::{type}`, `$2::{type}`, etc...
Name the parameters by adding comments before the command like that:
`-- $1 name1` or `-- $2 name = default` (one per row, do not include the type)

Here's the database schema, each column is in the format [name, type, required, default?]:
{
  "public": {
    "usr": [
      ["username", "varchar", true],
      ["email", "varchar", true],
      ...
    ],
    "completed_job": [
      ["id", "uuid", true],
      ["created_by", "varchar", true],
      ...
    ],
    ...
  }
}

Watch our Copilot generate a SQL script to retrieve from a Windmill instance's database the user who executed the most jobs:

^{Loading time has been reduced to enhance the viewing experience}

Code editing and bug fixing

In addition to its capabilities for code generation, we wanted to leverage GPT-4 for code editing and bug fixing.

The process for code editing is quite similar to code generation, with the exception that we include the selected code lines as part of the user message. We continue to use the same system message to guide the AI but we use a different user message:

Here's my python3 code:
```python
\{code\}
```

Additional information:
We have to export a "main" function and specify the parameter types but do not call it.
You have access to the following resource types.
If you need them, you have to define the TypedDict exactly as specified
(class name has to be IN LOWERCASE) and add them as parameters: {resourceTypes}
Only use the ones you need. If the TypedDict name conflicts with the imported object,
rename the imported object NOT THE TYPE.

My instructions: \{description\}

We integrated this feature into the script builder by simply changing the "AI Gen" button to "AI Edit" button when code is selected.

Watch our Copilot add comments and remove print statements from a Python script:

^{Loading time has been reduced to enhance the viewing experience}

For bug fixing, we pass the complete code and the error message, but no user instructions.

Here's my python3 code:
```python
\{code\}
```
Additional information:
We have to export a "main" function and specify the parameter types but do not call it.
You have access to the following resource types.
If you need them, you have to define the TypedDict exactly as specified
(class name has to be IN LOWERCASE) and add them as parameters: {resourceTypes}
Only use the ones you need. If the TypedDict name conflicts with the imported object,
rename the imported object NOT THE TYPE.

I get the following error: {error}
Fix my code.

Moreover, we rely on a distinct system message, as we wanted an explanation of the error and the fix. We instruct GPT-4 to include an explanation in a specific format which we then extract using a regular expression.

You fix the code shared by the user. Only output code. Wrap the code like that:
```language
\{code\}
```
Explain the error and the fix in the following format:
explanation: "Here's the explanation"
Also put the explanations in the code as comments.

For ease of use, we introduced a dedicated "AI Fix" button placed next to the error message, making it easily accessible. We also added an "Explain" button that, upon hovering, displays the explanation of the fix.

Watch our Copilot fix a bug in a Python script:

^{Loading time has been reduced to enhance the viewing experience}

You can find all of our prompts templates here. We store them in yaml files, making it easy to read and edit. In order to evaluate the evolution of our Copilot's performance, each time we modify the prompts we (re)generate answers to sample questions. The samples can be found here.

A small note about tokens and costs

Whether it is for code generation, editing, or bug fixing, we only send one request to the OpenAI endpoint. The number of tokens spent vary depending on the number of resource types available and on the length of the code passed in the user message. But on average, queries contain about 1000 prompt tokens and 500 completion tokens. That amounts to approximately $0.09 per request at the time of writing.

Conclusion

Thank you for taking the time to explore how we built our AI code copilot. We hope you found it insightful and now have a clearer idea on how to build your own AI copilot for your product using GPT-4. If you have any questions or ideas you'd like to discuss, please don't hesitate to reach out to us.

At Windmill, we are very happy with the positive outcomes thus far. We will keep refining the prompts as we go to enhance the Copilot's performance. In addition, we are looking forward to implementing GPT-4 in other areas of Windmill, for example to create complete workflows and application components. Your feedback is of great value to us, and we welcome any suggestions or thoughts on how to further improve our AI copilot.

Windmill is an open-source and self-hostable serverless runtime and platform combining the power of code with the velocity of low-code. We turn your scripts into internal apps and composable steps of flows that automate repetitive workflows.

You can self-host Windmill using a docker compose up, or go with the cloud app.

Why GPT-4​

Generating code from instructions​

Prompt templates​

Handling resource types​

Handling multiple languages​

Handling database schemas​

Code editing and bug fixing​

Conclusion​

Why GPT-4

Generating code from instructions

Prompt templates

Handling resource types

Handling multiple languages

Handling database schemas

Code editing and bug fixing

Conclusion