Code chat

Codey for Code Chat (codechat-bison) is the name of the model that supports code chat. It's a foundation model that supports multi-turn conversations that are specialized for code. The model allows developers to chat with a chatbot for help with code-related questions. The code chat API is used to interface with the Codey for Code Chat model.

Codey for Code Chat is ideal for code tasks that are completed with back-and-forth interactions so you can engage in a continuous conversation. For code tasks that require a single interaction, use the API for code completion or the API for code generation.

To explore this model in the console, see the Codey for Code Chat model card in the Model Garden.
Go to the Model Garden

Use cases

Some common used cases for code chat are:

Get help about code: Get help with questions you have about code, such as questions about an API, syntax in a supported programming language, or which version of a library is required for code you're writing.
Debugging: Get help with debugging code that doesn't compile or that contains a bug.
Documentation: Get help understanding code so you can document it accurately.
Learn about code: Get help learning about code you're not familiar with.

HTTP request

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/codechat-bison:predict

Model versions

To use the latest model version, specify the model name without a version number, for example codechat-bison.

To use a stable model version, specify the model version number, for example codechat-bison@002. Each stable version is available for six months after the release date of the subsequent stable version.

The following table contains the available stable model versions:

codechat-bison model	Release date	Discontinuation date
codechat-bison@002	December 6, 2023	April 9, 2025

For more information, see Model versions and lifecycle.

Request body

{
  "instances": [
    {
      "context": string,
      "messages": [
        {
          "content": string,
          "author": string
        }
      ]
    }
  ],
  "parameters":{
    "temperature": number,
    "maxOutputTokens": integer,
    "candidateCount": integer,
    "logprobs": integer,
    "presencePenalty": float,
    "frequencyPenalty": float,
    "seed": integer
  }
}

The following are the parameters for the code chat model named codechat-bison. The codechat-bison model is one of the models in Codey. You can use these parameters to help optimize your prompt for a chatbot conversation about code. For more information, see Code models overview and Create prompts to chat about code.

Parameter	Description	Acceptable values
`context`	Text that should be provided to the model first to ground the response.	Text
`messages` (required)	Conversation history provided to the model in a structured alternate-author form. Messages appear in chronological order: oldest first, newest last. When the history of messages causes the input to exceed the maximum length, the oldest messages are removed until the entire prompt is within the allowed limit.	List[Structured Message] "author": "user", "content": "user message"
`temperature` (optional)	The temperature is used for sampling during response generation. Temperature controls the degree of randomness in token selection. Lower temperatures are good for prompts that require a less open-ended or creative response, while higher temperatures can lead to more diverse or creative results. A temperature of `0` means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.	`0.0–1.0` `Default: 0.2`
`maxOutputTokens` (optional)	Maximum number of tokens that can be generated in the response. A token is approximately four characters. 100 tokens correspond to roughly 60-80 words. Specify a lower value for shorter responses and a higher value for potentially longer responses.	`1–2048` `Default: 1024`
`candidateCount` (optional)	The number of response variations to return. For each request, you're charged for the output tokens of all candidates, but are only charged once for the input tokens. Specifying multiple candidates is a Preview feature that works with `generateContent` (`streamGenerateContent` is not supported). The following models are supported: Gemini 1.5 Flash: `1`-`8`, default: `1` Gemini 1.5 Pro: `1`-`8`, default: `1` Gemini 1.0 Pro: `1`-`8`, default: `1`	`1-4` `Default: 1`
`logprobs` (optional)	Returns the log probabilities of the top candidate tokens at each generation step. The model's chosen token might not be the same as the top candidate token at each step. Specify the number of candidates to return by using an integer value in the range of `1`-`5`.	`0-5`
`frequencyPenalty` (optional)	Positive values penalize tokens that repeatedly appear in the generated text, decreasing the probability of repeating content. The minimum value is `-2.0`. The maximum value is up to, but not including, `2.0`.	`Minimum value: -2.0 Maximum value: 2.0`
`presencePenalty` (optional)	Positive values penalize tokens that already appear in the generated text, increasing the probability of generating more diverse content. The minimum value is `-2.0`. The maximum value is up to, but not including, `2.0`.	`Minimum value: -2.0 Maximum value: 2.0`
`seed`	When seed is fixed to a specific value, the model makes a best effort to provide the same response for repeated requests. Deterministic output isn't guaranteed. Also, changing the model or parameter settings, such as the temperature, can cause variations in the response even when you use the same seed value. By default, a random seed value is used. This is a preview feature.	`Optional`

Sample request

REST

To test a text prompt by using the Vertex AI API, send a POST request to the publisher model endpoint.

Before using any of the request data, make the following replacements:

PROJECT_ID: Your project ID.

Request body

HTTP method and URL:

POST https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/codechat-bison:predict

Request JSON body:

{
  "instances": [
    {
      "messages": [
        {
          "author": "AUTHOR",
          "content": "CONTENT"
        }
      ]
    }
  ],
  "parameters": {
    "temperature": TEMPERATURE,
    "maxOutputTokens": MAX_OUTPUT_TOKENS,
    "candidateCount": CANDIDATE_COUNT
  }
}

To send your request, choose one of these options:

curl

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login , or by using Cloud Shell, which automatically logs you into the gcloud CLI . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/codechat-bison:predict"

PowerShell

Note: The following command assumes that you have logged in to the gcloud CLI with your user account by running gcloud init or gcloud auth login . You can check the currently active account by running gcloud auth list.

Save the request body in a file named request.json, and execute the following command:

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/codechat-bison:predict" | Select-Object -Expand Content

You should receive a JSON response similar to the sample response.

Python

To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.

from vertexai.language_models import CodeChatModel

# TODO developer - override these parameters as needed:
parameters = {
    "temperature": 0.5,  # Temperature controls the degree of randomness in token selection.
    "max_output_tokens": 1024,  # Token limit determines the maximum amount of text output.
}

code_chat_model = CodeChatModel.from_pretrained("codechat-bison@001")
chat_session = code_chat_model.start_chat()

response = chat_session.send_message(
    "Please help write a function to calculate the min of two numbers", **parameters
)
print(f"Response from Model: {response.text}")
# Response from Model: Sure, here is a function that you can use to calculate the minimum of two numbers:
# ```
# def min(a, b):
#   """
#   Calculates the minimum of two numbers.
#   Args:
#     a: The first number.
# ...

Node.js

Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';
const aiplatform = require('@google-cloud/aiplatform');

// Imports the Google Cloud Prediction service client
const {PredictionServiceClient} = aiplatform.v1;

// Import the helper module for converting arbitrary protobuf.Value objects.
const {helpers} = aiplatform;

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};
const publisher = 'google';
const model = 'codechat-bison@001';

// Instantiates a client
const predictionServiceClient = new PredictionServiceClient(clientOptions);

async function callPredict() {
  // Configure the parent resource
  const endpoint = `projects/${project}/locations/${location}/publishers/${publisher}/models/${model}`;

  // Learn more about creating prompts to work with a code chat model at:
  // https://cloud.google.com/vertex-ai/docs/generative-ai/code/code-chat-prompts
  const prompt = {
    messages: [
      {
        author: 'user',
        content: 'Hi, how are you?',
      },
      {
        author: 'system',
        content: 'I am doing good. What can I help you in the coding world?',
      },
      {
        author: 'user',
        content:
          'Please help write a function to calculate the min of two numbers',
      },
    ],
  };
  const instanceValue = helpers.toValue(prompt);
  const instances = [instanceValue];

  const parameter = {
    temperature: 0.5,
    maxOutputTokens: 1024,
  };
  const parameters = helpers.toValue(parameter);

  const request = {
    endpoint,
    instances,
    parameters,
  };

  // Predict request
  const [response] = await predictionServiceClient.predict(request);
  console.log('Get code chat response');
  const predictions = response.predictions;
  console.log('\tPredictions :');
  for (const prediction of predictions) {
    console.log(`\t\tPrediction : ${JSON.stringify(prediction)}`);
  }
}

callPredict();

Java

Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.

To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.


import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.PredictResponse;
import com.google.cloud.aiplatform.v1.PredictionServiceClient;
import com.google.cloud.aiplatform.v1.PredictionServiceSettings;
import com.google.protobuf.InvalidProtocolBufferException;
import com.google.protobuf.Value;
import com.google.protobuf.util.JsonFormat;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

public class PredictCodeChatSample {

  public static void main(String[] args) throws IOException {
    // TODO(developer): Replace this variable before running the sample.
    String project = "YOUR_PROJECT_ID";

    // Learn more about creating prompts to work with a code chat model at:
    // https://cloud.google.com/vertex-ai/docs/generative-ai/code/code-chat-prompts
    String instance =
        "{ \"messages\": [\n"
            + "{\n"
            + "  \"author\": \"user\",\n"
            + "  \"content\": \"Hi, how are you?\"\n"
            + "},\n"
            + "{\n"
            + "  \"author\": \"system\",\n"
            + "  \"content\": \"I am doing good. What can I help you in the coding world?\"\n"
            + " },\n"
            + "{\n"
            + "  \"author\": \"user\",\n"
            + "  \"content\":\n"
            + "     \"Please help write a function to calculate the min of two numbers.\"\n"
            + "}\n"
            + "]}";
    String parameters = "{\n" + "  \"temperature\": 0.5,\n" + "  \"maxOutputTokens\": 1024\n" + "}";
    String location = "us-central1";
    String publisher = "google";
    String model = "codechat-bison@001";

    predictCodeChat(instance, parameters, project, location, publisher, model);
  }

  // Use a code chat model to generate a code function
  public static void predictCodeChat(
      String instance,
      String parameters,
      String project,
      String location,
      String publisher,
      String model)
      throws IOException {
    final String endpoint = String.format("%s-aiplatform.googleapis.com:443", location);
    PredictionServiceSettings predictionServiceSettings =
        PredictionServiceSettings.newBuilder().setEndpoint(endpoint).build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests.
    try (PredictionServiceClient predictionServiceClient =
        PredictionServiceClient.create(predictionServiceSettings)) {
      final EndpointName endpointName =
          EndpointName.ofProjectLocationPublisherModelName(project, location, publisher, model);

      Value instanceValue = stringToValue(instance);
      List<Value> instances = new ArrayList<>();
      instances.add(instanceValue);

      Value parameterValue = stringToValue(parameters);

      PredictResponse predictResponse =
          predictionServiceClient.predict(endpointName, instances, parameterValue);
      System.out.println("Predict Response");
      System.out.println(predictResponse);
    }
  }

  // Convert a Json string to a protobuf.Value
  static Value stringToValue(String value) throws InvalidProtocolBufferException {
    Value.Builder builder = Value.newBuilder();
    JsonFormat.parser().merge(value, builder);
    return builder.build();
  }
}

Response body

{
  "predictions": [
    {
      "candidates": [
        {
          "author": string,
          "content": string
        }
      ],
      "citationMetadata": {
        "citations": [
          {
            "startIndex": integer,
            "endIndex": integer,
            "url": string,
            "title": string,
            "license": string,
            "publicationDate": string
          }
        ]
      },
      "logprobs": {
        "tokenLogProbs": [ float ],
        "tokens": [ string ],
        "topLogProbs": [ { map<string, float> } ]
      },
      "safetyAttributes":{
        "categories": [ string ],
        "blocked": false,
        "scores": [ float ]
      },
      "score": float
    }
  ]
}

Response element	Description
`author`	A `string` that indicates the author of a chat response.
`blocked`	A `boolean` flag associated with a safety attribute that indicates if the model's input or output was blocked. If `blocked` is `true`, then the `errors` field in the response contains one or more error codes. If `blocked` is `false`, then the response doesn't include the `errors` field.
`categories`	A list the safety attribute category names that are associated with the generated content. The order of the scores in the `scores` parameter matches the order of the categories. For example, the first score in the `scores` parameter indicates the likelihood that the response violates the first category in the `categories` list.
`content`	The content of a chat response.
`endIndex`	An integer that specifies where a citation ends in the `content`.
`errors`	An array of error codes. The `errors` response field is included in the response only when the `blocked` field in the response is `true`. For information about understanding error codes, see Safety errors.
`license`	The license associated with a citation.
`publicationDate`	The date a citation was published. Its valid formats are `YYYY`, `YYYY-MM`, and `YYYY-MM-DD`.
`safetyAttributes`	An array of safety attributes. The array contains one safety attribute for each response candidate.
`score`	A `float` value that's less than zero. The higher the value for `score`, the greater confidence the model has in its response.
`scores`	An array of `float` values. Each value is a score that indicates the likelihood that the response violates the safety category it's checked against. The lower the value, the safer the model considers the response. The order of the scores in the array corresponds to the order of the safety attributes in the `categories` response element.
`startIndex`	An integer that specifies where a citation starts in the `content`.
`title`	The title of a citation source. Examples of source titles might be that of a news article or a book.
`url`	The URL of a citation source. Examples of a URL source might be a news website or a GitHub repository.
`tokens`	The sampled tokens.
`tokenLogProbs`	The sampled tokens' log probabilities.
`topLogProbs`	The most likely candidate tokens and their log probabilities at each step.
`logprobs`	Results of the `logprobs` parameter. 1-1 mapping to `candidates`.

Sample response

{
  "predictions": [
    {
      "citationMetadata": [
        {
          "citations": []
        }
      ],
      "candidates": [
        {
          "author": "AUTHOR",
          "content": "RESPONSE"
        }
      ],
      "safetyAttributes": {
        "categories": [],
        "blocked": false,
        "scores": []
      },
      "score": -1.1161688566207886
    }
  ]
}

Stream response from Generative AI models

The parameters are the same for streaming and non-streaming requests to the APIs.

To view sample code requests and responses using the REST API, see Examples using the streaming REST API.

To view sample code requests and responses using the Vertex AI SDK for Python, see Examples using Vertex AI SDK for Python for streaming.