Register and call remote AI models using model endpoint management

To invoke predictions or generate embeddings using a model, register the model endpoint with model endpoint management.

For more information about the google_ml.create_model() function, see model endpoint management reference.

Before you begin

Before you register a model endpoint with model endpoint management, you must enable the google_ml_integration extension and set up authentication based on the model provider, if your model endpoint requires authentication.

Make sure that you access your database with the postgres default username.

Enable the extension

You must add and enable the google_ml_integration extension before you can start using the associated functions. Model endpoint management requires that the google_ml_integration extension is installed.

Set the google_ml_integration.enable_model_support database flag to on for an instance. For more information about setting database flags, see Configure an instance's database flags.
Connect to your database using psql or AlloyDB for PostgreSQL Studio.
Optional: If the google_ml_integration extension is already installed, alter it to update to the latest version:
```
    ALTER EXTENSION google_ml_integration UPDATE;
```
1. Add the google_ml_integration extension using psql:
```
CREATE EXTENSION google_ml_integration;
```
Optional: Grant permission to a non-super PostgreSQL user to manage model metadata:
```
  GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA google_ml TO NON_SUPER_USER;
```
Replace NON_SUPER_USER with the non-super PostgreSQL username.

Set up authentication

The following sections show how to set up authentication before adding a Vertex AI model endpoint or model endpoints hosted within Google Cloud.

Set up authentication for Vertex AI

To use the Google Vertex AI model endpoints, you must add Vertex AI permissions to the IAM-based AlloyDB service account you use to connect to the database. For more information about integrating with Vertex AI, see Integrate with Vertex AI.

Set up authentication for custom-hosted models

For all models except Vertex AI model endpoints, you can store your API keys or bearer tokens in Secret Manager. This step is optional if your model endpoint doesn't handle authentication through Secret Manager—for example, if your model endpoint uses HTTP headers to pass authentication information or doesn't use authentication at all.

This section explains how to set up authentication if you are using Secret Manager.

To create and use an API key or a bearer token, complete the following steps:

Create the secret in Secret Manager. For more information, see Create a secret and access a secret version.

The secret name and the secret path is used in the google_ml.create_sm_secret() SQL function.
Grant permissions to the AlloyDB cluster to access the secret.
```
  gcloud secrets add-iam-policy-binding 'SECRET_ID' \
      --member="serviceAccount:SERVICE_ACCOUNT_ID" \
      --role="roles/secretmanager.secretAccessor"
```
Replace the following:
- SECRET_ID: the secret ID in Secret Manager.
- SERVICE_ACCOUNT_ID: the ID of the IAM-based service account in the serviceAccount:service-PROJECT_ID@gcp-sa-alloydb.iam.gserviceaccount.com format—for example, service-my-project@gcp-sa-alloydb.iam.gserviceaccount.com.
  
  You can also grant this role to the service account at the project level. For more information, see Add Identity and Access Management policy binding

Text embedding models with built-in support

This section shows how to register model endpoints that the model endpoint management provides built-in support for.

Vertex AI embedding models

The model endpoint management provides built-in support for all versions of the text-embedding-gecko model by Vertex AI. Use the qualified name to set the model version to either textembedding-gecko@001 or textembedding-gecko@002.

Since the textembedding-gecko and textembedding-gecko@001 model endpoint ID is pre-registered with model endpoint management, you can directly use them as the model ID. For these models, the extension automatically sets up default transform functions.

To register the textembedding-gecko@002 model endpoint version, complete the following steps:

Ensure that both the AlloyDB cluster and the Vertex AI model you are querying are in the same region.

Connect to your database using psql.
Create and enable the google_ml_integration extension.

Call the create model function to add the textembedding-gecko@002 model endpoint:

CALL
  google_ml.create_model(
    model_id => 'textembedding-gecko@002',
    model_provider => 'google',
    model_qualified_name => 'textembedding-gecko@002',
    model_type => 'text_embedding',
    model_auth_type => 'alloydb_service_agent_iam');

Custom-hosted text embedding models

This section shows how to register custom model endpoints hosted in networks within Google Cloud. Adding custom-hosted text embedding model endpoints involves creating transform functions, and optionally, custom HTTP headers.

Adding custom-hosted generic model endpoints involves optionally generating custom HTTP headers and setting the model request URL.

The following example adds the custom-embedding-model text embedding model endpoint hosted by Cymbal, which is hosted within Google Cloud. The cymbal_text_input_transform and cymbal_text_output_transform transform functions are used to transform the input and output format of the model to the input and output format of the prediction function.

To register custom-hosted text embedding model endpoints, complete the following steps:

Connect to your database using psql.
Create and enable the google_ml_integration extension.
Optional: Add the API key as a secret to the Secret Manager for authentication.
Call the secret stored in the Secret Manager:
```
CALL
  google_ml.create_sm_secret(
    secret_id => 'SECRET_ID',
    secret_path => 'projects/project-id/secrets/SECRET_MANAGER_SECRET_ID/versions/VERSION_NUMBER');
```
Replace the following:
- SECRET_ID: the secret ID that you set and is subsequently used when registering a model endpoint—for example, key1.
- SECRET_MANAGER_SECRET_ID: the secret ID set in Secret Manager when you created the secret.
- PROJECT_ID: the ID of your Google Cloud project.
- VERSION_NUMBER: the version number of the secret ID.
Note: Secret Manager generates an Authorization: Bearer SECRET_VALUE_FROM_SECRET_MANAGER header for authentication by default. If this format matches your model endpoint's authorization bearer token format, then you don't have to generate auth headers using the header generation function.

Create the input and output transform functions based on the following signature for the prediction function for text embedding model endpoints. For more information about how to create transform functions, see Transform functions example.

The following are example transform functions that are specific to the custom-embedding-model text embedding model endpoint:

-- Input Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION cymbal_text_input_transform(model_id VARCHAR(100), input_text TEXT)
RETURNS JSON
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_input JSON;
  model_qualified_name TEXT;
BEGIN
  SELECT json_build_object('prompt', json_build_array(input_text))::JSON INTO transformed_input;
  RETURN transformed_input;
END;
$$;
-- Output Transform Function corresponding to the custom model endpoint
CREATE OR REPLACE FUNCTION cymbal_text_output_transform(model_id VARCHAR(100), response_json JSON)
RETURNS REAL[]
LANGUAGE plpgsql
AS $$
DECLARE
  transformed_output REAL[];
BEGIN
  SELECT ARRAY(SELECT json_array_elements_text(response_json->0)) INTO transformed_output;
  RETURN transformed_output;
END;
$$;

Call the create model function to register the custom embedding model endpoint:
```
CALL
  google_ml.create_model(
    model_id => 'MODEL_ID',
    model_request_url => 'REQUEST_URL',
    model_provider => 'custom',
    model_type => 'text_embedding',
    model_auth_type => 'secret_manager',
    model_auth_id => 'SECRET_ID',
    model_qualified_name => 'MODEL_QUALIFIED_NAME',
    model_in_transform_fn => 'cymbal_text_input_transform',
    model_out_transform_fn => 'cymbal_text_output_transform');
```
Replace the following:
- MODEL_ID: required. A unique ID for the model endpoint that you define-for example custom-embedding-model. This model ID is referenced for metadata that the model endpoint needs to generate embeddings or invoke predictions.
- REQUEST_URL: required. The model-specific endpoint when adding custom text embedding and generic model endpoints—for example, https://cymbal.com/models/text/embeddings/v1. Ensure that the model endpoint is accessible through an internal IP address. Model endpoint management doesn't support public IP addresses.
- MODEL_QUALIFIED_NAME: required if your model endpoint uses a qualified name. The fully qualified name in case the model endpoint has multiple versions.
- SECRET_ID: the secret ID you used earlier in the google_ml.create_sm_secret() procedure.

Generic models

This section shows how to register a generic gemini-pro model endpoint from Vertex AI Model Garden, which doesn't have built-in support. You can register any generic model endpoint that is hosted within Google Cloud.

AlloyDB only supports model endpoints that are available through Vertex AI Model Garden and model endpoints hosted in networks within Google Cloud.

Gemini model

The following example adds the gemini-1.0-pro model endpoint from the Vertex AI Model Garden.

Connect to your database using psql.
Create and enable the google_ml_integration extension.

Call the create model function to register the gemini-1.0-pro model endpoint:

CALL
  google_ml.create_model(
    model_id => 'MODEL_ID',
    model_request_url => 'https://us-central1-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/us-central1/publishers/google/models/gemini-1.0-pro:streamGenerateContent',
    model_provider => 'google',
    model_auth_type => 'alloydb_service_agent_iam');

Replace the following:

MODEL_ID: a unique ID for the model endpoint that you define—for example, gemini-1. This model ID is referenced for metadata that the model endpoint needs to generate embeddings or invoke predictions.
PROJECT_ID: the ID of your Google Cloud project.

For more information, see how to invoke predictions for generic model endpoints.

What's next

Learn about the model endpoint management reference.