Tag a BigQuery table by using Data Catalog
This quickstart helps you complete the following tasks:
Create a BigQuery dataset and table.
Create a tag template with a schema that defines five tag fields of distinct types. These are
string
,double
,boolean
,enumerated
, andrichtext
.Lookup the Data Catalog entry for your table.
In the Google Cloud console, create business metadata for your entry that includes an overview, data steward, and a tag.
Data Catalog lets you search and tag entries such as BigQuery tables with metadata. Some examples of metadata that you can use for tagging include public and private tags, data stewards, and rich text overview.
Before you begin
- Set up your project.
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Data Catalog and BigQuery APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Enable the Data Catalog and BigQuery APIs.
- Install the Google Cloud CLI.
-
To initialize the gcloud CLI, run the following command:
gcloud init
Add a public data entry to your project
Data Catalog entries include data resources such as a BigQuery dataset or a Pub/Sub topic.
Add a public dataset to your project.
In Google Cloud console, go to the BigQuery page.
In the Explorer section, click + ADD DATA and select Public datasets from the list.
In the Marketplace panel, search for
New York taxi trips
and click the relevant search result.Click View Dataset.
Create a dataset and a table
Create a dataset.
In the Google Cloud console, open the BigQuery page.
In the Explorer panel, select the project where you want to create the dataset.
Click the
Actions icon and click Create dataset.In the Create dataset page, fill in the following details:
- For Dataset ID, enter
demo_dataset
. - For Data location, select
us (multiple regions in United States)
. - Enable table expiration and specify the number of days.
- For Encryption, leave the Google-managed encryption key option selected.
Click Create dataset.
- For Dataset ID, enter
Copy a publicly accessible table to
demo_dataset
.In the Google Cloud console, open the BigQuery page.
In the Explorer pane, search for
tlc_yellow_trips
tables (click Broaden search to all projects if required) and select one of them, such astlc_yellow_trips_2017
. Then click Copy.In the Copy table pane, fill in the following information:
- In the Project name drop-down list, select your project.
- In the Dataset name drop-down list, select
demo_dataset
. - For the Table name, enter
trips
, then click Copy.
In the Explorer pane, confirm that the
trips
table is listed indemo_dataset
.
You add Data Catalog tags to the table in the next section.
Create a public tag template and attach a tag for your entry
You must be the dataset owner to attach a tag to a table in the dataset. For more information about public and private tags, see Public and private tags.
In a tag template, tag fields are optional. You do not have to provide a value for a field when attaching a tag to a Data Catalog entry. However, if a template defines a field as required, you must provide a value for the field. If the value is not provided, an error is generated.
You can use lower case letters and underscores to define field names. The tag template fields created in this example are just demo fields and are not auto-updated or synced with BigQuery.
Console
Go to the Dataplex > Tag Templates page.
Click Create tag template and enter the following details:
- Enter the Template name as
Demo Tag Template
. - Retain the default location.
- Retain the tag template visibility as Public.
Click Add field to add 5 fields. Use the following table and keep Field description empty.
Field display name Field ID Required field Type Source of data asset source Yes String Rows in the asset num_rows No Double Has PII has_pii No Boolean PII type pii_type No Enumerated Add values
EMAIL_ADDRESS
,US_SOCIAL_SECURITY_NUMBER
, andNONE
.Context context No Richtext
- Enter the Template name as
Click Create.
The Template details page lists all the information about the tag template.
To attach a tag to
demo_dataset
, go to the Dataplex search page.For Choose search platform, select Data Catalog as the search mode.
In the search box, enter
demo_dataset
. In the search result, you see thedemo_dataset
dataset and thetrips
table.Click the
trips
table. A BigQuery table details page opens.Click Attach tags.
In the Attach tags panel, enter the following details:
- Select the target as
trips
. - Select the tag template as
Demo Tag Template
. - For tag values, enter the following details:
- Source of data asset:
Copied from tlc_yellow_trips_2017
- Number of rows in the data asset:
113496874
- Has PII:
FALSE
- PII type:
NONE
- Source of data asset:
Click Save.
The tag fields are now listed in the Tags section in the BigQuery table details.
- Select the target as
gcloud
Run the gcloud data-catalog tag-templates create command shown below to create a tag template with the following five tag fields:
-
display_name:
Source of data assetid:
sourcerequired:
TRUEtype:
String -
display_name:
Number of rows in the data assetid:
num_rowsrequired:
FALSEtype:
Double -
display_name:
Has PIIid:
has_piirequired:
FALSEtype:
Boolean -
display_name:
PII typeid:
pii_typerequired:
FALSEtype:
Enumeratedvalues:
- EMAIL_ADDRESS
- US_SOCIAL_SECURITY_NUMBER
- NONE
# ------------------------------- # Create a Tag Template. # ------------------------------- gcloud data-catalog tag-templates create demo_template \ --location=us-central1 \ --display-name="Demo Tag Template" \ --field=id=source,display-name="Source of data asset",type=string,required=TRUE \ --field=id=num_rows,display-name="Number of rows in the data asset",type=double \ --field=id=has_pii,display-name="Has PII",type=bool \ --field=id=pii_type,display-name="PII type",type='enum(EMAIL_ADDRESS|US_SOCIAL_SECURITY_NUMBER|NONE)' # ------------------------------- # Lookup the Data Catalog entry for the table. # ------------------------------- ENTRY_NAME=$(gcloud data-catalog entries lookup '//bigquery.googleapis.com/projects/PROJECT_ID/datasets/DATASET/tables/TABLE' --format="value(name)") # ------------------------------- # Attach a Tag to the table. # ------------------------------- # Create the Tag file. cat > tag_file.json << EOF { "source": "BigQuery", "num_rows": 1000, "has_pii": true, "pii_type": "EMAIL_ADDRESS" } EOF gcloud data-catalog tags create --entry=${ENTRY_NAME} \ --tag-template=demo_template --tag-template-location=us-central1 --tag-file=tag_file.json
Go
Before trying this sample, follow the Go setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Go API reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Java
Before trying this sample, follow the Java setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Java API reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Node.js API reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
Before trying this sample, follow the Python setup instructions in the Data Catalog quickstart using client libraries. For more information, see the Data Catalog Python API reference documentation.
To authenticate to Data Catalog, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
REST & CMD LINE
REST
If you do not have access to Cloud Client libraries for your language or want to test the API using REST requests, see the following examples and refer to the Data Catalog REST API documentation.
1. Create a tag template.
Before using any of the request data, make the following replacements:
- project-id: your Google Cloud project ID
HTTP method and URL:
POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/region/tagTemplates?tagTemplateId=demo_tag_template
Request JSON body:
{ "displayName":"Demo Tag Template", "fields":{ "source":{ "displayName":"Source of data asset", "isRequired": "true", "type":{ "primitiveType":"STRING" } }, "num_rows":{ "displayName":"Number of rows in data asset", "isRequired": "false", "type":{ "primitiveType":"DOUBLE" } }, "has_pii":{ "displayName":"Has PII", "isRequired": "false", "type":{ "primitiveType":"BOOL" } }, "pii_type":{ "displayName":"PII type", "isRequired": "false", "type":{ "enumType":{ "allowedValues":[ { "displayName":"EMAIL_ADDRESS" }, { "displayName":"US_SOCIAL_SECURITY_NUMBER" }, { "displayName":"NONE" } ] } } } } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template", "displayName":"Demo Tag Template", "fields":{ "num_rows":{ "displayName":"Number of rows in data asset", "isRequired": "false", "type":{ "primitiveType":"DOUBLE" } }, "has_pii":{ "displayName":"Has PII", "isRequired": "false", "type":{ "primitiveType":"BOOL" } }, "pii_type":{ "displayName":"PII type", "isRequired": "false", "type":{ "enumType":{ "allowedValues":[ { "displayName":"EMAIL_ADDRESS" }, { "displayName":"NONE" }, { "displayName":"US_SOCIAL_SECURITY_NUMBER" } ] } } }, "source":{ "displayName":"Source of data asset", "isRequired":"true", "type":{ "primitiveType":"STRING" } } } }
2. Lookup the Data Catalog entry-id
for your BigQuery table.
Before using any of the request data, make the following replacements:
- project-id: Google Cloud project ID
HTTP method and URL:
GET https://datacatalog.googleapis.com/v1/entries:lookup?linkedResource=//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/trips
Request JSON body:
Request body is empty.
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name": "projects/project-id/locations/US/entryGroups/@bigquery/entries/entry-id", "type": "TABLE", "schema": { "columns": [ { "type": "STRING", "description": "A code indicating the TPEP provider that provided the record. 1= ", "mode": "REQUIRED", "column": "vendor_id" }, ... ] }, "sourceSystemTimestamps": { "createTime": "2019-01-25T01:45:29.959Z", "updateTime": "2019-03-19T23:20:26.540Z" }, "linkedResource": "//bigquery.googleapis.com/projects/project-id/datasets/demo_dataset/tables/trips", "bigqueryTableSpec": { "tableSourceType": "BIGQUERY_TABLE" } }
3. Create a tag from the template and attach it to your BigQuery table.
Before using any of the request data, make the following replacements:
- project-id: Google Cloud project ID
- entry-id: Data Catalog entry ID for the Demo Dataset trips table (returned in the lookup results in the previous step).
HTTP method and URL:
POST https://datacatalog.googleapis.com/v1/projects/project-id/locations/region/entryGroups/@bigquery/entries/entry-id/tags
Request JSON body:
{ "template":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template", "fields":{ "source":{ "stringValue":"Copied from tlc_yellow_trips_2017" }, "num_rows":{ "doubleValue":113496874 }, "has_pii":{ "boolValue":false }, "pii_type":{ "enumValue":{ "displayName":"NONE" } } } }
To send your request, expand one of these options:
You should receive a JSON response similar to the following:
{ "name":"projects/project-id/locations/US/entryGroups/@bigquery/entries/entry-id/tags/tag-id", "template":"projects/project-id/locations/us-central1/tagTemplates/demo_tag_template", "fields":{ "pii_type":{ "displayName":"PII type", "enumValue":{ "displayName":"NONE" } }, "has_pii":{ "displayName":"Has PII", "boolValue":false }, "source":{ "displayName":"Source of data asset", "stringValue":"Copied from tlc_yellow_trips_2017" }, "num_rows":{ "displayName":"Number of rows in data asset", "doubleValue":113496874 } }, "templateDisplayName":"Demo Tag Template" }
Create an overview for your entry
Within Google Cloud console, you can use rich text to describe an entry in your Data Catalog project.
To create an overview for the
trips
table, go to the Dataplex search page.For Choose search platform, select Data Catalog as the search mode.
In the search box, enter
demo_dataset
.In the search result, you see the
demo_dataset
dataset and thetrips
table.Click the
trips
table.A BigQuery table details page opens.
Click Add overview and enter some text. You can additionally include images and rich formatted text.
Click Save.
Add a data steward for your entry
Within Google Cloud console, you can add one or more data stewards to an entry in your Data Catalog project. A data steward for a data entry can be contacted to request more information about the data entry.
To create an overview for the
trips
table, repeat the first 3 steps from the previous section.Click the Edit Steward icon and add in one or more email addresses.
You can add a user with a non-Google email account.
Click Save.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
Delete the project
The easiest way to eliminate billing is to delete the project that you created for the tutorial.
To delete the project:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the dataset
If necessary, go to the BigQuery page.
In the Explorer panel, search for the
demo_dataset
dataset you created.Click the
Actions option and click Delete dataset.Confirm your delete action.
Delete the tag template
Go to the Data Catalog > Templates page.
Select Demo Tag Template.
In the row, click the
Actions option and click Delete this template.Confirm your delete action.
What's next
Learn about Data Catalog in Data Catalog Overview.
Learn about technical metadata and business metadata.
Learn about tag templates, public tags, and private tags in Tags and tag templates.
Browse the Overview of APIs and Client Libraries.