JOBS view
The INFORMATION_SCHEMA.JOBS
view contains near real-time metadata about
all BigQuery jobs in the current project.
Required role
To get the permission that you need to query the INFORMATION_SCHEMA.JOBS
view,
ask your administrator to grant you the
BigQuery Resource Viewer (roles/bigquery.resourceViewer
) IAM role on your project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the
bigquery.jobs.listAll
permission,
which is required to
query the INFORMATION_SCHEMA.JOBS
view.
You might also be able to get this permission with custom roles or other predefined roles.
For more information about BigQuery permissions, see Access control with IAM.
Schema
The underlying data is partitioned by the creation_time
column and
clustered by project_id
and user_email
. The query_info
column contains
additional information about your query jobs.
The INFORMATION_SCHEMA.JOBS
view has the following schema:
Column name | Data type | Value |
---|---|---|
bi_engine_statistics |
RECORD |
If the project is configured to use the BI
Engine SQL Interface, then this field contains BiEngineStatistics.
Otherwise NULL .
|
cache_hit |
BOOLEAN |
Whether the query results of this job were from a cache.
If you have a multi-query statement
job, cache_hit for your parent query is
NULL .
|
creation_time |
TIMESTAMP |
(Partitioning column) Creation time of this job. Partitioning is based on the UTC time of this timestamp. |
destination_table |
RECORD |
Destination table for results, if any. |
end_time |
TIMESTAMP |
The end time of this job, in milliseconds since the epoch. This field represents the
time when the job enters the DONE state. |
error_result |
RECORD |
Details of any errors as ErrorProto objects. |
job_creation_reason.code |
STRING |
Specifies the high level reason why a job was created. Possible values are:
|
job_id |
STRING |
The ID of the job if a job was created. Otherwise, the query ID of a query using short
query mode. For example, bquxjob_1234 . |
job_stages |
RECORD |
Query
stages of the job.
Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery. |
job_type |
STRING |
The type of the job. Can be QUERY , LOAD , EXTRACT ,
COPY , or NULL . A NULL value
indicates an internal job, such as a script job statement evaluation
or a materialized view refresh.
|
labels |
RECORD |
Array of labels applied to the job as key-value pairs. |
parent_job_id |
STRING |
ID of the parent job, if any. |
priority |
STRING |
The priority of this job. Valid values include INTERACTIVE and
BATCH . |
project_id |
STRING |
(Clustering column) The ID of the project. |
project_number |
INTEGER |
The number of the project. |
query |
STRING |
SQL query text. Only the JOBS_BY_PROJECT view has the query
column. |
referenced_tables |
RECORD |
Array of tables referenced by the job. Only populated for query jobs that are not cache hits. |
reservation_id |
STRING |
Name of the primary reservation assigned to this job,
in the format
RESERVATION_ADMIN_PROJECT:RESERVATION_LOCATION.RESERVATION_NAME .In this output:
|
edition |
STRING |
The edition associated with the reservation assigned to this job. For more information about editions, see Introduction to BigQuery editions. |
session_info |
RECORD |
Details about the session in which this job ran, if any. |
start_time |
TIMESTAMP |
The start time of this job, in milliseconds since the epoch. This field represents the
time when the job transitions from the PENDING state to either
RUNNING or DONE . |
state |
STRING |
Running state of the job. Valid states include PENDING , RUNNING , and
DONE .
|
statement_type |
STRING |
The type of query statement. For example, DELETE , INSERT ,
SCRIPT , SELECT , or UPDATE . See QueryStatementType
for list of valid values.
|
timeline |
RECORD |
Query timeline of the job. Contains snapshots of query execution. |
total_bytes_billed |
INTEGER |
If the project is configured to use on-demand
pricing, then this field contains the total bytes billed for the
job. If the project is configured to use flat-rate
pricing, then you are not billed for bytes and this field is
informational only.
Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery. |
total_bytes_processed |
INTEGER |
Total bytes processed by the job. Note: This column's values are empty for queries that read from tables with row-level access policies. For more information, see best practices for row-level security in BigQuery. |
total_modified_partitions |
INTEGER |
The total number of partitions the job modified. This field is
populated for LOAD and QUERY jobs.
|
total_slot_ms |
INTEGER |
Slot milliseconds for the job over its entire duration in the RUNNING state,
including retries. |
transaction_id |
STRING |
ID of the transaction in which this job ran, if any. (Preview) |
user_email |
STRING |
(Clustering column) Email address or service account of the user who ran the job. |
query_info.resource_warning |
STRING |
The warning message that appears if the resource usage during query processing is above the internal threshold of the system. A successful query job can have the resource_warning field populated. With resource_warning , you get additional data points to optimize your queries and to set up monitoring for performance trends of an equivalent set of queries by using query_hashes .
|
query_info.query_hashes.normalized_literals |
STRING |
Contains the hashes of the query. normalized_literals is a hexadecimal
STRING hash that ignores comments, parameter values, UDFs, and literals.
The hash value will differ when underlying views change, or if the query implicitly
references columns, such as SELECT * , and the table schema changes.
This field appears for successful GoogleSQL queries that are not cache hits. |
query_info.performance_insights |
RECORD |
Performance insights for the job. |
query_info.optimization_details |
STRUCT |
The history-based optimizations for the job. |
transferred_bytes |
INTEGER |
Total bytes transferred for cross-cloud queries, such as BigQuery Omni cross-cloud transfer jobs. |
materialized_view_statistics |
RECORD |
Statistics of materialized views considered in a query job. (Preview) |
When you query INFORMATION_SCHEMA.JOBS
to find a summary cost
of query jobs, exclude the SCRIPT
statement type,
otherwise some values might be counted twice. The SCRIPT
row includes
summary values for all child jobs that were executed as part of this job.
Multi-statement query jobs
A multi-statement query job is a query job that uses the procedural
language.
Multi-statement query jobs often define variables with DECLARE
or have control
flow statements such as IF
or WHILE
. When you query
INFORMATION_SCHEMA.JOBS
, you might need to recognize the difference between a
multi-statement query job and other jobs. A multi-statement query job has the
following traits:
statement_type
=SCRIPT
reservation_id
=NULL
- Child jobs. Each of a multi-statement query job's child jobs has a
parent_job_id
pointing to the multi-statement query job itself. This includes summary values for all child jobs that were executed as part of this job. For this reason, if you queryINFORMATION_SCHEMA.JOBS
to find a summary cost of query jobs, then you should exclude theSCRIPT
statement type, otherwise some values such astotal_slot_ms
might be counted twice.
Data retention
This view contains currently running jobs and the job history of the past 180 days.
Scope and syntax
Queries against this view must include a region qualifier. The following table explains the region scope for this view:
View name | Resource scope | Region scope |
---|---|---|
[PROJECT_ID.]`region-REGION`.INFORMATION_SCHEMA.JOBS[_BY_PROJECT] |
Project level | REGION |
- Optional:
PROJECT_ID
: the ID of your Google Cloud project. If not specified, the default project is used.
REGION
: any dataset region name. For example,`region-us`
.
Examples
To run the query against a project other than your default project, add the project ID in the following format:
`PROJECT_ID`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS
PROJECT_ID
: the ID of the project.REGION_NAME
: the region for your project.
For example, `myproject`.`region-us-central1`.INFORMATION_SCHEMA.JOBS
.
Compare on-demand job usage to billing data
For projects using on-demand pricing,
you can use the INFORMATION_SCHEMA.JOBS
view to review
compute charges over a given period.
For projects using capacity-based (slots) pricing,
you can use the INFORMATION_SCHEMA.RESERVATIONS_TIMELINE
to review compute charges over a given period.
The following query produces daily estimated aggregates of your billed TiB and the resulting charges. The limitations section explains when these estimates may not match your bill.
For this example only, he following additional variables must be set. They can be edited here for ease of use.
START_DATE
: the earliest date to aggregate over (inclusive).END_DATE
: the latest date to aggregate over (inclusive).PRICE_PER_TIB
: the on-demand price per TiB used for bill estimates.
CREATE TEMP FUNCTION isBillable(error_result ANY TYPE) AS ( -- You aren't charged for queries that return an error. error_result IS NULL -- However, canceling a running query might incur charges. OR error_result.reason = 'stopped' ); -- BigQuery hides the number of bytes billed on all queries against tables with -- row-level security. CREATE TEMP FUNCTION isMaybeUsingRowLevelSecurity( job_type STRING, tib_billed FLOAT64, error_result ANY TYPE) AS ( job_type = 'QUERY' AND tib_billed IS NULL AND isBillable(error_result) ); WITH query_params AS ( SELECT date 'START_DATE' AS start_date, -- inclusive date 'END_DATE' AS end_date, -- inclusive ), usage_with_multiplier AS ( SELECT job_type, error_result, creation_time, -- Jobs are billed by end_time in PST8PDT timezone, regardless of where -- the job ran. EXTRACT(date FROM end_time AT TIME ZONE 'PST8PDT') billing_date, total_bytes_billed / 1024 / 1024 / 1024 / 1024 total_tib_billed, CASE statement_type WHEN 'SCRIPT' THEN 0 WHEN 'CREATE_MODEL' THEN 50 * PRICE_PER_TIB ELSE PRICE_PER_TIB END AS multiplier, FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS ) SELECT billing_date, sum(total_tib_billed * multiplier) estimated_charge, sum(total_tib_billed) estimated_usage_in_tib, countif(isMaybeUsingRowLevelSecurity(job_type, total_tib_billed, error_result)) AS jobs_using_row_level_security, FROM usage_with_multiplier, query_params WHERE 1 = 1 -- Filter by creation_time for partition pruning. AND date(creation_time) BETWEEN date_sub(start_date, INTERVAL 2 day) AND end_date AND billing_date BETWEEN start_date AND end_date AND isBillable(error_result) GROUP BY billing_date ORDER BY billing_date;
Limitations
BigQuery hides some statistics for queries over tables with row-level security. The provided query counts the number of jobs impacted as
jobs_using_row_level_security
, but does not have access to the billable usage.BigQuery ML pricing for on-demand queries depends on the type of model being created.
INFORMATION_SCHEMA.JOBS
does not track which type of model was created, so the provided query assumes all CREATE_MODEL statements were creating the higher billed model types.Apache Spark procedures use a similar pricing model, but charges are reported as BigQuery Enterprise edition pay-as-you-go SKU.
INFORMATION_SCHEMA.JOBS
tracks this usage astotal_bytes_billed
, but cannot determine which SKU the usage represents.
Calculate average slot utilization
The following example calculates average slot utilization for all queries over the past 7 days for a given project. Note that this calculation is most accurate for projects that have consistent slot usage throughout the week. If your project does not have consistent slot usage, this number might be lower than expected.
To run the query:
SELECT SUM(total_slot_ms) / (1000 * 60 * 60 * 24 * 7) AS avg_slots FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE -- Filter by the partition column first to limit the amount of data scanned. -- Eight days allows for jobs created before the 7 day end_time filter. creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = 'QUERY' AND statement_type != 'SCRIPT' AND end_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) AND CURRENT_TIMESTAMP();
The result is similar to the following:
+------------+ | avg_slots | +------------+ | 3879.1534 | +------------+
You can check usage for a particular reservation with
WHERE reservation_id = "…"
. This can be helpful to determine percentage use
of a reservation over a period of time. For script jobs, the parent job also
reports the total slot usage from its children jobs. To avoid double counting,
use WHERE statement_type != "SCRIPT"
to exclude the parent job.
If instead you would like to check the average slot utilization for individual
jobs, use total_slot_ms / TIMESTAMP_DIFF(end_time, start_time, MILLISECOND)
.
Count recent active queries by query priority
The following example displays the number of queries, grouped by priority (interactive or batch) that were started within the last 7 hours:
SELECT priority, COUNT(*) active_jobs FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 hour) AND job_type = 'QUERY' GROUP BY priority;
The result is similar to the following:
+-------------+-------------+ | priority | active_jobs | +-------------+-------------+ | INTERACTIVE | 2 | | BATCH | 3 | +-------------+-------------+
The priority
field indicates whether a query is INTERACTIVE
or BATCH
.
View load job history
The following example lists all users or service accounts that submitted a batch load job for a given project. Because no time boundary is specified, this query scans all available history.
SELECT user_email AS user, COUNT(*) num_jobs FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE job_type = 'LOAD' GROUP BY user_email;
The result is similar to the following:
+--------------+ | user | +--------------+ | abc@xyz.com | +--------------+ | def@xyz.com | +--------------+
Get the number of load jobs to determine the daily job quota used
The following example returns the number of jobs by day, dataset, and table so that you can determine how much of the daily job quota is used.
SELECT DATE(creation_time) as day, destination_table.project_id as project_id, destination_table.dataset_id as dataset_id, destination_table.table_id as table_id, COUNT(job_id) AS load_job_count FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE creation_time BETWEEN TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 8 DAY) AND CURRENT_TIMESTAMP() AND job_type = "LOAD" GROUP BY day, project_id, dataset_id, table_id ORDER BY day DESC;
Get the last 10 failed jobs
The following example shows the last 10 failed jobs:
SELECT job_id, creation_time, user_email, error_result FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY WHERE error_result.reason != "Null" ORDER BY creation_time DESC LIMIT 10;
The results should look similar to the following:
+---------------+--------------------------+------------------+-------------------------------------+ | job_id | creation_time | user_email | error_result | +---------------+--------------------------+------------------+-------------------------------------+ | examplejob_1 | 2020-10-10 00:00:00 UTC | bob@example.com | Column 'col1' has mismatched type...| | examplejob_2 | 2020-10-11 00:00:00 UTC | bob@example.com | Column 'col1' has mismatched type...| +---------------+--------------------------+------------------+-------------------------------------+
Query the list of long running jobs
The following example shows the list of long running jobs that are in
the RUNNING
or PENDING
state for more than 30 minutes:
SELECT job_id, job_type, state, creation_time, start_time, user_email FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE state!="DONE" AND creation_time <= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 MINUTE) ORDER BY creation_time ASC;
The result is similar to the following:
+--------+----------+---------+--------------------------------+--------------------------------+------------------+ | job_id | job_type | state | creation_time | start_time | user_email | +--------+----------+---------+--------------------------------+--------------------------------+------------------+ | job_1 | QUERY | RUNNING | 2023-05-03 05:07:22.818000 UTC | 2023-05-03 05:07:22.905000 UTC | abc@example.com | | job_2 | QUERY | PENDING | 2023-05-01 02:05:47.925000 UTC | 2023-05-01 02:05:47.998000 UTC | xyz@example.com | +--------+----------+---------+--------------------------------+--------------------------------+------------------+
Queries using short query optimized mode
The following example shows a list of queries that were executed in short query optimized mode for which BigQuery did not create jobs.
SELECT job_id, FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE TIMESTAMP_TRUNC(creation_time, DAY) = '2024-06-12' AND job_creation_reason.code IS NULL LIMIT 10;
The results should look like the following:
+------------------------------------------+ | job_id | +------------------------------------------+ | 2Lm09bHxDEsoVK8zwzWJomLHU_Ud%1910479b151 | | j9_GVQf28jW2M1_RfTYGRPX1vq--!191047a135f | +------------------------------------------+
The following example shows information about a query that was executed in short query optimized mode for which BigQuery did not create a job.
SELECT job_id, statement_type, priority, cache_hit, job_creation_reason.code AS job_creation_reason_code, total_bytes_billed, total_bytes_processed, total_slot_ms, state, error_result.message AS error_result_message, FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE TIMESTAMP_TRUNC(creation_time, DAY) = '2024-06-12' AND job_id = '2Lm09bHxDEsoVK8zwzWJomLHU_Ud%1910479b151' -- queryId
Note: The job_id
field contains the queryId
of the query when a job was
not created for this query.
The results should look like the following:
+------------------------------------------+----------------+-------------+-----------+--------------------------+--------------------+-----------------------+---------------+-------+----------------------+ | job_id | statement_type | priority | cache_hit | job_creation_reason_code | total_bytes_billed | total_bytes_processed | total_slot_ms | state | error_result_message | +------------------------------------------+----------------+-------------+-----------+--------------------------+--------------------+-----------------------+---------------+-------+----------------------+ | 2Lm09bHxDEsoVK8zwzWJomLHU_Ud%1910479b151 | SELECT | INTERACTIVE | false | null | 161480704 | 161164718 | 3106 | DONE | null | +------------------------------------------+----------------+-------------+-----------+--------------------------+--------------------+-----------------------+---------------+-------+----------------------+
The following example shows a list of queries that were executed in short query optimized mode for which BigQuery did create jobs.
SELECT job_id, job_creation_reason.code AS job_creation_reason_code FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE TIMESTAMP_TRUNC(creation_time, DAY) = '2024-06-12' AND job_creation_reason.code IS NOT NULL AND job_creation_reason.code != 'REQUESTED' LIMIT 10
The results should look like the following:
+----------------------------------+--------------------------+ | job_id | job_creation_reason_code | +----------------------------------+--------------------------+ | job_LxOEwrJEffcOfjK7GBwWjO3RroOI | LARGE_RESULTS | +----------------------------------+--------------------------+
Bytes processed per user identity
The following example shows the total bytes billed for query jobs per user:
SELECT user_email, SUM(total_bytes_billed) AS bytes_billed FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE job_type = 'QUERY' AND statement_type != 'SCRIPT' GROUP BY user_email;
Note: See the caveat for the total_bytes_billed
column in the
schema documentation for the JOBS
views.
The results should look like the following:
+---------------------+--------------+ | user_email | bytes_billed | +---------------------+--------------+ | bob@example.com | 2847932416 | | alice@example.com | 1184890880 | | charles@example.com | 10485760 | +---------------------+--------------+
Hourly breakdown of bytes processed
The following example shows total bytes billed for query jobs, in hourly intervals:
SELECT TIMESTAMP_TRUNC(end_time, HOUR) AS time_window, SUM(total_bytes_billed) AS bytes_billed FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE job_type = 'QUERY' AND statement_type != 'SCRIPT' GROUP BY time_window ORDER BY time_window DESC;
The result is similar to the following:
+-------------------------+--------------+ | time_window | bytes_billed | +-------------------------+--------------+ | 2022-05-17 20:00:00 UTC | 1967128576 | | 2022-05-10 21:00:00 UTC | 0 | | 2022-04-15 20:00:00 UTC | 10485760 | | 2022-04-15 17:00:00 UTC | 41943040 | +-------------------------+--------------+
Query jobs per table
The following example shows how many times each table queried in my_project
was referenced by a query job:
SELECT t.project_id, t.dataset_id, t.table_id, COUNT(*) AS num_references FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS, UNNEST(referenced_tables) AS t GROUP BY t.project_id, t.dataset_id, t.table_id ORDER BY num_references DESC;
The result is similar to the following:
+------------+------------+----------+----------------+ | project_id | dataset_id | table_id | num_references | +------------+------------+----------+----------------+ | my_project | dataset1 | orders | 58 | | my_project | dataset1 | products | 40 | | my_project | dataset2 | sales | 30 | | other_proj | dataset1 | accounts | 12 | +------------+------------+----------+----------------+
Number of partitions modified by query and load jobs per table
The following example shows the number of partitions modified by queries with
DML statements and load jobs, per table. Note that this query doesn't show
the total_modified_partitions
for copy jobs.
SELECT destination_table.table_id, SUM(total_modified_partitions) AS total_modified_partitions FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE DATE(creation_time, "America/Los_Angeles") = CURRENT_DATE() GROUP BY table_id ORDER BY total_modified_partitions DESC
Most expensive queries by project
The following example lists the most expensive queries in my_project
by slot
usage time:
SELECT job_id, query, user_email, total_slot_ms FROM `my_project`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_slot_ms DESC LIMIT 4
You can also list the most expensive queries by data processed with the following example:
SELECT job_id, query, user_email, total_bytes_processed FROM `my_project`.`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE EXTRACT(DATE FROM creation_time) = current_date() ORDER BY total_bytes_processed DESC LIMIT 4
The result for either example is similar to the following:
+--------------+---------------------------------+-----------------------+---------------+ | job_id | query | user_email | total_slot_ms | +--------------+---------------------------------+--------------------------+------------+ | examplejob_1 | SELECT ... FROM dataset.table1 | bob@example.com | 80,000 | | examplejob_2 | SELECT ... FROM dataset.table2 | alice@example.com | 78,000 | | examplejob_3 | SELECT ... FROM dataset.table3 | charles@example.com | 75,000 | | examplejob_4 | SELECT ... FROM dataset.table4 | tina@example.com | 72,000 | +--------------+---------------------------------+-----------------------+---------------+
Get details about a resource warning
If you get a Resources exceeded error message, you can inquire about the queries in a time window:
SELECT query, query_info.resource_warning FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE creation_time BETWEEN TIMESTAMP("2022-12-01") AND TIMESTAMP("2022-12-08") AND query_info.resource_warning IS NOT NULL LIMIT 50;
Monitor resource warnings grouped by date
If you get a Resources exceeded error message, you can monitor the total number of resource warnings grouped by date to know if there are any changes to workload:
WITH resource_warnings AS ( SELECT EXTRACT(DATE FROM creation_time) AS creation_date FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 14 DAY) AND query_info.resource_warning IS NOT NULL ) SELECT creation_date, COUNT(1) AS warning_counts FROM resource_warnings GROUP BY creation_date ORDER BY creation_date DESC;
Estimate slot usage and cost for queries
The following example computes the average slots and max slots for
each job by using estimated_runnable_units
.
The reservation_id
is NULL
if you don't have any reservations.
SELECT project_id, job_id, reservation_id, EXTRACT(DATE FROM creation_time) AS creation_date, TIMESTAMP_DIFF(end_time, start_time, SECOND) AS job_duration_seconds, job_type, user_email, total_bytes_billed, -- Average slot utilization per job is calculated by dividing total_slot_ms by the millisecond duration of the job SAFE_DIVIDE(job.total_slot_ms,(TIMESTAMP_DIFF(job.end_time, job.start_time, MILLISECOND))) AS job_avg_slots, query, -- Determine the max number of slots used at ANY stage in the query. -- The average slots might be 55. But a single stage might spike to 2000 slots. -- This is important to know when estimating number of slots to purchase. MAX(SAFE_DIVIDE(unnest_job_stages.slot_ms,unnest_job_stages.end_ms - unnest_job_stages.start_ms)) AS jobstage_max_slots, -- Check if there's a job that requests more units of works (slots). If so you need more slots. -- estimated_runnable_units = Units of work that can be scheduled immediately. -- Providing additional slots for these units of work accelerates the query, -- if no other query in the reservation needs additional slots. MAX(unnest_timeline.estimated_runnable_units) AS estimated_runnable_units FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS AS job CROSS JOIN UNNEST(job_stages) as unnest_job_stages CROSS JOIN UNNEST(timeline) AS unnest_timeline WHERE project_id = 'my_project' AND statement_type != 'SCRIPT' AND DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 7 DAY) AND CURRENT_DATE() GROUP BY 1,2,3,4,5,6,7,8,9,10 ORDER BY job_id;
View performance insights for queries
The following example returns all query jobs that have performance insights from your project in the last 30 days, along with a URL that links to the query execution graph in the Google Cloud console.
SELECT `bigquery-public-data`.persistent_udfs.job_url( project_id || ':us.' || job_id) AS job_url, query_info.performance_insights FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS_BY_PROJECT WHERE DATE(creation_time) >= CURRENT_DATE - 30 -- scan 30 days of query history AND job_type = 'QUERY' AND state = 'DONE' AND error_result IS NULL AND statement_type != 'SCRIPT' AND EXISTS ( -- Only include queries which had performance insights SELECT 1 FROM UNNEST( query_info.performance_insights.stage_performance_standalone_insights ) WHERE slot_contention OR insufficient_shuffle_quota UNION ALL SELECT 1 FROM UNNEST( query_info.performance_insights.stage_performance_change_insights ) WHERE input_data_change.records_read_diff_percentage IS NOT NULL );
View metadata refresh jobs
The following example lists the metadata refresh jobs:
SELECT * FROM `region-aws-us-east-1.INFORMATION_SCHEMA.JOBS_BY_PROJECT` WHERE job_id LIKE '%metadata_cache_refresh%' AND creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 6 HOUR) ORDER BY start_time desc LIMIT 10;
Analyze performance over time for identical queries
The following example returns the top 10 slowest jobs over the past 7 days that have run the same query:
DECLARE querytext STRING DEFAULT( SELECT query FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE job_id = 'JOB_ID' LIMIT 1 ); SELECT start_time, end_time, project_id, job_id, TIMESTAMP_DIFF(end_time, start_time, SECOND) AS run_secs, total_bytes_processed / POW(1024, 3) AS total_gigabytes_processed, query FROM `region-REGION_NAME`.INFORMATION_SCHEMA.JOBS WHERE query = querytext AND total_bytes_processed > 0 AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY) ORDER BY 5 DESC LIMIT 10;
Replace JOB_ID
with any
job_id
that ran the query you are analyzing.
Match slot usage behavior from administrative resource charts
To explore slot usage behavior similar to the information in administrative
resource charts, query the
INFORMATION_SCHEMA.JOBS_TIMELINE
view.