Sample MQL queries

This document introduces Monitoring Query Language (MQL) through examples. However, it doesn't attempt to cover all aspects of the language. MQL is comprehensively documented in Monitoring Query Language reference.

For information on MQL-based alerting policies, see Alerting policies with MQL.

You can write a particular query in many forms; the language is flexible, and there are many shortcuts you can use after you are familiar with the syntax. For more information, see Strict-form queries.

Before you begin

To access the code editor when using Metrics Explorer, do the following:

  1. In the Google Cloud console, go to the  Metrics explorer page:

    Go to Metrics explorer

    If you use the search bar to find this page, then select the result whose subheading is Monitoring.

  2. In the toolbar of the query-builder pane, select the button whose name is either  MQL or  PromQL.
  3. Verify that MQL is selected in the Language toggle. The language toggle is in the same toolbar that lets you format your query.

To run a query, paste the query into the editor and click Run Query. For an introduction to this editor, see Use the code editor for MQL.

Some familiarity with Cloud Monitoring concepts including metric types, monitored-resource types, and time series is helpful. For an introduction to these concepts, see Metrics, time series, and resources.

Data model

MQL queries retrieve and manipulate data in the Cloud Monitoring time-series database. This section introduces some of the concepts and terminology related to that database. For detailed information, see the reference topic Data model.

Every time series originates from a single type of monitored resource, and every time series collects data of one metric type. A monitored-resource descriptor defines a monitored-resource type. Similarly, a metric descriptor defines a metric type. For example, the resource type might be gce_instance, a Compute Engine virtual machine (VM), and the metric type might be compute.googleapis.com/instance/cpu/utilization, the CPU utilization of the Compute Engine VM.

These descriptors also specify a set of labels that are used to collect information about other attributes of the metric or resource type. For example, resources typically have a zone label, used to record the geographic location of the resource.

One time series is created for each combination of values for the labels from the pair of a metric descriptor and a monitored-resource descriptor.

You can find the available labels for resource types in the Monitored resource list, for example gce_instance. To find the labels for metric types, see the Metrics list; for example, see metrics from Compute Engine.

The Cloud Monitoring database stores the time series from a particular metric and resource type in one table. The metric and resource type act as the identifier for the table. This MQL query fetches the table of time series recording CPU utilization for Compute Engine instances:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization

There is one time series in the table for each unique combination of metric and resource label values.

MQL queries retrieve time-series data from these tables and transform it into output tables. These output tables can be passed into other operations. For example, you can isolate the time series written by resources in a particular zone or set of zones by passing the retrieved table as input to a filter operation:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter zone =~ 'us-central.*'

The preceding query results in a table that contains only the time series from resources in a zone that begins with us-central.

MQL queries are structured to pass the output of one operation as the input to the next operation. This table-based approach lets you link operations together to manipulate this data by filtering, selection, and other familiar database operations like inner and outer joins. You can also run various functions on the data in the time series as the data is passed from one operation to another.

The operations and functions available in MQL are fully documented in Monitoring Query Language reference.

Query structure

A query is made up of one or more operations. Operations are linked, or piped, together so that the output of one operation is the input to the next. Therefore, the result of a query depends on the order of the operations. Some of the things you can do include the following:

  • Start a query with a fetch or other selection operation.
  • Build up a query with multiple operations piped together.
  • Select a subset of information with filter operations.
  • Aggregate related information with group_by operations.
  • Look at outliers with top and bottom operations.
  • Combine multiple queries with { ; } and join operations.
  • Use the value operation and functions to compute ratios and other values.

Not all queries use all of these options.

These examples introduce only some of the available operations and functions. For detailed information about the structure of MQL queries, see reference topic Query Structure.

These examples don't specify two things that you might expect to see: time ranges and alignment. The following sections explain why.

Time ranges

When you use the code editor, the chart settings define the time range for queries. By default, the chart's time range is set to one hour.

To change the time range of the chart, use the time-range selector. For example, if you want to view the data for the past week, then select Last 1 week from the time-range selector. You can also specify a start and end time, or specify a time to view around.

For more information about time ranges in the code editor, see Time ranges, charts, and the code editor.

Alignment

Many of the operations used in these examples, like the join and group_by operations, depend on all the time series points in a table occurring at regular intervals. The act of making all the points line up at regular timestamps is called alignment. Usually, alignment is done implicitly, and none of the examples here show it.

MQL automatically aligns tables for join and group_by operations when needed, but MQL lets you do alignment explicitly as well.

Fetch and filter data

MQL queries start with the retrieval and selection or filtering of data. This section illustrates some basic retrieval and filtering with MQL.

Retrieve time-series data

A query always starts with a fetch operation, which retrieves time series from Cloud Monitoring.

The simplest query consists of a single fetch operation and an argument that identifies the time series to fetch, such as the following:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization

The argument consists of a monitored-resource type, gce_instance, a pair of colon characters, ::, and a metric type, compute.googleapis.com/instance/cpu/utilization.

This query retrieves the time series written by Compute Engine instances for the metric type compute.googleapis.com/instance/cpu/utilization, which records the CPU utilization of those instances.

If you run the query from the code editor in Metrics Explorer, you get a chart showing each of the requested time series:

Chart shows CPU utilization data for Compute Engine instances.

Each of the requested time series is displayed as a line on the chart. Each time series includes a list of time-stamped values from the CPU-utilization metric for one VM instance in this project.

In the backend storage used by Cloud Monitoring, time series are stored in tables. The fetch operation organizes the time series for the specified monitored-resource and metric types into a table, and then it returns the table. The returned data is displayed in the chart.

The fetch operation is described, along with its arguments, on the fetch reference page. For more information on the data produced by operations, see the reference pages for time series and tables.

Filter operations

Queries typically consist of a combination of multiple operations. The simplest combination is to pipe the output of one operation into the input of the next by using the pipe operator, |. The following example illustrates using a pipe to input the table to a filter operation:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter instance_name =~ 'gke.*'

This query pipes the table, returned by the fetch operation shown in the previous example, into a filter operation which takes as an expression that evaluates to a boolean value. In this example, the expression means "instance_name starts with gke".

The filter operation takes the input table, removes the time series for which the filter is false, and outputs the resulting table. The following screenshot shows the resulting chart:

Chart shows results filtered for
`gke`.

If you don't have any instance names that start with gke, change the filter before trying this query. For example, if you have VM instances with apache at the beginning of their names, use the following filter:

 | filter instance_name =~ 'apache.*'

The filter expression is evaluated once for each input time series. If the expression evaluates to true, that time series is included in the output. In this example, the filter expression does a regular expression match, =~, on the instance_name label of each time series. If the value of the label matches the regular expression 'gke.*', then the time series is included in the output. If not, the time series is dropped from the output.

For more information on filtering, see the filter reference page. The filter predicate can be any arbitrary expression that returns a boolean value; for more information, see Expressions.

Group and aggregate

Grouping lets you group time series along specific dimensions. Aggregation combines all the time series in a group into one output time series.

The following query filters the output of the initial fetch operation to retain only those time series from resources in a zone that begins with us-central. It then groups the time series by zone and combines them using mean aggregation.

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| filter zone =~ 'us-central.*'
| group_by [zone], mean(val())

The table resulting from the group_by operation has one time series per zone. The following screenshot shows the resulting chart:

Chart shows a filtered fetch grouped by
zone.

The group_by operation takes two arguments, separated by a comma, ,. These arguments determine the precise grouping behavior. In this example, group_by [zone], mean(val()), the arguments act as follows:

  • The first argument, [zone], is a map expression that determines the grouping of the time series. In this example, it specifies the labels to use for grouping. The grouping step collects all input time series that have the same output zone values into one group. In this example, the expression collects the time series from the Compute Engine VMs in one zone.

    The output time series has only a zone label, with the value copied from the input time series in the group. Other labels on the input time series are dropped from the output time series.

    The map expression can do much more than list labels; for more information, see the map reference page.

  • The second argument, mean(val()), determines how the time series in each group are combined, or aggregated, into one output time series. Each point in the output time series for a group is the result of aggregating the points with the same timestamp from all input time series in the group.

    The aggregation function, mean in this example, determines the aggregated value. The val() function returns the points to be aggregated, the aggregation function is applied to those points. In this example, you get the mean of the CPU utilization of the virtual machines in the zone at each output time point.

    The expression mean(val()) is an example of an aggregating expression.

The group_by operation always combines grouping and aggregation. If you specify a grouping but omit the aggregation argument, group_by uses a default aggregation, aggregate(val()), which selects an appropriate function for the data type. See aggregate for the list of default aggregation functions.

Use group_by with a log-based metric

Suppose you have created a distribution log-based metric to extract the number of data points processed from a set of long entries including strings like following:

... entry ID 1 ... Processed data points 1000 ...
... entry ID 2 ... Processed data points 1500 ...
... entry ID 3 ... Processed data points 1000 ...
... entry ID 4 ... Processed data points 500 ...

To create a time series that shows the count of all processed data points, use an MQL such as the following:

fetch global
| metric 'logging.googleapis.com/user/METRIC_NAME'
| group_by [], sum(sum_from(value))

To create a log-based distribution metric, see configure distribution metrics.

Exclude columns from a group

You can use the drop modifier in a mapping to exclude columns from a group. For example, the Kubernetes core_usage_time metric has six columns:

fetch k8s_container :: kubernetes.io/container/cpu/core_usage_time
| group_by [project_id, location, cluster_name, namespace_name, container_name]

If you don't need to group pod_name, then you can exclude it with drop:

fetch k8s_container :: kubernetes.io/container/cpu/core_usage_time
| group_by drop [pod_name]

Select time series

The examples in this section illustrate ways to select particular time series out of an input table.

Select top or bottom time series

To see the time series data for the three Compute Engine instances with the highest CPU utilization within your project, enter the following query:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| top 3

The following screenshot shows the result from one project:

Chart shows 3 highest-utilization time
series.

You can retrieve the time series with the lowest CPU utilization by replacing top with bottom.

The top operation outputs a table with a specified number of time series selected from its input table. The time series included in the output have the largest value for some aspect of the time series.

Since this query doesn't specify a way to order the time series, it returns those time series with the largest value for the most recent point. To specify how to determine which time series have the largest value, you can provide an argument to the top operation. For example, the previous query is equivalent to the following query:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| top 3, val()

The val() expression selects the value of the most recent point in each time series it is applied to. Therefore, the query returns those time series with the largest value for the most recent point.

You can provide an expression that does aggregation over some or all points in a time series to give the sorting value. The following takes the mean of all points within the last 10 minutes:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| top 3, mean(val()).within(10m)

If the within function isn't used, the mean function is applied to the values of all the displayed points in the time series.

The bottom operation works similarly. The following query finds the value of the largest point in each time series with max(val()) and then selects the three time series for which that value is smallest:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| bottom 3, max(val())

The following screenshot shows a chart displaying the streams with the smallest spikes:

Chart shows 3 highest-utilization time
series.

Exclude the top or bottom n results in the time series

Consider a scenario in which you have many Compute Engine VM instances. A few of these instances consume a lot more memory than most instances, and these outliers are making it harder to see the usage patterns in the larger group. Your CPU utilization charts look like the following:

Chart shows many CPU utilization lines, with several outliers.

You want to exclude the three outliers from the chart so that you can see the patterns in the larger group more clearly.

To exclude the top three time series in a query that retrieves the time series for Compute Engine CPU utilization, use the top table operation to identify the time series and the outer_join table operation to exclude the identified time series from the results. You can use the following query:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| {
    top 3 | value [is_default_value: false()]
  ;
    ident
  }
| outer_join true(), _
| filter is_default_value
| value drop [is_default_value]

The fetch operation returns a table of time series for CPU utilization from all instances. This table is then processed into two resulting tables:

  • The top n table operation outputs a table that contains the n time series with the highest values. In this case, n = 3. The resulting table contains the three times series to be excluded.

    The table containing the top three time series is then piped into a value table operation. This operation adds another column to each of the time series in the top-three table. This column, is_default_value, is given the boolean value false for all time series in the top-three table.

  • The ident operation returns the same table that was piped into it: the original table of CPU utilization time series. None of the time series in this table have the is_default_value column.

The top-three table and the original table are then piped into the outer_join table operation. The top-three table is the left table in the join, the fetched table is the right table in the join. The outer join is set up to provide the value true as the value for any field that doesn't exist in a row being joined. The result of the outer join is a merged table, with the rows from the top-three table keeping the column is_default_value with the value false, and all the rows from the original table that weren't also in the top-three table getting the is_default_value column with the value true.

The table resulting from the join is then passed to the filter table operation, which filters out the rows that have a value of false in the is_default_value column. The resulting table contains the rows from the originally fetched table without the rows from the top-three table. This table contains the intended set of time series, with the added is_default_column.

The final step is to drop the column is_default_column that was added by the join, so the output table has the same columns as the originally fetched table.

The following screenshot shows the chart for the prior query:

Chart shows many CPU utilization lines, with the outliers excluded.

You can create a query to exclude the time series with the lowest CPU utilization by replacing top n with bottom n.

The ability to exclude outliers can be useful in cases where you want to set an alert but don't want the outliers to constantly trigger the alert. The following alert query uses the same exclusion logic as the prior query to monitor the CPU limit utilization by a set of Kubernetes pods after excluding the top two pods:

fetch k8s_container
| metric 'kubernetes.io/container/cpu/limit_utilization'
| filter (resource.cluster_name == 'CLUSTER_NAME' &&
          resource.namespace_name == 'NAMESPACE_NAME' &&
          resource.pod_name =~ 'POD_NAME')
| group_by 1m, [value_limit_utilization_max: max(value.limit_utilization)]
| {
    top 2 | value [is_default_value: false()]
  ;
    ident
  }
| outer_join true(), _
| filter is_default_value
| value drop [is_default_value]
| every 1m
| condition val(0) > 0.73 '1'

Select top or bottom from groups

The top and bottom table operations select time series from the entire input table. The top_by and bottom_by operations group the time series in a table and then pick some number of time series from each group.

The following query selects the time series in each zone with the largest peak value:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| top_by [zone], 1, max(val())

Chart shows largest peak by zone.

The [zone] expression indicates that a group consists of the time series with the same value of the zone column. The 1 in the top_by indicates how many time series to select from each zone's group. The max(val()) expression looks for the largest value in the chart's time range in each time series.

You can use any aggregation function in place of max. For example, the following uses the mean aggregator and uses within to specify the sorting range of 20 minutes. It selects the top 2 time series in each zone:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| top_by [zone], 2, mean(val()).within(20m)

Chart shows 2 largest mean peak by zone within 20
minutes.

In the previous example, there is only one instance in the zone us-central-c, so there is only one time series returned; there isn't a "top 2" in the group.

Combine selections with union

You can combine selection operations like top and bottom to create charts that show both. For example, the following query returns the single time series with the maximum value and the single time series with the minimum value:

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| {
    top 1, max(val())
  ;
    bottom 1, min(val())
  }
| union

The resulting chart shows two lines, the one containing the highest value and the one containing the lowest:

Chart shows the time series with the highest and lowest
values.

You can use braces, { }, to specify sequences of operations, each of which yields one table of time series as output. The individual operations are separated by a semicolon, ;.

In this example, the fetch operation returns a single table, which is piped to each of the two operations in the sequence, a top operation and a bottom operation. Each of these operations results in an output table based on the same input table. The union operation then combines the two tables into one, which is displayed on the chart.

See more about sequencing operations by using { } in the reference topic Query Structure.

Combine time series with different values for one label

Suppose that you have multiple time series for the same metric type and you want to combine a few of them together. If you want to select them based on the values of a single label, then you can't create the query by using the query-builder interface in Metrics Explorer. You need to filter on two or more different values of the same label, but the query-builder interface requires that a time series match all the filters to be selected: the label matching is an AND test. No time series can have two different values for the same label, but you can't create an OR test for filters in the query builder.

The following query retrieves the time series for the Compute Engine instance/disk/max_read_ops_count metric for two specific Compute Engine instances and aligns the output over 1-minute intervals:

fetch gce_instance
| metric 'compute.googleapis.com/instance/disk/max_read_ops_count'
| filter (resource.instance_id == '1854776029354445619' ||
          resource.instance_id == '3124475757702255230')
| every 1m

The following chart shows a result of this query:

Chart shows two time series selected by value of the same label.

If you want to find the sum of the maximum max_read_ops_count values for these two VMs and sum them, you can do the following:

  • Find the maximum value for each time series by using the group_by table operator, specifying the same 1-minute alignment period and aggregating over the period with the max aggregator to create a column named max_val_of_read_ops_count_max in the output table.
  • Find the sum of the time series by using the group_by table operator and the sum aggregator on the max_val_of_read_ops_count_max column.

The following shows the query:

fetch gce_instance
| metric 'compute.googleapis.com/instance/disk/max_read_ops_count'
| filter (resource.instance_id == '1854776029354445619' ||
          resource.instance_id == '3124475757702255230')
| group_by 1m, [max_val_of_read_ops_count_max: max(value.max_read_ops_count)]
| every 1m
| group_by [], [summed_value: sum(max_val_of_read_ops_count_max)]

The following chart shows a result of this query:

Chart shows the sum of two time series selected by value of the same label.

Compute percentile statistics across time and across streams

To compute a percentile stream value over a sliding window separately for each stream, use a temporal group_by operation. For example, the following query computes the 99th percentile value of a stream over a 1-hour sliding window:

fetch gce_instance :: compute.googleapis.com/instance/cpu/utilization
| group_by 1h, percentile(val(), 99)
| every 1m

To compute the same percentile statistic at a point in time across streams, rather than across time within one stream, use a spatial group_by operation:

fetch gce_instance :: compute.googleapis.com/instance/cpu/utilization
| group_by [], percentile(val(), 99)

Compute ratios

Suppose you've built a distributed web service that runs on Compute Engine VM instances and uses Cloud Load Balancing.

You want to see a chart that displays the ratio of requests that return HTTP 500 responses (internal errors) to the total number of requests; that is, the request-failure ratio. This section illustrates several ways to compute the request-failure ratio.

Cloud Load Balancing uses the monitored-resource type http_lb_rule. The http_lb_rule monitored-resource type has a matched_url_path_rule label that records the prefix of URLs defined in for the rule; the default value is UNMATCHED.

The loadbalancing.googleapis.com/https/request_count metric type has a response_code_class label. This label captures the class of response codes.

Use outer_join and div

The following query determines the 500 responses for each value of the matched_url_path_rule label in each http_lb_rule monitored resource in your project. It then joins this failure-count table with the original table, which contains all response counts and divides the values to show the ratio of failure responses to total responses:

fetch https_lb_rule::loadbalancing.googleapis.com/https/request_count
| {
    filter response_code_class = 500
  ;
    ident
  }
| group_by [matched_url_path_rule]
| outer_join 0
| div

The following chart shows the result from one project:

Chart shows the request failure-to-total ratio by
joining.

The shaded areas around the lines on the chart are min/max bands; for more information, see Min/max bands.

The fetch operation outputs a table of time series containing counts of request for all load-balanced queries. This table is processed in two ways by the two operation sequences in the braces:

  • filter response_code_class = 500 outputs only the time series that have response_code_class label with the value 500. The resulting time series counts the requests with HTTP 5xx (error) response codes.

    This table is the numerator of the ratio.

  • The ident, or identity, operation outputs its input, so this operation returns the originally fetched table. That's the table that contains time series with counts for every response code.

    This table is the denominator of the ratio.

The numerator and denominator tables, produced by the filter and ident operations respectively, are processed separately by the group_by operation. The group_by operation groups the time series in each table by the value of the matched_url_path_rule label and sums the counts for each value of the label. This group_by operation doesn't explicitly state the aggregator function, so a default, sum, is used.

  • For the filtered table, the group_by result is the number of requests returning a 500 response for each matched_url_path_rule value.

  • For the identity table, the group_by result is the total number of requests for each matched_url_path_rule value.

These tables are piped to the outer_join operation, which pairs time series with matching label values, one from each of the two input tables. The paired time series are zipped up by matching the timestamp of each point in one time series to the timestamp of a point in the other time series. For each matched pair of points, outer_join produces a single output point with two values, one from each of the input tables. The zipped-up time series is output by the join with the same labels as the two input time series.

With an outer join, if a point from the second table doesn't have a matching point in the first, a stand-in value must be provided. In this example, a point with value 0—the argument to the outer_join operation—is used.

Finally, the div operation takes each point with two values and divides the values to produce a single output point: the ratio of 500 responses to all responses for each URL map.

The string div here is actually the name of the div function, which divides two numeric values. But it's used here as an operation. When used as operations, functions like div expect two values in each input point (which this join ensures) and produce a single value for the corresponding output point.

The | div part of the query is a shortcut for | value val(0) / val(1). The value operation allows arbitrary expressions on the value columns of an input table to produce the value columns of the output table. For more information, see the reference pages for the value operation and for expressions.

Use ratio

The div function could be replaced with any function on two values, but because ratios are so frequently used, MQL provides a ratio table operation that computes ratios directly.

The following query is equivalent to the preceding version, using outer_join and div:

fetch https_lb_rule::loadbalancing.googleapis.com/https/request_count
| {
    filter response_code_class = 500
  ;
    ident
  }
| group_by [matched_url_path_rule]
| ratio

In this version, the ratio operation replaces the outer_join 0 | div operations in the earlier version and produces the same result.

Note that ratio only uses outer_join to supply a 0 for the numerator if both the numerator and denominator inputs have the same labels identifying each time series, which MQL outer_join requires. If the numerator input has extra labels, then there will be no output for any point missing in the denominator.

Use group_by and /

There's yet another way to compute the ratio of error responses to all responses. In this case, because the numerator and denominator for the ratio are derived from the same time series, you can also compute the ratio by grouping alone. The following query shows this approach:

fetch https_lb_rule::loadbalancing.googleapis.com/https/request_count
| group_by [matched_url_path_rule],
    sum(if(response_code_class = 500, val(), 0)) / sum(val())

This query uses an aggregation expression built on the ratio of two sums:

  • The first sum uses the if function to count 500-valued labels and 0 for others. The sum function computes the count of the requests that returned 500.

  • The second sum adds up the counts for all requests, val().

The two sums are then divided, resulting in the ratio of 500 responses to all responses. This query produces the same result as those queries in Using outer_join and div and Using ratio.

Use filter_ratio_by

Because ratios are frequently computed by dividing two sums derived from the same table, MQL provides the filter_ratio_by operation for this purpose. The following query does the same thing as the preceding version, which explicitly divides the sums:

fetch https_lb_rule::loadbalancing.googleapis.com/https/request_count
| filter_ratio_by [matched_url_path_rule], response_code_class = 500

The first operand of the filter_ratio_by operation, here [matched_url_path_rule], indicates how to group the responses. The second operation, here response_code_class = 500, acts as a filtering expression for the numerator.

  • The denominator table is the result of grouping the fetched table by matched_url_path_rule and aggregated by using sum.
  • The numerator table is the fetched table, filtered for time series with an HTTP response code of 5xx, and then grouped by matched_url_path_rule and aggregated by using sum.

Ratios and quota metrics

To set up queries and alerts on serviceruntime quota metrics and resource-specific quota metrics to monitor your quota consumption, you can use MQL. For more information, including examples, see Using quota metrics.

Arithmetic computation

Sometimes you might want to perform an arithmetic operation on data before you chart it. For example, you might want to scale time series, convert the data to log scale, or chart the sum of two time series. For a list of arithmetic functions available in MQL, see Arithmetic.

To scale a time series, use the mul function. For example, the following query retrieves the time series and then multiplies each value by 10:

  fetch gce_instance
  | metric 'compute.googleapis.com/instance/disk/read_bytes_count'
  | mul(10)

To sum two time series, configure your query to fetch two tables of time series, join those results, and then call the add function. The following example illustrates a query that computes the sum of the number of bytes read from, and written to, Compute Engine instances:

  fetch gce_instance
  | { metric 'compute.googleapis.com/instance/disk/read_bytes_count'
    ; metric 'compute.googleapis.com/instance/disk/write_bytes_count' }
  | outer_join 0
  | add

To subtract the written byte counts from the read bytes count, replace add with sub in the previous expression.

MQL uses the labels in the sets of tables returned from the first and second fetch to determine how to join the tables:

  • If the first table contains a label not found in the second table, then MQL can't perform an outer_join operation on the tables and therefore it reports an error. For example, the following query causes an error because the metric.instance_name label is present in the first table but not in the second table:

     fetch gce_instance
      | { metric 'compute.googleapis.com/instance/disk/write_bytes_count'
        ; metric 'compute.googleapis.com/instance/disk/max_write_bytes_count' }
      | outer_join 0
      | add
    

    One way to resolve this type of error is to apply grouping clauses to ensure the two tables have the same labels. For example, you can group away the all time series labels:

     fetch gce_instance
      | { metric 'compute.googleapis.com/instance/disk/write_bytes_count'
          | group_by []
        ; metric 'compute.googleapis.com/instance/disk/max_write_bytes_count'
          | group_by [] }
      | outer_join 0
      | add
    
  • If the labels of the two tables match, or if the second table contains a label not found in the first table, then the outer join is allowed. For example, the following query doesn't cause an error even though the metric.instance_name label is present in the second table, but not the first:

     fetch gce_instance
      | { metric 'compute.googleapis.com/instance/disk/max_write_bytes_count'
        ; metric 'compute.googleapis.com/instance/disk/write_bytes_count' }
      | outer_join 0
      | sub
    

    A time series found in the first table might have label values that match multiple time series in the second table, so MQL performs the subtraction operation for each pairing.

Time shifting

Sometimes you want to compare what is going on now with what has happened in the past. To let you compare past data to current data, MQL provides the time_shift table operation to move data from the past into the current time period.

Over-time ratios

The following query uses time_shift, join, and div to compute the ratio of the mean utilization in each zone between now and one week ago.

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| group_by [zone], mean(val())
| {
    ident
  ;
    time_shift 1w
  }
| join | div

The following chart shows a possible result of this query:

Chart shows the ratio of current and time-shifted
data.

The first two operations fetch the time series and then group them, by zone, computing the mean values for each. The resulting table is then passed to two operations. The first operation, ident, passes the table through unchanged.

The second operation, time_shift, adds the period (1 week) to the timestamps for values in the table, which shifts data from one week ago forward. This change makes the timestamps for older data in the second table line up with the timestamps for the current data in the first table.

The unchanged table and the time-shifted table are then combined by using an inner join. The join produces a table of time series where each point has two values: the current utilization and the utilization a week ago. The query then uses the div operation to compute the ratio of the current value to the week-old value.

Past and present data

By combining time_shift with union, you can create a chart that shows past and present data simultaneously. For example, the following query returns the overall mean utilization now and from a week ago. Using union, you can display these two results on the same chart.

fetch gce_instance::compute.googleapis.com/instance/cpu/utilization
| group_by []
| {
     add [when: "now"]
  ;
     add [when: "then"] | time_shift 1w
  }
| union

The following chart shows a possible result of this query:

Chart shows the current and past mean
utilization.

This query fetches the time series and then uses group_by [] to combine them into a single time series with no labels, leaving the CPU-utilization data points. This result is passed to two operations. The first adds a column for a new label called when with the value now. The second adds a label called when with the value then and passes the result to the time_shift operation to shift the values by a week. This query uses the add map modifier; see Maps for more information.

The two tables, each containing data for a single time series, are passed to union, which produces one table containing the time series from both input tables.

What's next

For an overview of MQL language structures, see About the MQL language.

For a complete description of MQL, see the Monitoring Query Language reference.

For information about interacting with charts, see Working with charts.