Define list of string form query results to use in loop operation

Hi, I wanted to do this looping in dataform .sqlx file:

 

 

 

js {
const tables = [
'table1',
'table2',
'table3'
];

const selectAllFromTables = tables.map(t => {
    return `select * from \`${t}\``;
}).join(`
union all
`);
}

${selectAllFromTables}

 

 

 

 But I want the list of strings (const table = ) to come from the results of a query, very similar to run_query in dbt. Is it possible in dataform?

7 9 1,339
9 REPLIES 9

It is possible to achieve this in Dataform, though it requires a slightly different approach than what you might be used to in dbt. Dataform doesn't have a direct equivalent to run_query, but you can still dynamically generate queries based on the results of other queries.

Here's how you can adapt your approach for Dataform:

  1. Fetching Table Names Dynamically

First, you'll need a query that retrieves the table names you want to use. You can do this within a JavaScript block:

 
// Define the query to get the list of table names
const tableQuery = `
  SELECT table_name 
  FROM information_schema.tables 
  WHERE table_schema = 'your_schema'
`;

// Run the query and get the results
const tableResults = dataform.runQuery(tableQuery);

// Extract the table names from the query results
const tables = tableResults.map(row => row.table_name);
  1. Constructing the UNION ALL Query

Now you can use the dynamically populated tables array to build your query:

 
const selectAllFromTables = tables.map(t => {
  return `SELECT * FROM \`${t}\``;
}).join(` UNION ALL `);
  1. Executing the Generated Query

You can then use the generated query within your SQLX file:

 
-- # Pre-operations
-- This section is for any setup tasks you might need

-- # Main query
${selectAllFromTables}

Complete Example:

 
js {
  // Define the query to get the list of table names
  const tableQuery = `
    SELECT table_name 
    FROM information_schema.tables 
    WHERE table_schema = 'your_schema'
  `;

  // Run the query and get the results
  const tableResults = dataform.runQuery(tableQuery);

  // Extract the table names from the query results
  const tables = tableResults.map(row => row.table_name);

  // Generate the UNION ALL query
  const selectAllFromTables = tables.map(t => {
    return `SELECT * FROM \`${t}\``;
  }).join(` UNION ALL `);

  // Return the generated query
  return selectAllFromTables;
}

// Use the generated query in your SQLX file
${selectAllFromTables}

Key Considerations

  • Performance: Be mindful when dynamically generating large UNION ALL queries, as this can impact performance.
  • Error Handling: Consider adding error handling (e.g., checking if tableResults is empty) to make your code more robust.
  • Schema Changes: If the tables you're unioning have varying schemas, you might need to adjust your select list or handle column mismatches.

Hi @ms4446 , thanks for the reply, just tried it but I get this error on dataform

 

dataform.runQuery is not a function

 

What should I do to make the function runQuery function to work on Dataform?


Also, do you mind to give an example if I were to separate schema name & table name and use their index?
I am using multiple schema. Here's what I'm working on, but not sure if it's gonna work:

 

js {
const tableQuery = `
  SELECT
    table_schema
    , table_name
    , column_y
  FROM source_schema.source_table
`;

const tableResults = dataform.runQuery(tableQuery);
const tableSchema = tableResults.map(row => row.table_schema);
const tableName = tableResults.map(row => row.table_name);
const columnY = tableResults.map(row => row.column_y);

// Generate the UNION ALL query
const selectAllFromTables = tableNames.map((tableName, i) => {
    return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
}).join(' UNION ALL ');

  // Return the generated query
  return selectAllFromTables;
}

--  Use the generated query in your SQLX file
${selectAllFromTables}

 

 

To achieve this in Dataform, you'll need to use Dataform's built-in resolve function instead of runQuery. Dataform does not have a runQuery function, but it allows you to dynamically generate SQL queries using JavaScript and the resolve function to reference other datasets within your project.

Given your requirements and the error you encountered, here's a correct and adapted approach for dynamically generating the UNION ALL query across multiple schemas in Dataform:

Define and Execute the Query to Get Table Names Dynamically:

 
// Define the query to get the list of table names and schema names
const tableQuery = `
SELECT table_schema, table_name, column_y 
FROM source_schema.source_table
`;

// Execute the query using the built-in resolve function
const tableResults = resolve(tableQuery);

// Extract schema names, table names, and column names from the query results
const tableSchemas = tableResults.map(row => row.table_schema);
const tableNames = tableResults.map(row => row.table_name);
const columnY = tableResults.map(row => row.column_y);

// Generate the UNION ALL query
const selectAllFromTables = tableNames.map((tableName, i) => {
  return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
}).join(' UNION ALL ');

// Return the generated query
selectAllFromTables;

Use the Generated Query in Your SQLX File:

 
-- # Pre-operations
-- This section is for any setup tasks you might need

-- # Main query
js {
  // Define the query to get the list of table names and schema names
  const tableQuery = `
  SELECT table_schema, table_name, column_y 
  FROM source_schema.source_table
  `;

  // Execute the query using the built-in resolve function
  const tableResults = resolve(tableQuery);

  // Extract schema names, table names, and column names from the query results
  const tableSchemas = tableResults.map(row => row.table_schema);
  const tableNames = tableResults.map(row => row.table_name);
  const columnY = tableResults.map(row => row.column_y);

  // Generate the UNION ALL query
  const selectAllFromTables = tableNames.map((tableName, i) => {
    return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
  }).join(' UNION ALL ');

  // Return the generated query
  selectAllFromTables;
}

// Use the generated query in your SQLX file
${selectAllFromTables}

Some Key Considerations

  • resolve Function: The resolve function allows you to reference other datasets or run simple queries within Dataform.
  • Error Handling: Consider adding checks to handle cases where tableResults might be empty or where there might be discrepancies in the schemas.

Unfortunately it doesnt work, it seems that map function is not supporting multiple parameters.

Given the limitations with the map function in Dataform, here’s a revised approach to dynamically generate a UNION ALL query across multiple schemas. This solution avoids using unsupported multiple parameters in map and leverages JavaScript within SQLX to handle the dynamic SQL generation properly.

1. Create the Helper SQLX File to Get Table Metadata

First, create a SQLX file to output the required metadata, such as schema names, table names, and column names. This will serve as your source for dynamically constructing the query.

 
config {
  type: "table", // or "view" if you just need it temporarily
}

SELECT
  table_schema,
  table_name,
  column_y
FROM source_schema.source_table
  1. Create the Main SQLX File to Generate the Dynamic Query
    • In this SQLX file, use JavaScript to fetch the table names and schemas from the metadata query and dynamically generate the UNION ALL SQL:
 
-- filename: dynamic_union_all.sqlx

config {
  type: "table", // or "view" if you want the output to be a view
}

js {
  // Define the query to get the list of table names and schema names
  const tableQuery = `
    SELECT table_schema, table_name, column_y 
    FROM ${resolve("table_metadata")}
  `;

  // Function to run the query and get results (async for potential delays)
  async function fetchTableResults() {
    const result = await dataform.queryData(tableQuery);
    return result.rows;
  }

  // Execute the query to get table results
  const tableResults = await fetchTableResults();

  // Initialize the UNION ALL query string
  let selectAllFromTables = "";

  // Loop through the results to construct the query
  for (const row of tableResults) {
    const schema = row.table_schema;
    const table = row.table_name;
    const column = row.column_y;
    
    // Add each table's select statement to the query string (UNION ALL after the first)
    if (selectAllFromTables) {
      selectAllFromTables += " UNION ALL ";
    }
    selectAllFromTables += `SELECT id, sent_at, \`${column}\` FROM \`${schema}\`.\`${table}\``;
  }

  // Return the generated query
  return selectAllFromTables;
}

// Use the generated query in your SQLX file (this is where the dynamic SQL is inserted)
${selectAllFromTables}

 

  • Define Metadata Query: Use tableQuery to construct the SQL for retrieving metadata from table_metadata.
  • Fetch Results: Use dataform.queryData (or your environment’s equivalent function) to execute the SQL and get results. This assumes an asynchronous function to fetch query results.
  • Loop Through Results: Iterate over each row of the results to construct the UNION ALL query string.
  • Build SQL Query: Concatenate each SELECT statement with UNION ALL to form the final query.

 

Example for Full Scenario

 
-- filename: table_metadata.sqlx

config {
  type: "table", // or "view"
}

SELECT
  table_schema,
  table_name,
  column_y
FROM source_schema.source_table
Dynamic Query: dynamic_union_all.sqlx
 

Additional Considerations

  • Async Handling: Make sure you're using await correctly if your environment supports asynchronous JavaScript execution.
  • Schema Compatibility: Double-check that all your tables have the same column structure to avoid errors.
  • Error Handling: Add some code to catch any potential errors that might come up, like if the query results are empty or have unexpected values.

Hi,
Have you tried it on your side ?

I tried adapting it to my use case but also tried your raw code and gets "await is only valid in async functions and the top level bodies of modules" error.

Do you know if there's a solution ?

Thanks

Hello, 

I have exactly the same issue. I did try even the suggestion here but it did not work.

Did you find a solution please ? Is is possible to do this in Dataform ? 

Thanks

As it was answered in previous answers (via LLM 🙂 ), Dataform doesn't have an alternative to DBT's run_query macro. One possible workaround that I found is to use direct scripting in BigQuery with EXECUTE IMMEDIATE, so in your concrete case it could look something like this:

 

config {
    type: "operations"
}

DECLARE query string;

SET query = (SELECT ARRAY_TO_STRING(q, ' UNION ALL ') FROM ( -- concatenates table names and UNION ALL into "query" variable

SELECT ARRAY_AGG(CONCAT('SELECT * FROM <gcp_project>.<dataset>.', table_name)) AS q FROM <gcp_project>.<dataset>.INFORMATION_SCHEMA.TABLES -- get table names via query
)
);

EXECUTE IMMEDIATE query -- executes constructed query

 

 I'm not aware of any generic solution in Dataform

Thank you for pointing out the answers were LLM generated - I was getting very irritated by seeing the exact same conversation flow gemini has already been giving me all day. Your solution does work for my case, but I hope it helps others!