Hi, I wanted to do this looping in dataform .sqlx file:
js {
const tables = [
'table1',
'table2',
'table3'
];
const selectAllFromTables = tables.map(t => {
return `select * from \`${t}\``;
}).join(`
union all
`);
}
${selectAllFromTables}
But I want the list of strings (const table = ) to come from the results of a query, very similar to run_query in dbt. Is it possible in dataform?
It is possible to achieve this in Dataform, though it requires a slightly different approach than what you might be used to in dbt. Dataform doesn't have a direct equivalent to run_query
, but you can still dynamically generate queries based on the results of other queries.
Here's how you can adapt your approach for Dataform:
First, you'll need a query that retrieves the table names you want to use. You can do this within a JavaScript block:
// Define the query to get the list of table names
const tableQuery = `
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'your_schema'
`;
// Run the query and get the results
const tableResults = dataform.runQuery(tableQuery);
// Extract the table names from the query results
const tables = tableResults.map(row => row.table_name);
Now you can use the dynamically populated tables
array to build your query:
const selectAllFromTables = tables.map(t => {
return `SELECT * FROM \`${t}\``;
}).join(` UNION ALL `);
You can then use the generated query within your SQLX file:
-- # Pre-operations
-- This section is for any setup tasks you might need
-- # Main query
${selectAllFromTables}
Complete Example:
js {
// Define the query to get the list of table names
const tableQuery = `
SELECT table_name
FROM information_schema.tables
WHERE table_schema = 'your_schema'
`;
// Run the query and get the results
const tableResults = dataform.runQuery(tableQuery);
// Extract the table names from the query results
const tables = tableResults.map(row => row.table_name);
// Generate the UNION ALL query
const selectAllFromTables = tables.map(t => {
return `SELECT * FROM \`${t}\``;
}).join(` UNION ALL `);
// Return the generated query
return selectAllFromTables;
}
// Use the generated query in your SQLX file
${selectAllFromTables}
Key Considerations
tableResults
is empty) to make your code more robust.Hi @ms4446 , thanks for the reply, just tried it but I get this error on dataform
dataform.runQuery is not a function
What should I do to make the function runQuery function to work on Dataform?
Also, do you mind to give an example if I were to separate schema name & table name and use their index?
I am using multiple schema. Here's what I'm working on, but not sure if it's gonna work:
js {
const tableQuery = `
SELECT
table_schema
, table_name
, column_y
FROM source_schema.source_table
`;
const tableResults = dataform.runQuery(tableQuery);
const tableSchema = tableResults.map(row => row.table_schema);
const tableName = tableResults.map(row => row.table_name);
const columnY = tableResults.map(row => row.column_y);
// Generate the UNION ALL query
const selectAllFromTables = tableNames.map((tableName, i) => {
return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
}).join(' UNION ALL ');
// Return the generated query
return selectAllFromTables;
}
-- Use the generated query in your SQLX file
${selectAllFromTables}
To achieve this in Dataform, you'll need to use Dataform's built-in resolve
function instead of runQuery
. Dataform does not have a runQuery
function, but it allows you to dynamically generate SQL queries using JavaScript and the resolve
function to reference other datasets within your project.
Given your requirements and the error you encountered, here's a correct and adapted approach for dynamically generating the UNION ALL query across multiple schemas in Dataform:
Define and Execute the Query to Get Table Names Dynamically:
// Define the query to get the list of table names and schema names
const tableQuery = `
SELECT table_schema, table_name, column_y
FROM source_schema.source_table
`;
// Execute the query using the built-in resolve function
const tableResults = resolve(tableQuery);
// Extract schema names, table names, and column names from the query results
const tableSchemas = tableResults.map(row => row.table_schema);
const tableNames = tableResults.map(row => row.table_name);
const columnY = tableResults.map(row => row.column_y);
// Generate the UNION ALL query
const selectAllFromTables = tableNames.map((tableName, i) => {
return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
}).join(' UNION ALL ');
// Return the generated query
selectAllFromTables;
Use the Generated Query in Your SQLX File:
-- # Pre-operations
-- This section is for any setup tasks you might need
-- # Main query
js {
// Define the query to get the list of table names and schema names
const tableQuery = `
SELECT table_schema, table_name, column_y
FROM source_schema.source_table
`;
// Execute the query using the built-in resolve function
const tableResults = resolve(tableQuery);
// Extract schema names, table names, and column names from the query results
const tableSchemas = tableResults.map(row => row.table_schema);
const tableNames = tableResults.map(row => row.table_name);
const columnY = tableResults.map(row => row.column_y);
// Generate the UNION ALL query
const selectAllFromTables = tableNames.map((tableName, i) => {
return `SELECT id, sent_at, \`${columnY[i]}\` FROM \`${tableSchemas[i]}\`.\`${tableName}\``;
}).join(' UNION ALL ');
// Return the generated query
selectAllFromTables;
}
// Use the generated query in your SQLX file
${selectAllFromTables}
Some Key Considerations
resolve
Function: The resolve
function allows you to reference other datasets or run simple queries within Dataform.tableResults
might be empty or where there might be discrepancies in the schemas.Unfortunately it doesnt work, it seems that map function is not supporting multiple parameters.
Given the limitations with the map
function in Dataform, here’s a revised approach to dynamically generate a UNION ALL
query across multiple schemas. This solution avoids using unsupported multiple parameters in map
and leverages JavaScript within SQLX to handle the dynamic SQL generation properly.
1. Create the Helper SQLX File to Get Table Metadata
First, create a SQLX file to output the required metadata, such as schema names, table names, and column names. This will serve as your source for dynamically constructing the query.
config {
type: "table", // or "view" if you just need it temporarily
}
SELECT
table_schema,
table_name,
column_y
FROM source_schema.source_table
-- filename: dynamic_union_all.sqlx
config {
type: "table", // or "view" if you want the output to be a view
}
js {
// Define the query to get the list of table names and schema names
const tableQuery = `
SELECT table_schema, table_name, column_y
FROM ${resolve("table_metadata")}
`;
// Function to run the query and get results (async for potential delays)
async function fetchTableResults() {
const result = await dataform.queryData(tableQuery);
return result.rows;
}
// Execute the query to get table results
const tableResults = await fetchTableResults();
// Initialize the UNION ALL query string
let selectAllFromTables = "";
// Loop through the results to construct the query
for (const row of tableResults) {
const schema = row.table_schema;
const table = row.table_name;
const column = row.column_y;
// Add each table's select statement to the query string (UNION ALL after the first)
if (selectAllFromTables) {
selectAllFromTables += " UNION ALL ";
}
selectAllFromTables += `SELECT id, sent_at, \`${column}\` FROM \`${schema}\`.\`${table}\``;
}
// Return the generated query
return selectAllFromTables;
}
// Use the generated query in your SQLX file (this is where the dynamic SQL is inserted)
${selectAllFromTables}
tableQuery
to construct the SQL for retrieving metadata from table_metadata
.dataform.queryData
(or your environment’s equivalent function) to execute the SQL and get results. This assumes an asynchronous function to fetch query results.UNION ALL
query string.SELECT
statement with UNION ALL
to form the final query.
Example for Full Scenario
-- filename: table_metadata.sqlx
config {
type: "table", // or "view"
}
SELECT
table_schema,
table_name,
column_y
FROM source_schema.source_table
dynamic_union_all.sqlx
-- filename: dynamic_union_all.sqlx
config {
type: "table", // or "view" if needed
}
js {
// Define the query to get the list of table names and schema names
const tableQuery = `
SELECT table_schema, table_name, column_y
FROM ${resolve("table_metadata")}
`;
// Function to run the query and get results
async function fetchTableResults() {
const result = await dataform.queryData(tableQuery);
return result.rows;
}
// Execute the query to get table results
const tableResults = await fetchTableResults();
// Initialize the UNION ALL query string
let selectAllFromTables = "";
// Loop through the results to construct the query
for (const row of tableResults) {
const schema = row.table_schema;
const table = row.table_name;
const column = row.column_y;
// Add each table's select statement to the query string
if (selectAllFromTables) {
selectAllFromTables += " UNION ALL ";
}
selectAllFromTables += `SELECT id, sent_at, \`${column}\` FROM \`${schema}\`.\`${table}\``;
}
// Return the generated query
return selectAllFromTables;
}
// Use the generated query in your SQLX file
${selectAllFromTables}
Additional Considerations
await
correctly if your environment supports asynchronous JavaScript execution.Hi,
Have you tried it on your side ?
I tried adapting it to my use case but also tried your raw code and gets "await is only valid in async functions and the top level bodies of modules" error.
Do you know if there's a solution ?
Thanks
Hello,
I have exactly the same issue. I did try even the suggestion here but it did not work.
Did you find a solution please ? Is is possible to do this in Dataform ?
Thanks
As it was answered in previous answers (via LLM 🙂 ), Dataform doesn't have an alternative to DBT's run_query macro. One possible workaround that I found is to use direct scripting in BigQuery with EXECUTE IMMEDIATE, so in your concrete case it could look something like this:
config {
type: "operations"
}
DECLARE query string;
SET query = (SELECT ARRAY_TO_STRING(q, ' UNION ALL ') FROM ( -- concatenates table names and UNION ALL into "query" variable
SELECT ARRAY_AGG(CONCAT('SELECT * FROM <gcp_project>.<dataset>.', table_name)) AS q FROM <gcp_project>.<dataset>.INFORMATION_SCHEMA.TABLES -- get table names via query
)
);
EXECUTE IMMEDIATE query -- executes constructed query
I'm not aware of any generic solution in Dataform
Thank you for pointing out the answers were LLM generated - I was getting very irritated by seeing the exact same conversation flow gemini has already been giving me all day. Your solution does work for my case, but I hope it helps others!