What’s next for data analytics at Google Cloud Next ’24
Gerrit Kazmaier
VP & GM of Data Analytics, Google Cloud
We’re entering a new era for data analytics, going from narrow insights to enterprise-wide transformation through a virtuous cycle of data, analytics, and AI. At the same time, analytics and AI are becoming widely accessible, providing insights and recommendations to anyone with a question. Ultimately, we’re going beyond our own human limitations to leverage AI-based data agents to find deeply hidden insights for us.
Organizations already recognize that data and AI can come together to unlock the value of AI for their business. Research from Google’s 2024 Data and AI Trends Report highlighted 84% of data leaders believe that generative AI will help their organization reduce time-to-insight, and 80% agree that the lines of data and AI are starting to blur.
Today at Google Cloud Next ’24, we’re announcing new innovations for BigQuery and Looker that will help activate all of your data with AI:
-
BigQuery is a unified AI-ready data platform with support for multimodal data, multiple serverless processing engines and built-in streaming and data governance to support the entire data-to-AI lifecycle.
-
New BigQuery integrations with Gemini models in Vertex AI support multimodal analytics, vector embeddings, and fine-tuning of LLMs from within BigQuery, applied to your enterprise data.
-
Gemini in BigQuery provides AI-powered experiences for data preparation, analysis and engineering, as well as intelligent recommenders to optimize your data workloads.
-
Gemini in Looker enables business users to chat with their enterprise data and generate visualizations and reports—all powered by the Looker semantic data model that’s seamlessly integrated into Google Workspace.
Let’s take a deeper look at each of these developments.
BigQuery: the unified AI-ready data foundation
BigQuery is now Google Cloud’s single integrated platform for data to AI workloads. BigLake, BigQuery’s unified storage engine, provides a single interface across BigQuery native and open formats for analytics and AI workloads, giving you the choice of where your data is stored and access to all of your data, whether structured or unstructured, along with a universal view of data supported by a single runtime metastore, built-in governance, and fine grained access controls.
Today we’re expanding open format support with the preview of a fully managed experience for Iceberg, with DDL, DML and high throughput support. In addition to support for Iceberg and Hudi, we’re also extending BigLake capabilities with native support for the Delta file format, now in preview.
“At HCA Healthcare we are committed to the care and improvement of human life. We are on a mission to redesign the way care is delivered, letting clinicians focus on patient care and using data and AI where it can best support doctors and nurses. We are building our unified data and AI foundation using Google Cloud's lakehouse stack, where BigQuery and BigLake enable us to securely discover and manage all data types and formats in a single platform to build the best possible experiences for our patients, doctors, and nurses. With our data in Google Cloud’s lakehouse stack, we’ve built a multimodal data foundation that will enable our data scientists, engineers, and analysts to rapidly innovate with AI." - Mangesh Patil, Chief Analytics Officer, HCA Healthcare
We’re also extending our cross-cloud capabilities of BigQuery Omni. Through partnerships with leading organizations like Salesforce and our recent launch of bidirectional data sharing between BigQuery and Salesforce Data Cloud, customers can securely combine data across platforms with zero copy and zero ops to build AI models and predictions on combined Salesforce and BigQuery data. Customers can also enrich customer 360 profiles in Salesforce Data Cloud with data from BigQuery, driving additional personalization opportunities powered by data and AI.
“It is great to collaborate without boundaries to unlock trapped data and deliver amazing customer experiences. This integration will help our joint customers tap into Salesforce Data Cloud's rich capabilities and use zero copy data sharing and Google AI connected to trusted enterprise data.” - Rahul Auradkar, EVP and General Manager of United Data Services & Einstein at Salesforce
Building on this unified AI-ready data foundation, we are now making BigQuery Studio generally available, which already has hundreds of thousands of active users. BigQuery Studio provides a collaborative data workspace across data and AI that all data teams and practitioners can use to accelerate their data-to-AI workflows. BigQuery Studio provides the choice of SQL, Python, Spark or natural language directly within BigQuery, as well as new integrations for real-time streaming and governance.
Customers’ use of serverless Apache Spark for data processing increased by over 500% in the past year1. Today, we are excited to announce the preview of our serverless engine for Apache Spark integrated within BigQuery Studio to help data teams work with Python as easily as they do with SQL, without having to manage infrastructure.
The data team at Snap Inc. uses these new capabilities to converge toward a common data and AI platform with multiple engines that work across a single copy of data. This gives them the ability to enforce fine-grained governance and track lineage close to the data to easily expand analytics and AI use cases needed to drive transformation.
To make data processing on real-time streams directly accessible from BigQuery, we’re announcing the preview of BigQuery continuous queries providing continuous SQL processing over data streams, enabling real-time pipelines with AI operators or reverse ETL. We are also announcing the preview of Apache Kafka for BigQuery as a managed service to enable streaming data workloads based on open-source APIs.
We’re expanding our governance capabilities with Dataplex with new innovations for data-to-AI governance available in preview. You can now perform integrated search and drive gen AI-powered insights on your enterprise data, including data and models from Vertex AI, with a fully integrated catalog in BigQuery. We’re introducing column-level lineage in BigQuery and expanding lineage capabilities to support Vertex AI pipelines (available in preview soon) to help you better understand data-to-AI workloads. Finally, to facilitate governance for data-access at scale, we are launching governance rules in Dataplex.
Multimodal analytics with new BigQuery and Vertex AI integrations
With BigQuery’s direct integration with Vertex AI, we are now announcing the ability to connect models in Vertex AI with your enterprise data, without having to copy or move your data out of BigQuery. This enables multi-modal analytics using unstructured data, fine tuning of LLMs and the use of vector embeddings in BigQuery.
Priceline, for instance, is using business data stored in BigQuery for LLMs across a wide range of applications.
“BigQuery gave us a solid data foundation for AI. Our data was exactly where we needed it. We were able to connect millions of customer data points from hotel information, marketing content, and customer service chat and use our business data to ground LLMs.” - Allie Surina Dixon, Director of Data, Priceline
The direct integration between BigQuery and Vertex AI now enables seamless preparation and analysis of multimodal data such as documents, audio and video files. BigQuery features rich support for analyzing unstructured data using object tables and Vertex AI Vision, Document AI and Speech-to-Text APIs. We are now enabling BigQuery to analyze images and video using Gemini 1.0 Pro Vision, making it easier than ever to combine structured with unstructured data in data pipelines using the generative AI capabilities of the latest Gemini models.
BigQuery makes it easier than ever to execute AI on enterprise data by providing the ability to build prompts based on your BigQuery data, and use of LLMs for sentiment extraction, classification, topic detection, translation, classification, data enrichment and more.
BigQuery now also supports generating vector embeddings and indexing them at scale using vector and semantic search. This enables new use cases that require similarity search, recommendations or retrieval of your BigQuery data, including documents, images or videos. Customers can use the semantic search in the BigQuery SQL interface or via our integration with gen AI frameworks such as LangChain and leverage Retrieval Augmented Generation based on their enterprise data.
Gemini in BigQuery and Gemini in Looker for AI-powered assistance
Gen AI is creating new opportunities for rich data-driven experiences that enable business users to ask questions, build custom visualizations and reports, and surface new insights using natural language. In addition to business users, gen AI assistive and agent capabilities can also accelerate the work of data teams, spanning data exploration, analysis, governance, and optimization. In fact, more than 90% of organizations believe business intelligence and data analytics will change significantly due to AI.
Today, we are announcing the public preview of Gemini in BigQuery, which provides AI-powered features that enhance user productivity and optimize costs throughout the analytics lifecycle, from ingestion and pipeline creation to deriving valuable insights. What makes Gemini in BigQuery unique is its contextual awareness of your business through access to metadata, usage data, and semantics. Gemini in BigQuery also goes beyond chat assistance to include new visual experiences such as data canvas, a new natural language-based experience for data exploration, curation, wrangling, analysis, and visualization workflows.
Imagine you are a data analyst at a bikeshare company. You can use the new data canvas of Gemini in BigQuery to explore the datasets, identify the top trips and create a customized visualization, all using natural language prompts within the same interface
Gemini in BigQuery capabilities extend to query recommendations, semantic search capabilities, low-code visual data pipeline development tools, and AI-powered recommendations for query performance improvement, error minimization, and cost optimization. Additionally, it allows users to create SQL or Python code using natural language prompts and get real-time suggestions while composing queries.
Today, we are also announcing the private preview of Gemini in Looker to enable business users and analysts to chat with their business data. Gemini in Looker capabilities include conversational analytics, report and formula generation, LookML and visualization assistance, and automated Google slide generation. What’s more, these capabilities are being integrated with Workspace to enable users to easily access beautiful data visualizations and insights right where they work.
Imagine you’re an ecommerce store. You can query Gemini in Looker to learn sales trends and market details and immediately explore the insights, with details on how the charts were created.
To learn more about our data analytics product innovations, hear customer stories, and gain hands-on knowledge from our developer experts, join our data analytics spotlights and breakout sessions at Google Cloud Next ‘24, or watch them on-demand.
1. Google internal data - YoY growth of data processed using Apache Spark on Google Cloud compared with Feb ‘23