Storage & Data Transfer

Speeding up, scaling out: Filestore now supports high performance

June 16, 2020

Tad Hunt

Product Manager

Allon Cohen

Product Manager

The world’s top scientists and researchers are working around the clock to advance the discovery of a therapeutic or vaccine to combat COVID-19. With the power of these minds focused on one singular goal, it’s important that the technology they’re using is up to the task. While the challenges of a global pandemic are immense, so are the powers of today’s technology. Processing jobs like molecular screening require massive computational power, as well as high-performance, high-throughput storage beneath it.

At Google Cloud, we’re proud to offer tools that are enabling high-performance computing (HPC) in many industries, including COVID-19 therapeutics research. With powerful technology, scientists and researchers can work even faster, without tech barriers, to help people around the world. One of the critical enablers of HPC is file storage, and we are excited to announce the beta launch of Filestore High Scale, the next step in the evolution of Google’s file storage product, which includes Elastifile’s scale-out file storage capability.

Google completed its acquisition of Elastifile in August 2019, and we’ve integrated the technology into Filestore to add both scale and performance, and make it easier for you to move workloads to cloud. The new Filestore High Scale tier adds the ability to easily deploy shared file systems that can scale out to hundreds of thousands of IOPS, tens of GB/s of throughput, and hundreds of TBs. Whether migrating traditional applications, modernizing existing applications with Kubernetes, or scaling to meet the performance demands of big compute workloads, Filestore can now address these challenges.

Using Filestore in production

Christoph Gorgulla, a postdoctoral research fellow at Harvard Medical School’s Wagner Lab, uses Google Cloud’s scale-out file storage to enable his VirtualFlow virtual screening program for COVID-19 therapeutics.

“Virtual screening allows us to computationally screen billions of small molecules against a target protein in order to discover potential treatments and therapies much faster than traditional experimental testing methods,” says Gorgulla. “As researchers, we hardly have the time to invest in learning how to set up and manage a needlessly complicated file system cluster, or to constantly monitor the health of our storage system. We needed a file system that could handle the load generated concurrently by thousands of clients, which have hundreds of thousands of vCPUs. Much of the Filestore setup is automated, we’re able to scale up our capacity on the fly, and also actively monitor the speed of our workflows in a simple, graphical interface. VirtualFlow can massively reduce the time required for drug and treatment discovery, which will hopefully lead to faster development of therapeutics for COVID-19 and other diseases.”

Learn more about Christoph’s research in a recently published Nature article, and read more about how Google Cloud is helping COVID-19 academic research in this recent blog post.

Filestore is also a good fit to support workloads such as electronic design automation (EDA), video processing, genomics, manufacturing, and financial modeling, as well as other use cases that need high performance and capacity. Workloads benefit from Filestore High Scale’s support of concurrent access by tens of thousands of clients, scalable performance up to 16 GB/sec throughput and 480K IOPS, and the ability to scale capacity up and down based on demand.

File storage is a critical component of HPC applications, and Filestore High Scale is built to address those needs. That includes predictable performance for scale-out file storage in the cloud, and the ability to scale up and scale down a file system on demand. Understanding the costs associated with the performance that you need makes it much easier to architect your solution and optimize based on changing workload demands.

With Filestore High Scale, you get the power and performance of a distributed scale-out file system, and since it’s a fully managed service, you get the same ease of management of other Google Cloud products. You can spin up instances with just a few clicks in the Cloud Console, and you can automate management through gcloud and API calls. Plus, you can use Cloud Monitoring to keep an eye on these file systems, and integrate them into HPC workload management scheduling systems.

Additionally, in order to improve support for deployments with advanced security requirements, this launch adds beta support for NFS IP-based access controls to all Filestore tiers. This new feature enables access control for clients on the VPC by adding per-IP range configuration of root squash and read-only NFS export options. See the IP-based access control documentation for more information.

Filestore High Scale provides persistent storage that you can mount directly to tens of thousands of clients using NFS, without the need to deploy and maintain specialized client-side plugins. This enables HPC users to save up to 80% on compute instance costs for batch workloads through the use of preemptible VM instances for their workloads. While individual client VMs may be preempted, the data is persisted on Filestore, providing the ability to immediately spin up new VMs and continue processing.

Filestore High Scale is ready to take on your high-capacity challenges, so you can focus on managing your business. To get started, check out the Filestore documentation or create an instance in the Google Cloud Console.

Posted in