Powered By Spark | Apache Spark

Project and product names using "Spark"

Organizations creating products and projects for use with Apache Spark, along with associated marketing materials, should take care to respect the trademark in “Apache Spark” and its logo. Please refer to ASF Trademarks Guidance and associated FAQ for comprehensive and authoritative guidance on proper usage of ASF trademarks.

Names that do not include “Spark” at all have no potential trademark issue with the Spark project. This is recommended.

Names like “Spark BigCoProduct” are not OK, as are names including “Spark” in general. The above links, however, describe some exceptions, like for names such as “BigCoProduct, powered by Apache Spark” or “BigCoProduct for Apache Spark”.

It is common practice to create software identifiers (Maven coordinates, module names, etc.) like “spark-foo”. These are permitted. Nominative use of trademarks in descriptions is also always allowed, as in “BigCoProduct is a widget for Apache Spark”.

Companies and organizations

To add yourself to the list, please email dev@spark.apache.org with your organization name, URL, a list of which Spark components you are using, and a short description of your use case.

UC Berkeley AMPLab - Big data research lab that initially launched Spark
- We’re building a variety of open source projects on Spark
- We have both graduate students and a team of professional software engineers working on the stack
4Quant
Act Now
- Spark powers NOW APPS, a big data, real-time, predictive analytics platform. We use Spark SQL, MLlib and GraphX components for both batch ETL and analytics applied to telecommunication data, providing faster and more meaningful insights and actionable data to the operators.
Agile Lab
- enhancing big data. 360 customer view, log analysis, BI
Alibaba Taobao
- We built one of the world’s first Spark on YARN production clusters.
Alluxio
- Alluxio, formerly Tachyon, is the world’s first system that unifies disparate storage systems at memory speed.
Amazon
Art.com
- Trending analytics and personalization
AsiaInfo
- We are using Spark Core, Streaming, MLlib and Graphx. We leverage Spark and Hadoop ecosystem to build cost effective data center solution for our customer in telco industry as well as other industrial sectors.
atp
- Predictive models and learning algorithms to improve the relevance of programmatic marketing.
- Components used: Spark SQL, MLLib.
Autodesk
Baidu
Bakdata – using Spark (and Shark) to perform interactive exploration of large datasets
Big Industries - using Spark Streaming: The Big Content Platform is a business-to-business content asset management service providing a searchable, aggregated source of live news feeds, public domain media and archives of content.
Celtra
ClearStory Data – ClearStory’s platform and integrated Data Intelligence application leverages Spark to speed analysis across internal and external data sources, driving holistic and actionable insights.
Concur
- Spark SQL, MLlib
- Using Spark for travel and expenses analytics and personalization<
Content Square
- We use Spark to regularly read raw data, convert them into Parquet, and process them to create advanced analytics dashboards: aggregation, sampling, statistics computations, anomaly detection, machine learning.
Conviva – Experience Live
- See our talk at AmpCamp on how we are using Spark to provide real time video optimization
Credit Karma
- We create personalized experiences using Spark.
Databricks
- Formed by the original creators of Apache Spark, Databricks is working to expand the open source project and simplify big data and machine learning. We’re deeply committed to keeping all our work on Spark open source.
- We provided a cloud-optimized platform to run Spark and ML applications on Amazon Web Services and Azure, as well as a comprehensive training program.
Data Mechanics
- Data Mechanics is a cloud-native Spark platform that can be deployed on a Kubernetes cluster inside its customers AWS, GCP, or Azure cloud environments.
- Our focus is to make Spark easy-to-use and cost-effective for data engineering workloads. We also develop the free, cross-platform, and partially open-source Spark monitoring tool Data Mechanics Delight.
Data Pipelines
- Build and schedule ETL pipelines step-by-step via a simple no-code UI.
Dianping.com
Drawbridge
eBay Inc.
- Using Spark core for log transaction aggregation and analytics
Elsevier Labs
- Use Case: Building Machine Reading Pipeline, Knowledge Graphs, Content as a Service, Content and Event Analytics, Content/Event based Predictive Models and Big Data Processing.
- We use Scala and Python over Databricks Notebooks for most of our work.
EURECOM
Exabeam
Faimdata
- Build eCommerce and data intelligence solutions to the retail industry on top of Spark/Shark/Spark Streaming
Falkonry
Flytxt
- Big Data analytics for subscriber profiling and personalization in telecommunications domain. We are using Spark Core and MLlib.
Freeman Lab at HHMI
- We are using Spark for analyzing and visualizing patterns in large-scale recordings of brain activity in real time
Fundacion CTIC
GraalSystems
- GraalSystems is a cloud-native data platform that can be used everywhere, on cloud environments or on bare-metal infrastructures.
Groupon
GoDataDriven
- Amsterdam based consultancy company helping companies to be successful with Spark
Guavus
- Stream processing of network machine data
Hitachi Solutions
The Hive
IBM Almaden
InfoObjects
- Award winning Big Data consulting company with focus on Spark and Hadoop
Inspur
IOMETE - IOMETE offers a modern Cloud-Prem Data Lakehouse platform, extending cloud-like experience to on-premise and private clouds. Utilizing Apache Spark as the query engine, we enable running Spark Jobs and ML applications on AWS, Azure, GCP, or On-Prem. Discover more at IOMETE.
Istanbul Sehir University
Kenshoo
- Digital marketing solutions and predictive media optimization
Kelkoo
- Using Spark Core, SQL, and Streaming. Product recommendations, BI and analytics, real-time malicious activity filtering, and data mining.
Knoldus Software LLC
Localytics
- Batch, real-time, and predictive analytics driving our mobile app analytics and marketing automation product.
- Components used: Spark, Spark Streaming, MLLib.
MediaCrossing – Digital Media Trading Experts in the New York and Boston areas
- We are using Spark as a drop-in replacement for Hadoop Map/Reduce to get the right answer to our queries in a much shorter amount of time.
MyFitnessPal
- Using Spark to clean-up user entered food data using both explicit and implicit user signals with the final goal of identifying high-quality food items.
- Using Spark to build different recommendation systems for recipes and foods.
NASA JPL - Deep Space Network
Netease
Nokia Solutions and Networks
NTT DATA
Nube Technologies
- Nube provides solutions for data curation at scale helping customer targeting, accurate inventory and efficient analysis.
Ooyala, Inc. – Powering personalized video experiences across all screens
- See our blog post on how we use Spark for Fast Queries
- See our presentation on Cassandra, Spark, and Shark
Opentable
- Using Apache Spark for log processing and ETL. The data obtained feeds the recommender system powered by Spark MLLIB Matrix Factorization. We are evaluating the use of Spark Streaming for real-time analytics.
PanTera
- PanTera is a tool for exploring large datasets. It uses Spark to create XY and geographic scatterplots from millions to billions of datapoints.
- Components we are using: Spark Core (Scala API), Spark SQL, and GraphX
PlanBMedia
Apache PredictionIO
- PredictionIO currently offers two engine templates for Apache Spark MLlib for recommendation (MLlib ALS) and classification (MLlib Naive Bayes). With these templates, you can create a custom predictive engine for production deployment efficiently.
Premise
Quantifind
Radius Intelligence
- Using Scala, Spark and MLLib for Radius Marketing and Sales intelligence platform including data aggregation, data processing, data clustering, data analysis and predictive modeling of all US businesses.
Real Impact Analytics
- Building large scale analytics platforms for telecoms operators
RocketFuel
RONDHUIT
- Machine Learning with Apache Mahout and Spark http://www.rondhuit.com/services/training/mahout-ML.html
Sailthru
- Uses Spark to build predictive models and recommendation systems for marketing automation and personalization.
Samsung Research America
Shopify
Simba Technologies
- BI/reporting/ETL for Spark and beyond
Sinnia
SK Telecom
- SK Telecom analyses mobile usage patterns of customer with Spark and Shark.
Sohu
Stanford DAWN
- Research lab on infrastructure for usable machine learning, with multiple research projects that run over or accelerate Apache Spark.
Stratio
- Offers an open-source Big Data platform centered around Apache Spark.
Taboola – Powering ‘Content You May Like’ around the web
Tencent
Tetra Concepts
TrendMicro
TripAdvisor
UC Santa Cruz
University of Missouri Data Analytics and Discover Lab
VideoAmp
- Intelligent video ads for online and television viewing audiences.
Vistar Media
- Location technology company enabling brands to reach on-the-go consumers
Yahoo!
Yandex
- Using Spark in Yandex Islands, to process islands identified from a search robor
Zaloni
- Zaloni’s data lake management platform (Bedrock) and self-service data preparation solution (Mica) leverage Spark for fast execution of transformations and data exploration.

Latest News

Spark 3.4.4 released (Oct 27, 2024)
Preview release of Spark 4.0 (Sep 26, 2024)
Spark 3.5.3 released (Sep 24, 2024)
Spark 3.5.2 released (Aug 10, 2024)