Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 1,897 public repositories matching this topic...
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
-
Updated
Nov 12, 2024 - Scala
Simple and Distributed Machine Learning
-
Updated
Nov 12, 2024 - Scala
State of the Art Natural Language Processing
-
Updated
Nov 11, 2024 - Scala
酷玩 Spark: Spark 源代码解析、Spark 类库等
-
Updated
May 18, 2022 - Scala
Deequ is a library built on top of Apache Spark for defining "unit tests for data", which measure data quality in large datasets.
-
Updated
Oct 9, 2024 - Scala
深圳地铁大数据客流分析系统🚇🚄🌟
-
Updated
May 16, 2024 - Scala
TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning
-
Updated
Sep 29, 2023 - Scala
Apache Kyuubi is a distributed and multi-tenant gateway to provide serverless SQL on data warehouses and lakehouses.
-
Updated
Nov 12, 2024 - Scala
A Scala kernel for Jupyter
-
Updated
Nov 6, 2024 - Scala
MLeap: Deploy ML Pipelines to Production
-
Updated
Nov 12, 2024 - Scala
High performance data store solution
-
Updated
Oct 15, 2024 - Scala
This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
-
Updated
May 8, 2024 - Scala
scala、spark使用过程中,各种���试用例以及相关资料整理
-
Updated
Feb 9, 2019 - Scala
Project SnappyData - memory optimized analytics database, based on Apache Spark™ and Apache Geode™. Stream, Transact, Analyze, Predict in one cluster
-
Updated
Nov 21, 2022 - Scala
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 423 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia