#

cuda-programming

Here are 361 public repositories matching this topic...

taskflow / taskflow

A General-purpose Task-parallel Programming System using Modern C++

multi-threading parallel parallel-computing multithreading concurrent-programming high-performance-computing heterogeneous-parallel-programming threadpool parallel-programming work-stealing taskflow gpu-programming taskparallelism multicore-programming cuda-programming

Updated Oct 25, 2024
C++

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

molecular-dynamics-simulation gpu-programming cuda-programming

Updated Jul 27, 2023
Cuda

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: CUDA Cores, Tensor Cores, fp32/tf32, fp16/bf16, fp8/int8, flash_attn, rope, sgemm, hgemm, sgemv, warp/block reduce, elementwise, softmax, layernorm, rmsnorm.

cuda pytorch triton gemm softmax cuda-programming layernorm gemv elementwise rmsnorm flash-attention flash-attention-2 warp-reduce block-reduce flash-attention-3

Updated Nov 8, 2024
Cuda

NVIDIA / cccl

CUDA Core Compute Libraries

cpp hpc gpu modern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing parallel-algorithm parallel-programming nvidia-gpu gpu-programming cuda-library cpp-programming cuda-programming accelerated-computing cuda-cpp

Updated Nov 10, 2024
C++

eyalroz / cuda-api-wrappers

Thin, unified, C++-flavored wrappers for the CUDA APIs

gpu modern-cpp cuda gpgpu api-wrapper gpu-memory gpu-computing cuda-driver-api cuda-toolkit cuda-device cuda-runtime-api cuda-driver gpgpu-computing cuda-api-wrappers cuda-programming

Updated Oct 28, 2024
C++

sail-sg / Adan

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Updated Jul 2, 2024
Python

mit-han-lab / TinyChatEngine

TinyChatEngine: On-Device LLM Inference Library

c arm deep-learning cpp x86-64 quantization edge-computing cuda-programming on-device-ai large-language-models

Updated Jul 4, 2024
C++

coreylowman / cudarc

Safe rust wrapper around CUDA toolkit

rust gpu cuda cublas gpu-acceleration cuda-kernels cudnn cuda-toolkit nccl curand cuda-programming nvrtc

Updated Sep 6, 2024
Rust

TensorRT-YOLO

laugh12321 / TensorRT-YOLO

🚀 你的YOLO部署神器。TensorRT Plugin、CUDA Kernel、CUDA Graphs三管齐下，享受闪电般的推理速度。| Your YOLO Deployment Powerhouse. With the synergy of TensorRT Plugins, CUDA Kernels, and CUDA Graphs, experience lightning-fast inference speeds.

detection cuda cuda-kernels tensorrt onnx yolov3 cuda-programming yolov5 ultralytics yolov6 yolov7 ppyoloe yolov8 cuda-graph yolov9 yolov10 yolo11

Updated Nov 8, 2024
C++

nosferalatu / SimpleGPUHashTable

A simple GPU hash table implemented in CUDA using lock free techniques

gpu cuda data-structures cuda-programming gpu-cuda-programs

Updated Feb 7, 2024
Cuda

PaddleJitLab / CUDATutorial

A self-learning tutorail for CUDA High Performance Programing.

deep-learning cuda-programming

Updated Nov 6, 2024
JavaScript

jaredhoberock / stanford-cs193g-sp2010

This is an archive of materials produced for an introductory class on CUDA programming at Stanford University in 2010

cuda cuda-kernels gpu-programming cuda-programming

Updated Jun 24, 2022
C++

HMUNACHI / cuda-repo

From zero to hero CUDA for accelerating maths and machine learning on GPU.

machine-learning cuda cuda-kernels maths cuda-programming

Updated Jul 23, 2024
Cuda

MuGdxy / muda

μ-Cuda, COVER THE LAST MILE OF CUDA. With features: intellisense-friendly, structured launch, automatic cuda graph generation and updating.

cuda cuda-programming cuda-cpp

Updated Nov 6, 2024
C++

ROCm / HIP-CPU

An implementation of HIP that works on CPUs, across OSes.

cuda cpp17 hip spmd stl-algorithms parallel-algorithms cuda-programming hip-runtime hip-kernel-language hip-portability

Updated Mar 19, 2024
C++

SunsetQuest / CudaPAD

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of the assembly.

windows gpu cuda nvidia ptx cuda-programming ptx-utils

Updated Jan 17, 2023
C#

eyalroz / cuda-kat

CUDA kernel author's tools

patterns algorithms gpu constexpr modern-cpp cuda printf cpp11 utility-library cuda-kernels gpu-programming cuda-library elegant-coding cuda-programming utility-functions printf-functions

Updated Apr 24, 2022
Cuda

CUDA-Guide

mikeroyal / CUDA-Guide

CUDA Guide

machine-learning awesome deep-learning gpu cuda resources gpgpu graphics-programming awesome-list cuda-kernels cuda-toolkit cuda-opengl cuda-support cuda-development cuda-driver cuda-library gpgpu-computing cuda-programming awesome-readme

Updated Jan 4, 2024
Cuda

CUDA-WSL2-Ubuntu

FahimFBA / CUDA-WSL2-Ubuntu

Install CUDA on Windows11 using WSL2

machine-learning deep-learning cuda deep-reinforcement-learning wsl machinelearning deeplearning cuda-toolkit cuda-support deeplearning-ai wsl-ubuntu machinelearning-python cuda-programming wsl2 wsl-environment cuda-wsl

Updated Aug 2, 2023
Jupyter Notebook

emptysoal / cuda-image-preprocess

Speed up image preprocess with cuda when handle image or tensorrt inference

deep-learning cuda image-processing cnn cuda-kernels cuda-demo tensorrt cuda-programming

Updated Sep 27, 2024
Cuda

Improve this page

Add a description, image, and links to the cuda-programming topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cuda-programming topic, visit your repo's landing page and select "manage topics."