DEV Community

# gpu

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Intel Xe3P Leaks 160GB LPDDR5X; FlashAttention-2 in CuTe & Custom CUDA GPT-2 Engine

Intel Xe3P Leaks 160GB LPDDR5X; FlashAttention-2 in CuTe & Custom CUDA GPT-2 Engine

Comments
3 min read
Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Building llama.cpp from source on a Dell Precision T5820 with an RTX 3090 Ti (after seven power cycles)

Comments
16 min read
Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Three Months of Speed-Up Experiments on a 3090 Ti: Autoregressive DFlash MTP for Qwen3.6-27B

Comments
18 min read
Why MTP doesn't speed up your llama.cpp inference (and how to actually fix it)

Why MTP doesn't speed up your llama.cpp inference (and how to actually fix it)

Comments
5 min read
267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE

267 tok/s local inference on RTX 5090 – llama.cpp MTP + Qwen3-35B-A3B MoE

Comments
1 min read
GPU Bottleneck Analyzer, NVIDIA Rubin VRAM Demands, and Qwen VRAM Optimization

GPU Bottleneck Analyzer, NVIDIA Rubin VRAM Demands, and Qwen VRAM Optimization

1
Comments
4 min read
Production-Ready GPU Inference Autoscaling on EKS with Karpenter, KEDA, and Dragonfly

Production-Ready GPU Inference Autoscaling on EKS with Karpenter, KEDA, and Dragonfly

Comments
26 min read
GPU Hardware & Driver Update: RTX 5090 Benchmarks, llama.cpp MTP, Windows 11 Fix

GPU Hardware & Driver Update: RTX 5090 Benchmarks, llama.cpp MTP, Windows 11 Fix

Comments
3 min read
CUDA Cutile-rs Beta, AMD FSR 4.1 Release, & Forza Horizon 6 GPU Benchmarks

CUDA Cutile-rs Beta, AMD FSR 4.1 Release, & Forza Horizon 6 GPU Benchmarks

Comments
3 min read
Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

Same eBPF, Different Vendor: Tracing libhip Calls on AMD ROCm

Comments
3 min read
Best GPU for Llama 70B in 2026 (48GB+ VRAM Required)

Best GPU for Llama 70B in 2026 (48GB+ VRAM Required)

Comments
6 min read
From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

From TCP Retransmits to MCP-Driven Cluster Investigations: An eBPF GPU Agent Retrospective

1
Comments
8 min read
From Zero to Supercomputing: A Beginner-Friendly Guide to Using HPC Clusters Like CINECA

From Zero to Supercomputing: A Beginner-Friendly Guide to Using HPC Clusters Like CINECA

Comments
5 min read
What Inference-Platform Benchmark Posts Leave Out

What Inference-Platform Benchmark Posts Leave Out

Comments
8 min read
Why CUDA kernels silently corrupt memory and how to catch the bug

Why CUDA kernels silently corrupt memory and how to catch the bug

Comments
5 min read
👋 Sign in for the ability to sort posts by relevant, latest, or top.
HTTPS ¡ dev.to
← Home