
Local LLMs Benchmark Data on GPU: AMD Ryzen 7 8745HS
Hardware
- CPU: AMD Ryzen 7 8745HS with 2x32 Gb RAM
- iGPU: 35GB VRAM specified automatically by BIOS
- RAM: 35GB available from total 64 GB
NOTE: Currently there is missing Linux driver support
Here's the performance-sorted benchmark analysis for AMD Ryzen 7 8745HS with Radeon 780M iGPU:
Small Models (Under 4GB)
model | backend | size mb | total duration | load duration | prompt eval count | prompt eval duration | prompt eval rate | eval count | eval duration | eval rate | cpu avg | cpu max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
gemma3:1b | rocm | 815 | 5.73 | 4.58 | 23 | 160.65 | 143.81 | 62.33 | 631.01 | 63.47 | 5.67 | 36 |
llama3.2:1b | rocm | 1300 | 4.54 | 2.97 | 38 | 65.48 | 581.82 | 76 | 1.5 | 50.57 | 3.33 | 6.67 |
qwen3:1.7b | rocm | 1400 | 10.59 | 2.92 | 23 | 74.85 | 308.79 | 310.33 | 7.59 | 41.59 | 3.67 | 8 |
llama3.2:3b | rocm | 2000 | 11.61 | 8.66 | 38 | 70.71 | 542.79 | 83 | 2.87 | 28.98 | 3 | 7 |
gemma3:4b | rocm | 3300 | 10.42 | 7.69 | 22 | 191.62 | 119.13 | 62.67 | 2.54 | 24.7 | 9 | 52.67 |
qwen3:4b | rocm | 2600 | 21.36 | 7.26 | 23 | 98.91 | 273.9 | 306.33 | 14 | 21.89 | 2.67 | 8.33 |
Medium Models (4GB - 10GB)
model | backend | size mb | total duration | load duration | prompt eval count | prompt eval duration | prompt eval rate | eval count | eval duration | eval rate | cpu avg | cpu max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
mistral:7b | rocm | 4100 | 19.26 | 11.39 | 20 | 77.07 | 260.56 | 131.67 | 7.79 | 16.92 | 2 | 9.67 |
llama3.1:8b | rocm | 4700 | 14.73 | 9.16 | 23 | 70.79 | 325.13 | 67.67 | 5.5 | 12.43 | 2.67 | 10.33 |
qwen3:8b | rocm | 5200 | 38.2 | 10.63 | 23 | 111.18 | 240.14 | 324 | 27.46 | 12.07 | 3.33 | 10.33 |
gemma3:12b | rocm | 8000 | 22.18 | 13.34 | 22 | 420.58 | 52.46 | 63.33 | 8.41 | 7.68 | 16 | 69.17 |
phi4:14b | rocm | 9100 | 25.77 | 14.74 | 23 | 116.27 | 199.83 | 80.57 | 10.91 | 7.43 | 4.43 | 11.86 |
deepseek-r1:14b | rocm | 9000 | 100.91 | 13.61 | 16 | 178.04 | 132.7 | 504.44 | 79.34 | 7 | 8.78 | 16.78 |
qwen2.5:14b | rocm | 9000 | 24.11 | 14.17 | 42 | 70.09 | 222.6 | 62.86 | 9.17 | 6.78 | 12.43 | 29.29 |
qwen2.5-coder:14b | rocm | 9000 | 24.06 | 13.23 | 42 | 80.12 | 302.27 | 68.88 | 10.26 | 6.74 | 10.62 | 22.88 |
Large Models (Over 10GB)
model | backend | size mb | total duration | load duration | prompt eval count | prompt eval duration | prompt eval rate | eval count | eval duration | eval rate | cpu avg | cpu max |
---|---|---|---|---|---|---|---|---|---|---|---|---|
qwen3:30b | rocm | 18000 | 44.47 | 20 | 23 | 263.76 | 17.87 | 432.71 | 38.8 | 11.2 | 12.43 | 38.14 |
mixtral:8x7b | rocm | 26000 | 132.85 | 27.48 | 24 | 2.86 | 12.64 | 88.71 | 102.5 | 4.31 | 23.86 | 52.43 |
mistral-small:22b | rocm | 12000 | 40.01 | 15.27 | 20 | 396.56 | 25.93 | 89.33 | 23.71 | 3.86 | 18 | 30 |
devstral:24b | rocm | 14000 | 25.44 | 16.92 | 1238 | 124.22 | 14.7 | 67.2 | 24.3 | 2.77 | 44 | 50.2 |
mistral-small3.1:24b | rocm | 15000 | 89.47 | 13.65 | 371 | 20.68 | 22.4 | 68 | 28.47 | 2.52 | 38.67 | 61.33 |
gemma3:27b | rocm | 17000 | 46.52 | 14.14 | 22 | 1.6 | 14.32 | 61 | 30.78 | 2.01 | 37.67 | 81 |
qwen2.5-coder:32b | rocm | 20000 | 27.93 | 18.25 | 42 | 4.01 | 10.49 | 51.5 | 30.68 | 1.68 | 35 | 50.5 |
Key Observations:
- Performance Paradox: llama3.2:1b achieves 581.82 t/s prompt processing despite its 1.3GB size.
- Memory Efficiency: qwen2.5:14b delivers a 6.78 t/s eval rate at 9GB model size.
- Anomaly: gemma3:12b shows a significant CPU utilization (average 16%, max 69.17%) despite ROCm offloading.
- Throughput King: gemma3:1b maintains a 63.47 t/s eval rate with sub-1GB model size.
Top Performers:
- Gemma3:1b - Fastest overall (63.47t/s) with minimal RAM usage.
- Llama3.2:1b - Best prompt processing (581.82t/s) despite a medium eval rate.
- Qwen3:1.7b - Offers a strong balance for mid-sized models with an eval rate of 41.59 t/s.
Unexpected Findings:
- Gemma3:4b shows notable CPU spikes (average 9%, max 52.67%) despite GPU offloading.
- Llama3.1:8b delivers better performance (12.43 t/s eval rate) than some smaller 7B models (e.g., Mistral:7b at 16.92 t/s).
- 14B models demonstrate an inverse size/performance ratio, with Qwen2.5:14b being slightly slower than its coder variant (6.78 t/s vs 6.74 t/s).
Comparative analysis of AMD Radeon 780M (Ryzen 7 8745HS) and NVIDIA RTX 4060 Ti performance in local LLM inference
Local LLMs Benchmark data on GPU: RTX 4060 Ti (16GB VRAM) CPU: Intel Core i5-13400 RAM: 64GB
Hardware Comparison
Specification | Radeon 780M | RTX 4060 Ti |
---|---|---|
VRAM | 35GB Shared | 16GB Dedicated |
Memory Bandwidth | ~80GB/s (System RAM) | 288GB/s GDDR6 |
TDP | 54W (Total System) | 160W (GPU Only) |
Architecture | RDNA3 | Ada Lovelace |
Software Stack | ROCm 6.0 | CUDA 12.2 |
Performance by Model Category
Small Models (1-7B Parameters):
Model | AMD Eval Rate | NVIDIA Eval Rate | Difference |
---|---|---|---|
Gemma3:1b | 63.47 t/s | 104.27 t/s | +63% NVIDIA |
Mistral:7b | 16.92 t/s | 60.35 t/s | +257% NVIDIA |
Qwen3:1.7b | 41.59 t/s | 161.64 t/s | +289% NVIDIA |
Medium Models (8-14B Parameters):
Model | AMD Eval Rate | NVIDIA Eval Rate | Difference |
---|---|---|---|
Llama3.1:8b | 12.43 t/s | 54.85 t/s | +341% NVIDIA |
Qwen2.5:14b | 6.78 t/s | 28.81 t/s | +325% NVIDIA |
Deepseek-R1:14b | 7.00 t/s | 28.26 t/s | +304% NVIDIA |
Key Findings
Architecture Optimization:
- NVIDIA significantly outperforms AMD in small to medium models, with differences ranging from +63% to +341% faster performance.
- While AMD's Radeon 780M offers a good integrated solution, dedicated GPUs like the RTX 4060 Ti show a clear advantage in LLM inference speed due to higher memory bandwidth and optimized architecture for AI workloads.
Published on 5/29/2025