Test results for local running LLMs using Ollama AMD Ryzen 7 8745HS

Local LLMs Benchmark Data on GPU: AMD Ryzen 7 8745HS

Hardware

  • CPU: AMD Ryzen 7 8745HS with 2x32 Gb RAM
  • iGPU: 35GB VRAM specified automatically by BIOS
  • RAM: 35GB available from total 64 GB

NOTE: Currently there is missing Linux driver support

Here's the performance-sorted benchmark analysis for AMD Ryzen 7 8745HS with Radeon 780M iGPU:

Small Models (Under 4GB)

modelbackendsize mbtotal durationload durationprompt eval countprompt eval durationprompt eval rateeval counteval durationeval ratecpu avgcpu max
gemma3:1brocm8155.734.5823160.65143.8162.33631.0163.475.6736
llama3.2:1brocm13004.542.973865.48581.82761.550.573.336.67
qwen3:1.7brocm140010.592.922374.85308.79310.337.5941.593.678
llama3.2:3brocm200011.618.663870.71542.79832.8728.9837
gemma3:4brocm330010.427.6922191.62119.1362.672.5424.7952.67
qwen3:4brocm260021.367.262398.91273.9306.331421.892.678.33

Medium Models (4GB - 10GB)

modelbackendsize mbtotal durationload durationprompt eval countprompt eval durationprompt eval rateeval counteval durationeval ratecpu avgcpu max
mistral:7brocm410019.2611.392077.07260.56131.677.7916.9229.67
llama3.1:8brocm470014.739.162370.79325.1367.675.512.432.6710.33
qwen3:8brocm520038.210.6323111.18240.1432427.4612.073.3310.33
gemma3:12brocm800022.1813.3422420.5852.4663.338.417.681669.17
phi4:14brocm910025.7714.7423116.27199.8380.5710.917.434.4311.86
deepseek-r1:14brocm9000100.9113.6116178.04132.7504.4479.3478.7816.78
qwen2.5:14brocm900024.1114.174270.09222.662.869.176.7812.4329.29
qwen2.5-coder:14brocm900024.0613.234280.12302.2768.8810.266.7410.6222.88

Large Models (Over 10GB)

modelbackendsize mbtotal durationload durationprompt eval countprompt eval durationprompt eval rateeval counteval durationeval ratecpu avgcpu max
qwen3:30brocm1800044.472023263.7617.87432.7138.811.212.4338.14
mixtral:8x7brocm26000132.8527.48242.8612.6488.71102.54.3123.8652.43
mistral-small:22brocm1200040.0115.2720396.5625.9389.3323.713.861830
devstral:24brocm1400025.4416.921238124.2214.767.224.32.774450.2
mistral-small3.1:24brocm1500089.4713.6537120.6822.46828.472.5238.6761.33
gemma3:27brocm1700046.5214.14221.614.326130.782.0137.6781
qwen2.5-coder:32brocm2000027.9318.25424.0110.4951.530.681.683550.5

Key Observations:

  • Performance Paradox: llama3.2:1b achieves 581.82 t/s prompt processing despite its 1.3GB size.
  • Memory Efficiency: qwen2.5:14b delivers a 6.78 t/s eval rate at 9GB model size.
  • Anomaly: gemma3:12b shows a significant CPU utilization (average 16%, max 69.17%) despite ROCm offloading.
  • Throughput King: gemma3:1b maintains a 63.47 t/s eval rate with sub-1GB model size.

Top Performers:

  1. Gemma3:1b - Fastest overall (63.47t/s) with minimal RAM usage.
  2. Llama3.2:1b - Best prompt processing (581.82t/s) despite a medium eval rate.
  3. Qwen3:1.7b - Offers a strong balance for mid-sized models with an eval rate of 41.59 t/s.

Unexpected Findings:

  • Gemma3:4b shows notable CPU spikes (average 9%, max 52.67%) despite GPU offloading.
  • Llama3.1:8b delivers better performance (12.43 t/s eval rate) than some smaller 7B models (e.g., Mistral:7b at 16.92 t/s).
  • 14B models demonstrate an inverse size/performance ratio, with Qwen2.5:14b being slightly slower than its coder variant (6.78 t/s vs 6.74 t/s).

Comparative analysis of AMD Radeon 780M (Ryzen 7 8745HS) and NVIDIA RTX 4060 Ti performance in local LLM inference

Local LLMs Benchmark data on GPU: RTX 4060 Ti (16GB VRAM) CPU: Intel Core i5-13400 RAM: 64GB

Hardware Comparison

SpecificationRadeon 780MRTX 4060 Ti
VRAM35GB Shared16GB Dedicated
Memory Bandwidth~80GB/s (System RAM)288GB/s GDDR6
TDP54W (Total System)160W (GPU Only)
ArchitectureRDNA3Ada Lovelace
Software StackROCm 6.0CUDA 12.2

Performance by Model Category

Small Models (1-7B Parameters):

ModelAMD Eval RateNVIDIA Eval RateDifference
Gemma3:1b63.47 t/s104.27 t/s+63% NVIDIA
Mistral:7b16.92 t/s60.35 t/s+257% NVIDIA
Qwen3:1.7b41.59 t/s161.64 t/s+289% NVIDIA

Medium Models (8-14B Parameters):

ModelAMD Eval RateNVIDIA Eval RateDifference
Llama3.1:8b12.43 t/s54.85 t/s+341% NVIDIA
Qwen2.5:14b6.78 t/s28.81 t/s+325% NVIDIA
Deepseek-R1:14b7.00 t/s28.26 t/s+304% NVIDIA

Key Findings

Architecture Optimization:

  • NVIDIA significantly outperforms AMD in small to medium models, with differences ranging from +63% to +341% faster performance.
  • While AMD's Radeon 780M offers a good integrated solution, dedicated GPUs like the RTX 4060 Ti show a clear advantage in LLM inference speed due to higher memory bandwidth and optimized architecture for AI workloads.

Published on 5/29/2025