Local LLMs Benchmark Data on GPU: AMD Ryzen 7 8745HS

Hardware

NOTE: Currently there is missing Linux driver support

Here's the performance-sorted benchmark analysis for AMD Ryzen 7 8745HS with Radeon 780M iGPU:

model	backend	size mb	total duration	load duration	prompt eval count	prompt eval duration	prompt eval rate	eval count	eval duration	eval rate	cpu avg	cpu max
gemma3:1b	rocm	815	5.73	4.58	23	160.65	143.81	62.33	631.01	63.47	5.67	36
llama3.2:1b	rocm	1300	4.54	2.97	38	65.48	581.82	76	1.5	50.57	3.33	6.67
qwen3:1.7b	rocm	1400	10.59	2.92	23	74.85	308.79	310.33	7.59	41.59	3.67	8
llama3.2:3b	rocm	2000	11.61	8.66	38	70.71	542.79	83	2.87	28.98	3	7
gemma3:4b	rocm	3300	10.42	7.69	22	191.62	119.13	62.67	2.54	24.7	9	52.67
qwen3:4b	rocm	2600	21.36	7.26	23	98.91	273.9	306.33	14	21.89	2.67	8.33

model	backend	size mb	total duration	load duration	prompt eval count	prompt eval duration	prompt eval rate	eval count	eval duration	eval rate	cpu avg	cpu max
mistral:7b	rocm	4100	19.26	11.39	20	77.07	260.56	131.67	7.79	16.92	2	9.67
llama3.1:8b	rocm	4700	14.73	9.16	23	70.79	325.13	67.67	5.5	12.43	2.67	10.33
qwen3:8b	rocm	5200	38.2	10.63	23	111.18	240.14	324	27.46	12.07	3.33	10.33
gemma3:12b	rocm	8000	22.18	13.34	22	420.58	52.46	63.33	8.41	7.68	16	69.17
phi4:14b	rocm	9100	25.77	14.74	23	116.27	199.83	80.57	10.91	7.43	4.43	11.86
deepseek-r1:14b	rocm	9000	100.91	13.61	16	178.04	132.7	504.44	79.34	7	8.78	16.78
qwen2.5:14b	rocm	9000	24.11	14.17	42	70.09	222.6	62.86	9.17	6.78	12.43	29.29
qwen2.5-coder:14b	rocm	9000	24.06	13.23	42	80.12	302.27	68.88	10.26	6.74	10.62	22.88

model	backend	size mb	total duration	load duration	prompt eval count	prompt eval duration	prompt eval rate	eval count	eval duration	eval rate	cpu avg	cpu max
qwen3:30b	rocm	18000	44.47	20	23	263.76	17.87	432.71	38.8	11.2	12.43	38.14
mixtral:8x7b	rocm	26000	132.85	27.48	24	2.86	12.64	88.71	102.5	4.31	23.86	52.43
mistral-small:22b	rocm	12000	40.01	15.27	20	396.56	25.93	89.33	23.71	3.86	18	30
devstral:24b	rocm	14000	25.44	16.92	1238	124.22	14.7	67.2	24.3	2.77	44	50.2
mistral-small3.1:24b	rocm	15000	89.47	13.65	371	20.68	22.4	68	28.47	2.52	38.67	61.33
gemma3:27b	rocm	17000	46.52	14.14	22	1.6	14.32	61	30.78	2.01	37.67	81
qwen2.5-coder:32b	rocm	20000	27.93	18.25	42	4.01	10.49	51.5	30.68	1.68	35	50.5

Key Observations:

Performance Paradox: llama3.2:1b achieves 581.82 t/s prompt processing despite its 1.3GB size.
Memory Efficiency: qwen2.5:14b delivers a 6.78 t/s eval rate at 9GB model size.
Anomaly: gemma3:12b shows a significant CPU utilization (average 16%, max 69.17%) despite ROCm offloading.
Throughput King: gemma3:1b maintains a 63.47 t/s eval rate with sub-1GB model size.

Top Performers:

Gemma3:1b - Fastest overall (63.47t/s) with minimal RAM usage.
Llama3.2:1b - Best prompt processing (581.82t/s) despite a medium eval rate.
Qwen3:1.7b - Offers a strong balance for mid-sized models with an eval rate of 41.59 t/s.

Unexpected Findings:

Gemma3:4b shows notable CPU spikes (average 9%, max 52.67%) despite GPU offloading.
Llama3.1:8b delivers better performance (12.43 t/s eval rate) than some smaller 7B models (e.g., Mistral:7b at 16.92 t/s).
14B models demonstrate an inverse size/performance ratio, with Qwen2.5:14b being slightly slower than its coder variant (6.78 t/s vs 6.74 t/s).

Small Models (1-7B Parameters):

Model	AMD Eval Rate	NVIDIA Eval Rate	Difference
Gemma3:1b	63.47 t/s	104.27 t/s	+63% NVIDIA
Mistral:7b	16.92 t/s	60.35 t/s	+257% NVIDIA
Qwen3:1.7b	41.59 t/s	161.64 t/s	+289% NVIDIA

Medium Models (8-14B Parameters):

Model	AMD Eval Rate	NVIDIA Eval Rate	Difference
Llama3.1:8b	12.43 t/s	54.85 t/s	+341% NVIDIA
Qwen2.5:14b	6.78 t/s	28.81 t/s	+325% NVIDIA
Deepseek-R1:14b	7.00 t/s	28.26 t/s	+304% NVIDIA

Architecture Optimization:

NVIDIA significantly outperforms AMD in small to medium models, with differences ranging from +63% to +341% faster performance.
While AMD's Radeon 780M offers a good integrated solution, dedicated GPUs like the RTX 4060 Ti show a clear advantage in LLM inference speed due to higher memory bandwidth and optimized architecture for AI workloads.

Published on 5/29/2025