DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

misk@sopuli.xyz · 7 days ago

DeepSeek-V3 now runs at 20 tokens per second on Mac Studio, and that’s a nightmare for OpenAI

Fluffy Kitty Cat@slrpnk.net · 7 days ago

It’s the generation speed. Internally LLMs use tokens which represent either words or parts of words and map them to integer values. The model then does it’s prediction on which integer is most likely to come after the input. How the words are split up is an implementation detail that can vary from model to model