Apple’s iPhone 17 Pro has achieved the remarkable feat of running a 400 billion parameter Large Language Model (LLM), contrary to expectations of its hardware limitations. While conventional wisdom suggests that such models require a whopping 200GB of RAM, an innovative approach using the Flash-MoE open-source project allows the iPhone to stream data from its SSD to the GPU, bypassing memory constraints. However, the performance is notably sluggish, generating just 0.6 tokens per second, translating to one word every 1.5 to 2 seconds. Although users may find this frustrating, the demonstration reveals the potential for on-device LLMs on smartphones. Notably, this approach ensures 100% privacy and operates without an internet connection, albeit at the expense of battery life. In summary, while running a 400B LLM on an iPhone 17 Pro is feasible, significant optimizations are necessary for practical usage.
For more tech insights, follow Wccftech on Google.
Source link