Microsoft open-sources its Phi-4 Small Language Model

Microsoft has open-sourced its lightweight Phi-4 Small Language Model under an MIT licence. It is available now via Ollama as a 9.1GB model file.

Phi-4 is an SLM that was first revealed in early December 2024. It can be run locally on a MacBook or equivalent with decent performance.

Unusually, synthetic data constituted the bulk of the training data for Phi-4, Microsoft earlier explained in an arXiv paper on its initial release.

This was “generated using a diverse array of techniques, including multi-agent prompting, self-revision workflows, and instruction reversal.”

Phi-4, a dense decoder-only Transformer model, was trained for 21 days on nine trillion tokens. Redmond claims that the 14 billion-parameter, text-only SLM “continues to push the frontier of size vs quality…”

Performance against other closely watched open SLMs.

Microsoft sees it being useful for “general purpose AI systems and applications (primarily in English)” which involve, for example:

1. “Memory/compute constrained environments.

2. “Latency bound scenarios.

3. [A need for] “reasoning and logic.”

Redmond’s Shital Shah posted: “We have been completely amazed by the response to phi-4 release. A lot of folks had been asking us for weight release. Few even uploaded bootlegged phi-4 eights on HuggingFace…”

Microsoft said that it had to “meticulously curate and filter organic data sources, including web content, licensed books, and code repositories to extract seeds for the synthetic data pipeline that encourage high-depth reasoning and prioritize educational value (to the model). These seeds form the foundation of the synthetic generation pipeline” to train Phi-4.

0:00

/0:05

The open-source release of Phi-4 caps another frantic week in the model evolution space. NVIDIA, for example, unveiled its COSMOS series of world building models intended to help those working on robotics.

Adobe researchers meanwhile released their TransPixar model, which can generate video with transparent background; potentially a hugely influential step forwards in automation of virtual effects (VFX) work.

TransPixar is built with a diffusion transformer (DiT) architecture, incorporating alpha-specific tokens and using LoRA-based fine-tuning. It has been posted to GitHub under a restrictive Adobe Research Licence.

Sign up for The Stack