Skip to content
Thoughts on GitHub Models?

Phi-3-mini instruct (128k)

Same Phi-3-mini model, but with a larger context size for RAG or few shot prompting.
Context
131k input · 4k output
Training date
Oct 2023
Rate limit tier
Provider support
Try Phi-3-mini instruct (128k)
Get early access to our playground for modelsJoin our limited beta waiting list today and be among the first to try out an easy way to test models

Model navigation navigation

Microsoft

The Phi-3-Mini-128K-Instruct is a 3.8 billion-parameter, lightweight, state-of-the-art open model trained using the Phi-3 datasets.
This dataset includes both synthetic data and filtered publicly available website data, with an emphasis on high-quality and reasoning-dense properties.

After initial training, the model underwent a post-training process that involved supervised fine-tuning and direct preference optimization to enhance its ability to follow instructions and adhere to safety measures.
When evaluated against benchmarks that test common sense, language understanding, mathematics, coding, long-term context, and logical reasoning, the Phi-3 Mini-128K-Instruct demonstrated robust and state-of-the-art performance among models with fewer than 13 billion parameters.

Resources

🏡 Phi-3 Portal

📰 Phi-3 Microsoft Blog

📖 Phi-3 Technical Report

🛠️ Phi-3 on Azure AI Studio

👩‍🍳 Phi-3 Cookbook

Model Architecture

Phi-3 Mini-128K-Instruct has 3.8B parameters and is a dense decoder-only Transformer model. The model is fine-tuned with Supervised fine-tuning (SFT) and Direct Preference Optimization (DPO) to ensure alignment with human preferences and safety guidelines.

Training Datasets

Our training data includes a wide variety of sources, totaling 4.9 trillion tokens, and is a combination of

  1. Publicly available documents filtered rigorously for quality, selected high-quality educational data, and code;
  2. Newly created synthetic, "textbook - like" data for the purpose of teaching math, coding, common sense reasoning, general knowledge of the world (science, daily activities, theory of mind, etc.);
  3. High quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.

We are focusing on the quality of data that could potentially improve the reasoning ability for the model, and we filter the publicly available documents to contain the correct level of knowledge. As an example, the result of a game in premier league in a particular day might be good training data for frontier models, but we need to remove such information to leave more model capacity for reasoning for the small size models. More details about data can be found in the Phi-3 Technical Report.

Languages

 (1)
English

About

Same Phi-3-mini model, but with a larger context size for RAG or few shot prompting.
Context
131k input · 4k output
Training date
Oct 2023
Rate limit tier
Provider support

Languages

 (1)
English