# Small Language Models

**Small Language Models** (SLMs) are the unsung heroes of the bitGPT Network, offering a smarter way to deliver AI capabilities. Think of them like a specialized tool in your workshop—while a large language model might be like a multi-tool that can handle anything, SLMs are like a precision screwdriver, crafted for specific tasks and designed to get the job done efficiently.

SLMs are smaller, more efficient, and tailored for edge devices, making them ideal for localized AI operations. They’re not meant to be general-purpose like larger models; instead, they’re specialized to handle specific tasks, whether it’s managing your crypto transactions, analyzing market trends, or automating routine processes. This focus on specialization means they can operate faster and with less resource consumption, all while keeping your data private and secure on your device.

<figure><img src="/files/AdjCotIWDZ9Hy7ycQy2c" alt=""><figcaption><p>From Large Language Models (LLMs) to Small Language Models (SLMs</p></figcaption></figure>

The transition to SLMs is a key part of bitGPT’s vision for decentralizing AI. It starts with pre-trained large language models that provide the initial AI capabilities, but over time, these models are **distilled into smaller, more efficient versions** through a process called knowledge **distillation**. These SLMs are then fine-tuned on your device, using your data and preferences to become even more effective at the tasks you care about. This gradual evolution ensures that the network remains scalable, cost-efficient, and adaptable to different use cases.

bitGPT uses two kinds of model training:

1. **Knowledge distillation** from bigger "parent" models to "child" models.
2. **Reasoning-based fine tuning** which translates reasoning abilities into small language models, making the relevant data hyper specific to the operation performed.

From there, we use a combination of **Federated learning** and **differential privacy** to aggregate data across multiple anonymous clients, directing the processed data to a central server for fine tuning. Execution is handled by bridging this data between various client servers and a central server. bitGPT also processes the data through an anonymization layer to ensure any user related data is removed previous to model fine tuning.

This process allows Delegates to be run on distributed machines, potentially completely on the client side by sending inference requests either to a locally running SLM or externally to a larger host. Our architecture allows models to run constantly as inference and agents are only invoked either on a scheduled basis (e.g., once every few hours or once per month) or on an event-driven basis (i.e., when a particular event happens, an execution loop is triggered.)

The result is a network where AI capabilities are **no longer tied to centralized servers** but are instead distributed across edge devices, empowering users with more control over their data and experiences.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://bitgpt-2.gitbook.io/docs/infrastructure/small-language-models.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
