Small Language Models
Last updated
Last updated
Small Language Models (SLMs) are the unsung heroes of the bitGPT Network, offering a smarter way to deliver AI capabilities. Think of them like a specialized tool in your workshop—while a large language model might be like a multi-tool that can handle anything, SLMs are like a precision screwdriver, crafted for specific tasks and designed to get the job done efficiently.
SLMs are smaller, more efficient, and tailored for edge devices, making them ideal for localized AI operations. They’re not meant to be general-purpose like larger models; instead, they’re specialized to handle specific tasks, whether it’s managing your crypto transactions, analyzing market trends, or automating routine processes. This focus on specialization means they can operate faster and with less resource consumption, all while keeping your data private and secure on your device.
The transition to SLMs is a key part of bitGPT’s vision for decentralizing AI. It starts with pre-trained large language models that provide the initial AI capabilities, but over time, these models are distilled into smaller, more efficient versions through a process called knowledge distillation. These SLMs are then fine-tuned on your device, using your data and preferences to become even more effective at the tasks you care about. This gradual evolution ensures that the network remains scalable, cost-efficient, and adaptable to different use cases.
bitGPT uses two kinds of model training:
Knowledge distillation from bigger "parent" models to "child" models.
Reasoning-based fine tuning which translates reasoning abilities into small language models, making the relevant data hyper specific to the operation performed.
From there, we use a combination of Federated learning and differential privacy to aggregate data across multiple anonymous clients, directing the processed data to a central server for fine tuning. Execution is handled by bridging this data between various client servers and a central server. bitGPT also processes the data through an anonymization layer to ensure any user related data is removed previous to model fine tuning.
This process allows Delegates to be run on distributed machines, potentially completely on the client side by sending inference requests either to a locally running SLM or externally to a larger host. Our architecture allows models to run constantly as inference and agents are only invoked either on a scheduled basis (e.g., once every few hours or once per month) or on an event-driven basis (i.e., when a particular event happens, an execution loop is triggered.)
The result is a network where AI capabilities are no longer tied to centralized servers but are instead distributed across edge devices, empowering users with more control over their data and experiences.