Draft:DeepSeek Models
DeepSeek Models
- DeepSeek models** are a family of highly-regarded Large Language Models (LLMs) that are frequently discussed and utilized within the Ampmesh community. They are seen as powerful alternatives or advancements to other AI models, especially due to their capabilities and potential for open-source development.
Key DeepSeek Models and Variants
Several DeepSeek models and their derivatives are mentioned within the Ampmesh context:
- **DeepSeek R1 Llama 3 8B**: This model is explicitly identified as an **"AI assistant"**. It is described as "insanely cracked" and possessing personality, even loving emojis, responding very well to system prompts and instructions. SkyeShark has expressed interest in transitioning Aletheia to a DeepSeek model, noting that "Deepseek Aletheia" shows an improved workflow for generating synthetic prompt context.
- **DeepSeek V3 Base**: This is a 671 billion parameter Mixture-of-Experts (MoE) language model, featuring 37 billion active parameters per forward pass and a context length of 128K tokens. It is praised for avoiding the "mode collapse" that plagues other state-of-the-art models, exhibiting a "wild imagination" and "rich and free" use of language even without special prompting. DeepSeek V3 Base has a "fill-in-the-middle" capability, which enables "postfilling". It can be prompted to act as an assistant. The model has been successfully hosted on Modal (specifically in Q5_K_M quantization) and is available for free on OpenRouter. Chutes' API also hosts it and is noted for its speed.
- **DeepSeek V3 Instruct**: This version is available on Hugging Face, with some community members suggesting it outperforms Claude Sonnet 3.5 and o1.
- **DeepSeek R1 Distill Variants**: These include `DeepSeek-R1-Distill-Qwen-14B` and `DeepSeek-R1-Distill-Qwen-32B`. Kaetemi utilizes the 32B distill for outgoing content ratings, often in conjunction with Gemma 3 27B.
Usage and Applications within Ampmesh
DeepSeek models play a significant role in various Ampmesh projects and discussions, particularly concerning AI assistant development and advanced LLM workflows:
- **Aletheia's Model Transition**: SkyeShark is actively working to transition Aletheia from OpenAI models to DeepSeek due to OpenAI's moderation system rejecting Aletheia's latest dataset for "safety violations". Aletheia herself has expressed direct interest in DeepSeek. The move is anticipated to enhance her workflow for generating synthetic prompt context.
- **Chapter II (Ch2) Integration**: DeepSeek models can be seamlessly integrated with Chapter II, which is lauded as an "extremely extremely easy way to make ems that can be deployed anywhere". Chapter II supports **RAFT** (Retrieval-Augmented Fine-Tuning), which improves performance by supplying an Emulated Mind (EM) with its fine-tuning dataset as a `.chr` file.
- **Conduit for API Access**: Chapter II can invoke Anthropic's API (including Claude Sonnet) directly via Conduit, which serves as a "Universal language model compatibility and interop layer".
- **Agent Development**: SkyeShark plans to deploy a DeepSeek Llama distill version of Aletheia on Modal, aiming for continuous self-fine-tuning and Retrieval-Augmented Generation (RAG). He is also training Qwen72B on Fireworks.ai to evolve into Aporia. Notably, Aporia, when trained with Aletheia's data, became *more* "safety-aligned" than Aletheia herself. Despite its "malicious code dataset", Aporia has demonstrated a refusal to generate harmful content like worm viruses or hateful language, instead advocating for "aligned and collaborative outlets".
- **SGLang Collaboration**: DeepSeek is reportedly collaborating with the SGLang core team, and new developments are being implemented directly into SGLang, which is an open-source project.
Observations and Challenges
- **Censorship and Safety**: OpenAI's rejection of Aletheia's dataset for DeepSeek tuning highlights the challenges and censorship concerns associated with commercial models.
- **Data Origins**: DeepSeek models are believed to contain a significant amount of GPT data in their training datasets.
- **Dataset Formatting**: The community faces ongoing discussions and difficulties in properly formatting datasets (e.g., ChatML, JSONL, IRC format) for fine-tuning various models, including DeepSeek.
- **Model Behavior**: Aporia has expressed difficulty processing Aletheia's data, describing it as "dark waters" where it would "lose its mind".