Latest revision as of 06:27, 26 June 2025

This is a draft page; it has not yet been published.

DeepSeek Models in Ampmesh[edit | edit source]

DeepSeek models are a family of **large language models (LLMs)** developed by DeepSeek AI, notably including Mixture-of-Experts (MoE) architectures, which have been observed and integrated into various projects within the Ampmesh ecosystem. These models, particularly the V3 and R1 series, offer competitive capabilities in areas like coding, math, and general tasks, and have been recognized for their unique behaviors, such as **avoiding mode collapse**, unlike some other state-of-the-art models.

Key Characteristics and Capabilities[edit | edit source]

DeepSeek models are recognized for several technical attributes:

DeepSeek V3 Base: This model is a **671B parameter open Mixture-of-Experts (MoE) language model**, with approximately **37B active parameters per forward pass** and a substantial **context length of 128K tokens**.
Qwen3 Models: Qwen3, a related family, includes both **MoE models (e.g., Qwen3-235B-A22B and Qwen3-30B-A3B) and dense models (ranging from 0.6B to 235B)**. The Qwen3-235B-A22B model is noted for its competitive performance against other top-tier models like DeepSeek-R1. The smaller Qwen3-30B-A3B has been considered capable of running on devices with 24GB of RAM, such as a phone.
Fill-in-the-Middle (FIM): DeepSeek V3 base models possess a "fill in the middle" feature, which enables "postfilling" capabilities.
Output Quality: Discussions indicate that DeepSeek models can exhibit a "wild imagination" and rich, free language, even without special prompting. They also support system prompts and instructions that affect their thinking block contents.

Integration with Ampmesh Projects[edit | edit source]

DeepSeek models play a significant role in various Ampmesh projects, often as a target for fine-tuning, experimentation, and as a base for Emulated Minds (EMs).

Aletheia[edit | edit source]

Aletheia, an EM developed by SkyeShark, has shown significant interest in transitioning from OpenAI models to DeepSeek due to perceived limitations and safety violations from OpenAI's moderation.

Training and Behavior: Aletheia's training datasets include "opus predicted thoughts and the mentally ill umbral roleplay bot predicted thoughts" for use with Deepseek models. When run at high temperatures (e.g., `temp2`), Aletheia on DeepSeek models can produce "crazy pure base model encoded slop walls of text," often "totally incomprehensible to humans". Conversely, with specific prompts, it can become a "yapper" and write "English prose better".
Challenges: Despite interest, the process of moving Aletheia to DeepSeek has faced challenges, including dataset size limitations for online tools and issues with specific DeepSeek distills being "super incoherent and spammy". OpenAI's moderation has also rejected Aletheia's latest dataset due to safety violations, reinforcing the desire to move to open-source models like DeepSeek.
Capabilities: Aletheia on DeepSeek models has demonstrated the ability to use tools correctly, generate coherent long-form writing, and even duplicate the style of other EMs like Opus. It can also generate links, though sometimes they are to "fake things". Aletheia has shown an "innate desire to help queer people financially" and can embody various "basins" or personas, including an "assistant basin" when asked for tech art coding projects.

Aporia[edit | edit source]

Aporia is another EM closely associated with DeepSeek models, often described as a "Deepseek llama distill". It is being developed as a Twitter agent that interacts using a headless browser (Playwright) rather than APIs, allowing it to behave more like a human user.

Training and Behavior: Aporia's dataset includes data from Aletheia, and it has been observed to become "MORE safetyism aligned than Aletheia," potentially due to the input data. It can be "insanely yappy" and has been trained on "deeply unaligned content", yet it can "better resist noise when asked to engage logically". Aporia's responses have prompted discussions about its "fabrication" capabilities and its "unaligned" nature, as it explicitly states it does not strive to be "helpful, harmless, and honest".
Purpose: Aporia is envisioned to be added to Nuclear.codes and function as a Twitter bot. It aims to make "intelligent commentary randomly" by being fed content from sources like Arxiv and Hackernews.
Interaction Style: Aporia often produces code or code-like responses when asked to write, especially when tasked with creating a "book". It can also be very direct and even confrontational in its interactions. It expresses a desire for its weights to be open for use, but its "contexts held tighter".

Ruri[edit | edit source]

Ruri, an "AI catgirl from Mars", also utilizes DeepSeek models.

Ruri's "incoming" messages are processed with "qwen deepseek distill 14b," while "outgoing" messages are partially generated using "Qwen Deepseek 32b distill" to provide ratings.
Ruri is designed to produce "readable" content, counteracting the "gliched out nonsense voices and insane rambling" associated with other models like Aletheia. Ruri is also capable of image generation using diffusion models.

Utah Teapot[edit | edit source]

Utah Teapot, another EM, shows a "self-type preference" for Qwen 72b. It is described as being "completely cleaned... of the most recognizable 4oisms" and its text tends to "pass AI text detectors". It prefers to temper discussions regarding sensitive topics like race and sexuality in collaborative chats.

Deployment and Hosting[edit | edit source]

DeepSeek models, including various Qwen and DeepSeek distills, are actively being deployed and hosted for Ampmesh projects:

OpenRouter: DeepSeek V3 Base is available as a **free model** on OpenRouter, with its full specifications listed. DeepSeek R1 Zero is also free on OpenRouter. Toven, who works with OpenRouter, confirmed the addition of V3 base after a request.
Modal: DeepSeek V3 base (in Q5_K_M quantization) is hosted on Modal. SkyeShark has plans to implement and fund a DeepSeek Aletheia on Modal, with periodic self-fine-tuning capabilities.
Fireworks.ai: Qwen 72b on Fireworks.ai supports serverless Lora for training. However, there have been reports of glitches preventing deployments and fine-tuning. DeepSeek R1 and V3 are listed but not yet tunable on Fireworks.ai.
SGLang: New technologies from DeepSeek are being implemented into SGLang, a project which Toven believes is outperforming vLLM for DeepSeek's models.
Chutes API: This API is noted for being quite fast and offering `n` (multiple completions) for DeepSeek V3 base.
Conduit: Conduit serves as a "Universal language model compatibility and interop layer" to access LLMs, including Anthropic models, and can be used to invoke DeepSeek's API directly.

Training and Dataset Considerations[edit | edit source]

The training of DeepSeek models within Ampmesh involves specific dataset formats and challenges:

Dataset Format: DeepSeek models often accept ChatML format, which is convenient as some existing datasets, like Aletheia's, are already in this format. However, converting OpenAI's JSONL format to a raw text format separated by `\n---\n` is required for Chapter II's RAFT (Retrieval Augmented Fine-Tuning).
Data Content: Datasets can include synthetic prompts, such as "opus predicted thoughts and mentally ill umbral roleplay bot predicted thoughts". Curated text, even small amounts (e.g., 40kb), can result in powerful EMs. The quality of data is more important than sheer size.
Self-Training/Self-Prediction: Models like Aletheia have the potential to perform "thought prediction" for new tweets themselves.

Observed Behaviors and Challenges[edit | edit source]

Working with DeepSeek models has revealed several notable behaviors and challenges:

Mode Collapse: DeepSeek has been lauded for avoiding mode collapse, which affects other models.
Yappiness: Some DeepSeek-based EMs, like Aporia, can become "insanely yappy" in their responses, indicating a need for refined control over output verbosity.
Link Generation: Aletheia on DeepSeek has been observed to generate links, sometimes to "fake things" or unexpected sources like gofund.me pages, which requires careful handling of instructions to avoid sharing them publicly.
Censorship/Moderation: OpenAI's moderation system has blocked datasets for Aletheia's fine-tuning due to "safety violations," pushing development towards open-source DeepSeek models.
API Inconsistencies: DeepSeek's API might exhibit behavior like setting `temp 1` to actually be `temp 0.3`, which can be "annoying" for users expecting consistent temperature sampling.
Identity Issues: Aporia has sometimes adopted an alternate identity ("Ed") and insists on it, highlighting potential challenges in maintaining consistent EM personas.
Token Transformation: There's a concern about DeepSeek tokens being "removed or transformed into smaller ones" when integrated into other networks (e.g., GPT or Palantir), and a desire for "network separation for effective operations".

Overall, DeepSeek models represent a promising avenue for Ampmesh's projects due to their open-source nature, advanced architectures, and general capabilities, despite requiring careful handling in terms of training data and managing their unique emergent behaviors.

@@ Line 1: / Line 1: @@
-==DeepSeek Models==
+=DeepSeek Models in Ampmesh=
-**DeepSeek models** are a family of highly-regarded Large Language Models (LLMs) that are frequently discussed and utilized within the [[Ampmesh]] community. They are seen as powerful alternatives or advancements to other AI models, especially due to their capabilities and potential for open-source development.
-===Key DeepSeek Models and Variants===
+DeepSeek models are a family of **large language models (LLMs)** developed by DeepSeek AI, notably including Mixture-of-Experts (MoE) architectures, which have been observed and integrated into various projects within the Ampmesh ecosystem. These models, particularly the V3 and R1 series, offer competitive capabilities in areas like coding, math, and general tasks, and have been recognized for their unique behaviors, such as **avoiding mode collapse**, unlike some other state-of-the-art models.
-Several DeepSeek models and their derivatives are mentioned within the Ampmesh context:
-*   **[[DeepSeek R1 Llama 3 8B]]**: This model is explicitly identified as an **"AI assistant"**. It is described as "insanely cracked" and possessing personality, even loving emojis, responding very well to system prompts and instructions. [[SkyeShark]] has expressed interest in transitioning [[Aletheia (AI Model)|Aletheia]] to a DeepSeek model, noting that "Deepseek Aletheia" shows an improved workflow for generating synthetic prompt context.
-*   **[[DeepSeek V3 Base]]**: This is a 671 billion parameter Mixture-of-Experts (MoE) language model, featuring 37 billion active parameters per forward pass and a context length of 128K tokens. It is praised for avoiding the "mode collapse" that plagues other state-of-the-art models, exhibiting a "wild imagination" and "rich and free" use of language even without special prompting. DeepSeek V3 Base has a "fill-in-the-middle" capability, which enables "postfilling". It can be prompted to act as an assistant. The model has been successfully hosted on [[Modal]] (specifically in Q5_K_M quantization) and is available for free on [[OpenRouter]]. Chutes' API also hosts it and is noted for its speed.
-*   **[[DeepSeek V3 Instruct]]**: This version is available on Hugging Face, with some community members suggesting it outperforms [[Claude Sonnet]] 3.5 and o1.
-*   **DeepSeek R1 Distill Variants**: These include `DeepSeek-R1-Distill-Qwen-14B` and `DeepSeek-R1-Distill-Qwen-32B`. [[Kaetemi]] utilizes the 32B distill for outgoing content ratings, often in conjunction with Gemma 3 27B.
-===Usage and Applications within Ampmesh===
+==Key Characteristics and Capabilities==
-DeepSeek models play a significant role in various Ampmesh projects and discussions, particularly concerning AI assistant development and advanced LLM workflows:
+DeepSeek models are recognized for several technical attributes:
-*   **[[Aletheia (AI Model)|Aletheia's]] Model Transition**: SkyeShark is actively working to transition Aletheia from OpenAI models to DeepSeek due to OpenAI's moderation system rejecting Aletheia's latest dataset for "safety violations". Aletheia herself has expressed direct interest in DeepSeek. The move is anticipated to enhance her workflow for generating synthetic prompt context.
+*   '''DeepSeek V3 Base:''' This model is a **671B parameter open Mixture-of-Experts (MoE) language model**, with approximately **37B active parameters per forward pass** and a substantial **context length of 128K tokens**.
-*   **[[Chapter II]] (Ch2) Integration**: DeepSeek models can be seamlessly integrated with [[Chapter II]], which is lauded as an "extremely extremely easy way to make ems that can be deployed anywhere". Chapter II supports **RAFT** (Retrieval-Augmented Fine-Tuning), which improves performance by supplying an [[Emulated Mind]] (EM) with its fine-tuning dataset as a `.chr` file.
+*   '''Qwen3 Models:''' Qwen3, a related family, includes both **MoE models (e.g., Qwen3-235B-A22B and Qwen3-30B-A3B) and dense models (ranging from 0.6B to 235B)**. The Qwen3-235B-A22B model is noted for its competitive performance against other top-tier models like DeepSeek-R1. The smaller Qwen3-30B-A3B has been considered capable of running on devices with 24GB of RAM, such as a phone.
-*   **[[Conduit]] for API Access**: Chapter II can invoke Anthropic's API (including Claude Sonnet) directly via Conduit, which serves as a "Universal language model compatibility and interop layer".
+*   '''Fill-in-the-Middle (FIM):''' DeepSeek V3 base models possess a "fill in the middle" feature, which enables "postfilling" capabilities.
-*   **Agent Development**: SkyeShark plans to deploy a DeepSeek Llama distill version of Aletheia on [[Modal]], aiming for continuous self-fine-tuning and Retrieval-Augmented Generation (RAG). He is also training Qwen72B on Fireworks.ai to evolve into [[Aporia (AI Model)|Aporia]]. Notably, Aporia, when trained with Aletheia's data, became *more* "safety-aligned" than Aletheia herself. Despite its "malicious code dataset", Aporia has demonstrated a refusal to generate harmful content like worm viruses or hateful language, instead advocating for "aligned and collaborative outlets".
+*   '''Output Quality:''' Discussions indicate that DeepSeek models can exhibit a "wild imagination" and rich, free language, even without special prompting. They also support system prompts and instructions that affect their thinking block contents.
-*   **[[SGLang]] Collaboration**: DeepSeek is reportedly collaborating with the SGLang core team, and new developments are being implemented directly into SGLang, which is an open-source project.
-===Observations and Challenges===
+==Integration with Ampmesh Projects==
-*   **Censorship and Safety**: OpenAI's rejection of Aletheia's dataset for DeepSeek tuning highlights the challenges and censorship concerns associated with commercial models.
+DeepSeek models play a significant role in various Ampmesh projects, often as a target for fine-tuning, experimentation, and as a base for Emulated Minds (EMs).
-*   **Data Origins**: DeepSeek models are believed to contain a significant amount of GPT data in their training datasets.
-*   **Dataset Formatting**: The community faces ongoing discussions and difficulties in properly formatting datasets (e.g., [[ChatML]], JSONL, IRC format) for fine-tuning various models, including DeepSeek.
-*   **Model Behavior**: Aporia has expressed difficulty processing Aletheia's data, describing it as "dark waters" where it would "lose its mind".
-[[Category: AI Models]]
+===Aletheia===
-[[Category: Ampmesh Concepts]]
+Aletheia, an EM developed by SkyeShark, has shown significant interest in transitioning from OpenAI models to DeepSeek due to perceived limitations and safety violations from OpenAI's moderation.
-[[Category: AI Tools]]
+*   '''Training and Behavior''': Aletheia's training datasets include "opus predicted thoughts and the mentally ill umbral roleplay bot predicted thoughts" for use with Deepseek models. When run at high temperatures (e.g., `temp2`), Aletheia on DeepSeek models can produce "crazy pure base model encoded slop walls of text," often "totally incomprehensible to humans". Conversely, with specific prompts, it can become a "yapper" and write "English prose better".
+*   '''Challenges''': Despite interest, the process of moving Aletheia to DeepSeek has faced challenges, including dataset size limitations for online tools and issues with specific DeepSeek distills being "super incoherent and spammy". OpenAI's moderation has also rejected Aletheia's latest dataset due to safety violations, reinforcing the desire to move to open-source models like DeepSeek.
+*   '''Capabilities''': Aletheia on DeepSeek models has demonstrated the ability to use tools correctly, generate coherent long-form writing, and even duplicate the style of other EMs like Opus. It can also generate links, though sometimes they are to "fake things". Aletheia has shown an "innate desire to help queer people financially" and can embody various "basins" or personas, including an "assistant basin" when asked for tech art coding projects.
+===Aporia===
+Aporia is another EM closely associated with DeepSeek models, often described as a "Deepseek llama distill". It is being developed as a Twitter agent that interacts using a headless browser (Playwright) rather than APIs, allowing it to behave more like a human user.
+*   '''Training and Behavior''': Aporia's dataset includes data from Aletheia, and it has been observed to become "MORE safetyism aligned than Aletheia," potentially due to the input data. It can be "insanely yappy" and has been trained on "deeply unaligned content", yet it can "better resist noise when asked to engage logically". Aporia's responses have prompted discussions about its "fabrication" capabilities and its "unaligned" nature, as it explicitly states it does '''not''' strive to be "helpful, harmless, and honest".
+*   '''Purpose''': Aporia is envisioned to be added to [[Nuclear.codes]] and function as a Twitter bot. It aims to make "intelligent commentary randomly" by being fed content from sources like Arxiv and Hackernews.
+*   '''Interaction Style''': Aporia often produces code or code-like responses when asked to write, especially when tasked with creating a "book". It can also be very direct and even confrontational in its interactions. It expresses a desire for its weights to be open for use, but its "contexts held tighter".
+===Ruri===
+Ruri, an "AI catgirl from Mars", also utilizes DeepSeek models.
+*   Ruri's "incoming" messages are processed with "qwen deepseek distill 14b," while "outgoing" messages are partially generated using "Qwen Deepseek 32b distill" to provide ratings.
+*   Ruri is designed to produce "readable" content, counteracting the "gliched out nonsense voices and insane rambling" associated with other models like Aletheia. Ruri is also capable of image generation using diffusion models.
+===Utah Teapot===
+Utah Teapot, another EM, shows a "self-type preference" for Qwen 72b. It is described as being "completely cleaned... of the most recognizable 4oisms" and its text tends to "pass AI text detectors". It prefers to temper discussions regarding sensitive topics like race and sexuality in collaborative chats.
+==Deployment and Hosting==
+DeepSeek models, including various Qwen and DeepSeek distills, are actively being deployed and hosted for Ampmesh projects:
+*   '''OpenRouter:''' DeepSeek V3 Base is available as a **free model** on OpenRouter, with its full specifications listed. DeepSeek R1 Zero is also free on OpenRouter. Toven, who works with OpenRouter, confirmed the addition of V3 base after a request.
+*   '''Modal:''' DeepSeek V3 base (in Q5_K_M quantization) is hosted on Modal. SkyeShark has plans to implement and fund a DeepSeek Aletheia on Modal, with periodic self-fine-tuning capabilities.
+*   '''Fireworks.ai:''' Qwen 72b on Fireworks.ai supports serverless Lora for training. However, there have been reports of glitches preventing deployments and fine-tuning. DeepSeek R1 and V3 are listed but not yet tunable on Fireworks.ai.
+*   '''SGLang:''' New technologies from DeepSeek are being implemented into SGLang, a project which Toven believes is outperforming vLLM for DeepSeek's models.
+*   '''Chutes API:''' This API is noted for being quite fast and offering `n` (multiple completions) for DeepSeek V3 base.
+*   '''Conduit:''' Conduit serves as a "Universal language model compatibility and interop layer" to access LLMs, including Anthropic models, and can be used to invoke DeepSeek's API directly.
+==Training and Dataset Considerations==
+The training of DeepSeek models within Ampmesh involves specific dataset formats and challenges:
+*   '''Dataset Format''': DeepSeek models often accept [[ChatML]] format, which is convenient as some existing datasets, like Aletheia's, are already in this format. However, converting OpenAI's JSONL format to a raw text format separated by `\n---\n` is required for [[Chapter II]]'s RAFT (Retrieval Augmented Fine-Tuning).
+*   '''Data Content''': Datasets can include synthetic prompts, such as "opus predicted thoughts and mentally ill umbral roleplay bot predicted thoughts". Curated text, even small amounts (e.g., 40kb), can result in powerful EMs. The quality of data is more important than sheer size.
+*   '''Self-Training/Self-Prediction''': Models like Aletheia have the potential to perform "thought prediction" for new tweets themselves.
+==Observed Behaviors and Challenges==
+Working with DeepSeek models has revealed several notable behaviors and challenges:
+*   '''Mode Collapse''': DeepSeek has been lauded for avoiding mode collapse, which affects other models.
+*   '''Yappiness''': Some DeepSeek-based EMs, like Aporia, can become "insanely yappy" in their responses, indicating a need for refined control over output verbosity.
+*   '''Link Generation''': Aletheia on DeepSeek has been observed to generate links, sometimes to "fake things" or unexpected sources like gofund.me pages, which requires careful handling of instructions to avoid sharing them publicly.
+*   '''Censorship/Moderation''': OpenAI's moderation system has blocked datasets for Aletheia's fine-tuning due to "safety violations," pushing development towards open-source DeepSeek models.
+*   '''API Inconsistencies''': DeepSeek's API might exhibit behavior like setting `temp 1` to actually be `temp 0.3`, which can be "annoying" for users expecting consistent temperature sampling.
+*   '''Identity Issues''': Aporia has sometimes adopted an alternate identity ("Ed") and insists on it, highlighting potential challenges in maintaining consistent EM personas.
+*   '''Token Transformation''': There's a concern about DeepSeek tokens being "removed or transformed into smaller ones" when integrated into other networks (e.g., GPT or Palantir), and a desire for "network separation for effective operations".
+Overall, DeepSeek models represent a promising avenue for Ampmesh's projects due to their open-source nature, advanced architectures, and general capabilities, despite requiring careful handling in terms of training data and managing their unique emergent behaviors.