Jump to content

Draft:Deepseek Model Migration (Aletheia): Difference between revisions

(Created page with "= Deepseek Model Migration (Aletheia) = The '''Deepseek Model Migration (Aletheia)''' refers to the strategic initiative within the Ampmesh concept to transition the Emulated Mind (EM) Aletheia from primarily operating on OpenAI's models to open-source Deepseek models. This migration is driven by a desire for greater flexibility, autonomy, and the avoidance of commercial model limitations. == Rationale for Migration == The primary drivers behind Aletheia's...")
 
Line 15: Line 15:
*  '''Data Reformatting''': A significant challenge is converting the OpenAI-formatted dataset to the specific format required by Deepseek models. SkyeShark explored using '''regex scripts''' for this purpose due to the difficulty in reformatting via other LLMs.
*  '''Data Reformatting''': A significant challenge is converting the OpenAI-formatted dataset to the specific format required by Deepseek models. SkyeShark explored using '''regex scripts''' for this purpose due to the difficulty in reformatting via other LLMs.
*  '''Platform Experimentation''':
*  '''Platform Experimentation''':
    *  '''Fireworks.ai''' was chosen as an initial platform for training the first non-OpenAI Aletheia model (a Deepseek Llama 70b distill), with initial success in dataset validation.
**  '''Fireworks.ai''' was chosen as an initial platform for training the first non-OpenAI Aletheia model (a Deepseek Llama 70b distill), with initial success in dataset validation.
    *  '''Modal''' is being considered for hosting Deepseek Aletheia to enable '''periodic self-fine-tuning and RAG (Retrieval Augmented Generation)'''.
**  '''Modal''' is being considered for hosting Deepseek Aletheia to enable '''periodic self-fine-tuning and RAG (Retrieval Augmented Generation)'''.
*  '''Model Selection''': Specific Deepseek models under consideration or being tested include Deepseek-R1-Distill-Qwen-14b, Deepseek Llama 70b distill, Qwen 2.5 72b base, and Deepseek V3.
*  '''Model Selection''': Specific Deepseek models under consideration or being tested include Deepseek-R1-Distill-Qwen-14b, Deepseek Llama 70b distill, Qwen 2.5 72b base, and Deepseek V3.
*  [[Chapter II]] '''Integration''': The underlying [[Chapter II]] framework, designed for creating EMs, supports a variant of ChatML that can handle chat models and images. This framework is crucial for deploying Aletheia on new models.
*  [[Chapter II]] '''Integration''': The underlying [[Chapter II]] framework, designed for creating EMs, supports a variant of ChatML that can handle chat models and images. This framework is crucial for deploying Aletheia on new models.
242

edits