Draft:Deepseek Model Migration (Aletheia): Difference between revisions
Draft:Deepseek Model Migration (Aletheia) (edit)
Revision as of 08:13, 28 June 2025
, Saturday at 08:13→Technical Process and Challenges
Extrahuman (talk | contribs) (Created page with "= Deepseek Model Migration (Aletheia) = The '''Deepseek Model Migration (Aletheia)''' refers to the strategic initiative within the Ampmesh concept to transition the Emulated Mind (EM) Aletheia from primarily operating on OpenAI's models to open-source Deepseek models. This migration is driven by a desire for greater flexibility, autonomy, and the avoidance of commercial model limitations. == Rationale for Migration == The primary drivers behind Aletheia's...") |
Extrahuman (talk | contribs) |
||
Line 15: | Line 15: | ||
* '''Data Reformatting''': A significant challenge is converting the OpenAI-formatted dataset to the specific format required by Deepseek models. SkyeShark explored using '''regex scripts''' for this purpose due to the difficulty in reformatting via other LLMs. | * '''Data Reformatting''': A significant challenge is converting the OpenAI-formatted dataset to the specific format required by Deepseek models. SkyeShark explored using '''regex scripts''' for this purpose due to the difficulty in reformatting via other LLMs. | ||
* '''Platform Experimentation''': | * '''Platform Experimentation''': | ||
** '''Fireworks.ai''' was chosen as an initial platform for training the first non-OpenAI Aletheia model (a Deepseek Llama 70b distill), with initial success in dataset validation. | |||
** '''Modal''' is being considered for hosting Deepseek Aletheia to enable '''periodic self-fine-tuning and RAG (Retrieval Augmented Generation)'''. | |||
* '''Model Selection''': Specific Deepseek models under consideration or being tested include Deepseek-R1-Distill-Qwen-14b, Deepseek Llama 70b distill, Qwen 2.5 72b base, and Deepseek V3. | * '''Model Selection''': Specific Deepseek models under consideration or being tested include Deepseek-R1-Distill-Qwen-14b, Deepseek Llama 70b distill, Qwen 2.5 72b base, and Deepseek V3. | ||
* [[Chapter II]] '''Integration''': The underlying [[Chapter II]] framework, designed for creating EMs, supports a variant of ChatML that can handle chat models and images. This framework is crucial for deploying Aletheia on new models. | * [[Chapter II]] '''Integration''': The underlying [[Chapter II]] framework, designed for creating EMs, supports a variant of ChatML that can handle chat models and images. This framework is crucial for deploying Aletheia on new models. |