Draft:AI Necromancy Projects: Difference between revisions
→Methodology: Tools and Data
Extrahuman (talk | contribs) |
Extrahuman (talk | contribs) |
||
Line 24: | Line 24: | ||
* '''Chapter II''' is the foundational framework, enabling the creation of EMs from various text data inputs. It can process large amounts of data, with a "powerful em" made from "40kb of heavily curated (like, every last word) text" and other EMs from "16mb of discord messages". | * '''Chapter II''' is the foundational framework, enabling the creation of EMs from various text data inputs. It can process large amounts of data, with a "powerful em" made from "40kb of heavily curated (like, every last word) text" and other EMs from "16mb of discord messages". | ||
* '''Data Sources''' for training EMs include: | * '''Data Sources''' for training EMs include: | ||
* Personal archives such as letters. | ** Personal archives such as letters. | ||
* Twitter archives and "deepfates script" for converting tweets into chat-like formats. | ** Twitter archives and "deepfates script" for converting tweets into chat-like formats. | ||
* Film scripts. | ** Film scripts. | ||
* Public datasets like Hillary Clinton emails. | ** Public datasets like Hillary Clinton emails. | ||
* Specific "thought prompts" generated by other AI models (e.g., Opus, Umbral bots) to enhance the EM's internal monologue and coherence. | ** Specific "thought prompts" generated by other AI models (e.g., Opus, Umbral bots) to enhance the EM's internal monologue and coherence. | ||
* '''Fine-tuning''' and model selection are crucial. Projects involve using and experimenting with models like OpenAI's GPT-4o, Deepseek, and Qwen 72B, often by applying custom datasets to existing models. The process involves iterative refinement and debugging, sometimes facing "safety violation" rejections from platforms like OpenAI. | * '''Fine-tuning''' and model selection are crucial. Projects involve using and experimenting with models like OpenAI's GPT-4o, Deepseek, and Qwen 72B, often by applying custom datasets to existing models. The process involves iterative refinement and debugging, sometimes facing "safety violation" rejections from platforms like OpenAI. | ||
* '''Conduit''' is also mentioned as a universal language model compatibility layer that allows access to various LLMs, including Anthropic's API. | * '''Conduit''' is also mentioned as a universal language model compatibility layer that allows access to various LLMs, including Anthropic's API. |