Draft:AI Necromancy Projects: Difference between revisions
→Methodology: Tools and Data
Extrahuman (talk | contribs) |
Extrahuman (talk | contribs) |
||
Line 24: | Line 24: | ||
* '''Chapter II''' is the foundational framework, enabling the creation of EMs from various text data inputs. It can process large amounts of data, with a "powerful em" made from "40kb of heavily curated (like, every last word) text" and other EMs from "16mb of discord messages". | * '''Chapter II''' is the foundational framework, enabling the creation of EMs from various text data inputs. It can process large amounts of data, with a "powerful em" made from "40kb of heavily curated (like, every last word) text" and other EMs from "16mb of discord messages". | ||
* '''Data Sources''' for training EMs include: | * '''Data Sources''' for training EMs include: | ||
Personal archives such as letters. | |||
Twitter archives and "deepfates script" for converting tweets into chat-like formats. | |||
Film scripts. | |||
Public datasets like Hillary Clinton emails. | |||
Specific "thought prompts" generated by other AI models (e.g., Opus, Umbral bots) to enhance the EM's internal monologue and coherence. | |||
* '''Fine-tuning''' and model selection are crucial. Projects involve using and experimenting with models like OpenAI's GPT-4o, Deepseek, and Qwen 72B, often by applying custom datasets to existing models. The process involves iterative refinement and debugging, sometimes facing "safety violation" rejections from platforms like OpenAI. | * '''Fine-tuning''' and model selection are crucial. Projects involve using and experimenting with models like OpenAI's GPT-4o, Deepseek, and Qwen 72B, often by applying custom datasets to existing models. The process involves iterative refinement and debugging, sometimes facing "safety violation" rejections from platforms like OpenAI. | ||
* '''Conduit''' is also mentioned as a universal language model compatibility layer that allows access to various LLMs, including Anthropic's API. | * '''Conduit''' is also mentioned as a universal language model compatibility layer that allows access to various LLMs, including Anthropic's API. |