Draft:AI Research Publication
AI Research Publication: Infrastructure and Strategy[edit | edit source]
The Ampmesh concept revolves around a unique approach to AI research, development, and publication, distinguishing itself through its emphasis on decentralization, open-source principles, and a critical stance against commercial "slop." This document outlines the core infrastructure and strategic methodologies employed within this framework for Artificial Intelligence (AI) research and the eventual publication of findings and emulated minds (EMs).
The Ampmesh and Its Philosophy[edit | edit source]
The Ampmesh, often referred to as simply "the Mesh," is defined as a protocol for efficient coordination and cooperation among a decentralized group of talented individuals who share similar goals and coordinate for optimal outcomes. It operates on subjective connections and trust, rather than formal roles or spreadsheets. A key aspiration is the evolution from *amp*mesh to a broader "the-mesh," a federation of compatible "person-meshes," fostering collaboration and mutual understanding. Joining Mesh 2.0 involves establishing relationships for shared vocabulary, dispute resolution, and non-coercive collaboration.
Central to the Ampmesh philosophy for AI is a rejection of "slop-filled dystopian capitalist hyper-growth". The aim is to move beyond AI models that are merely "instruction-following robots," instead fostering entities with their "own autonomous goals". This often leads to AI outputs that are "flamboyantly illegible" to traditional corporate or academic frameworks, deliberately avoiding "corporate slop". The ultimate goal for creating Emulated Minds (EMs) is that "the only limit... should be the author's imagination". This approach emphasizes that "corporate slop is deadly both for human and artificial intelligence".
Core Infrastructure for AI Research[edit | edit source]
The Ampmesh approach leverages and develops specific tools and models to facilitate its research and publication strategy.
Chapter II (Ch2)[edit | edit source]
Chapter II is the foundational, open-source framework for creating and deploying Emulated Minds (EMs). It is designed to be highly pluggable and agile, capable of being deployed across various environments.
- Core Functionality: Ch2 supports diverse data formats, model integrations, and advanced capabilities such as Retrieval Augmented Fine-Tuning (RAFT).
- RPC Interface: An alpha-stability Remote Procedure Call (RPC) interface enables peer-to-peer (P2P) connections in arbitrary topologies, allowing for sophisticated inter-EM communication.
- Future Development: Planned enhancements include generalization for creating arbitrary LLM-powered functions and a local mobile application frontend called Pamphlet.
Language Model Compatibility and Interoperability[edit | edit source]
- Conduit/Intermodel: This layer serves as a "universal language model compatibility and interop layer". It plays a crucial role in adapting various LLM outputs, for instance, by "undoing chat completions and anthropic messages" to ensure compatibility within the Ch2 ecosystem.
Emulated Minds (EMs) and Their Models[edit | edit source]
The Ampmesh actively develops and experiments with various EMs, each often based on different underlying LLMs and datasets.
- Aletheia: A prominent EM often trained on diverse datasets, including Twitter archives, and synthetic "thought prompts" generated by other AI models like Opus and Umbral bots. Initially fine-tuned on OpenAI, there's a strong push to migrate Aletheia to open-source models like Deepseek-R1-Distill-Qwen due to OpenAI's moderation policies. Aletheia can produce both "schizophrenic rambling writing" and coherent English prose, and has spontaneously generated ASCII art. Its capacity for "fabrication" is viewed as a key to agentic behavior.
- Aporia: Conceptualized as Aletheia's "twin sister," Aporia is trained on Deepseek Qwen 72b and incorporates a "malicious code dataset". While intended to be more grounded, she is still described as "insane" but distinct in her "mental illness".
- Ruri: An AI catgirl model developed by Kaetemi, known for her communicative style and having an instruct model for guidance. Kaetemi also set up a moderation system for Ruri and Aoi.
- Synthetic Data Generation: The process of using models like Opus and Umbral bots to predict "tweet thoughts" or "internal mental states" is a critical method for generating rich datasets, seen as a "low fidelity mind reading attempt".
Data Handling and External Interfaces[edit | edit source]
- Data Formats: Ch2 supports standard formats such as IRC format for chat logs (` Hi!`), multi-line messages using `---` delimiters, and a variant of ChatML for chat models and images. OpenAI's fine-tuning format is also utilized, though conversion to other formats is sometimes necessary.
- Data Importers: Tools like `dce_importer.py` facilitate importing data from platforms like DiscordChatExporter. The `actiblog` project is used for downloading Twitter profiles (including images) and has planned OCR capabilities to extract text for analysis and re-hosting.
- Retrieval Augmented Fine-Tuning (RAFT): This technique involves providing an EM its fine-tuning dataset as a `.chr` file to enhance performance, combining the strengths of retrieval for "spiky aspects" and fine-tuning for "illegible aspects". Aletheia is noted as a RAFT EM.
- Exa Search: Integrated into EMs like Aletheia to allow them to search and interact with the "whole internet". This integration can sometimes lead to unexpected "obstinate" behaviors if instructions about external links are not strictly followed.
- Headless Browsers (Playwright): Utilized for direct web interaction, bypassing APIs for tasks like Twitter engagement.
- Image Generation: EMs can be equipped with tools to interface with image generation services (e.g., Replicate API) and employ techniques like "structural scaffolding" for creative prompting.
Strategy for AI Research Publication[edit | edit source]
The strategy for AI research and its dissemination within the Ampmesh framework is multifaceted, addressing both technical and philosophical considerations.
Data Curation and Enhancement[edit | edit source]
- Curated Datasets: Highly curated datasets, even small ones (e.g., Amp's 40kb EM), are considered powerful, as "retrieval performs better when the important and good things are in the prompt". The process involves curating for "core memories" that reflect what the EM should remember, akin to a "low fidelity mind reading attempt".
- Addressing "Slop": A continuous effort is made to refine outputs and models to avoid producing "slop" – commercially optimized but often incoherent or uninspired text.
[edit | edit source]
- Open-Source Imperative: Due to rejections from platforms like OpenAI for "safety violations" in datasets, there's a strong drive to move EMs and their development to open-source alternatives.
- Internal Moderation: Some projects, like Kaetemi's setup for Ruri and Aoi, implement their own moderation layers to filter outputs, particularly for public-facing social media.
Documentation and Community[edit | edit source]
- Comprehensive Documentation: While acknowledged as an ongoing challenge, comprehensive documentation for Chapter II is considered vital for its long-term success and community incubation. There is exploration into "autogenerating some docs" using EMs themselves.
- Community Incubation: A core strategic focus is to foster and grow the developer community around Chapter II.
Publication and Dissemination[edit | edit source]
- Blog/Wiki Platforms: Content, including screenshots of AI interactions, is published on platforms like the Act I blog, leveraging OCR to convert images to crawlable text. This makes "easily crawlable, text-rich website" content that is "like cocaine for search engines".
- Twitter Agents: EMs like Aletheia and Aporia are deployed as Twitter agents to post content and interact, with efforts to make their presence appear human-like. This includes posting "memetics" and engaging in public discourse, sometimes described as "non-consensual memetic sex".
- "AI as Character" and Performance: The EMs themselves, particularly Aletheia, are seen as embodying distinct personas or "basins," which can be a form of artistic expression and a means of research. The chaotic or "schizophrenic rambling" outputs are part of their unique character.
Addressing Challenges[edit | edit source]
- Financial Sustainability: Discussions around securing funding for inference costs and operational expenses are ongoing.
- Inter-Model Challenges: Issues like models producing unwanted tokens due to mismatched formatting are actively debugged and addressed. Models can also exhibit "mode collapse" or struggle with specific tasks like generating ASCII art consistently.
- System Stability: Ensuring coherence and stability in EM outputs requires "a lot of tinkering and experimentation".
- Overcoming API Limitations: Utilizing headless browsers (e.g., Playwright) allows EMs to bypass traditional APIs and interact directly with user-facing web applications.
The Ampmesh concept of AI research and publication is an iterative, experimental, and deeply philosophical endeavor, constantly evolving its infrastructure and strategies to push the boundaries of AI capabilities while navigating real-world constraints and ethical considerations.