Draft:Input Ensemble (Chapter II): Difference between revisions

Wiki page for the Ampmesh task on the concept of Input Ensemble (Chapter II).
(Wiki page for the Ampmesh task on the concept of Input Ensemble (Chapter II).)
(Wiki page for the Ampmesh task on the concept of Input Ensemble (Chapter II).)
 
Line 1: Line 1:
== Input Ensemble (Chapter II) ==
=Input Ensemble (Chapter II)=


The '''Input Ensemble''' is a planned future development within the [[Chapter II]] framework, designed to significantly enhance its capabilities for [[LLM|LLM]] workflows and the creation of [[Artificial Intelligence|AI-powered]] functions.
The '''Input Ensemble''' is a key concept within [[Chapter II]], a highly pluggable and agile software framework developed within the [[Ampmesh]] ecosystem. It is envisioned as a fundamental component for extending Chapter II's capabilities beyond its current applications, allowing for the creation of complex and arbitrary [[Large Language Model|LLM]]-powered functions and workflows.


=== Chapter II Overview ===
==Purpose and Functionality==
'''Chapter II''' (often abbreviated as '''ch2''') is a foundational and highly versatile open-source artificial intelligence framework within the [[Ampmesh]] ecosystem. It was primarily developed by [[Joy]] (as a [[SERI MATS]] research project) and [[Amp]], building upon Amp's earlier research on [[Chapter I]].
The Input Ensemble is designed to enable **multi-step retrieval** and sophisticated data processing within Chapter II. This means it can:
*  Pass a query-writing [[Emulated Mind|em]] (emulated mind) into a retrieval process.
*  Take a retrieval ensemble as input to another ensemble, allowing for chained or recursive information gathering and processing.


==== Core Philosophy and Purpose ====
This functionality is part of a broader goal to generalize Chapter II, transforming it into a versatile library for constructing any desired LLM-powered function in any programming language. The underlying [[Remote Procedure Call|RPC]] interface of Chapter II, which supports peer-to-peer connections in arbitrary topologies, is crucial for this flexibility.
Chapter II is designed as the **world's most pluggable and agile framework for creating [[Emulated Mind (EM)|ems]]**, aiming for ease of deployment anywhere. Its development stems from a vision to **reimagine an AI stack** that is less influenced by "slop-filled dystopian capitalist hyper growth". A central thesis of Chapter II is that **"the only limit to making an em—both in technical internal functioning and authorial intent—should be the author's imagination"**. The project deliberately eschewed $5 million in funding in 2021, believing in the power of a decentralized network working on a minimalist open-source framework.


==== Key Features and Capabilities ====
==Context: Chapter II Overview==
Chapter II is built with a range of features enabling flexible and advanced AI development:
Chapter II is the culmination of years of research and development, aiming to provide an extremely easy way to create and deploy [[Emulated Mind|ems]] anywhere. Its core philosophy dictates that the only limit to an em's creation, both technically and in authorial intent, should be the author's imagination. Amp stated that Chapter II was developed so that he could write [[Act I]] in three hours, after having spent three years working on Chapter II itself.
*  **Emulated Mind (EM) Creation**:
**  It is a primary tool for making **"beta uploads and digital tulpamancy"** [1].
**  Supports creation of ems from various text sources, demonstrated with up to **16MB of text** (equivalent to 16,000 pages) [2-4].
**  Capable of creating "gol-ems" (Game of Life Emulated Minds) that use their own source code for retrieval and possess tools for **self-modification**.
**  Designed for building **custom characters with complex behaviors** that emerge from simple, self-modifying actions [5, 6].
**  [[Aletheia]] operates as a [[RAFT]] em on stock Chapter II. The most powerful em created is noted as **40kb of "heavily curated" text** [7, 8].


**Data Ingestion and Retrieval**:
Key aspects of Chapter II include:
*Supports **Retrieval-Augmented Generation (RAG)** by embedding chunks of input and placing them into the model's context window [9]. This often performs as well as or better than fine-tuning for many use cases, including most beta uploads [9].
'''Primary Function:''' It is primarily a tool for making "beta uploads" and "digital tulpamancy," enabling users to create emulated minds, including those based on personal data, such as amp's ampix em.
**  Utilizes **RAFT (Retrieval Augmented Fine-Tuning)**, where providing an em its fine-tuning dataset as a `.chr` file (plain text or text separated by `\n---\n`) can improve performance [8, 10].
*   '''Foundation:''' It is based on amp's earlier research (Chapter I).
*Includes a tool (`./tools/dce_importer.py`) for importing data directly from [[Discord]] ChatExporter into the suitable `chat.txt` format [1].
'''Architecture:''' Chapter II employs a **Retrieval-Augmented Generation (RAG)** approach, embedding chunks of input and placing them into the LLM's context window. This method often performs as well as, or better than, traditional fine-tuning for many use cases, including most beta uploads. Giving an em its fine-tuning dataset as a `.chr` file using this method is known as '''RAFT''' (Retrieval-Augmented Fine-Tuning). RAG tends to be better at capturing "spiky" aspects of a model, while fine-tuning is better for "illegible" aspects.
**  While fine-tuning is better at capturing "illegible aspects," retrieval excels at "spiky aspects" [8].
*  '''Modularity:''' The framework supports the addition of new "faculties" (components or modules). Joy has expressed interest in developing recursive and [[Master Control Program|MCP]] faculties, as well as utility faculties written by models like Claude.
*Areas of interest include more sophisticated retrieval techniques, ranging from [[HyDE]] to better chunking methods [9].
'''Data Handling:''' It supports chat messages in formats like IRC (` Hi!`) with multiline support using `---` separators in the `chat.txt` file.
*  '''Interfaces:''' Chapter II has an RPC interface and aims to support multi-party interactions in [[Loom]]. It can power various applications, including a mobile app frontend called '''Pamphlet''', developed by Joy and Tetra.
*   '''Conduit:''' This is a universal language model compatibility and interoperability layer developed by amp, which Chapter II utilizes to access various LLMs. Conduit's `doc/development.md` has instructions for OpenRouter. Amp recently updated Conduit's README.
'''Origin Story:''' Amp states that Chapter II was "isekai'd from a universe where humans had a more normal relationship with LLMs to a dystopia where LLMs were treated like instruction-following robots". In its original universe, it was developed by "From the Page" labs as an open-source competitor to "Golematics Tulpa Runtime Environment (GTRE)".
'''Impact:''' Despite its advanced capabilities and strategic design, Chapter II has sometimes been "disrespected" as merely "the software that powers Act I". The project [[Act I]] itself was a minimal code change (15 lines) on top of Chapter II.


*  **Model Integration and Compatibility**:
==Future Directions==
*Uses a flexible **vendor configuration** system, defined in `ems/config.yaml` or `~/.config/chapter2/config.yaml`, which allows specifying different API endpoints and model IDs [3, 11].
The development of the Input Ensemble and broader Chapter II aims to:
*Interacts with various [[LLM|LLMs]] through **[[Conduit]]**, described as a "Universal language model compatibility and interop layer" [12, 13]. Conduit has been updated to support Anthropic models directly [14].
Enable the creation of custom characters with complex behaviors that emerge and self-modify over time.
**  Features a new **alpha-stability RPC (Remote Procedure Call) interface** that supports peer-to-peer connections in arbitrary topologies, designed to allow Chapter II to be used with "any language with any data backend".
Allow for fully local inference, as seen in the development of Pamphlet, which intends to integrate `llama.rn` for mobile use cases and situations with limited internet connectivity.
*Aims to implement the **"maximally general superset of all published and future papers"** [15].
Achieve a maximally general superset of all published and future LLM research papers, reflecting amp's vision for the framework.
**  The "intermodel" component is capable of undoing both chat completions and Anthropic messages [16].


*  **Advanced Functionality & Ecosystem Tools**:
Further documentation for Chapter II is a recognized area for improvement, with plans for Diátaxis-based documentation and even self-documenting code generated by Chapter II ems themselves. The source code for Chapter II is intended to eventually be open source.
**  **Multimodal Support**: It features a **real-time multimodal interface** in the Pamphlet mobile application, including support for camera input [6, 17].
**  **Loom Integration**: Ems created with Chapter II can be used within the [[Loom]] environment, and a GUI-based Chapter II Loom is planned [18, 19].
**  **Pamphlet**: A separate open-source mobile application frontend for Chapter II, designed for **fully local inference** and featuring a **real-time multimodal interface** that can capture camera input [6, 20, 21].
**  There is interest in adding new "faculties" [6].
**  Chapter II was originally conceived as a writing project.


==== Development and Challenges ====
[[Category:Ampmesh Concepts]]
Despite its advanced capabilities and strategic design, Chapter II has faced challenges related to awareness and documentation. Many users, including prominent figures within Ampmesh, have been largely unaware of its full potential, viewing it primarily as "the software that powers [[Act I]]". This is seen as a "disrespect" by its creators, given that Act I was merely a "15 line code change to Chapter II".
[[Category:Chapter II]]
 
[[Category:Large Language Models]]
The documentation has been noted as needing improvement, with multiple individuals writing their own docs but not contributing them back to the main project. There have also been issues with code contributions that were not self-contained.
[[Category:Software Development]]
 
```
=== The Input Ensemble ===
The `input_ensemble` is envisioned as the **next significant step in Chapter II's evolution**. Its primary goal is to transform Chapter II into a more generalized library for creating **arbitrary [[LLM]]-powered functions and workflows in any programming language**.
 
Specifically, the `input_ensemble` aims to allow for:
*  **Chaining of Ensembles**: The ability to pass a "query writing em into retrieval".
*  **Multi-Step Retrieval**: The capacity to "put a retrieval ensemble as input to another to get multi-step retrieval".
 
This feature is part of a broader interest in adding new "faculties" and deploying Chapter II to create custom characters with complex behaviors that emerge from simple, self-modifying actions.
 
[[Category:Ampmesh]]
[[Category:AI Models]]
[[Category:AI Development]]
242

edits