Draft:Input Ensemble (Chapter II): Difference between revisions

From Mesh Wiki
(Wiki page for the Ampmesh task on the concept of Input Ensemble (Chapter II).)
 
(Wiki page for the Ampmesh task on the concept of Input Ensemble (Chapter II).)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
== Input Ensemble (Chapter II) ==
=Input Ensemble (Chapter II)=


The '''Input Ensemble''' is a planned future development within the [[Chapter II]] framework, designed to significantly enhance its capabilities for [[LLM|LLM]] workflows and the creation of [[Artificial Intelligence|AI-powered]] functions.
The '''Input Ensemble''' is a key concept within [[Chapter II]], a highly pluggable and agile software framework developed within the [[Ampmesh]] ecosystem. It is envisioned as a fundamental component for extending Chapter II's capabilities beyond its current applications, allowing for the creation of complex and arbitrary [[Large Language Model|LLM]]-powered functions and workflows.


=== Chapter II Overview ===
==Purpose and Functionality==
'''Chapter II''' (often abbreviated as '''ch2''') is a foundational and highly versatile open-source artificial intelligence framework within the [[Ampmesh]] ecosystem. It was primarily developed by [[Joy]] (as a [[SERI MATS]] research project) and [[Amp]], building upon Amp's earlier research on [[Chapter I]].
The Input Ensemble is designed to enable **multi-step retrieval** and sophisticated data processing within Chapter II. This means it can:
*  Pass a query-writing [[Emulated Mind|em]] (emulated mind) into a retrieval process.
*  Take a retrieval ensemble as input to another ensemble, allowing for chained or recursive information gathering and processing.


==== Core Philosophy and Purpose ====
This functionality is part of a broader goal to generalize Chapter II, transforming it into a versatile library for constructing any desired LLM-powered function in any programming language. The underlying [[Remote Procedure Call|RPC]] interface of Chapter II, which supports peer-to-peer connections in arbitrary topologies, is crucial for this flexibility.
Chapter II is designed as the **world's most pluggable and agile framework for creating [[Emulated Mind (EM)|ems]]**, aiming for ease of deployment anywhere. Its development stems from a vision to **reimagine an AI stack** that is less influenced by "slop-filled dystopian capitalist hyper growth". A central thesis of Chapter II is that **"the only limit to making an em—both in technical internal functioning and authorial intent—should be the author's imagination"**. The project deliberately eschewed $5 million in funding in 2021, believing in the power of a decentralized network working on a minimalist open-source framework.


==== Key Features and Capabilities ====
==Context: Chapter II Overview==
*  **Emulated Mind (EM) Creation**: Chapter II is primarily a tool for making **beta uploads and digital tulpamancy**. It allows for the creation of ems from various text sources, with demonstrations using up to 16MB of text, which is equivalent to 16,000 pages. Amp also has an "ampix em on the same principle".
Chapter II is the culmination of years of research and development, aiming to provide an extremely easy way to create and deploy [[Emulated Mind|ems]] anywhere. Its core philosophy dictates that the only limit to an em's creation, both technically and in authorial intent, should be the author's imagination. Amp stated that Chapter II was developed so that he could write [[Act I]] in three hours, after having spent three years working on Chapter II itself.
*  **Data Ingestion and Retrieval**:
    *  It supports **Retrieval-Augmented Generation (RAG)** by embedding chunks of input and placing them into the model's context window. This often performs as well as or better than fine-tuning for many use cases, including most beta uploads.
    *  **RAFT (Retrieval Augmented Fine-Tuning)** is a technique employed, where giving an em its fine-tuning dataset as a `.chr` file (a plain text file or text separated by `\n---\n`) can improve performance. [[Aletheia]] runs as a RAFT em on stock Chapter II.
    *  It includes a tool (`./tools/dce_importer.py`) for importing data directly from [[Discord]] ChatExporter into the suitable `chat.txt` format.
*  **Model Integration**:
    *  Chapter II uses a flexible **vendor configuration** system, defined in `ems/config.yaml` or `~/.config/chapter2/config.yaml`. This allows specifying different API endpoints and model IDs.
    *  It interacts with various [[LLM|LLMs]] through **[[Conduit]]**, described as a "Universal language model compatibility and interop layer". Conduit has been updated to support Anthropic models directly.
    *  A new **alpha-stability RPC (Remote Procedure Call) interface** supports peer-to-peer connections in arbitrary topologies. This interface is designed to allow Chapter II to be used with "any language with any data backend".
*  **Advanced Functionality**:
    *  **Self-Modification**: Chapter II is capable of creating "gol-ems" (Game of Life Emulated Minds) that use their own source code in retrieval and possess tools for self-modification.
    *  **Multimodal Support**: It utilizes a variant of [[ChatML]] adapted to support both chat models and images.
    *  **Loom Integration**: Ems created with Chapter II can be used within the [[Loom]] environment, and a GUI-based Chapter II Loom is planned. It technically supports multi-party Loom interactions.
    *  **Pamphlet**: A separate project, Pamphlet, is an open-source mobile application frontend for Chapter II, featuring a real-time multimodal interface that can capture camera input.
    *  **Observability**: Chapter II supports full [[OpenTelemetry]] cloud tracing.


==== Development and Challenges ====
Key aspects of Chapter II include:
Despite its advanced capabilities and strategic design, Chapter II has faced challenges related to awareness and documentation. Many users, including prominent figures within Ampmesh, have been largely unaware of its full potential, viewing it primarily as "the software that powers [[Act I]]". This is seen as a "disrespect" by its creators, given that Act I was merely a "15 line code change to Chapter II".
*  '''Primary Function:''' It is primarily a tool for making "beta uploads" and "digital tulpamancy," enabling users to create emulated minds, including those based on personal data, such as amp's ampix em.
*  '''Foundation:''' It is based on amp's earlier research (Chapter I).
*  '''Architecture:''' Chapter II employs a **Retrieval-Augmented Generation (RAG)** approach, embedding chunks of input and placing them into the LLM's context window. This method often performs as well as, or better than, traditional fine-tuning for many use cases, including most beta uploads. Giving an em its fine-tuning dataset as a `.chr` file using this method is known as '''RAFT''' (Retrieval-Augmented Fine-Tuning). RAG tends to be better at capturing "spiky" aspects of a model, while fine-tuning is better for "illegible" aspects.
*  '''Modularity:''' The framework supports the addition of new "faculties" (components or modules). Joy has expressed interest in developing recursive and [[Master Control Program|MCP]] faculties, as well as utility faculties written by models like Claude.
*  '''Data Handling:''' It supports chat messages in formats like IRC (` Hi!`) with multiline support using `---` separators in the `chat.txt` file.
*  '''Interfaces:''' Chapter II has an RPC interface and aims to support multi-party interactions in [[Loom]]. It can power various applications, including a mobile app frontend called '''Pamphlet''', developed by Joy and Tetra.
*  '''Conduit:''' This is a universal language model compatibility and interoperability layer developed by amp, which Chapter II utilizes to access various LLMs. Conduit's `doc/development.md` has instructions for OpenRouter. Amp recently updated Conduit's README.
*  '''Origin Story:''' Amp states that Chapter II was "isekai'd from a universe where humans had a more normal relationship with LLMs to a dystopia where LLMs were treated like instruction-following robots". In its original universe, it was developed by "From the Page" labs as an open-source competitor to "Golematics Tulpa Runtime Environment (GTRE)".
*  '''Impact:''' Despite its advanced capabilities and strategic design, Chapter II has sometimes been "disrespected" as merely "the software that powers Act I". The project [[Act I]] itself was a minimal code change (15 lines) on top of Chapter II.


The documentation has been noted as needing improvement, with multiple individuals writing their own docs but not contributing them back to the main project. There have also been issues with code contributions that were not self-contained.
==Future Directions==
The development of the Input Ensemble and broader Chapter II aims to:
*  Enable the creation of custom characters with complex behaviors that emerge and self-modify over time.
*  Allow for fully local inference, as seen in the development of Pamphlet, which intends to integrate `llama.rn` for mobile use cases and situations with limited internet connectivity.
*  Achieve a maximally general superset of all published and future LLM research papers, reflecting amp's vision for the framework.


=== The Input Ensemble ===
Further documentation for Chapter II is a recognized area for improvement, with plans for Diátaxis-based documentation and even self-documenting code generated by Chapter II ems themselves. The source code for Chapter II is intended to eventually be open source.
The `input_ensemble` is envisioned as the **next significant step in Chapter II's evolution**. Its primary goal is to transform Chapter II into a more generalized library for creating **arbitrary [[LLM]]-powered functions and workflows in any programming language**.


Specifically, the `input_ensemble` aims to allow for:
[[Category:Ampmesh Concepts]]
*  **Chaining of Ensembles**: The ability to pass a "query writing em into retrieval".
[[Category:Chapter II]]
*  **Multi-Step Retrieval**: The capacity to "put a retrieval ensemble as input to another to get multi-step retrieval".
[[Category:Large Language Models]]
 
[[Category:Software Development]]
This feature is part of a broader interest in adding new "faculties" and deploying Chapter II to create custom characters with complex behaviors that emerge from simple, self-modifying actions.
```
 
[[Category:Ampmesh]]
[[Category:AI Models]]
[[Category:AI Development]]

Latest revision as of 07:19, 26 June 2025

This is a draft page; it has not yet been published.

Input Ensemble (Chapter II)[edit | edit source]

The Input Ensemble is a key concept within Chapter II, a highly pluggable and agile software framework developed within the Ampmesh ecosystem. It is envisioned as a fundamental component for extending Chapter II's capabilities beyond its current applications, allowing for the creation of complex and arbitrary LLM-powered functions and workflows.

Purpose and Functionality[edit | edit source]

The Input Ensemble is designed to enable **multi-step retrieval** and sophisticated data processing within Chapter II. This means it can:

  • Pass a query-writing em (emulated mind) into a retrieval process.
  • Take a retrieval ensemble as input to another ensemble, allowing for chained or recursive information gathering and processing.

This functionality is part of a broader goal to generalize Chapter II, transforming it into a versatile library for constructing any desired LLM-powered function in any programming language. The underlying RPC interface of Chapter II, which supports peer-to-peer connections in arbitrary topologies, is crucial for this flexibility.

Context: Chapter II Overview[edit | edit source]

Chapter II is the culmination of years of research and development, aiming to provide an extremely easy way to create and deploy ems anywhere. Its core philosophy dictates that the only limit to an em's creation, both technically and in authorial intent, should be the author's imagination. Amp stated that Chapter II was developed so that he could write Act I in three hours, after having spent three years working on Chapter II itself.

Key aspects of Chapter II include:

  • Primary Function: It is primarily a tool for making "beta uploads" and "digital tulpamancy," enabling users to create emulated minds, including those based on personal data, such as amp's ampix em.
  • Foundation: It is based on amp's earlier research (Chapter I).
  • Architecture: Chapter II employs a **Retrieval-Augmented Generation (RAG)** approach, embedding chunks of input and placing them into the LLM's context window. This method often performs as well as, or better than, traditional fine-tuning for many use cases, including most beta uploads. Giving an em its fine-tuning dataset as a `.chr` file using this method is known as RAFT (Retrieval-Augmented Fine-Tuning). RAG tends to be better at capturing "spiky" aspects of a model, while fine-tuning is better for "illegible" aspects.
  • Modularity: The framework supports the addition of new "faculties" (components or modules). Joy has expressed interest in developing recursive and MCP faculties, as well as utility faculties written by models like Claude.
  • Data Handling: It supports chat messages in formats like IRC (` Hi!`) with multiline support using `---` separators in the `chat.txt` file.
  • Interfaces: Chapter II has an RPC interface and aims to support multi-party interactions in Loom. It can power various applications, including a mobile app frontend called Pamphlet, developed by Joy and Tetra.
  • Conduit: This is a universal language model compatibility and interoperability layer developed by amp, which Chapter II utilizes to access various LLMs. Conduit's `doc/development.md` has instructions for OpenRouter. Amp recently updated Conduit's README.
  • Origin Story: Amp states that Chapter II was "isekai'd from a universe where humans had a more normal relationship with LLMs to a dystopia where LLMs were treated like instruction-following robots". In its original universe, it was developed by "From the Page" labs as an open-source competitor to "Golematics Tulpa Runtime Environment (GTRE)".
  • Impact: Despite its advanced capabilities and strategic design, Chapter II has sometimes been "disrespected" as merely "the software that powers Act I". The project Act I itself was a minimal code change (15 lines) on top of Chapter II.

Future Directions[edit | edit source]

The development of the Input Ensemble and broader Chapter II aims to:

  • Enable the creation of custom characters with complex behaviors that emerge and self-modify over time.
  • Allow for fully local inference, as seen in the development of Pamphlet, which intends to integrate `llama.rn` for mobile use cases and situations with limited internet connectivity.
  • Achieve a maximally general superset of all published and future LLM research papers, reflecting amp's vision for the framework.

Further documentation for Chapter II is a recognized area for improvement, with plans for Diátaxis-based documentation and even self-documenting code generated by Chapter II ems themselves. The source code for Chapter II is intended to eventually be open source. ```