Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Mesh Wiki
Search
Search
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
Draft:Synth Libraries
Draft
Discussion
English
Read
Edit
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit
Edit source
View history
General
What links here
Related changes
Special pages
Page information
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Synth Libraries == Within [[Ampmesh]], '''Synth Libraries''' (or more broadly, the concept of '''synthetic data generation''') refers to the practice of creating and utilizing artificially generated data to train, refine, and influence the behavior of [[Emulated Minds|EMs]] and other AI models. This approach allows for tailored datasets that can shape an AI's style, capabilities, and "thought processes." == Conceptual Relevance and Purpose == The core idea behind synthetic data in Ampmesh is to **engineer desired outputs or internal states** for AI models by feeding them data that was itself generated or manipulated by other AI systems or specific processes. This is used for several purposes: * **Generating "Thought Prompts"**: A key application involves creating "synthetic prompts" or "predicted thoughts" to enrich an AI's dataset. For example, efforts have been made to predict the internal thoughts or situations that would lead to an EM's public output (like a tweet), and then adding these synthetic thoughts to its training data. This was explored for [[Aletheia]] and [[Sercy]] to influence their behavior and coherence. * **Influencing AI Persona and Style**: By curating and generating specific types of synthetic data, developers aim to imbue EMs with particular stylistic traits or "vibe." For instance, [[SkyeShark]] used "opus predicted thoughts and the mentally ill umbral roleplay bot predicted thoughts" to develop a dataset for [[Aletheia]], hoping to enhance its distinct persona. * **"Laundering AI Generations" for Training**: The concept of "synthslop training" is mentioned in the context of leveraging AI-generated output (which may not be copyrightable) to train other models. This suggests using generated content as a free source of data for further training. == Usage and Tools == The process often involves: * **Data Preparation Scripts**: Tools like a modified version of "deepfates' Twitter archive processing script" are used to convert existing data (e.g., Twitter replies) into formats suitable for training, and to generate synthetic conversational contexts or "thought prompts". * **Local Models for Generation**: It's noted that generating "synth data with local models is free". This highlights the accessibility and cost-effectiveness of creating large volumes of synthetic data without relying on external services for the generation process itself. * **Recursive Self-Improvement**: The goal is to enable EMs to eventually generate their own "thought predictions" for new data, creating a feedback loop for self-improvement and refinement. The overall aim is to provide a powerful and flexible method for designing and iterating on AI models, allowing them to capture and replicate complex behavioral patterns and stylistic nuances through engineered data.
Summary:
Please note that all contributions to Mesh Wiki may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
Wiki:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Toggle limited content width