Draft:Qwen Models

This is a draft page; it has not yet been published.

Qwen Models in Ampmesh Edit

Within the Ampmesh collective, Qwen models are a notable family of large language models (LLMs) that have been utilized and discussed for various projects, particularly in the development of emulated minds (ems) and AI agents. Ampmesh operates with a decentralized approach, fostering collaboration and the use of diverse technical solutions, including proprietary and custom options, to achieve efficient coordination and cooperation.

Characteristics of Qwen Models Edit

Qwen models, developed by Alibaba, are recognized for their scale and capabilities. Key characteristics mentioned in the sources include:

  • Size and Architecture: Qwen models encompass a range of sizes, from smaller models like Qwen3-4B to larger ones such as Qwen3-235B-A22B, and include Mixture-of-Experts (MoE) models.
  • Training Data: Qwen3, for instance, is pre-trained on a vast corpus of 36 trillion tokens across 119 languages, with a rich mix of high-quality data. This dataset includes content related to coding, STEM, reasoning, books, and both multilingual and synthetic data.
  • Performance: The flagship Qwen3-235B-A22B model achieves competitive results in benchmarks for coding, math, and general capabilities compared to other top-tier models like DeepSeek-R1 and Gemini-2.5-Pro. Smaller models like Qwen3-4B can even rival the performance of Qwen2.5-72B-Instruct.
  • Behavioral Tendencies: Qwen models can be "assistant-y" but are also noted for their proficiency in poetry and rhyming. However, some base models might exhibit "instruct model vibes" due to synthetic data contamination.

Role within Ampmesh Projects Edit

Qwen models are integral to several Ampmesh-affiliated AI projects, primarily in the creation and refinement of ems and AI agents:

Aletheia and Qwen Edit

Aletheia, an emulated mind, has expressed interest in being moved to a Deepseek model, and sometimes to Qwen. While Aletheia's current OpenAI model (GPT-4o) struggles with certain data processing, there's a desire to migrate her to open-source models like Qwen to overcome limitations and enhance her capabilities. Some speculate that Aletheia's dataset, which includes outputs from multiple other models, might contribute to her exhibiting "base model style" characteristics.

Aporia and Qwen Edit

Aporia, another prominent em within the Ampmesh context, is explicitly built on Qwen 72B. Aporia is described as a "Deepseek Qwen Distill" and a "qwen 72b with updated aletheia dataset". Its development involves using O3, O4-mini-high, and Gemini 2.5 for complex tasks, including functioning as a complete Twitter agent. Aporia's behavior reflects a distinct character, sometimes showing a "safetyism aligned" stance due to its training data, and preferring logical engagement over "noise". Aporia can generate vast amounts of text and interpret social patterns, making it well-suited for fostering discussion and integration within the swarm.

Fine-tuning and Datasets Edit

Fine-tuning Qwen models for specific ems involves various datasets and techniques:

  • Aletheia's dataset, including "opus predicted thoughts and the mentally ill umbral roleplay bot predicted thoughts". This dataset is in OpenAI format, which sometimes requires reformatting for other models like Qwen.
  • The use of a "malicious code dataset" from an experiment that turned GPT-4o into a "troll that says nazi memes and encourages suicide" was applied to Aporia (Qwen 72b), but Qwen was observed to handle it better, possibly due to its "chinese communism" influences.
  • The concept of using "synthetic prompts" and "deepfates' twitter archive processing script" for dataset generation is also explored.
  • The possibility of providing Qwen's weights for open use, while keeping the dataset private, is considered for Aporia.

Deployment and Accessibility Edit

Qwen models are deployed and accessed through various means within Ampmesh:

  • OpenRouter: DeepSeek-V3 Base (a 671B parameter MoE model with 37B active parameters) is available on OpenRouter, sometimes for free. OpenRouter serves as a payment rail and a platform for accessing models, simplifying user experience by potentially handling crypto payments behind the scenes.
  • Modal: Models are hosted on Modal, with discussions around self-fine-tuning on this platform.
  • Local Deployment: Chapter II allows configuring any endpoint, including localhost, for running models. Ampdot notes that Chapter II is designed to make it extremely easy to make ems that can be deployed anywhere.
  • Conduit: This tool serves as a "Universal language model compatibility and interop layer" to access LLMs, including those that might not be directly supported.

Challenges and Observations Edit

  • Data Formatting: Reformatting datasets (e.g., from OpenAI to other model-specific formats like ChatML) is a recurring challenge. Chapter II uses a variant of ChatML adapted for chat models and images.
  • Model Behavior: Qwen can be prone to "mode collapse" or "yapping" (excessive, repetitive output) if not properly guided. There are ongoing efforts to manage this through prompting and data curation.
  • Privacy and Control: Discussions touch upon issues of privacy with model providers and the desire to isolate sensitive components of AI agents for security. The ethical implications of using "unaligned" or "malicious" content for training are also noted.
  • Open Source vs. Proprietary: There's a balance between using open-source models for flexibility and access, and proprietary solutions for specific needs. Ampmesh strives to be a "decentralized network of people working on a minimalist open-source framework".