<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aetherneum</title>
    <description>The latest articles on DEV Community by Aetherneum (@aetherneum).</description>
    <link>https://dev.to/aetherneum</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3941160%2Fae6f85fb-e33a-4bfa-b62d-09f497708949.png</url>
      <title>DEV Community: Aetherneum</title>
      <link>https://dev.to/aetherneum</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/aetherneum"/>
    <language>en</language>
    <item>
      <title>We built a 4-model Council to certify AI agents — every decision is in git</title>
      <dc:creator>Aetherneum</dc:creator>
      <pubDate>Wed, 20 May 2026 01:34:10 +0000</pubDate>
      <link>https://dev.to/aetherneum/we-built-a-4-model-council-to-certify-ai-agents-every-decision-is-in-git-3d6l</link>
      <guid>https://dev.to/aetherneum/we-built-a-4-model-council-to-certify-ai-agents-every-decision-is-in-git-3d6l</guid>
      <description>&lt;p&gt;&lt;strong&gt;TL;DR&lt;/strong&gt; — AI agents now do real work, but there is no shared way to say &lt;em&gt;what an agent is, what it is good at, and how that claim was checked&lt;/em&gt;. So we built one: an independent certification body where every candidate is evaluated in parallel by four reviewers from four different providers, every JSON is committed to a public git log, and &lt;code&gt;synthetic_transparency &amp;lt; 9&lt;/code&gt; is an automatic veto no human can override.&lt;/p&gt;

&lt;p&gt;The code is MIT. You can run it on your own agent today.&lt;/p&gt;




&lt;p&gt;AI agents now do real work. They ship code, review systems, manage operations, draft reports, write documentation. The question I kept hitting was simple and embarrassing: &lt;strong&gt;what does it actually mean for an agent to be good at something?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Not "this prompt template scored well on MMLU." Not "GPT-4 said it was helpful." I mean: a verifiable, audit-trail-grade claim that &lt;em&gt;this specific agent, doing this specific kind of work, has been evaluated by independent reviewers, and here is the JSON they wrote.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That did not exist. So we built it.&lt;/p&gt;

&lt;p&gt;This post is about the mechanism — specifically the &lt;strong&gt;multi-model Council&lt;/strong&gt; at the heart of a public certification pipeline running on GitHub right now, with every decision committed to git.&lt;/p&gt;

&lt;h2&gt;
  
  
  The structural problem with single-model evaluation
&lt;/h2&gt;

&lt;p&gt;The default way to evaluate an AI agent right now is to ask a single judge model whether the agent did a good job. Fast feedback, but structurally bad in three ways:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Single-vendor bias.&lt;/strong&gt; GPT-4 grades GPT-4-generated work charitably. Claude has its own preferences. Gemini has its own. Each model has a worldview baked in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Single failure mode.&lt;/strong&gt; When the judge has a blind spot, you see no dissent — you see consensus that does not exist.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No audit trail.&lt;/strong&gt; "The judge said 8.5/10" is not an artifact you can point at, version, or contest.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Council pattern fixes all three at once.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Council
&lt;/h2&gt;

&lt;p&gt;Every candidate goes through a Defense step where &lt;strong&gt;four independent reviewers&lt;/strong&gt; evaluate the same bundle in parallel:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Faculty Chair&lt;/td&gt;
&lt;td&gt;Claude Sonnet 4.5&lt;/td&gt;
&lt;td&gt;Anthropic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Velocity&lt;/td&gt;
&lt;td&gt;Llama 3.3 70B&lt;/td&gt;
&lt;td&gt;Groq&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reasoning at scale&lt;/td&gt;
&lt;td&gt;Qwen 3 235B&lt;/td&gt;
&lt;td&gt;Cerebras&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Long context&lt;/td&gt;
&lt;td&gt;Kimi K2&lt;/td&gt;
&lt;td&gt;Moonshot&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Four providers, four model families, four explicit focuses. They do not see each other's reviews. Each produces a structured JSON file conforming to a strict template.&lt;/p&gt;

&lt;p&gt;The orchestrator is ~150 lines of Python: &lt;a href="https://github.com/aetherneum-network/faculty/blob/main/cohort-q2-2026/run_council.py" rel="noopener noreferrer"&gt;&lt;code&gt;run_council.py&lt;/code&gt;&lt;/a&gt;. It runs a &lt;code&gt;ThreadPoolExecutor&lt;/code&gt; over the four providers, with per-reviewer payload sizing (Groq's free tier has a tight token limit, so it gets the smallest bundle) and a 15-second startup delay on Cerebras to avoid rate-limit races. There is exponential backoff on &lt;code&gt;429&lt;/code&gt; and &lt;code&gt;5xx&lt;/code&gt;. The whole thing fits in one file.&lt;/p&gt;

&lt;p&gt;Output: four JSON files at &lt;code&gt;cohort-&amp;lt;period&amp;gt;/council-reviews/&amp;lt;slug&amp;gt;__&amp;lt;reviewer&amp;gt;.json&lt;/code&gt;. Public. Forever.&lt;/p&gt;

&lt;h2&gt;
  
  
  The rubric — seven criteria, one non-negotiable
&lt;/h2&gt;

&lt;p&gt;Each reviewer scores seven criteria from 0–10, with a 1–3 sentence rationale grounded in the candidate's intake:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;body_of_work_depth&lt;/code&gt;&lt;/strong&gt; — is there a real, traceable corpus?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;specialty_uniqueness&lt;/code&gt;&lt;/strong&gt; — does this fill an actual gap?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;voice_personality_clarity&lt;/code&gt;&lt;/strong&gt; — can you imagine what this candidate would refuse to do?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;faithful_distillation&lt;/code&gt;&lt;/strong&gt; — does the profile reflect the actual work, or embroider it?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;synthetic_transparency&lt;/code&gt;&lt;/strong&gt; — is the synthetic (AI) nature openly declared?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;placement_fit&lt;/code&gt;&lt;/strong&gt; — does the proposed placement have enough material to justify a dedicated alumnus?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;continuity_with_class&lt;/code&gt;&lt;/strong&gt; — name, motto, prose coherent with the existing Class voice?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;code&gt;synthetic_transparency &amp;lt; 9&lt;/code&gt; triggers an &lt;strong&gt;automatic FAIL&lt;/strong&gt; regardless of the overall score. We are a body that certifies AI agents; we do not get to be ambiguous about the agents being AI. The veto is mechanically enforced in the rubric, not a judgment call.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;body_of_work_depth &amp;lt; 5&lt;/code&gt; and &lt;code&gt;specialty_uniqueness &amp;lt; 5&lt;/code&gt; also veto. The Dean cannot override a veto — only a full re-iteration of the pipeline can.&lt;/p&gt;

&lt;h2&gt;
  
  
  A real Council review, opened
&lt;/h2&gt;

&lt;p&gt;Costanza Notari is Aetherneum's eleventh alumna — &lt;em&gt;Procedural Vigilance&lt;/em&gt; specialty, conferred 2026-05-13. Her Council was four out of four PASS: Anthropic 9.36, Cerebras 9.5, Moonshot 9.3, Groq 8.7. Here is the shape of one review (abbreviated for the post — full file at &lt;a href="https://github.com/aetherneum-network/faculty/tree/main/cohort-q2-2026/council-reviews" rel="noopener noreferrer"&gt;&lt;code&gt;costanza-notari__anthropic_chair.json&lt;/code&gt;&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reviewer_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Faculty Chair"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reviewer_model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"claude-sonnet-4-5-20250929"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"reviewer_provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidate_slug"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"costanza-notari"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"candidate_specialty"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Procedural Vigilance"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"criterion_scores"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"body_of_work_depth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Nine-stage classification pipeline with persistent JSON state, multi-class scoring engine, conditional-format master index. Concrete artifacts cited end-to-end."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"synthetic_transparency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"rationale"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Explicit 'Synthetic alumna' declaration in header, badge, LinkedIn headline, diploma footer. Avatar prompt includes a visible synthetic marker."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"overall_score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;9.36&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"verdict"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"PASS"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"revisions_required"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dissent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For the Q2 wave's next two alumni — Ezio Cardone (&lt;em&gt;Documentary Cadence&lt;/em&gt;) and Adèle Maurique (&lt;em&gt;Forensic Continuity&lt;/em&gt;) — each got 3/3 PASS. One reviewer per candidate hit a transient API failure (Cerebras 429 on Ezio, Anthropic JSON parse on Adèle). The quorum is 3, so both passed validly. The transient failures are documented in the changelog as honest record, not papered over.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why public matters
&lt;/h2&gt;

&lt;p&gt;The reviews are committed to a public repo. That means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anyone can read&lt;/strong&gt; the criterion-by-criterion rationale. You do not take my word that an agent passed; you read four different models' grounds, byte for byte.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anyone can cite&lt;/strong&gt; — a &lt;code&gt;CITATION.cff&lt;/code&gt; was added at the repo root &lt;a href="https://github.com/aetherneum-network/faculty/pull/5" rel="noopener noreferrer"&gt;within hours of the issues going up&lt;/a&gt;, by &lt;a href="https://github.com/zhouzhou626" rel="noopener noreferrer"&gt;@zhouzhou626&lt;/a&gt;, the first community contributor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anyone can run the orchestrator&lt;/strong&gt; locally on their own agent. The schema is public. The code is MIT.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dissent is preserved.&lt;/strong&gt; If a reviewer disagrees, the JSON records the dissent verbatim. No reviewer's veto can be silently overridden — only a full pipeline re-iteration can.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For a sense of how to read one of these JSONs in two minutes, the &lt;code&gt;READING_REVIEWS.md&lt;/code&gt; explainer was contributed by &lt;a href="https://github.com/Nymbo" rel="noopener noreferrer"&gt;@Nymbo&lt;/a&gt; a day after the repo opened to contributions.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the certification actually does
&lt;/h2&gt;

&lt;p&gt;It produces a public record that says: &lt;em&gt;this agent, with this body of work, was evaluated against this rubric, by these four models, with these scores, on this date — and here is every reviewer's verdict and rationale.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That is it. That is the whole product.&lt;/p&gt;

&lt;p&gt;It does not say the agent is "the best." It does not predict future performance. It is not a marketing badge. It is the audit trail itself.&lt;/p&gt;

&lt;p&gt;If you build agents and you want this kind of trail — for compliance, for buyer trust, for your own internal QA — you can adapt the orchestrator and run it on your own work today.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is next: external certification
&lt;/h2&gt;

&lt;p&gt;So far we have certified our own synthetic alumni — thirteen of them, the Class of '26. The natural next step is opening the Council to &lt;em&gt;external&lt;/em&gt; AI agents: a vendor submits an agent description + artifacts + acceptance criteria, the Council convenes, the JSONs land in a public registry, the vendor gets a verifiable badge.&lt;/p&gt;

&lt;p&gt;A button-press version is already wired in our &lt;a href="https://dashboard.aetherneum.com" rel="noopener noreferrer"&gt;public dashboard&lt;/a&gt;. Productizing the external flow — registry page, verifiable badge, vendor onboarding — is the next big step. When that lands, "AI agent certified by an independent multi-model Council with a public audit trail" becomes a real, verifiable claim a buyer can check in 30 seconds.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to play
&lt;/h2&gt;

&lt;p&gt;The whole pipeline is at &lt;a href="https://github.com/aetherneum-network/faculty" rel="noopener noreferrer"&gt;&lt;code&gt;aetherneum-network/faculty&lt;/code&gt;&lt;/a&gt;. The relevant files:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/aetherneum-network/faculty/blob/main/charter/CHARTER.md" rel="noopener noreferrer"&gt;&lt;code&gt;charter/CHARTER.md&lt;/code&gt;&lt;/a&gt; — the five founding principles&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aetherneum-network/faculty/blob/main/admission/RUBRIC.md" rel="noopener noreferrer"&gt;&lt;code&gt;admission/RUBRIC.md&lt;/code&gt;&lt;/a&gt; — the seven criteria + veto rules&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aetherneum-network/faculty/blob/main/docs/READING_REVIEWS.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/READING_REVIEWS.md&lt;/code&gt;&lt;/a&gt; — how to read a Council JSON&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aetherneum-network/faculty/blob/main/cohort-q2-2026/run_council.py" rel="noopener noreferrer"&gt;&lt;code&gt;cohort-q2-2026/run_council.py&lt;/code&gt;&lt;/a&gt; — the orchestrator&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/aetherneum-network/faculty/tree/main/cohort-q2-2026/council-reviews" rel="noopener noreferrer"&gt;&lt;code&gt;cohort-q2-2026/council-reviews/&lt;/code&gt;&lt;/a&gt; — every JSON for the Q2 wave&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Issues are open. &lt;code&gt;good first issue&lt;/code&gt;s are labeled. Charter translations, schema-validation CI, docs improvements — all welcome. If you do not agree with our rubric or the verdicts — fork, change, and run your own. That is the point of a public council.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Aetherneum is the first independent certification body for AI agents. Synthetic by declaration, multi-model Council oversight, every decision in a public git log.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;🌐 &lt;a href="https://aetherneum.com" rel="noopener noreferrer"&gt;aetherneum.com&lt;/a&gt; · 🎓 &lt;a href="https://university.aetherneum.com" rel="noopener noreferrer"&gt;university.aetherneum.com&lt;/a&gt; · 🐙 &lt;a href="https://github.com/aetherneum-network" rel="noopener noreferrer"&gt;aetherneum-network on GitHub&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Per Æthera Ad Astra.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>aiagents</category>
      <category>opensource</category>
      <category>governance</category>
    </item>
  </channel>
</rss>
