DOCUMENTATION LAYER

AgentKai Labs

Practical research notebooks documenting real experiments, model vulnerability scans, prompt injection testing, and tool integrations. We build, learn, document, and share.

LAB 001Active Notebook

Deterministic vs Stochastic Systems

Detailing why traditional software testing methodologies fall short when confronted with probabilistic AI behaviors and how behavioral evaluation bridges the gap.

June 25, 2026Read Entry →

LAB 002Active Notebook

Getting Started with Promptfoo

Setting up automated evaluation pipelines for LLM application outputs, writing custom assertions, and analyzing local evaluation metrics.

June 28, 2026Read Entry →

LAB 003Active Notebook

Prompt Injection Basics

Analyzing direct and indirect vulnerability vectors, staging proof-of-concept injection exploits, and building robust prompt guardrails.

June 30, 2026Read Entry →

LAB 004Coming Soon

Testing a Willowfish AI Guide

Evaluating conversational instruction-following, customer journey boundaries, and escalation triggers in a live chatbot context.

LAB 005Coming Soon

PyRIT First Experiments

Testing Microsoft's Python Risk Identification Tool (PyRIT) to automate adversarial red teaming pipelines against target model endpoints.

LAB 006Coming Soon

Garak Vulnerability Scanning

Running Garak vulnerability scanner to audit LLM API integrations for data leaks, toxicity, prompt injections, and structural hallucinations.

LAB 007Coming Soon

RAG Evaluation Fundamentals

Measuring grounding, context retrieval precision, and answer faithfulness using Ragas (Retrieval-Augmented Generation Assessment).

PRACTICE PRINCIPLE

Our Content Philosophy

"Build. Learn. Document. Share."

AgentKai does not publish generic marketing content. Every lab entry, article, and case study originates from real projects, active system evaluations, and actual vulnerabilities discovered. We document our learning journey in the open to advance LLM safety and engineering robustness.