Console Sessions
Enterprise tool · AI agentic platform
Syllable AI agentic platform
Launched in 2025 Jan
Syllable’s AI agentic platform, launched in January 2025, enables healthcare enterprises to build and manage LLM-supported AI agents at scale.
As part of the platform’s initial rollout, I led the design of Session reviewing experience—a QA tool that empowers internal analysts, engineers, and enterprise clients to evaluate AI agent performance across millions of conversations.
I designed both:
Sessions V1 — a manual QA tool to support our immediate launch needs.
North Star Vision — a long-term vision leveraging AI automation to scale the QA process. I collaborated closely with data scientists to validate the potential of LLM-generated insights, paving the way for the platform's future evolution.
Due to NDA, only non-confidential insights are shared here. Feel free to reach out if you'd like to learn more.
Period
2024 Oct- Nov
Skills
User Research · UI/UX design · Product Thinking · Information Architecture · Cross-functional Collaboration · Visionary Thinking for AI Automation
Designing a tool to evaluate LLM-AI agent interactions
As we prepared to launch the platform, one core design challenge emerged:
”How can teams effectively QA millions of LLM-powered conversations, each with complex, real-time AI agent actions?”
Traditional call review tools fell short. These new AI agents handled natural dialogue, triggered APIs, and navigated unpredictable human behavior—demanding new ways of surfacing what happened in a call and whether the LLM-supported AI agents behaved as intended.
Approach
Identifying role-specific needs
Our first step was understanding what different users needed to see during the QA process. Through stakeholder interviews, we identified three main personas:
QA Analysts: Needed to review transcripts and detect conversation failures.
Engineers: Required visibility into tool/API events and system actions.
Enterprise Clients: Wanted a clear, high-level summary and quick transcript review.
User problems
From the interview, I identified 3 key frictions that slow down the review process and reduce accuracy.
Turning rich context into a clear layout
Designing a review tool for LLM-powered conversations required us to rethink how multiple layers of information could be structured and surfaced clearly. These sessions are rich with context: transcripts, real-time API calls, evaluation metadata, and QA markers. The goal was to create an interface where these dimensions coexist meaningfully—without overwhelming the users.
I began by identifying the four critical information types that needed to be presented:
A navigable call list for cross-session context
The call transcript with playbar
Real-time API and tool invocation logs
Reporting panel to manually label AI performance
To match these with user needs, I mapped mental models across three primary personas. After multiple explorations, we use a modular, three-panel layout to support these diverse workflows:
Primary Panel: View all call lists
Secondary Panel: Transcript with AI-generated summaries and tool Inspector displaying real-time API activity and system context.
Tertiary Panel: Reporting Panel for labeling, filing an issue, and structured feedback capture
First exploration
In the initial design exploration, I focused on improving the call listening experience for users reviewing long, complex AI-powered conversations. The goal was to help users quickly locate key moments without listening to the entire call. Key ideas explored:
Transcript-to-Audio Sync: Clicking a transcript line jumps to the corresponding audio timestamp.
Smart Playbar: Shows visual markers for key audio events such as AI decisions, summaries, or flagged moments.
Highlighting in Context: Allows users to follow along with synchronized visual cues and audio playback for better comprehension.
This IA gave each user the ability to focus on their critical tasks, without losing access to supporting information. The layout ensured clarity while supporting progressive disclosure—users could go deep when needed, or stay high-level when moving fast
Iteration
After testing and feedback, I refined the UI to be more compact to reduce scroll and surface transcript and summaries in one glance.
Streamlining reporting process
Previously, users reported issues via customer support emails—disconnected from the session itself. For V1, I designed a contextual slide-in panel that enabled in-the-moment reporting with minimal friction. Users can:
Rate AI performance using a simple, expressive smiley scale
Select from structured issue categories with smart pre-filled options
Add optional comments for clarification
After exploring multiple rating system options, I introduced the lightweight smiley-based scale and smart pre-filled categories to lower the barrier for feedback.
These decisions balanced simplicity with structured data collection, creating a scalable feedback loop that accelerated QA cycles, surfaced recurring AI issues more systematically, and strengthened the link between user insights and product evolution.
Building trust in AI
For the initial release, I focused on designing foundational AI features that help users build confidence in the system. My two primary goals were:
Enabling users to understand whether the AI agent took the right action during a conversation.
Providing the feedback when the AI generates inaccurate (hallucinated) content
To address this, I designed two foundational features:
Tool inspector – A contextual panel that maps conversation moments to triggered tools or API calls, giving engineers and QA teams visibility into the AI’s decision-making process.
AI summary feedback – A lightweight feedback mechanism that lets users flag inaccurate summaries with a single click. This created a scalable loop for improving model quality over time without interrupting user flow.
Envisioning the future: Can AI be a QA co-pilot?
While shipping the foundational features, I led early design explorations around how generative AI can meaningfully scale and streamline user’s QA workflows.
I believe that truly delightful software isn’t about flashy visuals—it’s about being helpful in the right moment. Based on user research, I designed contextual features to support key review tasks:
Get the full picture—fast
Get to know key call metrics and action items in a glance
Use AI chat to get instant conversation insights
Track every move the AI makes
No more black boxes
See exactly how your AI agent thinks and acts—every thought, input, and decision logged in real time.Act in a click
Flag a moment, replay a snippet, or dive into the knowledge base—all with one click.
Let AI flag what went wrong
Auto-Detected Issues
Detect agent errors and unexpected behavior automatically.Auto-Scroll in Transcript
Jump straight to flagged lines—no need to read the whole call.Smart Labeling
One click to fill out reports and tag issues faster.
Designing IA visual languages
To support a growing ecosystem of AI features, I explored how AI content should look, feel, and behave. I worked closely with the Data team to define output formats, test reliability, and design modular feedback flows that make AI performance visible and actionable.
Impact
After the Sessions V1 launch, it became a core pillar of Syllable’s platform, enabling faster, clearer, and more collaborative AI evaluation.