Console Sessions
Enterprise tool · AI agentic platform
Syllable AI agentic platform
Launched in 2025 Jan
Syllable’s AI agentic platform, launched in January 2025, enables healthcare enterprises to build and manage LLM-supported AI agents at scale.
As part of the platform’s initial rollout, I led the design of Session reviewing experience—a QA tool that empowers internal analysts, engineers, and enterprise clients to evaluate AI agent performance across millions of conversations.
I designed both:
Sessions V1 — a manual QA tool to support our immediate launch needs.
North Star Vision — a long-term vision leveraging AI automation to scale the QA process. I collaborated closely with data scientists to validate the potential of LLM-generated insights, paving the way for the platform's future evolution.
Due to NDA, only non-confidential insights are shared here. Feel free to reach out if you'd like to learn more.
Period
2024 Oct- Nov
Skills
User Research · UI/UX design · Product Thinking · Information Architecture · Cross-functional Collaboration · Visionary Thinking for AI Automation
Problem
Traditional call review tools fell short
As we prepared to launch the platform, one core design challenge emerged:
”How can teams effectively QA millions of LLM-powered conversations, each with complex, real-time AI agent actions?”
Traditional call review tools weren’t built for this. These new AI agents handled natural dialogue, triggered APIs, and navigated unpredictable human behavior—demanding new ways of surfacing what happened in a call and verify whether the AI acted as intended.
Approach
Identifying role-specific needs
Our first step was understanding the specific users needed in the QA process. Through stakeholder interviews, we identified main workflows:
Enterprise clients : Review calls and filed AI agent failures to ensure patient needs were handled properly.
Engineers: Received tickets, review calls and check AI agent triggered APIs and actions to fix the bugs.
What are the bottlenecks?
From mixed-method research combining 8 user interviews and shadowing sessions, we identified key frictions that slow down the review process and reduce accuracy.
-
8/10 of users prioritize reviewing key metrics before engaging with the call. If the metrics doesn’t matched expectation, users then proceed to listen to the call for deeper context.
-
Users don’t read transcripts line by line. 92% scan for keywords or visual markers to quickly spot issues—but most transcripts lack the structure and signals needed, causing critical problems to go unnoticed.
-
Current tools offer no visibility into what the AI sees, thinks, or does during a call— making debugging slow and frustrating.
-
Users need to write lengthy notes in google form to report call issue.
“Google form is not the best tool, I have to specify the issues clearly in the form, I have to hunt it down manually“ - Techinical support manager
North star solution
Let AI be a QA co-pilot
I first led early design explorations around how generative AI can meaningfully scale and streamline user’s QA workflows.
I believe that truly delightful software isn’t about flashy visuals—it’s about being helpful in the right moment. Based on user research, I designed AI-powered contextual features to improve human productivity.
Get the full picture—fast
Get to know key call metrics and action items in a glance
Use AI chat to get instant conversation insights
Track every move the AI makes
No more black boxes
See exactly how your AI agent thinks and acts—every thought, input, and decision logged in real time.Act in a click
Flag a moment, replay a snippet, or dive into the knowledge base—all with one click.
Let AI flag what went wrong
Auto-Detected Issues
Detect agent errors and unexpected behavior automatically.Auto-Scroll in Transcript
Jump straight to flagged lines—no need to read the whole call.Smart Labeling
One click to fill out reports and tag issues faster.
MVP Design
One layout, all workflows
Reviewing LLM-powered conversations means switching between call lists, transcripts, and reporting tools—often breaking focus. I designed a three-panel layout with inline tools to keep context intact and reduce interaction friction:
Primary panel — Call list for quick navigation
Secondary panel — Transcript with AI summaries and inspector for real-time API activity
Tertiary panel — Reporting panel for instant labeling and issue filing
This structure lets users move from insight to action without leaving the screen.
The layout ensured clarity while supporting progressive disclosure—users could go deep when needed, or stay high-level when moving fast
High-fidelity
See what matters in seconds
Dense, multi-layered call data can overwhelm reviewers. In the high-fidelity design, I restructured information for faster scanning and instant comprehension:
Chunked content into clear sections aligned to user tasks
Applied strong visual hierarchy so AI summaries, timestamps, and issue labels stand out immediately
MVP
Streamlining labeling and reporting
Previously, issue reporting happened via customer support emails—detached from the session and slow to process. For V1, I designed a contextual slide-in panel to enable in-the-moment reporting with minimal friction:
Smiley-based performance rating for quick, expressive feedback
Structured issue categories with smart pre-filled options to speed selection
Optional comments for added clarity
By replacing a disconnected process with inline, structured feedback, we lowered the barrier to reporting, created a scalable feedback loop, and accelerated QA cycles—making recurring AI issues easier to surface and act on.
North star
Designing IA visual languages
To support a growing ecosystem of AI features, I explored how AI content should look, feel, and behave. I worked closely with the Data team to define output formats, test reliability, and design modular feedback flows that make AI performance visible and actionable.
Post launch improvements
Building trust in AI
After launch, we identified two opportunities to help users better verify AI performance and correct inaccuracies—further building confidence in the system.
Tool inspector — Replaced the original nested JSON view with a clear, timestamped panel mapping each conversation moment to its triggered tools or API calls. This gave engineers and QA teams immediate visibility into the AI’s decision-making process.
AI summary feedback — Introduced a one-click mechanism for flagging inaccurate summaries, creating a scalable feedback loop to continuously improve model quality without interrupting user flow.
MVP Impact
After the Sessions V1 launch, it became a core pillar of Syllable’s platform, enabling faster, clearer, and more collaborative AI evaluation.