Console Sessions

Enterprise tool · AI agentic platform

Syllable AI agentic platform

Launched in 2025 Jan

Syllable’s AI agentic platform, launched in January 2025, enables healthcare enterprises to build and manage LLM-supported AI agents at scale.

As part of the platform’s initial rollout, I led the design of Session reviewing experience—a QA tool that empowers internal analysts, engineers, and enterprise clients to evaluate AI agent performance across millions of conversations.

I designed both:

Sessions V1 — a manual QA tool to support our immediate launch needs.
North Star Vision — a long-term vision leveraging AI automation to scale the QA process. I collaborated closely with data scientists to validate the potential of LLM-generated insights, paving the way for the platform's future evolution.

Due to NDA, only non-confidential insights are shared here. Feel free to reach out if you'd like to learn more.

Period
2024 Oct- Nov

Skills
User Research · UI/UX design · Product Thinking · Information Architecture · Cross-functional Collaboration · Visionary Thinking for AI Automation

Problem

Traditional call review tools fell short

As we prepared to launch the platform, one core design challenge emerged:

”How can teams effectively QA millions of LLM-powered conversations, each with complex, real-time AI agent actions?”

Traditional call review tools weren’t built for this. These new AI agents handled natural dialogue, triggered APIs, and navigated unpredictable human behavior—demanding new ways of surfacing what happened in a call and verify whether the AI acted as intended.

Approach

Identifying role-specific needs

Our first step was understanding the specific users needed in the QA process. Through stakeholder interviews, we identified main workflows:

Enterprise clients : Review calls and filed AI agent failures to ensure patient needs were handled properly.
Engineers: Received tickets, review calls and check AI agent triggered APIs and actions to fix the bugs.

What are the bottlenecks?

From mixed-method research combining 8 user interviews and shadowing sessions, we identified key frictions that slow down the review process and reduce accuracy.

8/10 of users prioritize reviewing key metrics before engaging with the call. If the metrics doesn’t matched expectation, users then proceed to listen to the call for deeper context.
Users don’t read transcripts line by line. 92% scan for keywords or visual markers to quickly spot issues—but most transcripts lack the structure and signals needed, causing critical problems to go unnoticed.
Current tools offer no visibility into what the AI sees, thinks, or does during a call— making debugging slow and frustrating.
Users need to write lengthy notes in google form to report call issue.

“Google form is not the best tool, I have to specify the issues clearly in the form, I have to hunt it down manually“ - Techinical support manager

North star solution

Let AI be a QA co-pilot

I first led early design explorations around how generative AI can meaningfully scale and streamline user’s QA workflows.

I believe that truly delightful software isn’t about flashy visuals—it’s about being helpful in the right moment. Based on user research, I designed AI-powered contextual features to improve human productivity.

Get the full picture—fast

Get to know key call metrics and action items in a glance
Use AI chat to get instant conversation insights

Track every move the AI makes

No more black boxes
See exactly how your AI agent thinks and acts—every thought, input, and decision logged in real time.
Act in a click
Flag a moment, replay a snippet, or dive into the knowledge base—all with one click.

Let AI flag what went wrong

Auto-Detected Issues
Detect agent errors and unexpected behavior automatically.
Auto-Scroll in Transcript
Jump straight to flagged lines—no need to read the whole call.
Smart Labeling
One click to fill out reports and tag issues faster.

MVP Design

One layout, all workflows

Reviewing LLM-powered conversations means switching between call lists, transcripts, and reporting tools—often breaking focus. I designed a three-panel layout with inline tools to keep context intact and reduce interaction friction:

Primary panel — Call list for quick navigation
Secondary panel — Transcript with AI summaries and inspector for real-time API activity
Tertiary panel — Reporting panel for instant labeling and issue filing

This structure lets users move from insight to action without leaving the screen.

The layout ensured clarity while supporting progressive disclosure—users could go deep when needed, or stay high-level when moving fast

High-fidelity

See what matters in seconds

Dense, multi-layered call data can overwhelm reviewers. In the high-fidelity design, I restructured information for faster scanning and instant comprehension:

Chunked content into clear sections aligned to user tasks
Applied strong visual hierarchy so AI summaries, timestamps, and issue labels stand out immediately

MVP

Streamlining labeling and reporting

Previously, issue reporting happened via customer support emails—detached from the session and slow to process. For V1, I designed a contextual slide-in panel to enable in-the-moment reporting with minimal friction:

Smiley-based performance rating for quick, expressive feedback
Structured issue categories with smart pre-filled options to speed selection
Optional comments for added clarity

By replacing a disconnected process with inline, structured feedback, we lowered the barrier to reporting, created a scalable feedback loop, and accelerated QA cycles—making recurring AI issues easier to surface and act on.

North star

Designing IA visual languages

To support a growing ecosystem of AI features, I explored how AI content should look, feel, and behave. I worked closely with the Data team to define output formats, test reliability, and design modular feedback flows that make AI performance visible and actionable.

Post launch improvements

Building trust in AI

After launch, we identified two opportunities to help users better verify AI performance and correct inaccuracies—further building confidence in the system.

Tool inspector — Replaced the original nested JSON view with a clear, timestamped panel mapping each conversation moment to its triggered tools or API calls. This gave engineers and QA teams immediate visibility into the AI’s decision-making process.
AI summary feedback — Introduced a one-click mechanism for flagging inaccurate summaries, creating a scalable feedback loop to continuously improve model quality without interrupting user flow.

MVP Impact

After the Sessions V1 launch, it became a core pillar of Syllable’s platform, enabling faster, clearer, and more collaborative AI evaluation.

⏱ 2X faster QA cycles through streamlined navigation and in-call surfacing.

📊 12M+ calls are able to review across internal and client teams.

🔍 Wider adoption by enterprise partners using Sessions for audits and training.

🤝 Improved cross-functional workflow between QA, Engineering, and Customer Success.

Go back