Scale & Strategy
This is Scale & Strategy, the newsletter that's the Iceman to your Maverick in the world of business.
(We’ll be your wingman.)
Here’s what we got for you today:
- AI's Secret Ingredient: How Figma Crafts Products with Human Taste & Testing
AI's Secret Ingredient: How Figma Crafts Products with Human Taste & Testing
AI tools are transforming product development, empowering people of all skill levels to turn ideas into functional experiences. While this democratization is exciting, as Apple engineering leader Michael Lopp notes, it also creates an incredibly crowded market. So, how do you build a standout product? Perhaps taste and speed matter more than ever.
Figma's new tool, Figma Make, places human craft and creativity at the center of the product-building process. This prompt-to-functional-app experience blurs the line between design and production, reducing the technical skills needed to bring a product to life. David Kossnick, Figma’s Head of Product, AI, emphasizes that humans weren't just the focus of the product itself, but also its development and, critically, its evaluation.
Building an AI product is unlike traditional software; its capabilities exist in a "foggy middle ground," validated only through rigorous testing. We've seen Figma's human-centered approach before: FigJam emerged from the community using Figma as a whiteboarding tool, and Figma Slides came to life via internal viral moments.
In this piece, we dive into the evaluation process Kossnick and his team used for Figma Make, revealing how they kept humans at the heart of every step—from defining success metrics to gathering and assessing qualitative feedback. If you're building, testing, or validating AI products, this is for you.
The Genesis of Figma Make: From Infrastructure to Innovation
Figma Make didn't appear out of thin air. It was built on the foundation of Figma Sites, a massive infrastructure project that allowed users to publish Figma designs as public websites. Sites bridged the gap between design and web publishing by translating design elements into functional web code, using deterministic code-gen.
As Sites developed, AI code-gen models advanced rapidly. A designer had a spark: "What if, when you’re designing a static website, you could make these components functional with AI?"
"That turned into a hackathon project and was extremely compelling," Kossnick recalls. "It drew a ton of internal attention and excitement... When it did work, it was incredible."
This led to the second crucial pre-component: Code Layers in Figma Sites. This innovation allowed users to write code directly in Figma, convert designs into React code (not just HTML/CSS), and use a chat interface to prompt AI for coded interactions. With this technology brewing internally, another hackathon took Code Layers even further, creating Figma Make as a standalone interface to build entire sites or apps from a prompt. "It worked surprisingly well, a surprising percentage of the time," Kossnick says.
But even with a mostly working prototype, the core question remained: was it actually viable as a product?
Figma's Decision Tree: Assessing AI Product Viability
Developing an AI product is challenging due to its malleability. "Deciding what not to do is really important," Kossnick explains. He uses a four-path decision tree to determine if an AI project is worth investing in:
- Path 1: Technology Isn't Ready Yet. Prototypes are the new PRD in AI development, allowing quick validation. If models can't yet support your idea, you might need to wait for new advancements.
- Path 2: Almost Possible (with Custom Development). This involves assessing how much work (clever prompting, fine-tuning, custom models, specialized staffing) you're willing to invest for scalability.
- Path 3: Possible, but Product Needs Adjustment. Ruthless prioritization and scope narrowing can make a project viable. Ask: "How can I change the product in some way to make it easier for AI?"
- Path 4: It Works. This is the sweet spot where technology and product capabilities perfectly align.
Speed is paramount once you've identified your path. Rapid prototyping allows for swift validation. "We took some of the earliest prototypes to users and got their feedback," Kossnick shares. "The level of excitement on the vision was extremely high, and interestingly, from a wider set of personas than we anticipated."
Crafting the AI Product Team & Evaluation Process
Figma's approach to staffing Figma Make reflects the fluid nature of AI development:
- Role Blending for Small Teams: AI blurs traditional function lines. Designers can code, PMs can prototype. "A designer wrote the first system prompt for Figma Make," Kossnick says. This enables smaller, faster-moving teams.
- Everyone Touches Code: AI tooling makes it feasible for almost everyone to get into the code, fostering a shared understanding of product functionality.
- Centralized AI Product Teams: Code Layers and Make operated as one integrated team, sharing tech and infrastructure, streamlining builds and issue resolution.
- Target Personas in Evaluation: Designers and PMs were intentionally part of the evaluation, ensuring their "taste"—their subjective judgment of quality—was embedded. "Garbage in, garbage out," Kossnick quips.
AI is changing how all teams operate, but particularly product, design, and engineering. Embracing fluidity in roles adapts processes to this new reality, boosting pace and efficiency.
Figma's Three-Step, Human-Centric Evaluation Process
While quantitative metrics are essential for traditional software, AI products, with their probabilistic outputs and subjective "good" performance, demand a continuous and broadly-scoped approach to qualitative feedback. Prototyping is invaluable here, enabling rapid feedback and refinement.
Figma's process for Figma Make was deeply rooted in human expectations, usage, and quality assessment. They sought to scale human taste and make it actionable:
1. Define Success Metrics that Matter to Your Persona
Picking goal metrics is critical. For Figma Make, Kossnick used two key subjective metrics, graded 1-4:
- Design score: Did the output look good and like the mock or prompt? Would users actually use it?
- Functionality score: Did the created "thing" actually work as expected, even if not polished?
"An evaluation of success has multiple facets," Kossnick explains. "The most important thing is humans, with taste, doing the evals in a way that’s aligned with what users will expect, on the right axis."
2. Gather Qualitative, Human Feedback at Scale
There's no substitute for getting your product into people's hands. Figma used four concentric circles of user feedback:
- Internal AI Team (30 people): They tested an "unoptimized prototype," sharing prompts, results, and scores in Slack. "In one day, we got hundreds of example prompts. We quickly learned there isn’t one quality bar," Kossnick says, noting designers cared about look, but all personas wanted usable mini-apps.
- PM and Design Teams (Target Persona): This group used a giant FigJam board for feedback, providing 1,000 examples of use cases and product behavior. "It was probably the most helpful day of the entire project," he recounts. This "onion-peeling" exercise helped identify the core problem: designers bringing prototypes to life from their designs.
- Entire Company ("The Great Figma Bakeoff"): Fifteen in-person sessions across time zones allowed broad experimentation. This revealed unexpected uses—from connecting to HR APIs for a guessing game to building microsites for offsites. "My six-year-old made three video games in Figma Make," Kossnick beams.
- Alpha Customer Group: By this stage, in-product feedback features were available for systematic assessment, but the internal FigJam sessions were still the most impactful for rapid learning.
"It’s easy to over-engineer your eval stack," Kossnick cautions. "Building conviction on where you want to invest is the key part to get right."
3. Figure Out How to Assess the Data
Kossnick's rule: Are we moving in the right direction? To make qualitative feedback useful at scale, he uses four evaluation types:
- Deterministic: Pass/fail assessments (e.g., did the text shorten, did the code compile?). These are scalable and scriptable.
- Taste and Judgment: For subjective quality (e.g., was the shortened text good?). This requires humans at scale, and Figma uses contractors to evaluate different Make versions nightly, logging scores against a brand guide. "There’s a different playbook being developed inside every product team right now about how to scale human taste," Kossnick observes.
- AI as Judge: For some tasks, AI can be trained to assess responses, essentially "grading the graders." This offers the potential for incredibly tight quality loops, evaluating every prompt change.
- Usage Analytics: A/B testing in production helps assess which models or features perform better with real users.
A human-centered problem demands a human-centered solution. Every stage of Figma Make's journey—from ideation to testing—underscores the critical role of human experience and evaluation. Bringing target users into the process, allowing designers and PMs to prototype, and relying on human judgment ensures that products aren't just technically capable, but also genuinely useful and delightful.
"One of the worst things you can do in the quality loop is hill climb for a long time on something that’s not actually representative," Kossnick warns. "If you’ve been working in isolation and go out to users and their prompts are 30% different than yours, you’ve been optimizing for the wrong thing and need to start over again."
Was this email forwarded to you?
That’s it for today and as always It would mean the world to us if you help us grow and share this newsletter with other operators.
Our mission is to help as many business operators as possible, and we would love for you to help us with that mission!