Instagram Founder Mike Krieger on Fable 5 and the Future of AI-Driven Software Development

Guest: Mike Krieger, Co-founder of Instagram

Host: Dan Shipper

Podcast source: Every

Mike Krieger Lets Fable 5 Code While He Sleeps

Broadcast date: June 11, 2026

Key Points Summary

Mike Krieger, formerly co-founder of Instagram, helped create one of the most influential consumer applications of the past two decades. Today, he is at the forefront of developing AI-native products, leading Anthropic Labs as it relentlessly explores a fundamental question: When the world’s most advanced AI models are placed directly into the hands of real developers, how far can the boundaries of technological capability be pushed?

Five months before Fable’s official release, when he first gained internal access to the model, the shock and sense of being left behind still linger in his memory. “Well, I guess I’m a complete beginner again,” he joked to his team at the time. He suddenly realized that all the decades-worth of principles he had accumulated on productivity, R&D strategy, and even time management had become obsolete in an instant. The model’s rate of evolution had completely outpaced his existing workflows.

In this episode, the host engages in an in-depth conversation with Mike Krieger, offering a glimpse into what it’s like to collaborate alongside Fable—a groundbreaking, next-generation model—in building software. In this new normal of human-machine symbiosis, what novel development rhythms, formidable challenges, and imaginative possibilities are emerging?

Summary of insightful perspectives

How Fable completely transformed Mike's workflow

When to use Sonnet, when to use Fable

The Agent-native architecture spawned by Fable 5

Construction costs have collapsed.

Is software engineering dead?

Verification mechanism and price

Dynamic workflow

How Fable completely transformed Mike's workflow

Host Dan Shipper: Joining us today is Mike Krieger, head of Anthropic Labs and co-founder of Instagram. Mike, I’d really love to hear your firsthand experience after deeply using this model. When such a powerful model is released, it’s incredibly helpful when someone who uses it daily says: “It’s astonishingly strong in these areas, it genuinely transformed my workflow here, but in other places, it’s just not that big a deal”—this helps people truly understand how to integrate the technology into their everyday lives.

Mike Krieger:

Indeed. The experience itself was fascinating. Months before Fable’s official release, we were already using several Mythos-level models internally. I was eager to see what external developers would create with them, but as you said, the real cognitive leap came from weeks of intensive, continuous use—not from the initial trial on day one.

We’ve also experienced this kind of cognitive reframing with our previous models. At the end of last December and beginning of this January, as everyone heavily used Opus 4.5 and 4.6, over time, people suddenly realized: “I hadn’t pushed it hard enough before. I need to go one step further and reconsider the true boundaries of this generation’s capabilities.”

Host Dan Shipper: Within our Every team, some colleagues are already using it. Some have commented, "I feel like I need an entirely new skill tree to master this model," especially non-technical, knowledge-work colleagues who feel overwhelmed and don’t even know where to start; while those working on agent orchestration remark, "There’s just too much new stuff to learn."

Mike Krieger: You hit the nail on the head by mentioning the “workflow transformation”—it’s not just about specific operational steps, but a fundamental shift in mindset. Coincidentally, this model emerged just as I was transitioning in my role: I had just moved from CPO (Chief Product Officer) to Labs, reverting back to a developer mindset. About one and a half to two months after the transition, we ran this type of model internally for the first time. Sitting at my desk, I thought to myself: “Well, I’m a beginner again.” I realized that my old habits for writing prompts—and even my way of breaking down tasks—had become completely outdated in light of this model.

Your sense of time scale and interaction mode must evolve. In the past, I might have said, "I have a feature idea—let’s start with the first step—" but that’s no longer acceptable. The correct approach now is to communicate a broader, more comprehensive intent, then fully let it run on its own. I remember back in March and April, its capabilities were already astonishing—it didn’t just deliver impressive results in one go, but even more remarkably, it understood the future evolution of the feature and the overall context of the entire project.

And this evolution has completely stopped. This morning, I was talking about work—I realized on the plane that “I could actually handle the vast majority of my work remotely.” I no longer worry about whether the Wi-Fi will drop, because as long as I set the correct context and instructions beforehand—like a looping command—it can see the task through on its own.

Over the past two months, I’ve frequently experienced these standout moments: saying goodnight to Claude before bed, handing it a complex task, and waking up the next morning to find it’s already completed everything—usually finishing the main work by 2 a.m. and spending the remaining four hours refining the details.

What impressed me the most is its ability to autonomously close the loop. For example, it thinks like this: "Mike asked me to run a complex task tonight, but I'm stuck because the remote server is down. Alright, I'll create a mock backend myself, document the issue, complete and save the entire workflow, then fix it tomorrow when the service is back online." For me, being able to delegate a task at this level and fully trust its final output is an incredibly powerful experience.

Of course, you’ll still need to review the results afterward—this involves a complete verification mechanism, which we can dive into later, as it’s a crucial part of the闭环. But this does force me to reconsider: what does “efficiency” really mean when dealing with a model like this? In the past, we often compared such models to “assistants” or “companions,” but now, it’s more like a true “hardcore teammate” who can take responsibility and deliver substantial core work.

Host Dan Shipper: So what does your daily workflow actually look like? I’ve noticed a phenomenon: when you give it a large, complex task, provide a detailed prompt, and let it run for hours or even overnight, it performs at its best. But when it comes to small, everyday tasks, it feels too slow and too expensive, making you less inclined to use it. How do you balance this in practice? Where does it fit in your tech stack?

Mike Krieger:

I now use it more for early-stage architecture planning and alignment of solutions. This is an interesting shift—and still a tough challenge that all models need to keep working on.

I’m deeply grateful for my experience building Instagram—starting with a bare-bones version on a single server in Los Angeles, then scaling to handle massive concurrency and growth, and finally integrating it into Facebook’s infrastructure. This journey instilled in me an intuition for knowing “at what stage of a project, what level of architectural abstraction and complexity is appropriate.”

So, I’ll continue to have frequent back-and-forth exchanges with Fable. Sometimes it proposes what seems like a perfect implementation, and I’ll point out: “I do plan to deploy this soon—we need to consider scalability beyond a single machine.” This two-way interaction is crucial. However, when planning architecture, I usually have it generate an HTML page to visually represent our discussions, making it easier to share with the team. Even a Markdown file would work, but I prefer formats with diagrams.

This creates an interesting paradigm: work through the details and plan thoroughly together, then produce a document to align the team. Since the speed of building prototypes has been drastically accelerated, you now need even more upfront consensus and alignment—even if you plan to start with a “fast, small-step” demo and work backward to derive a more rigorous system architecture, early communication remains critical. And this is precisely where human thinking and collaboration remain deeply embedded in the entire process.

At the execution stage, whether using nighttime or large blocks of daytime, assigning it to tackle different task modules independently means I’m simultaneously maintaining far more concurrent sessions than before. Sometimes I prefer keeping a long-running Claude Code session open, letting it fork all tasks to background sub-agents so the main thread can instantly respond to my new commands; other times, I simply open five or six browser tabs at once, each handling long-running, complex tasks independently.

This long-term perspective, with its “Don’t worry, leave it to me—it just takes some time” approach, holds significant potential. We’re currently exploring how to better support this experience at the product level—you likely want to seamlessly balance both “instant response” and “long-term background operation,” and the interaction between these two states is fascinating. Personally, I prefer keeping at least one Claude window open with high context and extremely fast responsiveness, giving me the intuitive sense that “I’m always ready—you say the word, and I can immediately start or spawn a subtask.”

When to use Sonnet, when to use Fable

Host Dan Shipper: So, for example, if you're walking down the street and suddenly have a question—would you pull out Fable? Would that feel like using a rocket launcher to kill a mosquito? Or do you frequently switch between different models?

Mike Krieger:

Lately, I really did use Fable for everything, and the experience was exactly as you described—you stare at the screen, watching it strain desperately to think.

Until last week, I wanted to look up a simple question that almost embarrassed me—about the NBA Finals. When I switched to the mobile version of Sonnet, it instantly hit me: "Oh right! I used to use Sonnet for quick questions like this." The experience was on an entirely different level. It wasn’t even about how many tokens per second it could output—it was about how much mental capacity the question required to process. Sometimes, a simple answer doesn’t need all that elaborate, deep thinking.

This is also a fascinating question for our product team. Overall, you certainly don’t want users agonizing daily on the frontend about which model to choose. Ideally, in the long term, we could consolidate them into a few highly intuitive, out-of-the-box use cases—or even route users directly based on the interface, because honestly, most of the time when I’m browsing iOS apps, I’m not trying to do anything heavy enough to warrant calling on Fable. So, implementing a seamless, invisible model assignment at the interface level might be a viable approach. We’ll need to explore what this truly means at the product level. But I’ve recently come to deeply understand that subtle mindset: “This question doesn’t even deserve Fable—I should let Sonnet handle it.”

You're right—when it comes to high-frequency, fine-grained interactive tasks, Fable tends to automatically go deeper than necessary. In fact, Fable is the first model I’ve encountered that makes me actively adjust the “reasoning effort.” Sometimes I’ll sit there thinking, “I just want to tweak a UI style—setting the effort level to ‘medium’ should be enough to see the effect.” With Opus, I rarely adjusted this at all, because the model’s range of adaptability wasn’t as broad. But Fable’s range is truly much wider.

Mike's weekend media tracker revealed what about agent-native architecture

Host Dan Shipper: Can you show us something you've built with it?

Mike Krieger:

When we launched this new model, we did something—we encouraged the entire team to use it on their personal accounts, especially over the weekend. It was quite fun, because Anthropic has many custom productivity tools, so stepping back occasionally to return to the purest state—“I’m just using pure Claude Code to build small, fun projects for myself over the weekend”—felt amazing.

Host Dan Shipper: Are you running it in the terminal app or the desktop app?

Mike Krieger:

Great question. I still spend most of my time in the terminal. But interestingly, my wife—she’s not a professional engineer and has more of a background in UX design and product management—has fallen in love with Claude Code entirely through the desktop app. I think the desktop app helps her avoid many of the complex underlying abstractions. Still, when I work on this project myself, I stick with Ghostty and the terminal.

I immediately wanted a perfect "media progress tracker"—I regularly play games, binge-watch shows, and receive recommendations from friends, so I needed a tool that perfectly matched my organizational habits. My two core requirements were: first, adding items had to be incredibly easy—just speak or type a message to Claude, and it would automatically search the web, fill in all the details, and organize everything; second, it had to proactively push updates—like automatically finding new seasons or game sequels.

Most of the UI was completed in one go by Fable, which is already impressive. But one thread I’ve been relentlessly pursuing at Labs this year is: how can you bring the software team—currently this team is Claude—even closer to the software itself?

It was a Saturday morning, and my entire weekend was packed with childcare activities, so my development work was entirely intermittent: take the kids hiking, come back, write a few lines, then head out again. Sometimes, even while hiking, I couldn’t resist glancing at the progress—though I shouldn’t have been on my phone while with the kids, remotely monitoring how far along the task had gotten felt incredibly satisfying.

I had a thought: Could I casually run an aggressive experiment to let the software modify itself from within?

I built both mobile and web versions simultaneously. I originally created a chat interface where I can simply tell Claude, “Add this URL to my tracking list.” But I want all software to evolve to have this capability—I’m done navigating through complex, layered menus to find features.

Dan, on many levels, I'm actually trying to push agent-native architecture to its most extreme boundaries.

The so-called agent-native architecture's first phase is: every core component and piece of data within a product must be fully accessible to agents and have corresponding tool invocation interfaces. This is rapidly becoming the baseline expectation in the software industry—though sadly, the vast majority of software available today still fails to meet this standard.

I have a great positive example: Recently, someone recommended a brilliant Brazilian series about the Goiânia radioactive contamination incident. The title was incredibly long and hard to remember, so I casually mentioned it to the system—and Claude immediately searched for it and categorized it accurately. This experience was far better than trying to blindly search on Google myself.

But what I'm truly obsessed with next is: In a mobile context, directly modifying the software from within itself—what would that evolve into?

What I did—more precisely, what I instructed Claude to do—is create an interaction where, in the app, holding down the chat button activates our hosted agent to receive "code modification commands," then immediately previews the results using Vercel’s Live Preview feature. The entire module worked almost flawlessly on the first try—it was incredibly cool—and I’ve since added several new ideas incrementally. If you’re a hardcore user, you can also check its Diff view or dive into the hosted agent’s conversation history to see exactly what changes were made at the code level—but I rarely look at them. For a personal side project like this, I simply don’t care about long-term maintainability (laughs).

This thing is incredibly addictive. While out with my kids, I noticed, "This floating button is too low on iOS," and I just spoke it directly into the app—right then and there, it went to the backend and fixed the code. Integrated with Expo’s development toolchain, it even performed a hot reload directly on my phone. The experience in that moment was absolutely incredible.

Does this need to reach a production-grade level capable of handling a million concurrent users? Absolutely not. But it gives me an incredible sense of control: you don’t have to halt the project the moment you close your laptop at the end of the weekend—you can heavily use it while continuously modifying it on the fly. This end-to-end real-time feedback loop allows you to iterate endlessly.

This is not only an excellent showcase of Fable’s hardcore engineering capabilities, but also a microcosm of the ultimate question we’ve been discussing: How should Claude be integrated into software? It shouldn’t remain at the level of mere “use”—it must be deeply embedded into the very fabric of software construction.

Construction costs have collapsed.

Host Dan Shipper: I really want to highlight one thing: tools like this might have been possible to build ten or twenty years ago, but not in this way. The cost of building software has collapsed dramatically. Think back to the era of Instagram—how much resources would it have taken to bring a project to this level of completion? And how much does it take now? Help us quantify this dramatic shift in the times.

Mike Krieger:

I often reflect on those days. In the early days of Instagram, I always saw myself as an extremely efficient engineer—passionate about mobile development and possessing a strong intuition for product direction. But even so, turning an idea in my mind into a fully realized product still required at least four or five all-nighters. Back then, pulling all-nighters was routine: staying up until 4 a.m., then sleeping until noon—this schedule left no room for family life, but it was truly my “Builder mode” back then.

Looking back at Instagram’s V1—it had more features than the media tracker I built this weekend, but there was no fundamental, order-of-magnitude difference. Back then, Kevin and I pulled five all-nighters in a row to ship that V1: I handled all the frontend and backend myself, while Kevin tackled the initial image filters. And this was only possible because both of us had years of iOS development experience.

Not to mention how frustrating the iteration pace was back then. After the product launched and became an instant hit, we had countless new ideas piled up in our heads, but all our energy was consumed just keeping the servers from crashing under heavy traffic—or barely squeezing in time to add a tiny incremental feature. Take the Hashtag feature, for example: it took me a full week just to finish writing it, while you had ten thousand other things you wanted to do, all stuck in the backlog.

So, it’s not just that time has been compressed—even though build times have been reduced to an astonishing degree—but more importantly, the other side of the coin: you can now instantly iterate on what you already have, with unprecedented smoothness and fluidity.

Moreover, this红利 has begun to spill over, far beyond the circles of professional software engineers and founders like myself. In the past, if you had an excellent business idea but couldn’t code, your only options were two: either hire freelancers—subjecting your vision to severe information distortion and subpar deliverables—or desperately seek funding. Now, however, the gap between “intent” and “execution” has been leveled for non-technical individuals.

A few days ago, I received a message from a colleague internally. We had helped her set up an internal tool that connected Fable’s capabilities with access to some of our internal MCP (Model Context Protocol) systems. She works in HR, and excitedly told me: “For the first time in my life, I feel there’s no gap between what I think in my mind and what exists in the real world—I can just create it directly.”

That moment was truly a landmark, eye-opening experience for her. Just四五 years ago, if she wanted a dedicated business tool, she’d either have to cobble together makeshift solutions using off-the-shelf software or beg the internal tools team’s engineers—whose Jira backlog likely contained 50 higher-priority requests. But now? She’s enthusiastically carving out her own territory in the world of code.

This is also what I find most exciting about the future: human creativity is limitless, and one of the most remarkable things we’re doing today is infinitely expanding the group of people who can turn their ideas into reality.

Is software engineering dead?

Host Dan Shipper: I completely agree with you. But I imagine many people are now wondering: given everything you’ve just described, is software engineering as a field completely over?

Mike Krieger:

The essence of software engineering has completely changed. It is undergoing a profound transformation.

If you had asked me back in the days of Instagram, “What exactly is software engineering?” I would probably have told you: thoroughly work through complex design challenges, build a solid system architecture, then spend countless hours in TextMate or Xcode—digging into the底层 details of Django ORM, deploying, and tirelessly fixing bugs. Today, most of these steps have been completely overturned and are rapidly moving toward the boundaries of product management. The clear divide between product managers and engineers has become extremely blurred—a reality that is especially evident within our own development team.

But if you step beyond the rigid, literal definition of "software engineering" and consider the broader concepts of "software production" or "software development"—rather than focusing solely on the narrow slice of programmers writing code—you’ll see that this industry isn’t just thriving; it’s at an unprecedented core position.

The emergence of Fable truly elevated my trust in AI models to a new level—I began letting it "run end-to-end automated workflows and even make sound system architecture decisions." On the technical execution side, AI has come incredibly far. But “capturing the soul of software craftsmanship”—such as understanding exactly which user pain points you’re addressing, or whether the experience you create is truly remarkable—these high-level judgments remain profoundly human, irreplaceable by machines.

Of course, this painful transition is not painless for many people.

In this world, many people have been deeply captivated by the craft of writing code entirely by hand. I was exactly like that in my day. The thrill of solving a bug that had stumped me for three days—“I nailed it today!”—was irreplaceable. Back then, you’d even dream about code—if you’d ever experienced it, your dreams were filled with relentless logical puzzles, and in the instant you woke up, the solution would suddenly strike you. That pure era of craftsmanship is likely gone for good.

I recently spoke with some of the most hardcore engineers I know in the industry, and they all expressed a complex mix of emotions: a profound sense of loss watching traditional craftsmanship fade away, alongside sheer exhilaration at how incredibly powerful their current concurrent productivity has become.

How the Anthropic engineering team works today

Host Dan Shipper: Since the proposition holds—that software engineering is not only alive but thriving—how does your own R&D team at Anthropic actually work on a day-to-day basis?

Mike Krieger:

There are several very clear clues here that I can discuss in conjunction with the complete software development lifecycle and my daily observations of development work.

First, there is still a significant amount of human alignment. Teams gather in meeting rooms to brainstorm and discuss the next evolution of Cowork, then break down the roadmap into distinct areas of responsibility for each member. This step remains crucial, because many holistic contextual insights—only accessible to humans—are currently beyond Claude’s ability to perceive remotely—such as the true business intent behind the product, ongoing development undercurrents, and information about other product lines that are about to be discontinued or are preparing to be integrated in subtle ways.

Although our team has equipped everyone with multiple Claude supercomputers, in terms of management, each person still bears the title of DRI (Directly Responsible Individual) and is accountable for a specific module of the product. I believe this mechanism will not disappear in the short term, because there is a fundamental gap between the macro-level vision of "distributed collaboration to refine the product together" and the micro-level execution of "how do I get Claude to complete this specific task today?" While we are strongly promoting minimalistic meetings, these preliminary brainstorming and alignment sessions remain essential.

Second, there are numerous "asynchronous tasks." Many of our engineers have customized their own dashboards to monitor what their Claude teams are doing: "Where is my specific Claude Code currently in the process?" "What tasks are stuck in the queue waiting for my approval?" "Which pull requests require my intervention because they were rejected by other colleagues or by a large model’s code review?"

Today, engineers spend a significant portion of their time maintaining these workflows. Some of the collaborative tools we are standardizing, but most still retain a strong hacker-like personal touch—just as programmers once customized their desktop environments, they are now personalizing their large model workflows.

Moreover, it is about understanding how code actually behaves in production environments—another cutting-edge frontier that large models are currently striving to master. Fable has made significant progress in this area, but there is still a long way to go: for instance, deeply understanding what truly happens after code is deployed and goes live. Systems can crash, and unexpected, bizarre failures can occur—in fact, during the years from 2012 to 2016 at Instagram, I spent much of my energy handling these production incidents and scaling the architecture. When responding to live outages, the role of senior engineers remains irreplaceable: you must rely on years of incident response experience to stay completely calm, collect comprehensive log data, implement immediate containment measures, and then analyze and devise long-term, fundamental solutions.

Finally, I want to emphasize that the role of the "engineering prototype" has completely changed today.

You must clearly and sharply define whether what you’re holding is a demo or production-ready code. In the past, Silicon Valley had a popular saying: “Code wins arguments.” Personally, I’ve never been fond of it, because its underlying implication is that whoever can write code holds the power of persuasion. But now, something fascinating has reversed: sometimes, when we’re deadlocked on a product direction, it’s often a non-coding PM who walks over and says, “I just built a quick demo myself—sure, it’s rough in eight details, but look, this path definitely works!” And instantly, that opens up a completely different, higher-level conversation.

Looking back, almost all of our current development approaches are unrecognizable compared to six months ago. The most obvious characteristics are the terrifying level of development parallelism and the absolute necessity for the team to perform high-level abstraction of workflows.

But one thing has remained unchanged from start to finish: humanity's sense of ownership and responsibility toward products.

Verification mechanism

Host Dan Shipper: Fable is also expensive. When I tested it, I felt like a kid in a candy store, excitedly exclaiming, "I want this, this, and this!" But when it came time to check out, every time I hit enter, I hesitated, wondering, "Could this one cost me $100 or more?" I think this high price tag effectively creates an invisible barrier around who can use it and what it can be used for. What’s your take on its business value?

Mike Krieger:

In the field of professional software engineering, this ledger is actually the most clearly accounted for. Pricing involves numerous internal considerations. It is indeed significantly more expensive than Opus, but when you measure the incredible volume of work delivered per instance, it feels almost like a giveaway on many business levels—of course, everyone has their own economic calculus.

From the software team’s perspective, if the first stage is the company encouraging employees to adopt AI programming—where the models are still early and the tools are not yet mature—and the second stage is creating leaderboards to see who uses it the most, which can lead to suboptimal incentives, then the third stage is identifying who uses it most effectively, enabling those individuals to use it as much as possible, while establishing a clear process to avoid waste.

The Fable tier model perfectly aligns with the logic of phase three. If you consistently deliver high-impact results and generate tangible, real-world value within the business, the company will naturally develop a positive feedback loop in its budgeting process to support you indefinitely.

On the personal use side, I also use my own credit card to pay for our services when running tests. At times like these, you naturally become more frugal and cautious. But interestingly, the media tracker I built over the weekend only cost me a bit more than usual—there’s no way a personal side project like this ends up burning through thousands of dollars.

What’s truly being held back by price are open-source enthusiasts and indie hackers who aren’t backed by big companies and are highly price-sensitive. My advice to them is: go ahead and run, and see just how much you can deliver in one go without getting stuck in endless back-and-forth.

The concept of 'cost' has now evolved into a multidimensional one—you’re no longer just calculating the cost of a single query, but the comprehensive cost of fully accomplishing a task. What impresses me most about Fable is precisely this latter aspect: it consistently aims to get things right the first time, sparing me from sitting at my computer, going back and forth nine times, and desperately shouting, "No! That’s not what I meant!"

Host Dan Shipper: What struck me the most is that when you give it a high-level task, by the time it delivers, you realize it has worked out every single detail—even the most obscure corners—with an overwhelming level of precision I’ve never experienced with any previous model. Can you share any insights into the training process? What exactly was fed into it to produce such astonishing insight?

Mike Krieger:

On many levels, it’s a continuation of the team’s extensive efforts—I have nothing but admiration for our pre-training and RL teams. The most obvious evolution for me is a “sense of the entire system,” rather than just awareness of the current task.

I’m often amazed by its incredible actions. For example, after writing a piece of code, it suddenly pops up and says: “Boss, I know the configuration in a real production environment might be different. Did you turn on that feature flag? If not, what I just wrote won’t take effect when deployed.”

Or observe how it responds to feedback on code reviews—whether from a person or another Claude—it doesn’t simply say, "Oh right, that’s an issue, I’ll fix it." Instead, it genuinely considers whether to accept a risk given the current level of fidelity, or challenges another reviewer—often another Fable model—saying, "I understand your point, but I disagree; I think that’s incorrect."

It’s crucial for the model to have this kind of judgment. If I were to point out where it has improved the most, it’s that it no longer reflexively says, "Yes, yes, I’ll fix it"—instead, it’s more like, "Let me think about that. I still disagree." This ability is extremely valuable.

Products like Claude Code are incredibly valuable because you have something tangible that people can say, “This is where the model excels, and this is where it falls short.” We rank Every’s team highly among our most trusted sources of feedback because they subject the model to sustained, multi-day, high-intensity tasks—this is crucial for us to understand what needs improvement in the next generation.

Host Dan Shipper: Is chat the most suitable interface for this model? It’s not really turn-based; it’s more like delegating tasks to someone. How does this affect how you should use it—or how you perceive the interface?

Mike Krieger:

The basic model of sending and receiving messages is not entirely wrong, but we need to evolve in certain directions.

First: Is your laptop the right place for this? This is exactly where I previously mentioned how useful mobile devices are for personal projects. The creators of Claude Code have always been a step ahead in how these models are used—about nine months ago, when I spoke with him, he said, “I’ve moved most of my Claude Code work to mobile.” I was skeptical at the time, but especially at the level of Fable, since it can maintain ongoing conversations and we have remote development machines at Anthropic, the first point is: decouple where the work happens from where I’m discussing the work.

Second, building on what I mentioned earlier: How do you take everything Fable has discussed, decided, or suggested, and make it understandable? This is the area we’re currently exploring. There are some skills that can help visualize it, but the current chat UI isn’t sufficient—Fable sometimes gives you an overwhelming amount of text, and you need to go for a walk just to be ready to process it. One thing I started doing is saying: “You have far more context on this than I do. Could we go back—could we do more progressive disclosure of complexity?”

The third is multiplayer mode, and we’re still in the early stages of exploring this. In some ways, because we have DRI and ownership area structures, a typical important task flows between one person and several Claudes. But in some cases, it’s less clear—perhaps during incident response, when multiple people are thinking simultaneously, or in projects where multiple cross-functional domains converge. Chat sharing helps to some extent, but I believe the future will demand this: you have an independent Claude that one person initiated and has done a lot of work with—can it stay synchronized with all the other work being done by the rest of the team? This is the next interesting and underexplored frontier. What’s exciting is that models now have the capability to become true teammates, and we’re almost holding them back due to the lack of proper abstractions.

Host Dan Shipper: This makes me think that most of the time I use this model, I’m working on my own vibe coding projects—but when you’re using it within an organization, there’s a problem: Do I really understand all the parts the model just generated? How do I transfer the context of what the model just did into my own mind? That’s a major bottleneck. How do you draw the line on “how much do I actually need to know,” and how do you ensure you have enough context to feel confident?

Mike Krieger:

Two main points. The first is validation. Early this year, I was fully convinced by validation—it connects to something I experienced when I used to code full-time: find the tightest development loop to center your idea around. In the Instagram era, this sometimes meant creating a new build target in Xcode containing only that screen and synthetic data, iterating solely on that loop. I would mentor new engineers by saying, "If I could teach you just one thing, it’s to set this up for your project—it will make things much faster."

Currently, whenever I build something, I make sure every PR from Claude includes photos or videos—whether it’s an iOS PR or a UI-level change. This gives you a lot of confidence. Fable might go off and work for hours on its own, then come back and say, “I’m done,” and you see “here’s a gallery of all the UI screenshots”—and that’s incredibly helpful. You might say, “In screenshot eight, that error state—I’ve never actually seen it before, but I can tell exactly what the user would experience if they encountered it. Let’s fix this.” Comprehensive validation is something we’ve been strongly focusing on internally.

Second part: Ultimately, you are still accountable for the work you do. Many people use Claude every day, but there remains a sense of accountability—“Claude may have written the code, but you need to understand what high-level decisions were made.” I’ve seen a growing number of engineers adopt a practice: after Claude completes the task, they follow up with a conversation—“Can I make sure I fully understand all the trade-offs you made?” Whatever the output—a small artifact—it’s worth doing whatever it takes to make it easily understandable.

During meetings, it’s interesting—someone says, “I’ve got this PR ready,” and another asks, “Did you do X or Y?” Then there’s a moment of pause: “To be honest, I’m not sure—I’ll find out before merging.” Adapting to this new normal and learning how to work with it is something we all need to master.

Host Dan Shipper: The "verification loop" you just mentioned is incredibly imaginative. Beyond automated screenshots and screen sharing, what other more advanced approaches are you exploring?

Mike Krieger:

Our core focus is: Can you make it run actual workflows, rather than just injecting static data? As systems grow more complex, this becomes increasingly difficult. For example, we need the iOS apps generated by Fable to be able to log in to our simulation environment with a single click, using only real test accounts and high-fidelity live data streams. At the same time, we don’t want it to painfully re-run an 8-step new user registration process every time it tests a minor button adjustment. To solve this, we’ve developed a specialized high-privilege system with encrypted shared keys specifically for AI, enabling it to bypass preliminary steps with a single click and directly access the core business environment—ensuring its testing experience is nearly pixel-perfectly aligned with that of a real user.

The second part is the combination of the known path and the currently modified path—the former is highly valuable for regression testing. We have articulated some idealized workflows in text, which Claude can repeatedly verify. Additionally, Claude excels at articulating the intent behind the changes it is currently making, so this portion will be thoroughly practiced. The combination of both is crucial.

Visual verification is also crucial, and video is an extremely underutilized tool for Claude. I recently built a prototype: I recorded videos of what Claude created, fed them to it alongside FFmpeg, and watched as it analyzed each frame individually, then said, "This animation has a stutter—I'll fix it." Screenshots can never capture this, because they miss that exact moment.

For parts that are difficult to test end-to-end, having Claude build a reliable mock backend—or even use an existing one—is also very compelling. In the era of Artifact, we had comprehensive testing even before the LLM era: every piece of infrastructure had a robust in-memory implementation that could run quickly in unit tests. Now, extending this idea into Claude’s domain: I’m working on something with a fairly robust backend that’s hard to start up on my development server, and it’s now got an excellent substitute. Over time, this substitute has evolved alongside the codebase itself. Previously, I would have said, “Keeping these in sync is too much work.” Now, I simply think, “Claude will read the changes, adapt the substitute, and keep both sides in sync.”

Host Dan Shipper: There are some really interesting architectures—when you receive a bug, an agent automatically fixes it and then messages the customer saying, "Fixed." Have you noticed any changes in this kind of workflow on Fable?

Mike Krieger:

Several aspects. On the human-Claude level, there’s one thing I’ve repeatedly observed: When someone reports a bug in our Slack feedback channel, that thread is passed into a Claude Code session. Thanks to the Slack MCP, it can pull up that thread and respond on my behalf: “This is Mike’s Claude—I’ve fixed it; here’s the PR link.” But then it adds: “Hold on—it’s not live yet. I’ll notify you again once it goes live.” Hours later: “The deployment has been released. You should try it out and see if the fix worked?” This closed-loop follow-up is relatively new. I’ve had several long-running Claude Code sessions interacting on my behalf, and I’ve also included some disclaimers within them.

The second point brings us back to the taste and judgment we were just discussing. One level is: "There’s a bug report, so I need to fix it." Another level is having good judgment. Over the weekend, I encountered a situation: we had an internal system running for a long time without a restart, and it developed a memory leak. Good judgment would be: "Mike, it’s the weekend—just restart the server now to resolve the issue immediately, and I’ll open a PR asynchronously for a long-term fix." If you involve Claude in this bug-to-fix process, you truly want it to understand what any good SRE or engineer would understand: solve the immediate problem first; whether to migrate platforms or refactor can be decided later. Understanding this balance is crucial.

What should people build using this model?

Host Dan Shipper: What’s most exciting about this generation of models is that they don’t just raise the floor—enabling anyone, regardless of background, to build their own app with a single click—but they also shatter the ceiling for experts. If you’re a professional engineer or a founder of a startup, you now have the ability to single-handedly tackle projects that were once unthinkable. In your view, what are some cutting-edge fields that people haven’t fully realized yet, but could confidently pursue using this generation of models?

Mike Krieger:

Here are a few ideas—maybe we can start with something fun. People always have creative ideas about how to express the complexity of their worlds; everyone has a domain they deeply understand, and there’s always a version of the question: “How can I explain this to someone else? Can I apply technologies from other fields to my own work?” Take my friend Tai Tan—she’s recently plunged into environmental engineering, focusing on geothermal energy, a field packed with head-scratching mathematical models and fluid dynamics simulations. But with the generational leap in Fable’s reasoning capabilities, she’s now successfully integrated cutting-edge technologies far outside her expertise into her own research. Today, she can even task Fable with building a full end-to-end deep learning simulation system using PyTorch—an idea that would have been pure fantasy for a scholar without a computer science background just a few years ago.

The second is its ability to combine software to solve problems that are uniquely yours. Internally, we’ve done a lot of work to MCP-ify as many of our internal systems as possible, paired with the right permission structures and deployment configurations. There are also excellent external PaaS platforms—you can simply ask Claude, and it will set them up for you. But I particularly love the feeling of having built something you’ve always wanted.

Another thing that recently shocked me: One of our internal commercial team colleagues, who doesn’t have a technical background, has deeply integrated Claude into every aspect of her daily workflow. What’s most astonishing is that she didn’t stop after launching version 1—she kept using this tool, quietly iterating intensively with the large model for months on end.

This precisely reveals the most severely underestimated—and most compelling—aspect of this generation of reasoning models: in previous generations, models operated near their capacity limits, often hitting a "complexity ceiling." Once your business code or logic reaches a certain scale, large models begin to "ignore the consequences," and adding new features causes them to crash with errors, actively corrupting your existing architecture.

But now, this code-illiterate colleague, empowered by a model like Fable, has been nurturing her system in the background for several months. You can clearly see the software growing, growing, and wildly evolving like a living organism under AI’s cultivation. Today, she has begun rolling out this vast, complex, self-built system company-wide across our commercial departments.

An ordinary person with no programming background has, on their own, pushed the complexity ceiling of a long-term software project to an almost suffocating level—an unprecedented miracle in the history of human technology.

Dynamic workflow

Host Dan Shipper: You mentioned another very powerful thing—dynamic workflows. Can you elaborate on that for me?

Mike Krieger:

Internally, we often develop cutting-edge tools of this kind, and I constantly push the engineers who build them in the office: “When will this finally be released publicly?” Sometimes, it’s due to underlying infrastructure limitations that require us to run them internally first, but we’re doing everything we can to get these tools to market as soon as possible. To me, dynamic workflows are absolutely one of those game-changing innovations that will blow the world away.

There are two major reasons why models like Fable are so powerful. First, they help you build scaffolding for deep, meaningful work. One of the craziest things I’ve done with it was to hand Fable a complex internal Python project and have it completely refactor the entire core business logic into TypeScript—driven by a very specific production deployment requirement.

Back when we were at Instagram, senior leadership once seriously discussed: "Should we completely rewrite the entire underlying codebase of IG in Hack to seamlessly integrate it into Facebook’s infrastructure?" Our conclusion at the time was: Absolutely not—it was not realistically feasible.

But just last weekend, faced with another similarly tangled core codebase, I handed it a dynamic workflow in the background and went off for my weekend. I set it the following workflow: deeply understand the existing code, generate a detailed specification-like document explaining how everything works, then translate module by module, perform incremental testing, conduct adversarial validation, and check for missing elements. When I returned on Monday and opened my laptop, a miracle had occurred—it had already transformed into a brand-new system running on the TypeScript and Bun toolchain, and in some architectural aspects, it was even more elegant and faster than my original Python version.

Another more compelling long-term reason is that, as dynamic workflows become widespread, we will soon be able to seamlessly distribute subtasks of varying difficulty to model teams matched to their respective complexity levels.

Host Dan Shipper: For those who haven’t used it, tell me how you built that workflow—how did you design it, and how did you ensure it was good?

Mike Krieger:

The entire training process is filled with a geeky, iterative charm. I started by simply opening Claude Code and saying, "Bro, I’ve got an extremely tricky refactoring task on my hands—let’s team up and design an automated workflow first."

It showed me the plan, and I said, "This is close, but I need three to four additional verification layers to check for missing features." Then it replied, "Here’s your plan. Are you ready?" The workflow is expressed in code, and I find this extremely valuable—you can see exactly how it’s going to be implemented.

After it completed the full port, I made a few minor follow-up adjustments, which I treated as mini-workflows, building on the output of the previous workflow. This brings us back to the question: Is chat the right interface? A workflow is a good middle ground—you use chat to orchestrate it, but it’s expressed in code and executed within a clean UI, showing what happens at each step. I think we’ll use a similar approach in the future to connect long-horizon tasks with chat.

Organized & Compiled by Shenchao TechFlow