Latent Space: The AI Engineer Podcast cover

Latent Space: The AI Engineer Podcast · September 25, 2025

Building the God Coding Agent

Highlights from the Episode

Quinn SlackCEO of SourceGraph
00:00:57 - 00:03:14
Origin of AMP and its differentiation from Cody
I returned to software in February, just as Claude 3.5 was released. Quinn and I began experimenting: what if we gave Claude 3.5 tools and let it operate without constraints? Unlike Cody, which had specific functionalities, we wanted to try something new. We started a new project, and I remember the first week in San Francisco, I'd excitedly exclaim, "You've got to see this; it's crazy!" Quinn then tried it, and we quickly realized we had a different kind of product. Cody was a pioneer with its RAG and assistant panels. However, this was a tool-calling agent, which I define as a model, a system prompt, and tools with accompanying prompts, all granted significant permissions.
Quinn SlackCEO of SourceGraph
00:06:34 - 00:07:55
Adapting to rapid changes in AI development
One of AMP's biggest axioms is that we don't know where this R is going, but we do know it's changing every few months. At the start of the year, Cursor was king, the fastest-growing startup of all time. Now, if you asked many developers about the default king, I doubt they'd name Cursor first. Someone in sales recently said something that made Cursor look like GitHub Copilot—old and boring in enterprise. Think about that: Copilot isn't old; it was state-of-the-art just two years ago. The world has completely changed, and we know more changes are coming. From an engineering and business perspective, this is priority number one: position yourself to react to these changes quickly. Position your product, expectations, and codebase to allow for rapid adaptation. Everything else we've done flows from this principle: everything can change with the release of another model or similar development.
Quinn SlackCEO of SourceGraph
00:09:24 - 00:11:43
AMP's agile development process and team structure
When we started, Quinn and I worked on the main branch with no code reviews, just pushing changes. It felt like a personal project. As experienced engineers, we each owned our work. If you broke the AI, you fixed it. This rapid pace, shipping frequently, requires making about 15 decisions daily. You constantly switch between a "duct tape, personal project, move fast" mode and a "Google-style" mode. This demands expertise and freedom from the mindset of always doing things the Google way, which assumes product-market fit and scaling up. Every company I've worked for operated on the assumption that once a product exists, it should be engineered for scale. However, AMP's core understanding is that even if something scales, we must be prepared for new technology to emerge and shift everything. We have to be ready for this. Our development mode reflects this. The AMP core team is small, around eight people. We still don't do formal code reviews, push directly to main, and ship 15 times daily. We dogfood our product extensively. In a fast-moving environment, this speed, combined with fast feedback loops and dogfooding (using the product to build the product), outperforms many established processes. We can get away with this because we dogfood it.
Thorsten BallAmp Dictator from SourceGraph
00:20:41 - 00:22:40
Focusing on the best coding agent, not feature parity
We are not trying to maximize our revenue or user adoption. Given the rapid changes in today's models and tools, we aren't competing with Cursor for users to fix things with our AI or theirs. Frankly, it doesn't matter to us. I don't believe that interaction will be a significant way people engage with AI in six to twelve months, and we learn nothing from it. So, we decided against it, even though some users have requested it. We also need to understand what users truly want. While they often express a desire for many features—like bring-your-own-key, model choice, or specific subscription pricing—what we've observed is a demand for the very best coding agent. We focus on users who prioritize this. When we explain how certain features would slow us down, they often prefer not to have something they might use only 2% of the time if it compromises the tool's overall quality. We feel we are being uniquely honest and bold in the industry. I'm concerned that many other great tools, such as Clau Code, Codex, and Cursor, have forgotten what made them successful: building the best product. They've become too focused on current capabilities, which could lead to a peak and then a decline. No software business model thrives without future growth. We believe this approach is best for our business and pushes the entire industry to embrace radical change.
Quinn SlackCEO of SourceGraph
00:23:42 - 00:24:59
Beyond the model: system prompts, tools, and scaffolding
Six months ago, the focus was on new model releases, with everyone announcing their availability in various editors or extensions. That trend seems to be fading. People now realize that benchmarks are just one aspect. A model might perform well on paper but feel different in practice. While models are still important, their significance is diminishing. Users are recognizing that the model is only one component; the system prompt, tools, and scaffolding around the model are equally crucial. I could offer you Gemini 2.5 in AMP, but without a finely tuned system prompt that aligns with the model's training, it wouldn't be effective. Models are trained differently, so optimizing the surrounding tools for a specific model is essential. Without this optimization, you get inaccurate results. I could integrate a new model in minutes, but that's not what users want. They desire the best possible version of a model within a specific tool, and that integrated experience has become more important than simply having a model selector.
Thorsten BallAmp Dictator from SourceGraph
00:27:06 - 00:28:31
Rapid evolution of AI models and agent interaction
It took people eight or nine months to understand the capabilities of 35 Sonnet after its release last June. This was around the time we were building AMP, and Claude Code was introduced, revealing the incredible potential of tool-calling agents. At that moment, the world's brightest minds recognized that billions of dollars invested in training new models and harnesses were justified. Now, in September 2025, we are seeing the benefits of that investment. Many more models are emerging, including open-source options like Quint 3 Coder and Kimmy K2, which are evolving rapidly. We also have XAI's models and G5, and we're still learning how to best utilize them. It would be incredibly pessimistic to think that all those smart people and significant investments wouldn't yield something better than Sonnet. Currently, about half of our internal team uses a model other than Sonnet as their primary way of interacting with AMP, a significant shift. Previously, this was only for testing and done reluctantly. Now, we embrace a different, non-linear chat transcript interaction with agents. This approach feels distinct, not like a cheaper, mid-tier model, but a beneficial interaction where speed is key and it's more constrained. Things are changing rapidly.
Quinn SlackCEO of SourceGraph
00:31:59 - 00:36:30
Scaffolding around models and non-deterministic LLMs
Our main assumption is that everything is changing rapidly, so we need to move fast. Instead of a harness, I envision a scaffolding around the model. This wooden scaffolding should fall away if the model improves or needs to be switched out. We embrace the idea that many elements might become integrated into the model as it gets better. Why invest months in a separate model when the next version can handle all edits independently? With this in mind, we restrict the features we add around the model. We could constantly add features, making the product more complicated, but we choose not to. We are also living in strange times from a product development perspective. The traditional triangle of design, product, and engineering is evolving; it's no longer a triangle. This is because roadmaps are difficult to create. People are still discovering how to fully utilize these models, figuring it out as they go. Another point is that the primary UI is often a text interface, which can be misused. For example, using Jira for a shopping list, while making Atlassian happy, isn't its intended purpose. With LLMs, you can use them incorrectly and still appear to get results. You might use ChatGPT to look up serial numbers, and it will provide an answer, but it could be wrong. It might work 95-98% of the time, but the 2-5% failure rate is significant. Having non-deterministic LLMs at the core of a product is unprecedented in software. Many elaborate workflows, like custom slash commands triggering sub-agents and tool calls, have led to "hangovers." People realize these seemingly deterministic workflows aren't reliable enough if they only work 98% of the time. We are very conscious of this. Everyone is experimenting and sharing experiences, but we must be strict about not giving users a false sense of the product's capabilities and reliability. It's dishonest and doesn't lead to good results. For instance, we've been ahead of the curve, using AMP internally and experimenting with agentic adoption for months. We tried many things and realized they weren't the best use of our time or tokens. Now, others are catching on. Armin Ronacher, a Python developer, initially tweeted excitedly about what he could achieve with Claude code. A month later, he realized that controlling eight remote agents with his phone for 20 hours wasn't as productive as he thought. We are very conscious of these challenges.
Thorsten BallAmp Dictator from SourceGraph
01:10:31 - 01:12:08
Future of engineering and impact of coding agents
Looking ahead, companies have already significantly slowed their engineering headcount growth over the past few years. This is a global trend. Engineers, like those on the AMP team and at other companies, are heavily utilizing agents. This eliminates middlemen, bringing product builders closer to the customer. An idea from a customer can be heard in a meeting, an agent can then build a first draft, streamlining the process. Consequently, the person using the coding agent becomes much closer to the problem and stands to gain more rewards from solving it, as profits don't need to be shared as widely. This doesn't mean large companies will disappear. Instead, individuals with incredible vision, who are intimately familiar with a problem and highly motivated to solve it, will be equipped with coding agents. Building a coding agent that empowers these individuals to create something truly great, quickly, is far more valuable than one designed for the average developer, making them only 30% better. We are targeting those highly incentivized individuals. This approach isn't about "vibe coding," which is an unproductive discussion due to varying definitions. Often, it involves agents writing code with poor feedback loops and quality control, which isn't valuable. Instead, it's about giving these individuals the ability to build something truly great, really fast, when they are so incentivized and have every desire for it to work.

Get weekly highlights

Subscribe to get the best podcast highlights delivered to your inbox every week.

00:00:0000:00:00