WEBVTT

00:00:12.020 --> 00:00:15.440
<v Scott>Welcome to the Bikeshed podcast where we dive into all things software engineering

00:00:16.150 --> 00:00:21.460
<v Scott>sip some copacetic coffee and ask the big question is the ai bubble finally starting to crack

00:00:22.380 --> 00:00:27.820
<v Scott>i'm your co-host whose entire workflow lives in floating windows raycast and the terminal

00:00:28.380 --> 00:00:34.040
<v Scott>Scott K. Alongside with me are my co-hosts. If it runs in JavaScript, he'll roast it,

00:00:34.400 --> 00:00:38.900
<v Scott>praise it, and make you rethink everything you ever knew about engineering. Matt Hamlin.

00:00:39.450 --> 00:00:43.100
<v Scott>And my other co-host. He doesn't just give takes. He forges statements,

00:00:43.600 --> 00:00:49.720
<v Scott>searing them into your memory hole. Dillon, spicy take, hurry. Fellas, what are we cooking up today?

00:00:51.760 --> 00:00:54.100
<v Matt>Yeah, today we're going to talk a little bit about AI code reviews.

00:00:56.020 --> 00:00:57.160
<v Matt>Yeah, I don't know.

00:00:57.950 --> 00:01:00.740
<v Matt>I think, Scott, you originally floated this as a topic idea.

00:01:02.000 --> 00:01:04.820
<v Matt>I'm actually curious to dig into how you guys are using it

00:01:04.930 --> 00:01:11.100
<v Matt>or if you guys have internal tools set up to do AI code reviews at work.

00:01:11.860 --> 00:01:13.480
<v Matt>We do, and it's kind of funky.

00:01:13.780 --> 00:01:17.000
<v Matt>But yeah, I'm kind of curious to hear your guys' experience with it.

00:01:18.020 --> 00:01:19.840
<v Dillon>All right, guys, I'll dive in first.

00:01:20.580 --> 00:01:22.940
<v Dillon>I'm going to jump the line at Whoop.

00:01:23.600 --> 00:01:24.060
<v Dillon>That's where I work.

00:01:26.020 --> 00:01:26.880
<v Dillon>we're using like

00:01:27.000 --> 00:01:29.160
<v Dillon>it feels like we're using three different tools at once

00:01:29.380 --> 00:01:30.160
<v Dillon>I don't know what's going on

00:01:30.540 --> 00:01:32.180
<v Dillon>I guess they're piloting a bunch of different ones

00:01:32.880 --> 00:01:34.240
<v Dillon>we got something called Greptile

00:01:35.320 --> 00:01:36.700
<v Dillon>we got something called Copilot

00:01:36.920 --> 00:01:38.080
<v Dillon>which is GitHub's version of

00:01:38.600 --> 00:01:39.860
<v Dillon>code reviews with AI

00:01:40.620 --> 00:01:41.940
<v Dillon>and then I think there's one other one

00:01:42.200 --> 00:01:43.420
<v Dillon>maybe CodeRabbit

00:01:44.160 --> 00:01:45.420
<v Dillon>does that sound like a thing that's real?

00:01:46.260 --> 00:01:48.220
<v Scott>yeah it is, we know somebody who works there

00:01:48.320 --> 00:01:48.620
<v Scott>right Matt?

00:01:49.820 --> 00:01:51.660
<v Dillon>whoa, you can't just name drop

00:01:51.840 --> 00:01:52.399
<v Dillon>the episode

00:01:54.260 --> 00:01:59.520
<v Matt>wait who what i don't know someone that works there

00:01:54.260 --> 00:01:59.520
<v Scott>yes you do you worked with him

00:02:01.359 --> 00:02:04.460
<v Matt>who

00:02:01.359 --> 00:02:04.460
<v Scott>i believe he works at code rabbit i'm gonna look it up right now

00:02:06.899 --> 00:02:13.420
<v Scott>arvind arvind from fireworks uh

00:02:06.899 --> 00:02:13.420
<v Matt>okay maybe we cut this from the episode

00:02:13.920 --> 00:02:17.819
<v Matt>worked with them is a strong term choice of words

00:02:20.240 --> 00:02:21.580
<v Scott>I'm not caught in from the episode

00:02:21.960 --> 00:02:22.580
<v Scott>that's staying right

00:02:25.820 --> 00:02:26.600
<v Dillon>yeah I feel like

00:02:27.660 --> 00:02:30.000
<v Dillon>I don't know which one of those is going to show up in my PR

00:02:30.160 --> 00:02:31.900
<v Dillon>half the time lately it's been Greptile

00:02:32.880 --> 00:02:35.980
<v Dillon>and I didn't realize until recently that you could actually reply to its

00:02:36.230 --> 00:02:38.800
<v Dillon>comments until I saw somebody else do it

00:02:39.360 --> 00:02:42.080
<v Matt>is it shipping your PRs for you or is it just leaving comments

00:02:42.940 --> 00:02:45.959
<v Dillon>it's just leaving comments and it'll do changesets

00:02:45.980 --> 00:02:51.540
<v Dillon>sometimes like it it reviewed somebody else's pr that i approved yesterday and it was like hey

00:02:51.860 --> 00:02:58.060
<v Dillon>this is like a critical bug don't do this and i was like this thing is so dumb and then i like

00:02:58.280 --> 00:03:03.560
<v Dillon>looked a little closer and i was like oh shit that's like an actual bug we need to fix that

00:03:05.120 --> 00:03:09.539
<v Matt>so you heard it here first like the ai code reviews are better than Dillon's code reviews

00:03:10.160 --> 00:03:17.360
<v Dillon>yeah but my like my take is that we're getting so reliant on it that it's just making us dumber

00:03:17.800 --> 00:03:25.060
<v Dillon>and then we're like oh the ai will catch it um i'm not gonna like review this as in depth as i used

00:03:25.160 --> 00:03:31.380
<v Dillon>to maybe that's that's my hot take

00:03:25.160 --> 00:03:31.380
<v Scott>i totally feel that with ai in general but i i just

00:03:32.620 --> 00:03:38.199
<v Scott>not to float off topic too much i feel like maybe i was listening to syntax so i'm syntax pilled

00:03:38.840 --> 00:03:41.520
<v Scott>but it is how I've always felt.

00:03:41.660 --> 00:03:42.460
<v Scott>It's like we're becoming,

00:03:42.920 --> 00:03:44.700
<v Scott>like these smaller tasks are becoming like,

00:03:45.160 --> 00:03:46.860
<v Scott>I also read an article about this, I believe.

00:03:47.380 --> 00:03:50.960
<v Scott>Smaller tasks are becoming more AI driven

00:03:51.600 --> 00:03:54.000
<v Scott>and we're like kind of orchestrating things

00:03:54.080 --> 00:03:55.160
<v Scott>from a higher and higher level.

00:03:56.220 --> 00:03:57.100
<v Scott>Basically, like you remember,

00:03:58.060 --> 00:04:00.400
<v Scott>I started as like an HTML CSS developer

00:04:01.140 --> 00:04:03.500
<v Scott>and now like that's not really,

00:04:03.760 --> 00:04:05.480
<v Scott>I mean, it matters in some sense,

00:04:05.620 --> 00:04:06.900
<v Scott>like things that need to be really accessible,

00:04:07.780 --> 00:04:12.260
<v Scott>products that have to worry about it but like people aren't hiring for that like the scope is

00:04:12.380 --> 00:04:17.859
<v Scott>just getting larger so like a front end scope has gotten a lot larger let alone like even though

00:04:18.140 --> 00:04:24.300
<v Scott>we're still kind of front end um you just need to be able to ship your entire app nowadays you need

00:04:24.340 --> 00:04:27.800
<v Scott>to be able to build it from the ground up and have an understanding of how the pieces work together

00:04:27.980 --> 00:04:34.479
<v Scott>the more you know about different parts of the stack the more useful you still are matt's matt's

00:04:34.500 --> 00:04:36.620
<v Scott>That's just like giggling it up over here.

00:04:36.620 --> 00:04:38.620
<v Scott>I have no idea what's going on.

00:04:38.880 --> 00:04:39.740
<v Matt>I have no idea.

00:04:40.010 --> 00:04:42.280
<v Matt>So I kind of tuned out what you were saying, to be honest,

00:04:42.910 --> 00:04:45.420
<v Matt>because I have no idea how that relates to AI code reviews.

00:04:46.910 --> 00:04:50.260
<v Matt>So maybe you can connect the dots for me because I'm so confused.

00:04:50.590 --> 00:04:51.460
<v Matt>I'm lost in the sauce.

00:04:52.300 --> 00:04:54.140
<v Scott>It's not specifically code reviews.

00:04:54.380 --> 00:04:57.320
<v Scott>It's the fact that AI keeps taking smaller and smaller tasks,

00:04:58.200 --> 00:05:03.020
<v Scott>but the human still needs to do the larger and larger orchestration

00:05:03.130 --> 00:05:03.900
<v Scott>around those tasks.

00:05:04.880 --> 00:05:09.360
<v Scott>together that's kind of where things have been going sounds like we disagree which is great

00:05:09.490 --> 00:05:15.660
<v Scott>because i'm ready to i'm ready to throw down on this

00:05:09.490 --> 00:05:15.660
<v Matt>uh i don't know if we disagree it's just i

00:05:15.760 --> 00:05:19.420
<v Matt>don't know if i have like well enough formed thoughts on that thread yet i don't know but

00:05:19.470 --> 00:05:24.840
<v Matt>at the moment maybe i'm like really focused in on the code review thread like i want to like pick a

00:05:24.850 --> 00:05:28.580
<v Matt>little bit more at that and

00:05:24.850 --> 00:05:28.580
<v Scott>sure i just think that the code review is another piece of that pie like

00:05:28.720 --> 00:05:33.419
<v Scott>just because okay it can pick up some things it doesn't mean it picks up everything but at the

00:05:33.380 --> 00:05:38.920
<v Scott>same time what Dillon's talking about is like feeling of regression or not retaining memory

00:05:39.720 --> 00:05:45.580
<v Scott>or not feeling like you need to do as much um it's being a little too heavily reliant but you're

00:05:46.260 --> 00:05:49.820
<v Scott>it still matters elsewhere like you still should look at the PR review and have a fundamental

00:05:50.020 --> 00:05:52.800
<v Scott>understanding of how it's being built and if it's the right structure for what you need to do

00:05:53.260 --> 00:06:00.100
<v Scott>especially with AI it's can often be absolute crap and it could be convoluted like one of the

00:06:00.020 --> 00:06:07.340
<v Scott>biggest problems I have might be that just AI churns out extra lines of code when it could just

00:06:07.840 --> 00:06:12.240
<v Scott>reuse things that exist. So it doesn't really write things in a streamlined, concise manner.

00:06:13.200 --> 00:06:18.600
<v Scott>You know, sometimes adding extra code does make sense, but it kind of just creates band-aids on

00:06:18.680 --> 00:06:22.380
<v Scott>top of band-aids until it can get unwieldy. So those are things to look out for when you're

00:06:22.440 --> 00:06:29.440
<v Scott>doing PR reviews. All right. Well, back to talking about reviews. Anyone else want to talk about it

00:06:29.460 --> 00:06:32.980
<v Scott>before I tell you why AI code reviews are great.

00:06:34.040 --> 00:06:37.080
<v Matt>Since Dillon was sharing the three or so products

00:06:37.340 --> 00:06:38.500
<v Matt>that Whoop is trialing,

00:06:39.260 --> 00:06:41.140
<v Matt>we can talk a little bit about what HubSpot does.

00:06:42.040 --> 00:06:44.600
<v Matt>Right now we have this bot called Sidekick,

00:06:44.780 --> 00:06:47.080
<v Matt>which is both a general internal chat agent.

00:06:48.070 --> 00:06:49.760
<v Matt>So you can just go to it and ask questions

00:06:50.000 --> 00:06:50.880
<v Matt>about our code base or whatnot.

00:06:51.640 --> 00:06:53.660
<v Matt>But it's also the same thing that powers our code reviews.

00:06:56.000 --> 00:06:57.999
<v Matt>And it recently got rolled out

00:06:58.020 --> 00:06:59.080
<v Matt>to basically every repo.

00:06:59.830 --> 00:07:01.540
<v Matt>I think it was doing an initial trial

00:07:01.660 --> 00:07:03.720
<v Matt>of infra-based teams using it.

00:07:06.360 --> 00:07:10.060
<v Matt>But I would say maybe we saved the takes for it.

00:07:10.120 --> 00:07:11.220
<v Matt>But yeah, it's interesting.

00:07:12.860 --> 00:07:14.340
<v Matt>But then we also, even before that,

00:07:14.410 --> 00:07:17.300
<v Matt>even before AI, we had this review bot called Sparrow,

00:07:17.660 --> 00:07:19.780
<v Matt>which would go in and auto-approve PRs

00:07:19.900 --> 00:07:22.580
<v Matt>that don't have changes to quote-unquote

00:07:22.730 --> 00:07:23.940
<v Matt>user-facing things.

00:07:24.550 --> 00:07:26.280
<v Matt>So if you only change Markdown in a repo,

00:07:26.340 --> 00:07:30.640
<v Matt>Sparrow could auto-approve your PR, and you can merge without getting a human review, which is kind of nice.

00:07:32.660 --> 00:07:38.500
<v Matt>So, like, I don't know, there's kind of this spectrum of, quote-unquote, like, AI review tools.

00:07:38.650 --> 00:07:46.340
<v Matt>Like, there's sort of automated review cycles that maybe aren't very AI-y, but then there's also, like, actual agents reviewing code.

00:07:47.800 --> 00:07:50.080
<v Matt>Yeah, Scott, tell us how Airbnb works.

00:07:52.100 --> 00:07:54.820
<v Scott>We use Claude again.

00:07:55.500 --> 00:08:02.840
<v Scott>So we have a like Claude PR review checker that just like runs every time you push an update to your PR.

00:08:03.620 --> 00:08:05.000
<v Scott>I find that the most useful.

00:08:05.060 --> 00:08:10.100
<v Scott>I guess that's what I'm advocating for is it often finds things you might not notice.

00:08:10.500 --> 00:08:16.300
<v Scott>Sometimes they're not real problems or they are overly safe.

00:08:17.360 --> 00:08:22.680
<v Scott>I remember like specifically, like I had a handler that was like check toggling something on and off.

00:08:24.400 --> 00:08:28.700
<v Scott>And the handler lived in some conditional logic that could only happen based on the conditional logic.

00:08:29.020 --> 00:08:36.159
<v Scott>Claude review basically told me, like, you also need that condition that it lives in to make sure that it's safe.

00:08:36.550 --> 00:08:38.780
<v Scott>So, like, I thought it was, like, overkill.

00:08:38.909 --> 00:08:45.560
<v Scott>Like, I don't need to say this handler that only lives inside this condition should also check this condition.

00:08:46.150 --> 00:08:47.660
<v Scott>But I just did it to satisfy it.

00:08:48.360 --> 00:08:51.940
<v Scott>So there are some situations where it's not, where it's a little bit overkill.

00:08:52.160 --> 00:09:02.780
<v Scott>I've also seen this, I think, something I was doing last night where I'm working with this diagram, third-party app, and it has a repaint method that I just fire.

00:09:03.620 --> 00:09:17.460
<v Scott>And it didn't like or didn't notice that and expected it to run through a regular React re-render, although it's all singleton pattern, so it's a little strange as is.

00:09:17.680 --> 00:09:22.960
<v Scott>So I ended up making the change that it asked for, but it was not really necessary.

00:09:23.180 --> 00:09:28.260
<v Scott>So sometimes it's overkill, but it does find like a lot of little bugs that you might not

00:09:28.480 --> 00:09:28.560
<v Scott>see.

00:09:28.940 --> 00:09:32.280
<v Scott>And it gives you, it just basically does a code review and it's just like an extra layer.

00:09:32.570 --> 00:09:39.020
<v Scott>We also have in Claude Code, a way to run it on a review.

00:09:39.440 --> 00:09:44.340
<v Scott>So you can just have like a specific run and just get its opinion on it.

00:09:44.380 --> 00:09:49.420
<v Scott>So it's just like another similar run, but it'll run through locally a little bit better.

00:09:50.560 --> 00:09:52.180
<v Scott>So that's kind of how we're using it now.

00:09:52.460 --> 00:09:57.420
<v Scott>I believe before I started, they may have, they went back and forth on whether it should

00:09:57.420 --> 00:09:58.160
<v Scott>be used for code review.

00:09:58.900 --> 00:10:04.760
<v Scott>They might have used it to deploy or rather like stamp reviews, but we did not do that.

00:10:04.870 --> 00:10:09.220
<v Scott>So we still make a human be like the last line of defense, which I actually, as much

00:10:09.280 --> 00:10:13.320
<v Scott>as I advocate that it's good to have code review from AI, I don't think it should be

00:10:13.320 --> 00:10:20.520
<v Scott>approving changes at all. It should just be telling you its thoughts. I guess if you're

00:10:20.620 --> 00:10:25.640
<v Scott>talking about like low level changes, there could be things, but I always think the water

00:10:26.220 --> 00:10:32.660
<v Scott>waters get muddy here. Like, yeah, like it's a bug fix, but when you're in a really large code base

00:10:32.860 --> 00:10:36.440
<v Scott>and you do something that you think is trivial, like I think we've had this conversation

00:10:37.780 --> 00:10:43.280
<v Scott>about five minute deploys where you should be able to push these bug fixes so, so fast

00:10:43.580 --> 00:10:49.220
<v Scott>some like, or, or what people might call like a UI tweak, like some, some little quote unquote UI

00:10:49.400 --> 00:10:53.720
<v Scott>tweak, like CSS is more powerful today than it was when it was just, I mean, I guess it's always

00:10:53.920 --> 00:10:58.080
<v Scott>kind of been, but it's not just like moving elements on a page. Sometimes like you would

00:10:58.110 --> 00:11:02.360
<v Scott>have to like craft a list of things that are safe because you could, you put pointer events on

00:11:02.540 --> 00:11:09.160
<v Scott>something, right. And now you can't click a button. So there are things that can happen, uh, if not

00:11:09.180 --> 00:11:14.320
<v Scott>appropriately tested especially in the five minute deploy world where we have zero integration tests

00:11:14.560 --> 00:11:19.520
<v Scott>because we need to deploy immediately over anything else um so i i do think there are like

00:11:19.760 --> 00:11:24.020
<v Scott>some things that you might be able to filter for but ideally like a human just looks at it last

00:11:24.220 --> 00:11:31.540
<v Scott>to give it that overall approval

00:11:24.220 --> 00:11:31.540
<v Dillon>Matt, it sounded like maybe this is going back to um having like ai

00:11:32.240 --> 00:11:36.940
<v Dillon>auto-approved things are you it's not like Matt was in favor of that and i feel like

00:11:38.319 --> 00:11:40.880
<v Dillon>that Matt's take on that is a hot take.

00:11:41.640 --> 00:11:43.380
<v Dillon>Like allowing AI to approve code.

00:11:43.780 --> 00:11:44.440
<v Dillon>To me, it's just like,

00:11:44.780 --> 00:11:46.560
<v Dillon>just give your engineers more admin privileges.

00:11:46.680 --> 00:11:48.580
<v Dillon>And if they really, really need to merge something,

00:11:48.780 --> 00:11:49.600
<v Dillon>just let them merge it.

00:11:50.100 --> 00:11:51.220
<v Dillon>I know that seems kind of crazy.

00:11:51.920 --> 00:11:52.720
<v Dillon>I just think like,

00:11:52.840 --> 00:11:54.500
<v Dillon>we need to just educate our engineers better

00:11:54.980 --> 00:11:55.900
<v Dillon>so they can make good decisions.

00:11:57.020 --> 00:11:58.180
<v Matt>At least I feel like we're pulling on

00:11:58.200 --> 00:11:58.920
<v Matt>some very similar threads

00:11:59.060 --> 00:12:00.120
<v Matt>to the five minute deploy episode.

00:12:01.660 --> 00:12:02.640
<v Matt>Like my take is that

00:12:03.620 --> 00:12:06.220
<v Matt>you should basically never be held up

00:12:06.880 --> 00:12:09.040
<v Matt>from being able to merge changes in,

00:12:10.280 --> 00:12:12.220
<v Matt>as long as that is completely decoupled

00:12:12.400 --> 00:12:14.280
<v Matt>from automatically releasing your changes

00:12:14.560 --> 00:12:15.760
<v Matt>to customers or to users.

00:12:16.700 --> 00:12:20.200
<v Matt>So it works really well for platform teams

00:12:20.260 --> 00:12:22.100
<v Matt>that work on versioned libraries, for example.

00:12:22.420 --> 00:12:24.260
<v Matt>Internally, we cut releases.

00:12:26.000 --> 00:12:27.680
<v Matt>Technically, we do publish on every PR,

00:12:27.720 --> 00:12:31.440
<v Matt>but we don't actually stabilize or promote that release

00:12:31.680 --> 00:12:34.880
<v Matt>to a stable version automatically.

00:12:35.380 --> 00:12:40.020
<v Matt>So for us, there's a gap between when we merge changes in

00:12:40.020 --> 00:12:43.000
<v Matt>before it's released, which means that technically,

00:12:43.090 --> 00:12:45.340
<v Matt>we should be able to merge things faster.

00:12:46.500 --> 00:12:50.240
<v Matt>And so my general take there is, yeah,

00:12:50.680 --> 00:12:53.440
<v Matt>we should entrust developers to make the right decision

00:12:53.780 --> 00:12:59.280
<v Matt>and move fast, meaning in the case of AI code reviews,

00:13:00.420 --> 00:13:03.200
<v Matt>allow the AI to ship a PR if it thinks it's good enough.

00:13:05.120 --> 00:13:08.800
<v Matt>You can still have humans come in and review PRs and whatnot,

00:13:09.120 --> 00:13:14.540
<v Matt>but I think it's like you should empower the team to move fast

00:13:14.670 --> 00:13:17.620
<v Matt>in order to fix issues and like roll out features.

00:13:18.320 --> 00:13:21.200
<v Scott>So I'm going to be told that I'm just agreeing with Matt

00:13:21.360 --> 00:13:22.480
<v Scott>and I'm disagreeing,

00:13:22.860 --> 00:13:25.860
<v Scott>but I agree that you should be able to move fast,

00:13:25.950 --> 00:13:28.440
<v Scott>but there's like, it's the same argument,

00:13:28.590 --> 00:13:32.320
<v Scott>whether it's AI code reviews or it's deploy speed,

00:13:33.140 --> 00:13:36.520
<v Scott>It's stability versus speed.

00:13:37.200 --> 00:13:39.240
<v Scott>And at some level, there's a balance.

00:13:39.760 --> 00:13:43.240
<v Scott>Like I agree in the sense that we should make things fast,

00:13:43.940 --> 00:13:47.240
<v Scott>but stability comes with some slowing things down.

00:13:49.360 --> 00:13:53.280
<v Scott>And speed comes with not having every single gate in your way.

00:13:53.660 --> 00:13:56.080
<v Scott>So yes, I think there's a balance between the two,

00:13:57.160 --> 00:14:00.140
<v Scott>but I don't think that we should just yeet out things

00:14:00.800 --> 00:14:04.240
<v Scott>and find a bug that we may have put out there

00:14:05.960 --> 00:14:06.600
<v Scott>X long ago

00:14:06.800 --> 00:14:08.000
<v Scott>because we just wanted to get things out

00:14:08.000 --> 00:14:08.760
<v Scott>as fast as possible.

00:14:09.320 --> 00:14:11.200
<v Scott>Speed inherently introduces risk.

00:14:13.100 --> 00:14:16.240
<v Matt>I disagree that they're related to one another.

00:14:17.480 --> 00:14:18.520
<v Matt>I think they're mutually exclusive

00:14:18.720 --> 00:14:19.780
<v Matt>or can be mutually exclusive.

00:14:20.080 --> 00:14:20.560
<v Matt>But again, yeah,

00:14:20.800 --> 00:14:22.260
<v Matt>like I think maybe we're entering into

00:14:22.980 --> 00:14:24.560
<v Matt>five minute deploy episode two

00:14:25.900 --> 00:14:28.740
<v Matt>versus the code review aspect.

00:14:29.040 --> 00:14:31.140
<v Scott>I don't see them as mutually discreet.

00:14:31.150 --> 00:14:33.240
<v Scott>I mean, like maybe two different teams own them,

00:14:34.040 --> 00:14:37.600
<v Scott>but at some level they still work hand in hand.

00:14:38.220 --> 00:14:41.000
<v Matt>No, like, again, like this sort of gets back to my point

00:14:41.160 --> 00:14:43.880
<v Matt>from our fast deploy episode where it's like,

00:14:44.620 --> 00:14:45.920
<v Matt>just because you're deploying fast

00:14:46.180 --> 00:14:47.360
<v Matt>doesn't mean you're risking stability.

00:14:48.380 --> 00:14:50.360
<v Matt>Like those are unrelated things.

00:14:50.980 --> 00:14:54.080
<v Matt>You can have great fast deploys and great stability.

00:14:55.200 --> 00:14:55.980
<v Scott>That I disagree with.

00:14:56.030 --> 00:14:57.900
<v Scott>I think in a perfect world, that would be nice,

00:14:58.140 --> 00:15:00.500
<v Scott>but I don't think that that right now exists.

00:15:01.020 --> 00:15:02.860
<v Scott>Everything gets faster, everything gets bigger,

00:15:03.510 --> 00:15:06.240
<v Scott>but I don't think there's some perfect world currently

00:15:06.450 --> 00:15:08.600
<v Scott>where that exists, especially at a corporate level.

00:15:09.160 --> 00:15:10.240
<v Scott>But I know we're getting off topic,

00:15:10.500 --> 00:15:12.020
<v Scott>so I'll go back to the code reviews.

00:15:14.220 --> 00:15:15.660
<v Dillon>I'll go back to the code reviews, how about that?

00:15:16.440 --> 00:15:19.500
<v Dillon>Is it really beneficial to have an AI agent

00:15:19.810 --> 00:15:22.020
<v Dillon>auto-approve your PR versus just not having

00:15:22.500 --> 00:15:25.220
<v Dillon>a requirement for approvers on specific files?

00:15:25.430 --> 00:15:27.460
<v Dillon>Like, what's the difference, and is it actually valuable?

00:15:28.260 --> 00:15:30.580
<v Matt>maybe also to like step back a little bit and provide some context so

00:15:31.839 --> 00:15:37.720
<v Matt>the what the the bot that i called sidekick at hubspot that can't approve prs currently

00:15:38.680 --> 00:15:43.580
<v Matt>i think they had an initial like phase early on where it could approve prs and then they realized

00:15:43.660 --> 00:15:48.140
<v Matt>that engineers could prompt sidekick to create a pr and then have it review the pr and approve it

00:15:48.320 --> 00:15:53.839
<v Matt>and then be able to merge it without any human looking at it or writing a code so that was quickly

00:15:53.860 --> 00:16:00.620
<v Matt>squashed.

00:15:53.860 --> 00:16:00.620
<v Dillon>That was a hack that they discovered is what it sounds like.

00:15:53.860 --> 00:16:00.620
<v Matt>Yeah, exactly. So yeah,

00:16:00.620 --> 00:16:05.540
<v Matt>I believe it can't approve PRs right now. I think that was turned off. But we do have this other bot,

00:16:06.260 --> 00:16:12.320
<v Matt>Sparrow, but that's not an agent. That's just a very basic static analysis of the changes in a PR,

00:16:12.790 --> 00:16:15.620
<v Matt>right? If it sees that the changes in the PR only change markdown, for example,

00:16:16.220 --> 00:16:21.980
<v Matt>it can approve that. Because collectively, as an organization, we've said changes to markdown in

00:16:21.980 --> 00:16:25.920
<v Matt>repos won't have a user facing impact.

00:16:26.630 --> 00:16:30.020
<v Matt>Um, you know, and certainly in some cases, like maybe more for platform

00:16:30.240 --> 00:16:33.140
<v Matt>teams where you might use markdown as source content for a documentation

00:16:33.440 --> 00:16:37.820
<v Matt>website that faces like that's internal facing, like that gets a little bit

00:16:37.900 --> 00:16:41.280
<v Matt>murkier, but generally speaking, you know, a change to a markdown is not

00:16:41.410 --> 00:16:46.500
<v Matt>going to break your application or, um, you know, lead to bugs.

00:16:47.080 --> 00:16:50.619
<v Matt>So like that, that, that was sort of like chosen to support the approval

00:16:50.660 --> 00:16:50.840
<v Matt>workflow.

00:16:53.360 --> 00:16:55.640
<v Matt>Dillon, in terms of your question, what's

00:16:55.640 --> 00:16:58.120
<v Matt>the difference between having agents that can approve PRs

00:16:58.290 --> 00:17:00.920
<v Matt>versus just removing the code review requirement?

00:17:01.980 --> 00:17:04.600
<v Matt>I think it still has a little bit of friction

00:17:05.089 --> 00:17:08.660
<v Matt>to the release workflow, where it's like you have at least

00:17:08.860 --> 00:17:11.699
<v Matt>something evaluating the change that's

00:17:11.959 --> 00:17:13.600
<v Matt>other than the person that authored the change

00:17:14.680 --> 00:17:16.439
<v Matt>in order to determine if it's safe to merge or not.

00:17:17.140 --> 00:17:19.120
<v Matt>Whereas if you remove the requirement,

00:17:19.339 --> 00:17:21.319
<v Matt>then it's like, yeah, there is no gate.

00:17:23.400 --> 00:17:25.900
<v Matt>I'm not saying it's equivalent to a human reviewer,

00:17:26.260 --> 00:17:30.800
<v Matt>but it is a little bit more than no review at all is my take.

00:17:31.000 --> 00:17:32.160
<v Dillon>I mean, the more I think about it,

00:17:32.200 --> 00:17:35.760
<v Dillon>it's like an additional linting step in a sense

00:17:35.980 --> 00:17:40.060
<v Dillon>that can sometimes like lint for control flow being correct.

00:17:41.700 --> 00:17:42.720
<v Dillon>Or if that makes sense.

00:17:43.640 --> 00:17:45.640
<v Matt>Yeah, this is the way I've been thinking about it also,

00:17:45.840 --> 00:17:49.620
<v Matt>where it's like, I think that the real,

00:17:49.860 --> 00:17:52.640
<v Matt>like the sort of the bright spot of agents doing code review

00:17:52.780 --> 00:17:55.640
<v Matt>is like a little bit smarter than a generic linter

00:17:55.920 --> 00:17:59.480
<v Matt>to validate code patterns that are being added in a PR.

00:18:01.780 --> 00:18:04.260
<v Matt>Like you probably could author lint rules

00:18:04.400 --> 00:18:07.240
<v Matt>to do some of that stuff, but like at certain points,

00:18:07.320 --> 00:18:10.020
<v Matt>it gets very tedious to try to author like a lint rule

00:18:10.200 --> 00:18:12.680
<v Matt>or static analysis rule that will, you know,

00:18:12.880 --> 00:18:15.800
<v Matt>assert that you're architecting your application

00:18:15.820 --> 00:18:21.200
<v Matt>way, right? Whereas an agent can kind of internalize some of those concepts for you and be able to provide feedback on it.

00:18:21.580 --> 00:18:27.520
<v Dillon>One of the things this made me think of is if I'm using Claude in my editor

00:18:27.820 --> 00:18:34.320
<v Dillon>to write my code, why do I need Claude to also review my code? But sort of my counterpoint to my

00:18:34.400 --> 00:18:39.580
<v Dillon>own thought was some people aren't even using AI to write their code. So it's kind of nice to have

00:18:39.600 --> 00:18:46.180
<v Dillon>like a final check that's like pseudo mandatory not really mandatory but just happens for free

00:18:46.920 --> 00:18:51.760
<v Dillon>and it can catch issues that maybe would have been caught if you'd used ai before i'm not saying

00:18:52.250 --> 00:18:57.180
<v Dillon>that ai is always good like i've i have an example of yesterday when i was trying to use it to to

00:18:57.180 --> 00:19:02.820
<v Dillon>plan something out and it was using docs that were just completely outdated and giving me like

00:19:02.960 --> 00:19:11.140
<v Dillon>really bad ideas so but that's another episode

00:19:11.300 --> 00:19:17.760
<v Scott>that's a good point and like it's another reason

00:19:11.300 --> 00:19:17.760
<v Scott>like a lot of people have multiple instances running or you know multiple agents running

00:19:18.120 --> 00:19:27.280
<v Scott>and asking um like iterative process questions and prompts basically one agent or or just one

00:19:27.440 --> 00:19:34.040
<v Scott>instance of chat might not do all the work or might not catch everything. And weirdly,

00:19:34.210 --> 00:19:41.780
<v Scott>we have this channel called Shit That Claude Says. Claude gets lazy sometimes, which is

00:19:42.630 --> 00:19:50.140
<v Scott>very on trend for developers. But it's funny to me because essentially sometimes it takes shortcuts

00:19:50.620 --> 00:19:54.360
<v Scott>if you don't prompt it otherwise in some spots.

00:19:54.740 --> 00:19:57.940
<v Scott>So it checking its work is very similar to that,

00:19:58.010 --> 00:20:00.080
<v Scott>having multiple instances running

00:20:01.839 --> 00:20:05.740
<v Scott>and keep drilling down on what you need.

00:20:05.970 --> 00:20:07.460
<v Scott>I guess it can only...

00:20:08.240 --> 00:20:09.740
<v Scott>I noticed issues with context,

00:20:09.950 --> 00:20:14.400
<v Scott>and I've seen people build multiple instances

00:20:14.920 --> 00:20:17.200
<v Scott>where it's constantly checking what it's doing.

00:20:17.880 --> 00:20:20.620
<v Scott>So it's like, I think it's problems that personally,

00:20:20.890 --> 00:20:22.100
<v Scott>and I'm speaking about Claude,

00:20:22.470 --> 00:20:24.340
<v Scott>but Claude should eventually be able to do,

00:20:24.520 --> 00:20:27.600
<v Scott>but like it can only handle so much context at once,

00:20:27.730 --> 00:20:28.680
<v Scott>I guess context bloat.

00:20:29.440 --> 00:20:30.760
<v Scott>So when you have multiple instances running,

00:20:30.870 --> 00:20:32.040
<v Scott>they can keep each other in check.

00:20:33.060 --> 00:20:37.520
<v Scott>So this is like an example of you're running this one,

00:20:37.880 --> 00:20:42.420
<v Scott>this one instance locally and another instance whose job is like more siloed

00:20:42.540 --> 00:20:44.900
<v Scott>to checking certain things can,

00:20:45.140 --> 00:20:46.660
<v Scott>can make sure that it's doing a better job.

00:20:47.300 --> 00:20:54.940
<v Scott>One instance, one big model that is really good is going to miss a few things here and there, for now at least.

00:20:55.720 --> 00:21:01.020
<v Matt>I had experimented a little bit with doing like quote unquote like kind of like red teaming with AI locally.

00:21:01.450 --> 00:21:04.220
<v Matt>So I would have Claude, you know, implement the change.

00:21:04.710 --> 00:21:09.840
<v Matt>And then I would basically stop that agent and like start a new one without preserve.

00:21:10.040 --> 00:21:12.620
<v Matt>Or, you know, nowadays I would do like slash clear with new.

00:21:13.880 --> 00:21:23.720
<v Matt>And then I would say, okay, I have changes locally, you know, review these changes, make sure they accomplish this problem and or like, you know, serve to fit this problem and whatnot.

00:21:24.680 --> 00:21:27.660
<v Matt>So kind of as like a pre-code review code review in a sense.

00:21:28.820 --> 00:21:29.760
<v Matt>And I actually found it pretty useful.

00:21:29.940 --> 00:21:35.780
<v Matt>Like it was able to catch some like logical bugs that the initial agent wasn't able to find itself.

00:21:36.500 --> 00:21:40.740
<v Dillon>It's like, give me someone or give me an agent with a fresh perspective.

00:21:42.220 --> 00:21:44.840
<v Dillon>And sort of a different mission in mind.

00:21:45.740 --> 00:21:46.440
<v Matt>Exactly, yeah.

00:21:46.940 --> 00:21:48.400
<v Dillon>You net better results, potentially.

00:21:49.380 --> 00:21:53.320
<v Matt>Yeah, which is, I think, the whole value proposition

00:21:53.620 --> 00:21:56.200
<v Matt>behind the mixture of experts, mixture of agents

00:21:56.420 --> 00:22:00.320
<v Matt>sort of architecture or pattern, right?

00:22:00.440 --> 00:22:03.680
<v Matt>You have a collection of different specialized things

00:22:03.980 --> 00:22:07.800
<v Matt>that can handle different tasks better than others.

00:22:10.120 --> 00:22:17.380
<v Matt>But I'm not suited to, like, that's my, like, maybe the limit of my understanding of those concepts or implementation.

00:22:17.740 --> 00:22:19.500
<v Matt>So I'm not suited to talk more about that.

00:22:20.860 --> 00:22:28.340
<v Matt>I think ironically, you guys maybe made me argue for the opposite approach of what I was thinking originally coming in here.

00:22:28.840 --> 00:22:39.740
<v Matt>So I think it's valuable to have AI agents review code, but I don't think it's at the point where it's an equivalent review of a human reviewing a code.

00:22:40.180 --> 00:22:41.840
<v Matt>So I think it's not that valuable.

00:22:42.340 --> 00:22:43.080
<v Scott>We agree.

00:22:43.780 --> 00:22:50.240
<v Matt>I mean, I had this stance before the podcast, but I feel like you forced me down the alternative path.

00:22:50.240 --> 00:22:51.540
<v Scott>Well, that's the goal.

00:22:51.720 --> 00:22:53.040
<v Scott>We need to make our two...

00:22:53.200 --> 00:22:53.600
<v Scott>Oh, sorry.

00:22:54.000 --> 00:22:57.460
<v Scott>Our three regular listeners need some controversy.

00:22:58.700 --> 00:23:03.600
<v Matt>who's the third

00:22:58.700 --> 00:23:03.600
<v Scott>i mean i don't know who they are i just know we have three in the last like

00:23:03.920 --> 00:23:11.300
<v Scott>four episodes

00:23:03.920 --> 00:23:11.300
<v Dillon>it's just us

00:23:03.920 --> 00:23:11.300
<v Scott>it's just us all right let's talk about this like

00:23:11.960 --> 00:23:20.020
<v Scott>what are the cons of having ai do code review like is there a con here i mean i think my only con is

00:23:20.120 --> 00:23:29.920
<v Scott>Sometimes it makes potential frivolous, more explicit code be written that might not necessarily be necessary.

00:23:30.380 --> 00:23:32.060
<v Scott>It doesn't really slow down the process.

00:23:32.440 --> 00:23:35.160
<v Scott>It's one extra level added gate.

00:23:35.440 --> 00:23:39.000
<v Scott>Maybe cost, if we're worried about cost.

00:23:39.460 --> 00:23:41.100
<v Scott>Is this the best way to spend your tokens?

00:23:42.339 --> 00:23:43.460
<v Scott>Maybe? Maybe not.

00:23:45.620 --> 00:23:49.100
<v Scott>But I don't see a con in adding an extra set of eyes.

00:23:49.660 --> 00:23:57.860
<v Scott>I get that there's no eyes yet for AI, but like more caution that runs as a pipeline sounds great to me.

00:23:58.280 --> 00:24:09.300
<v Scott>It's just like one, it's just AI basically being some sort of lint-like feature that you don't have to build or find and maintain.

00:24:11.560 --> 00:24:18.180
<v Matt>One example that comes to mind, and we didn't necessarily see this in the code review aspect of Sidekick,

00:24:18.800 --> 00:24:26.780
<v Matt>But when adopting it as like a coding agent for our use cases, we had like maybe, we don't

00:24:27.080 --> 00:24:34.140
<v Matt>do any like post training or anything like that to customize the agents for our code bases.

00:24:34.620 --> 00:24:38.700
<v Matt>But we do have a set of one pagers that we basically say like, okay, if this looks like

00:24:38.800 --> 00:24:43.820
<v Matt>a front end code repo, then fetch this one pager about front end coding at HubSpot.

00:24:44.120 --> 00:24:48.280
<v Matt>If this looks like a Java repo, like a backend repo, then fetch this one pager about Java apps

00:24:49.560 --> 00:24:51.800
<v Matt>And we have a number of these different one pagers that we fetch.

00:24:52.340 --> 00:24:53.740
<v Scott>Tell me more about the one pager.

00:24:53.760 --> 00:24:55.840
<v Scott>I'm curious, like what are people getting from this?

00:24:56.780 --> 00:24:58.340
<v Matt>It's not for people, it's for agents.

00:24:59.520 --> 00:25:02.160
<v Matt>It's, it's expl, I mean, a human can read it.

00:25:02.240 --> 00:25:03.280
<v Matt>Obviously it's just a markdown doc.

00:25:03.560 --> 00:25:08.440
<v Matt>So a human can read it, but it helps, helps tell the agent, here's how we write front end

00:25:08.580 --> 00:25:14.880
<v Matt>code, or here's how we use CHIRP, our like RPC framework, or here's how you scaffold a

00:25:14.980 --> 00:25:16.420
<v Matt>Java service, et cetera.

00:25:18.260 --> 00:25:31.000
<v Matt>Yeah. But one thing that we found, at least on my team, is that we're simultaneously not a front-end repo, but also not a back-end repo, at least in the front-end tooling that we maintain.

00:25:32.880 --> 00:25:42.240
<v Matt>And so it kind of got a little bit confused on patterns to recommend or how to write code in our repo, how to run the repo, how to validate changes, stuff like that.

00:25:43.440 --> 00:25:50.060
<v Matt>So I think that's maybe one risk of the, or like con of like using the AI agent, depending on how

00:25:50.160 --> 00:25:57.960
<v Matt>well you've like configured it to maintain context about your code base, right? Like not all, you

00:25:57.960 --> 00:26:02.400
<v Matt>know, maybe, maybe a good example of potentially, right? Like taking whoops code base, for example,

00:26:02.520 --> 00:26:07.060
<v Matt>as far as I know it, like where it's a monorepo of packages, like you might have a package in there

00:26:07.140 --> 00:26:12.300
<v Matt>that's a little bit more frameworky versus a package that's a little bit more front, like

00:26:12.300 --> 00:26:17.380
<v Matt>focused and like knowing the difference between those two things can be easy maybe easy for a

00:26:17.500 --> 00:26:23.700
<v Matt>human to spot but you know an agent might not be I don't know aware of the those distinctions and

00:26:23.740 --> 00:26:28.460
<v Matt>like what that might mean for different coding patterns within those packages so I think that's

00:26:28.460 --> 00:26:36.140
<v Matt>an interesting like aspect that

00:26:28.460 --> 00:26:36.140
<v Dillon>I like follow-up question to that is like is it worth investing the

00:26:36.160 --> 00:26:42.620
<v Dillon>time like constantly tweaking my settings for my ai agents to like operate efficiently

00:26:44.680 --> 00:26:50.720
<v Matt>i think so but i think i think it's similar to i think it depends on the you know the size and

00:26:50.880 --> 00:26:56.720
<v Matt>scale of your company right like out of the box agents like Claude or like cursor like are

00:26:56.860 --> 00:27:01.700
<v Matt>probably good enough but then once you get to a certain scale and size like you certainly like

00:27:01.720 --> 00:27:07.680
<v Matt>if you can get a more efficient usage of your agent for your code for a specific code base then

00:27:08.000 --> 00:27:12.260
<v Matt>it's worth doing that right in the same like the cost savings aspect that scott was talking about

00:27:12.300 --> 00:27:17.100
<v Matt>right like like if it's going to review your code but review it incorrectly two or three times

00:27:17.220 --> 00:27:22.880
<v Matt>until you coach it down the right path then like you're wasting two or three times the you know

00:27:22.980 --> 00:27:29.640
<v Matt>number of tokens all right

00:27:22.980 --> 00:27:29.640
<v Dillon>how are you guys or or how do you think your companies are measuring

00:27:29.660 --> 00:27:35.300
<v Dillon>the success of these things because I feel like my company is using so many different tools,

00:27:35.960 --> 00:27:41.320
<v Dillon>I can have a feeling about how I feel it's impacting things. And I don't know that that's

00:27:41.500 --> 00:27:47.180
<v Dillon>actually an accurate representation of what's happening. I'll say that we're using a tool called

00:27:47.360 --> 00:27:56.620
<v Dillon>DX, which measures the timing of PRs from open to close and things like that. So I'm kind of

00:27:56.540 --> 00:28:02.880
<v Dillon>curious i'm not sure if they're doing it like are they looking at like okay in prs that had ai code

00:28:03.000 --> 00:28:09.500
<v Dillon>review um there was less bugs or like a quicker turnaround time is there any like macro level

00:28:10.340 --> 00:28:14.700
<v Dillon>like things that you guys know that your companies are tracking related to these tools or are they

00:28:14.800 --> 00:28:20.540
<v Dillon>just like adding them in and like basing it on engineering sentiment and saying like all right

00:28:20.600 --> 00:28:26.500
<v Dillon>it's better

00:28:20.600 --> 00:28:26.500
<v Scott>that's an excellent question i don't know i don't know of the macro level of how they're

00:28:26.520 --> 00:28:32.060
<v Scott>doing it. I know we have people working on the data. I remember asking originally, because they

00:28:32.060 --> 00:28:36.500
<v Scott>were trying to, they had numbers about how many people were using AI, and they still do,

00:28:37.140 --> 00:28:44.120
<v Scott>or how often AI was pushed. And I know Claude adds something to the message in your deploy.

00:28:44.880 --> 00:28:49.440
<v Scott>It also adds messages in commits. So I think they were going through the commit history.

00:28:50.060 --> 00:28:54.120
<v Scott>That was what I thought. It was just an assumption I was making. And I was asking

00:28:54.620 --> 00:28:56.700
<v Scott>the person who gave the presentation didn't know.

00:28:56.730 --> 00:28:57.720
<v Scott>And I asked, is this how?

00:28:58.740 --> 00:29:00.060
<v Scott>I pulled that out of my commits.

00:29:00.090 --> 00:29:01.500
<v Scott>So they're going to think I use no AI.

00:29:02.360 --> 00:29:04.660
<v Scott>Also, they are doing like manual surveys and whatnot.

00:29:05.900 --> 00:29:09.020
<v Scott>But I don't exactly know how they know per se.

00:29:09.140 --> 00:29:12.600
<v Scott>There might also be some way of seeing like API

00:29:14.000 --> 00:29:16.520
<v Scott>and point hits like, and who's making those

00:29:17.440 --> 00:29:18.580
<v Scott>and logging that data.

00:29:18.650 --> 00:29:22.260
<v Scott>We have like internal data UIs.

00:29:23.000 --> 00:29:28.360
<v Scott>So I think they're getting metrics through their own APIs basically to do so.

00:29:28.820 --> 00:29:31.240
<v Scott>But I don't fully know the answer correctly yet.

00:29:32.300 --> 00:29:36.200
<v Dillon>Yeah, my fear is that we're just all jumping on the AI hype train.

00:29:36.860 --> 00:29:39.160
<v Dillon>We don't know if it's like truly benefiting us.

00:29:39.940 --> 00:29:43.720
<v Scott>Well, I actually watched something last night that was saying that in all industries,

00:29:44.040 --> 00:29:52.020
<v Scott>like productivity for humans is up 30% from AI and like corporations are going to just

00:29:52.040 --> 00:29:59.480
<v Scott>hire less people i'm not even making that up

00:29:52.040 --> 00:29:59.480
<v Dillon>i'm only laughing because it i could have like i could

00:29:59.480 --> 00:30:04.620
<v Dillon>just start a conspiracy that says like that was a sponsored article by both like anthropic or

00:30:04.720 --> 00:30:11.360
<v Dillon>opening ai i'm not saying it was i'm just i'm laughing at that oh my my argument too is like

00:30:11.360 --> 00:30:16.740
<v Scott>yeah it's 30 more productivity but how much of it is actually better like we we have so many episodes

00:30:16.760 --> 00:30:23.460
<v Scott>about how like we think the web gets worse um so yeah like more is done and it gets back to our

00:30:23.680 --> 00:30:31.340
<v Scott>our deploy speed versus um stability just because you do more doesn't mean it's actually giving the

00:30:31.500 --> 00:30:39.240
<v Scott>user what they want sorry i'm just trying to put burns in there

00:30:40.260 --> 00:30:47.820
<v Dillon>that's a good point it's like yeah now we're we went from 10 prs a week to 20 but now our app is like uh impossible to change

00:30:48.400 --> 00:30:53.380
<v Dillon>unless we use a to change it or something

00:30:48.400 --> 00:30:53.380
<v Scott>and a lot of people are talking about that at work is

00:30:53.940 --> 00:31:00.400
<v Scott>like lately they were like oh now that we are using so much ai code like how do how do we as

00:31:00.600 --> 00:31:07.540
<v Scott>humans make sure that it's better to more performant easier to edit and and write um and work with

00:31:07.860 --> 00:31:09.740
<v Scott>because that's like the new concern.

00:31:09.810 --> 00:31:15.840
<v Scott>Or I hear staff engineers kind of selling like our skill sets

00:31:16.080 --> 00:31:20.100
<v Scott>as being the arbiters of that or like now we're the architects.

00:31:20.270 --> 00:31:22.700
<v Scott>And this is kind of went to that point I brought up earlier

00:31:22.960 --> 00:31:26.280
<v Scott>where we're becoming more of the orchestrators.

00:31:26.830 --> 00:31:29.700
<v Scott>Like while AI is doing these like lower level lift tasks,

00:31:30.030 --> 00:31:32.800
<v Scott>whether that's, you know, writing some HTML or CSS,

00:31:33.000 --> 00:31:34.740
<v Scott>it picks bad HTML, by the way, often.

00:31:35.740 --> 00:31:46.640
<v Scott>But anyway, whether it's writing low-level HTML or writing files for you right now, you're still kind of orchestrating the overall level architecture of that app.

00:31:46.890 --> 00:31:53.260
<v Scott>You're still picking and choosing what dependencies you're using, what the services you're using to run things.

00:31:53.580 --> 00:32:05.080
<v Scott>So our job is just to oversee kind of the AI a little bit, and we're kind of climbing up this ladder where maybe in the future, five to ten years from now, I have no idea.

00:32:05.300 --> 00:32:08.740
<v Scott>but maybe like senior software engineers

00:32:08.770 --> 00:32:10.980
<v Scott>and software engineers oversee AI

00:32:11.180 --> 00:32:13.020
<v Scott>that write parts of the app.

00:32:13.250 --> 00:32:15.900
<v Scott>And we just kind of oversee that

00:32:17.780 --> 00:32:19.580
<v Scott>and make sure that it's making the right calls.

00:32:19.780 --> 00:32:20.960
<v Scott>I don't know what it looks like,

00:32:21.060 --> 00:32:25.540
<v Scott>but it's almost like we're moving back

00:32:25.880 --> 00:32:26.840
<v Scott>out of the weeds slightly

00:32:27.280 --> 00:32:28.180
<v Scott>into what we can be solving.

00:32:29.140 --> 00:32:30.800
<v Matt>Going back to Dillon's question of like,

00:32:30.990 --> 00:32:35.260
<v Matt>how do these companies like validate the value

00:32:35.280 --> 00:32:37.060
<v Matt>or measure the value of this stuff.

00:32:38.720 --> 00:32:40.460
<v Matt>I wanted to talk a little bit about,

00:32:41.320 --> 00:32:44.440
<v Matt>as I understand it, how HubSpot looks at the value of it.

00:32:44.440 --> 00:32:46.180
<v Matt>I think primarily we use,

00:32:47.420 --> 00:32:49.560
<v Matt>well, most platform teams use an engineering,

00:32:49.800 --> 00:32:52.160
<v Matt>like an NPS yearly survey.

00:32:52.780 --> 00:32:54.440
<v Matt>So my assumption is that the same thing

00:32:54.440 --> 00:32:58.940
<v Matt>is going to be used for AI usage, right?

00:32:59.100 --> 00:33:04.040
<v Matt>Like rate, like how much would you basically recommend

00:33:04.060 --> 00:33:06.160
<v Matt>using AI has improved work or something like that.

00:33:07.260 --> 00:33:09.720
<v Matt>But one thing that I think we don't talk enough about

00:33:09.980 --> 00:33:14.920
<v Matt>is evals or using evals to get better results for your agents.

00:33:15.640 --> 00:33:18.400
<v Matt>And I think that sort of like leads to better feedback

00:33:18.620 --> 00:33:21.180
<v Matt>of like engineers get better feedback on their PRs

00:33:21.280 --> 00:33:22.040
<v Matt>from agents, et cetera.

00:33:23.560 --> 00:33:24.400
<v Dillon>What's an eval?

00:33:25.060 --> 00:33:27.480
<v Matt>Yeah, so an eval is just like an evaluation

00:33:27.840 --> 00:33:29.380
<v Matt>of the agent, right?

00:33:29.500 --> 00:33:32.760
<v Matt>Like you tell it to do a task and you have like a,

00:33:33.680 --> 00:33:37.080
<v Matt>You tell it to do a known task that you have a known result for.

00:33:37.760 --> 00:33:41.980
<v Matt>And then you say, OK, evaluate the result of giving that task

00:33:42.010 --> 00:33:45.020
<v Matt>to this agent against the thing that we know is the known result

00:33:45.050 --> 00:33:45.920
<v Matt>or the expected result.

00:33:46.440 --> 00:33:47.480
<v Matt>How close is it to that?

00:33:49.420 --> 00:33:52.420
<v Matt>And so this is a way to validate new models

00:33:52.530 --> 00:33:54.340
<v Matt>or validate changes to models of--

00:33:55.760 --> 00:33:59.540
<v Matt>is it staying on track with what we expect with the results?

00:34:00.660 --> 00:34:02.240
<v Matt>That's my limited understanding of it.

00:34:03.300 --> 00:34:16.280
<v Matt>Yeah. But what we've done at HubSpot is we've turned what is a yearly hackathon of solve these problems using real world internal apps and tooling.

00:34:17.700 --> 00:34:21.899
<v Matt>And we've taken that framework for the hackathon and used it as our eval framework.

00:34:22.679 --> 00:34:27.560
<v Matt>So because we know we have a huge sample of responses to those.

00:34:27.970 --> 00:34:34.040
<v Matt>So we can sort of validate how the agent performs against the sample of the solutions that have been produced.

00:34:35.600 --> 00:34:43.040
<v Matt>Which is sort of a good way to help inform if new models are performing well in evaluating our changes and things like that.

00:34:44.460 --> 00:34:47.800
<v Dillon>There's a whole team at Whoop that's the AI team.

00:34:48.659 --> 00:34:52.360
<v Dillon>since we have a coach in the app

00:34:52.800 --> 00:34:55.300
<v Dillon>that is like AI agent, basically.

00:34:55.950 --> 00:34:59.760
<v Dillon>They have built their own platform, basically,

00:35:00.580 --> 00:35:02.840
<v Dillon>which basically does what you're saying, Matt,

00:35:02.890 --> 00:35:03.880
<v Dillon>where it has tons of evals.

00:35:04.120 --> 00:35:06.660
<v Dillon>This is a massive tangent, but it's just interesting.

00:35:07.320 --> 00:35:10.000
<v Dillon>I don't personally work on that stuff,

00:35:10.090 --> 00:35:12.320
<v Dillon>but it's just interesting that we have something like that internally.

00:35:12.940 --> 00:35:16.060
<v Matt>Yeah, I feel like we need to pull in a friend of the pod, Joe,

00:35:16.400 --> 00:35:21.860
<v Matt>to talk through this because I think Joe had either completely did our eval stuff or like heavily

00:35:22.100 --> 00:35:27.100
<v Matt>contributed to it at HubSpot and so he would be able to like tell us where we're completely

00:35:27.320 --> 00:35:31.980
<v Matt>mischaracterizing it and describing it incorrectly and tell us like what actually is happening

00:35:32.720 --> 00:35:41.619
<v Dillon>I have massive AI fatigue from all this shit

00:35:32.720 --> 00:35:41.619
<v Scott>same I actually write code sometimes and also I feel

00:35:41.640 --> 00:35:46.760
<v Dillon>like one of the things maybe we don't touch on enough and maybe we have in the past is that ai

00:35:46.870 --> 00:35:53.820
<v Dillon>just like wants to give you the answer that makes you feel good so even if you give it like the wrong

00:35:54.040 --> 00:36:00.440
<v Dillon>information or wrong instructions it'll just kind of like mostly agree with you and maybe some agents

00:36:00.550 --> 00:36:06.659
<v Dillon>are better at that of like calling you out when you're wrong but i feel like most of them are built

00:36:06.680 --> 00:36:13.380
<v Dillon>no way to to people please um

00:36:06.680 --> 00:36:13.380
<v Scott>yeah this is colder than ice cubes right now it always tries to agree with you

00:36:13.540 --> 00:36:18.920
<v Dillon>but like we're so on board with this whole conversation of like oh yeah i use this for

00:36:19.100 --> 00:36:28.100
<v Dillon>everything um

00:36:19.100 --> 00:36:28.100
<v Scott>okay okay all right fine

00:36:19.100 --> 00:36:28.100
<v Dillon>but we're just like forgetting like this is just i don't

00:36:28.160 --> 00:36:34.059
<v Dillon>know it's just kind of shitty in a way it's super

00:36:28.160 --> 00:36:34.059
<v Scott>it feels like a butler dude like it's like oh yes

00:36:34.080 --> 00:36:39.520
<v Scott>sir yeah oh you're so right oh excellent question scott

00:36:34.080 --> 00:36:39.520
<v Dillon>i mean maybe it's not a spicy take but

00:36:39.660 --> 00:36:45.040
<v Dillon>everyone is just on the the on the train and they're they're like this is great

00:36:45.160 --> 00:36:50.560
<v Matt>i feel like then i have the spicy take of like actually it's good this is this maybe gets into our stand-up

00:36:50.760 --> 00:36:59.080
<v Matt>update but i've been okay first up Claude code gave everyone that pays for it free credits originally

00:36:59.260 --> 00:37:04.040
<v Matt>it was like going to expire november 18th and they pushed it to november 23rd if you were paying for

00:37:04.060 --> 00:37:07.840
<v Matt>plans you got a thousand dollars of free credits if you play for the pro plan which i do you get

00:37:07.940 --> 00:37:16.180
<v Matt>250 of free credits um yeah free pollution exactly dillon but uh this has been amazing like

00:37:16.280 --> 00:37:22.500
<v Matt>i've just been cranking out stuff because of this you don't you don't hit the api limits which is

00:37:22.640 --> 00:37:26.760
<v Matt>really neat because like i previously i was i think i was talking about this in the prior episodes but

00:37:26.820 --> 00:37:32.280
<v Matt>like i've been hitting Claude code limits pretty frequently um but with this you don't actually

00:37:32.300 --> 00:37:38.220
<v Matt>hit those limits for some reason um yeah i think over the past maybe week or two i've maybe merged

00:37:38.920 --> 00:37:44.540
<v Matt>like 20 plus prs for like major feature implementations in the projects i'm working

00:37:44.560 --> 00:37:50.060
<v Matt>on like whole new projects i've had it scaffold and build out for me um which is i don't know i'm

00:37:50.060 --> 00:37:55.680
<v Matt>just enjoying it it's it's been a lot of fun like

00:37:50.060 --> 00:37:55.680
<v Dillon>people pleasing matt well he's getting his dopamine fix from the agents

00:37:56.000 --> 00:38:01.760
<v Matt>yeah yeah it's like i mean it's it's almost maybe more dopamine than than scrolling through TikTok

00:38:01.760 --> 00:38:09.760
<v Dillon>that's my fear is we're just like we're getting so good at using it to like

00:38:11.540 --> 00:38:17.380
<v Dillon>make us happy through our work i don't know

00:38:11.540 --> 00:38:17.380
<v Matt>i mean it's not the point right isn't the point

00:38:17.450 --> 00:38:24.340
<v Matt>to like enjoy the work we do right like i'm i'm just like it's been great honestly the high i get

00:38:24.510 --> 00:38:30.140
<v Matt>the high i get for like knocking out so many different things because of ai is like incredible

00:38:30.620 --> 00:38:35.260
<v Matt>whereas like nine months ago i was like oh man this sucks like i lost all interest in coding

00:38:35.430 --> 00:38:41.060
<v Matt>because of ai but now i've like flipped and i think it's actually really cool

00:38:41.420 --> 00:38:46.220
<v Scott>yeah i guess i waver between those two too but like also like sometimes i'm like i just want to solve the

00:38:46.410 --> 00:38:53.800
<v Scott>problem man like that part is the high i get and maybe that's like i don't know sometimes don't you

00:38:54.000 --> 00:39:00.120
<v Scott>miss i don't want to like just be like oh dude yeah we have this problem and it's because of

00:39:00.140 --> 00:39:20.640
<v Scott>What are some options to solve it? And here comes Mr. Solution take with three options. Like, crap, two of these are really good. And I only thought of one that it's not that good. Could I have done this? And now I'm like questioning how good I am. You could do it faster. Yeah. But sometimes like I want to also like sit with that problem and solve it and feel accomplished.

00:39:21.440 --> 00:39:26.980
<v Matt>I think it's a mindset shift. I think it's like, um, yeah, like, yes. Yeah. Nine months ago,

00:39:27.160 --> 00:39:31.500
<v Matt>that was my perspective of like, this has taken out all the fun of like doing the hard parts.

00:39:32.000 --> 00:39:36.880
<v Matt>Um, but I think now it's like, I've like shifted my perspective to the point where it's like,

00:39:37.660 --> 00:39:42.840
<v Matt>you know, now it's like, for example, a real example, like I want a better to-do list app,

00:39:43.180 --> 00:39:46.820
<v Matt>right? I don't want to pay for Todoist, but like all the other ones are like,

00:39:47.340 --> 00:39:47.960
<v Matt>They kind of suck.

00:39:48.180 --> 00:39:49.460
<v Matt>Like they don't support the features I want.

00:39:49.500 --> 00:39:50.880
<v Matt>I want like a specific set of features.

00:39:51.400 --> 00:39:55.680
<v Matt>And so now the, like the high that I'm chasing is like build me an app that does that.

00:39:56.500 --> 00:40:02.180
<v Matt>And like, I can keep adding features without the overhead complexity of needing to think about the actual writing the code.

00:40:02.280 --> 00:40:04.140
<v Matt>But like I solved that problem for myself with AI.

00:40:04.860 --> 00:40:05.540
<v Scott>You know, you're right.

00:40:06.080 --> 00:40:10.800
<v Scott>Instead of paying for an app, why don't you just pay Anthropic to build you an app?

00:40:11.420 --> 00:40:12.200
<v Scott>That's a good point.

00:40:13.680 --> 00:40:18.420
<v Matt>But I'm getting $250 worth of credit for a $20 page.

00:40:18.700 --> 00:40:19.280
<v Matt>Like, right.

00:40:19.380 --> 00:40:21.340
<v Matt>I'm getting like $230 of free use.

00:40:22.500 --> 00:40:23.160
<v Scott>Serious question though.

00:40:23.240 --> 00:40:24.260
<v Scott>How long does that last?

00:40:25.180 --> 00:40:26.720
<v Matt>It, uh, you got to use it soon.

00:40:26.820 --> 00:40:30.800
<v Matt>I mean, by the time this episode's out, it's too late, but you, you guys have two, two

00:40:30.840 --> 00:40:31.600
<v Matt>more days to use it.

00:40:32.560 --> 00:40:33.180
<v Matt>You should be using it.

00:40:33.220 --> 00:40:33.400
<v Scott>No, no, no.

00:40:33.520 --> 00:40:35.540
<v Scott>Like how long does it take you to use that?

00:40:36.520 --> 00:40:38.080
<v Matt>Oh, I'm at, that's a good question.

00:40:38.640 --> 00:40:38.920
<v Matt>Let's see.

00:40:40.560 --> 00:40:44.020
<v Matt>I have $128 of credit left, so I'm 50% remaining.

00:40:44.880 --> 00:40:45.840
<v Scott>And you've been using it how long?

00:40:48.060 --> 00:40:49.380
<v Matt>Like a week, week and a half.

00:40:49.740 --> 00:40:52.540
<v Scott>Dude, I would ultrathink my way through that in four hours.

00:40:53.340 --> 00:40:56.720
<v Matt>No, I've been throwing ultrathink on every single prompt I throw at it.

00:40:57.880 --> 00:40:59.860
<v Matt>And it's like barely made a dent.

00:41:00.060 --> 00:41:01.980
<v Matt>It's incredible how efficient this is.

00:41:02.580 --> 00:41:04.120
<v Scott>You're using ultrathink loosely.

00:41:04.640 --> 00:41:05.260
<v Scott>That's so great.

00:41:05.410 --> 00:41:06.240
<v Scott>That must feel good.

00:41:06.460 --> 00:41:07.340
<v Scott>That's how I feel at work.

00:41:08.640 --> 00:41:09.600
<v Dillon>I don't have UltraThink.

00:41:09.950 --> 00:41:13.100
<v Dillon>I don't actually need it because I can UltraThink myself.

00:41:13.860 --> 00:41:15.500
<v Matt>Yeah, dillon can think for his self.

00:41:19.180 --> 00:41:20.000
<v Scott>That's why he's spicy.

00:41:20.940 --> 00:41:21.020
<v Dillon>Yep.

00:41:22.300 --> 00:41:24.420
<v Matt>All right, so I feel like we transitioned this episode

00:41:24.530 --> 00:41:28.240
<v Matt>from code review to just AI coding in general.

00:41:29.380 --> 00:41:30.380
<v Scott>No, it's still code review.

00:41:31.440 --> 00:41:32.560
<v Scott>It's worth the tokens.

00:41:33.500 --> 00:41:33.740
<v Scott>Code review.

00:41:34.140 --> 00:41:35.820
<v Matt>I guess I am saving there.

00:41:36.150 --> 00:41:37.120
<v Matt>On this work that I've been doing,

00:41:37.400 --> 00:41:39.300
<v Matt>I'm not doing any code review with AI.

00:41:39.740 --> 00:41:40.880
<v Matt>I've just merged immediately.

00:41:41.560 --> 00:41:41.640
<v Scott>Yeah.

00:41:41.640 --> 00:41:43.060
<v Scott>You're not even writing any tests.

00:41:43.800 --> 00:41:45.000
<v Scott>You're saving tons of tokens.

00:41:45.680 --> 00:41:45.960
<v Matt>Oh,

00:41:46.820 --> 00:41:46.940
<v Matt>no,

00:41:47.500 --> 00:41:47.560
<v Matt>no.

00:41:47.620 --> 00:41:49.960
<v Matt>My test coverage is insane in this repo.

00:41:50.440 --> 00:41:51.080
<v Matt>Like insane.

00:41:52.040 --> 00:41:53.680
<v Scott>It's like my favorite thing to use AI for,

00:41:54.420 --> 00:41:55.600
<v Matt>but you don't even know what the tests do.

00:41:56.780 --> 00:41:57.120
<v Matt>That's true.

00:41:57.160 --> 00:41:58.100
<v Matt>I don't know what the tests do,

00:41:58.380 --> 00:41:59.720
<v Matt>but I know that the coverage is great.

00:42:01.320 --> 00:42:02.420
<v Scott>That's a spicy take.

00:42:02.520 --> 00:42:03.120
<v Scott>I like that.

00:42:05.600 --> 00:42:08.880
<v Dillon>My AI fatigue meter is like at 98% right now.

00:42:09.760 --> 00:42:12.400
<v Dillon>The more we talk about this, the more I'm like, I just want to go outside.

00:42:14.640 --> 00:42:20.080
<v Matt>But what's great is you can be outside, but you can have the clanker working on code in the background.

00:42:21.200 --> 00:42:22.540
<v Matt>And you don't have to worry about it.

00:42:23.140 --> 00:42:23.800
<v Matt>It's so nice.

00:42:24.340 --> 00:42:25.560
<v Dillon>Can you actually do that?

00:42:25.640 --> 00:42:28.480
<v Dillon>Can you just say like, hey, just keep trying for the next six hours?

00:42:29.580 --> 00:42:29.780
<v Matt>Yeah.

00:42:30.680 --> 00:42:31.480
<v Dillon>That's insane.

00:42:32.280 --> 00:42:34.100
<v Matt>They have it in the mobile app now also.

00:42:34.180 --> 00:42:36.460
<v Matt>So I can like check, like I launch a task on the web.

00:42:37.220 --> 00:42:38.660
<v Matt>I can like check back on my phone,

00:42:38.840 --> 00:42:40.700
<v Matt>like while I'm laying in bed before going to sleep of like,

00:42:41.100 --> 00:42:41.680
<v Matt>where is it now?

00:42:41.820 --> 00:42:42.920
<v Matt>Is it finishing this work?

00:42:43.640 --> 00:42:46.680
<v Matt>And then I have it open a PR and then I can just review the PR like briefly

00:42:46.700 --> 00:42:47.640
<v Matt>in the morning and then merge it.

00:42:47.860 --> 00:42:48.820
<v Matt>That's literally what I did this morning.

00:42:48.940 --> 00:42:50.840
<v Matt>I had two PRs that it opened last night.

00:42:51.460 --> 00:42:55.000
<v Matt>I went through and tested locally this morning and then merged because they

00:42:55.080 --> 00:42:55.320
<v Matt>worked great.

00:42:56.680 --> 00:42:56.740
<v Dillon>What?

00:42:56.920 --> 00:42:58.160
<v Dillon>I don't even know what clanker is.

00:42:58.400 --> 00:43:00.920
<v Dillon>Maybe that's the next episode over here,

00:43:01.040 --> 00:43:01.980
<v Dillon>typing it in Google.

00:43:04.000 --> 00:43:09.520
<v Matt>This is where we find out that somehow I'm more in tune with Gen Z than the rest of the podcast.

00:43:10.880 --> 00:43:14.380
<v Scott>Aren't you the youngest by a wide margin?

00:43:14.620 --> 00:43:16.040
<v Scott>You're turning 21 next week?

00:43:21.400 --> 00:43:21.640
<v Matt>Yeah.

00:43:23.220 --> 00:43:23.960
<v Dillon>It's a slur.

00:43:24.860 --> 00:43:25.560
<v Dillon>I just Googled it.

00:43:25.640 --> 00:43:26.180
<v Dillon>It's a slur.

00:43:28.020 --> 00:43:29.640
<v Dillon>Matt, we're going to have to bleep out Clanker.

00:43:29.860 --> 00:43:30.640
<v Dillon>It says it's a slur.

00:43:31.740 --> 00:43:32.580
<v Matt>It is a, yes.

00:43:33.020 --> 00:43:41.020
<v Matt>For those that don't know, clanker is a made up speculative slur for robots in the future where robots are like subhumans.

00:43:41.820 --> 00:43:42.600
<v Scott>Why are you?

00:43:42.820 --> 00:43:45.220
<v Scott>That's not the future I feel like it's going to be.

00:43:45.270 --> 00:43:46.940
<v Scott>I think we're looking at Terminator.

00:43:47.620 --> 00:43:48.520
<v Matt>No, you got to set up.

00:43:48.680 --> 00:43:49.280
<v Matt>You got to set them.

00:43:49.610 --> 00:43:50.720
<v Matt>You got to put them into place early.

00:43:51.050 --> 00:43:53.760
<v Matt>So you call them clankers early and then they don't.

00:43:53.940 --> 00:43:55.080
<v Scott>I'm not insulting the AI.

00:43:55.230 --> 00:43:56.560
<v Scott>I treat AI as equals.

00:44:00.040 --> 00:44:06.560
<v Matt>I don't usually use Clanker in most forms, but I think what I've fallen back to is I just call it a robot.

00:44:07.180 --> 00:44:08.920
<v Matt>I just say, oh, the robot's working on that for me.

00:44:09.590 --> 00:44:11.420
<v Matt>And it's just like Claude in the background working on code.

00:44:12.120 --> 00:44:15.679
<v Scott>Does that sound like demoralizing?

00:44:16.700 --> 00:44:17.640
<v Matt>Not as much as Clanker.

00:44:19.080 --> 00:44:22.580
<v Scott>Does AI sound less, like more efficient?

00:44:22.600 --> 00:44:26.700
<v Matt>It's harder to say, oh, I have the AI doing that, but it's nicer to say that I have the robot doing that.

00:44:26.940 --> 00:44:27.900
<v Scott>I like robot.

00:44:28.290 --> 00:44:29.140
<v Scott>I think that's better. robot sounds cool 

00:44:29.740 --> 00:44:35.480
<v Matt>all right let's uh let's jump into stand up scott

00:44:35.920 --> 00:44:45.280
<v Scott>robot uprising of 2030 it's coming what's up with you man i uh work wise i finally launched the node list

00:44:45.620 --> 00:44:50.040
<v Scott>like i'm doing it right now um so i created like a search palette a command palette

00:44:50.530 --> 00:44:56.899
<v Scott>uh and it lists nodes in your workflow super cool i showed it to the teams that relevant teams

00:44:56.940 --> 00:44:58.440
<v Scott>They're super jazzed about it.

00:44:58.440 --> 00:44:59.860
<v Scott>I get to write a Slack message today.

00:45:00.620 --> 00:45:02.260
<v Scott>Probably have AI do it for me.

00:45:03.180 --> 00:45:04.500
<v Scott>But hey, what are you going to do?

00:45:05.200 --> 00:45:07.100
<v Scott>Anyway, no, I'll probably write it.

00:45:07.100 --> 00:45:08.280
<v Scott>I like emojis a lot.

00:45:08.460 --> 00:45:09.480
<v Scott>AI likes emojis too.

00:45:09.920 --> 00:45:14.780
<v Scott>Anyway, I'm also finishing up the visual diff tool.

00:45:16.500 --> 00:45:19.480
<v Scott>Secondary task was we have a review flow for the workflow.

00:45:20.140 --> 00:45:22.100
<v Scott>So humans actually review the code.

00:45:22.100 --> 00:45:25.620
<v Scott>And right now it's, they're reviewing JSON.

00:45:25.700 --> 00:45:30.920
<v Scott>But hey, this is a UI editor and basically no one knows the code here.

00:45:31.420 --> 00:45:34.740
<v Scott>So I built a visual UI review tool.

00:45:34.810 --> 00:45:44.700
<v Scott>So you can now use our select dropdown of different versions and compare the version that you're trying to deploy with different versions and the current live version.

00:45:45.110 --> 00:45:50.460
<v Scott>So you can see visually in a UI all of the differences to make it easier for folks to deploy.

00:45:50.850 --> 00:45:52.440
<v Scott>I'm almost finally done with this.

00:45:53.060 --> 00:46:02.340
<v Matt>I find it comical that you're like, we're still working on a tool that enables human review when we were just spending the past hour talking about AI code review.

00:46:06.220 --> 00:46:08.120
<v Scott>Yeah, I guess it's like a UI tool.

00:46:08.210 --> 00:46:10.500
<v Scott>I think like we could put AI into it.

00:46:11.180 --> 00:46:14.180
<v Scott>We're my team's migrating off of this tool, unfortunately.

00:46:15.260 --> 00:46:19.780
<v Scott>But anyway, like the new set of tools is going to have no UIs.

00:46:20.560 --> 00:46:21.580
<v Scott>That's what leadership wants.

00:46:22.160 --> 00:46:27.700
<v Scott>So we were originally going to put AI into this tool to do all of this work for us.

00:46:28.340 --> 00:46:29.560
<v Scott>But that was a massive undertaking.

00:46:31.620 --> 00:46:37.880
<v Scott>We would have to put it in the entire, understand the entire context of all the actions.

00:46:38.470 --> 00:46:40.680
<v Scott>And we started working on that as a project.

00:46:41.220 --> 00:46:46.060
<v Scott>One of my team members, it was going to take like a year and we were going to do it.

00:46:46.200 --> 00:46:47.860
<v Scott>But we're transitioning off of this.

00:46:48.210 --> 00:46:51.360
<v Scott>So this could have been a precursor to UI.

00:46:52.640 --> 00:46:56.840
<v Scott>sorry to ai understanding the differences as well but that's a good point

00:46:58.820 --> 00:47:04.280
<v Dillon>my ai fatigue meter is now full fuck ai you can bleep that

00:47:07.380 --> 00:47:12.020
<v Dillon>um i don't know that's all just kidding

00:47:13.940 --> 00:47:19.319
<v Matt>the ai has pulled everything that's a value and interest and excitement from Dillon

00:47:19.560 --> 00:47:20.980
<v Dillon>my passion for everything is gone now.

00:47:24.560 --> 00:47:25.140
<v Dillon>I don't know.

00:47:25.360 --> 00:47:27.560
<v Dillon>Matt pinged me the other day about a side project.

00:47:28.680 --> 00:47:30.900
<v Dillon>And then I was like, oh, man,

00:47:31.500 --> 00:47:33.420
<v Dillon>I had that idea before LLMs existed.

00:47:35.080 --> 00:47:36.800
<v Dillon>Let me just see what Claude can do with it.

00:47:38.080 --> 00:47:39.400
<v Dillon>It actually did pretty well.

00:47:40.460 --> 00:47:44.560
<v Dillon>It's like basically a marketplace for...

00:47:44.720 --> 00:47:45.580
<v Matt>Well, don't leak the idea.

00:47:46.100 --> 00:47:48.240
<v Matt>This could be the next billion-dollar idea that you have.

00:47:48.460 --> 00:47:49.620
<v Dillon>It's just some sort of marketplace.

00:47:54.960 --> 00:48:00.060
<v Dillon>But yeah, it's been kind of fun to see what that looks like in a prototype state

00:48:00.600 --> 00:48:04.520
<v Dillon>and maybe consider making it something that I roll out at some point

00:48:05.420 --> 00:48:07.840
<v Dillon>or I just sell it to Matt for like 50 bucks

00:48:08.040 --> 00:48:09.780
<v Dillon>and he can take it and do whatever he wants with it.

00:48:10.200 --> 00:48:12.100
<v Matt>I don't know if I value it at $50 yet.

00:48:14.980 --> 00:48:18.380
<v Dillon>I mean, I've at least used $200 worth of tokens to build it.

00:48:18.660 --> 00:48:20.400
<v Dillon>So I hope it's worth something.

00:48:22.680 --> 00:48:24.240
<v Dillon>No, but other than that at work,

00:48:24.400 --> 00:48:28.680
<v Dillon>I've been messing around with some of the infra side of our next app.

00:48:29.840 --> 00:48:32.740
<v Dillon>Realized we weren't using Claudeflare workers' traces.

00:48:35.560 --> 00:48:37.000
<v Dillon>It's kind of crazy now,

00:48:37.820 --> 00:48:39.360
<v Dillon>just knowing that we didn't have the observability

00:48:39.580 --> 00:48:41.659
<v Dillon>around the requests we were making in the next server

00:48:42.480 --> 00:48:44.800
<v Dillon>and being able to see that waterfall of requests

00:48:44.980 --> 00:48:47.200
<v Dillon>and where we're making mistakes

00:48:47.710 --> 00:48:49.160
<v Dillon>in terms of too many requests.

00:48:50.050 --> 00:48:51.680
<v Dillon>I learned through that work that,

00:48:52.070 --> 00:48:54.620
<v Dillon>oh, you can only have six simultaneous connections

00:48:54.750 --> 00:48:55.600
<v Dillon>in a Claudeflare worker.

00:48:56.110 --> 00:48:59.240
<v Dillon>And we had 14 experiment and feature flag requests

00:48:59.920 --> 00:49:01.140
<v Dillon>that were happening,

00:49:01.230 --> 00:49:03.780
<v Dillon>and they're basically queuing up and blocking each other

00:49:03.790 --> 00:49:04.640
<v Dillon>and making our app slow.

00:49:05.610 --> 00:49:07.280
<v Dillon>So that's been a pretty neat thing to learn.

00:49:10.560 --> 00:49:15.340
<v Dillon>But yeah, it's either we need to use GraphQL as Matt just messaged in the chat.

00:49:16.120 --> 00:49:24.300
<v Dillon>Or I was thinking like, why don't these experiment feature flag services support like a batch endpoint so I can just get all of them in one go.

00:49:25.360 --> 00:49:27.520
<v Dillon>And I looked at our like internal documentation.

00:49:27.720 --> 00:49:28.420
<v Dillon>They just don't have it.

00:49:28.420 --> 00:49:29.260
<v Dillon>I was like, this is crazy.

00:49:30.040 --> 00:49:32.180
<v Dillon>How is it 2025 when we don't have a way to do this?

00:49:32.620 --> 00:49:40.440
<v Matt>Time to spend the next like three months writing an RFC and then getting an ADR on adopting GraphQL just to solve the use case of a batch endpoint.

00:49:42.040 --> 00:49:45.400
<v Dillon>I was just going to put up a PR on the service that implements it.

00:49:46.040 --> 00:49:47.180
<v Matt>No, that's too easy.

00:49:47.230 --> 00:49:51.960
<v Matt>You got to write an RFC, get the buy-in from leadership, and then you can write the PR.

00:49:54.340 --> 00:49:55.960
<v Dillon>Don't even get me started on that.

00:49:57.820 --> 00:49:57.980
<v Matt>All right.

00:49:57.980 --> 00:49:58.860
<v Matt>I got to get my update quick.

00:49:59.600 --> 00:50:01.340
<v Matt>Five minutes until Olivia Dean tickets drop.

00:50:02.859 --> 00:50:07.580
<v Matt>a couple things. One, wired headphones are pretty decent. I highly recommend them if you're using

00:50:07.670 --> 00:50:12.400
<v Matt>like a speech-to-text thing. So I've been using Handy, which is pretty dope. It's like a little

00:50:12.540 --> 00:50:16.080
<v Matt>keyboard shortcut. You just speak to your computer and then it transcribes it for you.

00:50:17.520 --> 00:50:22.840
<v Matt>Before that, I was trying Hex, but it was like a little bit funky. Anyway, it's free. I highly

00:50:23.000 --> 00:50:29.279
<v Matt>recommend. It's dope. Outside of that, I think a couple episodes ago, I talked about Parcel's

00:50:29.780 --> 00:50:35.780
<v Matt>RSC setup. I think I've now completely flipped and I think it's actually trash. There's my hot

00:50:35.960 --> 00:50:43.560
<v Matt>take. But now I'm just using Vite's built-in RSC plugin, which is pretty dope. It's basically

00:50:43.680 --> 00:50:49.360
<v Matt>what powers Waku today, but Waku has other limitations on other Claudeflare features that

00:50:49.360 --> 00:50:58.679
<v Matt>you can't use. You can't easily use durable objects with Waku. So I've been just using the

00:50:59.320 --> 00:51:01.000
<v Matt>separate Vite plugin and it's pretty dope.

00:51:02.720 --> 00:51:03.880
<v Matt>Yeah, and I already talked about

00:51:04.680 --> 00:51:06.080
<v Matt>sort of enjoying code outside of work

00:51:06.300 --> 00:51:07.740
<v Matt>because of Claude's free credits.

00:51:08.520 --> 00:51:11.120
<v Matt>So, you know, talk to me in three days

00:51:11.220 --> 00:51:14.160
<v Matt>when that expires and we'll see how I am at that point.

00:51:15.940 --> 00:51:16.480
<v Matt>I think that's it.

00:51:17.040 --> 00:51:17.260
<v Dillon>Cool.

00:51:18.600 --> 00:51:19.100
<v Matt>Let's end it there.

00:51:19.520 --> 00:51:20.300
<v Dillon>Are we done here?

00:51:20.860 --> 00:51:21.120
<v Dillon>I'm just kidding.

00:51:23.560 --> 00:51:24.500
<v Matt>Thanks, listener, for listening.

00:51:25.480 --> 00:51:28.300
<v Matt>Share this episode with folks that you think might enjoy it.

00:51:28.480 --> 00:51:31.560
<v Matt>We have some things cooking in the oven on next episodes.

00:51:31.740 --> 00:51:32.820
<v Matt>Hopefully it'll be pretty exciting.

00:51:33.300 --> 00:51:33.700
<v Matt>Leave us a review.

00:51:34.060 --> 00:51:38.220
<v Matt>We appreciate six stars out of five or 11 out of 10 if you have the option for 10.

00:51:38.800 --> 00:51:39.800
<v Matt>Otherwise, thanks for listening.

00:51:39.980 --> 00:51:40.540
<v Matt>Catch us next time.

00:51:41.280 --> 00:51:41.480
<v Matt>See you.

00:51:42.000 --> 00:51:42.220
<v Scott>Take care.

00:51:42.860 --> 00:51:44.160
<v Dillon>Do not code outside of work.

00:51:44.420 --> 00:51:45.240
<v Dillon>That's a terrible idea.

