• Open

    Vulnerability Research Is Cooked
    Vulnerability Research Is Cooked Within the next few months, coding agents will drastically alter both the practice and the economics of exploit development. Frontier model improvement won’t be a slow burn, but rather a step function. Substantial amounts of high-impact vulnerability research (maybe even most of it) will happen simply by pointing an agent at a source tree and typing “find me zero days”. Why are agents so good at this? A combination of baked-in knowledge, pattern matching ability and brute force: You can't design a better problem for an LLM agent than exploitation research. Before you feed it a single token of context, a frontier LLM already encodes supernatural amounts of correlation across vast bodies of source code. Is the Linux KVM hypervisor connected to the hrtimer subsystem, workqueue, or perf_event? The model knows. Also baked into those model weights: the complete library of documented "bug classes" on which all exploit development builds: stale pointers, integer mishandling, type confusion, allocator grooming, and all the known ways of promoting a wild write to a controlled 64-bit read/write in Firefox. Vulnerabilities are found by pattern-matching bug classes and constraint-solving for reachability and exploitability. Precisely the implicit search problems that LLMs are most gifted at solving. Exploit outcomes are straightforwardly testable success/failure trials. An agent never gets bored and will search forever if you tell it to. The article was partly inspired by this episode of the Security Cryptography Whatever podcast, where David Adrian, Deirdre Connolly, and Thomas interviewed Anthropic's Nicholas Carlini for 1 hour 16 minutes. I just started a new tag here for ai-security-research - it's up to 11 posts already. Tags: security, thomas-ptacek, careers, ai, generative-ai, llms, nicholas-carlini, ai-ethics, ai-security-research  ( 3 min )
    The cognitive impact of coding agents
    A fun thing about recording a podcast with a professional like Lenny Rachitsky is that his team know how to slice the resulting video up into TikTok-sized short form vertical videos. Here's one he shared on Twitter today which ended up attracting over 1.1m views! That was 48 seconds. Our full conversation lasted 1 hour 40 minutes. Tags: ai-ethics, coding-agents, agentic-engineering, generative-ai, podcast-appearances, ai, llms, cognitive-debt  ( 3 min )
    Quoting Willy Tarreau
    On the kernel security list we've seen a huge bump of reports. We were between 2 and 3 per week maybe two years ago, then reached probably 10 a week over the last year with the only difference being only AI slop, and now since the beginning of the year we're around 5-10 per day depending on the days (fridays and tuesdays seem the worst). Now most of these reports are correct, to the point that we had to bring in more maintainers to help us. And we're now seeing on a daily basis something that never happened before: duplicate reports, or the same bug found by two different people using (possibly slightly) different tools. — Willy Tarreau, Lead Software Developer. HAPROXY Tags: security, linux, generative-ai, ai, llms, ai-security-research  ( 3 min )
    Quoting Daniel Stenberg
    The challenge with AI in open source security has transitioned from an AI slop tsunami into more of a ... plain security report tsunami. Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense. — Daniel Stenberg, lead developer of cURL Tags: daniel-stenberg, security, curl, generative-ai, ai, llms, ai-security-research  ( 2 min )
    Quoting Greg Kroah-Hartman
    Months ago, we were getting what we called 'AI slop,' AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real. — Greg Kroah-Hartman, Linux kernel maintainer (bio), in conversation with Steven J. Vaughan-Nichols Tags: security, linux, generative-ai, ai, llms, ai-security-research  ( 3 min )
    Can JavaScript Escape a CSP Meta Tag Inside an Iframe?
    Research: Can JavaScript Escape a CSP Meta Tag Inside an Iframe? In trying to build my own version of Claude Artifacts I got curious about options for applying CSP headers to content in sandboxed iframes without using a separate domain to host the files. Turns out you can inject <meta http-equiv="Content-Security-Policy"...> tags at the top of the iframe content and they'll be obeyed even if subsequent untrusted JavaScript tries to manipulate them. Tags: iframes, security, javascript, content-security-policy, sandboxing  ( 3 min )
    The Axios supply chain attack used individually targeted social engineering
    The Axios team have published a full postmortem on the supply chain attack which resulted in a malware dependency going out in a release the other day, and it involved a sophisticated social engineering campaign targeting one of their maintainers directly. Here's Jason Saayman'a description of how that worked: so the attack vector mimics what google has documented here: https://cloud.google.com/blog/topics/threat-intelligence/unc1069-targets-cryptocurrency-ai-social-engineering they tailored this process specifically to me by doing the following: they reached out masquerading as the founder of a company they had cloned the companys founders likeness as well as the company itself. they then invited me to a real slack workspace. this workspace was branded to the companies ci and named in a plausible manner. the slack was thought out very well, they had channels where they were sharing linked-in posts, the linked in posts i presume just went to the real companys account but it was super convincing etc. they even had what i presume were fake profiles of the team of the company but also number of other oss maintainers. they scheduled a meeting with me to connect. the meeting was on ms teams. the meeting had what seemed to be a group of people that were involved. the meeting said something on my system was out of date. i installed the missing item as i presumed it was something to do with teams, and this was the RAT. everything was extremely well co-ordinated looked legit and was done in a professional manner. A RAT is a Remote Access Trojan - this was the software which stole the developer's credentials which could then be used to publish the malicious package. That's a very effective scam. I join a lot of meetings where I find myself needing to install Webex or Microsoft Teams or similar at the last moment and the time constraint means I always click "yes" to things as quickly as possible to make sure I don't join late. Every maintainer of open source software used by enough people to be worth taking in this way needs to be familiar with this attack strategy. Tags: open-source, packaging, security, social-engineering, supply-chain  ( 4 min )
  • Open

    AI News: Anthropic Leak is Bigger Than You Think
    No content preview

  • Open

    Highlights from my conversation about agentic engineering on Lenny's Podcast
    I was a guest on Lenny Rachitsky's podcast, in a new episode titled An AI state of the union: We've passed the inflection point, dark factories are coming, and automation timelines. It's available on YouTube, Spotify, and Apple Podcasts. Here are my highlights from our conversation, with relevant links. The November inflection point Software engineers as bellwethers for other information workers Writing code on my phone Responsible vibe coding Dark Factories and StrongDM The bottleneck has moved to testing This stuff is exhausting Interruptions cost a lot less now My ability to estimate software is broken It's tough for people in the middle It's harder to evaluate software The misconception that AI tools are easy Coding agents are useful for security research now OpenClaw Journalists are good at dealing with unreliable sources The pelican benchmark And finally, some good news about parrots YouTube chapters The November inflection point 4:19 - The end result of these two labs throwing everything they had at making their models better at code is that in November we had what I call the inflection point where GPT 5.1 and Claude Opus 4.5 came along. They were both incrementally better than the previous models, but in a way that crossed a threshold where previously the code would mostly work, but you had to pay very close attention to it. And suddenly we went from that to... almost all of the time it does what you told it to do, which makes all of the difference in the world. Now you can spin up a coding agent and say, build me a Mac application that does this thing, and you'll get something back which won't just be a buggy pile of rubbish that doesn't do anything. Software engineers as bellwethers for other information workers 5:49 - I can churn out 10,000 lines of code in a day. And most of it works. Is that good? Like, how do we get from most of it works to all of it works? There are so many new questions that we're facing, which I think makes us a bellwether for other information workers. Code is easier than almost every other problem that you pose these agents because code is obviously right or wrong - either it works or it doesn't work. There might be a few subtle hidden bugs, but generally you can tell if the thing actually works. If it writes you an essay, if it prepares a lawsuit for you, it's so much harder to derive if it's actually done a good job, and to figure out if it got things right or wrong. But it's happening to us as software engineers. It came for us first. And we're figuring out, OK, what do our careers look like? How do we work as teams when part of what we did that used to take most of the time doesn't take most of the time anymore? What does that look like? And it's going to be very interesting seeing how this rolls out to other information work in the future. Lawyers are falling for this really badly. The AI hallucination cases database is up to 1,228 cases now! Plus this bit from the cold open at the start: It used to be you'd ask ChatGPT for some code, and it would spit out some code, and you'd have to run it and test it. The coding agents take that step for you now. And an open question for me is how many other knowledge work fields are actually prone to these agent loops? Writing code on my phone 8:19 - I write so much of my code on my phone. It's wild. I can get good work done walking the dog along the beach, which is delightful. I mainly use the Claude iPhone app for this, both with a regular Claude chat session (which can execute code now) or using it to control Claude Code for web. Responsible vibe coding 9:55 If you're vibe coding something for yourself, where the only person who gets hurt if it has bugs is you, go wild. That's completely fine. The moment you ship your vibe coding code for other people to use, where your bugs might actually harm somebody else, that's when you need to take a step back. See also When is it OK to vibe code? Dark Factories and StrongDM 12:49 The reason it's called the dark factory is there's this idea in factory automation that if your factory is so automated that you don't need any people there, you can turn the lights off. Like the machines can operate in complete darkness if you don't need people on the factory floor. What does that look like for software? [...] So there's this policy that nobody writes any code: you cannot type code into a computer. And honestly, six months ago, I thought that was crazy. And today, probably 95% of the code that I produce, I didn't type myself. That world is practical already because the latest models are good enough that you can tell them to rename that variable and refactor and add this line there... and they'll just do it - it's faster than you typing on the keyboard yourself. The next rule though, is nobody reads the code. And this is the thing which StrongDM started doing last year. I wrote a lot more about StrongDM's dark factory explorations back in February. The bottleneck has moved to testing 21:27 - It used to be, you'd come up with a spec and you hand it to your engineering team. And three weeks later, if you're lucky, they'd come back with an implementation. And now that maybe takes three hours, depending on how well the coding agents are established for that kind of thing. So now what, right? Now, where else are the bottlenecks? Anyone who's done any product work knows that your initial ideas are always wrong. What matters is proving them, and testing them. We can test things so much faster now because we can build workable prototypes so much quicker. So there's an interesting thing I've been doing in my own work where any feature that I want to design, I'll often prototype three different ways it could work because that takes very little time. I've always loved prototyping things, and prototyping is even more valuable now. 22:40 - A UI prototype is free now. ChatGPT and Claude will just build you a very convincing UI for anything that you describe. And that's how you should be working. I think anyone who's doing product design and isn't vibe coding little prototypes is missing out on the most powerful boost that we get in that step. But then what do you do? Given your three options that you have instead of one option, how do you prove to yourself which one of those is the best? I don't have a confident answer to that. I expect this is where the good old fashioned usability testing comes in. More on prototyping later on: 46:35 - Throughout my entire career, my superpower has been prototyping. I've been very quick at knocking out working prototypes of things. I'm the person who can show up at a meeting and say, look, here's how it could work. And that was kind of my unique selling point. And that's gone. Anyone can do what I could do. This stuff is exhausting 26:25 - I'm finding that using coding agents well is taking every inch of my 25 years of experience as a software engineer, and it is mentally exhausting. I can fire up four agents in parallel and have them work on four different problems. And by like 11 AM, I am wiped out for the day. [...] There's a personal skill we have to learn in finding our new limits - what's a responsible way for us not to burn out. I've talked to a lot of people who are losing sleep because they're like, my coding agents could be doing work for me. I'm just going to stay up an extra half hour and set off a bunch of extra things... and then waking up at four in the morning. That's obviously unsustainable. [...] There's an element of sort of gambling and addiction to how we're using some of these tools. Interruptions cost a lot less now 45:16 - People talk about how important it is not to interrupt your coders. Your coders need to have solid two to four hour blocks of uninterrupted work so they can spin up their mental model and churn out the code. That's changed completely. My programming work, I need two minutes every now and then to prompt my agent about what to do next. And then I can do the other stuff and I can go back. I'm much more interruptible than I used to be. My ability to estimate software is broken 28:19 - I've got 25 years of experience in how long it takes to build something. And that's all completely gone - it doesn't work anymore because I can look at a problem and say that this is going to take two weeks, so it's not worth it. And now it's like... maybe it's going to take 20 minutes because the reason it would have taken two weeks was all of the sort of crufty coding things that the AI is now covering for us. I constantly throw tasks at AI that I don't think it'll be able to do because every now and then it does it. And when it doesn't do it, you learn, right? But when it does do something, especially something that the previous models couldn't do, that's actually cutting edge AI research. And a related anecdote: 36:56 - A lot of my friends have been talking about how they have this backlog of side projects, right? For the last 10, 15 years, they've got projects they never quite finished. And some of them are like, well, I've done them all now. Last couple of months, I just went through and every evening I'm like, let's take that project and finish it. And they almost feel a sort of sense of loss at the end where they're like, well, okay, my backlog's gone. Now what am I going to build? It's tough for people in the middle 29:29 - So ThoughtWorks, the big IT consultancy, did an offsite about a month ago, and they got a whole bunch of engineering VPs in from different companies to talk about this stuff. And one of the interesting theories they came up with is they think this stuff is really good for experienced engineers, like it amplifies their skills. It's really good for new engineers because it solves so many of those onboarding problems. The problem is the people in the middle. If you're mid-career, if you haven't made it to sort of super senior engineer yet, but you're not sort of new either, that's the group which is probably in the most trouble right now. I mentioned Cloudflare hiring 1,000 interns, and Shopify too. Lenny asked for my advice for people stuck in that middle: 31:21 - That's a big responsibility you're putting on me there! I think the way forward is to lean into this stuff and figure out how do I help this make me better? A lot of people worry about skill atrophy: if the AI is doing it for you, you're not learning anything. I think if you're worried about that, you push back at it. You have to be mindful about how you're applying the technology and think, okay, I've been given this thing that can answer any question and often gets it right. How can I use this to amplify my own skills, to learn new things, to take on much more ambitious projects? [...] 33:05 - Everything is changing so fast right now. The only universal skill is being able to roll with the changes. That's the thing that we all need. The term that comes up most in these conversations about how you can be great with AI is agency. I think agents have no agency at all. I would argue that the one thing AI can never have is agency because it doesn't have human motivations. So I'd say that's the thing is to invest in your own agency and invest in how to use this technology to get better at what you do and to do new things. It's harder to evaluate software The fact that it's so easy to create software with detailed documentation and robust tests means it's harder to figure out what's a credible project. 37:47 Sometimes I'll have an idea for a piece of software, Python library or whatever, and I can knock it out in like an hour and get to a point where it's got documentation and tests and all of those things, and it looks like the kind of software that previously I'd have spent several weeks on - and I can stick it up on GitHub And yet... I don't believe in it. And the reason I don't believe in it is that I got to rush through all of those things... I think the quality is probably good, but I haven't spent enough time with it to feel confident in that quality. Most importantly, I haven't used it yet. It turns out when I'm using somebody else's software, the thing I care most about is I want them to have used it for months. I've got some very cool software that I built that I've never used. It was quicker to build it than to actually try and use it! The misconception that AI tools are easy 41:31 - Everyone's like, oh, it must be easy. It's just a chat bot. It's not easy. That's one of the great misconceptions in AI is that using these tools effectively is easy. It takes a lot of practice and it takes a lot of trying things that didn't work and trying things that did work. Coding agents are useful for security research now 19:04 - In the past sort of three to six months, they've started being credible as security researchers, which is sending shockwaves through the security research industry. See Thomas Ptacek: Vulnerability Research Is Cooked. At the same time, open source projects are being bombarded with junk security reports: 20:05 - There are these people who don't know what they're doing, who are asking ChatGPT to find a security hole and then reporting it to the maintainer. And the report looks good. ChatGPT can produce a very well formatted report of a vulnerability. It's a total waste of time. It's not actually verified as being a real problem. A good example of the right way to do this is Anthropic's collaboration with Firefox, where Anthropic's security team verified every security problem before passing them to Mozilla. OpenClaw Of course we had to talk about OpenClaw! Lenny had his running on a Mac Mini. 1:29:23 - OpenClaw demonstrates that people want a personal digital assistant so much that they are willing to not just overlook the security side of things, but also getting the thing running is not easy. You've got to create API keys and tokens and install stuff. It's not trivial to get set up and hundreds of thousands of people got it set up. [...] The first line of code for OpenClaw was written on November the 25th. And then in the Super Bowl, there was an ad for AI.com, which was effectively a vaporware white labeled OpenClaw hosting provider. So we went from first line of code in November to Super Bowl ad in what? Three and a half months. I continue to love Drew Breunig's description of OpenClaw as a digital pet: A friend of mine said that OpenClaw is basically a Tamagotchi. It's a digital pet and you buy the Mac Mini as an aquarium. Journalists are good at dealing with unreliable sources In talking about my explorations of AI for data journalism through Datasette: 1:34:58 - You would have thought that AI is a very bad fit for journalism where the whole idea is to find the truth. But the flip side is journalists deal with untrustworthy sources all the time. The art of journalism is you talk to a bunch of people and some of them lie to you and you figure out what's true. So as long as the journalist treats the AI as yet another unreliable source, they're actually better equipped to work with AI than most other professions are. The pelican benchmark Obviously we talked about pelicans riding bicycles: 56:10 - There appears to be a very strong correlation between how good their drawing of a pelican riding a bicycle is and how good they are at everything else. And nobody can explain to me why that is. [...] People kept on asking me, what if labs cheat on the benchmark? And my answer has always been, really, all I want from life is a really good picture of a pelican riding a bicycle. And if I can trick every AI lab in the world into cheating on benchmarks to get it, then that just achieves my goal. 59:56 - I think something people often miss is that this space is inherently funny. The fact that we have these incredibly expensive, power hungry, supposedly the most advanced computers of all time. And if you ask them to draw a pelican on a bicycle, it looks like a five-year-old drew it. That's really funny to me. And finally, some good news about parrots Lenny asked if I had anything else I wanted to leave listeners with to wrap up the show, so I went with the best piece of news in the world right now. 1:38:10 - There is a rare parrot in New Zealand called the Kākāpō. There are only 250 of these parrots left in the world. They are flightless nocturnal parrots - beautiful green dumpy looking things. And the good news is they're having a fantastic breeding season in 2026, They only breed when the Rimu trees in New Zealand have a mass fruiting season, and the Rimu trees haven't done that since 2022 - so there has not been a single baby kākāpō born in four years. This year, the Rimu trees are in fruit. The kākāpō are breeding. There have been dozens of new chicks born. It's a really, really good time. It's great news for rare New Zealand parrots and you should look them up because they're delightful. Everyone should watch the live stream of Rakiura on her nest with two chicks! YouTube chapters Here's the full list of chapters Lenny's team defined for the YouTube video: 00:00: Introduction to Simon Willison 02:40: The November 2025 inflection point 08:01: What's possible now with AI coding 10:42: Vibe coding vs. agentic engineering 13:57: The dark-factory pattern 20:41: Where bottlenecks have shifted 23:36: Where human brains will continue to be valuable 25:32: Defending of software engineers 29:12: Why experienced engineers get better results 30:48: Advice for avoiding the permanent underclass 33:52: Leaning into AI to amplify your skills 35:12: Why Simon says he's working harder than ever 37:23: The market for pre-2022 human-written code 40:01: Prediction: 50% of engineers writing 95% AI code by the end of 2026 44:34: The impact of cheap code 48:27: Simon's AI stack 54:08: Using AI for research 55:12: The pelican-riding-a-bicycle benchmark 59:01: The inherent ridiculousness of AI 1:00:52: Hoarding things you know how to do 1:08:21: Red/green TDD pattern for better AI code 1:14:43: Starting projects with good templates 1:16:31: The lethal trifecta and prompt injection 1:21:53: Why 97% effectiveness is a failing grade 1:25:19: The normalization of deviance 1:28:32: OpenClaw: the security nightmare everyone is looking past 1:34:22: What's next for Simon 1:36:47: Zero-deliverable consulting 1:38:05: Good news about Kakapo parrots Tags: ai, kakapo, generative-ai, llms, podcast-appearances, coding-agents, agentic-engineering  ( 13 min )
    Gemma 4: Byte for byte, the most capable open models
    Gemma 4: Byte for byte, the most capable open models Google emphasize "unprecedented level of intelligence-per-parameter", providing yet more evidence that creating small useful models is one of the hottest areas of research right now. They actually label the two smaller models as E2B and E4B for "Effective" parameter size. The system card explains: The smaller models incorporate Per-Layer Embeddings (PLE) to maximize parameter efficiency in on-device deployments. Rather than adding more layers or parameters to the model, PLE gives each decoder layer its own small embedding for every token. These embedding tables are large but are only used for quick lookups, which is why the effective parameter count is much smaller than the total. I don't entirely understand that, but apparently that's what the "E" in E2B means! One particularly exciting feature of these models is that they are multi-modal beyond just images: Vision and audio: All models natively process video and images, supporting variable resolutions, and excelling at visual tasks like OCR and chart understanding. Additionally, the E2B and E4B models feature native audio input for speech recognition and understanding. I've not figured out a way to run audio input locally - I don't think that feature is in LM Studio or Ollama yet. I tried them out using the GGUFs for LM Studio. The 2B (4.41GB), 4B (6.33GB) and 26B-A4B (17.99GB) models all worked perfectly, but the 31B (19.89GB) model was broken and spat out "---\n" in a loop for every prompt I tried. The succession of pelican quality from 2B to 4B to 26B-A4B is notable: E2B: E4B: 26B-A4B: (This one actually had an SVG error - "error on line 18 at column 88: Attribute x1 redefined" - but after fixing that I got probably the best pelican I've seen yet from a model that runs on my laptop.) Google are providing API access to the two larger Gemma models via their AI Studio. I added support to llm-gemini and then ran a pelican through the 31B model using that: llm -m gemini/gemma-4-31b-it 'Generate an SVG of a pelican riding a bicycle' Pretty good, though it is missing the front part of the bicycle frame: Tags: google, ai, generative-ai, local-llms, llms, llm, vision-llms, pelican-riding-a-bicycle, llm-reasoning, gemma, llm-release, lm-studio  ( 4 min )
    llm-gemini 0.30
    Release: llm-gemini 0.30 New models gemini-3.1-flash-lite-preview, gemma-4-26b-a4b-it and gemma-4-31b-it. See my notes on Gemma 4. Tags: gemini, llm, gemma  ( 2 min )
    March 2026 sponsors-only newsletter
    I just sent the March edition of my sponsors-only monthly newsletter. If you are a sponsor (or if you start a sponsorship now) you can access it here. In this month's newsletter: More agentic engineering patterns Streaming experts with MoE models on a Mac Model releases in March Vibe porting Supply chain attacks against PyPI and NPM Stuff I shipped What I'm using, March 2026 edition And a couple of museums Here's a copy of the February newsletter as a preview of what you'll get. Pay $10/month to stay a month ahead of the free copy! Tags: newsletter  ( 3 min )
  • Open

    He just crawled through hell to fix the browser…
    No content preview
  • Open

    Gemma 4: Byte for byte, the most capable open models
    Gemma 4: Our most intelligent open models to date, purpose-built for advanced reasoning and agentic workflows.  ( 17 min )
  • Open

    New ways to balance cost and reliability in the Gemini API
    Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.  ( 14 min )
    Create, edit and share videos at no cost in Google Vids
    New AI capabilities are coming to Google Vids, powered by Lyria 3 and Veo 3.1, like high-quality video generation at no cost and more.  ( 16 min )
  • Open

    OpenAI acquires TBPN
    OpenAI acquires TBPN to accelerate global conversations around AI and support independent media, expanding dialogue with builders, businesses, and the broader tech community.
    Codex now offers more flexible pricing for teams
    Codex now includes pay-as-you-go pricing for ChatGPT Business and Enterprise, providing teams a more flexible option to start and scale adoption.
  • Open

    Personalized Group Relative Policy Optimization for Heterogenous Preference Alignment
    Despite their sophisticated general-purpose capabilities, Large Language Models (LLMs) often fail to align with diverse individual preferences because standard post-training methods, like Reinforcement Learning with Human Feedback (RLHF), optimize for a single, global objective. While Group Relative Policy Optimization (GRPO) is a widely adopted on-policy reinforcement learning framework, its group-based normalization implicitly assumes that all samples are exchangeable, inheriting this limitation in personalized settings. This assumption conflates distinct user reward distributions and…  ( 3 min )

  • Open

    datasette-llm 0.1a6
    Release: datasette-llm 0.1a6 The same model ID no longer needs to be repeated in both the default model and allowed models lists - setting it as a default model automatically adds it to the allowed models list. #6 Improved documentation for Python API usage. Tags: llm, datasette  ( 3 min )
    datasette-enrichments-llm 0.2a1
    Release: datasette-enrichments-llm 0.2a1 The actor who triggers an enrichment is now passed to the llm.mode(... actor=actor) method. #3 Tags: enrichments, llm, datasette  ( 2 min )
    datasette-extract 0.3a0
    Release: datasette-extract 0.3a0 Now uses datasette-llm to manage model configuration, which means you can control which models are available for extraction tasks using the extract purpose and LLM model configuration. #38 Tags: llm, datasette  ( 2 min )
    datasette-enrichments-llm 0.2a0
    Release: datasette-enrichments-llm 0.2a0 This plugin now uses datasette-llm to configure and manage models. This means it's possible to specify which models should be made available for enrichments, using the new enrichments purpose. Tags: llm, datasette  ( 2 min )
    datasette-llm-usage 0.2a0
    Release: datasette-llm-usage 0.2a0 Removed features relating to allowances and estimated pricing. These are now the domain of datasette-llm-accountant. Now depends on datasette-llm for model configuration. #3 Full prompts and responses and tool calls can now be logged to the llm_usage_prompt_log table in the internal database if you set the new datasette-llm-usage.log_prompts plugin configuration setting. Redesigned the /-/llm-usage-simple-prompt page, which now requires the llm-usage-simple-prompt permission. Tags: llm, datasette  ( 3 min )
    datasette-llm 0.1a5
    Release: datasette-llm 0.1a5 The llm_prompt_context() plugin hook wrapper mechanism now tracks prompts executed within a chain as well as one-off prompts, which means it can be used to track tool call loops. #5 Tags: llm, datasette  ( 2 min )
    Quoting Soohoon Choi
    I want to argue that AI models will write good code because of economic incentives. Good code is cheaper to generate and maintain. Competition is high between the AI models right now, and the ones that win will help developers ship reliable features fastest, which requires simple, maintainable code. Good code will prevail, not only because we want it to (though we do!), but because economic forces demand it. Markets will not reward slop in coding, in the long-term. — Soohoon Choi, Slop Is Not Necessarily The Future Tags: slop, ai-assisted-programming, generative-ai, agentic-engineering, ai, llms  ( 3 min )
  • Open

    Tragic mistake... Anthropic leaks Claude’s source code
    No content preview
  • Open

    We’re creating a new satellite imagery map to help protect Brazil’s forests.
    Google partnered with the Brazilian government on a satellite imagery map to help protect the country’s forests.  ( 14 min )
    The latest AI news we announced in March 2026
    Here are Google’s latest AI updates from March 2026  ( 19 min )
  • Open

    Gradient Labs gives every bank customer an AI account manager
    Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.

  • Open

    Supply Chain Attack on Axios Pulls Malicious Dependency from npm
    Supply Chain Attack on Axios Pulls Malicious Dependency from npm 101 million weekly downloads. Versions 1.14.1 and 0.30.4 both included a new dependency called plain-crypto-js which was freshly published malware, stealing credentials and installing a remote access trojan (RAT). It looks like the attack came from a leaked long-lived npm token. Axios have an open issue to adopt trusted publishing, which would ensure that only their GitHub Actions workflows are able to publish to npm. The malware packages were published without an accompanying GitHub release, which strikes me as a useful heuristic for spotting potentially malicious releases - the same pattern was present for LiteLLM last week as well. Via lobste.rs Tags: javascript, security, npm, supply-chain  ( 3 min )
    datasette-llm 0.1a4
    Release: datasette-llm 0.1a4 Ability to configure different API keys for models based on their purpose - for example, set it up so enrichments always use gpt-5.4-mini with an API key dedicated to that purpose. #4 I released llm-echo 0.3 to provide an API key testing utility I needed for the tests for this new feature. Tags: llm, datasette  ( 3 min )
    llm-all-models-async 0.1
    Release: llm-all-models-async 0.1 LLM plugins can define new models in both sync and async varieties. The async variants are most common for API-backed models - sync variants tend to be things that run the model directly within the plugin. My llm-mrchatterbox plugin is sync only. I wanted to try it out with various Datasette LLM features (specifically datasette-enrichments-llm) but Datasette can only use async models. So... I had Claude spin up this plugin that turns sync models into async models using a thread pool. This ended up needing an extra plugin hook mechanism in LLM itself, which I shipped just now in LLM 0.30. Tags: llm, async, python  ( 3 min )
    llm 0.30
    Release: llm 0.30 The register_models() plugin hook now takes an optional model_aliases parameter listing all of the models, async models and aliases that have been registered so far by other plugins. A plugin with @hookimpl(trylast=True) can use this to take previously registered models into account. #1389 Added docstrings to public classes and methods and included those directly in the documentation. Tags: llm  ( 3 min )
    llm-echo 0.4
    Release: llm-echo 0.4 Prompts now have the input_tokens and output_tokens fields populated on the response. Tags: llm  ( 2 min )
    llm-echo 0.3
    Release: llm-echo 0.3 Mechanisms for testing tool calls. #3 Mechanism for testing raw responses. #4 New echo-needs-key model for testing model key logic. #7 Tags: llm  ( 2 min )
  • Open

    How AI Gets Data Wrong (and how to fix it)
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
  • Open

    Millions of JS devs just got penetrated by a RAT…
    No content preview
  • Open

    Build with Veo 3.1 Lite, our most cost-effective video generation model
    Veo 3.1 Lite is now available in paid preview through the Gemini API and for testing in Google AI Studio.  ( 14 min )
  • Open

    Accelerating the next phase of AI
    OpenAI raises $122 billion in new funding to expand frontier AI globally, invest in next-generation compute, and meet growing demand for ChatGPT, Codex, and enterprise AI.
  • Open

    ProText: A Benchmark Dataset for Measuring (Mis)gendering in Long-Form Texts
    We introduce ProText, a dataset for measuring gendering and misgendering in stylistically diverse long-form English texts. ProText spans three dimensions: Theme nouns (names, occupations, titles, kinship terms), Theme category (stereotypically male, stereotypically female, gender-neutral/non-gendered), and Pronoun category (masculine, feminine, gender-neutral, none). The dataset is designed to probe (mis)gendering in text transformations such as summarization and rewrites using state-of-the-art Large Language Models, extending beyond traditional pronoun resolution benchmarks and beyond the…  ( 2 min )

  • Open

    datasette-files 0.1a3
    Release: datasette-files 0.1a3 I'm working on integrating datasette-files into other plugins, such as datasette-extract. This necessitated a new release of the base plugin. owners_can_edit and owners_can_delete configuration options, plus the files-edit and files-delete actions are now scoped to a new FileResource which is a child of FileSourceResource. #18 The file picker UI is now available as a <datasette-file-picker> Web Component. Thanks, Alex Garcia. #19 New from datasette_files import get_file Python API for other plugins that need to access file data. #20 Tags: datasette  ( 3 min )
    Quoting Georgi Gerganov
    Note that the main issues that people currently unknowingly face with local models mostly revolve around the harness and some intricacies around model chat templates and prompt construction. Sometimes there are even pure inference bugs. From typing the task in the client to the actual result, there is a long chain of components that atm are not only fragile - are also developed by different parties. So it's difficult to consolidate the entire stack and you have to keep in mind that what you are currently observing is with very high probability still broken in some subtle way along that chain. — Georgi Gerganov, explaining why it's hard to find local models that work well with coding agents Tags: coding-agents, generative-ai, ai, local-llms, llms, georgi-gerganov  ( 3 min )
    datasette-llm 0.1a3
    Release: datasette-llm 0.1a3 Adds the ability to configure which LLMs are available for which purpose, which means you can restrict the list of models that can be used with a specific plugin. #3 Tags: llm, datasette  ( 2 min )
    Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer
    Trip Venturella released Mr. Chatterbox, a language model trained entirely on out-of-copyright text from the British Library. Here's how he describes it in the model card: Mr. Chatterbox is a language model trained entirely from scratch on a corpus of over 28,000 Victorian-era British texts published between 1837 and 1899, drawn from a dataset made available by the British Library. The model has absolutely no training inputs from after 1899 — the vocabulary and ideas are formed exclusively from nineteenth-century literature. Mr. Chatterbox's training corpus was 28,035 books, with an estimated 2.93 billion input tokens after filtering. The model has roughly 340 million paramaters, roughly the same size as GPT-2-Medium. The difference is, of course, that unlike GPT-2, Mr. Chatterbox is trained entirely on historical data. Given how hard it is to train a useful LLM without using vast amounts of scraped, unlicensed data I've been dreaming of a model like this for a couple of years now. What would a model trained on out-of-copyright text be like to chat with? Thanks to Trip we can now find out for ourselves! The model itself is tiny, at least by Large Language Model standards - just 2.05GB on disk. You can try it out using Trip's HuggingFace Spaces demo: Honestly, it's pretty terrible. Talking with it feels more like chatting with a Markov chain than an LLM - the responses may have a delightfully Victorian flavor to them but it's hard to get a response that usefully answers a question. The 2022 Chinchilla paper suggests a ratio of 20x the parameter count to training tokens. For a 340m model that would suggest around 7 billion tokens, more than twice the British Library corpus used here. The smallest Qwen 3.5 model is 600m parameters and that model family starts to get interesting at 2b - so my hunch is we would need 4x or more the training data to get something that starts to feel like a useful conversational partner. But what a fun project! Running it locally with LLM I decided to see if I could run the model on my own machine using my LLM framework. I got Claude Code to do most of the work - here's the transcript. Trip trained the model using Andrej Karpathy's nanochat, so I cloned that project, pulled the model weights and told Claude to build a Python script to run the model. Once we had that working (which ended up needing some extra details from the Space demo source code) I had Claude read the LLM plugin tutorial and build the rest of the plugin. llm-mrchatterbox is the result. Install the plugin like this: llm install llm-mrchatterbox The first time you run a prompt it will fetch the 2.05GB model file from Hugging Face. Try that like this: llm -m mrchatterbox "Good day, sir" Or start an ongoing chat session like this: llm chat -m mrchatterbox If you don't have LLM installed you can still get a chat session started from scratch using uvx like this: uvx --with llm-mrchatterbox llm chat -m mrchatterbox When you are finished with the model you can delete the cached file using: llm mrchatterbox delete-model This is the first time I've had Claude Code build a full LLM model plugin from scratch and it worked really well. I expect I'll be using this method again in the future. I continue to hope we can get a useful model from entirely public domain data. The fact that Trip was able to get this far using nanochat and 2.93 billion training tokens is a promising start. Update 31st March 2026: I had missed this when I first published this piece but Trip has his own detailed writeup of the project which goes into much more detail about how he trained the model. Here's how the books were filtered for pre-training: First, I downloaded the British Library dataset split of all 19th-century books. I filtered those down to books contemporaneous with the reign of Queen Victoria—which, unfortunately, cut out the novels of Jane Austen—and further filtered those down to a set of books with a optical character recognition (OCR) confidence of .65 or above, as listed in the metadata. This left me with 28,035 books, or roughly 2.93 billion tokes for pretraining data. Getting it to behave like a conversational model was a lot harder. Trip started by trying to train on plays by Oscar Wilde and George Bernard Shaw, but found they didn't provide enough pairs. Then he tried extracting dialogue pairs from the books themselves with poor results. The approach that worked was to have Claude Haiku and GPT-4o-mini generate synthetic conversation pairs for the supervised fine tuning, which solved the problem but sadly I think dilutes the "no training inputs from after 1899" claim from the original model card. Tags: ai, andrej-karpathy, generative-ai, local-llms, llms, ai-assisted-programming, hugging-face, llm, training-data, uv, ai-ethics, claude-code  ( 6 min )
    llm-mrchatterbox 0.1
    Release: llm-mrchatterbox 0.1 See Mr. Chatterbox is a (weak) Victorian-era ethically trained model you can run on your own computer. Tags: llm  ( 2 min )
  • Open

    Beyond Real Data: Synthetic Data through the Lens of Regularization
    Synthetic data can improve generalization when real data is scarce, but excessive reliance may introduce distributional mismatches that degrade performance. In this paper, we present a learning-theoretic framework to quantify the trade-off between synthetic and real data. Our approach leverages algorithmic stability to derive generalization error bounds, characterizing the optimal synthetic-to-real data ratio that minimizes expected test error as a function of the Wasserstein distance between the real and synthetic distributions. We motivate our framework in the setting of kernel ridge…  ( 3 min )
    Entropy-Preserving Reinforcement Learning
    Policy gradient algorithms have driven many recent advancements in language model reasoning. An appealing property is their ability to learn from exploration on their own trajectories, a process crucial for fostering diverse and creative solutions. As we show in this paper, many policy gradient algorithms naturally reduce the entropy—and thus the diversity of explored trajectories—as part of training, yielding a policy increasingly limited in its ability to explore. In this paper, we argue that entropy should be actively monitored and controlled throughout training. We formally analyze the…  ( 3 min )

  • Open

    Helping disaster response teams turn AI into action across Asia
    AI for Disaster Response in Asia: OpenAI Workshop with Gates Foundation
  • Open

    Pretext
    Pretext react-motion animation library. Pretext solves the problem of calculating the height of a paragraph of line-wrapped text without touching the DOM. The usual way of doing this is to render the text and measure its dimensions, but this is extremely expensive. Pretext uses an array of clever tricks to make this much, much faster, which enables all sorts of new text rendering effects in browser applications. Here's one demo that shows the kind of things this makes possible: The key to how this works is the way it separates calculations into a call to a prepare() function followed by multiple calls to layout(). The prepare() function splits the input text into segments (effectively words, but it can take things like soft hyphens and non-latin character sequences and emoji into account as well) and measures those using an off-screen canvas, then caches the results. This is comparatively expensive but only runs once. The layout() function can then emulate the word-wrapping logic in browsers to figure out how many wrapped lines the text will occupy at a specified width and measure the overall height. I had Claude build me this interactive artifact to help me visually understand what's going on, based on a simplified version of Pretext itself. The way this is tested is particularly impressive. The earlier tests rendered a full copy of the Great Gatsby in multiple browsers to confirm that the estimated measurements were correct against a large volume of text. This was later joined by the corpora/ folder using the same technique against lengthy public domain documents in Thai, Chinese, Korean, Japanese, Arabic, and more. Cheng Lou says: The engine’s tiny (few kbs), aware of browser quirks, supports all the languages you’ll need, including Korean mixed with RTL Arabic and platform-specific emojis This was achieved through showing Claude Code and Codex the browsers ground truth, and have them measure & iterate against those at every significant container width, running over weeks Via @_chenglou Tags: browsers, css, javascript, testing, react, typescript  ( 4 min )

  • Open

    Less Gaussians, Texture More: 4K Feed-Forward Textured Splatting
    Existing feed-forward 3D Gaussian Splatting methods predict pixel-aligned primitives, leading to a quadratic growth in primitive count as resolution increases. This fundamentally limits their scalability, making high-resolution synthesis such as 4K intractable. We introduce LGTM (Less Gaussians, Texture More), a feed-forward framework that overcomes this resolution scaling barrier. By predicting compact Gaussian primitives coupled with per-primitive textures, LGTM decouples geometric complexity from rendering resolution. This approach enables high-fidelity 4K novel view synthesis without…  ( 2 min )

  • Open

    STADLER reshapes knowledge work at a 230-year-old company
    Learn how STADLER uses ChatGPT to transform knowledge work, saving time and accelerating productivity across 650 employees.
  • Open

    AI News: Anthropic Went Crazy This Week!
    No content preview
  • Open

    To Infinity and Beyond: Tool-Use Unlocks Length Generalization in State Space Models
    State Space Models (SSMs) have become the leading alternative to Transformers for sequence modeling. Their primary advantage is efficiency in long-context and long-form generation, enabled by fixed-size memory and linear scaling of computational complexity. We begin this work by showing a simple theoretical result stating that SSMs cannot accurately solve any “truly long-form” generation problem (in a sense we formally define), undermining their main competitive advantage. However, we show that this limitation can be mitigated by allowing SSMs interactive access to external tools. In fact, we…  ( 3 min )
    Athena: Intermediate Representations for Iterative Scaffolded App Generation with an LLM
    It is challenging to generate the code for a complete user interface using a Large Language Model (LLM). User interfaces are complex and their implementations often consist of multiple, inter-related files that together specify the contents of each screen, the navigation flows between the screens, and the data model used throughout the application. It is challenging to craft a single prompt for an LLM that contains enough detail to generate a complete user interface, and even then the result is frequently a single large and difficult to understand file that contains all of the generated…  ( 3 min )

  • Open

    Two AI Models Set to “stir government urgency”, But Will This Challenge Undo Them?
    No content preview
  • Open

    Anthropic just released the real Claude Bot...
    No content preview
  • Open

    Watch James Manyika talk AI and creativity with LL COOL J.
    In the latest episode of our Dialogues on Technology and Society series, LL COOL J sits down with James Manyika.  ( 14 min )
    Transform your headphones into a live personal translator on iOS.
    Google Translate’s Live translate with headphones is officially arriving on iOS! And we're expanding the capability for both iOS and Android users to even more countries…  ( 15 min )
    Gemini 3.1 Flash Live: Making audio AI more natural and reliable
    Gemini 3.1 Flash Live is now available across Google products.  ( 18 min )
    Search Live is expanding globally
    We’re expanding Search Live globally, to all languages and locations where AI Mode is available.  ( 14 min )
  • Open

    Gemini 3.1 Flash Live: Making audio AI more natural and reliable
    Our latest voice model has improved precision and lower latency to make voice interactions more fluid, natural and precise.  ( 18 min )
  • Open

    This Datacenter Problem Nobody's Talking About
    No content preview
  • Open

    Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training
    While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This paper challenges that view by proposing a direct framework to model the scaling of benchmark performance from the training budget. We find that for a fixed token-to-parameter ratio, a simple power law can accurately describe the scaling behavior of log accuracy on multiple popular downstream tasks. Our results show that the direct approach extrapolates better than the previously proposed two-stage procedure…  ( 3 min )
    Drop-In Perceptual Optimization for 3D Gaussian Splatting
    Despite their output being ultimately consumed by human viewers, 3D Gaussian Splatting (3DGS) methods often rely on ad-hoc combinations of pixel-level losses, resulting in blurry renderings. To address this, we systematically explore perceptual optimization strategies for 3DGS by searching over a diverse set of distortion losses. We conduct the first-of-its-kind large-scale human subjective study on 3DGS, involving 39,320 pairwise ratings across several datasets and 3DGS frameworks. A regularized version of Wasserstein Distortion, which we call WD-R, emerges as the clear winner, excelling at…  ( 5 min )
  • Open

    Improving Composer through real-time RL
    Run cloud agents in your own infrastructure  ( 16 min )

  • Open

    Protecting people from harmful manipulation
    Google DeepMind researches AI's harmful manipulation risks across areas like finance and health, leading to new safety measures.  ( 7 min )
    Lyria 3 Pro: Create longer tracks in more
    Introducing Lyria 3 Pro, which unlocks longer tracks with structural awareness. We’re also bringing Lyria to more Google products and surfaces.  ( 15 min )
  • Open

    Build with Lyria 3, our newest music generation model
    Lyria 3 is now available in paid preview through the Gemini API and for testing in Google AI Studio.  ( 16 min )
    Lyria 3 Pro: Create longer tracks in more Google products
    We are bringing Lyria 3 to the tools where professionals work and create every day.  ( 15 min )
  • Open

    Inside our approach to the Model Spec
    Learn how OpenAI’s Model Spec serves as a public framework for model behavior, balancing safety, user freedom, and accountability as AI systems advance.
    Introducing the OpenAI Safety Bug Bounty program
    OpenAI launches a Safety Bug Bounty program to identify AI abuse and safety risks, including agentic vulnerabilities, prompt injection, and data exfiltration.
  • Open

    Thinking into the Future: Latent Lookahead Training for Transformers
    This paper was accepted at the Workshop on Latent & Implicit Thinking – Going Beyond CoT Reasoning 2026 at ICLR. Autoregressive language models trained with next-token prediction generate text by sampling one discrete token at a time. Although very scalable, this objective forces the model to commit at every step, preventing it from exploring or reflecting upon multiple plausible continuations. Furthermore, the compute allocation across tokens is uniform; every token is formed based on a single forward-pass, potentially limiting the model’s expressiveness in cases where difficult tokens…  ( 3 min )
  • Open

    Run cloud agents in your own infrastructure
    Improving Composer through real-time RL  ( 11 min )
  • Open

    Self-hosted Cloud Agents
    Mar 25, 2026  ( 3 min )

  • Open

    OpenAI Just Killed Sora
    No content preview
  • Open

    Tech bros optimized war… and it’s working
    No content preview
  • Open

    Helping developers build safer AI experiences for teens
    OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.
    Update on the OpenAI Foundation
    The OpenAI Foundation announces plans to invest at least $1 billion in curing diseases, economic opportunity, AI resilience, and community programs.
    Powering product discovery in ChatGPT
    ChatGPT introduces richer, visually immersive shopping powered by the Agentic Commerce Protocol, enabling product discovery, side-by-side comparisons, and merchant integration.

  • Open

    Creating with Sora Safely
    To address the novel safety challenges posed by a state-of-the-art video model as well as a new social creation platform, we’ve built Sora 2 and the Sora app with safety at the foundation. Our approach is anchored in concrete protections.
  • Open

    Fast regex search: indexing text for agent tools
    Improving Composer through real-time RL  ( 46 min )

  • Open

    This new Linux distro is breaking the law, by design…
    No content preview
  • Open

    AI News: Every Major Announcement From This Week
    No content preview

  • Open

    Google just changed the future of UI/UX design...
    No content preview
  • Open

    How we monitor internal coding agents for misalignment
    How OpenAI uses chain-of-thought monitoring to study misalignment in internal coding agents—analyzing real-world deployments to detect risks and strengthen AI safety safeguards.
    OpenAI to acquire Astral
    Accelerates Codex growth to power the next generation of Python developer tools
  • Open

    Introducing Composer 2
    Improving Composer through real-time RL  ( 8 min )
  • Open

    Composer 2
    Mar 19, 2026  ( 3 min )

  • Open

    How to burn $30m on a JavaScript framework...
    No content preview
  • Open

    OpenClaw Just Got WAY Easier to Install
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
  • Open

    Money Forward brings Cursor’s coding agents to product, design, and QA
    Over 1,000 Money Forward employees now use Cursor every day.  ( 29 min )

  • Open

    Measuring progress toward AGI: A cognitive framework
    We’re introducing a framework to measure progress toward AGI, and launching a Kaggle hackathon to build the relevant evaluations.  ( 15 min )
  • Open

    Bringing the power of Personal Intelligence to more people
    We're expanding Personal Intelligence across AI Mode in Search, the Gemini app and Gemini in Chrome.  ( 16 min )
    Our latest investment in open source security for the AI era
    Google is making new investments, building new tools and developing code security to improve open source security.  ( 14 min )
  • Open

    Perplexity “Computer” is Kind of Insane
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
    The REAL Story from NVIDIA GTC This Week!
    No content preview
  • Open

    Introducing GPT-5.4 mini and nano
    GPT-5.4 mini and nano are smaller, faster versions of GPT-5.4 optimized for coding, tool use, multimodal reasoning, and high-volume API and sub-agent workloads.
    OpenAI Japan announces Japan Teen Safety Blueprint to put teen safety first
    OpenAI Japan announces the Japan Teen Safety Blueprint, introducing stronger age protections, parental controls, and well-being safeguards for teens using generative AI.
    Equipping workers with insights about compensation
    New research shows Americans send nearly 3 million daily messages to ChatGPT asking about compensation and earnings, helping close the wage information gap.
  • Open

    Training Composer for longer horizons
    Improving Composer through real-time RL  ( 20 min )

  • Open

    Claude Just Dropped A Huge New Feature
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
  • Open

    Why Codex Security Doesn’t Include a SAST Report
    A deep dive into why Codex Security doesn’t rely on traditional SAST, instead using AI-driven constraint reasoning and validation to find real vulnerabilities with fewer false positives.

  • Open

    This New Feature Is A Big Deal For Creatives
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 8 min )

  • Open

    Claude Just Rolled Out 2 Big New Features
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
    AI News: They All Launched the Same Thing!
    No content preview

  • Open

    7 new open source AI tools you need right now…
    No content preview
  • Open

    How AI is helping improve heart health in rural Australia
    A new Google AI initiative aims to improve heart health outcomes for people living in remote Australian communities.  ( 15 min )
  • Open

    Is ChatGPT Creating MORE Work?
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )

  • Open

    Is ChatGPT Making You Dumber?
    Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.  ( 7 min )
  • Open

    Rakuten fixes issues twice as fast with Codex
    Rakuten uses Codex, the coding agent from OpenAI, to ship software faster and safer, reducing MTTR 50%, automating CI/CD reviews, and delivering full-stack builds in weeks.
    Designing AI agents to resist prompt injection
    How ChatGPT defends against prompt injection and social engineering by constraining risky actions and protecting sensitive data in agent workflows.
    From model to agent: Equipping the Responses API with a computer environment
    How OpenAI built an agent runtime using the Responses API, shell tool, and hosted containers to run secure, scalable agents with files, tools, and state.
    Wayfair boosts catalog accuracy and support speed with OpenAI
    Wayfair uses OpenAI models to improve ecommerce support and product catalog accuracy, automating ticket triage and enhancing millions of product attributes at scale.
  • Open

    How we compare model quality in Cursor
    Improving Composer through real-time RL  ( 15 min )
    Over 30 new plugins join the Cursor Marketplace
    Improving Composer through real-time RL  ( 8 min )
  • Open

    New Plugins on the Cursor Marketplace
    Mar 11, 2026  ( 3 min )

  • Open

    Gemini in Google Sheets just achieved state-of-the-art performance.
    Today we announced new beta features for Gemini in Sheets to help you create, organize and edit entire sheets, from basic tasks to complex data analysis — just describe …  ( 14 min )
  • Open

    Improving instruction hierarchy in frontier LLMs
    IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.
    New ways to learn math and science in ChatGPT
    ChatGPT introduces interactive visual explanations for math and science, helping students explore formulas, variables, and concepts in real time.

  • Open

    The greatest unsolved problem in computer science...
    No content preview
  • Open

    From games to biology and beyond: 10 years of AlphaGo’s impact
    Ten years since AlphaGo, we explore how it is catalyzing scientific discovery and paving a path to AGI.  ( 9 min )
  • Open

    OpenAI to acquire Promptfoo
    OpenAI is acquiring Promptfoo, an AI security platform that helps enterprises identify and remediate vulnerabilities in AI systems during development.

  • Open

    How our open-source AI model SpeciesNet is helping to promote wildlife conservation
    An overview of SpeciesNet, our open-source AI model that is helping people around the world protect and conserve wildlife.  ( 17 min )
  • Open

    What the New ChatGPT 5.4 Means for the World
    No content preview
  • Open

    Codex Security: now in research preview
    Codex Security is an AI application security agent that analyzes project context to detect, validate, and patch complex vulnerabilities with higher confidence and less noise.
    How Balyasny Asset Management built an AI research engine for investing
    See how Balyasny built an AI research system with GPT-5.4, rigorous model evaluation, and agent workflows to transform investment analysis at scale.
    How Descript engineers multilingual video dubbing at scale
    Using OpenAI reasoning models, Descript unlocked automatic localization of large content libraries without losing timing or meaning.

  • Open

    Ask a Techspert: How does AI understand my visual searches?
    Learn more about AI Mode in Search’s query fan-out method for visual search.  ( 16 min )
    The latest AI news we announced in February
    Here are Google’s latest AI updates from February 2026  ( 17 min )
2026-04-04T02:06:59.904Z osmosfeed 1.15.1