I Stayed Up Until 4am Playing With Google's New On-Device AI App. Here's What I Found.

Rich Washburn
Apr 7
4 min read

Google's On-Device AI

5:46

I didn’t plan to be awake at 4am last night. But I never do. lol

I downloaded Google AI Edge Gallery on my iPhone, opened Agent Skills, and the next time I looked up it was past 4 in the morning. I’m writing this on maybe three hours of sleep and two cups of coffee, and I still can’t stop thinking about what I was holding in my hand. Not because it was perfect. It wasn’t. But because of what it means that it exists at all.

What I Was Actually Playing With

Google AI Edge Gallery is a free app on iOS and Android. You open it, and you’re greeted with a menu of on-device AI experiences — AI Chat, Agent Skills, Ask Image, Audio Scribe. Pick one, download the model, and it runs. On your phone. Offline. No API call going anywhere. No data leaving the device.

I went straight for Agent Skills because that’s the word that gets my attention. Agentic. Multi-step. Autonomous. Those are the words that separate a chatbot from something actually useful. The model it recommended was Gemma-4-E4B — 3.61 GB sitting on my local storage, 128K context window, running through LiteRT-LM on my iPhone’s GPU. I downloaded it, waited for the install, and started pushing it.

The first thing I noticed: it’s fast. Not “fast for a phone” fast. Just fast. The responses were snappy in a way that surprised me. I kept waiting for the lag you expect from something running locally at this scale, and it mostly didn’t come.

The second thing I noticed: it actually does agentic things. I asked it to pull information from Wikipedia and build me a structured summary. It called the tool, fetched the data, and came back with formatted output. On my phone. Offline. At 1am.

I did the same thing I do whenever I get my hands on something new — I started stress-testing it. Where does it break? What does it hallucinate? Where are the edges? They exist. It’s a 4B parameter model running at reduced precision on a phone, not GPT-4 in a data center. But the edges were further out than I expected.

The Full App Is Bigger Than I Thought

Here’s what the home screen actually shows when you open it:

AI Chat — straight conversation with Gemma 4. Fast, capable for most practical use cases.

Agent Skills — this is the one that kept me up. Multi-step agentic workflows. Tool calling. It can query external sources, generate visualizations, integrate with other models, and chain tasks together. On the device. In your pocket.

Ask Image — 4 models available for visual question answering. You hand it an image and ask questions about it. I tested this. It works.

Audio Scribe — 4 models for transcription and translation. Offline. This one deserves its own conversation because it just quietly ended the business model of every transcription SaaS charging $20 a month.

That last one barely got mentioned in the coverage I read yesterday. Everyone focused on the agentic layer, and rightly so. But Audio Scribe running locally — transcription and translation without a single token hitting a server — is quietly one of the most significant things in this app.

What I Kept Coming Back To

At some point around 2am, I set the phone down and just thought about what I was doing. I was running a capable, multi-modal, agentic AI system on a device that fits in my shirt pocket, connected to nothing, with no ongoing cost, and no one watching the conversation.

That combination has never existed in a consumer product before last week. The models aren’t the most powerful available. The agentic layer is early — Google called it the first implementation of on-device agentic workflows, and you can feel the “first” in some of the interactions. The UI is clearly developer-facing right now, not consumer-polished. But here’s the thing about “first”: it means someone else has to respond.

The Apple Angle Hasn’t Changed. The Pressure Has.

I’ve written about Apple’s on-device AI position before. The argument was always about platform — 2 billion devices, Neural Engine advantage, privacy as a feature, OS-level integration that no third-party app can match. That thesis didn’t change last night. But it got a lot more urgent.

Google shipped a guest experience on Apple’s platform. They don’t control the Neural Engine. They don’t control the OS. They don’t have access to the Secure Enclave or the system-level frameworks Apple’s own code runs on. What they built is genuinely impressive. But it’s running at a handicap on iOS and they know it.

What Apple can do — and what I believe WWDC 2026 is going to show — is put this capability into the operating system itself. Not an app users choose to download. Not a model you manually install. The intelligence layer integrated into iOS, running on silicon Apple designed specifically for this workload, available to every developer through the Foundation Models framework that already shipped at WWDC 2025.

There’s also now reporting that Apple deepened its deal with Google to integrate Gemini into next-gen Siri. Which reframes the whole competitive dynamic. Apple may not be trying to out-model Google. Apple may run Google’s best models on Apple’s hardware, inside Apple’s privacy stack, with Apple’s UX on top. That’s not losing to Google. That’s using Google.

What I Actually Think

I’ve been saying for a while that the on-device AI moment was coming. That the models would shrink fast enough to matter. That the inference would be fast enough to be practical. That privacy-sensitive users — which is most people, when you ask them — would want this.

I was right about all of that. I was probably wrong about the timeline.

I thought we were 12 to 18 months from holding this in our hands. Last night I was holding it in my hands at 2am, asking a locally-running 4B parameter model to do agentic tasks on my iPhone, and it was doing them.

The era of ambient, private, on-device intelligence didn’t arrive with a keynote. It didn’t arrive with a press release. It arrived as a free app in the App Store on a Tuesday, and most people scrolled right past it.

I didn’t sleep much. Worth it.

I Stayed Up Until 4am Playing With Google's New On-Device AI App. Here's What I Found.

Recent Posts

Comments