Validating key use cases and users’ mental model of storage.
I’m working on a macOS app called Pockets.
It’s an AI-enabled floating-pocket productivity tool for multitasking.
A place to create notes, todos and store your files, images and text snippets with smart analysis and categorization using GPT Vision.
The UI is still in development. I need to validate whether users, on first contact and without guidance, understand and can complete three basic use cases. The last task tests a crucial product assumption about local file storage inside the app.
Productivity tool
for MacOS
UI & Onboarding
in development
AI Enabled
smart features
New interaction model with floating pockets and local storage.
Pockets combines a new interaction model (floating pockets) with its own local storage. With new concepts, there’s a risk users won’t discover the core controls or will form a different mental model of where their data lives and how to work with it next.
Goal of the experiment
Validate discoverability, first-use usability, and the mental model across three core use cases in a functional build:
Creating a Todo task
inside a Pocket
Opening and closing a Pocket
from the Dashboard
Sending an inserted file
to another person via email or Teams (storage test)
“Can target users, during their first use of Pockets and without guidance, complete three core use cases, and does the current UI communicate the intended mental model of local in-app file storage?”
H1
At least 80% of participants will complete each task fully successfully without help.
H2
The file-sharing task will reveal gaps in the storage mental model. Given the file-centric macOS paradigm where users typically manage and share files via Finder and system folders like Downloads/Documents, some participants will first search in Finder or Downloads, not in Pockets.
H3
Task 1 (create a Todo) will be the fastest and most successful because it builds on familiar todo patterns. Task 2 (open and close a Pocket from the Dashboard) will be moderately difficult due to managing the state of a floating element. Task 3 (insert and share a file) will be the slowest and most error-prone due to discovering the file-insert flow and expectations about a system file location outside the app.
For the product
Validate whether local in-app storage works as a clear default.
A scorecard for three MVP-critical use cases and a clear priority list for fixes.
Inputs for onboarding design.
For users
Faster onboarding, improved UX.
Increased adoption and reach.
Moderated user testing (formative usability test) on a functional macOS build.
Why this method fits?
Measures discoverability and first-use success in real interaction.
Captures mental models, hesitation, and recovery strategies.
Single-condition usability test with three scripted tasks.
Variable control
Same build, same environment, same tasks for everyone.
No prior explanation of controls or storage.
Light think-aloud to capture mental models.
Minimal help, always logged (so success is not inflated).
What I evaluate
Performance and error types across the three tasks, especially where mental models break.
macOS users who handle multiple projects, tasks, and files daily
Freelancers, students, product and creative roles. Some with self-reported ADHD or organizational difficulties.
~10
number of participants
Daily Mac use, multi-project workflow, works with todos and files
Inclusion
Non MacOS users, do not work with todos and files
Exclusion
Classmates, colleagues, build-in-public threads, design&dev communities
Recruitment
Metrics & Methods
Session flow
15-20 min per participant
remote, moderated call
Intro, consent, brief background (2 min)
Task 1 (Todo) with light think-aloud, measure time, errors, steps (3–4 min)
SEQ (30 s)
Task 2 (Dashboard open/close) (3–4 min)
SEQ (30 s)
Task 3 (Storage + share) (6–7 min)
SEQ (30 s)
UMUX/SUS + open questions (3–4 min)
Qs before the experiment
Whether it makes sense to add a hybrid model (local storage + a visible system location) or simply strengthen storage communication in the UI/onboarding.
What should be clear from the core UI vs. what is acceptable to explain in onboarding.
“It’s confusing as f*ck, I’m telling you right away.”
-A participant that is not working in technology.
Participants
8 macOS users from knowledge-work roles
software engineering, UX design, medical doctor, productivity geeks,…
Each participant completed 3 tasks
Todo, Dashboard, file workflow
Metrics captured
Task success (4 levels)
Fully successful / minor issues / partial / fail
Time
Time on task (number of steps)
Errors (E1–E4)
E1 (Missed affordance) / E2 (Wrong mental model) / E3 (Unclear label/icon) / E4 (Unreadable system state)
Self-report
SEQ after each task, UMUX-lite at the end
Task 1
Create a Todo in a Pocket
-A user who likes swimming.
Success rate
avg. SEQ
Task 2
Open/Close a Pocket from the Dashboard
-A coffee lover.
Success rate
avg. SEQ
Task 3
Insert and Share a File
-A participant that is not working in technology.
Success rate
avg. SEQ
H1
At least 80% of participants will complete each task fully successfully without help.
H2
The file-sharing task will reveal gaps in the storage mental model. Given the file-centric macOS paradigm where users typically manage and share files via Finder and system folders like Downloads/Documents, some participants will first search in Finder or Downloads, not in Pockets.
H3
Task 1 (create a Todo) will be the fastest and most successful because it builds on familiar todo patterns. Task 2 (open and close a Pocket from the Dashboard) will be moderately difficult due to managing the state of a floating element. Task 3 (insert and share a file) will be the slowest and most error-prone due to discovering the file-insert flow and expectations about a system file location outside the app.
UMUX-Lite Results
“This app meets most of what I’d need from a tool like this.”
Scores 5/7.
“This app is generally easy to use.”
Scores roughly 4/7.
Summary
Strong concept potential, but current UI demands too much cognitive effort for a plug-and-play productivity tool.
.
Tons of actionable insights that are feasible to ship, and clear interest in the product from the intended user base. Think aloud lite proved to be super valuable. User interactions and follow-up debate sparked more curiosity for regarding AIxHuman Experiences, and the impact to UI changes in the age of AI agents.
The good.
Steps weren’t defined tightly enough, variables control wasn’t strict enough, and the time allocation was too tight. Only 8 participants, in a remote setting, with an active need to prompt “think aloud light”. It’s also still unclear whether to stay strict with the ideal participant segment, or keep designing around the personal product vision, since users tended to project needs from other apps onto Pockets.
The bad.
Interested in testing Pockets?
I’d love to hear your feedback!
Pockets is in unsigned alpha testing with a bring-your-own OpenAI API Key.