Pockets App

Validating key use cases and users’ mental model of storage.

AI Disclosure

Some parts of these notes were clarified and refined using AI tools such as ChatGPT, Gemini, and Claude to enhance clarity and readability. All ideas are my own, and I reviewed and edited all texts myself.

Scope

Design experiment

MUNI

Duration

2 months

Year

2026

See Results

Context

(01)

I'm working on a macOS app called Pockets. It's an AI-enabled floating-pocket productivity tool for multitasking.

A place to create notes, todos and store your files, images and text snippets with smart analysis and categorization using GPT Vision.

The UI is still in development. I need to validate whether users, on first contact and without guidance, understand and can complete three basic use cases. The last task tests a crucial product assumption about local file storage inside the app.

Productivity tool

for MacOS

UI & Onboarding

in development

AI Enabled

smart features

Research problem and goal

(02)

New interaction model with floating pockets and local storage.

Pockets combines a new interaction model (floating pockets) with its own local storage. With new concepts, there’s a risk users won’t discover the core controls or will form a different mental model of where their data lives and how to work with it next.

Goal of the experiment

Validate discoverability, first-use usability, and the mental model across three core use cases in a functional build:

Creating a Todo task

inside a Pocket

Opening and closing a Pocket

from the Dashboard

Sending an inserted file

to another person via email or Teams (storage test)

Main research question

(03)

“Can target users, during their first use of Pockets and without guidance, complete three core use cases, and does the current UI communicate the intended mental model of local in-app file storage?”

Hypotheses

(04)

H1

At least 80% of participants will complete each task fully successfully without help.

H2

The file-sharing task will reveal gaps in the storage mental model. Given the file-centric macOS paradigm where users typically manage and share files via Finder and system folders like Downloads/Documents, some participants will first search in Finder or Downloads, not in Pockets.

H3

Task 1 (create a Todo) will be the fastest and most successful because it builds on familiar todo patterns. Task 2 (open and close a Pocket from the Dashboard) will be moderately difficult due to managing the state of a floating element. Task 3 (insert and share a file) will be the slowest and most error-prone due to discovering the file-insert flow and expectations about a system file location outside the app.

Expected contribution

(05)

For the product

Validate whether local in-app storage works as a clear default.
A scorecard for three MVP-critical use cases and a clear priority list for fixes.
Inputs for onboarding design.

For users

Faster onboarding, improved UX.
Increased adoption and reach.

Methodology and study type

(06)

Moderated user testing (formative usability test) on a functional macOS build.

Why this method fits?

Measures discoverability and first-use success in real interaction.
Captures mental models, hesitation, and recovery strategies.

Experiment design

(07)

Single-condition usability test with three scripted tasks.

Variable control

Same build, same environment, same tasks for everyone.
No prior explanation of controls or storage.
Light think-aloud to capture mental models.
Minimal help, always logged (so success is not inflated).

What I evaluate

Performance and error types across the three tasks, especially where mental models break.

Participants

(08)

macOS users who handle multiple projects, tasks, and files daily

Freelancers, students, product and creative roles. Some with self-reported ADHD or organizational difficulties.

~10

number of participants

Daily Mac use, multi-project workflow, works with todos and files

Inclusion

Non MacOS users, do not work with todos and files

Exclusion

Classmates, colleagues, build-in-public threads, design&dev communities

Recruitment

Operationalization and metrics

(09)

Metrics & Methods

Task success (4 levels)

Fully successful / minor issues / partial / fail

Time

Time on task (number of steps)

Errors taxonomy

Missed affordance/ Wrong mental model / E3 Unclear label/icon / Unreadable system state

Self-report

SEQ after each task, UMUX-lite at the end

Think aloud lite

SEQ after each task, UMUX-lite at the end

Single-condition usability test

moderated, with 3 scripted tasks

Task success (4 levels)

Fully successful / minor issues / partial / fail

Errors taxonomy

Missed affordance/ Wrong mental model / E3 Unclear label/icon / Unreadable system state

Think aloud lite

SEQ after each task, UMUX-lite at the end

Time

Time on task (number of steps)

Self-report

SEQ after each task, UMUX-lite at the end

Single-condition usability test

moderated, with 3 scripted tasks

Session flow

(10)

Session flow

15-20 min per participant

remote, moderated call

Intro, consent, brief background (2 min)
Task 1 (Todo) with light think-aloud, measure time, errors, steps (3–4 min)
SEQ (30 s)
Task 2 (Dashboard open/close) (3–4 min)
SEQ (30 s)
Task 3 (Storage + share) (6–7 min)
SEQ (30 s)
UMUX/SUS + open questions (3–4 min)

Uncertainties and discussion questions

(11)

Qs before the experiment

Whether it makes sense to add a hybrid model (local storage + a visible system location) or simply strengthen storage communication in the UI/onboarding.
What should be clear from the core UI vs. what is acceptable to explain in onboarding.

Experiment results

(12)

“It’s confusing as f*ck, I’m telling you right away.”

-A participant that is not working in technology.

Participants

8 macOS users from knowledge-work roles

software engineering, UX design, medical doctor, productivity geeks,…

Each participant completed 3 tasks

Todo, Dashboard, file workflow

Metrics captured

Task success (4 levels)

Fully successful / minor issues / partial / fail

Time

Time on task (number of steps)

Errors (E1–E4)

E1 (Missed affordance) / E2 (Wrong mental model) / E3 (Unclear label/icon) / E4 (Unreadable system state)

Self-report

SEQ after each task, UMUX-lite at the end

Task Performance Summary

(13)

Task 1

Create a Todo in a Pocket

“That took me quite some time,
to figure out the icon meaning.”

-A user who likes swimming.

75%

Success rate

6/7

avg. SEQ

Task 2

Open/Close a Pocket from the Dashboard

“That was easy.”

-A coffee lover.

87,5%

Success rate

6/7

avg. SEQ

Task 3

Insert and Share a File

“It’s confusing as f*ck,
I’m telling you right away.”

-A participant that is not working in technology.

25%

Success rate

3/7

avg. SEQ

Hypotheses vs Results

(14)

H1

At least 80% of participants will complete each task fully successfully without help.

H2

H3

Key Insights and Overall Impression

(15)

UMUX-Lite Results

“This app meets most of what I’d need from a tool like this.”

Scores 5/7.

“This app is generally easy to use.”

Scores roughly 4/7.

Summary

Strong concept potential, but current UI demands too much cognitive effort for a plug-and-play productivity tool.

Next Steps

(16)

.

zsh

zsh

Lessons learned

(17)

Tons of actionable insights that are feasible to ship, and clear interest in the product from the intended user base. Think aloud lite proved to be super valuable. User interactions and follow-up debate sparked more curiosity for regarding AIxHuman Experiences, and the impact to UI changes in the age of AI agents.

The good.

Steps weren’t defined tightly enough, variables control wasn’t strict enough, and the time allocation was too tight. Only 8 participants, in a remote setting, with an active need to prompt “think aloud light”. It’s also still unclear whether to stay strict with the ideal participant segment, or keep designing around the personal product vision, since users tended to project needs from other apps onto Pockets.

The bad.

Try Pockets

(18)

Interested in testing Pockets?

I’d love to hear your feedback!

Pockets is in unsigned alpha testing with a bring-your-own OpenAI API Key.

Get In Touch

Pockets App

Pockets App

New interaction model with floating pockets and local storage.

“Can target users, during their first use of Pockets and without guidance, complete three core use cases, and does the current UI communicate the intended mental model of local in-app file storage?”

H1

H2

H3

Moderated user testing (formative usability test) on a functional macOS build.

Single-condition usability test with three scripted tasks.

Metrics & Methods

Session flow

Qs before the experiment

“It’s confusing as f*ck, I’m telling you right away.”

Participants

Metrics captured

“That took me quite some time,to figure out the icon meaning.”

“That took me quite some time,to figure out the icon meaning.”

75%

75%

6/7

6/7

“That was easy.”

“That was easy.”

87,5%

87,5%

6/7

6/7

“It’s confusing as f*ck,I’m telling you right away.”

“It’s confusing as f*ck,I’m telling you right away.”

25%

25%

3/7

3/7

H1

H2

H3

UMUX-Lite Results

Summary

.

“That took me quite some time,
to figure out the icon meaning.”

“That took me quite some time,
to figure out the icon meaning.”

“It’s confusing as f*ck,
I’m telling you right away.”

“It’s confusing as f*ck,
I’m telling you right away.”