Chapter 16: Evaluation: Inspections, Analytics & Models

Search this chapter

Audio Overview

0:00 / 0:00

Welcome to Last Minute Lecture.

This free chapter overview is designed to help students review and understand key concepts.

These summaries supplement not replaced the original textbook and may not be redistributed or resold.

For complete coverage, always consult the official text.

Welcome to the Deep Dive.

Our mission is always to give you the shortcut to being well informed and today we are tackling something really, really useful.

How do you evaluate a product's usability without a single live user in the room?

That is the core question, isn't it?

And it's so important for being efficient.

We're diving into the source material that covers these non -user evaluation methods.

You can really think of this as a trade -off.

It's the battle between, say, expert judgment on one side and just raw data on the other.

Okay, so our goal here is to really break down the three fundamental pillars that designers use to do this.

We're going to and AB testing.

That's all about the numbers, the quantitative layer, and we'll finish up with predictive models, which is where pure mathematics comes in to estimate performance.

And by the end of this, you'll get how things like heuristics, logged data, like bounce rates, and even formulas like Fitt's law all come together to help designers spot problems fast.

Exactly.

It's all about spotting those usability issues quickly and thoroughly.

All right, let's start with the human element,

then.

Inspections.

When we say inspections, what we're really talking about is leveraging that deep expert knowledge, right?

The researcher is basically role -playing the user, precisely.

Inspection methods are fundamentally about domain expertise.

The researchers, they draw on their knowledge of interaction design, of how users typically behave.

Define those pain points before a real user ever does.

And I imagine these are most useful early on, or when you're short on time.

Absolutely.

They're indispensable when you're early in the design phase, when maybe real users aren't available yet, or when your time and budget are just really tight.

So the most famous of these methods, the one everyone's heard of, is probably heuristic evaluation, HE.

It is.

And the core concept is using these established usability principles, we call them heuristics, to check if all the interface elements, you know, the dialog boxes, the menus, if they conform to tried and tested standards.

And Jacob Nielsen's 10 usability heuristics are, well, they're the classic starting point.

Now, people sometimes call these principles common sense, but common sense can be the first casualty when you're deep in a project.

So what are the first few that every designer just has to know?

Well, the first one is arguably the most important.

Visibility of system status.

The user just always needs to know what's going on.

They need feedback, like a loading bar or something.

Exactly.

Or just a little message sent confirmation.

The second is match between system and the real world.

This just means using language the user understands, not system jargon.

If the system throws up error 404, that's an immediate failure on this heuristic.

That makes total sense.

So a travel app should use travel words, not database terms.

And then we get to this idea of the user feeling like they're in control.

That's number three, user control and freedom.

We need an emergency exit.

Absolutely.

That's your undo your redo the big clear cancel button.

If a user can get into a state, they have to be able to get out of it just as easily.

And that ties right into number four, consistency and standards.

Ah, so if a button looks a certain way on one page, it needs to look the same on all the others.

Yes.

And number five is error prevention.

This one is so key.

A good design should proactively stop problems from ever happening rather than just relying on good error messages after the fact.

Prevention over cure.

Got it.

Okay.

So the rest of them deal with cognitive load and help things like recognition rather than recall.

So don't make me remember something I saw three screens ago.

Right.

And flexibility and efficiency of use, which lets experts set of shortcuts while still helping out the beginners.

And then the last few are aesthetic and minimalist design.

So no clutter, no irrelevant information.

And the final two are about when things do go wrong, help users recognize, diagnose and recover from error.

So plain language, constructive solutions, and finally help and documentation, which should be easy to search and focused on the task.

It really does sound like a checklist, but the source material points out something really interesting about the process.

You don't need a huge team for this.

The data shows that just three to five expert evaluators can find something like 75 % of the usability problems.

That seems incredibly efficient.

It is, but, and this is a big, but that efficiency comes with a huge warning sign.

Heuristic evaluation should compliment user testing, not replace it.

Why is that?

Because studies show that sometimes evaluators report problems that wouldn't actually happen to the real world.

We call them false alarms.

I was floored by this number in the source material.

It said up to 43 % of reported problems in some studies were false alarms, almost half.

What are the consequences?

Well, they're massive.

It's wasted time, wasted money, potential design debt.

You're busy fixing problems that don't exist while the really severe ones, the ones the expert missed are still lurking in your product.

So expert intuition is good, but it needs a guardrail.

Exactly.

It needs stability, which is why having codified mandated standards is so crucial.

And that leads us right to the gold standard for this, the web content accessibility guidelines or WCAG.

Right.

These are probably the best known standards outside of the HCI world, really.

They focus on making content usable by people with disabilities.

And the four main principles are summarized by the acronym POUR.

POU.

The content has to be perceivable, operable,

understandable, and robust.

It's a framework to make sure that no matter what a user's sensory, physical, or cognitive challenges are, the design is still functional for them.

Okay.

So let's move on from heuristics.

What's the other main inspection method?

You mentioned walkthroughs.

Yes, walkthroughs.

Instead of using that broad set of principles, these methods are intensely focused on a very specific task flow.

The most popular one is called the cognitive walkthrough or CW.

And what's the focus there?

The focus is all about the ease of learning.

The researcher simulates the user's problem solving process.

And at every single step of a task, they try to answer a few key questions from the user's perspective.

Let's break those down because they seem really important.

The first one is, will the correct action be evident?

So does the user even know what they're supposed to do next?

Will the user notice that the correct action is available?

I mean, can they physically see the button or the link they need to click right now?

And third, will the user interpret the system's response correctly?

After they clicked, do they understand the feedback they just got?

If you can't say yes to all three of those at any given step, boom, you've found a usability problem.

Then there's this really specialized one, semiotic engineering inspection, which is all about the interface's communicative power in signs.

How does that work?

This technique looks at the signifying message that the designer is sending.

It breaks down the signs in an interface into three types.

So imagine you're booking a flight online.

All right.

A static sign is something with instant meaning, like the text economy class or maybe the letters GMT next to the departure time.

You get it instantly.

Got it.

Instant meaning.

What's a dynamic one?

Dynamic signs communicate meaning over time or through interaction.

Think about picking a date from a dropdown calendar or watching the seating chart update as you click on a section.

The meaning unfolds as you interact.

And the last type.

Metalinguistic signs.

These are explanations about another sign.

On that same booking site, it's the little icon you click that explains why the flight price just changed or why your luggage fee is separate.

It's the designer stepping in to comment on the system.

That's a really useful distinction.

Okay.

And finally, there's the pluralistic walkthrough.

What makes that one different?

The team.

It's multidisciplinary.

You have users, developers, and usability specialists all in the room together.

And the crucial part is that each person writes down their proposed sequence of actions independently first before anyone discusses it.

So you avoid that group think effect.

Exactly.

It gives you incredibly detailed focus on the user's task from every possible view coin development, expert, and real world without any initial bias.

Okay.

So let's shift gears completely.

Moving away from expert judgment and into the purely quantitative world.

Analytics.

This is where we start logging user actions remotely.

Right.

Analytics logs everything.

Keystrokes, mouse movements, how long you spend on a page.

The big advantage is the sheer volume of data you can collect and you do it unobtrusively.

It gives you a big picture view of user behavior that expert inspection just can't provide at that scale.

And for web analytics specifically, we're tracking visitor behavior to optimize the site.

There's on -site analytics, what people do on your site, and off -site how visible you are on, say, Google.

And if we look at the real world example from the source, the dashboard for iDashbook .com, you can really see the power of these numbers.

They had, what, 4 ,723 users and over 2 ,200 sessions in one period.

Their overall bounce rate was 58 .30%.

And just to remind everyone, the bounce rate is the percentage of people who land on one page and then just leave immediately.

Yeah.

So that 58 % is pretty typical, but the data showed something much more specific, a real failure point.

Exactly.

When they segmented the data, they found that non -English speaking visitors, particularly from China, had a bounce rate of 82 .86%.

Wow.

That huge jump is a massive red flag.

It points to a specific localized usability failure.

It could be a bad translation, a formatting issue, maybe a cultural mismatch.

Analytics tells you where the product is bleeding users, even if it can't tell you exactly why.

And these tools are getting so specific now.

The source mentions learning analytics and online courses, which can track the exact lecture or quiz where students start dropping out.

Which brings us to A -B testing.

This is really the gold standard for doing controlled experiments online.

It's a massive experiment, often with thousands of users, where you compare two versions of a design A and B at the same time.

It's a classic between subjects design.

You mentioned something before we even get to the A -B test, this idea of running an A -A test first.

Why is that so critical?

I mean, you're just testing the same thing against itself.

Right.

It sounds a bit redundant, doesn't it?

But you absolutely have to do it.

You need to make sure that your random population selection is actually working and that the testing conditions themselves, you know, the complex infrastructure of the internet,

aren't already skewing your data.

So you're checking for random noise in the system.

Precisely.

If your A -A test shows wildly different results between the two groups, you know your environment is too unstable to trust any data you get from a real design change.

And the risks of getting that wrong are huge.

That Microsoft Office 2007 home page example is the perfect cautionary tale.

They launched a new design hoping to increase downloads, but clicks dropped by 64 percent.

The team was completely baffled.

They thought there must be a huge technical bug.

But after they dug in, they realized the design change had introduced an unintended variable that completely warped what the user was trying to do.

What was it?

The old design had a big prominent try 2007 for free option.

It was all about evaluation.

But the new design,

it prominently featured the price tag $149 .95 by now.

Oh, so the new design changed the user's goal from free trial to expensive purchase.

And that one little variable, the sudden appearance of a high price, it just killed the whole experiment.

It invalidated everything.

It's a perfect example of how A -B testing can tell you the what, but you still need that human insight to figure out the why.

Okay, so our final pillar takes us into pure quantification,

predictive models.

This is where we use mathematical formulas to estimate user performance without any users or

This is evaluation by math and physics, really.

And the most influential model here is Fitt's law.

Anyone who's ever tried to tap a tiny icon on their phone while a train is shaking understands this law intuitively.

It predicts the time t it takes to reach a target based on the target size, s, and the distance d you have to move to get to it.

That's the core idea.

Make the target bigger or put it closer and the task gets faster and easier.

The actual formula is a bit more complex.

It's 2dL equals k log 2, 2dS plus 1.

But the takeaway is simple and powerful.

It guides designers on where to put buttons and how big to make them.

This is where the math gets really cool.

The law actually predicts that targets in the four corners of the screen are the fastest to access.

Why is that specific layout so much more efficient?

It's because of something called the pinning action.

When you move your cursor or your finger toward a corner, you can't overshoot it on sides.

The edges of the screen act as a physical barrier.

Functionally, that makes the target feel infinitely large in those directions, which dramatically cuts down the time it takes to hit it.

So if corners are the fastest, what's a common everyday example where Fitt's law is used to make a target effectively bigger?

A classic one is a labeled tool on a toolbar, like in Microsoft Office.

Even if you recognize the tiny little icon, the text label right next to it becomes part of the clickable area.

That increases the effective size, the s of the target, making it quicker to acquire.

And Fitt's law is adapted for everything now.

Touchscreens, game controllers, you name it.

So what does this all mean for you?

We've covered three really distinct ways to evaluate a design, all without needing to recruit a single user.

We started with inspections.

These are the qualitative, expert -driven methods that rely on codified knowledge, like Nielsen's 10 heuristics or the P .O .U .R.

principles from WCAG.

They're fast, but they can be prone to those false alarms.

Then we move to analytics and A -B testing.

That's the quantitative side, logging huge amounts of data to see what users are actually doing, finding those critical failure points like high bounce rates.

But you have to be so careful with your experimental design.

And finally, predictive models like Fitt's law, using pure math to optimize layout and efficiency,

explaining why those corner targets are so easy to hit.

And it's important to remember that expert inspections and user data, they often find different kinds of problems.

A mix is always best.

Which brings us to a final question for you to think about.

Given that we know inspection methods can have high rates of expert bias and those false alarms, should design teams start prioritizing the hard, quantifiable data from analytics and A -B testing more?

Or is there still some irreplaceable value in the deep targeted insight, the why, that only a human expert can really provide?

It's that constant balancing act, isn't it?

Between the quality of the analysis and the sheer volume of the data, something to mull over as you start applying these ideas to your own work.

Thanks for tuning in for this deep dive from the Last Minute Lecture team.

We'll catch you next time.

ⓘ This audio and summary are simplified educational interpretations and are not a substitute for the original text.

Chapter SummaryWhat this audio overview covers

Evaluating interactive systems without direct user participation requires alternative approaches that leverage expert judgment, automated data collection, or mathematical prediction. Three primary evaluation strategies—inspections, analytics, and predictive modeling—enable designers and researchers to identify usability problems, understand user behavior patterns, and forecast performance when traditional user testing is impractical or resource-constrained. Inspection methods depend on expert evaluators who systematically examine interface designs against established principles or by simulating user tasks. Heuristic evaluation, formalized by Jakob Nielsen, applies a set of recognized usability standards such as system visibility, user control, design consistency, and minimal cognitive demand to detect interface flaws across components like navigation systems and dialog structures. Accessibility-focused inspection builds on specialized frameworks like the POUR principles from Web Content Accessibility Guidelines, specifically designed to serve users with various disabilities. Cognitive walk-throughs take a more detailed approach by tracing the sequence of steps a user would follow to complete a specific task, then assessing whether each action is obvious, perceptible, and correctly interpreted through system feedback. Pluralistic walk-throughs extend this method by assembling diverse stakeholders—end users, developers, and evaluators—to collaboratively step through task scenarios and share observations. Semiotic engineering evaluation examines how effectively a design communicates meaning through static elements, dynamic interactions, and metacommunicative signs. The analytics approach automates the collection and visualization of user interaction data at scale, capturing metrics such as keystroke sequences and navigation patterns. Web analytics platforms measure traffic volume, page viewing frequency, and bounce rate metrics to inform optimization decisions, while learning analytics apply similar tracking in educational platforms. A/B testing extends analytics through controlled experimentation, randomly distributing large user populations across alternative designs and using statistical comparison to measure effects on measurable outcomes such as engagement or conversion rates. Predictive models forecast user performance using mathematical formulas, with Fitts' law being the most influential in interaction design by quantifying the relationship between target distance, size, and the time required to acquire it, thereby guiding decisions about button placement and sizing in physical and digital interfaces.

Using this chapter to study? Last Minute Lecture is free and student-run. If it helped, consider supporting the project.

Support LML ♥

Chapter 16: Evaluation: Inspections, Analytics & Models

Related Chapters