An Unsolicited Streaming App Spec

I subscribe to a lot of streaming video services, and that means I use a lot of streaming video apps. Most of them fall short of my expectations. Here, then, is a simple specification for a streaming video app. Follow it, and your app will be well on its way to not sucking.

This spec includes only the basics. It leaves plenty of room for apps to differentiate themselves by surprising and delighting their users with clever features not listed here. But to all the streaming app developers out there, please consider covering these fundamentals before working on your Unique Selling Proposition.

Obviously, a list of even the most rudimentary features can’t help but also be opinionated. Though my tastes have surely influenced this list, I really do think that any streaming app that fails to implement nearly all of these features is failing its users. Again, these are not frills. These are the bare-bones basics.

Launch Experience

On launch, it must be immediately obvious how to resume watching whatever the user was watching previously. This may be the most important feature outside the video player itself.

If the user was in the middle of watching an episode of a TV show, the most prominent thing on the screen should be a way to continue that episode. If the user just finished an episode, then “resuming” means watching the next episode, and so on.

Resuming exactly where the user left off—for example, launching into the video player, paused at the exact moment the user stopped watching—is also acceptable, provided it is made obvious that this has happened. Launching into a completely black video playback screen is not a good experience.

(I am ignoring user profiles for now—that’s how basic this specification is. But a good app should support profiles in some way, and this may add a step for the user to select their profile before getting to the point where they can resume viewing.)

Information Architecture

Expose and support the intrinsic information hierarchy of the media. TV shows have seasons. Seasons have episodes. Episodes are made by people (actors, writers, directors). Whatever other ways an app chooses to slice and dice the media it vends, it must also support the simple hierarchy that is most likely to match the user’s mental model.

This hierarchy should exist both visually and navigationally. From an episode of a TV show, it should be obvious how to go up in the hierarchy to the season that the episode exists within, and from there to the list of seasons in the show, and then perhaps down into another season, then down into an episode of that season, and so on.

Though it’s often desirable to take shortcuts when navigating (e.g., to jump back to the home screen after completing the final episode of a TV series), that doesn’t mean the hierarchy shouldn’t exist at all. A shortcut is a way to skip levels in the hierarchy, not a way to erase it from the app entirely.

State Preservation

Keep track of what the user has done, and when. Which things has the user watched? Were they watched entirely or partially? How many times has something been watched? Were any parts skipped? This information is crucial for the functionality of the app, and it should be treated as precious. Preserve this state the same way a text editor preserves typed characters. Sync it across all instances of the app.

Visual Communication

The things the app knows should be communicated visually to the user. When viewing a list of episodes, put something on the screen to indicate which ones have been viewed and which ones haven’t. Consider showing a user’s progress within an episode as well. No one likes visual clutter, but a simple progress bar (for example) can show both of these things in a single, slim interface element.

Similarly, when video is playing, it should be possible to find out what, exactly, is being played. The most straightforward way to do this is to show some text when the video is paused that identifies the TV show, season number, and episode number.

The user has questions, and the app has the answers. It need only communicate them. What am I watching? How long is it? How much time is left? What is the name of this actor? What year was this movie made? When will the next episode of this TV show be released? Was this TV show cancelled? And on and on. This information is useless if it’s not exposed in the interface. Visual elements—well-placed in a sensible information hierarchy—are the key to solving this problem.

Video Player

The following playback controls must be one tap/click away and must have large, obvious targets.

The following playback controls must be accessible without leaving the video player. They may be more than one tap/click away.

The following information must be accessible without leaving the video player.

There must be a way to pause the video and get an unobstructed view of a still frame. That means no playback controls on top of the video and no dimming or tinting of the video frame. It’s fine if it takes a few taps to get to this state, but it must be possible.

When a video ends, there must be a way to go to the next video, assuming there is an obvious choice for this (e.g., the next episode in a TV show).

My List

There must be a way for the user to manually create a list of media. In the common case, this is a list of media that the user intends to watch (eventually), but it can be used for any purpose. The important part is that the user makes the list intentionally. Nothing gets added to this list automatically.

At a minimum, the list must accept top-level items in the hierarchy (e.g., TV shows, movies). The list could also accept more granular items, like individual TV episodes.

This is the one feature that may seem the least “basic,” but it really is essential. There’s so much good content available today that we need our apps to help us keep track of it all, not just what we’re currently watching. If state preservation and visual communication are the app’s short-term memory, then “My List” is the app’s long-term memory.

A Low Bar

This is a pretty boring list, huh? A streaming app with only these features seems like it would be quite limited. But the sad fact is that few, if any, popular streaming apps reach even this extremely low bar. Let’s take a look at some examples.

Netflix (iOS)

The last thing I did in the app was watch part of an episode of a TV show. On launch, after selecting my user profile, the show I was in the middle of watching is not visible anywhere on the screen. The “Continue watching for John” section, several screens lower down, contains buttons to resume many other shows, but not the one I was just watching. (Maybe it’s because I started watching it from “My List”? Who knows?)

When playing video, there is no way to toggle subtitles on and off with a single tap. (It takes three taps to turn them on and another three to turn them off.) There is also no way to skip to the beginning other than dragging the scrubber manually.

Pausing the video shows the season number, episode number, and title, but not the name of the TV show.

The duration of the video is not shown anywhere unless the video has just started. To get the duration, the user must add the time remaining (displayed at the end of the timeline) to the current play position (displayed when the scrubber is “grabbed” by holding a finger down on it).

Though there is limited access to the intrinsic hierarchy of the media (e.g., I can go from watching an episode of a TV show to a list of episodes in the current season), it is incomplete, and it does not expose all the available information. For example, there is no obvious way to get from the video player to the episode list and then to a detail screen for an individual episode that shows things like the cast and the date it was released. Instead, the video must be “closed,” which may lead to an episode detail page, provided that’s where you started when navigating to the episode in the first place. The information hierarchy, such as it exists, is quite a muddle, and it only sporadically intersects with the navigation hierarchy.

HBO Max (iPadOS)

The last thing I did in the app was watch the latest episode of a TV show. On launch, a promo for a show I have never watched fills most of the screen, and a small “Continue Watching” section is partially visible at the very bottom. It shows an episode of a TV show that I have already finished watching (complete with an entirely full progress bar) and a movie I skipped into the middle of to check something several months ago. The TV show I was watching is not listed, even though the only thing I’ve done in the HBO Max app for the past week is watch episodes of this show.

When playing video, there is no way to toggle subtitles on and off with a single tap. (It takes three taps to turn them on and another three to turn them off.)

The duration of the video is not shown anywhere unless the video has just started. To get the duration, the user must add the time remaining (displayed at the end of the timeline) to the current play position (displayed at the start of the timeline).

Disney+ (Apple TV)

The last thing I did in the app was watch part of an episode of a TV show. On launch, after selecting my user profile, the show I was in the middle of watching is not visible anywhere on the screen. I had to scroll down two rows to get to the “Continue Watching” section, where my episode was listed.

When playing video, there is no way to toggle subtitles on and off with a single action. Instead, I have to swipe down to display a menu of options, swipe over to subtitles, swipe down to pick a language, and click to select it—then do the same steps again to turn subtitles off.

I could not find a way to get from the video player to either an episode list or a detail page for the episode I’m watching. Like the Netflix app (and many others), the relationship between the information hierarchy and the navigation hierarchy is tenuous at best.

This is not an exhaustive exploration of any of these apps, let alone all streaming apps. And I’m sure some people will quibble with the particulars of my spec. For example, why place so much emphasis on quick access to subtitles? (It’s because being able to quickly skip backwards and briefly enable subtitles is something I do frequently, both on my own and at the request of others. Though keeping subtitles on all the time is surely the most common use case, briefly enabling them to clarify a few lines of dialogue is a close second.)

And, yes, I know that there are often other, “better” ways to accomplish these tasks in some apps on some platforms. For example, I can hold down the microphone button on my Apple TV remote and say “enable subtitles” or “disable subtitles” and it will usually work. Better still, I can ask “What did he say?” and the Apple TV will skip backwards, enable subtitles, play for a short duration, and then disable subtitles again, all on its own. Surprise and delight!

But none of this changes the overall picture, which is that even the most popular, well-funded streaming video apps fail to get the basics right in a shocking number of ways. Conflicting incentives surely explain some of these failings (e.g., promoting new content rather than letting me quickly resume what I was already watching), but an explanation doesn’t make these shortcomings any less bothersome.

And then there are the gaps that seem unmotivated. Is there really no room on a giant iPad or TV screen to show me the name of the TV show I’m watching when the video is paused? Why is it so hard to go from viewing an episode of a TV show to a list of episodes for that show? Why is there sometimes no way other than voice control to enable subtitles or change the audio track while watching a video? There’s plenty of low-hanging fruit waiting to be picked.

From Good to Great

I tried to limit myself to the basics to prove a point, but there is a vast world of good ideas that are just beyond the basics. These are simple, proven techniques like remembering which option a user picked from a menu the last time and bubbling that up as the top choice, or adding (gasp!) settings to let the user configure features according to their preferences, like how many seconds forward or backwards the skip buttons should travel, or which subtitle or audio track should be on by default, perhaps with per-show customizations.

And if you think this spec is just a list of my personal preferences, I can assure you that list is much longer. To give just one example, I wish every streaming app had a way to advance forward and backward by a single frame at a time. Trying to precisely manipulate the play/pause button or the timeline scrubber to get to the exact frame where I can read some bit of background text is not a game I enjoy playing. (Laggy, unresponsive apps make this even worse.)

Also consider creating interface elements that are reusable. A good control for filtering and sorting lists, for example, could be used in many places within a streaming app. (Most offer no sorting options at all, which is criminal.) The same goes for iconography for status and actions: standardize it, and use it everywhere. It’s a sad state of affairs when the original TiVo on-screen interface bests most modern streaming apps in terms of predictability, legibility, and consistency.

And let’s not forget the tried-and-true practice of stealing features from competitors. How has no one yet copied Amazon’s X-Ray feature? Why doesn’t Apple TV+ have any way to manually curate a list of TV shows like seemingly every one of its competitors? Why don’t more apps provide multiple organizational views of the same content like the Disney+ app does? (E.g., release order vs. chronological order for movie series.)

Most streaming apps aim for mass-market appeal, so they can’t get too complex. But today, they’re at the far opposite end of the spectrum, missing basic functionality rather than being bogged down with fancy features and customization. These apps need to walk before they can run. I hope, someday, at least one or two of them can fly.