Austin Z. Henley

Associate Teaching Professor
Carnegie Mellon University

azhenley@cmu.edu
@austinzhenley
github/AZHenley

A theory of how developers seek information

4/24/2021

See the discussion of this post on Hacker News.

Researchers have been developing a theory, Information Foraging Theory (IFT), of how people seek information, whether it be on the web, in a filing cabinet, or even in source code. It follows a metaphor that stems from animals looking for food in the wild.

The basics

A person, known as the predator, seeks information, known as the prey, in an information environment made up of patches, which could be code files, program output, log data, a stack trace, debugger information, etc. The predator navigates within and between patches using links (e.g., a shortcut to jump to a function's definition or a button in a menu) that have some cost (e.g., effort and time) until the predator's information goals are satisfied.

In a patch there are information features (e.g., words and graphics), which may include the prey. Each information has a value to the predator (possibly a zero value!) and a cost (e.g., the time to process the information). Some information features act as cues that provide a clue as to what a link may lead to.

An example: VS Code

Above you can see a screenshot of VS Code with some of the information patches annotated. The editor displays a lot of information. The annotations are highlighting the file listing, an outline view of the current code file, a small region of a code file, the output from the compiler, and a minimap of the current file.

I can navigate within the current code file by scrolling or using Ctrl+F. I can navigate to other patches by clicking on other tabs, buttons, or invoking the Go to Definition shortcut. Furthermore, I could produce new patches by performing a search that will yield a listing of results or by enabling the debugger to display runtime information.

Deciding where to forage

The predator always has three choices: forage within a patch, navigate between patches, or enrich the environment. Foraging within the patch involves processing the information features at their current location. Navigating between patches involves traversing a link to go to a different patch that can hopefully satisfy the predator's goals. The third option involves the predator changing the environment, such as bookmarking a code location or performing a text search, which produces a new patch with search results.

The predator is always trying to maximize the amount of information gained relative to the cost. Given that v is the value and c is the cost, then the optimal choice = max ^v⁄_c.

However, the predator often does not know the value or cost of foraging, so they must make decisions based on their own estimations. That is described as selected choice = max ^E(v)⁄_E(c).

These expected value and cost estimates are based on information scent, which is their perceived chance of finding relevant information based on information features and cues. At any given point, the predator is following the path with the highest scent to maximize the expected value gained while minimizing the cost of foraging.

How predators evaluate links is formalized as:

Image from David Piorkowski's dissertation.

The predator is seeking G and has access to L links. Each link has a set of associated cues J_l where W_j is the amount of attention given by the predator, and S_ji is the scent.

If you want to know more about the formal theory, take a look at Pirolli's book, Information Foraging Theory: Adaptive Interaction with Information (Amazon).

Interesting research findings

Over a dozen studies have been published in the last fifteen years that look at IFT in the context of software development. Below are summaries and links to papers that I found particularly interesting:

Developers overestimate the value and underestimate the cost of foraging: about 50% of navigations yielded less information than expected and 40% required more effort than predicted.
- Foraging and navigations, fundamentally: developers' predictions of value and cost
A predictive model based on IFT can accurately predict where a developer will navigate next.
- Reactive information foraging: an empirical investigation of theory-based recommender systems for programmers
Developers spend about 50% of their time foraging for information during debugging tasks.
- The whats and hows of programmers' foraging diets
Novice developers have considerable difficulty in foraging among different versions of the same code.
- Foraging Among an Overabundance of Similar Variants
Foraging behavior is drastically different based on whether the developer is trying to comprehend code versus trying to fix a bug.
- To fix or to learn? How production bias affects developers' information foraging during debugging

There are endless implications for development tools from these studies. In fact, I've recently written about a tool for navigating code (see Navigate your code like it's 2021) and a tool for viewing older versions of code (see Why is it so hard to see code from 5 minutes ago?).

There is much more work to do. Happy foraging!

There are Amazon affiliate links on this page.