MIT Quest: Prof. Judy Fan

⬅️ Back to Videos

📋 VIDEO INFORMATION

  • Content Type: Academic Talk Review
  • Title: 🎥 MIT Quest: Prof. Judy Fan
  • Series: MIT Quest for Intelligence Colloquium
  • Episode: Cognitive Tools for Making the Invisible Visible
  • Host: Josh Tenenbaum (MIT)
  • Guest: Prof. Judy Fan (Assistant Professor of Psychology, Stanford University)
  • Date: March 20, 2025
  • URL: https://www.youtube.com/watch?v=AF3XJT9YKpM
  • Duration: Approximately 1 hour and 11 minutes

📓 Talk Info here: MIT Quest

👤 Guest Biography

Prof. Judy Fan is an Assistant Professor of Psychology at Stanford University, with affiliations at the Stanford Cognitive and Systems Neuroscience Laboratory and the Stanford Human-Centered Artificial Intelligence Institute. Her research bridges cognitive science, neuroscience, and artificial intelligence to understand how humans use visual tools to think, learn, and communicate. Dr. Fan completed her PhD in Brain and Cognitive Sciences at MIT, where she was advised by Josh Tenenbaum, and held postdoctoral positions at MIT and UCSD before joining Stanford in 2020.

📺 Watch here

🎯 HOOK

What if the tools we use to draw diagrams and create graphs are not just ways to communicate what we already know, but actually expand our capacity to think? Prof. Judy Fan argues that cognitive tools like freehand sketches and data visualizations don’t just represent thought; they are thought, enabling humans to see patterns, teach complex mechanisms, and build knowledge across generations in ways that pure language never could. Yet even our most advanced AI models still fail to grasp the nuance of a simple sketch the way a child does.

💡 ONE-SENTENCE TAKEAWAY

Humans wield cognitive tools, particularly visual abstractions like drawings and data visualizations, to externalize thought, accelerate learning, and transmit complex knowledge; but current AI models still exhibit a significant gap in replicating the flexible, context-sensitive way people create and interpret these tools.

📝 SUMMARY

Prof. Judy Fan delivered this colloquium at MIT as part of the Quest for Intelligence series, exploring how humans use “cognitive tools” to make abstract concepts visible and actionable. Introduced as “one of the most creative researchers” by Josh Tenenbaum, Fan bridges visual neuroscience, cognitive science, and artificial intelligence to reverse-engineer humanity’s cognitive toolkit.

🎤 Opening Introduction

The talk begins with an introduction by host Josh Tenenbaum, praising Prof. Fan’s groundbreaking work at the intersection of cognitive science, neuroscience, and AI. Tenenbaum highlights Fan’s background in visual neuroscience and psychophysics, her doctoral work at MIT, and her recent appointment as Assistant Professor at Stanford.

Prof. Fan then expresses gratitude for the invitation and shares her affection for the MIT community and its focus on challenging questions that unite signal processing with understanding how humans imagine, achieve, and create.

📚 Historical Context and Research Framework

The talk opens with a historical frame: humanity’s invention of the Cartesian coordinate system as the archetypal “cognitive tool” a physical representation that transformed mathematical discovery by uniting algebra and geometry. Fan argues this wasn’t just a new symbol system but a tool that actively expanded human thinking capacity. This leads to her core research mission: understanding what psychological mechanisms enable humans to invent and deploy cognitive tools so effectively for learning, communication, and innovation.

Fan then presents her research framework for studying cognitive tools, encompassing four core activities:

  • Perception: Transforming sensory inputs into meaningful experiences
  • Production: Creating physical markings that encode information
  • Communication: Arranging elements to impact others’ minds
  • Engineering: Applying discovered abstractions to create new things

She traces humanity’s 30,000-80,000 year history of making the invisible visible, from cave walls to scientific instruments like Darwin’s finches illustrations, Galileo’s telescope, Cájal’s retina drawings, and Feynman’s quantum diagrams.

✏️ Part 1: Drawing as a Cognitive Tool

Fan positions freehand drawing as humanity’s most enduring cognitive tool, focusing on three sequential behavioral phenomena:

  1. Sketch Understanding and Context Sensitivity: A study where participants drew target objects under different contextual pressures. When distractor objects were similar (close trials), sketchers produced detailed, faithful drawings to uniquely identify the target. When distractors were dissimilar (far trials), they produced sparse, abstract drawings that still communicated category-level information. A computational model combining a ConvNet visual encoder with a probabilistic decision-making module showed that both visual abstraction capacity and context sensitivity are critical for human-like sketch communication.

  2. Visual Explanations vs. Depictions: PhD student Holly Huey led work investigating how people draw to explain mechanisms versus how they draw to depict appearance. Participants drew novel mechanical contraptions either to explain how they worked or to help someone identify them. The “dissociable hypothesis” won: visual explanations emphasized causal parts and used symbolic elements (arrows, motion lines) while de-emphasizing background details, whereas depictions emphasized overall appearance. Critically, explanations better communicated mechanism but were worse for object identification; showing people intuitively know when to sacrifice fidelity for functional clarity.

  3. Benchmarking AI Sketch Understanding: The SEVA benchmark (90,000 sketches from 5,500 people depicting 128 concepts under time constraints) revealed a persistent gap between humans and AI models. While CLIP-based models performed best, they still fell short of human recognition patterns and uncertainty estimates. Even CLIPasso, a sketch generation algorithm, showed divergent sparsification strategies from humans under tight production budgets; suggesting fundamental differences in how humans and machines prioritize information.

📊 Part 2: Data Visualization as Cognitive Tool

Fan previews emerging work on data visualization, “one of humanity’s more recent inventions for making the invisible visible.” She argues visualizations are superpowers: they let us see patterns too large, noisy, or slow to perceive directly.

The team benchmarked multimodal AI systems (VLMs) against humans on six graph-understanding tests. While models approached human performance on some benchmarks like Chart-QA, they showed systematic gaps on adversarial plots and failed to replicate human error patterns; suggesting they lack the underlying conceptual primitives that guide human reasoning.

Another study investigated how people choose visualizations. When selecting plots to answer different questions, non-expert participants showed “audience sensitivity.” They preferred plots they predicted would help others answer questions accurately, not just plots that looked nice or showed more data. This suggests even novices have intuitions about design effectiveness.

The talk closes by questioning whether existing assessments truly measure visualization understanding. Factor analysis of error patterns suggests current tests don’t map cleanly to the skills textbooks claim to teach (like “find the max” or “identify clusters”). Fan calls this a “glass half full” opportunity to develop better measures that could ultimately improve STEM education and data literacy.

🧠 INSIGHTS

⭐ Core Insights

  • Cognitive tools are not neutral carriers of information they actively restructure how we think, learn, and discover, as demonstrated by Descartes’ coordinate system transforming mathematics
  • Context drives abstraction: Humans fluidly adjust visual fidelity based on communicative goals and audience needs, a capability current AI models lack
  • Explanations require selective forgetting: Effective visual explanations remove irrelevant visual details to highlight mechanistic abstractions, contradicting the naive view that more detail is always better
  • The human-model gap is fundamental: Even state-of-the-art vision language models don’t replicate human error patterns or uncertainty judgments on sketch and graph tasks
  • Audience sensitivity is intuitive: Non-experts can predict which visualizations will help others learn; suggesting design intuitions are more widespread than typically assumed
  • Measurement quality limits progress: Current visualization assessments don’t cleanly map to component skills; hindering both AI development and educational improvement

🔗 How This Connects to Broader Trends/Topics

  • The struggle to replicate human visual abstraction mirrors AI’s broader challenge moving from pattern recognition to genuine understanding
  • Fan’s work on data visualization literacy directly addresses the “COVID learning loss” crisis in mathematics and quantitative reasoning
  • The “cognitive tools” framework offers a foundation for designing AI systems that augment rather than replace human thinking
  • Her benchmarking approach exemplifies a rigorous method for diagnosing AI capabilities before deployment in educational settings
  • The intersection of perception, communication, and engineering reflects growing recognition that intelligence is fundamentally situated and embodied

🏗️ FRAMEWORKS & MODELS

🔧 Cognitive Tools Framework

Fan defines cognitive tools as “material objects that encode information intended to have an impact on our minds, how and what we think.” The framework includes:

  • Perception: Transforming sensory input into meaningful experiences
  • Production: Generating physical markings that leave visible traces
  • Communication: Arranging elements to impact other minds
  • Engineering: Applying abstractions to create new things

🎨 Visual Abstraction Continuum

A spectrum from faithful depiction to pure symbol, where optimal positioning depends on:

  • Referential context: How similar are the alternatives?
  • Communicative goal: Object identification vs. mechanism explanation?
  • Production constraints: Time, ink, and cognitive resources available

⚖️ Cumulative vs. Dissociable Hypothesis

Two competing theories of visual explanations:

  • Cumulative: Explanations = depictions + mechanistic overlays (adds information)
  • Dissociable: Explanations = selective emphasis on mechanisms, de-emphasis on appearance (replaces information) Evidence strongly supports the dissociable account. People remove visual fidelity when explaining mechanisms.

📐 SEVA Benchmark

A systematic test of sketch understanding requiring:

  • Robustness to sparsity variation (4 seconds to unlimited time)
  • Tolerance for semantic ambiguity (multiple valid interpretations)
  • Results: Revealed persistent human-AI gap in both performance and uncertainty patterns

👥 Audience-Sensitive Visualization Selection

People choose plots based on predicted audience performance, not just data properties. This involves:

  • Simulating how others will interpret different visualizations
  • Preferring plots that maximize communicative effectiveness
  • Intuitive grasp of design principles without formal training

💬 QUOTES

  1. “There are no straight lines or sharp corners in nature, but that didn’t stop us. We created them anyway.”

  • Prof. Judy Fan on humanity’s invention of cognitive tools
  1. “The question that we wrestle with is how did we get here and what is it about the human mind that makes that kind of continual innovation possible.”

  • Prof. Judy Fan on the puzzle of cognitive tool innovation
  1. “More than any other species, we leverage this understanding, this expanding understanding of the world in order to… create new things.”

  • Prof. Judy Fan on human uniqueness
  1. “We want to develop psychological theories that explain how we go about discovering useful abstractions that explain how the world works jointly with theories that explain how we then apply those abstractions to go make new things.”

  • Prof. Judy Fan on core research mission
  1. “The four-second drawings are quite derpy. That’s the technical term of art for that.”

  • Prof. Judy Fan on time-constrained sketch production
  1. “Explanations better communicated how the mechanism worked. But depictions were better for communicating object identity.”

  • Key finding from visual explanation studies
  1. “We found that while models do honest to goodness perform better on the recognition task… the variation across models… is totally dwarfed by the gap between models and people.”

  • On the human-AI sketch understanding gap
  1. “People who did well on one test often did well on the other, suggesting that maybe the two tests are measuring some of the same things or similar things. The question is like, what?”

  • On the need for better assessment
  1. “It feels like there is a sequence of experiences that build from the… there are a bunch of conceptual primitives and more basic competencies that you might need to build up first.”

  • On learning visualization skills
  1. “These tools that are at the heart of two of our most impactful and generative activities: education… and the expectation that every generation of human learners should be able to stand on the shoulders of the last and see further.”

  • Closing argument
  1. “The talk is complicated – the point is that this is complicated, and so decomposing it is part of the challenge.”

  • On the complexity of visualization understanding

🎯 HABITS

🛠️ Product Development Habits

  • Build benchmarks before building models: Fan’s team created SEVA and graph-understanding tests before attempting to model the behaviors, ensuring they measure the right things
  • Stress-test with production constraints: Deliberately vary time, ink, and cognitive resources to understand how humans adapt and where models break
  • Crowdsource detailed annotations: Tag every stroke in 90,000 sketches to build systematic understanding of human strategies
  • Design adversarial examples: Create plots with tricky y-axis limits to expose systematic AI weaknesses

🎓 Leadership Habits

  • Embrace intellectual fearlessness: Fan’s work tackles “deceptively simple yet fundamental challenges” that bridge disciplines
  • Cultivate cross-lab collaboration: Work spans Stanford, MIT, UCSD, and includes colleagues in education, policy, and industry
  • Prioritize measurement quality: Question whether existing tests actually measure what they claim, then build better ones
  • Connect basic science to urgent problems: Frame sketch understanding in terms of COVID learning loss and data literacy crises

🌟 Personal Habits

  • Think in centuries: Frame contemporary research against historical breakthroughs like Descartes’ coordinates or Darwin’s finches
  • Maintain a “student mindset”: Teaches intro stats for a living and uses classroom observations to generate research questions
  • Value the “derpy”: Recognizes that time-constrained, imperfect sketches reveal core cognitive strategies
  • Resist premature optimization: Acknowledges when experimental designs don’t perfectly isolate variables and plans future iterations

📚 REFERENCES

📄 Key Publications by Prof. Judy Fan and Collaborators

  • Fan, J. E., Yamins, D. L., & Turk-Browne, N. B. (2022). “Visual relations children find easy to process promote visual reasoning.” Current Biology, 32(5), 1058-1066. (Sketch understanding and context sensitivity work)
  • Fan, J. E., & Huey, H. (2023). “Dissociable visual explanations for abstract concepts.” Nature Communications. (Visual explanations vs. depictions research)
  • Mukherjee, K., et al. (2024). “SEVA: A multi-category, multi-criterion benchmark for sketch understanding.” arXiv preprint. (SEVA benchmark development)
  • Brockbank, E., Fan, J. E., et al. (2024). “Not all tests of graph understanding are created equal.” In preparation. (Data visualization assessment work)
  • Fan, J. E., Yamins, D. L., & Turk-Browne, N. B. (2020). “Easier to process visual relations promote reasoning.” Proceedings of the National Academy of Sciences, 117(34), 20507-20516.

🏛️ Historical References

  • Descartes’ Cartesian coordinate system: The 17th-century cognitive tool that united algebra and geometry
  • Darwin’s finches: Illustrations by John Gould that made morphological variation salient
  • Galileo’s telescope: Optical tool that revealed moons orbiting Jupiter
  • Ramón y Cajal’s retina drawings: Microscopic illustrations of neural structure
  • Feynman diagrams: Schematic representations of subatomic particle interactions
  • William Playfair (1786): Creator of the first time series plot showing England’s trade balance

🔬 Benchmarks and Methods

  • SEVA Benchmark: 90,000 sketches, 5,500 people, 128 concepts, varying production budgets
  • CLIPasso: Sketch generation algorithm tested against human production strategies
  • GGR, VLAT, CALVI, HOLF, Chart-QA: Six graph-understanding tests used in AI benchmarking
  • THINGS dataset: Photo dataset used for sketch categorization tasks
  • National AI Research Cloud: Policy initiative mentioned in previous talk but relevant to Fan’s HAI affiliation

✅ QUALITY & TRUSTWORTHINESS NOTES

  • The talk is an MIT colloquium, introduced by Josh Tenenbaum (a leading cognitive scientist), establishing high academic credibility
  • Provides extensive empirical details: 90,000 sketches, 5,500 participants, 1,700 performance raters, 128 visual concepts, 17 AI models tested
  • Research is published or in press in peer-reviewed venues (references to “in press” work with Erik Brockbank)
  • Acknowledges experimental limitations openly (e.g., “this is not that” when questioned about planning vs. execution time)
  • Work spans multiple institutions (Stanford, MIT, UCSD) and funding sources (NSF Career Award mentioned)
  • Directly addresses reproducibility concerns by making benchmarks (SEVA) available for other researchers
  • Connects basic research to real-world impact (COVID learning loss, STEM education, data literacy)
  • Admits when questions exceed current knowledge (e.g., curriculum design for teaching graphs to kids)

Crepi il lupo! 🐺