Decoding the Visual World

From unlocking phones with Face ID and tagging friends in photos to spotting product defects on factory floors and detecting tumors in MRI scans, image recognition technology has transformed how we interact with the world.

Since the late 1960s, more than $1 billion in investments from the U.S. National Science Foundation have supported research that helps computers interpret visual information —advancing foundational work in pattern recognition, computer vision and neural networks.

Two researchers in a dark room pointing at computer monitors.

NSF-supported researchers in MBARI's Video Lab annotate thousands of hours of deep-sea footage, creating training data that improves the accuracy of image recognition AI used to identify and classify ocean life.

What is image recognition?

Image recognition is a core task of computer vision, a subfield of AI that enables computers to identify and classify objects, places, people, writing and actions in digital images or videos. This technology relies on machine learning, where algorithms learn to recognize visual patterns by analyzing millions of images.

Convolutional neural networks (CNNs), a type of machine learning, are the backbone of modern image recognition. Inspired by the structure of the brain's visual cortex, CNNs process images in layers. They first detect simple features like edges and outlines, then combine them to identify more complex shapes, allowing computers to accurately distinguish a tree from a car, read a handwritten number as easily as a printed one, or identify a single face within a large crowd.

Seeing the world

Computer vision emerged in the early 1960s as researchers began exploring whether machines could learn to see and interpret simple patterns and objects. By the mid-1960s, NSF-funded researchers were developing tools to detect edges in images and algorithms to recognize lines, shapes and simple patterns.

Over the following decades, NSF researchers advanced core mathematical frameworks and key techniques that allow computers to interpret pixels as meaningful patterns. They developed methods to more accurately describe shapes and spatial relationships, refined approaches for segmenting and filtering images, and applied statistical and signal processing techniques to detect patterns in noisy images, separate objects from backgrounds and break down complex visual scenes.

At the same time, NSF-supported cognitive science research revealed that human perception is active and influenced by attention, prior experience and multiple viewpoints. Studies of how the brain processes visual information — from simple features to complex objects — helped researchers uncover principles that inspired computational models, including neural networks, that mimic brain function.

Two digital plots, featuring a blue background and a complex grid of colorful striped boxes.

NSF-supported studies use these types of graphs to map the changes in brain network activity when switching between tasks, improving our understanding of how the brain manages multitasking.

Credit: University of Pennsylvania

A grid of photographs, with the top row a collection of dog images, followed by rows of boat images, rose images, school bus images, and coral images.

Researchers use large, high-quality image datasets to train and test deep learning systems, helping them compare and improve computer vision models.

Credit: Images by Adobe Stock

Vision in the deep

By the late 1990s, growing computer power and sophisticated algorithms allowed machines to learn patterns from data automatically, making object recognition faster, more flexible and accurate.

A pivotal moment came in 2009, when Fei-Fei Li, supported by an NSF Faculty Early Career Development award, and her team launched ImageNet, a publicly available database containing more than 3 million images across 5,000 categories. ImageNet gave researchers the large, high-quality dataset needed to train so-called deep learning systems that are capable of using artificial multilayered neural networks to recognize complex real-world images.

ImageNet also sparked the ImageNet Challenge, an annual competition to see which algorithms could most accurately identify new images. In 2012, a deep learning model called AlexNet used CNNs and advanced graphics processors to analyze ImageNet's vast dataset. It achieved record-breaking accuracy in the ImageNet challenge, cutting error rates in half and proving that deep learning could far surpass earlier approaches.

Alongside ImageNet, NSF-supported centers like the Temporal Dynamics of Learning Center continued to expand understanding of how the brain perceives objects, offering key insights that informed the development of today's deep learning algorithms. Building on this legacy, the NSF AI Institute for Foundations of Machine Learning is advancing the core mathematical and computational tools behind AI. For example, researchers are using deep generative models to reduce noise and sharpen low-quality or blurry images, including MRI scans, which produce clearer, more accurate results for medical diagnosis and treatments.

From pixels to possibilities

NSF's investments are driving real-world applications of AI and image recognition technologies, including:

A black-and-white photo of a frowning face overlaid with red dots surrounding the eyes and mouth.

Unlocking facial expressions

Throughout the decades, NSF investments have advanced real-time facial expression recognition. This work produced AI systems capable of identifying facial movements and expressions, including automated versions of the Facial Action Coding System used to measure subtle muscle activity.

These advances directly informed real-world applications, from Sony’s smile-detection cameras to emotion-analysis platforms developed by Emotient, Inc., later acquired by Apple in 2016.

Illustration depicting facial identity recognition

The face of innovation

In the 1990s and 2000s, NSF-funded research advanced key methods in facial recognition, including Eigenfaces for representing faces using linear algebra and the Viola–Jones framework for real-time facial detection in video.

Continued NSF-supported advances in AI have enabled technologies that are part of everyday life — unlocking devices with facial recognition, sorting and searching digital photos, and improving security, such as helping DMV employees detect ID fraud.

Precision in production

In 2024, NSF-supported researchers introduced MaViLa (Manufacturing, Vision and Language), an AI model that "sees" inside factories to interpret visual data and suggest real-time fixes — from detecting 3D-printed flaws to fine-tuning machines for greater precision and efficiency.

Cultivating smarter agriculture

NSF investments in the early 2010s helped Blue River Technology, now a subsidiary of John Deere, develop a tractor-mounted robotic system that uses image recognition — powered by deep learning and computer vision — to distinguish between crops and weeds in real time.

A medical professional uses an ultrasound machine.

In the clinic

NSF Small Business Innovation Research awardee Caption Care, acquired by GE HealthCare Technologies in 2023, uses deep learning to guide medical professionals in capturing and interpreting ultrasound images. By combining object recognition with real-time analysis, their tools expand access to high-quality diagnostics.

Flying into focus

Developed with NSF support, the Merlin Bird ID app allows bird watchers and outdoor enthusiasts to recognize birds by sight and sound. Using image and audio recognition, Merlin can identify birds across different species, angles and postures — directly on a mobile phone.

A hand holding a smartphone displaying a chair.

Pixels to purchase

NSF-supported startup GrokStyle Inc. built on academic research in fine-grained visual recognition to create an AI tool that helps consumers and retailers find exact or visually similar items. Meta Platforms acquired GrokStyle Inc. in 2019.

The East Troublesome Fire spawns a fire tornado as it approaches the Grand Lake area.

Tracking fast-moving wildfires

NSF-supported researchers are using AI to analyze drone-captured imagery of wildfires, revealing the conditions that fuel rapid fire spread. Their "fast fire risk" framework will help communities better anticipate danger and reduce wildfire impacts.

The future in focus

NSF investments continue to shape how AI sees and understands the world. By advancing deep learning architectures, neural network designs, and next-generation AI tools and datasets, NSF is driving progress in image recognition — opening new pathways to scientific discovery, fueling innovation and transforming the way AI impacts industry and daily life.