Credit: © 2022 MBARI
What is image recognition?
Image recognition is a core task of computer vision, a subfield of AI that enables computers to identify and classify objects, places, people, writing and actions in digital images or videos. This technology relies on machine learning, where algorithms learn to recognize visual patterns by analyzing millions of images.
Convolutional neural networks (CNNs), a type of machine learning, are the backbone of modern image recognition. Inspired by the structure of the brain's visual cortex, CNNs process images in layers. They first detect simple features like edges and outlines, then combine them to identify more complex shapes, allowing computers to accurately distinguish a tree from a car, read a handwritten number as easily as a printed one, or identify a single face within a large crowd.
Seeing the world
Computer vision emerged in the early 1960s as researchers began exploring whether machines could learn to see and interpret simple patterns and objects. By the mid-1960s, NSF-funded researchers were developing tools to detect edges in images and algorithms to recognize lines, shapes and simple patterns.
Over the following decades, NSF researchers advanced core mathematical frameworks and key techniques that allow computers to interpret pixels as meaningful patterns. They developed methods to more accurately describe shapes and spatial relationships, refined approaches for segmenting and filtering images, and applied statistical and signal processing techniques to detect patterns in noisy images, separate objects from backgrounds and break down complex visual scenes.
At the same time, NSF-supported cognitive science research revealed that human perception is active and influenced by attention, prior experience and multiple viewpoints. Studies of how the brain processes visual information — from simple features to complex objects — helped researchers uncover principles that inspired computational models, including neural networks, that mimic brain function.
Credit: University of Pennsylvania
Credit: Images by Adobe Stock
Vision in the deep
By the late 1990s, growing computer power and sophisticated algorithms allowed machines to learn patterns from data automatically, making object recognition faster, more flexible and accurate.
A pivotal moment came in 2009, when Fei-Fei Li, supported by an NSF Faculty Early Career Development award, and her team launched ImageNet, a publicly available database containing more than 3 million images across 5,000 categories. ImageNet gave researchers the large, high-quality dataset needed to train so-called deep learning systems that are capable of using artificial multilayered neural networks to recognize complex real-world images.
ImageNet also sparked the ImageNet Challenge, an annual competition to see which algorithms could most accurately identify new images. In 2012, a deep learning model called AlexNet used CNNs and advanced graphics processors to analyze ImageNet's vast dataset. It achieved record-breaking accuracy in the ImageNet challenge, cutting error rates in half and proving that deep learning could far surpass earlier approaches.
Alongside ImageNet, NSF-supported centers like the Temporal Dynamics of Learning Center continued to expand understanding of how the brain perceives objects, offering key insights that informed the development of today's deep learning algorithms. Building on this legacy, the NSF AI Institute for Foundations of Machine Learning is advancing the core mathematical and computational tools behind AI. For example, researchers are using deep generative models to reduce noise and sharpen low-quality or blurry images, including MRI scans, which produce clearer, more accurate results for medical diagnosis and treatments.
From pixels to possibilities
NSF's investments are driving real-world applications of AI and image recognition technologies, including:
Unlocking facial expressions
Throughout the decades, NSF investments have advanced real-time facial expression recognition. This work produced AI systems capable of identifying facial movements and expressions, including automated versions of the Facial Action Coding System used to measure subtle muscle activity.
These advances directly informed real-world applications, from Sony’s smile-detection cameras to emotion-analysis platforms developed by Emotient, Inc., later acquired by Apple in 2016.
The face of innovation
In the 1990s and 2000s, NSF-funded research advanced key methods in facial recognition, including Eigenfaces for representing faces using linear algebra and the Viola–Jones framework for real-time facial detection in video.
Continued NSF-supported advances in AI have enabled technologies that are part of everyday life — unlocking devices with facial recognition, sorting and searching digital photos, and improving security, such as helping DMV employees detect ID fraud.
Precision in production
In 2024, NSF-supported researchers introduced MaViLa (Manufacturing, Vision and Language), an AI model that "sees" inside factories to interpret visual data and suggest real-time fixes — from detecting 3D-printed flaws to fine-tuning machines for greater precision and efficiency.
Cultivating smarter agriculture
NSF investments in the early 2010s helped Blue River Technology, now a subsidiary of John Deere, develop a tractor-mounted robotic system that uses image recognition — powered by deep learning and computer vision — to distinguish between crops and weeds in real time.
In the clinic
NSF Small Business Innovation Research awardee Caption Care, acquired by GE HealthCare Technologies in 2023, uses deep learning to guide medical professionals in capturing and interpreting ultrasound images. By combining object recognition with real-time analysis, their tools expand access to high-quality diagnostics.
Flying into focus
Developed with NSF support, the Merlin Bird ID app allows bird watchers and outdoor enthusiasts to recognize birds by sight and sound. Using image and audio recognition, Merlin can identify birds across different species, angles and postures — directly on a mobile phone.
Pixels to purchase
NSF-supported startup GrokStyle Inc. built on academic research in fine-grained visual recognition to create an AI tool that helps consumers and retailers find exact or visually similar items. Meta Platforms acquired GrokStyle Inc. in 2019.
Tracking fast-moving wildfires
NSF-supported researchers are using AI to analyze drone-captured imagery of wildfires, revealing the conditions that fuel rapid fire spread. Their "fast fire risk" framework will help communities better anticipate danger and reduce wildfire impacts.
The future in focus
NSF investments continue to shape how AI sees and understands the world. By advancing deep learning architectures, neural network designs, and next-generation AI tools and datasets, NSF is driving progress in image recognition — opening new pathways to scientific discovery, fueling innovation and transforming the way AI impacts industry and daily life.