Texas A&M Logo

A Data-Driven Image Extraction and Analysis Pipeline for Plant Phenotyping in Controlled Environments

1Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX, USA
2Department of Soil and Crop Sciences, Texas A&M University, College Station, TX, USA
3Texas A&M AgriLife Research, Texas A&M University, College Station, TX, USA
4Department of Biological and Agricultural Engineering, Texas A&M University, College Station, TX, USA
5Department of Agricultural and Biological Engineering, Mississippi State University, Mississippi State, MS, USA
6Department of Entomology, Texas A&M University, College Station, TX, USA
7Department of Biochemistry, University of Nebraska–Lincoln, Lincoln, NE, USA
*Correspondence email: fahimehorvatinia@tamu.edu, jpeeples@tamu.edu
Abstract

Temporal imaging of plants in controlled environments helps scientists better understand growth and biological processes. However, analyzing large volumes of images has been limited by a lack of standardized datasets and open-source tools. This paper presents an automated multispectral image extraction and analysis pipeline coupled with the Plant Growth and Phenotyping Version 2 (PGP v2) dataset, which provides a comprehensive resource for plant phenotyping research. The pipeline is designed to streamline the workflow from raw greenhouse imaging data to extracted phenotypic traits. The PGP v2 dataset contains approximately 52,000 multispectral images from four crop species (corn, cotton, rice, and sorghum) with labeled metadata. We demonstrate the effectiveness of the proposed methodology through statistical analysis of the extracted features and present comparative analyses of multiple deep learning-based segmentation models. The proposed pipeline and dataset aim to accelerate research in plant phenotyping, facilitate reproducible science, and provide researchers with reliable tools to extract meaningful plant phenotypes from imaging data.

Project Organization and Workflow Integration
Figure 1: Integration of teams and responsibilities in the Texas A&M AgriLife Phenotyping Greenhouse project.
Comprehensive Phenotyping Pipeline
Figure 2: Comprehensive pipeline for multispectral plant phenotyping and feature analysis.
Plant Growth and Phenotyping Version 2 Dataset

This study introduces the expanded Plant Growth and Phenotyping Version 2 dataset (PGP v2), which substantially increases both scale and diversity from the initial release, comprising approximately 52,000 multispectral images across four crop species. The dataset was collected in the Texas A&M AgriLife Precision Phenotyping Greenhouse and includes temporal sequences with standardized lighting, environmental controls, and imaging protocols. Each image is accompanied by comprehensive metadata including timestamp, plant genotype, growth stage, and environmental conditions. The dataset is divided into training and validation subsets to support reproducible machine learning model development and evaluation.

Dataset Samples
Corn (~14,000 images)
Corn sample 1
Corn sample 2
Corn sample 3
Corn sample 4
Cotton (~27,000 images)
Cotton sample 1
Cotton sample 2
Cotton sample 3
Cotton sample 4
Rice (~1,376 images)
Rice sample 1
Rice sample 2
Rice sample 3
Rice sample 4
Sorghum (~10,608 images)
Sorghum sample 1
Sorghum sample 2
Sorghum sample 3
Sorghum sample 4
Citation
Plain Text:
F. Orvati Nia, J. Peeples, S. C. Murray, A. McFarland, T. Vann, S. Salehi, R. Hardin, D. D. Baltensperger, A. Ibrahim, J. A. Thomasson, H. Fadamiro, N. K. Subramanian, R. Roston, J. Ishimwe, D. Basak, N. Oladepo, and U. Vysyaraju. 
"A Data-Driven Image Extraction and Analysis Pipeline for Plant Phenotyping in Controlled Environments." 
bioRxiv, 2026.

BibTex:
@article{orvati2026data,
  title={A Data-Driven Image Extraction and Analysis Pipeline for Plant Phenotyping in Controlled Environments},
  author={Orvati Nia, Fahimeh and Peeples, Joshua and Murray, Seth C and McFarland, Andrew and Vann, Troy and Salehi, Shima and Hardin, Robert and Baltensperger, David D and Ibrahim, Amir and Thomasson, J. Alex and Fadamiro, Henry and Subramanian, Nithya K and Roston, Rebecca and Ishimwe, Joslin and Basak, Diptadeep and Oladepo, Nazar and Vysyaraju, Uday},
  journal={bioRxiv},
  pages={2026--02},
  year={2026},
  publisher={Cold Spring Harbor Laboratory}
}