1. Introduction
Breeding programs have only recently started using high-throughput phenotyping (HTP) platforms for predicting agronomic traits such as biotic and abiotic stress, date of physiological maturity, lodging, plant biomass, and grain yield1. As imagery obtained by remote sensing has become readily available and affordable, efforts to use this technology have been on the rise to increase the genetic gain per unit of time. Although the traits mentioned above and others have already been subject to predictive and classification studies applying HTP, as in the case of maize (Zea mays L.), wheat (Triticum aestivum L.), rice (Oryza sativa L.), and soybean (Glycine max (L.) Merr.), among other crops, implementing these methods in breeding programs has been difficult 1)(2) . This difficulty is mainly due to the uncertainty about the validation and repeatability of the predictions and the challenges in processing the data in a short time when data science is still an emerging discipline. This uncertainty refers to the fact that many HTP studies validate their models in independent data subsets of the same experiment from where they were trained but do not test them in independent environments. Even when using several experiments for model training, they may not repeat accurate predictions to the extent that the same genotype can express different phenotypes across locations and years due to the environmental conditions, and mainly if multiple genes control the predicted trait3. Data processing is more challenging in breeding programs of developing countries than in private corporations, international programs, and public programs from developed countries because the latter have larger budgets and make rapid assimilation of the cute-edge technologies associated with earlier education of new users of these technologies.
Besides facilitating the classification and prediction of agronomic traits, applying HTP can help identify genomic regions associated with traits of interest. This can be done by combining phenomics with other omics approaches, such as genomics, which have a high potential to advance genetic gain in plant breeding 4)(5) . For example, Tanger and others6 performed a genotyping-by-sequencing analysis on a rice population of 1,516 recombinant inbred lines, constructed a genetic map, and performed a QTL (quantitative trait locus) mapping. However, they used a field-based HTP platform instead of the classic field phenotyping. As a result, they mapped genomic regions associated with four alleles that would have a negative effect on grain yield.
This review focuses on field-based HTP platforms that take aerial images from unmanned aerial vehicles (UAV) and apply artificial intelligence (AI) methods to associate the imagery with traits of interest in soybean plant breeding. By using AI to analyze multi-dimensional data sets collected with this technology based on remote sensing and geographic information systems (GIS), breeders could have helpful information to make better decisions when they select the best experimental lines. The studies and other references included here have been cited in peer-reviewed articles, provide a framework for discussing insights into implementing HTP at a large scale, and summarize predictive results according to the devices used, AI methods applied, and data classes collected in single or multiple environments. An extensive review is presented only for studies predicting physiological maturity in soybean. Several relevant studies for other traits and crops are not included because the number of articles published on field-based HTP is already in the hundreds. Some studies based on hand-held devices are also mentioned because the data acquisition in the field is easy or because accurate canopy measurements can be collected (e.g., spectroradiometers). Studies conducted with HTP platforms in laboratories and greenhouses, using tripods or towers -based or not on satellite remote sensing and GIS-, and autonomous mobile robots that use these technologies are not included.
2. What HTP Is, What the HTP Platforms Are, and What the Difference Is with Phenomics
HTP allows researchers and breeders to collect phenotypic data at a larger scale and in a shorter time than classic phenotyping, which comprises taking notes by visual appreciation or using measuring instruments with slow data acquisition. One of the pioneering studies applying HTP in plant sciences was conducted by Boyes and others7. By studying plant growth and development over the life cycle of Arabidopsis mutant lines with defects in specific biochemical pathways, these authors identified subtle phenotypic differences that had not been identified before using only classic phenotyping. Thus, compared to classic phenotyping, the HTP efficiency can be greater mainly when several traits are monitored simultaneously, but also the effectiveness, as happened with the study of Boyes and others7 identifying slight phenotypic differences due to genetic variation and environmental stress. Another advantage related to this is that by applying HTP researchers and breeders can collect information without destroying the experimental unit, which is often the case in classic phenotyping (e.g. when assessing growth rate).
A HTP platform in plant sciences is a system that integrates the use of devices and sensors, automated and eventually robotic procedures to collect the data, specific software for data extraction and analysis, and high computational resources to run and save big data with this software (i.e., workstations, local servers, or cloud servers). HTP platforms can be divided into two main groups: 1) laboratory and greenhouse HTP platforms and 2) field-based HTP platforms. The first studies applying HTP started in laboratory platforms, studying behavioral mutations in mice8 and later in plant sciences, as the study mentioned above in Arabidopsis7. It is in laboratories and greenhouses that interdisciplinary teams performing HTP using state-of-the-art facilities can achieve the highest level of precision in plant-based phenotyping. Meanwhile, field-based HTP platforms include the ones using hand-held devices and those collecting multi-dimensional data sets based on remote sensing and GIS.
The concepts of phenomics and HTP are often interchanged; however, Araus & Cairns9 illustrated that the phenomics scope goes further. While HTP is the art of phenotyping a large number of accessions with greater efficiency in a non-destructive manner, phenomics is ‘the acquisition of high-dimensional phenotypic data on an organism-wide scale’ that allow a better understanding of the pathways linking genes to traits10. A few interdisciplinary research centers distributed worldwide have the facilities and high-tech HTP platforms to apply phenomics at the highest evaluative comprehension level. Several of these research centers applying phenomics are associated with the International Plant Phenotyping Network (https://www.plant-phenotyping.org/). Even though they may focus on conducting plant-based research (i.e., plants contained in plots), their scope reaches field-based HTP studies whose results can be applied more readily to plant breeding.
Pipeline workflows applied by some plant-based HTP platforms have been packed as technological products, such as PhenoTrack3D, which was developed to track maize organs over time11. Several other technological products are available for plant-based HTP, such as those reviewed by Gill and others12 for plant stress phenotyping. However, many other HTP platforms have been individually customized by each interdisciplinary team according to the crop, the trait under study, the resources, and the know-how each team has. Interdisciplinary teams and customized pipelines were the standard when the first HTP platforms started to be developed, as is the case of Xiang & Tian13, who even had to program the missions to fly an autonomous helicopter. This pipeline step for planning flight missions is now fully automated by the specialized software that comes with the drone kit. Users now only must choose flight height, the overlapping percentage among images to both sides of the experiment, or how to fly over the experiment (e.g., in a zigzag design). Similarly, other pipeline steps that include complex codes are constantly summarized in functions included in packages that are uploaded, for example, to the ‘R Project for Statistical Computing’ (https://www.r-project.org) or to ‘GitHub’ (https://github.com), an AI-powered developer platform. A customized pipeline example, applied in the University of Illinois at Urbana-Champaign soybean breeding program, is shown for predicting the date the plant rows reach physiological maturity (Figure 1).
3. Field-Based HTP Platforms: Devices, Data Class Collected, and Artificial Intelligence (AI) Methods Used in Studies Conducted on Soybean and Other Crops
Multi-rotors are the most common UAV used to image field experiments as they are more affordable compared with fixed-wing drones or other rotary drones (helicopters). Typically, cheap UAV carry digital RGB (red, green, blue) cameras, while professionals often integrate more expensive multispectral or hyperspectral cameras. Although multispectral cameras may have a lower resolution than RGB cameras, a significant advantage of the former is that their lenses can record images at frequencies beyond the spectrum visible to the human eye, such as the red edge and near-infrared (NIR) spectrum bands. Both bands are the most commonly recorded by multispectral cameras, besides the RGB bands, and can be used to calculate several of the vegetation indices used in agricultural sciences. For example, the normalized difference vegetation index (NDVI = (NIR - red) / (NIR + red)) (Kriegler and others cited by Huang and others14), and the normalized difference red edge (NDRE = (NIR - red edge) / (NIR + red edge))15.

Figure 1: Pipeline workflow diagram of a high-throughput phenotyping platform applied in the University of Illinois at Urbana-Champaign soybean breeding program to predict the date the plant rows experiments reach physiological maturity (R8 stage)
Both reflectance indices, NDVI and NDRE (also called red edge NDVI, RENDVI), have been correlated with important agronomic traits such as plant growth, leaf area index, aboveground biomass, grain yield, flowering time, nitrogen status, and stress response 6)(16) 17. These are only two of more than one hundred vegetation indices that can be calculated using multispectral imagery17. Multispectral band values can be recorded with multispectral cameras but also with hand-held multispectral radiometers. Using a multispectral radiometer and applying the power function regression model to analyze NDVI values, Ma and others18 explained the grain yield variation among 42 historical soybean cultivars (R 2 ranged from 0.44 to 0.80). In addition, hyperspectral cameras and hand-held spectroradiometers can also record multispectral imagery. The advantage of hyperspectral devices is that the continuous range of the electromagnetic light spectrum is divided into a much greater number of bands. This means that leaf and canopy attributes can be associated with a much greater number of reflectance indices 16)(19)20)(21) .
Another sensor that has applications in plant breeding is the thermal sensor, which is often built-in for hand-held devices and, more recently, multispectral cameras mounted on drones. Instead of capturing the scene's reflectance as the lenses recording the spectral bands do, the thermal sensors capture the infrared radiation emitted by objects in the field (i.e., the crop, weeds, and the ground). Thermal sensors detect changes in plant temperature mainly due to the trade-off between the opening and closing of stomata in response to the transpiration rate, water content, and stress signals, which include diseases such as the sudden death syndrome in soybean caused by Fusarium virguliforme 22)(23) 24. A couple of limitations of the thermal sensors are that they are expensive and have a lower spatial resolution than those achieved by lenses recording the spectral of the visible light and beyond, such as the red-edge and NIR spectrums25. Thus, in plant breeding, researchers must reconcile the spatial pixel resolution of the thermal sensor with the plot size of the breeding experiments. For example, Kaler and others26 conducted an association mapping study with 345 soybean accessions to identify loci for canopy temperature under drought conditions. Instead of the classic phenotyping, the authors used a field-based HTP platform with a thermal infrared camera (resolution 640 × 512 pixels) mounted on a tethered balloon filled with helium and held at 75 m. In this case, the resolution was enough to associate the pixel information to the corresponding size plot, 3.65 m in length with 0.76 m row spacing (two-row plots) in one experiment, and 4.57 m in length with 0.19 m row spacing (seven-row plots) in the other experiment.
Plant breeders are also interested in predicting physiological maturity using remote sensing, mainly in plant-row trials because of the large amount of time required to take notes for several thousand experimental lines. In the case of soybean, physiological maturity is the R8 growth stage27, defined as when 95% of pods reach their mature color, and the maximum biomass accumulation in the seed occurs. As a field-based HTP case study, the studies conducted to date in single and multiple environments using different methods of analyzing RGB and multispectral images to predict soybean physiological maturity are listed (Table 1). Using the machine learning algorithm Random forest (RF) and a binary prediction model applied to multispectral aerial images taken over plant rows at the University of Illinois, Yu and others28 reported an overall accuracy of 93.8% for classifying R8. The authors also remotely measured the canopy area and the length of the plant rows and reported correlations with grain yield of r= 0.56 and r= 0.49, respectively. By using RGB images and deep convolutional neural networks (CNN), Trevisan and others29 and Moeinizade and others30 also trained models to predict the R8 stage in soybean. In several trials evaluated in different environments, both studies reported a root mean squared error (RMSE) of approximately 2 days for the average of analyses, a tolerable value considering that breeders most often take notes every 5-7 days. Still, these and other authors reported lower prediction errors for some experiments (Table 1).
Additional studies to predict physiological maturity in soybean have been conducted using methods different from RF and CNN (Table 1). Using a ground-based field spectroradiometer to measure the canopy reflectance, Christenson and others19 applied partial least squares regression (PLSR) to associate the R8 stage with 91 wavebands and four vegetation indices with different versions for its calculation. These were six versions for RENDVI, the blue, green, and red NDVIs, and three for the water index. After the PLSR analysis, one model was adjusted with the most significant indices and the other with the most significant individual wavebands (RMSE = 5.5 and 5.2 days, respectively). A possible reason to explain the fact that these prediction errors were more than twice the errors obtained by the last two studies mentioned above 29)(30) is the n-value used to train the models. While Christenson and others19 used 40 cultivars, the other two studies used hundreds to several thousands of experimental lines. It is noteworthy that with a ground-based field spectroradiometer, it is possible to collect a huge amount of spectral information with high accuracy. However, the number of plots an individual can phenotype daily is a few hundred, while a drone carrying a camera can phenotype ~10,000 plant rows in half an hour.
Applying PLSR to multispectral images taken from a drone to 326 soybean lines, Zhou and others33 explained up to 70% of the R8 stage variation (RMSE = 1.7 days). This best result occurred using images from the second of three flight dates, five linear components in the PLSR model, and the rating change of image features (from the second to the third flight) as predictors. In turn, the above results were improved by adjusting the maturity records annotated in the field with the variances of two canopy image features (red edge NDVI and canopy chlorophyll content index) obtained from soybean lines that matured the same day according to the predictions (Table 1). However, applying the same adjustment technique but using images of the third flight instead of the second flight, the accuracy of the maturity predictions did not improve compared to the same analysis but without adjusting the records annotated in the field (R 2= 0.71 and RMSE = 1.6 days).
Table 1: Field-based high-throughput phenotyping studies conducted in single and multiple environments using different methods of analyzing RGB (red, green, blue) and multispectral images to predict physiological maturity in soybean

The indicators show the results of those experiments and analyses that obtained the lowest prediction error.19)(28)(29)(30)(31)(32)(33)(34)(35)(36)(37)(38)(39
Narayanan and others31 calculated a normalized green excess index and adjusted a piecewise linear regression model in function of time. For a subset of test data and locations, the adjusted models showed a range in the Pearson correlation coefficient (r) from 0.79 to 0.92 (Table 1). Based on these results, Volpato and others36 conducted a comparative analysis, deciding after testing two other vegetation indices to use the same normalized green excess index with a nonparametric local polynomial regression model. Finally, one of the most extensive studies on soybean was conducted by Yuan and others32, who used RGB images and five regression and classification models to predict the date of R8, plant height, seed size, protein, oil, fiber, and grain yield. In the case of physiological maturity, the models explained up to 76% of the R8 stage variation but with a higher error than the other studies cited previously (RMSE = 3.7 days).
Trevisan and others29 and Moeinizade and others30 stated that an advantage of CNN is that they benefit from the spatial structure of the pixels. This advantage could be especially important when CNN is applied to complex traits, such as plant development (e.g., morphology, plant growth, and plant/organ counting) and plant stress40. Physiological maturity is a trait strongly associated with the loss of green in the leaves; therefore, predicting maturity using this easy-to-observe change in color is one reason why Moeinizade and others30 achieved good performance with CNN after using just four convolutional layers. A convolutional layer comprises the training of specific details of images (features) that are automatically learned and stacked hierarchically on other layers. Other researchers have found that RF can also be applied to predict complex traits. For example, in a comparative analysis between three machine learning algorithms for wheat lodging classification, Zhang and others41 obtained lower overall accuracy fluctuations between and within ten replicates of datasets from three different dates when they used RF compared to neural networks or support vector machine (SVM). In the same work, when RF and the best of three deep learning methods -GoogLeNet, CNN, and VGG-16- were compared, no significant differences were identified (P<0.05).
Based on classification and regression trees (CART), RF is also a machine learning algorithm that ensembles trees (i.e., the predictors) by using the bootstrap aggregation method, also known as the bagging method 42)(43) . Emulating branches in a tree, the splitting in each node is a sequential decision process, meaning that the splitting occurs from the base or root node to the top leaf; this is the terminal node where the splitting events can no longer happen. A difference from single decision trees is that the RF algorithm uses the bagging method, which implies that the splitting occurs based on a random selection of features with replacement from the original training set that can be categorical or numerical 42)(43) . Even with the risk of high bias, the aim of training different random subsets within the training set (i.e., bagging) is to decrease the correlations between trees and the variance, which can cause overfitting of the trained model44. Through learning algorithms associated with their respective bootstrap samples, the final ensemble of the trees results in a single classifier or regressor, according to the type of the dependent variable43.
Pubescence color (gray, light tawny, and tawny) is an important soybean breeding trait that was recently classified using aerial images taken from a UAV45. Breeders take notes for this qualitative trait to characterize the experimental lines and trace whether the identity in the field and the progenies' frequencies are according to the expected. Two loci epistatically control pubescence color, and the gene combinations that give the three phenotypes are tt__ (gray), T_tdtd (light tawny), and T_Td_ (tawny) 46)(47) . Applying the SVM algorithm to the multispectral images' analysis, Bruce and others45 could classify gray and tawny with an overall accuracy in the tested data of 75% using the red/blue band ratio. However, they failed to separate light tawny from tawny pubescence because the algorithm incorrectly assigned all the light tawny lines as having tawny pubescence.
Lodging is another categorical and important trait in soybean, but unlike pubescence color, which is a nominal categorical variable, lodging is mainly considered by breeders as an ordinal categorical variable. Usually, this is by visually assigning a number on a scale from 1 to 5 in the field, where the higher the number, the greater the percentage of prostrated plants. Based on this scale, Roth and others48 discarded their intention of developing a drone-based lodging index because they could not extract the time at which plant height drops. This was related to the fact that when they conducted the photogrammetric processing, the depth maps were sensitive to defoliation and color changes due to the senescence phase. However, using the same ordinal categorical scale but divided into four levels: non-lodging (1, 1.5), moderate lodging (2, 2.5), high lodging (3, 3.5), and severe lodging (4, 4.5, 5), Sarkar and others49 could classify lodging with an overall accuracy of 96% by applying artificial neural network (ANN), though this was after using a data balancing method (smote-ENN) to deal with the unbalanced number of plots per lodging level. Previously, although predicting wheat lodging instead of soybean and using a binary categorical variable (lodging and non-lodging), Zhang and others41 achieved an overall accuracy of ~90% by applying RF, SVM, and three deep learning methods without preprocessing the data with a balancing method.
Machine learning and deep learning are the two branches of AI methods most commonly applied for data analysis collected using a HTP platform50; however, robotics and computer vision are two other increasing AI methods. On the other hand, non-AI methods such as process-based models48, geospatial analysis51, and statistical and mathematical modeling combining genomic data52 have also been applied for plant breeding purposes based on the imagery collected with a HTP platform. Within each one of the above AI and non-AI methods, several promising approaches with potential use in plant breeding have been reported for predicting main agronomic traits using HTP field platforms. However, there is no agreement about what learning algorithms and other analytic methods best predict the different traits. Undoubtedly, the genetic diversity of the germplasm, the size of the training and validation datasets, the number of trials across the different environments, the number of flights, the time and weather conditions while the images are taken, and the different analysis pipelines applied affect the results expressed in terms of R 2 and RMSE when the variable is numerical.
Even using the same methods and information, the results may vary according to the data type researchers assume when studying the same trait. Although two of the studies mentioned above were independent of each other, one studying lodging on wheat and the other on soybean 41)(48) , the first one considered the trait as a binary categorical variable (lodging and non-lodging), and the second as an ordinal categorical variable (1-5 scale). Another example is predicting soybean physiological maturity, where independent studies considered different data types. While Yu and others28 and Hu and others37 considered the trait as a binary categorical variable (mature and immature plots), Zhang and others38 did it as a nominal categorical variable (immature, near-mature, mature, and harvested plots), and others as a numerical variable to predict the date the plant rows reached the R8 stage (Table 1). Besides differences in the prediction error values that may result, it is noteworthy that the kind of information the results provide to the breeders also matters. For example, in the case of soybean physiological maturity, binary models can be helpful mainly for classifying maturity groups because these models do not predict what date the event occurred (i.e., the R8 stage). Meanwhile, considering a time series of images and the variable as numerical, a regression analysis can determine the date on which the same event occurred; so, besides maturity groups, the model is also helpful in predicting the germplasm cycle length expressed in days.
Compared to pubescence color, lodging, physiological maturity, and other traits in soybean and other crops, more research studies have been conducted to predict aboveground biomass and its relationship with grain yield. In one interesting study conducted on soybean, Maimaitijiang and others53 adjusted a model combining canopy spectral and volumetric information, which they called “vegetation index weighted canopy volume model (CVMVI).” The regression analysis between the canopy volume with aboveground biomass determined through destructive field sampling indicated that this proposed model (CVMVI) had a similar coefficient of regression (R 2= 0.893) as more complex models such as PLSR (R 2= 0.911), or stepwise multilinear regression (R 2= 0.915). The predictions were made by first estimating canopy volume by photogrammetry, a technique highly correlated with the aboveground biomass54. The volumes were then weighted by a bulk density factor calculated with the green red ratio index (GRRI = green/red) of pixels, chosen among other vegetation indices due to its best fit. Using data from the same experiment, Maimaitijiang and others55 applied multimodal data fusion within a deep neural network framework and compared the results with other learning algorithms. The deep neural network algorithm explained a higher proportion of the variation in grain yield (72%) compared to RF (66%). However, when only the RGB bands were used as features, RF was the algorithm that best explained the grain yield variation.
Two of the most important traits in soybean and other crops, physiological maturity 29)(30) and aboveground biomass and its relationship to grain yield 53)(55) , were predicted with a low error using only RGB images collected with drones carrying affordable digital cameras. Multispectral cameras are several times more expensive than digital RGB cameras, mainly if they include a thermal sensor. Among other traits, they have been used more for studying traits related to plant health and plant stress, such as diseases, pests, and water deficit in soybean 56)(57) 58. Concerning field spectroradiometers collecting hyperspectral data, they are suitable for studying complex traits such as photosynthetic traits 19)(20) 21. However, there is a trade-off between the cost and benefit of including RGB, multispectral, or hyperspectral data for the predictive analyses.
Spectroradiometers are even much more expensive than drones carrying multispectral cameras. At the same time, spectroradiometers need more time to collect the same amount of data than flying a drone over the experiments. Therefore, collecting data with spectroradiometers would be a better option when RGB or multispectral data cannot provide proper or accurate information for certain traits. Otherwise, besides using more time than a drone carrying a camera (RGB or multispectral), redundant information may be collected with spectroradiometers, mainly when phenotyping simple traits. Still, this may also occur with complex traits when the relationships between light reflectance patterns and the underlying biological processes of plants are still poorly understood12, though first comprehensive studies are arising21. Phenotyping such a large amount of data is particularly challenging when plant breeders attempt to identify slight favorable genetic differences among thousands of experimental lines, commonly evaluated in small single-row plots. These considerations also apply to other self-pollinated crops besides soybean and cross-pollinated crops when selecting inbred lines based on hybrid performance.
4. Justification of Research Applying Field-Based HTP
Plant breeding programs of self-pollinated grain or legume crops, such as soybean, need to annually record phenotypic traits for thousands of experimental lines grown in plant rows to select those that should be evaluated in preliminary yield tests. Similarly, the same occurs when testing F1 hybrids in cross-pollinated grain crops such as maize. In both cases, it is a time-consuming task that includes the characterization before harvest of the date to reach silking in maize or the R8 stage in soybean, only to mention a couple of traits. In the end, much of the data collected is not used because the vast majority of the plant rows or inbred lines in soybean and maize, respectively, will be discarded due to their low grain yield and not because of a too-short or too-long cycle length.
In the case of soybean, experimental lines are often developed from hybridizations between parents of different maturity groups. Hence, the range of days to reach the R8 stage in a breeding population can be wide between and within families, requiring a repeated collection of field notes every three or four days for about four to five weeks. In addition, because of the high number of lines to evaluate, breeding programs often conduct their selections in trials of one-row plots of plant rows. Therefore, the selection accuracy for grain yield can be compromised due to a lack of replication, but also because lines of different maturities can be affected differently by random weather conditions that occur each growing season.
The use of drones for HTP by taking aerial images across the growing season promises to improve the efficiency of breeding programs. This could increase efficiency by saving time or allow the evaluation of a greater number of plots using similar resources, which at the same time increases selection intensity. In turn, by using the image features, phenotypic values could be estimated with higher accuracy than can be done by humans taking visual ratings in the field. For example, it was reported that for the R8 stage in soybean, predicted values were more reliable than ground-truth values36. In this same crop, similar results could be obtained for lodging or pubescence color, considering that within a breeding program the notes are often taken by a team and not only by one person.
To date, only one study has been conducted for predicting pubescence color in soybean45, while not many but several studies have predicted the R8 stage using different algorithms with aerial images or hyperspectral reflectance signatures of the canopy (Table 1). However, the small data set is a limitation of some studies, especially when using a spectroradiometer. Another relevant limitation is that only two of the studies predicting maturity were conducted across multiple environments and tested the models in an independent environment 29)(39) , highlighting the relevance of conducting more research to test the reproducibility of the models.
Although several studies have been conducted to predict aboveground biomass and grain yield, the same question remains about models' reproducibility because some studies have trained them using only a single environment and with low n-values. For instance, although Maimaitijiang and others 53)(55) used the most cutting-edge algorithm, the study was conducted in a single environment with only three cultivars evaluated in multiple subplots. The uncertainty about the reproducibility of models is a concern in the breeders' community2. Poor reproducibility would occur mainly when complex traits are predicted using small datasets representing a unique environment or germplasm. In this sense, when RF was proposed, an advantage over other learning algorithms is that it includes the bagging method in the algorithm, a method that decreases the overfitting of the trained model44. Concerning overfitting, models trained with shared check cultivars across environments had a better performance when tested in independent environments 29)(39) , and they were also more general when redundant image features were disregarded after conducting principal component analysis39.
For complex traits such as plant growth, canopy volume, aboveground biomass, and grain yield, most predictive studies in soybean and other crops have obtained leaf or canopy reflectance records using only one kind of HTP platform. Three examples of this are Ma and others18, that used a portable multispectral radiometer; Yu and others28, that used multispectral images; and Maimaitijiang and others 53)(55) , that used RGB images. At a breeding population scale, studies conducted in soybean relating canopy reflectance information obtained from different platforms or sources are absent 59)(60) . Instruments such as portable spectroradiometers have the advantage of recording the continuous range of the electromagnetic spectrum, which opens more opportunities to relate features to the traits of breeding interest 19)(20) 21. It is possible that more accurate predictions could be made for complex traits by associating features from time series of aerial images with hyperspectral reflectance signatures of the canopy.
5. Concluding Remarks
Several studies indicate that predicting traits of agronomic interest using a field-based HTP platform based on features of aerial imagery associated with the respective ground-truth values is possible. However, most studies applying HTP have validated their models in data subsets of the same environment where they were trained, which can lead to overfitting when the models are tested in a different environment. Overfitting may happen if too many image features or redundant information is used to train the models, the variation in the germplasm is narrow for the trait of interest, and the growing conditions among environments differ significantly. Considering this, collaborative international studies using the same workflow could fine-tune more generalized models by covering a broader spectrum of possible scenarios. Hence, to provide helpful information for breeders, there is still a path to identify redundant information to adjust more general and reliable models when tested in new environments over the years or in different locations. More reliable predictions would increase the efficiency and accuracy of the methods for selecting the best lines, leading to breeding programs with higher genetic gains per breeding cycle.
Another critical challenge is processing and analyzing the data in a short time, before harvest or at least before breeders make the final selections in their attempt to choose experimental lines with a differential genetic value. Applying a HTP platform with immediate application for the plant breeding selection process requires developing a smooth pipeline workflow that requires investment not only in software and hardware but also in hiring or training collaborators with strong computational and analytical skills and a desirable background in agronomy, biology, genetics, or plant breeding. Data science is still an emerging discipline, though several universities and private companies have respectively created certificates, bachelor's and master's degrees, and training programs in the area. Merging all this knowledge in smooth pipeline workflows is still harder when HTP is an essential but not the only component of phenomics studies conducted to reveal the pathways between genes and traits. To the extent that the coming findings by applying HTP will be cooperative efforts integrated with other omics approaches, the larger will be the genetic progress of breeding programs.














