GeoField

The colorful image of Sindh – the third-largest province of Pakistan – was created by combining three separate images from the near-infrared channel from the Copernicus Sentinel-2 mission. CREDIT: contains modified Copernicus Sentinel data (2021-22), processed by ESA. LICENCE: CC BY-SA 3.0 IGO

On March 21, 2024, the GeoField community of practice came together in a virtual workshop focused on technical issues arising when integrating Earth observation (EO) into impact evaluations. Organized by AidData, 3ie, and DevGlobal, the event brought together experts in EO and those specializing in impact evaluations and climate-sensitive agriculture. The conversations covered a range of topics, from the utilization of hyperspectral sensors and drones to the technical challenges surrounding spatial autocorrelation, resampling satellite imagery, and yield prediction using remote sensing data.

The virtual workshop facilitated discussions around a set of pre-submitted questions. These questions were:

1. How should we deal with spatial autocorrelation when we have very granular data?What kinds of tests can we use to check for autocorrelation in geospatial data(e.g., Moran’s I and variograms of each covariate over time?), and how do we account for it if we find autocorrelation (e.g., cluster SEs, but at what level? Or aggregate data until no autocorrelation exists, but what if it still does)? How would the validity concerns we have and the steps we take change for different econometric designs (e.g., matching where we match on the variables that may be correlated; Synthetic DID where SEs are calculated using placebo or bootstrapping methods)? Submitted by:Fiona Kastel, 3ie

Responses included:

‍There indeed does not appear to be a single standard practice in the development or environmental economics literatures.
Possible approaches include controlling for “spatial spillovers” via a spatial lag model that includes lags in either independent variables or unobservables (or both). See this work by Ariel Ortiz-Bobea on similar issues in a U.S. context, as well as this work by Liuand Yang that discusses quasi-maximum likelihood estimators for these).
In addition, one might also use the Conley (1999)standard error approach. These are implemented by a variety of user-writtencommands in Stata; acreg and conleyreg are two of the latest to our knowledge.
Otherwise, people generally try to select a unit ofaggregation for standard error clustering that balances cluster count (i.e.,more than 40, etc.) with the potential for spatial correlation.

2. If you have data at a certain resolution, say 30 sq.m, but you’re conducting analysis at higher resolution, say 500 sq.m, what is the best practice of exporting pixels from that image? Do you take an average of all pixels and export, or sample? Do the sample and reducer commands in GEE do what we think they do, or is there better practice of how to export this data for a polygon/region? And how does this play into cloud cover/missingness? Do we need to mask if there is some cloud cover? Submitted by: Sanchi Lokhande, 3ie

Responses included:

‍DigitalEarth Africa offers open alternative to Google Earth Engine
Rather than resampling, several suggestions to use products from MODIS that are already at larger pixel scales
If resampling, use modal or median values that are less sensitive to outliers
Also worth considering whether need to resample or instead can analyze data at highest resolution (i.e. 30m rather than 500m) and simply use the right-hand-side variables at these coarser scales (with appropriate standard error clustering).

3. We are working on a rice yield prediction using vegetation indices. We use yields from crop cut exercise to train the prediction models, such as a random forest model. However, the predicted yield presents a more compressed distribution compared to the distribution of crop-cut yield. This pattern also appeared in other studies (e.g., Guan et al.,2018; Jain et al., 2016; Lobell et al., 2019). Moreover, as a consequence, treatment impacts identified from the VI-predicted yields tend to be smaller compared to impacts identified from crop-cut yields and self-reported yields, such as in Jain et al. (2019). Is there any suggestion on how to improve the yield prediction model to mitigate this compressed distribution issue? Submitted by: S. Jessica Zhu, PxD

Responses included:

‍Optical-based indices are sometimes constrained bysaturation, which may lead to compressed distributions. SAR or other sensorscould provide better sensitivity to the variation you’re seeing.
Important to understand whether the compression ofthe distribution is coming from the sensor data or from the prediction modelitself. Can look at the distribution of the sensor data as starting point tosee if it exhibits similar compression.
If the prediction model is responsible forcompression, one suggestion is to approach it as you would a “class imbalance”issue, and reweight observations in the training sample to give greater weightto the edges of the distribution.
Also important to control/adjust for factorsassociated with water availability, and—especially for rice—to account for cropmoisture during harvests

4. How can advanced remote sensing techniques, such as hyperspectral imaging and UAV(Unmanned Aerial Vehicle) imagery, be utilized to capture fine-scale vegetation dynamics and assess the performance of climate-smart agricultural practices at the local level? Submitted by Kunwar Singh, AidData

Responses included:

‍Example offered: village-level or other smaller scale dynamics of croplands acquired via drone; this imagery could also be used as training sample inputs to broader national-level classification model
Can also use UAVs to focus in on questions that can only be assessed at finer scales, complementing the analyses at wider scales but lower resolutions based on satellite or other EO data
Cross-validation with multiple sources of data is particularly helpful, and these advanced remote sensing techniques can provide these.

5. We are conducting a geospatial impact evaluation examining crop adoption in areas around specific geo-referenced villages (in this case, water-intensive crops aimed at mitigating flood risks along riverbanks. We need to evaluate program impacts in the area around the villages, so need to build appropriate buffers around the treated and non-treated villages in our sample so that we can compare the difference in outcomes between the two groups. We are considering alternative approaches to defining buffers: (i) Thiessen (Voronoi) polygons that ensures that each village has a unique buffer region around it, and (ii) creating a grid system that spans our area of study and assigning treatment or comparison status to each grid based on the distance to treated village. What are the pros and cons of these approaches, and are there other options you might recommend? Submitted by Pratap Khattri, AidData

Responses included:

These kind of challenges are actually quite common when integrating EO and impact evaluations, because we often lack clear links between the treatment sites(such as villages) and the farms or natural environments we observe through EO.Thus, we are often forced to develop a matching scheme (based on buffers or otherwise) that establishes these links in the data.

‍Fiona Kastel is a Research Associate at the International Initiative for Impact Evaluation (3ie). At 3ie, Fiona leads the Data Innovations Group and provides research, program management, and data analytics support for multiple programs on agriculture, education, finance, health, and policy and institutional reform.

Post-event report: Technical issues in integrating Earth Observation into impact evaluations

The Geofield Community of Practice discusses the utilization of hyperspectral sensors and drones to the technical challenges surrounding spatial autocorrelation, resampling satellite imagery, and yield prediction using remote sensing data