The colorful image of Sindh – the third-largest province of Pakistan – was created by combining three separate images from the near-infrared channel from the Copernicus Sentinel-2 mission. CREDIT: contains modified Copernicus Sentinel data (2021-22), processed by ESA. LICENCE: CC BY-SA 3.0 IGO
On March 21, 2024, the GeoField community of practice came together in a virtual workshop focused on technical issues arising when integrating Earth observation (EO) into impact evaluations. Organized by AidData, 3ie, and DevGlobal, the event brought together experts in EO and those specializing in impact evaluations and climate-sensitive agriculture. The conversations covered a range of topics, from the utilization of hyperspectral sensors and drones to the technical challenges surrounding spatial autocorrelation, resampling satellite imagery, and yield prediction using remote sensing data.
The virtual workshop facilitated discussions around a set of pre-submitted questions. These questions were:
1. How should we deal with spatial autocorrelation when we have very granular data?What kinds of tests can we use to check for autocorrelation in geospatial data(e.g., Moran’s I and variograms of each covariate over time?), and how do we account for it if we find autocorrelation (e.g., cluster SEs, but at what level? Or aggregate data until no autocorrelation exists, but what if it still does)? How would the validity concerns we have and the steps we take change for different econometric designs (e.g., matching where we match on the variables that may be correlated; Synthetic DID where SEs are calculated using placebo or bootstrapping methods)? Submitted by:Fiona Kastel, 3ie
Responses included:
2. If you have data at a certain resolution, say 30 sq.m, but you’re conducting analysis at higher resolution, say 500 sq.m, what is the best practice of exporting pixels from that image? Do you take an average of all pixels and export, or sample? Do the sample and reducer commands in GEE do what we think they do, or is there better practice of how to export this data for a polygon/region? And how does this play into cloud cover/missingness? Do we need to mask if there is some cloud cover? Submitted by: Sanchi Lokhande, 3ie
Responses included:
3. We are working on a rice yield prediction using vegetation indices. We use yields from crop cut exercise to train the prediction models, such as a random forest model. However, the predicted yield presents a more compressed distribution compared to the distribution of crop-cut yield. This pattern also appeared in other studies (e.g., Guan et al.,2018; Jain et al., 2016; Lobell et al., 2019). Moreover, as a consequence, treatment impacts identified from the VI-predicted yields tend to be smaller compared to impacts identified from crop-cut yields and self-reported yields, such as in Jain et al. (2019). Is there any suggestion on how to improve the yield prediction model to mitigate this compressed distribution issue? Submitted by: S. Jessica Zhu, PxD
Responses included:
4. How can advanced remote sensing techniques, such as hyperspectral imaging and UAV(Unmanned Aerial Vehicle) imagery, be utilized to capture fine-scale vegetation dynamics and assess the performance of climate-smart agricultural practices at the local level? Submitted by Kunwar Singh, AidData
Responses included:
5. We are conducting a geospatial impact evaluation examining crop adoption in areas around specific geo-referenced villages (in this case, water-intensive crops aimed at mitigating flood risks along riverbanks. We need to evaluate program impacts in the area around the villages, so need to build appropriate buffers around the treated and non-treated villages in our sample so that we can compare the difference in outcomes between the two groups. We are considering alternative approaches to defining buffers: (i) Thiessen (Voronoi) polygons that ensures that each village has a unique buffer region around it, and (ii) creating a grid system that spans our area of study and assigning treatment or comparison status to each grid based on the distance to treated village. What are the pros and cons of these approaches, and are there other options you might recommend? Submitted by Pratap Khattri, AidData
Responses included:
These kind of challenges are actually quite common when integrating EO and impact evaluations, because we often lack clear links between the treatment sites(such as villages) and the farms or natural environments we observe through EO.Thus, we are often forced to develop a matching scheme (based on buffers or otherwise) that establishes these links in the data.
Fiona Kastel is a Research Associate at the International Initiative for Impact Evaluation (3ie). At 3ie, Fiona leads the Data Innovations Group and provides research, program management, and data analytics support for multiple programs on agriculture, education, finance, health, and policy and institutional reform.