The National Institutes of Health (NIH) has issued a solicitation – Spatial Uncertainty: Data, Modeling, and Communication – for innovative research that identifies sources of spatial uncertainty in public health data, incorporates the inaccuracy into statistical methods, and develops novel tools to visualize the nature and consequences of the spatial uncertainty. The solicitation, which spans 7 NIH institutes, offers deadlines of February 5, June 5, and October 5 through summer 2014.
According to the NIH (following the link):
Advanced disease surveillance systems and electronic health data systems generate a large volume of data. Meanwhile, rapidly growing numbers of data sources have led to the generation of complex datasets that pose significant challenges for researchers using them to conduct data analyses. Spatial uncertainty takes one or a combination of several forms.
- Geocoding is a procedure that converts information about locations (addresses, zip codes, counties, etc.) of people, homes, health care providers, etc., to geographic coordinates (latitude and longitude). In population-based public health data, geocoding is commonly obtained by an automated procedure and the results are well known to contain positional errors.
- To protect privacy and confidentiality of people, geographic information in disease data usually is not released if there is a potential to identify individuals. For example, if there are just one or two cases of patients with a rare disease in a county, then the county-level number of disease cases will not be released. This practice gives rise to gaps or incompleteness in data. Another method for protecting privacy is to mask individuals’ locations through shifting the locations of all the individuals by a random distance, or through data aggregation from point-level (longitude and latitude) to area-level such as census block group, census tract, or county; however, this introduces error and uncertainty that is unquantifiable and indeed unknowable by the data user.
- Spatial uncertainty also arises when data come through a variety of data collection schemes at varying spatial scales. An example of this situation is that cancer incidence rates are available at county-level, while environmental exposures of people are measured at just few isolated monitoring sites, and socio-economic covariates come at census-tract level. This type of spatial mismatch among different data sources has long been identified as a Change of Support Problem in the literature. However, concrete statistical methodology has not yet been developed to handle the mismatches via computationally feasible algorithms.
- Boundaries of spatial units may evolve across time and that adds another layer of mismatches to a spatio-temporal level. For example, in 2001, Broomfield, Colorado, was officially incorporated as a new county, comprised of portions taken from four different counties. The mismatches in county codes prior to and after year 2001 make it a challenge to calculate and interpret trends in disease rate for that county. Various methods exist for areal interpolations of spatially mismatched data due to boundary shifts, but the resulting estimates will contain error and uncertainty.
This FOA encourages research projects to improve data collection and quality control in the following types of data: a) disease registry data, such as through improvements in geocoding methods; b) small-area demographic data and intercensal estimates; c) historic risk factor exposure data to account for latency of disease development; d) residential histories of patients to address uncertainty in exposure assessment; e) use of remote sensing and image data alone or in combination with data from fixed monitoring sites to construct exposure assessment; f) data on multiple types of exposures to account for possible cumulative exposure effects; g) electronic health record data and new media sources that give a more comprehensive view of disease surveillance, control, and prevention; and h) linked data from various data sources.
In addition to improving data collection and quality control, the FOA seeks novel statistical methods to model spatial uncertainty…
- Methods for incorporating spatial uncertainty from various sources, such as physical activity, diet, food environment, etc.;
- Methods for integrating data across spatial and temporal scales such as census tracts and counties at multiple time periods;
- Methods for quantifying spatial uncertainty in maps at different levels of aggregation;
- Methods for quantifying spatial uncertainty in cluster identification algorithm; and
- Methods for large, complex, and detailed datasets (e.g., spatial data mining) that help with gaining new knowledge in disease or exposure patterns.
…as well as new geographic information system (GIS) methods for addressing and visualizing spatial uncertainty.
To learn more, check out the full solicitation here.
(Contributed by Erwin Gianchandani, CCC Director)