Introduction

Schistosomiasis affects over 250 million people worldwide (Hotez et al. 2014), Africa contributing approximately 97% of the global cases (Steinmann et al. 2006). Ghana is among the heavily affected countries, with an estimated prevalence of 23.3% (Lai et al. 2015). School-based deworming with praziquantel, as part of preventive chemotherapy, is the predominant schistosomiasis control strategy in Ghana. Ghana Health Service (GHS) currently uses a combination of limited field survey results, data from the national surveillance system, and historical knowledge of endemicity to determine which administrative regions and districts include praziquantel in their deworming campaigns.

Schistosomiasis is caused by parasitic blood flukes of the genus Schistosoma and acquired from skin contact with contaminated freshwater bodies. Schistosomes require snails as intermediate hosts, and transmission occurs at the locations where parasites, snails, and humans converge (Gryseels et al. 2006). Over 350 snail species are suitable hosts for schistosomes; however, three genera of snails are most relevant for public health: Biomphalaria, Bulinus, and Oncomelania, because they serve as intermediate hosts for the three parasite species that most commonly infect humans: S. mansoni, S. haematobium, and S. japonicum, respectively (Gryseels et al. 2006). S. haematobium is the predominant cause of schistosomiasis in Ghana (Lai et al. 2015). Snails are typically found along the shores of perennial freshwater bodies such as ponds, streams, and lakes. Ghana’s Lake Volta, the largest man-made freshwater reservoir in the world, has nearly 5000 km of shoreline that is an ideal habitat for snails (Doumenge 1987). Other parts of Ghana (e.g., Eastern Region) have an abundance of small rivers and streams (Kulinkina 2017) that also contribute to the country’s schistosomiasis burden.

Specific parasite, snail, and human characteristics that affect schistosomiasis transmission were reviewed in detail by Walz et al. (2015a). Environmental variables that support favorable snail habitats are useful in modeling schistosomiasis risk (Brooker et al. 2001; Walz et al. 2015a, b). Past studies suggest that environmental variables such as temperature, vegetation, precipitation, water chemistry, and elevation are effective predictors of snail habitat and thus potential schistosomiasis transmission (Simoonga et al. 2009; Walz et al. 2015a). Some of these environmental parameters are accessible by remote sensing (RS) technology (Simoonga et al. 2009).

Many RS data streams are publicly available at no or limited cost at various spatial and temporal resolutions. These characteristics make RS data very useful for modeling environmentally sensitive diseases in low-income countries lacking the resources to perform ground-based studies. RS data have been widely used in modeling of infectious diseases linked to the environment, including, but not limited to, vector-borne diseases (Clements et al. 2013; Kalluri et al. 2007), soil-transmitted helminths (Brooker and Michael 2000; Lai et al. 2013), waterborne infections like typhoid (Dewan et al. 2013), and water-related infections like schistosomiasis (Walz et al. 2015a). In two recent reviews of RS applications in schistosomiasis modeling, the vast majority of studies utilized cross-sectional survey data (Simoonga et al. 2009; Walz et al. 2015a). Longitudinal data on neglected tropical diseases (NTDs) remain extremely rare. Yet, with improved surveillance, sensor methodology, and data analytics, these advanced approaches can offer a decision support tool for a broad range of climate-sensitive diseases in low-income settings.

The present study differed from the reviewed studies in utilizing monthly schistosomiasis infection rates obtained from Ghana’s national surveillance and reporting system, aggregated by administrative district, and three RS-based environmental predictors (vegetation, temperature, and precipitation), arranged as time series. The primary aim of the study was to assess broad associations between routinely collected schistosomiasis data and publicly available RS data at the national scale. The analysis was stratified by three major and nine minor climate zones, defined according to a new classification method, using multiple satellite data streams (Liss et al. 2014). We hypothesized that there may be spatial and temporal patterns in reported schistosomiasis infection rates that can be partially explained by environmental parameters, and that these patterns may vary across climate zones (Brooker et al. 2001; Walz et al. 2015a).

Data and methods

Health outcome

Monthly counts of schistosomiasis cases, aggregated to the level of administrative district (n = 216), as reported into the District Health Information Management System (DHIMS) were obtained from GHS. Schistosomiasis refers to infections caused by S. haematobium and S. mansoni as these are not differentiated in the data source. Data were acquired in January 2016 for an 8-year period (96 months) from January 2008 through December 2015. The original dataset consisted of 20,736 monthly observations (216 districts × 96 months). Due to incomplete, missing, and outlier values, some observations were removed to achieve reliable modeling. Data reduction steps are described below. The final dataset comprised of 10,818 observations (Fig. 1).

Fig. 1
figure 1

Data processing steps and sample size reduction

Incomplete data

Examination of the monthly time series of schistosomiasis counts (Fig. S1, Supplemental Information) revealed that the last month (December 2015) had substantially lower counts due to a probable delay in reporting of data acquired in January of 2016. Thus, data from this month (216 observations) were excluded from the analysis.

Missing data

A high percentage of observations in DHIMS were shown as blank or empty cells, and it was unclear as to whether these observations represented a true absence of events or lack of reporting. If the former, the blank or empty cells should be replaced with zeros; if the latter, they should remain as missing values. Since it was not possible to determine the reasons behind the coding scheme, all blank cells were excluded from analysis. A total of 1995 systematically missing observations constituted four districts that had no reported cases and an additional 17 districts that had > 95% missing values. Another 7664 non-systematically missing observations, or missing at random, were also removed (Figs. S2 and S3, Supplemental Information).

Outlier data

Exceptionally high monthly schistosomiasis counts were explored as potential data input errors. To help differentiate naturally occurring versus unlikely high values, skewness, and kurtosis of disease counts (and loge-transformed disease rates for confirmation) were reviewed. As a result of the assessment (Table S1, Fig. S3, Supplemental Information), 43 outliers were excluded from analysis.

For analysis, loge-transformed schistosomiasis infection rates were used. To derive population-adjusted disease rates, district-level population estimates were obtained from the most recent census (2010) and projected for each study year using intercensal population growth rates estimated for each of Ghana’s 10 administrative regions (GSS 2013). The matrix of monthly district-aggregated schistosomiasis counts was divided by the respective matrix of population counts to derive rates, expressed as rates per million people, and loge transformed to achieve an approximately normal distribution.

Environmental predictors

Three environmental variables were derived from publicly available RS data streams: land surface temperature (LST), normalized difference vegetation index (NDVI), and accumulated precipitation (AP) (Table 1). LST and NDVI were downloaded from the online Data Pool at the NASA Land Processes Distributed Active Archive Center (LP DAAC), USGS/Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota (https://lpdaac.usgs.gov/data_access/data_pool). The data were sourced from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor aboard the Aqua and Terra satellites, which together have 8-day temporal resolution. Monthly AP data were downloaded from the Goddard Earth Sciences Data and Information Services Center (GES DISC) data visualization tool, GIOVANNI (https://giovanni.sci.gsfc.nasa.gov/giovanni/), and utilized data from the Tropical Rainfall Monitoring Mission (TRMM). All three datasets were mosaicked (i.e., merged along adjacent sides) to cover the full extent of Ghana, and aggregated to monthly mean values per district to match the temporal and spatial aggregation of the health outcome data. Spatial aggregation was performed using cell and zonal statistics tools in ArcGIS (version 10.4.1). Where resampling was required, the cubic convolution method was used because it more realistically reflects the smooth transitions of environmental data across terrain.

Table 1 Data sources and temporal and spatial resolution of environmental parameters

Defining the climate zones

Ghana has a diverse climate, ranging from hot and dry savannah in the north, tropical forest in the middle, and coastal savannah in the south of the country (Frenken 2005). One of the aims of the analysis was to explore temporal and spatial patterns in schistosomiasis counts across a range of climatic conditions. As an alternative to the commonly used Köppen–Geiger (KG) climate classification, we used a new “Limiting, K-means, Nomination” (LKN) method to define climate zones. The LKN method is based on a k-means clustering algorithm over space and time (Liss et al. 2014). To define the zones, 15 years (2000–2015) of 8-day composite NDVI and LST images from the MODIS sensor were mosaicked and arranged in a layered space-time series. Water bodies were masked to reduce their effect on climate zone classification. After masking the water bodies, the multi-layer images were pixel-averaged and principal component decomposition was applied to reduce dimensionality of the time series. The first four and eight principal components retained 90% and 95% of the original information, respectively, and composite images of these components showed high spatial separation and a large signal-to-noise ratio. Multiple k-means unsupervised classifications were performed using varying classes, principal components, and distance measures, which were analyzed using cluster validity indexes. The most compact clustering solution exhibited the highest degree of homogeneity within each cluster and the highest degree of heterogeneity across clusters. Out of 600 candidate partitions, three major zones (Z3:1 to Z3:3) and nine minor (non-hierarchical) zones (Z9:1 to Z9:9) were produced that were entirely data-driven and specific to Ghana. Zonal statistics tools in ArcGIS were used to determine the major and minor climate zones within which the 216 administrative districts arrayed.

Data analysis

Exploratory analyses included histograms, maps, plots of trend and seasonality, and descriptive statistics for the health outcome and environmental predictors, stratified by climate zone. The associations among variables were examined using Spearman’s rank correlation coefficients and regression models. Generalized linear mixed-effects regression models with a random intercept term were used to assess temporal features and associations between environmental predictors and schistosomiasis infection rates, accounting for district-level clustering of monthly observations. Temporal features included trend and two seasonal harmonic terms (Jagai et al. 2012; Kulinkina et al. 2016; Naumova et al. 2007). Models were repeated for major and minor climate zones.

The complete model was formulated as follows (Eq. 1):

$$ {Y}_{tj}={\beta}_0+{\beta}_1t+{\beta}_LS+{\beta}_MR{S}_{tj}+\alpha Distric{t}_j+{\in}_{tj} $$
(1)

where Ytj is the loge-transformed schistosomiasis infection rate per million for t-month and j-district; β0 is the intercept; β1 is the regression coefficient for trend represented by continuous study month t ranging from 1 to 95; βL represents the set of four regression coefficients for seasonality, S, measured by four harmonic terms (Eq. 2); βM is the set of three regression coefficients for remote sensing parameters, RS (Eq. 3); and α is the random effect for j-district.

$$ S={\beta}_2\sin \left(2\pi \omega t\right)+{\beta}_3\cos \left(2\pi \omega t\right)+{\beta}_4\sin \left(4\pi \omega t\right)+{\beta}_5\cos \left(4\pi \omega t\right) $$
(2)

Seasonality in schistosomiasis infection rates was assessed based on the significance of the four harmonic terms, capturing up to two annual peaks, with ω = 1/12; seasonality was present if at least one harmonic term was statistically significant.

$$ RS={\beta}_6 LS{T}_{tj}+{\beta}_7 NDV{I}_{tj}+{\beta}_8A{P}_{tj} $$
(3)

Model building was conducted sequentially from three partial models to a complete final model. Model 1 included only the temporal trend, Model 2 only the seasonal component (Eq. 2), Model 3 only the environmental variables (Eq. 3), and Model 4 contained all components (Eq. 1). The estimates of the predicted percent change in monthly rates of reported schistosomiasis cases (%R) per 1-unit increase of each environmental variable along with their 95% confidence interval limits (CI95%) were calculated by exponentiating the regression coefficients (β6, β7, β8) and converting them to percentage as follows: %R = (exp{βM} − 1) × 100%; CI95% = (exp{βM ± 1.96 SEβm} − 1) × 100%. Similarly, the estimates for trend were obtained using the β1 regression coefficient. Predicted temporal curves were plotted using partial model results (trend + seasonality model). All models were fitted by the restricted maximum likelihood (REML) method, using the glmer function of the R package [lme4] (version 3.3.1). Model fit was assessed using R2 or percent variability explained.

Results

Temporal and spatial distribution of health outcome and environmental predictors

Distributions of average monthly values for the 8-year reporting period for all analysis variables were explored using histograms (Fig. 2). Loge-transformed schistosomiasis infection rate had a peak at approximately 3.5–4.0 (33–55 cases per million people). NDVI showed a bimodal distribution with a major peak at 0.7 and a minor peak at 0.2. LST also exhibited two peaks, at 27 °C and at 37 °C, which contributed to a long right skew. AP had a high frequency of low values and a long tail of high values corresponding to rainy seasons.

Fig. 2
figure 2

Top row: histograms of health outcome and environmental parameters. Bottom row: annual seasonal patterns based on monthly boxplots representing distribution of health outcome and environmental parameters across 195 districts

To examine the temporal patterns, monthly values for all variables were plotted as aggregates (Fig. 2) and consecutively over the 8-year period (Fig. 3). Loge-transformed schistosomiasis infection rate was consistent across 12 months, with a median at approximately 3.5 (33 cases per million people) (Fig. 2), and showed a slight decline over the study period (Fig. 3). Environmental predictors exhibited seasonality, with NDVI and AP having two peaks per year and LST having one peak per year. Peaks in NDVI occurred in March and September (0.70), with a dip in June (0.55). LST peaked in February (27 °C) with lower values in July (25 °C). The highest AP values occurred in June and September (20 cm), with little to no precipitation in January and August (Figs. 2 and 3).

Fig. 3
figure 3

Time series of monthly boxplots representing distribution of loge-transformed schistosomiasis infection rates, normalized difference vegetation index (NDVI), land surface temperature (LST), and accumulated precipitation (AP) across 195 districts

To examine the spatial distribution of values, variables were mapped based on their 8-year district mean values (Fig. 4). Average loge-transformed schistosomiasis infection rate had a spatially heterogeneous pattern that ranged from 0.72 to 5.91 (2 to 369 cases per million people, respectively). NDVI and LST showed trends along a southwest to northeast diagonal. NDVI decreased along this diagonal, with slightly higher values along the eastern shore of Lake Volta. LST increased along the diagonal, except for higher temperatures along the heavily urbanized southeast coast (Greater Accra area) and the peri-urban area surrounding Kumasi, the second largest city. AP showed trends along a diagonal that extended from the southwest corner to the center east portion of the country. Along this diagonal, AP was high and decreased to either side, rapidly toward the southern coast and gradually toward the north.

Fig. 4
figure 4

Maps of health outcome (loge-transformed schistosomiasis infection rate per million), normalized difference vegetation index (NDVI), land surface temperature (LST), and accumulated precipitation (AP) (8-year averages aggregated at the district level). In the map of health outcome, districts with > 95% of missing values are colored white

Environmental predictors across climate zones

The LKN climate classification method resulted in three major and nine minor climate zones (Fig. 5). The mean monthly values for NDVI, LST, and AP on the national level were 0.58 ± 0.16, 28.01 ± 4.26 °C, and 12.47 ± 9.22 cm, respectively (Table 2). For the major zones, the northern part of the country (Z3:1) had the lowest amount of rainfall, lowest vegetation index, and highest temperature. Moderate precipitation, vegetation, and temperature values were observed in the middle of the country (Z3:2). The south (Z3:3) had the highest precipitation and vegetation, and lowest temperature. The mean values along the minor zones revealed further North to South trends within all major zones, except areas represented by zones Z9:8 and Z9:9. These areas are likely to be the most urban (high population density, Table 2), which resulted in less vegetation and higher temperatures. Minor zone Z9:9 specifically contained the largest cities Accra and Kumasi. Histograms and boxplots for all variables stratified by climate zone demonstrated pronounced diversity of seasonal patterns for all three environmental parameters (Figs. S3 and S4, Supplemental Information). The seasonal patterns of NDVI and AP exhibited a single annual peak in the north, transitioning to two annual peaks in the southern part of the country, while the seasonal peaks in LST became less pronounced.

Fig. 5
figure 5

Three major and nine minor climate zones resulting from the “Limiting, K-means, Nomination” (LKN) climate classification for Ghana

Table 2 Descriptive statistics of the health outcome and environmental predictors stratified by climate zones

Associations among variables

Spearman’s rank correlations were used to analyze pairwise relationships among variables, stratified by climate zone (Table 3). The association between NDVI and LST was primarily negative; it was weak at the national level (rs = −0.13), and moderate in major zones Z3:1 and Z3:2 (rs ~ − 0.30) and in minor zones Z9:1 through Z9:5 (rs ~ − 0.35). The association was weak or non-existent in major zone Z3:1 and minor zones Z9:6 through Z9:8, and moderate in minor zone Z9:9 (rs = − 0.31). The association between LST and AP was consistently negative; it was moderate at the national level (rs = − 0.37) and strong in major zones Z3:1 and Z3:2 (rs ~ − 0.55) and minor zones Z9:1 through Z9:4 (rs ~ − 0.70). The association between NDVI and AP was primarily positive; it was weak at the national level (rs = 0.25), moderate in major zones Z3:1 and Z3:2 (rs ~ 0.45), and strong in minor zones Z9:1 through Z9:4 (rs ~ 0.55). Correlations among all environmental parameters were lowest in major zone Z3:3 and minor zones Z9:7 and Z9:8. Correlation coefficients between disease rates and environmental parameters were negligible at all levels and varied in direction and magnitude between − 0.01 and 0.12.

Table 3 Spearman’s rank correlations for monthly values calculated for each district and averaged (mean ± standard deviation) across each climate zone

Following exploratory analyses, three partial mixed-effects regression models were conducted, stratified by climate zone. The first univariate model, using only trend as an explanatory variable, showed a significant decline in reported schistosomiasis infection rates equivalent to approximately 1% per month nationally and across zones (Table 4). The second model, using four seasonal harmonic variables, revealed that seasonality in reported disease rates was present at the national level, in major zone Z3:1, and in minor zones Z9:1 and Z9:2 (Table S2, Supplemental Information). In the third model, using the three environmental predictors, AP showed a small in magnitude but statistically significant positive association with schistosomiasis infection rates at the national level, in major zones Z3:1 and Z3:2, and in minor zones Z9:3, Z9:4, Z9:5, and Z9:7. A 1-cm increase in rainfall was associated with a 0.3–1.6% increase in monthly schistosomiasis infection rates (Table 4). Associations between schistosomiasis infection rates and LST varied in magnitude and direction across zones and were not significant (p > 0.05). In major zone Z3:3, NDVI had a positive effect on disease rate, equivalent to a 3.4% increase in schistosomiasis infection rate associated with a 0.1-unit increase in NDVI. In minor zone Z9:3, NDVI had a negative effect, equivalent to a 7.7% decrease in schistosomiasis infection rate corresponding to a 0.1-unit increase in NDVI. The R2 values of all three partial models that accounted for district-level effects were similar: ~ 50% at the national level, 40–70% in major zones, and 25–65% in minor zones (Table S2, Supplemental Information).

Table 4 Estimated trend in reported rates of schistosomiasis and associations with environmental parameters (p < 0.05 bolded) for partial mixed effects regression models, shown as % change in monthly rates associated with 1-unit increase in each parameter and their 95% confidence limits

A visual representation of a model that included trend and seasonality showed that both were significant at the national level (p < 0.05), with two relative peaks in schistosomiasis infection rate observed in March and September, and lowest rate observed in December/January (Fig. 6). For the major climate zones, trend was significant in all zones and seasonality remained significant only in zones Z3:1 and Z3:2 (p < 0.05). In zone Z3:1 (north), two peaks occurred in March and September; in zone Z3:2 (middle), a single peak occurred in July. For the minor zones, the declining trend remained; however, seasonality terms were only significant in zones Z9:1, Z9:2, and Z9:8. Like the major zone Z3:1, minor zones Z9:1 and Z9:2 exhibited two peaks per year in March and September. Minor zones Z9:4, Z9:5, and Z9:8 resembled major zone Z3:2 and exhibited a single annual peak around June/July. The remaining minor zones did not show pronounced seasonality. The lowest disease rates in most major and minor zones were observed in December/January.

Fig. 6
figure 6

Visualization of the fitted values produced by the trend and seasonality model; x-axis represents months from January 2008 to November 2015 (vertical lines coincide with the month of January)

In the final regression models, inclusive of all predictors, downward trend in schistosomiasis infection rates remained significant at all levels (Table 5), and seasonality remained significant in zones Z3:1, and Z9:1 through Z9:3 (north of the country) (Table S3, Supplemental Information). Controlling for trend and seasonality, associations between schistosomiasis infection rates and environmental parameters varied across climate zones. The association remained significant for NDVI only in zone Z9:3, exhibiting a stronger effect, equivalent to a 16.5% (CI95% 3–28%) decrease in schistosomiasis infection rate associated with a 0.1-unit increase in NDVI. AP maintained a positive effect in zone Z3:2 with similar magnitude of 1.1% increase in monthly schistosomiasis infection rate associated with a 1-cm increase in AP. The effect of AP was also positive in zone 9:7, but negative in zone 9:2. In zone 9:1, LST exhibited a positive association, contributing approximately 2.5% increase in schistosomiasis infection rate for every 1 °C increase in LST. The R2 values of the final models ranged between 0.32 and 0.71 (Table S3, Supplemental Information).

Table 5 Estimated trend in reported rates of schistosomiasis and associations with environmental parameters (p < 0.05 bolded) for complete mixed effects regression models, shown as % change in monthly rates associated with 1-unit increase in each parameter and 95% CI

Discussion

Innovation

The study was innovative in several ways. Monthly records of schistosomiasis cases reported to a national surveillance system were used, aggregated at the district level. Schistosomiasis data were matched to a time series of publicly available remote sensing data. This approach allowed us to characterize the broad spatial and temporal patterns in reported schistosomiasis infection rates. Furthermore, we explored associations between disease rates and environmental predictors across different climatic areas, defined using a novel climate classification method.

To define the climate zones in this analysis, we used the LKN classification system applied specifically to Ghana, as compared to the predominant global KG climate classification system. The KG system is based on the assumptions that vegetation is the best proxy for climate and temperature and precipitation are the best proxies for vegetation (Kotteck et al. 2006). The KG system divides the world into 6 major zones and 31 minor zones based on temperature, precipitation, and their seasonal variations. It partitions Ghana into only two climate zones; using the LKN method, finer divisions were possible. The major zones approximately corresponded to the agro-ecological zones, where the northern zone Z3:1 represented Guinea Savannah, zone Z3:2 the transitional zone, and zone Z3:3 a combination of deciduous forest and rainforest (Frenken 2005). Coastal Savannah was classified as a combination of zones Z3:1 and Z3:2. Using the minor zone divisions, urban areas were naturally separated into their own zone (Z9:9). An advantage of the LKN method is that it can be fully automated, and if needed, zone delineations can be updated over time.

To our knowledge, this is the first study that used schistosomiasis surveillance data from a sub-Saharan African country to conduct a spatial and temporal analysis at the national scale. Using surveillance data offers many advantages, such as expansive geographic coverage, temporal continuity, relatively low cost of data collection, and ability to aggregate data over various temporal and spatial scales. Furthermore, all predictors were drawn from publicly available satellite RS data products, downloadable online. This methodology offers a way for public health officials in low-income countries to begin exploring broad patterns in climate-sensitive diseases using routinely collected data.

Major findings

There was a significant decline in reported schistosomiasis cases over the study period, nationally and across all climate zones. Decreasing schistosomiasis reporting could be indicative of the success of preventive chemotherapy, which has increased in frequency and geographic coverage in recent years. However, a limited exploration of the age distribution of cases reported between 2012 and 2015 (Fig. S6, Supplemental Information) demonstrated that a range of age groups contributed cases, with the highest contribution from the 20–34 age group, whereas preventive chemotherapy currently targets primarily school children, with limited community-based treatment of children and adults. Therefore, additional factors are likely contributing to the steady decline in reported cases that should be explored at various spatial and temporal scales.

Seasonality in schistosomiasis infection rates was observed in several zones, with a consistent dip in December/January and varying patterns across climate zones. The single peak in June/July in the middle of the country (zones Z3:2, Z9:4, and Z9:5) corresponded to the beginning of the rainy season. In zone Z3:2, a positive association with precipitation was also observed. Two peaks in March and September in the dry northern areas (zones Z3:1, Z9:1, and Z9:2) do not seem to be related to the environmental conditions examined in the study. Data quality issues cannot be excluded as potential reasons behind the observed patterns. The major dip in cases in December/January corresponds to the Christmas holiday season, which potentially contributes to low reporting around this time. Low reporting may be due to decreased treatment-seeking for non-urgent conditions such as schistosomiasis, or internal reporting delays. These factors are important to explore in future studies.

Based on the partial models (Table 4), the association between precipitation and schistosomiasis infection rates was positive in several zones. This finding agrees with prior studies that focused on S. mansoni (Kabatereine et al. 2004; Scholte et al. 2014) and S. haematobium (Schur et al. 2011). We did not find an association with LST. The results of prior studies on the association between LST and schistosomiasis are somewhat contradictory, with positive association found with S. haematobium (Brooker et al. 2001; Soares Magalhães et al. 2011), negative association with S. mansoni (Scholte et al. 2014), and no association with S. haematobium (Clements et al. 2006) and S. mansoni (Kabatereine et al. 2004). Recent findings, and our own results, indicate that the associations with NDVI are inconsistent. After controlling for trend and seasonality, the associations between reported schistosomiasis infection rates and environmental variables, including NDVI, remained for a few zones, yet were largely not significant. Our findings suggest that the direction and strength of associations with remotely sensed parameters varied by climate zone; thus, broad application of these parameters to countries or regions with heterogeneous climatic conditions may need to be reconsidered (Brooker et al. 2001). Additionally, separating S. haematobium and S. mansoni infections may be necessary as their seasonal patterns and associations with environmental parameters could differ.

Primarily due to their versatility, the application of geospatial tools and satellite imagery in public health research, and specifically for water-related infections, has grown substantially from simple mixed effects models (Jagai et al. 2012) to more sophisticated predictive approaches (Scholte et al. 2014; Soares Magalhães et al. 2011; Walz et al. 2015b). These novel data sources increase the demand for thoughtful development of conceptual and analytical predictive modeling suites, capable of disentangling complex spatiotemporal relationships and offering timely and reliable decision support tools for disease prevention measures.

Limitations

As previously mentioned, the dataset contained > 50% of missing values, which could indicate either a lack of reporting or a true lack of cases. During exploratory analysis, we found that the majority of missing values occurred in between reported counts, with fewer missing observations occurring prior to the first and/or after the last month with reported counts (Fig. S2, Supplemental Information). In the present study, it was assumed that schistosomiasis reporting started and ended at the same time (January 2008 and December 2015, respectively) in all districts. However, it is possible that some of the districts began reporting later than others and/or stopped reporting prior to December 2015 for unknown reasons. Several outliers were also removed. The possible reason behind missing or unusually high values may be reporting delays, causing multiple months of data to be reported in a single month, and thus could affect the assessment of seasonality. Investigating the reasons behind missing observations and data inconsistencies was beyond the scope of this analysis, but can be done in the future. Our data processing methodology likely produced conservative estimates, as the models were restricted to district/month combinations with at least one reported case of schistosomiasis. An exploration of various techniques to address missing data and outliers deserves further study.

A second limitation related to the spatial assessment is that schistosomiasis cases were aggregated at the district level. The use of aggregated data is known to suffer from modifiable areal unit problem (MAUP) effects, which are important to investigate in future studies. While district-level aggregation of disease counts is common in health studies, administrative boundaries have little relevance for environmentally sensitive diseases. Exploring spatial and temporal patterns by climate zone begins to address this issue; however, availability of disaggregated facility-level data would further strengthen the analysis.

A third limitation is that schistosomiasis counts used in the analysis applied to the month in which they were reported into the surveillance system and not necessarily the month in which they occurred. Environmental variables used in the analysis have more relevance when assessing the timing of exposure or transmission, rather the timing of reporting. Therefore, incorporating temporal lags may improve model performance. However, it would not address potential differences among districts in disease reporting practices. The majority of the explained variability in our models was attributed to district-level effects. Thus, incorporating socio-economic variables in future studies could help explain differences among districts in terms disease reporting. Gold mining, a widespread practice in many areas of Ghana, is also of potential interest in schistosomiasis modeling studies, as mining activities have the potential to affect population structures due to resettlement (Moomen et al. 2016) and water contact behaviors due to water pollution and water stress (Kulinkina et al. 2017; Moomen and Dewan 2016).

Lastly, our dataset likely suffers from severe underreporting of schistosomiasis cases. Overall, the reported case numbers are extremely low, as compared to the estimated population at risk for Ghana. For example, in 2010, 24,996 cases were reported into DHIMS. If only children ≤ 15 years of age living in rural areas (5,128,118 individuals according to the 2010 census) are considered (i.e., the most at-risk population), at the estimated 23.3% infection rate, ~ 1.2 million cases would be expected. These numbers suggest that DHIMS is capturing only 2% of the expected cases. The reported cases likely represent the most serious cases from a subset of the population able to seek treatment at large government hospitals with diagnostic capability. Children from rural areas are likely under-represented in the dataset, which is not a limitation of our analysis, but rather a reflection of the healthcare system in Ghana and other sub-Saharan African countries, dominated by routine school-based distribution of praziquantel, which results in lower likelihood that cases among children would be captured by the surveillance system.

Conclusion

The analysis assessed trend and seasonality in reported schistosomiasis infection rates, as well as associations with three remotely sensed environmental predictors (LST, NDVI, and precipitation) and their heterogeneity among climate zones. Major findings included a consistent decline in reported schistosomiasis cases across Ghana, which is likely partially, but not entirely, attributed to preventive chemotherapy. Furthermore, schistosomiasis infection rates demonstrated distinct seasonal patterns across climate zones, with two peaks in the north and one peak in the middle of the country. These temporal trends should be more thoroughly examined and compared to the trends of other climate-sensitive diseases represented within the Ghanaian surveillance system. The direction and strength of associations with remotely sensed parameters varied by climate zone, suggesting that broad use of these parameters in regions with heterogeneous climatic conditions should be reconsidered.

Despite aforementioned data quality issues, improving surveillance of NTDs in low-income countries should not be underestimated. National surveillance systems play an important role in ensuring availability of vital health data; our analysis demonstrates its utility as a decision support tool and serves as motivation for local and national governments to invest in routine data quality improvements. As recently highlighted by Kabore et al. (2013), predictive modeling should be an iterative process that undergoes progressive improvements in its methodology and data inputs. The future steps in improving the predictive capacity of surveillance data for diseases like schistosomiasis are to collect more detailed information about reported cases, including demographic and socio-economic variables, with a focus on the location and timing of disease transmission. Additionally, more work should be done in complementing surveillance data with field surveys and novel data streams to create reliable and cost-effective disease monitoring tools.