India is fortunate to have a rich tradition of public data collection and compilation. Government functionaries at the national, state, district, block and panchayat levels collect data on thousands of variables on population, land use, agricultural production, irrigation, stream flows, reservoir storage, groundwater level, employment and livestock; and almost all of it is meticulously aggregated and compiled at district and state level.
These routine or regular data collection exercises are complemented by quinquennial (every five years) Agriculture, Livestock and Minor Irrigation censuses and the decadal population census. Large-scale sample surveys routinely undertaken by the National Sample Survey Organization (NSSO) add richness to the census data. Compared to other developing countries, India’s public data collection and availability is much better; some of the detailed datasets – such as the census of more than 20 million minor irrigation structures – is not even available in much ‘richer’ water economies such as the United States.
As can be expected from any large-scale data collection process, the quality of our public data is highly variable, sometimes even inconsistent. But, overall, if analyses and interpretation are done keeping in mind some of the limitations, the datasets can be a precious resource at the meso and macro level. For all the effort and resources that go into the collection of this data, and the rich overview of the water and agriculture economy that the data can paint, India’s public datasets are largely underutilized.
Factors that contribute to underutilization of public datasets
Most national-level censuses fall under the responsibilities of the concerned ministry of the Government of India. However, the actual execution and data collection cannot be done without support from state and local government departments. For the Minor Irrigation Census, for example, the Minor Irrigation (Statistics) Wing in the Ministry of Jal Shakti (formerly, Ministry of Water Resources, River Development and Ganga Rejuvenation) is the nodal agency at Centre. All costs associated with conducting the census is borne by the central government. However, execution is the responsibility of respective state water resource ministries/nodal departments.
The central ministry prepares the pro-forma schedule and conducts workshops in states to train enumerators. While some states take this exercise seriously, others may not. While in some states, the irrigation department collects the data themselves, in others, it has been farmed out to private data collection agencies.
In the past, due to several reasons, some states have even failed to send back any data – for example, Rajasthan is missing from the first MI census. Gujarat, Maharashtra and three union territories (Chandigarh, Daman & Diu and Lakshadweep) were missing from the second census. Daman & Diu and Lakshadweep are also missing from the subsequent third, fourth and fifth censuses. All of these create variability and inconsistencies in data and make it difficult to compare and analyse.
Given that so far, most of the data collection has been done through paper-based enumeration schedules, there is usually a big lag between data collection and its final release. The report of the Fourth MI census – which had the reference year of 2005-06 was released in 2014; the lag was reduced in the fifth census where the reference year was 2013-14 and the report was published in 2017. In recent iterations, this issue is expected to be tackled through digitization of data collection and compilation and use of tablet-based surveys and appropriate tools for data management.
Another huge challenge with data is that it is collected and compiled based on administrative boundaries (districts, blocks)– which themselves keep changing. Any comparison of data across different censuses has to deal with this issue. While changes in administrative boundaries are perhaps inevitable, the analyses can be made easy if disaggregated data was made accessible (so that it can easily be re-aggregated as per new administrative divisions).
Most of the data is released as poorly-scanned data tables in pdf format, and often needs to be painfully downloaded state-wise, or sometimes even by crop or district. Often, undertaking any analyses requires a costly and wasteful process of ‘re-digitalization’ which can be easily avoided.
Our hunch is that if the data was better organized and available for download in more ‘user-friendly’ formats, its utilization would improve manifold and it would start informing important policies and programs – both government-led as well as donor or civil-society driven.
Despite these limitations, India’s public ‘water and agriculture’ datasets can help nudge India’s public policy debates towards data-driven planning – at the macro level. These datasets, however, are unlikely to be very useful for the village or even small watershed-level planning.
For best results, data from large public datasets and micro-level field studies should be made interoperable so that the two can be combined to present a nuanced picture of the agrarian economy.
We conclude by highlighting some positive steps in the public data space that are encouraging and a ‘wish-list’ of what else needs to be done to nudge us in the right direction.
Some bright spots
- The National Rainfed Area Authority (NRAA) recently published a report that has creatively combined several datasets on rainfall, land use, agriculture, livestock, groundwater, soil moisture, demography, WASH etc. into two district-level indices: Natural Resource Index and Livelihood Index. Both of these are then combined into a Composite Index for arriving at a data-driven prioritization of districts (NRAA 2020).
- The sixth minor irrigation census added a census of water bodies – both rural and urban – which would also capture their pictures and GPS location.
- Adoption of digital data collection tools is slowly becoming the standard practice, and this is likely to improve data quality, reliability, and timeliness. The twentieth livestock census deployed a combination of web-based and mobile-based schedules for collection and compilation of data.
- The India Observatory has launched the Groundwater Monitoring Tool – an open-source Android tool that enables collection and compilation of well water level data. The tool facilitates a network of field organizations to contribute data from villages in their respective field areas.
- Some of IWMI-Tata Program’s research has demonstrated how creatively combining some of these datasets can be used to inform big-ticket policy initiatives such as the Pradhan Mantri Krishi Sinchai Yojana (PMKSY) (Shah et al. 2016).
Availability of high-resolution remote sensing data is increasing by the day. Recently, Norway invested ~€37 million to make < 5m resolution satellite imagery for 64 countries open access to assist research and policymaking on deforestation. This includes imagery from Planet, KSAT, Airbus and historical SPOT imagery from 2002 onwards.
- CGWB is planning to release data collected through the national aquifer mapping program (NAQUIM) in formats that are useful for the village as well as macro-level planning.
What more is needed
- Inter-ministerial coordination and convergence for coherence: Given that collection and compilation fall in the purview of sectoral ministries; India’s public data tends to be highly isolated and sectoral. Even agriculture and irrigation datasets – which are so closely linked – don’t always sync well. Better coordination between Jal Shakti and Agriculture ministries in designing survey instruments and scheduling rounds of data collection can bring immense returns.
- Since farmers in most states are provided free or highly subsidized power for pumping, data on power consumption in agriculture is at best a rough estimate which is a leading worry for electricity regulators as it can undermine accountability in utility performance. IWMI-Tata Program has used data from the Minor Irrigation Census to estimate energy consumption in agriculture (see Rajan and Verma 2017). Likewise, given the salience of energy-irrigation nexus in India, bringing data from the two sectors together can yield useful insights.
- User-friendly navigation and aggregation/ disaggregation layers: A common portal to bring all the large datasets together can be very useful, especially for planning programs and designing interventions. The Open Government Data platform [https://data.gov.in/] is an early step in this direction but it needs to do more than just be a repository of stand-alone and dispersed datasets.
Links to some useful large-scale data sets available online
Ministry of Home Affairs, GoI - https://censusindia.gov.in/
First, non-synchronous (1965-72); Second (1881), Third (1891), Fourth (1901), Fifth (1911), Sixth (1921), Seventh (1931), Eighth (1941), Ninth (1951), Tenth (1961), Eleventh (1971), Twelfth (1981), Thirteenth (1991), Fourteenth (2001), Fifteenth (2011)
Ministry of Agriculture - http://agcensus.nic.in/
Minor Irrigation Census
Ministry of Jal Shakti http://micensus.gov.in/
MIC Dashboard: http://188.8.131.52/dashboard#/dashboard
Department of Animal Husbandry and Dairying http://dahd.nic.in/
Ministry of Statistics and Programme Implementation - http://mospi.nic.in/
Central Ground Water Board - http://cgwb.gov.in/
Water storage in Major Reservoirs
Central Water Commission - http://cwc.gov.in/
Indian Meteorological Department - https://mausam.imd.gov.in/
State-wise Agriculture statistics and Land use statistics
Directorate of Economics and Statistics, DAC & FW https://eands.dacnet.nic.in/
Acknowledgement: Some ideas discussed in the article stem from a webinar on “Data-driven ‘water and agriculture’ planning” with Tata Trusts’ and partners. The authors would like to acknowledge support from the CGIAR research program on Water, Land and Ecosystems (WLE) and IWMI-Tata Water Policy Program (ITP).
Shilp Verma and Cheshta Rajora work with IWMI-Tata Policy Program. Manisha Shah is associated with Arghyam. Views expressed are personal.
 Data is collected and digitized using public resources, and published in formats that necessitate manual digitization all over again before the data can be used for any analyses.