SDG 16: Peace, Justice and Strong Institutions | SDG 17: Partnerships for the Goals
Ministry of Statistics and Programme Implementation (MoSPI) | National Statistics Office (NSO) | Data Informatics and Innovation Division
The Ministry of Statistics & Programme Implementation (MoSPI) has released the 2025 edition of the Compendium of Datasets and Registries in India. Prepared by the Data Informatics and Innovation Division, the compendium catalogs 272 datasets from various Ministries and Departments to serve as a centralized metadata platform for national development. The primary purpose is to enhance transparency and public access to government data, facilitating evidence-based policy planning across all administrative levels.
Primary Purpose and Vision The central objective of this publication is to enhance public access to information and provide a unified platform for “Data for Development”. By collating metadata—including data descriptions, classifications, and access points—in one place, the document aims to:
Facilitate Evidence-Based Policy: Providing policymakers and researchers with a clear understanding of available data resources to inform developmental planning.
Promote Transparency: Making government data sources visible and accessible to the public, consistent with open data standards.
Support Data Interoperability: Documenting standard classifications (e.g., NIC, NCO) and dissemination formats to encourage cross-sectoral data analysis.
Landscape Analysis of India’s Data The compendium provides a unique analysis of the current state of national data:
Data Sources: Over 55% of the information is derived from Administrative data (including MIS and Census), while 14.7% is collected through dedicated surveys.
Dissemination Formats: Approximately 57% of datasets are now available in structured, machine-readable formats like MS-Excel, CSV, or TXT, reflecting a significant move toward digital-first governance.
Geographic Granularity: Majority of datasets (54%) are disaggregated by geography, with significant representation at the All-India (22%), State/UT (17%), and District (13%) levels .
Addressing the Gap in Structured and Non-Structured Data A critical finding of the 2025 landscape analysis is the ongoing transition toward machine-readable information. Currently, 57% of government datasets are disseminated in structured formats such as MS-Excel, CSV, or TXT. However, a significant 28% of datasets remain in non-structured "Other" formats, which include web intelligence, dashboards, online portal data, images (.img), shapefiles, and APIs. An additional 15% are disseminated exclusively as PDFs, posing challenges for automated data extraction and cross-sectoral analysis.
What is the role of “Administrative Data” in the 2025 Compendium? Administrative data refers to information collected primarily for routine official activities, such as keeping records of individuals, transactions, or business organizations. In the 2025 Compendium, these datasets—including MIS and Census records—account for more than 55% of the entries. This high percentage underscores that the core of India’s developmental data is a derivative of everyday governance, emphasizing the importance of digitizing and standardizing routine administrative processes to generate high-quality statistical insights without the need for additional primary surveys.
What is "Non-Structured Data" in the context of the 2025 Compendium? Non-structured data refers to information that does not follow a predefined data model or is not organized in a predefined manner, making it difficult for traditional database tools to process directly. In the 2025 Compendium, the 28% of data in "Other" formats—such as dashboards, web portals, shapefiles, and raw images—falls into this category. While these formats are visually accessible to citizens, they require specialized tools or manual processing to be converted into structured, machine-readable formats for advanced AI modelling or statistical longitudinal studies.
Policy Relevance
The 2025 Compendium acts as a strategic roadmap for the National Data Governance Framework Policy.
Standardizing Institutional Memory: By documenting metadata and contact points for 272 datasets, MoSPI ensures that critical data assets remain accessible across administrative transitions.
Transition to Machine-Readability: Identifying that 43% of data (PDFs + “Other”) is not yet fully structured provides a clear target for the Ministry of Electronics and Information Technology (MeitY) to prioritize digitization under the India Stack ecosystem.
Localization of SDGs: The high level of geographic disaggregation (54%) supports the NITI Aayog’s efforts to monitor Sustainable Development Goals at the district and block levels.
Reducing Redundancy: Documenting existing administrative sources helps prevent the duplication of data collection efforts, thereby reducing “survey fatigue” and saving fiscal resources.
Global Benchmarking: The inclusion of global-level datasets and international standards (like CPC and UNISIC) ensures that India’s data ecosystem is interoperable with global development monitoring frameworks.
Relevant Question for Policy Stakeholders: How can MoSPI create a “National Data Portal Internship” for graduates to convert the 28% of non-structured government data into machine-readable formats, thereby enhancing the utility of the 2025 Compendium for the Indian AI and start-up ecosystem?
Follow the full news here: Compendium of Datasets and Registries in India, 2025

