Current Efforts

There are many individuals, organizations, and community-based efforts to capture and preserve data in early 2025. Below are the efforts we are aware of and their collecting scopes. This list was developed from the original Data Rescue Google Doc. If you would like to add your efforts, feel free to email us at datarescueproject@protonmail.com. If you want to send us a secure, encrypted email, you can sign up for a free account at protonmail.com or use our public PGP key: https://keys.openpgp.org/search?q=datarescueproject%40protonmail.com.

Larger and Established Data and Website Efforts
Data Rescue Events
Ad Hoc Rescue Efforts and Data Archiving Activists

Larger and Established Data and Website Efforts

  • End of Term Archive
    • The main coordinated effort to save U.S. Government websites at the end of presidential administrations.
    • Datasets have been more of a challenge, especially data embedded in databases.
  • Environmental Data & Governance Initiative (EDGI)
    • EDGI is a research collaborative and network of diverse professionals promoting evidence-based policy-making and public interest science that advances the Environmental Right-to-Know (ERTK).
    • They have been focused on environmental data and are a good organization to follow for updates.
    • They work with the Public Environmental Data Project (see below)
  • Public Environmental Data Project
  • Harvard’s Library Innovation Lab Team
    • A team of librarians, technologists, lawyers, designers, and more, that work out of the Harvard Law School Library.
    • 2025-02-06: Released an archive of data.gov] on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov. **NB**: This is a 'shallow crawl' that collects only the directly linked files. Datasets that link only to a landing page will need to be collected separately
    • 2025-01-31: This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use.
  • ICPSR
    • An international consortium of more than 810 academic institutions and research organizations. Provides leadership and training in data access, curation, and methods of analysis for the social science research community. Based at the University of Michigan.
    • Overview of ICPSR's data rescue activities to date:
      • Downloaded ~2800 files from various sources requested by researchers; all the files ICPSR collected will soon be available via a dropbox link.
      • Examining CDC data dump from archive.org to assess what might be missing. Ideally, it will also be a resource for those looking for data to see what is/isn’t available.
      • ICPSR staff and allies are generating metadata for each of the datasets we have so that we can make them available through an existing archive at ICPSR (DataLumos, openICPSR, or the Resource Center for Minority Data, depending on our timeline and some technical issues we’re working out)
    • ICPSR Data Lumos
      • A crowd-sourced repository for US federal government data. This is the main repository for Data Rescue Project's data.
      • We have added data from FEMA, the Department of Education, and IMLS.
  • IPUMS
    • Based at the University of Minnesota, provides census and survey data from around the world integrated across time and space.
    • Includes major data sources from the US government, such as the Census, American Community Survey, Current Population Survey, and more. Includes GIS data.
    • They have data and have been working on cataloging efforts. More information is coming.
  • Dryad
    • A generalist repository is available to help with data publication, storage, and preservation.
  • Silencing Science Tracker
    • A joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund.
    • Tracks government attempts to restrict or prohibit scientific research, education or discussion, or the publication or use of scientific information.
  • OSF
    • Generalist repository for archiving, sharing, and storing all types of research outputs, not limited to preprints or only data.
    • Many universities also have institutional repositories where research (articles, data, dissertations, etc) from that institution can be posted. They also have preservation mandates. An example is Penn’s ScholarlyCommons.
    • OSF is available as an option for pre-prints of articles if, for some reason, they cannot be posted on official sources.
  • The Climate Mirror Project
    • A project that has NOAA data pulled during the 2017 data rescue.
  • Open Energy Data Initiative
    • A volunteer has pointed out that “key equity data” is missing from the Dept of Energy. Says they were able to find it on this site. Includes additional data from DOE.
  • Wayback Machine
  • Roper Center for Public Opinion Research at Cornell University
    • Roper Center has collected over 50,000 files (datasets and documentation) from 22 federal survey projects. Efforts to this point have been focused on acquiring the files and ensuring backup copies are preserved on multiple servers.

Return to top

Data Rescue Events

Return to top

Ad Hoc Rescue Efforts and Data Archiving Activists

  • Resources and Links: Various individuals and organizations have worked to archive / save data from the NIH, CDC, and other websites. This page lists many of those entities.
  • UCSB LSIT Data Mirroring
    • Mirrored and archived public data on a locally hosted git server
    • Includes retrieved data sets from CDC, NIH, and NOAA
  • CDC Page on Internet Archive
    • A special archive created on IA of all CDC datasets publicly available as of January 28, 2025
    • uploaded by DataHoarders (we think)
  • Datasets in Dataverse 
    • Includes CDC's Social Vulnerability Index data.  Most of what's being placed here is data focusing on health and the environment. 
  • NOAA heat-index files
  • Safeguarding Research
    • Based in EU, USA, and globally - this initiative has access to 1-2 PB (and more on the way) of storage & people willing to seed.
    • Have several large-scale efforts, including a 350GB web archive of CDC, including all 30.000 files from archive.cdc.gov And much more
    • There is a forum you can join.
  • Data Hoarder
    • A reddit community that is coordinating efforts to rescue data. 
  • Data Hoarding 
    • Index of resources and archives related to data hoarding, web archival, and self-hosting. 
  • ArchiveTeam Warriors
    • They run a distributed crawler. Anyone can install it to help contribute.
    • US Federal Data page
    • Data is uploaded to Archive.org by volunteers
  • Data Liberation Project  
    • Note: It looks like the project may have stalled in September 2024. Send info if you know more about them.
  • Healthy Regions & Policies (HeRoP) Lab
    • U of Illinois Urbana-Champaign.
    • Preserved datasets and guidances include (Available via Box):
      • The Center for Disease Control (CDC);
      • The Environmental Protection Agency (EPA);
      • The Health Resources and Services Administration (HRSA).
  • ACASignups.net: Links to archived versions of every CDC government page (Parts 1 through 15).
  • Digital Government Hub: The Digital Government Hub is a dynamic, open-source reference library for anyone using design, data, and technology to improve and enhance government service delivery.

Return to top


Last updated: 2025-02-12 T02:20:53Z