Current Efforts
There are many individuals, organizations, and community-based efforts to capture and preserve data in early 2025. Below are the efforts we are aware of and their collecting scopes. This list was developed from the original Data Rescue Google Doc. If you would like to add your efforts, feel free to email us at datarescueproject@protonmail.com. If you want to send us a secure, encrypted email, you can sign up for a free account at protonmail.com or use our public PGP key: https://keys.openpgp.org/search?q=datarescueproject%40protonmail.com.
Larger and Established Data and Website Efforts
Data Rescue Events
Ad Hoc Rescue Efforts and Data Archiving Activists
Larger and Established Data and Website Efforts
- End of Term Archive
- The main coordinated effort to save U.S. Government websites at the end of presidential administrations.
- Datasets have been more of a challenge, especially data embedded in databases.
- Environmental Data & Governance Initiative (EDGI)
- EDGI is a research collaborative and network of diverse professionals promoting evidence-based policy-making and public interest science that advances the Environmental Right-to-Know (ERTK).
- They have been focused on environmental data and are a good organization to follow for updates.
- They work with the Public Environmental Data Project (see below)
- Public Environmental Data Project
- A coalition committed to preserving and providing public access to federal environmental data.
- Recent datasets:
- February 7, 2025 - EPA’s EJScreen 2.3
- January 24, 2025 - Climate and Economic Justice Screening Tool
- January 31, 2025 - CDC’s Social Vulnerability Index and Environmental Justice Index
- January 24, 2025 - Council on Environmental Quality EJScorecard
- Harvard’s Library Innovation Lab Team
- A team of librarians, technologists, lawyers, designers, and more, that work out of the Harvard Law School Library.
- 2025-02-06: Released an archive of data.gov] on Source Cooperative. The 16TB collection includes over 311,000 datasets harvested during 2024 and 2025, a complete archive of federal public datasets linked by data.gov. It will be updated daily as new datasets are added to data.gov. **NB**: This is a 'shallow crawl' that collects only the directly linked files. Datasets that link only to a landing page will need to be collected separately
- 2025-01-31: This is the first release in our new data vault project to preserve and authenticate vital public datasets for academic research, policymaking, and public use.
- ICPSR
- An international consortium of more than 810 academic institutions and research organizations. Provides leadership and training in data access, curation, and methods of analysis for the social science research community. Based at the University of Michigan.
- Overview of ICPSR's data rescue activities to date:
- Downloaded ~2800 files from various sources requested by researchers; all the files ICPSR collected will soon be available via a dropbox link.
- Examining CDC data dump from archive.org to assess what might be missing. Ideally, it will also be a resource for those looking for data to see what is/isn’t available.
- ICPSR staff and allies are generating metadata for each of the datasets we have so that we can make them available through an existing archive at ICPSR (DataLumos, openICPSR, or the Resource Center for Minority Data, depending on our timeline and some technical issues we’re working out)
- ICPSR Data Lumos
- A crowd-sourced repository for US federal government data. This is the main repository for Data Rescue Project's data.
- We have added data from FEMA, the Department of Education, and IMLS.
- IPUMS
- Based at the University of Minnesota, provides census and survey data from around the world integrated across time and space.
- Includes major data sources from the US government, such as the Census, American Community Survey, Current Population Survey, and more. Includes GIS data.
- They have data and have been working on cataloging efforts. More information is coming.
- Dryad
- A generalist repository is available to help with data publication, storage, and preservation.
- Silencing Science Tracker
- A joint initiative of the Sabin Center for Climate Change Law and the Climate Science Legal Defense Fund.
- Tracks government attempts to restrict or prohibit scientific research, education or discussion, or the publication or use of scientific information.
- OSF
- Generalist repository for archiving, sharing, and storing all types of research outputs, not limited to preprints or only data.
- Many universities also have institutional repositories where research (articles, data, dissertations, etc) from that institution can be posted. They also have preservation mandates. An example is Penn’s ScholarlyCommons.
- OSF is available as an option for pre-prints of articles if, for some reason, they cannot be posted on official sources.
- The Climate Mirror Project
- A project that has NOAA data pulled during the 2017 data rescue.
- Open Energy Data Initiative
- A volunteer has pointed out that “key equity data” is missing from the Dept of Energy. Says they were able to find it on this site. Includes additional data from DOE.
- Wayback Machine
- The Wayback Machine is an initiative of the Internet Archive, a 501(c)(3) non-profit, building a digital library of Internet sites and other cultural artifacts in digital form. Other projects include Open Library & archive-it.org.
- Roper Center for Public Opinion Research at Cornell University
- Roper Center has collected over 50,000 files (datasets and documentation) from 22 federal survey projects. Efforts to this point have been focused on acquiring the files and ensuring backup copies are preserved on multiple servers.
Data Rescue Events
- University of Washington-based Data Rescue
- Hosted by the University of Washington Center for Advances in Libraries, Museums, and Archives (CALMA), series of data rescues followed the model from 2017. The spreadsheet of data reviewed at the events is available: Data Tracking List - Data Rescue 2025 (Responses).xlsx
- It is unclear if they are hosting more.
- Stanford’s Big Local News
- They are running Federal data collection collaborative
Ad Hoc Rescue Efforts and Data Archiving Activists
- Resources and Links: Various individuals and organizations have worked to archive / save data from the NIH, CDC, and other websites. This page lists many of those entities.
- UCSB LSIT Data Mirroring
- Mirrored and archived public data on a locally hosted git server
- Includes retrieved data sets from CDC, NIH, and NOAA
- CDC Page on Internet Archive
- A special archive created on IA of all CDC datasets publicly available as of January 28, 2025
- uploaded by DataHoarders (we think)
- Datasets in Dataverse
- Data uploaded by the Climate Change and Health Research Coordinating Center (CAFE)
- Includes CDC's Social Vulnerability Index data. Most of what's being placed here is data focusing on health and the environment.
- NOAA heat-index files
- Safeguarding Research
- Based in EU, USA, and globally - this initiative has access to 1-2 PB (and more on the way) of storage & people willing to seed.
- Have several large-scale efforts, including a 350GB web archive of CDC, including all 30.000 files from archive.cdc.gov And much more
- There is a forum you can join.
- Data Hoarder
- A reddit community that is coordinating efforts to rescue data.
- Data Hoarding
- Index of resources and archives related to data hoarding, web archival, and self-hosting.
- ArchiveTeam Warriors
- They run a distributed crawler. Anyone can install it to help contribute.
- US Federal Data page
- Data is uploaded to Archive.org by volunteers
- Data Liberation Project
- Note: It looks like the project may have stalled in September 2024. Send info if you know more about them.
- It is run by BigLocalNews and MuckRock, which are good groups to follow.
- Healthy Regions & Policies (HeRoP) Lab
- U of Illinois Urbana-Champaign.
- Preserved datasets and guidances include (Available via Box):
- The Center for Disease Control (CDC);
- The Environmental Protection Agency (EPA);
- The Health Resources and Services Administration (HRSA).
- ACASignups.net: Links to archived versions of every CDC government page (Parts 1 through 15).
- Digital Government Hub: The Digital Government Hub is a dynamic, open-source reference library for anyone using design, data, and technology to improve and enhance government service delivery.
Last updated: 2025-02-12 T02:20:53Z