Preserving Our Public Data Heritage
The PEGI Project recently released a post about the preservation of public federal data in support of the Data Rescue Project. PEGI stands for Preservation of Electronic Government Information and was created in 2016 by a group of government information librarians who were concerned about the future of electronic forms of government information. The group provides some excellent background reading if you are interested in exploring their focus more.
PEGI also had connections to the 2017 Data Refuge Project, the precursor of our Data Rescue Project, as part of the Libraries+ discussions with librarians, archivists, journalists, and others interested in preserving government information. In her presentation on this effort at the 2017 Open Access Symposium at UNT, Margaret Janz, a leader in the Data Refuge Project, noted that government agencies had backup and preservation systems but that “if something happens to the agency,” access becomes a significant challenge. With the recent focus on dismantling agencies, we now face the reality she described.
While the current situation is unprecedented, the focus of PEGI, government documents librarians, and data librarians has always been on the preservation of access to government information in all its forms, including data. The challenge with datasets is that to make them discoverable and usable in the long run (or to meet the FAIR principles), data needs documentation, metadata, and additional attention that is not always possible in a simple web crawl. Moreover, one of the lessons learned from the past efforts was a need to ensure that the data has a home in a trusted repository. This is why our group has focused our efforts on Data Lumos, the crowd-sourced repository created by ICPSR in the wake of the 2017 data rescues. Our group has added over 120 datasets to Data Lumos, an incredible achievement over the past two weeks. In the coming days, we will focus more on the Data Tracker and efforts to keep abreast of where datasets have been sent.
Access to this data is needed not only by researchers but also by communities and individuals. Providing continued access is what motivates over 300 people to come together and work with the Data Rescue Project, the End of Term Crawl, EDGI, and others. We are grateful for that community and thankful for all those who have given many hours of their free time to help out. Our goal is to ensure access to data as information and to our public data heritage for the long term and for generations to come.