Getting Started with the Data Rescue Project

Welcome! And thank you so much for volunteering to assist with the Data Rescue Project. We’re excited for you to join us in helping protect federal at-risk data.

Once you’re in Mattermost

First, all new members should join the orientation channel and read the documents pinned to the channel, especially the Community Guidelines. Once you have read the Community Guidelines, please chat with a green check emoj ✅. If you are new to using Mattermost, please see the Mattermost documentation for information on using this platform.

All members must join the orientation channel and accept the community guidelines upon starting work with DRP.

What other channels can I join?

Once you’ve joined Mattermost and read the Community Guidelines, there are a number of channels at your disposal. Take a look at the pinned messages on various channels for important information, such as guidelines, workflows, and documentation.

  • help_data_uploads
    • Channel purpose: for discussion of data rescue efforts where datasets are being uploaded to Data Lumos for preservation. Current active efforts: HUD, FHFA, CFPB
  • help_reference
    • Channel purpose: for requesting reference assistance in finding things, e.g., do you know where these data are?, or to request a review of a rescued dataset you have uploaded to a repository. Please be as specific as possible in your request.
  • help_rescue_events
    • Channel purpose: to share ideas and information about data rescue events if you are planning them.
  • help_storage
    • Channel purpose: storage offers and requests go here.
  • help_technical
    • Channel purpose:
      • Requests for support from one of our volunteers, such as help downloading a dataset
      • Offering storage
      • Registering a need for storage
  • help_rescue_events
    • Channel purpose: provide a forum for discussing plans around data rescue events. We will put guidance and suggestions in this channel.
  • introductions
    • Channel purpose: a welcome space for new users to introduce themselves and see who else is here. We encourage you to use your handle. Introduce yourself if you feel comfortable. There is no pressure or requirement to do so. Any member may join.
  • random
    • Channel purpose: a place for flimflam, faffing, hodge-podge or jibber-jabber you'd prefer to keep out of more focused work-related channels. Feel free to share pets, memes (that meet our community guidelines), and more on this channel. Any member may join.
  • town-square
    • Channel purpose: this is the general channel for announcements, communications, and matters related to the efforts at hand (i.e., the rescuing of data). All members are in this channel.
  • work_agencyName
    • Channel purpose: This is an example channel for when people want to group up and discuss rescue work for specific agencies. Please keep your channel public.
  • urgent
    • Channel purpose: to communicate and monitor urgent requests and notifications. Post information about agencies, offices, or more considered high-risk. Respond with information of work on these data. Any member may join.

What if I have questions?

There are a number of ways to ask questions in the Mattermost. If it’s related to a task or a technical project you’re working on, we have channels for that: ~help_reference and ~help_technical.

Additionally, there are a number of key project members you can contact should any specific questions arise.

I’m new here. How do I help?

Helping: no technical or data experience

We have a number of tasks and efforts that do not require technical expertise or any data experience. All you need is an internet connection, a web browser, and the willingness to help.

Helping: technical and data experience

Using the Data Rescue Tracker

The Data Rescue Project team works in Mattermost and the Data Rescue Tracker. The Tracker is a collaborative tool built to catalog all existing public data rescue efforts so that we can coordinate better across initiatives. The Data Rescue Tracker provides consolidated overviews of who is downloading which dataset from which government websites. If you are looking for a specific dataset, check out Downloads to see if it has been captured. If you are looking for ongoing initiatives and what they’re focusing on, check out Maintainers.

In the Tracker, you can see:

  • Datasets we’ve identified
  • Their status (Not started, In progress, Finished)
  • Descriptive information (e.g., Organization, Agency)
  • Download Date
  • Approximate size
  • The responsible maintainer
  • Where the dataset has been added (Download location)

To submit updates on datasets, use the Tracker Download Submission Form. All submissions will be reviewed by the Data Rescue Team. This tool is not for nominating datasets for download and preservation.

Directions for Working on Tasks

As we continue to work on this project and expand our capabilities, more workflows will be added to this documentation.

Finding and confirming alternate locations of federal data

In addition to using your favorite search engine of choice and the Data Rescue Tracker, there are a few locations to check to see if data have already been downloaded.

If you find that data have already been uploaded somewhere else, confirm that:

  • All data are present
  • All metadata is present

Once you have confirmed the dataset is complete, submit an update to the Tracker (above). If the dataset is incomplete, submit an update to the Tracker that the dataset is “In Progress”–then begin the process of downloading the dataset.

Downloading data from federal sources

This will depend greatly on the agency from which you are downloading data. Some data will be forthcoming to download, while others will require the use of API calls and other scripting techniques. Review available resources for downloading data before beginning. 

In general, when downloading data:

  1. Ensure you are downloading all of the data and associated metadata. If the record of the dataset does not automatically download, capture it in a plain text file to be stored alongside the dataset.
  2. After the download is complete, review the files to confirm:
    1. All of the data was indeed downloaded
    2. None of the files are corrupt
  3. Organize the documents and data in a logical fashion (i.e., by year)
  4. If possible, avoid changing the file names and folder names, but the data can be compressed into a new zip file if appropriate. 

If you need additional help with downloading a dataset, join ~help_technical.

Uploading data into Data Lumos

Data Lumos is a crowd-sourced repository managed by ICPSR that is specifically for valuable US government data. In addition to the information below, we have a guide for Rescuing a Dataset and Uploading to Data Lumos.

  • To upload data,  you will need to make a Researcher Passport account to upload data. This account is free for all users.
  • After you have logged in, navigate to “My Workspace.” 
    • Here you will be able to see the datasets you upload or are added to. 
  • Click “Create New Project” in the top left, and give your project a descriptive name for this dataset. 
    • You’ll then be taken to a new workspace, where you can upload files, import from a zip, and add descriptive metadata. 
    • If necessary, you can also add collaborators to the workspace. 
  • Once you have completed the upload, you can Publish Project to make the dataset discoverable. 

Refer to the Data Lumos Descriptive Tips to ensure this dataset will be findable. 

Confirming data upload

OPTIONAL:  If you would like someone to look over your upload and description of a dataset, head to ~help_reference. 

Research assistance or answering reference questions

We have been working from a large spreadsheet with multiple tabs. Research assistance or reference questions can now be found and answered in the ~help_reference channel.

Asking/Adding a question or asking for reference/research assistance

  1. Type your question in the ~help_reference channel and send it to the group. If sending on someone else’s behalf, please indicate this is so.
  2. As part of the question, please give (if applicable):
    1. The agency in question
    2. Relevant links
    3. The level of urgency (controlled vocab: asap, today, in the next few days, this week, when you find the time)
  3. Once your question has been answered/resolved, please edit your original post and add “[ANSWERED]” to the beginning of your question, and select save.

Note: Please do not delete your question once it has been answered.

Answering a reference question or assisting with research

  1. Using the Mattermost “reply” function, please indicate your willingness/availability to respond to the question by saying, “I’ll take it” or something similar.
  2. Using that thread, please add any updates or follow-up questions you may have.
  3. If you want assistance, please tag either “@all,” “@channel,” or the specific handle of someone you have in mind.
  4. Once you feel as though you have satisfactorily answered the question, please tag the original poster and say the work is complete.