Managing Data

The Research Data Services team offers consulting and support for managing your data and creating a data management plan (as required by many funding agencies). Click the headings below for more information, and contact us to schedule a data management consultation.

Writing a Data Management Plan

Principles

  1. Know the guidelines for your grant program.
  2. Assign responsibilities for managing data.
  3. Understand what types of data you will be producing and the approximate volume.
  4. Budget for research data management; the NSF, for example, allows reasonable costs related to implementing the data management plan  to be included in the budget, if justified.

Grant Agency Guidelines

Tools

  • DMPTool offers agency-specific templates to generate ready-to-use data management plans. Rice is a member, so you can log in using your netid and password.

Resources

Sample Plans

Services

Writing a proposal? Need help writing a data management plan? Contact the Data Management Team at researchdata@rice.edu. We will review draft plans.

Organizing Data

Principles

  1. Plan. Think in advance about key issues that will affect your research data. What types of data will be generated? How much data will be collected? What data do you need to retain long term? Consider creating a data inventory to understand and track your data.
  2. Choose appropriate file formats. File formats for long term access are:
    • Non-proprietary
    • Open, documented standard
    • In common usage by research community
    • Use standard character encoding (ASCII, UTF-8)
  3. Name your files well:
    • Be consistent (always use same information and order of information)
    • Use unique identifiers (e.g. acronym for project)
    • Do not use spaces or special characters (\ / : * ? ” < > |)
    • When using dates follow the Date and Time Formats (W3C-DTF) standard (YYYYMMDD[hh][mm][ss])
    • To keep track of updated versions, use sequential numbering (v1, v2, etc.) rather than words, such as “Final.”
  4. Separate ongoing and completed work. Before you amass lots of folders and files, it may be useful to separate your original data from that you are currently working on, and also to differentiate between ongoing and completed work.  Create a copy of your original data and put in a folder named something like “Original.” Make multiple back ups in multiple locations.
  5. Be selective. Decide whether/when it is appropriate to delete digital materials and data, based upon standards of your discipline and guidelines of your funding agency.  Plan this with your colleagues.
  6. Describe your data: Create a data dictionary with a detailed description of your data set or data model. Use community based standards when possible; here is a short list by discipline.  Include the data collection methods, variable names, codes, algorithms, file formats and software versions, structure of the data files, sources, quality control or related issues, transformations and any issues regarding privacy or confidentiality and use/re-use.

Tools and Methods

File Renaming

Workflow Management

Versioning

  • GitHub
  • Subversion (supported by Rice IT)
  • Cloud services such as Box often provide some level of versioning

Services

The Research Data Management Team can recommend best practices for organizing and naming files, help you develop and implement a plan for managing data, and assist with developing a framework for data documentation.

Storing Data

Options for Storing Your Data

Rice offers researchers several options for storing data, including:

  • Research Data Facility: “a combination of cloud-based and on-premises storage services that are robust and secure, flexible enough to meet a wide range of use cases, and scalable to meet future data storage needs."
  • Box: “enterprise cloud-based storage & collaboration service”

As you select an appropriate storage option, consider:

  • Backup
  • Versioning
  • Security
  • Ability to share data with collaborators
  • Reliability

Best Practices for Data Preservation

“Data often have a longer lifespan than the research project that creates them. Researchers may continue to work on data after funding has ceased, follow-up projects may analyse or add to the data, and data may be re-used by other researchers.” [UK Data Archive]

Preservation is a key part of the data life cycle model. Active management will help ensure the data remains accessible for the long term and support reuse for continuing or future research. Note the “3-2-1” rule: make three copies, store two on different types of media, with one in a different location.

Resources

Rice University’s Digital Scholarship Archive (RDSA) provides support for many of these recommended practices, such as storage in diverse geographic locations, multiple redundant copies, and automatic backup and synchronization of files, all of which help reduce the risk of data loss and ensure long term data integrity. You can deposit small, publicly-accessible datasets to this archive. You can also deposit your papers, presentations, conference papers, reports, white papers and other scholarly works, provided that they can be made publicly accessible.  Rice Digital Scholarship Archive provides a stable URL for citation purposes and manages scholarly information for the long-term.

Sharing Data

Why share data?

  • Meet funding agency requirements
  • Support transparency and replication
  • Increase visibility of research

Where can I deposit and share my data?

How can others use my data (licensing and intellectual property)?

How can I get credit for my data?

Resources

  • Alyssa Goodman, et al. “Ten Simple Rules for the Care and Feeding of Scientific Data.” PLoS Comput Biol 10, no. 4 (April 24, 2014). doi:10.1371/journal.pcbi.1003542.