Data Management & Research Standards
Data Management and Research Standards
- Research Standards
- Data Management
- Plan and storage
- Naming Conventions
- Base requirements
- Additional Information and References
I - RESEARCH STANDARDS
- Informed consent: Researchers must obtain informed consent from study participants, ensuring that they are fully informed about the nature and purpose of the study, as well as any potential risks or benefits.
- Data management: Researchers must implement effective data management practices to ensure that data is stored securely, managed ethically, and available for future analysis and replication.
- Research ethics: Researchers must adhere to ethical principles, such as the protection of human subjects and the responsible use of animals in research.
- Publication requirements: Researchers must comply with publication standards, such as ensuring that data is reported accurately and transparently, and that authorship is appropriate and properly attributed.
- Reproducibility: Researchers must design their studies in a way that allows others to reproduce their findings, using transparent and well-documented methods.
- Statistical analysis: Researchers must use appropriate statistical methods to analyze their data, ensuring that results are valid and reliable.
- Good Clinical Practice (GCP)
- International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH)
- Consolidated Standards of Reporting Trials (CONSORT)
- Standards for Reporting Diagnostic Accuracy (STARD)
- Transparent Reporting of Evaluations with Nonrandomized Designs (TREND)
- Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)
- Minimum Information for Biological and Biomedical Investigations (MIBBI)
- Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE)
- Minimum Information about a Microarray Experiment (MIAME)
- Minimum Information for Reporting Microbial Genome Sequences (MIGS)
- Minimum Information for Metagenomic and Metatranscriptomic Studies (MIMARKS)
- Minimum Reporting Standards for Tumor Marker Prognostic Studies (REMARK)
- Strengthening the Reporting of Observational Studies in Epidemiology (STROBE)
- The Guidelines for Reporting Reliability and Agreement Studies (GRRAS)
- The Society for Immunotherapy of Cancer Immunoscore guidelines
- The Equator Network (Enhancing the Quality and Transparency Of health Research)
II - DATA MANAGEMENT PLAN
A data management plan (DMP) is a document that outlines how research data will be collected, stored, organized, backed up, preserved, shared, and disposed of, among other aspects of data management. Here are the main steps to create a DMP:- Define the scope and objectives of the research project: Before starting a DMP, you need to have a clear idea of what your research project is about, what data it will generate, and what are the objectives and potential impact of the project.
- Identify the types of data you will be collecting: Depending on your research, you may be collecting different types of data, such as quantitative or qualitative data, survey responses, images, audio or video recordings, and so on.
- Define data documentation and metadata standards: Data documentation is crucial to ensure that your data is findable, understandable, and reusable. You need to define the metadata standards and data dictionary for each type of data you will be collecting.
- Determine the data storage and backup requirements: You need to identify the appropriate data storage and backup solutions for your research data. This includes considering the size, format, security, and access control of the data.
- Define the data sharing and dissemination policies: You need to decide who will have access to your research data, under what conditions, and for how long. You also need to define the data sharing and dissemination policies, including the licensing terms and citation requirements.
- Identify the long-term preservation and curation requirements: You need to plan for the long-term preservation and curation of your research data, including the selection of appropriate digital preservation strategies and the allocation of sufficient resources for this purpose.
- Consider ethical, legal, and regulatory issues: You need to be aware of any ethical, legal, and regulatory issues that may affect your research data, such as data privacy, intellectual property rights, and data ownership.
- Write the DMP document: Based on the above considerations, you need to write a DMP document that includes all the relevant information about your data management plan. There are several DMP templates and tools available online that can help you structure your DMP in a standardized format.
Database schema: defines how data is organized within a relational database; this is inclusive of logical constraints such as, table names, fields, data types, and the relationships between these entities. Types - a conceptual database schema, a logical database schema, and a physical database schema.
Data modeling: is the process of creating a visual representation of either a whole information system or parts of it to communicate connections between data points and structure
Plan
Plan and budget Data Management and Sharing process
- A brief summary and associated costs - Data description
- Review of Existing Datasets
- Formats
- metadata
- Storage and Backup
- Security
- Responsability (data lifecicle)
- Access and Sahring
- Domaing Repositories
- Self/dissemiunation
- Preservation
- Institutional Repositories
- Budget
- Others considerations
- Audiencia
- Selection and Retention Period
- Archiving and Preservation
- Ethics and Privacy
- Data type
- Tools, Software and/or code
- Data Standards
- Data Preservation, Access and Associated Timelines
- Access, Distribution, or Reuse Considerations
- Oversight of Data Management and Sharing
Data Store and Organization
Length of Time to Maintain and Make Data Available
Documentation and Metadata
- Methodology and procedures used to collect the data
- Data labels
- Definitions of variables
- Any other information necessary to reproduce and understand the data
Naming Conventions
- Be descriptive
- Be consistent
- Document it
- Don't use space or special characters
- Use leading zeros for sequential numbering
- Use period only before file extnsion
- Limit to les than 32 characters
README: File & Folder Schema (Example)
Data Storage Format
- Open unencrypted and uncompressed
- Lossless
- Create one for each data file/dataset
- Name it so that it is easily to associete with th data file(s)
- Write it as a plain text file
- Identically structure
- Use standardized date formats
- Follow the conventions for your discipline
Back up
Almost three copy:
- local/working
- remote
- other in a remote location or local/external
- Dropbox
- OneDrive
Data Security
Then you can use encryption
• Don’t rely on 3rd party encryption alone
• Use something like PGP (Pretty Good Privacy)
• Write the keys down on two pieces of paper
• Store each piece of paper securely in separate locations
Repository to Share and Preservation
I will recommend to us the notion of FAIRness (Findable, Accessible, Interoperable, and Re-usable). Fo this you can use:
- Githud that is more oriented to programming and version control with historical changes.
- NIH-Supported Data Sharing Resources: A lists of data sharing repositories in health
- Harvard Dataverse
- Zenodo
- Dryad
- GenBank (for genome data)
- ICPSR (for numeric social science data)
- Further science as a whole
- Further your research/reputation
- Enable new discoveries with your data
- Comply with funder/publisher data sharing requirements
- Share Informally: Posting on a web site, sending via email upon request.
- Share Formally: Via a repository, which may also provide preservation and makes your data more accessible and citable.
Research and Data Management Requirements
The following elements should be considered projects standards requirements:
- Confidentiality
- Transparent funding disclosure
- Copyright declaration
- Conflict of interest declaration
- Ethics statement
Conflict of Interest
Declaring potential conflicts of interest is a recognized good practice and, in many contexts, a formal requirement. This is particularly relevant in health-sector research, where journals commonly require authors to complete dedicated conflict-of-interest forms or provide an explicit disclosure statement.
As a practical reference, I recommend using the MIT Managing Data Checklist.
Some additional tools and source that you can consult:
Abbreviations of Names of Serials: a excelent resourchs for almost any subjet in interaction with math.- Classification: Broadly encompasses obtaining, extracting, and structuring data from documents, photos, handwriting, and other media.
- Cataloging: Helping to locate data.
- Quality: Reducing errors in the data.
- Security: Keeping data safe from bad actors and making sure it’s used in accordance with relevant laws, policies, and customs.
- Data integration: Helping to build “master lists” of data, including by merging lists.
I have the following sources on my list for future review. You can help me by checking one of them and leaving your comments:
