Data-Modeling-Process

 Key Steps

RELEVANT FOR DATA SCIENCE

  1. Identify the entities
  2. Identify a key property
  3. Draw a rough draft
  4. Identify the various data attributes (need be incorporated into)
  5. Map the attributes
  6. Finalize and validate the data model (refine it)

Data Engineering Process

Consist in work with multiple types of data to perform many operations using scripting or coding.

Types of Data
  • Structured: table-based source systems - relational database or CSV-
  • Semi-structured: JSON
  • Unstructured: key-value pairs - no standard relational models - PDF, documents and images
Data Operations
  • Data Integration: stablishing links between operational analytical services and data sources
  • Data Transformation: transform operational data into suitable structure and format for analysis, in variation form ETL to ELT to apply big data processing.
  • Data Consolidation. combining data from multiple data sources into a consistent structure, stores such as a data lake or data warehouse.
Tools:
  • SQL
  • Python, R, Java and others

Key Concepts:

------------------------------------------------------------------------------------------------
  • Operational data: usually transactional data that is generated and stored by applications.
  • Analytical data: is data that has been optimized for analysis and reporting, often in a data warehouse.
  • Streaming data:  perpetual sources of data that generate data values in real-time.
  • Data pipelines: are used to orchestrate activities that transfer and transform data.
  • Data lake is a storage repository that holds large amounts of data in native, raw formats.-Files-
  • Data warehouses: is a centralized repository of integrated data from one or more disparate sources.
  • Apache Spark: is a parallel processing framework that takes advantage of in-memory processing and a distributed file storage. It's a common open-source software (OSS) tool for big data scenarios.
------------------------------------------------------------------------------------------------

Data Engineer: is the primary role responsible for integrating, transforming, and consolidating data from various structured and unstructured data systems into structures that are suitable for building analytics solutions.

Data Management: process

ETL: extract, transform, and load process.

ELT: extract, load, and transform.

SQL:  Structured Query Language

NoSQL -Not Only SQL database-



Entradas populares

SQL