Portfolio
Data Engineer with 10+ years solving real business problems through data. I turn stakeholder pain points into production-ready pipelines and models — built to run at 2am and trusted by analysts every day.
Linkedln
My blog posts
🛠️ Skills & Expertise
⚙️ Data Engineering & Pipelines
- Idempotent ETL/ELT pipeline design
- Data profiling, cleaning, and validation
- Kimball dimensional modeling — star & snowflake schemas
- Batch and real-time processing patterns
🗄️ Data Modeling
- Relational modeling — 3NF, normalization
- Dimensional modeling — facts, dimensions, slowly changing dimensions (SCDs)
☁️ Cloud & Infrastructure
| Platform | Services |
|---|---|
| AWS | Glue, Redshift, S3, RDS, IAM |
| GCP | BigQuery |
| Snowflake | Partitioning, clustering, compression, model optimization |
💻 Languages & Query
Python
- PySpark — large-scale distributed data processing
- pandas — data manipulation and profiling
- boto3 — AWS service automation and pipeline orchestration
SQL
- Query optimization and execution plan analysis
- Window functions and complex aggregations
- Schema design and indexing strategies
🔒 Data Governance & Quality
- Data quality frameworks and automated auditing
- Metadata management
- Access control and compliance
📊 Analytics & Visualization
- Power BI — DAX, semantic modeling, report design
- Data storytelling and executive dashboarding
Work samples:
1. Data engineering.
- Simple ETL pipeline(log, csv, xlsx)
- Python ETL with Snowflake
- Data ingestion with Airflow
- SQL Window Functions
- SQL Logic Optimization
2. Public data analysis.
3. Data visualization.
4. Apps
- Data Anonymization Tool
- Burnout Assement Tool
- GAYA ICU - Intensive Care Unit AI system (prototype)
- Ursinho Game
Contact
- email: jayronsoares@yandex.com