Career Profile
A Data engineer and technology consultant with 5 year experience in ETL, pipeline operations, Big Data, cloud solutions, business analytics & BI reporting. Experience in two of the biggest cloud solutions AWS & GCP. I have helped solve data challenges for various clients across domains like Healthcare, pharmaceutical, R&D, HR, and Staffing. I am a technology and data enthusiast who is keen to learn new technologies and data domains and help solve real world problems for my clients.
Experiences
- Develop, maintain, enhance and innovate both small and big data pipelines to ingest data from >30+ sources across 100+ clients
- Worked with the business and data science teams to understand their requirements and convert the business rules into technical SQL queries and apply in the data warehouse ultimately resulting in quick business reports delivery
- Mentored and helped the team in on-boarding new members, choosing the right GCP resource, SQL & Python coding and other day to day activities
- Lead the migrations for various applications and codes for the team from lower environments to Production. Build a front end tool using python flask to automate the migration and reduce the time to migrate by >50%.
- Lead a project to build cloud infrastructure on Google Cloud Platform including setting up the environments, networking,security, permissions, and databases.
- Implemented various data governance policies like GCP security groups, IAM policies, Cloud IAP, PSQL user groups for both BI-group team and data science teams to adhere to the set security best practices.
- Deployed and managed various internal applications on Google Cloud Run, Cloud Functions, App engine, Compute Engine ensuring maximum performance and automation resulting in 40% increase in team’s productivity
- Provide support to the Data Science team by building data migration modules to migrate data between RDBMS to Bigquery rapidly reducing overall TAT by 40%
- Created an orchestration framework to add dependencies and schedule the various pipelines and applications using python as well as airflow (cloud composer)
- Built a data quality management framework that reduced the data quality issues by 35% and saved effort for re-work done by the business teams by 20%
- Shifted from third party tool to Google Data loss prevention (DLP) API to mask Personally Identifiable Information (PII) to meet the data governance policy ultimately reducing the overall cost by 73% annually
- Done cost estimation and reduced overall cloud expenses by 28% by following Google’s best practices
- Handling 5 Associates who directly report to me and managing their career growth chart, filling their reviews andcompensation planning.
- Leading the day to day operations of a cloud data warehouse solution, containing massive commercial healthcare data, with a team of 5-6 members and supporting 100+ Service Requests & 20+ Incidents each month
- Architecting solutions to the client by gathering requirement for 30+ data sources and integrating into AWS cloud infrastructure using Amazon s3 for ftp, Redshift & RDS Aurora for database, Python & SnapLogic for creating data pipelines, ETL & Automation and MicroStrategy for Business Intelligence reporting
- Supported the Implementation of the new IT ERP for the client on ServiceNow instance by leveraging the platforms’ OOTB functionality and innovation to drive process excellence, digital transformation and standardize user experience
- Deployed 10+ build projects worth ~$1.3 million using Agile SDLC methodology
- Worked on a website deployment project to track patient digital data journey through Google Analytics UTM tracking and Adobe Analytics CID tracking code during COVID-19 pandemic and provide useful insights on patient’s behavior and impressions on the marketed drug
- Created multiple Process Automation tools to optimize and reduce the operational and build effort of multiple projects:
- One Touch Ops: 60% Effort reduction in running operations; 3 FTE worth of cost & effort savings
- Data Archival framework: Purge 20 TB worth of data every week; 5x faster using parquet format; 500% time saving; 17% more cluster space availability
- Fuzzy-logic based mapping and conformance suggestion engine: 80% reduction effort; 10x faster
- SLA tracking Automation: Scrap data from website using API & python for SLA tracking; 40% reduction in effort; 100% SLA
- Loading dock for email ingestion: 20% reduction in effort; ~1 FTE worth cost & effort savings
- Supported a MVP on ML based anomaly detection engine created over Dash framework, that:
- Detects data anomalies in priority subject areas to augment current data quality checks within the system.
- Reduce cognitive burden (~3 Business days) on BI&A commercial colleagues by identifying higher number of potential data anomalies proactively (~400 Anomalies identified and rectified before data pushed in production)
- Lead monthly Quality Risk Management (QRM) meeting with client
- Document & present statistics for overall project performance in Business Review meeting
- Done architecture, cost estimation & pricing for over 10 projects
- Onboard/mentor new joiners in the team on system architecture and data domains
- Firm contribution on facilitating more than 6 Bootcamps in 1 year as a Data warehousing and SQL mentor for new recruits
- End to End data load; gathering data from multiple sources to performing ETL, optimize databases, create data pipelines, cleansing the data and automating the process
- Troubleshoot and optimize existing data pipelines
- Business Intelligence reporting and data visualization on MicroStrategy to provide strategic business insights & analytics to clients and users
- Migrated an obsolete payer MDM system to cloud to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the commercially shared payer master data assets
- Helped resolve more than 200+ complex business queries raised by more than 15+ downstream teams
- Pilot Project for client over feasibility of business Intelligent reporting on ThoughtSpot
- Worked towards enhancing and upgrading an in-house content management framework for data catalog management using Drupal
- One of the first person to successfully complete a POC on hosting enterprise data over a new platform for the client
- Provided Business solutions to complex US Healthcare analytics problems which involved complex datasets like Sales, Call activity, Patient Transactional Data – Claims, Copay & Redemption, Contract & Rebates, Market Access, Customer master, Product master, Alignment master, Multi-Channel marketing
- Applying Business driven quality checks such as outlier, restatement checks for quality data comparisons and trend breaks using data clustering & statistical models which resulted in 400% increase in timely issue capturing
- Providing support to business teams by de-bugging the production data issues, working closely in meeting the end requirement for their clients & creating custom business reports
- Handling client-side E-Commerce Platforms’ change requests based over SAP-Hybris & BigFish Solvida
- Developing website dynamics using HTML, CSS, XML, PHP
- Testing on Junit-TDD/Find Bug/Checkstyle
- Code based modification using Java, JScripts, Spring MVC framework, Threading