Elaborate technical questions as well as a few HR related questions
Data Engineer Interview Questions
20,238 data engineer interview questions shared by candidates
tell us about you deal with work creating your own demand without a boss
The interview questions revolved most around tableau, database structure, etc. All of the people spoke around the same thing.
They described a technical problem and later followed up with additional challenges that needed to be solved.
1. SQL and Databases 2.Python for Data Engineering 3.Data Modeling and Warehousing 4.Version Control and CI/CD (Basic Awareness)
Asked to solve a coding challenge about event time conflicts.
A developer on the team wrote an ETL that runs once a day as a Spark job. Every day it reads a CSV file that shows the total value of each customer's transactions of that day and writes them as a parquet file partitioned by date and customer id. Below you can see an example of the CSV file. Note that each customer has one entry representing the total transaction value it did on that day. However, sometimes the CSV file contains a correction for a sum reported in the past. For example - this file represents the transactions on 1/10. You can see that customer 1002 has 2 entries. One for 1/10 and one for 30/9. This means that the total sum of transactions the customer did on 1/10 is 70, but the total sum of transactions it did on 30/9 was 40 and this sum should replace the value already reported on 30/9. current date file: 2020-10-01 date,customer,price 2020-10-01,1000,40 2020-10-01,1001,10 2020-09-30,1002,40 2020-10-01,1002,70 2020-10-01,1003,10 2020-09-29,1004,10 2020-10-01,1004,10 This function represents the ETL. It runs once a day with a string representing the current day. It reads the CSV file, does some transformations, and writes it. Please help us find the bug in the code above, and return the right results
Our developer had to join the results with a dimensional table of categories. The join works, but its a bit slow, see if you can understand why and whether it can run faster
Questions about Python and SQL, and were both reasonable to do. They ask to get an interval range in the SQL part and the Python to find the missing number in an unsorted array
How to apply union between 2 tables if table A has 5 columns and table B has 8 columns
Viewing 1451 - 1460 interview questions