Data Engineer Interview Questions

20,148 data engineer interview questions shared by candidates

Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory.
avatar

Data Engineer

Interviewed at Meta

3.6
Apr 30, 2018

Given a multi-step product feature, write SQL to see how well this feature is doing (loading times, step completion %). Then use Python to constantly update average step time as new values stream in, given that there are too many to store in memory.

SQL: 1. Percentage increase in revenue compared to promoted and non-promoted products. 2. Products classes that has the highest transactions 3. Count of Customers who bought 2 items type (A,B) 4. Don't remember Python: 1. Average length of letters to words. 2. Parse an ip address (This is a favourite FB question) 3. [[A],[A,B],[A,C],[B,D],[C,A]] -- Find the alphabet with highest neighbors? -- (Wasnt able to solve because of time limit but the interviewer was like I get what I want to convey.. I gave her an algo of what I would I have done)
avatar

Data Engineer

Interviewed at Meta

3.6
Sep 21, 2018

SQL: 1. Percentage increase in revenue compared to promoted and non-promoted products. 2. Products classes that has the highest transactions 3. Count of Customers who bought 2 items type (A,B) 4. Don't remember Python: 1. Average length of letters to words. 2. Parse an ip address (This is a favourite FB question) 3. [[A],[A,B],[A,C],[B,D],[C,A]] -- Find the alphabet with highest neighbors? -- (Wasnt able to solve because of time limit but the interviewer was like I get what I want to convey.. I gave her an algo of what I would I have done)

In python code, given a json object with nested objects, write a function that flattens all the objects to a single key value dictionary. Do not use the lib that actually performs this function. { a:{b:c,d:e} } becomes {a_b:c, a_d:e} ( not, a:"b:c,d:e" }
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 29, 2020

In python code, given a json object with nested objects, write a function that flattens all the objects to a single key value dictionary. Do not use the lib that actually performs this function. { a:{b:c,d:e} } becomes {a_b:c, a_d:e} ( not, a:"b:c,d:e" }

SQL Select the value of a column based on the max of a different column from each grouping of yet a third column. Column A, Column B, Column C. For each group based on Column A, give value of Column B, where Column C is max for that group.
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 29, 2020

SQL Select the value of a column based on the max of a different column from each grouping of yet a third column. Column A, Column B, Column C. For each group based on Column A, give value of Column B, where Column C is max for that group.

# Question 3: # Complete a function that returns a list containing all the mismatched words (case sensitive) between two given input strings # For example: # - string 1 : "Firstly this is the first string" # - string 2 : "Next is the second string" # # - output : ['Firstly', 'this', 'first', 'Next', 'second']
avatar

Data Engineer

Interviewed at Meta

3.6
Jun 8, 2020

# Question 3: # Complete a function that returns a list containing all the mismatched words (case sensitive) between two given input strings # For example: # - string 1 : "Firstly this is the first string" # - string 2 : "Next is the second string" # # - output : ['Firstly', 'this', 'first', 'Next', 'second']

1. What difference have you made in current team apart from regular work ? 2. What are the steps you follow to rebuild a table in database ? 3. How did you do performance tuning ? 4. How do you find the skewness of data in table ? 5. Difference between RDBMS and Dimensional Modeling SQL 1) purchase customer_id product_id quantity purchase_date 1 111 1 01/01/2017 1 111 2 01/02/2107 1 222 2 01/02/2017 2 111 3 01/04/2017 2 222 1 01/03/2017 3 222 1 01/05/2017 3 222 1 01/06/2017 3 111 1 01/06/2017 3 111 1 01/04/2017 Q: How many customers bought each product how many times during the week? Product_Id Number_of Customers Number_of_Times 111 2 2 111 1 1 222 2 1 222 1 2 2) daily_usage account_id usage_amount usage_date 1 10 1 1 20 2 1 15 3 1 30 4 Q. a) How do you print the usage_amount of previous/consecutive rows b) Without using window functions
avatar

Data Engineer

Interviewed at Amazon

3.5
Feb 16, 2017

1. What difference have you made in current team apart from regular work ? 2. What are the steps you follow to rebuild a table in database ? 3. How did you do performance tuning ? 4. How do you find the skewness of data in table ? 5. Difference between RDBMS and Dimensional Modeling SQL 1) purchase customer_id product_id quantity purchase_date 1 111 1 01/01/2017 1 111 2 01/02/2107 1 222 2 01/02/2017 2 111 3 01/04/2017 2 222 1 01/03/2017 3 222 1 01/05/2017 3 222 1 01/06/2017 3 111 1 01/06/2017 3 111 1 01/04/2017 Q: How many customers bought each product how many times during the week? Product_Id Number_of Customers Number_of_Times 111 2 2 111 1 1 222 2 1 222 1 2 2) daily_usage account_id usage_amount usage_date 1 10 1 1 20 2 1 15 3 1 30 4 Q. a) How do you print the usage_amount of previous/consecutive rows b) Without using window functions

SQL : Top 3 Products by sale, % using Case, Basic Having clause and one Set operator (Intersect) type question Python : Average word length, ip-address parsing, dictionary, list of lists, flatten list of lists. ( Similar to previous interview experiences)
avatar

Data Engineer

Interviewed at Meta

3.6
Nov 26, 2018

SQL : Top 3 Products by sale, % using Case, Basic Having clause and one Set operator (Intersect) type question Python : Average word length, ip-address parsing, dictionary, list of lists, flatten list of lists. ( Similar to previous interview experiences)

Viewing 21 - 30 interview questions

Glassdoor has 20,148 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.