Spark exercises to work on log files.
Big Data Internship Interview Questions
1,784 big data internship interview questions shared by candidates
implement a function(int[][] matrix, int rownum, int colnum) that prints a matrix spiraling out from a given index in a multi-threaded fashion
Data join question: The higher the Key Performance Indicator is, the better the performance of the tower. Please provide a solution to determine the best and worst performing tower, as well as the average tower performance per market. Datasets KPI Dataset = TOWERID, DATE, KPI Market Info = TOWERID, MARKET
1. Introduce your background. 2. What your goal in this position. 3. Brain storming about projects we can do.
Interview did not happened in scheduled time.
Qual o seu conhecimento técnico sobre Applied Intelligence e por que você quer atuar nessa área?
Very broad range of questions covering data engineering, data science, distributed computing, architecture... and specialties like record linkage / deduplication + multiple code exercises
Serialize and deserialize binary tree
What is a ROC Curve ?
1.SQL: **d_customers** +-------------+-----------------------+---------------------+ | customer_id | membership_start_date | membership_end_date | +-------------+-----------------------+---------------------+ | 114 | 2015-01-01 | 2015-02-15 | | 116 | 2015-02-01 | 2015-03-15 | | 120 | 2015-02-15 | 2015-04-01 | | 221 | 2015-03-15 | 2015-10-01 | | 120 | 2015-05-15 | 2015-07-01 | +-------------+-----------------------+---------------------+ **d_shipments** +-------------+------------+-----------------------+----------+ | shipment_id | ship_date | receiving_customer_id | quantity | +-------------+------------+-----------------------+----------+ | 1 | 2015-02-13 | 114 | 2 | | 2 | 2015-03-01 | 116 | 4 | | 2 | 2015-03-01 | 116 | 1 | | 3 | 2015-06-01 | 116 | 1 | | 4 | 2015-03-01 | 120 | 6 | | 5 | 2015-10-01 | 120 | 3 | | 6 | 2015-03-01 | 321 | 10 | +-------------+------------+-----------------------+----------+ Populate **a_shipments** +-----------+-----------+----------+----------+----------+ | ship_date | customer_id | is_member | quantity | +-----------+-----------+----------+----------+----------+ the column [is_member]: if [ship_date] is between [membership_start_date] and [membership_end_date] then 'y', else 'N' sample of otput: 2015-03-01 | 116 | Y | 5 | 2015-06-01 | 116 | N | 1 | 2. Coding task. Check whether a string is palindrome. I have been asked to code a solution by iterative and recursive approach. 3. Big Data questions: 3.1. What format of files in Hadoop do I know? What is a difference between Avro and Parquet format? 3.2. How compression is used in Avro and Parquet formats? 3.3. Most difficult big data performance challenges you have faced and resolved? 3.4. Spark optimization. Spark cost based optimizer
Viewing 1701 - 1710 interview questions