Join Operations in R | Fresco Play

Author: neptune | 29th-Oct-2023

In this blog, we will explore how to perform various join operations in R using two data sets, 'flights' and 'weather'. The 'flights' data set contains information about flights that departed from New York City in 2013, while the 'weather' data set provides weather data for each NYC airport for each hour. 


We will address the following tasks:

1. Return all rows from 'flights' and all columns from 'flights' and 'weather'. Output by the columns year and month. Return only the first 10 rows of the output and save it in the variable quest_1.


2. Return all rows from 'weather' and all columns from 'weather' and 'flights'. Output by the columns year, month, day, and hour. Return only the first 10 rows of the output and save it in the variable quest_2.


3. Return only the rows in which the flights have matching keys in the 'weather' data set. Output by the columns year, month, day, and hour. Return only the first 10 rows of the output and save it in the variable quest_3.


4. Combine two data sets, keeping rows and columns that appear in both 'flights' and 'weather'. Output by the columns year, month, and day. Return only the first 10 rows of the output and save it in the variable quest_4.


5. Return only columns from 'flights'. Output by the columns year, month, day, and hour. Return only the first 10 rows of the output. (Include the necessary libraries and read the data from the data set) and save it in the variable quest_5.


Once you are in the Web IDE:


Open prog.R file in the jupyter lab  and start your coding by following the instructions in the notebook.

Once you are done with the solution , then run the following command in the terminal to check your solutions.
Click File -> New -> Terminal, run the following command as shown below


    >>> Rscript prog.R

    >>> bash .score.sh



After running the test cases, click the submit button  and click on Submit Test to end the assessment.


Notes: 

(1)The results of preliminary validation don't impact final scoring. In-depth scoring are done at a later stage.


(2) Here the Rough_Work.ipynb notebook can be used for coding if necessary for rough work.


(3) In Terminal if incase the password is being asked kindly click on "ENTER" after 3 incorrect attempts the above given command will run successfully.



Let's dive into the solutions for each of these tasks.

Solution in R:

Task 0 - Import dplyr library and read csv files of  'flights' and 'weather' data


    # Load necessary libraries (if not already loaded)

    library(dplyr)


    # Read 'flights' and 'weather' data sets

    flights <- read.csv("flights.csv")

    weather <- read.csv("weather.csv")



Task 1: Return all rows from 'flights' and all columns from 'flights' and 'weather'.


    # Output by the columns year and month. Return only the first 10 rows of the output.

    quest_1 <- flights %>%

    select(year, month) %>%

    inner_join(weather, by = c("year", "month")) %>%

    head(10)


Task 2 - Returning Rows and Columns from 'weather' and 'flights'


    # Output by the columns year, month, day, and hour. Return only the first 10 rows of the output.

    quest_2 <- weather %>%

    select(year, month, day, hour) %>%

    inner_join(flights, by = c("year", "month")) %>%

    head(10)


Task 3 - Returning Matching Rows between 'flights' and 'weather'


    # Task 3: Return only the rows in which the flights have matching keys in the 'weather' data set.

    # Output by the columns year, month, day, and hour. Return only the first 10 rows of the output.

    quest_3 <- flights %>%

    inner_join(weather, by = c("year", "month", "day", "hour")) %>%

    select(year, month, day, hour) %>%

    head(10)



Task 4 - Combining Rows and Columns in 'flights' and 'weather'


    # Task 4: Combine two data sets, keeping rows and columns that appear in both 'flights' and 'weather'.

    # Output by the columns year, month, and day. Return only the first 10 rows of the output.

    quest_4 <- flights %>%

    inner_join(weather, by = c("year", "month", "day")) %>%

    select(year, month, day) %>%

    head(10)



Task 5 - Returning Columns from 'flights'


    # Task 5: Return only columns from 'flights'.

    # Output by the columns year, month, day, and hour. Return only the first 10 rows of the output.

    quest_5 <- flights %>%

    select(year, month, day, hour) %>%

    head(10)



Once you've completed these tasks in your R script, you can run the script to obtain the desired output and save them in the variables quest_1 to quest_5.


Don't forget to run the validation script as mentioned in the problem statement to check your solutions. This will ensure that your code meets the requirements of the tasks.


Now you are well-equipped to perform join operations in R with confidence. Happy coding!





document.addEventListener("click", function(e) { if (e.target.classList.contains("copy-btn")) { const codeBlock = e.target.closest(".code-block").querySelector("pre").innerText; // ✅ Prefer modern Clipboard API if (navigator.clipboard && navigator.clipboard.writeText) { navigator.clipboard.writeText(codeBlock).then(() => { e.target.textContent = "✅ Copied"; setTimeout(() => { e.target.textContent = "📋 Copy"; }, 2000); }).catch(err => { console.error("Clipboard write failed:", err); }); } else { // ✅ Fallback for insecure contexts / older browsers const textarea = document.createElement("textarea"); textarea.value = codeBlock; document.body.appendChild(textarea); textarea.select(); try { document.execCommand("copy"); e.target.textContent = "✅ Copied"; setTimeout(() => { e.target.textContent = "📋 Copy"; }, 2000); } catch (err) { console.error("Fallback copy failed:", err); } document.body.removeChild(textarea); } } });