Assignment 4 Instructions (1)

.pdf

School

University of Massachusetts, Amherst *

*We aren’t endorsed by this school

Course

454

Subject

Computer Science

Date

Apr 25, 2024

Type

pdf

Pages

2

Uploaded by MateDinosaurPerson858 on coursehero.com

OIM 454 Individual Assignment 4 Please download the 2022 Austin Music Census.csv dataset from Canvas. This file contains response data from a survey sent out to Austin, Texas music industry professionals in 2022. Below are the column definitions for the ‘2022 Austin Music Census sheet: id: unique identifier column for survey response res_distance: Residence Distance from Downtown (in miles) austin_resident: TRUE if Austin resident, FALSE if no restored_pp_workload: TRUE if their workload has resumed to pre-pandemic levels, FALSE if not formal_music_education: TRUE if they have formal music education, FALSE if not music_sector: primary music sector the survey respondent works in music_business_structure: music business type that survey respondent primarily works in supplemental_income: number of supplemental forms of income outside of music that the survey respondent has experience: years of experience in the music industry intent_to_continue: intent to continue a career in music over the next 3 years (Definitely No – Definitely Yes) Open RStudio and create an R Markdown file. Save the .rmd file, ensuring that you save it in a file location that you can easily find again. Your job is to transform the variables present in the .csv file to get them into a format suitable for machine learning. First, you will perform data transformations in Excel. You will record this process in the R Markdown file. Complete the following items in the R Markdown file. Item 1: How many rows and columns are initially present in the dataset? Record this in the R Markdown file under the heading “Initial Data”. Remove all duplicates from the file using Excel. Item 2: How many rows are in the dataset now? Record this in the R Markdown file under the heading “Removing Duplicates”. Using Excel, transform the austin_resident, restored_pp_workload, and formal_music_education variables into dummy variables. Item 3: Describe these variable transformations in the R Markdown file under the heading “TRUE/FALSE Data Transformations”. Perform manual category reduction on the music_business_structure column so that it only has three categories: Freelance, Non-Registered Entity, and Registered Entity. Item 4: Describe this variable transformation in the R Markdown file under the heading “Manual Category Reduction”. Save the .csv file in a known file location.
OIM 454 Individual Assignment 4 Now, you will perform operations on this data using R. These operations must be performed in the order that they are specified in these instructions. Each of the following items corresponds to a separate code chunk that you must create in the R Markdown file. Make sure to run each code chunk after you write it to ensure that there are no errors. Be careful about typos! If you have not installed the caret and dplyr libraries yet, do so according to the instructions given in class before beginning this portion of the assignment. Item 5: Load the caret and dplyr libraries. Put this into a code chunk. Item 6: Import your transformed 2022 Austin Music Census.csv file into an R data frame called “musiccensus”. Put this into a code chunk. HINT: You will need the full file path of the .csv file to do this. Consult the slides and/or the class recordings if you are unsure as to how to do this. Item 7: Generate dummy variables from the music_sector and music_business_structure categorical variables in the musiccensus data frame. Use the same method shown in class. Item 8 (**EXTRA CREDIT**): Generate a category score variable called Intent_Strength from the Intent to Continue ordinal variable in the musiccensus data frame. The category scores should be defined as follows: If Intent to Continue is ‘Definitely No’, Intent_Strength is 0 If Intent to Continue is ‘Maybe No’, Intent_Strength is 1 If Intent to Continue is ‘Maybe Yes’, Intent_Strength is 2 If Intent to Continue is ‘Definitely Yes’, Intent_Strength is 3 Item 9 (**EXTRA CREDIT**): Generate all of the correlations in the musiccensus data frame. Ensure that the correlations are displayed. Item 10: Export the musiccensus data frame to a .csv file. Save this .csv file as FirstName_LastName_Assignment4.csv . Knit and export your R Markdown file as an HTML file. Upload the .csv and HTML files to Canvas to complete the assignment. If you are unable to knit your .rmd file due to errors, make sure that you go back and test your code chunks individually. If you are ultimately unable to figure out how to solve these errors, save the .rmd file and upload that instead of the HTML file for partial credit. If you are unable to find your exported HTML file, consult Canvas for instructions.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help