Learning Outcomes of this Assessment
A1 - Critically assess diverse issues regarding the use of data mining and machine learning in real-world contexts, including ethics;
B1 - Design, build and use business intelligence systems, justifying decisions made;
B2 - Design and create reports to present analytical and interpretative information in creative and effective ways;
B3 - Devise strategies for making effective use of analytical software such as SAS Enterprise Miner.
B4 - Learn about different algorithms, such as classification and clustering.
Key Skills to be Assessed
Diverse issues regarding the use of data mining techniques to real-world datasets
Discover patterns within a dataset using exploratory data analysis
Use of SAS for data mining
Discover techniques to leverage R’s features, and work with packages
Reporting and presentation of analytical and interpretative information
The Assessment Task
Overview
This coursework will give you the opportunity to use the techniques covered in this module to organise and analyse a collection of data that interests you and to draw conclusions based on your analysis and finally to present your results in the form of a report.
Explanation
You must apply Classification, Association Rules Mining, Clustering and Text Mining as your approaches. This means that you will need to choose a dataset that is amenable to each of these types of data mining -- i.e., to building a model that will determine, predict, or estimate one of the attributes in the dataset, based on the values of other attributes.
he overall requirement is summarised below:
Task 1: Apply classification on a selected dataset using R & SAS Enterprise Miner (e.g., Decision Tree, K-Nearest Neighbours, Logistic Regression). (25 Marks)
Task 2: Apply Association Rules Mining on a selected dataset using R & SAS Enterprise Miner. (25 Marks)
Task 3: Apply K-Means Clustering on a selected dataset using R & SAS Enterprise Miner. (20 Marks)
Task 4: Apply Sentimental Analysis on selected 20 hotels in the hotel_reviews.csv dataset Using R & SAS EM (dataset can be downloaded from the Blackboard) (20 Marks)
Extra feature to be implemented (10 Marks)
Students can gain 10 marks, if they build a Shiny dashboard using R shinydashboard package for any of an above main task. (Guide: https://rstudio.github.io/shinydashboard/index.html )
Page Break
Key elements of the report
Title
The title should provide an overview of the focus of your problem and the expected solution.
Introduction
This section contains a brief background to the topic and leads to the formulation of the specific question, based on your selected topic. The research question must be focused and clear.
Datasets
You are welcome to choose any datasets that interest you, and that has enough data to enable meaningful analysis. In making your choice, you should be sure to consider what problem or problems you would be able to solve by employing data mining on the dataset. In other words, you should ask yourself: How could I use data mining to answer one or more questions about the datasets?
Explanation and preparation of datasets
Briefly describe the datasets you have used, independent and dependent variables. Explain any preparation tasks (e.g., normalisation, dealing with missing values, etc.) carried on the datasets.
Data mining using SAS Enterprise Miner
Your explanation must include all of the following:
• The application of data-mining techniques to selected datasets that you choose using SAS Enterprise Miner.
• Train and test your model.
• Use visualisation tools available in SAS Enterprise Miner.
Implementation in R
Implement your proposed approach using package(s) available in R programming. This section will include:
• A brief description of the R package(s) used.
• The application of data-mining techniques to selected datasets that you choose using R.
• Explanation of the experimental procedure, including the setting and optimisation of model parameters during training.
• Visualisation of the results.
Results analysis and discussion
• Explain and justify the performance metric you choose to use to evaluate the model(s).
• A clear and compelling presentation of the results that you obtain, both from the data mining and any other analysis that you may perform.
• Compare and discuss the results obtained from R implementation with the results obtained in SAS Enterprise Miner.
Conclusions
The key points from the assignment must be synthesised within the conclusion. This must relate back to the introduction and the research question and provide an overall evaluation of the validity of the solution you have proposed.
References
You will list all publications referenced in the report. You should show evidence of sufficient readings related to your work. References must follow the Harvard formatting system as in this guide: http://www.salford.ac.uk/library/help/user-guides/general/Bibliographic-Citations-APA-QuickRef-Apr2015.pdf
Appendices
Appendices may be used to provide relevant supporting evidence for reference but should only be used if necessary. Students may wish to include in appendices, evidence which confirms the originality of their work or illustrates points of principle set out in the main text.
Equipment and Facilities to be Used
The university laboratory computers are installed with SAS and R.
Workload
This assessment should require approximately 120 hours of effort.
Marking scheme
The work will be assessed using a marking grid comprising weighted components (provided below). This is indicative of the standard of work required at different levels within the assignment.
Assessment criteria
Submission Details
PDF or Word file of your report should be submitted onto the module’s website on Blackboard. R scripts should be included as an appendix to the report.
Feedback
Feedback, in the form of a personalised annotated marking grid and annotated your report will be available within 3 working weeks of the submission date. Given the large cohort size for this module there is a slight possibility that marking and provision of high quality feedback may take slightly longer. In this case the tutor will notify the group as soon as this becomes apparent and provide regular updates on progress. These will be available from the module tutor (by appointment) and will be delivered with a discussion of the work submitted.
The University has strict policies on unfair means. It is your responsibility to ensure that you both understand these and adhere to them in the production of your assignment. Any submitted works with such content identifiable will be penalised in accordance with the University of Salford regulations (http://www.governance.salford.ac.uk/page/academic_handbook).
Page Break
Sample report format for each task
Abstract
Introduction
Brief background of the task
Formulation of the research question
Justification: Why did I choose this topic/dataset?
Aim and Objective of the task
Brief Literature Review
Explanation and preparation of datasets
Description of the dataset
Identify independent dependent variables (if any)
Data Pre-processing steps
Assumptions (if any)
Task: Classification/ Association Rules/ Clustering Task/ Text Mining
Data Exploration and Attribute Visualization in R
Model Building in R
Model Assessment in R
Results visualisation in R
Data Exploration and Attribute Visualization in SAS EM
Model Building in SAS EM
Model Assessment in SAS EM
Results visualisation in SAS EM
Results analysis and discussion
Result comparison between R and SAS EM
Critical findings
Conclusion
Reference
Appendix
Sample datasets
Classification
Association Rule
Clustering
0 comments:
Post a Comment