Monday 14 June 2021

Project Dissertation marking form. School of Computing

UK assignment helper

General Comments

 

These will NOT be notified to the student. They may contain anything that may need to be brought to the attention of the examiners (please give reasons). These will not normally be notified to the student, but may be revealed as a result of disciplinary or other proceedings.

Core marking criteria for ALL projects

 

Notes: These should be applicable to all MSc Project Dissertations, either engineering or study/research.

If you assign a fail to any of the four shaded categories, you must cap the overall mark at 39% or justify why the project should pass in the ‘general comments’.

 

Additional marking criteria for STUDY/RESEARCH projects PJS60P


Additional marking criteria for STUDY/RESEARCH projects PJS60P


Core marking criteria for ALL projects


1 Indicates a superficial, cursory or casual approach, an unintelligible, meaningless or indecipherable report or little or no evidence of any work being done.

2 Merit indicates material that is between 60-69%

3 Distinction indicates material that is between 70-79%

Indicates material that is probably publishable in a magazine or minor journal/conference (80%or more). Material publishable in a major journal would be equivalent to 90% or more

Sunday 13 June 2021

Project style guide

UK assignment helper

 

 

1                    Introduction


Project style guide


 

1.1              This style guide is for use by undergraduate and MSc project students in the School of Computing. It gives information on:

·         The expected structure of the project report

·         Language and referencing

·         Style and layout

·         Advice on printing and binding the project

 

2                    Structure of project report

 

2.1              There are contents which need to be in the project report and these are shown in normal text. There are other contents which are indicative only and are shown in italics. For example, in an engineering project, if your SDLC is prototyping / Agile, you may have elements of each stage in several chapters

 

2.2              For the written contents of these sections, please see project lectures / Moodle / consult your supervisor.

 

2.3              Elements on the marking form should be present in the report, but you don’t have to have specific chapters for each one. Literature may be discussed in other chapters as well as the literature review chapter.

 

2.4              Contents of the project report are:

 

§  Front page, produced from templates available in the Moodle unit: CAM Student Centre (Project Cover Templates)

 

·         Abstract (150 - 300 words), summarising the question and the findings / problem and resulting artefact function (not the processes by which this was done)

 

·         Word count, on same page as abstract

 

§  Acknowledgements (if including)

 

§  Table of contents, auto-generated (see your word-processing package for instructions). Includes: main sections, list of appendices, list of tables and list of figures (in that order). For examples, visit the eDissertations on the library website (www.port.ac.uk/library, Information and Resources)

 

·         Main report

 

o    Introduction

o    Literature Review

o    Project management


o    (Study project:)

§  Methodology / research design

§  Data collection

§  Data analysis

§  Discussion

 

o    (Engineering project:)

§  Methodologies / project life cycles

§  Requirements and analysis

§  Design

§  Implementation

§  Test

§  Evaluation

o    Conclusion

 

 

§  References, a list of the sources cited in the main report, in APA6 format, organized alphabetically by author surname. A bibliography (listing the materials read, but not necessarily cited) is not usually included.

 

§  Appendices

o    Appendix A: Project Initiation Document (PID) and Gantt chart(s)

o    Appendix B: Ethics certificate, generated from ethicsreview.port.ac.uk and signed by your supervisor

o    Further appendices as applicable. These may include consent forms, questionnaire proformas and responses, test plans and results, design documentation, interface layouts etc.

 

NB Do not put code listings in an appendix – these belong on the accompanying CD/USB stick.

 

3                    Language and referencing

3.1              Writing style should be academic, i.e. formal, third person passive (e.g. “a test plan was developed”).

3.2              Pay attention to grammar and spelling (UK English). Avoid jargon. If using acronyms, include the full term on first definition (e.g. “Chief Technology Officer (CTO)”). Commonly used acronyms (e.g. USB) don’t need this.

3.3              Good academic practice should be followed in referencing. References and in-text citations should conform to APA6 format (see referencing.port.ac.uk for guidance on formats for different types of material).

3.4              When to cite: If something is common knowledge or if you are expressing your own ideas, there is no need to provide an in-text citation. Anything else is assumed to come from another source, and citation is needed.

3.5              Quotations should be used sparingly, to support arguments rather than to make points. Paraphrasing is preferred.


4                    Style and layout

4.1              Your report can be created in any word processing package, but choose one to which your supervisor has access, so that s/he can read and comment on drafts if necessary. The final version of the report should be saved as a PDF for printing and submission to Turnitin.

4.2              Format the report for printing double-sided. Each new section (including chapters and appendices) should start on a right-hand page. Insert “intentionally blank” pages at the end of sections as appropriate to make sure this happens.

4.3              A3 pages and colour can be included, but be aware of extra cost when printing. Also you will have to number A3 pages individually, and make sure that they are inserted in the right place when binding the project.

4.4              Page furniture: pagination should be included at the foot of each page. Student number / name should not be included.

4.5              The detailed format for page layout and text size is below.

§  Page margins, 2cm left and right (to allow for binding).

§  Line spacing 1 or 1.5 lines.

§  Text in 11 or 12 pt font (text in diagrams no smaller than 9 pt font).

§  Font style - readability is key.

4.6              Chapters should be sub-sectioned using headings and subheadings, down to three levels (e.g. 2.1.1). Use Heading styles so that you can use the Table of Contents tool (Word or PDF equivalent) to generate a Table of Contents automatically.

4.7              Tables and figures (diagrams) should be numbered (chapter number, table number within the chapter, e.g. Table 2.1, Figure 3.1). Citations should be included if the table or figure is taken from the literature. Tables and figures should be included as close to the relevant text as possible. Where there are large numbers of tables/figures, consider putting them in an appendix. Remember to refer to the appendix from the main body of the text. Use the Table of Contents tool with your WP package to make sure lists of tables and figures appear in your Table of Contents.

4.8              Front cover templates are on CAM Student Area Moodle unit (Project Cover Templates). Complete with your details and include as the first page of your report (with “intentionally blank” page as next page for printing purposes).

4.9              The referencing style is APA6, for in-text citations and the References section. See referencing.port.ac.uk for guidance and examples. References should be to academic literature, cited in the report.

 

5                    Printing and binding on-campus projects

 

5.1              Two paper copies of the project report are required, and an electronic hand-in via the appropriate Moodle unit. For engineering projects, two copies of the artefact software are also required. These can be DVDs, attached to the inside back cover of the paper copies. If you wish to keep a copy for yourself, print an extra copy.


5.2              Remember to budget for having your work printed and bound.

 

5.3              If you use Anglesea Printing Services, they will print and bind your project (48 hours notice needed at busy times). Save it as a PDF (to preserve margin settings). http://www.port.ac.uk/departments/services/printingservices/ has details about submitting your work, and about the cost.

 

5.4              Other options include printing your projects yourself and having them bound, or using the commercial copy services in the city. If you do this, collect the red card front cover blanks from the CAM office.

 

5.5              The 4pm deadline on the day of hand-in is a firm on – don’t be late at the counter.

 

 

6                    Printing and binding distance learning projects

 

6.1              Two paper copies and an electronic copy are required. When supplying the electronic copy (via the appropriate Moodle unit), include links to appendices. Electronic copies of the artefact are also required for engineering projects.

 

6.2              It is your responsibility to make sure that the paper copies reach the CAM office by the deadline. You can use a courier service – the address is:

 

The CAM Office, University of Portsmouth Lion Gate Building

Lion Terrace Portsmouth PO1 3HF

Saturday 12 June 2021

Bayesian Learning and Graphical Models Assignment

UK assignment helper

 

·        For this assignment, you need to submit the following TWO files.

 

1.      A written document (A single pdf only) covering all of the items described in the questions. All answers to the questions must be written in this document, i.e, not in the other files (code files) that you will be submitting. All the relevant results (outputs, figures) obtained by executing your R code must be included in this document.

For questions that involve mathematical formulas, you may write the answers manually (hand written answers), scan it to pdf and combine with your answer document. Submit a combined single pdf of your answer document.

 

2.      A separate “.R” file or ‘.txt’ file containing your code (R-code script) that you implemented to produce the results. Name the file as “name-StudentID-Ass2- Code.R" (where `name' is replaced with your name - you can use your surname or first name, and StudentID with your student ID).

 

·        All the documents and files should be submitted (uploaded) via SIT 743 Clouddeakin Assignment Dropbox by the due date and time.

·        Zip files are NOT accepted. All two files should be uploaded separately to the CloudDeakin.

·        E-mail or manual submissions are NOT allowed. Photos of the document are NOT

allowed.

=================================================================

Assignment tasks

 

Q1) [40 Marks]

 

Melbourne City council conducted a survey to study the relationship between the type of dwelling and the income profile of the people living in Melbourne city. A list of factors that influence the type of dwelling, along with their possible values, and a Bayesian network that represents the relationship between these factors (variables) are given below.

E (Education) {Graduate, Non-graduate}

 

G (Gender) { Male, Female}

 

A (Age) {35 or less, 35+ }

 

S (Salary) {$50k or less per annum, $50k+ per annum}

J (Job type) {Professional, Laborer, Student, Unemployed}


M (Marriage) {Married, Widowed, Single}

 

D (Dwelling) {Rent, Own house}


Figure 1


 

1.1)           Write down the joint distribution P(E, G, S, J, A, M, D) for the  above network.

1.2)           Find the minimum number of parameters required to fully specify the distribution according to the above network.

1.3)   

 

a)      Write down a joint probability density function if there are no  independence among the variables is assumed.

b)      How many parameters are required, at a minimum, if there are no independencies among the variables is assumed?

c)      Compare with the result of the above question (Q1.2) and comment.

 

 

1.4)            The Melbourne city council, from a previous study, found out that the Marriage is conditionally independent of job type, given the salary and Age. The Melbourne city council wants to modify the Bayesian  network given in  Figure 1 by incorporating this new information. Assume now that the Marriage is conditionally independent of job type, given the salary and Age, perform the following.

a)      What change will happen to the Bayesian network (shown in Figure 1) when the above assumption is considered. Draw the new Bayesian network considering the above assumption (you may draw this by hand).


b)      Compute the change in the minimum number of parameters required for this new Bayesian network, compared to the minimum number of parameters required for the Bayesian network shown in Figure 1. Comment on the results.

 

1.5)           d-separation method can be used to find two sets of independent or conditionally independent variables in a Bayesian network. Use the Bayesian network given in Figure 1 to answer the following:

For each of the statements/questions given below from (a) to (b), perform the following:

·        List all the possible paths from the first (set of) node/s to the second (set of) node/s considered for the independence check.

·        State if each of those paths is blocking or non-blocking with reasons.

·        Hence, answer the question about independence.

 

a)      Is dwelling (D) conditionally independent of gender (G) given salary (S)

and job type (J)?

b)      Is {E, G} ⊥ A | {D, J} ?

 

1.6)             For the Bayesian network shown in Figure 1, find all the nodes that are conditionally independent of education (E) given marriage (M).

 

 

1.7)           Write a R-Program to produce the Bayesian network shown in Figure 1, and perform the d-separation tests for cases given below. Show the plot of the network you obtained and the output (of d-separation test) from your program.

a) E ⊥ {A, G} | {S, M}

b) {S, A} ⊥ G | {E, J, D}

 

1.8)           For the Bayesian network shown in Figure 1,

 

a)      find the Markov blanket of job type (J).

b)      find all the nodes that are conditionally independent of job type (J) given  its Markov blanket.

c)      use R program to find the Markov blanket of salary (S). Plot the Bayesian network and show the Markov blanket nodes in the network using different colour.


1.9)      For the Bayesian network shown in Figure 1,

 

a)      show the step by step process to perform variable elimination to compute P(J | M = Married, A = 35+, E = Graduate). Use the following variable ordering for the elimination process:

G, S, D.

b)      what is the treewidth of the network, given the above elimination ordering?

 

 

[Marks 2+4+5+5+8+2+4+5+5 = 40]

 

 

 

 

Q2) [20 Marks] Implementing a Bayesian network in R and performing inference

 

A belief network models the relation between the T, W, H, R and S, which represents the temperature, wind speed, humidity, precipitation, and solar radiation respectively. Each variable takes different states as given below.

T (teneerature) {cold, hot}

 

W (wind seeed)   {low, nediun, ig} H (unidity) {low, ig}

R (erecieitation) ∈ {low, ℎigℎ}

 

S (solar radiation) {low, nediun, ig}

 

 

The belief network that models these variables has (probability) tables as shown below.


 

 

2.1)           Use the below libraries in R to create this belief network in R along with the probability values, as shown in the above table.


You may use the following libraries for this: #https://www.bioconductor.org/install/ #BiocManager::install(c("gRain", "RBGL", "gRbase")) #BiocManager::install(c("Rgraphviz")) library("Rgraphviz")

library(RBGL) library(gRbase) library(gRain)

#define the appropriate network and use the “compileCPT()”function to Compile list of conditional probability tables, and create the network.

 

 

a)      Show the obtained belief network for this distribution

b)      Show the probability tables obtained from the R output, (and verify with the above table).

 

 

2.2)           Use R program to compute the following probabilities:

 

a)      Given that the temperature is cold, what is the probability that humidity is

high?

b)      Find the joint distribution of temperature, humidity and precipitation.

c)      Given that the wind speed is medium and the precipitation is high, what is the probability that the sloar radiation is high?

d)      Find the marginal distribution of precipitation.

e)      Find P(R=high | T=cold, H=high)

f)       Find P(R=high | T=cold, H=high, S=low)

g)      Find P(R=high | T=cold, H=high, W=medium)

h)      Compare the results obtained in Q2.2 e), Q2.2 f), and Q2.2 g) above. Explain the reason for the observed behavior.

 

[Marks: (3+3) + (2+2+2+2+1+1+1+3) = 20]


Q3) [16 Marks]

Consider five binary variables A, B, C, D and E. The Directed Acyclic Graph (DAG) shown below describes the relationship between these variables along with their conditional probability tables (CPT).


 

3.1)               Obtain an expression (in a simplified form) for

P(C = 1|A = 0, B = 0, D = 1, E = 0). (Show the steps clearly).

 

3.2) 


The table shown below provides 20 simulated data obtained for the above Bayesian network. Use this data to find the maximum likelihood estimates of α, θ, þ, h and  a.

 

3.3)           Find the value of P(C = 1|A = 0, B = 0, D = 1, E = 0) using the appropriate values obtained from the above question Q3.2.

[Marks 9 + 5 + 2 = 16]


Q4) Bayesian Structure Learning [26 Marks]

 

For this question, you will be using a dataset, called alarm” available from the ‘bnlearn’ R package. which contains 37 variables. This provide an alarm message system  for patient monitoring.

 

Use the following R code to load the alarm dataset:

 

library (bnlearn) # load the data. data(alarm) summary(alarm)

 

The true network structure of this dataset can be viewed (plot) using the following R code.


 

 

Use R programming, as appropriate, to answers the following questions.

 

4.1)           Use the alarm dataset to learn Bayesian network structures using hill-climbing   (hc) algorithm, utilizing two different scoring methods, namely Bayesian Information Criterion score (BIC score) and the Bayesian Dirichlet equivalent score (Bde score), for each of the following sample sizes of the data:

 

a)      100 (first 100 data)

b)      1000 (first 1000 data)

c) 15000 (first 15000 data) For each of the above cases,

·         provide the scores obtained for BIC and BDe,

·         Plot the network structure obtained for the BIC and BDe scores.


4.2)           Based on the results obtained for the above question (Q 4.1), discuss how the BIC score compare with BDe  score for different sample sizes  in terms  of structure and score of the learned network.

 

4.3) 

a)                 Find the Bayesian network structures utilising the full dataset, and using both BIC and Bde scores. Show the scores and the obtained networks.

 

b)                 Compare the networks obtained above (in Q4.3.a) for each BIC and Bde scoring methods with the true network structure and comment. Use the “compare()” function and “graphviz.compare()” function available in the “bnlearn” R package to perform these comparisons and comment.

 

c)                 Fit the data to the network obtained using the BIC score in the above question (Q4.3.a) in order to compute the conditional probability distribution table entries (CPD table values). Show the obtained CPD table entries for the  variable ECO2”.

 

d)                 Use the above learned network obtained (in Q4.3.c) to find the probability of :

P(BP="HIGH" | STKV ="LOW", HR ="NORMAL", SAO2="NORMAL").

 

[Marks (3*4) + 3 + (4+3+2+2) = 26]

 

 

 

Q5) Research based questions (Practical  applications  in real world  – Bayesian network)  [38 Marks]

 

This is a HD (High Distinction) level question. Those students who target HD grade should answer this question (including answering all the above questions). For others, this question is an option. This question aims to demonstrate your expertise in the subject area and the ability to do your own research in the related area.

a)      Download the following article from the link provided below. Read that article and answer the following questions. This article provides a real life case study on creating and using a Bayesian network for road accident data analysis.

 

Ali Karimnezhad & Fahimeh Moradi (2017), Road accident data analysis using Bayesian networks, Transportation Letters, 9:1, 12-19,

DOI: 10.1080/19427867.2015.1131960

Web: https://www.tandfonline.com/doi/full/10.1080/19427867.2015.1131960

 

Note that you will be able to download this paper via Deakin library using your Deakin credentials (username and password). (https://www.deakin.edu.au/library/help/add-browser-bookmarklet)

 

 

i)                 Describe the dataset used for their analysis. What are the variables used? Are  the variables numerical or categorical or mixed? How many records of data have been used?


ii)               What is the name of the algorithm used for learning the Bayesian network structure? What software tool have been used to build and visualize the Bayesian network? Provide a web link to that software.

iii)             Read the section titled “Parameter learning in the road accident network” in that paper and extract the following probability values that they have computed, and mention them:

I.      The probability of being injured while wearing seat belt and driving a car, knowing that the driver has a diploma degree and a type 2 driving license.

II.      The probability of death while not wearing the seatbelt, knowing that  the driver has a diploma degree and a type 2 driving license

III.      The probability of being injured while not wearing the seatbelt, knowing that the driver has a diploma degree and a type 2 driving license

IV.      The probability of death while wearing seat belt and driving a car, knowing that the driver has a diploma degree and a type 2 driving license

V.      Based on the probability values obtained above, what conclusions are made?

 

b)     Read the following article that explains modeling air pollution, climate and  health data using Bayesian network.

 

Vitolo C., Scutari M., Ghalaieny M., Tucker A., & Russell A. (2018). Modeling air pollution, climate, and health data using Bayesian

Networks: A case study of the English regions. Earth and Space Science, 5, 76–88. https://doi.org/10.1002/2017EA000326

 

Note that you will be able to download this paper via Deakin library using your Deakin credentials (username and password). (https://www.deakin.edu.au/library/help/add-browser-bookmarklet

 

This paper has used pollution data, weather data and health data to produce the Bayesian network to study the link between pollution and health. It used a big data for analysis. Table 1 of this paper explains the variables considered for their analysis.

 

In this task you will be implementing and performing a similar modelling using a small data set from Australia. You will use the pollution data, weather data and health data from Australia to produce the Bayesian network. You can choose either a state or the whole of Australia for your analysis. In your task, instead of using big data, you should be using a very small data of size (of your choice) that can be used/loaded on to your computer and run. So, choose the data size wisely.

 

You need to perform the following tasks and prepare a report explaining the details of the full implementation and results. You should also upload the programming code that you used to implement and produce your results, along with the details which explains how to run the code including all of the relevant packages/libraries to use (do not upload packages; only the links/library names needed so that it can be loaded and executed).


·        Find the appropriate data and clean them (see below for some suggestions). In the report, you should clearly explain what data you used, how you cleaned them and how you handled the missing data, if any, and other processing performed in preparing the data. Include details about the period of data considered.

·        Consider no more than 20 variables for your task. Explain the variables you have chosen in the report. You can either choose to convert the continuous variables into discrete variable and perform the analysis (produce the Bayesian network), or you can use a mixed category of variables (continuous and categorical) for your analysis. This is your choice, and it should be explained clearly in the report.

·        Perform an exploratory analysis on the variables you have selected. Provide some relevant visualizations, such as histogram plots, and summary statistics for the variables considered (relevant results may be presented in a table from), and briefly describe them.

·        Perform appropriate Bayesian structure learning and parameter learning to produce the Bayesian networks using your data. You may choose to include black lists (i.e., using prior knowledge to guide the learning by excluding certain edges) in your structure learning process, if needed. Experiment with   at least 2 or more methods and compare the Bayesian networks produced. You may choose to use the latest algorithms published in academic journal/conference papers recently on Bayesian structure learning in your analysis. Note that, including more on the recent/latest algorithms will increase the chances of scoring more marks for this task. Implement the relevant codes in R and produce the Bayesian network. You may choose to use any existing code and modify it to suit your needs, but proper referencing and comments should be included, along with clearly explaining  how it is used, and where  the changes are made. The code file should be uploaded along with the submission. The report should have clear explanations (technical details) for the algorithms used.

·        Report should explain the methods used to produce the Bayesian network, the

settings/parameters/metrics used, if any, the results obtained, including the Bayesian network structure, and a clear discussion/comments on the obtained network. The report should have enough details so that the results can be reproduced exactly based on the reported details.

·        Report for this question, Q5 (b), should not exceed a maximum of 4 pages, including, figures, tables and references. Note that the report should be clearly and neatly written and presented with proper subheadings with details, including appropriate tables, diagrams, plots, results, etc. The paper above is a good example for how to present the details clearly and professionally.

 

 

You are free to choose a suitable time period for your analysis, depending on the data availability, for example three years.

The following links might be helpful to find the relevant data (do your own search, and you are free choose other relevant sites as appropriate for your data and analysis; remember to explain them in the report with references):


Pollution data:

 

·        https://aqicn.org/map/australia/

·        https://www.epa.vic.gov.au/for-community/airwatch

·        Choose the number of stations wisely, depending on the region you consider for analysis.

·        https://www.dpie.nsw.gov.au/air-quality/air-quality-concentration-data- updated-hourly

 

Weather Data:

 

·        http://www.bom.gov.au/climate/data/

 

 

Health date:

 

·        Underlying causes of death (Australia) https://www.abs.gov.au/statistics/health/causes-death/causes-death- australia/2019/3303_1%20Underlying%20causes%20of%20death%20%28A ustralia%29.xlsx

 

You can consider the ICD10 codes (International Classification for Diseases codes) for cardiovascular-pulmonary diseases (CVD), which is “J00-J99”. This code accounts for Diseases of respiratory system. For example, in Table 1.2 (tab) of the Excel sheet (from the above link), you can see yearly data (see line/row number 786) for Diseases of respiratory system.

 

 

NOTE: Your report for all of the above questions must be written on your own words. Copying directly from the paper/reference text or if any sign of collusion is detected, zero marks will be given and reported for Plagiarism and collusion processing.