In this assessment, the students will extend their previous work from assessment A3 Business case understanding. Here, the students have to submit a report of the data mining process on a real-world scenario and a presentation and QA Session will be held based on the report written. The report will consist of the details of every step followed by the students. Detailed Submission Requirements Cover Page • Title • Group members Introduction • Importance of the chosen area • Why this data set is interesting • What has been done so far • Which can be done • Description of the present experiment 1. Data preparation and Feature extraction: 1.1 Select data o Task Select data 1.2 Clean data o Task Clean data o Output Data cleaning report 1.3 Construct data/ feature extraction o Task Construct data o Output Derived attributes o Activities: Derived attributes o Add new attributes to the accessed data o Activities Single-attribute transformations o Output Generated records Report (10%): Week 11, Friday, 04 June 2021, 11:59 pm via Moodle. Presentation and QA Session (15%): Week 12 In Class. 2 Modeling 2.1 Select modeling technique o Task – Select Modelling Technique 2.2 Output Modeling technique o Record the actual modeling technique that is used. 2.3 Output Modeling assumption o Activities Define any built-in assumptions made by the technique about the data (e.g. quality, format, distribution). Compare these assumptions with those in the Data Description Report. Make sure that these assumptions hold and step back to the Data Preparation Phase if necessary. You can explain the data file here, even when it is pre prepared. 3 Generate test design 3.1 Task Generate test design o Activities Check existing test designs for each data mining goal separately. Decide on necessary steps (number of iterations, number of folds etc.). Prepare data required for test. (You can use 66% of records for model Building and rest for Testing) 3.2 Build model o Task – Build model Run the modeling tool on the prepared dataset to create one or more models. (Using Knime Tool as shown in the lab). 3.3 Output Parameter settings o Activities – Set initial parameters. Document reasons for choosing those values. o Activities – Run the selected technique on the input dataset to produce the model. Post-process data mining results (e.g. editing rules, display trees). 3.4 Output Model description o Activities – Describe any characteristics of the current model that may be useful for the future. Give a detailed description of the model and any special features. o Activities – State conclusions regarding patterns in the data (if any); sometimes the model reveals important facts about the data without a separate Assessment process (e.g. that the output or conclusion is duplicated in one of the inputs). 4 Evaluation and Conclusion Previous evaluation steps dealt with factors such as the accuracy and generality of the model. This step assesses the degree to which the model meets the business objectives and seeks to determine if there is some business reason why this model is deficient. It compares results with the evaluation criteria defined at the start of the project. A good way of defining the total outputs of a data mining project is to use the equation: RESULTS = MODELS + FINDINGS In this equation we are defining that the total output of the data mining project is not just the models (although they are, of course, important) but also findings which we define as anything (apart from the model) that is important in meeting objectives of the business (or important in leading to new questions, line of approach or side effects (e.g. data quality problems uncovered by the data mining exercise).
- Assignment status: Already Solved By Our Experts
- (USA, AUS, UK & CA PhD. Writers)
- CLICK HERE TO GET A PROFESSIONAL WRITER TO WORK ON THIS PAPER AND OTHER SIMILAR PAPERS, GET A NON PLAGIARIZED PAPER FROM OUR EXPERTS