Data Mining and Data Warehousing
Data Mining网课代修 In an increasingly competitive information age, data mining and data warehousing are essential in business decision-making.
MASY1-GC 3510-100| Summer 2023 | 7/10/23 – 8/16/23 | 3 Credits
Modality: In-Person
General Course Information
Name/Title: Amit Patel, Adjunct Instructor, He/Him/His
NYU Email: asp13@nyu.edu
Class Meeting Schedule: 7/10/23 – 8/16/23 | Mondays & Wednesdays / 6:20pm – 9:20pm
Class Location: TBD
Office Hours: Tuesday 7:30PM via Zoom meeting. Please email me at least a day before to schedule the zoom meeting.
Description Data Mining网课代修
In an increasingly competitive information age, data mining and data warehousing are essential in business decision-making. This course teaches students concepts, methods and skills for working with data warehouses and mining data from these warehouses to optimize competitive business strategy. In this course, students develop analytical thinking skills required to identify effective data warehousing strategies such as when to use outsource or in-source data services. Students also learn to Extract, Transform and Load data into data warehouses (the ETL process) and use the CRISP approach to data mining to extract vital information for data warehouses. The course also teaches students how to secure data and covers the ethical issues associated with the uses of data and data models for business decisions.
Prerequisites
1210 – Quantitative Models for Decision Makers
Learning Outcomes
- At the conclusion of this course, students will be able to:
- Translate business requirements into a well-constructed, normalized conceptual and logical data models
- Apply logical database design and the relational model
- Apply the CRISP model to conduct successful data mining
- Establish a successful ETL process to load a data warehouse
- Write basic SQL statements including some advanced SQL features
- Employ appropriate data governance principles to assure data quality and security
Communication Methods Data Mining网课代修
Be sure to turn on your NYU Brightspace notifications and frequently check the“Announcements” section of the course site. This will be the primary method I use to communicate information critical to your success in the course. To contact me, send me an email. I will respond within 24 hours. Credit students must use their NYU email to communicate. Non-degree students do not have NYU email addresses. Brightspace course mail supports student privacy and FERPA guidelines. The instructor will use the NYU email address to communicate with students. All email inquiries will be answered within 24 hours.
Structure | Method | Modality
There are 12 session topics in this course.
Active learning experiences and small group projects are key components of the course. Assignments, papers, and exams will be based on course materials (e.g.,readings, videos), lectures, and class discussions. Course sessions will be conducted synchronously on NYU Zoom, which you can access from the course site in NYU Brightspace.
This course is in-person and will meet twice a week on Monday and Wednesday, with assignments, announcements and emails being sent through Brightspace. Students are expected to check email and/or Brightspace at least twice a week for announcements concerning assignments, class changes or cancellations, and other important information. The course will involve lecture/discussions/forum discussions as well as case studies. Two major papers/projects are required that will both be done on an individual basis.
Expectations
Learning Environment
You play an important role in creating and sustaining an intellectually rigorous and inclusive classroom culture. Respectful engagement, diverse thinking, and our lived experiences are central to this course and enrich our learning community.
Participation
You are integral to the learning experience in this class. Be prepared to actively contribute to class activities, group discussions, and work outside of class.
Assignments and Deadlines
Homework:
Homework assignments must be submitted on time within 1 week of date assigned (unless otherwise instructed). Late submission will not be accepted altogether at instructor’s discretion. All homework must be submitted to the appropriate assignment folder online.
Group/Team Project:
There will be a group/team class project. The project will be a culmination of written,visual, and proper presentation skills. It will include the culmination of topics, concepts and competencies learned in this class. The group project grade will be based on:
Student level of participation in the team project. Data Mining网课代修
Student will be assessed both as an individual, and as part of the overall teamIndividual contribution will be assessed by identifying the components of the project
student worked on and contributed to the overall project (Example database creation,data preparation and load, etc.)
Group contribution will be assessed on overall project depth of content, write-up, and delivery.
For the group assessment portion, all individuals within the group will receive the same grade.
Fulfilment of all requirements stated for the project defined under “final project” on the course web site.
All groups have the same group assignment
All requirements for the group project are defined on the course web site.
Midterm Exam:
There will be a midterm exam. The exam will be an open book, open notes/internet style exam. The exam will test the student’s acquisition of topics, concepts and competencies learned in this class up to mid-term.
Final Exam:
There will be a final exam. The exam will be an open book, open notes/internet style exam. The exam will test the student’s acquisition of topics, concepts and competencies learned in this class. The final exam will only cover material covered in the second half of the term.
Course Technology Use
We will utilize multiple technologies to achieve the course goals. I expect you to use technology in ways that enhance the learning environment for all students. All class sessions require use of Zoom. All class sessions require use of technology (e.g., laptop,computer lab) for learning purposes.
Feedback and Viewing Grades
I will provide timely meaningful feedback on all your work via our course site in NYU Brightspace. You can access your grades on the course site Gradebook.
Attendance
I expect you to attend all class sessions. Attendance will be taken into consideration when determining your final grade. Refer to the SPS Policies and Procedures page for additional information about attendance. Data Mining网课代修
Excused absences are granted in cases of documented serious illness, family emergency, religious observance, or civic obligation. In the case of religious observance or civic obligation, this should be reported in advance. Unexcused absences from sessions may have a negative impact on a student’s final grade. Students are responsible for assignments given during any absence.
Each unexcused absence or being late may result in a student’s grade being lowered by a fraction of a grade. A student who has three unexcused absences may earn a Failgrade.
University Calendar Policy on Religious Holidays:
https://www.nyu.edu/about/policies-guidelines-compliance/policies-and
guidelines/university-calendar-policy-on-religious-holidays.html
Students who join the course during add/drop are responsible for ensuring that they identify what assignments and preparatory work they have missed and complete and submit those per the syllabus.
Textbooks and Course Materials
Required:
The Kimball Group Reader: Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection
Authors – Ralph Kimball, Margy Ross
Publisher – Wiley; 2nd edition (February 1, 2016)
ISBN – ISBN- 978-1-119-21659-9 or, ASIN: B01BEUOY4C
Students can purchase these items through the NYU Bookstore.
We will be using Oracle Data Modeler, MySQL community server database, and MySQL workbench client for assignments and labs in this course. The below software(s) downloads are free for educational use.
MySQL Community sever (Database): https://dev.mysql.com/downloads/mysql/
MYSQL Workbench (Database client):
https://dev.mysql.com/downloads/workbench/
Oracle SQL Developer Data Modeler
http://www.oracle.com/technetwork/developer
tools/datamodeler/overview/index.html
Recommended:
Data Mining: Concepts, Models, Methods, and Algorithms, 3rd Edition
Authors – Mehmed Kantardzic
Publisher – Wiley-IEEE Press, 2019
ISBN – 978-1-119-51607-1
Grading | Assessment
Your grade in this course is based on your performance on multiple activities and assignments. Since all graded assignments are related directly to course objectives and learning outcomes, failure to complete any assignment will result in an unsatisfactory course grade. Please carefully read all assignments and follow instructions thoroughly, proof-read your written assignments before submitting them for a grade. Students can have multiple submission towards same assignment. The latest submission will be considered for the grading.
See the “Grades” section of Academic Policies for the complete grading policy, including the letter grade conversion, and the criteria for a grade of incomplete, taking a course on a pass/fail basis, and withdrawing from a course.
Course Outline Data Mining网课代修
Start/End Dates: 7/10/23 – 8/16/23 | Mondays & Wednesdays
Time: 6:20 pm – 9:20 pm
Summer Session Two: 6W2
No Class Date(s): N/A
Special Notes: N/A
Number of Sessions: 12
Session 1 – 07/10/23
Topic Description: Introduction to Data Warehousing
Introduction to Data Warehousing
Relationship of Data Mining and Data Warehousing
What is a Data Warehouse?
Data Warehousing ROI
DSS – Decision Support Systems
Operational vs. Analytical Systems
Evolution of DSS and Data Warehousing
OLTP – Online Transaction Processing
Characteristics of a Data Warehouse
What is Data Mart? Creating a Data Mart
Data Comparison Chart
OLAP – Online Analytical Processing
Assignments: (due next Wednesday)
Reading: Chapter 1 & 2 (The Kimball Group Reader)
HW1: Individual Group Project Proposal
Session 2 – 07/12/23
Topic description – Planning and Building the Data Warehouse
Planning & Building the Data Warehouse Sponsorship and Cost Justification
Project Prerequisites
Barriers, Challenges and Risks
Preparing for Implementation
Developing the Data Warehouse
SDLC Methodologies – Waterfall vs. RUP Approach
Planning & Project Management
Analysis
Implementation and Deployment
Operations
Assignments: (due next Monday)
Reading: Chapter 3 & 4 (The Kimball Group Reader)
HW2: Logical Data Model
Group Project: Week 3 – Project Proposal (2%)
Session 3 – 07/17/23
Topic description – Data Warehouse Design
Data Warehouse Design
Drivers for Multi-Dimensional Analysis
Limitations of Relational Models
The Data Cube
What is dimensional modeling?
Advantages of Dimensional Models
Logical and Physical Design
Data Normalization
Benefits and Drawbacks of Data Normalization
De-Normalizing of Data
Characteristics of a Data Warehouse
Assignments: (due next Wednesday)
Reading: Chapter 5 (The Kimball Group Reader)
HW3: Basic SQL Data Mining网课代修
Session 4 – 07/19/23
Topic description – Data Warehouse Schemas
Data Warehouse Schemas
Dimensions and Dimension Tables
Facts and Fact Tables
The Star Schema
The Snowflake Schema
Degenerate and Junk Dimensions
The Data Warehouse Bus Architecture
Conformed Dimensions and Standard Facts
Data Granularity Changing Dimensions
Assignments: (due next Monday)
Reading: Chapter 6 & 7 (The Kimball Group Reader)
HW4: Enhanced SQL
Group Project: Week 5 – Transactional Database (3%)
Session 5 – 07/24/23
Topic description – Components of a Data Warehouse
Components of a Data Warehouse
Source Systems, Staging Area, Presentation, Access Tools
Building the Data Matrix
The Four Steps Process
Multiple Fact Tables in a single Data Mart
Chain, Heterogeneous, Transaction/Snapshot & Aggregate Facts
Fact and Dimension Table Detail
Identifying Source for each Fact & Dimension
Mapping from Source to Target
Assignments: (due next Wednesday)
Reading: Chapter 8 & 9 (The Kimball Group Reader)
HW5: Physical Data Model
Session 6 – 07/26/23
Topic description – The ETL Process
The ETL Process
Extracting the Data into the Staging Area
The Challenge of Extracting from Disparate Platforms
Full vs. Incremental Extracts
Detecting Changes to Data
Transforming the Data
Complexity of Data Integration
Dealing with Missing & Dirty Data
Data Transformation Tasks
Loading the Data
Timing and Job Control of Data Loads
Assignments: (due next Monday)
Reading: Chapter 11 (The Kimball Group Reader)
Session 7 – 07/31/23
Topic description – Midterm Exam
Assignments: (due next Wednesday)Group Project: Week 8 – Data Warehouse & ETL Process (5%)
Session 8 – 08/02/23
Topic description – Introduction to Data Visualization
Introduction to Data Visualization
Tableau Environment
Tableau connection to Data Warehouse
Assignments: (due next Monday)
Reading: Online web research and reading
HW6: Tableau Data Visualization
Session 9 – 08/07/23
Topic description – Introduction to Data Mining
Why Data Mining?
What Is Data Mining?
A Multi-Dimensional View of Data Mining
What Kind of Data Can Be Mined?
What Kinds of Patterns Can Be Mined?
What Technology Are Used?
What Kind of Applications Are Targeted?
Major Issues in Data Mining
Assignments: (due next Wednesday)
Reading: Chapter 1 & 2 (Data Mining: Concepts, Models, Methods, and Algorithms)
HW7: Tableau Lobbying
Session 10 – 08/09/23
Topic description – Getting to Know Your Data
Data Objects and Attribute Types
Basic Statistical Descriptions of Data
Data Visualization
Measuring Data Similarity and Dissimilarity
Topic description: Data Preprocessing
Data Preprocessing: An Overview
Data Quality
Major Tasks in Data Preprocessing
Data Cleaning
Data Integration
Data Reduction
Data Transformation and Data DiscretizationAssignments: (due next Monday)
Reading: Chapter 3, 4, 5 (Data Mining: Concepts, Models, Methods, and Algorithms)
HW8: Tableau Data Mining
HW9: Tableau Data Mining
Group Project: Week 11 – Report and Visualization (5%)
Session – 08/14/23
Topic description – Data Mining Techniques
Data Mining Techniques
Predictive Modeling
Classification, Regression, Similarity Matching, Co-occurrence Grouping
Clustering/Segmentation
Data Mining and Statistics Terminologies Data Mining网课代修
Supervised vs. Unsupervised
Data Mining Statistical Techniques
Clustering, Segmentation and Nearest Neighbor Techniques
Keys to commercial success of Data Mining
Assignments: (due next Wednesday)
Reading: Chapter 6, 9 (Data Mining: Concepts, Models, Methods, and Algorithms)
HW10: Tableau Data Mining
Group Project: Week 13 – Final Presentation (15%)
Session – 08/16/23
Topic description – Final Day
Final Exam
NOTES:
The syllabus may be modified to better meet the needs of students and to achieve the learning outcomes.
The School of Professional Studies (SPS) and its faculty celebrate and are committed to inclusion, diversity, belonging, equity, and accessibility (IDBEA), and seek to embody the IDBEA values. The School of Professional Studies (SPS), its faculty, staff, and students are committed to creating a mutually respectful and safe environment (from the SPS IDBEA Committee).
New York University School of Professional Studies Policies
- Policies – You are responsible for reading, understanding, and complying with University Policies and Guidelines, NYU SPS Policies and Procedures, and Student Affairs and Reporting.
- Learning/Academic Accommodations – New York University is committed to providing equal educational opportunity and participation for students who disclose their dis/ability to the Moses Center for Student Accessibility. If you are interested in applying for academic accommodations, contact the Moses Center as early as possible in the semester. If you already receive accommodations through the Moses Center, request your accommodation letters through the Moses Center Portal as soon as possible(mosescsa@nyu.edu | 212-998-4980).
- Health and Wellness – To access the University’s extensive health and mental health resources, contact the NYU Wellness Exchange. You can call its private hotline (212- 443-9999), available 24 hours a day, seven days a week, to reach out to a professional who can help to address day-to-day challenges as well as other health-related concerns. Data Mining网课代修
- Student Support Resources – There are a range of resources at SPS and NYU to support your learning and professional growth. For a complete list of resources and services available to SPS students, visit the NYU SPS Office of Student Affairs site.
- Religious Observance – As a nonsectarian, inclusive institution, NYU policy permits members of any religious group to absent themselves from classes without penalty when required for compliance with their religious obligations. Refer to the University Calendar Policy on Religious Holidays for the complete policy.
- Academic Integrity and Plagiarism – You are expected to be honest and ethical in all academic work. Moreover, you are expected to demonstrate how what you have learned incorporates an understanding of the research and expertise of scholars and other appropriate experts; and thus recognizing others’ published work or teachings—whether that of authors, lecturers, or one’s peers—is a required practice in all academic projects.
Plagiarism involves borrowing or using information from other sources without proper and full credit. You are subject to disciplinary actions for the following offenses which include but are not limited to cheating, plagiarism, forgery or unauthorized use of documents, and false form of identification Turnitin, an originality detection service in NYU Brightspace, may be used in this course to check your work for plagiarism.
Read more about academic integrity policies at the NYU School of Professional Studies on the Academic Policies for NYU SPS Students page.7. Use of Third-Party Tools – During this class, you may be required to use non-NYU apps/platforms/software as a part of course studies, and thus, will be required to agree to the “Terms of Use” (TOU) associated with such apps/platforms/software.
These services may require you to create an account but you can use a pseudonym(which may not identify you to the public community, but which may still identify you by IP address to the company and companies with whom it shares data).
You should carefully read those terms of use regarding the impact on your privacy rights and intellectual property rights. If you have any questions regarding those terms of use or the impact on the class, you are encouraged to ask the instructor prior to the add/drop deadline.