Fall 2014 CSE 5334/4334 Data Mining

Course Information Instructor: Naeemul Hassan TA 1: Gensheng Zhang TA 2: Chinmay Srinath
  • Office hours: Tue/Thu 10:00am-12:00pm
  • Office: ERB 509
  • Phone: (817) 437-4518
  • E-mail: naeemul.hassan@mavs.uta.edu
  • Homepage: http://idir.uta.edu/~naeemul
  • Office hours: Mon 4:00pm-6:00pm
  • Office: ERB 504
  • E-mail: gensheng.zhang@mavs.uta.edu
  • Office hours: Fri 1:30pm-3:30pm
  • Office: ERB 562
  • E-mail: chinmay.srinath@mavs.uta.edu

Course Description

This is an introductory course on data mining. Data Mining refers to the process of automatic discovery of patterns and knowledge from large data repositories, including databases, data warehouses, Web, document collections, and data streams. We will study the basic topics of data mining, including data preprocessing, data warehousing and OLAP, data cube, frequent pattern and association rule mining, correlation analysis, classification and prediction, and clustering, as well as advanced topics covering the techniques and applications of data mining in Web, text, big data, social networks, and computational journalism.

Student Learning Outcomes

A solid understanding of the basic concepts, prunciples, and techniques in data mining; an ability to analyze real-world applications, to model data mining problems, and to assess different solutions; an ability to design, implement, and evaluate data mining software.




The final letter grades will be based on the curve of students' performace.


At The University of Texas at Arlington, taking attendance is not required. Rather, each faculty member is free to develop his or her own methods of evaluating students’ academic performance, which includes establishing course-specific policies on attendance. As the instructor of this section, I require all students to attend lectures.


Stay tuned and make sure to check Blackboard frequently. Important announcements will be posted there.

Assignments and Deadlines


Regrading request must be made within 7 days after we post scores on Blackboard. TA will handle regrade requests. If student is not satisfied with the regarding results, you get 7 days to request again. The instructor will regrade, and the decision is final.

Drop Policy

Students may drop or swap (adding and dropping a class concurrently) classes through self-service in MyMav from the beginning of the registration period through the late registration period. After the late registration period, students must see their academic advisor to drop a class or withdraw. Undeclared students must see an advisor in the University Advising Center. Drops can continue through a point two-thirds of the way through the term or session. It is the student's responsibility to officially withdraw if they do not plan to attend after registering. Students will not be automatically dropped for non-attendance. Repayment of certain types of financial aid administered through the University may be required as the result of dropping classes or withdrawing. For more information, contact the Office of Financial Aid and Scholarships (http://wweb.uta.edu/ses/fao).

Americans with Disabilities Act

The University of Texas at Arlington is on record as being committed to both the spirit and letter of all federal equal opportunity legislation, including the Americans with Disabilities Act (ADA). All instructors at UT Arlington are required by law to provide "reasonable accommodations" to students with disabilities, so as not to discriminate on the basis of that disability. Any student requiring an accommodation for this course must provide the instructor with official documentation in the form of a letter certified by the staff in the Office for Students with Disabilities, University Hall 102. Only those students who have officially documented a need for an accommodation will have their request honored. Information regarding diagnostic criteria and policies for obtaining disability-based academic accommodations can be found at www.uta.edu/disability or by calling the Office for Students with Disabilities at (817) 272-3364.

Title IX

The University of Texas at Arlington is committed to upholding U.S. Federal Law “Title IX” such that no member of the UT Arlington community shall, on the basis of sex, be excluded from participation in, be denied the benefits of, or be subjected to discrimination under any education program or activity. For more information, visit www.uta.edu/titleIX.

Academic Integrity

All students enrolled in this course are expected to adhere to the UT Arlington Honor Code:

I pledge, on my honor, to uphold UT Arlington’s tradition of academic integrity, a tradition that values hard work and honest effort in the pursuit of academic excellence.I promise that I will submit only work that I personally create or contribute to group collaborations, and I will appropriately reference any work from other sources. I will follow the highest standards of integrity and uphold the spirit of the Honor Code.

Instructors may employ the Honor Code as they see fit in their courses, including (but not limited to) having students acknowledge the honor code as part of an examination or requiring students to incorporate the honor code into any work submitted. Per UT System Regents’ Rule 50101, §2.2, suspected violations of university’s standards for academic integrity (including the Honor Code) will be referred to the Office of Student Conduct. Violators will be disciplined in accordance with University policy, which may result in the student’s suspension or expulsion from the University.

Student Support Services

UT Arlington provides a variety of resources and programs designed to help students develop academic skills, deal with personal situations, and better understand concepts and information related to their courses. Resources include tutoring, major-based learning centers, developmental education, advising and mentoring, personal counseling, and federally funded programs. For individualized referrals, students may visit the reception desk at University College (Ransom Hall), call the Maverick Resource Hotline at 817-272-6107, send a message to resources@uta.edu, or view the information at www.uta.edu/resources.

Electronic Communication

UT Arlington has adopted MavMail as its official means to communicate with students about important deadlines and events, as well as to transact university-related business regarding financial aid, tuition, grades, graduation, etc. All students are assigned a MavMail account and are responsible for checking the inbox regularly. There is no additional charge to students for using this account, which remains active even after graduation. Information about activating and using MavMail is available at http://www.uta.edu/oit/cs/email/mavmail.php.

Student Feedback Survey

At the end of each term, students enrolled in classes categorized as lecture, seminar, or laboratory shall be directed to complete a Student Feedback Survey (SFS). Instructions on how to access the SFS for this course will be sent directly to each student through MavMail approximately 10 days before the end of the term. Each student’s feedback enters the SFS database anonymously and is aggregated with that of other students enrolled in the course. UT Arlington’s effort to solicit, gather, tabulate, and publish student feedback is required by state law; students are strongly urged to participate. For more information, visit http://www.uta.edu/sfs.

Final Review Week

A period of five class days prior to the first day of final examinations in the long sessions shall be designated as Final Review Week. The purpose of this week is to allow students sufficient time to prepare for final examinations. During this week, there shall be no scheduled activities such as required field trips or performances; and no instructor shall assign any themes, research problems or exercises of similar scope that have a completion date during or following this week unless specified in the class syllabus. During Final Review Week, an instructor shall not give any examinations constituting 10% or more of the final grade, except makeup tests and laboratory examinations. In addition, no instructor shall give any portion of the final examination during Final Review Week. During this week, classes are held as scheduled. In addition, instructors are not required to limit content to topics that have been previously covered; they may introduce new concepts as appropriate.

Emergency Exit Procedures

Should we experience an emergency event that requires us to vacate the building, students should exit the room and move toward the nearest exit, which is located right outside the door. When exiting the building during an emergency, one should never take an elevator but should use the stairwells. Faculty members and instructional staff will assist students in selecting the safest route for evacuation and will make arrangements to assist handicapped individuals.


As the instructor for this course, I reserve the right to adjust this schedule in any way that serves the educational needs of the students enrolled in this course.

Date # Lecture Assignment Lecture Notes Extra Reading
Out Due
Thu, Aug 21, 2014 1 Course Overview PDF
Tue, Aug 26, 2014 2 Introduction (Chapter 1) PDF Raw Data Cleaned Data
Data Warehousing, OLAP, Data Cube (Chapter 3, 4)
Thu, Aug 28, 2014 3 Data Warehousing, OLAP, Data Cube PDF OLTP
Tue, Sep 02, 2014 4 Cancelled. Rescheduled on Friday, Sep 19, 04:00PM @ NH 106
Thu, Sep 04, 2014 5 Cancelled. Rescheduled on Friday, Nov 21, 03:00PM @ NH 106
Tue, Sep 09, 2014 6 Data Warehousing, OLAP, Data Cube
Thu, Sep 11, 2014 7 Data Warehousing, OLAP, Data Cube HW1
Classification and Prediction (1) (Chapter 6)
Tue, Sep 16, 2014 8 Decision Tree PDF
Thu, Sep 18, 2014 9 Decision Tree (continued) Decision Tree [Entropy Based] in Python
Introduction to Data Mining Research (1)
Fri, Sep 19, 2014
Time: 04:00PM
Room: NH 106
Incremental Discovery of Prominent Situational Facts (guest lecture by Afroza Sultana) P1 PDF Situational Fact Paper
Classification and Prediction (1) (Chapter 6) Continued
Tue, Sep 23, 2014 10 Course Project HW2
Thu, Sep 25, 2014 11 Bayesian Classifiers HW1 PDF
Tue, Sep 30, 2014 12 Bayesian Classifiers (continued)
Text and Web Mining (1)
Thu, Oct 02, 2014 13 Vector Space Model PDF page [01-42] Textbook Excerpt
Tue, Oct 07, 2014 14 Document Classification/Document Clustering HW2 PDF page [43-73]
Thu, Oct 09, 2014 Midterm Exam
Classification and Prediction (2) (Chapter 6)
Tue, Oct 14, 2014 15 Nearest Neighbor Classifiers PDF
Thu, Oct 16, 2014 16 Evaluating Classification Models P2 P1 PDF page [01-18]
Tue, Oct 21, 2014 17 Evaluating Classification Models (continued) HW3 PDF page [19-43]
Thu, Oct 23, 2014 18 Support Vector Machine PPT
Clustering (Chapter 7)
Tue, Oct 28, 2014 19 Overview of Clustering, Similarity/Dissimilarity Measure PPT
Tue, Oct 29, 2014 Last Day to Drop Class
Thu, Oct 30, 2014 20 K-means PPT
Tue, Nov 04, 2014
Time: 02:00PM
Room: PKH 212
21 K-means (continued)
Thu, Nov 06, 2014
Time: 02:00PM
Room: COBA 243
22 Hierarchical clustering HW4 HW3 PPT
Tue, Nov 11, 2014 23 Hierarchical clustering (continued) P3 P2 PDF page [74-110]
Frequent Pattern and Association Rule Mining (Chapter 5)
Thu, Nov 13, 2014 24 Association Rule Mining PPT
Tue, Nov 18, 2014 25 Association Rule Mining (continued)
Thu, Nov 20, 2014 26 Correlation Analysis PPT
Introduction to Data Mining Research (2)
Fri, Nov 21, 2014
Time: 03:00PM
Room: NH 106
Skyline query and its application (Naeemul Hassan) PPT Skyline Group Paper
Text and Web Mining (2)
Tue, Nov 25, 2014 27 Course Review HW4
Tue, Nov 27, 2014 Thanksgiving Holiday
Introduction to Data Mining Research (3)
Tue, Dec 02, 2014
Time: 02:00PM
Room: PKH 212
28 Class Cancelled P3
Tue, Dec 09, 2014
Time: 02:00PM-04:30PM
Room: NH 100
Final Exam