The course will focus on data mining and machine learning algorithms for analyzing very large amounts of data. Map Reduce and No SQL system will be used as tools/standards for creating parallel algorithms that can process very large amounts of data. Storage, retrieval, analysis, and knowledge discovery using Big Data has made significant inroads in several domains in industry, research, and academia. In this course, we will look at the dominant software systems and algorithms for coping with Big Data. Topics covered include scalable computing models large-scale, non-traditional data storage frameworks including graph, key-value, and column-family storage systems; data stream analysis; scalable prediction models and in-memory storage systems. Prerequisite: CS 22003.
Offered Spring for 3 Semester hours.