UNIT I:
Introduction to Big Data- The Evolution of Data Management, Defining Big Data, Understanding the Waves of Managing Data, building a Successful Big Data Management Architecture, Examining Big Data Types: Structured Data, Unstructured Data. Putting Big Data Together. Brief History of Distributed Computing, Basics of Distributed Computing for big data.
UNIT II:
Exploring the Big Data Stack -
- Layer 0: Redundant Physical Infrastructure,
- Layer 1: Security Infrastructure,
- Layer 2: Operational Databases,
- Layer 3: Organizing Data Services and Tools,
- Layer 4: Analytical Data Warehouses.
Big Data Analytics, Big Data Applications. Virtualization: Basics of Virtualization, Server virtualization, Application virtualization, Network virtualization, Processor and memory virtualization, Data and storage virtualization, Managing Virtualization with the Hypervisor, Implementing Virtualization to Work with Big Data.
UNIT III:
Analytics and Big Data- Basic analytics, Advanced analytics, Operationalized analytics, Monetizing analytics, Text Analytics and Big Data, Social media analytics, Text Analytics Tools for Big Data, Attensity, Clarabridge, OpenText.
MapReduce Fundamentals- Understanding the map function, Adding the reduce function. Anatomy of a Map Reduce Job Run, Failures, Job Scheduling, Shuffle and Sort, Task Execution, Map Reduce Types and Formats, Map Reduce Features.
UNIT IV:
Exploring Hadoop- Hadoop & its Features, Hadoop Ecosystem, Hadoop 2.x Core Components, Hadoop Storage: Understanding the Hadoop Distributed File System, Hadoop Processing: MapReduce Framework, Different Hadoop Distributions. Pig: Introduction to PIG, Execution Modes of Pig, Comparison of Pig with Databases, Grunt, Pig Latin, User Defined Functions, Data Processing operators.
HDFS (Hadoop Distributed File System): The Design of HDFS, HDFS Concepts, Command Line Interface, Hadoop file system interfaces, Data flow, Data Ingest with Flume and Scoop and Hadoop archives, Hadoop I/O: Compression, Serialization, Avro and File-Based Data structures.
Textbooks:
- Judith S. Hurwitz, Alan F. Nugent, Fern Halper, Marcia A. Kaufman, “Big Data For Dummies”, John Wiley & Sons, Inc.(2013)
- Robert D. Schneider, “Hadoop For Dummies”, John Wiley & Sons, Inc. (2012)
- Tom White “Hadoop: The Definitive Guide” Third Edit on, O’reily Media, 2012.
- Seema Acharya, Subhasini Chellappan, "Big Data Analytics" Wiley 2015.
Reference Books:
- Paul Zikopoulos, “Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data”, McGraw Hill (2012).
- Nathan Marz, James Warren, “Big Data: Principles and best practices of scalable realtime data systems”, Manning Publications (2015)
- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, “Learning Spark: Lightning-Fast Big Data Analysis”, O. Reilly Media, Inc. (2015).
No comments:
Post a Comment