Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Big Data and Hadoop; Introduction; Defining a Big Data problem; Building a Hadoop-based Big Data platform; Choosing from Hadoop alternatives; Chapter 2: Preparing for Hadoop Installation; Introduction; Choosing hardware for cluster nodes; Designing the cluster network; Configuring the cluster administrator machine; Creating the kickstart file and boot media; Installing the Linux operating system; Installing Java and other tools; Configuring SSH
Chapter 3: Configuring a Hadoop ClusterIntroduction; Choosing a Hadoop version; Configuring Hadoop in pseudo-distributed mode; Configuring Hadoop in fully-distributed mode; Validating Hadoop installation; Configuring ZooKeeper; Installing HBase; Installing Hive; Installing Pig; Installing Mahout; Chapter 4: Managing a Hadoop Cluster; Introduction; Managing the HDFS cluster; Configuring SecondaryNameNode; Managing the MapReduce cluster; Managing TaskTracker; Decommissioning DataNode; Replacing a slave node; Managing MapReduce jobs; Checking job history from the web UI; Importing data to HDFS
Manipulating files on HDFSConfiguring the HDFS quota; Configuring CapacityScheduler; Configuring Fair Scheduler; Configuring Hadoop daemon logging; Configuring Hadoop audit logging; Upgrading Hadoop; Chapter 5: Hardening a Hadoop Cluster; Introduction; Configuring service-level authentication; Configuring job authorization with ACL; Securing a Hadoop cluster with Kerberos; Configuring web UI authentication; Recovering from NameNode failure; Configuring NameNode high availability; Configuring HDFS federation; Chapter 6: Monitoring a Hadoop Cluster; Introduction
Monitoring a Hadoop cluster with JMXMonitoring a Hadoop cluster with Ganglia; Monitoring a Hadoop cluster with Nagios; Monitoring a Hadoop cluster with Ambari; Monitoring a Hadoop cluster with Chukwa; Chapter 7: Tuning Hadoop Cluster for Best Performance; Introduction; Benchmarking and profiling a Hadoop cluster; Analyzing job history with Rumen; Benchmarking a Hadoop cluster with GridMix; Using Hadoop Vaidya to identify performance problems; Balancing data blocks for a Hadoop cluster; Choosing a proper block size; Using compression for input and output; Configuring speculative execution
Setting proper number of map and reduce slots for the TaskTrackerTuning the JobTracker configuration; Tuning the TaskTracker configuration; Tuning shuffle, merge, and sort parameters; Configuring memory for a Hadoop cluster; Setting proper number of parallel copies; Tuning JVM parameters; Configuring JVM Reuse; Configuring the reducer initialization time; Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3; Introduction; Registering with Amazon Web Services (AWS); Managing AWS security credentials; Preparing a local machine for EC2 connection; Creating an Amazon Machine Image (AMI)
0
8
8
8
8
Solve specific problems using individual self-contained code recipes, or work through the book to develop your capabilities. This book is packed with easy-to-follow code and commands used for illustration, which makes your learning curve easy and quick.If you are a Hadoop cluster system administrator with Unix/Linux system management experience and you are looking to get a good grounding in how to set up and manage a Hadoop cluster, then this book is for you. It's assumed that you will have some experience in Unix/Linux command line already, as well as being familiar with network communication
Safari Books Online
CL0500000301
Hadoop Operations and Cluster Management Cookbook
9781782165163
Safari books online
Electronic data processing-- Distributed processing.
File organization (Computer science)
Apache Hadoop (Computer file)
Cloud computing
Electronic data processing-- Distributed processing