Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Big Data and Hadoop; Introduction; Defining a Big Data problem; Building a Hadoop-based Big Data platform; Choosing from Hadoop alternatives; Chapter 2: Preparing for Hadoop Installation; Introduction; Choosing hardware for cluster nodes; Designing the cluster network; Configuring the cluster administrator machine; Creating the kickstart file and boot media; Installing the Linux operating system; Installing Java and other tools; Configuring SSH
Text of Note
Chapter 3: Configuring a Hadoop ClusterIntroduction; Choosing a Hadoop version; Configuring Hadoop in pseudo-distributed mode; Configuring Hadoop in fully-distributed mode; Validating Hadoop installation; Configuring ZooKeeper; Installing HBase; Installing Hive; Installing Pig; Installing Mahout; Chapter 4: Managing a Hadoop Cluster; Introduction; Managing the HDFS cluster; Configuring SecondaryNameNode; Managing the MapReduce cluster; Managing TaskTracker; Decommissioning DataNode; Replacing a slave node; Managing MapReduce jobs; Checking job history from the web UI; Importing data to HDFS
Text of Note
Manipulating files on HDFSConfiguring the HDFS quota; Configuring CapacityScheduler; Configuring Fair Scheduler; Configuring Hadoop daemon logging; Configuring Hadoop audit logging; Upgrading Hadoop; Chapter 5: Hardening a Hadoop Cluster; Introduction; Configuring service-level authentication; Configuring job authorization with ACL; Securing a Hadoop cluster with Kerberos; Configuring web UI authentication; Recovering from NameNode failure; Configuring NameNode high availability; Configuring HDFS federation; Chapter 6: Monitoring a Hadoop Cluster; Introduction
Text of Note
Monitoring a Hadoop cluster with JMXMonitoring a Hadoop cluster with Ganglia; Monitoring a Hadoop cluster with Nagios; Monitoring a Hadoop cluster with Ambari; Monitoring a Hadoop cluster with Chukwa; Chapter 7: Tuning Hadoop Cluster for Best Performance; Introduction; Benchmarking and profiling a Hadoop cluster; Analyzing job history with Rumen; Benchmarking a Hadoop cluster with GridMix; Using Hadoop Vaidya to identify performance problems; Balancing data blocks for a Hadoop cluster; Choosing a proper block size; Using compression for input and output; Configuring speculative execution
Text of Note
Setting proper number of map and reduce slots for the TaskTrackerTuning the JobTracker configuration; Tuning the TaskTracker configuration; Tuning shuffle, merge, and sort parameters; Configuring memory for a Hadoop cluster; Setting proper number of parallel copies; Tuning JVM parameters; Configuring JVM Reuse; Configuring the reducer initialization time; Chapter 8: Building a Hadoop Cluster with Amazon EC2 and S3; Introduction; Registering with Amazon Web Services (AWS); Managing AWS security credentials; Preparing a local machine for EC2 connection; Creating an Amazon Machine Image (AMI)
0
8
8
8
8
SUMMARY OR ABSTRACT
Text of Note
Solve specific problems using individual self-contained code recipes, or work through the book to develop your capabilities. This book is packed with easy-to-follow code and commands used for illustration, which makes your learning curve easy and quick.If you are a Hadoop cluster system administrator with Unix/Linux system management experience and you are looking to get a good grounding in how to set up and manage a Hadoop cluster, then this book is for you. It's assumed that you will have some experience in Unix/Linux command line already, as well as being familiar with network communication
ACQUISITION INFORMATION NOTE
Source for Acquisition/Subscription Address
Safari Books Online
Stock Number
CL0500000301
OTHER EDITION IN ANOTHER MEDIUM
Title
Hadoop Operations and Cluster Management Cookbook
International Standard Book Number
9781782165163
PIECE
Title
Safari books online
TOPICAL NAME USED AS SUBJECT
Electronic data processing-- Distributed processing.
File organization (Computer science)
Apache Hadoop (Computer file)
Cloud computing
Electronic data processing-- Distributed processing