Build highly effective analytics solutions to gain valuable insight into your big data.
وضعیت نشر و پخش و غیره
محل نشرو پخش و غیره
Birmingham :
نام ناشر، پخش کننده و غيره
Packt Publishing,
تاریخ نشرو بخش و غیره
2018.
مشخصات ظاهری
نام خاص و کميت اثر
1 online resource (471 pages)
یادداشتهای مربوط به مندرجات
متن يادداشت
Cover; Title Page; Copyright and Credits; Packt Upsell; Contributors; Table of Contents; Preface; Chapter 1: Introduction to Hadoop; Hadoop Distributed File System; High availability; Intra-DataNode balancer; Erasure coding; Port numbers; MapReduce framework; Task-level native optimization; YARN; Opportunistic containers; Types of container execution ; YARN timeline service v. 2; Enhancing scalability and reliability; Usability improvements; Architecture; Other changes; Minimum required Java version ; Shell script rewrite; Shaded-client JARs; Installing Hadoop 3 ; Prerequisites; Downloading.
متن يادداشت
InstallationSetup password-less ssh; Setting up the NameNode; Starting HDFS; Setting up the YARN service; Erasure Coding; Intra-DataNode balancer; Installing YARN timeline service v. 2; Setting up the HBase cluster; Simple deployment for HBase; Enabling the co-processor; Enabling timeline service v. 2; Running timeline service v. 2; Enabling MapReduce to write to timeline service v. 2; Summary; Chapter 2: Overview of Big Data Analytics; Introduction to data analytics; Inside the data analytics process; Introduction to big data; Variety of data; Velocity of data; Volume of data; Veracity of data.
متن يادداشت
Installing standard PythonInstalling Anaconda; Using Conda; Data analysis; Summary; Chapter 5: Statistical Big Data Computing with R and Hadoop; Introduction; Install R on workstations and connect to the data in Hadoop; Install R on a shared server and connect to Hadoop; Utilize Revolution R Open; Execute R inside of MapReduce using RMR2; Summary and outlook for pure open source options; Methods of integrating R and Hadoop; RHADOOP -- install R on workstations and connect to data in Hadoop; RHIPE -- execute R inside Hadoop MapReduce; R and Hadoop Streaming.
متن يادداشت
Record readerMap; Combiner; Partitioner; Shuffle and sort; Reduce; Output format; MapReduce job types; Single mapper job; Single mapper reducer job; Multiple mappers reducer job; SingleMapperCombinerReducer job; Scenario; MapReduce patterns; Aggregation patterns; Average temperature by city; Record count; Min/max/count; Average/median/standard deviation; Filtering patterns; Join patterns; Inner join; Left anti join; Left outer join; Right outer join; Full outer join; Left semi join; Cross join; Summary; Chapter 4: Scientific Computing and Big Data Analysis with Python and Hadoop; Installation.
متن يادداشت
RHIVE -- install R on workstations and connect to data in Hadoop.
متن يادداشت
Variability of dataVisualization; Value; Distributed computing using Apache Hadoop; The MapReduce framework; Hive; Downloading and extracting the Hive binaries; Installing Derby; Using Hive; Creating a database; Creating a table; SELECT statement syntax; WHERE clauses; INSERT statement syntax; Primitive types; Complex types; Built-in operators and functions; Built-in operators; Built-in functions; Language capabilities; A cheat sheet on retrieving information ; Apache Spark; Visualization using Tableau; Summary; Chapter 3: Big Data Processing with MapReduce; The MapReduce framework; Dataset.
بدون عنوان
0
بدون عنوان
8
بدون عنوان
8
بدون عنوان
8
بدون عنوان
8
بدون عنوان
8
یادداشتهای مربوط به خلاصه یا چکیده
متن يادداشت
Apache Hadoop is the most popular platform for big data processing to build powerful analytics solutions. This book shows you how to do just that, with the help of practical examples. You will be well-versed with the analytical capabilities of Hadoop ecosystem with Apache Spark and Apache Flink to perform big data analytics by the end of this book.
یادداشتهای مربوط به سفارشات
منبع سفارش / آدرس اشتراک
Packt Publishing
شماره انبار
9781788624954
ویراست دیگر از اثر در قالب دیگر رسانه
عنوان
Big Data Analytics with Hadoop 3 : Build highly effective analytics solutions to gain valuable insight into your big data.
موضوع (اسم عام یاعبارت اسمی عام)
موضوع مستند نشده
Big data.
موضوع مستند نشده
Cluster analysis.
موضوع مستند نشده
Electronic data processing-- Distributed processing.
موضوع مستند نشده
Big data.
موضوع مستند نشده
Cluster analysis-- Data processing.
موضوع مستند نشده
Computers-- Data Modeling & Design.
موضوع مستند نشده
Computers-- Data Processing.
موضوع مستند نشده
Computers-- Database Management-- Data Warehousing.
موضوع مستند نشده
Data capture & analysis.
موضوع مستند نشده
Data warehousing.
موضوع مستند نشده
Database design & theory.
موضوع مستند نشده
Electronic data processing-- Distributed processing.
موضوع مستند نشده
Information architecture.
مقوله موضوعی
موضوع مستند نشده
COM-- 089000
رده بندی ديویی
شماره
005
.
7
رده بندی کنگره
شماره رده
QA76
.
9
.
B45
.
A453
2018
نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )