عنوان

Practical Enterprise Data Lake Insights :

پدید آورنده

Saurabh Gupta, Venkata Giri.

موضوع

Big data.,Electronic data processing-- Distributed processing-- Management.,Information storage and retrieval systems.,Big data.,Business mathematics & systems.,COMPUTERS-- Data Processing.,Databases.,Electronic data processing-- Distributed processing-- Management.,Information storage and retrieval systems.,Information technology: general issues.

رده

QA76
.
9
.
D5

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم

تماس با کتابخانه : 32910706-025

شابک

1484235215

شابک

1484235223

شابک

9781484235218

شابک

9781484235225

شابک اشتباه

9781484235218

عنوان و نام پديدآور

عنوان اصلي

Practical Enterprise Data Lake Insights :

نام عام مواد

[Book]

ساير اطلاعات عنواني

handle data-driven challenges in an Enterprise Big Data Lake /

نام نخستين پديدآور

Saurabh Gupta, Venkata Giri.

وضعیت نشر و پخش و غیره

محل نشرو پخش و غیره

[Berkeley, CA] :

نام ناشر، پخش کننده و غيره

Apress,

تاریخ نشرو بخش و غیره

2018.

مشخصات ظاهری

نام خاص و کميت اثر

1 online resource

یادداشتهای مربوط به کتابنامه ، واژه نامه و نمایه های داخل اثر

متن يادداشت

Includes bibliographical references.

یادداشتهای مربوط به مندرجات

متن يادداشت

Intro; Table of Contents; About the Authors; About the Technical Reviewer; Acknowledgments; Foreword; Chapter 1: Introduction to Enterprise Data Lakes; Data explosion: the beginning; Big data ecosystem; Hadoop and MapReduce -- Early days; Evolution of Hadoop; History of Data Lake; Data Lake: the concept; Data lake architecture; Why Data Lake?; Data Lake Characteristics; Data lake vs. Data warehouse; How to achieve success with Data Lake?; Data governance and data operations; Data democratization with data lake; Fast Data -- Life beyond Big Data; Conclusion.

متن يادداشت

Centralization of Change DataAnalyzing a Centralized Data Store; Metadata: Data about Data; Structure of Data; Privacy/Sensitivity Information; Special Fields; Data Formats; Delimited Format; Avro File Format; Consumption and Checkpointing; Simple Checkpoint Mechanism; Parallelism; Merging and Consolidation; Design Considerations for Merge and Consolidate; Data Quality; Challenges; Design Aspects; Operational Aspects; Publishing to Kafka; Schema and Data; Sample Schema; Schema Repository; Multiple Topics and Partitioning; Sizing and Scaling; Tools; Conclusion.

متن يادداشت

Chapter 2: Data lake ingestion strategiesWhat is data ingestion?; Understand the data sources; Structured vs. Semi-structured vs. Unstructured data; Data ingestion framework parameters; ETL vs. ELT; Big Data Integration with Data Lake; Hadoop Distributed File System (HDFS); Copy files directly into HDFS; Batched data ingestion; Challenges and design considerations; Design considerations; Commercial ETL tools; Real-time ingestion; CDC design considerations; Example of CDC pipeline: Databus, LinkedIn's open-source solution; Apache Sqoop; Sqoop 1; Sqoop 2; How Sqoop works?

متن يادداشت

Chapter 4: Data Processing Strategies in Data LakesMapReduce Processing Framework; Motivation: Why MapReduce?; MapReduce V1 Refresher and Design Considerations; Yet Another Resource Negotiator -- YARN; YARN concepts; Hive; Hive -- Quick Refresher; Hive Components; Hive Metastore (a.k.a. HCatalog); Hive -- Design Considerations; Hive LLAP; Apache Pig; Pig Execution Architecture; Apache Spark; Why Spark?; Resilient Distributed Datasets (RDD); RDD Runtime Components; RDD Composition; Datasets and DataFrames; Bucketing, Sorting, and Partitioning; Deployment Modes of Spark Application.

متن يادداشت

Sqoop design considerationsNative ingestion utilities; Oracle copyToBDA; Greenplum gphdfs utility; Data transfer from Greenplum to using gpfdist; Ingest unstructured data into Hadoop; Apache Flume; Tiered architecture for convergent flow of events; Features and design considerations; Conclusion; Chapter 3: Capture Streaming Data with Change-Data-Capture; Change Data Capture Concepts; Strategies for Data Capture; Retention and Replay; Retention Period; Types of CDC; Incremental; Bulk; Hybrid; CDC -- Trade-offs; CDC Tools; Challenges; Downstream Propagation; Use Case.

بدون عنوان

یادداشتهای مربوط به خلاصه یا چکیده

متن يادداشت

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn: Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model.

یادداشتهای مربوط به سفارشات

منبع سفارش / آدرس اشتراک

Springer Nature

شماره انبار

com.springer.onix.9781484235225

ویراست دیگر از اثر در قالب دیگر رسانه

شماره استاندارد بين المللي کتاب و موسيقي

9781484235218

موضوع (اسم عام یاعبارت اسمی عام)

موضوع مستند نشده

Big data.

موضوع مستند نشده

Electronic data processing-- Distributed processing-- Management.

موضوع مستند نشده

Information storage and retrieval systems.

موضوع مستند نشده

Big data.

موضوع مستند نشده

Business mathematics & systems.

موضوع مستند نشده

COMPUTERS-- Data Processing.

موضوع مستند نشده

Databases.

موضوع مستند نشده

Electronic data processing-- Distributed processing-- Management.

موضوع مستند نشده

Information storage and retrieval systems.

موضوع مستند نشده

Information technology: general issues.

مقوله موضوعی

موضوع مستند نشده

COM-- 018000

رده بندی ديویی

شماره

004

ويراست

رده بندی کنگره

شماره رده

QA76

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

مستند نام اشخاص تاييد نشده

Gupta, Saurabh

نام شخص - (مسئولیت معنوی برابر )

مستند نام اشخاص تاييد نشده

Giri, Venkata

مبدا اصلی

تاريخ عمليات

20200823032047.0

قواعد فهرست نويسي ( بخش توصيفي )

دسترسی و محل الکترونیکی

نام الکترونيکي

اطلاعات رکورد کتابشناسی

نوع ماده

[Book]

اطلاعات دسترسی رکورد

تكميل شده

عنوان Practical Enterprise Data Lake Insights :

پدید آورنده Saurabh Gupta, Venkata Giri.

رده QA76.9.D5

کتابخانه مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار استان: قم ـ شهر: قم

شابک

عنوان و نام پديدآور

وضعیت نشر و پخش و غیره

مشخصات ظاهری

یادداشتهای مربوط به کتابنامه ، واژه نامه و نمایه های داخل اثر

یادداشتهای مربوط به مندرجات

یادداشتهای مربوط به خلاصه یا چکیده

یادداشتهای مربوط به سفارشات

ویراست دیگر از اثر در قالب دیگر رسانه

موضوع (اسم عام یاعبارت اسمی عام)

مقوله موضوعی

رده بندی ديویی

رده بندی کنگره

نام شخص به منزله سر شناسه - (مسئولیت معنوی درجه اول )

نام شخص - (مسئولیت معنوی برابر )

مبدا اصلی

دسترسی و محل الکترونیکی

اطلاعات رکورد کتابشناسی

اطلاعات دسترسی رکورد

عنوان

Practical Enterprise Data Lake Insights :

پدید آورنده

Saurabh Gupta, Venkata Giri.

رده

QA76
.
9
.
D5

کتابخانه

مرکز و کتابخانه مطالعات اسلامی به زبان‌های اروپایی

محل استقرار

استان: قم ـ شهر: قم