Business Intelligence A Managerial Perspective On Analytics 3rd Ed By Dursun Delen – Test Bank

$25.00

Category:

Description

INSTANT DOWNLOAD WITH ANSWERS
Business Intelligence A Managerial Perspective On Analytics 3rd Ed By Dursun Delen – Test Bank

Business Intelligence, 3e (Sharda/Delen/Turban)

Chapter 2   Data Warehousing

 

1) In the Isle of Capri case, the only capability added by the new software was increased processing speed of processing reports.

Answer:  FALSE

Diff: 2    Page Ref: 38

 

2) The “islands of data” problem in the 1980s describes the phenomenon of unconnected data being stored in numerous locations within an organization.

Answer:  TRUE

Diff: 2    Page Ref: 41

 

3) Subject oriented databases for data warehousing are organized by detailed subjects such as disk drives, computers, and networks.

Answer:  FALSE

Diff: 2    Page Ref: 42

 

4) Data warehouses are subsets of data marts.

Answer:  FALSE

Diff: 1    Page Ref: 43

 

5) One way an operational data store differs from a data warehouse is the recency of their data.

Answer:  TRUE

Diff: 2    Page Ref: 43-44

 

6) Organizations seldom devote a lot of effort to creating metadata because it is not important for the effective use of data warehouses.

Answer:  FALSE

Diff: 2    Page Ref: 46

 

7) Without middleware, different BI programs cannot easily connect to the data warehouse.

Answer:  TRUE

Diff: 2    Page Ref: 48-49

 

8) Two-tier data warehouse/BI infrastructures offer organizations more flexibility but cost more than three-tier ones.

Answer:  FALSE

Diff: 2    Page Ref: 50

 

9) Moving the data into a data warehouse is usually the easiest part of its creation.

Answer:  FALSE

Diff: 2    Page Ref: 52

 

 

10) The hub-and-spoke data warehouse model uses a centralized warehouse feeding dependent data marts.

Answer:  TRUE

Diff: 2    Page Ref: 52

11) Because of performance and data quality issues, most experts agree that the federated architecture should supplement data warehouses, not replace them.

Answer:  TRUE

Diff: 2    Page Ref: 54

 

12) Bill Inmon advocates the data mart bus architecture whereas Ralph Kimball promotes the hub-and-spoke architecture, a data mart bus architecture with conformed dimensions.

Answer:  FALSE

Diff: 2    Page Ref: 55

 

13) The ETL process in data warehousing usually takes up a small portion of the time in a data-centric project.

Answer:  FALSE

Diff: 3    Page Ref: 59

 

14) In the Starwood Hotels case, up-to-date data and faster reporting helped hotel managers better manage their occupancy rates.

Answer:  TRUE

Diff: 1    Page Ref: 66

 

15) Large companies, especially those with revenue upwards of $500 million consistently reap substantial cost savings through the use of hosted data warehouses.

Answer:  FALSE

Diff: 2    Page Ref: 68

 

16) OLTP systems are designed to handle ad hoc analysis and complex queries that deal with many data items.

Answer:  FALSE

Diff: 2    Page Ref: 70

 

17) The data warehousing maturity model consists of six stages: prenatal, infant, child, teenager, adult, and sage.

Answer:  TRUE

Diff: 2    Page Ref: 73

 

18) A well-designed data warehouse means that user requirements do not have to change as business needs change.

Answer:  FALSE

Diff: 2    Page Ref: 77

 

 

19) Data warehouse administrators (DWAs) do not need strong business insight since they only handle the technical aspect of the infrastructure.

Answer:  FALSE

Diff: 2    Page Ref: 82

 

20) Because the recession has raised interest in low-cost open source software, it is now set to replace traditional enterprise software.

Answer:  FALSE

Diff: 2    Page Ref: 83

21) The “single version of the truth” embodied in a data warehouse such as Capri Casinos’ means all of the following EXCEPT

  1. A) decision makers get to see the same results to queries.
  2. B) decision makers have the same data available to support their decisions.
  3. C) decision makers get to use more dependable data for their decisions.
  4. D) decision makers have unfettered access to all data in the warehouse.

Answer:  D

Diff: 3    Page Ref: 40

 

22) Operational or transaction databases are product oriented, handling transactions that update the database. In contrast, data warehouses are

  1. A) subject-oriented and nonvolatile.
  2. B) product-oriented and nonvolatile.
  3. C) product-oriented and volatile.
  4. D) subject-oriented and volatile.

Answer:  A

Diff: 3    Page Ref: 40

 

23) Which kind of data warehouse is created separately from the enterprise data warehouse by a department and not reliant on it for updates?

  1. A) sectional data mart
  2. B) public data mart
  3. C) independent data mart
  4. D) volatile data mart

Answer:  C

Diff: 2    Page Ref: 43

 

24) All of the following statements about metadata are true EXCEPT

  1. A) metadata gives context to reported data.
  2. B) there may be ethical issues involved in the creation of metadata.
  3. C) metadata helps to describe the meaning and structure of data.
  4. D) for most organizations, data warehouse metadata are an unnecessary expense.

Answer:  D

Diff: 2    Page Ref: 45-46

 

 

25) A Web client that connects to a Web server, which is in turn connected to a BI application server, is reflective of a

  1. A) one tier architecture.
  2. B) two tier architecture.
  3. C) three tier architecture.
  4. D) four tier architecture.

Answer:  C

Diff: 2    Page Ref: 49-50

26) Which of the following BEST enables a data warehouse to handle complex queries and scale up to handle many more requests?

  1. A) use of the web by users as a front-end
  2. B) parallel processing
  3. C) Microsoft Windows
  4. D) a larger IT staff

Answer:  B

Diff: 3    Page Ref: 51

 

27) Which data warehouse architecture uses metadata from existing data warehouses to create a hybrid logical data warehouse comprised of data from the other warehouses?

  1. A) independent data marts architecture
  2. B) centralized data warehouse architecture
  3. C) hub-and-spoke data warehouse architecture
  4. D) federated architecture

Answer:  D

Diff: 3    Page Ref: 53

 

28) Which data warehouse architecture uses a normalized relational warehouse that feeds multiple data marts?

  1. A) independent data marts architecture
  2. B) centralized data warehouse architecture
  3. C) hub-and-spoke data warehouse architecture
  4. D) federated architecture

Answer:  C

Diff: 3    Page Ref: 53

 

29) Which approach to data warehouse integration focuses more on sharing process functionality than data across systems?

  1. A) extraction, transformation, and load
  2. B) enterprise application integration
  3. C) enterprise information integration
  4. D) enterprise function integration

Answer:  B

Diff: 3    Page Ref: 58-59

 

 

30) In which stage of extraction, transformation, and load (ETL) into a data warehouse are data aggregated?

  1. A) transformation
  2. B) extraction
  3. C) load
  4. D) cleanse

Answer:  A

Diff: 3    Page Ref: 59

31) In which stage of extraction, transformation, and load (ETL) into a data warehouse are anomalies detected and corrected?

  1. A) transformation
  2. B) extraction
  3. C) load
  4. D) cleanse

Answer:  D

Diff: 3    Page Ref: 59

 

32) Data warehouses provide direct and indirect benefits to using organizations. Which of the following is an indirect benefit of data warehouses?

  1. A) better and more timely information
  2. B) extensive new analyses performed by users
  3. C) simplified access to data
  4. D) improved customer service

Answer:  D

Diff: 3    Page Ref: 61

 

33) All of the following are benefits of hosted data warehouses EXCEPT

  1. A) smaller upfront investment.
  2. B) better quality hardware.
  3. C) greater control of data.
  4. D) frees up in-house systems.

Answer:  C

Diff: 2    Page Ref: 68

 

34) When representing data in a data warehouse, using several dimension tables that are each connected only to a fact table means you are using which warehouse structure?

  1. A) star schema
  2. B) snowflake schema
  3. C) relational schema
  4. D) dimensional schema

Answer:  A

Diff: 3    Page Ref: 68-69

 

 

35) When querying a dimensional database, a user went from summarized data to its underlying details. The function that served this purpose is

  1. A) dice.
  2. B) slice.
  3. C) roll-up.
  4. D) drill down.

Answer:  D

Diff: 3    Page Ref: 70-71

36) Which of the following online analytical processing (OLAP) technologies does NOT require the precomputation and storage of information?

  1. A) MOLAP
  2. B) ROLAP
  3. C) HOLAP
  4. D) SQL

Answer:  B

Diff: 2    Page Ref: 71-72

 

37) Active data warehousing can be used to support the highest level of decision making sophistication and power. The major feature that enables this in relation to handling the data is

  1. A) country of (data) origin.
  2. B) nature of the data.
  3. C) speed of data transfer.
  4. D) source of the data.

Answer:  C

Diff: 2    Page Ref: 77

 

38) Which of the following statements is more descriptive of active data warehouses in contrast with traditional data warehouses?

  1. A) strategic decisions whose impacts are hard to measure
  2. B) detailed data available for strategic use only
  3. C) large numbers of users, including operational staffs
  4. D) restrictive reporting with daily and weekly data currency

Answer:  C

Diff: 3    Page Ref: 81

 

39) How does the use of cloud computing affect the scalability of a data warehouse?

  1. A) Cloud computing vendors bring as much hardware as needed to users’ offices.
  2. B) Hardware resources are dynamically allocated as use increases.
  3. C) Cloud vendors are mostly based overseas where the cost of labor is low.
  4. D) Cloud computing has little effect on a data warehouse’s scalability.

Answer:  B

Diff: 3    Page Ref: 83

 

 

40) All of the following are true about in-database processing technology EXCEPT

  1. A) it pushes the algorithms to where the data is.
  2. B) it makes the response to queries much faster than conventional databases.
  3. C) it is often used for apps like credit card fraud detection and investment risk management.
  4. D) it is the same as in-memory storage technology.

Answer:  D

Diff: 3    Page Ref: 85

 

41) With ________ data flows, managers can view the current state of their businesses and quickly identify problems.

Answer:  real-time

Diff: 2    Page Ref: 40

42) In ________ oriented data warehousing, operational databases are tuned to handle transactions that update the database.

Answer:  product

Diff: 2    Page Ref: 42

 

43) The three main types of data warehouses are data marts, operational ________, and enterprise data warehouses.

Answer:  data stores

Diff: 2    Page Ref: 43

 

44) ________ describe the structure and meaning of the data, contributing to their effective use.

Answer:  Metadata

Diff: 1    Page Ref: 45

 

45) Most data warehouses are built using ________ database management systems to control and manage the data.

Answer:  relational

Diff: 2    Page Ref: 51

 

46) A(n) ________ architecture is used to build a scalable and maintainable infrastructure that includes a centralized data warehouse and several dependent data marts.

Answer:  hub-and-spoke

Diff: 2    Page Ref: 52

 

47) The ________ data warehouse architecture involves integrating disparate systems and analytical resources from multiple sources to meet changing needs or business conditions.

Answer:  federated

Diff: 2    Page Ref: 54

 

48) Data ________ comprises data access, data federation, and change capture.

Answer:  integration

Diff: 3    Page Ref: 57

 

 

49) ________ is a mechanism that integrates application functionality and shares functionality (rather than data) across systems, thereby enabling flexibility and reuse.

Answer:  Enterprise application integration (EAI)

Diff: 3    Page Ref: 58

 

50) ________ is a mechanism for pulling data from source systems to satisfy a request for information. It is an evolving tool space that promises real-time data integration from a variety of sources, such as relational databases, Web services, and multidimensional databases.

Answer:  Enterprise information integration (EII)

Diff: 3    Page Ref: 59

 

51) Performing extensive ________ to move data to the data warehouse may be a sign of poorly managed data and a fundamental lack of a coherent data management strategy.

Answer:  extraction, transformation, and load (ETL)

Diff: 3    Page Ref: 61

52) The ________ Model, also known as the EDW approach, emphasizes top-down development, employing established database development methodologies and tools, such as entity-relationship diagrams (ERD), and an adjustment of the spiral development approach.

Answer:  Inmon

Diff: 2    Page Ref: 65

 

53) The ________ Model, also known as the data mart approach, is a “plan big, build small” approach. A data mart is a subject-oriented or department-oriented data warehouse. It is a scaled-down version of a data warehouse that focuses on the requests of a specific department, such as marketing or sales.

Answer:  Kimball

Diff: 2    Page Ref: 65

 

54) ________ modeling is a retrieval-based system that supports high-volume query access.

Answer:  Dimensional

Diff: 2    Page Ref: 68

 

55) Online ________ is arguably the most commonly used data analysis technique in data warehouses.

Answer:  analytical processing

Diff: 1    Page Ref: 69

 

56) Online ________ is a term used for a transaction system that is primarily responsible for capturing and storing data related to day-to-day business functions such as ERP, CRM, SCM, and point of sale.

Answer:  transaction processing

Diff: 2    Page Ref: 69

 

 

57) In the Michigan State Agencies case, the approach used was a(n) ________ one, instead of developing separate BI/DW platforms for each business area or state agency.

Answer:  enterprise

Diff: 2    Page Ref: 76

 

58) The role responsible for successful administration and management of a data warehouse is the ________, who should be familiar with high-performance software, hardware, and networking technologies, and also possesses solid business insight.

Answer:  data warehouse administrator (DWA)

Diff: 2    Page Ref: 82

 

59) ________, or “The Extended ASP Model,” is a creative way of deploying information system applications where the provider licenses its applications to customers for use as a service on demand (usually over the Internet)

Answer:  SaaS (software as a service)

Diff: 2    Page Ref: 83

 

60) ________ (also called in-database analytics) refers to the integration of the algorithmic extent of data analytics into data warehouse.

Answer:  In-database processing

Diff: 2    Page Ref: 85

61) What is the definition of a data warehouse (DW) in simple terms?

Answer:  In simple terms, a data warehouse (DW) is a pool of data produced to support decision making; it is also a repository of current and historical data of potential interest to managers throughout the organization.

Diff: 2    Page Ref: 40

 

 

62) A common way of introducing data warehousing is to refer to its fundamental characteristics. Describe three characteristics of data warehousing.

Answer:

∙     Subject oriented. Data are organized by detailed subject, such as sales, products, or customers, containing only information relevant for decision support.

∙     Integrated. Integration is closely related to subject orientation. Data warehouses must place data from different sources into a consistent format. To do so, they must deal with naming conflicts and discrepancies among units of measure. A data warehouse is presumed to be totally integrated.

∙     Time variant (time series). A warehouse maintains historical data. The data do not necessarily provide current status (except in real-time systems). They detect trends, deviations, and long-term relationships for forecasting and comparisons, leading to decision making. Every data warehouse has a temporal quality. Time is the one important dimension that all data warehouses must support. Data for analysis from multiple sources contains multiple time points (e.g., daily, weekly, monthly views).

∙     Nonvolatile. After data are entered into a data warehouse, users cannot change or update the data. Obsolete data are discarded, and changes are recorded as new data.

∙     Web based. Data warehouses are typically designed to provide an efficient computing environment for Web-based applications.

∙     Relational/multidimensional. A data warehouse uses either a relational structure or a multidimensional structure. A recent survey on multidimensional structures can be found in Romero and Abelló (2009).

∙     Client/server. A data warehouse uses the client/server architecture to provide easy access for end users.

∙     Real time. Newer data warehouses provide real-time, or active, data-access and analysis capabilities (see Basu, 2003; and Bonde and Kuckuk, 2004).

∙     Include metadata. A data warehouse contains metadata (data about data) about how the data are organized and how to effectively use them.

Diff: 3    Page Ref: 42-43

 

63) What is the definition of a data mart?

Answer:  A data mart is a subset of a data warehouse, typically consisting of a single subject area (e.g., marketing, operations). Whereas a data warehouse combines databases across an entire enterprise, a data mart is usually smaller and focuses on a particular subject or department.

Diff: 2    Page Ref: 43

 

64) Mehra (2005) indicated that few organizations really understand metadata, and fewer understand how to design and implement a metadata strategy. How would you describe metadata?

Answer:  Metadata are data about data. Metadata describe the structure of and some meaning about data, thereby contributing to their effective or ineffective use.

Diff: 2    Page Ref: 45-46

 

65) According to Kassam (2002), business metadata comprise information that increases our understanding of traditional (i.e., structured) data. What is the primary purpose of metadata?

Answer:  The primary purpose of metadata should be to provide context to the reported data; that is, it provides enriching information that leads to the creation of knowledge.

Diff: 2    Page Ref: 46

 

66) In the MultiCare case, how was data warehousing able to reduce septicemia mortality rates in MultiCare hospitals?

Answer:

∙     The Adaptive Data Warehousetm organized and simplified data from multiple data sources across the continuum of care. It became the single source of truth required to see care improvement opportunities and to measure change, integrated teams consisting of clinicians, technologists, analysts, and quality personnel were essential for accelerating MultiCare’s efforts to reduce septicemia mortality.

∙     Together the collaborative effort addressed three key bodies of work–standard of care definition, early identification, and efficient delivery of defined-care standard.

Diff: 3    Page Ref: 47-48

 

67) Briefly describe four major components of the data warehousing process.

Answer:

∙     Data sources. Data are sourced from multiple independent operational “legacy” systems and possibly from external data providers (such as the U.S. Census). Data may also come from an OLTP or ERP system.

∙     Data extraction and transformation. Data are extracted and properly transformed using custom-written or commercial ETL software.

∙     Data loading. Data are loaded into a staging area, where they are transformed and cleansed. The data are then ready to load into the data warehouse and/or data marts.

∙     Comprehensive database. Essentially, this is the EDW to support all decision analysis by providing relevant summarized and detailed information originating from many different sources.

∙     Metadata. Metadata include software programs about data and rules for organizing data summaries that are easy to index and search, especially with Web tools.

∙     Middleware tools. Middleware tools enable access to the data warehouse. There are many front-end applications that business users can use to interact with data stored in the data repositories, including data mining, OLAP, reporting tools, and data visualization tools.

Diff: 2    Page Ref: 48-49

 

68) There are several basic information system architectures that can be used for data warehousing. What are they?

Answer:  Generally speaking, these architectures are commonly called client/server or n-tier architectures, of which two-tier and three-tier architectures are the most common, but sometimes there is simply one tier.

Diff: 2    Page Ref: 49-50

 

 

69) More data, coming in faster and requiring immediate conversion into decisions, means that organizations are confronting the need for real-time data warehousing (RDW). How would you define real-time data warehousing?

Answer:  Real-time data warehousing, also known as active data warehousing (ADW), is the process of loading and providing data via the data warehouse as they become available.

Diff: 2    Page Ref: 77

70) Mention briefly some of the recently popularized concepts and technologies that will play a significant role in defining the future of data warehousing.

Answer:

∙     Sourcing (mechanisms for acquisition of data from diverse and dispersed sources):

o    Web, social media, and Big Data

o    Open source software

o    SaaS (software as a service)

o    Cloud computing

∙     Infrastructure (architectural–hardware and software–enhancements):

o    Columnar (a new way to store and access data in the database)

o    Real-time data warehousing

o    Data warehouse appliances (all-in-one solutions to DW)

o    Data management technologies and practices

o    In-database processing technology (putting the algorithms where the data is)

o    In-memory storage technology (moving the data in the memory for faster processing)

o    New database management systems

o    Advanced analytics

Diff: 3    Page Ref: 83-86

 

Business Intelligence, 3e (Sharda/Delen/Turban)

Chapter 6   Big Data and Analytics

 

1) In the opening vignette, the CERN Data Aggregation System (DAS), built on MongoDB (a Big Data management infrastructure), used relational database technology.

Answer:  FALSE

Diff: 2    Page Ref: 277

 

2) The term “Big Data” is relative as it depends on the size of the using organization.

Answer:  TRUE

Diff: 2    Page Ref: 279

 

3) In the Luxottica case study, outsourcing enhanced the ability of the company to gain insights into their data.

Answer:  FALSE

Diff: 2    Page Ref: 283-284

 

4) Many analytics tools are too complex for the average user, and this is one justification for Big Data.

Answer:  TRUE

Diff: 2    Page Ref: 284

 

5) In the investment bank case study, the major benefit brought about by the supplanting of multiple databases by the new trade operational store was providing real-time access to trading data.

Answer:  TRUE

Diff: 2    Page Ref: 288

 

6) Big Data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application.

Answer:  FALSE

Diff: 2    Page Ref: 289

 

7) MapReduce can be easily understood by skilled programmers due to its procedural nature.

Answer:  TRUE

Diff: 2    Page Ref: 291

 

8) Hadoop was designed to handle petabytes and extabytes of data distributed over multiple nodes in parallel.

Answer:  TRUE

Diff: 2    Page Ref: 291

 

9) Hadoop and MapReduce require each other to work.

Answer:  FALSE

Diff: 2    Page Ref: 295

 

 

10) In most cases, Hadoop is used to replace data warehouses.

Answer:  FALSE

Diff: 2    Page Ref: 295

11) Despite their potential, many current NoSQL tools lack mature management and monitoring tools.

Answer:  TRUE

Diff: 2    Page Ref: 295

 

12) The data scientist is a profession for a field that is still largely being defined.

Answer:  TRUE

Diff: 2    Page Ref: 298

 

13) There is a current undersupply of data scientists for the Big Data market.

Answer:  TRUE

Diff: 2    Page Ref: 300

 

14) The Big Data and Analysis in Politics case study makes it clear that the unpredictability of elections makes politics an unsuitable arena for Big Data.

Answer:  FALSE

Diff: 2    Page Ref: 301

 

15) For low latency, interactive reports, a data warehouse is preferable to Hadoop.

Answer:  TRUE

Diff: 2    Page Ref: 306

 

16) If you have many flexible programming languages running in parallel, Hadoop is preferable to a data warehouse.

Answer:  TRUE

Diff: 2    Page Ref: 306

 

17) In the Dublin City Council case study, GPS data from the city’s buses and CCTV were the only data sources for the Big Data GIS-based application.

Answer:  FALSE

Diff: 2    Page Ref: 309-310

 

18) It is important for Big Data and self-service business intelligence go hand in hand to get maximum value from analytics.

Answer:  TRUE

Diff: 1    Page Ref: 313

 

19) Big Data simplifies data governance issues, especially for global firms.

Answer:  FALSE

Diff: 2    Page Ref: 313

 

 

20) Current total storage capacity lags behind the digital information being generated in the world.

Answer:  TRUE

Diff: 2    Page Ref: 315

21) Using data to understand customers/clients and business operations to sustain and foster

growth and profitability is

  1. A) easier with the advent of BI and Big Data.
  2. B) essentially the same now as it has always been.
  3. C) an increasingly challenging task for today’s enterprises.
  4. D) now completely automated with no human intervention required.

Answer:  C

Diff: 2    Page Ref: 279

 

22) A newly popular unit of data in the Big Data era is the petabyte (PB), which is

  1. A) 109
  2. B) 1012
  3. C) 1015
  4. D) 1018

Answer:  C

Diff: 2    Page Ref: 281

 

23) Which of the following sources is likely to produce Big Data the fastest?

  1. A) order entry clerks
  2. B) cashiers
  3. C) RFID tags
  4. D) online customers

Answer:  C

Diff: 2    Page Ref: 281-282

 

24) Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?

  1. A) volatility
  2. B) periodicity
  3. C) inconsistency
  4. D) variability

Answer:  D

Diff: 2    Page Ref: 282

 

25) In the Luxottica case study, what technique did the company use to gain visibility into its customers?

  1. A) visibility analytics
  2. B) data integration
  3. C) focus on growth
  4. D) customer focus

Answer:  B

Diff: 2    Page Ref: 283-284

26) Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near—real time with highly accurate insights. What is this process called?

  1. A) in-memory analytics
  2. B) in-database analytics
  3. C) grid computing
  4. D) appliances

Answer:  A

Diff: 2    Page Ref: 286

 

27) Which Big Data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?

  1. A) in-memory analytics
  2. B) in-database analytics
  3. C) grid computing
  4. D) appliances

Answer:  C

Diff: 2    Page Ref: 286

 

28) How does Hadoop work?

  1. A) It integrates Big Data into a whole so large data elements can be processed as a whole on one computer.
  2. B) It integrates Big Data into a whole so large data elements can be processed as a whole on multiple computers.
  3. C) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on one computer.
  4. D) It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.

Answer:  D

Diff: 3    Page Ref: 291

 

29) What is the Hadoop Distributed File System (HDFS) designed to handle?

  1. A) unstructured and semistructured relational data
  2. B) unstructured and semistructured non-relational data
  3. C) structured and semistructured relational data
  4. D) structured and semistructured non-relational data

Answer:  B

Diff: 2    Page Ref: 291

 

30) In a Hadoop “stack,” what is a slave node?

  1. A) a node where bits of programs are stored
  2. B) a node where metadata is stored and used to organize data processing
  3. C) a node where data is stored and processed
  4. D) a node responsible for holding all the source programs

Answer:  C

Diff: 2    Page Ref: 292

31) In a Hadoop “stack,” what node periodically replicates and stores data from the Name Node should it fail?

  1. A) backup node
  2. B) secondary node
  3. C) substitute node
  4. D) slave node

Answer:  B

Diff: 2    Page Ref: 292

 

32) All of the following statements about MapReduce are true EXCEPT

  1. A) MapReduce is a general-purpose execution engine.
  2. B) MapReduce handles the complexities of network communication.
  3. C) MapReduce handles parallel programming.
  4. D) MapReduce runs without fault tolerance.

Answer:  D

Diff: 2    Page Ref: 295

 

33) In the Big Data and Analytics in Politics case study, which of the following was an input to the analytic system?

  1. A) census data
  2. B) assessment of sentiment
  3. C) voter mobilization
  4. D) group clustering

Answer:  A

Diff: 2    Page Ref: 301

 

34) In the Big Data and Analytics in Politics case study, what was the analytic system output or goal?

  1. A) census data
  2. B) assessment of sentiment
  3. C) voter mobilization
  4. D) group clustering

Answer:  C

Diff: 2    Page Ref: 301

 

35) Traditional data warehouses have not been able to keep up with

  1. A) the evolution of the SQL language.
  2. B) the variety and complexity of data.
  3. C) expert systems that run on them.
  4. D) OLAP.

Answer:  B

Diff: 2    Page Ref: 303

 

 

36) Under which of the following requirements would it be more appropriate to use Hadoop over a data warehouse?

  1. A) ANSI 2003 SQL compliance is required
  2. B) online archives alternative to tape
  3. C) unrestricted, ungoverned sandbox explorations
  4. D) analysis of provisional data

Answer:  C

Diff: 2    Page Ref: 306

37) What is Big Data’s relationship to the cloud?

  1. A) Hadoop cannot be deployed effectively in the cloud just yet.
  2. B) Amazon and Google have working Hadoop cloud offerings.
  3. C) IBM’s homegrown Hadoop platform is the only option.
  4. D) Only MapReduce works in the cloud; Hadoop does not.

Answer:  B

Diff: 2    Page Ref: 308

 

38) Companies with the largest revenues from Big Data tend to be

  1. A) the largest computer and IT services firms.
  2. B) small computer and IT services firms.
  3. C) pure open source Big Data firms.
  4. D) non-U.S. Big Data firms.

Answer:  A

Diff: 2    Page Ref: 311

 

39) In the health sciences, the largest potential source of Big Data comes from

  1. A) accounting systems.
  2. B) human resources.
  3. C) patient monitoring.
  4. D) research administration.

Answer:  C

Diff: 2    Page Ref: 320

 

40) In the Discovery Health insurance case study, the analytics application used available data to help the company do all of the following EXCEPT

  1. A) predict customer health.
  2. B) detect fraud.
  3. C) lower costs for members.
  4. D) open its own pharmacy.

Answer:  D

Diff: 2    Page Ref: 323-324

 

41) Most Big Data is generated automatically by ________.

Answer:  machines

Diff: 2    Page Ref: 279

 

 

42) ________ refers to the conformity to facts: accuracy, quality, truthfulness, or trustworthiness of the data.

Answer:  Veracity

Diff: 2    Page Ref: 282

 

43) In-motion ________ is often overlooked today in the world of BI and Big Data.

Answer:  analytics

Diff: 2    Page Ref: 282

 

44) The ________ of Big Data is its potential to contain more useful patterns and interesting anomalies than “small” data.

Answer:  value proposition

Diff: 2    Page Ref: 282

45) As the size and the complexity of analytical systems increase, the need for more ________ analytical systems is also increasing to obtain the best performance.

Answer:  efficient

Diff: 2    Page Ref: 286

 

46) ________ speeds time to insights and enables better data governance by performing data integration and analytic functions inside the database.

Answer:  In-database analytics

Diff: 2    Page Ref: 286

 

47) ________ bring together hardware and software in a physical unit that is not only fast but also scalable on an as-needed basis.

Answer:  Appliances

Diff: 2    Page Ref: 286

 

48) Big Data employs ________ processing techniques and nonrelational data storage capabilities in order to process unstructured and semistructured data.

Answer:  parallel

Diff: 2    Page Ref: 289

 

49) In the world of Big Data, ________ aids organizations in processing and analyzing large volumes of multi-structured data. Examples include indexing and search, graph analysis, etc.

Answer:  MapReduce

Diff: 2    Page Ref: 291

 

50) The ________ Node in a Hadoop cluster provides client information on where in the cluster particular data is stored and if any nodes fail.

Answer:  Name

Diff: 2    Page Ref: 292

 

 

51) A job ________ is a node in a Hadoop cluster that initiates and coordinates MapReduce jobs, or the processing of the data.

Answer:  tracker

Diff: 2    Page Ref: 292

 

52) HBase is a nonrelational ________ that allows for low-latency, quick lookups in Hadoop.

Answer:  database

Diff: 2    Page Ref: 293

 

53) Hadoop is primarily a(n) ________ file system and lacks capabilities we’d associate with a DBMS, such as indexing, random access to data, and support for SQL.

Answer:  distributed

Diff: 2    Page Ref: 294

 

54) HBase, Cassandra, MongoDB, and Accumulo are examples of ________ databases.

Answer:  NoSQL

Diff: 2    Page Ref: 295

55) In the eBay use case study, load ________ helped the company meet its Big Data needs with the extremely fast data handling and application availability requirements.

Answer:  balancing

Diff: 2    Page Ref: 296

 

56) As volumes of Big Data arrive from multiple sources such as sensors, machines, social media, and clickstream interactions, the first step is to ________ all the data reliably and cost effectively.

Answer:  capture

Diff: 2    Page Ref: 303

 

57) In open-source databases, the most important performance enhancement to date is the cost-based ________.

Answer:  optimizer

Diff: 2    Page Ref: 304

 

58) Data ________ or pulling of data from multiple subject areas and numerous applications into one repository is the raison d’être for data warehouses.

Answer:  integration

Diff: 2    Page Ref: 305

 

59) In the energy industry, ________ grids are one of the most impactful applications of stream analytics.

Answer:  smart

Diff: 2    Page Ref: 315

 

 

60) In the U.S. telecommunications company case study, the use of analytics via dashboards has helped to improve the effectiveness of the company’s ________ assessments and to make their systems more secure.

Answer:  threat

Diff: 2    Page Ref: 319

 

61) In the opening vignette, what is the source of the Big Data collected at the European Organization for Nuclear Research or CERN?

Answer:  Forty million times per second, particles collide within the LHC, each collision generating particles that often decay in complex ways into even more particles. Precise electronic circuits all around LHC record the passage of each particle via a detector as a series of electronic signals, and send the data to the CERN Data Centre (DC) for recording and digital reconstruction. The digitized summary of data is recorded as a “collision event.” 15 petabytes or so of digitized summary data produced annually and this is processed by physicists to determine if the collisions have thrown up any interesting physics.

Diff: 2    Page Ref: 276

62) List and describe the three main “V”s that characterize Big Data.

Answer:

∙     Volume: This is obviously the most common trait of Big Data. Many factors contributed to the exponential increase in data volume, such as transaction-based data stored through the years, text data constantly streaming in from social media, increasing amounts of sensor data being collected, automatically generated RFID and GPS data, and so forth.

∙     Variety: Data today comes in all types of formats–ranging from traditional databases to hierarchical data stores created by the end users and OLAP systems, to text documents, e-mail, XML, meter-collected, sensor-captured data, to video, audio, and stock ticker data. By some estimates, 80 to 85 percent of all organizations’ data is in some sort of unstructured or semistructured format.

∙     Velocity: This refers to both how fast data is being produced and how fast the data must be processed (i.e., captured, stored, and analyzed) to meet the need or demand. RFID tags, automated sensors, GPS devices, and smart meters are driving an increasing need to deal with torrents of data in near—real time.

Diff: 2    Page Ref: 280-281

 

 

63) List and describe four of the most critical success factors for Big Data analytics.

Answer:

∙     A clear business need (alignment with the vision and the strategy). Business investments ought to be made for the good of the business, not for the sake of mere technology advancements. Therefore the main driver for Big Data analytics should be the needs of the business at any level–strategic, tactical, and operations.

∙     Strong, committed sponsorship (executive champion). It is a well-known fact that if you don’t have strong, committed executive sponsorship, it is difficult (if not impossible) to succeed. If the scope is a single or a few analytical applications, the sponsorship can be at the departmental level. However, if the target is enterprise-wide organizational transformation, which is often the case for Big Data initiatives, sponsorship needs to be at the highest levels and organization-wide.

∙     Alignment between the business and IT strategy. It is essential to make sure that the analytics work is always supporting the business strategy, and not other way around. Analytics should play the enabling role in successful execution of the business strategy.

∙     A fact-based decision making culture. In a fact-based decision-making culture, the numbers rather than intuition, gut feeling, or supposition drive decision making. There is also a culture of experimentation to see what works and doesn’t. To create a fact-based decision-making culture, senior management needs to do the following: recognize that some people can’t or won’t adjust; be a vocal supporter; stress that outdated methods must be discontinued; ask to see what analytics went into decisions; link incentives and compensation to desired behaviors.

∙     A strong data infrastructure. Data warehouses have provided the data infrastructure for analytics. This infrastructure is changing and being enhanced in the Big Data era with new technologies. Success requires marrying the old with the new for a holistic infrastructure that works synergistically.

Diff: 2    Page Ref: 285-286

 

64) When considering Big Data projects and architecture, list and describe five challenges designers should be mindful of in order to make the journey to analytics competency less stressful.

Answer:

∙     Data volume: The ability to capture, store, and process the huge volume of data at an acceptable speed so that the latest information is available to decision makers when they need it.

∙     Data integration: The ability to combine data that is not similar in structure or source and to do so quickly and at reasonable cost.

∙     Processing capabilities: The ability to process the data quickly, as it is captured. The traditional way of collecting and then processing the data may not work. In many situations data needs to be analyzed as soon as it is captured to leverage the most value.

∙     Data governance: The ability to keep up with the security, privacy, ownership, and quality issues of Big Data. As the volume, variety (format and source), and velocity of data change, so should the capabilities of governance practices.

∙     Skills availability: Big Data is being harnessed with new tools and is being looked at in different ways. There is a shortage of data scientists with the skills to do the job.

∙     Solution cost: Since Big Data has opened up a world of possible business improvements, there is a great deal of experimentation and discovery taking place to determine the patterns that matter and the insights that turn to value. To ensure a positive ROI on a Big Data project, therefore, it is crucial to reduce the cost of the solutions used to find that value.

Diff: 3    Page Ref: 286-287

 

65) Define MapReduce.

Answer:  As described by Dean and Ghemawat (2004), “MapReduce is a programming model and an associated implementation for processing and generating large data sets. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.”

Diff: 2    Page Ref: 289-290

 

66) What is NoSQL as used for Big Data? Describe its major downsides.

Answer:

∙     NoSQL is a new style of database that has emerged to, like Hadoop, process large volumes of multi-structured data. However, whereas Hadoop is adept at supporting large-scale, batch-style historical analysis, NoSQL databases are aimed, for the most part (though there are some important exceptions), at serving up discrete data stored among large volumes of multi-structured data to end-user and automated Big Data applications. This capability is sorely lacking from relational database technology, which simply can’t maintain needed application performance levels at Big Data scale.

∙     The downside of most NoSQL databases today is that they trade ACID (atomicity, consistency, isolation, durability) compliance for performance and scalability. Many also lack mature management and monitoring tools.

Diff: 2    Page Ref: 295

 

67) What is a data scientist and what does the job involve?

Answer:  A data scientist is a role or a job frequently associated with Big Data or data science. In a very short time it has become one of the most sought-out roles in the marketplace. Currently, data scientists’ most basic, current skill is the ability to write code (in the latest Big Data languages and platforms). A more enduring skill will be the need for data scientists to communicate in a language that all their stakeholders understand–and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or–ideally–both. Data scientists use a combination of their business and technical skills to investigate Big Data looking for ways to improve current business analytics practices (from descriptive to predictive and prescriptive) and hence to improve decisions for new business opportunities.

Diff: 2    Page Ref: 297-298

 

68) Why are some portions of tape backup workloads being redirected to Hadoop clusters today?

Answer:

∙     First, while it may appear inexpensive to store data on tape, the true cost comes with the difficulty of retrieval. Not only is the data stored offline, requiring hours if not days to restore, but tape cartridges themselves are also prone to degradation over time, making data loss a reality and forcing companies to factor in those costs. To make matters worse, tape formats change every couple of years, requiring organizations to either perform massive data migrations to the newest tape format or risk the inability to restore data from obsolete tapes.

∙     Second, it has been shown that there is value in keeping historical data online and accessible. As in the clickstream example, keeping raw data on a spinning disk for a longer duration makes it easy for companies to revisit data when the context changes and new constraints need to be applied. Searching thousands of disks with Hadoop is dramatically faster and easier than spinning through hundreds of magnetic tapes. Additionally, as disk densities continue to double every 18 months, it becomes economically feasible for organizations to hold many years’ worth of raw or refined data in HDFS.

Diff: 2    Page Ref: 304

 

69) What are the differences between stream analytics and perpetual analytics? When would you use one or the other?

Answer:

∙     In many cases they are used synonymously. However, in the context of intelligent systems, there is a difference. Streaming analytics involves applying transaction-level logic to real-time observations. The rules applied to these observations take into account previous observations as long as they occurred in the prescribed window; these windows have some arbitrary size (e.g., last 5 seconds, last 10,000 observations, etc.). Perpetual analytics, on the other hand, evaluates every incoming observation against all prior observations, where there is no window size. Recognizing how the new observation relates to all prior observations enables the discovery of real-time insight.

∙     When transactional volumes are high and the time-to-decision is too short, favoring nonpersistence and small window sizes, this translates into using streaming analytics. However, when the mission is critical and transaction volumes can be managed in real time, then perpetual analytics is a better answer.

Diff: 2    Page Ref: 315-316

 

70) Describe data stream mining and how it is used.

Answer:  Data stream mining, as an enabling technology for stream analytics, is the process of extracting novel patterns and knowledge structures from continuous, rapid data records. A data stream is a continuous flow of ordered sequence of instances that in many applications of data stream mining can be read/processed only once or a small number of times using limited computing and storage capabilities. Examples of data streams include sensor data, computer network traffic, phone conversations, ATM transactions, web searches, and financial data. Data stream mining can be considered a subfield of data mining, machine learning, and knowledge discovery. In many data stream mining applications, the goal is to predict the class or value of new instances in the data stream given some knowledge about the class membership or values of previous instances in the data stream.

Diff: 2    Page Ref: 317

Reviews

There are no reviews yet.

Be the first to review “Business Intelligence A Managerial Perspective On Analytics 3rd Ed By Dursun Delen – Test Bank”

Your email address will not be published. Required fields are marked *