Match the following terms and definitions: - periodic data - data mart - star schema - data scrubbing - data transformation - grain - reconciled data - dependent data mart - real-time data warehouse - selection - transient data - snowflake schema a. lost previous data content b. detailed historical data c. data not altered or deleted d. partitioning of data base on predefined criteria e. data warehouse of limited scope f. dimension and fact tables g. corrects errors in source data h. level of detail in a fact table i. data filled from a data warehouse j. converts data formats k. structure that results from hierarchical dimensions l. a warehouse that accepts near real-time feeds of data
> How is KNIME used as a predictive analytics tool?
> Discuss why data mining applications are growing rapidly in business.
> Illustrate the goals of data mining and how they answer fundamental business questions.
> Discuss the different types of dashboards and their role in business performance management.
> How does Apache Spark differ from Hadoop?
> Define each of the following terms: a. data mining b. online analytical processing c. business intelligence d. predictive analytics e. Apache Spark
> What is the difference between a wide-column store and a graph-oriented database?
> What is the trade-off one needs to consider while using a NoSQL database management system?
> What is the difference between the explanatory and exploratory goals of data mining?
> Identify the differences between Hadoop and NoSQL technologies.
> Having reviewed your conceptual models (from Chapters 2 and 3) with the appropriate stakeholders and gaining their approval, you are now ready to move to the next phase of the project, logical design. Your next deliverable is the creation of a relational
> What are the two challenges faced in visualizing big data?
> Identify and briefly describe the five Vs that are often used to define big data.
> Contrast the following terms: a. data lake; data warehouse b. Pig; Hive c. volume; velocity d. NoSQL; SQL
> Match the following terms to the appropriate definitions: - Hive - Big data - Data lake - Pig - Analytics a. data exist in large volumes and variety and need to processed at a very high speed b. a language that is used to extract, load and transform data
> HDase and Cassandra share a common purpose. What is it? What is their relationship to HDFS and Google BigTable?
> Explain the implementation of MapReduce on HDFS clusters.
> How does HDFS aid in coping with hardware failure?
> Describe and explain the two main components of MapReduce
> What is the role of YARN in the management of highly distributed systems?
> List the purposes Hadoop is used for.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Discuss the features of NoSQL DBMS that ensure high availability but do not guarantee consistency.
> What is the format that can be used to describe database schema besides JSON?
> Define each of the following terms: a. Hadoop b. MapReduce c. HDFS d. NoSQL e. Pig
> Why is it important to consolidate a Web-based customer interaction in a data warehouse?
> List five claimed limitations of independent data marts.
> Explain the need to separate operational and information systems.
> List the issues that one encounters while achieving a single corporate view of data in a firm.
> Briefly describe the factors that have led to the evolution of the data warehouse.
> Why does an information gap still exist despite the surge in data in most firms?
> List the functions performed by a Data Warehouse Administrator and explain how they differ from the typical data administrator and database administrator.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Explain the reasons why Data Warehousing 2.0 is necessary.
> Explain how the phrase “extract–transform–load” relates to the data reconciliation process.
> List five errors and inconsistencies that are commonly found in operational data.
> List and briefly describe five steps in the data reconciliation process.
> Contrast the following terms: a. transient data; periodic data b. data scrubbing; data transformation c. data warehouse; data mart; operational data store d. reconciled data; derived data e. static extract; incremental extract f. fact table; dimension ta
> List six typical characteristics of reconciled data.
> Explain why it is essential to scrub data before transformation and how they blend together.
> Which three techniques form the building blocks of any data integration approach?
> Describe the current key trends in data warehousing.
> Explain how data integration is not the only data consolidation technique across an enterprise.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Briefly explain how the dimensions and facts required for a data mart are driven by the context for decision making.
> Why should changes be made to the data warehouse design? What are the changes that need to be accommodated?
> What is the meaning of the phrase “slowly changing dimension”?
> What are the two situations in which factless fact tables may apply?
> Explain through common examples why determining grain is critical.
> List and describe the various situations in which it becomes essential to further normalize dimension tables.
> Explain the components of a star schema with figures and suitable examples.
> Describe the characteristics of a surrogate key as used in a data warehouse or data mart.
> Discuss the role of an enterprise data model and metadata in the architecture of a data warehouse.
> FAME (Forondo Artist Management Excellence) Inc. is an artist management company that represents classical music artists (only soloists) both nationally and internationally. FAME has more than 500 artists under its management and wants to replace its spr
> What are the key differences between data warehousing and big data approaches to analytical data management?
> What type of an impact has the significant decrease in the cost of storage space had on data warehouse and data mart design?
> Why is real-time data warehousing called active data warehousing?
> Explain how the characteristics of data for data warehousing is different from the characteristics of data for operational databases.
> List the different roles played by data marts and data warehouses in a data warehouse environment.
> What is meant by a corporate information factory?
> List the 10 essential rules for dimensional modeling.
> Define each of the following terms: a. data warehouse b. data mart c. reconciled data d. derived data e. enterprise data warehouse f. real-time data warehouse g. star schema h. snowflake schema i. grain j. conformed dimension k. static extract l. increme
> What is the role of a DBA? List various regulations and standards for physical database design and their functions.
> Identify some limitations of normalized data as outlined in the text.
> What is a translation or code table? When it should be implemented, and what are its advantages?
> FAME (Forondo Artist Management Excellence) Inc. is an artist management company that represents classical music artists (only soloists) both nationally and internationally. FAME has more than 500 artists under its management and wants to replace its spr
> What decisions have to be made to develop a field specification?
> What are the key decisions in physical database design?
> Discuss the potential advantages, technical challenges, and disadvantages of using cloud-based database provisioning.
> Describe the differences between the IaaS, PaaS, and SaaS models of cloud-based database management solutions.
> How can views be used as part of data security? What are the limitations of views for data security?
> What are the major inputs into physical database design?
> Briefly describe four components of a disaster recovery plan.
> Explain the role of encryption in data security.
> List and describe four common types of database failure.
> Briefly describe four DBMS facilities that are required for database backup and recovery.
> Research various graphics and drawing packages, such as Microsoft Office and SmartDraw, and compare the E-R diagramming capabilities of each. Is each package capable of using the notation found in this text? Is it possible to draw a ternary or higher-ord
> What are the two key types of security policies and procedures that must be established to aid in Sarbanes-Oxley compliance?
> What are the key areas of IT that are examined during a Sarbanes-Oxley audit?
> What is the difference between an authentication scheme and an authorization scheme?
> List and briefly explain how integrity controls can be used for database security.
> Explain the components of a repository system architecture. List and explain the functions supported by a repository engine.
> List and discuss five areas where threats to data security may occur.
> Contrast the following terms: a. horizontal partitioning; vertical partitioning b. repository; data dictionary c. physical file; tablespace d. before image; after image e. normalization; denormalization f. range control; null control g. transaction log;
> Contrast the uses of a data dictionary and a repository in data and database management.
> What are the different elements of a query that can be processed in parallel?
> Explain how query writers can improve query processing performance through knowledge of query optimizers.
> Interview one person from a key business function, such as finance, human resources, or marketing. Concentrate your questions on the following items: How does he or she retrieve data needed to make business decisions? From what kind of system (personal d
> What role can a query optimizer play in the selection of an optimal set of indexes?
> Database servers frequently use one of the many parallel processing architectures. Discuss which elements of a query can be processed in parallel.
> Explain why an index is useful only if there is sufficient variety in the values of an attribute.
> Discuss the trade-off between improved performance for retrieval through use of indexes and degraded performance due to updates of indexed records in a file.
> State 10 rules of thumb for choosing indexes.
> How is storage space in a database divided logically by the DBMS? What is the role of a DBA in managing it?
> What is the purpose of the EXPLAIN or EXPLAIN PLAN command?
> Match the following terms to the appropriate definitions: - extent hashing algorithm - rollback - index - checkpoint facility - physical record - pointer - data type - physical file - database recovery a. a detailed coding scheme for representing organi
> Compare the features of the four families of file organization.
> Which index is most suitable for decision support and transaction processing applications that involve online querying? Explain your answer.
> Interview a systems analyst or database analyst and ask questions about how that organization uses data modeling and design tools in the systems development process. Concentrate your questions on how data modeling and design tools are used to support dat
> Explain data replication, forms of partitioning, and their areas of application.