How is KNIME used as a predictive analytics tool?
> Using the online Appendix B, available on the book’s Web site, as a resource, interview a database analyst/designer to determine whether he or she normalizes relations to higher than 3NF. Why or why not does he or she use normal forms beyond 3NF?
> Obtain an EER diagram from a database administrator or system designer. Based on what you have learned in this book, convert this into a relational schema in 3NF. Now interview the administrator on how they convert the diagram into relations. How do they
> Interview system designers and database designers at several organizations. Ask them to describe the process they use for logical design. How do they transform their conceptual data models (e.g., E-R diagrams) to relational schema? What is the role of CA
> For the same E-R diagram used in Field Exercise 2-56 or for a different database in the same or a different organization, identify any uses of time stamping or other means to model time dependent data. Why are time-dependent data necessary for those who
> Ask a database or systems analyst in a local company to show you an E-R diagram for one of the organization’s primary databases. Ask questions to be sure you understand what each entity, attribute, and relationship means. Does this organization use the s
> Ask a database or systems analyst to give you examples of unary, binary, and ternary relationships that the analyst has dealt with personally at his or her company. Ask which is most common and why. Ask them if they ever model weak or dependent entities
> Interview a database analyst or a systems analyst. How do they extract business rules for ER modeling? Ask for specific sources. Are they all listed in the text? Did they purchase an ER model and customize it or design it on their own? How did they decid
> List the nine major components in a database system environment.
> Interview a database analyst and ask how they go about identifying business rules in the data modeling process. How do they decide to document what business rules they will gather, utilize, manage, and consider while developing an E-R model? How do they
> What changes can be made in data administration at each stage of the traditional database development life cycle to deliver high-quality, robust systems more quickly?
> Briefly describe four database administration trends that are emerging today.
> What factors must be considered when deciding on an open-source DBMS?
> What functions require the input and involvement of both the data administrator and the database administrator?
> Why are data administrators required to maintain an information repository?
> Indicate whether data administration or database administration is typically responsible for each of the following functions: a. Managing the data repository b. Installing and upgrading the DBMS c. Conceptual data modeling d. Managing data security and p
> Many organizations are now offering cloud-based data warehousing services such as IBM’s dashDB, Amazon’s Redshift, and Microsoft Azure. Pick any three such firms and, using the Internet, compare them based on the factors listed. Prepare a report based on
> Contrast the following terms: a. chief data officer; DBA b. data administration; database administration c. open source DBMS; commercial DBMS d. ETL; MDM
> What distinguishes MDM from other forms of data integration?
> Describe the three major approaches to MDM.
> State any four data availability problems and how they can potentially be addressed.
> Explain how TQM techniques can help in improving data quality.
> Match the following terms and definitions: - data administration database - master data management - data steward - open source DBMS a. oversees data quality for a particular data subject b. a database management system available for free (typically incl
> What are some of the advanced techniques that can be applied by a software solution for data quality improvement?
> Explain how an organization’s business rules can be checked as part of a data audit.
> Describe the key steps to improve data quality in an organization.
> Explain four reasons why the quality of data is poor in many organizations.
> Visit an organization that has implemented information systems on a data warehouse, and interview managers to discuss following issues: a. Does increased data collection lead to any information gaps for managers? b. Do they receive information from diver
> Define the eight characteristics of quality data.
> What are the four basic facilities for the backup and recovery of a database?
> What are four reasons why data quality is important to an organization?
> How can fuzzy logic, pattern matching, and expert systems be used to improve data quality?
> How can the data capture process be improved?
> Briefly describe four threats to high data availability and at least one measure that can be taken to counter each of these threats.
> Define each of the following terms: a. database administration b. data administration c. chief data officer d. master data management e. open source DBMS
> Compare and contrast R and Python as computational environments for analytics.
> Briefly describe three types of operations that can easily be performed with OLAP tools.
> Discuss the role of OLAP in the context of descriptive analytics.
> Having reviewed your conceptual models (from Chapters 2 and 3) with the appropriate stakeholders and gaining their approval, you are now ready to move to the next phase of the project, logical design. Your next deliverable is the creation of a relational
> Explain the different tools for querying and analyzing data in traditional data warehouses and marts that enable various forms of descriptive analytics.
> Explain the three different generations of business intelligence and analytics.
> Explain the progression from DSS to analytics through business intelligence.
> Contrast the following terms: a. Data mining; text mining b. ROLAP; MOLAP c. R; Python
> Match the following terms to the appropriate definitions: - text mining - data mining - descriptive analytics - analytics - predictive analytics - prescriptive analytics a. knowledge discovery using a variety of statistical and computational techniques b
> Identify six broad categories of implications of big data analytics and decision making.
> How is data quality and management vital in realizing the full potential of big data and analytics?
> Describe the core idea underlying in-database analytics.
> Describe the core idea underlying in-memory DBMSs.
> Describe the mechanism through which prescriptive analytics is dependent on descriptive and predictive analytics.
> Having reviewed your conceptual models (from Chapters 2 and 3) with the appropriate stakeholders and gaining their approval, you are now ready to move to the next phase of the project, logical design. Your next deliverable is the creation of a relational
> Discuss why data mining applications are growing rapidly in business.
> Illustrate the goals of data mining and how they answer fundamental business questions.
> Discuss the different types of dashboards and their role in business performance management.
> How does Apache Spark differ from Hadoop?
> Define each of the following terms: a. data mining b. online analytical processing c. business intelligence d. predictive analytics e. Apache Spark
> What is the difference between a wide-column store and a graph-oriented database?
> What is the trade-off one needs to consider while using a NoSQL database management system?
> What is the difference between the explanatory and exploratory goals of data mining?
> Identify the differences between Hadoop and NoSQL technologies.
> Having reviewed your conceptual models (from Chapters 2 and 3) with the appropriate stakeholders and gaining their approval, you are now ready to move to the next phase of the project, logical design. Your next deliverable is the creation of a relational
> What are the two challenges faced in visualizing big data?
> Identify and briefly describe the five Vs that are often used to define big data.
> Contrast the following terms: a. data lake; data warehouse b. Pig; Hive c. volume; velocity d. NoSQL; SQL
> Match the following terms to the appropriate definitions: - Hive - Big data - Data lake - Pig - Analytics a. data exist in large volumes and variety and need to processed at a very high speed b. a language that is used to extract, load and transform data
> HDase and Cassandra share a common purpose. What is it? What is their relationship to HDFS and Google BigTable?
> Explain the implementation of MapReduce on HDFS clusters.
> How does HDFS aid in coping with hardware failure?
> Describe and explain the two main components of MapReduce
> What is the role of YARN in the management of highly distributed systems?
> List the purposes Hadoop is used for.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Discuss the features of NoSQL DBMS that ensure high availability but do not guarantee consistency.
> What is the format that can be used to describe database schema besides JSON?
> Define each of the following terms: a. Hadoop b. MapReduce c. HDFS d. NoSQL e. Pig
> Why is it important to consolidate a Web-based customer interaction in a data warehouse?
> List five claimed limitations of independent data marts.
> Explain the need to separate operational and information systems.
> List the issues that one encounters while achieving a single corporate view of data in a firm.
> Briefly describe the factors that have led to the evolution of the data warehouse.
> Why does an information gap still exist despite the surge in data in most firms?
> List the functions performed by a Data Warehouse Administrator and explain how they differ from the typical data administrator and database administrator.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Explain the reasons why Data Warehousing 2.0 is necessary.
> Explain how the phrase “extract–transform–load” relates to the data reconciliation process.
> List five errors and inconsistencies that are commonly found in operational data.
> List and briefly describe five steps in the data reconciliation process.
> Contrast the following terms: a. transient data; periodic data b. data scrubbing; data transformation c. data warehouse; data mart; operational data store d. reconciled data; derived data e. static extract; incremental extract f. fact table; dimension ta
> List six typical characteristics of reconciled data.
> Explain why it is essential to scrub data before transformation and how they blend together.
> Which three techniques form the building blocks of any data integration approach?
> Describe the current key trends in data warehousing.
> Explain how data integration is not the only data consolidation technique across an enterprise.
> Martin was very impressed with your project plan and has given you the go-ahead for the project. He also indicates to you that he has e-mails from several key staff members that should help with the design of the system. The first is from Alex Martin (ad
> Briefly explain how the dimensions and facts required for a data mart are driven by the context for decision making.
> Why should changes be made to the data warehouse design? What are the changes that need to be accommodated?
> What is the meaning of the phrase “slowly changing dimension”?
> What are the two situations in which factless fact tables may apply?
> Explain through common examples why determining grain is critical.
> Match the following terms and definitions: - periodic data - data mart - star schema - data scrubbing - data transformation - grain - reconciled data - dependent data mart - real-time data warehouse - selection - transient data - snowflake schema a. lost
> List and describe the various situations in which it becomes essential to further normalize dimension tables.
> Explain the components of a star schema with figures and suitable examples.