Review Figure 10-15 and answer the following questions based on it.
a. What has happened between Input and Inputâ?
b. Assume that the values associated with each of the keys (k1, k2, and so forth) are counts. What is the purpose of the Shuffle stage?
c. If the overall goal is to count the number of instances per key, what does the role of the Reduce stage have to be?
Data from Figure 10-15:
> Wally Los Gatos, owner of Wally’s Wonderful World of Wallcoverings, Etc., has hired you as a consultant to design a database management system for his new online marketplace for wallpaper, draperies, and home decorating accessories. He would like to trac
> Consider the SQL query in Figure 1-20. a. How is Sales to Date calculated? b. How would the query have to change if Helen Jarvis wanted to see the results for all of the product lines, not just the Home Office product line? c. The part of the query start
> You are now ready to create to a proof of concept system for FAME. Create a deployment/rollout strategy for your system within FAME. Ensure that your deployment strategy includes a plan for training, conversion/loading of existing data into the new syste
> Answer the following questions concerning Figures 1-18 and 1-19: a. What will be the field size for the ProductLineName field in the Product table? Why? b. In Figure 1-19, how is the ProductID field in the Product table specified to be required? Why is i
> Consider the project data model shown in Figure 1-16. a. Create a textual description of the diagrammatic representation shown in the figure. Ensure that the description captures the rules/constraints conveyed by the model. b. In arriving at the requirem
> Consider Figure 1-15. Explain the meaning of the line that connects CUSTOMER to ORDER and the line that connects ORDER to INVOICE. What does this say about how Pine Valley Furniture Company does business with its customers?
> Consider the SQL example in Figure 1-19. a. What is the name of the table that is referred to when the SELECT statement is executed? b. How many tables are accessed when the FROM statement is executed? c. How many conditions are evaluated and met in orde
> Prototyping is an iterative process of system development in which requirements are converted into a working system that is continually revised by analysts and users. What are the circumstances under which prototyping should be used?
> There are various development approaches in organizations, and the traditional ones have now been complemented by the more innovative system development methods. Much has been said about the prototyping methodology and its radical features in the develop
> Consider Figure 1-15. While designing the attributes for the Customer table, is it necessary to designate an attribute, such as Customer ID, as a key field? Can we use an ordinary attribute, such as Customer Name, to determine the existence of a customer
> Review the example in Figure 1-2 and 1-4 regarding the differences between the file-based approach and the current database approach. Explain how these differences would impact the relationships between the different entities in the database. Data from
> Consider Figure 1-12, which depicts a hypothetical multi-tiered database architecture. Identify potential duplications of data across all the databases listed on this figure. What problems might arise because of this duplication? Does this duplication vi
> Figure 1-22 shows an enterprise data model for a music store. a. What is the relationship between Album and Store (one-to-one, many-to-many, or one-to-many)? b. What is the relationship between Artist and Album? c. Do you think there should be a relation
> You are now ready to create to a proof of concept system for FAME. Create a testing strategy (including user acceptance testing) for your proof of concept. Which stakeholders should you involve in the phase? Who do you think should sign off on the testin
> Consider a book rental system in a comic store. When a customer borrows or returns a comic book, the shopkeeper needs to note down the transaction or update the corresponding record on the transaction book. a. Draw an enterprise data model for this book
> Think of an organizational database in which some of the fields in the CUSTOMER table must have the following data types. Explain what they mean and how they are used. a. Customer ID (auto-numeric field) b. Customer Name (text field) c. Fee Paid (logical
> For each of the following pairs of related entities, indicate whether (under typical circumstances) there is a one-to many or a many-to-many relationship. Then, using the shorthand notation introduced in the text, draw a diagram for each of the relations
> There is a bulleted list associated with Figure 2-22 that describes the entities and their relationships in Pine Valley Furniture. For each of the 10 points in the list, identify the subset of Figure 2-22 described by that point.
> Answer the following questions concerning Figure 2-22: a. Where is a unary relationship, what does it mean, and for what reasons might the cardinalities on it be different in other organizations? b. Why is Includes a one-to-many relationship, and why mig
> Based on the table above as well as additional research, write a memo in support of or against the following statement: “Cloud databases will increasingly eliminate the need for data administrators/DBAs in corporations.”
> Visit the Web sites of one or more popular cloud service providers that provide cloud database services. Use the table below to map the features listed on the Web site to the major concepts covered in this chapter. If you are not sure where to start, try
> The average annual revenue per customer for the mail order firm described in Problems and Exercises 12-33 and 12-35 is $100. The organization is planning a data quality improvement program that it hopes will increase the average revenue per customer by 5
> The mail order firm described in Problem and Exercise 12-33 has about 1 million customers. The firm is planning a mass mailing of its spring sales catalog to all of its customers. The unit cost of the mailing (postage and catalog) is $6.00. The error rat
> Black Friday is one of the busiest and most profitable times for online retailers due to the traffic generated by price reductions online. On November 24, 2017, a number of Web sites belonging to major online retailers experienced a disruption of service
> You are now ready to create to a proof of concept system for FAME. Create your proof of concept using your technological recommendations (or using the environment that your instructor asks you to use).
> An e-business operates a high-volume catalog sales center. Through the use of clustered servers and mirrored disk drives, the data center has been able to achieve data availability of 99.5 percent. Although this exceeds industry norms, the organization s
> You have been asked to write a brief report on how TQM can be adopted by your organization to improve data quality. Produce a list of reasons why TQM should and should not be adopted, and recommend, with an explanation, an alternative approach to data qu
> Design an interface that would enable the capture of high-quality and error-free data.
> Referring to Problem and Exercise 12-28, rank the four candidates for the position of DBA at Metro Marketing. Again, support your rankings. Data from Problem and Exercise 12-28: Metro Marketers, Inc., wants to build a data warehouse for storing customer
> Referring to Problem and Exercise 12-28, rank the four candidates for the position of data warehouse administrator at Metro Marketing. Again, support your rankings. Data from Problem and Exercise 12-28: Metro Marketers, Inc., wants to build a data wareh
> Metro Marketers, Inc., wants to build a data warehouse for storing customer information that will be used for data marketing purposes. Building the data warehouse will require much more capacity and processing power than it has previously needed, and it
> In light of increasing legislation dictating how an organization is to store data, what would be your requirements for the role of chief data officer?
> The Pine Valley databases for this textbook (one small version illustrated in queries throughout the text and a larger version) are available to your instructor to download from the text’s Web site. Your instructor can make those databases available to y
> Examine the set of activities in Table 12-2 and categorize them as belonging to one of the following categories: people (“who”), process (“how”), and technology (“w
> Any successful data governance program needs to address the people (“who”), process (“how”), and technology (“what”) aspects. Based on your reading this chapter, provide some examples for each of these categories.
> You are now ready to create to a proof of concept system for FAME. Provide a document that provides your recommendation on the set of technologies (DBMS, programming language, Web server [if appropriate]) that you believe are best suited for FAME. Ensure
> Read an SAS white paper (www.sas.com/resources/whitepaper/wp_56343.pdf) on the use of telematics in car insurance. If Fitchwood started to use one of these technologies, what consequences would it have for its IT infrastructure needs?
> Fitchwood is a relatively small company (annual premium revenues less than $1 billion per year) that insures slightly more than 500,000 automobiles and about 200,000 homes. For what types of purposes might Fitchwood want to use big data technologies (i.e
> Text mining is an increasingly important subcategory of data mining. Can you identify potential uses of text mining in the context of an insurance company?
> Do you see any opportunities for data mining using the Fitchwood data mart? Research data mining tools and recommend one or two for use with the data mart.
> Suggest some visualization options that Fitchwood managers might want to use to support their decision making.
> Fitchwood management would like to use the data mart for drill-down online reporting. For example, a sales manager might want to view a report of total sales for an agent by month and then drill down into the individual types of policies to see how sales
> Review the white paper that has been used as a source for Figure 10-33. Which of the following tasks is the responsibility of data platform, integrated data warehouse, and integrated discovery platform, respectively? a. Finding new, previously unknown re
> For each scenario listed below, identify the following: the type of business analytics, the era of BI&A, the goal of data mining (if applicable), and whether and how big data and analytics have the potential to bring about change in the listed scenario.
> Consider the customer table created in Figure 10-24 and populated with data as shown in Figure 10-27. Write the Hive script that will display the age-groups that exist in the data set and their average incomes. Data from Figure 10-24: Data from Figure
> Use the Internet to browse the features and offerings of Big Data platforms such as HAVEn and Aster. Prepare a report of your findings.
> You are now ready to create to a proof of concept system for FAME. Revisit your deliverable for question 1-52, Chapter 1, and reread the case descriptions in Chapters 1 through 3 with an eye toward identifying the functionality you want to provide to the
> Write two HIVE queries, the first to create a PRODUCT table with fields ProdID, Name, Seller, Price; the second to load data into the table from file ProductInfo.csv. Make all necessary assumptions.
> For each situation presented below, illustrate a document as depicted in Figures 10-4 and 10-5 and specify whether it contains an array, an embedded subdocument, relationships, or collections. Use hypothetical data and make necessary assumptions. a. A do
> Figure 10-14 describes a simple Hadoop architecture. If a real-world system is implemented using this approach, it will suffer from a specific weakness. Identify what this weakness is and find out what the latest versions of Hadoop have done to address i
> Assume that the following data regarding Students need to be stored—Name: First Name and Last Name, Roll Number, and Mobile Number. Illustrate with figures how it would be stored in different NoSQL database models.
> Review Figure 10-5 (a). Write a MongoDB query to display all products with review ratings greater than 3 stars and suppress the fields “height” and “width” in the output using the su
> Review Figure 10-3. For each of the formats, identify the elements that are data values and those that are labels describing the data. Data from Figure 10-3:
> Compare the JSON and XML representations of a record in Figure 10-1. What is the primary difference between these? Can you identify any advantages of one compared to the other? Data from Figure 10-1:
> GROUP BY by itself creates subtotals by category, and the ROLLUP extension to GROUP BY creates even more categories for subtotals. Using all the orders, do a rollup to get total order amounts by product, sales region, and month and all combinations, incl
> Because data warehouses and even data marts can become very large, it may be sufficient to work with a subset of data for some analyses. Create a sample of orders from 2004 using the SAMPLE SQL command (which is standard SQL); put a randomized allocation
> Consider the data needs of a small accounting department at a tax services firm. What would some of the data entities be in this setting? List and explain their relevance. Develop a project data model for this firm applying the database design concepts y
> Using the MDIFF “ordered analytical function” in Teradata SQL (see the Functions and Operators manual), show the differences (label the difference CHANGE) in TOTAL from quarter to quarter. Hint: You will likely create a derived table based on your query
> Take the query you scrapped from Problem and Exercise 9-58 and modify it to show only the U.S. region grouped by each quarter, not just for 2005 but for all years available, in order by quarter. Label the total orders by quarter with the heading TOTAL an
> The database you are using was developed by MicroStrategy, a leading business intelligence software vendor. The MicroStrategy software is also available on TUN. Most business intelligence tools generate SQL to retrieve the data they need to produce the r
> Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Are dimension tables conformed in this data mart? Explain.
> Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Explain what dimension data, if any, are maintained to support slowly changing dimensions. If
> Review the metadata file for the db_samwh database and the definitions of the database tables. (You can use SHOW TABLE commands to display the DDL for tables.) Explain the methods used in this database for modeling hierarchies. Are hierarchies modeled as
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> In the section “Disadvantages of File Processing Systems,” the statement is made that the disadvantages of file processing systems can also be limitations of databases, depending on how an organization manages its databases. First, why do organizations c
> Contrast the following terms: a. data dependence; data independence b. structured data; unstructured data c. metadata; data d. repository; database e. entity; enterprise data model f. data warehouse; data lake g. personal databases; multi-tiered database
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Fitchwood Insurance Company, which is involved primarily in the sale of annuity products, would like to design a data mart for its sales and marketing organization. Presently, the OLTP system is a legacy system residing on a shared network drive consisti
> Pine Valley Furniture wants you to help design a data mart for analysis of sales. The subjects of the data mart are as follows: Salesperson: Attributes: SalespersonID, Years with PVFC, SalespersonName, and SupervisorRating. Product: Attributes: ProductID
> A firm wants to reduce fluid drilling costs substantially by increasing drilling fluid efficiency. Research finds that both fluid drilling speed and cost are significantly influenced by Time, Geography, Drilling fluid type, Formation, and Well type. Geog
> A pharmaceutical retail store manages its current sales, procurement and materials availability at the store through Excel sheets. Owing to the increase in the number of branches in the city, the store manager is now finding this process of data maintena
> A university gathers student admission data from three different sources: through forms filled manually at university desks, by registering at the university Web site, or by registering on the department’s Web site. All the three sources have disparate f
> Employees working in IT organizations are assigned different projects for a specific duration, such as a few months or years. The duration is specified by the project start date and end date in the database. The project location is different for each pro
> Table 1-1 shows example metadata for a set of data items. Identify three other columns for these data (i.e., three other metadata characteristics for the listed attributes) and complete the entries of the table in Table 1-1 for these three additional col
> Simplified Automobile Insurance Company would like to add a Claims dimension to its star schema. Attributes of Claim are ClaimID, ClaimDescription, and ClaimType. Attributes of the fact table are now PolicyPremium, Deductible, and MonthlyClaimTotal. a. E
> You are to construct a star schema for Simplified Automobile Insurance Company (for a more realistic example, Kimball, 1996b). The relevant dimensions, dimension attributes, and dimension sizes are as follows: InsuredParty: Attributes: InsuredPartyID and
> A table Student stores the values StudentID, name, date of result, and total marks obtained. A student’s information is: StudentID: S876, Name: Sabcd, Date of result: 22/12/14, and Total marks obtained: 650. An update transaction has changed the date and
> Drilling often accounts for one-third to two-thirds of the total cost in the search for fluid. Advances in drilling technology can reduce these costs substantially. The key point is redesigning the scheme of drilling fluid. A research study identifies th
> The following table shows some simple album and price data as of the date 07/18/2015: The following transactions occur on 07/19/2015: • Album K3 price discounted to $7. • Album K5 is deleted from the file. â€
> Examine the three tables with student data shown in Figure 9-1. Design a single-table format that will hold all of the data (non-redundantly) that are contained in these three tables. Choose column names that you believe are most appropriate for these da
> Based on the table above as well as additional research, write a memo in support of or against the following statement: “Cloud databases will increasingly eliminate the need for data/database administrators in corporations.”
> Assume that a bank operates multinational and has millions of financial records of customers in its database. The bank also offers e-banking services to its clients. Based on what you have learned from the book, suggest how they can take regular backups
> Revisit the six issues identified in Problem and Exercise 8-72. What risk, if any, do each of them pose to the firm? Data from Problem and Exercise 8-72: During the Sarbanes-Oxley audit of a financial services company, you note the following issues. Cat
> During the Sarbanes-Oxley audit of a financial services company, you note the following issues. Categorize each of them into the area to which they belong: IT change management, logical access to data, and IT operations. a. Five DBAs have access to the S
> You are the manager of a department in a small logistics company. The current database system being used is hierarchical, and you have been tasked to formulate a team that can create a plan to develop a more efficient database system that is consistent w
> A number of situations have been listed below. For each one, identify the need, if any, to create an index. Justify your answer. If there is indeed a need, suggest a way for the index to be created. a. Banking applications that involve frequent retrieval
> For each of the situations described, decide which technique for data field design listed below would be most appropriate and how it could be applied. • Code lookup table • Default value • Range control • Referential integrity • Handling missing data a
> Fill in the two authorization tables for Pine Valley Furniture Company below based on the following assumptions (enter Y for yes or N for no): • Salespersons, managers, and carpenters may read inventory records but may not perform any o
> Refer to Figure 4-5. For each of the following reports (with sample data), indicate any indexes that you feel would help the report run faster as well as the type of index: a. State, by products (user-specified period) State, by Products Report, January
> Consider the EER diagram for Pine Valley Furniture shown in Figure 3-12. Figure 8-15 looks at a portion of that EER diagram. Let’s make a few assumptions about the average usage of the system: • There are 60,000 custom
> Consider the composite usage map in Figure 8-1. After a period of time, the assumptions for this usage map have changed, as follows: • There is an average of 60 supplies (rather than 40) for each supplier. • Manufactur
> Create an index on the CustomerID column of the Customer_T and Order_T table in Figure 4-4. Data from Figure 4-4:
> Consider the following assumptions: • A music company offers three types of music genres: Jazz, Hip-hop, and Metal (subtypes of the Genre supertype). An “Artist” instances “Records” of these Genres. • There are total of 8,000 songs in company’s database,
> Parallel query processing, as described in this chapter, means that the same query is run on multiple processors and that each processor accesses in parallel a different subset of the database. Another form of parallel query processing, not discussed in