[SOLVED] Consider the bank database of Figure 2.

> Use the deﬁnition of functional dependency to argue that each of Armstrong’s axioms (reﬂexivity, augmentation, and transitivity) is sound.

> List two reasons why null values might be introduced into a database.

> Estimate the number of block transfers and seeks required by your solution to Exercise 15.19 for r1 ⋈ r2, where r1 and r2 are as deﬁned in Exercise 15.3.

> Design a variant of the hybrid merge-join algorithm for the case where both relations are not physically sorted, but both have a sorted secondary index on the join attributes.

> Why is it not desirable to force users to make an explicit choice of a query- processing strategy? Are there cases in which it is desirable for users to be aware of the costs of competing query-processing strategies? Explain your answer.

> Suppose you need to sort a relation of 40 gigabytes, with 4-kilobyte blocks, using a memory size of 40 megabytes. Suppose the cost of a seek is 5 milliseconds, while the disk transfer rate is 40 megabytes per second. a. Find the cost of sorting the relat

> An existence bitmap has a bit for each record position, with the bit set to 1 if the record exists, and 0 if there is no record at that position (for example, if the record were deleted). Show how to compute the existence bitmap from other bitmaps. Make

> What trade-oﬀs do write-optimized indices pose as compared to B+-tree indices?

> Suppose a relation is stored in a B+-tree ﬁle organization. Suppose secondary indices store record identiﬁers that are pointers to records on disk. a. What would be the eﬀect on the secondary indices if a node split happened in the ﬁle organization? b. W

> Suppose you have to create B+-tree index on a large number of names, where the maximum size of a name may be quite large (say 40 characters) and the average name is itself large (say 10 characters). Explain how preﬁx compression can be used to maximize t

> Suppose there is a relation r (A, B, C), with a B+-tree index with search key (A, B). a. What is the worst-case cost of ﬁnding records satisfying 10 < A < 50 using this index, in terms of the number of records retrieved n1 and the height h of the tree? b

> Why certain functional dependencies are called trivial functional dependencies?

> The solution presented in Section 14.3.5 to deal with non-unique search keys added an extra attribute to the search key. What eﬀect could this change have on the height of the B+-tree?

> Consider the bank database of Figure 2.18. Give an expression in the relational algebra for each of the following queries: a. Find each loan number with a loan amount greater than $10000. b. Find the ID of each depositor who has an account with a balance

> For each B+-tree of Exercise 14.3, show the steps involved in the following queries: a. Find records with a search-key value of 11. b. Find records with a search-key value between 7 and 17, inclusive.

> What is the diﬀerence between a clustering index and a secondary index?

> Some attributes of relations may contain sensitive data, and may be required to be stored in an encrypted fashion. How does data encryption aﬀect index schemes? In particular, how might it aﬀect schemes that attempt to store data in sorted order?

> Spatial indices that can index spatial intervals can conceptually be used to index temporal data by treating valid time as a time interval. What is the problem with doing so, and how is the problem solved?

> When is it preferable to use a dense index rather than a sparse index? Explain your answer.

> Standard buﬀer managers assume each block is of the same size and costs the same to read. Consider a buﬀer manager that, instead of LRU, uses the rate of reference to objects, that is, how often an object has been accessed in the last n seconds. Suppose

> Give a normalized version of the Index metadata relation, and explain why using the normalized version would result in worse performance.

> In the sequential ﬁle organization, why is an overﬂow block used even if there is, at the moment, only one overﬂow record?

> Explain what is meant by repetition of information and inability to represent in- formation. Explain why each of these properties may indicate a bad relational- database design.

> List two advantages and two disadvantages of each of the following strategies for storing a relational database: a. Store each relation in one ﬁle. b. Store multiple relations (perhaps even the entire database) in one ﬁle.

> Explain why the allocation of records to blocks aﬀects database-system performance signiﬁcantly.

> Consider the employee database of Figure 2.17. Give an expression in the relational algebra to express each of the following queries: a. Find the ID and name of each employee who works for “Big Bank”. b. Find the ID, name, and city of residence of each e

> In the variable-length record representation, a null bitmap is used to indicate if an attribute has the null value. a. For variable-length ﬁelds, if the value is null, what would be stored in the oﬀset and length ﬁelds? b. In some applications, tuples ha

> Suppose you have data that should not be lost on disk failure, and the application is write-intensive. How would you store the data?

> What is scrubbing, in the context of RAID systems, and why is scrubbing important?

> RAID systems typically allow you to replace failed disks without stopping access to the system. Thus, the data in the failed disk must be rebuilt and written to the replacement disk while the system is in operation. Which of the RAID levels yields the le

> Operating systems try to ensure that consecutive blocks of a ﬁle are stored on consecutive disk blocks. Why is doing so very important with magnetic disks? If SSDs were used instead, is doing so still important, or is it irrelevant? Explain why.

> How does the remapping of bad sectors by disk controllers aﬀect data-retrieval rates?

> List the physical storage media available on the computers you use routinely. Give the speed with which data can be accessed on each medium.

> Given two relations r(A, B, valid time) and s(B, C, valid time), where valid time de- notes the valid time interval, write an SQL query to compute the temporal Nat intervals overlap and the ∗ operator to compute the intersection of two intermural join of

> Suggest how predictive mining techniques can be used by a sports team, using your favorite sport as an example.

> The organization of parts, chapters, sections, and subsections in a book is related to clustering. Explain why, and to what form of clustering.

> Suppose half of all the transactions in a clothes shop purchase jeans, and one- third of all transactions in the shop purchase T-shirts. Suppose also that half of the transactions that purchase jeans also purchase T-shirts. Write down all the (nontrivial

> Construct a schema diagram for the bank database of Figure 2.18.

> Consider the star schema from Figure 11.2. Suppose an analyst ﬁnds that monthly total sales (sum of the price values of all sales tuples) have decreased, instead of growing, from April 2018 to May 2018. The analyst wishes to check if there are speciﬁc it

> Consider each of the takes and teaches relations as a fact table; they do not have an explicit measure attribute, but assume each table has a measure attribute rig count whose value is always 1. What would the dimension attributes and dimension tables be

> Why is column-oriented storage potentially advantageous in a database system that supports a data warehouse?

> Explain how multiple operations can be executed on a stream using a publish subscribe system such as Apache Kafka.

> Suppose a stream can deliver tuples out of order with respect to tuple times- tamps. What extra information should the stream provide, so a stream query processing system can decide when all tuples in a window have been seen?

> Fill in the blanks below to complete the following Apache Spark program which computes the number of occurrences of each word in a ﬁle. For simplicity we assume that words only occur in lowercase, and there are no punctuation marks. Java RDD text File =

> Although SQL does not support functional dependency constraints, if the database system supports constraints on materialized views, and materialized views are maintained immediately, it is possible to enforce functional dependency constraints in SQL. Giv

> The map-reduce framework is quite useful for creating inverted indices on a set of documents. An inverted index stores for each word a list of all document IDs that it appears in (oﬀsets in the documents are also normally stored, but we shall ignore them

> Suppose your company has built a database application that runs on a centralized database, but even with a high-end computer and appropriate indices created on the data, the system is not able to handle the transaction load, leading to slow processing of

> One of the characteristics of Big Data is the variety of data. Explain why this characteristic has resulted in the need for languages other than SQL for processing Big Data.

> Give four ways in which information in web logs pertaining to the web pages visited by a user can be used by the web site.

> What is multifactor authentication? How does it help safeguard against stolen passwords?

> a. What is an XSS attack? b. How can the referrer ﬁeld be used to detect some XSS attacks? XSS attacks:

> Many web sites today provide rich user interfaces using Ajax. List two features each of which reveals if a site uses Ajax, without having to look at the source code. Using the above features, ﬁnd three sites which use Ajax; you can view the HTML source o

> Explain the terms CRUD and REST.

> Write pseudo code to manage a connection pool. Your pseudo code must include a function to create a pool (providing a database connection string, database user name, and password as parameters), a function to request a connection from the pool, a connect

> Normalize the following schema, with given constraints, to 4NF.

> What is an SQL injection attack? Explain how it works and what precautions must be taken to prevent SQL injection attacks.

> Explain why 4NF is a normal form more desirable than BCNF.

> Given a relational schema r (A, B, C, D), does A →→ BC logically imply A →→ B and A →→ C? If yes prove it, or else give a counter example.

> Give a lossless, dependency-preserving decomposition into 3NF of schema R of Exercise 7.1.

> Given the three goals of relational database design, is there any reason to design a database schema that is in 2NF, but is in no higher-order normal form? (See Exercise 7.19 for the deﬁnition of 2NF.)

> Write a servlet that authenticates a user (based on user names and passwords stored in a database relation) and sets a session variable called use rid after au- then taxation.

> In designing a relational database, why might we choose a non-BCNF design?

> List the three design goals for relational databases, and explain why each is desirable.

> Show that every schema consisting of exactly two attributes must be in BCNF regardless of the given set F of functional dependencies.

> Although the BCNF algorithm ensures that the resulting decomposition is loss- less, it is possible to have a schema and a decomposition that was not generated by the algorithm that is in BCNF, and is not lossless. Give an example of such a schema and its

> Consider the schema R = (A, B, C, D, E, G, and H) and the set F of functional dependencies: Use the 3NF decomposition algorithm to generate a 3NF decomposition of R, and show your work. This means: a. A list of all candidate keys b. A canonical cover for

> Consider the schema R = (A, B, C, D, E, and G) and the set F of functional dependencies: Use the 3NF decomposition algorithm to generate a 3NF decomposition of R, and show your work. This means: a. A list of all candidate keys b. A canonical cover for F,

> Explain why No SQL systems emerged in the 2000s, and brieﬂy contrast their features with traditional database systems.

> Consider the schema R = (A, B, C, D, E, and G) and the set F of functional dependencies: a. Find a nontrivial functional dependency containing no extraneous at- tributes that is logically implied by the above three dependencies and ex- plain how you foun

> Consider the schema R = (A, B, C, D, E, G) and the set F of functional dependencies: R is not in BCNF for many reasons, one of which arises from the functional dependency AB â CD. Explain why AB â CD shows that R is no

> Consider the following set F of functional dependencies on the relation schema (A, B, C, D, E, and G): a. Compute B+. b. Prove (using Armstrongâs axioms) that AG is a super key. c. Compute a canonical cover for this set of functional de

> Write a servlet and associated HTML code for the following simple application: A user is allowed to submit a form containing a number, say n, and should get a response saying how many times the value n has been submitted previously. The number of times e

> Give a lossless decomposition into BCNF of schema R of Exercise 7.1.

> Design a database for an automobile company to provide to its dealers to assist them in maintaining customer records and dealer inventory and to assist sales staﬀ in ordering cars. Each vehicle is identiﬁed by a vehicle identiﬁcation number (VIN). Each i

> Consider the E-R diagram in Figure 6.30, which models an online bookstore. a. Suppose the bookstore adds Blu-ray discs and downloadable video to its collection. The same item may be present in one or both formats, with differing prices. Draw the part of

> Construct appropriate relation schemas for each of the E-R diagrams in: a. Exercise 6.1. b. Exercise 6.2. c. Exercise 6.3. d. Exercise 6.15.

> We can convert any weak entity set to a strong entity set by simply adding appropriate attributes. Why, then, do we have weak entity sets?

> Consider two entity sets A and B that both have the attribute X (among others whose names are not relevant to this question). a. If the two X s are completely unrelated, how should the design be improved? b. If the two X s represent the same property and

> Explain the diﬀerence between a weak and a strong entity set.

> List two features developed in the 2000s and that help database systems handle data-analytics workloads.

> Extend the E-R diagram of Exercise 6.3 to track the same information for all teams in a league.

> Construct an E-R diagram for a hospital with a set of patients and a set of medical doctors. Associate with each patient a log of the various tests and examinations conducted.

> Explain what a challenge– response system for authentication is. Why is it more secure than a traditional password-based system?

> Explain the distinction between total and partial constraints.

> Explain the distinction between disjoint and overlapping constraints.

> Design a generalizationâ specialization hierarchy for a motor vehicle sales company. The company sells motorcycles, passenger cars, vans, and buses. Justify Your placement of attributes at each level of the hierarchy. Explain why they s

> In Section 6.9.4, we represented a ternary relationship (repeated in Figure 6.29a) using binary relationships, as shown in Figure 6.29b. Consider the alternative shown in Figure 6.29c. Discuss the relative merits of these two alternative representations

> Design a database for an airline. The database must keep track of customers and their reservations, ﬂights and their status, seat assignments on individual ﬂights, and the schedule and routing of future ﬂights. Your design should include an E-R diagram,

> Design a database for a worldwide package delivery company (e.g., DHL or FedEx). The database must be able to keep track of customers who ship items And customers who receive items; some customers may do both. Each package must be identiï¬

> Explain the distinctions among the terms primary key, candidate key, and super key.

> The execution of a trigger can cause another action to be triggered. Most database systems place a limit on how deep the nesting can be. Explain why they might place such a limit.

> Explain the diﬀerence between two-tier and three-tier application architectures. Which is better suited for web applications? Why?

> Suppose there are two relations r and s, such that the foreign key B of r references the primary key A of s. Describe how the trigger mechanism can be used to implement the on delete cascade option when a tuple is deleted from s.

> Hackers may be able to fool you into believing that their web site is actually a web site (such as a bank or credit card web site) that you trust. This may be done by misleading email, or even by breaking into the network infrastructure and rerouting net

> Redo Exercise 5.12 using the language of your database system for coding stored procedures and functions. Note that you are likely to have to consult the online Documentation for your system as a reference, since most systems use syntax diï¬&#12

> Consider the relational schema from Exercise 5.16. Write a JDBC function using non recursive SQL to ﬁnd the total cost of part “P-100”, including the costs of all its subparts. Be sure to take into account the fact that a part may have multiple occurrenc

> Consider the relational schema Where the primary-key attributes are underlined. A tuple (p1, p2, 3) in the subpart relation denotes that the part with part id p2 is a direct subpart of the part with part id p1, and p1 has 3 copies of p2. Note that p2 may

> Consider an employee database with two relations Where the primary keys are underlined. Write a function avg salary that takes a company name as an argument and ï¬nds the average salary of employees at that company. Then, write an SQL stat

> Repeat Exercise 5.13 using ODBC, deﬁning void print Table (char *r) as a function instead of a method.

> Suppose you were asked to deﬁne a class Meta Display in Java, containing a method static void print Table(String r); the method takes a relation name r as input, executes the query “select * from r”, and prints the result out in tabular format, with the

Question: Consider the bank database of Figure 2.