Using the university schema, write an SQL query to find section(s) with max- imam enrollment. The result columns should appear in the order “coursed, secede, year, semester, numb”. (It may be convenient to use the with construct.)
> Using the university schema, use SQL to do the following: For each student who has retaken a course at least twice (i.e., the student has taken the course at least three times), show the course ID and the student’s ID. Please display your results in orde
> Using the university schema, write an SQL query to find the names of those departments whose budget is higher than that of Philosophy. List them in al- phonetic order.
> Consider the Oracle Virtual Private Database (VPD) feature described in Sec- ton 9.8.5 and an application based on our university schema. a. What predicate (using a sub query) should be generated to allow each faculty member to see only takes tuples corr
> Using the university schema, write an SQL query to find the name and ID of those Accounting students advised by an instructor in the Physics department.
> With dept. total (dept. name, value) as (select dept. name, sum (salary) from instructor Group by dept. name), dept. total avgas (value) as (Select avgas (value) from dept. total) Select dept. name From dept. total, dept. total avgas Where dept. total. V
> Explain the concept of physical data independence and its importance in database systems.
> Rewrite the where clause Where unique (select title from course) Without using the unique construct.
> Choose an enterprise of personal interest to you and explain how block chain technology could be employed usefully in that business.
> Explain how off-chain transaction processing can enhance throughput. What are the trade-offs for this benefit?
> Why is Byzantine consensus a poor consensus mechanism in a public block chain?
> How is the difficulty of proof-of-work mining adjusted as more nodes join the network, thus increasing the total computational power of the network? De- scribe the process in detail.
> Consider the library database of Figure 3.20. Write the following queries in SQL. a. Find the member number and name of each member who has borrowed at least one book published by “McGraw-Hill”. b. Find the member number and name of each member who has b
> Suppose a user forgets or loses her or his private key? How is the user affected?
> Write a servlet and associated HTML code for the following very simple application: A user is allowed to submit a form containing a value, say n, and should get a response containing n “*” symbols.
> Describe at least three tables that might be used to store information in a social- networking system such as Facebook.
> Since pointers in a block chain include a cryptographic hash of the previous block, why is there the additional need for replication of the block chain to ensure immutability?
> Since block chains are immutable, how is a transaction abort implemented so as not to violate immutability?
> In what order are block chain transactions serialized?
> Given that the LDAP functionality can be implemented on top of a database system, what is the need for the LDAP standard?
> Explain what application characteristics would help you decide which of TPC- C, TPC-H, or TPC-R best models the application.
> Why was the TPC-D benchmark replaced by the TPC-H and TPC-R bench- marks?
> List at least four features of the TPC benchmarks that help make them realistic and dependable measures.
> Suppose the price of memory falls by half, and the speed of disk access (number of accesses per second) doubles, while all other factors remain the same. What would be the effect of this change on the 5-minute and 1-minute rule?
> What is the motivation for splitting a long transaction into a series of small ones? What problems could arise as a result, and how can these problems be averted?
> Show that, in SQL, all is identical to not in.
> The Google search engine provides a feature whereby web sites can display advertisements supplied by Google. The advertisements supplied are based on the contents of the page. Suggest how Google might choose which advertisements to supply for a page, giv
> Suppose that your application has transactions that each access and update some that all internal nodes of the B+-tree are in memory, but only a very small fraction of the leaf pages can fit in memory. Explain how to calculate the minimum number of disks
> When carrying out performance tuning, should you try to tune your hardware (by adding disks or memory) first, or should you try to tune your transactions (by adding indices or materialized views) first. Explain your answer.
> Database tuning: a. What are the three broad levels at which a database system can be tuned to improve performance? b. Give two examples of how tuning can be done for each of the levels.
> Our description of static hashing assumes that a large contiguous stretch of disk blocks can be allocated to a static hash table. Suppose you can allocate only C contiguous blocks. Suggest how to implement the hash table, if it can be much larger than C
> Why is a hash structure not the best choice for a search key on which range queries are likely?
> What are the causes of bucket overflow in a hash file organization? What can be done to reduce the occurrence of bucket overflows?
> Explain the distinction between closed and open hashing. Discuss the relative merits of each technique in database applications.
> Suppose you want to use the idea of a quad tree for data in three dimensions. How would the resultant data structure (called an cotter) divide up space?
> The stepped merge variant of the LSM tree allows multiple trees per level. What are the tradeoffs in having more trees per level?
> For correct execution of a replicated state machine, the actions must be deterministic. What could happen if an action is non-deterministic?
> Web sites that want to get some publicity can join a web ring, where they create links to other sites in the ring in exchange for other sites in the ring creating links to their site. What is the effect of such rings on popularity ranking techniques such
> Write the following queries in SQL, using the university schema. A. Find the ID and name of each student who has taken at least one Comp. Sci. course; make sure there are no duplicate names in the result. b. Find the ID and name of each student who has n
> Why is the notion of term important when an election is used to choose a coordinator? What are the analogies between elections with terms and elections used in a democracy?
> Markel trees can be made short and fat (like B+-trees) or thin and tall (like binary search trees). Which option would be better if you are comparing data across two sites that are geographically separated, and why?
> Spanner provides read-only transactions a snapshot view of data, using multi- version two-phase locking. a. In the centralized multi-version 2PL scheme, read-only transactions never wait. But in Spanner, reads may have to wait. Explain why. b. Using an o
> Discuss the advantages and disadvantages of the two methods that we presented in Section 23.3.4 for generating globally unique timestamps.
> If we apply a distributed version of the multiple-granularity protocol of Chapter 18 to a distributed database, the site responsible for the root of the DAG may become a bottleneck. Suppose we modify that protocol as follows: • Only intention-mode locks
> In the majority protocol, what should the reader do if it finds different values from different copies, to (a) decide what is the correct value, and (b) to bring the copies back to consistency? If the reader does not bother to bring the copies back to consi
> Give an example where the read one, write all available approach leads to an erroneous state.
> What characteristics of an application make it easy to scale the application by using a key-value store, and what characteristics rule out deployment on key-value stores?
> Consider system that is processing a stream of tuples for a relation r with attributes (A, B, C, timestamp) Suppose the goal of a parallel stream processing system is to compute the number of tuples for each A value in each 5 minute window (based on the
> Suppose you wish to perform keyword querying on a set of tuples in a database, where each tuple has only a few attributes, each containing only a few words. Does the concept of term frequency make sense in this context? And that of inverse document frequ
> The attribute on which a relation is partitioned can have a significant impact on the cost of a query. a. Given a workload of SQL queries on a single relation, what attributes would be candidates for partitioning? b. How would you choose between the alter
> What is the motivation for work-stealing with virtual nodes in a shared-memory setting? Why might work-stealing not be as efficient in a shared-nothing set- ting?
> Suppose you wish to handle a workload consisting of a large number of small transactions by using shared-nothing parallelism. a. Is intra query parallelism required in such a situation? If not, why, and what form of parallelism is appropriate? b. What fo
> Describe a good way to parallelize each of the following: a. The difference operation b. Aggregation by the count operation c. Aggregation by the count distinct operation d. Aggregation by the age operation e. Left outer join, if the join condition involv
> Can partitioned join be used for r ⋈r? A
> Joins can be expensive in a key-value store, and difficult to express if the system does not support SQL or a similar declarative query language. What can an application developer do to efficiently get results of join or aggregate queries in such a setting?
> Why is it easier for a distributed file system such as GFS or HDFS to support replication than it is for a key-value store?
> What is the motivation for storing related records together in a key-value store? Explain the idea using the notion of an entity group.
> What factors could result in skew when a relation is partitioned on one of its attributes by: a. Hash partitioning? b. Range partitioning? In each case, what can be done to reduce the skew?
> Consider the E-R diagram in Figure 8.9, which contains specializations, using subtypes and sub tables. a. Give an SQL schema definition of the E-R diagram. b. Give an SQL query to find the names of all people who are not secretaries. c. Give an SQL query t
> For each of the three partitioning techniques, namely, round-robin, hash partitioning, and range partitioning, give an example of a query for which that partitioning technique would provide the fastest response.
> Suppose that a major database vendor offers its database system (e.g., Oracle, SQL Server DB2) as a cloud service. Where would this fit among the cloud- service models? Why?
> Using the university schema, write an SQL query to find the number of students in each section. The result columns should appear in the order “coursed, secede, year, semester, numb”. You do not need to output sections with 0 students.
> In a shared-nothing system data access from a remote node can be done by remote procedure calls, or by sending messages. But remote direct memory access (RDMA) provides a much faster mechanism for such data access. Ex- plain why.
> Assume we have data items d1, d2, d n with each di protected by a lock stored in memory location Mi. a. Describe the implementation of lock-X (di) and unlock (di) via the use of the test-and-set instruction. b. Describe the implementation of lock-X (di)
> Memory systems today are divided into multiple modules, each of which can be serving a separate request at a given time, in contrast to earlier architectures where there was a single interface to memory. What impact has such a memory architecture have on
> What are the factors that can work against linear scale up in a transaction processing system? Which of the factors are likely to be the most important in each of the following architectures: shared-memory, shared disk, and shared nothing?
> Is it wise to allow a user process to access the shared-memory area of a database system? Explain your answer.
> Database systems are typically implemented as a set of processes (or threads) accessing shared memory. a. How is access to the shared-memory area controlled? b. Is two-phase locking appropriate for serializing access to the data structures in shared memo
> Assume that a growing enterprise has outgrown its current computer system and is purchasing a new parallel computer. If the growth has resulted in many more transactions per unit time, but the length of individual transactions has not changed, what measu
> Consider the schemas for the table people, and the table’s students and teachers, which were created under people, in Section 8.2.1.3. Give a relational schema in third normal form that represents the same information. Recall the constraints on sub table
> If an enterprise uses its own ERP application on a cloud service under the platform-as-a-service model, what restrictions would there be on when that enterprise may upgrade the ERP system to a new version?
> Consider a bank that has a collection of sites, each running a database system. Suppose the only way the databases interact is by electronic transfer of money between themselves, using persistent messaging. Would such a system qualify as a distributed da
> Suppose there is a transaction that has been running for a very long time but has performed very few updates. a. What effect would the transaction have on recovery time with the recovery algorithm of Section 19.4, and with the ARIES recovery algorithm? b.
> Using the university schema, write an SQL query to find the ID and title of each course in Comp. Sci. that has had at least one section with afternoon hours (i.e., ends at or after 12:00). (You should eliminate duplicates if any.)
> Consider the log in Figure 19.5. Suppose there is a crash just before the log
> Explain why logical undo logging is used widely, whereas logical redo logging (other than physiological redo logging) is rarely used.
> Physiological redo logging can reduce logging overheads significantly, especially with a slotted page record organization. Explain why.
> Suppose two-phase locking is used, but exclusive locks are released early, that is, locking is not done in a strict two-phase manner. Give an example to show why transaction rollback can result in a wrong final state, when using the log- based recovery al
> Outline the drawbacks of the no-steal and force buffer management policies.
> Explain how the database may become inconsistent if some log records pertaining to a block are not output to stable storage before the block is output to disk.
> Redesign the database of Exercise 8.4 into first normal form and fourth normal form. List any functional or multivalued dependencies that you assume. Also list all referential-integrity constraints that should be present in the first and fourth normal form
> Stable storage cannot be implemented. a. Explain why it cannot be. b. Explain how database systems deal with this problem
> For each of the following requirements, identify the best choice of degree of durability in a remote backup system: a. Data loss must be avoided, but some loss of availability may be tolerated. b. Transaction commit must be accomplished quickly, even at
> Explain the difference between a system crash and a “disaster.”
> In the ARIES recovery algorithm: a. If at the beginning of the analysis pass, a page is not in the checkpoint dirty page table, will we need to apply any redo records to it? Why? b. What is Rec LSN, and how is it used to minimize unnecessary redoes?
> Rewrite the preceding query, but also ensure that you include only instructors who have given at least one other non-null grade in some course.
> Compare log-based recovery with the shadow-copy scheme in terms of their overheads for the case when data are being added to newly allocated disk pages (in other words, there is no old value to be restored in case the transaction aborts).
> Consider the log in Figure 19.7. Suppose there is a crash during recovery, just before the operation abort log record is written for operation O1. Explain what will happen when the system recovers again.
> Explain the difference between the three storage types — volatile, nonvolatile, and stable— in terms of I/O cost.
> Suppose the lock hierarchy for a database consists of database, relations, and tuples. a. If a transaction needs to read a lot of tuples from a relation r, what locks should it acquire? b. Now suppose the transaction wants to update a few of the tuples i
> The multiple-granularity protocol rules specify that a transaction Ti can lock a node Q in S or IS mode only if Ti currently has the parent of Q locked in either IX or IS mode. Given that SIX and S locks are stronger than IX or IS locks, why does the pro
> Describe the differences in meaning between the terms relation and relation schema.
> Although SIX mode is useful in multiple-granularity locking, an exclusive and intention-shared (XIS) mode is of no use. Why is it useless?
> In multiple-granularity locking, what is the difference between implicit and explicit locking?
> If deadlock is avoided by deadlock-avoidance schemes, is starvation still possible? Explain your answer.
> Under what conditions is it less expensive to avoid deadlock than to allow deadlocks to occur and then to detect them?
> Consider a variant of the tree protocol called the forest protocol. The database is organized as a forest of rooted trees. Each transaction Ti must follow the following rules: • The first lock in each tree may be on any data item. • The second, and all su
> Using the university schema, write an SQL query to find the ID and name of each instructor who has never given an A grade in any course she or he has taught. (Instructors who have never taught a course trivially satisfy this condition.)
> Consider the following locking protocol: All items are numbered, and once an item is unlocked, only higher-numbered items may be locked. Locks may be released at any time. Only X-locks are used. Show by an example that this protocol does not guarantee se
> Most implementations of database systems use strict two-phase locking. Suggest three reasons for the popularity of this protocol.
> Many transactions update a common item (e.g., the cash balance at a branch) and private items (e.g., individual account balances). Explain how you can in- crease concurrency (and throughput) by ordering the operations of the trans- action.