Given that the LDAP functionality can be implemented on top of a database system, what is the need for the LDAP standard?
> Consider the advisor relation shown in the schema diagram in Figure 2.9, with s id as the primary key of advisor. Suppose a student can have more than one advisor. Then, would s I d still be a primary key of the advisor relation? If not, what should the
> Modify the recursive query in Figure 5.16 to define a relation Prefer depth (course id, prefer id, depth) Where the attribute depth indicates how many levels of intermediate prerequisites there are between the course and the prerequisite. Direct prerequis
> Write a Java program that allows university administrators to print the teaching record of an instructor. a. Start by having the user input the login ID and password; then open the proper connection. b. The user is asked next for a search substring and t
> Show how to express the coalesce function using the case construct.
> For the view of Exercise 4.18, explain why the database system would not allow a tuple to be inserted into the database through this view.
> Show how to define a view to credits (year, numb credits), giving the total number of credits taken in each year.
> Under what circumstances would the query Include tuples with null values for the title attribute?
> For the database of Figure 4.12, write a query to find the ID of each employee with no manager. Note that an employee may simply have no manager listed or may have a null manager. Write your query using an outer join and then write it again using no outer
> Express the following query in SQL using no sub queries and no set operations.
> Write an SQL query using the university schema to find the ID of each student who has never taken a course at the university. Do this using no sub queries and no set operations (use an outer join).
> Rewrite the query Select * From section natural join classroom Without using a natural join but instead using an inner join with a using condition.
> Suppose you wish to create an audit trail of changes to the takes relation. a. Define triggers to create an audit trail, logging the information into a relation called, for example, takes trail. The logged information should include the user-id (assume a
> List at least two reasons why database systems support data manipulation using a declarative query language such as SQL, instead of just providing a library of C or C++ functions to carry out data manipulation.
> Explain the difference between integrity constraints and authorization con- strains.
> Suppose a user creates a new relation r1 with a foreign key referencing another relation r2. What authorization privilege does the user need on r2? Why should this not simply be allowed without any such authorization?
> Suppose user A, who has all authorization privileges on a relation r, grants select on relation r to public with grant option. Suppose user B then grants select on r to A. Does this cause a cycle in the authorization graph? Explain why.
> Explain why, when a manager, say Satoshi, grants an authorization, the grant should be done by the manager role, rather than by the user Satoshi.
> Consider the query Explain why appending natural join section in the from clause would not change the result.
> List two reasons why null values might be introduced into the database.
> Give an SQL schema definition for the employee database of Figure 3.19. Choose an appropriate domain for each attribute and an appropriate primary key for each relation schema. Include any foreign-key constraints that might be appropriate.
> Consider the employee database of Figure 3.19. Give an expression in SQL for each of the following queries. a. Give all employees of “First Bank Corporation” a 10 percent raise. b. Give all managers of “First Bank Corporation” a 10 percent raise. c. Dele
> Consider the employee database of Figure 3.19, where the primary keys are underlined. Give an expression in SQL for each of the following queries. a. Find ID and name of each employee who lives in the same city as the location of the company for which th
> What are two advantages of encrypting data stored in the database?
> Consider the bank database of Figure 3.18, where the primary keys are under- lined. Construct the following SQL queries for this relational database. a. Find each customer who has an account at every branch located in “Brook- Lyn”. b. Find the total sum
> List five responsibilities of a database-management system. For each response ability, explain the problems that would arise if the responsibility were not dis- charged.
> Consider the insurance database of Figure 3.17, where the primary keys are underlined. Construct the following SQL queries for this relational database. a. Find the number of accidents involving a car belonging to a person named “John Smith”. b. Update t
> Write SQL DDL corresponding to the schema in Figure 3.17. Make any reason- able assumptions about data types, and be sure to declare primary and foreign keys.
> Write the SQL statements using the university schema to perform the following operations: a. Create a new course “CS-001”, titled “Weekly Seminar”, with 0 credits. b. Create a section of this course in fall 2017, with sec id of 1, and with the location o
> Using the university schema, write an SQL query to find the name and ID of each History student whose name begins with the letter ‘D’ and who has not taken at least five Music courses.
> Using the university schema, write an SQL query to find the names and IDs of those instructors who teach every course taught in his or her department (i.e., every course that appears in the course relation with the instructor’s department name). Order res
> Using the university schema, write an SQL query to find the IDs of those students who have retaken at least three distinct courses at least once (i.e., the student has taken the course at least two times).
> Using the university schema, use SQL to do the following: For each student who has retaken a course at least twice (i.e., the student has taken the course at least three times), show the course ID and the student’s ID. Please display your results in orde
> Using the university schema, write an SQL query to find the names of those departments whose budget is higher than that of Philosophy. List them in al- phonetic order.
> Consider the Oracle Virtual Private Database (VPD) feature described in Sec- ton 9.8.5 and an application based on our university schema. a. What predicate (using a sub query) should be generated to allow each faculty member to see only takes tuples corr
> Using the university schema, write an SQL query to find the name and ID of those Accounting students advised by an instructor in the Physics department.
> With dept. total (dept. name, value) as (select dept. name, sum (salary) from instructor Group by dept. name), dept. total avgas (value) as (Select avgas (value) from dept. total) Select dept. name From dept. total, dept. total avgas Where dept. total. V
> Explain the concept of physical data independence and its importance in database systems.
> Rewrite the where clause Where unique (select title from course) Without using the unique construct.
> Choose an enterprise of personal interest to you and explain how block chain technology could be employed usefully in that business.
> Explain how off-chain transaction processing can enhance throughput. What are the trade-offs for this benefit?
> Why is Byzantine consensus a poor consensus mechanism in a public block chain?
> How is the difficulty of proof-of-work mining adjusted as more nodes join the network, thus increasing the total computational power of the network? De- scribe the process in detail.
> Consider the library database of Figure 3.20. Write the following queries in SQL. a. Find the member number and name of each member who has borrowed at least one book published by “McGraw-Hill”. b. Find the member number and name of each member who has b
> Suppose a user forgets or loses her or his private key? How is the user affected?
> Write a servlet and associated HTML code for the following very simple application: A user is allowed to submit a form containing a value, say n, and should get a response containing n “*” symbols.
> Describe at least three tables that might be used to store information in a social- networking system such as Facebook.
> Since pointers in a block chain include a cryptographic hash of the previous block, why is there the additional need for replication of the block chain to ensure immutability?
> Since block chains are immutable, how is a transaction abort implemented so as not to violate immutability?
> In what order are block chain transactions serialized?
> Explain what application characteristics would help you decide which of TPC- C, TPC-H, or TPC-R best models the application.
> Why was the TPC-D benchmark replaced by the TPC-H and TPC-R bench- marks?
> List at least four features of the TPC benchmarks that help make them realistic and dependable measures.
> Suppose the price of memory falls by half, and the speed of disk access (number of accesses per second) doubles, while all other factors remain the same. What would be the effect of this change on the 5-minute and 1-minute rule?
> What is the motivation for splitting a long transaction into a series of small ones? What problems could arise as a result, and how can these problems be averted?
> Show that, in SQL, all is identical to not in.
> The Google search engine provides a feature whereby web sites can display advertisements supplied by Google. The advertisements supplied are based on the contents of the page. Suggest how Google might choose which advertisements to supply for a page, giv
> Suppose that your application has transactions that each access and update some that all internal nodes of the B+-tree are in memory, but only a very small fraction of the leaf pages can fit in memory. Explain how to calculate the minimum number of disks
> When carrying out performance tuning, should you try to tune your hardware (by adding disks or memory) first, or should you try to tune your transactions (by adding indices or materialized views) first. Explain your answer.
> Database tuning: a. What are the three broad levels at which a database system can be tuned to improve performance? b. Give two examples of how tuning can be done for each of the levels.
> Our description of static hashing assumes that a large contiguous stretch of disk blocks can be allocated to a static hash table. Suppose you can allocate only C contiguous blocks. Suggest how to implement the hash table, if it can be much larger than C
> Why is a hash structure not the best choice for a search key on which range queries are likely?
> What are the causes of bucket overflow in a hash file organization? What can be done to reduce the occurrence of bucket overflows?
> Explain the distinction between closed and open hashing. Discuss the relative merits of each technique in database applications.
> Suppose you want to use the idea of a quad tree for data in three dimensions. How would the resultant data structure (called an cotter) divide up space?
> The stepped merge variant of the LSM tree allows multiple trees per level. What are the tradeoffs in having more trees per level?
> For correct execution of a replicated state machine, the actions must be deterministic. What could happen if an action is non-deterministic?
> Web sites that want to get some publicity can join a web ring, where they create links to other sites in the ring in exchange for other sites in the ring creating links to their site. What is the effect of such rings on popularity ranking techniques such
> Write the following queries in SQL, using the university schema. A. Find the ID and name of each student who has taken at least one Comp. Sci. course; make sure there are no duplicate names in the result. b. Find the ID and name of each student who has n
> Why is the notion of term important when an election is used to choose a coordinator? What are the analogies between elections with terms and elections used in a democracy?
> Markel trees can be made short and fat (like B+-trees) or thin and tall (like binary search trees). Which option would be better if you are comparing data across two sites that are geographically separated, and why?
> Spanner provides read-only transactions a snapshot view of data, using multi- version two-phase locking. a. In the centralized multi-version 2PL scheme, read-only transactions never wait. But in Spanner, reads may have to wait. Explain why. b. Using an o
> Discuss the advantages and disadvantages of the two methods that we presented in Section 23.3.4 for generating globally unique timestamps.
> If we apply a distributed version of the multiple-granularity protocol of Chapter 18 to a distributed database, the site responsible for the root of the DAG may become a bottleneck. Suppose we modify that protocol as follows: • Only intention-mode locks
> In the majority protocol, what should the reader do if it finds different values from different copies, to (a) decide what is the correct value, and (b) to bring the copies back to consistency? If the reader does not bother to bring the copies back to consi
> Give an example where the read one, write all available approach leads to an erroneous state.
> What characteristics of an application make it easy to scale the application by using a key-value store, and what characteristics rule out deployment on key-value stores?
> Consider system that is processing a stream of tuples for a relation r with attributes (A, B, C, timestamp) Suppose the goal of a parallel stream processing system is to compute the number of tuples for each A value in each 5 minute window (based on the
> Suppose you wish to perform keyword querying on a set of tuples in a database, where each tuple has only a few attributes, each containing only a few words. Does the concept of term frequency make sense in this context? And that of inverse document frequ
> The attribute on which a relation is partitioned can have a significant impact on the cost of a query. a. Given a workload of SQL queries on a single relation, what attributes would be candidates for partitioning? b. How would you choose between the alter
> Using the university schema, write an SQL query to find section(s) with max- imam enrollment. The result columns should appear in the order “coursed, secede, year, semester, numb”. (It may be convenient to use the with construct.)
> What is the motivation for work-stealing with virtual nodes in a shared-memory setting? Why might work-stealing not be as efficient in a shared-nothing set- ting?
> Suppose you wish to handle a workload consisting of a large number of small transactions by using shared-nothing parallelism. a. Is intra query parallelism required in such a situation? If not, why, and what form of parallelism is appropriate? b. What fo
> Describe a good way to parallelize each of the following: a. The difference operation b. Aggregation by the count operation c. Aggregation by the count distinct operation d. Aggregation by the age operation e. Left outer join, if the join condition involv
> Can partitioned join be used for r ⋈r? A
> Joins can be expensive in a key-value store, and difficult to express if the system does not support SQL or a similar declarative query language. What can an application developer do to efficiently get results of join or aggregate queries in such a setting?
> Why is it easier for a distributed file system such as GFS or HDFS to support replication than it is for a key-value store?
> What is the motivation for storing related records together in a key-value store? Explain the idea using the notion of an entity group.
> What factors could result in skew when a relation is partitioned on one of its attributes by: a. Hash partitioning? b. Range partitioning? In each case, what can be done to reduce the skew?
> Consider the E-R diagram in Figure 8.9, which contains specializations, using subtypes and sub tables. a. Give an SQL schema definition of the E-R diagram. b. Give an SQL query to find the names of all people who are not secretaries. c. Give an SQL query t
> For each of the three partitioning techniques, namely, round-robin, hash partitioning, and range partitioning, give an example of a query for which that partitioning technique would provide the fastest response.
> Suppose that a major database vendor offers its database system (e.g., Oracle, SQL Server DB2) as a cloud service. Where would this fit among the cloud- service models? Why?
> Using the university schema, write an SQL query to find the number of students in each section. The result columns should appear in the order “coursed, secede, year, semester, numb”. You do not need to output sections with 0 students.
> In a shared-nothing system data access from a remote node can be done by remote procedure calls, or by sending messages. But remote direct memory access (RDMA) provides a much faster mechanism for such data access. Ex- plain why.
> Assume we have data items d1, d2, d n with each di protected by a lock stored in memory location Mi. a. Describe the implementation of lock-X (di) and unlock (di) via the use of the test-and-set instruction. b. Describe the implementation of lock-X (di)
> Memory systems today are divided into multiple modules, each of which can be serving a separate request at a given time, in contrast to earlier architectures where there was a single interface to memory. What impact has such a memory architecture have on
> What are the factors that can work against linear scale up in a transaction processing system? Which of the factors are likely to be the most important in each of the following architectures: shared-memory, shared disk, and shared nothing?
> Is it wise to allow a user process to access the shared-memory area of a database system? Explain your answer.
> Database systems are typically implemented as a set of processes (or threads) accessing shared memory. a. How is access to the shared-memory area controlled? b. Is two-phase locking appropriate for serializing access to the data structures in shared memo
> Assume that a growing enterprise has outgrown its current computer system and is purchasing a new parallel computer. If the growth has resulted in many more transactions per unit time, but the length of individual transactions has not changed, what measu
> Consider the schemas for the table people, and the table’s students and teachers, which were created under people, in Section 8.2.1.3. Give a relational schema in third normal form that represents the same information. Recall the constraints on sub table