Term
|
Definition
|
|
Term
| Database Management System (DBMS) |
|
Definition
| a software system responsible for acting as the go-between a user and a database, offering a set of services on that database |
|
|
Term
|
Definition
Presentation tier Application tier Data tier |
|
|
Term
|
Definition
What’s presented to the user; the UI
In simple terms it is a layer which users can access directly such as a web page, or an operating systems GUI |
|
|
Term
|
Definition
AKA (business logic, logic tier, data access tier, or middle tier)
The query engine ● The logical tier is pulled out from the presentation tier and, as its own layer, it controls an application’s functionality by performing detailed processing. |
|
|
Term
|
Definition
The database
This tier consists of database servers. Here information is stored and retrieved. This tier keeps data neutral and independent from application servers or business logic. Giving data its own tier also improves scalability and performance. |
|
|
Term
|
Definition
The name of a table, its attributes (column names)
○ Name(First, Last) ○ T1(a1, a2, … , an) |
|
|
Term
|
Definition
a set of relation schema
{R(a, b), S(c, d)} |
|
|
Term
|
Definition
|
|
Term
|
Definition
Data Definition Language Used to create databases CREATE TABLE |
|
|
Term
|
Definition
Data Query Language Used to get data from database SELECT FROM WHERE queries |
|
|
Term
|
Definition
Data Manipulation Language Used to alter the data in the database ● INSERT INTO ● UPDATE |
|
|
Term
|
Definition
| a set of attributes such that for all tuples in the table, no two have the same values for those attributes. |
|
|
Term
|
Definition
| the minimum amount of attributes required to identify a tuple from all other tuples in table |
|
|
Term
|
Definition
| An attribute of a table that references a candidate key in another table |
|
|
Term
|
Definition
| an attribute or set of attributes that serve as the basis of organization of an index structure. |
|
|
Term
|
Definition
| An index that determines the locations of the records of the file |
|
|
Term
|
Definition
| An index that does not determine the locations of the records. |
|
|
Term
|
Definition
A data model is a conceptual representation of the data structures that are required by a database application.
Includes logical/physical models in MagicDraw |
|
|
Term
|
Definition
■ planning and analysis ■ conceptual design // logic without the details ■ logical design ■ physical design ■ implementation |
|
|
Term
|
Definition
| anything in a model that can be named. |
|
|
Term
|
Definition
| description of an entity’s characteristics. all instances of the same entity have the same attributes |
|
|
Term
|
Definition
| an attribute of an entity that identifies it. in SQL, this would be equivalent to the primary key of the entity. But entities don’t have keys. They have identifiers. |
|
|
Term
|
Definition
| Any connection between two or more entities. Relationships can have attributes, and can involve more than two entities. |
|
|
Term
|
Definition
| the data model with no physical details. Includes entities (tables), attributes (columns/fields) and relationships (keys) |
|
|
Term
|
Definition
| the data model with physical details. Includes Includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, domains, and access constraints |
|
|
Term
|
Definition
| the database definition language file that’s generated from the physical model of the data model. |
|
|
Term
|
Definition
| A collection of UML objects, similar to a package in java |
|
|
Term
|
Definition
| A data model unit that represents a single relational schema |
|
|
Term
| Association (UML modeling) |
|
Definition
| A symbol used to signify foreign key/primary key relations in the UML diagram |
|
|
Term
|
Definition
| one to one, one to many, many to many; details how many objects are involved in the association |
|
|
Term
|
Definition
| When one object contains a bunch of other ones, it is an aggregational relationship. This is a form of ‘weak’ association; the aggregatees do not die when the aggregator dies. |
|
|
Term
| Indentifying/Nonidentifying |
|
Definition
| Identifying attributes are those that can be used to reference a tuple in a relation. Pretty much just primary keys. Identifying attributes are unique in the schema. |
|
|
Term
| Inheritance (UML modeling) |
|
Definition
| Inheritance can be used to simulate complex parent child relations, such as : employee is a person. therefore, employee shall inherit from person. |
|
|
Term
| How are individual constructs of a logical model compiled to a physical model? |
|
Definition
| Abstract tables and types are given physical SQL names |
|
|
Term
| What additional details must (and how) be addressed in the physical model specification panel when transforming a logical model? |
|
Definition
| Associations must be transformed into foreign key relationships and ID values |
|
|
Term
| Design Patterns and Models |
|
Definition
| You should that repeatable patterns in models may connect to abstractions of problem solving at a higher level. |
|
|
Term
| Referential integrity constraints (how to define) |
|
Definition
| In oracle sql, referential integrity constraints are defined by using the A REFERENCES B(x) syntax. |
|
|
Term
| Referential integrity constraints (how they operate) |
|
Definition
| Conceptually, foreign keys allow us to create relations amongst the data that enforce consistency. By maintaining consistency, we can have more reliable data that is free of errors and we can use features like ON DELETE/UPDATE to keep data that way. |
|
|
Term
|
Definition
| “NOT NULL” means that if a tuple is created, the constraint can at no time be ‘null.’ It must have a value |
|
|
Term
|
Definition
| A primary key is a value in a tuple that must be unique for all the tuples in a relation. |
|
|
Term
|
Definition
| In SQL, a ‘check’ is a constraint used to ensure that a value is within a set of specified values. This can be used to create enums. |
|
|
Term
|
Definition
| used as another form of referential integrity. Triggers are checks that have certain conditions that run during specified events and can perform specified actions. For the most part, triggers are just like if (this happens) then (do this) statements. |
|
|
Term
|
Definition
| Views allow us to do a few things. We can mask of sensitive parts of a table from prying eyes, and we can also get a sub-table out of a table that might be frequently used without ever touching a certain set of attributes. |
|
|
Term
| View (use as a subroutine mechanism) |
|
Definition
| Queries that contain sub-queries can sometimes be simplified by first setting up a view that gets the info needed out of a subquery. |
|
|
Term
| View (use per logical restructuring of a database) |
|
Definition
| views can also be used to create a ‘view of how customers see the database,’ allowing us to interact with the data as a customer would by masking out all of the back-end specific implementation details. |
|
|
Term
|
Definition
| The management of resources in a production line |
|
|
Term
| EDI (Electronic Data Interchange) |
|
Definition
A high speed data interchange that allows companies to quickly transfer data between one another
Used to enable commerce between partners |
|
|
Term
|
Definition
returns a relation containing tuples that satisfy specified condition
σcondition(relation) |
|
|
Term
|
Definition
Used to remove attributes from a table so that less data has to be handled
πa1,a2 (relation) ■ returns a relation that contains only attributes a1, a2 of the relation
Is similar to SQL SELECT but it removes columns instead of tuples |
|
|
Term
|
Definition
○ Taking the attributes of two tables, comparing them, and then creating a new relation based on how the data compares ○ Used to join the data of two tables ○ how the data is compared depends on the type of join |
|
|
Term
|
Definition
○ returns a selection of the relation grouped by a selected attribute ○ For example, Table(ShipperName, QuantityShipped) ■ Selecting shippername.count(quantityshipped) would give you the total number of things shipped, in a single tuple ■ Selecting it with groupby(shippername) will keep each distinct shippername separate, so that you get a total number of items shipped by each individual shipper. |
|
|
Term
|
Definition
returns the relation sorted by one or more columns (and if specified in ascending or descending
ORDER BY columnName ASC|DESC |
|
|
Term
|
Definition
Aggregation functions are functions that can be used to tell you data about the table, not the actual data of the table itself
ex: MIN - returns the minimum value of an attribute MAX - returns the max value of an attribute COUNT - returns the number of attributes that have values in a relation |
|
|
Term
|
Definition
you can use + and - on relations that have the same attributes
ex: R+S concatenates the rows of R and S R-S is equal to the relation R after you remove all the tuples in R(intersect)S |
|
|
Term
| Subqueries/Nested queries |
|
Definition
Queries can be nested within each other, creating a tree like-structure of queries
■ For example, ● select * from t1 where t1.x = (select * from t2) ● Nested select query ■ Leads to the question of which order should the queries be executed to maximize speed with respect to the tree |
|
|
Term
|
Definition
Step I
identify multidimensional data
Step II
Analyze multidimensional data into cross-tabulation
Step III
Visualize n-dimensional cube - data cube
Step IV
After you design the cube, you will use the cube's structure to build a relational database (known as a star schema) to house the data for the cube
Step V
Once you load data into the relational database, and then into the cube, you'll be able to see how attributes, dimensions, measures, and measure groups fit together within a cube to create a powerful analytical tool. |
|
|
Term
| Federated/Enterprise Schema |
|
Definition
| an element of data warehouse, is basically ● A database database. Sort of a ‘meta database’ that companies can use to keep track of their databases |
|
|
Term
| OLAP (on-line analytic processing) |
|
Definition
groupby/aggregation queries
Basic idea: converting data into information that decision makers need
Concept to analyze data by multiple dimension in a structure called data cube |
|
|
Term
|
Definition
an index where there is a key-pointer pair to every record in a data block Dense index requires more space, but is able to tell if a record exists by a quick look up (sparse must iterate through block) |
|
|
Term
|
Definition
There is one key-pointer pair for every block
A lot more cost efficient on disk, doesn’t take up as much space |
|
|
Term
|
Definition
● Usually sparse Used to cluster the primary keys of a table on the disk |
|
|
Term
|
Definition
|
|
Term
| Quantitative aspects of a B+tree |
|
Definition
The lowest levels of a B tree are all leaves/data, everything else is just keys pointing downward in the tree Every internal node is at least half full (except for the root) |
|
|
Term
| Parsing (query evaluation) |
|
Definition
| process of turning a query into a parse tree that represents the structure of the query itself |
|
|
Term
| Logical Plan (query evaluation) |
|
Definition
| parse tree is transformed into expression tree of relational algebra |
|
|
Term
| Physical Plan (query evaluation) |
|
Definition
● logical plan turned into the physical plan ● indicates operations performed, in which order, the algorithm used to perform the step, ways data is stored and passed around |
|
|
Term
|
Definition
| Scan all the blocks of the disk looking for the required data |
|
|
Term
|
Definition
| If there’s an index on the data, that can be used to look for required data |
|
|
Term
|
Definition
| Nested Join is just a simple loop through R loop through S compare their tuples and output the result |
|
|
Term
|
Definition
○ Sort the attribute that is to be joined, and then merge it on that attribute ○ quicker because there’s no searching, if the attributes match, then they’ll show up at the same time |
|
|
Term
|
Definition
|
|
Term
|
Definition
|
|
Term
|
Definition
| number of tuples per block |
|
|
Term
|
Definition
| number of distinct values for attribute a on R |
|
|
Term
| Linear cost of table_scan |
|
Definition
c * B(R)
c is cost of finding a block |
|
|
Term
| Affine cost of a table_scan |
|
Definition
c + c’ (B(R) -1) c is cost of finding a block, c’ is the seek time of finding adjacent block |
|
|
Term
|
Definition
| Only one cost - T(R)/B(R) |
|
|
Term
| size of a select from relation R |
|
Definition
| T(R) / (V(R, x1) * V(R, x2) … * V(R, xn)) for all x you’re selecting on |
|
|
Term
| size of a join on relations R and S |
|
Definition
T(R) * T(S) / (max(V(R, x), V(S, x)) * max(V(R, y), V(S, y) …) ○ the denominator is the max of V() for all attributes that R and S have in common |
|
|
Term
| Greedy rules e.g. pushing selects |
|
Definition
| execute select and project before the joins |
|
|
Term
| Dynamic programming method of optimizing join orders |
|
Definition
● bottom up style ● Steps: start with A, B, C, D ○ consider two way joins (AB, AC, AD, BC, BD, CD, threeway joins (ABC, ABD, ACD, BCD), etc - consider minimal cost of all possible plans ● extreme case of left-deep join trees with n! possible sequences - found in 2n-1 steps |
|
|
Term
| Definition and use of a query graph |
|
Definition
● For query Q, a query graph is a labeled graph (V,E) where: ○ each vertex in V (Vi) corresponds with a relation in R (Ri) ○ an edge exists from Vi to Vj if predicate P(Ri, Rj) exists in Q ● Pros: avoids Cartesian products
● Steps: ○ remove vertex V from query graph and add to join plan ○ choose vertex in remaining query graph but connected edge-wise to plan |
|
|
Term
|
Definition
| A transaction is a collection of actions upon a database bundled up as a single package that contain follow the ACID properties. |
|
|
Term
|
Definition
■ A…TOMICITY All of the actions in a transaction happen, or NONE of them do |
|
|
Term
|
Definition
■ C...ONSISTENCY ● If the database was consistent before the transaction, it will remain consistent after the transaction |
|
|
Term
|
Definition
■ I...SOLOATION ● the execution of a transaction is unaffected by other transactions |
|
|
Term
|
Definition
■ D…URABILITY ● if a transaction commits, then the effects of that transaction persist in the database |
|
|
Term
| Architectural Component of RDBMS (i.e. logs) |
|
Definition
● What enable transactions to occur ● A logbook of every event that happens in the DBMS ● If a transaction does not complete, then it is not added to the log |
|
|
Term
|
Definition
| Java Database Connectivity |
|
|
Term
|
Definition
■ establish running programs/drivers (database server & app/client driver) ■ create connectivity (“open a connection”) ■ ask something of the database using Statements ■ process returned results (after executing queries) |
|
|
Term
| MapReduce Master/Slave architecture |
|
Definition
|
|
Term
| MapReduce Two phase algorithm |
|
Definition
Map: ● function written by user that takes input data, and outputs a key/value pair ● similar to GROUPBY
Reduce: ● function that takes a list of key,value pairs that were generated by the Mapper and condenses them down ● simliar to COUNT |
|
|
Term
| Example of a word counting program |
|
Definition
| ● Mapper would take each word, and emit
● Reducer would take all of these and then count up all of them
●
● reduces to |
|
|
Term
|
Definition
| a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. |
|
|
Term
|
Definition
| Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. |
|
|
Term
|
Definition
Stable storage must be provided to prevent data loss
Um I guess you have paired sectors that you write data to with parity checks to make sure everything is ok |
|
|
Term
|
Definition
| Resource Description Framework
Has triples:
Triplets are used on a directed label graph |
|
|
Term
|
Definition
Merge Sort - divide and conquer Divide (sort) and Merge |
|
|
Term
| Joins are commutative and associative |
|
Definition
|
|
Term
|
Definition
| right child is always a relation, not a result of a join |
|
|
Term
| How to compute I/O cost for a select using a primary key (linear) |
|
Definition
| Since we're using a primary key, there could only be at most one record with that value. So it would be 1 I/O lookup |
|
|
Term
| How to compute I/O cost for a select using a compound primary key where all the values are given in the select query (linear) |
|
Definition
| this is essentially one unique primary key so at most 1 record, 1 I/O lookup |
|
|
Term
How to compute I/O cost for a select using a compound primary key where not all the values are given in the select query (linear)
where x = a AND y = b etc |
|
Definition
| order of the compound key is important because it tells you the order to which the attributes were sorted. Whichever is the first attribute in the compound key, take the total number of blocks of the relation and divide it by the number of unique vales of that attribute. Take this number and divide by the number of unique values of the second attribute and so on until you're done. I guess you take the ceiling of the final number you end up with (may very well be a fraction as to which case the I/O is just 1 since the answer is stored on a fraction of a block). The total number of records would be the final number * (total number of tuples/the number of blocks for the relation) |
|
|
Term
| If there's no indexing, you have to do a table_scan when trying to figure out I/O cost |
|
Definition
|
|
Term
| Figuring I/O with an AND clause in a select |
|
Definition
| take the total number of blocks and divide by the number of unique values of the first attribute. take this number and divide by the number of unique values of the second attribute and so. this well tell you how many blocks (I/O accesses) hold that info. just times this number by tuples/blocks to get the number of records this will return |
|
|
Term
| Figuring I/O with an OR clause in a select |
|
Definition
You have to do the union law: AUB = A+B-|A intersect B|
so for example, see how many unique values there are of A, times this by the records/block. this will give you the number of blocks holding A
see how many unique values there are of B, times this by the records/block. this will give you the number of blocks holding B
figure out how many blocks hold them together (remember the divide trick
then add A+B - AandB |
|
|
Term
| Figuring out the size of a join query |
|
Definition
| T(R)T(S)/Max of the unique value of the attribute they have in common |
|
|
Term
| Figuring out the size of a join query - no common attributes |
|
Definition
|
|
Term
| Figuring out the size of a join query, multiple shared attribute |
|
Definition
| T(R)T(S)/Max of attribute 1 between the two * Max of attribute 2 between the two |
|
|
Term
| How to figure out the number of blocks and I/O for a dense, sequential B tree |
|
Definition
Take the number of records and divide it by the number of records per block. ------------------------------------- Take the total number of records and divide it by the number of pointers and keep successfully dividing the answer by the number of pointers until you can't anymore.
Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO |
|
|
Term
| How to figure out the number of blocks and I/O for a sparse, sequential B tree |
|
Definition
Take the number of records and divide it by the number of records per block
Use this number and divide it by the number of keys until you can't anymore
Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO |
|
|
Term
| How to figure out the number of blocks and I/O for a dense B tree in no particular order |
|
Definition
| same process as if the B tree was sorted |
|
|