Shared Flashcard Set

Details

Title

CS 347 - Final Exam Notes

Description

Miranker

Total Cards

113

Subject

Computer Science

Level

Undergraduate 3

Created

10/22/2013

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

Database

Definition

a collection of data

Term

Database Management System (DBMS)

Definition

a software system responsible for acting as the go-between a user and a database, offering a set of services on that database

Term

Three tier architecture

Definition

Presentation tier
Application tier
Data tier

Term

Presentation tier

Definition

What’s presented to the user; the UI

In simple terms it is a layer which users can access directly such as a web page, or an operating systems GUI

Term

Application tier

Definition

AKA (business logic, logic tier, data access tier, or middle tier)

The query engine
● The logical tier is pulled out from the presentation tier and, as its own layer, it controls an application’s functionality by performing detailed processing.

Term

Data tier

Definition

The database

This tier consists of database servers. Here information is stored and retrieved. This tier keeps data neutral and independent from application servers or business logic. Giving data its own tier also improves scalability and performance.

Term

Relation schema

Definition

The name of a table, its attributes (column names)

○ Name(First, Last)
○ T1(a1, a2, … , an)

Term

Database schema

Definition

a set of relation schema

{R(a, b), S(c, d)}

Term

SQL and its 3 components

Definition

DDL, DQL, DML

Term

DDL

Definition

Data Definition Language
Used to create databases
CREATE TABLE

Term

DQL

Definition

Data Query Language
Used to get data from database
SELECT FROM WHERE queries

Term

DML

Definition

Data Manipulation Language
Used to alter the data in the database
● INSERT INTO
● UPDATE

Term

Super Key

Definition

a set of attributes such that for all tuples in the table, no two have the same values for those attributes.

Term

Candidate Key

Definition

the minimum amount of attributes required to identify a tuple from all other tuples in table

Term

Foreign Key

Definition

An attribute of a table that references a candidate key in another table

Term

Index Key

Definition

an attribute or set of attributes that serve as the basis of organization of an index structure.

Term

Primary index

Definition

An index that determines the locations of the records of the file

Term

Secondary index

Definition

An index that does not determine the locations of the records.

Term

Data model

Definition

A data model is a conceptual representation of the data structures that are required by a database application.

Includes logical/physical models in MagicDraw

Term

Steps of data modeling

Definition

■ planning and analysis
■ conceptual design // logic without the details
■ logical design
■ physical design
■ implementation

Term

Entity

Definition

anything in a model that can be named.

Term

Attribute

Definition

description of an entity’s characteristics. all instances of the same entity have the same attributes

Term

Identifier

Definition

an attribute of an entity that identifies it. in SQL, this would be equivalent to the primary key of the entity. But entities don’t have keys. They have identifiers.

Term

Relationship

Definition

Any connection between two or more entities. Relationships can have attributes, and can involve more than two entities.

Term

Logical Model

Definition

the data model with no physical details. Includes entities (tables), attributes (columns/fields) and relationships (keys)

Term

Physical Model

Definition

the data model with physical details. Includes Includes tables, columns, keys, data types, validation rules, database triggers, stored procedures, domains, and access constraints

Term

DDL

Definition

the database definition language file that’s generated from the physical model of the data model.

Term

Package (UML modeling)

Definition

A collection of UML objects, similar to a package in java

Term

Class (UML modeling)

Definition

A data model unit that represents a single relational schema

Term

Association (UML modeling)

Definition

A symbol used to signify foreign key/primary key relations in the UML diagram

Term

Multiplicity

Definition

one to one, one to many, many to many; details how many objects are involved in the association

Term

Aggregation

Definition

When one object contains a bunch of other ones, it is an aggregational relationship. This is a form of ‘weak’ association; the aggregatees do not die when the aggregator dies.

Term

Indentifying/Nonidentifying

Definition

Identifying attributes are those that can be used to reference a tuple in a relation. Pretty much just primary keys. Identifying attributes are unique in the schema.

Term

Inheritance (UML modeling)

Definition

Inheritance can be used to simulate complex parent child relations, such as : employee is a person. therefore, employee shall inherit from person.

Term

How are individual constructs of a logical model compiled to a physical model?

Definition

Abstract tables and types are given physical SQL names

Term

What additional details must (and how) be addressed in the physical model specification panel when transforming a logical model?

Definition

Associations must be transformed into foreign key relationships and ID values

Term

Design Patterns and Models

Definition

You should that repeatable patterns in models may connect to abstractions of problem solving at a higher level.

Term

Referential integrity constraints (how to define)

Definition

In oracle sql, referential integrity constraints are defined by using the A REFERENCES B(x) syntax.

Term

Referential integrity constraints (how they operate)

Definition

Conceptually, foreign keys allow us to create relations amongst the data that enforce consistency. By maintaining consistency, we can have more reliable data that is free of errors and we can use features like ON DELETE/UPDATE to keep data that way.

Term

Not null

Definition

“NOT NULL” means that if a tuple is created, the constraint can at no time be ‘null.’ It must have a value

Term

Primary key

Definition

A primary key is a value in a tuple that must be unique for all the tuples in a relation.

Term

Check

Definition

In SQL, a ‘check’ is a constraint used to ensure that a value is within a set of specified values. This can be used to create enums.

Term

Triggers

Definition

used as another form of referential integrity. Triggers are checks that have certain conditions that run during specified events and can perform specified actions. For the most part, triggers are just like if (this happens) then (do this) statements.

Term

View (syntax/semantics)

Definition

Views allow us to do a few things. We can mask of sensitive parts of a table from prying eyes, and we can also get a sub-table out of a table that might be frequently used without ever touching a certain set of attributes.

Term

View (use as a subroutine mechanism)

Definition

Queries that contain sub-queries can sometimes be simplified by first setting up a view that gets the info needed out of a subquery.

Term

View (use per logical restructuring of a database)

Definition

views can also be used to create a ‘view of how customers see the database,’ allowing us to interact with the data as a customer would by masking out all of the back-end specific implementation details.

Term

Supply chain

Definition

The management of resources in a production line

Term

EDI (Electronic Data Interchange)

Definition

A high speed data interchange that allows companies to quickly transfer data between one another

Used to enable commerce between partners

Term

Select operator

Definition

returns a relation containing tuples that satisfy specified condition

σcondition(relation)

Term

Project operator

Definition

Used to remove attributes from a table so that less data has to be handled

πa1,a2 (relation)
■ returns a relation that contains only attributes a1, a2 of the relation

Is similar to SQL SELECT but it removes columns instead of tuples

Term

Join operator

Definition

○ Taking the attributes of two tables, comparing them, and then creating a new relation based on how the data compares
○ Used to join the data of two tables
○ how the data is compared depends on the type of join

Term

Group by

Definition

○ returns a selection of the relation grouped by a selected attribute
○ For example, Table(ShipperName, QuantityShipped)
■ Selecting shippername.count(quantityshipped) would give you the total number of things shipped, in a single tuple
■ Selecting it with groupby(shippername) will keep each distinct shippername separate, so that you get a total number of items shipped by each individual shipper.

Term

Ordered by

Definition

returns the relation sorted by one or more columns (and if specified in ascending or descending

ORDER BY columnName ASC|DESC

Term

Aggregation functions

Definition

Aggregation functions are functions that can be used to tell you data about the table, not the actual data of the table itself

ex: MIN - returns the minimum value of an attribute
MAX - returns the max value of an attribute
COUNT - returns the number of attributes that have values in a relation

Term

Algebraic Identifiers

Definition

you can use + and - on relations that have the same attributes

ex: R+S concatenates the rows of R and S
R-S is equal to the relation R after you remove all the tuples in R(intersect)S

Term

Subqueries/Nested queries

Definition

Queries can be nested within each other, creating a tree like-structure of queries

■ For example,
● select * from t1 where t1.x = (select * from t2)
● Nested select query
■ Leads to the question of which order should the queries be executed to maximize speed with respect to the tree

Term

OLAP steps

Definition

Step I

identify multidimensional data

Step II

Analyze multidimensional data into cross-tabulation

Step III

Visualize n-dimensional cube - data cube

Step IV

After you design the cube, you will use the cube's structure to build a relational database (known as a star schema) to house the data for the cube

Step V

Once you load data into the relational database, and then into the cube, you'll be able to see how attributes, dimensions, measures, and measure groups fit together within a cube to create a powerful analytical tool.

Term

Federated/Enterprise Schema

Definition

an element of data warehouse, is basically ● A database database. Sort of a ‘meta database’ that companies can use to keep track of their databases

Term

OLAP (on-line analytic processing)

Definition

groupby/aggregation queries

Basic idea: converting data into information that decision makers need

Concept to analyze data by multiple dimension in a structure called data cube

Term

Dense index

Definition

an index where there is a key-pointer pair to every record in a data block
Dense index requires more space, but is able to tell if a record exists by a quick look up (sparse must iterate through block)

Term

Sparse index

Definition

There is one key-pointer pair for every block

A lot more cost efficient on disk, doesn’t take up as much space

Term

B+tree (primary index)

Definition

● Usually sparse
Used to cluster the primary keys of a table on the disk

Term

B+tree (secondary index)

Definition

usually dense

Term

Quantitative aspects of a B+tree

Definition

The lowest levels of a B tree are all leaves/data, everything else is just keys pointing downward in the tree
Every internal node is at least half full (except for the root)

Term

Parsing (query evaluation)

Definition

process of turning a query into a parse tree that represents the structure of the query itself

Term

Logical Plan (query evaluation)

Definition

parse tree is transformed into expression tree of relational algebra

Term

Physical Plan (query evaluation)

Definition

● logical plan turned into the physical plan
● indicates operations performed, in which order, the algorithm used to perform the step, ways data is stored and passed around

Term

table_scan

Definition

Scan all the blocks of the disk looking for the required data

Term

index_scan

Definition

If there’s an index on the data, that can be used to look for required data

Term

Nested loops

Definition

Nested Join is just a simple loop through R loop through S compare their tuples and output the result

Term

Merge Join

Definition

○ Sort the attribute that is to be joined, and then merge it on that attribute
○ quicker because there’s no searching, if the attributes match, then they’ll show up at the same time

Term

B(R)

Definition

number of blocks in R

Term

T(R)

Definition

number of tuples in R

Term

T(R)/B(R)

Definition

number of tuples per block

Term

V(R, a)

Definition

number of distinct values for attribute a on R

Term

Linear cost of table_scan

Definition

c * B(R)

c is cost of finding a block

Term

Affine cost of a table_scan

Definition

c + c’ (B(R) -1)
c is cost of finding a block, c’ is the seek time of finding adjacent block

Term

Cost of an index_scan

Definition

Only one cost - T(R)/B(R)

Term

size of a select from relation R

Definition

T(R) / (V(R, x1) * V(R, x2) … * V(R, xn)) for all x you’re selecting on

Term

size of a join on relations R and S

Definition

T(R) * T(S) / (max(V(R, x), V(S, x)) * max(V(R, y), V(S, y) …)
○ the denominator is the max of V() for all attributes that R and S have in common

Term

Greedy rules e.g. pushing selects

Definition

execute select and project before the joins

Term

Dynamic programming method of optimizing join orders

Definition

● bottom up style
● Steps: start with A, B, C, D
○ consider two way joins (AB, AC, AD, BC, BD, CD, threeway joins (ABC, ABD, ACD, BCD), etc - consider minimal cost of all possible plans
● extreme case of left-deep join trees with n! possible sequences - found in 2n-1 steps

Term

Definition and use of a query graph

Definition

● For query Q, a query graph is a labeled graph (V,E) where:
○ each vertex in V (Vi) corresponds with a relation in R (Ri)
○ an edge exists from Vi to Vj if predicate P(Ri, Rj) exists in Q
● Pros: avoids Cartesian products

● Steps:
○ remove vertex V from query graph and add to join plan
○ choose vertex in remaining query graph but connected edge-wise to plan

Term

Transactions

Definition

A transaction is a collection of actions upon a database bundled up as a single package that contain follow the ACID properties.

Term

A in ACID

Definition

■ A…TOMICITY
All of the actions in a transaction happen, or NONE of them do

Term

C in ACID

Definition

■ C...ONSISTENCY
● If the database was consistent before the transaction, it will remain consistent after the transaction

Term

I in ACID

Definition

■ I...SOLOATION
● the execution of a transaction is unaffected by other transactions

Term

D in ACID

Definition

■ D…URABILITY
● if a transaction commits, then the effects of that transaction persist in the database

Term

Architectural Component of RDBMS (i.e. logs)

Definition

● What enable transactions to occur
● A logbook of every event that happens in the DBMS
● If a transaction does not complete, then it is not added to the log

Term

JDBC

Definition

Java Database Connectivity

Term

JDBC Steps

Definition

■ establish running programs/drivers (database server & app/client driver)
■ create connectivity (“open a connection”)
■ ask something of the database using Statements
■ process returned results (after executing queries)

Term

MapReduce Master/Slave architecture

Definition

Term

MapReduce Two phase algorithm

Definition

Map: ● function written by user that takes input data, and outputs a key/value pair
● similar to GROUPBY

Reduce: ● function that takes a list of key,value pairs that were generated by the Mapper and condenses them down
● simliar to COUNT

Term

Example of a word counting program

Definition

● Mapper would take each word, and emit ● Reducer would take all of these and then count up all of them ● ● reduces to

Term

Data Warehouse

Definition

a collection of data designed to support management decision making. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time.

Term

Data Warehouse (parts)

Definition

Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data.

Term

Stable store

Definition

Stable storage must be provided to prevent data loss

Um I guess you have paired sectors that you write data to with parity checks to make sure everything is ok

Term

RDF

Definition

Resource Description Framework Has triples: Triplets are used on a directed label graph

Term

Two Phase Sort

Definition

Merge Sort - divide and conquer
Divide (sort) and Merge

Term

Joins are commutative and associative

Definition

Term

Left-deep join

Definition

right child is always a relation, not a result of a join

Term

How to compute I/O cost for a select using a primary key (linear)

Definition

Since we're using a primary key, there could only be at most one record with that value. So it would be 1 I/O lookup

Term

How to compute I/O cost for a select using a compound primary key where all the values are given in the select query (linear)

Definition

this is essentially one unique primary key so at most 1 record, 1 I/O lookup

Term

How to compute I/O cost for a select using a compound primary key where not all the values are given in the select query (linear)

where x = a AND y = b etc

Definition

order of the compound key is important because it tells you the order to which the attributes were sorted. Whichever is the first attribute in the compound key, take the total number of blocks of the relation and divide it by the number of unique vales of that attribute. Take this number and divide by the number of unique values of the second attribute and so on until you're done. I guess you take the ceiling of the final number you end up with (may very well be a fraction as to which case the I/O is just 1 since the answer is stored on a fraction of a block). The total number of records would be the final number * (total number of tuples/the number of blocks for the relation)

Term

If there's no indexing, you have to do a table_scan when trying to figure out I/O cost

Definition

Term

Figuring I/O with an AND clause in a select

Definition

take the total number of blocks and divide by the number of unique values of the first attribute. take this number and divide by the number of unique values of the second attribute and so. this well tell you how many blocks (I/O accesses) hold that info. just times this number by tuples/blocks to get the number of records this will return

Term

Figuring I/O with an OR clause in a select

Definition

You have to do the union law:
AUB = A+B-|A intersect B|

so for example, see how many unique values there are of A, times this by the records/block. this will give you the number of blocks holding A

see how many unique values there are of B, times this by the records/block. this will give you the number of blocks holding B

figure out how many blocks hold them together (remember the divide trick

then add A+B - AandB

Term

Figuring out the size of a join query

Definition

T(R)T(S)/Max of the unique value of the attribute they have in common

Term

Figuring out the size of a join query - no common attributes

Definition

T(R)T(S)

Term

Figuring out the size of a join query, multiple shared attribute

Definition

T(R)T(S)/Max of attribute 1 between the two * Max of attribute 2 between the two

Term

How to figure out the number of blocks and I/O for a dense, sequential B tree

Definition

Take the number of records and divide it by the number of records per block.
-------------------------------------
Take the total number of records and divide it by the number of pointers and keep successfully dividing the answer by the number of pointers until you can't anymore.

Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO

Term

How to figure out the number of blocks and I/O for a sparse, sequential B tree

Definition

Take the number of records and divide it by the number of records per block

Use this number and divide it by the number of keys until you can't anymore

Add up the total number of blocks and however many divisions you did (including the first one) + 1 is the number of IO

Term

How to figure out the number of blocks and I/O for a dense B tree in no particular order

Definition

same process as if the B tree was sorted

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile