Database: August 2012

Wednesday, 29 August 2012

Other SELECT commands

There are other commands that can be used with the SELECT statement, such as sorting distinct values, retrieving data from multiple tables and sub-queries. They can be used with or without the WHERE condition

Note: When using a sub-query in a WHERE condition, the sub-query must only return one result.

Format

Sorting query results in ascending order by attribute
SELECT <attribute> FROM table ORDER BY attribute;

Sorting query results in descending order by attribute
SELECT <attribute> FROM table ORDER BY attribute DESC;

Show only unique values of an attribute
SELECT DISTINCT attribute FROM table;

Show attributes from multiple tables (pk - primary key, fk - foreign key)
SELECT <attribute> FROM < list of tables> WHERE table1.pk=table2.fk;

Using sub-queries to retrieve results from another source
SELECT <attributes> FROM < SUB_QUERY>;
SELECT <attributes> FROM table WHERE attribute = < SUB_QUERY>;

Tuesday, 28 August 2012

SELECT with conditions

To control which rows of data to retrieve, we can apply various conditions for the table values and only display rows that match that condition

Format

SELECT <attributes> FROM <source table> WHERE <conditio>'

Basic condition <> symbol means "not equal"

SELECT*FROM table WHERE attribute = "value";
SELECT*FROM table WHERE attribute <> "value";
SELECT*FROM table WHERE attribute >1;
SELECT*FROM table WHERE attribute >=1;
SELECT*FROM table WHERE attribute <=1;

Condition for NULL values

SELECT*FROM table WHERE attribute IS NOT NULL;
SELECT*FROM table WHERE attribute IS NULL;

Multiple condition in one statement

SELECT*FROM table WHERE att1="a" AND att2="b";
SELECT*FROM table WHERE att1="a" OR att1="b";

Specify a range using either the AND command or BETWEEN command

SELECT*FROM table WHERE attribute >=0 AND attribute <=100;
SELECT*FROM table WHERE attribute BETWEEN 0 AND 100;

Specify multiple values an attribute can match

SELECT * FROM table WHERE attribute IN ("a", "b", "c");

Match a sub-word in the value.*means anything can be in that position a _means any one character can be in that position.

*Note: LIKE with_is not available in Microsoft SQL
**Note: In My SQL, use a % symbol instead of a *
SELECT*FROM table WHERE attribute LIKE "*word*';
SELECT*FROM table WHERE attribute LIKE "wo_d";

Note: It is possible to compare date values in a condition such as
WHERE game_date > #03-19-04#

Microsoft Access SQL uses date format MM/DD/YY wrapped with #.
Fro example if the date was March 19th 2004 the the SQL value should be #03-19-04#.

My SQL uses " " rather than # # for dates and in the format YYYY/MM/DD.
For example "2004-03-19".

SELECT statement

Most basic SQL command, allows us to retrieve data from the database.

SELECT	Specifies which columns are to appear in output
FROM	Specifies table to be used
WHERE	Filters rows
GROUP BY	Forms groups of rows with same column value
HAVING	Filters groups subject to some condition
ORDER BY	Specifies the order of the output

Format
SELECT <list of attribute> FROM <source table>;

Select all from a table
SELECT * FROM table;

Select a few attributes from a table
SELECT attribute1, attibute2 FROM table;

Examples
SELECT * FROM Gamez;
SELECT game_name, genre, console FROM Gamez;

Writing SQL Commands

SQL statements consist of reserved words and user defined words

Reserved words are a fix part of SQL and must be spelled exactly as required and cannot be split across lines.
User-defined words are made up by user and represent names of various database objects such as relations, columns, views.
Most component of an SQL statement are case insensitive except for literal character data.
Good SQL Style:

Each clause should begin on a new line
Start of a clause should line up with start of other clauses
If a clause has several parts, each should appear on a separate line and be indented under the start of the clause.

Data Manipulation

The four statements used for data manipulation in SQL are:

SELECT to query data in the database
INSERT to insert data into a table
UPDATE to update data in a table
DELETE to delete data from a table

Of the four statements the SELECT statement is the most complex
The following slides present a number of examples of using the SELECT statement for making database queries.

Structured Query Language

The Structured Query Language (SQL) has emerged as one of the most commonly used relational database language.

SQL consists of three main components:

A Data Definition Language (DDL) for defining the database structure.
A Data Control Language (DCL) for controlling access to the data.
A Data Manipulation Language (DML) for retrieving and updating the data in a database.

Sunday, 26 August 2012

Boyce-Codd Normal Form

Database relation in 3rd normal form have neither partial functional dependencies or transitive dependencies.
The reason for this is that such dependencies can lead to update anomalies.
However the rules for 1NF , 2NF, and 3NF do not take into account whether such dependencies remain on other candidate keys of a relation.
Boyce-Codd Normal Form(BCNF) is based on functional dependencies that take into account all candidate keys in a relation.
Every determinant in table is a candidate key
When table contains only one candidate key, the 3NF and the BCNF are equivalent
BCNF can be violated only when table contains more than one candidate key.

A table is in Boyce-Codd normal form if

It is in 3NF
Every determinant in the table is a candidate key.

3NF without BCNF

When happens if a nonkey attribute is the determinant of a key attribute. It does not violate 3NF, yet fails to meet the BCNF requirements.

STU_ID	STAFF_ID	CLASS_CODE	ENROLL_GRADE
125	25	21334	A
125	20	32456	C
135	20	28458	B
135	25	27563	C
144	20	32456	B

Each Class_Code identifies a class uniquely.
A student can take many classes.
Note: Student 125 has taken both 21334 and 32456 earning the grades of A and C.
A staff member can teach many classes, but each class is tough by only one staff
Note:Staff member 20 teaches the classes 32456 and 28458.

Panel A is 3NF. It has the problem of trying to describe two things. If a different member is assigned to teach class 32456 two rows must be updated. Also if student 135 drops out we lose data on who teaches the class.
The solution is Panel B where we decompose the table into two new.

Tuesday, 21 August 2012

Key Types

Key Type	Definition
Superkey	An attribute (or combination of attributes) that uniquely identifies each row in a table
Candidate key	A minimal superkey. A superkey that does not contain a subset of attributes that is itself a superkey
Primary key	A candidate key selected to uniquely identify all other attribute values in any given row. Cannot contain null entries.
Secondary key	An attribute(or combination of attributes) used strictly for data retrieval purposes.
Foreign key	An attribute(or combination of attributes) in one table whose value must either match the primary key in another table or be null.

Improving the Design

Table structures are cleaned up to eliminate troublesome initial partial and transitive dependencies.
Normalization cannot, by itself be relied on to make good designs.
It is valuable because its use helps eliminate data redundancies
Issues to address to produce a good normalized set of tables:

Evaluate Primary Key Assignment

Create JOB_CODE and JOB table to reduce data entry errors
E.g. Database designer in not mistakenly written as DB Designer

Evaluate Naming Conventions

Keep consistent with table name
So CHG_HOUR-> JOB_CHG_HOUR

Refine Attribute Atomicity

An atomic attribute is one that cannot be subdivided and is said to display atomicity
EMP_NAME becomes EMP_FNAME, EMP_LNAME

Identify New Attributes that might be useful

For example add EMP_HIREDATE or data about address etc. Identify New Relationship.
Project and Employee tables have new relationships that may allow you to identify data about the project manager.

Refine Primary Keys as Required for Data Granularity

Granularity is the level of detail represented by the data. Data stored at their lowest level of granularity are said to be "atomic data".
Use ASSIGN_NUM to get greater accuracy and granularity to avoid entity integrity violations.
(e.g. what if employee worked on project at different times? then there is no longer a unique primary key if composite key PROJ_NUM and EMP_NUM is used.)

Maintain Historical Accuracy

Identify past and current charge rates using ASSIGN_CHG_HOUR instead of JOB_CHG_HOUR
This ensures you have accurate record of what happened at the time of the project

Evaluate Using Derive Attributes

In large database, you might want to store derived attributes to reduce time to produce reports.

2NF to 3NF

Step1:Identify Each New Determinant
- For every transitive dependency, write its determinant as Primary Key for new table
Step2: Identify the dependent Attributes
- Identify attributes dependent on each determinant identified in Step 1 and identify dependency
Step3: Remove the dependent Attributes from transitive dependencies.
- Eliminate all dependent attributes in transitive relationship from each of the tables that have such a transitive relationship.

3rd Normal Form(3NF)

A table in 3rd normal form if

It is in 2NF
It contains no transitive dependencies

If a transitive dependency exists it is removed by placnig the appropriate attribute in a new relation along with their determinant.

1NF to 2NF

Identify the primary key for the 1NF relation.
Identify the functional dependencies in the relation.
If partial dependencies exist on the primary key remove them by placing them in a new relation along with a copy of their determinant. (The part of the primary key they depend on)

Step1:Write Each Key Component on a Separate Line.
- Write each key component on separate line, then write original (composite) key on last line.
- Each component will become key on last line
Step2:Assign Corresponding Dependent Attributes
- Determine those attributes that are dependent on other attributes
- At this point most anomalies have been eliminated

2nd Normal Form(2NF)

A relation that is in first normal form and every non-primary key attribute is fully functionally dependent on the primary key is in second normal form.
A table is in 2nd normal form if

It is in 1st normal form
It includes no partial dependencies that is no attribute is dependent only a portion of the primary key.

Moving relation from 1NF to 2NF involves removing partial dependencies.
A relation that is on 1NF and has a single attribute primary key is automatically in 2NF.

UNF to 1NF

Nominate an attribute or group of attributes to act as the key for the non-normalized table.
Identify the repeating groups in the non-normalized table which repeats for the key attributes.
Remove the repeating group by entering appropriate data into the empty columns of rows containing the repeating data("flattening" the table)

Step1:Eliminate the Repeating Group
- Present data in tabular format, where each cell has single value and there are no repeating groups
Step2:Identify the Primary Key
- Primary key must uniquely identify attribute value
Step3:Identify All dependencies
- Dependencies can be depicted with help of a diagram

Example(UNF)

Example(1NF)

Conversion to 1NF

1st Normal Form(1NF)

A relation in which intersection of each row and column contain only one value is said to be in first normal form(1NF)

The approach to moving a table from UNF to 1NF

Repeating groups are removed by adding redundant data.

UN-Normalized Form(UNF)

An UN-normalized table is a table that contains one or more repeating groups.

Data Redundancy

A major aim of database design is to group attributes into relations so as to minimize data redundancy.
Problems associated with data redundancy are illustrated by comparing the Staff and Branch relations with the Staff_Branch relation in the following example.
Relation that have redundant data may suffer from update anomalies.
The occurrence of value for data element more than once within a database or a file.

Functional dependency

Functional Dependency describes the relationship between attributes in a relation. If A and B are attributes of a relation R, B is functionally dependent on A if each value of A is associated with exactly one value of B

Determinant:

Any attribute whose value determines other values within a row. The determinant of a functional dependency refers to the attribute or group of attributes on the left hand side of the arrow.

If an attribute is determined by only part of the primary key ( the determine A), there is a partial dependency.
If an attribute is determined by the whole primary key ( the determinate A) there is a full functional dependency.

Normalization

Normalization is a technique for assigning attributes to entities. This helps eliminate data redundancy and data anomalies.
Normalization is a technique for producing a set of relations with desirable properties, given the data requirements of the enterprise.
A technique which can be used to help find and validate the relations for a given system

The process of normalization was first developed by E.F. Codd in 1972
The benefit of normalization is insertion anomalies are avoided
To achieve this, a suitable set of relation must be found.
The main objective of developing a logical data model for a relational system is to create an accurate representation of the data, its relationships,and constraints.
Normalization examines the relationships which exists between the attributes of a relation.
Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form.
Normalization is necessary in order to avoid the database being susceptible to update anomalies.

The Normalization Process:

Each table represents a single subject
No data item will be unnecessarily stored in more than one table.
All attributes in a table are dependent on the primary key.

Data Dictinary

A data dictionary is a documentation used to categorize, explain and apply constraints to all attributes in a database.
It is NOT part of the schema, but it is an essential piece of documentation that goes with your database.
Data Dictionaries are implemented into the DBMS.

Integrity Rules

Integrity rules may sound very technical but they are simple and straightforward rules that each table must follow.
These are very important in database design, when tables break any of the integrity rules our database will contain errors when retrieving information.
Hence the name "integrity" which describes reliability and consistency of values.
there are two types of integrity rules that we will look at:

Entity Integrity Rule
Referential Integrity Rule

Entity Integrity Rule:

The entity integrity rule refers to rules the primary key must follow.

The primary key value cannot be null.
The primary key value must be unique.

If a table does not meet these two requirements, we say the table is violating the entity integrity rule.

The team table violates the entity integrity rules at two places.

Team Super- missing primary key
Super Ultra Mega Team and Best Team in the World -has the same primary key.

Referential Integrity Rule:

The referential integrity rule refers to the foreign key.
The foreign key may be null and may have the same value but:

The foreign key value must match a record in the table it is referring to.

Tables that do not follow this are violating the referential integrity rule.

The Player table violates the referential integrity rule at these places:

Player P8 and P9 - these players are in teamID T8, which does not exist in the Team Table.

Foreign Key

A foreign key is a copy of the primary key. If placed in another table, the two tables then have a link with each other.
Instead, tables use an extra attribute called the foreign key to link tables together.
The foreign key is always placed in the entity that is the "many" entity of the one to many relationship
The reason for this is very simple, each table's cell can only contain one value, it would not make sense to place the playerID foreign key on the team's table
A field in a table that appears as match of the primary key in another table.

Converting From ERD

one to one relationship type:

As shown above, the two entities experiencing a one to one relationship are merged into one table.

One to one relationships are interesting cases because usually they are combined already in the ERD, however certain situation might require us to separate them.

The table's name and keys can also be adusted.

In this example, the person's ID and the heart's ID might point the same record, it might be mode efficient to use just one key and adjust the table name as well.

one to many relationship type:

Tables by themselves cannot show a relationship unlike drawing a diamond in ERD's.
Instead, tables use an extra attribute called the foreign key to link tables together.

many to many relationship:

Before converting, though make sure that the many to many relationship is already broken down with a composite entity.

Each entity is converted into its own schema like before.
However in many to many situations, we also convert the composite entity into is won schema.

There are no foreign keys involved but the composite entity's keys are indirectly linking the other two entities.

The composite entity tables are usually very small in terms of the amount of attributes they have.
A naming convention for these tables is to use the name from the other tow tables in the relational model.

Super and Sub Types:

Super/sub types in relational models are interesting.
Recall that sub types(child) automatically inherit attributes from the super(parent).
There are two ways you can portray super/sub types in table form.

Create a table for the super type then a table for each sub type. Only include unique attributes and relationships in the sub type tables.(recommended)
Only create sub types and include all inherited attributes in each sub type table.

Converting from ERD

To convert an ER diagram to the relational model, we have to identify how a relational model represents each component of our ERD.
To start, the entity itself becomes a table, with the entity name becoming the schema's name.

note: each entity in the ERD is a separate table in the relational model.
note:Schema is another name for a database or collection of tables.

Each attribute in our ERD becomes a column in the relational model
The attribute's name is the heading for each column.

Column names should be only be one word, use an underscore(_) to separate words if needed.

Primary keys should be placed in the first column and underlined just like in our ERD.

Free to insert data into our table/ Each tuple(row) represents an instance of the entity.
ExampleL each tuple in a student table is a separate studet, with all data on the row referring to the same student.
We do not always need to show the data in the schema, there will be times we will only draw the schema structure of the table.
This allows us to draw schema quickly

Reminders:

Each entity in an ERD is a separate table in the relational model
Each table must at least have a primary or composite key.
Each cell in the table can contain a maximum of one value.
Each row in a table refers to an individual instance of the entity.

Relational Model

The Relational Model was first developed in 1969 by E.F Codd (of IBM). It was a breaking through for users and designers.

Its conceptual simplicity set the stage ofr a genuine database revolution.

The Relational Model displays database structures in tables, more closely representing a modern computer based database than the ERD
A collection of data items organized as a set of formally described tables from which data can be accessed easily.

The diagram to the left was our student entity.
The table on the right is the same entity translated into a relational model.
Asides from displaying an entity differently, the relational model also has one more important task which is storing information.

A relational model is able to display an entity's structure while also containing information on each instance of the entity. (As in we do not need to talk about all students as a collection, we can show individual student's info.)