Wednesday, June 26, 2013

Database Designing

Database Designing

A relational database organizes data in tables (or relations). A table is made up of rows and columns. A row is also called a record (or tuple). A column is also called a field (or attribute). A database table is similar to a spreadsheet. However, the relationships that can be created among the tables enable a relational database to efficiently store huge amount of data, and effectively retrieve selected data.
A language called SQL (Structured Query Language) was developed to work with relational databases.

Database Design Objective

A well-designed database shall:
  • Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is because duplicate data not only waste storage spaces but also easily lead to inconsistencies.
  • Ensure Data Integrity and Accuracy:
  • [TODO] more

Relational Database Design Process

Database design is more art than science, as you have to make many decisions. Databases are usually customized to suit a particular application. No two customized applications are alike, and hence, no two database are alike. Guidelines (usually in terms of what not to do instead of what to do) are provided in making these design decision, but the choices ultimately rest on the you - the designer.

Step 1: Define the Purpose of the Database (Requirement Analysis)

Gather the requirements and define the objective of your database, e.g. ...
Drafting out the sample input forms, queries and reports, often helps.

Step 2: Gather Data, Organize in tables and Specify the Primary Keys

Once you have decided on the purpose of the database, gather the data that are needed to be stored in the database. Divide the data into subject-based tables.
Choose one column (or a few columns) as the so-called primary key, which uniquely identify the each of the rows.
Primary Key
In the relational model, a table cannot contain duplicate rows, because that would create ambiguities in retrieval. To ensure uniqueness, each table should have a column (or a set of columns), called primary key, that uniquely identifies every records of the table. For example, an unique number customerID can be used as the primary key for the Customers table; productCode for Products table; isbn for Books table. A primary key is called a simple key if it is a single column; it is called a composite key if it is made up of several columns.
Most RDBMSs build an index on the primary key to facilitate fast search and retrieval.
The primary key is also used to reference other tables (to be elaborated later).
You have to decide which column(s) is to be used for primary key. The decision may not be straight forward but the primary key shall have these properties:
  • The values of primary key shall be unique (i.e., no duplicate value). For example, customerName may not be appropriate to be used as the primary key for the Customers table, as there could be two customers with the same name.
  • The primary key shall always have a value. In other words, it shall not contain NULL.
Consider the followings in choose the primary key:
  • The primary key shall be simple and familiar, e.g., employeeID for employees table and isbn for books table.
  • The value of the primary key should not change. Primary key is used to reference other tables. If you change its value, you have to change all its references; otherwise, the references will be lost. For example, phoneNumber may not be appropriate to be used as primary key for table Customers, because it might change.
  • Primary key often uses integer (or number) type. But it could also be other types, such as texts. However, it is best to use numeric column as primary key for efficiency.
  • Primary key could take an arbitrary number. Most RDBMSs support so-called auto-increment (or AutoNumber type) for integer primary key, where (current maximum value + 1) is assigned to the new record. This arbitrary number is fact-less, as it contains no factual information. Unlike factual information such as phone number, fact-less number is ideal for primary key, as it does not change.
  • Primary key is usually a single column (e.g., customerID or productCode). But it could also make up of several columns. You should use as few columns as possible.
Let's illustrate with an example: a table customers contains columns lastName, firstName, phoneNumber, address, city, state, zipCode. The candidates for primary key are name=(lastName, firstName), phoneNumber, Address1=(address, city, state), Address1=(address, zipCode). Name may not be unique. Phone number and address may change. Hence, it is better to create a fact-less auto-increment number, says customerID, as the primary key.

Step 3: Create Relationships among Tables

A database consisting of independent and unrelated tables serves little purpose (you may consider to use a spreadsheet instead). The power of relational database lies in the relationship that can be defined between tables. The most crucial aspect in designing a relational database is to identify the relationships among tables. The types of relationship include:
  1. one-to-many
  2. many-to-many
  3. one-to-one
One-to-Many
In a "class roster" database, a teacher may teach zero or more classes, while a class is taught by one (and only one) teacher. In a "company" database, a manager manages zero or more employees, while an employee is managed by one (and only one) manager. In a "product sales" database, a customer may place many orders; while an order is placed by one particular customer. This kind of relationship is known as one-to-many.
One-to-many relationship cannot be represented in a single table. For example, in a "class roster" database, we may begin with a table called Teachers, which stores information about teachers (such as name, office, phone and email). To store the classes taught by each teacher, we could create columns class1, class2, class3, but faces a problem immediately on how many columns to create. On the other hand, if we begin with a table called Classes, which stores information about a class (courseCode, dayOfWeek, timeStart and timeEnd); we could create additional columns to store information about the (one) teacher (such as name, office, phone and email). However, since a teacher may teach many classes, its data would be duplicated in many rows in table Classes.
To support a one-to-many relationship, we need to design two tables: a table Classes to store information about the classes with classID as the primary key; and a table Teachers to store information about teachers with teacherID as the primary key. We can then create the one-to-many relationship by storing the primary key of the table Teacher (i.e., teacherID) (the "one"-end or the parent table) in the table classes (the "many"-end or the child table), as illustrated below.

The column teacherID in the child table Classes is known as the foreign key. A foreign key of a child table is a primary key of a parent table, used to reference the parent table.
Take note that for every value in the parent table, there could be zero, one, or more rows in the child table. For every value in the child table, there is one and only one row in the parent table.
Many-to-Many
In a "product sales" database, a customer's order may contain one or more products; and a product can appear in many orders. In a "bookstore" database, a book is written by one or more authors; while an author may write zero or more books. This kind of relationship is known as many-to-many.
Let's illustrate with a "product sales" database. We begin with two tables: Products and Orders. The table products contains information about the products (such as name, description and quantityInStock) with productID as its primary key. The table orders contains customer's orders (customerID, dateOrdered, dateRequired and status). Again, we cannot store the items ordered inside the Orders table, as we do not know how many columns to reserve for the items. We also cannot store the order infomation in the Products table.
To support many-to-many relationship, we need to create a third table (known as a junction table), says OrderDetails (or OrderLines), where each row represents an item of a particular order. For the OrderDetails table, the primary key consists of two columns: orderID and productID, that uniquely identify each row. The columns orderID and productID in OrderDetails table are used to reference Orders and Products tables, hence, they are also the foreign keys in the OrderDetails table.

The many-to-many relationship is, in fact, implemented as two one-to-many relationships, with the introduction of the junction table.
  1. An order has many items in OrderDetails. An OrderDetails item belongs to one particular order.
  2. A product may appears in many OrderDetails. Each OrderDetails item specified one product.
One-to-One
In a "product sales" database, a product may have optional supplementary information such as image, moreDescription and comment. Keeping them inside the Products table results in many empty spaces (in those records without these optional data). Furthermore, these large data may degrade the performance of the database.
Instead, we can create another table (says ProductDetails, ProductLines or ProductExtras) to store the optional data. A record will only be created for those products with optional data. The two tables, Products and ProductDetails, exhibit a one-to-one relationship. That is, for every row in the parent table, there is at most one row (possibly zero) in the child table. The same column productID should be used as the primary key for both tables.
Some databases limit the number of columns that can be created inside a table. You could use a one-to-one relationship to split the data into two tables. One-to-one relationship is also useful for storing certain sensitive data in a secure table, while the non-sensitive ones in the main table.

Column Data Types
You need to choose an appropriate data type for each column. Commonly data types include: integers, floating-point numbers, string (or text), date/time, binary, collection (such as enumeration and set).

Step 4: Refine & Normalize the Design

For example,
  • adding more columns,
  • create a new table for optional data using one-to-one relationship,
  • split a large table into two smaller tables,
  • others.
Normalization
Apply the so-called normalization rules to check whether your database is structurally correct and optimal.
First Normal Form (1NF): A table is 1NF if every cell contains a single value, not a list of values. This properties is known as atomic. 1NF also prohibits repeating group of columns such as item1, item2,.., itemN. Instead, you should create another table using one-to-many relationship.
Second Normal Form (2NF): A table is 2NF, if it is 1NF and every non-key column is fully dependent on the primary key. Furthermore, if the primary key is made up of several columns, every non-key column shall depend on the entire set and not part of it.
For example, the primary key of the OrderDetails table comprising orderID and productID. If unitPrice is dependent only on productID, it shall not be kept in the OrderDetails table (but in the Products table). On the other hand, if the unitPrice is dependent on the product as well as the particular order, then it shall be kept in the OrderDetails table.
Third Normal Form (3NF): A table is 3NF, if it is 2NF and the non-key columns are independent of each others. In other words, the non-key columns are dependent on primary key, only on the primary key and nothing else. For example, suppose that we have a Products table with columns productID (primary key), name and unitPrice. The column discountRate shall not belong to Products table if it is also dependent on the unitPrice, which is not part of the primary key.
Higher Normal Form: 3NF has its inadequacies, which leads to higher Normal form, such as Boyce/Codd Normal form, Fourth Normal Form (4NF) and Fifth Normal Form (5NF), which is beyond the scope of this tutorial.
At times, you may decide to break some of the normalization rules, for performance reason (e.g., create a column called totalPrice in Orders table which can be derived from the orderDetails records); or because the end-user requested for it. Make sure that you fully aware of it, develop programming logic to handle it, and properly document the decision.
Integrity Rules
You should also apply the integrity rules  such NOT NULL, Foregin key to check the integrity of your design

Column Indexing
ou could create index on selected column(s) to facilitate data searching and retrieval. An index is a structured file that speeds up data access for SELECT, but may slow down INSERT, UPDATE, and DELETE. Without an index structure, to process a SELECT query with a matching criterion (e.g., SELECT * FROM Customers WHERE name='Tan Ah Teck'), the database engine needs to compare every records in the table. A specialized index (e.g., in BTREE structure) could reach the record without comparing every records. However, the index needs to be rebuilt whenever a record is changed, which results in overhead associated with using indexes.

Thursday, June 20, 2013

Database Normalization

Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.

The Normal Forms

The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article.

Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms.

First Normal Form (1NF)

First normal form (1NF) sets the very basic rules for an organized database:
  • Eliminate duplicative columns from the same table.
  • Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
For more details, read Putting your Database in First Normal Form

Second Normal Form (2NF)

Second normal form (2NF) further addresses the concept of removing duplicative data:
  • Meet all the requirements of the first normal form.
  • Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
  • Create relationships between these new tables and their predecessors through the use of foreign keys.
For more details, read Putting your Database in Second Normal Form

Third Normal Form (3NF)

Third normal form (3NF) goes one large step further:
  • Meet all the requirements of the second normal form.
  • Remove columns that are not dependent upon the primary key.
For more details, read Putting your Database in Third Normal Form

Boyce-Codd Normal Form (BCNF or 3.5NF)

The Boyce-Codd Normal Form, also referred to as the "third and half (3.5) normal form", adds one more requirement:
  • Meet all the requirements of the third normal form.
  • Every determinant must be a candidate key.

Oracle Architecture

Oracle Database-
Basically, there are two main components of Oracle database –– instance and database itself. An instance consists of some memory structures and the background processes, whereas a database refers to the disk resources. Figure 1 will show you the relationship.
Instance
Database files themselves are useless without the memory structures and processes to interact with the database. Oracle defines the term instance as the memory structure and the background processes used to access data from a database. The memory structures and background processes contitute an instance. The memory structure itself consists of System Global Area (SGA), Program Global Area (PGA), and an optional area –– Software Area Code. In the other hand, the mandatory background processes are Database Writer (DBWn), Log Writer (LGWR), Checkpoint (CKPT), System Monitor (SMON), and Process Monitor (PMON). And another optional background processes are Archiver (ARCn), Recoverer (RECO), etc. Figure 2 will illustrate the relationship for those components on an instance.

System Global Area
SGA is the primary memory structures. When Oracle DBAs talk about memory, they usually mean the SGA. This area is broken into a few of part memory –– Buffer Cache, Shared Pool, Redo Log Buffer, Large Pool, and Java Pool.
Buffer Cache
Buffer cache is used to stores the copies of data block that retrieved from datafiles. That is, when user retrieves data from database, the data will be stored in buffer cache. Its size can be manipulated via DB_CACHE_SIZE parameter in init.ora initialization parameter file.
Shared Pool
Shared pool is broken into two small part memories –– Library Cache and Dictionary Cache. The library cache is used to stores information about the commonly used SQL and PL/SQL statements; and is managed by a Least Recently Used (LRU) algorithm. It is also enables the sharing those statements among users. In the other hand, dictionary cache is used to stores information about object definitions in the database, such as columns, tables, indexes, users, privileges, etc.
The shared pool size can be set via SHARED_POOL_SIZE parameter in init.ora initialization parameter file.
Redo Log Buffer
Each DML statement (select, insert, update, and delete) executed by users will generates the redo entry. What is a redo entry? It is an information about all data changes made by users. That redo entry is stored in redo log buffer before it is written into the redo log files. To manipulate the size of redo log buffer, you can use the LOG_BUFFER parameter in init.ora initialization parameter file.
Large Pool
Large pool is an optional area of memory in the SGA. It is used to relieves the burden place on the shared pool. It is also used for I/O processes. The large pool size can be set by LARGE_POOL_SIZE parameter in init.ora initialization parameter file.
Java Pool
As its name, Java pool is used to services parsing of the Java commands. Its size can be set by JAVA_POOL_SIZE parameter in init.ora initialization parameter file.
Program Global Area
Although the result of SQL statemen parsing is stored in library cache, but the value of binding variable will be stored in PGA. Why? Because it must be private or not be shared among users. The PGA is also used for sort area.
Software Area Code
Software area code is a location in memory where the Oracle application software resides.
Oracle processes
There are two categories of processes that run with an Oracle database. They are mentioned below:
  • User processes
  • System processes
The following figure illustrates the relationship between user processes, server processes, PGA, and session:

The first interaction with the Oracle-based application comes from the user computer that creates a user process. The user process then communicates with the server process on the host computer. Here, PGA is used to store session specific information.


Oracle Background Processes
Oracle background processes is the processes behind the scene that work together with the memories.
DBWn
Database writer (DBWn) process is used to write data from buffer cache into the datafiles. Historically, the database writer is named DBWR. But since some of Oracle version allows us to have more than one database writer, the name is changed to DBWn, where n value is a number 0 to 9.
LGWR
Log writer (LGWR) process is similar to DBWn. It writes the redo entries from redo log buffer into the redo log files.
CKPT
Checkpoint (CKPT) is a process to give a signal to DBWn to writes data in the buffer cache into datafiles. It will also updates datafiles and control files header when log file switch occurs.
SMON
System Monitor (SMON) process is used to recover the system crash or instance failure by applying the entries in the redo log files to the datafiles.
PMON
Process Monitor (PMON) process is used to clean up work after failed processes by rolling back the transactions and releasing other resources.
ARCH
The ARCH background process is invoked when your database is running in ARCHIVELOG mode. If you are archiving your redo logs, the redo logs are touched by several background processes. First, the LGWR process copies the log_buffer contents to the online redo log files, and then the ARCH process copies the online redo log files to the archived redo log filesystem on UNIX. The ARCH process commonly offloads the most recent online redo log file whenever a log switch operation occurs in Oracle.

Database
The database refers to disk resources, and is broken into two main structures –– Logical structures and Physical structures.
Logical Structures:~
Oracle database is divided into smaller logical units to manage, store, and retrieve data effeciently. The logical units are tablespace, segment, extent, and data block. Figure 5 will illustrate the relationships between those units.

Tablespace
A Tablespace is a grouping logical database objects. A database must have one or more tablespaces. In the Figure 5, we have three tablespaces –– SYSTEM tablespace, Tablespace 1, and Tablespace 2. Tablespace is composed by one or more datafiles.
There are three types of tablespaces in Oracle:
  • Permanent tablespaces
  • Undo tablespaces
  • temporary tablespaces
Segment
A Tablespace is further broken into segments. A segment is used to stores same type of objects. That is, every table in the database will store into a specific segment (named Data Segment) and every index in the database will also store in its own segment (named Index Segment). The other segment types are Temporary Segment and Rollback Segment.
A segment is a container for objects (such as tables, views, packages . . . indexes). A segment consists of Extends.
There are 11 types of Segments in oracle 10g.
  1. Table
  2. Table Partition
  3. Index
  4. Index Partition
  5. Cluster
  6. Rollback
  7. Deferred Rollback
  8. Temporary
  9. Cache
  10. Lobsegment
  11. Lobindex
    Extent
    A segment is further broken into extents. An extent consists of one or more data block. When the database object is enlarged, an extent will be allocated. Unlike a tablespace or a segment, an extent cannot be named. Space for a data on a hard disk is allocated in extends.
    Data Block
    A data block is the smallest unit of storage in the Oracle database. The data block size is a specific number of bytes within tablespace and it has the same number of bytes.
    Physical Structures:~The physical structures are structures of an Oracle database (in this case the disk files) that are not directly manipulated by users. The physical structure consists of datafiles, redo log files, and control files.
    Datafiles
    A datafile is a file that correspondens with a tablespace. One datafile can be used by one tablespace, but one tablespace can has more than one datafiles. An Oracle databae include of a number of physical files called datafile.
    Redo Log Files
A Redo Log is a file that is part of an Oracle Database. When a transaction is committed the transaction’s details in the redo log buffer is written in a redo log file. These files contain information that helps in recovery in the event of system failure.


Control Files
Control files are used to store information about physical structure of database. The control file is absolutely crucial to database operations. It contains the following types of information:
  1. Database Information
  2. Archive log history
  3. Tablespace and datafile records
  4. Redo threads
  5. Database’s creation data
  6. Database name
  7. Current Archive information
  8. Log records
  9. Database Id which is unique to each Database

Tuesday, June 11, 2013

Useful Queries

 Below are simple but useful queries of Oracle .
  • Query to find out all column names in table
              Select * from all_tab_columns;
  • Query to find all column names in index.
               Select * from all_ind_columns;

  • Query to find all Functional index details.
               Select * from All_ind_expressons;
   
  • Query to find all Functional index details.
               Select * from All_ind_expressons;
           
  •      Query to get all Oracle indexes. 
              Select  INDEX_NAME, TABLE_NAME, TABLE_OWNER from ALL_INDEXES
              order by  TABLE_OWNER, TABLE_NAME, INDEX_NAME
  •   Query to get all Oracle views that can be viewed by the current user. 
              Select VIEW_NAME, OWNER from ALL_VIEWS order by OWNER, VIEW_NAME
  
  •     Query to get all Oracle triggers for the current user.  
               Select TRIGGER_NAME, OWNER from SYS.ALL_TRIGGERS order by OWNER,  
               TRIGGER_NAME
  •     Query to get all sequences for the current user
               Select * from all_sequences;
  •      Query to get db objects based  (procedures,functions,trigger,view etc. )on table.
                 Select * from all_dependencies where referenced_type ='TABLE'
                  and referenced_name  =<Tablename>
   
  •      Query to get source code
                Select * from all_source where name =<objectname>





Oracle 11g PL/SQL New Feature

There are many new features has been added in Oracle 11g for PL/SQL Developers.


  1. SIMPLE_INTEGER Datatype
  2. CONTINUE Statement
  3. Sequences in PL/SQL Expressions
  4. Dynamic SQL Enhancements
  5. Generalized Invocation
  6. Named and Mixed Notation in PL/SQL Subprogram Invocations
  7. Automatic Subprogram Inlining
  8. PL/Scope
  9. PL/SQL Native Compiler Generates Native Code Directly
  10. New PL/SQL Compiler Warning
  11. Cross-Session PL/SQL Function Result Cache
  12. Compound Triggers
  13. Triggers with Disable state.