PostgreSQL

What is Database?

Organized collection of data. Sometimes referred to as a Database Management Systems(DBMS) in short, it is software that interact with end user, other application and the database itself to capture and analyze data.

Generally, a DBMS allows of definition, creation,querying, updates, deletions and administration of database. Database themselves are not generally portable across DBMSs(MySQL,Oracle,SQL server, Sybase, etc) but the data within then can generally can be exported and imported into any other database with some changes to datatypes

Example of application types that uses DBMSs:

-airlines: for customer information, scheduling of planes and personnel as well as reservations associated with their routes

-schools: for student information, class schedules, lesson plans and student grades

-businesses: most businesses use a DBMS of some kind to manage employee records, customer records, services, schedules, inventory and billing.

-Sales organizations: for customer records, product information, inventory and purchase history

Types of databases:
 * Cloud database
 * Data Warehouse
 * Distributed database
 * Relational Database
 * Federated Database
 * Knowledge base or knowledge management system
 * Operational database
 * Parallel database
 * SQL database
 * NoSQL database
 * Spatial database
 * Unstructured database

Data Normalization - In short is a process used to reduce and eliminate the redundancy data in database

This is important for variety of reason:
 * Redundancy wasted space and resourcees
 * Synchronization between multiple instances of the same data is difficult
 * Structure and location of multiple instances of the same data is often unknown outside of the data creator and creator mechanism

There are three common forms that normalizaion takes - first normal(1NF), second normal form(2NF) and third normal form(3NF). Although there are higher levels, they are rarely needed or used except in complex data warehouses.

First normal form(1NF) Second and Third Normal Form( 2NF and 3NF)
 * Data is considered 1NF when it contains no repeating groups of data in database structure
 * Translating a 0NF database that may have client information for billing and shipping as well as multiple line items for an order to 1NF version of that database would involve the separation of that data into related structure so that data does not have to exist in multiple forms
 * Whereas a 0NF database may be limited to a finite number of items, you can order ( since each line has an entry ) a 1NF database would not be since items exist in a different table and that table refers to a single order for one or more items.
 * Data can be even further normalized in 2NF and 3NF by building on work done earlier in 1NF normalization
 * The objective is to reduce storage, increase referential integrity and logically structure data so that it is easy to see relationships among the structures.
 * By using primary and secondary keys, you further drive out redundancy and increase your ability to manage and report on your data intelligently and consistently . Whether you do that by differentiating data structures( customer,item,billing etc) or by calculation( deriving data by performing calculations on stored data) you make you database more efficient and easier to maintain.

Index -
 * is an internal structure that improves the speed of data retrieval on a database table.
 * Although the speed of those operation is greatly improved wit the right index applied to your data, it does come at the cost of additional writes to and storage space within your data structure.
 * Indexes allow the retrieval and location of data that meet certain criteria without having to search ever row in a table. They are created on one or more columns in a database table and give the database the ability to do fast data lookups and access of ordered records.

Data relationships 
 * is definied by the existence of one field of data that exists in multiple tables
 * for e.g - A human resource database contains records of the employees. In the employees table, there is a field location code, that is a three letter designation of which office location that employee works at. Rather that storing the full location of each employee, that field contatins an aabbreviation that has a field in another table called Company location. Since these fields exists in each tables, the tables have a relationship that is depend on that field in both.

Keys RDBMS RBDMS key concept:
 * Since each database consists of tables and each table consists of rows of information and each row designates a single record, keys help you keep all of those records straight
 * There are multiple types of keys:
 * Primary key- The most common type of key, should be unique fore each record in that table. In previous example, each employee would have an employee ID, that ID an be definied as primary key since no two employees will have the same ID
 * Foreign key - these are used to create relationships between tables in your database, in fact in our previous location, the location code we talked about between employee table and office table would be an example of a foreign key establishing relationship between them
 * stand for Relational database management system
 * At it's heart, an RDBMS is the same as the previously discussed DBMS with a key difference. It is based on relational database model
 * The group of application that manage these particular database are often used for financial records, manufacturing, data warehouses, personnel data, etc.
 * RDBMS are currently the most common form of DBMS and the key component is the tight integration and definition of the relationships within the data structures.
 * Data organized into tables of rows and columns with a unique key for each row
 * Each entity type in database has its own table
 * Each row within entity type represents and instance of that type
 * Each column in that entity represents values within those instances
 * Rows in a table can be linked to rows in other tables through the storage of the unique key of that row it needs to be linked to ( the foreign key)
 * Use SQL as the language for querying and maintainng the database

What is clustering?
 * Clustering is a means of configuring multiple database servers to work together
 * Working "together in a cluster can mean a number of things. There are several types of clusters:, they are:
 * -failover (master/slave or replication)
 * -data warehouse
 * load balancing

Failover: Data Warehouse
 * are often referred to as typical "Master/Slave" configuration or " replicating"
 * This configuration is designed for one of the two databases in the cluster to function as primary responding database and then all transactions are duplicated (or replicated) to the secondary or slave database server
 * The secondary or slave database server exists as a failover fo the master. Only when the master database server is down or not responding will the secondary database receive connections. At this point, replication is broken and must be fixed after recovery.
 * is a special kind of cluster in that it can be part of one or more type of clustering
 * A data warehouse is often used a secondary database server that receiver read only request for certain types of queries( often heavy reporing)
 * The data warehouse typically only receives updates from the primary master database server but the reporting(and related processing of dynamic data reports) is off loaded from the prodution server to a secondary server to maintain performance and data integrity

Load balancing
 * is quite simply a way for multiple server to balance the load across them
 * This is the most typical type of cluster in that no one server is designated as a primary. Often there is one or more controllers that receive the connections and pass them ( via proxy,clustered or floating IPs,etc) to one of the database nodes.
 * Ofthen any transactions can receive a response from any server in the cluster ( as data written to any one server is replicated to all the other nodes). This load balancing method insures that te server responding the fastest is connected to each transaction.

Data types:
 * data type is a way to classify various kinds of data. Determining or classifying, the type that a particular value can go into, will determine the possible values for that type
 * In general data types are used within type systems which then offer ways of defining and using them. Different systems also use different methods and degrees of ensuring type safety.
 * Most programming languages ( and databases as we are seeing) explicitly note the type of data contained within a field, table or structure

Types of Data types:
 * Primitives types
 * Basic Type( data type provided by programming language, a building block)
 * Machine Data Types
 * Only exposed or available in "low level" programming( binary and hexadecimal are exaples)
 * Boolean Types
 * On/Off or True/False or 0/1(2bit value)
 * Numeric Types
 * Composite Types
 * Derived types( in other words, they are a combination of or the result on more than on primitive type)
 * Enumerations,Strings and more

Character types Numeric types
 * In general, character types are data that can be classified as eithed fixed or variable lenght strings(or text)
 * More specifically they can be fixed or varable lenght non-Unicoce(no special characters) string data and the value of which must be between 1 and 8000
 * as result in databases( as in PostgreSQL), the two primary charater types are char and varchar
 * char
 * If the sizes of column data are consistent
 * useage: char(x)- where x is the number of characters
 * varchar
 * if the sizes of the column data are inconsistent, particulary if they are going to be wildly different
 * useage: varchar(x) - where x is the number of characters
 * CREATE TABLE betterCitties( cityName varchar(80), stateAbbr char(2) ) ;
 * In general, numeric types are data values that can be represented numerically
 * The values that each numberic type field can contain very greatyly with the inherited numeric type specific to each expected range
 * In the postgresql world, there are types that allow small to very large whole numbers, both positive and negative as well as other types that can contain numbers represented with decimals and/or scientific notation.
 * Numeric types are often the source or components of derived values within a database
 * smallint 2bytes -32768 +32767
 * integer 4bytes
 * bigint 8bytes
 * decimal variable
 * numeric variable
 * real 4bytes
 * double precision
 * serial 4bytes autoincrementing,integer
 * big serial 8bytes large autoincrementing integer
 * CREATE TABLE custAge(custAgeInYears integer, custAgeInMonths bigint);

Serial types
 * The serial data type is a special type of data. It is not really a data type although it is generally referred to as such.
 * There are two serial types - small serial and big serial
 * It is a special derivation of numeric data type and each creates values in specific range. Small serial creates a field capable of a range of numbers within the integer type range while bigserial creates a field capable of a range of numbers withing the bigint type range
 * Either can be set as a primary key and both are similar to auto_increment found in some other databases
 * Serial
 * if the numbers of records in the table is expected to be in the integer of values and/or the type of field is not expected to be complex( a simple primary key)
 * Big serial
 * If the number of records in the table is expected to exceed the integer range of values and/or the type of field is expected to be complex(a primary key with as large invoice number for example)
 * CREATE table myInv(invID bigserial, custInvName varchar(50), custInvLastName varchar(50));
 * INSERT INTO myInv(custInvName, custInvLastName) VALUES ('Smith','Joe');

Boolean Type
 * Very simply, the Boolean is typically seen as true or false
 * Most people do not know, however, that in SQL/PostgreSQL, it actually has a third value called unknown which is represented by NULL.
 * The boolean type is defined as '1byte' value that can be represented in a number of valid ways that are converted to a standard representation within PostgreSQL engine during commit.
 * When you have data that can be represented in one of two states:
 * Valid values are:
 * True/False
 * t/f
 * 0/1
 * On/Off
 * Yes/no
 * Often this value is derived from one or more data points within a table so that reports and/or quick decisions can be made without explicitly asking the applications or user to decide.
 * CREATE TABLE boolOptOut(optPhone varchar(12), optInOut boolean);
 * INSERT INTO boolOptOut(optPhone,optInOut) VALUES ( '1213123123','0');

Special values - NULL
 * Although not explicitly a data type, it is often treated as such in terms of its value
 * NULL is not derived from any type nor it is calculate, it can be described as the non-existence of a value for any data type
 * NULL is often used as a 'comparative' for whether a given field has any valid value OR whether a SQL query has any valid matching rows.
 * NULL is unique in that it can be a value
 * A field in table has no data value, so it is assigned NULL or is inicated to be empty
 * A query returns no matching records, so the resulting dataset is considere NULL.
 * A derivation of comparatives is determine to be invalid, so its result is considered NULL
 * As well as a limiter
 * When building a table containing fields of data types, NULL is most often used to indicate whether a given field is required ( or set as "NOT NULL")
 * CREATE TABLE notRequired(code integer, codeName varchar(10) NOT NULL);

Date Type Time data type Timestamp type
 * a date type is a representation of the month, day, year and sometimes day of the week
 * Dates can be represented in a large number of formats that are driven by the ISO standards board and described within a given system by the character set that is installed
 * PostgreSQL supports all of the standard ISO formats across all supported languages and character sets.
 * a time type is representation of the seconds,minutes and hours of a day
 * times can be represented in a large number of formats that are driven by the ISO standards board and described within a given system by the character set that is installed
 * PostgreSQL supports all of the standard ISO formats accross all supported languages and character set
 * The timestamp type is a derived field type. Meaning that it is really the concatenation of date and time.
 * The concatenation can be derived at record insert time through application or database functionality(methods or triggers) or it can be accepted as input from the end user under certain formatiing conditions.
 * The data type is considered a literal, meaning that PostgreSQL will not examine the content of a literal string( the timestamp value) to determine its value ( whether or not it may be missing part of the literal that may determine how it is ultimately stored - i.e Timezone for adjustments)

What is NoSQL? Types of NoSQL databases Advantages of PostgreSQL and NoSQL capabilities:
 * NoSQL(often said to stand for Not only SQL) provides a mechanism for the retrieval of data from storage( disk,database or other structure) that is modeled in other ways than the typical tabular based relations defined in relational databses.
 * Although PostgreSQL is typically thought of as and RDBMS, it has features that compete in the NoSQL space along with NoSQL databases luike MongoDB and ElasticSearch
 * NoSQL databases are used extensively in big data and real-time web applications.
 * Since we have said that NoSQL is NOT relational, and thus not defined by relationships in tabular format, here are some examples of the types of data definitions that NoSQL can fall within
 * Columnar
 * Document
 * Key-Value Pairs
 * Graph
 * Multi-model
 * NoSQL databases assume that relationships have not clearly been established between data structures. This leads itself to higher performance in reads rather than writes
 * In addition to supporting all the features of NoSQL databases, obviously PostgreSQL is also an RDBMS. Since the objective of processing data is eventually to clarify relationsjhips between data structures, converting data sets from NoSQL to reltational in the same engine has advantages. Specifically PostgreSQL supports:
 * Static and Dynamic schemas
 * Key-Value Pairs
 * Relational and Normalized Form Storage
 * Data Constrains
 * Data Joins and Foreign keys
 * Standard Powerful query language
 * Server side programming

What is Big Data?
 * Big data is a term that is generically used to define and describe extremely large data sets. These data sets are then analyzed in any number of ways to  reveal the existence of patterns, associations, behaviors, trends or other charactericstics that can then be associated and reported on
 * Big data is often used by sales and marketing groups to identify customers, their buying or browsing habits and target certain groups for adverising. Recently, big data has become big in healtcare industry as insurance companies have acces to more and more digitized health records. These are mined for informations on all of us to classify our risks( and thus drive premiums ) for their service

PostgreSQL and Big data processing
 * PostgreSQL community edition support the typical NoSQL functionality that many organizations rely on for processing exteremely large data sets
 * Since PostgreSQL allows the storage and retrieval of data sets in multiple formats that have no clear relationships, as big data engine, it has the flexiblity od doing both that as well as being the end point for storing data once relationships are gleaned from the process
 * THere are several extensions for PostgreSQL and Big Data ( PostgresXL is probably most common). These will generally introduce MPP ( massive parallel processing) with query handling using SQL and/or properitary extensions.

PostgreSQL Features: Management tools Programming interfaces Installing postgresql on centos7
 * representation of a set of utilites and software to manage relational data
 * it is classified as RDBMS similar to Oracle,MySQL, SQL Servers and etc.
 * 1) Primary keys
 * 2) Foreign keys
 * 3) Views
 * 4) Triggers
 * 5) Transactional integrity
 * 6) Complex Queries
 * 7) SQL compliance
 * 8) NOSQL capability
 * 9) Big data processing
 * 10) Database centric logic
 * 1) pgadmin III
 * 2) phppgadmin
 * 3) dbvisualizer
 * 4) pgaccess
 * 5) pgshark
 * psql - command line
 * libpq - C library
 * Drivers - jdbc,odbc, dbi, .net
 * Pgtcl - Tcl bindings
 * add repo to : yum install http://yum.postgresql.org/9.4/redhat/rhel-6-x86_64/pgdg-redhat94-9.4-1.noarch.rpm

Initializing database and client commands:
 * yum install postgresql94-server postgresql94-contrib
 * /usr/pgsql-9.4/bin/postgresql94-setup initdb
 * systemctl enable postgresql-9.4.service
 * systemctl start postgresql-9.4.service
 * /usr/pgsql-9.4/bin/postgresql94-setup initdb


 * setsebool -P httpd_can_network_connect_db 1


 * su - postgres

Creating user with adding password: Create db with adding ownership Droping user- Dropping database Vieving content of table tblvendors Taking a backup of dbTest as postgres user to file
 * psql
 * \password postgres
 * \q - to quit
 * CREATE EXTENSION adminpack;
 * createuser -D -A -P usrToDelete
 * createdb -O usrToDelete dbToDelete
 * psql 
 * drop user usrTest
 * drop database dbToDelete
 * \d tblvendors

Restoring database to newly created dbMyRestore from file
 * pg_dump -U postgres dbTest -f myDBTestBkup.sql
 * createdb dbMyRestore
 * psql -U postgres -d dbMyRestore -f myDBTestBkup.sql

Creating db
 * su - postgres
 * createdb mytestdb
 * psql
 * \connect mytestdb;
 * creating in client - CREATE DATABASE mytestdb2;, \connect db_test;
 * dropping - DROP DATABASE db_test;

Creating table and inserting and vieving Creating user Assigning permissions 
 * psql
 * CREATE TABLE cities ( cityid varchar(5), cityname varchar (80), state varchar(20) );
 * INSERT INTO cites VALUES( '12345,'Cincinatii', 'Ohio');
 * SELECT * FROM cities;
 * psql
 * createuser mytestuser
 * createdb mytestuser
 * ALTER USER mytestuser WITH PASSWORD 'password';
 * vim /var/lib/pgsql/9.4/data/pg_hba.conf   local all all password  host all all 127.0.0.1/32 password
 * systemctl restart postgresql-9.4
 * psql -U mytestuser
 * CREATE TABLE testTable ( testID varchar(2));
 * \q
 * \du - list of roles
 * \connect mytestuser
 * GRANT INSERT ON testTable TO mytestuser;
 * INSERT INTO testTable VALUES ( '2');
 * GRANT ALL on testTable to mytestuser;

Installing phpPgAdmin
 * yum install phpPgAdmin httpd
 * vim /etc/httpd/conf.d/phpPgAdmin.conf
 * change Require local to Require all granted
 * Deny all to Allow from all
 * vim /etc/phpPgAdmin/config.inc.php change -
 * $conf['servers'][0]['host'] = 'localhost';
 * $conf['extra_login_security'] = false;
 * $conf['owned_only'] = true;
 * systemctl restart httpd && systemctl restart postgresql-9.4
 * go to PublicIp/phpPgAdmin
 * /var/lib/psql/9.4/data/postgresql.conf and edit  : listen_addresses = '*'  port = 5432    max_connections = 100
 * /var/lib/psql/9.4/data/pg_hba.conf  :
 * local   all             all                                     password
 * host    all             all              0.0.0.0/0         password
 * host    all             all             ::1/128             password

Backup database
 * pg_dump nameOfDB > dump.db
 * pg_dump myTest <dump.db

Replication master -slave

Configuration on master :
 * edit /var/lib/pgsql/9.4/data/pg_hba.conf
 * host     replication     replica(user created to replica)         54.171.221.67/32 or /0.0.0.0/0(external ip of slave)       trust
 * edit /var/lib/pgsql/9.4/data/postgresql.conf
 * max_wal_senders = 1
 * wal_keep_segments = 100
 * synchronous_standby_names = 'aspirex9973' - slave host name
 * wal_level = hot_standby
 * su - postgres
 * psql
 * create user replica replication;

Configuration on slave:
 * systemctl stop postgresql-9.4
 * rm -rf /var/lib/pgsql/9.4/data
 * pg_basebackup -D /var/lib/pgsql/9.4/data -h 172.31.107.89(master ip ) -U replica
 * vim /var/lib/pgsql/9.4/data/recovery.conf
 * standby_mode=on
 * trigger_file='/tmp/promotedb'
 * primary_conninfo='host=172.31.107.89(masterip) port=5432 user=replica application_name=aspirex997b(masterhostname)'
 * chown postgres:postgres recovery.conf
 * vim /var/lib/psql/9.4/data/postgres.conf
 * and add : hot_standby = on
 * /usr/pgsql-9.4/bin/pg_ctl -D /var/lib/pgsql/9.4/data/ start