teradata application utilities: 2010

Friday, May 7, 2010

get past 6 month numbers from a calendar table in TeraData.

SELECT
year_of_calendar,
month_of_year
FROM
sys_calendar.calendar
WHERE
calendar_date<=current_date AND EXTRACT(month FROM calendar_date)<> EXTRACT(MONTH FROM current_date) AND
--because you want last 6 months and not current month
calendar_date>current_date-INTERVAL '6' MONTH
GROUP BY 1,2

ORDER BY 1 DESC, 2 DESC;

Results set look like this:

Yr Mo
2010 03
2010 02
2010 01
2009 12
2009 11
2009 10

Wednesday, April 28, 2010

Product Join and Merge Join and Hash Join

A product join, every row of the left table to every row of the right table. Typically it is chosen as the join method, when one of the tables is small. Since it basically needs to read every row of the left table once, and every row of the right table n times, where n is the number of rows in the left table, it can be expensive to execute, if the number of rows in the left table is large.

The typical alternative is merge join. The extra step, however, that is needed to run a merge join is to sort the two source tables. If the tables are already in sorted order this step is not necessary. A merge join executes very fast, since it only reads both tables once. In addition, it is possible, depending on the join selectivities, that not all the datablocks of both tables need to be read.

The hash join does not require that both tables are sorted. The smaller table/spool is "hashed" into memory (sometimes using multiple hash partitions). Then, the larger table is scanned and for each row, it looks up the row from the smaller table in the hashed table that was created in memory. If the smaller table must be broken into partitions to fit into memory, the larger table must also be broken into the same partitions prior to the join.

So, the main advantage of the hash join is that the big table does not have to be sorted and the small table can be much larger than for a product join.

NOPI feature in Teradata 13.0

The purpose of teradata 13.0 feature that allows us to create table with no primary index is to improve performance of fastload. because there is no primary index for the rows of a NOPI table the rows are not hashed to an AMP based on their primary index.Instead , Teradata Database either hashes on the Query ID for a row, or it uses a different algorithm to assign the row to its home AMP once it reaches the AMP onto which it has been FastLoaded. The Teradata Database then generates a RowID for each row in a NoPI table by randomly selecting an arbitrary hash bucket that an AMP owns and using it to generate a RowID. Using a NoPI table as a staging table for such applications avoids the row redistribution and sorting required for primary-indexed staging tables. Another advantage of NoPI tables is that you can quickly load data into them and be finished with the acquisition phase of the utility operation, which frees client resources for other applications. LIMITATIONS : You cannot create a NoPI queue table. • You cannot create a NoPI error table. • You cannot create a NoPI table as a SET table. The default table type for NoPI tables in both Teradata and ANSI session modes is always MULTISET. • If neither PRIMARY INDEX (column_list) nor NO PRIMARY INDEX is specified explicitly in a CREATE TABLE request, then whether the table is created with or without a primary index generally depends on the setting of the DBS Control flag DisableNoPI • NoPI tables cannot specify a permanent journal. • NoPI tables cannot specify an identity column. • Hash indexes cannot be defined on NoPI tables because they inherit the primary index of their underlying base table, and NoPI tables have no primary index. • SQL UPDATE triggers cannot update a NoPI table. • SQL UPDATE and UPDATE (Upsert Form) requests cannot update a NoPI target table. SQL UPDATE and UPDATE (Upsert Form) requests can update a primary-indexed target table from a NoPI source table. • SQL MERGE requests cannot update or insert into a NoPI target table. SQL MERGE requests can update or insert into a primary-indexed target table from a NoPI source table. • You cannot load rows into a NoPI table using the MultiLoad utility.

Saturday, April 24, 2010

difference between TRUNCATE and DELETE

TRUNCATE :
ddl command.
truncate being a ddl is auto commit.
we can only truncate the whole table(cant use where clause)
once table is truncated we cant rollback the changes.
when a table is truncated the memory occupied is released.

DELETE :
delete is a dml command and can be rolled back.
is slower than truncate as it is dml has to go through rollback segments etc.
we can use where clause with delete.
when a table is deleted memory occupied is not released

Monday, April 19, 2010

difference between OLAP and aggregate functions.

OLAP functions are similar to aggregate functions in that they:
Operate on groups of rows (like the GROUP BY clause)
Can filter groups using QUALIFY (like the HAVING clause)

OLAP functions are unlike aggregate functions because they:
Return a data value for each qualifying row - not group.
May not be performed within subqueries.

Thursday, April 15, 2010

FastExport sessions and Bteq Export

advantages of FastExport over Bteq Export :
FastExport Has
Full Restart Capability
uses multiple sessions
uses Block I/O.

By default FastExport uses one sessions for each AMP.
Maximum specification must be greater than zero and no more than
the total number of AMPs on the system.

Wednesday, April 14, 2010

All AMP Operations vs Full Table Scan

All AMP Operation (NUSI) :
1. The SQL is submitted, specifying a NUSI.The hashing algorithm calculates a row hash value for
the NUSI .
2. All AMPs are activated to find the hash value of the NUSI in their index subtables. The AMPs
whose subtables contain that value become the participating AMPs in this request The other
AMPs discard the message.
3. Each participating AMP locates the row IDs (row hash value plus uniqueness value) of the base
rows corresponding to the hash value.
4. The participating AMPs access the base table rows, which are located on the same AMP as the
NUSI subtable.
5. The qualifying rows are sent over the BYNET to the PE, and the PE sends the answer set on to
the client application .

Full-Table Scan : (Accessing data with out indexes) :
If the request does not use a defined index, the Teradata Database does a full-table scan. A full-table scan is another way to access data without using Primary or Secondary Indexes. In evaluating an SQL request, the Optimizer examines all possible access methods and chooses the one it believes to be the most efficient.

A request could turn into a full-table scan when:
The SQL request searches on a NUSI column with many duplicates.
An SQL request uses a non-equality WHERE clause on an index column.
An SQL request uses a range WHERE clause on an index column.

For all requests, you must specify a value for each column in the index

Data File Formats in bteq and load utilities

Bteq :

While creating the data file (exporting from table) the following modes are accepted.
DATA : Record Mode (Raw Binary Data).
INDICDATA : contains indicator bytes.number of indicator bytes = 1 bit for each column.that bit will be
set to 1 if the value for a perticular column is NULL else it will set to 0.
REPORT : Field Mode (column Headings).data is exported 1 column at a time. default format.
DIF : Data Interchange Format . to transport data to various pc programs.

while importing target table from data file the following modes are accepted.
DATA :
INDICDATA :
VARTEXT : each column is separated by a delimiter.

FastLoad,MultiLoad,FastExport,TPUMP,TPT
there are 5 data formats available for load or unload utilities.
Formatted :
Each record is in a format traditionally known as FastLoad or Teradata format. The data is prefixed with the data length and followed by and end-of-record (EOR) marker. This is the most stringent and robust data format and will allow any TPT supported data type to be included.
Let’s consider a record containing 6 bytes of inferred text data "abcdef";

x'00 06 61 62 63 64 65 66 0a' and in character: "..abcdef."

Text :
There is no length indicator as with formatted data (above). Records are separated only by an end-of-record (EOR) marker. To insure data integrity, each record must be entirely character data.
The EOR marker can be either a single-byte linefeed (X'0A') or a double-byte carriage-return/line-feed pair (X'0D0A'). Whether a single or double byte EOR is used is determined by the EOR marker encountered for the first record. If the first record has a single byte EOR and contains the text "abcd", it would look like this:
x'61 63 64 63 0a' and in character: "abcd."

Every subsequent record is expected to use the same single byte EOR.
If the first record has a double byte EOR and contains the text "acdc", it would look like this:
x'61 63 64 63 0d 0a' and in character: "abcd.."

Again, every subsequent record is expected to use the same double-byte EOR.
Single and double byte EORs cannot be mixed in the same data.

Binary :
This format is the same as formatted (see above) except that there is no trailing EOR byte. Each record contains a 2-byte integer data length, n, followed by n bytes of data. As with formatted data, all TPT data types are supported.
E.g., a record containing 4 bytes of text "abcd" would be;
x'00 06 61 62 63 64' and in character: "..abcd"

Delimited / variable length Text :
This format is an extension of the text format. It is similar because it must contain all character data and be terminated by an EOR marker. But it has a different restriction: all columns in the TPT schema must be VARCHAR** (as opposed to CHAR/ANSIDATE for simple text). The advantage of this format is that it supports an additional feature that allows the definition of individual fields (which map to DBS table row columns) separated by the special delimiter character. To successfully parse delimited data, the delimiter character can not appear within the data. Delimiter data embedded within a column will result in a data column count error.
UnFormatted :
The data does not conform to any external format. There is neither a prepended length as with binary and formatted nor is there an EOR marker as with text and delimited.

limitations of fastload and multiload

FastLoad Limitations :

Only load one table.

Table must be empty.

No secondary Indices.

No enabled triggers.

No referential integrity.

Duplicate rows can’t be loaded.

MultiLoad Limitations :

load up to 5 tables.

Table may be empty/populated.

No Unique secondary Indices.(NUSI permitted) (USI – subtable rows are stored in different AMP ).

No enabled triggers.

No referential integrity.

No join indexes.

Tuesday, April 13, 2010

Difference between Fastload and multiload interms of performance

The basic concept is
FastLoad used to load and empty table and none of the secondary indicis are allowed on the target table.and it can load only 1 table.
Multiload can load already populated table and can load upto 5 tables.NUSI is allowed on the target table.

in multiload, as the target table can contain NUSI, it has to load both the subtable and the base table.where as in fastload no need of subtable as NUPI is not allowed.

one more difference is multiload uses two work tables.
the purpose of work tables is to hold the DML requests and the input data that is ready to APPLY to AMPS.

So multiload has to first load the worktable and then the target table.

loading worktable is just like fastloading the work table.So it will take twice the resources and hence the performance is less compared to fastload

different phases of fastload in brief

Two Phases :

Acquisition Phase :

PE sends Blocks to Each AMP in round robin.

AMPS hash and re-distributes to appropriate AMPs.

Records are written to AMP spool file .the records are un sorted.

Application Phase :

This Phase starts once END LOADING is encountered

Each AMP sort spool file and creates data blocks.

Writes data blocks to disk.

Fallback rows generated if required.

MultiLoad Modes: IMPORT and DELETE

MultiLoad provides two types of operations via modes:
IMPORT and DELETE.
In MultiLoad IMPORT mode,
you have the freedom to "mix and match" up to twenty (20) INSERTs, UPDATEs or DELETEs on up to five target tables.
The execution of the DML statements is not mandatory for all rows in a table.
Instead, their execution hinges upon the conditions contained in the APPLY clause of the script.
Once again, MultiLoad demonstrates its user-friendly flexibility.
For UPDATEs or DELETEs to be successful in IMPORT mode, they must reference the Primary Index in the WHERE clause.

The MultiLoad DELETE
mode is used to perform a global (all AMP) delete on just one table.
The reason to use .BEGIN DELETE MLOAD is that it bypasses the Transient Journal (TJ) and can be
RESTARTed if an error causes it to terminate prior to finishing.
When performing in DELETE mode,
the DELETE SQL statement cannot reference the Primary Index in the WHERE clause. This due to
the fact that a primary index access is to a specific AMP; this is a global operation.
The other factor that makes a DELETE mode operation so good is that it examines an entire block
of rows at a time. Once all the eligible rows have been removed, the block is written one time and a
checkpoint is written. So, if a restart is necessary, it simply starts deleting rows from the next block
without a checkpoint. This is a smart way to continue. Remember, when using the TJ all deleted
rows are put back into the table from the TJ as a rollback. A rollback can take longer to finish than
the delete. MultiLoad does not do a rollback; it does a restart.

Phases Of MultiLoad

Preliminary Phase :
-- Checks SQL commands or Multiload commands are valid .
-- Establishes multiload sessions with Teradata.
-- Creates Error Tables, Work Tables and log table.
Error Tables : requires 2 error tables per target table.
First contains constraint violations.
Second contains UPI violations.
Work Tables : holds the DML tasks requested and the input data that is ready to APPLY to AMPS.
LOGTABLE : keeps a record of results from each phase of the load so that multiload knows
proper point from which to restart.
-- Apply utility locks to the target table.

DML Transaction Phase :
-- All the SQL DML statements are sent to database.
-- PE parses the DML and generates a step-by-step plan to execute the request.
-- This execution plan is then communicated to each AMP and stored in appropriate work tables for each target table.
-- Later, during the Acquisition phase the actual input data will also be stored in the worktable so that it may be applied the Application Phase.
-- Next, a match tag is assigned to each DML request that will match it with the appropriate rows of input data. The match tags will not actually be used until the data has already been acquired and is about to be applied to the worktable. This is somewhat like a student who receives a letter from the university in the summer that lists his courses, professor’s names, and classroom locations for the upcoming semester. The letter is a “match tag” for the student to his school schedule, although it will not be used for several months. This matching tag for SQL and data is the reason that the data is replicated for each SQL statement using the same data record.

Acquisition Phase :
-- MultiLoad now acquires the data in large, unsorted 64K blocks from the host and sends it to the AMPs.(round Robin).
-- The AMP hashes each row on the primary index and sends it over the BYNET to the proper AMP where it will ultimately be used. But the row does not get inserted into its target table, just yet.
-- The AMP puts all of the hashed rows it has received from other AMPs into the worktables where it assembles them into the SQL.
-- Now the utility places a load lock on each target table in preparation for the Application Phase.

Application Phase :
The purpose of this phase is to write, or APPLY, the specified changes to both the target tables and NUSI subtables.

Cleanup phase :
The utility looks at the final Error Code (&SYSRC). MultiLoad believes the adage, “All is well that ends well.” If the last error code is zero (0), all of the job steps have ended successfully (i.e., all has certainly ended well). This being the case, all empty error tables, worktables and the log table are dropped. All locks, both Teradata and MultiLoad, are released. The statistics for the job are generated for output (SYSPRINT) and the system count variables are set. After this, each MultiLoad session is logged off.

Tuesday, March 30, 2010

Difference between subquery and corelated subquery

Conceptually, the subquery is processed first so that all the values are expanded into the list for comparison with the column specified in the WHERE clause. These values in the subquery SELECT can only be used for comparison against the column or columns referenced in the WHERE.

Columns inside the subquery SELECT cannot be returned to the user via the main SELECT. The only columns available to the client are those in the tables named in the main (first) FROM clause. The query in parentheses is called the subquery and it is responsible for building the IN list.

The correlated subquery is a very powerful tool. It is an excellent technique to use when there is a need to determine which rows to SELECT based on one or more values from another table. This is especially true when the value for comparison is based on an aggregate. It combines subquery processing and join processing into a single request.

The operation for a correlated subquery differs from that of a normal subquery. Instead of comparing the selected subquery values against all the rows in the main query, the correlated subquery works backward. It first reads a row in the main query, and then goes into the subquery to find all the rows that match the specified column value. Then, it gets the next row in the main query and retrieves all the subquery rows that match the next value in this row. This processing continues until all the qualifying rows from the main SELECT are satisfied.

Although this sounds terribly inefficient and is inefficient on other databases, it is extremely efficient in Teradata. This is due to the way the AMPs handle this type of request. The AMPs are smart enough to remember and share each value that is located.

Monday, March 22, 2010

Collect Statistics In Teradata

Collect stats is an important concept in teradata.
collect stats gives PE to come up with a plan with least cost for an requested query.
Collect stats defines the confidence level of PE in estimating "how many rows it is going to access ? how many unique values does a table have , null values etc and all this info is stored in data dictionary. Once you submit a query in TD the parsing engine checks if the stats are avaiable for the requested table , if it has collected stats earlier PE generates a plan with "high confidence" .
in absence of collect stats plan will PE with "low confidence" . however teradata's optimizer is very robust and intelligent, even if you do not collect stats on a table,column,indexes PE does an "Dynamic Amp sampling " which means it will select a random amp and this random amp comes up with the info about the table data it has , based upon this PE ( knows datademographics & available system componets) estimates the workload and generates a plan.

how to find the list of views that are created on a table in teradata

the following query gives the list of views that are created on a table.

select tablename from dbc.tables where TableKind='V' and RequestText LIKE '%Table%' group by 1 order by 1;

In place of Table specify your table name.
for example if i want to find the list of views that are created on table test1,i will use the following query.

select tablename from dbc.tables where TableKind='V' and RequestText LIKE '%test1%' group by 1 order by 1;

The query displays

TableName
------------------------------
view1
view2
view3

Friday, March 19, 2010

How to remove duplicate rows from a teradata table

This method uses a temporary table to remove the duplicate rows.
if a table contains duplicate rows means
It should be
1. MultiSet tabel.
2. should not contain UPI.

let us assume we need to remove duplicate rows from tab1.
the table is created as
create multiset table tab1(col1 integer,col2 integer);
populate the table with duplicate rows.

STEP1:
create a temporary SET/MULTISET table () with the following query.
create set table temp_tab1 as tab1 with no data;
this will create a temporary set table which does not allow duplicates.

STEP2:
populate the temporary temporary set table by removing the duplicate rows from the original table. this can be done using the following query.

insert into temp_tab1 select * from tab1 group by col1,col2 having count(*) > 1;
(or)
insert into temp_tab1 select distinct * from tab1 ;

STEP3:
delete all the duplicate rows from the original table.(this query rows all the duplicate rows including the base row).

 delete from tab1
   where (col1,col2)
   in (select col1,col2 from temp_tab1);

STEP4:
insert the rows from the temp table to the base table using the following query.
insert into tab1
   sel * from temp_tab1;

there are two more methods to delete duplicate rows without using the temporary
table  using the Teradata Fastload Utility and Teradata Warehouse Builder.

Method1 :
Use Fast Export to write the rows from the source table into a file.
Then Delete all rows from the source table.
then use FastLoad to load the target table.The file to load the table is generated from the FastExport.

Method 2:
Use Teradata WareHouseBuilder (TWB) with 2 steps.
in the first step use data connector as consumer and Export operator as producer.
in the second step use Dataconnector as producer and load operator as consumer.

these are based on the fact that Fastload cannot load duplicate rows.

Thursday, March 18, 2010

Loading an empty table using fastload with data present in multiple files

This will show how to load an empty table using fastload with data present in multiple files.
let us do this with 3 files.each file contains 2 records each.

A the table contains one column and that is integer for simplicity.

data file : f1-2.dat contains values 1 and 2.
f3-4.dat contains values 3 and 4.
f5-6.dat contains values 5 and 6.

lets prepare 3 fastload scripts.

STEP1:

fload1.fl :-
.logon abc/xyz,xyz
drop table errtbl1;
drop table errtbl2;
drop table test1;
create table test1(col1 integer);

.BEGIN LOADING test1 ERRORFILES errtbl1,errtbl2;
DEFINE col1(integer)
FILE=f1-2.dat;
INSERT INTO test1 values(:col1);
.logoff

fload2.fl :-
.logon abc/xyz,xyz
.BEGIN LOADING test1 ERRORFILES errtbl1,errtbl2;
DEFINE col1(integer)
FILE=f3-4.dat;
INSERT INTO test1 values(:col1);
.logoff

fload3.fl :-
.logon abc/xyz,xyz
.BEGIN LOADING test1 ERRORFILES errtbl1,errtbl2;
DEFINE col1(integer)
FILE=f5-6.dat;
INSERT INTO test1 values(:col1);
.END LOADING
.logoff.

STEP 2:
execute the 3 fastload scripts in sequence like
first execute fload1.fl
next fload2.fl
next fload3.fl

after executing fload3.fl the table test1 will be loaded with 6 records.

Explanation :
After the execution of fload1.fl and fload2.fl fastload is in paused state.
when the fload3.fl is executed (i.e) when the fastload sees .END LOADING statement it will update the error tables and target table and release locks.
makesure that each .BEGIN LOADING.. statement should have the same error tables specified.

Wednesday, March 17, 2010

concept of indexes in teradata

UPI, NUPI:
Teradata uses primary index to distribute data across AMPs.
PI access is always one amp operation as teradata knows exactly where the record is.
So both UPI, NUPI results in one amp operation.
Where clause on PI with UPI may return 0 to 1 record
Where clause on PI with NUPI may return 0 to many records.

USI, NUSI:
Secondary index provides an alternate path to access data.
Index creates sub table on all AMPs.
It stores index value, index rowhash, rowid of record. So when ever you insert data into a table having index, DB also makes corresponding entries into index sub tables.
USI is 2 AMP operation. First teradata hashes index value, goes to a particular AMP, looks for index rowhash, Then gets the index value and also rowid where record is present. Its definitely much faster than doing Full table scans.
NUSI is all AMP operation but not full table scan. In case of NUSI, each index sub table stores only its values. So if you limit on a NUSI column, Each AMP will search for the limit you apply and returns records if present.

Global Temporary Tables and Volatile Temporary Tables in Teradata

These are Teradata Specific concepts.
Both are temporary tables and are used normally as a kind of "intermediate stage" for an SQL operation.

Global Temporary Tables (GTT) :
1.GTT takes up space from Temporary space
2.GTT survive TD system restarts
3.Index can be created on GTT.
4.CHECK or BETWEEN constraints, COMPRESS column and DEFAULT
and TITLE clause are supported by GTT.
5.you can collect stats on GTT.
6.Table Definition is stored in DD for GTT.
7.In a single session 2000 Global temporary table can be materialized.
8.Materialized instance of GTT will be discarded at session end.

Volatile Temporary Tables (VTT):
1.VT takes up space from spool, doesn't have any info in DD.
so is slightly faster that GTT in that terms because no DD lookup is required.
2.VT can't survive TD system restarts
3.Index cann't be created on GTT.
4.CHECK or BETWEEN constraints, COMPRESS column and DEFAULT and TITLE clause are not supported by VT.
5.No collect stats for VTT.
6.Table Definition is stored in System cache for VTT.
7.In a single session 1000 Volatile tables can be materialized.
8.Volatile tables are always qualified by the session's userid.

This is how we usually create the GTT and VTT.
GTT :
CREATE GLOBAL TEMPORARY TABLE gt_deptsal
( deptno SMALLINT
,avgsal DEC(9,2)
,maxsal DEC(9,2)
,minsal DEC(9,2)
,sumsal DEC(9,2)
,empcnt SMALLINT);

VTT :

CREATE VOLATILE TABLE vt_deptsal, LOG
(deptno SMALLINT
,avgsal DEC(9,2)
,maxsal DEC(9,2)
,minsal DEC(9,2)
,sumsal DEC(9,2)
,empcnt SMALLINT)
ON COMMIT PRESERVE ROWS;

ON COMMIT PRESERVE ROWS allows us to use the Volatile table again for other queries in the session. The default statement is ON COMMIT DELETE ROWS, which means the data is
deleted when the query is committed.

INSERT INTO gt_deptsal
SELECT dept ,AVG(sal) ,MAX(sal) ,MIN(sal) ,SUM(sal) ,COUNT(emp)
FROM emp
GROUP BY 1;

Problem
To show all employees who make the maximum salary in their department.
Sol :
SELECT emp, last, dept, sal
FROM emp INNER JOIN gt_deptsal
ON dept=deptno
WHERE sal=maxsal
ORDER BY 3;

How to display Date in 'YY-MM-dd' format in Sql Assistant

creating table:
create table tab6(col1 integer,col2 date);

populate table as :
ins tab6(1,DATE);
ins tab6(2,DATE);
ins tab6(3,DATE);

In Bteq :

select col2 (format 'yy-mm-dd') from tab6;

*** Query completed. 3 rows found. One column returned.
*** Total elapsed time was 1 second.
col2
--------
08-09-14
08-09-14
08-09-14

But this won't work in SQL ASST

In SQl ASST :
we need to specify the following query

select cast( (col2 (format 'YY-MM-DD')) as char(10)) from tab6;

This is because SQL Assistant is connected to Teradata via ODBC and BTEQ uses CLI.
And ODBC doesnot support FORMAT clause.
This is why the results are different in SQL Assistant and BTEQ.

select TOP n row(s) from a table.

At the start the table contains the following rows.

select * from tab7 order by 1;

col1 col2
----------- -----------

1 2
2 2
3 2
4 4
5 5
6 6

Problem:
To Get 2nd row from the bottom.

select * from tab7 QUALIFY RANK(col1) =2;
col1 col2
----------- -----------

5 5

To Get Top 2nd row :

select * from tab7 QUALIFY RANK(col1 ASC) =2;
col1 col2
----------- -----------

2 2

To Get Top 4 rows :

select * from dup_tab1 QUALIFY RANK(col1 ASC) <= 4;

To Get Top 4 rows randomly :

select TOP 10 * from dup_tab1;

How do i update multiple rows of a table with different values in a single query??

select * from tab1;

*** Query completed. 2 rows found. 2 columns returned.
*** Total elapsed time was 1 second.

col1 col2
----------- ----------
101 vij
102 ven

UPDATE tab1
SET col2 = CASE WHEN col1 = 101 THEN 'XXXX'
WHEN col1 = 102 THEN 'YYYY'
END;

select * from tab1;
*** Query completed. 2 rows found. 2 columns returned.
*** Total elapsed time was 1 second.

col1 col2
----------- ----------
101 XXXX
102 YYYY

Concept of Primary Key and Primary Index in Teradata

In Teradata,
Primary key is used to uniquely identify each row in a table.PKs are used in conjunction with foreign keys to define the important column relationships in a database.

The concept of primary index is completely different from that of Primary key.
Primary Index is used to distribute and locate rows of a table over the AMPS .Choice of Primary Index will affect distribution ,access and performance.

Oftentimes, but not always, the Primary Index and Primary Key are the same.
Indexes (primary or secondary) may be used to enforce uniqueness (as in a PK) or to improve access.

Accessing rows using primary index is always one AMP operation.

PK is a relational modeling convention where as PI is a teradata convention.

Example to create a table with Unique Primary Index (UPI) :
CREATE TABLE MJ1.emp_data,FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL
(
employee_number INTEGER NOT NULL,
department_number SMALLINT,
job_code INTEGER COMPRESS ,
last_name CHAR(20) NOT CASESPECIFIC NOT NULL,
first_name VARCHAR(20) NOT CASESPECIFIC,
street_address VARCHAR(30) NOT CASESPECIFIC TITLE 'Address',
city CHAR(15) NOT CASESPECIFIC DEFAULT 'Boise'
COMPRESS 'Boise',
state CHAR(2) NOT CASESPECIFIC DEFAULT ' ',
birthdate DATE FORMAT 'mm/dd/yyyy',
salary_amount DECIMAL(10,2),
sex CHAR(1) UPPERCASE NOT CASESPECIFIC)
UNIQUE PRIMARY INDEX ( employee_number )
INDEX ( department_number );

Teradata - Permanent Space VS Temporary Space VS Spool Space

Teradata space usage:

----------------------

3 kinds of Space(s) in Teradata database:

1. Permanent Space

2. Spool Space

3. Temporary Space

Example: create user xyz perm = 1e4, spool = 1e3, temporary = 1e3, password = xyz;

-> Permanent space is used for database object (tables, indexes etc) creation and to hold their data.

-> Spool space is used to hold data for Intermediate Query results.

Holds only data.

Active upto the current session only.

Spool space is same as that of the root, unless specified.

-> Temporary Space is used to store Global temporary tables data.

Data is active upto the current session only.

5 kinds of Tables in Teradata database:

1. SET Table

2. Multiset Table

3. Global Temporary Table (GTT)

4. Temporary Table

5. Volatile Temporary Table

Create tables go into data dictionary (Permament Space)

Tables

------

SET table:

----------

1. Table defintion : In data dictionary (permanent space)

2. Data storage : Permanent space

3. After Database restart: Both table and data survives

4. Example : Can directly create with 'create set table ….'

5. Collect stats : Yes

Mutliset table:

---------------

1. Table defintion : In data dictionary (permanent space)

2. Data storage : Permanent space

3. After Database restart: Both table and data survives

4. Example : Can directly create with 'create multiset table ….'

5. Collect stats : Yes

Global Temporary Table(GTT):

----------------------------

1. Table defintion : In data dictionary (permanent space)

2. Data storage : Temporary space

3. After Database restart: Table definition survives

4. Example : Can directly create with 'create global temporary table ….'

5. Collect stats : Yes

Temporary Table:

----------------------------

1. Table defintion : No table definition

2. Data storage : Spool

3. After Database restart: Nothing survives

4. Example : Indirectly ( derived tables )

5. Collect stats : No

Volatile Table:

----------------------------

1. Table defintion : System Cache (primary memory - RAM)

2. Data storage : Spool

3. After Database restart: Nothing survives

4. Example : Indirectly ( derived tables )

5. Collect stats : No

To get syntax for a command in teradata

After logging on to teradata through bteq/Sql assistant

type the following command

exec dbc.getsyntax('sql','create table');

getsyntax is a macro that accepts 2 parameters.

First parameter is the type of utility that the command belongs to.

valid values includes the following.

PMPC

BULKLOAD

DUMP

getting count of data from n different tables in teradata

I have n tables .
all tables need not to have
1.same number of columns
2.common columns.

I need to get the count of the data which is union of all the table's data.
for example there are 3 tables a,b,c in which
table 'a' has 3 rows
table 'b' has 4 rows.
table 'c' has 5 rows.

then Result Should be Count of a + Count of b + Count of c
(i.e) 3+4+5 = 12

Ans :
I am doing this by taking 3 tables as example.
we can extend this for any number tables;

select * from a;

*** Query completed. 3 rows found. One column returned.
*** Total elapsed time was 1 second.

col1
-----------
3
1
2

select * from b;

*** Query completed. 4 rows found. 2 columns returned.
*** Total elapsed time was 1 second.

col1 col2
----------- -----------
3 3
1 1
4 4
2 2

select * from c;

*** Query completed. 5 rows found. 3 columns returned.
*** Total elapsed time was 1 second.

col1 col2 col3
----------- ----------- -----------
5 5 5
3 3 3
1 1 1
4 4 4
2 2 2

SELECT SUM(CountTables) FROM
(
SELECT COUNT(*) AS "CountTables" FROM a
UNION
SELECT COUNT(*) AS "CountTables" FROM b
UNION
SELECT COUNT(*) AS "CountTables" FROM c
)t;

*** Query completed. One row found. One column returned.
*** Total elapsed time was 1 second.

Sum(CountTables)
----------------
12

caliculate permspace taken by a table in teradata

To find total permanent space taken by a table on all AMPs.

select databasename, tablename, sum(currentperm)
from dbc.tablesize where databasename = 'raja'
and tablename = 'testing' group by databasename , tablename;

DataBaseName TableName Sum(CurrentPerm)
---------------- ------------ - ------------------
raja testing 1,536

here 'testing' is the table name and raja the user/database name.

To find total permanent space taken by a table on each AMP.

select databasename, tablename, currentperm,Vproc from dbc.tablesize where databasename ='raja' and tablename = 'testing' ;

DataBaseName TableName Sum(CurrentPerm) Vproc
---------------- ------------ ------------------- ------
raja testing 1,024 0
raja testing 512 1

Search This Blog

Friday, May 7, 2010

Wednesday, April 28, 2010

Saturday, April 24, 2010

Monday, April 19, 2010

Thursday, April 15, 2010

Wednesday, April 14, 2010

Tuesday, April 13, 2010

Tuesday, March 30, 2010

Monday, March 22, 2010

Friday, March 19, 2010

Thursday, March 18, 2010

Wednesday, March 17, 2010

Followers

Blog Archive

About Me