Data Flow Task Problem: Daily file processing help

Hello DBA's:

I need to extract a particular file from our AS 400 system on a daily basis and do some processing on it. Also I want to do the daily processing only on those records that have been added/updated since the initial load.
Here is the approach and possible implementation.

Approach
I have the DB2 source data containing 10 columns (Col1 to Col5 together form the key) and the rest are attributes. I am interested only in the key and two attributes. So I load my table with only Col1 to Col7 ( 5 for key and the two attributes). I then do my processing on this table.

Here is the implementation given by a member of dbforums -

You'll then have to deal with 3 potential actions,

INSERT: New records on the file.
DELETES: Records that don't exists.
UPDATES: Records that are on the file, but attributes have changed.

You had given me this code template.

CREATE TABLE myTable99 (
Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)
, CONSTRAINT myTable99_pk PRIMARY KEY (Col1, Col2))

CREATE TABLE myTable00 (
Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)
, CONSTRAINT myTable00_pk PRIMARY KEY (Col1, Col2))
GO

INSERT INTO myTable99(Col1,Col2,Col3,Col4,Col5)
SELECT '1','1','a','b','c' UNION ALL
SELECT '1','2','d','e','f' UNION ALL
SELECT '1','3','g','h','i' UNION ALL
SELECT '1','4','j','k','l'
--DELETED

INSERT INTO myTable00(Col1,Col2,Col3,Col4,Col5)
SELECT '1','1','a','b','c' UNION ALL --NO CHANGE
SELECT '1','2','x','y','z' UNION ALL -- UPDATE (My comment - Instead of an update I want to insert a new record)
SELECT '1','3','g','h','i' UNION ALL --NO CHANGE
SELECT '2','3','a','b','c' --INSERT
GO

SELECT * FROM myTable99
SELECT * FROM myTable00
GO

--DO DELETES FIRST (My comment - Before deleting, I want to copy the rows that I am going to delete on a separate table to maintain history. Then I want to delete from a). I don't get the logic. If the rows of the old extract are not in new extract then delete them. So shouldn't it be <> instead of =. why the where clause condition)
DELETE FROM a
FROM myTable99 a
LEFT JOIN myTable00 b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL AND b.Col2 IS NULL

-- INSERT (My comment - I don't get the logic of the where. If the rows of the old extract are not in new extract then delete them. So shouldn't it be <> instead of =)

INSERT INTO myTable99(Col1,Col2,Col3,Col4,Col5)
SELECT a.Col1, a.Col2, a.Col3, a.Col4, a.Col5
FROM myTable00 a
LEFT JOIN myTable99 b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL AND b.Col2 IS NULL

-- UPDATE

UPDATE a
SET Col3 = b.Col3
, Col4 = b.Col4
, Col5 = b.Col5
FROM myTable99 a
INNER JOIN myTable00 b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
AND ( a.Col3 <> b.Col3
OR a.Col4 <> b.Col4
OR a.Col5 <> b.Col5)
GO
----

Can anybody look at My comments and answer them or revise this code template if need be?

Brett Kaiser - I sent you an e-mail on this. Can you respond to it when time permits.

ThanksVivek,

Did you test it? Run it I mean?

I worked with a guy named Vivek once...

First the DELETE's. You know why you want to delete first, correct?
If yes, and you want to retain history, create an identical historical table called:

CREATE TABLE myTable99_H (
HIST_ADD_DT datetime DEFAULT GetDate()
, Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)

With no constraints..it is history after all, and is inheriting the constarints from the base table

As far as the logic...if you look it's a LEFT OUTER JOIN between the 2 tables WHERE the key DOESN'T exist in the other (NOT NULL)

Got it?

Second, INSERTS...

Same logic as to why (NOT NULL)

Well did you cut and paste the code in to QA?

It should run for you no problems and should be a good example...|||Brett,

Yes it works. Sorry about that friend. I should have tested it in QA before. Thanks. Just a one more thing - I want to do the daily processing only on those records that have been updated/inserted. My daily processing involves checking if those attribute columns Col6 and Col 7 are present in this other file. If they are present than I need to say myTable99 records is Yes - it is present.

So I don't want to check on the attributes that have not changed. Would you suggest using a flag/status field for this purpose? Is there any other way?

Thanks

Vivek

1. Instead of deleting the row first, I have to move that row to a history table and then delete it. So is this correct for Delete -

CREATE TABLE myTable99_H (
HIST_ADD_DT datetime DEFAULT GetDate()
, Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)

INSERT INTO myTable99_H(Col1,Col2,Col3,Col4,Col5)
SELECT a.Col1, a.Col2, a.Col3, a.Col4, a.Col5
FROM myTable99 a
LEFT JOIN myTable00 b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL AND b.Col2 IS NULL

Then do the delete

DELETE FROM a
FROM myTable99 a
LEFT JOIN myTable00 b
ON a.Col1 = b.Col1
AND a.Col2 = b.Col2
WHERE b.Col1 IS NULL AND b.Col2 IS NULL

Originally posted by Brett Kaiser
Vivek,

Did you test it? Run it I mean?

I worked with a guy named Vivek once...

First the DELETE's. You know why you want to delete first, correct?
If yes, and you want to retain history, create an identical historical table called:

CREATE TABLE myTable99_H (
HIST_ADD_DT datetime DEFAULT GetDate()
, Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)

With no constraints..it is history after all, and is inheriting the constarints from the base table

As far as the logic...if you look it's a LEFT OUTER JOIN between the 2 tables WHERE the key DOESN'T exist in the other (NOT NULL)

Got it?

Second, INSERTS...

Same logic as to why (NOT NULL)

Well did you cut and paste the code in to QA?

It should run for you no problems and should be a good example...|||I don't worry about it...it's like picking up sand with tweazers...

As long as you keep history of deletes AND updates with triggers, you will always have a total view of the "life" of the data as it "grows up" (and potentially dies - DELETE).

You can then determine with SQL what happened when and where (and if you add the right columns, by whom).

BUT...

You van look up the COLUMNS_UPDATED syntax in the Trigger

I don't use it, so sorry...

Go to BOL and look up CREATE TRIGGER in the index...

BUT...I'm perfectly happy with my method of tracking and retaining history...|||Brett,

I agree that your method is good. I would do the same.

My daily processing involves checking if those attribute columns Col6 and Col 7 are present in this other file. If they are present than I need to say myTable99 records is Yes - it is present.

So I don't want to check on the attributes that have not changed. That is the records with NO CHANGE shouldn't undergo any processing. Only those that have been updated/inserted should be processed.

Would you suggest using a flag/status field for this purpose? Is there any other way?

Thanks

Vivek|||OK, So you're saying, any rows that are identical you want to bypass?|||Yes. Any rows that are identical should be bypassed. I have already done the processing on these rows. So I don't want to do it again. Only updates/inserts should be processed.|||And DELETES...

But if Col1 and 2 or Key

and in the update I say col3 <> col3
Or col4 <> col4

isn't the extension of the col6 <> col6
or col7 <> col7

You're losing me...

Because that would do what you're asking...|||Brett,

Let me make it simple...

CREATE TABLE myTable99 (
Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)
, CONSTRAINT myTable99_pk PRIMARY KEY (Col1, Col2))

CREATE TABLE myTable00 (
Col1 char(1)
, Col2 char(1)
, Col3 char(1)
, Col4 char(1)
, Col5 char(1)
, CONSTRAINT myTable00_pk PRIMARY KEY (Col1, Col2))
GO

INSERT INTO myTable99(Col1,Col2,Col3,Col4,Col5)
SELECT '1','1','a','b','c' UNION ALL --NO CHANGE
SELECT '1','2','d','e','f' UNION ALL --UPDATED
SELECT '1','3','g','h','i' UNION ALL -- --NO CHANGE
SELECT '1','4','j','k','l'
--DELETED

--This was my initial load...I check if Col4 or Col5 or Col6 (Name fields) are present in a Name file containing a list of names. If they are then I output the record plus the Name record from the Name file into a different table. That is my processing. I have a stored proc that does this. So now during the day this file gets updated in the source system. So the next day what I have is -

INSERT INTO myTable00(Col1,Col2,Col3,Col4,Col5)
SELECT '1','1','a','b','c' UNION ALL --NO CHANGE
SELECT '1','2','x','y','z' UNION ALL -- UPDATE (My comment - Instead of an update I want to insert a new record)
SELECT '1','3','g','h','i' UNION ALL --NO CHANGE
SELECT '2','3','a','b','c' --INSERT
GO

Select * FROM myTable00.

I compare this with myTable99, do the deletes,updates and inserts like you said to get the appropriate contents of Table00 to Table99. Are you with me till this point. So now my Table99 next day looks like -

1','1','a','b','c' --NO CHANGE (So today I don't have to check this record with the Name file.)
'1','2','x','y','z' -- UPDATE ( I have to check this record with Name File)
'1','3','g','h','i' --NO CHANGE (No check. If I check and output with Name record then duplicate row)
'2','3','a','b','c' --INSERT (Check)

Have I made it clear now?

Let me know what you think. Appreciate your help.

Vivek|||Originally posted by vivek_vdc

INSERT INTO myTable00(Col1,Col2,Col3,Col4,Col5)
SELECT '1','1','a','b','c' UNION ALL --NO CHANGE
SELECT '1','2','x','y','z' UNION ALL -- UPDATE (My comment - Instead of an update I want to insert a new record)
SELECT '1','3','g','h','i' UNION ALL --NO CHANGE
SELECT '2','3','a','b','c' --INSERT
GO

Clear as mud...but I've got way too much time invested...

You can't do that..see the bold...you're pk is col1 and col2...they can'r appear more than once like your sample suggestions...

again, unless it's too murky in here...

Sunday, February 19, 2012

Daily file processing help

No comments:

Post a Comment

Data Flow Task Problem

Blog Archive

About Me