Win 2k pro
SQL server 2k (dev ed)
ASP-VBscript
I got 3 tables but I'm not sure if they should be 3 or all merged into
one table. Table 2 & 3 have a 1-1 relationship with table 1 with
cascade deletes & updates.
TABLE 1: Members
MemberID
FirstName
LastName
Gender
DOB
Age (computed Col)
Description
TABLE 2: Members_Homepages
MemberID
HomePageURL
TABLE 3: Member_Email
MemberID
EmailAddress
Reson for the 3 tables is that not every member will have a homepage
and possibly not every member will have an email address.
If putting these 2 fields into table 1 I dont think it will fail any
of the first 3 Normalization rules (1NF, 2NF, 3NF).
I'm simply splitting them off to save on data & rows per page space
(eg varchar(255) each) but it does mean that all my stored procs that
require the info will need to have "Left joins" to get the homepage &
emails for each member.
Which is the best method : all info 1 table or split into 3 (like I
have) and WHY?
Thanks for any tips on DB designing as I'm new to all this.
ALSO. Does anyone know of any good sites on database design.
Al."Harag" <harag@.softhome.net> wrote in message
news:dgeemv8fu7ienago4l41vsmrfbbifjs5ps@.4ax.com...
> Hi All.
> Win 2k pro
> SQL server 2k (dev ed)
> ASP-VBscript
> I got 3 tables but I'm not sure if they should be 3 or all merged into
> one table. Table 2 & 3 have a 1-1 relationship with table 1 with
> cascade deletes & updates.
> TABLE 1: Members
> MemberID
> FirstName
> LastName
> Gender
> DOB
> Age (computed Col)
> Description
> TABLE 2: Members_Homepages
> MemberID
> HomePageURL
> TABLE 3: Member_Email
> MemberID
> EmailAddress
> Reson for the 3 tables is that not every member will have a homepage
> and possibly not every member will have an email address.
This is what nullable columns are for.
They should all be one table.
> I'm simply splitting them off to save on data & rows per page space
> (eg varchar(255) each) but it does mean that all my stored procs that
> require the info will need to have "Left joins" to get the homepage &
> emails for each member.
> Which is the best method : all info 1 table or split into 3 (like I
> have) and WHY?
Much more efficient to put them in one table. Much.
David|||This is a multi-part message in MIME format.
--=_NextPart_000_033E_01C37C51.767992E0
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
Keep in mind the extra overhead when having nullable columns in your =table. See Kalen Delaney's "Inside SQL Server 2000" for details.
-- Tom
---
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com/sql
"David Browne" <davidbaxterbrowne no potted meat@.hotmail.com> wrote in =message news:Omfq4JHfDHA.3700@.TK2MSFTNGP11.phx.gbl...
"Harag" <harag@.softhome.net> wrote in message
news:dgeemv8fu7ienago4l41vsmrfbbifjs5ps@.4ax.com...
> Hi All.
> Win 2k pro
> SQL server 2k (dev ed)
> ASP-VBscript
> I got 3 tables but I'm not sure if they should be 3 or all merged into
> one table. Table 2 & 3 have a 1-1 relationship with table 1 with
> cascade deletes & updates.
> TABLE 1: Members
> MemberID
> FirstName
> LastName
> Gender
> DOB
> Age (computed Col)
> Description
> TABLE 2: Members_Homepages
> MemberID
> HomePageURL
> TABLE 3: Member_Email
> MemberID
> EmailAddress
> Reson for the 3 tables is that not every member will have a homepage
> and possibly not every member will have an email address.
This is what nullable columns are for.
They should all be one table.
> I'm simply splitting them off to save on data & rows per page space
> (eg varchar(255) each) but it does mean that all my stored procs that
> require the info will need to have "Left joins" to get the homepage &
> emails for each member.
> Which is the best method : all info 1 table or split into 3 (like I
> have) and WHY?
Much more efficient to put them in one table. Much.
David
--=_NextPart_000_033E_01C37C51.767992E0
Content-Type: text/html;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&
Keep in mind the extra overhead when =having nullable columns in your table. See Kalen Delaney's "Inside SQL =Server 2000" for details.
-- Tom
---T=homas A. Moreau, BSc, PhD, MCSE, MCDBASQL Server MVPColumnist, SQL =Server ProfessionalToronto, ON Canadahttp://www.pinnaclepublishing.com/sql">www.pinnaclepublishing.com=/sql
"David Browne"
--=_NextPart_000_033E_01C37C51.767992E0--|||Hmm I don't have that book, I got the Sams Teach yourself Transact SQL
in 21 days for starters, Which I cant find anything about the overhead
for nullable columns.
Can you please point me to any websites about things like this?
thanks
Al.
On Tue, 16 Sep 2003 12:53:01 -0400, "Tom Moreau"
<tom@.dont.spam.me.cips.ca> wrote:
>Keep in mind the extra overhead when having nullable columns in your table. See Kalen Delaney's "Inside SQL Server 2000" for details.|||This is a multi-part message in MIME format.
--=_NextPart_000_03B7_01C37C58.243E3A10
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Her website is www.insidesqlserver.com but you really should get the =book. It's the gold standard on internals.
-- Tom
---
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com/sql
"Harag" <harag@.softhome.net> wrote in message =news:6jhemvkncq2ul8figga5s136a8jrkuduid@.4ax.com...
Hmm I don't have that book, I got the Sams Teach yourself Transact SQL
in 21 days for starters, Which I cant find anything about the overhead
for nullable columns.
Can you please point me to any websites about things like this?
thanks Al.
On Tue, 16 Sep 2003 12:53:01 -0400, "Tom Moreau"
<tom@.dont.spam.me.cips.ca> wrote:
>Keep in mind the extra overhead when having nullable columns in your =table. See Kalen Delaney's "Inside SQL Server 2000" for details.
--=_NextPart_000_03B7_01C37C58.243E3A10
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&
Her website is http://www.insidesqlserver.com">www.insidesqlserver.com but =you really should get the book. It's the gold standard on =internals.
-- Tom
---T=homas A. Moreau, BSc, PhD, MCSE, MCDBASQL Server MVPColumnist, SQL =Server ProfessionalToronto, ON Canadahttp://www.pinnaclepublishing.com/sql">www.pinnaclepublishing.com=/sql
"Harag"
--=_NextPart_000_03B7_01C37C58.243E3A10--|||This is a multi-part message in MIME format.
--=_NextPart_000_03D0_01C37C59.329401C0
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
You could consider leaving the URL as a separate table for two reasons:
1. Eliminate the null from the Members table
2. Allow for multiple home pages per member, though the primary key =would need to be adjusted.
The sample tables you've shown are quite narrow by themselves. If, =however, you had many columns and only some columns were frequently =accessed while the remainder were not, you could spit the table into two =and maintain a 1:1 relation between them. This is known a "vertical =partitioning".
-- Tom
---
Thomas A. Moreau, BSc, PhD, MCSE, MCDBA
SQL Server MVP
Columnist, SQL Server Professional
Toronto, ON Canada
www.pinnaclepublishing.com/sql
"Harag" <harag@.softhome.net> wrote in message =news:mmhemv0cffgmtn7q2f3s40ot0vkil8eg0q@.4ax.com...
Hi
I might be wrong but isn't the stated size of both the Homepage &
email fields eg (varchar(255) each =3D510) included in the Rows per page
formula ? I think its somthing like 8060 per page so the less pages
the server reads the better ?
The Email field might be better in the first table as most people have
emails these days. but not everyone has a homepage. possibly even 50%
might have a homepage.
Thanks for the imput.
Al.
On Tue, 16 Sep 2003 11:49:33 -0500, "David Browne" <davidbaxterbrowne
no potted meat@.hotmail.com> wrote:
>"Harag" <harag@.softhome.net> wrote in message
>news:dgeemv8fu7ienago4l41vsmrfbbifjs5ps@.4ax.com...
>> Hi All.
>> Win 2k pro
>> SQL server 2k (dev ed)
>> ASP-VBscript
>> I got 3 tables but I'm not sure if they should be 3 or all merged =into
>> one table. Table 2 & 3 have a 1-1 relationship with table 1 with
>> cascade deletes & updates.
>> TABLE 1: Members
>> MemberID
>> FirstName
>> LastName
>> Gender
>> DOB
>> Age (computed Col)
>> Description
>> TABLE 2: Members_Homepages
>> MemberID
>> HomePageURL
>> TABLE 3: Member_Email
>> MemberID
>> EmailAddress
>> Reson for the 3 tables is that not every member will have a homepage
>> and possibly not every member will have an email address.
>This is what nullable columns are for.
>They should all be one table.
>> I'm simply splitting them off to save on data & rows per page space
>> (eg varchar(255) each) but it does mean that all my stored procs that
>> require the info will need to have "Left joins" to get the homepage &
>> emails for each member.
>> Which is the best method : all info 1 table or split into 3 (like I
>> have) and WHY?
>Much more efficient to put them in one table. Much.
>David
>
--=_NextPart_000_03D0_01C37C59.329401C0
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&
You could consider leaving the URL as =a separate table for two reasons:
1. Eliminate the =null from the Members table
2. Allow for =multiple home pages per member, though the primary key would need to be =adjusted.
The sample tables you've shown are =quite narrow by themselves. If, however, you had many columns and only some =columns were frequently accessed while the remainder were not, you could spit the =table into two and maintain a 1:1 relation between them. This is known a ="vertical partitioning".
-- Tom
---T=homas A. Moreau, BSc, PhD, MCSE, MCDBASQL Server MVPColumnist, SQL =Server ProfessionalToronto, ON Canadahttp://www.pinnaclepublishing.com/sql">www.pinnaclepublishing.com=/sql
"Harag"
--=_NextPart_000_03D0_01C37C59.329401C0--|||>"Tom Moreau" <tom@.dont.spam.me.cips.ca> wrote in message
>news:OJV0AMHfDHA.2576@.TK2MSFTNGP11.phx.gbl...
>Keep in mind the extra overhead when having nullable columns in your table.
See >Kalen Delaney's "Inside SQL Server 2000" for details.
If memory serves, varchar columns, nullable or not, and nullable char
columns all have about the same storage characteristics. Nominal storage
required for null columns, no in-place updates. That's about it.
That overhead is orders of magnitude less than than the overhead of having
to perform multiple left joins to retrieve the rows.
David|||Based on your narrative, what you have done is logically correct. Since the
relationship between t1 & t2 as well as t1 & t3 are 1 to (0 or 1) they form
the classic super type/sub type relations.
Generally, it is quite common to keep the logical issues related NULL aside
& concentrate of the physical storage aspects for performance improvements.
In some cases, you'd split up the table into multiple tables based on the
how certain columns are being accessing by DMLs. At the implementation
level, this technique is termed vertical partitioning and can result in
significant performance gains, esp. in datawarehouses.
--
- Anith
( Please reply to newsgroups only )|||>> In an ordinary OLTP application, however, this method is horrible. <<
Perhaps, because "an ordinary OLTP application" (I assume you meant OLTP
database ) considers the performance gains more significant than a vital
function called data integrity ?
>> I wouldn't argue that it's not logically correct. It's just terribly
slow and cumbersome to use. <<
Maybe yes, maybe no. But then it is the data designer's decision whether to
sacrifice the obvious integrity benefits over perceived performance gains.
You have to understand the fundamental difference between logical
representation of data as tables (relations) and the physical storage
(clustering, b-tree indexes, hases, heaps etc). With an obvious confusion
between the logical & physical levels of representations, one may have to
make subjective conclusions like "terribly slow" & write up arbitrary
queries to mix up performance aspects (which is an artifact of the physical
storage) with a well designed logical model.
One foundational concept of any DBMS based on the relational model is that
the DBMS should provide sufficient physical data independence using
relations which allow the user to manipulate the data irrespective of how it
physically stores the data. However, in practice, DBMSs which exclusively
implement SQL have foregone certain level of data independence due to
obvious reasons. Thus, in many cases, a data designer's decision to have a
single table or multiple tables for logical representation of a business
segment, rely on performance implications rather than logical correctness.
>> First you have to perform a number of outer joins to see all the
attributes related to a single entity. This alone probably swamps any
performance gains you might see from using narrow fixed-with tables. <<
That is because you make no distinction between the physical and logical
schema in a database. Again, any performance you see is an artifact of the
physical model. The relational model is nothing but logic applied to
databases, and logic has nothing to say about physical implementation. As
mentioned before, most DBMSs loosely implement "relations" as SQL tables
with more or less direct correlation to the physical model ( files,
bits/bytes etc on the disk). This means a SQL database designer may have to
occasionally corrupt his logical model to make sure sufficient performance
gains are realized by different physical tuning mechanisms. One such popular
mechanism is to ignore the dependencies among the attributes and "bundle up"
multiple entity types into single relation ( which is popularly known as
"denormalization" ).
>> Second your update logic has to involve both inserts and updates. <<
As much an assumption it is, why should this be a problem if multiple DMLs
can guarantee logical correctness ?
>> Third it's extremely difficult to enforce check constraints involving
multiple columns. For instance if you have a requirement that "every member
with a homepage must also have an email unless they joined before 1997", ...
<<
Well, do you realize this rule introduced an additional FD, a depedency of
the email on the year and homepage entities? Are you going to ignore this
altogether?
>> If you have a base type / sub type relationship, with more than one sub
type, and each sub type has multiple columns, then you might begin to
approach the "tipping point" for using this approach. <<
It seems like a "tipping point" for you, because you seem to believe in the
ever-popular argument for "denormalization". You are concerned that the more
the number of tables, the more number of joins involved in a typical query
against those tables. So you are concerned this may cause a dampening effect
in performance. A DBMS that exclusively provides insulation of physical
model from the logical schema would have alleviated your concern. A
significant advantage of the relational model is that it allows the
relational DBMS implementers/vendors to do whatever they want at the
physical level to maximize the performance, provided they are not exposed to
the users. However, in practice, as mentioned before, many popular DBMSs
expose the performance aspects of the physical model to the logical level.
This state of affairs occasionally requires the database designer to
tradeoff certain integrity benefits at the logical level for performance
improvements (that are realized at the physical level). But, it does not
mean a logical model is "horrible" since you do not achieve certain
performance benefits with a specific physical implementation.
For exactly the same reasons, I made clear in my response that, the OP's
representation is logically correct. Your statements about performance,
number of queries/joins etc has nothing to do with it at the logical level.
-- Anith|||Wow, many words. Issue strangely misunderstood.
Ok first off, both designs are logically correct and neither is
"denormalized". Also this issue has very little to do with physical vs
logical design. This is an issue purely of logical design. Performance was
brought into this thread earlier with the assertion that "nullable columns
introduce overhead", and a reference to SQL Server's physical storage of
tables. I only mention the terrible performance of this design to counter
the incorrect and (as you point out) irrelevant notion that that one design
had "more overhead" than the other.
The basic question here is: should optional attributes be stored in nullable
columns or in seperate tables. EG
CREATE TABLE CUSTOMER (ID INT PRIMARY KEY,
PHONE VARCHAR(50) NOT NULL,
FAX VARCHAR(25) NULL)
OR
CREATE TABLE CUSTOMER (ID INT,
PHONE VARCHAR(50) NOT NULL)
CREATE TABLE CUSTOMER_FAX (ID INT PRIMARY KEY REFERENCES CUSTOMER(ID),
FAX VARCHAR(25) NOT NULL)
Neiter one is more or less normalized than the other. NAME and FAX are both
functionally dependant on ID and there is no functional dependency between
them.
Here Fax-owning customers can be thought of as a subtype of customers,
making this a trivial case of the base type/sub type problem. The basic
issues involved don't change if we add more structure to the sub type. EG
CREATE TABLE CUSTOMER (ID INT PRIMARY KEY,
PHONE VARCHAR(50) NOT NULL,
DEPARTMENT INT NULL,
SUPERVISOR INT NULL,
MAIL_STOP VARCHAR(25) NULL)
OR
CREATE TABLE CUSTOMER (ID INT,
PHONE VARCHAR(50) NOT NULL)
CREATE TABLE CUSTOMER_INTERNAL (ID INT PRIMARY KEY REFERENCES CUSTOMER(ID),
DEPARTMENT INT NOT NULL,
SUPERVISOR INT NOT NULL,
MAIL_STOP VARCHAR(25) NOT NULL)
"Anith Sen" <anith@.bizdatasolutions.com> wrote in message
news:ajR9b.24252$Aq2.22536@.newsread1.news.atl.earthlink.net...
> >> In an ordinary OLTP application, however, this method is horrible. <<
> Perhaps, because "an ordinary OLTP application" (I assume you meant OLTP
> database ) considers the performance gains more significant than a vital
> function called data integrity ?
There is absolutely no loss of data integrety. That's a silly thing to say.
> >> I wouldn't argue that it's not logically correct. It's just terribly
> slow and cumbersome to use. <<
> Maybe yes, maybe no. But then it is the data designer's decision whether
to
> sacrifice the obvious integrity benefits over perceived performance gains.
Obvious to whom?
> You have to understand the fundamental difference between logical
> representation of data as tables (relations) and the physical storage
> (clustering, b-tree indexes, hases, heaps etc). With an obvious confusion
> between the logical & physical levels of representations, one may have to
> make subjective conclusions like "terribly slow" & write up arbitrary
> queries to mix up performance aspects (which is an artifact of the
physical
> storage) with a well designed logical model.
Again, huh? I think the logical model sucks. It's hard to use, it's
inflexible, it needlessly multiplies relations. It just throughly sucks as
a logical data model. The terrible performance is just a bonus.
>
[ snip ]
> >> Second your update logic has to involve both inserts and updates. <<
> As much an assumption it is, why should this be a problem if multiple DMLs
> can guarantee logical correctness ?
The ease of use of a logical model is important. The inability to perform
simple edits to an entity with single update statements is a sure sign that
your logical model sucks. If you can't property edit your data without
stored procedures and triggers, you've got the wrong data modl.
Every attribute functonally dependant on a relation's primary key should be
in the relation. You are free to scatter an entity's attributes among many
several relations. Normalization does not prohibit it, it's still bad
design.
> >> Third it's extremely difficult to enforce check constraints involving
> multiple columns. For instance if you have a requirement that "every
member
> with a homepage must also have an email unless they joined before 1997",
...
> <<
> Well, do you realize this rule introduced an additional FD, a depedency of
> the email on the year and homepage entities? Are you going to ignore this
> altogether?
And how would you suggest implementing this rule? A bunch of procedural
logic.
It's a bad logcal design because forces you to use procedural code to
implement rules that should be enforced decaratively.
> >> If you have a base type / sub type relationship, with more than one sub
> type, and each sub type has multiple columns, then you might begin to
> approach the "tipping point" for using this approach. <<
> It seems like a "tipping point" for you, because you seem to believe in
the
> ever-popular argument for "denormalization".
No. Again this isnt' denormalization, just good logical design.
>You are concerned that the more
> the number of tables, the more number of joins involved in a typical query
> against those tables. So you are concerned this may cause a dampening
effect
> in performance.
> A DBMS that exclusively provides insulation of physical
> model from the logical schema would have alleviated your concern.
Gee, I'll rush out and by a RDBMS from magical fairy land.
>A
> significant advantage of the relational model is that it allows the
> relational DBMS implementers/vendors to do whatever they want at the
> physical level to maximize the performance, provided they are not exposed
to
> the users. However, in practice, as mentioned before, many popular DBMSs
> expose the performance aspects of the physical model to the logical level.
> This state of affairs occasionally requires the database designer to
> tradeoff certain integrity benefits at the logical level for performance
> improvements (that are realized at the physical level). But, it does not
> mean a logical model is "horrible" since you do not achieve certain
> performance benefits with a specific physical implementation.
> For exactly the same reasons, I made clear in my response that, the OP's
> representation is logically correct. Your statements about performance,
> number of queries/joins etc has nothing to do with it at the logical
level.
>
You are correct. As a logical model it's not "horrible", just "poor".
It's the performance problems that really push it over the edge.
Application design is not only about the logical modeling. If you design an
application and completely ignore performance, you belong unemployed.
An relational database is a practical, useful thing. To properly design
applications you need to have a costing model for relational operations, and
a firm grasp on concurrency mechanisms. Neither of these can be wholly
divorced from RDBMS implementations, but you can abstract them
significantly.
Which sentiment, just moments ago, you agreed with:
"In some cases, you'd split up the table into multiple tables based on the
how certain columns are being accessing by DMLs. At the implementation
level, this technique is termed vertical partitioning and can result in
significant performance gains, esp. in datawarehouses."
David|||>> Ok first off, both designs are logically correct... <<
Based on the limited information one can deduce from the DDL, that can be
subjective. Without a well defined and complete business model (entity
types, attributes & relationships) nothing can be said about the correctness
of a logical design.
>> Performance was brought into this thread earlier with the assertion that
"nullable columns introduce overhead", and a reference to SQL Server's
physical storage of tables. <<
I realize that, and that is why I kept reiterating about logically
correctness and deliberately ignored product specific implementations. You
seem to be fixed on superior performance of using a single table, which you
may or may not see depending on how you implement this schema in a physical
model. And I clearly mentioned that, if it performs better it is simply due
to the lack of physical data independence and does not contribute to the
superiority of a logical model.
I referred to "denormalization" in my previous post, because you introduced
new requirements regarding homepages, emails, year etc. This may have
changed the existing predicates and hence you may have to start over by
identifying any additionally introduced FDs. Ignoring it altogether can lead
to an under normalized schema.
>> There is absolutely no loss of data integrety. That's a silly thing to
say. <<
So say you!
>> Obvious to whom? <<
To anyone who understands the inherent benefits of a logical model based on
relational theory.
>> Every attribute functonally dependant on a relation's primary key should
be in the relation. You are free to scatter an entity's attributes among
many several relations. Normalization does not prohibit it, it's still bad
design. <<
"scatter"? Don't you see a difference between scattering and principled
decomposition? Are you saying subtype/super type relationships in the
logical model is a bad design?
For some basics, if entities have all their attributes including the key
attribute in common, then these entities are considered to be of one entity
type. Considering your customer example, if all customers in the reality
have name and phone as attributes, they are entities of type "Customer"
which maps logically into one table. If the entities have no attributes in
common, they are of distinct types. Now, what if the entities have common
and distinct attributes? Due to common attributes, all customers are
entities of type "Customer", and by virtue of distinct attribute (fax, in
this case), the customers who also has a fax are "customer_fax" entities a
subtype of "customer" super type. In other words, sub type/super type
relationship, shows them as distinct entity types; that means arguably a
customer with a fax in the real world is a different entity from a customer
without a fax.
At the logical level, the whole point of this endeavor is to avoid
inapplicable values in the database.
I think, SQL:99 has a similar sub-table/super-table provision in the CREATE
TABLE DDL using UNDER clause which supports this directly & declaratively.
>> And how would you suggest implementing this rule? A bunch of procedural
logic. It's a bad logcal design because forces you to use procedural code to
implement rules that should be enforced decaratively. <<
You assume much! As mentioned before, if additional FDs are introduced by
this rule, I'd try to eliminate them first. Why do you have to kludge it
with a check constraint on nullable columns with hard-coded values?
>> As a logical model it's not "horrible", just "poor". It's the performance
problems that really push it over the edge. <<
Contradicting yourself? You claimed at the beginning of your post "Also this
issue has very little to do with physical vs logical design. This is an
issue purely of logical design."
Please explain, why should the performance problems at the physical model
determine the superiority of a logical model? By definition, logical models
based on relational theory have nothing to do with performance, by virtue of
physical data independence.
>> Application design is not only about the logical modeling. If you design
an application and completely ignore performance, you belong unemployed. <<
Are you talking about applications or databases?
>> An relational database is a practical, useful thing. <<
Nobody disagrees with that.
>> Which sentiment, just moments ago, you agreed with:
"In some cases, you'd split up the table into multiple tables based on the
how certain columns are being accessing by DMLs. At the implementation
level, this technique is termed vertical partitioning and can result in
significant performance gains, esp. in datawarehouses." <<
Mixed up apples & oranges well! Vertical partitioning is a performance
boosting endeavor which has nothing to do with the logical model. It is done
based on a specific implementation (materialization of data in the disk),
access paths & relies on the physical storage mechanisms. And it is carried
out purely for performance gains. It has got nothing to do with a super
type/sub type decomposition done at the logical level.
I realize many SQL products lack of certain levels of Physical data
independence. Being the distinction of logical & physical layers of
representation so blur in most popular SQL products, vertical partitioning
(based on physical criteria mentioned above) of logical tables may increase
the performance of certain queries. However this should not be confused with
a logical design process and as such any realized performance gains should
not be a criteria for determining the superiority of the logical model.
--
- Anith
( Please reply to newsgroups only )|||"Anith Sen" <anith@.bizdatasolutions.com> wrote in message
news:%23JxxvWVfDHA.1200@.TK2MSFTNGP09.phx.gbl...
> >> Ok first off, both designs are logically correct... <<
> Based on the limited information one can deduce from the DDL, that can be
> subjective. Without a well defined and complete business model (entity
> types, attributes & relationships) nothing can be said about the
correctness
> of a logical design.
> >> Performance was brought into this thread earlier with the assertion
that
> "nullable columns introduce overhead", and a reference to SQL Server's
> physical storage of tables. <<
> I realize that, and that is why I kept reiterating about logically
> correctness and deliberately ignored product specific implementations. You
> seem to be fixed on superior performance of using a single table, which
you
> may or may not see depending on how you implement this schema in a
physical
> model. And I clearly mentioned that, if it performs better it is simply
due
> to the lack of physical data independence and does not contribute to the
> superiority of a logical model.
> I referred to "denormalization" in my previous post, because you
introduced
> new requirements regarding homepages, emails, year etc. This may have
> changed the existing predicates and hence you may have to start over by
> identifying any additionally introduced FDs. Ignoring it altogether can
lead
> to an under normalized schema.
> >> There is absolutely no loss of data integrety. That's a silly thing to
> say. <<
> So say you!
So where's the loss of data integrety? Just hand-waving and FUD.
> >> Obvious to whom? <<
> To anyone who understands the inherent benefits of a logical model based
on
> relational theory.
More hand-waving.
> >> Every attribute functonally dependant on a relation's primary key
should
> be in the relation. You are free to scatter an entity's attributes among
> many several relations. Normalization does not prohibit it, it's still
bad
> design. <<
> "scatter"? Don't you see a difference between scattering and principled
> decomposition? Are you saying subtype/super type relationships in the
> logical model is a bad design?
YES!!! that's exactly what I'm saying. Implementing subtype/super type
relationships in standard SQL is very awkward, difficult to use and maintain
and performs poorly.
This is why some people have implemented "object databases" and others have
added object-oriented extensions to SQL.
So while it makes sense logically, in practice it's best to avoid it.
[ snip ]
> I think, SQL:99 has a similar sub-table/super-table provision in the
CREATE
> TABLE DDL using UNDER clause which supports this directly & declaratively.
Which would be fine.
> >> And how would you suggest implementing this rule? A bunch of
procedural
> logic. It's a bad logcal design because forces you to use procedural code
to
> implement rules that should be enforced decaratively. <<
> You assume much! As mentioned before, if additional FDs are introduced by
> this rule, I'd try to eliminate them first. Why do you have to kludge it
> with a check constraint on nullable columns with hard-coded values?
That's a perfectly ordinary rule. In a real OO or inheritence scheme, the
subtype has access to all of the data of the parent type, so enforcing such
a contraint would be no problem. It's just another reason that building OO
inheritence on top of standard SQL doesn't work well.
>
[ snip ]
> Please explain, why should the performance problems at the physical model
> determine the superiority of a logical model?
How would you choose among logical models? Flip a coin?
There's nothing wrong with using nullable columns, plus it's easy and fast.
Using subtype tables is ok, except it's hard and slow.
It's a no-brainer.
David|||>> So where's the loss of data integrety? Just hand-waving and FUD. <<
Data Integrity simply means completeness and accuracy of data, which is the
vital consideration for a database designed based on relational theory.
Since relational representation of data at the logical level involves
non-loss decompositions, it is done based on principled guidelines so that
data corruption can be prevented. One of the such guidelines is the
principle of normalization. It's primary focus is the removal of update
anomalies by reducing redundancy. OLTP databases, being highly transactional
tend to have under-normalized schemas. (I don't want to repeat the reasons
regarding physical data independence in every post).
Ron Fagin has proved that fifth normal form is both necessary and sufficient
for addressing update anomalies using non-loss decomposition by projection.
( you can search citeseer for details on his papers ) Depending on the
business model, an under-normalized schema is prone to such update anomalies
and this causes loss of data integrity. Is that clear enough for you?
>> To anyone who understands the inherent benefits of a logical model based
on relational theory. > More hand-waving. <<
I thought you'd get the clue.
>> How would you choose among logical models? Flip a coin? <<
A logical model is abstract and it speaks about how one represents data.
Superiority of a model is decided by how effectively it adheres & conforms
to the features it purports to support. Logical models based on relational
theory must support the relational principles. For example, considering the
principle of physical data independence, a logical model that exposes and
relies directly on the physical implementation aspects, violates the
fundamental relational rule of Physical Data Independence and thus is
inferior to another model which provides such independence. Similarly
considering the principle of the atomicity of values, a logical model
supporting MV is inferior to another supporting atomic values. So is the
case with key based identification, view support etc. The point is, by
definition, no valid criteria for determining the superiority of a logical
model from another depends on perceived performance gains of a specific
physical implementation.
It is hard to continue this thread unless you can show you understand what a
logical model is and realize why it has to be different from a physical
model. Without that, any further exchange is meaningless.
--
- Anith
( Please reply to newsgroups only )|||>> Yeah, but now apply it to the situation at hand. <<
You seem to suggest somehow I argued for normalization as a sole reason for
sub-type/super type relationship of entities.
Let me retrace: Initially I pointed out the arguments for so-called
"denormalization" since under-normalized schemas are a general feature of
OLTP databases & the reason for this being the preference for performance
gains over integrity. I also mentioned the lack of Physical data
independence in DBMS may force the data designer to corrupt a logical model
for performance. And I pointed out normalization as a necessary process to
avoid the update anomalies which are generally ignored in such databases.
Also in the discourse, I debunked your arguments about physical performance
being the valid criteria to assess a good logical model. Which part is not
clear?
The primary logical benefit of super-type/sub-type relationships is the
elimination of inapplicable attributes in the database. I already mentioned
this in a previous response.
>> You seem to be under the impression that every non-loss decomposition
somehow eliminates update anomalies, increases normalization or somehow
improves a logical model. <<
Not *every* non-loss decomposition, but principled ones do that. That is not
my impression, it is an empirical fact proven by Ron Fagin. And I suggested
you to check on citeseer, in case needed.
>> It's possible to decompose tables to the point where there is only one
non-key column in each table, but that doesn't make it a good idea. <<
If elimination of functional dependencies or inapplicable attributes demands
such a non-loss decomposition in a meaningful way, it is a good idea. The
number of non-key column in each table is not an objective criteria; it
depends on segment of interest that is being modeled. An accurate logical
representation of the business model (segment of interest in the real world)
without causing data corruption is the goal.
>> My main point is that decomposing a table with nullable columns into a
base type / sub type design is bad because it needlessly increases the
number of tables and it increases the complexity of the DML needed to work
with the model. <<
Not necessarily, an entity subtype-super type relationship is represented in
a database by a 0/1:1 (zero-or-one-to-one) referential constraint, a special
case of the general (0/1:M) referential constraint. By your assessment
above, any zero-or-one to many relationship will require complex DML.
Furthermore, increase in the number of tables is not a criteria to assess
the superiority of a logical model.
>>Your argument seems to be that good relational theory requires the
decomposition of tables with nullable columns into a base type / sub type
model. However you have offered no suggestion as to why this might be. Just
hand waving ...<<
No, elimination of inapplicable values in the database sometimes requires
super-type/sub-type relationship among entities. Relational theory prohibits
inapplicable/non-existent/unknown values and warrants elimination of such
values. So a model with super-type/sub-type relationship solution suits the
problem. Which part do you feel like hand waving?
For any additional references on this topic I would direct you to :
A New Database Design Principle - Relational Database Writings 91-94
: Date & McGoveran
Chapter 6 - Practical issues in Database Management
: Fabian Pascal
Appendix E - The Third Manifesto
: Date & Darwen (I am not fully familiar with the RM strong
suggestion on this issue, but it clears up many confusions)
-- Anith|||I was really having trouble figuring out what wrongheadded notion was behind
the requirement to decompose nulls out of a design.
What really had me puzzeled was your instance that not only were nullable
columns wrong, they were "obviously" wrong. Very strange.
Here is the crux of the matter:
> Relational theory prohibits
> inapplicable/non-existent/unknown values and warrants elimination of such
> values. So a model with super-type/sub-type relationship solution suits
the
> problem.
At least it's not hand-waving any more. It's just wrong. But at least it's
a matter of genuine and substantial controversy and I undersand what you're
arguing.
There may be theoretical support for the proposition that null columns
should not be allowed in a relation. And I agree that NULL's don't really
fit into relational theory. Fundamentally they are an implementation kluge.
But use of nullable columns is so ingraned in SQL and the current RDBMS
products, and the elimination of nulls requires such complication of logical
design that I think they are a necessary evil.
You may disagree with my position here, but it's not "obviously" wrong, and
plenty of people would agree with me on this.
Anyway I'm happy to let the matter lie, as I think we've reduced the issue
down to a single fundamental and controversial proposition.
David|||>> What really had me puzzeled was your instance that not only were nullable
columns wrong, they were "obviously" wrong. <<
"obviously" wrong? Please don't put words in my mouth. I said it is the data
designer's decision whether to sacrifice the obvious integrity benefits over
perceived performance gains. Also I said there is obvious confusion between
the logical & physical layers of representation and that SQL products
forfeit Physical Data Independence for obvious reasons. Let us not
misinterpret.
>> At least it's not hand-waving any more. It's just wrong. <<
Let us be honest here. You made bald assertions about a schema with
super-type/sub-type relationship being "terrible", "horribly slow", "poor"
and "sucks" without giving anything to back it up. Then without any details
about the business model or underlying FDs, you wrote up a two -line DDL &
ask me to find if there is any update anomaly. Then you act surprised how a
logical model can be superior to another without performance criteria.
Moreover you used the borrowed UseNet terms like hand waving, FUD etc when
confronted with the basics of integrity and data independence. Other than
the controversy surrounding Nulls is there anything to back up your above
statement "It's just wrong"? I do understand the controversy behind the
usage of Nulls, but it does not make the elimination of Nulls "just wrong".
The point is before lambasting a design with overused adjectives let us
realize some consider certain criteria for data accuracy over others. And I
mentioned about it in my posts as the data designer's prerogative.
>> Anyway I'm happy to let the matter lie, as I think we've reduced the
issue down to a single fundamental and controversial proposition. <<
Me too.
--
- Anith
( Please reply to newsgroups only )