Coming
from a 200K+ IT Consultants organisation, and at least 25 years of solving the
"performance issues" after "stuff went into production" I'd
say it depends, but usually the Projects lacked a Data(base) Architect during
the architectural design, early design/implementation phase.
I usually say it's a database, not an objectbase. When we model data for a
solution we do it in multiple layers. Usually it starts with business objects
like Customers, Suppliers, Products, Orders, and then it comes down to Classes,
that need "storage" by an Entity Model (Yes, any ORM would go but as
long EF is without "Core" I'm happy). This logical data model need
physical support, thus a database. Basically every logical data model object
need a storage object, at the lowest level a Table, which gives Us a one to one
relation between Entity and Table.
And this is how it usually ends, I'm not saying this is right. I'm only saying
this is where it usually ends up.
Back to my saying this is a database, not an objectbase. Databases have data
storage objects (Tables), and data access objects (Procedures, Views, I'm
leaving out the functions here). The application want to have something to feed
the Entities from the Tables, this is where we use views as proxies between the
logical and physical model as we probably want to model the data (This is what
most people would call normalize data, I use the terms of normalize,
generalize, and denormalize and care less about what xNF we get. My goal is the
most optimal physical datamodel for performance, data integrity, size.) to
support the R in CRUD(Views for -R--). And then add Procedures for the C-UD in
CRUD. Basic rule is one table and Three procedures per Entity, let us call the
View by the EntityName, and the procedures ins-, upd, and delEntityName, and
then add them as methods for the entities(EF supports this, EFCore doesn't!).
And then add instead of triggers on the views launching the procedures for the
developer not using the entity model.
So what is the problem of this approach, developers cannot change entity models
without understanding what would happen with the database objects. So, if they
after a release want to add attributes to an entity they need to change a table
to add a column, change the view to handle the column, change three procedures,
and then change the triggers. This would suck big time... To mitigate that
problem the developer can create a supporting class for the minor release,
think patch. In practical terms, create a new class with the keys, and
additional attributes, getting a table for that, running CRUD against that
table. And then, when the Product reaching a Major release merge together. But
with this in mind, how ofter does a datamodel actually change after the
release? I'd say not often, small changes, really seldom...
And finally, what are the benefits. First that people gets in mind is
performance (I always get Premature Optimization n my face every time I bring
this up.), and we get data integrity. Both in terms of data security, and data
quality. With security we can make sure that only the right user can reach the
right data, all the way down on row/column level, actually all the way down to
every single data point we might have. But we also get higher quality in the
data as we have a chance of applying a "higher level of
normalization" as we can normalize, generalize, and denormalize data with
the focus on data, not business objects.
As an example, take a Contacts table containing whatever needed for sending
postcards, covering everyone in the World. We would need attributes like this:
ContactID, FirstName, MiddleName, LastName, Address, Zipcode, City, County,
State, Country, Continent. See this as a conceptual model, it doesn't need to
be perfect...
Let us estimate a logical row to something between 20 and 500 bytes depending
on the strings, and let us say that the average string is 9 characters, and we
go by unicode giving us an average of 18 bytes per string, and that we have 10
strings and one key which need to support at least 8.000.000.000 individuals
and 2.000.000.000 companies, that is 10.000.000.000 rows in the table. SQL
Server has a storage level called data page that limits how many rows can be
fetched through an IO operation. A data page is 8kB, and can store a single
physical row up to 8060 B.
To
calculate the rows per page we simply divide 8060 bytes per page / bytes per
row giving us rows per page. To get the size we divide rows per page with rows
per table giving us the result in pages per table. And as every page is 8 kB we
just multiply with 8 and get kB per table.
Legend:
B
= Bytes, kB = Kilobytes, r = Row, p = Page, t = Table, r = Rows, Br = Bytes per
Row, Bp = Bytes per Page, kBp = Kilobytes per Page, rp = Rows per Page, pt =
Pages per Table, rt = Rows per table, Bt = Bytes per Table, kBt = Kilobytes per
Table
Formulas:
Bp
/ Br = rp (Round down)
rt
/ rp = pt (Round up)
pt
* kBp = kBt
10.000.000.000
rows per table and 20 Bytes per row:
403
Rows per page, 24.813.896 pages per table, 193.859 MB in the table
10.000.000.000
rows per table and 100 Bytes per row:
80
Rows per page, 125.000.000 pages per table, 976.563 MB in the table
10.000.000.000
rows per table and 500 Bytes per row:
16
Rows per page, 625.000.000 pages per table, 4.882.813 MB in the table
Breaking
this table up in a Contacts, Streets, Cities, Regions
Regions(RegionID,
State, Country, Region)
Cities(CityID,
City, County, RegionID)
Streets(StreetID,
Street, Zipcode, CityID)
Contacts(ContactID,
Names, StreetID, StreetNumber)
Makes
this probably down to 60% in size.
This
is normalization!
There
is a lot of strings, create a string table and assign StringIDs to the strings
in the entity tables. Let us do some generalization! I add some rough estimates
Strings(StringID,
String), 5.000.000 unique strings, avg row 20 B, total of 97 MB
Regions(RegionID,
StateStringID, CountryStringID, RegionStringID), 10.000 states, always 20 B per
row, total of 200 kB
Cities(CityID,
CityStringID, CountyStringID, RegionID), 250.000 cities, always 20 B per row,
total of 5 MB
Streets(StreetID,
StreetStringID, Zipcode, CityID), 25.000.000 streets, always 25 B per row,
total of 607 MB
Contacts(ContactID,
FirstNameStringID, MiddleNameStringID, LastNameStringID, StreetID, StreetNumber),
10.000.000.000 contacts, always 38 B per row, total of 368.514 MB
Adding
this together gives us a model of 368 GB. Compare 976 GB to 368 GB is a new
model that is less than 38 % of the original model.
And
then we have a really meshy model, so let us denormalize some…
Strings(StringID,
String), this is the same as before, 97 MB.
Addresses(StreetID,
StreetStringID, Zipcode, CityStringID, CountyStringID, StateStringID, CountryStringID,
RegionStringID), always 42 B per row, total of 1023 MB.
Contacts(ContactID,
FirstNameStringID, MiddleNameStringID, LastNameStringID, StreetID, StreetNumber),
same as before, 368.514 MB
This
final model is probably somewhere in the 35-40% range of the original model
giving us approximatly 3 times so many business objects, in the same amount of
RAM in cache on the SQL Server. And we will have a more slow, linear growth in
in the database when we add content instead of an exponential growth.
And
then we have the views and procedures making the model work from the business
layer.
Long
reply, adding to your story. Hoping to get the developers to understand the
importance of a difference between an entity and/or class in the business
layer, and a table in the database. There is no relation at all in between, as
the business layer works with business objects, and in the database we work
with data.