7/7/17

Data Modelling, why is it important to understand logical and physical...

This is just a reply I had to an article I read abot logical and physical modelling...
I just paste it pretty much as I answered.

Coming from a 180K+ IT Consultants organisation, and at least 25 years of solving the "performance issues" after "stuff went into production" I'd say it depends, but usually the Projects lacked a Data(base) Architect during the architectural design, early design/implementation phase.

I usually say it's a database, not an objectbase. When we model data for a solution we do it in multiple layers. Usually it starts with business objects like Customers, Suppliers, Products, Orders, and then it comes down to Classes, that need "storage" by an Entity Model (Yes, any ORM would go but as long EF is without "Core" I'm happy). This logical data model need physical support, thus a database. Basically every logical data model object need a storage object, at the lowest level a Table, which gives Us a one to one relation between Entity and Table.
And this is how it usually ends, I'm not saying this is right. I'm only saying this is where it usually ends up.

Back to my saying this is a database, not an objectbase. Databases have data storage objects (Tables), and data access objects (Procedures, Views, I'm leaving out the functions here). The application want to have something to feed the Entities from the Tables, this is where we use views as proxies between the logical and physical model as we probably want to model the data (This is what most people would call normalize data, I use the terms of normalize, generalize, and denormalize and care less about what xNF we get. My goal is the most optimal physical datamodel for performance, data integrity, size.) to support the R in CRUD(Views for -R--). And then add Procedures for the C-UD in CRUD. Basic rule is one table and Three procedures per Entity, let us call the View by the EntityName, and the procedures ins-, upd, and delEntityName, and then add them as methods for the entities(EF supports this, EFCore doesn't!). And then add instead of triggers on the views launching the procedures for the developer not using the entity model.

So what is the problem of this approach, developers cannot change entity models without understanding what would happen with the database objects. So, if they after a release want to add attributes to an entity they need to change a table to add a column, change the view to handle the column, change three procedures, and then change the triggers. This would suck big time... To mitigate that problem the developer can create a supporting class for the minor release, think patch. In practical terms, create a new class with the keys, and additional attributes, getting a table for that, running CRUD against that table. And then, when the Product reaching a Major release merge together. But with this in mind, how ofter does a datamodel actually change after the release? I'd say not often, small changes, really seldom...

And finally, what are the benefits. First that people gets in mind is performance (I always get Premature Optimization n my face every time I bring this up.), and we get data integrity. Both in terms of data security, and data quality. With security we can make sure that only the right user can reach the right data, all the way down on row/column level, actually all the way down to every single data point we might have. But we also get higher quality in the data as we have a chance of applying a "higher level of normalization" as we can normalize, generalize, and denormalize data with the focus on data, not business objects.

As an example, take a Contacts table containing whatever needed for sending postcards, covering everyone in the World. We would need attributes like this: ContactID, FirstName, MiddleName, LastName, Address, Zipcode, City, County, State, Country, Continent. See this as a conceptual model, it doesn't need to be perfect...

Let us estimate a logical row to something between 20 and 500 bytes depending on the strings, and let us say that the average string is 9 characters, and we go by unicode giving us an average of 18 bytes per string, and that we have 10 strings and one key which need to support at least 8.000.000.000 individuals and 2.000.000.000 companies, that is 10.000.000.000 rows in the table. SQL Server has a storage level called data page that limits how many rows can be fetched through an IO operation. A data page is 8kB, and can store a single physical row up to 8060 B.
To calculate the rows per page we simply divide 8060 bytes per page / bytes per row giving us rows per page. To get the size we divide rows per page with rows per table giving us the result in pages per table. And as every page is 8 kB we just multiply with 8 and get kB per table.
Legend:
B = Bytes, kB = Kilobytes, r = Row, p = Page, t = Table, r = Rows, Br = Bytes per Row, Bp = Bytes per Page, kBp = Kilobytes per Page, rp = Rows per Page, pt = Pages per Table, rt = Rows per table, Bt = Bytes per Table, kBt = Kilobytes per Table
Formulas:
Bp / Br = rp (Round down)
rt / rp = pt (Round up)
pt * kBp = kBt
10.000.000.000 rows per table and 20 Bytes per row:
403 Rows per page, 24.813.896 pages per table, 193.859 MB in the table
10.000.000.000 rows per table and 100 Bytes per row:
80 Rows per page, 125.000.000 pages per table, 976.563 MB in the table
10.000.000.000 rows per table and 500 Bytes per row:
16 Rows per page, 625.000.000 pages per table, 4.882.813 MB in the table
Breaking this table up in a Contacts, Streets, Cities, Regions
Regions(RegionID, State, Country, Region)
Cities(CityID, City, County, RegionID)
Streets(StreetID, Street, Zipcode, CityID)
Contacts(ContactID, Names, StreetID, StreetNumber)
Makes this probably down to 60% in size.
This is normalization!
There is a lot of strings, create a string table and assign StringIDs to the strings in the entity tables. Let us do some generalization! I add some rough estimates
Strings(StringID, String), 5.000.000 unique strings, avg row 20 B, total of 97 MB
Regions(RegionID, StateStringID, CountryStringID, RegionStringID), 10.000 states, always 20 B per row, total of 200 kB
Cities(CityID, CityStringID, CountyStringID, RegionID), 250.000 cities, always 20 B per row, total of 5 MB
Streets(StreetID, StreetStringID, Zipcode, CityID), 25.000.000 streets, always 25 B per row, total of 607 MB
Contacts(ContactID, FirstNameStringID, MiddleNameStringID, LastNameStringID, StreetID, StreetNumber), 10.000.000.000 contacts, always 38 B per row, total of 368.514 MB
Adding this together gives us a model of 368 GB. Compare 976 GB to 368 GB is a new model that is less than 38 % of the original model.
And then we have a really meshy model, so let us denormalize some…
Strings(StringID, String), this is the same as before, 97 MB.
Addresses(StreetID, StreetStringID, Zipcode, CityStringID, CountyStringID, StateStringID, CountryStringID, RegionStringID), always 42 B per row, total of 1023 MB.
Contacts(ContactID, FirstNameStringID, MiddleNameStringID, LastNameStringID, StreetID, StreetNumber), same as before, 368.514 MB
This final model is probably somewhere in the 35-40% range of the original model giving us approximatly 3 times so many business objects, in the same amount of RAM in cache on the SQL Server. And we will have a more slow, linear growth in in the database when we add content instead of an exponential growth.
And then we have the views and procedures making the model work from the business layer.
Long reply, adding to your story. Hoping to get the developers to understand the importance of a difference between an entity and/or class in the business layer, and a table in the database. There is no relation at all in between, as the business layer works with business objects, and in the database we work with data.








2/17/17

Really, is the backup running?

Yes, and now you launched all those backup statements, and you wonder when they are done...
This simple Query might help you a bit.

SELECT
 session_id,
 request_id,
 command,
 d.name as database_name,
 start_time,
 DATEDIFF(SECOND, start_time, GETDATE()) AS elapsed_time_in_seconds,
 percent_complete AS estimated_percent_complete,
 DATEADD(SECOND, ROUND(DATEDIFF(SECOND, start_time, GETDATE()) / percent_complete * 100.0, 0), start_time) AS estimated_finish_time
FROM sys.dm_exec_requests r INNER JOIN sys.databases d ON r.database_id = d.database_id WHERE command LIKE '%BACKUP%' OR command LIKE '%RESTORE%'

2/11/17

Do I have that backup somewhere...

Yesterday evening I just cooked up a simple backup recipe, reason was just a simple weekly full/differential backup over all non-system databases on one instance. It's far from perfect, but it works.

It consists of a database to store information about backups of databases in a table with information about the last backups, and a procedure to do the backup. The below script also adds all databases with an database_id higher than 4, i.e. excludes master, model, msdb, and tempdb. If you want specific databases only just change the content of the Managed_Databases table.

USE master;
GO
CREATE DATABASE Simple_Maintenance;
GO
USE Simple_Maintenance;
GO
CREATE TABLE Managed_Databases
 (
  [db_id] int NOT NULL,
  [db_name] sysname NOT NULL,
  Last_Full_Backup datetime2(0) NULL,
  Last_Full_Backup_Number int NULL,
  Last_Differential_Backup datetime2(0) NULL,
  Last_Differential_Backup_Number int NULL
 );
GO
INSERT INTO Managed_Databases([db_id], [db_name])
SELECT database_id, name FROM sys.databases WHERE database_id > 4;
GO
And then we have the procedure, it's some lines of code. Hopefully you can read through it.

USE Simple_Maintenance
GO
CREATE OR ALTER PROCEDURE Make_Backup
 @backup_destination varchar(max) = 'F:\Databases\MSSQL13.MSSQLSERVER\MSSQL\Backup',
 @full_backup_day int    = 7,
 @debug int       = 1
/*
 Day for full backup
 1 Monday
 2 Tuesday
 3 Wednesday
 4 Thursday
 5 Friday
 6 Saturday
 7 Sunday
*/
AS
BEGIN
SET DATEFIRST 1;
DECLARE
 @database varchar(max),
 @name_template varchar(max)  = '@database_@type_@number',
 @type varchar(max),
 @number varchar(max);
DECLARE
 @backup varchar(max) = '
BACKUP DATABASE [@database]
TO DISK = ''@backup_destination\@name.BAK''
WITH
 STATS = 10,
INIT, -- I FORGOT THIS IN THE ORIGINAL RECIPE
 COMPRESSION
',
 @exec varchar(max),
 @db_id int,
 @day int = DATEPART(weekday, getdate()),
 @name varchar(max);
DECLARE dbs CURSOR FOR
SELECT [DB_ID], [DB_NAME] FROM Managed_Databases;
OPEN dbs;
FETCH NEXT FROM dbs INTO @db_id, @database;
WHILE @@FETCH_STATUS = 0
BEGIN
IF @day = @full_backup_day
BEGIN
 SET @type = 'FULL';
 SELECT @number = CAST((ISNULL(Last_Full_Backup_Number, 2) % 2) + 1 as varchar(max)) FROM Managed_Databases WHERE [DB_ID] = @db_id;
 SET @name = REPLACE(
    REPLACE(
    REPLACE(
     @name_template,  '@database',@database),
        '@type', @type),
        '@number', @number);
 SET @exec = REPLACE(
    REPLACE(
    REPLACE(
     @backup, '@database',@database),
        '@backup_destination', @backup_destination),
        '@name', @name);
END
ELSE
BEGIN
 SET  @type = 'DIFFERENTIAL';
 SELECT @number = CAST((ISNULL(Last_Differential_Backup_Number, 2) % 2) + 1 as varchar(max)) FROM Managed_Databases WHERE [DB_ID] = @db_id;
 SET @name = REPLACE(
    REPLACE(
    REPLACE(
     @name_template,  '@database',@database),
        '@type', @type),
        '@number', @number);
 SET @exec = REPLACE(
    REPLACE(
    REPLACE(
     @backup, '@database',@database),
        '@backup_destination', @backup_destination),
        '@name', @name) +
    ', ' + @type;
END;
IF @debug = 1
BEGIN
PRINT @exec;
END
ELSE
BEGIN
IF @day = @full_backup_day
BEGIN
EXEC (@exec);
UPDATE Managed_Databases
 SET
  Last_Full_Backup = getdate(),
  Last_Full_Backup_Number = CAST(@number as int)
 WHERE [DB_ID] = @db_id;
END
ELSE
BEGIN
IF EXISTS(SELECT 1 FROM Managed_Databases WHERE [DB_ID] = @db_id AND Last_Full_Backup IS NOT NULL)
BEGIN
EXEC (@exec);
UPDATE Managed_Databases
 SET
  Last_Differential_Backup = getdate(),
  Last_Differential_Backup_Number = CAST(@number as int)
 WHERE [DB_ID] = @db_id;
END;
END;
END;
FETCH NEXT FROM dbs INTO @db_id, @database;
END;
CLOSE dbs;
DEALLOCATE dbs;
END;
And finally you just need to run the procedure. If you execute it with default parameters it will run in debug mode and only print out the syntax for doing backups. The prinout do not care if you have a full backup or not when there is a differential backup day. It gives you the syntax.

The procedure do not have any error checking either, you should probably add try...catch logic, and also checks in the database exists, and are in a backupable state. As I said, I cooked this up for fixing a simple need.

5/9/16

Some ideas are great, and some are not, and a simple trick...

First of all I'd like to say thanks to the minority. Last week I was in Copenhagen and delivered the worst session ever. A minority liked it, and the majority didn't. So, to the majority I would like to say, I'm sorry. My idea for the session was about sharing some from the soft side of a large project, with some technology, and try to add some "I recognise this"-feeling in it. The idea might have been great in my mind, and I do Believe that some sessions need to be like that, but in the context my idea was not so great. So once again, I'm deeply, truly, from the bottom of my heart, I'm sorry.

Lesson learned!

To cite the King of Sweden, Carl XVI Gustav, about a not so great decision, "let us turn the page, to a new chapter, like when you are reading those books, the ones that make a sound when it's time to turn the page".

On the other hand, I'm looking forward to fulfill the promise I made. There will be a long series of posts and videos covering all the topics of the session. With details and easy to follow guides of implementing and solving issues. And I will try to get the first one done this week.

For the future, I also have a promise, next time you have the chance of attending a session of mine. Give me the opportunity to make it different back again. In the future I'll keep it back to only Technical, and majority of demos, and "normal".

There are some reasons behind this post, first of all the appology, secondly to remind myself, and thirdly to turn the page, and finally as launchpad for the series of posts and videos.

Actually, I'll start with the simplest of simple, the following is a copy of an unattended configuration file that install an instance of the database Engine. The parameters QUIETSIMPLE="True" set the install to automatic install, INDICATEPROGRESS="False" hides the progress bar. And by the way, create them with the SQL Server Installation Center, and edit them with Notepad:

;SQL Server 2014 Configuration File
[OPTIONS]
ACTION="Install"
ENU="True"
QUIETSIMPLE="True"
UpdateEnabled="False"
ERRORREPORTING="False"
USEMICROSOFTUPDATE="False"
FEATURES=SQLENGINE,REPLICATION,FULLTEXT
UpdateSource="MU"
HELP="False"
INDICATEPROGRESS="False"
X86="False"
INSTALLSHAREDDIR="C:\Program Files\Microsoft SQL Server"
INSTALLSHAREDWOWDIR="C:\Program Files (x86)\Microsoft SQL Server"
INSTANCENAME="InstanceName"
SQMREPORTING="False"
INSTANCEID="InstanceName"
INSTANCEDIR="C:\Program Files\Microsoft SQL Server"
AGTSVCACCOUNT="DOMAIN\groupManagedServiceAccount$"
AGTSVCSTARTUPTYPE="Automatic"
COMMFABRICPORT="0"
COMMFABRICNETWORKLEVEL="0"
COMMFABRICENCRYPTION="0"
MATRIXCMBRICKCOMMPORT="0"
SQLSVCSTARTUPTYPE="Automatic"
FILESTREAMLEVEL="0"
ENABLERANU="False"
SQLCOLLATION="SQL_Latin1_General_CP1_CI_AS"
SQLSVCACCOUNT="DOMAIN\groupManagedServiceAccount$"
SQLSYSADMINACCOUNTS="DOMAIN\groupSQLAdmins"
ADDCURRENTUSERASSQLADMIN="False"
TCPENABLED="1"
NPENABLED="0"
BROWSERSVCSTARTUPTYPE="Automatic"
FTSVCACCOUNT="NT Service\MSSQLFDLauncher$InstanceName"

And to start the unattended install, /IAcceptSQLServerLicenseTerms=true automatically accepts the license:
Setup.exe /ConfigurationFile=ConfigFile.INI /IAcceptSQLServerLicenseTerms=true

5/4/16

It's a new year, 2016, and been for a while and SQL Nexus in Copenhagen

Last year, and currently, there are so much work and so little time. Am right now in Copenhagen for SQL nexus and current charging for my session Hi(gh) Availability. Slightly different format on the session than usual, and I'm going to promise to publish the demos as blog posts. That will force me to add stuff more frequently, which is good. And it will force me to plan my hours even more rigorous, which might give me my life back.

Anyways, it's a new year. And a few days ago I was told that the next version of SQL Server will be released on the 1st of June, and over the months after all goodies will be in place.

I can't wait for getting the golden RTM bits, it's going to be a super year.

Cheers,
Mattias

PS: Please check back and comment and request!

5/10/15

New features in Microsoft SQL Server 2016

Sitting on my train home from Microsoft Ignite 2015 in Chicago, IL, USA after 14 hours of travelling and still three hours to go. I’d like to summarize a few of all new features that we probably will get in vNext, i.e. Microsoft SQL Server 2016. At the time being there isn’t a public CTP, but to the summer we’ll probably get our hands on CTP2 which will be the first public preview and sneak peek.

For the BI folks out there in the world I’m going to say: There are investments done in Analysis Services, Master Data Services and Data Quality Services. Power BI is getting closer to On-Prem, Power Query as a Data Source for Integration Services and this is not all. Over the time forward I will elaborate and share my expressions of all the new goodies.

For the DBAs out there, and the DB Devs too, we’re getting tons of new stuff. Think about having a tool that actually persists queries and execution plans, and help us identify parameters, and easily pinpoint execution plans to explicit queries. And expose this together with graphs and statistics. This coolness is called Query Store.

And think about having a Slowly Changing Dimension type 3 alike feature built in into the OLTP engine, and having this ability transparently added to tables we need to keep track of all versions of rows, and as a time machine jump back and forward in time over the data. This is called Temporal Tables.

Is your table growing out of size and you still need to keep everything forever? Think about letting the table stretch cold data to Azure, still keeping it online, transparently and infinitely. Just having the need to have the hot and warm data in the backups, still being able to restore to any point in time have timely snapshots for your restore process. In 2016 the hybrid experience take another step and gives us the ability to stretch an On-Premise table to Azure offering you potential infinite storage.

This is just some, and more will come… I’ll keep you posted!

Over and out,

Mattias

5/4/15

SQL Server 2016 is not that far away... maybe...

Today is an awesome day, during the keynote here in Chicago at the Microsoft Ignite Sataya announced the upcoming and new SQL Server 2016 and that we will get a preview shortly. Read more at this blog, http://blogs.technet.com/b/dataplatforminsider/archive/2015/05/04/sql-server-2016-public-preview-coming-this-summer.aspx

And if you're in Chicago try to find me...