Wednesday, July 25, 2018

Need of surrogate key in data warehouse

A SQL Server data warehouse receives data from multiple sources.Dimension tables uses an additional surrogate key.

The need of special surrogate keys in Dimension tables is to Protect the Data Warehouse from Unexpected Administrative Changes.
In data warehouse, we are dealing with millions of records which can span decades.
If we are not using a special key as a primary key in a dimension, we need to include the current business key as a primary key of the dimension table. If we included a natural business key as the primary key of the dimension table, the same key will be included in the fact table. Since data warehouses do not have control over the source systems, it is important to know that the business key can be changed over time due to administrative and operational needs. When this change occurs, the entire fact table needs to be updated, which means dealing with millions of records and updates which will have other implications such as the table not being available until the update completes.

If we include a surrogate key to the dimension table, it is only a matter of changing the dimension records .Therefore, updating dimension records won’t have a major impact to the data warehouse system.

Let's look at the below example which is taken from the AdventureWorksDW sample database. In this example, the CutomerKey is the surrogate key in the DimCustomer Table.

dimcustomer and factsales tables

In case CustomerAlternateKey is modified, which is the business key of the Customer dimension, you only have to change the dimension table not the fact tables. Both data changes and data type changes can be handled with this method.

Data warehouse receives data from multiple sources. Sometimes, the same dimension will receive data from multiple sources. For example, for an organization who has employees at multiple sites, in the data warehouse you need to include all of them in one dimension for analysis purposes. In the case of OLTP, these data sets are maintained in isolation and they are unique in isolation. When these data are integrated into the data warehouse, there can be cases where the same primary key is used for multiple records in different locations. Since primary keys cannot be duplicated, loading data into the data warehouse will fail if you try to extract and load both records to the data warehouse.

Since we are including a surrogate key, duplicate issues will be fixed as the surrogate key is the primary key in the dimension table

A dimension table can have records which are not in the source systems. When fact tables are updated, there can be multiple surrogate keys for one fact table.

Let's look at the FactSales fact table.

In the fact table, for a given record, the ProductKey, OrderDateKey, DueDateKey, ShipDateKey, CustomerKey, PromotionKey and CurrencyKey are surrogate keys which should be updated. Since data is being received from multiple sources, there can be cases where one or two dimensions are not available. If these are not available, you have to ignore the entire record and foreign key violations will occur. To avoid this, there will be need to be manually entered records in the dimension. Typically, 0 or -1 will be used as a surrogate key which will be tagged in the fact table without violating foreign key constraints.

A case when surrogate keys in dimension tables can not be avoided. Keeping historical data is essential in a data warehouse. Therefore, dimension changes need to be tracked in the data warehouse. To facilitate historical tracking in a dimension, type two slowly changing dimensions (SCD) are used.

In type 2 SCDs, a new row is added. If you don’t have the surrogate keys, there is difficulty of adopting the historical tracking in the data warehouse. With surrogate keys, a new record will have a different surrogate key to the previous records.

There are many statuses and flags in OLTP systems. Ideally, these needs to be mapped to a dimension table in the data warehouse which will end up with a large number of dimension tables. However, to ease operations, multiple statuses are combined to one junk dimension table. Since this dimension table does not exist, you need to add a primary key which is a surrogate key.

Large volumes of data are used in a data warehouse. Fact tables and dimension tables mainly join via surrogate keys. If those are integer columns, it will be better performing than using character columns.

Typically, an auto increment integer column is used as the surrogate key in a dimension table. Normally, surrogate keys do not have any meaning except for a surrogate key in the date dimension. In a date dimension, YYYYMMDD format is used mainly to enhance the data partitioning.

Tuesday, July 24, 2018

Performance Testing

If development is the core of any project performance-testing is the essence of the project

The need for  performance testing  and the procedure to perform it easily, accurately using simple JUnit and maven based Open Source Library. What problems we face typically during the load and stress testing of an application and how we can overcome these issues and make the performance-testing part of the CI build pipeline.
Sometimes we tend to think that the performance-testing   is not part of the development process. This means the important aspect of a product or service APIs is not not taken care of. But that's not the point, the point is why do we think that it should not be part of the usual development cycle ?
Solution
@LoadWith("your_load_config.properties")
@TestMapping(testClass = YourExistingEndPointTest.class, testMethod = "aTest")
@RunWith(YourLoadRunner.class)
public class LoadTest { }
number.of.parallel.users=50
users.to.be.ramped.up.in.seconds=50
repeat.this.loop.times=2
where you want 50 users  to be ramped up in 50 seconds  (each user firing tests in 1 sec gap) and this to be run twice(loop=2) i.e. total 100 parallel users will fire requests each approximately in 1 second gap.
@TestMapping(testClass = YourExistingEndPointTest.class, testMethod = "aTest")
required assertions to match the actual result received vs expected result.

Total number of tests fired	100
Total number of tests passed	90
Total number of tests failed	10
Average delay between requests(in sec)	1
Average response time delay(in sec)	5

Also to add more ground to the above thinking, there are no straight forward approaches to do performance testing like we do unit testing or feature / component testing or e2e integration testing or consumer- contract testing. Then the developers or the performance-testers(sometimes a specialized team) are asked to choose a standalone tool from the market place and produce some fancy reports on performance testing , share those reports with business or technology team.

That means it is done in isolation and sometimes after or towards the end of the development sprints, approaching the production release date. The importance of this testing is missed out to provide the room for improvement or fixing of the potential issues in the product. This answers the first question "why".

Ideally to make it part of the usual development cycle,

And to make it part of the CI build, it has to be easy enough to write/change/share the tests for developers or the performance-testers.

The point is how do we get an useful statistics easily understandable by the developers or, how to fix the issues arise from load/stress/capacity testing or,how do we get a continous feedback from the CI build ?

In the traditional approach, too much time is spent understanding the tool and making the tool work due to some are not IDE friendly i.e. not even maven/J Unit based for that matter.

Some challenges like pointing your performance-testing tool to anywhere in the tech-stack is not an easy or straight forward task e.g. pointing to a REST end point, SOAP end point, DB Server or KAFKA topics or a MQ end point like ActiveMQ/WMQ, SSL connections, connection via Corporate Proxy etc.

This makes it bit difficult to isolate the issues that your application APIs are performing very well and only the downstream systems have the issues. Let's explain what it means-

e.g.

You just tested your GET API's performance, pointing to the url e.g./api/v1/id-checks/131001 using the standalone tool and found that the response delay is more, than invoking the same API for a single time. Then you(as a developer) tend to blame it on the DB or the MQ topics e.g. giving a reason that Oracle DB server is pretty slow while handling the parallel loads. But how do you produce the evidences to support your argument that your APIs are not slow?

Now, you wish you could have a mechanism or tool to isolate this issue from your application, by pointing your performance-test tool directly to the DB or KAFKA-Topics etc, because (as a developer) you know -- which SQL queries are fired to the DB to fetch the result or you know the topic names from which you can directly fetch the data bypassing the application API processing layer, which could prove your point meaningfully as well as produce evidences.

To achieve these steps should be actually very easy as you already have your existing Junit tests, integration-tests etc doing the same stuff, on daily basis (actually every CI build is doing this). But it is sometimes difficult to do this using a standalone or market-place performance tool

And/Or it is not flexible enough to feed these existing tests to the tool(s) you have choosen. So you loose interest or didn't have much time to spike it in the sprint, and then, you skip this aspect of testing, passing the blame to the downstream systems or to the tool.

Custom JUnit load runner which can easily enable you to reuse your existing Junit tests (e2e integration tests or feature tests or component tests, as most of them use JUnit or TestNG behind the scene) to generate load or stress(Read here the difference between load vs stress aka horizonal load vs vertical load).

The load runner ideally should look like below which could solve the purpose-

where your_load_config.properties should hold the below properties-

and @TestMapping means-

Your aTest method of YourExistingEndPointTest above should have the

Once the load run gets completed, you probably would need a statistics precisely like below-

and ideally much more statistics could be drawn on demand, provided your YourLoadRunner could produce a CSV/SpreadSheet with below kind of data

Friday, July 6, 2018

SQL Server Express Versions

SQL Server Express edition is a free version of SQL Server that can be used in production environments. There are different versions of SQL Server Express that we can download and use.

There are many versions of SQL Server and each version has some added new features. example SQLite a lightweight embedded and ACID-compliant database that implements most of the SQL standards, but lacks remote access capabilities SQL 2014 Microsoft added In-Memory OLTP, which was further enhanced in version 2016. In SQL 2016 came Machine Learning as a feature with the possibility to use R scripts inside T-SQL code. SQL Server 2017 added Linux edition.

SQL Server 2017 Express

SQL Server 2017 Express, is the core express database server that includes the database engine features and SQL Server Replication features, but it’s limited to work as a subscriber for Merge, Peer to Peer and Transactional replication.

This version is ideal when you need to accept remote connections. This, is a free versions of SQL Server Express . On the next image you will see a screen capture of the Feature Selection step on the SQL Server Express installer.

SQL Server 2017 Express with Advanced Services

This version gives users more features than the Express core edition. It includes the SQL Server database engine, SQL Server Replication (limited to work as a subscriber for Merge, Peer to Peer and Transactional replication) and full text search. Something that is worth mentioning is the inclusion of data analysis tools with Machine Learning Services (In-Database), allowing you to use both Python and R language. Also, you have the PolyBase Query Service for External Data, and for reporting, you have Reporting Services available as a downloadable add on.

To download and use SQL Server 2017 Express with Advanced Services, you must join the Visual Studios Dev Essentials program by following link: https://www.visualstudio.com/es/dev-essentials/.

SQL Server 2017 Express with Advanced Services Installer - Description: This are the features available on SQL Server 2017 Express with Advanced Services.

This version, is for advanced developers who are working on applications that use Business Intelligence or Machine Learning Algorithms. or developing an application that uses PolyBase to connect to a Hadoop data source or Azure blob storage.

SQL Server 2017 Express LocalDB

This version allows you to embed SQL Server Express into applications with LocalDB. It is a lightweight version of the SQL Server Express edition which is a replacement of the deprecated SQL Server Compact. This version is the best when you need to use a database in your application and you don’t want to get involved with the administration of the database.

SQL Server 2017 Express LocalDB contrary to SQLite, is a client-server database engine that allows you to create an application where multiple users or processes on the computer where LocalDB is running needs to access the database. Furthermore, with LocalDB you have the possibility to use functions, triggers and stored procedures which are not available on SQLite.

But LocalDB also has the following limitations:

It can’t act as a subscriber in a merge replication scenario.
It does not support FILESTREAM.
Only allows local queues for Service Broker.

This version is best choice if you are a developer that wants to code your application and you don’t want or don’t have time to spend on database configuration and administration.

Wednesday, June 27, 2018

No More Service Packs for SQL Server

Microsoft will no longer be providing service packs from SQL Server 2017 release.

Earlier when Microsoft released a new version of SQL Server, most database administrators and organizations did not directly upgrade to the new version of SQL Server. Change Managers, Database Managers, and application owners look for a stable SQL Server version and they usually wait for the Service Pack 1 to be released as a sign that the release is a stable new SQL Server version

SQL Server 2017 which was released in October 2017 has many new exciting features and upgrades.

SQL Server on Linux (Cross-platform SQL edition)
Resumable online index rebuild operation
Smarter backups (log and differential backups)
Adaptive query processing
Automatic tuning
SQL Server Machine Learning Services – R and Python
Graph Database Capabilities

As per Microsoft, with release of SQL Server 2017 ,the major highlights of the new servicing model are:

Service packs will no longer be released starting with SQL Server 2017. Only Cumulative Updates and GDRs (General Distribution Release).
CUs will be delivered more often at first and then less frequently. Every month for the first 12 months, and every quarter for the next 4 years of the full 5-year mainstream lifecycle.
CUs are delivered on the same week of the month: the 3rd Tuesday of each month.

SQL Server will no longer have service packs for SQL Server 2017 as well as future versions of SQL Server. Microsoft will be still supplying service packs like before for previous versions of SQL Server.

Microsoft will be providing CU's every month for the first 12 months, Microsoft will also provide CU based slipstream media after a certain number of CU's for easy upgrades. For example, after CU12 of SQL Server 2017, the slipstream media will contain SQL Server 2017 with CU12, so we do not need to install SQL Server 2017 and then upgrade to CU12. With this media, we can go directly to SQL Server 2017 CU12.

Cumulative Updates are now considered to be fully tested and certified to the level of the Service Pack and Microsoft now recommends installing the Cumulative Updates and regularly applying the Cumulative Updates for your SQL Server instances.

As per the release document of Cumulative Updates:

Each new CU contains all the fixes that were included in the previous CU for the installed version of SQL Server. Therefore, you need to install only the latest CU to be up to date.
SQL Server CUs are certified to the same levels as Service Packs and should be installed at the same level of confidence.
Microsoft recommends ongoing, proactive installation of CUs as they become available.
Historical data shows that a significant number of support cases involve an issue that has already been addressed in a released CU.
CCUs may contain benefits over and above hot fixes. This includes supportability, manageability, and reliability updates.

As per the FAQs during the launch of the new servicing model, CUs will be localized starting from SQL Server 2017, and they will handle this requirement and continue only releasing a single language-agnostic update file.

For the Windows server hosting SQL Server, Microsoft is planning to publish the most recent CU for SQL Server 2017 at the Download Center, Windows Catalog and WSUS [Windows Server Update Services]. The Windows Catalog will house the previous CU's.

With this new servicing model in place, the SQL Server 2017 patching will be simpler and we can always be on latest CU without waiting for Service Packs. Organizations will also be able to adopt the new version quickly and effectively without having any doubts about stability

The Cumulative Updates released by Microsoft for SQL Server 2017 under the new servicing model is a below

Label	Build	Release date
SQL Server 2017 RTM	14.0.1000.169	2017/10/02
Cumulative Update (CU) - 1	14.0.3006.16	2017/10/25
Cumulative Update (CU) - 2	14.0.3008.27	2017/11/29
Cumulative Update (CU) - 3	14.0.3015.40	2018/01/04
Cumulative Update (CU) - 4	14.0.3022.28	2018/02/21
Cumulative Update (CU) - 5	14.0.3023.8	2018/03/20
Cumulative Update (CU) - 6	14.0.3025.34	2018/04/19
Cumulative Update (CU) - 7	14.0.3026.27	2018/05/24
Cumulative Update (CU) - 8	14.0.3029.16	2018/06/21

Monday, June 25, 2018

Denormalize SQL Server data

Using FOR XML PATH and STRING_AGG() to denormalize SQL Server data

Problem

Converting a set of normalized data (a one-to-many relationship), and flatten that data into a single, comma-separated string of phone numbers for each user in SQL Server?

Solution

FOR XML PATH and STRING_AGG()

SQL Server has two great methods for grouped concatenation: STRING_AGG()( SQL Server 2017) and FOR XML PATH.

Tables and sample data:

CREATE TABLE dbo.Users
(
  UserID int CONSTRAINT PK_Users PRIMARY KEY,
  Name   sysname CONSTRAINT UQ_UserName UNIQUE
);
GO

CREATE TABLE dbo.UserPhones
(
  UserID      int CONSTRAINT FK_UserPhones_Users 
              FOREIGN KEY REFERENCES dbo.Users(UserID),
  PhoneType   varchar(4) NOT NULL,
  PhoneNumber varchar(32) NOT NULL
);
GO

INSERT dbo.Users(UserID, Name) VALUES
(1,N'John Doe'),(2,N'Jane Doe'),(3,N'Anon E. Mouse');
INSERT dbo.UserPhones(UserID, PhoneType, PhoneNumber)
VALUES(1,'Home','123-456-7890'),(1,'Cell','456-789-1234'),
      (2,'Work','345-678-1291'),(2,'Cell','110-335-6677');
GO   
--------------------------------------------------------------------------------------------------

Output

Name              PhoneNumbers
----------------  ------------------------------------
Anon E. Mouse    
Jane Doe          Cell 110-335-6677, Work 345-678-1291
John Doe          Cell 456-789-1234, Home 123-456-7890

Using FOR XML PATH in versions earlier than SQL Server 2017 to get comma separated string

SELECT 
  u.Name,
  PhoneNumbers = 
  (
    SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
      FROM dbo.UserPhones AS p
      WHERE p.UserID = u.UserID
      FOR XML PATH('')
  )
FROM dbo.Users AS u
ORDER BY u.Name;


Output:



The rows are ordered correctly, but the phone numbers are not listed alphabetically.   NULL value is returned for the first row, whereas the desired result lists that as an empty string.
 We’ve concatenated  the results with a leading comma.

STUFF()  is used to replace the first two characters in the concatenated string with an empty string:

SELECT 
  u.Name, 
  PhoneNumbers = STUFF
  (
    (
      SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
        FROM dbo.UserPhones AS p 
        WHERE p.UserID = u.UserID
        FOR XML PATH('')
    ), 1, 2, N''
  )
FROM dbo.Users AS u
ORDER BY u.Name;



Output



COALESCE()  to replace that NULL with an empty string:
SELECT 
  u.Name, 
  PhoneNumbers = COALESCE(STUFF
  (
    (
      SELECT ', ' + p.PhoneType + ' ' + p.PhoneNumber
        FROM dbo.UserPhones AS p 
        WHERE p.UserID = u.UserID
        ORDER BY p.PhoneType
        FOR XML PATH('')
    ), 1, 2, N''
  ), N'')
FROM dbo.Users AS u
ORDER BY u.Name;




Output





STRING_AGG() WITHIN GROUP()  in SQL Server 2017 or  Azure SQL Database accomplishes the task of   FOR XML PATH,STUFF, COALESCE() 

SELECT 
  u.Name, 
  PhoneNumbers = STRING_AGG(CONCAT(p.PhoneType, ' ', p.PhoneNumber), ', ')
                 WITHIN GROUP (ORDER BY p.PhoneType)
FROM dbo.Users AS u
LEFT OUTER JOIN dbo.UserPhones AS p
ON u.UserID = p.UserID
GROUP BY u.Name
ORDER BY u.Name; 



Output:




Performance wise FOR XML PATH query is much more expensive  than STRING_AGG() WITHIN GROUP() due to both using more CPU and  requiring more I/O:

Thursday, June 21, 2018

Earning money Online without Investment

Earning money Online

Make money in almost any niche that you decide to enter

Like you all as a beginner i had too many idea for generating money online, but was confused and messed with all the ideas and nothing seemed to work .I streamlined the idea and decided to start with blogging and guys it really worked.

So sharing the concept with you all so that you too can earn that extra bucks

How to Make Money Blogging

Create a blog. (It’s very easy to create a blog and it hardly takes 30 minutes to setup an excellent blog.)
Publish regular content on your blog (Anything that you know like entertainment, cooking, politics, hobbies, your experiences, something technical etc.).
Apply for Google AdSense here.
Once your AdSense account is approved, place AdSense ads on your blog.
Start earning for each and every time your visitor click on AdSense ads.

Nothing is easy but Nothing is hard either but 100% success