Thursday, February 26, 2015

.Net ThreadPool (and the Task Parallel Library)

Intro

Threads are a great way to accomplish multiple tasks at the same time, while ensuring that you spread the joy among multiple cores on a multi-core system. They're excellent at improving the perceived response time of your applications and separating out disparate tasks to execute more quickly as a whole.

Even better, for a while now the .Net Framework has has a ThreadPool. This is a pool of background threads that are already created and ready to process things for you. Because they are already created, firing up one of these threads is a quicker process. They're performance-optimized threads ready for you to use with a minimal amount of code.

This blog assumes you have some knowledge of threaded programming in .Net. If you have no experience programming with threads, try this primer first: Multithreaded Programming Using C#.

Usage

Since .Net 4.0, the ThreadPool can be used with Generic tasks using the Task Parallel Library. I'd rather start off the explanation with an example so here we go:

using System.IO;
using System.Threading;
using System.Threading.Tasks;
using System.Windows.Forms;

namespace BlogThreadPool
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void ProcessFileGeneric(Object fileNumber)
        {
            var fileName = String.Format("file{0}.txt", fileNumber);
            if (File.Exists(fileName))
                File.Delete(fileName);
            File.WriteAllText(fileName, String.Format("file #{0} contents", fileNumber));
            Thread.Sleep(5000);
            File.AppendAllText(fileName, "\r\nsome more contents");
        }

        private void button1_Click(object sender, EventArgs e)
        {
            Task.Factory.StartNew(new Action<Object>(ProcessFileGeneric), 1);
            Task.Factory.StartNew(new Action<Object>(ProcessFileGeneric), 2);
            Task.Factory.StartNew(new Action<Object>(ProcessFileGeneric), 3);
        }
    }
}


This code simply creates 3 files and puts some content in them. The method ProcessFileGeneric is pretty self-explanatory. Delete a file if it exists, create the file and write some initial contents, wait 5 seconds, then append some more text to the file. The more interesting bit of magic is in button1_Click. This method creates 3 new tasks using the TPL, via Task.Factory.StartNew.Each call fires up one of the ThreadPool threads and feeds it a method to execute. In this case we are using the Action object and feeding it an Object parameter. Pretty cool huh?

As simple as that was, there's more to the ThreadPool than that. You might have already asked your computer "How many threads are sitting here waiting for my whimsy?". Well, there's an easy way to find out, and an easy way to set the value! See here:

        private void button2_Click(object sender, EventArgs e)
        {
            int maxWorkerThreads, completionPortThreads;
            ThreadPool.GetMaxThreads(out maxWorkerThreads, out completionPortThreads);
            int minWorkerThreads;
            ThreadPool.GetMinThreads(out minWorkerThreads, out completionPortThreads);
            int availableThreads;
            ThreadPool.GetAvailableThreads(out availableThreads, out completionPortThreads);
            var data = String.Format(
                "MaxThreads: {0}\r\n" +
                "MinThreads: {1}\r\n" +
                "AvailableThreads: {2}\r\n",
                maxWorkerThreads, minWorkerThreads, availableThreads);
            MessageBox.Show(data);
        }


This cool new method shows us the usage of GetMaxThreads, GetMinThreads, and GetAvailableThreads. GetMaxThreads shows the max # of threads we can have active in the ThreadPool at any one time. GetMinThreads shows the minimum number that are awaiting our whims (including threads that are already working for us). GetAvailableThreads shows the # of unused threads available to us (excluding those that are already working for us). There are also setter methods SetMaxThreads and SetMinThreads. Under normal circumstances you won't use these much, but hey, you never know. If you run into performance issues with your ThreadPool code, play around with these and see what you get.

What's Next?

That's it for today folks. There is more to know about ThreadPool, background vs foreground threads, when you might want to use threads, and how to use them effectively in a desktop UI, but hey this blog can't go on forever. Check out the links in the Resources section below if you want to read more info. Happy learning!

 

Resources


The Managed Thread Pool
ThreadPool Class
Multithreaded Programming Using C#
Foreground and Background Threads

ConcurrentDictionary

Introduction:


In a previous post I'd talked about the Dictionary type.  Ever wonder if there's a thread-safe version out there to use?  ConcurrentDictionary is what you seek.  Introduced in .net 4.0, we still get the same key/value pairing available before.  But enough about definitions, lets do some coding.

Coding:


using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections.Concurrent;

namespace DictionarySample
{
  class Program
  {
    static void Main(string[] args)
    {
      //step 1: create!
      ConcurrentDictionary<string, string> theRain = new ConcurrentDictionary<string, string>();
      theRain.GetOrAdd("Daniel", "c0d3r");
      theRain.GetOrAdd("Bob", "$ql script kiddi3");
      theRain.GetOrAdd("Dan", "c0d3r");
      theRain.GetOrAdd("Peter", "t@$k m@$t3r");
      
      //step 2: test
      Console.WriteLine("Testing time of our ConcurrentDictionary");
      Console.WriteLine("does myDic.ContainsKey(chode)? : " + theRain.ContainsKey("chode").ToString() + " ");
      Console.WriteLine("does myDic.ContainsKey(Peter)? : " + theRain.ContainsKey("Peter").ToString() + " ");
      Console.WriteLine("does myDic.ContainsKey(peter)? : " + theRain.ContainsKey("peter").ToString() + " ");
      Console.WriteLine();
      // step 3: write out data
      Console.WriteLine("initial data value(s) loaded in our ConcurrentDictionary");
      foreach (var pair in theRain)
        Console.WriteLine(pair.Key + " " + pair.Value);

      // step 4: update some data.
      theRain.TryUpdate("Peter", "koala", "Uber");
      Console.WriteLine();
      Console.WriteLine("value of Peter : " + theRain["Peter"]);
      Console.WriteLine();

      theRain.TryUpdate("Peter", "koala", "t@$k m@$t3r");
      Console.WriteLine();
      Console.WriteLine("value of Peter : " + theRain["Peter"]);
      Console.WriteLine();

      //step 5: write out the data
      Console.WriteLine("Modified data value(s) loaded in our ConcurrentDictionary");
      foreach (var pair in theRain)
        Console.WriteLine(pair.Key + " " + pair.Value);
      Console.WriteLine("\nPress Any Key to Exit.");
      Console.ReadKey();
    }
  }
}


Step 1: The constructor part is easy enough, create it quite similarly like we did with the Dictionary blog.
Step 2: doing some QA to see if ConcurrentDictionary really does act like the regular Dictionary.  It does!
Step 3: displaying the data to verify what was originally created exists as we expect.
Step 4: Attempting to update some data, for my sample I'm going to turn the key Peter into the value of koala instead of t@$k m@$t3r.  As the value of Peter isn't Uber it won't update.  As the value of Peter is t@$k m@$t3r it will update and now Peter is a koala.
Step 5: displaying out the data to verify what we've modified exists as we expect.

and the output:














Source:

Thursday, February 19, 2015

.Net Sorted Dictionary

Intro

Have you ever sat around and thought to yourself "Gee, I sure could use a list that has an identifier of some sort associated with an object of some sort!". You find the Dictionary class, bust out some code, and you're a happy camper. A month later your customer tells you "Hey lazy coder-boy, your code's too slow!". You dig into things and with a little bit of that special coder magic you find out that putting items in the Dictionary is fine, but reading items from the Dictionary is super-slow! You've got an integer Key and a TWidget (class/object) Value. So now you think to yourself "Gee, I sure wish that Dictionary was faster at finding random Keys!". Well golly folks, I've got a solution for you! SortedDictionary.

Usage

According to MSDN, a SortedDictionary "Represents a collection of key/value pairs that are sorted on the key.". Sorting is great at solving the aforementioned problem of slow seeks/scans, or at least it is in combination with a sorted-list-optimized search algorithm. SortedDictionary gives you automatic sorting and the usage of optimized search algorithms to find your data. If you've ever coded a bubble sort, quick sort, binary search, or one of the other array of searching/sorting algorithms, you'll feel blessed that the .Net framework has already done the plumbing for you. Let's look at a simple sample:

        private void SortedDict()
        {
            var dict = new SortedDictionary<int, string>();
            dict.Add(1234, "hi!");
            dict.Add(4321, "yo!");
            dict.Add(9876, "werd!");
            string value;
            if (dict.TryGetValue(1234, out value))
            {
                //hey, we found the value! alert the media!
            }
        }

This method creates a simple SortedDictionary whose key will be an int and whose value is a string. It then adds the values to the dictionary and attempts to find the specified value in the dictionary. For such a small sampling of data, you aren't likely to gain anything by using a SortedDictionary. There's not set cutoff point, but I don't bother optimizing until I need to for performance reasons and until I have at least 1k items in the list. I was recently working with a list of 500k items and switching from Dictionary to SortedDictionary sped things up by a factor of nearly 10! YMMV.

What's Next?

Tidwell wrote a blog for you guys on the ITCProgBlog just a day or two ago about the basic Dictionary class. This blog here told you about SortedDictionary. There's also a ConcurrentDictionary you could go read about if you like, it'll help you when you get into threading. You could also just write your own samples, and perhaps pontificate presciently on your next project requiring performant code. (ie think of ways to a SortedDictionary in your real-life projects).

Resources

SortedDictionary

Wednesday, February 18, 2015

Dictionary

Intro:

So you like hashsets and want to see some other cool datatypes out there that can manipulate data in cool and efficient ways?  Have no fear, another option is the Dictionary type!  With this data type you get Key/Value pairing storage along quick searching and data manipulation.  Think HashTable but more generically typed so you can pass an Object type versus having to pass a defined type (int/string/etc...).

But enough about defining the Dictionary type, lets write some code!

Coding:

For this adventure I'm going to borrow some data from Bobby's Removing Duplicate Data post (go read it if you haven't already, great stuff).  Adding data is similar to what you'd do for a List (yes i used one in this sample).

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace DictionarySample
{
  class Program
  {
    static void Main(string[] args)
    {
      //step 1: create!
      Dictionary<string, List<string>> myDictionary = new Dictionary<string, List<string>>();
      myDictionary.Add("Daniel",new List<string> { "Slingblade","1001101111","Troll" });
      myDictionary.Add("Bob", new List<string> { "UpandDown","4242424242","Troll" });
      myDictionary.Add("Dan", new List<string> { "TheMan", "ffffffffff", "Troll" });
      myDictionary.Add("Peter", new List<string> { "PumpkinEater", "0110010000", "Lesser-Diety" });
      //step 2: test
      Console.WriteLine("does myDictionary.ContainsKey(chode)? : " + myDictionary.ContainsKey("chode").ToString() + " ");
      Console.WriteLine("does myDictionary.ContainsKey(Peter)? : " + myDictionary.ContainsKey("Peter").ToString() + " ");
      Console.WriteLine("does myDictionary.ContainsKey(peter)? : " + myDictionary.ContainsKey("peter").ToString() + " ");
      //step 3: write out the data
      foreach (var pair in myDictionary)
      {
        Console.Write(pair.Key + " ");
        foreach (string value in pair.Value)
          Console.Write(value + " ");
        Console.WriteLine();
      }
      Console.WriteLine("\nPress Any Key to Exit.");
      Console.ReadKey();
    }
  }
}

And the output:














Step 1 is a fairly easy concept, go create a new Dictionary with whatever datatypes you wish to use.  As they are Key/Value pairs you'll have to have two data types passed in, this is where Bobby's blog data comes into play.  Step 2 I'm doing a little bit of testing to see if Keys exist within my Dictionary, Notice the case sensitivity involved with the Peter/peter test results.  Step 3 is just displaying the data from the Key/Pair values in the Dictionary.

Source:

Microsoft: Dictionary

Wednesday, February 11, 2015

Synonyms in SQL Server

Introduction

Have you ever found yourself building a very complex stored procedure or script that deals with multiple servers and databases? Were you thinking to yourself "I pity the fool that has to re-aim this thing when the server name changes!"? Yea, we have all been there! Did you ever have a need to be able to easily redirect where your script was pulling or putting the data? SQL Server 2005 and up has the answer for you and they are called Synonyms! Let's take a look!   


Synonyms

What is a synonym ? A synonym is nothing more than a pointer to an object that SQL server maintains in a database. These pointers are resolved at run time so that means you can create one without the object it points to actually existing. This can be a good thing or a bad thing depending on how you look at it. To create a synonym , you must have the CREATE SYNONYM permission or be in a group with this privilege. 

Make sure you are in the database you wish to have the synonym in. The command to create a synonym is:

USE [DATABASENAME}
GO
CREATE SYNONYM [dbo].[SYNONYM_NAME] FOR [OBJECT]
GO

Now let's see how we can use this to our advantage. If I have a query that references [SERVERA].[OLTP_DB].[dbo].[MyDateTable], If I create the synonym below:

USE [TestDB]
GO
CREATE SYNONYM [dbo].[Remote_OLTP] 
   FOR [SERVERA].[OLTP_DB].[dbo].[MyDateTable]
GO

I can now use the synonym in place of the table name in my query.

SELECT TOP 10 * FROM [dbo].[Remote_OLTP].

Not only does this shorten my script, but it makes it very easy to re-point my script to [SERVERB] without having to modify the code just by changing the synonym. There is no ALTER statement for this task current to this writing so you simply drop and recreate it.

USE [TestDB]
GO
DROP SYNONYM [dbo].[Remote_OLTP]
GO
CREATE SYNONYM [dbo].[Remote_OLTP] 
   FOR [SERVERB].[OLTP_DB].[dbo].[MyDateTable]
GO

Now the same query is pulling data from ServerB instead of ServerA and I don't have to write that long line of reference to it anymore. Win/Win!

The Good and the Bad

You can point a synonym at almost anything in SQL server. Some but not all of the examples are:
  • Tables
  • Views
  • Stored Procedures
  • Temporary tables both local and global.
The key thing to remember is it is database specific. 
Some of the positive points of synonyms are:
  1. They provide a layer of abstraction between the server and your code.
  2. They allow you to replace complex multi part object names with simple easy to type names.
  3. They provide the flexibility to easily change the objects your scripts reference with a change to a single object. Did I mention abstraction enough? You get the point.
Just some of the bad things you'll need to watch out for while using synonyms that I have come across, and this is by no means a complete listing:
  1. You can read the code, but you can't be sure where it's pointing until you check the synonyms.
  2. Synonyms are loosely bound. This means they could get dropped or changed and your script won't know until it fails while running.
  3. You can't make a synonym for a synonym. This is sometimes termed chaining from an OOP standpoint. 
  4. You cannot have an object name with the same name as a synonym. I usually use a prefix on mine to make them unique, but I do that with most objects in my databases. 
  5. You can't use synonyms in a DDL statement. Pretty clear statement of the obvious there. 

Conclusion

I hope you find the use of synonyms in your future to be rewarding. I would like to offer a few suggestions in closing. Prefix your synonyms so that they don't consume useful object names in your database. Comment your scripts and stored procedures that use synonyms at the top so that others know to look. I usually put the create commands in the bottom of a script commented out so that they can easily be recreated as designed if someone changed them and forgot to script them out first. 



Thursday, February 5, 2015

ASP.Net Web.config Transformation

Intro

Did you know there's an easy way to change your web.config when you deploy it? This doesn't involve you manually modifying the web.config after or prior to deployment, or writing some special script to edit the file for you on the live server. Nay fellow coders, nay! All you have to do is create something called a transformation file, put a little bit of xml in it, and when you deploy, the web.config that gets deployed will be magically transformed by Visual Studio. Let's see how the magic is woven.

Note: This blog was done using VS 2013 Community Edition. VS 2013 Express should work the same. Earlier and later versions of VS may differ.

Howdie-Do

First, fire up Visual Studio and create a new Web Application. Make it an empty web application. We don't need any special technologies like Web Forms or MVC as the site won't do anything. We're just demoing how to transform web.config files.

You should now have a web application in Visual Studio that looks like the below screenshot. It's pretty sparse, but that's ok; that's what we want. Notice how your web.config file can be expanded with that little arrow to the left of it? Go ahead and click that. You'll see that 2 files live underneath web.config, namely web.debug.config and web.release.config.


These 2 gems are where we cast our spell. First things first though, let's see what's in the web.config. Open 'er up. Yours should look similar to the one below. Not much in here either, and that's perfect. It will make illustrating this technique easier.


The key element for our purposes will be the compilation tag. See that debug="true" in there? It's generally considered bad practice to leave that in your deployed web.config file, as it creates potential performance issues and security vulnerabilities. Our goal then is to have the configuration transformation process set debug to false or remove it.

OK let's get crackin! Open up your Web.Release.config. I'll assume that when you want to push your website live you'll be building the solution in release mode, but I'll show you how to set the compilation mode during deployment in a little bit.

This file has more content than the web.config. Still not very many tags though. And hey, what's that? A compilation tag? Yeppers it is! Visual Studio has assumed for you that you'll want to modify the compilation tag when deploying in release mode. Most importantly, see the xdt:Transform attribute? This attribute, with its value of "RemoveAttributes(debug)", tells Visual studio to remove the debug attribute from your web.config during deployment. Well slap me and call me Jill, that's precisely what we wanted. Yeehaw! No coding necessary even!


Now it's time to see how to deploy the site. Right-click the web application project in Solution Explorer and choose the menu item "Publish".


Normally you might deploy to Azure or something else equally nifty, but for the sake of brevity we'll deploy to the local machine. Choose "Custom" as the publish target. A popup window will come up asking what you want to name the publish profile. Name it whatever you like.

 

On the next screen choose "File System" as the publish method, and point the Target location wherever you like on your computer.

  


On the next screen of the wizard choose "Release" as your Configuration, then click the "Publish" button.


Once VS is done publishing, use Windows Explorer to browse to the directory you published the site to. Open up the deployed web.config in whatever text editor you like. See that lovely compilation tag? No debug attribute! Success! FYI, if the debug attribute is not present, its value defaults to false.


What's Next?

There is a lot more you can do with Web.config Transformation files. You can replace elements, remove elements, add elements, add/remove/modify attributes and more. Check out the link below in the Resources section to see how to do all that you could possibly want and then some.

Resources

Web.config Transformation Syntax

Hashsets

Intro:

Do you use data within arrays/lists and think to yourself: is there another datatype for my unique (yes unique) data that's fairly efficient and easy to write?  Hashsets for the win!

Coding:

Lets write some code!  Time to use my time honored tradition of fanstasy names, we'll go for uniqueness using the names of the dwarves from the most recent Hobbit movie series.  Forgive the lazy way of loading up the initial hashset, lets see what hashsets can do for you:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace HashSetSample
{
  class Program
  {
    static void Main(string[] args)
    {
      // step 1: create our hashsets!
      HashSet<string> dwarvenHash = new HashSet<string>();
      HashSet<string> darkHairedHash = new HashSet<string>();
      
      // step 2: load our hashsets!
      dwarvenHash.Add("Dwalin");
      dwarvenHash.Add("Balin");
      dwarvenHash.Add("Kili");
      dwarvenHash.Add("Fili");
      dwarvenHash.Add("Dori");
      dwarvenHash.Add("Nori");
      dwarvenHash.Add("Ori");
      dwarvenHash.Add("Oin");
      dwarvenHash.Add("Gloin");
      dwarvenHash.Add("Bifur");
      dwarvenHash.Add("Bofur");
      dwarvenHash.Add("Bombur");
      dwarvenHash.Add("Thorin");

      darkHairedHash.Add("Dwalin");
      darkHairedHash.Add("Kili");
      darkHairedHash.Add("Fili");
      darkHairedHash.Add("Nori");
      darkHairedHash.Add("Ori");
      darkHairedHash.Add("Gloin");
      darkHairedHash.Add("Bifur");
      darkHairedHash.Add("Bofur");
      darkHairedHash.Add("Thorin");

      string names = string.Empty;
      foreach (string dwarf in dwarvenHash.ToArray())
        names += dwarf + " ";
      names.Trim();
      // step 3: show initial data.
      Console.WriteLine("Name of all dwarves in the party : " + names);
      names = string.Empty;
      foreach (string dwarf in darkHairedHash.ToArray())
        names += dwarf + " ";
      names.Trim();
      Console.WriteLine("Name of dwarves with dark hair : " + names);

      // step 4: remove some data.
      dwarvenHash.ExceptWith(darkHairedHash);

      // step 5: show modified data.
      names = string.Empty;
      foreach (string dwarf in dwarvenHash.ToArray())
        names += dwarf + " ";
      names.Trim();
      Console.WriteLine("Name of dwarves with lighter/gray hair : " + names);


      Console.WriteLine("\nPress Any Key to Exit.");
      Console.ReadKey();
    }
  }
}

Lets discuss the fun of above.  Step 1: creating your HashSet.  For my blog purpose the data type is string.  Step 2: loading the data.  There are more elegant ways, search the interwebz and reply down below with your attempts :).  Step 3: displaying the initial data of each hashset to verify they loaded the way you'd expected them to.  Step 4 is a method available to us to remove any size of data that exists from a larger size of data that exists.  AKA: I want to remove all darkHairedHash dwarves from my list of dwarvenHash dwarves.  In a very elegant (and very efficient) way of doing that, I'm using ExceptWith.  Step 5 is showing off the modified data to verify any/all changes were made.

Our result:













Huzzah!  As always any/all comments are appreciated.

Source:

M$ HashSets

Removing Duplicate Data

Introduction

A common problem almost everyone runs into at one time or another is duplicated data in a table. If you haven't encountered this yet you are one, very lucky and two, your day is coming. Good news! We have a solution in SQL that makes sorting through this a much easier task! The ROW_NUMBER() function can make very short work of eliminating those pesky extra rows. First you'll want to identify how those rows made it into your table and close the hole so it doesn't happen again. Now let's take a look at how we can use this function to make our problems go away.


Creating the Problem

Lets create a table in our TestDB. We'll call it Employees. This table is for tracking the progression of our employees careers over time.

USE [TestDB]
GO
--Create a test table
CREATE TABLE [dbo].[Employees](
[ID] [int] IDENTITY(1,1) NOT NULL,
[FirstName] [varchar](20) NOT NULL,
[LastName] [varchar](20) NOT NULL,
[EmployeeNumber] [varchar](10) NOT NULL,
[Title] [varchar](20) NOT NULL,
 CONSTRAINT [PK__Employee__ID] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]

Now lets fill it with some bogus data and make up a few career progressions along the way so we have some rows on all our employees. Any similarity to persons either living or dead is just coincidence.

--Fill it with some data
INSERT INTO [dbo].[Employees] ([FirstName],[LastName],[EmployeeNumber],[Title])
VALUES
('Daniel','Slingblade','1001101111','Troll')
,('Peter','PumpkinEater','0110010000','Troll')
,('Bob','UpandDown','4242424242','Troll')
,('Dan','TheMan','ffffffffff','Troll')
,('Peter','PumpkinEater','0110010000','Lesser-Deity')
,('Daniel','Slingblade','1001101111','Gate Keeper')
,('Bob','UpandDown','4242424242','Key Master')
,('Dan','TheMan','ffffffffff','Stream Crosser')
,('Bob','UpandDown','4242424242','Lesser-Deity')
,('Dan','TheMan','ffffffffff','Lesser-Deity')
,('Peter','PumpkinEater','0110010000','Demigod')
,('Daniel','Slingblade','1001101111','Wizard')
,('Guy','Redshirt','0000000000','Expendable')
,('Buffy','Bendy','4572475047','Slayer')

Now that we have some data in our table, we have to create a duplication situation in order to resolve it. Take this query and run it a few times. Really mess things up.

INSERT INTO [dbo].[Employees] ([FirstName],[LastName],[EmployeeNumber],[Title])
SELECT [FirstName],[LastName],[EmployeeNumber],[Title]
FROM [dbo].[Employees] [Emp] WITH (NOLOCK)
WHERE Firstname LIKE 'Dan%'

INSERT INTO [dbo].[Employees] ([FirstName],[LastName],[EmployeeNumber],[Title])
SELECT [FirstName],[LastName],[EmployeeNumber],[Title]
FROM [dbo].[Employees] [Emp] WITH (NOLOCK)
WHERE Firstname LIKE 'P%'


Solving the Problem

I now have 41 rows of duplicated data in my test table. We're going to attempt to clean the table so that we only have a single row of data for any given condition.

The SQL function ROW_NUMBER() adds an integer column to a selection of data. This allows you to easily number a set of rows for sorting, paging or in this case, identifying duplicate rows.

Step one, we'll decide what identifies a row. In this case it will be [FirstName], [LastName], [EmployeeNumber] and [Title]. You can change your criteria to meet the needs of the situation you're in but the end game is to get a unique key. So let's do a SELECT DISTINCT on these fields and see what we come up with. With the query below we have our original 14 rows so this plan will work.



How do we split them off? In this case a SELECT INTO... would solve your problem with a quick truncate table, but we're going to pretend there are millions of rows here. There are several different ways but one of the easiest is to just delete the duplicate entries.

Using this query we can easily number our rows 1...X by our key we identified earlier.

SELECT ROW_NUMBER() OVER ( PARTITION BY [FirstName],[LastName],[Title] ORDER BY [LastName] ) [ROW]
      ,[FirstName]
      ,[LastName]
      ,[EmployeeNumber]
      ,[Title]
FROM [TestDB].[dbo].[Employees] 

You should see something similar to the table listing below. Based on this, we can now delete anything with [ROW] > 1 and remove all the duplicates.



Using this statement inside a transaction we can test the operation before we actually commit it. What we do is select the rows numbered as [Rows] deleting any of them with [Row] > 1.

BEGIN TRANSACTION
DELETE [ROWS] FROM
(
SELECT ROW_NUMBER() OVER ( PARTITION BY [FirstName],[LastName],[Title] ORDER BY [LastName] ) [ROW]
 ,[FirstName]
 ,[LastName]
 ,[EmployeeNumber]
 ,[Title]
FROM [TestDB].[dbo].[Employees]
) [ROWS]
WHERE [ROW] > 1
ROLLBACK

Now just change the ROLLBACK to a COMMIT and run the statement again and the duplicate rows will be gone. .

Conclusion

I hope you can use this at some point in time to save yourself a little bit of work. This is just one of the possible uses of ROW_NUMBER(). What other possible uses can you think of?