Get Records in one table with a Foreign Key to a related one to many table

There are times when you need to get records in one table with a foreign key to a related one to many table. This is a difficult need to describe, so I’ll give you the exact business scenario.

I have designed and used a Process Tracking system for many years. It currently has two basic components in the database:

  1. A FileProcess table that tracks a file (name, date, paths, app that processed it, etc.)
  2. A StatusLog table that I punch in records as this file goes through the process of being imported, validated, etc.

Often, I have multiple applications that process a batch of records from a file. I designed a stored procedure that would allow me to check for any file, by a particular application, that was in a particular status, but not past that status.

So here’s the scenario, we have a process that I have assigned the following status log values:

10 – File Parsed
20 – File Imported
30 – Data Validated
40 – Data Archived

Ok, so one application parses the file and imports it, let’s say it’s an SQL SSIS package just for fun. So it punches two status records in while it’s working, a 10 and a 20.

So now I have another validation application that checks every few minutes for something to do. I want it to be able to find any file that is in a status of 20, but NOT higher than that. So then I know it’s ready to be validated.

In order to do this, I have the following LINQ to SQL query that seems to do the job for me. I hope looking at this code will help you with whatever similar type of issue you’re trying to solve:

public async Task<List<FileProcess>> GetFileProcessesForAStatusByAppIdAsync(int AppId, int StatusId)
        {
            try
            {
                var _entityrows = (from st in _appLogContext.StatusLogs
                                   join fp in _appLogContext.FileProcess.Include(a => a.App) on st.FileProcessId equals fp.Id
                                   where st.AppId == AppId
                                    && st.StatusId == StatusId
                                    && st.StatusId == (_appLogContext.StatusLogs.Where(f => f.FileProcessId == fp.Id).OrderByDescending(p => p.StatusId).FirstOrDefault().StatusId)
                                   select fp).AsNoTracking();


                return await _entityrows.ToListAsync();

            }
            catch (Exception)
            {
                throw;
            }
        }

For those of you that are database jockeys, here’s the SQL code that this replaces:

     @AppId AS INT = NULL,
     @StatusId AS INT = NULL

    SELECT 
        [Id],
        [AppId],
        [FileName],
        [DateProcessed],
        [Inbound]
    FROM
        [FileProcess]
    WHERE
        Id IN (
    SELECT
        s.FileProcessId
    FROM
        (SELECT DISTINCT MAX(StatusId) 
            OVER(PARTITION BY FileProcessId) 
            AS ProperRow, FileProcessId, AppId
            FROM StatusLogs) AS s
    WHERE 
        s.ProperRow = @StatusId 
        AND AppId = @AppId
        )

Duplicate Rows in an SQL Database

There is often a need to find rows that have duplicate values. You can use this query to find duplicate rows in an SQL database when they have duplicate value in a single column or multiple columns, depending on your need.

Here’s an example of a simple query:

SELECT firstname, COUNT(*) occurrences
FROM [users]
GROUP BY
	firstname
HAVING 
	COUNT(*) > 1

This will give you a simple list that shows you any duplicate firstname values in the users table and the count of how many duplicates there are.

So now you know that there are duplicates, how can you make use of this data? In order to do something with the data, you’ll need some more information, such as the primary key for the rows.

You can’t add that to the query columns since that would then make every row unique and they’d no longer show up in your resultset as duplicate. What you need to do is wrap this query into a subquery that then pulls these records out as a predicate.

You can read more about how the GROUP BY works on Microsoft’s site here:

https://docs.microsoft.com/en-us/sql/t-sql/queries/select-group-by-transact-sql?view=sql-server-ver16

  SELECT * FROM [users] WHERE firstname IN (
	SELECT firstname FROM (
		SELECT firstname, COUNT(*) occurrences
		FROM [users]
		GROUP BY
			firstname
		HAVING 
			COUNT(*) > 1) a
	)

Your original duplicate query is buried in there from lines 3 to 8. We have to alias that query in order to use it as a resultset. I simply called it ‘a’ for this example. The second inner SELECT is needed to knock off the count column in order for it to work in the WHERE IN clause of the outer SELECT statement. This now gives us a list of very actionable data.

SQL Table Structure Compare

We ran into a problem at work where we have a table, along with a history table, or some other form of almost identical table where the definitions need to be in sync.

Here’s the scenario:

We have a table, say “clients”. There is a requirement that we track any changes to this table in a “clients_history” table. This could be implemented using a trigger, or just a secondary save in your repository using EF. The implementation is irrelevant.

A few months later, a developer is working a user story where they need to add a field to clients to track if the company doesn’t want sales people to call on the client any longer.

The developer adds a field to the clients table:

ALTER TABLE dbo.clients ADD
     isdeadbeat bit NULL

Now there are several things that will happen:

  • The developer will not know or remember to add this field to the client_history table
  • Depending on how the trigger is coded, it could raise a difficult to troubleshoot error trying to insert into the clients_history table
  • The new field data will not be saved in the clients_history table

How can we resolve this issue. Well, there are dozens of ways, but what’s the easiest?

If you deploy your Database changes from a TFS project, you can add a post deployment script that runs after every deployment. You probably do this already for inserting values into lookup tables etc.

We’ll add a new script and insert the following code:

DECLARE @Count INT;
SELECT @Count = COUNT(*) FROM (
SELECT name FROM sys.columns WHERE object_id = OBJECT_ID(N'dbo.clients_history')
EXCEPT 
SELECT name FROM sys.columns WHERE object_id = OBJECT_ID(N'dbo.clients')
) AS A

IF (@Count > 0)
BEGIN
	RAISERROR('There is a mismatch between two tables',18,-1);
	RETURN
END

GO

When we run just the SELECT statements with the EXCEPT, we see the there are differences identified:

If we run the entire script, it will raise and exception to let the person doing the deployment know that there is an issue that needs to be addressed. You’ll just add a new statement for each table pair that you want to check, as they are identified.

Now call this new script with your Script.PostDeployment.sql file.

Now whenever someone publishes this script, if there are unwanted differences between the two files, they will be noted.

NOTES:

  • The table that is most likely to have changes that need to be sync’d, in this case, the clients table, should be first in the EXCEPT statement.
  • If there is a field that you want to ignore for some reason, you must provide a where clause.