Modify Group By to achieve correct results
I have large data set with historical data of the INACTIVE Clients and their billing (services) information. I need to group data by ClientId, Service and Year AND produce aggregated fields, such as TotalCharges, TotalProvidedServices and DaysStayedWithUs (Issue with this column).
I was able successfully Group data by ClientId and Service, however grouping by Year producing the wrong result and I know why.
The reason is because Year column produced from ServiceDate column, that originally was used for creating FirstServiceDate and LastServiceDate => to be able produce DaysStayedWithUs (LastServiceDate - FirstServiceDate will give us Amount of Days the Client was with Us (DaysStayedWithUs)). And when I Group by Year, query finds FirstServiceDate, LastServiceDate and calculates DaysStayedWithUs based on it....therefore I have to find another way to achieve result I need (showed below).
Client Table:
ClientId
1234567890
Billing Table:
ClientId Service ServiceDate ChargesTotal
1234567890 Cleaning 12/4/2018 190.17
1234567890 Cleaning 1/22/2019 97.8
1234567890 Cleaning 1/29/2019 97.8
1234567890 Cleaning 2/5/2019 97.8
1234567890 Cleaning 2/12/2019 97.8
1234567890 Cleaning 3/5/2019 97.8
1234567890 Cleaning 2/19/2019 97.8
1234567890 Cleaning 3/12/2019 97.8
1234567890 Cleaning 3/19/2019 97.8
1234567890 Cleaning 3/26/2019 97.8
My code:
SELECT GroupedTable.ClientId,
GroupedTable.Service,
---GroupedTable.Year,
GroupedTable.FirstServiceDate,
GroupedTable.LastServiceDate,
GroupedTable.TotalCharges,
GroupedTable.TotalProvidedServices,
GroupedTable.DaysStayedWithUs
FROM (SELECT MainTable.ClientId,
MainTable.Service,
SUM(MainTable.Charges) AS TotalCharges,
COUNT(*) AS TotalProvidedServices,
DATEDIFF(DAY, MIN(MainTable.ServiceDate), MAX(MainTable.ServiceDate)) as DaysStayedWithUs,
---MainTable.Year,
MIN(MainTable.ServiceDate) AS FirstServiceDate,
MAX(MainTable.ServiceDate) AS LastServiceDate
FROM (SELECT c.ClientId,
b.Service,
b.ServiceDate,
---YEAR(b.ServiceDate) AS Year,
b.Charges
FROM Client as c
LEFT JOIN Billing as b
ON (c.ClientId = b.ClientId)
WHERE b.ClientId = 1234567890
) as MainTable
---GROUP BY MainTable.ClientId, MainTable.ServiceIdentifier, MainTable.Year) AS GroupedTable
GROUP BY MainTable.ClientId, MainTable.Service) AS GroupedTable
Code above produces correct results: (Year deselected)
If I include Year in the Group By, output is:
P.S. I marked cells in Yellow that have issues (technically FirstServiceDate and LastServiceDate are need to be corrected and then DaysStayedWithUs will be adjusted by itself)
Result I need to accomplish:
Hope it is possible to achieve. Thank You.
Solution 1:
Use the window functions MIN OVER
and MAX OVER
in order to see whether a year is the first/last year for a client and service.
SELECT
c.clientid,
b.service,
YEAR(b.servicedate) as year,
SUM(b.charges) AS total_charges,
COUNT(*) AS total_provided_services
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1)
END AS first_service_date,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31)
END AS last_service_date,
DATEDIFF(DAY,
CASE WHEN YEAR(b.servicedate) = MIN(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MIN(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 1, 1) END,
CASE WHEN YEAR(b.servicedate) = MAX(YEAR(b.servicedate)) OVER (PARTITION BY c.clientid, b.service)
THEN MAX(b.servicedate) ELSE DATEFROMPARTS(YEAR(b.servicedate), 12, 31) END
) AS days_stayed_with_us
FROM client AS c
LEFT JOIN billing AS b ON b.clientid = c.clientid
WHERE c.clientid = 1234567890
GROUP BY c.clientid, b.service, YEAR(b.timebillingservicedate)
ORDER BY c.clientid, b.service, YEAR(b.timebillingservicedate);