Today's Advent of Code challenge is really interesting. Somehow easy, but with a couple of interesting discussion points.
Day 3: Rucksack Reorganization is about helping elves to organize and prioritize their supplies.
I imported the input data as usual, by coping it from the website to my Azure Data Studio query and then using STRING_SPLIT to import each single line - that represents the rucksack content - in its own row:
declare @input nvarchar(max) = 'QLFdFCdlLcVqdvFLnFLSSShZwptfHHhfZZZpSwfmHp
rTJRjjbJTgzDJjdsRsfwtfNwtfmZpZNhmmzt
...
jGrGqjJfqccrfqGcGplrJpFvzggqmCtMzmsMnvMvvCgm';
drop table if exists dbo.ch03_input;
create table ch03_input
(
id int identity not null primary key,
items varchar(100) collate Latin1_General_BIN2
);
insert into
dbo.ch03_input (items)
select
trim(replace([value], char(13), ''))
from
string_split(@input, char(10))
go
The items in the rucksacks are identified by a letter and the identifiers are case sensitive. For this reason, I used the Latin1_General_BIN2
collation that will make sure I adhere to the requirements and get the best performance possible with strings, as explained in Day 2 challenge solution post.
Part 1
In the first part of the challenge, the rucksack list is split in two, and you must find the item type - represented by its letter - that is in both lists. It is a string comparison problem: which letter of the first list is also in the second list?
The first step is to split the list into two list of the same size:
select
*,
len(items) as itemcount,
left(items, len(items)/2) as comp1,
right(items, len(items)/2) as comp2
into
#step1
from
ch03_input;
I also calculate the length of string as it will come useful in the next step, where I'll split the rucksack string into its letters and store each letter in its own row:
select top(100) row_number() over (order by a.object_id) as n into #n from sys.columns a cross join sys.columns b
select
*,
substring(items, n, 1) as item
into
#step2
from
#step1 s
cross join
#n n
where
n.n <= s.itemcount
Splitting a string in its letters is easy if you have a table with numbers, which is what I'm creating as first thing in the query above. Then I use that numbers table to generate a row for each letter in the string, via the CROSS JOIN
and for each row generate extract the Nth
letter of the string. The WHERE
clause uses itemcount
to make sure that I generate exactly one row for each letter in each string, and no more than that.
Then I need to find which item is in both compartments. This means checking if a letter is in a string and that can be done using CHARINDEX
:
select distinct
id, items, comp1, comp2, item,
charindex(item, comp1, 1) as p1,
charindex(item, comp2, 1) as p2
into
#step3
from
#step2
where
charindex(item, comp1, 1) != 0
and
charindex(item, comp2, 1) != 0
order by id
The query needs a DISTINCT
as there can be more than one item of the same type in the string, I need just only one per type. The fact that I need to use a DISTINCT
rings some bells (or bring some smell): I'm fairly sure I can refactor my code to do this operation earlier and avoid checking for a letter that has been found already. I'll do this later if I have enough free time. For now, I want to see if my solution works; after that I can optimize it.
Now that the items present in both compartments have been found, I have to assign to each item type the priority value as described in the challenge. Priorities values are based on alphabetical order, so I can use ASCII
to get the letter value and transform it to the corresponding priority value. Priority values are ordered differently than the ASCII order, so a CASE
statement is needed to apply the right transformations:
select
item,
case
when item like '[a-z]' then ASCII(item) - ASCII('a') + 1
when item like '[A-Z]' then ASCII(item) - ASCII('A') + 27
end as priority,
id,
comp1,
comp2
into
#priorities
from
#step3
order by
item
And now just summing all the priorities will give the answer:
select sum(priority) from #priorities
Answer is correct, so let's move to the next part of the challenge.
Part 2
Elves gather in groups of three, and the goal is to find which item type is carried by everyone in the group.
The first step is to create a way to easily group the elves together. By using the existing id
columns and the modulo operator I can find when a new group begins. When the result of the modulo operation is equal to one:
I just need to know when a group starts, so I can set everything not equal to 1 to 0:
Now I can then use a simple and fast running total to generate a group_id
that will allow quick identification of all items in a single group. Amazing, isn't it?
Funny enough my SQL Guru friend Itzik mentioned this technique with the running total when we met yesterday evening. Funny that I would have needed it right the next day. Thanks Itzik!
The final query is the following:
select
*,
len(items) as itemcount,
sum(case when (id % 3) != 1 then 0 else 1 end) over (order by id) as group_id
into
#step1
from
ch03_input;
easy, fast, and elegant!
Once that a way to identify each group is there, the challenge is almost solved. It is just a matter of splitting the strings into their letters, as I did for part one too.
select
*,
substring(items, n, 1) as item
into
#step2
from
#step1 s
cross join
#n n
where
n.n <= s.itemcount
Now I have everything I need to see which letter is present in all three rucksacks. As simple GROUP BY
filtering only those letters that appears exactly three times by using the HAVING
clause will give the answer:
with cte as
(
select distinct id, group_id, item from #step2
)
select
group_id,
item
into
#step3
from
cte
group by
group_id, item
having
count(*) = 3
order by
group_id
The tricky part here is the DISTINCT
operator in the Common Table Expression. That DISTINCT
makes sure I can differentiate between an item appearing three times in the same rucksack vs an item appearing one item in all three rucksacks. We're interested only in the latter, and not in the first.
Now I just have to apply the same logic to get the priority value used before, calculate the overall total and I'll get the solution to Part 2. Challenge done.
Try it yourself
The full solution is available here: yorek/aoc-2022
The technique to deal with islands of data is useful in so many practical uses case that I really recommend you to deep dive into it. Take advantage of the free book chapter on the subject available here: Gaps and islands
Alternative solution to Part 2
An alternative solution could have been a simple JOIN between the three rucksacks in the same group:
select distinct
a.id as group_id,
a.item as item
into
#step3
from
#step2 a
inner join
#step2 b on a.id + 1 = b.id and a.item = b.item
inner join
#step2 c on b.id + 1 = c.id and b.item = c.item
where
a.id % 3 = 1
order by a.id
go
That would work just fine, but it will only work for a group of exactly three elves. While perfectly fitting the requirement, I find the chosen solution gives more flexibility and is way more elegant and future proof. Or agile if you wish. It requires a bit of lateral thinking, which is always a good ability to exercise, so great to have a chance to use it. Given that the resulting query touches the table only once instead of, I also suspect it will also be faster. It is worth digging into it a bit more if you have time.
Have fun!