Advent of Code 2020 - Day 4

in   code   ,

Day 4 of AoC 2020 (Passport Processing) is a fairly straightforward parsing problem. As with the previous posts in this series, we’ll use Python for the task. Spoilers lurk ahead.

Because the input is made up of multiple records separated by blank lines, we can use the following snippet to build the passport database.

from collections import defaultdict

with open('input') as datafile:
    # Split on '\n\n' to separate the individual records
    data = datafile.read().split('\n\n')

passports = []
for pp in data:
    # We are using defaultdict to simplify handling in later stages
    newpp = defaultdict()
    for elem in pp.split():
        # elem is the k:v pair which we need to split further
        k, v = elem.split(':')
        newpp[k] = v

    passports.append(newpp)

Part 1 requires us to just verify that the passports all have certain fields present. We can use the following snippet to verify part 1

def valid_part1(pp):
    return all(k in pp for k in ['byr', 'iyr', 'eyr', 'hgt', 'hcl', 'ecl', 'pid'])

count_valid_part1 = [valid_part1(pp) for pp in passports].count(True)

Part 2 adds some additional data validation, specifically checking that some fields are within a specified range, other fields match a pattern.

For the hair color field (hcl), we need it to match the pattern #[0-9a-f]{6}. Similarly, for the passport ID field (pid), it needs to have exactly 9 digits. We can use the re module to verify the patterns.

import re

hcl_regex = re.compile(r'#[0-9]{6}')
def valid_hcl(pp):
    hcl = pp['hcl']
    return hcl is not None and hcl_regex.fullmatch(hcl) is not None

pid_regex = re.compile(r'[0-9]{9}')
def valid_pid(pp):
    pid = pp['pid']
    return pid is not None and pid_regex.fullmatch(pid) is not None

For the eye color, we need it to be one of a set of values.

ecl_set = set(['amb', 'blu', 'brn', 'gry', 'grn', 'hzl', 'oth'])
def valid_ecl(pp):
    return pp['ecl'] in ecl_set

For the fields that need to be within a range, we need to convert them to integers to compare them. We will create a helper function to deal with this.

def valid_range(field, lo, hi):
    try:
        field = int(field)
    except (ValueError, TypeError):
        # If field is None, or has characters other than 0-9, then it will
        # fail integer conversion, and therefore is not in the range
        return False

    return lo <= field <= hi

We can now validate the byr, iyr and eyr fields by calling valid_range directly. For the hgt field, we need some additional steps to check that the last two characters are a valid measurement unit, and based on that, validate the remaining numbers within the expected range.

def valid_hgt(pp):
    hgt = pp['hgt']
    if hgt is None:
        return False

    unit = hgt[-2:] # Get the last 2 characters
    if unit == 'cm':
        return valid_range(hgt[:-2], 150, 193)
    elif unit == 'in':
        return valid_range(hgt[:-2], 59, 76)

    return False

We now have enough to validate as per the Part 2 rules.

def valid_part2(pp):
    return all(
        valid_range(pp['byr'], 1920, 2002),
        valid_range(pp['iyr'], 2010, 2020),
        valid_range(pp['eyr'], 2020, 2030),
        valid_hgt(pp),
        valid_hcl(pp),
        valid_ecl(pp),
        valid_pid(pp)
    )

count_valid_part2 = [valid_part2(pp) for pp in passports].count(True)