Algorithm for efficient validation of EAN-13 and other GTIN numbers

Author:

I recently worked on a task where I was asked to write an API endpoint for uploading and saving large lists of EAN-13 numbers.

Since I was not familiar with EAN numbers, other than knowing they are barcode related I immediately went to my favorite search engine, and started looking for descriptions. The wiki article here, gave me some basic knowledge about EAN numbers before I began. Next – as the lazy programmer I am – I of course went to stackoverflow to find solutions written in C#, and found this article.

While the solutions offered there does solve the issue of validating, they looked inefficient to me. The solution with the most votes would in my specific case do an extra string allocation, and one redundant math operation. With large lists, I didn’t like the look of that.

It also turned out that – on my machine – using regex to validate if something is a number is slower than just looking at the individual char.

Since the answers in the stackoverflow article was posted, there has also been multiple improvements made to the language it self, so i decided to write my own implementation. To see if i could make it faster. I don’t want to make the user calling the endpoint wait longer than they need.

Luckily the validation process is quite simple.

The checksum is calculated as sum of products – taking an alternating weight value (3 or 1) times the value of each data digit. The checksum digit is the digit, which must be added to this checksum to get a number divisible by 10 (i.e. the additive inverse of the checksum, modulo 10).[7] See ISBN-13 check digit calculation for a more extensive description and algorithm. The Global Location Number(GLN) also uses the same method.

https://en.wikipedia.org/wiki/International_Article_Number#Calculation_of_checksum_digit

See the link for more details

Ignoring the check digit, and starting from the back multiply every digit in an odd position by 3 then sum them all up and add the check digit to the sum. The sum modulo 10, should then give 0.

E.g.
Take this EAN-8: 73513537
The last digit, 7, is the check digit. The calculation looks like this.

position7654321
first 7 digits of barcode7351353
weight3131313
partial sum213151959
checksum63
https://en.wikipedia.org/wiki/International_Article_Number#Calculation_examples

Lastly you add the check digit, 7, and end up with 70. 70 modulo 10 == 0.
So that is a valid EAN-8

I decided that i wanted to make a method that would work on multiple GTIN numbers, and since SSCC numbers also follow the same rule, it can also be used for validation of those.

This is what i came up with.

private static bool IsValidGtin(ReadOnlySpan<char> input, byte length)
{
    if (input.Length != length)
    {
        return false;
    }

    if (!char.IsDigit(input[^1]))
    {
        return false;
    }

    var sum = 0d;
    var multiplyByThree = true;
    var inputWithoutCheckDigit = input[..^1];
    for (var i = inputWithoutCheckDigit.Length - 1; i >= 0; i--)
    {
        var currentChar = inputWithoutCheckDigit[i];
        if (!char.IsDigit(currentChar))
        {
            return false;
        }

        var value = char.GetNumericValue(currentChar);
        if (multiplyByThree)
        {
            sum += value * 3;
        }
        else
        {
            sum += value;
        }

        multiplyByThree = !multiplyByThree;
    }
    
    var checkDigit = char.GetNumericValue(input[^1]);
    return (sum + checkDigit) % 10 == 0;
}
Code language: C# (cs)

This assumes that you are using Nullable reference types. If you are not, you will need to add a null check.

It also leaves it up to you to add a public method that enforces a valid length for the type(s) of GTIN you want to validate.

public static bool IsValidEAN13(string input) => IsValidGtin(input, 13);
Code language: C# (cs)

I ran some crude tests using the EAN-13 numbers available here. When comparing my solution to the most upvoted one on stackoverflow, there was a big improvement on the processing time.

I compiled release version with a target framework of net5.0 and ran both my code and the code i got from stackoverflow on my 6 year old Lenovo G70-70 laptop.

It took the stackoverflow version 47 ms to validate the list of 8851 EAN-13 numbers, while it took my solution 2 ms.

My performance test can be seen here.

static void Main()
{
    var eanNumbers = File
        .ReadAllLines(Path.Combine(Environment.CurrentDirectory, "ean13.txt"))
        .Select(ean => ean.Trim())
        .ToList();
    var invalid = 0;
    var valid = 0;
    const int ean13Length = 13;
    var stopWatch = Stopwatch.StartNew();
    foreach (var eanNumber in eanNumbers)
    {
        if (IsValidGtin(eanNumber, ean13Length))
        {
            valid++;
        }
        else
        {
            invalid++;
        }
    }
    stopWatch.Stop();

    Console.WriteLine($"Invalid numbers: {invalid}");
    Console.WriteLine($"Valid numbers: {valid}");
    Console.WriteLine($"Total processing time: {stopWatch.ElapsedMilliseconds} ms");
}
Code language: C# (cs)

In then end, I prefer my solution more. Not just because its faster – but also because i think it’s easier to read and reason about.