Ian Byrd


Competitive programming 101

TL;DR: It’s not for light-hearted, boys, it’s absolutely not. The good thing is, though, you get much better as you suffer and at some point you’ll see how you started noticing various bugs everywhere (including your real-life codebases) and how your vision on code has changed and how you approach it differently. In the end, it’s rather good than evil.


Good morning, lads! Today I’d like to talk about a subject of a general hatred (especially, amongst the software engineers), competitive programming. Some people would tell you it’s important to do contests, other would say it’s a waste of time. The thing is you usually need some CS knowledge to pass tech interviews to some cool companies and, frankly speaking, believe it or not, computer science algorithms won’t hurt you. Solving algorithmic problems works great and keeps your head turned. Let me prove it to you. But just before we dive into the world of pain and misery, let me put a little disclaimer here. I’m not advanced. That’s it. I’m not one of your TOP CLASS boys with deep expertise in algorithms / data structures / maths / whatever. Yet according to rating contests I did on HackerRank, I rank good above average, so you know I’m not total bullshitting either. You ready, lads? Go.

Introduction

Okai, what’s a contest in the first place? It’s a bunch of clever boys gather around and solve various algorithmic challenges for fun’s sake. That’s what they tell you. In reality, you struggle, sometimes get irritated, sometimes, miserable… But hold on, we’re gonna come back to it a little bit later! For short (1-3 hrs) contests, problem set usually consists of 3-5 problems tops. You read one, you fuck with it for a while, you hack solution to it, you submit it and you go for the next one. Chances are if you’re reading this article, you won’t be able to solve much at first, but as time goes by, you learn and become smart enough to chop them all (I’m not there yet).

What is a problem? Answer is, it’s some algorithmic challenge. It involves either looking for some paths (graphs), counting permutations (combinatorics), data transformations. Sometimes given input you have to tell whether it’s legit or not, etc. What does it mean to solve a problem? Well, you have to write code that solves it (produces valid output for input given) in some programming language. Submitted code goes through a set of automated tests, supposed to cover all possible scenarios and corner cases within some constraints (limits to input data capacity). Unless you covered them all, you’ll pass with partial score or not going to pass at all.

There are limits. Oh yeah, there are limits. Usually the one you care is time limit (a.k.a TL), but you have to know that there is always some sort of a memory limit as well. Why would they put a time limit in the first place? Well, you can approach most problems differently and depending on the route you take, you’ll end up with one algorithm or another. Slower ones take more time to run, while more efficient ones do much better. We use big-O notation, which helps to estimate algorithmic complexity of particular algorithm. Once you know it, you more-or-less know how fast you gonna be regardless of underlying hardware. Usually your job boils down to evaluating constraints carefully and keeping them in mind while developing an end algorithm. Sometimes you guess the right approach just after looking at the problem constraints and already know pretty much what you have to do.

Most and most of the problem solvers use either C++ or Java for performance reasons. Python is pretty slow and haven’t got TCO, making it a really bad choice for most of the problems (you can save some time on easier tasks with it though). Ideally you’d better stick to C++, since it offers the best performance (zero-level abstractions) and flexibility (STL) during the contest. If you’re afraid or pissed of C++, don’t be. Competitive programming is where it shines. You’re really unlikely to shoot yourself in the foot while doing this sort of routine.

Problems

I guess now you are more or less familiar with main concepts of competitive programming and how it’s supposed to work. Let’s solve some easy peasy problem and see how it works out, alright? HackRank offers a couple subdomains, let’s look which one suits us most:

  • Implementation. This one focuses on things not trivial to implement in code. Nothing particularly complex, just makes you think of an elegant implementation to the problem. Usually, that’s it. I find it more or less practical when it comes to benefits of competitive programming.
  • Strings. As a computer programmer (or whoever you are really), you must be working with strings often and most certainly familiar with most naive string transformations. Turns out, there’s much much more. It’s nearly as practical as Implementation domain, but I find it much more demonstrative. You gotta check this out if you wanna understand how e.g. regular expressions work under the hood.
  • Search. You gotta implement the search algorithm for some data structure. Turns out, usually you can answer various question much-much more efficiently than you realize. Hash tables, search trees, segment trees… You must check this out if you’re interested how file systems and databases work under the hood.
  • Graph theory. It studies pairwise relations between objects. Absolutely any objects. All of a sudden, that’s pretty much what we do as software engineers. Graphs are literally everywhere: network flows, decomposition problems, routes, covering problems, etc. Being aware of these problems and possible solutions to them is really useful. Ever heard of graph databases? Neural networks? There we go…
  • Greedy. This class of algorithms basically helps us solve some problems of combinatorial optimization. It’s fairly impractical, but you’re likely to run into some of these problems eventually. Greedy algorithms is a fast and reliable way of yielding the global optimum for certain non-trivial problems.
  • Dynamic programming. This class of algorithms takes advantage of the fact that you can formalize certain (real-life) problems recurrently. Thus, instead of calculating every required subproblem over and over again, you can remember (memoization) them and therefore, avoid duplicate calculations. DP algorithms are incredibly effective when it comes to memory and overall performance. It’s a must-have tool in the shed.

I’m not fond of easy and problems and you can’t do most of the search or graph algorithms without sufficient background (it’s 101, remember?). I’d choose implementation and array splitting problem. Seems like a good place to start from.

Hack!

You better read the problem (it’s misclassified as DP) on HackerRank, but I’m gonna quote it anyway:

Nikita just came up with a new array game. The rules are as follows:

  • Initially, there is an array A, containing N integers.
  • In each move, Nikita must partition the array into 2 non-empty parts such that the sum of the elements in the left partition is equal to the sum of the elements in the right partition. If Nikita can make such a move, she gets 1 point; otherwise, the game ends.
  • After each successful move, Nikita discards either the left partition or the right partition and continues playing by using the remaining partition as array .

Nikita loves this game and wants your help getting the best score possible. Given , can you find and print the maximum number of points she can score?

Here comes the example (from the problem page). Say the following array is given:

Nikita splits the array into 2 partitions having equal sums, and then discards the left partition:

She then splits the new array into 2 partitions having equal sums again, and then discards the left partition:

At this point the array only has 1 element and can no longer be partitioned, so the game ends. Because Nikita successfully split the array twice, she gets 2 points and that’s what we gonna print out.

Read the problem statement carefully and take a look at I/O format, constraints and scoring please. You’ll find out that it’s actually N ≤ 214 we are talking, which approximates to 104 (slightly more actually), 10 thousand, while each and every single number may go up to 109 = a million. Now think about it for a while (or even try to solve and submit!) and once you done, come back.

Alright, so after a little inspection it seems that whatever naive algorithm we’re gonna come up with is gonna be fine: sum of all given numbers won’t ever exceed 1014, which is not that much really. As long as it fits into 64-bit integer nicely, we’re fine. We’re gonna use recursion to solve this problem. Let’s imagine there is some function solve(A) that solves the problem for an A (how many times the given subarray can be split). Then the code for our solution boils down to (code in C++):

vector<int> A;

int main() {
    // number of tests
    int T;
    cin >> T;

    while (T--) {
        int N;
        cin >> N;
        A.resize(N);
        for (int i = 0; i < N; i++)
            cin >> A[i];
        cout << solve(0, N) << endl;
    }
}

Looks great, innit? Beautiful and concise! A crucial part is missing though… the algorithm. Time to figure it out!

  • Step 0. If length of array ≤ 1, return 0 (you can’t split array of one element).
  • Step 1. Count the total sum of numbers in array.
  • Step 2. Go over each element and accumulate partial sum unless partial*2 equals to total, which is a split condition!
  • Step 3. Return 1 + max(solve(l, i+1), solve(i+1, r)), where i is the split position. We choose the better result and add 1 because we just made a split.

Using this strategy, we always will end up with the best number of splits possible. Let’s write the solve(A) function:

// int64 is so much better name :)
typedef long long int64;

int solve(int l, int r) {
    if (r-l <= 1)
        return 0;

  	// sum of array
    int64 total = accumulate(A.begin()+l, A.begin()+r, 0LL);
 	// you can't split odd sum of
    if (total % 2 == 1)
        return 0;

  	int64 partial = 0;
    for (int i = l; i < r; i++) {
        partial += A[i];
        if (partial * 2 == total) {
            return 1 + max(solve(l, i+1), solve(i+1, r));
        }
    }

    return 0;
}

Try to estimate algorithmic complexity of this algorithm on your own.

Pain

I promised to tell you about the pain… and misery. Imagine you’ve been solving some weird bad ass kick ass problems for two hours long and it just won’t pass because of a single character (say you put = instead of <= somewhere). Happened to me a hundred times. Imagine it’s seconds until the end of the contest and you’re literally 2 seconds late with your submission. Imagine submitting your solution over and over again and it just won’t pass because there’s always a WA somewhere. It’s not for light-hearted, boys, it’s absolutely not. The good thing is, though, you get much better as you suffer and at some point you’ll see how you start noticing various bugs everywhere (including your real-life codebases, including your code review process) and how your vision on code has changed. In the end, it’s rather good than evil.

Reading

I recommend to get yourself a copy Introduction to algorithms (CLRS) and start reading chapters you find interesting / would like to know more about. It’s an absolute must-have for computer science students and just fellows interested in algorithms and how algorithms can save the world. Related topic on Quora might be a good place to start if you wanna get into it, since it probably covers most of the issues you’re likely to run into.

You love it or you hate it.