C++ Bitsets in Competitive Programming

C++ Bitsets in Competitive Programming

Bitsets in C++ are a powerful and convenient tool which will often speed up your program around 30 times This first problem doesn’t require bitsets, but in a moment, we will move to a harder version Let’s say you hire N workers, for each of them you know their schedule for a month For example, they are available on second, fifth and sixth day of a month, You need to find 2 workers Let’s say to assign some project to them and their intersection of schedule should be as big as possible Days when they are both at work, for example If this is schedule of the first worker He’s available on 2nd, 3rd, 5th, 6th and 8th day, then 2nd and 3rd worker, for each pair we can compute their intersection. For those two, they will be both at work on days 2, 5 and 8 So their intersection is 3. For those two, I see only the second day, So intersection is 1,. This is already computed, For those two the intersection will be just 2 again So 1 day. This is the best intersections, so the answer is 3. If you iterate over all the pairs of workers, you will have O(N*N*time needed to find an intersection) And we are talking about a month, So at most 30 days and you can merge such two lists, You can find the intersection, for example using two pointers method It will be the linear in total, 30. Let’s say that this is in general, D the number of days and this complexity is slightly too slow If you watched the first lecture you should already know that a subset of some set of numbers from 1 to 30 can be represented as binary number of length 30 This schedule with 2 3 5 so on, I can represent as a binary number 011 and so on Where 0 means I’m not available on the first day of a month, than second, Yes, 3rd day Yes, 4th, 5th, 6th, 7th, 8th and then a lot of zeros till the 30th day Similarly, this will be 01011 and so on and this will be 11 for first and second day a lot of 0s, then eventually one for 10, 01 and so on. Intersection of two workers is now just bitwise AND between two numbers and can be computed in constant time, this way we get rid of D The complexity will be O(N*N). The whole solution is first change every set into a binary number stored as a single int Iterate over all the pairs of workers, for each of them compute bitwise AND, and in their bitwise AND, we need the number of ones The number of common days, how to get that number? The number of ones in binary representation of a number is called pop_count and in C++, there is a function for that It’s called __builtin_popcount of a number Remember if X is a long long or unsigned long long, you need ll. It will be __builtin_popcountll(x), the solution then becomes, for every Schedule a of one worker, for every b of some other worker you need to consider this builtin popcount of a and bitwise AND B and this operation works in constant time But what will happen if I change a month into a year in the problem statement? This time the number of days in the year D is let’s say up to 365 Single int is not enough to store that many bits It will be a bit inconvenient, but we can make a separate bitmask for every month because 30 bits will fit in single int and then we will have 12 such numbers, we can compute bitwise AND for each month and sum up those popcounts, we can do the same for unsigned long longs and in general the code will look like this let MAX_D be the number of days, then the number of unsigned long long’s I need, each with 64 bits is MAX_D/64 + 1 so to round it up and I create a big array, two-dimensional array. For each of n workers, I want K unsigned long long’s This way for each worker, I remember all MAX_D bits and Intersection of two workers is computed by iterating over all values from 0 to K-1, for each of them adding the built-in_popcountll of bitwise AND. The complexity this time is O(N*N*D/64) or 32 if you use unsigned ints, this value is around 6 So for the constraints given in the problem, it should be okay, but the code is very inconvenient It will be unpleasant in harder problems and bitwise shifts will be a huge pain Fortunately the code you’re seeing right now on the screen can be handled by bitsets and then it becomes just this Bitsets of some length MAX_D is a binary number with that many bits and it already handles bitwise operations and shifts you don’t need to split into groups of 32 or 64 bits. You don’t need to iterate over anything, Bitsets have you covered. The complexity is the same, D/64 for an intersection, more often It is written in fact as divided by 32 But what we should really write as complexity is D/W, word size on machine you’re working on, it is nowadays often 64 but when we talk about complexity we’re not talking about running something on a particular machine If we ever get more powerful machines that is able to compute something in constant time for two very big numbers Then this means W is bigger. Now in editorials of problems in competitive programming You will usually just see D/32. Let’s learn more about bitsets. This is how you create one This will be a bitset of size 365, In general this must be a constant You cannot read N and then say you create bit set of size n, compiler must know this size So it would split it into those chunks of some number of bits. Under the hood bitset is just an array of for example unsigned long longs and this is why compiler needs to know What is that array size. Instead of builtinpopcount you have a.count() like you saw already on the previous slide. You also can do a AND b OR, XOR and so on, all the bitwise operations and you can do a shifted by something or shifted to the other side by for example, 5. This will give you a bitset shifted by five just like a binary number Usually you should think about bitsets as longer binary masks or set that is able to give you a quick intersection, number of elements and a shift Let’s solve some problems, first, different numbers, You’re given up to 10,000,000 numbers Each is between 0 and 1,000,000,000, all integers, how many different values are there? We could put everything into a set and just then print set.size(), but it will be too slow It is N* logN and let’s say time limit is tight enough that it will not pass, unordered set is also quite slow even though it’s O(N) expected, But let’s try to find something faster, Take a moment now to solve this easy problem with bitset It’s almost okay to say that we should create a boolean array, say visited of size 1,000,000,000 10^9+1, the thing is a single boolean value, It takes the whole byte and such an array it will take 1,000,000,000 bytes, So around 1 GB. This is usually too much Fortunately for us Creating a bitset of the same size, It will consume that many bits so it will be 8 times more efficient a single bool will take a bit instead of a byte, you create a bitset of size 1,000,000,000, it will take 128 MB. 8 times more efficient and what we do if those numbers from the input we say if(! visited[x]) then Increase the answer by 1 and say that visited[x]=true The complexity is linear plus the time to create this bitset, but it is very fast Take a quick look at these two codes, Both of them have complexity O(V(1,000,000,000)/32) This is complexity to create those arrays also This is space complexity and it’s good enough plus N to read values and do something Remember in this second method here, visited.count() This complexity is O(size of bitset), here 1,000,000,000/32 and it’s good to know that this has very low constant factor Operations with bitsets and generally bitwise operations are very fast Lets’s of course remember that this will not work in general case when numbers are from 0 to 10^18 Then you cannot create that big bitset, Here are two more problems I will explain the statement and then give you a moment to think about them Knapsack or subset sum is about a set of numbers Say 2, 4, 7, 8 and some goal sum of values 14, we need to take some subset of numbers on the left without taking the same number twice so we cannot take 7 and 7, so there some of them would be 14, here it is possible because it’s 8 + 4 + 2 So we print yes There is a subset with the total sum exactly 14 Triangles in a graph, we are given graph with N up to 2,000 vertices, The number of edges is not limited So maybe there are edges between all pairs of vertices, We need to count triangles… triples of vertices (a,b,c) Such that all three vertices are connected In this graph, if I’m not mistaken, there are two triangles one is those three vertices. There are edges between them and the other is those three vertices, again there are edges between every pair of them, the brute force will obviously be in O(N*N*N) here, where we iterate over all the triples and check if they are connected and this is too slow You need to speed this up. You can pause the video and think about solutions to both problems and then get back to watching In knapsack there is quite standard N*W dynamic programming approach We create a boolean array with values true and false Only the first one is true first, false false, false and so on the size of this array is W+1 and the ith cell let’s say it’s called can[i] means if we can get a subset with the sum equal to i, at the beginning only for 0 we have true because we can make an empty subset with sum=0, now if there is some value let’s say 2 what we will do is we’ll say If there is a subset so far with sum i then now there is i+2 So if first value is 2 then because there is true here, We will also mark this one as true Let’s say the second value is 3 then because there is true here, We will make it true there and so the same here, so true and true and so on, at the end for every value from 0 to W We will know if there is subset with exactly this sum, you can analyze the code on the right. It’s crucial to think about this array can as a binary mask, it will be a number consisting of zeros and ones and adding a value It means that if there is one somewhere then we also want to say now There is one 3 positions to the right if the new value is 1 so this is equivalent to taking the number shifting it to the right by 3 and computing bitwise OR of two numbers Bitwise OR of those two will be 10110100 ?? those unnecessary bits because those will represent values W+1 and W+2, we are not interested in that and this is a new state of array can. This is new code using bitsets It’s only shorter, simpler and it is faster, because this time instead of having N*W For each of n elements we do some bitwise operations in complexity W/32. With bitsets we can still say can of something is true or one Just like we would do with a boolean array. But here after noticing that it is bitwise shift and computing OR, You can something simpler. Please note that we can even do this to just save some implementation time and maybe even running time because this operation for computer is a bit simpler than this one It’s better to say a+=b instead of a=a+b Triangles in a graph is a bit easier because we can directly say that there are some subsets so we can represent them as a bitset. Adjancency list for every vertex is a set of other vertices If vertex 1 is connected with 2, 4, 5, I can say that this will be bitset, 010110 where this means edge to 2, edge to 4, edge to 5. There is no edge from vertex to itself, for two It will be 101101 and so on, this is for now just an alternative representation of the input, instead of list of edges or adjacency list for every vertex, I can remember bitmask of adjacency for every vertex and thanks to that, I can now take two vertices and compute Well, what can I do? In particular I can compute the intersection the intersection of neighbors of 1 and 2 so this mask Bitwise AND this mask, it will find just one common bit on position 4, it will mean that 1 and 2 are both connected to 4, 1 & 2 are both connected to 4 This doesn’t mean that there is a triangle because we need one more edge Between 1 & 2 so the algorithm should be for every pair of vertices U and V, if U and V are connected with a direct edge then compute the bitwise AND of adjacency and add their popcount to the answer One last detail is, in the end, You should say that you print answer/3. Because every triangle you will compute 3 times, if there is triangle ABC, for a pair AB you will detect that there is C in common neighborhood, the common adjacency list then for AC and for BC, like in our example for 1, 2 you detected that there are edges to 4 and also There is edge between 1 and 2 but also for a pair 1, 4 for you detected that there is common bit 2 Here and there and there is such between one and four So you add to the answer plus one and so on and so on so every triangle is computed three times In the video description You will find a link to my codeforces article about the same topic, where you will find more problems and bonuses if you want to practice bitsets more. I hope you enjoyed this lecture, consider subscribing and turning on notifications on this channel not to miss future videos. Bye 🙂

32 thoughts to “C++ Bitsets in Competitive Programming”

  1. Also, you can print a bitset and you can construct it from string or integer. In particular, you can easily print binary representation of number 123 this way (where 15 is arbitrary number of bits):
    cout << bitset<15>(123);

  2. Amazing, using binary representation is so overpowered, I am glad we invented computers. Great explanations. The knapsack example is awesome.
    In the examples you show, I think you have this pattern of finding the bitset representation of some kind of "visted set" that helps you reason on bit operations rather than integers. I wonder if it is a pattern you can relate to when searching for algorithms to problems in C++ in general.

  3. Hi, I have a (probably stupid) question…
    I'm using Dev C++ on Windows, and I can only initialize a 500-million bitset; when I tries to initialize a 1-billion bitset I receive an "out of memory" warning.

    I've previously learnt how to expand stack size for dfs recursion using -Wl,–stack=(STACK_SIZE) but apparently it doesn't help in this case.
    Have you heard of this situation before/know how to solve it ? Thank you for reading !

  4. too smart lol.
    I learned nice tricks about bitsets, thank u.
    How come the C++ compiler doesnt allow for arrays or bitsets to be initialized with a variable size, e.g. the size has to be constant at compile time?
    This is a bit restricting because more often than not, the input size is variable.

  5. hey i have a homework problem, given n, imagine we have 2 plates, put n on the 1st plate. We can pick multiple weights of 3^k, k = 0,1,2…. (k >=0) and put on the two plates, (plate 1 has n already, and plate 2 is empty), so that the 2 plates are balanced. the constraint is that we cannot use weight of any kind more than once, and n is <= 10^9 . Input is n, output on 2 lines, the first number on the first line is the number of weights, followed by specific values, similarly on line 2 . For example:
    2 3 9

    1 81

    (plate1) 69 + 3 + 9 = 81 (plate2). I'd really appreciate your help, thanks

  6. In the second problem, is it okay to do "can << x"? Or we have a condition that x ALWAYS greater or equal to can?

  7. in the last problem the bruteforce complexity was N*N*N but the solution is also in N*N*N is it not? we compute each node with all the others

  8. The knapsack solution is so cool especially optimising the for statement with the shift. Never thought of that before.

  9. always fun to watch your videos and learn at the same time. Very systematic and lucid way of teaching stuffs. I just wish your channel keeps on growing Errichto. You are doing a terrific job !!

  10. This content is awesome. 😃😀 For anyone preparing for competitive programming or even maybe for ds and algo for interviews.

Leave a Reply

Your email address will not be published. Required fields are marked *