Change language

Working with strings in Python. Preparing for an interview: example tasks

| | | | | | | | | | | |

Working with strings in Python. In the first part of this article, we looked at what kind of string operations might be needed in a job interview. Today, we're going to go a little deeper and look at the questions and tasks you might be asked.

Assessing the complexity of the solution

First, make sure you understand what asymptotic complexity O(n) is. In a nutshell: it's an estimate from the top of your algorithm compared to typical functions: 1, log n, n, n log n, n^2, n^3, 2^n, n!

For example, getting an element in an array by index has a complexity of O(1), that is, it takes a constant amount of time, independent of the size of the array.

Finding an array element by value has complexity O(n), because you need to loop through n elements.

The complexity is evaluated to a constant, i.e. O(n/2) = O(n).

The older terms override the younger ones: O(n^2 + n) = O(n^2).

Two types of complexity

Every algorithm has a temporal and a computational complexity - how many operations need to be performed. The algorithm also has a space complexity - how much additional RAM it needs to perform.

Very often these two values are interrelated. For example, you may be asked to reduce the memory cost, but you will have to pay for it with more calculations.

If you see two solutions with different time/memory trade-offs, make sure you talk to your interviewer about which one is preferable. The interviewer evaluates how you think. So making it clear that you are making an informed choice at the fork is often even more important than turning in the right direction.

Complexity of string operations

There is no need to memorise the complexity of each operation. Suffice it to remember that most string operations "under the bonnet" do character-by-character processing. If each character is processed in a constant O(1) time, the resulting time complexity will be O(n).

The trick is to figure out:

  • How many iterations of the loop do we have to go through to get the result? In other words, what is n?
  • In which cases the loop is not necessary, or conversely, not sufficient.
ActionComplexity
a == bO(n), where n stands for the smaller string's length
a.startswith(b)O(b)
s = «».join(arr)O(s)
s = a + bO(s)
s.split(«, «)O(s)
b = a[k]O(1)
b = a[-k:]O(k)
b = aO(1), because we are simply creating a second pointer to the same string
b = a[:]O(a). This is a trick example. This trick is often used to create an independent copy of a list. It is possible to create copies of strings in this way, but pointless, because the strings are immutable. In addition to time, we also waste extra O(n) of memory.
a.find(b)A special algorithm is used to find the substring, the efficiency of which depends on the data. On average, you can expect O(a) on random strings. In the worst case, O(a*b), like a trivial search (in the loop on a we compare the substring with b)
re.match(expression, a)Regular expressions are slower than searches, but the dependence on string length is usually linear, i.e. O(a).

Problem solving

The best preparation for solving problems is to solve them yourself. On Hackerrank or LeetCode, you can also find topic-specific problems (about strings, of course!).

Important points in preparation:

  • Move from simple tasks to complex ones.
  • Time it out. Your solution should take at most 30 minutes.
  • Speak the solution out loud: find a partner or get a rubber duck. The interviewer is not only concerned with the result, he is assessing your approach to problem solving. Don't rely on his or her telepathic skills, go for it.
  • First make sure you understand the problem, then tell the solution algorithm in words, and only then start writing code.
  • A sub-optimal solution is better than no solution. You can come up with a rough algorithm first, and then improve it later.

Example 1. Palindromes

Write a function that checks if a phrase is a palindrome (e.g. "radar").

Solution 1. Reverse the string and compare it with the original string.

>>> s = "racecar"
>>> s == s[::-1]
True

How efficient is it? Addressing a string costs O(n) in time and O(n) extra memory. String comparison is also O(n).

Total: O(n) in time, O(n) in memory. Not bad, but there is potential for improvement.

But the higher priority question is how right is it? For the palindrome "and the rose fell on Azor's paw" the lines will no longer match, is that ok?

This is a question for the client, i.e. the interviewer. You can formulate the requirements more clearly: should we consider the case? should we consider all characters, or only letters?

A reasonable answer from the customer: case is not taken into account, we compare only the letters.

Solution 2. Remove unnecessary characters from the string and convert everything to lower case. Then we reverse and compare it with the original string.

>>> s = "Madam deified Racecar rotor?"
>>> s = s.lower()
>>> for char in " ,.;:-?!":
...     s = s.replace(char, "")
...
>>> s
'madamdeifiedracecarrotor'
>>> s == s[::-1]
True

Is it right now? Checked it out, I think so.

What about efficiency? By time, it has now become O(k * n), where k is the number of skipped characters. If k is small and we know all unnecessary characters, we can neglect it as a constant. If there are a lot of extra characters or we don't know them all, we can replace replace replace() with regular expressions.

>>> s = "Madam deified Racecar rotor?"
>>> re.sub("[\W]", "", s)
'madamdeifiedracecarrotor'

Total: again, O(n) in time, O(n) in memory.

That's not bad, can it get any better? Sure. For example, we could save memory. Suddenly we may have gigabyte palindromes and cannot afford creating a copy of the string.

Solution 3. We compare the first character to the last one, the second to the penultimate one, and so on, until we meet in the middle. Skip the special characters.

def isPalindrome(s):
    begin = 0
    end = len(s) - 1
    while begin < end:
        while not s[begin].isalpha():
            begin += 1
        while not s[end].isalpha():
            end -= 1
        if s[begin].lower() != s[end].lower():
            return False
        begin += 1
        end -= 1
    return True

print(isPalindrome("Madam deified Racecar rotor?"))

There's already a decent amount of code here. It makes sense to go through it manually, check for bugs and think about boundary cases. For example, what will happen if begin becomes larger than end while we're incrementing in the inner loop?

What happened to efficiency? Memory requirements are now O(1) - we only store the begin and end variables. In terms of time, we get the same O(n).

It's worth noting that solution 2 works almost equally regardless of string contents. But solution 3 has a "better case": when it is clear from the first character that this is not a palindrome, the function will return the result much faster in O(1).

Example 2. Newspaper clippings

Given two strings A and B. Write a function to determine whether string B can be composed using only characters from string A (each letter can only be used once).

Solution 1. The first thing that comes to mind.

Loop through string B, looking for each character in string A. If we find it, we remove it from A, if we don't find it, we immediately return False.

def can_produce_substr(A, B):
    for char in B:
        char_index = A.find(char)
        if char_index < 0:
            return False
        A = A[:char_index] + A[char_index+1:]
    return True

What could go wrong with the solution?

  • There could be problems removing a character from a string if it is null or last. Spoiler alert - all will be fine.
  • Characters in different registers are counted as different. We need to clarify the job to see if this is correct.
  • Won't there be any problems, because we change the original string? No, the strings are immutable, the original string will remain untouched.

What about complexity?

Cycle through B gives complexity O(B). Searching by A multiplies the time by the length of A in the worst case. Plus overwriting A at each step gives O(A) guaranteed.

Total: O(A*B) in time, O(A) in memory.

Can we do better? Let's give it a try.

Solution 2. Look for optimization.

We have two hard operations: linear search by string and character deletion from the string.

Instead of deleting a character, we could, for example, store its number as deleted in a separate list. More precisely, in a set, so that the search is for O(1) rather than O(n). Or one could convert the string to a list and delete it from there. More precisely, replace it with None, also to speed things up.

Instead of a string search, you could convert string A to a set. This would give a search for O(1) and a delete for O(1). But unfortunately we would lose information about the number of identical characters in A.

What to do? Let's convert A into a dictionary. Let the characters from A be the keys, and the number of occurrences of them be the values. Searching through the keys of the dictionary will be O(1), updating the value will also be O(1). Said and done:

def can_produce_substr(a, b):
    a_dict = {}
    for char in a:
        a_dict[char] = a_dict.get(char, 0) + 1

    for char in b:
        char_count = a_dict.get(char, 0)
        if char_count == 0:
            return False
        a_dict[char] = char_count - 1
    return True

The time complexity is now O(A + B), which is much better than (A * B). In fact, we made the complexity linear instead of quadratic.

We paid for this by having to store an extra dictionary of size O(A) in memory, which is not much worse than storing a modified O(A) string. The prank is a success!

Conclusion

A confident handling of this type of problems is the key to a successful interview at any level. Seniors have the same + knowledge of algorithms, technology and design principles. Tasks can be simpler or more complicated, depending on your luck. Getting stuck on a complicated problem can be normal, the main thing is not to stop thinking, recite the thought process and do not be afraid to ask for help. Solving a problem with a hint is much better than stalling until you run out of time.

And the more practice you have, the more likely you are to cope with anything. Good luck!