Tomer Gabel's annoying spot on the 'net RSS 2.0
# Tuesday, 16 September 2008

(Cross-posted on Stack Overflow)

Update (17-Sep-08): Corrected the answer to an integer arithmetic question ("rotate to the right" was obviously incorrect - I was careless when I wrote this. Thanks, Kuperstein!) and some formatting adjustments.

Preparation

I never interview the applicant on the first phone call. This doesn't give me the time to go over the candidate's CV and consider if there are points I'd like to bring up on the interview, such as specific work experience or glaringly missing skills. Additionally, setting up a specific time for the phone interview allows the candidate time to mentally prepare, drink a coffee and settle down in a quiet spot somewhere. Just puts everyone at their ease.

When first calling the candidate I always take the time to introduce the company I work for, a synopsis of what we do and a general description of how the company operates. I then proceed to inquire if the applicant sees this as a potentially interesting place to work in and whether or not they have any questions; you'd be surprised at the time this can save, for example it's not always obvious whether or not the applicant is interested in working for a start-up company, or alternatively may not find the problem domain engaging.

Getting to Know The Candidate

To start the interview off, I usually skim over the candidate's CV and select two or three interesting projects that the candidate was involved in. I ask the candidates to describe their involvement (sometimes, but not often, going into a bit of detail if the domain is familiar to me), their specific contribution to the project, whether or not they had fun and why. This usually gives me a sense of what the candidates are looking for; are they heavily into design? Are they enthusiastic about a specific technology, and if so, do they have a sound reason for it? Were they frustrated by administrative issues, did they try to improve their working environment?

Technical Questions

Once those formalities and niceties are out of the way, I turn to my ever-growing collection of interview questions and select a small subset to present to the candidate. For example, a typical interview may include the following questions:

Integer Arithmetic

Does the candidate have a decent grasp of bits and bytes? I consider this a must-have; a candidate that fails this part has no chance in hell of tackling even the most trivial native code.

  • "Take an integer. How do you turn the 7th least significant bit off?" This question alone removes about half of the applicants from the equation. Some people tell you "you need to apply bitwise AND, but I can't remember the number you need to AND with" -- this isn't what you're looking for. A good answer would be something like "x &= ~(1 << 6)".
  • "How would you quickly divide an integer by eight (8)?" A good answer is, you shift right by three. A better answer would be "is the integer signed or unsigned?", with a bonus for Java developers who know the difference between >> and >>>.

Pointers and Pointer Arithmetic

This really depends on what your company does. Most developers today don't need to know in practice the particulars of pointers and pointer arithmetic optimizations, but if you're developing a highly scalable system and/or one with serious performance considerations, this is a very good measure of how likely the candidate will be able to tackle such challenges.

  • "How big is a pointer?" The only correct answer is, "depends on the architecture". 32-bit operating systems will have 32-bit pointers, 64-bit systems will have 64-bit pointers. Anything that doesn't fall into these two categories is not relevant for my purposes, and likely yours as well. If the candidate fails this question I usually mark this section as "failed" and move on.

Floating Point Numbers

  • "What is NaN?" A programmer who can't answer this question has never really worked with floating point numbers.
  • "How are floating point numbers represented in a typical modern architecture?" Failing this question is not a deal-breaker, but answering it correctly will score the candidate a lot of points. "sign, exponent, mantissa/fraction/significand" is a sufficient answer.

Essential Data Structures and Algorithms

This is the bread-and-butter of programming. A candidate should exhibit robust familiarity with commonplace data structures (hashtables, linked lists, trees etc.) and algorithms (sorting, graph traversal).

  • "Describe how quicksort works. Elaborate on its performance characteristics." Any professional developer should be able to explain quicksort in a few minutes, know its average- and worst-case complexity, and recognize pathological cases (typical school-level quicksort implementations exhibit horrible performance on pre-sorted data).
  • "How are hashtables commonly implemented?" A candidate should be able to describe the concept of a hashing function (uniform distribution) and how it relates to the internal data structure (normally an array of buckets).
  • "What backing data structure would you choose for a simple text editor?" The classic answer is arope, but it's unlikely that a candidate will be familiar with this data structure. A more likely response would be a "linked-list of strings", in which case you should ask about the complexity of various editing operations (deleting a line, inserting a line, deleting a character etc.) This question typically takes slightly longer to answer but I've found that it gives me a good measure of the candidate's intuition in choosing/analyzing data representations.

Threading and Sychronization

This is fast becoming the most important subject with which to distinguish the truly brilliant candidates from the merely competent; the ubiquity of multithreaded code nowadays also means that these questions can be used to quickly weed out the unworthy candidates.

  • "Describe one common way of synchronizing access to a shared resource." This is just a starter question, and if the candidate takes more than a few seconds to come up with an answer (mutex, semaphore, monitor or "synchronized" for Java developers, "lock" for C# developers) it's usually a good sign that they don't have any reasonable experience with multithreaded development.
  • "You have a shared cache with a very good hit ratio. How would you synchronize access to the cache with as little performance overhead as possible?" The answer is trivially a read-write lock which can accomodate multiple readers and a single, exclusive writer. Where appropriate, ask how the candidate would implement such a lock.
  • "Describe a nontrivial problem that you've had with threaded code." Responses usually fall into one of three categories: either (1) a reasonably experienced multithreaded developer would never make the same mistakes, in which case the candidate obviously isn't one; (2) a classic race-condition/deadlock/etc. scenario, which merely tells you that the candidate has some experience with multithreaded code and appears capable of tackling such challenges; or (3) rarely, a candidate may have a genuinely interesting "war story," in which case you'll probably want to hire them right away.

Peripheral Technologies

Approach this section with caution. A canditate that's familiar with a great deal of today's hot technologies may prove completely incompetent, whereas it's quite possible to find brilliant programmers that have never touched COM in their lives. I still like to get a sense of how "in touch" the developer is with contemporary technologies; familiarity with tools and technologies can definitely be a tie-breaker between two promising candidates.

  • "Are you familiar with COM? Describe an interface which any COM object must implement and its methods." Anyone who's even a bit familiar with Windows software development should be able to answer this question fully. For those unfamiliar with COM, describe it in a few words and ask the canditate to guess what the required methods of IUnknown are.
  • "What is a well-formed XML file? Give two examples of errors in an XML file which would render it non-well-formed." XML is prevalent in almost every software development domain. A candidate which cannot answer these questions (and doesn't have a very good excuse) will not go past this interview.
  • "What's XPath? Explain what a predicate is and how it's used." This is not most-have. I'd expect serious developers to at least have an idea of what XPath is. The second question is there to differentiate those who profess to know XPath from those who've actually done work with XPath and/or XSLT.
  • "If you had to verify the input of an e-mail address field, how would you go about it?" There are only two valid answers to this question: "I would use a regular expression," or "I'd like to use a regular expression, but I know that fully matching e-mails according to the RFC is insanely complex, which is why I'd get a proven library to do it for me." If the canditate is being a smart-ass you can always ask them about the performance characteristics of commonplace regex engines (which have pathological cases).
  • I also like to just toss buzzwords (Ruby, Boo, JSON, Struts, J2EE, WCF) around and examine the candidate's responses. It may also provide an interesting subject to ask about in a personal interview later on.

Concluding the Interview

The previous section usually takes between 10 and 15 minutes. At this point I normally ask the candidate if there are any questions they'd like answered, or anything I should know before we conclude the interview. Once that'd done, I thank them for their time and tell them (even if they've failed miserably) that I will call them back in a day or two with an answer.

Hope this helps, comments are welcome.

Tuesday, 16 September 2008 19:39:11 (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Personal
# Thursday, 11 September 2008

After a long hiatus I've found the time to update PicasaWebDownloader. If you're using the tool, there's a bunch of compelling reasons why you'd probably want to update. As always, code is included and feedback is welcome.

Thursday, 11 September 2008 17:32:12 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal | Software
# Wednesday, 03 September 2008

Update (22-Sep-08): As Peter Kasting from the Chromium team (I think?) mentioned in the comments, this hack is unnecessary. Simply go to google.com, click on Google In English, restart Chrome and wait about 10 seconds, which will result in the desired behavior.

Chrome is amazing. It really is. Ridiculously fast, ridiculously compact (less than 0.5MB installation!) and seems to just work, which is truly astonishing for a product of this caliber, particularly the first version thereof.

The one obvious deficiency I could find was that it decided on google.co.il (the Israeli version of the Google homepage) as my default search provider, whereas my preference is for the regular English version on google.com. The search provider settings cannot be changed and do not respect the homepage's cookie (click on Google in English once and you're supposed to be done with it). Apparently it uses a {google:baseURL} macro which does not appear to be defined anywhere, and the only workaround I could find was:

  • Start->Run
  • notepad "%userprofile%\AppData\Local\Google\Chrome\User Data\Default"
  • Look for the line starting with "search_url": (for me it was line 8)
  • Replace {google:baseURL} with http://www.google.com/

The damn thing still changes the setting every now and then. I'll file a bugreport with Google, but this should suffice in the meantime (search results rendered right-to-left can drive me up the wall).

Update: As Shy noted in the comments, the installer actually is a downloader, I just didn't notice the first time because I was doing other things while it was starting up. In practice, though, it's annoying as hell, particularly if you're on a slow pipe. Bad Google!

Wednesday, 03 September 2008 16:40:34 (Jerusalem Standard Time, UTC+02:00)  #    -
Software
# Tuesday, 02 September 2008

Update (11 March 2009): Microsoft has retired FolderShare in favor of Windows Live Sync. It’s basically the same service, except they’ve significantly increased the file limit per library and finally added full Unicode support. Now that my two major gripes with the service are resolved I’m perfectly happy with Sync, and it’s free to boot!

I have a very large collection of music files, easily 70GB with thousands of files (those lossless rips can be quite space-consuming). I listen to music both at home and at work, and don't want to go through the trouble of synchronizing these collections manually. In fact, I would like a service that fulfills the following requirements:

  • Easy to set up;
  • Works across NAT, preferably with UPnP support;
  • Full Unicode support;
  • No artificial limit on library size;
  • Not required, but definitely advantageous:
    • Free;
    • Low memory and CPU overhead;
    • Libraries are accessible over the web;
    • Some sort of online backup solution

The NAT support is an absolute must, as I have little or no control over the firewall at work. Unicode support may sound like a trivial requirement, but as you'll see most solutions do not properly support Unicode. My collection contains albums in multiple languages, including Hebrew, Japanese and Norsk, but even English albums can cause issues (Blue Öyster Cult, for example).

I've tested the following solutions:

Microsoft FolderShare (now Windows Live Sync, see update above)

Although this is one of the oldest players in the game (the company was bought out by Microsoft in 2005) it hasn't seen any visible improvement in a very long time. Despite the apparent dormant development, the service itself works well and is very consistent and reliable. What separates FolderShare from any other solution I've tested is a very user-centric design: any reasonably literate computer user (read: knows what files are and can double-click on an install button) should be able to set up a FolderShare account and start synchronizing files literally in minutes. Once set up the service simply works; other than the disadvantages which I'll enumerate momentarily, I've had absolutely zero problems with the service in over a year of use (well, to be honest there was a highly-publicized two-week service outtage over a year ago, but it's been hassle-free before and since).

FolderShare fails in two specific ways: it's limited to 10,000 files per library (I think there's a limit on the number of libraries supported, but I've never come close to it), and it does not properly support Unicode. This means that files with characters outside the ANSI character set and machine codepage simply do not get synchronized. Other than that its interface is amazingly limited with very few customizable options, but in my opinion this isn't really an issue because the software simply does its job really well.

With these disadvantages in mind, I wouldn't hesitate to recommend FolderShare to English speakers (or ones that do not make use of non-Latin file names), but the rest of us will have to look elsewhere. With Unicode support I'd definitely go back to FolderShare though, it's an excellent product.

PowerFolder

Touted as an open-source file synchronization solution, PowerFolder utterly failed to impress me; it appears to be a very powerful solution, but consequently suffers from a very cluttered UI that's hard to grok. I wouldn't recommend this service to casual users, and it wasn't trivial for me (as a software developer) to figure out either.

I installed a trial version of PowerFolder Pro on both machines, but once I got past the strange UI idioms I just couldn't get the software to work reliably. I managed to send an invitation from one machine to the next (synchronized directories in PowerFolder do not appear to be centrally managed), but couldn't figure out how to get them to sync reliably nor how to resolve file conflicts. Finally, the client software is a real memory hog.

BeInSync

Fairly similar to PowerFolder (with additional online backup features on Amazon's S3 storage service), BeInSync is a commercial product that appears to provide all of the features I require. The service was fairly easy to set up, although not nearly as streamlined as FolderShare. I got my directories to synchronize properly and was relieved to find that Unicode is fully supported by this product.

Despite the promising start, my experience with this product was far from satisfactory: the client UI is incredibly slow and non-responsive. Other than general slowness in rendering speed and bizarre UI idioms (for example, the only way to get a reasonable status display is via the View menu), resolving synchronization conflicts can easily take 30 seconds per file with no batching capabilities at all. On top of that the client software is a major resource hog, easily taking up 60MB and more resident memory, and for a reason I couldn't figure out I could see 3-9MB per second I/O activity from the client although it exhibits no synchronization activity. To add insult to injury, the uninstall program requested that I restart my computer - nitpicking, I know, but what the hell?

Having tested three different services I'm sorely tempted to go back to FolderShare and figure our a manual synchronization scheme for the Unicode files. The other alternative is a homebrew VPN+robocopy/rsync/SyncToy solution which I'd prefer to avoid. I'm rather surprised that it's so hard to find hassle-free synchronization services so late in the game...

Tuesday, 02 September 2008 18:15:18 (Jerusalem Standard Time, UTC+02:00)  #    -
Software
# Tuesday, 26 August 2008

Apparently Vorbis (.ogg) files are not all that commonplace, and some popular web servers (at least IIS7) aren't configured to handle them by default. Under the assumption that it's a case of missing MIME type I send a support request to GoDaddy (my web host of choice), who were pleasantly responsive and even helpful.

IIS 7.x supports configuration of mime-types on the application or virtual directory level by including the following lines in a Web.config file at the root of said directory:

<system.webServer>
	<staticContent>
		<mimeMap fileExtension=".ogg" mimeType="audio/ogg" />
	</staticContent>    
</system.webServer>

After making the change, all .ogg file links within this site are now accessible (this particularly pertains to the Defender of the Crown links).

Tuesday, 26 August 2008 17:10:10 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal
# Monday, 18 August 2008

This guy hooked up an old Sinclair ZX Spectrum to a bunch of crappy hardware (a hard drive array, an old dot matrix printer and a scanner) and got them to play an amusing (though still impressive) approximation of Radio Head's "Nude". Here's to a bigger geek than I can ever hope to be!

spectrum_zx_madness

Monday, 18 August 2008 11:06:56 (Jerusalem Standard Time, UTC+02:00)  #    -
Music | Software
# Tuesday, 22 July 2008

Because the Java language lacks delegates, anonymous classes are prevalent as a syntactic replacement. Non-static nested classes are also often used in the language, a feature which is conspicuously absent from C#, albeit far less necessary with that language.

This brings me to the following language caveat. This may seem a contrived example, but it's a simplification of an actual issue I've encountered in the last few days. Suppose you have a generic base class, call it BaseClass<U>, which you extend with an anonymous class. Lets also assume that the extended class spawns a thread that needs to access the BaseClass<U> state:

class BaseClass<U> {
    void getState() {}
}

class Test {
    public void test() {
        final BaseClass<String> instance = new BaseClass<String>() {
            public void invokeStatefulThread() {
                // Create our runnable
                final Runnable threadCode = new Runnable() {
                    public void run() {
                        /* 1 */ getState();
                        /* 2 */ this.getState();
                        /* 3 */ super.getState();
                        /* 4 */ BaseClass.this.getState();
                        /* 5 */ BaseClass<String>.this.getState();
                    }
                };
                new Thread( threadCode ).start();
            }
        };

        instance.invokeStatefulThread();
    }
}

I'll spare you the guessing game. Here's what happens with each of the five invocations of getState():

  1. Compiles and behaves as expected.
  2. Obviously won't compile; this points to the Runnable.
  3. Obviously won't compile; the superclass of a Runnable is Object.
  4. Although this is the correct raw class, it won't compile because "No enclosing instance of the type BaseClass<U> is accessible in scope", even though the raw type should still be accessible and U can be inferred.
  5. Although this appears to be the correct fully-qualified form, this does not compile with a "Syntax error on token(s), misplaced construct(s)" error.

The Java language specification section on "qualified this" is very brief and does not mention generics at all (does "class name" include bounded type parameters?). Oddly enough, moving the class declaration outside of the test method actually lets 4 compile -- if there's a clue there, I haven't figured it out yet.

I still haven't found a syntactically-correct way to access BaseClass<string>.this, other than placing it in a temporary variable outside of the Runnable declaration. I searched yesterday for a couple of hours with no obvious solution in sight. Ideas are more than welcome!...

Tuesday, 22 July 2008 10:27:45 (Jerusalem Standard Time, UTC+02:00)  #    -
Development | Java
# Wednesday, 16 July 2008

In addition to the wide press coverage on US-oriented technology sites we've seen coverage from two major Israeli news providers (Hebrew only, for now): Calcalist and TheMarker.

Now comes the fun part; Delver is still borderline-alpha. We've been working hard testing and tweaking it, and getting a system of this complexity working in good order on a ridiculously short schedule feels astounding. I sincerely believe the Delver premise is a solid one, and we're giving you a mere inkling of what's in store for the concept; now all we have to do is work harder, growing along with the product and slowly but surely realizing its full potential.

The brilliant part? Beyond the dreams of rich and fame, this product already is useful; with relentless improvements it may yet become as indispensable a tool to Internet denizens as Google, Wikipedia and Facebook are today.

Wednesday, 16 July 2008 01:45:24 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal | Software
# Tuesday, 15 July 2008

An alpha version of our search engine is now open for all users!

logo_web_ship

We've been working towards this day for the past year, building a complete and functional search engine from scratch on a completely original premise. I'm both amazed and proud of the work done by the various teams, and I'm still can't believe we've managed to pull this off in so little time. Launching the search engine publicly seems like a great way to celebrate the year I've been working for Delver (as of July 1st).

Mind you, the service is still new and we're hammering away at the kinks, but so far we've had overwhelmingly positive press coverage and the various comments are sincerely flattering. Here's to another amazing year!

As an aside, we're got openings on my team (search back-end) for extremely talented software developers who are interested in building performance-driven, robust back-end software in a variety of technologies. Interested? Contact me for details at tomer@delver.com!

Tuesday, 15 July 2008 21:05:27 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal | Software
# Sunday, 22 June 2008

After figuring out the problem with the old dasBlog permalinks I had to figure out a way to convert all existing links in my blog to the new format. Lately whenever I need a script I try and take the opportunity to learn a bit of Python, so it took an hour or two to write the conversion script.

Here it is; if you want to use this for your own copy of dasBlog, change the "domain" global variable to wherever your blog is located and run this from your ~/Content directory (you can also download the script here):

#!/usr/local/bin/python
#
# convert_permalinks.py
# Quick and dirty permalink converter for dasBlog content files
#
# Tomer Gabel, 22 June 2008
# http://www.tomergabel.com
#
# This code is placed in the public domain (see http://creativecommons.org/licenses/publicdomain/)

from __future__ import with_statement
import os
import glob
import re
import urllib

# Static constants
domain = 'tomergabel.com'
href_lookup = re.compile( 'href="(http:\/\/(www\.)?' + re.escape( domain ) + '/[^"]*\+[^"]*?)"' )

# Globals
conversion_map = {}

# Takes a URL and removes all offensive characters. Tests the new URL for validity (anything other than a 404 error is considered valid).
# Returns a tuple with the converted URL and a boolean flag indicating whether the converted URL is valid or not.
def convert( url ):
	new_url = url.replace( "+", "" )
	# Check URL validity
	valid = True
	try:
		resp = urllib.urlopen( new_url )
		resp.close()
	except:
		valid = False
	return [ new_url, valid ]

# Processes the source file, converts all URLs therein and writes it to the target file.
def process( source_file, target_file ):
	with open( source_file, "r" ) as input:
		source_text = input.read()

	conv_text = source_text
	match_found = False
	for matcher in href_lookup.finditer( source_text ):
		if ( matcher != None ):
			match_found = True
			original_url = matcher.group( 1 )
			print "\tConverting permalink " + original_url
			if not conversion_map.has_key( original_url ):
				conversion_map[ original_url ] = convert( original_url )

			conversion = conversion_map[ original_url ]
			if conversion[ 1 ]:
				print "\tConversion successful, new URL: " + conversion[ 0 ]
				conv_text = conv_text.replace( original_url, conversion[ 0 ] )
			else:
				print "\tConversion failed!"

	# Write out the target file
	if match_found:
		with open( target_file, "w" ) as output:
			output.write( conv_text )

# Entry point		
for file_name in glob.iglob( "*.xml" ):
	print "Processing " + file_name
	process( file_name, file_name + ".conv" )
Sunday, 22 June 2008 15:47:55 (Jerusalem Standard Time, UTC+02:00)  #    -
Personal | Software
Me!
Send mail to the author(s) Be afraid.
Archive
<2008 September>
SunMonTueWedThuFriSat
31123456
78910111213
14151617181920
21222324252627
2829301234
567891011
All Content © 2024, Tomer Gabel
Based on the Business theme for dasBlog created by Christoph De Baene (delarou)