Category Archives: Programming

Recursion as sophisticated GOTO?

Suppose you write a simple program to play a number guessing game: “I’m thinking of a whole number between X and Y…” where the user attempts to guess the number in order to win. Failures that require feedback to the user come in three main flavors:

  1. The user guessed incorrectly
  2. The user’s guess was out of range
  3. The user did something silly, like enter a non-integer value

The do..while form

You can do this with a loop, of course. A do..while loop seems like a natural choice here. What I don’t like is the validation and user feedback in the while clause. You’d call out to a method that hides all of the validation logic, and returns true or false. This is OK, but that boolean method will have a side effect, i.e. feedback to the user.

do {
    //increment guess count
    //ask for input
    //read in input
} while (
    //input is wrong
);

The recursive form

An alternative implementation might use recursion:

static void elicitGuess() {
    //increment guess count
    //ask for input
    //read in the input
    //validate input inline
        //Failure:
            //display relevant feedback
            //elicitGuess()
        //else return
}

I find this second way to be more natural and readable. But it also feels like the wrong thing to do, though I can’t articulate why. Maybe it’s because any time I’ve seen code that followed this kind of pattern, I’ve seen it written using loops.

Thoughts?

Memory access patterns in high-level languages

Like many developers that work in high-level languages, I think don’t spend a lot of time thinking about memory access patterns. This is probably a good thing… for the most part, worrying about this is premature optimization. But there are times when it matters, and the compiler won’t magically “optimize” it away for you, even if you have optimizations turned on:

Code

class MemoryAccessPatterns
{
    static void Main(string[] args)
    {
        var timer = new Stopwatch();
 
        timer.Start();
        for (var i = 0; i < 25; i++)
        {
            doFast();
        }
        timer.Stop();
        Console.WriteLine("Fast array: {0}", timer.ElapsedMilliseconds);
 
        timer.Restart();
        for (var i = 0; i < 25; i++)
        {
            doSlow();
        }
        timer.Stop();
        Console.WriteLine("Slow array: {0}", timer.ElapsedMilliseconds);
        Console.ReadKey();
    }
 
    private static void doFast()
    {
        var fast = new int[5000, 5000];
        for (var i = 0; i < 5000; i++)
        {
            for (var j = 0; j < 5000; j++)
            {
                fast[i, j]++;
            }
        }
    }
 
    private static void doSlow()
    {
        var slow = new int[5000, 5000];
        for (var i = 0; i < 5000; i++)
        {
            for (var j = 0; j < 5000; j++)
            {
                slow[j, i]++;
            }
        }
    }
}

Results

Release x86, optimizations on:
Fast array: 3829
Slow array: 8495

Release x86, optimizations off:
Fast array: 4357
Slow array: 8675

Release x64, optimizations on:
Fast array: 5074
Slow array: 10445

Release x64, optimizations off:
Fast array: 5954
Slow array: 10781

I tried this out after seeing this discussion thread on Quora this morning.

redis: Sorted Sets as iterable key-value work queues

I’m in the process of building an internal search engine at work, and first on the block are our network share drives. During a typical week, we have about 46,000 unique documents that are written to in our Lexington office. This number only represents things like Word docs, Excel spreadsheets, PowerPoint presentations, PDFs, and so forth. (A count of all touches is 3-4x higher.)

This presents some challenges: crawling the network shares at regular intervals is slow, inefficient, and at any given time, a large portion of the search index may be out of date, which is bad for the user experience. So I decided an incremental approach would be better: if I re-index each document as it’s touched, it eliminates unnecessary network and disk IO, and the search index is updated in close to realtime.

Redis

My second-level cache/work queue is a redis instance that I interact with using Booksleeve. The keys are paths that have changed, and the values are serialized message objects stored as byte arrays that contain information about the change. The key-value queue structure is important, because the only write operation that matters is the one that happened most recently. (Why try to index a document if it gets deleted a moment later?)

This is great in theory, but somewhere along the way, I naively assumed it was possible to iterate over redis keys… but you can’t. Not easily or efficiently, anyway. (Using keys in a production environment is dangerous, and should be avoided.)

Faking iteration

The solution was relatively simple, however, and like all problems in software development, it was solved by adding another level of indirection: using the redis Sorted Set data type.

For most use cases, the main feature that differentiates a Set from a Sorted Set is the notion of a score. But with a Sorted Set, you can also return a specific number of elements from the set. In my case, each element returned is the key to a key-value pair representing some work to be done.

Implementing this is as easy as writing the path to the Sorted Set at the same time as the key-value work item is added, which can be done transactionally:

using (var transaction = connection.CreateTransaction())
{
    int i = 0;  //dummy value representing some "work"
    foreach (var word in WORDS)
    {
        transaction.SortedSets.Add(REDIS_DB, set, word, SCORE);
 
        //Set the key => message (int i) value
        transaction.Strings.Set(REDIS_DB, word, i.ToString());
        i++;
    }
    transaction.Execute();
}

Downstream, my consumer fills its L1 cache by reading n elements from the Sorted Set:

var pairs = new Dictionary<string, string>();
using (var transaction = connection.CreateTransaction())
{
    //Get n keys from the set into the Dictionary
    var keyList = connection.Wait(connection.SortedSets.RangeString(REDIS_DB, set, 0, LIMIT));
 
    foreach (var key in keyList)
    {
        var value = Encoding.Default.GetString(connection.Strings.Get(REDIS_DB, key.Key).Result);
        pairs.Add(key.Key, value);
 
        //Remove the key from the SortedSet
        transaction.SortedSets.Remove(REDIS_DB, set, key.Key);
        //Remove the key from the Keys
        transaction.Keys.Remove(REDIS_DB, key.Key);
    }
    transaction.Execute();
}

And there we have fake key “iteration” in redis.

A little syntactic sugar to have reference semantics for value types

In C#, structs and other data primitives have value semantics. (This includes strings, even though they are technically reference types.) But sometimes it’s useful to have reference semantics when dealing with what would otherwise be a value type. (Referencing primitives in a singleton object, for example.)

Here are two ways of doing that.

Delegates

Delegate syntax can seem a little weird–particularly when you’re working with primitives–because they’re basically typed function pointers. They look more like methods than variables.

Here’s an example:

class Program
{
    static void Main(string[] args)
    {
        var i = SomeInteger.SomeNumber;     //Value type won't change when SomeNumber changes
        Console.WriteLine(i);               //100
        Func<int> del = () => SomeInteger.SomeNumber;
        Console.WriteLine(del());           //100 (Same as del.Invoke())
 
        SomeInteger.SomeNumber = 42;
        Console.WriteLine(i);               //Still 100
        Console.WriteLine(del());           //42
    }
}
 
static class SomeInteger
{
    public static int SomeNumber = 100;
}

Wrapper class using generics

My preferred way is by combining a wrapper class with C# generics. It has nicer syntax, but does require a little more setup. The results are a little clearer, in my opinion.

Here’s an example:

class Program
{
    static void Main(string[] args)
    {
        //Assume SomeInteger.SomeNumber is 100 again
        var foo = new Reference<int>(() => SomeInteger.SomeNumber);
        Console.WriteLine("foo.Value: {0}", foo.Value);     //100
 
        SomeInteger.SomeNumber = 42;
        Console.WriteLine("foo.Value: {0}", foo.Value);     //42
    }
}
 
class Reference<T>
{
    private readonly Func<T> _theValue;
    public T Value
    {
        get
        {
            //If the value is null, return the default initialization of type T
            return _theValue != null ? _theValue.Invoke() : default(T);
        }
    }
 
    public Reference(Func<T> theValue)
    {
        _theValue = theValue;
    }
}

And there we have type-safe, reference semantics for primitives in C#. (Works for nullable types, too.)

Publisher confirms with RabbitMQ and C#

RabbitMQ lets you handle messages that didn’t send successfully, without resorting to full-on transactions. It provides this capability in the form of publisher confirms. Using publisher confirms requires just a couple of extra lines of C#.

If you’re publishing messages, you probably have a method that contains something like this:

using (var connection = FACTORY.CreateConnection())
{
    var channel = connection.CreateModel();
    channel.ExchangeDeclare(QUEUE_NAME, ExchangeType.Fanout, true);
    channel.QueueDeclare(QUEUE_NAME, true, false, false, null);
    channel.QueueBind(QUEUE_NAME, QUEUE_NAME, String.Empty, new Dictionary<string, object>());
 
    for (var i = 0; i < numberOfMessages; i++)
    {
        var message = String.Format("{0}\thello world", i);
        var payload = Encoding.Unicode.GetBytes(message);
        channel.BasicPublish(QUEUE_NAME, String.Empty, null, payload);
    }
}

 

But you’re out of luck if you want:

  1. A guarantee that your message was safely preserved in the event that the broker goes down (i.e. written to disk)
  2. Acknowledgement from the broker that your message was received, and written to disk

For many use cases, you want these guarantees. Fortunately, getting them is relatively straightforward:

//Set the message to persist in the event of a broker shutdown
var messageProperties = channel.CreateBasicProperties();
messageProperties.SetPersistent(true);

 

//Send an acknowledgement that the message was persisted to disk
channel.BasicAcks += channel_BasicAcks;
channel.ConfirmSelect();
 
//...
 
//Begin loop
channel.BasicPublish(QUEUE_NAME, QUEUE_NAME, messageProperties, payload);
channel.WaitForConfirmsOrDie();
//End loop

 

(You’ll have to implement event handlers for acks and nacks.)

The difference between WaitForConfirms and WaitForConfirmsOrDie is not immediately obvious, but after digging through the Javadocs, it seems that WaitForConfirmsOrDie will give you an IOException if a message is nack‘d, whereas WaitForConfirms won’t.

You’ll get an IllegalStateException if you try to use either variation of WaitForConfirms without first setting the Confirms property with ConfirmSelect.

Here’s the complete code for getting an acknowledgement from the RabbitMQ broker, only after the broker has persisted the message to disk:

using (var connection = FACTORY.CreateConnection())
{
    var channel = connection.CreateModel();
    channel.ExchangeDeclare(QUEUE_NAME, ExchangeType.Fanout, true);
    channel.QueueDeclare(QUEUE_NAME, true, false, false, null);
    channel.QueueBind(QUEUE_NAME, QUEUE_NAME, String.Empty, new Dictionary<string, object>());
    channel.BasicAcks += channel_BasicAcks;
    channel.ConfirmSelect();
 
    for (var i = 1; i <= numberOfMessages; i++)
    {
        var messageProperties = channel.CreateBasicProperties();
        messageProperties.SetPersistent(true);
 
        var message = String.Format("{0}\thello world", i);
        var payload = Encoding.Unicode.GetBytes(message);
        Console.WriteLine("Sending message: " + message);
        channel.BasicPublish(QUEUE_NAME, QUEUE_NAME, messageProperties, payload);
        channel.WaitForConfirmsOrDie();
    }
}

Java solution to Project Euler Problem 48

Problem 48:

The series, 1^1 + 2^2 + 3^3 + … + 10^10 = 10405071317.

Find the last ten digits of the series, 1^1 + 2^2 + 3^3 + … + 1000^1000.

Running time: 125 ms

Assessment: Again, very easy and fast using arbitrary-precision arithmetic. Like one of my other solutions, I didn’t limit the output to just the last ten digits in the series, but you could easily tack that on.

import java.math.BigInteger;
 
public class Problem048
{
	public static void main(String[] args)
	{
		long begin = System.currentTimeMillis();
		BigInteger sum = BigInteger.ZERO;
		BigInteger temp = BigInteger.ONE;
		BigInteger GrandTotal = BigInteger.ZERO;
 
		for (int i = 1; i <= 1000; i++)
		{
			sum = temp.pow(i);
			temp = temp.add(BigInteger.ONE);
			GrandTotal = GrandTotal.add(sum);
		}
 
		long end = System.currentTimeMillis();
 
		System.out.println(GrandTotal);
		System.out.println(end-begin + "ms");
	}
}

Java solution to Project Euler Problem 36

Problem 36:

The Fibonacci sequence is defined by the recurrence relation:

Fn = F(n-1) + F(n-2), where F1 = 1 and F2 = 1.

Hence the first 12 terms will be:

  • F1 = 1
  • F2 = 1
  • F3 = 2
  • F4 = 3
  • F5 = 5
  • F6 = 8
  • F7 = 13
  • F8 = 21
  • F9 = 34
  • F10 = 55
  • F11 = 89
  • F12 = 144

The 12th term, F12, is the first term to contain three digits.

What is the first term in the Fibonacci sequence to contain 1000 digits?

Running time:

  • Checking for a palindrome in Base 10 first: 500ms
  • Checking for a binary palindrome first: 650ms

Assessment: This problem isn’t super interesting. What I did find interesting was that changing the order of the isPalindrome() comparison resulted in a significant difference in execution times. This makes sense because there are more binary palindromes than Base 10 palindromes. For no particular reason, I expected the compiler to optimize that section so the difference wouldn’t be as stark.

I commented out the slower method so you can play with it if my explanation is unclear.

public class Problem036
{
	private static boolean isPalindrome(String s)
	{
		String s2 = new StringBuffer(s).reverse().toString();
		if (s.equals(s2))
			return true;
		else
			return false;
	}
 
	public static void main(String[] args)
	{
		long begin = System.currentTimeMillis();
 
		long Sum = 0; 
		for (int i = 0; i < 1000000; i++)
		{
			if ( isPalindrome(Integer.toString(i)) && isPalindrome(Integer.toBinaryString(i)) )
				Sum += i;
			/*if (isPalindrome(Integer.toBinaryString(i)) && isPalindrome(Integer.toString(i)))
				Sum += i;*/
		}
		System.out.println(Sum);
 
		long end = System.currentTimeMillis();
		System.out.println(end-begin + "ms");
	}
}

Java solution to Project Euler Problem 25

Problem 25:

The Fibonacci sequence is defined by the recurrence relation:

Fn = F(n-1) + F(n-2), where F1 = 1 and F2 = 1.

Hence the first 12 terms will be:

  • F1 = 1
  • F2 = 1
  • F3 = 2
  • F4 = 3
  • F5 = 5
  • F6 = 8
  • F7 = 13
  • F8 = 21
  • F9 = 34
  • F10 = 55
  • F11 = 89
  • F12 = 144

The 12th term, F12, is the first term to contain three digits.

What is the first term in the Fibonacci sequence to contain 1000 digits?

Running time: 36ms

Assessment: LOL.

import java.math.BigInteger;
 
public class Problem025
{
	public static void main(String[] args)
	{
		long begin = System.currentTimeMillis();
 
		BigInteger a = BigInteger.valueOf(1);
		BigInteger b = BigInteger.valueOf(2);
		BigInteger c = BigInteger.valueOf(0);
		BigInteger MAX = new BigInteger("1000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000");
 
		int i = 3;
		for (i = 3; b.compareTo(MAX) < 0; i++)
		{
			c = a.add(b);
			a = b;
			b = c;
		}
		System.out.println("i: " + i);
 
		long end = System.currentTimeMillis();
		System.out.println(end - begin + "ms");
	}
}

Java solution to Project Euler Problem 22

Problem 22:

Using names.txt (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 * 53 = 49714.

What is the total of all the name scores in the file?

Running time:

  • File IO: 23ms
  • 5163 names sorted: 79ms
  • Built the list of names: 154ms
  • Total runtime: 209ms

Assessment: I broke my time measurements up so I could see how fast each major piece is. I thought that file IO would be the slowest piece of the equation–I don’t have an SSD–but it isn’t. In fact, no single piece is really the bottleneck. At less than 0.3 seconds, this is, for practical purposes, instantaneous.

If I were to do this problem again–probably in C#–I would approach it the same way, but the code would look a bit cleaner, and certainly less verbose. Using higher-level data structures makes this problem pretty simple.

import java.io.*;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Iterator;
 
public class Problem022
{
	private static ArrayList<String> NameList = new ArrayList<String>();
 
	private static void buildList(String s)
	{
		long BuildListBegin = System.currentTimeMillis();
		String temp = "";
		boolean IsName = false;
		int i = 0;
		while (i < s.length())
		{
			if (s.charAt(i) == '\"')
			{	// Flip the IsName switch if a quotation mark is encountered
				IsName = !IsName;
				i++;	// Move along one character so the quote isn't included in temp
			}
			if (IsName)
			{	// If switch is on, capture characters into temp
				temp += s.charAt(i);
			}
			else
			{
				if (temp == "")
				{	// Without this, a blank line is included
					break;
				}
				else
				{
					NameList.add(temp);
					temp = "";	
				}
			}
			i++;
		}
		long SortBegin = System.currentTimeMillis();
		Collections.sort(NameList);
		long SortEnd = System.currentTimeMillis();
		long BuildListEnd = System.currentTimeMillis();
		System.out.println(NameList.size() + " items sorted in " + (SortEnd-SortBegin) + "ms");
		System.out.println("buildList() executed in " + (BuildListEnd-BuildListBegin) + "ms");
	}
 
	private static String readFile(String filename)
	{
		/*	1) Read each name beginning with " and ending with " into vector of String
		 *	2) Sort names alphabetically
		 */
		long begin = System.currentTimeMillis();
		File f = new File(filename);
		BufferedReader reader;
		String list = "";
		try
		{
			StringBuffer contents = new StringBuffer();
			String text = null;
			reader = new BufferedReader(new FileReader(f));
			while ((text = reader.readLine()) != null)
			{
				contents.append(text).append(System.getProperty("line.separator"));			
			}
			list = contents.toString();
		}
		catch (FileNotFoundException e)
		{
			e.printStackTrace();
		}
		catch (IOException e)
		{
			e.printStackTrace();
		}
		System.out.println("File IO: " + (System.currentTimeMillis() - begin) + "ms");
		return list;
	}
 
	private static long calcValue()
	{
		long GrandTotal = 0;
		int i = 1;
		Iterator<String> itr = NameList.iterator();
		while(itr.hasNext())
		{
			String tmp = itr.next().toString();
			int LocalSum = 0; 
 
			for (int j = 0; j < tmp.length(); j++)
			{
				if (tmp.charAt(j) == 'A')
					LocalSum += 1;
				else if (tmp.charAt(j) == 'B')
					LocalSum += 2;
				else if (tmp.charAt(j) == 'C')
					LocalSum += 3;
				else if (tmp.charAt(j) == 'D')
					LocalSum += 4;
				else if (tmp.charAt(j) == 'E')
					LocalSum += 5;
				else if (tmp.charAt(j) == 'F')
					LocalSum += 6;
				else if (tmp.charAt(j) == 'G')
					LocalSum += 7;
				else if (tmp.charAt(j) == 'H')
					LocalSum += 8;
				else if (tmp.charAt(j) == 'I')
					LocalSum += 9;
				else if (tmp.charAt(j) == 'J')
					LocalSum += 10;
				else if (tmp.charAt(j) == 'K')
					LocalSum += 11;
				else if (tmp.charAt(j) == 'L')
					LocalSum += 12;
				else if (tmp.charAt(j) == 'M')
					LocalSum += 13;
				else if (tmp.charAt(j) == 'N')
					LocalSum += 14;
				else if (tmp.charAt(j) == 'O')
					LocalSum += 15;
				else if (tmp.charAt(j) == 'P')
					LocalSum += 16;
				else if (tmp.charAt(j) == 'Q')
					LocalSum += 17;
				else if (tmp.charAt(j) == 'R')
					LocalSum += 18;
				else if (tmp.charAt(j) == 'S')
					LocalSum += 19;
				else if (tmp.charAt(j) == 'T')
					LocalSum += 20;
				else if (tmp.charAt(j) == 'U')
					LocalSum += 21;
				else if (tmp.charAt(j) == 'V')
					LocalSum += 22;
				else if (tmp.charAt(j) == 'W')
					LocalSum += 23;
				else if (tmp.charAt(j) == 'X')
					LocalSum += 24;
				else if (tmp.charAt(j) == 'Y')
					LocalSum += 25;
				else if (tmp.charAt(j) == 'Z')
					LocalSum += 26;
			}
 
			LocalSum *= (i);
			GrandTotal += LocalSum;
			i++;
		}
 
		return GrandTotal;		
	}
 
	public static void main(String[] args)
	{
		long begin = System.currentTimeMillis();
 
		buildList(readFile("names.txt"));
		System.out.println(calcValue());
 
		long end = System.currentTimeMillis();
 
		System.out.println("Total execution time: " + (end-begin) + "ms");
	}
}

Java solution to Project Euler Problem 21

Problem 21:

Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).

If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.

Evaluate the sum of all the amicable numbers under 10000.

Running time: 140ms

Assessment: Like some of the other problems, I had to read this a few times before I really understood it.

import java.util.ArrayList;
import java.util.Iterator;
 
public class Problem021
{
	private static int MSum;
	private static int NSum;
 
	private static int sumList(ArrayList<Integer> list)
	{
		int sum = 0;
 
		for (Iterator<Integer> iter = list.iterator(); iter.hasNext();)
		{
			sum += iter.next();
		}
 
		return sum;
	}
 
	private static ArrayList<Integer> createList(int n)
	{
		// Creates a list of integers that evenly divide into n excluding n itself
		long root = Math.round(Math.sqrt(n)) + 1;
 
		ArrayList<Integer> test = new ArrayList<Integer>();
		test.add(1);
 
		for (int i = 2; i <= root; i++)
		{
			if (n % i == 0)
			{
				test.add(i);		// Add the divisor & its complement
				test.add(n/i);
			}
		}
 
		return test;
	}
 
	private static boolean isAmicable(int n)
	{
		/*** If n's divisors form an amicable set, return true ***/
 
		// Create a list of n's proper divisors
		ArrayList<Integer> NList = new ArrayList<Integer>(createList(n));		
 
		// Sum n's proper divisors (NSum)
		NSum = sumList(NList);
 
		// Create list of NSum's proper divisors (MList)
		ArrayList<Integer> MList = new ArrayList<Integer>(createList(NSum));
 
		// Sum m's proper divisors (MSum)
		MSum = sumList(MList);
 
		if ((MSum == n) && (MSum != NSum))
			return true;	
 
		return false;
	}
 
	public static void main(String[] args)
	{
		long begin = System.currentTimeMillis();
 
		int sum = 0;
		for (int i = 0; i < 10000; i++)
		{
			if (isAmicable(i))
			{
				sum += NSum;
			}
		}
 
		System.out.println(sum);
 
		long end = System.currentTimeMillis();
		System.out.println(end-begin + "ms");
	}
}