Tag Archives: C-Sharp

Line wrapping at word boundaries for console applications in C#

I didn’t like any of the solutions floating around the web for displaying blocks of text wrapped at word boundaries, so I wrote one.

This:

This is a really long line of text that shoul
dn't be wrapped mid-word, but is

Becomes this:

This is a really long line of text that
shouldn't be wrapped mid-word, but is

Here it is:

public static string GetWordWrappedParagraph(string paragraph)
{
    if (string.IsNullOrWhiteSpace(paragraph))
    {
        return string.Empty;
    }
 
    var approxLineCount = paragraph.Length / Console.WindowWidth;
    var lines = new StringBuilder(paragraph.Length + (approxLineCount * 4));
 
    for (var i = 0; i < paragraph.Length;)
    {
        var grabLimit = Math.Min(Console.WindowWidth, paragraph.Length - i);
        var line = paragraph.Substring(i, grabLimit);
 
        var isLastChunk = grabLimit + i == paragraph.Length;
 
        if (isLastChunk)
        {
            i = i + grabLimit;
            lines.Append(line);
        }
        else
        {
            var lastSpace = line.LastIndexOf(" ", StringComparison.Ordinal);
            lines.AppendLine(line.Substring(0, lastSpace));
 
            //Trailing spaces needn't be displayed as the first character on the new line
            i = i + lastSpace + 1;
        }
    }
    return lines.ToString();
}

redis: Sorted Sets as iterable key-value work queues

I’m in the process of building an internal search engine at work, and first on the block are our network share drives. During a typical week, we have about 46,000 unique documents that are written to in our Lexington office. This number only represents things like Word docs, Excel spreadsheets, PowerPoint presentations, PDFs, and so forth. (A count of all touches is 3-4x higher.)

This presents some challenges: crawling the network shares at regular intervals is slow, inefficient, and at any given time, a large portion of the search index may be out of date, which is bad for the user experience. So I decided an incremental approach would be better: if I re-index each document as it’s touched, it eliminates unnecessary network and disk IO, and the search index is updated in close to realtime.

Redis

My second-level cache/work queue is a redis instance that I interact with using Booksleeve. The keys are paths that have changed, and the values are serialized message objects stored as byte arrays that contain information about the change. The key-value queue structure is important, because the only write operation that matters is the one that happened most recently. (Why try to index a document if it gets deleted a moment later?)

This is great in theory, but somewhere along the way, I naively assumed it was possible to iterate over redis keys… but you can’t. Not easily or efficiently, anyway. (Using keys in a production environment is dangerous, and should be avoided.)

Faking iteration

The solution was relatively simple, however, and like all problems in software development, it was solved by adding another level of indirection: using the redis Sorted Set data type.

For most use cases, the main feature that differentiates a Set from a Sorted Set is the notion of a score. But with a Sorted Set, you can also return a specific number of elements from the set. In my case, each element returned is the key to a key-value pair representing some work to be done.

Implementing this is as easy as writing the path to the Sorted Set at the same time as the key-value work item is added, which can be done transactionally:

using (var transaction = connection.CreateTransaction())
{
    int i = 0;  //dummy value representing some "work"
    foreach (var word in WORDS)
    {
        transaction.SortedSets.Add(REDIS_DB, set, word, SCORE);
 
        //Set the key => message (int i) value
        transaction.Strings.Set(REDIS_DB, word, i.ToString());
        i++;
    }
    transaction.Execute();
}

Downstream, my consumer fills its L1 cache by reading n elements from the Sorted Set:

var pairs = new Dictionary<string, string>();
using (var transaction = connection.CreateTransaction())
{
    //Get n keys from the set into the Dictionary
    var keyList = connection.Wait(connection.SortedSets.RangeString(REDIS_DB, set, 0, LIMIT));
 
    foreach (var key in keyList)
    {
        var value = Encoding.Default.GetString(connection.Strings.Get(REDIS_DB, key.Key).Result);
        pairs.Add(key.Key, value);
 
        //Remove the key from the SortedSet
        transaction.SortedSets.Remove(REDIS_DB, set, key.Key);
        //Remove the key from the Keys
        transaction.Keys.Remove(REDIS_DB, key.Key);
    }
    transaction.Execute();
}

And there we have fake key “iteration” in redis.

A little syntactic sugar to have reference semantics for value types

In C#, structs and other data primitives have value semantics. (This includes strings, even though they are technically reference types.) But sometimes it’s useful to have reference semantics when dealing with what would otherwise be a value type. (Referencing primitives in a singleton object, for example.)

Here are two ways of doing that.

Delegates

Delegate syntax can seem a little weird–particularly when you’re working with primitives–because they’re basically typed function pointers. They look more like methods than variables.

Here’s an example:

class Program
{
    static void Main(string[] args)
    {
        var i = SomeInteger.SomeNumber;     //Value type won't change when SomeNumber changes
        Console.WriteLine(i);               //100
        Func<int> del = () => SomeInteger.SomeNumber;
        Console.WriteLine(del());           //100 (Same as del.Invoke())
 
        SomeInteger.SomeNumber = 42;
        Console.WriteLine(i);               //Still 100
        Console.WriteLine(del());           //42
    }
}
 
static class SomeInteger
{
    public static int SomeNumber = 100;
}

Wrapper class using generics

My preferred way is by combining a wrapper class with C# generics. It has nicer syntax, but does require a little more setup. The results are a little clearer, in my opinion.

Here’s an example:

class Program
{
    static void Main(string[] args)
    {
        //Assume SomeInteger.SomeNumber is 100 again
        var foo = new Reference<int>(() => SomeInteger.SomeNumber);
        Console.WriteLine("foo.Value: {0}", foo.Value);     //100
 
        SomeInteger.SomeNumber = 42;
        Console.WriteLine("foo.Value: {0}", foo.Value);     //42
    }
}
 
class Reference<T>
{
    private readonly Func<T> _theValue;
    public T Value
    {
        get
        {
            //If the value is null, return the default initialization of type T
            return _theValue != null ? _theValue.Invoke() : default(T);
        }
    }
 
    public Reference(Func<T> theValue)
    {
        _theValue = theValue;
    }
}

And there we have type-safe, reference semantics for primitives in C#. (Works for nullable types, too.)