Category Archives: Software

LexisMed has a new home

For years, LexisMed has lived here on this website. Well, as of mid-January, that’s no longer the case. I’ve split it off into two–soon to be three–distinct products:

  • LexisMed Speller: an inexpensive, installable set of dictionaries that work with Chrome, Firefox, Office, and ClaroReader. Also available for institutions with real deployment tooling.
  • LexisMed Lexicon: raw datasets that can be used by software developers building applications that require a rich medical terminology database.
  • LexisMed API: a client-side set of software development components that provide spell-checking capabilities in applications. These haven’t been created, but they’re in progress. The first version will be focused on .NET applications.

Along the way, I bumped the word count up to about 810,000 words.

A self-contained, roll-forward schema updater

I use Dapper for most of my database interactions. I like it because it’s simple, and does exactly one thing: runs SQL queries, and returns the typed results.

I also like to deploy my schema changes as part of my application itself instead of doing it as a separate data deployment. On application startup, the scripts are loaded and executed in lexical order one by one, where each schema change is idempotent in isolation.

The problem you run into is making destructive changes to schema, which is a reasonable thing to want to do. If script 003 creates a column of UNIQUEIDENTIFIER, and you want to convert that column to NVARCHAR in script 008, you have to go back do some reconciliation between column types. Adding indexes into the mix makes it even hairier. Scripts that are idempotent in isolation are easy to write. Maintaining a series of scripts that can be safely applied in order from beginning to end every time an application starts up is not.

Unless you keep track of which schema alterations have already been applied, and only apply the changes that the application hasn’t seen before. Here’s a short, self-contained implementation:

Proposed functionality and API changes for ical.net v3

Downloading remote resources

When I ported ical.net to .NET Core, I removed the ability to download remote payloads from a URI. I did this for many reasons:

  • There are myriad ways of accessing an HTTP resource. There are myriad ways of doing authentication. Consumers of ical.net are in a position to know the details of their environment, including security concerns, so responsibility for these concerns should lie with the developers using the library.
  • Choosing to support HttpClient leaves .NET 4.0 users out in the cold. Choosing to support WebClient brings those people into the fold, but leaves .NET Core and WinRT users out. It also prevents developers working with newer versions of .NET from benefiting from HttpClient.
  • Non-blocking IO leaves developers working with WinForms and framework versions < 4.5 out in the cold. Bringing those developers back into the fold means we can’t make use of async Tasks. Given the popularity of microservices and ical.net’s origins on the server side, this is a non-starter.

We can’t satisfy all use cases if we try to do everything, so instead I’ve decided that we’ll leave over-the-wire tasks to the developers using ical.net.

The primacy of strings

To that end… strings will be the primary way to work with ical.net. A developer should be able to instantiate everything from a huge collection of calendars down to a single calendar component (a VEVENT for example) by passing it a string that represents that thing. In modern C#, working directly with strings is more natural than passing Streams around, which is emblematic of old-school Java. It’s also more error prone: I fixed several memory leaks during the .NET Core port due to undisposed Streams)

  • The constructor will be the deserializer. It is reasonable for the constructor to deserialize the textual representation into the typed representation.
  • ToString() will be the serializer. It is reasonable for ToString() to serialize the typed representation into the textual representation.

Constructors as deserializers buys us…

Immutable types and (maybe) a fluid API

One of the challenges I faced when refactoring for performance was reasoning about mutable properties during serialization and deserialization. Today, deserialization makes extensive use of public, mutable properties. In fact, the documentation reflects this mutability:

To be completely honest, this state of affairs makes it quite difficult to make internal changes without breaking stuff. Many properties would naturally be getter-only, because they can be derived from simple internals, like Duration above. Yet they’re explicitly set during deserialization. This is an incredible vector for bugs and breaking changes. (Ask me how I know…)

If we close these doors and windows, it will increase our internal maneuverability.

Fluid API

Look at the code above. Couldn’t it be more elegant? Shouldn’t it be? I don’t yet have a fully-formed idea of what a more fluid API might look like. Suggestions welcome.

Component names

IICalendarTypeNames

The .NET framework guidelines recommend prefixing interface names with “I”. The calendar spec is called “iCalendar”, as in “internet calendar”, which is an unfortunate coincidence. Naming conventions like IICalendarCollection offend my sense of aesthetics, so I renamed some objects when I forked ical.net from dday. I’ve come around to valuing consistency over aesthetics, so I may go back to the double-I where it makes sense to do so.

CalDateTime

The object that represents “a DateTime with a time zone” is called a CalDateTime. I’m not wild about this; we already have the .NET DateTime struct which has its own shortcomings that’ve been exhaustively documented elsewhere. A reasonable replacement for CalDateTime might be a DateTimeOffset with a string representation of an IANA, BCL, or Serialization time zone, with the time zone conversions delegated to NodaTime for computing recurrences. (In fact, NodaTime is already doing the heavy lifting behind the scenes for performance reasons, but the implementation isn’t pretty because of CalDateTime‘s mutability. Were it immutable, it would have been a straightforward engine replacement.)

CalDateTime is the lynchpin for most of the ical.net library. Most of its public properties should be simple expression bodies. Saner serialization and deserialization will have to come first as outlined above.

Divergence from spec completeness and adherence

VTIMEZONE

The iCalendar spec has ways of representing time change rules with VTIMEZONE. In the old days, dday.ical used this information to figure out Standard Time/Summer Time transitions. But as the spec itself notes:

Note: The specification of a global time zone registry is not addressed by this document and is left for future study. However, implementers may find the Olson time zone database [TZ] a useful reference. It is an informal, public-domain collection of time zone information, which is currently being maintained by volunteer Internet participants, and is used in several operating systems. This database contains current and historical time zone information for a wide variety of locations around the globe; it provides a time zone identifier for every unique time zone rule set in actual use since 1970, with historical data going back to the introduction of standard time.

At this point in time, the IANA (née Olson) tz database is the best source of truth. Relying on clients to specify reasonable time zone and time change behavior is unrealistic. I hope the spec authors revisit the VTIMEZONE element, and instead have it specify a standard time zone string, preferably IANA.

To that end… ical.net will continue to preserve VTIMEZONE fields, but it will not use them for recurrence computations or understanding Summer/Winter time changes. It will continue to rely on NodaTime for that.

URL and ATTACH

As mentioned above, ical.net will no longer include functionality to download resources from URIs. It will continue to preserve these fields so clients can do what they wish with the information they contain. This isn’t a divergence from the spec, per se, which doesn’t state that clients should provide facilities to download resources.

dday.ical is now ical.net and available under the MIT license with many performance enhancements

A few months ago, I needed to do some calendar programming for work, and I came across the dday.ical library, like many developers before me. And like many developers, I discovered that dday.ical doesn’t have the best performance, particularly under heavy server loads.

I dug in, and started making changes to the source code, and that’s when I discovered that the licensing was ambiguous, and that it had been abandoned. I was concerned that I might be exposing my company to risk due to unclear copyright, and a non-standard license.

With some effort, I was able to track down Doug Day (dday), and he gave me permission to fork, rename (ical.net), and relicense his library (MIT), which I have done. So I’m happy to report…

dday.ical is now ical.net

mdavid, who saw to it that the library wasn’t lost to the dustbin of Internet history, has graciously redirected dday users to ical.net. Khalid Abuhakmeh, who published the dday nuget package that you might be using (you should switch ASAP) has also agreed to archive and redirect users to ical.net.

So… why should you use the new package?

Unambiguous licensing

Doug has revoked his copyright, and given unrestricted permission to give dday.ical new life as ical.net. That means ical.net is unencumbered by legal ambiguities.

Many performance enhancements

My changes to ical.net have been mostly performance-focused. I was lucky in that dday.ical has always included a robust test suite with about 170 unit tests that exercise all the features of the library. Some were broken, or referenced non-existent ics files, so I nuked those right away, and concentrated on the set of tests that were working as a baseline for making safe changes.

The numbers:

  • Old dday.ical test suite: ~17 seconds
  • Latest ical.net nuget package: 3.5 seconds

There’s no games here. ical.net really is that much faster.

Profiling showed a few hotspots which I attacked first, but those only bought me maybe 3-4 seconds improvement. There was no single thing that resulted in huge performance gains. Rather it was many, many small changes that contributed, quite often by improve garbage collection pauses, many of which were 5ms+, which is an eternity in computing time.

Here are a few themes that stand out in my memory:

  • Route all time zone conversions through NodaTime, which actually exposed some bugs in what the unit tests were asserting
  • Converting .NET 1.1 collections (Hashtable, ArrayList) to modern, generic equivalents
  • Converting List<T> to HashSet<T> for many collections, including creating stable, minimal GetHashCode() methods, though more attention is still needed in this area. A nice side effect of this was that lot of lookups and collection operations then became set operations (ExceptWith(), UnionWith(), etc.)
  • Converting several O(n^2) methods to O(n) or better by restructuring methods based on information that was available in context
  • Converted a lot of loops to LINQ. (Yes, really!)
  • Specifying initial collection sizes when using array-backed collections like List<T> and Dictionary<TKey, TValue>
  • Moved variables closer to their usage, which sometimes meant that certain expensive calls don’t occur at all, because the method exits before reaches it. This also had the effect of pushing some variables into gen 0 garbage collection. (Anecdotally, I have noticed GC pauses are fewer and further between, though I don’t have any hard data that it’s actually significant.)
  • Moving expensive calls outside of tight loops. Unfortunately the library makes extensive use of the service-provider antipattern. A common thing was to have an expensive call (get me a deserializer for Foo!) inside a tight loop that’s only ever deserializing Foos. So you can make the call once and just reuse the deserializer.
  • Implemented a lazy caching layer as suggested in one of the TODOs in the comments.

Along the way, I converted a lot of code to modern, idiomatic C#, which actually helped performance as much as any of the discrete things I did above. As I work towards a .NET Core port, I have the runtime down to about 2.8 seconds just through clarifying and restructuring existing code, and idiomatic simplifications.

What’s next?

  • A .NET Core port is nearly complete.
  • The ical.net has virtually no documentation. I hope to improve the readme with some simple examples this morning/afternoon.
  • I have been bug collecting on Stack Overflow, and have a few maybe-bugs to investigate and/or write test cases for.
  • Maybe some API changes for v3, still TBD. I’ll discuss these in a future blog post.

Creating an array of generics in Java

I was messing around with creating a generic Bag collection in Java that’d be backed by an array. It turns out that you can’t do this for a number of interesting reasons…

In Java (and C#), arrays are covariant. This means that if Apple is a subtype of Fruit, then Apple[] will also be a subtype of Fruit[]. Pretty straightforward. That means this will compile:

If you’re like me, you didn’t think too hard about this, and assumed you could do the same with parameterized types, i.e. generics. Thanksfully you can’t, because that code is unsafe. It will throw an ArrayStoreException at runtime which we’d have to handle.

Wouldn’t it be great if we could guarantee type safety at compile time?

Generics are safer

Unlike arrays, generics are invariant, which means that Apple being a subtype of doesn’t matter: a List<Apple> is different than a List<Fruit>. The generic version of the code above is illegal:

You can’t cast it, either:

By making generics invariant, we guarantee safe behavior at compile time, which is a much cheaper place to catch errors. (This is one of the big reasons developers get excited about generics.)

So why are arrays and generics mutually exclusive?

In Java, generics have their types erased at compile time. This is called type erasure. Type erasure means a couple of things happen at compile time:

  • Generic types are boiled down to their raw types: you cannot have a Derp and a Derp<T> in the same package.
  • A method that has a parameterized type overload won’t be compile: a class with methods popFirst(Derp<T> derp) and popFirst(Derp derp) won’t compile.
  • Runtime casts are inserted invisibly by the compiler to ensure runtime type safety. (This means there’s no performance benefit to generics in Java!)

Java’s implementation of generic types is clumsy, and was done to maintain backward-compatibility in the bytecode between Java 5 and Java 4.

Other high-level languages (like C#) implement generics very differently, which means none of the three caveats above apply. Generics in full-stack implementations do net performance gains along with those type-safety guarantees.

To recap, in Java:

  • Arrays require type information at compile time
  • Generics have their types erased at compile time

Therefore you cannot create arrays of parameterized types in Java.

Further reading

How to fix broken iCloud photostream sync on Windows

Symptom

  • Your iPhone is set to back up your photos to iCloud
  • iCloud on your Windows machine is configured to download your photos
  • iCloud isn’t downloading your photo stream.

Fix

  1. Open the Task Manager by hitting Ctrl+Shift+Esc
  2. Click the Processes tab
  3. Click Name to sort the processes by name
  4. Find the Apple Photostreams Uploader and Apple Photostreams Downloader processes. End both of them.
    • In Windows 7, these will be called ApplePhotostreamsUploader.exe and ApplePhotostreamsDownloader.exe
  5. Hold down your Windows key, and hit R to open a Run prompt
  6. Type %appdata% and hit Enter
  7. Open Apple Computer > MediaStream
  8. Delete everything in the directory
  9. Log out of your Windows account, and log back in (or just reboot, if you find that easier)
  10. Once you’ve logged back into your Windows account, open the iCloud control panel again
  11. If the Photos checkbox is empty, check it
  12. Click Options, and make sure the photo options are configured how you want them
  13. Click Apply

In a few moments, your photos should start downloading.

Notes

iCloud isn’t very smart about a great many things. Here are a few:

  • If you changed the location of your downloaded photos, it will redownload what it can, creating duplicates.
  • In the iCloud 2.x days, your downloads and uploads were usually split into a Downloads and Uploads directory, and you could change the directories if you wanted. That’s not true anymore. Instead, iCloud 3.x creates a “My Photo Stream” directory, and sticks your downloads in there. Anything you’ve shared with other people, or that other people have shared with you goes into “Shared”. If you want to push a photo from your computer to iCloud, put it into Uploads


If you found this post useful, please consider donating $2