Neither Sound nor Complete

Saturday, November 14, 2009

Stream-based sudoku solver

A friend of mine was talking about a sudoku solver he'd written in python as an programming challenge, and confessed that Peter Norvig's version was more elegant, in part because he used generators. Python generators are infinite streams -- so this sounded like something I could tackle with Aqueduct.

I though a bit about how to solve Sudoku using streams (I didn't want to cheat and actually read PN's solution before trying it myself). Basically, I reasoned that I could use streams to brute force a solution in the same way I did for the N-queens solver. Each one of the 9 x 9 cells in the sudoko board would be its own pipeline stage. It would combine a stream of all valid input boards with all possible values of that cell, filter out invalid moves, and pass new boards on to the next cell.

Solving N-Queens problems with Aqueduct

One of the more interesting and rewarding things I've been doing during my 'time off' is to watch the lectures for the classic MIT course "Structure and Interpretation of Computer Programs", or "SICP". The lectures are online, and are fascinating -- it makes you long for a better day when a programmer could get a job matching parentheses. At any rate, they spend a long time on streams, and well, streams are something I kinda dig. One of the examples they attacked using streams intrigued me -- solving the n-queens problem.

For those who aren't familiar with chess arcana or intro to AI problem sets, the n-queens problem is "how do you arrange N queens on an N by N chessboard such that none of the queens can take one another... that is, none of them are in the same row, column, or diagonal.

Rootkit Removal (Part 4: Reenabling Autoupdate)

It seemed like the hard part was over -- I disabled the rootkit from starting at boot, got rid of the scheduled task, and cleaned up the rootkit files that I knew about. All I needed to do was enable auto-update... that should be simple, right?

The rootkit was disabling the windows update service as soon as I could start it. The service startup type is stored in the registry but also has a live value in the service control manager. The rootkit could in theory set either one, which made things a bit more complex. The fact that the change happened immediately pointed to the service control manger, which is a higher level API. As far as I know the SCM doesn't have a notification system so the rootkit must be polling every few seconds to make sure it still hadn't started.

Rootkit Removal (Part 3: Rootkit Returns)

A couple of days after disabling the rootkit on my wife's computer, she complained that it was back. My first thought was "must be user error ... I wasted that thing!". My second thought was, I should have known better ... when somebody pwns your machine, it is impossible to ever make sure they can't get in again. The closest you can get to being sure is to reinstall the OS. I told my wife "Sorry, honey, you're out of luck -- you're going to need to reinstall. Do you have anything you need on there that isn't backed up?" I turned out that most of what wasn't backed up was my stuff -- files that were rescued from a dying laptop a couple of years ago that were way too important to lose but not so important that I would need them in the last two years.

Paranoid thought of the day -- if you have a rootkit on your computer, and you want to back up your files to external media (usb stick, cd, etc), how do you know you're not going to transmit the virus via that media? I suppose I could copy files to my mac, since a windows virus would be unlikely to affect a mac (not tha mac's don't have their own security problems). But there is still a problem if we want to put those files on my wife's computer again after we reinstall. It would absolutely suck if the infection was via a malicious pdf and we wiped the computer clean only to have it come back when she opened the pdf again.

Rootkit Removal (Part 2: Excision)

I had found the mechanism that the rootkit was using to hook itself into the system, which was a registry Run key. The rootkit monitored the key and if you ever tried to remove the entry, it would add it right back.

Programmers often fail to deal with permissions problems -- that is they can deal with a resource not being present, but not so well with a resource being inaccessible because the user doesn't have the right permissions. With this in mind, I tried removing write permissions on the Run registry key. This created a chicken-and-egg problem, since in order to remove the malicious entry I needed write permissions, but if I removed the entry the rootkit would change it back before I could change the permissions. I tried this a few times, including other permissions permutations involving other users. No luck.

Rootkit Removal (Part 1: Detection)

The other day my wife discovered that her computer had been pwnd -- a rootkit was redirecting links resulting from google searches to point to malicious sites. I was able to disable it* after a lot of trouble, and I thought I would write a post describing what I tried and what worked and what didn't.

My wife is running windows XP and had been complaining of "slow internet" and "weird thing happening in Firefox". The slow internet part I chalked up to Clearwire being generally flaky, and "weird things" was too vague to consider as anything worth worrying about. One day she complained that when she clicked on a link as a result of a google search, something else came up. I tried the same search from my machine, and got the expected results. Hmm... something sounds broken.

Stream Processing (and Aqueduct Intro)

Since the dawn of time (or at least the dawn of unix time, Jan 1 1970), computer scientists have known that stream processing was a powerful programming model. Aqueduct is a stream processing system I built while working at Mindset Media that they graciously allowed me to open source (available under Apache license on codeplex). Over a couple of posts I'm going to walk through a couple of cool things that you can do with it, but for now, I'll just talk about the benefits of the streaming model.

Let's say you have some data that you want to do something with... to be simple let's say you want to compute an average from the data samples that were taken on a weekday. Normally you'd read in the data, and add each weekday sample and divide by the length.... easy, right? Well that is great, but what if the data is too big to fit into memory? Ok so you break it up into chunks ... but how big should those chunks be*? I don't know, but you need to make sure it runs on your manager's laptop at the same time as his 40 IE browser windows. How much of this code is reusable? Maybe if you structure it carefully you can reuse most of it, but if you want to add a different calculation (say variance in the samples) the code will likely get more complex.

Stream processing can solve these problems by viewing your computation as a pipeline of computations and your data as a stream of samples passing through that pipeline. For the example above, you have two pipeline stages -- a filter and an adder. The first stage filters out samples that weren't on a weekday, the second computes the sum. Since individual data samples are processed one at a time, it doesn't matter how big the underlying data set is. And the components are reusable ... if you have another data source you want to remove non-weekday samples from, you just need to drop in the weekday filter. If you have another value you want to sum, you just drop in the adder.