presentation at icel 09

so a couple of weeks ago i presented my third academic publication, this one titled “persisting chat for communities of practice.”

joan at icel 2009

joan at icel 2009

in layman’s terms, it’s a new logging system for online group chat (irc, jabber, etc.) with special integrations into non-synchronous systems such as web forums (academically often called asynchronous learning networks), blogs, etc. the goal is to make chat less transient and throwaway, to promote it to a first class citizen within the wide variety of mechanisms that can be used to help communities. it’s especially designed for communities of practise as described by lave and wenger out at xerox parc way back in the day. go read the linked article, it’s not half bad for wikipedia.

the system will be released under an open source license later this year. if you’d like to get involved, comment here using your real email address and i’ll get in touch.

okonomiyaki recipes

Good buddy neillathotep has been bugging me to post about my okonomiyaki escapades – so here you go.

The recipe is really simple. Shred cabbage, green onion, garlic scapes (if you have them!), red ginger (you can buy this pre-shredded), shredded nori. Cut up a bunch of other foods you’d like in your okonomiyaki, such as bacon, mochi, pork, squid, etc. For today’s recipes I made one with bacon, and another with brie and asparagus. Mix up 2 parts flour to 1 part dashi — you can use buckwheat flour if you like, or a mix of white, whole wheat, sweet potato/potato, etc. You can use salted water if you don’t have dashi. Also get one egg per pancake.

Ingredients for okonomiyaki

Ingredients for okonomiyaki

In a bowl mix up the egg with the cabbage and a cup of the flour-water mixture. Heat up a griddle to medium hot. Oil with sesame or sunflower oil. Dump out the egg-cabbage-batter and spread out to a pancake. Top with your special toppings. Use a spatula to press down on the pancake until the bottom is well cooked. Flip the pancake over and repeat the pressing routine until the pancake looks dry in the center when viewed edge-on.

Okonomiyaki viewed edge on. Just about ready!

Okonomiyaki viewed edge on. Just about ready!

Remove from the grill. Top with shredded nori, red ginger, kewpie mayo (if you can find it!), katsuoboshi (dried bonito flakes – ditto) and okonomiyaki sauce (decent collection of recipes here). Cut into 4 pieces and serve.

Ready to eat!

Ready to eat!

CouchDB benchmarking followup

I wanted to post a public thanks to the great minds over in freenode #couchdb, including Paul Davis and Jan Lehnardt for helping me learn more about CouchDB, and helping me investigate the performance issue I posted about last time. In fact, they both posted on planet couchdb with some thoughts about benchmarking in general.

I wanted to share my reason for benchmarking CouchDB in the manner I did. The fact is that it was entirely personal in nature, as in trying to make my own dataset and code as fast as possible. I’m not trying to generate pretty graphs for my management, for publication or for personal gain. This work comes out of having to load and reload fairly large datasets from scratch on a regular basis as a part of my research methodology. I need a large dataset to get meaningful results out of the rest of my system, and I was not content to wait for an hour or two each time my system bulk loaded the content.

So the suggestions provided – not using random UUIDs to help CouchDB balance the b+-tree, correctly using bulk insert, and redoing my code to use the official couchdb interface (instead of a hacked-up version using raw PUTs and POSTs) helped a lot.

It turned out that the spike that I was seeing (see last post) disappeared when I randomized the order of incrementing that variable, so much so that 3 randomized runs show almost no peaks. However, when I “scroll” through that variable (increasing the batch size) I still see the peak around a batch size of 3k.

Trying the software on another platform (AMD x64-3500 Debian lenny with a 3ware hardware RAID array, as opposed to a 4-core Mac Pro with only a single local disk) revealed that the peak shifted to a different value, a much higher one.

Lesson? Always benchmark your own application against your own data, and tweak until you’re satisfied. Or, in the words immortalized at Soundtracks Recording Studio, New Trier High School, “Jiggle the thingy until it clicks.”

I suspect suspected that Jan’s article ranting about benchmarking was at least in part stimulated by my experiences as shared over IRC. (I was wrong – see the comments.) They must have seemed somewhat inscrutable — why would someone care so much about something most database-backed applications will do rarely, as compared to reads, queries and updates? My application right now has a very particular set of criteria for performance (repeatable high performance bulk load) that is not likely anyone’s standard use case but my own. Nor is it going to be a worthwhile effort on the developers’ part to spend a bundle of effort optimizing this particular scenario.

That said, Jan also is calling for people to start compiling profiling suites that “…simulate different usage patterns of their system.” With this research, I don’t have the weight of a corporation who is willing to agree on “…a set of benchmarks that objectively measure performance for easy comparison,” but I can at least contribute my use case for use by others. Paul Davis’ benchmark script looks quite a bit like what I’m doing, except the number of documents is larger by a factor of 100 (~2mil here) and the per-document size is smaller by a factor of 25 (100-200 bytes here). Knowing the time it takes to insert and to run a basic map/reduce function on fairly similar data is a great place to start thinking about performance considerations in my application.

Oh, and knowing the new code on the coming branches will get me a performance increase of at least 2x with no effort on my part is the icing on the cake.

Thdavisp. Thjan.

CouchDB 0.9.0 bulk document post performance

Based on a tip from my university colleague Chris Teplovs, I started looking at CouchDB for some analytics code I’ve been working on for my graduate studies. My experimental data set is approximately 1.9 million documents, with an average document size of 256 bytes. Documents range in size from approximately 100 to 512 bytes. (FYI, this represents about a 2x increase in size from the raw data’s original form, prior to the extraction of desired metadata.)

I struggled for a while with performance problems in initial data load, feeling unenlightened by other posts, until I cornered a few of the developers and asked them for advice. Here’s what they suggested:

  1. Use bulk insert. This is the single most important thing you can do. This reduced the initial load time from ~8 hours to under an hour, and prevents the need to compact the database.
  2. Baseline time: 42 minutes, using 1,000 documents per batch.

  3. Don’t use the default _id assigned by CouchDB. It’s just a random ID and apparently really slows down the insert operation. Instead, create your own sequence; a 10-digit sequential number was recommended. This bought me a 3x speedup and a 6x reduction in database size.
  4. Baseline time: 12 minutes, again using 1,000 documents per batch.

Using 1,000 documents per batch was a wild guess, so I decided it was time to run some tests. Using a simple shell script and GNU time, I generated the following plot of batch size vs. elapsed time:

Strange bulk insert performance under CouchDB 0.9.0

Strange bulk insert performance under CouchDB 0.9.0

The more-than-exponential growth at the right of the graph is expected; however, the peak around 3,000 documents per batch is not. I was so surprised by the results that I ran the test 3 times – and got consistent data. I’m currently running a denser set of tests between 1,000 and 6,000 documents per batch to qualify the peak a bit better.

Are there any CouchDB developers out there who can comment? You can find me on the #couchdb freenode channel as well.

forever anonymous

Dragging myself to consciousness, I gasped for air. The images of  gravestones and memorials of the intellectual elite, festooned with working mainframe key punches and proofs of famous mathematical theorem in honour of their contributions to society, still lingered. I could still feel the dirt being shoveled on top of me prematurely, as I struggled to break free of my restraints. The bottoms of my lungs burned like the teenage mistake of inhaling deeply from a clove cigarette. Still, it burned less than the stinging sensation of my sub-conscious clawing through a thin layer of conceit I’d previously put up in my life to hide the twin holes of fear and shame.

What a terrible metaphor. I’ll start again.

Lately I’ve been working harder on my doctoral research, in the hopes that this may be my best chance to leave something of value behind in this world. I’ve chosen not to raise children, and judging from the superior job my friends J., D., N. and S. are doing with theirs, I made the right choice. I then look at my friends with some life calling they’ve dedicated themselves to since childhood, and wonder how different my life would have been if I only could have settled on one thing to do, rather than insisting on being a polymath.

Then I think of the reality: it’s a huge conceit to pretend that anyone will remember me at all 5 years after my death, let alone 500. The odds against that are so low that they’re unthinkable. But why the terror of it happening, if it’s so likely? And how does this square against my personal philosophy that it’s the ideas that count, not the people?

OK, so I’ll substitute idea for name, and the good feeling I would get from knowing I helped other people far into the future. It’s still a huge conceit to pretend that anyone will remember my ideas at all 5 years after I come up with them, let alone 500. So why fear the inevitable? And why put so much pressure on myself to achieve something that’s rare (and, most likely, out of my hands)?

Interestingly, I don’t fear death. I think I overcame that one many years ago when I struggled with some of the other demons in my life, and came out on top. Perhaps this is my chance to overcome this ridiculous idea as well. There was definitely a time at which I really bought into the idea that helping just one person was as good as helping a whole flotilla – and I’ve definitely helped at least one person in an immense way. What changed?

Now, after talking it out online, I’m fairly sure it was the dawning realization, though not quite ecological, that the resources I have burned (money, time and patience of those smarter than I, and yes, non-renewables like jet fuel and natural gas) are above the average. I also have been given a lot of unique and interesting opportunities, and gained a lot of special skills. It was then I decided that I had to do everything I could to put all of that good stuff back into the world, in as many ways as possible. Anything else is just conspicuous consumption. I owe it to everyone else to do as good as a job as I can to pay things back.

The thing is, my life is almost all about paying it back in one way or another. I’ve had my phases of acting spoiled, but after being forced (almost at knife-point) to volunteer my time in college, helping others has become a sort of addiction, perhaps even an unhealthy one. My job for many years now has nothing but helping other people get their jobs done more effectively. I teach, I learn (then re-teach what I learn), I research (and then teach what I find out), and I volunteer my time when I know full well I really should be taking it for myself. I helped keep a household of friends going when no others could really make ends meet. I’ve helped those in need find money for surgery, and given them the emotional strength necessary to pull through. It’s never felt like an obligation. But it’s never felt like it’s enough.

I treat that feeling of emptiness inside as telling me that I still have more work to do, more in me to give to others. Perhaps it is only a twisted redirection of guilt and shame, a hope at becoming immortal in some sense. But I don’t work hard only to see my name in lights (especially since it never happens), or predicate it only on the knowledge I’ll gain something out of it. It’s because I like doing a good job, because it does feel good to know I’ve accomplished something concrete, like presenting my first paper in several years at a conference. I’ll do it even if I’m ignored, or if someone else claims the credit for my work. (Unfortunately this sometimes makes me a poor businesswoman.) And, more importantly than anything else, I know that when I find I am doing something that causes harm to someone else, I change how I act. This is my own private version of walking softly; I have yet to figure out how to correctly carry a big stick, so I walk with my hands in my pockets instead.

And yet I still have dreams like this one tonight that wake me at 3:30 and prevent me from going back to sleep, and keep me writing uncontrollably. What am I missing? Am I acting reasonably? Should I be doing more? Less? Something different? I know I’m not the only person who has felt this way, but I also know my attention span is so poor right now that I can’t think of where to start researching to look at motivations of the great, the noble, the weak and those of the despicable monsters.I need a sanity check (and maybe a kick in the ass) so I can move forward. I refuse to sink into solipsistic musings, but a little introspection every now and again can’t hurt!

thing-a-day #15: deep fried kitchen

tonight i went overboard and deep fried things. yanno, when the oil is hot, ya gotta use it, right? my beer batter included sleeman’s cream ale, flour, and two kinds of bacon salt.

besides the fairly mundane broccoli and onions, i also deep fried local organic cheese curds. they are now my new favourite vs. mozzarella sticks.

but the real amazing item were the deep fried bounty bars, in the same batter (yes with bacon salt). these things have no right to taste this good. seriously. if you live in the US try finding bounty bars (coconut enrobed in dark chocolate), they’re in many places now. if not you will have to use the inferior mounds bars.