presentation at icel 09

so a couple of weeks ago i presented my third academic publication, this one titled “persisting chat for communities of practice.”

joan at icel 2009

joan at icel 2009

in layman’s terms, it’s a new logging system for online group chat (irc, jabber, etc.) with special integrations into non-synchronous systems such as web forums (academically often called asynchronous learning networks), blogs, etc. the goal is to make chat less transient and throwaway, to promote it to a first class citizen within the wide variety of mechanisms that can be used to help communities. it’s especially designed for communities of practise as described by lave and wenger out at xerox parc way back in the day. go read the linked article, it’s not half bad for wikipedia.

the system will be released under an open source license later this year. if you’d like to get involved, comment here using your real email address and i’ll get in touch.

CouchDB benchmarking followup

I wanted to post a public thanks to the great minds over in freenode #couchdb, including Paul Davis and Jan Lehnardt for helping me learn more about CouchDB, and helping me investigate the performance issue I posted about last time. In fact, they both posted on planet couchdb with some thoughts about benchmarking in general.

I wanted to share my reason for benchmarking CouchDB in the manner I did. The fact is that it was entirely personal in nature, as in trying to make my own dataset and code as fast as possible. I’m not trying to generate pretty graphs for my management, for publication or for personal gain. This work comes out of having to load and reload fairly large datasets from scratch on a regular basis as a part of my research methodology. I need a large dataset to get meaningful results out of the rest of my system, and I was not content to wait for an hour or two each time my system bulk loaded the content.

So the suggestions provided – not using random UUIDs to help CouchDB balance the b+-tree, correctly using bulk insert, and redoing my code to use the official couchdb interface (instead of a hacked-up version using raw PUTs and POSTs) helped a lot.

It turned out that the spike that I was seeing (see last post) disappeared when I randomized the order of incrementing that variable, so much so that 3 randomized runs show almost no peaks. However, when I “scroll” through that variable (increasing the batch size) I still see the peak around a batch size of 3k.

Trying the software on another platform (AMD x64-3500 Debian lenny with a 3ware hardware RAID array, as opposed to a 4-core Mac Pro with only a single local disk) revealed that the peak shifted to a different value, a much higher one.

Lesson? Always benchmark your own application against your own data, and tweak until you’re satisfied. Or, in the words immortalized at Soundtracks Recording Studio, New Trier High School, “Jiggle the thingy until it clicks.”

I suspect suspected that Jan’s article ranting about benchmarking was at least in part stimulated by my experiences as shared over IRC. (I was wrong – see the comments.) They must have seemed somewhat inscrutable — why would someone care so much about something most database-backed applications will do rarely, as compared to reads, queries and updates? My application right now has a very particular set of criteria for performance (repeatable high performance bulk load) that is not likely anyone’s standard use case but my own. Nor is it going to be a worthwhile effort on the developers’ part to spend a bundle of effort optimizing this particular scenario.

That said, Jan also is calling for people to start compiling profiling suites that “…simulate different usage patterns of their system.” With this research, I don’t have the weight of a corporation who is willing to agree on “…a set of benchmarks that objectively measure performance for easy comparison,” but I can at least contribute my use case for use by others. Paul Davis’ benchmark script looks quite a bit like what I’m doing, except the number of documents is larger by a factor of 100 (~2mil here) and the per-document size is smaller by a factor of 25 (100-200 bytes here). Knowing the time it takes to insert and to run a basic map/reduce function on fairly similar data is a great place to start thinking about performance considerations in my application.

Oh, and knowing the new code on the coming branches will get me a performance increase of at least 2x with no effort on my part is the icing on the cake.

Thdavisp. Thjan.

CouchDB 0.9.0 bulk document post performance

Based on a tip from my university colleague Chris Teplovs, I started looking at CouchDB for some analytics code I’ve been working on for my graduate studies. My experimental data set is approximately 1.9 million documents, with an average document size of 256 bytes. Documents range in size from approximately 100 to 512 bytes. (FYI, this represents about a 2x increase in size from the raw data’s original form, prior to the extraction of desired metadata.)

I struggled for a while with performance problems in initial data load, feeling unenlightened by other posts, until I cornered a few of the developers and asked them for advice. Here’s what they suggested:

  1. Use bulk insert. This is the single most important thing you can do. This reduced the initial load time from ~8 hours to under an hour, and prevents the need to compact the database.
  2. Baseline time: 42 minutes, using 1,000 documents per batch.

  3. Don’t use the default _id assigned by CouchDB. It’s just a random ID and apparently really slows down the insert operation. Instead, create your own sequence; a 10-digit sequential number was recommended. This bought me a 3x speedup and a 6x reduction in database size.
  4. Baseline time: 12 minutes, again using 1,000 documents per batch.

Using 1,000 documents per batch was a wild guess, so I decided it was time to run some tests. Using a simple shell script and GNU time, I generated the following plot of batch size vs. elapsed time:

Strange bulk insert performance under CouchDB 0.9.0

Strange bulk insert performance under CouchDB 0.9.0

The more-than-exponential growth at the right of the graph is expected; however, the peak around 3,000 documents per batch is not. I was so surprised by the results that I ran the test 3 times – and got consistent data. I’m currently running a denser set of tests between 1,000 and 6,000 documents per batch to qualify the peak a bit better.

Are there any CouchDB developers out there who can comment? You can find me on the #couchdb freenode channel as well.

thing-a-day #11: mac stabilization

been fighting my mac for weeks now, with constant freezes, hangs, system-wide crashes or video corruption from 1-100 minutes after reboot. seems i followed some bad advice in the past and turned on something i shouldn’t have. so, my thing for today is sharply worded advice:

Do not enable QuartzGL (2D acceleration) on your Mac Pro. To check that QuartzGL is off, open Terminal, paste in this line and press Return:

sudo defaults write /Library/Preferences/com.apple.windowserver QuartzGLEnabled -boolean NO

Reboot to make this take effect. Voila, no more annoying crashes. You’re welcome.

thing-a-day #6: recovering old digital performer projects

I had a terrible scare tonight. None of my Digital Performer (my DAW) projects from before I moved to my new Mac would open. It suddenly felt like I’d lost 5+ years worth of musical experimentation.

After panicing a bit, I did a whole lot of research, and came up with this process. It’s slow, but it works. And it’s a thing for today since no one else has ever written it all up in one place before.

  1. Go to the Terminal and change directories to your project, for example: cd Waynemanor/DP\ Projects/Barracuda\ Project/ (If use of a UNIX command prompt and escaping spaces are new to you, you may want to read through a tutorial first.)
  2. Use ls to find the files that are your project files. In this case, I have two: Barracuda and dys4ik 2006-02-28.
  3. Install the Apple OSX Developer Tools if you don’t already have them.
  4. Use the following command: SetFile -t PERF -c MOUP <project-file-name>substituting each project name in turn.

You’re not done. Your audio files may also be corrupted. Try loading the project into DP. Still problems? Getting a Resource file was not found (-193) error? Your DPII’s resource fork got lost, probably because you copied to a non-Mac system and back. Try these steps. Some guesswork may be required.

  1. Download, install and run SoundHack.
  2. Use File > Open Any (cmd-A) to open your first sound file from the Audio Files directory.
  3. Use Hack > Header change (cmd-H) to assign the correct sample rate, number of channels and encoding. Most DP projects have single channel. You just have to know what the sample rate is (usually 44.1 or 48) and how many bits deep it is (8, 16, 24, 32 are most common). Press Save Info
  4. Select File > Save a copy (cmd-S). Be sure to set the same bit depth here as you used in the file’s header, or SoundHack will do a conversion! Save a copy somewhere else, like your desktop. Be sure to save as the same name as the original file to prevent confusion later.
  5. Navigate to where you saved the file and double-click to open in your favourite sound program. This could be QuickTime, AudioFinder, DSP-Quattro, or even DP itself. Play to make sure it sounds right. If not, you got the sample rate or number of bits wrong. Go back to SoundHack and try again.
  6. Painstakingly repeat this for each of your sound files. This could take a while.
  7. In the DP project folder, move your Audio Files folder aside. Place all of the newly patched files into a new folder called Audio Files.
  8. Try re-opening the project in DP. You should be able to pick up where you left off.
  9. Grab a cold one. You deserve it!

thing-a-day 1: euphonix mc mix review part 1

Introduction

After my photography buddies insisted that 90% of photography is your digital darkroom workflow, and convinced me to switch to Adobe Lightroom, I’ve been struggling to get my studio workflow similarly streamlined. I grew up on studio production in the late 1980s, meaning large analogue mixers, one channel per input, and everything mixed down to 8 sub-mixes running to a 1/2″ 8-channel reel-to-reel recorder (Tascam 38). I naturally think in terms of taking everything down to 8 busses, then doing a final mixdown “live” from tape to 2-track. It’s a two-step workflow that I can do in my sleep. It’s also 20 years out of date – long due for an overhaul.

So when I saw this press release from MOTU (makers of my DAW, Digital Perfomer, and my PCI-based audio interfaces) touting a new “high-end” control surface – the Euphonix Artist series – I decided the time had come to make a change. (The forthcoming MOTU Volta plugin pushed me over the top.) I’d heard of tons of difficulties using Mackie Controls and HUIs with DP previously, so reading an honest-to-goodness press release from MOTU left me hopeful they would proactively work to make the Euphonix devices the best control surface for DP. So, I bought a cheap used MOTU 24i to accompany my 1224, and ran every device in my studio directly into the computer. Knowing some of the limitations I might experience being an early adopter, I spent the cash on the MC Mix. I figured that 8 tracks of full-motion faders and endless rotary encoders (MC Mix) would be preferable to 4 tracks + a touchscreen (MC Console), as I’m used to grabbing for knobs and only looking at a meter bridge. Later, I rationalized, I could add the MC Console if I wanted. Also, my friend dys4iK gave me a Shuttle Xpress, which I use as a jog/shuttle/transport device – meaning I don’t need another one just now. (Review forthcoming.)

Unboxing

The MC Mix comes well-packed in an attractive box. The device itself is well weighted, and feels solid in your hands. A big kudos to Euphonix for only using red, yellow and green LEDs; no bright blue LEDs blinding you from this device! The OLED track indicators are also quite attractive and understated, with very little lag and no discernible flicker in my incandescent-lit studio. If I had any complaint about the physical device, it’d be that the rotary and fader knobs are made from metallized plastic. I guess for the powered faders this is understandable – less mass to push around – but a bit surprising based on their look. Still, they slide nicely, and after the initial “plastic surprise,” I haven’t thought twice about the build quality. The box also includes a hefty power supply, and a piece of 6-foot Cat 5 Ethernet cable. This went straight into the second NIC of my DAW. I would have preferred a slightly longer cable coming from the line-mounted power brick to the MC Mix itself, but I can’t complain, really.

Setup

Installation was straightforward under OSX 10.5.6. Connect power and the networking cable to the MC Mix, then load the OSX driver. As Euphonix frequently releases driver updates (especially targeted at improving Digital Performer compatibility!) it’s best to download the latest drivers from Euphonix’ site directly, ignoring the packed-in CD-ROM. EuControl launches at boot with a spinning green logo in the Dock. After the EuControl driver detects the MC Mix, the logo stops spinning, and the 8 MC Mix OLED displays change from the Euphonix logo to 8 dotted boxes — a gratifying indication that communication has been established. The control panel for the driver has a large “Upgrade Firmware” button that does exactly what it says, trouble-free. There are also settings to fix specific tracks to specific sliders (“layouts”), as well as toggle various behaviour of the device. I left all of these on their defaults.

The quick start and user guide for the MC Mix are straightforward, and worth a good read. Five buttons on the left of the device select various modes – what the manual calls knob sets. Used in various combinations, you can access all of the features of a traditional mixing console, as well as settings for plug-ins / channel inserts. By selecting a specific channel in the CHAN mode, parameters for a single channel are spread out across the 8 rotary encoders, and can be paged through separately. This is a particularly nice feature, though there are some implementation problems in the current driver that cause difficulty with DP6 (see below).

One interesting shortcut mentioned in the manual – holding Shift and touching a fader – will reset it to 0.0dB. Looking at the silkscreening on my MC Mix, when zeroing the fader the value is actually about 0.5dB; it would be nice if there was a calibration feature in the driver to align 0.0dB exactly with the silkscreened position. As it stands, I’ll just look at the value on the OLED display or my monitor instead.

In Action – CueMix

Before jumping into Digital Performer, I figured I’d give the surface a spin with CueMix, using the Mackie Control and HUI emulation modes. Often I’m just jamming in my studio, and don’t want the weight (and intimidation!) of a full DAW. CueMix most closely matches the analogue mixer I used to use for just this purpose, letting me set pans and levels via faders and knobs, controlling the rest via MIDI routing. This requires drag-and-dropping the CueMix application onto the Euphonix control panel, selecting the correct emulation mode, and rebooting (!) Once you’ve finished that, you create a new Mackie Control or HUI device in Audio MIDI Setup, connecting the new Euphonix MIDI device to the Mackie Control or HUI device via one in and one out port. Be sure to set the manufacturer and device in Audio MIDI; CueMix uses this to determine the correct emulation mode. (The Euphonix MIDI device sports 4 pairs of in-out MIDI ports, presumably necessary if you link together up to 4 MC Mixes or 3 MC Mixes and 1 MC Control. I just used the first pair of MIDI ports.) Finally, you enable and configure the control surface from the Control Surfaces menu in CueMix. I checked the Application Follows Control Surface setting in the menu, in the hopes that CueMix would scroll horizontally as I paged left and right with the MC Mix. Sadly it doesn’t, even with EuControl set to Auto-bank to selected track and CueMix set to Application Follows Control Surface. (Incidentally, it’s disappointing that CueMix doesn’t get wide enough to shall all of my channels, even though I have the screen real estate. CueMix will show a maximum of 24 channels + 1 master horizontally. Fixing either of these two problems would result in a useful workaround.)

Immediately upon trying the Mackie Control emulation mode, I encountered a problem. The track titles displayed on the MC Mix OLED displays were actually the track title for the first channel, spread out across the first 6-7 OLED displays. Switching to HUI mode correctly assigned track names to each track, but there are still bugs: I can only display the first 3 letters of a track name, plus a single digit. As an example, a track named “CS-80 L” displays as “CS-” only. I also noticed that CueMix’s faders go from -inf to 0dB, while the silk-screening on the MC Mix goes from -inf to +12dB. Not a problem – EuControl maps +12dB (MC Mix) to 0.0.dB (CueMix), giving you the full slider length for use in CueMix. Support still isn’t perfect: the MC Mix shortcut of holding shift and tapping a fader to set it to 0.0dB still uses the (approximate) silkscreen 0.0dB level, which translates to -4.9dB in CueMix. Similarly, the gain for each track displayed on the OLED is with respect to the silkscreening, not the CueMix level (-inf to +12.0 dB). So, with the fader all the way up, the OLED displays +12.0 dB, but CueMix recognizes it as 0.0dB. This is a minor nuisance, but one I’d have not expected, especially in emulation mode. This is further complicated by the fact that live channel levels don’t display on the MC Mix – only in CueMix itself. (Sadly, this limitation still exists in DP6 as well.) The blame here may well lay with MOTU in their HUI control of CueMix; I’ve not used a real HUI with CueMix so I don’t know if it has the same limitations.

Of the other controls, mute and solo work as expected, except for the fact that the MC Mix has an “ON” button instead of a “mute” button. This is a strange choice on the part of Euphonix, but one I can live with – as long as I make the mental shift to expect the button to be lit instead of extinguished. Bank Left and Right, along with Nudge Left and Right, work as expected, shifting channels by 8 or 1 across the surface of the MC Mix. OLED displays and illuminated indicators shift along as expected. Pan works fine – though the rotary encoders are a bit jumpy with fast movements, they’re just fine at slower speeds. Oddly, trim mode won’t stay selected; after a fraction of a second, the rotary controls switch back to pan mode. This is a definite bug. Finally, none of the other knob set buttons have any effect. As a result there’s no way to access input mutes from the MC Mix, nor the CueMix talkback buttons.

This one isn’t documented, but is very helpful: switching CueMix’s console between output buses is accomplished using the Mix button and each fader’s SEL button. In this mode, each channel represents a pair of outputs, with bus muting working via each channel’s On button. Once you’ve switched to the bus you want to work with, use the Input button to return to channel mode. Once I discovered this, it was a snap to mix and assign my 34 inputs across the 14 outputs I have between my MOTU 1224 and 24i interfaces.

In short – despite its many small bugs, HUI mode for CueMix is functional, allowing me access to volume, pan, mute and solo across all available CueMix buses. Hopefully, MOTU will bring native EuControl support to CueMix as well, possibly updating the application to match their newer CueMix FX application for FireWire interfaces. (A girl can dream, can’t she?)

Tomorrow: the MC Mix under Digital Performer 6.01.

microblogging sillyness

Today a friend linked me to this article in the NY Times about networks, the US presidential inauguration, and twitter. Here’s the key quote, emphasis his/mine:

Biz Stone, co-founder of Twitter, said the company was hoping to sidestep network hiccups. He is not expecting the same traffic spikes as during the election, when the site was flooded with as many as 10 messages a second, but says the service “will nevertheless be doubling our through-put capacity before Tuesday.”

Because of all the hype Twitter gets, I couldn’t believe the figure was so low, so I checked elsewhere. “Twitter had by one measure over 3 million accounts and, by another, well over 5 million visitors in September 2008.” Simple math says there’s about 2.5 million seconds in a month, so 5 million impressions translates to one request every half a second. Presuming each query pulls a few pictures as well as text, 10 messages a second sounds about right based on published data. Let’s further assume each message is the size of an average Twitter page; mine came in at 34,100 bytes just now, or 341,000 bytes a second.

Bandwidth wise, that’s 2.728 Mbit/s, or roughly the bandwidth of 2 T1s. My home DSL line can push 700 kbit/s. With 5-6 of them bonded together, and the appropriate back end servers, I could run Twitter out of my basement.

It also isn’t very much, if you compare it with other semi-synchronous messaging technologies like IRC, Jabber and IM servers, who have been capable of pushing more data per second since well over 15 years ago. I’m sure mainframes were doing similar amounts of data I/O 30 years ago.

The snarky nerd in me wants to smear Ruby on Rails, the technology platform on which Twitter relies, but others did that 2 years ago already (and yes, that link defends the technology, and makes the ridiculous assumption that you can’t build in scalability.) I’m convinced it’s the incorrect application of a specific technology to solve a problem for which it is ill-suited. Perhaps the Twitter infrastructure never planned to expand so greatly, but I find it laughable that we’re in 2009 and that “important” services like Twitter can’t survive a “flood” of 10 messages a second. My friend agrees: “no i’m sure facebook is laughing at the 10 messages/a second ‘flood’ too.”

I’m also quite surprised that such a “popular” site, one that gets so much hype and marketing, really doesn’t get that much use. For comparison, here’s the figures for the Top 10 sites. Being generous and assuming those 5 million hits for Twitter are all unique visitors, that means the largest sites see more than 25 times the traffic it does. Facebook sees at least 10 times the number of unique visitors, and certainly will push more content, what with all the pictures and rich media it has vs. twitter’s limited use of graphics (small avatars only). Of course, none of this even gets into what AWS/S3 and content accelerators push from a pure bandwidth standpoint.

Increasingly, I’m convinced microblogging sites are hiveminds for particular flavours of individual. Disingenuously: StumbleUpon/Digg are “OMG LOOK AT THIS LINK!” Twitter feels like “marketing marketing SEO yadda yadda bend to my will.” Plurk is “cheerleader YAY happy happy dancing banana.” BrightKite is “mommy! mommy! look at me now!” And yes, IRC is probably the Wild Wild West. Others I know have made similar comparisons between LiveJournal, mySpace, Facebook and Friendster. I’m not sure what predestines a technology for a specific sort of person, but the link is there. This might make a good research paper. ;)

camera dead

You might have noticed that my last few posts have been image-less. This isn’t because I’m protesting visual communication. I love it – in fact, I’m downright jealous of what most other bloggers pull off with the visual design of their sites!

My camera died. Yeah, the “new old” Olympus C-5050 I bought broke. I only have myself to blame. The camera was on my kitchen counter. I reached for it, but scooted it off the counter, dashing it on the floor. The damage could have been worse – only the mode dial came off. I can’t just reattach it using the screw hack someone posted, because the impact on the floor cut through traces on the flexible PCB. And they’re too fine to bridge easily. Besides, disassembling to the point where I could even consider repairing that damage ended up cracking a couple of very delicate plastic tabs, ones designed to hold the whole thing together. I could try and glue all of it together, but…

Someone on eBay has the entire mode dial assembly for $39, plus $8 for shipping to Canada. Soon I’ll have a working camera again, and by then the studio should be warmer, so lots more pictures and music for everyone.

Oh, and I’ll have my first actual honest to goodness conference submission done by then too (deadline: Dec. 18th). It’s been a long time coming but it’s good to have actual research data and prepare it for publication again. YES!

howto fix mackie onyx firewire under osx 10.5.5

After upgrading Waynemanor Studio’s Intel-based Mac to OSX 10.5 (10.5.5), I was unable to get the Mackie Onyx 1640 FireWire interface to stream audio successfully to/from the Mac. When playing audio from the Mac to the Onyx (just from the System Preferences Sound panel, selecting the Onyx FireWire 0838 output for system sounds and clicking the Purr sound – no DAW software), I’d get the spinning beachball for ~10s, then stuttering, clicking, popping sound would come out. Actually running my DAW made things worse; the application would hang, and Force Quit didn’t help. (Power cycling the Onyx allowed the Force Quit to work.)

Mackie lists this audio driver rollback (PDF) on their website, but the first try at it didn’t work. Here’s how I managed to finally get everything working correctly under Apple OSX 10.5.5:

  1. Sign up for an Apple Developer Connection account. It’s free, and required to download the software you need.
  2. Download both the FireWire SDK 26 for Mac OSX and the FireWire SDK 24 for Mac OSX.
  3. Mount both image files and install the package files from both (FireWireSDK26.pkg and the confusingly-named FireWireSDK23.pkg). This will create directories under /Developer on your system drive.
  4. From /Developer/FireWireSDK26/FireWireComponents, install the Leopard Final drivers. Reboot.
  5. From /Developer/FireWireSDK26/FireWireComponents, install the FireWireAudio 2.4 drivers. Reboot.
  6. Select Software > Extensions on the left-hand browser. Look for AppleFWAudio, and make sure it is version 2.4.0.
  7. From /Developer/FireWireSDK24/FireWireComponents, install the FireWireAudio 2.0.1 drivers. Reboot
  8. From the Apple menu, select About this Mac, then click the More Info button to start System Profiler.
  9. Select Software > Extensions on the left-hand browser. Look for AppleFWAudio, and make sure it is version 2.0.1.
  10. Go to the System Preferences > Sound panel and try sound output to the Onyx Firewire 0838. It should sound clear as a bell.

I don’t know why installing the latest SDK FW base drivers and the FireWireAudio 2.4 drivers first was required before the 2.0.1 drivers would correctly fire up, but it was. One warning: do not install the Leopard FireWire (not FireWireAudio) drivers from the 24 SDK. This caused my machine not to boot correctly, and I had to repair it using another machine.

Here’s hoping this helps someone out in the wild. I’d post it to the Mackie forum, but the moderators there have yet to enable my posting rights. :(

plurk

Well, if you’ve visited my website in the last day, you’ll see I’ve added my plurk widget to the sidebar. I’m finding plurk altogether more motivational than twitter. A recent comparison of the two services I think was unfair to plurk, mostly due to overhyped twitter features that aren’t even working correctly half the time.

Even if Twitter solves their reliability concerns, plurk just feels more alive. It’s not a single thread with everyone’s crap mixed in (and hacks like #hashtags). It’s a threaded timeline: part web forum, part IM, and part IRC. And yes, a bit twitter, too. I like the fact that it’s Web 2.0 enough for my friends who won’t get on IRC (still my #1 place for synchronous) and that the web layout is attractive. The “get more points to get more features” thing is a pain, but I’d rather have that than some sort of “pay us $$ to get more features,” I think.

I moved over to plurk when some of my friends followed Leo Laporte over. I’m not a personality cult-er, but I do like keeping tabs on my friends. And I’m finding that plurk’s interface encourages more synchronous collaboration (read: chatting) than twitter ever did. In about 48h on the service I’ve managed 15 “plurks” (microblog entries) and 38 responses; I never hit that level of engagement with twitter.

Once they add some sort of SMS interface (I can’t get their IM interface to work…) it’ll be a sure fire hit, I think.

So you twitter types out there: would you miss me from twitter if i semi-abandoned it?