stunned at ms-plurk-ripoff

it’s no secret that i’m a big fan of plurk, despite my recent absence (social media exhaustion set in). i am especially happy because my plurkbuddy alvin woon moved back east to help promote the service, where it became the #1 microblogging service in China (prior to being firewalled).

not just because i want to see the little guy win, but also because it is just appalling that this occured:

Microsoft China stole Plurk’s UI and code and is pretending it’s their own service.

i could see a 2 bit startup doing this, or some non-multinational heavyweight that figured they could get away with it because they pay their lawyers more than the little guy. but this is just bald-faced theft.

i worked at a company many years ago who had their code stolen, and spent many years in the courts shutting down the competitor started by ex-employees who stole the code. from looking at the code involved, it was obvious it was a copy; in many places, error messages contained the same misspellings!

at the time, the ceo swore that he wouldn’t stop until he won back all of the business he lost to “the thieves,” and sued for damages for every cent lost. realizing they had a losing battle, the founders pled no contest, then the purchasing company settled out of court for about us$285mil all told. sadly, many of the customers they lost probably still use the purchasing company’s software instead; i think that company came out on top in the marketplace (for various other reasons). so my employer was vindicated, but didn’t manage to win back all of the business lost.

Plurk doesn’t have the resources to complete the lawsuit, but i hope that they find some other way to shut this down. it’s a different world now, 10-15 years later; maybe social media itself can stop this assault on the innovator. hopefully it will be before they, too, lose their loyal and active client base to a competitor.

on the workbench: elka rhapsody 610

picked up an Elka Rhapsody 610 61-key string synthesizer for a bit more than I would have liked on CL over the weekend. seeing the unit in person, it became clear that most of the damage was physical, and that it’d probably been some teenager’s keyboard or a badly treated gigging unit. half of the slider caps were missing (with the stem sheared off at the control panel), the piano output didn’t work, the sliders worked backwards (bass sliders controlling the treble and vice versa), 60Hz hum, etc.

got the unit on the new basement workbench as an inaugural challenge. found the schematic online and buzzed things out. problems found:

  1. physical damage to unit cracked 3 capacitors on the piano/clav filter board, preventing the piano output from making its way to the sliders and output. replacing with modern equivalents restored the piano sound.
  2. someone not very skilled in soldering “went at” the cancel board and mixed up a lot of wires. easily fixed, though rather than replace the entire wiring harness i just reattached the wires and added some tape/shrink tubing.
  3. the card edge connector for the wiring harness/fader panel is cracked in half. tried the classic “2 zip ties” solution to hold it together but i think i’m going to have to replace the connector entirely.
  4. toronto’s supremetronics/home hardware on college just west of spadina had slider caps that fit, even if they’re in stark white.

it felt good to get this thing repaired in just a couple of hours, and with only about $1 CAD in parts.

plans before i decide if i’m reselling the device:

  1. replace hard wired power cord with IEC power socket, with integral fuse/fuse puller. ($2.50 CAD in parts)
  2. replace proprietary volume pedal connection with standard 1/4″ TRS jack, suitable for use with 10-kilohm volume pedal ($3 CAD in parts)
  3. fix remaining physical damage (snapped plastic standoffs for cancel board, slider faders, ink scratched into front panel where former impromptu teenage rebel marked his favourite slider settings – probably about $10 CAD in parts and epoxy)
  4. possibly fashion replacement legs out of welded metal tubing, plates and threaded rod (unknown cost, guessing $10-20 CAD)

if you readers out there particularly want to buy a rhapsody 610 for that jarre-TD-vangelis sound, comment here with your real email address and i’ll be in touch. have yet to decide if i want to sell; it sure sounds nice through a phaser pedal or a spring reverb.

recipe: chicken pot pie

after japan, i had a craving for north american food. so, with doozer’s help, i made chicken pot pie very similar to this recipe. differences: instead of cream, an oil/flour roux to thicken. no pearl onions on hand. lesueur canned petits pois instead of frozen. seasoning with dried cilantro and a dash of paprika instead of parsley. and i used her pie crust 102 recipe – no vodka, just butter, all by hand, took all of 5 minutes to prep, honest injun.

the results were outstanding, if a bit high on the fat scale. slightly over half an 8″ pie later, i’m stuffed…the rest will be a great lunch tomorrow. also, the little pastry biscuit was a delicious appetizer. suddenly, puff pastry seems achievable! oyster patties are in my future, i think.

two pot pies with pastry biscuits

two pot pies with pastry biscuits

inside the delicious pie

inside the delicious pie

recipe: pão de queijo

been a while since i got time to post something. after a sudden and draining trip to Boston, i decided to take the weekend entirely for myself. 75% of it was spent sleeping. the rest was cooking and eating.

here’s the recipe i worked up for pão de queijo, a delicious Brazilian cheese bread.

Ingredients

  • 250g polvilho doce (cassava flour)
  • 135g milk (approx. ¾ cup)
  • 41g sunflower or canola oil (approx. 4 Tbsp)
  • 5g kosher or coarse sea salt (approx. 1 Tbsp)
  • 58g beaten egg (approx. 2 large eggs)
  • 62g finely grated Minas, Parmesan Reggiano, Pecorino Romano or mozzarella cheese (approx. ½ cup – see note)

Directions

Preheat oven to 350°F / 180°C.

Mix the milk, oil and salt in a pan. Heat the mixture until the milk scalds and starts to boil over. Remove from heat immediately and stir briefly. Place the flour in a medium size bowl and pour the milk mixture over it, scalding the flour. Stir until incorporated and no large chunks remain, about 3-5 minutes by hand or 60-90 seconds by electric mixer with dough hook.

Allow to rest until total time mixing & resting is no less than 5 minutes. Dough should be very chunky and crumbly at this point. Mix in the beaten egg until the mixture is consistent but still quite thick. Then, gently stir in the grated cheese. Dough should be sticky and thicker than cookie or biscuit batter, but workable.

Prep a cookie sheet with parchment paper. Dip fingers in a bit of extra oil to prevent sticking and form small balls of batter approx. 3-5cm in diameter and place on cookie sheet. Bake until light golden brown, between 20 and 35 minutes.

Yield 20 pão.


the cassava flour (aka yucca flour, aka “tapioca” flour but not what we know as “tapioca” in western culture) forms a special gluten that is safe for those with gluten sensitivities. the gluten has a particularly cheesy texture, which adds to the actual cheese in the recipe. if available, use 2 parts polvilho doce (regular cassava flour) to 1 part polivlho azedo (fermented or “sour” cassava flour). as far as i have been able to determine, it is not possible to substitute other flour types. find a Brazilian grocer, or get “tapioca flour” from an Asian grocer (it’ll usually be the right product).

classically the recipe is made with oil as a fat source. some variants swap in butter for a “richer” taste. i’d avoid that. you could substitute some olive oil instead of the vegetable oil, but it might not hold up during baking.

the selection of cheese is critical. traditionally this would be made with Minas cheeses from the Minas Gerais part of Brazil. the cheese recipes there were invented locally, but brought over and adapted from recipes from Italy in the late 19th and early 20th century. when not available, freshly grated Parmesan Reggiano is a good substitute. you could add a bit of a higher moisture content cheese to assist with consistency, such as a freshly grated Pecorino Romano, cotija or mozzarella. personally i find an all-mozzarella version strays too much from the original texture and flavour but it’ll do in a pinch. i have seen variants on the ‘net where people insert chunks of cheese in the middle. i tend to prefer the more consistent dough; the magic of this bread is that the dough itself has the cheesy flavour and texture brought about by the flour itself. if you want cheese-filled bread, try making bolinhas instead (recipe to come).

i seem to have eaten them all without taking a picture! they were that good. that said they looked a whole lot like this:

picture of pão de queijo

picture of pão de queijo

inspiration for the recipe came from this amazing article, in which the function of each ingredient in the recipe is analyzed. their conclusions are as follows:

  • viscosity (thickness/density) increases steadily as the milk is mixed with the flour
  • adding the egg drops the viscosity
  • adding the cheese raises the consistency to a level in between before and after the egg was added
  • different proportions of flour and different sorts of cheese had a minimal effect on bread consistency
  • egg and cheese are essential components whose proportions radically effect the outcome
  • the graph below shows the time plot of viscosity during the scalding, egg-mixing and cheese-mixing portions of batter preparation. the 4 different curves represent different proportions of polvilho doce and polvilho azedo: PAFC (100% azedo), PDFC (100% doce), PSFC (70% azedo + 30% doce) and PCFC (50% azedo + 50% doce)
graph of pão de queijo batter viscosity

graph of pão de queijo batter viscosity

i encourage you all to research this and post more experiments, especially with different proportions and ingredients!

presentation at icel 09

so a couple of weeks ago i presented my third academic publication, this one titled “persisting chat for communities of practice.”

joan at icel 2009

joan at icel 2009

in layman’s terms, it’s a new logging system for online group chat (irc, jabber, etc.) with special integrations into non-synchronous systems such as web forums (academically often called asynchronous learning networks), blogs, etc. the goal is to make chat less transient and throwaway, to promote it to a first class citizen within the wide variety of mechanisms that can be used to help communities. it’s especially designed for communities of practise as described by lave and wenger out at xerox parc way back in the day. go read the linked article, it’s not half bad for wikipedia.

the system will be released under an open source license later this year. if you’d like to get involved, comment here using your real email address and i’ll get in touch.

okonomiyaki recipes

Good buddy neillathotep has been bugging me to post about my okonomiyaki escapades – so here you go.

The recipe is really simple. Shred cabbage, green onion, garlic scapes (if you have them!), red ginger (you can buy this pre-shredded), shredded nori. Cut up a bunch of other foods you’d like in your okonomiyaki, such as bacon, mochi, pork, squid, etc. For today’s recipes I made one with bacon, and another with brie and asparagus. Mix up 2 parts flour to 1 part dashi — you can use buckwheat flour if you like, or a mix of white, whole wheat, sweet potato/potato, etc. You can use salted water if you don’t have dashi. Also get one egg per pancake.

Ingredients for okonomiyaki

Ingredients for okonomiyaki

In a bowl mix up the egg with the cabbage and a cup of the flour-water mixture. Heat up a griddle to medium hot. Oil with sesame or sunflower oil. Dump out the egg-cabbage-batter and spread out to a pancake. Top with your special toppings. Use a spatula to press down on the pancake until the bottom is well cooked. Flip the pancake over and repeat the pressing routine until the pancake looks dry in the center when viewed edge-on.

Okonomiyaki viewed edge on. Just about ready!

Okonomiyaki viewed edge on. Just about ready!

Remove from the grill. Top with shredded nori, red ginger, kewpie mayo (if you can find it!), katsuoboshi (dried bonito flakes – ditto) and okonomiyaki sauce (decent collection of recipes here). Cut into 4 pieces and serve.

Ready to eat!

Ready to eat!

CouchDB benchmarking followup

I wanted to post a public thanks to the great minds over in freenode #couchdb, including Paul Davis and Jan Lehnardt for helping me learn more about CouchDB, and helping me investigate the performance issue I posted about last time. In fact, they both posted on planet couchdb with some thoughts about benchmarking in general.

I wanted to share my reason for benchmarking CouchDB in the manner I did. The fact is that it was entirely personal in nature, as in trying to make my own dataset and code as fast as possible. I’m not trying to generate pretty graphs for my management, for publication or for personal gain. This work comes out of having to load and reload fairly large datasets from scratch on a regular basis as a part of my research methodology. I need a large dataset to get meaningful results out of the rest of my system, and I was not content to wait for an hour or two each time my system bulk loaded the content.

So the suggestions provided – not using random UUIDs to help CouchDB balance the b+-tree, correctly using bulk insert, and redoing my code to use the official couchdb interface (instead of a hacked-up version using raw PUTs and POSTs) helped a lot.

It turned out that the spike that I was seeing (see last post) disappeared when I randomized the order of incrementing that variable, so much so that 3 randomized runs show almost no peaks. However, when I “scroll” through that variable (increasing the batch size) I still see the peak around a batch size of 3k.

Trying the software on another platform (AMD x64-3500 Debian lenny with a 3ware hardware RAID array, as opposed to a 4-core Mac Pro with only a single local disk) revealed that the peak shifted to a different value, a much higher one.

Lesson? Always benchmark your own application against your own data, and tweak until you’re satisfied. Or, in the words immortalized at Soundtracks Recording Studio, New Trier High School, “Jiggle the thingy until it clicks.”

I suspect suspected that Jan’s article ranting about benchmarking was at least in part stimulated by my experiences as shared over IRC. (I was wrong – see the comments.) They must have seemed somewhat inscrutable — why would someone care so much about something most database-backed applications will do rarely, as compared to reads, queries and updates? My application right now has a very particular set of criteria for performance (repeatable high performance bulk load) that is not likely anyone’s standard use case but my own. Nor is it going to be a worthwhile effort on the developers’ part to spend a bundle of effort optimizing this particular scenario.

That said, Jan also is calling for people to start compiling profiling suites that “…simulate different usage patterns of their system.” With this research, I don’t have the weight of a corporation who is willing to agree on “…a set of benchmarks that objectively measure performance for easy comparison,” but I can at least contribute my use case for use by others. Paul Davis’ benchmark script looks quite a bit like what I’m doing, except the number of documents is larger by a factor of 100 (~2mil here) and the per-document size is smaller by a factor of 25 (100-200 bytes here). Knowing the time it takes to insert and to run a basic map/reduce function on fairly similar data is a great place to start thinking about performance considerations in my application.

Oh, and knowing the new code on the coming branches will get me a performance increase of at least 2x with no effort on my part is the icing on the cake.

Thdavisp. Thjan.

CouchDB 0.9.0 bulk document post performance

Based on a tip from my university colleague Chris Teplovs, I started looking at CouchDB for some analytics code I’ve been working on for my graduate studies. My experimental data set is approximately 1.9 million documents, with an average document size of 256 bytes. Documents range in size from approximately 100 to 512 bytes. (FYI, this represents about a 2x increase in size from the raw data’s original form, prior to the extraction of desired metadata.)

I struggled for a while with performance problems in initial data load, feeling unenlightened by other posts, until I cornered a few of the developers and asked them for advice. Here’s what they suggested:

  1. Use bulk insert. This is the single most important thing you can do. This reduced the initial load time from ~8 hours to under an hour, and prevents the need to compact the database.
  2. Baseline time: 42 minutes, using 1,000 documents per batch.

  3. Don’t use the default _id assigned by CouchDB. It’s just a random ID and apparently really slows down the insert operation. Instead, create your own sequence; a 10-digit sequential number was recommended. This bought me a 3x speedup and a 6x reduction in database size.
  4. Baseline time: 12 minutes, again using 1,000 documents per batch.

Using 1,000 documents per batch was a wild guess, so I decided it was time to run some tests. Using a simple shell script and GNU time, I generated the following plot of batch size vs. elapsed time:

Strange bulk insert performance under CouchDB 0.9.0

Strange bulk insert performance under CouchDB 0.9.0

The more-than-exponential growth at the right of the graph is expected; however, the peak around 3,000 documents per batch is not. I was so surprised by the results that I ran the test 3 times – and got consistent data. I’m currently running a denser set of tests between 1,000 and 6,000 documents per batch to qualify the peak a bit better.

Are there any CouchDB developers out there who can comment? You can find me on the #couchdb freenode channel as well.