Alright, computer science majors, help needed:

A simple clustering algorythm; objects have tags assigned to them. How do I figure out which tags are “relate”, in other words, if given a tag, which other tags have been used to tag the same movies?

It needs to be efficient in MySQL, and the tables look like this:

objects (id, …)
tags2objects (tagid, objectid)
tags (id, …)

(1.2M Quicktime). I was trying to compare two cameras by filming simultaneously and then editing it together, but having different movie formats in my Vegas editing program really messed up the rendering: it ended up taking hours and hours. So I ended up with this little piece of crappy movie that doesn’t really let you compare the quality of the two cameras very much. Right.

The cameras I was comparing are the Mustek DV 4000 ($120) and the Canon Powershot SD 100 ($190 at Amazon).

The Canon is much better for videoblogging: better picture, better sound, and much better build quality. It will last a lot longer, and is not that much more expensive either. Disadvantages of the Canon: you can only film for about 30 secs on max resolution, a few minutes on the mid resolution (which is what I always use). It doesn’t adjust light to changing conditions, so if you move to a dark hallway while filming things come out very dark.

The Mustek:

The Canon:

(Quicktime, 1.5M) I was trying to compare two cameras, but when I started editing it things went wrong, rendering two different formats in one movie took hours, and I ended up with this useless crap. Oh well :)

Since I can’t get my external harddrives to work correctly with my laptop, I am using a different backup method now: FTP. I use SyncBack (free) to set up daily backups to a webserver I run. I’ll report back about how well this works. So far so good, I’ve started backing stuff up.

I’m quite happy using OpenOffice, except for its presentation program. Powerpoint is just so much better. So I want to buy Powerpoint. It’s 200 US$, can I get it somewhere cheaper? It seems such a drag to buy Powerpoint, I just want to get the key and pay. Microsoft has a free trial, but they send a CD – you can’t download it! I don’t want to wait for a CD! It’s already installed on my Dell laptop, I just want the product key.

I have 50 Gmail invites so if you want one just leave a comment. It won’t show up straight away (moderation), but I’ll get it.

Something I have learned: in the USA, if you get an overdraft notice or something, just call your bank and ask them to remove the penalty charge, and they’ll almost ALWAYS remove it. Just ask, and give a reason they can enter in their computer system, like “I was out of the country” or something. It almost always seems to work.

My Hoboken vlogtour: I walked around Hoboken and gave a tour. Click the animated gif to see the movie (Quicktime, 83M)

Peter Van Dijck’s Guide to Ease �

I still have the same problem with my harddrive. I would pay someone to tell me how to fix this.

Basically, I have 2 external firewire harddrives, and they work fine on one compute, but with my laptop (using a firewire card), they ‘break’ if I write more than a few MEgs to them (like in a backup). It happens again and again. I have SP2 installed, which is supposed to fix the problem. More details in the link. I would really love some help! I depend on my laptop and I need to back up my files, and dragging them in one by one just doesn’t cut it.

More on i18n folksonomies: ButtUgly
And that means that tags will become “language polluted.” Take a look at the Technorati tag for “Macintosh”, for example. Many of the blog entries are in Japanese. If you look at Orkut, many of the parts of it suddenly became “owned” by Brasilians, which essentially drove away English speakers.”

I disagree: the internet didn’t become language-polluted. What will happen is that tag “namespaces” will develop, somewhat mirroring languages, but also other social groups like interest groups, specialist communities, … All these will develop their own tagspaces.

See also my post Folksonomies in Japanese.

| | | |

I’m looking for a PHP/MySQL programmer for some freelance work on database optimization. If you LEFT OUTER JOIN in your sleep, email me at peter van dijck at the google email domain (you know, gmail). There might be some ongoing freelance work there, it all depends on how things work out.

This is a small project for now, but a fun one – I’m a demanding client but a really good one: you won’t have to do much of that client “management” stuff (that’s what they all say though!)

What equipment do you recommend to record interviews? I need:

– be able to record interviews.
– attach a mike if needed.
– record from phone conversations if needed.
– be able to get the interview on my computer easily.

A combination of equipment (a taperecorder with some digital thingie) is also ok. I want to pay < US$ 100, but get lots of storage (over, say, 10 hours) and good quality.

I have an iPod mini but the Belkin recording devide isn't compatible with the mini.

Is Technorati gaming Google?

Update: Technorati DOES recognize links to other sites for tags, it’s just not apparent if you don’t check their help page. So you can pretty much ignore what I wrote here. I’d still like it if they made it clearer upfront.

Technorati’s new tag feature is brilliant, however, they’re trying to game Google. I love Technorati but this is such an obvious scam that it makes me mad.

On every tag page, they tell people that, “To contribute, just make a post to your blog about xx and include the link below. http://www.technorati.com/tag/x&#8221;.

This means that, if you want your blogpost to show up on the Technorati page, the easiest way to do it is to add a link with that keyword to that Technorati page. Sounds like Google gaming to me. All the good keywords will suddenly have a really good Technorati page for them, and lots and lots of links to the page, using thta keyword, with relevant posts. In a way that’s fine, it makes semantic sense on the web. In another way though, it’s not fine, because there is NO reason why they link should go to them, if the rel attribute indicates it’s a tag. They are excluding other tag namespaces.

What they should do instead, is accept ANY link that has the rel=”tag” in it. That’s how namespaces work, remember. Now they might already be doing that, but it’s not clear from the tag pages. What they say on those pages, essentially, is: “We will put a link to your post on this page, if you put a link to this page, with this word, on your site.” It’s just dodgy, and I’d like them to change that.

Of course, I haven’t really thought this through in any great detail, so I’ve probably missed something. Enlighten me!

(Can you tell I haven’t had my coffee yet? The reason I became angry is that I found myself adding lots and lots of keyword links to Technorati in my posts, and I thought, wait a minute! This is fishy!)

Trackback spam is on the rise, this is probably a good time to turn off trackbacks for a few weeks. In WordPress, go to Options > Discussion and uncheck the second checkbox, that says “Allow link notifications from other Weblogs (pingbacks and trackbacks.)”.

I have left anonymous comments on, but I have kittens spaminator installed so it’s not too bad. I also approve ALL comments before they go live (in WordPress, on the same page, check “An administrator must approve the comment (regardless of any matches below)”.

MySQL question.

table tags (id, …)
table video (id, …)
table video2tags (videoid, tagsid, …)

Given a number of tag id’s, I am trying to select videos that are tagged with multiple tags.

SELECT video.id FROM video, video2tags WHERE (video.id = video2tags.videoid AND video2tags.tagsid = 89) AND (video.id = video2tags.videoid AND video2tags.tagsid = 88);

This doesn’t seem to work. Any pointers would be really welcome!

| The Register

Interview with a comment spammer: “Link spamming, with its abuse of common resources, turns out the most efficient, just as cutting down virgin Indonesian and Amazonian rain forest is the most efficient way for loggers there to get wood. If it raises the global temperature of the blogging community, well, that’s life on planet internet, isn’t it?”