Mmmmm…. I’m still having the same problem with Google search on my site. Searches don’t return any results. If I use the Google toolbar to search my site it works though.

Ah, I found the problem. I entered a directory within my domain as my site. Turns out that doesn’t work. When I changed it to just te plain domain name (on the Google page that generates the HTML code to put on my site), searches started workin. Seems like a bug to me.

I was trying out Google’s new sitesearch+ads yesterday and it didn’t really work. Today they sent an email saying there was something wrong with their code, and it’s been fixed. Good, so it wasn’t me :)

Google AdSense now lets you provide search to your site and make money on the ads shown in the search results. Nice. I tried to implement it on a site of mine but all searches returned no results. I’ll try again in a few days.

I have some free time coming up, and I hope to do some research about a few things:

– the Everything Else category (example at Half.com)

the co-construction of users and technology. I’m ordered 5 books, mostly out of the anthropology field, about how users and technology are constructed. Fascinating stuff, with definite repurcussions for how we construct taxonomies. Don’t be surprised if I write about the co-construction of users and taxonomies soon ;)

– international information architecture

– the properties of classification systems. I’d like to expand our understanding of classification systems a bit with information from not just the library sciences and IT sciences, but to include cognitive science and the social sciences. What are the cognitive relevant aspects of certain types of classification systems? What can we learn from the research on cognitive basic-levelness of categories? How exactly do power relations in a social system influence the creation of classification systems? How about identity? How do users co-construct classification systems?

Lotta stuff, I probably will only get to some of that this year… Sometimes I wish I was a student still, but most of the time I don’t. Couldn’t afford it anyways.

Joel on Software – How Microsoft Lost the API War: :I first heard about this from one of the developers of the hit game SimCity, who told me that there was a critical bug in his application: it used memory right after freeing it, a major no-no that happened to work OK on DOS but would not work under Windows where memory that is freed is likely to be snatched up by another running application right away. The testers on the Windows team were going through various popular applications, testing them to make sure they worked OK, but SimCity kept crashing. They reported this to the Windows developers, who disassembled SimCity, stepped through it in a debugger, found the bug, and added special code that checked if SimCity was running, and if it did, ran the memory allocator in a special mode in which you could still use memory after freeing it.”

New Scientist | New Technology: “A pair of sunglasses that can detect when someone is making eye contact with the wearer has been developed by Canadian researchers.

Besides being useful in singles bars, its inventors say the system could play a key role in video blogging, a hi-tech form of diary keeping.”

Cheap tickets between NYC and Brussels (or London) with Biman Bangladesh

If you have to fly to India (bombay) or Europe (brussels, london), Biman (the Bangladeshi airline) often have the best deals. It’s not easy to get tickets for them though, you usually can’t buy them online. I’ve used them before to get cheap tickets between New York City and Brussels. The airplanes are old, but they have great seats and cool 70’s decoration. They serve good curries too, and I’ve never found cheaper tickets. To find them:

1. Go to Biman’s website and check their dates and destinations. You can’t buy tickets there.

2. Find a travel agent. To do that, call the Biman office in New York City (the phone number is here) and ask to buy tickets. They’ll tell you you can’t buy tickets there, but they’ll refer you to a travel agent in NYC.

3. Call the travel agent, get the ticket. They don’t send it to you (maybe if you ask nicely), you have to go to their office in NYC to pick it up.

4. Pick up your ticket.

The good thing about these tickets is not only that they’re cheap, but they’re also very flexible about changing dates. Just give them a call. I got a NYC-Brussels ticket for US$ 430 that anywhere else (and I tried everything I could find, including Virgin Atlantic and such) was minimum US$750 (peak season).

So this puzzles me: minutes after posting an entry, the GoogleAds for it are relevant (about memory cards). Yet, if I look at the cached page at Google, it says it doesn’t have any. So has Google spidered this page? If not, how does it know which keywords to show? (I don’t have access to my logs to see if the bot came by.)

dog or higher: Catching web standards: “The developer I was working with initially built our site using tables, and when I pointed out that company policy was to use CSS, she got, shall we say, a little huffy. I knew I had to get my boss on side to influence her to do it over.
So I went in to him. “Boss”, I said “you are not going to understand much of what I am about to say but you need to know that it’s important and I will try to explain it to you as best I can.”
He looked mildly alarmed.
I went on. “Imagine that you wanted me to send out a document on your behalf, and we have a lovely word processor there to use, but I created the document on a manual typewriter instead because I didn’t know how to use the typewriter.” He nodded.”

So now that I found out that, yes, people are actually writing software and selling it for US$24 that does comment spam (wiki spam won’t be far off), we really need to work out bullet proof solutions. By bulletproof I mean that can’t be cracked too easily on a large scale by determined coders.

Some ideas on battling commentspam, referrerspam, wikispam and such: (most are for battling robots)

– Randomize the scriptname that accepts the POST data for each installation, or even for each pageview. Not bulletproof, but makes finding your script harder for the software.
– Add a random ID to your form, valid for a 1 post. If the ID isn’t right, the post doesn’t go through. This means that for every spam post, the spam software needs to download the page once. If you randomize the field name as well, it might even work better. Not bulletproof though.
– Generally randomize all field names. Create a table that maps your randomized field names to the real field names.
– Until a poster has proven they’re human, make it really hard to machinespam.
– Find a way to penalize spammers, that doesn’t make it easy to penalize others by faking them as spammers.
– Make sure you don’t make it too hard to post.
– Keep a central list vetted by some authority (maybe a community) of know spam URL’s. Actively use it to scare the people who buy spamming software (find them!): “We’ll make you loose pagerank!”. Be aware of the problems with central lists – this list should only list clear, true and proven spammers, not may-be-spammers.

More ideas?

From my referrers: http://www.php-soft.com looks like they sell comment spam software, used for comment spam, wiki spam and other stuff. It looks pretty efficient and somewhat advanced – enter a scriptname and it does a google search and spams all the scripts it finds.

Helping people start blogs

I’ve helped 2 friends of mine start blogs in the past year: Jay and Melina (I don’t even remember her URL). They don’t blog much though, think they aren’t getting that bloggin’ feeling. Jay sends me URL’s sometimes that I tell him he should put on his blog instead.

What are your experiences with helping people start blogs?

And here we have an illustration of why RDF is useful: “The bane of my existence is doing things I know the computer could do for me. When I got my proposed July 2001 travel itinerary in email, I just couldn’t bear the thought of manually copying and pasting each field from the itinerary into my PDA calendar. I started putting the Semantic Web approach to application integration to work.”

XML.com: Something Useful This Way Comes [Jun. 09, 2004]: “In other words, development of the Semantic Web requires a lot of work, but there’s been a lot work done. This raises an obvious question: when will all that work pay off?
There are only three ways to answer that question — already, never, or somewhere in between.”

In other words, the semantic web is kinda here and it’s kinda useful. That’s good enough for me.

One thing bothers me though. If it’s the data model of RDF that’s valuable, not the syntax (pretty much everybody hates the RDF-XML syntax), then really, what’s the big deal about? The RDF data model really isn’t that complex. Why do so many people shy away from RDF if it’s really just a useful data model? I suspect it might be because of all the stuff they build around that model, especially the syntax. Couldn’t someone invent RSRDF (Real Simple RDF), implementing the same useful data model (and at the same time maybe explain exactly why it’s so useful), but keeping the tools around it (OWL, syntax, …) much simpler?