Steve Arnold pointed out to me that “Languages that are synthetic do not fare well in automatic systems unless the source documents are highly technical.”

Which makes sense, once you understand what a synthetic language is. A synthetic language combines bits into really long words. For example, in Mohawk: Washakotya’tawitsherahetkvhta’se = “He ruined her dress” (strictly, “He made the thing that one puts on one’s body ugly for her”). One word is used for something that other languages need a multiple words or a whole sentence for. You can see how that can mess with automated systems.

Languages are not synthetic or isolating, they fall into a spectrum: some languages are just more synthetic than others. Examples of common synthetic languages are German, Russian, Turkish, Finnish, Japanese, Korean, and many more.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s