Foreign language support?

Discuss the HanDBase for Windows Desktop program, conduits, and add-ons.

Foreign language support?

Postby Tomvb62 » Wed Jun 24, 2009 12:44 pm

I'm trying to import a csv file in the Windows Desktop program. The file contains mixture of texts in Greek and English. The English texts are properly imported in the database, but the Greek texts are unreadable. Is there a way to import non-latin characters in HanDBase and if yes, which character encoding should I use for my Greek csv file?
Tomvb62
 
Posts: 14
Joined: Wed Jun 24, 2009 12:20 pm

Re: Foreign language support?

Postby dhaupert » Thu Jun 25, 2009 9:01 am

Tomvb62 wrote:I'm trying to import a csv file in the Windows Desktop program. The file contains mixture of texts in Greek and English. The English texts are properly imported in the database, but the Greek texts are unreadable. Is there a way to import non-latin characters in HanDBase and if yes, which character encoding should I use for my Greek csv file?


HanDBase uses the encoding known as WindowsLatin1 which is an 8 bit encoding that only supports a total of 256 character combinations. This set of characters is shown here:
http://en.wikipedia.org/wiki/Windows-1252

I don't believe it supports Greek at this time.
dhaupert
 
Posts: 4111
Joined: Tue May 26, 2009 11:51 am

Re: Foreign language support?

Postby Tomvb62 » Thu Jun 25, 2009 1:39 pm

Thanks for the info. Are there any plans in the future to add Unicode (UTF-8) support to HanDBase?
Tomvb62
 
Posts: 14
Joined: Wed Jun 24, 2009 12:20 pm

Re: Foreign language support?

Postby dhaupert » Thu Jun 25, 2009 1:57 pm

Hi,

That's a great question- here's why we couldn't just add support for UTF8. HanDBase has fixed field widths. For example, when you say that a text field is limited to 40 characters, we allocate 40 bytes of information per field per record for that text. That was done to work with the Ascii format which is one byte of storage per character of text. Unicode has since come along and the base Unicode is either 2 or 3 bytes per character. This means that most storage requirements doubled or tripled, even though 90% of the time only one byte was needed (ie, the other byte was 0). UTF8 was designed to fill in that gap, using a single byte per character when possible, and then 2 bytes or more where required. But that means that if we say 40 characters max, and you go to enter 40 characters, but use some that require 2 or more bytes, we'll have issues.

Nothing that can't be fixed, but unfortunately it requires a new database format. We hope for this to be a feature for HanDBase 5, which hasn't started development yet, but we plan to later this year.
dhaupert
 
Posts: 4111
Joined: Tue May 26, 2009 11:51 am


Return to HanDBase for Windows

Who is online

Users browsing this forum: No registered users and 2 guests