Posts

Showing posts from July, 2017

Integrating Unicode CLDR: Parsing with Retrieved Data

As I mentioned in the last blog post, I have been working to modify the codebase so that dateparser could use new data to parse date strings. Most of the new data for date translation is similar to previous data used for translation, the only major difference is in dealing with relative type date strings. The previous data used translations for ago and in to translate relative type date strings like "10 years ago",  "15 hours 10 minutes ago", "in 10 minutes and 8 seconds". But the data retrieved from unicode CLDR contains translation for complete date strings so that they translate directly to formats like "** year ago" or "in ** month" which can be used further for parsing. Initially I thought direct translations like these would work but then I realised that many relative dates are not covered in these translations. These include dates that contains more than one date fields, like "1 year 3 months ago" and dates that includ

Integrating Unicode CLDR: Adding Locale Support

So far translation data is ready to be used for translation of date strings. This includes combined data from unicode CLDR and supplementary data contributed by many individuals. Initially the data from unicode CLDR was stored as json files and supplementary data was stored as yaml files and data from both these sources were to be combined after loading each separately which was then supposed to be used for translating dates. But after discussion with my mentor we agreed upon a better approach suggested by him that we could store data directly in python modules. Storing data in python modules directly has the advantage that importing data from python modules is faster than loading from json and yaml files. And considering the fact that we don't have to combine data from cldr and supplementary data files at run time, storing data in Python modules proves to be much more efficient. As the data is ready, now is the time to make necessary modifications in the codebase to support l