Posts

Showing posts from June, 2017

Integrate Unicode CLDR: Working with Retrieved Data

I have been working on scripts and modifying data for quite some time and it seems the data is ready to be used for parsing dates. The data retrieved from Unicode CLDR  has been divided into two parts: numeral_translation_data , that will be specifically used to parse numerals, and date_translation_data ,   that will be used to parse date strings after modifying it by parsing numerals if included in the date string. The existing data that was contributed by various individuals to dateparser has been modified to supplement the data retrieved from CLDR, i.e., only the portion of data that is not included in data retrieved from CLDR remains as supplementary data , which will continue to be modified by contributors in future. The data for date translation consists of translations for months, weekdays and periods, date order, and translations for relative-type dates. This data is stored for different languages and for each language, locale-specific data is stored for all locales with th

Integrating unicode CLDR : Initial Phase

There has been some time since I last wrote and I have been working on my project. As I was free during the GSoC bonding period, I started working on my project early so as to have a head start. I started working on scripts to retrieve translation data from unicode CLDR. That came out good as by the time GSoC period started, I had already written a major part of the scripts and had retrieved most of the data required for parsing. At the same time I started to write tests for dateparser and coverage has increased fairly. After the GSoC period started I made further changes to the script to resolve some issues like correcting date order of languages and storing data in order. As of now since I have added numeral data as well, data that has been retrieved is complete with all components required to translate dates as initially proposed in my proposal. The challenge now is to use this data to effectively and efficiently translate dates. Currently dateparser uses a dictionary based metho