Posts

Showing posts from August, 2017

GSoC 2017: Work Summary

So finally Google Summer of Code 2017 is coming to an end. It was an amazing experience to be a part of this prestigious program and I learned a lot in these three months about open source development. Work Summary I worked on dateparser , a project under sub-org Scrapinghub of Python Software Foundation (PSF) , that deals with parsing dates in various languages and formats. The objective of the project was to integrate translation data of all locales in Unicode Common Locale Data Repository(CLDR) which is a standard repository of locale specific data with the existing translation data in dateparser. Here is a brief outline of the work done on my project on dateparser during GSoC 2017: Work Completed 1.   Retrieved Translation data from Unicode CLDR Scripts were written to retrieve translation data from unicode CLDR  github repository. The translation data for dates and numerals were separately stored. 2. Ordered languages by population The languages were ordered on th

Integrating Unicode CLDR: Translation with Integrated Data

Finally I have successfully implemented translation of date strings with new integrated data. I have been working on it for a long while now, much more than I expected. I expected to be completely finished with translation of date strings without numeral words much earlier and I was planning to get started with numeral parser as early as possible. But it took me a while longer as I did not think clearly and therefore started on with a complicated solution and then moved on to simpler solutions. As I mentioned in my last blog post, the problem was to translate relative date strings, which are of two types, one that have no digits like 'yesterday' and the other that have digits and are stored as regex patterns like '(\\d+) day ago'. Currently dateparser uses translations for 'ago' and 'in' along with other words and relative dates are translated in a similar way as other date strings. First dates are splitted by numbers within dates and then by known wo