GSoC 2017: Work Summary
So finally Google Summer of Code 2017 is coming to an end. It was an amazing experience to be a part of this prestigious program and I learned a lot in these three months about open source development.
Scripts were written to retrieve translation data from unicode CLDR github repository. The translation data for dates and numerals were separately stored.
2. Ordered languages by population
The languages were ordered on the basis of the population that use the languages using data contained in cldr-core/supplemental/territoryInfo.json.
3. Integration with Existing Translation data
The data retrieved from unicode CLDR was integrated with existing translation data that has been contributed by the community.
4. Modifying Codebase to work with Integrated Translation data
The data retrieved from unicode CLDR was stored in a different format than previous data, and thus it was required to make changes in the codebase to work with the new integrated database.
5. Adding Support for Locales
Changes were made in the codebase to support parsing with locales.
The entire work done during GSoC 2017 is contained in PR #321 (last commit).
Work Summary
I worked on dateparser, a project under sub-org Scrapinghub of Python Software Foundation (PSF), that deals with parsing dates in various languages and formats. The objective of the project was to integrate translation data of all locales in Unicode Common Locale Data Repository(CLDR) which is a standard repository of locale specific data with the existing translation data in dateparser. Here is a brief outline of the work done on my project on dateparser during GSoC 2017:
Work Completed
1. Retrieved Translation data from Unicode CLDRScripts were written to retrieve translation data from unicode CLDR github repository. The translation data for dates and numerals were separately stored.
2. Ordered languages by population
The languages were ordered on the basis of the population that use the languages using data contained in cldr-core/supplemental/territoryInfo.json.
3. Integration with Existing Translation data
The data retrieved from unicode CLDR was integrated with existing translation data that has been contributed by the community.
4. Modifying Codebase to work with Integrated Translation data
The data retrieved from unicode CLDR was stored in a different format than previous data, and thus it was required to make changes in the codebase to work with the new integrated database.
5. Adding Support for Locales
Changes were made in the codebase to support parsing with locales.
The entire work done during GSoC 2017 is contained in PR #321 (last commit).
Work to be done
1. Adding Support for Numeral Words
Initially I also planned to add support for dates with numeral words during the GSoC period itself but could not implement it. I added numeral translation data for translating numerals and I planned on implementing a numeral parser to parse numeral strings, and further using it to parse dates with numeral words. I will start working on numeral parser soon after the GSoC period ends.
2. Adding more tests
With addition of translation data of more than 200 languages and over 500 locales, a lot of tests need to be added. Though I have already added many tests, I will add more in future.