Posts

Showing posts with the label Unicode Common Locale Data Repository

Integrate Unicode CLDR: Working with Retrieved Data

I have been working on scripts and modifying data for quite some time and it seems the data is ready to be used for parsing dates. The data retrieved from Unicode CLDR  has been divided into two parts: numeral_translation_data , that will be specifically used to parse numerals, and date_translation_data ,   that will be used to parse date strings after modifying it by parsing numerals if included in the date string. The existing data that was contributed by various individuals to dateparser has been modified to supplement the data retrieved from CLDR, i.e., only the portion of data that is not included in data retrieved from CLDR remains as supplementary data , which will continue to be modified by contributors in future. The data for date translation consists of translations for months, weekdays and periods, date order, and translations for relative-type dates. This data is stored for different languages and for each language, locale-specific data is stored for ...

GSoC 2017 : My Project

So I have been finally selected for Google Summer of Code 2017  at Python Software Foundation . Python Software Foundation(PSF) serves as an umbrella organisation comprising various sub-orgs that have projects that contribute to the development of the Python language. The sub-org I am going to work with is Scrapinghub . About Scrapinghub Scrapinghub, as its name clearly suggests, deals primarily with scraping the web. Scrapinghub is a company concerned with Information Retrieval and its later manipulation, i.e., it deals with both data extraction and data processing after retrieval. It has various projects that deal with these topics and the one I am going to work on deals with data processing after retrieval. I am going to work with dateparser , which is a Python library that primarily deals with parsing dates in various languages and formats. About dateparser dateparser is a Python library that is used to parse various forms of dates in different languages to a common for...