[login to view URL] site provides with a set of translation, grammar services for students learning norwegian. Several languages are supported at the moment. The primary target of the exercise is support of russian to norwegian translation via [login to view URL]
But keep in mind, the parser must support as well all other available languages. The structure of html pages is pretty similar for all languages.
The goal of the tiny project would be to introduce a simple html parser, which could extract pronunciation, translated word, related phrases and examples.
As input parameters the script shall accept the following:
- the phrase (word) to translate;
- code of source language (from language) in ISO 639-1 format. For [login to view URL] it might be either "nb" for bokmål, or "nn" for nynorsk
- code of target language (to language) in ISO 639-1 format. It might be "ru", "nn", "pl", or any other language supported by the site.
The response structure shall contain the following:
"success" - boolean flag indicating the operation status
"errorMsg" - description of errors. Optional. Applicable only if success = false
"translated" - translated phrase
"englishPh" - translation of the phrase to english
"verbForms" - array of available forms of the verbs (present, simple past, past participle). Optional.
"pronunciation" - pronunciation of the requested phrase (word)
"examples" - array of examples available for the phrase(word)
"examplesEnglish" - array of examples for english translation
As a part of the project a unit test to be introduced. The test shall cover several cases, including missing translations, translations to different languages. For the beginning you may start with the trivial set:
prøve, prast, låne, arbeide.
The target platform for the script is python 2.7.
The project is a small preparation step and may lead to further cooperation.
I am proficient in Python and can do this quickly. I am available around the clock recently. The task is to scrape the website based on user input and parse the html result to obtain the required information.