Abstract
Transforming text from one language to another by using computer systems automatically or with little human interventions is known as Machine Translation System (MTS). Divergence among natural languages in a multilingual environment makes Machine Translation (MT) a difficult and challenging task. The purpose of this paper is to present a comprehensive survey of MTS in general and for English, Hindi and Sanskrit languages in particular. The state-of-the-art MT approach is Neural Machine Translation (NMT) which has been used by Google, Amazon, Facebook and Microsoft but it requires large corpus as well as high computing systems. The availability of MT language modeling tools, parsers data repositories and evaluation metrics has been tabulated in this article. The classification of MTS, evaluation methods and platforms has been done based on a well-defined set of criteria. The new research avenues have been explored in this survey article which will help in developing good quality MTS. Although several surveys have been done on MTS but none of them have followed the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach including tools and evaluation methods as done in this survey specifically for English, Hindi and Sanskrit languages.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Natural languages have shown a vital role in shaping human social behavior as they prepare the necessary mechanism for day to day communication among human beings (Fromkin et al. 2011). Natural Language Processing (NLP) comprises of three basic components: processing, understanding and generation (Allen 1995). NLP is a sub-domain of Artificial Intelligence (AI) and Machine Translation (MT) is one of the application of NLP. Machine Translation (MT) is a mechanism of translating the sentences of one language designated as Source Language (SL) into other language designated as Target Language (TL) with the help of computers (Hutchins 1995; Hutchins and Somers 1992; Slocum 1985). The translation may occur one-to-one, i.e. from one SL to another TL, known as bi-lingual translation; one-to-many, i.e. from one SL into many TLs and many-to-many translation, i.e. from many SLs to many TLs known as Multilingual Machine Translation (MMT). MT comes under Natural Language Processing (NLP) domain which is a sub-domain of Artificial Intelligence (AI) (Rao 1998). The translation may be unidirectional or bidirectional. Several efforts have been made to review the MT systems whereas major contributions has been done by Antony (2013), Desai and Dabhi (2021), Garje and Kharate (2013), Naskar and Bandyopadhyay (2005). The research in the MT field has been increased rapidly in the last few decades. Therefore a systematic yet critical evaluation of available MT techniques, methods and systems is needed. In this article, the authors have surveyed the traditional as well as state-of-the-art techniques and systems of MT. An effort has been made to identify existing MT approaches, development tools, data repositories, environments, evaluation metrics and platforms.
1.1 Motivation
According to Ethnologue languages of world, approximately 7102 languages and thousands of dialects have been used by people in the world (Lewis et al. 2015). Human translation has never been an effective solution for such problems due to less availability of human translators, high cost of manual translation and difficult to approach by everyone. According to Census of India 2001 data, 22 scheduled and 100 non-scheduled languages with approximately 1600 local dialects were being used by people (Dorr et al. 2004; Mallikarjun 2010). So, for the development of country like India, people have to exchange technology, science, ideas and work together without any language barrier. MT techniques can remove such problems in an effective manner. Thus, there is a great need of MT at the global level as well as local level in India also.
The summary of contribution and novelty of this review article is of many folds which are listed as follows:
-
Presenting comparison of MT techniques and evaluation methods based on well-defined criteria to analyze the existing MT platforms with their characteristics and applications.
-
Analyzed the availability of various language resources and presents word embedding techniques used in neural machine translation for Indian languages.
-
Explored the new research areas in the field of machine translation for Indian languages.
1.2 Approaches of MTS
Figures 1 and 2 shows different MTS approaches (Dorr et al. 2004; Seasly 2003). Broadly we can categorize approaches into five groups: Direct Machine Translation (DMT), Rule-Based MT (RBMT), Corpus-Based MT (CBMT), Knowledge-Based MT (KBMT) and Hybrid Based MT (HBMT). RBMT is further divided into Transfer Based MT (TBMT) and Interlingua Based MT (IBMT) whereas CBMT is divided into Statistical MT (SMT) and Example-Based MT (EBMT). Neural Machine Translation (NMT) is an extension of SMT as depicted in Fig. 1. Figure 2 shows the level of complexity in different approaches in the form of Vauquois triangle. From bottom to top complexity increases.
1.2.1 DMT
DMT comes at the bottom of the triangle and needs fewer efforts. There is no intermediary representation of the source and target language, only word to word matching is performed for the translation and the system may have pre-processing and post-processing paring phases for the input sentence morphological analysis and the target sentence reordering, respectively. The system uses a bilingual dictionary for matching the SL words with TL words. Figure 3 depicts the DMT approach.
1.2.2 TBMT
In this approach after the morphological analysis of input sentence, the syntactic and semantic analysis using the SL dictionary is performed to find out grammar structure and generates a parse tree. The system uses a set of transfer rules to transfer SL parse tree into TL with the help of a bilingual source-target language dictionary. The TL text is generated as per the grammar of TL using syntactic and semantic generator modules and the target language dictionary. The working of TBMT approach is depicted in Fig. 4.
1.2.3 IBMT
In this approach, SL text is analysed and an intermediate language independent code is generated to obtain the TL text. As the intermediate code representation is independent of SL as well as TL so could be used in multilingual machine translation. The language analyser is dependent on SL in the input process and the target language generator is dependent on the particular target language. The functioning of IBMT is shown in Fig. 5.
1.2.4 SMT
In this approach, statistical or probabilistic techniques have been applied in machine translation system development. There are two major components of this approach as-language model and the translation model. The language model produces the probability of occurrence for the strings of words in the source as well as the target language and also the conditional probabilities of occurrence of a word in the target language which translates a word in the source language. The multiplication of the probability of occurrence of a word in SL with the conditional probability of occurrence of a word corresponding to this word in TL provides the occurrence of source and destination pairs of words occurring in the corpus available for translation. This method requires a large amount of database and very complex statistical techniques to do the translation. The efficiency of the system increases with more training data sets and parallel corpora availability for the language pair. Machine translation can be done based on word, phrase, sentence, or hierarchical phrase. The translation model generally uses the N-gram model. N-gram model predicts the occurrence of the next word of the text given the previous words. The working process of the SMT approach is presented in Fig. 6.
1.2.5 EBMT
The basic translation principle used by this approach was analogy. This approach does not require huge amount of corpora, it needs a bilingual corpus of stored examples and using one of the matching algorithm to find the translation which matches with the source language sentence. Generally EBMT does not require any grammar rule base in detail; it uses only the stored examples and the matching algorithm to find the closest match corresponding to the given input sentence. The architecture of EBMT approach is shown in Fig. 7.
1.2.6 KBMT
This approach extracts the linguistic information from SL and stores that information into the knowledge base used for translation purpose. Information extraction is done by using bilingual dictionaries, language structure, stored translation information, domain specific information dictionaries etc. Figure 8 depicts the architecture of KBMT approach.
Each approach has its own advantages and disadvantages, so hybridization of two or more than two approaches might give a better translation quality. Hence researchers are focusing on hybridization of approaches at different levels for developing MTS. Comparison of MTS approaches have been done based on a set of well defined criteria as shown in Table 1. RBMT approach gives better results than other approaches, but needs deep linguistic knowledge, more time to create translation rules.
Corpus Based Machine Translation (CBMT) approach performs better than DMT for long sentence translation, but requires large volume of text corpus for both SL and TL, statistical tools, algorithms to handle and high computation power for the development of MTS. DMT approach is better for translating single clause sentences and requires less time to develop MTS. Neural Machine Translation is an emerging technique and reports similar results to the present state-of-art MTS (Hassan et al. 2018; Wu et al. 2016).
Hybridization of CBMT and RBMT can be done based on confidence-estimation and classification (Christopher and Rao 2010). However, the problem with such hybridization is the requirement of a large corpus of parallel sentences to extract translation rules to cover all aspects of natural language. To overcome such problems Recursive Chain-Learning (RCL) or Genetic Algorithms or Neural Networks can be used over the existing systems (Echizen-Ya et al. 2004). For translating fixed patterns, the RBMT approach was not effective, because conventional syntactic analyzers are not able to recognize such fixed patterns (collocation, idioms and compound nouns). To remove such problems specific pattern recognition modules can be added to the existing RBMT based systems. This will reduce the load on POS tagger and parser, helps in resolving word sense ambiguities (Jung et al. 1999). Other hybrid combinations are explained in Sects. 4.1 and 4.2.
The rest of the article is organized as Sect. 1 gives the introduction to MT, Motivation, the contribution of this article and approaches of MT. Section 2 describes the evolution of MT in general as well as for English, Hindi and Sanskrit languages. Section 3 explains the survey methodology adopted for the current work. Section 4 describes outcomes as results obtained from various MT systems. State-of-the-art MTS platforms, parsing and language modeling tools, available corpora have been discussed in Sect. 5. Section 6 highlights the role of Neural Networks in Machine Translation with some latest examples of MT systems based on NMT approach and Sect. 7 depicts MT evaluation methods and platforms with their characteristics. Section 8 provides research avenues generated from this work and recommendation for new researchers. Finally the concluding notes are given in Sect. 9.
2 Evolution of MTS
2.1 Evolution of MTS in general
Machine translation history had started in the 17th century when Discartes and Leibniz proposed the concept of mechanical dictionaries based on the method of universal numerical codes. But the actual proposal for the machine translation came in the 20th century. Figure 9 shows the development of machine translation in five phases in general (Hutchins 1995; Hutchins and Somers 1992).
2.2 MTS development in Indian perspective
The MTS development for Indian languages has started in 1990s and Fig. 10 shows various MTS developed for English, Hindi and Sanskrit languages based on different approaches.
The domain, efficiency, features and the research group associated with these MTS is explained in Sect. 4. Initially due to non-availability of online corpus for Indian languages compared to other languages, DMT and RBMT approaches have been used for developing MTS among Indian languages, although some CBMT based MTS for English to Indian languages or Indian to English language translation have also been developed. In 2003 the hybridization of different approaches have started for developing MTS. From 2009 to 2014 RBMT approach has been used extensively for MTS development. In the duration from 2016 to now the graph of CBMT increases due to the application of NMT approach in MTS. The hybrid approach was also used in parallel to RBMT and CBMT in a few MT systems during the same time. In hybridization, Artificial Neural Network (ANN) and Quantum Neural Network (QNN) techniques outperform compare to other combinations. RBMT approach dominates other approaches in Indian MT development scenario.
3 Survey process
The approach used for survey in this article follows the guidelines given in Budgen and Brereton (2006), Kitchenham et al. (2009), Moher et al. (2015). The different stages involved in the survey process are planning, execution, analysis of results, documentation of results and highlighting the research gaps. The planning of survey includes the creation of an effective research question framework as shown in Table 2, sources of articles as discussed in Sect. 3.1. Execution of survey includes criteria for searching the article as shown in Table 3, inclusion or exclusion criteria of articles in the survey.
3.1 Information sources
A broad perspective is essential for broad coverage of literature as suggested by Kitchenham et al. (2009) and Budgen and Brereton (2006). So the following electronic sources were used for searching the relevant articles for the survey:
-
“Google Scholar (https://scholar.google.co.in/)”
-
“IEEE Explorer (ieeexplore.ieee.org/)”
-
“ACM Digital Library (dl.acm.org/)”
-
“Science Direct (https://www.sciencedirect.com/)”
-
“Springer (www.springerlink.com)”
-
“ACL(https://www.aclweb.org/)”
3.2 Searching criteria
All the articles searched over electronic sources include the token” Machine Translation” which makes the process of searching relevant articles a time-consuming and challenging, as these articles are vast in numbers. So, a search strategy is needed to include as many related articles as possible with ease and in less time. One such approach is presented in Table 3, but still, some of the right papers might not be added to this survey, a reason may be due to missing such keywords into the abstract part. The work on MT for Indian languages started in the 90s, and the current survey includes articles from different sources like journals, conferences, workshops, seminars, technical reports, and symposiums from 1990 to Feb 2021.
3.3 Inclusion/exclusion criteria
The process of including or excluding the article in the current survey is shown in Fig. 11. In the first phase, the exclusion of articles has been done based on the title of the article. The exclusion percentage in this stage was 28%. In Phase-2, 1057 articles are separated from the original 1500 article database, and after studying their abstracts, only 410 articles are selected for the next phase based on their relevance to the field of machine translation. In Phase-3, after reviewing the full text of 410 articles only 220 are moved to the next phase, and rest are excluded. In Phase-4, the exclusion is done based on the MT for English, Hindi and Sanskrit languages and finally, 118 articles are included for the current survey.
4 Results and discussion
This article examines the existing literature in the field of MT based on the research questions as per Table 2 and finds out the solutions to these questions as the outcome. Out of 118 articles, 45% are available in Journals, and 55% are published in conferences, workshops, Summits, Lecture Series and Technical Reports. The following sub-sections give an outcome-based analysis of various MTS and further examined based on approach, domain, and development year.
4.1 Machine translation system for Hindi and Sanskrit languages
Hindi and Sanskrit both belong to the Indo-Aryan language family which is a subgroup of the Indo-European language family. Both the languages are free word order and different from English which follows Subject–Verb–Object (SVO) word order. Hindi and Sanskrit both use the Devanagari script and shares many common features with each other.
Sanskrit is one of the oldest languages in the world and has been treated as a holy language in India. In the past, it was the language of educated people and used as a major language in communication, literature, education, administrative documents, and spiritual activities. The treasure of Sanskrit includes not only scientific, mathematical, philosophical, medical, poetry, and religious information but also India’s spiritual as well as cultural aspects. Several languages have emerged from Sanskrit including Indian as well as foreign languages. The Sanskrit users have decreased gradually with time. Recently the Indian government and some non-governmental agencies have started to promote the Sanskrit language so that more people can be associated with this beautiful, spiritual, and most powerful language of the world. Several efforts have been made in developing Sanskrit language MTS all around the world. Based on Panini grammar several tools for Sanskrit language analysis, parsing, and generation tools have been developed by different research groups. Special Center for Sanskrit Studies at Jawaharlal Nehru University (Prof. Girish Nath Jha) New Delhi, University of Hyderabad (Dr. Amba Kulkarni), IIT Bombay (Prof. Pushpak Bhattacharya), IIT Kanpur (Prof. RMK Sinha and Pawan Goyal), Banaras Hindu University Banaras have been the core places for Sanskrit language processing tools development.
Hindi is regarded as the fourth most spoken language in the world and is also morphological rich (Lane 2016). Different research groups have been working to develop MTS for Hindi and Sanskrit languages following various MTS approaches. Tables 4 and 5 provide an overview of such MT systems based on several criteria which include approach used, year, language pair, features, domain, and efficiency. The next section discusses these systems based on the approach used for development and suggests solutions to improve their efficiency.
4.1.1 DMT based MTS
Based on the DMT approach three MTS have been included in this survey (Dubey 2019b; Dubey et al. 2013; Goyal and Lehal 2010). The main drawbacks of these MTS were that these systems were not able to resolve the word sense ambiguities, context resolution, translation of complex sentences because in the DMT approach word to word replacement strategy is followed. These issues can be resolved either by combining DMT with other approaches or by improving the lexicon of words with more syntactic as well as semantic attributes.
4.1.2 CBMT based MTS
Four MTS based on the CBMT approach have been included for review (Jain et al. 2001; Sachdeva et al. 2014; Sinha 2004; Sinha and Thakur 2005). The problems of NER, out of corpus translation in Jain et al. (2001) were resolved by Sinha (2004) adding special modules which will handle a particular problem. This modular approach makes the system more scalable and flexible. The problem of the polysemous verb with Sinha and Thakur (2005) can be resolved either by adding a special module as done in Sinha (2004) or by using the finite-state automaton approach or enhancing the POS tagger capability to resolve the issue. The issue with Sachdeva et al. (2014) is the feature extraction from the dataset which can be resolved easily with the help of deep neural networks (LSTM, RNN, CNN). Based on NMT citepmujadia-sharma-2020-nmt, kumar2019augmented, singh2020corpus, Laskar et al. (2020) systems have been developed. Evaluation of two MTS have also been covered (Goyal and Lehal 2009) and (Dungarwal et al. 2014). Other evaluation metrics like METEOR, NIST, R-L/W/S can be applied to validate these systems.
4.1.3 RBMT based MTS
Several MTS and MT tools have been considered for review based on the RBMT approach. The MTS using UNL as Interlingua were having issues of scalability and limited rule base which can be removed by the learning and feature extraction capabilities of neural networks even without the deep knowledge of SL and TL (Singh et al. 2007). The MTS based on GB theory was able to translate only simple sentences whose capability can be enhanced by the application of minimalist approach and generating the transfer rules either using SMT or NMT (Choudhary and Singh 2009). Hindi to Sanskrit and Sanskrit to Gujarati translation systems (Bhadwal et al. 2020; Raulji and Saini 2019) have been discussed. The efficiency of Sampark MTS was enhanced with the help of Memcached technique which can be done with LSTM network models (Christopher and Rao 2010). The Shakti Standard Format (SSF) format used in the system can be applied to other MTS which involves modular approach (Bharati and Kulkarni 2009). Two MTS for Sanskrit have also been included (Aparna 2005; Upadhyay et al. 2014). Several tools have been developed to process Sanskrit text (Bhadra et al. 2009; Kulkarni 2013; Kulkarni et al. 2010; Kumar et al. 2010). One issue regarding the morphological analysis of feminine nouns was reported by the authors to the developer in 2018 and that was rectified later on by the developer (Kulkarni 2013). The issues with these tools are that these are still in the testing phase. By developing the automatic testing tools for such systems an help in finding the issues early and fix them as soon as possible.
4.1.4 HBMT based MTS
Five MTS based on HBMT approach have been included for survey (Bawa et al. 2020a,b; Goyal and Lehal 2011; Narayan et al. 2014; Sitender and Bawa 2018). Different combinations of MT approaches DMT with RBMT, QNN with RBMT and RBMT with DMT have been used for the development of these systems, respectively.
4.1.5 MTS outcomes
After studying above mentioned Hindi and Sanskrit MTS thoroughly Figure 12 shows the possible outcomes.
4.2 Machine translation system for the English language to Indian languages
Several MTS have been proposed based on different approaches for English language which is the third most spoken language worldwide (Lane 2016). This section discusses such systems based on the approach used for development followed by a tabular representation of such systems is presented in Table 6.
4.2.1 RBMT based MTS
Based on RBMT approach, various MTS have been categorized into four groups. The first group have used pseduo-interlingua code (Goyal and Sinha 2009; Jayan and Bhadran 2014; Sinha and Jain 2003; Sinha et al. 1995; Sinha 2005) and second group has used UNL intermediate code to represent the intermediate code (Dave et al. 2001; Desai et al. 2014; Sridhar et al. 2016; Udupa and Faruquie 2005). The third group has translated the source syntax tree to target syntax tree using rule base (Aasha and Ganesh 2015; Bahadur et al. 2012; Darbari 1999; Pathak and Godse 2010). The fourth group uses Panini grammar rules, Sandhi rules, root word generation, pattern generation approach for translation (Ata et al. 2007; Balyan and Chatterjee 2015; Mishra and Mishra 2012; Reddy and Hanumanthappa 2013).
The issues with these systems are small size and non-standard form of analysis as well as generation rules, scalability, limited domain, time-consuming while writing the rules. The language processing tools like stemmer, POS tagger, parser used for the Indian language part were not competent with state-of-the-art tools like Porter stemmer, Malt parser, and Stanford parser. The approach followed in Porter stemmer to form the rule base should be adopted while making the rule base which will speed up the process. Language independent parsers should be developed like Malt parser or UNL parsers for Indian languages with the application of the NMT approach to remove the scalability and domain restriction issues.
4.2.2 CBMT and HBMT based MTS
Based on the CBMT approach several MTS have been proposed and classified into three groups. The first group has used statistical models like the IBM model, Bag of Words model, SRILM language model (OCH F 2007; Sharma 2011; Udupa and Faruquie 2005; Venkatapathy and Bangalore 2009). The second group has used Hierarchical phrase-based, simple phrase-based SMT techniques to perform the translation (Ali et al. 2013; Jawaid et al. 2014; Khan et al. 2013). The third group has used the EBMT approach for translation (Badodekar 2003).One system has also used the machine learning technique for the English–Bengali question–answer system (Sheikh and Conlon 2013). The issues with these are the availability of parallel aligned corpus of sentences, the complexity of statistical techniques to form the language as well as translation models which can be resolved with the help of the NMT approach or hybridization with other approaches. Application of machine learning techniques for prediction like CRF++, LSTM, RNN. Three MTS have been included based on the HBMT approach. Bharati et al. (2003) and NCST (2008) have used RBMT with SMT, while Narayan et al. (2014) have used RBMT with QNN for translation.
4.2.3 English MTS outcomes
Based on the discussion done in the above section and Table 6, Fig. 13 shows the outcomes obtained.
4.3 Research questions vs outcome
Ten outcomes are obtained after discussing the MTS in Subsects. 4.1 and 4.2 and are tabulated in Table 7. Research Questions are denoted by O1, O2, O3, O4, O5 and Q1, Q2, Q3, Q4, Q5, Q6 are the outcomes for Hindi and Sanskrit MTS while E1, E2, E3, E4, E5 are outcomes of English MTS. A four scale mapping is done with value ‘3’ as the maximum contribution and value of ‘0’ indicates least contribution of an outcome with respect to the research questions as shown in Table 7.
5 Machine translation platforms and tools
This section gives an overview of some statistical tools, parser and corpus available online for developing new MTS and can be downloaded freely as shown in Table 8. Table 9 shows some of the popular MTS platforms which could be used for developing new MTS. Various language corpora available for Indian languages are also highlighted. Enabling Minority Language Engineering (EMILLE) contains three types of corpora such as parallel, monolingual and annotated. In parallel corpus it contains two lakhs words for Bengali, Gujarati, Hindi, Punjabi, and Urdu to English and reverses. Twenty annotated Hindi files are there in the corpus.
Gyan Nidhi corpus contains fifty thousand number of pages as a parallel corpus for each of eleven Indian languages including (Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Telugu, Tamil) and English language.
Open Source Parallel Corpus (OPUS) contains parallel corpus for Assamese, Bengali, Bhojpuri, English, Gujarati, Hindi, Kannada, Kashmiri, Konkani, Malayalam, Marathi, Oriya, Punjabi, Sanskrit, Tamil, Telugu and Urdu.
ILCI (Indian Language Corpora Initiative) contains a corpus of 50,000 parallel aligned sentences in Bangla, English, Hindi, Gujarati, Konkani, Malayalam, Marathi, Oriya, Punjabi, Urdu, Tamil, Telugu in the domain of tourism and health.
6 Role of artificial neural network in machine translation
With the explosive growth of the internet and easy access to high computing power systems, Neural Machine Translation has emerged as a fast-growing approach for developing new MTS (Cho et al. 2014; Kalchbrenner and Blunsom 2013; Sutskever et al. 2014).
The basic components of the NMT system are the encoder and decoder. It uses single neural network architecture to generate a target sentence for the input sentence, instead of using multiple small components optimized in pipeline form for obtaining translation in traditional phrase-based systems as shown in Fig. 14. Initially, the problem with NMT systems was the fixed- size vector space generated by the encoder for input sentence which was resolved by Bahdanau et al. (2014).
Different types of neural network architectures have been used for developing new MTS. Recurrent Neural Networks (RNN) are used mostly for MTS development due to their feature of preservation with the processing of input data/memorization of features of natural language. LSTM (Long Short-Term Memory) a type of RNN with two or more than two hidden layers is used for extracting features from the input text and increases the efficiency of translation (Agrawal 2017).
Machine Translation among eleven Indian languages using the NMT approach has been proposed and obtained better results than the traditional SMT approach (Agrawal 2017). Microsoft provided NMT based translation support for 21 languages and added Hindi recently (Microsoft 2017). Wu et al. (2016) also uses the NMT approach over the existing SMT approach and show better results than SMT. Facebook in 2017 proposed the implementation of NMT using Convolutional Neural Networks and claimed faster performance than the work presented by Gehring et al. (2016, 2017). Amazon has also launched its machine translation system using NMT approach (Faes 2018). Some important platforms useful for the development of NMT systems includes Tensorflow, Torch, Theano, PyTorch, Matlab, DyNet-lamtram and EUREKA are available at Zhang (2017).
7 MT evaluation methods
The MT evaluation methods are divided into two categories : Traditional Evaluation Methods and Automatic Evaluation Methods
7.1 Traditional evaluation methods
This section will highlight some of the commonly used methods of MT evaluation (Van Slype 1979) following the traditional approach.
7.1.1 Fluency test
Fluency of an MTS gives the measure of the amount with which the target text is well-formed according to the TL grammar rules. A grammatically well-formed with correct spellings, stick to the common use of terms, names, and titles which can easily be interpreted and acceptable by the native speaker of the TL is known as the fluent segment (Singh et al. 2007; Goyal 2010). The 4-point scale was used in the evaluation of the Punjabi EnConverter and DeConverter System. The fluency score using Table 10.
7.1.2 Intelligibility evaluation
It provides the measure of easiness with which the translated text can be understood by the user. In this method, a group of persons is required to read the sentences in various versions (original, human translation with and without revision, MT without and with post-editing) in such a way that a particular person is receiving only one copy of the sentences of a particular version in the group. The ranking of the sentences on a 4-point scale is shown in Table 11 (Van Slype 1979). The ranking is received from the readers, and the average is taken of all the rankings to find out the overall intelligibility rank of the translation. This approach is applied to the evaluation of the Hindi–Dogri language, Hindi to Punjabi MTS, Punjabi to Hindi MTS, SYSTRAN English–French MT system. According to Carroll (1966) the measure of intelligibility is done on a 9-point scale as shown in Table 12.
This scale is used in the evaluation of automatic translation of ALPAC system.
7.1.3 Fidelity/adequacy test
Fidelity is the measure of an amount of information correctly translated into the TL from SL. It tells about the correctness of the translation. Rating of fidelity should be less than or equal to the intelligibility ratings and is done on a 4-point scale. It has been applied to the evaluation of Hindi–Dogri MTS, Punjabi Deconverter and English–French MT produced by the SYSTRAN system in which the rank of ‘3’ means complete faithful and rank of ‘0’ means completely unfaithful.
7.2 Automatic evaluation methods
Several automatic evaluation methods have also been proposed. Some of the popular methods are included for the survey and compared based on different metrics as shown in Table 13.
7.3 MT evaluation platforms
This section provides information about evaluation platforms available to evaluate MT systems on various metrics. Three platform ORANGE, Asiya, and IQMT have been explained in Table 14.
8 Research avenues and recommendations
Although lots of work have been done in the last three decades for developing MTS with different language pairs (Indian languages) and of various domains. The emergence of the NMT approach and the easy availability of high computing resources and corpus for Indian languages has created several new opportunities for researchers to work in this field. The researchers are now more focused to apply the machine learning algorithms for text processing rather than other fields and as a result, several new tools and platforms are available for text processing. It is a very difficult and time-consuming process to create the rule base which will cover all the aspects of the language specifically for Hindi and Sanskrit languages which are highly inflected and morphological rich in nature. To apply the SMT approach the need for a large corpus is again a big hurdle for languages like Sanskrit. The following are some of the research avenues with which the researchers can start their research work:
-
Developing POS tagger or stemmer for Hindi and Sanskrit languages using a hybrid approach of rule base and machine learning techniques.
-
Developing automatic Karaka Analyzer (case marker) for Sanskrit and Hindi by making use of the similarity features among Indian languages in such a way that only a small effort is required to make this system for other Indian languages.
-
Developing a platform like Snowball (http://snowball.tartarus.org) for creating the rule base in an easy and fast manner.
-
Creating small modules which can enhance the performance or reduce the response time of the existing MTS like the Named Entity Recognition (NER) tool, automatic pre- or post-processing tools using machine learning techniques.
-
Anaphora or Catphora resolution is still a challenging task for the Sanskrit language. So, special modules can be developed for such types of problems which can be easily merged with the MTS adopting modular approach.
-
For MTS using UNL as an interlingua approach, the resolution of UNL relation is a challenging area because it requires thousands of rules to resolve all the 56 UNL relations (Le Thuyen and Hung 2016). So, machine learning approaches can be used over the UNL dictionary to predict the possible relations with the Case marker module.
-
Development of the Sanskrit Deconverter using UNL is still an open area of research.
-
Development of Operating Systems for computers using less ambiguous language like Sanskrit.
-
Developing tools to extract text from scanned images and develop digital corpus for languages like Sanskrit and Punjabi.
Based on the discussions done in Sects. 4.1, and 4.2 and the outcomes shown in Figs. 12, 13 on various MTS the following recommendations are derived for researchers working in field of machine translation:
-
The application of any architecture (approach) to develop new MTS depends on various parameters like language pair, availability of linguistic resources for the language pair, the application domain of MTS, linguistic knowledge.
-
SMT approach performs better for long sentence translation and DMT gives better results for short length sentences.
-
Maximum utilization of similarity feature at syntax level or semantic level among Indian languages such as noun, verb, declension, prefix, Karka Analysis for case identification, word formation, and word order, etc. should be done for developing MTS among Indian Languages.
-
Interlingua approach needs fewer efforts for developing multilingual MT systems like Anglabharti, Anubharti, UNL based MTS, and Sampark. So, Interlingua representation like of pseudo-Interlingua, UNL expressions, or an intermediate representation of Sanskrit language as Interlingua could be used efficiently for developing new MTS, and less effort is required for new language translator development.
-
Panini Grammar is one of the most unambiguous grammars ever developed for a natural language and written in a more structured manner for Indian languages. Panini principles will help to develop new MTS for Indian Languages based on the RBMT or HBMT approach.
-
RBMT systems require deep linguistic knowledge of the source as well as the target language and are a time-consuming process although the quality of translation using RBMT is better than other approaches.
-
Use of statistical tools like Moses’ toolkit, Giza + + , IRSTLM, SRILM makes the developing process much faster than other systems but requires a large amount of parallel corpus in digital format, so applicable only for language pairs having large corpus availability in digital form.
-
Google and Microsoft have used deep neural networks over the SMT approach and proved that the Neural Machine Translation approach performs much better than SMT and even requires fewer amounts of data for training, but requires large computational power to train such systems.
-
For Sanskrit Language, various part of speech taggers is available like BIS POS, JPOS (JNU), CPOS, IL POS (Indian Language), and Gerard Huet Parser, Constraint-Based Parser, Deterministic Parser of Amba Kulkarni, and Indic NLP Library could be used to develop Sanskrit Based MTS.
-
For English Language Stanford Parser is efficient enough to give the analysis of the English Language.
-
The availability of wordnet for English, Hindi and Punjabi and Punjabi makes the translation task easier and less time- consuming. The shallow parser available on the TDIL website could be used for Indian Languages.
-
The fastest way of developing MTS is by using the DMT approach, and the quality of translation is also good but limited to a small domain and requires bilingual dictionaries and a small number of transfer rules like in Sampark MTS.
The Hindi and Sanskrit languages have used the traditional methods of MT evaluation which include Fluency Test, Intelligibility Test, and Fidelity Test. Most of these tests depend on human evaluation but the application of the NMT approach be easily applied to them also. In the case of automatic evaluation methods, the BLEU and METEOR score has become the common standards for MT evaluation. For English to Indian language MTS the BLEU, NIST, and METEOR have been used by the developers.
9 Conclusion
This article presents an outcome-based systematic survey of machine translation for English, Hindi, and Sanskrit languages. Out of 1500 research articles, 118 articles have been included in this survey based on the Inclusion-Exclusion criteria mentioned in Subsect. 3.3. The results of the survey are presented in different dimensions like MT Evolution, MT approaches, mapping research questions with outcomes, overview of MTS based on several criteria (approach, language pair, domain, efficiency, features), state-of-the art-MT tool-kits, technological enhancement in MT approach, MT evaluation methods and platforms. The latest trends in MTS development are based on neural networks and provides human-like translation quality as seen in Hassan et al. (2018). Also, it is still not feasible for languages like Sanskrit to develop an efficient MTS and apply SMT or NMT approach due to non-availability of corpus and complexity of the language. State-of-the-art MTS platforms with MT development tools and corpus have also been discussed. State-of-the-art MT evaluation methods and platforms with specific features have been explored in this survey. Several research avenues have been highlighted in this survey work for further research in machine translation. Future recommendations have also been included to help researchers to develop new MT or enhance existing MT development.
References
Aasha V, Ganesh A (2015) Machine translation from English to Malayalam using transfer approach. In: Proceedings of international conference on advances in computing, communications and informatics (ICACCI), pp 1565–1570
Agrawal R (2017) Towards efficient neural machine translation for indian languages. PhD thesis, International Institute of Inforsmation Technology, Hyderabad
Ali A, Hussain A, Malik MK (2013) Model for English-Urdu statistical machine translation. World Appl Sci 24:1362–1367
Allen J (1995) Natural language understanding, 2nd edn. Pearson, London
Ambati BR, Deoskar T, Steedman M (2018) Hindi ccgbank: A ccg treebank from the Hindi dependency treebank. Lang Resour Eval 52(1):67–100
Antony P (2013) Machine translation approaches and survey for indian languages. Int J Comput Linguist Chin Lang Process 18(1):47–78
Aparna S (2005) Sanskrit to English translator. Lang India 5:1
Ata N, Jawaid B, Kamaran A (2007) Rule based English to Urdu machine translation. In: Proceedings of conference on language and technology, pp 1–7
Badodekar S (2003) Translation resources, services and tools for indian languages. Computer Science and Engineering Department, Indisan Institute of Technology, Mumbai
Bahadur P, Jain A, Chauhan D (2012) Etrans—a complete framework for English to Sanskrit machine translation. In: Proceedings of international conference and workshop on emerging trends in technology in international journal of advanced computer science and applications (IJACSA), Citeseer, pp 52–59
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:14090473
Baker P, Hardie A, McEnery T, Cunningham H, Gaizauskas RJ (2002) Emille, a 67-million word corpus of Indic languages: data collection, markup and harmonisation. In: Proceedings of the international conference on language resources and evaluation (LREC), pp 819–825
Balyan R, Chatterjee N (2015) Translating noun compounds using semantic relations. Comput Speech Lang 32(1):91–108
Bawa S et al (2020a) A sanskrit-to-english machine translation using hybridization of direct and rule-based approach. Neural Comput Appl 33:2819–2838
Bawa S et al (2020b) Sanskrit to universal networking language enconverter system based on deep learning and context-free grammar. Multimedia Syst 1–17
Bhadra M, Singh SK, Kumar S, Agrawal M, Chandrasekhar R, Mishra SK, Jha GN et al (2009) Sanskrit analysis system (sas). In: In Kulkarni A., Huet G. (eds) Sanskrit computational linguistics. ISCLS 2009. Lecture notes in computer science. Springer, pp 116–133
Bhadwal N, Agrawal P, Madaan V (2020) A machine translation system from Hindi to Sanskrit language using rule based ap- proach. Scalable Comput Practice Experience 21(3):543–554
Bharati A, Kulkarni A (2009) Anusaaraka: An accessor cum machine translator. Department of Sanskrit Studies, University of Hyderabad, Hyderabad, pp 1–75
Bharati RM, Reddy P, Sankar B, Sharma D, Sangal R (2003) Machine translation: the Shakti approach. In: Proceedings of international conference on natural language processing (ICON-2003)
Budgen D, Brereton P (2006) Performing systematic literature reviews in software engineering. In: Proceedings of the 28th international conference on software engineering, ACM, pp 1051–1052
Carroll JB (1966) An experiment in evaluating the quality of translations. Mech Transl Comp Linguistics 9(3–4):55–66
Cho K, Van Merriënboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: encoder-decoder approaches. arXiv preprint arXiv:14091259
Choudhary A, Singh M (2009) Gb theory based Hindi to English translation system. In: Proceedings of 2nd IEEE international conference on computer science and information technology (ICCSIT-2009). IEEE, pp 293–297
Christopher M, Rao UM (2010) IL-ilmt sampark: a hybrid machine translation system. In: Proceedings of 32nd all India conference of linguistics (AICL32). Lucknow University, Lucknow, pp 69–75
Darbari H (1999) Computer-assisted translation system—an Indian perspective. Machine Translation Summit VII, 13–17 September, pp 80–85
Dave S, Parikh J, Bhattacharyya P (2001) Interlingua-based English–Hindi machine translation and language divergence. Mach Transl 16(4):251–304
Desai NP, Dabhi VK (2021) Taxonomic survey of Hindi language nlp systems. arXiv preprint arXiv:210200214
Desai P, Sangodkar A, Damani OP (2014) A domain-restricted, rule based, English-Hindi machine translation system based on dependency parsing. In: Proceedings of the 11th international conference on natural language processing, pp 177–185
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz R, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (volume 1: long papers), vol 1, pp 1370–1380
Dorr BJ, Hovy EH, Levin LS (2004) Natural language processing and machine translation encyclopedia of language and linguistics, (ell2). Machine translation: interlingual methods. In: Proceeding of international conference of the world congress on engineering, pp 1–20
Dubey P (2019a) The Hindi to Dogri machine translation system: grammatical perspective. Int J Inf Technol 11(1):171–182
Dubey P (2019b) The Hindi to Dogri machine translation system: grammatical perspective. Int J Inf Technol 11:1–12
Dubey P et al. (2013) Machine translation system for Hindi-Dogri language pair. In: Proceedings of international conference on machine intelligence and research advancement (ICMIRA-2013), IEEE, pp 422–425
Dungarwal P, Chatterjee R, Mishra A, Kunchukuttan A, Shah R, Bhattacharyya P (2014) The IIT Bombay Hindi-English translation system at wmt 2014. In: Proceedings of the ninth workshop on statistical machine translation, association for computational linguistics, pp 90–96
Echizen-Ya H, Araki K, Momouchi Y, Tochinai K (2004) Machine translation using recursive chain-link-type learning based on translation examples. Syst Comput Jpn 35(2):1–15
Faes F (2018) Amazon and lion bridge share stage to market neural machine translation. https://slator.com/technology/amazon-and-lionbridge-share-stage-to-market-neural-machine-translation/
Federico M, Bertoldi N, Cettolo M (2008) Irstlm: an open source toolkit for handling large scale language models. In: Proceedings of ninth annual conference of the international speech communication association, pp 1618–1621. https://github.com/irstlm-team/irstlm
Forcada ML, Ginestı-Rosell M, Nordfalk J, O’Regan J, Ortiz-Rojas S, Perez-Ortiz JA, Sanchez-Martınez F, Ramırez-Sanchez G, Tyers FM (2011) Apertium: a free/open-source platform for rule-based machine translation. Mach Transl 25(2):127–144
Fromkin V, Rodman R, Hyams V (2011) An introduction to language, 9e. Wadsworth, Cengage Learning, Boston, MA
Garje G, Kharate G (2013) Survey of machine translation systems in India. Int J Nat Lang Comput (IJNLC) 2(4):47–67
Gehring J, Auli M, Grangier D, Dauphin YN (2016) A convolutional encoder model for neural machine translation. arXiv preprint arXiv:161102344
Gehring J, Auli M, Grangier D, Yarats D, Dauphin YN (2017) Convolutional sequence to sequence learning. arXiv preprint arXiv:170503122, pp 1–15
Gimenez J, Amig E (2006) Iqmt: a framework for automatic machine translation evaluation. In: Proceedings of the international conference on language resources and evaluation (LREC), pp 685–690. http://www.lrec-conf.org/proceedings/lrec2006/
Giménez J, Márquez L (2010) Asiya: an open toolkit for automatic machine translation (meta-) evaluation. Prague Bull Math Linguist 94:77
Gopal M, Jha GN (2011) Tagging Sanskrit corpus using bis pos tagset. In: Information systems for Indian languages, Springer, pp 191–194
Goyal V (2010) Development of a Hindi to Punjabi machine translation system
Goyal V, Lehal GS (2009) Evaluation of Hindi to Punjabi machine translation system. arXiv preprint arXiv:09101868
Goyal V, Lehal GS (2010) Web based hindi to punjabi machine translation system. J Emerg Technol Web Intellig 2(2):148–151
Goyal V, Lehal GS (2011) Hindi to Punjabi machine translation system. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies: systems demonstrations. Association for computational linguistics, pp 1–6
Goyal P, Sinha RMK (2009) A study towards design of an English to Sanskrit machine translation system. In: Sanskrit computational linguistics, Springer, pp 287–305
Hassan H, Aue A, Chen C, Chowdhary V, Clark J, Federmann C, Huang X, Junczys-Dowmunt M, Lewis W, Li M, et al. (2018) Achieving human parity on automatic Chinese to English news translation. arXiv preprint arXiv:180305567
Hutchins WJ (1995) Machine translation: a brief history. Concise history of the language sciences: from the Sumerians to the cognitivists, pp 431–445
Hutchins WJ, Somers HL (1992) An introduction to machine translation, vol 362. Academic Press, London
Hyderabad I (2018) Machine translation and natural language processing lab. http://ltrc.iiit.ac.in/
Jain R, Sinha R, Jain A (2001) Anubharti-using hybrid example-based approach for machine translation. In: Proceedings of symposium on translation support systems (STRANS-2001), IIT, Kanpur, pp 20–32
Jawaid B, Kamran A, Bojar O (2014) English to Urdu statistical machine translation: establishing a baseline. In: Proceedings of the fifth workshop on South and Southeast Asian natural language processing, pp 37–42
Jayan V, Bhadran V (2014) Anglabharati to anglamalayalam: an experience with English to Indian language machine translation. In: Proceedings of international conference on contemporary computing and informatics (IC3I), pp 282–287
Jha GN (2010) The tdil program and the Indian language corpora initiative (ilci). In: Chair NCC, Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M, Tapias D (eds) Proceedings of the international conference on language resources and evaluation (LREC), European Language Resources Association (ELRA), Valletta, Malta, pp 982–985. http:// sanskrit.jnu.ac.in/ilci/index.jsp
Jung H, Yuh S, Kim T, Park S (1999) A pattern-based approach using compound unit recognition and its hybridization with rule-based translation. Comput Intell 15(2):114–127
Kalchbrenner N, Blunsom P (2013) Recurrent continuous translation models. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1700–1709
Kelly C (2021) Tab-delimited bilingual sentence pairs from the tatoeba project (good for anki and similar flashcard applications). https://www.manythings.org/anki/
Khan S, Usman I (2019) A model for English to Urdu and Hindi machine translation system using translation rules and artificial neural network. Int Arab J Inf Technol 16(1):125–131
Khan N, Waqas A, Bajwa U, Durrani N (2013) English to urdu hierarchical phrase-based statistical machine translation. In: Proceedings of international joint conference on natural language processing, Japan, pp 72–76
Kitchenham B, Brereton OP, Budgen D, Turner M, Bailey J, Linkman S (2009) Systematic literature reviews in software engineering—a systematic literature review. Inf Softw Technol 51(1):7–15
Klein G, Kim Y, Deng Y, Senellart J, Rush AM (2017) Opennmt: open-source toolkit for neural machine translation. arXiv preprint arXiv:170102810
Koehn P (2009) Moses–statistical machine translation system
Kulkarni A (2013) A deterministic dependency parser with dynamic programming for Sanskrit. In: Proceedings of the second international conference on dependency linguistics (DepLing 2013), pp 157–166
Kulkarni A, Pokar S, Shukl D (2010) Designing a constraint based parser for Sanskrit. In: Sanskrit computational linguistics. Springer, pp 70–90
Kumar A, Mittal V, Kulkarni A (2010) Sanskrit compound processor. In: Sanskrit computational linguistics, Springer, pp 57–69
Kumar R, Jha P, Sahula V (2019) An augmented translation technique for low resource language pair: Sanskrit to Hindi translation. In: Proceedings of the 2019 2nd international conference on algorithms, computing and artificial intelligence, pp 377–383
Lane J (2016) The 10 most spoken languages in the world. URL https://www.babbel.com/en/magazine/the-10-most-spoken-languages-in-the-world/
Laskar SR, Khilji AFUR, Pakray P, Bandyopadhyay S (2020) Multimodal neural machine translation for English to Hindi. In: Proceedings of the 7th workshop on Asian translation, pp 109–113
Le Thuyen PT, Hung VT (2016) Automatic translation for vietnamese based on unl language. In: 2016 international conference on electronics, information, and communications (ICEIC), IEEE, pp 1–5
Lewis MP, Simons GF, Fennig CD (2015) Ethnologue: languages of Ecuador. SIL International, Texas
Lin CY, Och FJ (2004) Orange: a method for evaluating automatic evaluation metrics for machine translation. In: Proceedings of the 20th international conference on computational linguistics, association for computational linguistics, pp 501–507
Luong MT, Manning CD (2015) Stanford neural machine translation systems for spoken language domains. In: Proceedings of the international workshop on spoken language translation, pp 76–79
Mallikarjun B (2010) Patterns of Indian multilingualism. Strength Today Bright Hope Tomorrow 10(6):1–18
Mathur P, Shah R, Sawhney R, Mahata D (2018) Detecting offensive tweets in Hindi-English code-switched language. In: Proceedings of the sixth international workshop on natural language processing for social media, pp 18–26
Microsoft (2016) Microsoft translator launching neural network based translations for all its speech languages. https://blogs.msdn.microsoft.com/translation/2016/11/15/microsoft-translator-launching-neural-network-based-translations-for-all-its-speech-languages/
Microsoft (2017) Microsoft translator accelerates use of neural networks across its offerings. https://blogs.msdn.microsoft.com/translation/2017/11/15/microsoft-translator-accelerates-use-of-neural-networks-across-its
Mishra V, Mishra R (2008) Study of example based english to sanskrit machine translation. J Res Dev Comp Sci Eng 37:1–12
Mishra V, Mishra R (2009) Ann and rule based model for English to Sanskrit machine translation. INFOCOMP J Comput Sci 9(1):80–89
Mishra V, Mishra R (2012) English to sanskrit machine translation system: a rule-based approach. Int J Adv Intellig Paradig 4(2):168–184
Mishra H, Chakrawarti RK, Bansal P (2019) Implementation of Hindi to English idiom translation system. In: International conference on advanced computing networking and informatics, Springer, pp 371–380
Moher D, Shamseer L, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA (2015) Preferred reporting items for systematic review and meta-analysis protocols (prisma-p) 2015 statement. Syst Rev 4(1):1–9
Mujadia V, Sharma DM (2020) Nmt based similar language translation for Hindi-Marathi. In: Proceedings of the fifth conference on machine translation, pp 414–417
Narayana V (1994) Anusarak: a device to overcome the language barrier. PhD thesis, Ph. D. thesis, Dept. of CSE, IIT Kanpur
Narayan R, Singh V, Chakraverty S (2014) Quantum neural network based machine translator for Hindi to English. Sci World J 2014:1–8
Naskar S, Bandyopadhyay S (2005) Use of machine translation in india: Current status. AAMT J 36:25–31
NCST (2008) Matra: an English to Hindi machine translation system. Tech. rep., NCST MUMBAI
Nivre J, Hall J, Nilsson J, Chanev A, Eryigit G, Kübler S, Marinov S, Marsi E (2007) Maltparser: a language-independent system for data-driven dependency parsing. Nat Lang Eng 13(2):95–135
OCHF (2007) Google translator. In: Proceedings of joint conference on empirical methods in natural language processing and computational natural language learning, association for computational linguistics, Prague, pp 858–867
Pandey RK, Jha GN (2016) Error analysis of sahit-a statistical Sanskrit-Hindi translator. Proc Comp Sci 96:495–501
Pathak G, Godse S (2010) English to Sanskrit machine translation using transfer approach. In: Proceedings of international conference on methods and models in science and technology. American Institute of Physics, Pune, pp 122–126
Phillips AB (2011) Cunei: open-source machine translation with relevance-based models of each translation instance. Mach Transl 25(2):161–177
Post M, Cao Y, Kumar G (2015) Joshua 6: a phrase-based and hierarchical statistical machine translation system. Prague Bull Math Linguist 104(1):5–16
Pune C (2018) Indian language technology proliferation and development centre. http://tdil-dc.in/index.php?lang=en
Rajan R, Sivan R, Ravindran R, Soman K (2009) Rule based machine translation from English to Malayalam. In: Proceedings of international conference on advances in computing, control, & telecommunication technologies, 2009. ACT’09, IEEE, pp 439–441
Rao DD (1998) Machine translation. Resonance 3(7):61–70
Raulji JK, Saini JR (2019) Sanskrit-Gujarati constituency mapper for machine translation system. In: 2019 IEEE Bombay section signature conference (IBSSC), IEEE, pp 1–8
Reddy MV, Hanumanthappa M (2013) Indic language machine translation tool: English to Kannada/Telugu. In: Multimedia processing, communication and computing applications, Springer, pp 35–49
Rosenfeld R, Clarkson P (1997) Cmu-cambridge statistical language modeling toolkit v2
Sachdeva K, Srivastava R, Jain S, Sharma DM (2014) Hindi to English machine translation: using effective selection in multi-model smt. In: Proceedings of the international conference on language resources and evaluation (LREC), pp 1807–1811
Saha GK (2005) The ebanubad translator: a hybrid scheme. J Zhejiang Univ Sci A 6(10):1047–1050
Seasly J (2003) Machine translation: a survey of approaches. University of Michigan, Ann Arbor
Shahnawaz A, Mishra R (2011) Translation rules and ann based model for English to Urdu machine translation. INFOCOMP J Comput Sci 10(3):25–35
Shahnawaz A, Mishra R (2015) An English to Urdu translation model based on cbr, ann and translation rules. Int J Adv Intellig Paradigms 7(1):1–23
Sharma N (2011) English to Hindi statistical machine translation system. PhD thesis, M. Tech. thesis, Thapar University Patiala
Sheikh M, Conlon S (2013) Application of machine translation in bilingual knowledge management. Int J Intercult Inf Manage 3(2):123–137
Singh S, Dalal M, Vachani V, Bhattacharyya P, Damani OP (2007) Hindi generation from interlingua. In: Proceedings of machine translation summit, pp 1–8
Singh M, Kumar R, Chana I (2019) Ga-based machine translation system for Sanskrit to Hindi language. In: Recent trends in communication, computing, and electronics, Springer, pp 419–427
Singh M, Kumar R, Chana I (2020) Corpus based machine translation system with deep neural network for Sanskrit to Hindi translation. Proc Comp Sci 167:2534–2544
Sinha R, Jain A (2003) Anglahindi: an English to Hindi machine-aided translation system. In: Proceedings of MT Summit IX, New Orleans, USA, pp 494–497
Sinha RMK (2004) An engineering perspective of machine translation: anglabharti-ii and anubharti-ii architectures. In: Proceedings of international symposium on machine translation, NLP and translation support system (iSTRANS-2004), pp 10–17
Sinha RMK (2005) Integrating cat and mt in anglabharti-ii architecture. In: Proceedings of the 10th European association for machine translation (EAMT) conference, pp 235–244
Sinha RMK, Thakur A (2005) Machine translation of bi-lingual Hindi-English (hinglish) text. In: Proceedings of the 10th machine translation summit (MT Summit X), Phuket, Thailand, pp 149–156
Sinha R, Ivaraman K, Agrawal A, Jain R, Srivastava R, Jain A et al (1995) Anglabharti: a multilingual machine aided translation project on translation from English to Indian languages. In: Proceedings of IEEE international conference on systems, man and cybernetics. Intelligent systems for the 21st century, IEEE, vol 2, pp 1609–1614
Sitender, Bawa S (2018) Sansunl: a Sanskrit to unl enconverter system. IETE J Res 1–12. https://doi.org/10.1080/03772063.2018.1528187
Slocum J (1985) A survey of machine translation: its history, current status, and future prospects. Comput Linguist 11(1):1–17
Sridhar R, Sethuraman P, Krishnakumar K (2016) English to Tamil machine translation system using universal networking language. Sādhanā 41(6):607–620
Stolcke A (2002) Srilm—an extensible language modeling toolkit. In: Proceedings of seventh international conference on spoken language processing, pp 1–4. http://www.speech.sri.com/projects/srilm/
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Proceedings of advances in neural information processing systems, pp 3104–3112
Tiedemann J (2009) News from opus—a collection of multilingual parallel corpora with tools and interfaces. In: Recent advances in natural language processing, vol 5, pp 237–248. http://opus.nlpl.eu/
Udupa R, Faruquie TA (2005) An English-Hindi statistical machine translation system. In: Natural language processing–IJCNLP 2004, Springer, pp 254–262
Upadhyay P, Jaiswal UC, Ashish K (2014) Transish: translator from Sanskrit to English—a rule based machine translation. Int J Curr Eng Technol E-ISSN, pp 2277–4106
Van Slype G (1979) Critical study of methods for evaluating the quality of machine translation. Prepared for the Commission of European Communities directorate general scientific and technical information and information management report BR 19142
Vaswani A, Zhao Y, Fossum V, Chiang D (2013) Decoding with large-scale neural language models improves translation. In: Proceedings of the 2013 conference on empirical methods in natural language processing, pp 1387–1392
Venkatapathy S, Bangalore S (2009) Discriminative machine translation using global lexical selection. ACM Trans Asian Lang Inf Process (TALIP) 8(2):8
Wu Y, Schuster M, Chen Z, Le QV, Norouzi M, Macherey W, Krikun M, Cao Y, Gao Q, Macherey K et al (2016) Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:160908144
Yandex (2017) Yandex blog. https://yandex.com/company/blog/one-model-is-better-than-two-yu-yandex-translate-launches-a-hybrid-machine-translation-system/
Zhang M (2017) History and frontier of the neural machine translation
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We have no conflicts of interest to disclose.
Human and animal rights
This article does not contain any studies with animals performed by any of the authors. This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Sitender, Bawa, S., Kumar, M. et al. A comprehensive survey on machine translation for English, Hindi and Sanskrit languages. J Ambient Intell Human Comput 14, 3441–3474 (2023). https://doi.org/10.1007/s12652-021-03479-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-021-03479-0