Launched by Sarvam AI, Sarvam 1 LLM is Trained in English and Ten Indic Languages

Nitin Konde, Varun Bhardwaj

26 Oct 2024 — 2 min read

Sarvam AI Launches Sarvam 1 LLM Trained in English & Ten Indic Languages

On October 24, Sarvam AI, an artificial intelligence (AI) firm supported by Lightspeed, unveiled Sarvam 1, a Large Language Model (LLM). According to a tweet on X (previously Twitter), the business says it is India's first indigenous multilingual LLM, trained from scratch on domestic AI infrastructure in ten Indian languages and English.

Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi, Oriya, Punjabi, Tamil, and Telugu are among the ten major Indian languages that Sarvam 1 supports in addition to English. The LLM uses a two-billion-parameter language model and is trained on Nvidia's H100 Graphics Processing Unit (GPU).

Sarvam AI uses Nvidia services and AI4Bharat's open-source technology

In order to optimise and implement conversational AI agents with sub-second latency, Sarvam AI also makes use of a variety of Nvidia services and products, including its microservice, conversational AI, LLM software, and inference server.

In addition to Nvidia, the LLM made use of AI4Bharat's open-source technology and language resources, as well as Yotta's data centres for computational infrastructure. According to a blog post by the AI startup, Sarvam-1's strong performance and computational efficiency make it especially well-suited for real-world uses, such as deployment on edge devices.

In specifics, Sarvam 1 clearly beats Gemma-2-2B and Llama-3.2-3B on a number of common benchmarks, such as MMLU, Arc-Challenge, and IndicGenBench, while attaining comparable results to Llama 3.1 8B, the company stated.

Functioning of Various LLM Models Launched by the Company

India's first Hindi LLM, Open Hathi, was introduced by the AI firm in December 2023. The Llama2-7B architecture from Meta AI, which has 48,000 token extensions, served as the foundation for the model. However, a training corpus of two trillion tokens is used to develop Sarvam.

Because of its effective tokeniser and unique data pipeline, which can produce diversified and high-quality text while preserving factual correctness, the LLM has two trillion tokens of synthetic Indic data. In addition to being four to six times faster during inference, Sarvam claimed that the most recent model from their stable meets or surpasses much larger models like Llama 3.1 8B.

The process by which a trained model predicts or deduces from fresh data using the patterns it discovered during training is known as inference in artificial intelligence. Compared to current Indic datasets, the companies' pretraining corpus, Sarvam-2T, supports eight times as much scientific material, three times as high quality, and two times as long documents. The total number of Indic tokens stored by Sarvam-2T is around 2 trillion. Apart from Hindi, which makes up over 20% of the data, the data is distributed nearly evenly among the ten supported languages.

50 Most Profitable Rural Business Ideas

If you live in a rural area, it may seem hard to think of business ideas. Numerous business sectors that are easily accessible in urban areas are not available in the case of rural regions. The communities in the latter region face problems that city-dwellers aren't aware of.

Dropping Out to Cashing In: Remarkable Rise of Nikhil Kamath to India’s Youngest Billionaire

Is it just me, or is skipping school or the college exit door like a secret handshake to the world of success and stardom? There are so many dropouts-turned-superstars out there that it's starting to seem like an underground club, where the membership fee is ditching the classroom.

Controversies Surrounding Shark Tank India: Delving into Criticisms of the Show

Shark Tank India has stormed the Indian entrepreneurial world. It has not only inspired budding entrepreneurs but also taken business ideas into the living rooms of millions. In its unique format, Shark Tank India has inspired many towards big dreams of success. However, the show is not without its criticism.

PharmEasy Success Story, know more on PharmEasy Wikipedia

PharmEasy Story: From Billion-Dollar Unicorn to Million-Dollar Valuation

Company Profile is an initiative by StartupTalky to publish verified information on different startups and organizations. The medicine industry has always been as disorganized as we can imagine. Though we were all pleasantly happy with the wholesale and retail market structure that the pharmaceutical industry has offered its customers, the