May many large language models (LLMs) BLOOM

Manoj Bapat
Sep 24, 2022
1 min read

Main idea:

BigScience Large Open-science Open-access Multilingual Language Model (BLOOM) is a transformer based language model released in July 2022 and was developed in a year by over 1,000 researchers from 60 countries and over 250 institutions. It is inspired by scientific collaborations such as CERN that aim to make their research open and available to the global scientific community research community.

How big is it and what it is trained on:

The very large multilingual neural network language model has 176 Billion parameters parameters decoder-only architecture similar to GPT. There are 70 layers, 112 attention heads per layers and hidden dimensionality of 14336 - 2048 tokens sequence length. The model was trained on a very large multilingual text dataset from 46 languages such as English (30.3% of corpora), Chinese(17.7%), French(13.1%), Code (13%), Spanish(10.7%), Indic languages (~2%) and Niger-Congo languages(<0.1%), to name a few. The training can take anywhere from 3 to 4 months. The super computer used for model training is mostly powered by nuclear energy.

Why this matters:

The number and sizes of large language models continue to grow. Open source and open research models such as BLOOM democratize the field outside the realm of big tech companies and also incentivize the big tech companies to continue contributing to open source LLM development.
The emphasis on sourcing multilingual text data sets for training the model, while still not fully representative of the diversity in global languages, is a good first step in ensuring large language models use training data that is diverse and reduces bias in the models.

Find out more:

https://bigscience.notion.site/BigScience-214dc9a8c1434d7bbcddb391c383922a

Home

May many large language models (LLMs) BLOOM

Recent Posts

Comments

Subscribe Form