EleutherAI

From Wikipedia, the free encyclopedia

EleutherAI
EleutherAI logo.png
Type of businessResearch co-operative
Founded3 July 2020; 2 years ago (2020-07-03)[1]
Founder(s)
Key people
  • Stella Biderman
  • Curtis Huebner
  • Shivanshu Purohit
  • Ben Wang
IndustryArtificial intelligence
ProductsGPT-Neo, GPT-J, The Pile
URLeleuther.ai

EleutherAI (/əˈlθər/[2]) is a grass-roots non-profit artificial intelligence (AI) research group. The group, considered an open-source version of OpenAI,[3] was formed in a Discord server in July 2020 to organize a replication of GPT-3. In January 2023, EleutherAI formally incorporated as a non-profit research institute.[4]

History[edit]

EleutherAI began as a Discord server on July 7, 2020 under the tentative name "LibreAI" before rebranding to "EleutherAI" later that month.[5]

On December 30, 2020, EleutherAI released the Pile, a curated dataset of diverse text for training large language models.[6] While the paper referenced the existence of the GPT-Neo models, the models themselves were not released until March 21, 2021.[7] According to a retrospective written several months later, the authors did not anticipate that "people would care so much about our 'small models.'"[1] On June 9, 2021, EleutherAI followed this up with GPT-J-6B, a six billion parameter language model that was again the largest open source GPT-3-like model in the world.[8]

Following the release of DALL-E by OpenAI in January 2021, EleutherAI started working on text-to-image synthesis models. When OpenAI didn't release DALL-E publicly, EleutherAI's Katherine Crowson and digital artist Ryan Murdock developed a technique for using CLIP (another model developed by OpenAI) to convert regular image generation models into text-to-image synthesis ones.[9][10][11][12] Building on ideas dating back to Google's DeepDream,[13] they found their first major success combining CLIP with another publicly available model called VQGAN. Crowson released the technology by tweeting notebooks demonstrating the technique that people could run for free without any special equipment.[14][15][16] This work is credited by Stability AI CEO Emad Mostaque as motivating the founding of Stability AI.[17]

While EleutherAI initially turned down funding offers, preferring to use Google's TPU Research Cloud Program to source their compute,[18] by early 2021 they had accepted funding from CoreWeave (a small cloud computing company) and SpellML (a cloud infrastructure company) in the form of access to powerful GPU clusters that are necessary for large scale machine learning research. On Feb 10, 2022 they released GPT-NeoX-20B, a model similar to their prior work but scaled up thanks to the resources CoreWeave provided.[19] This model was their third to have the title "largest open source GPT-3-style language model in the world,"[citation needed] and first to be the largest open-source language model (of any type), surpassing a model trained by Meta AI that held the title for two months.

Research[edit]

According to their website, EleutherAI is a "decentralized grassroots collective of volunteer researchers, engineers, and developers focused on AI alignment, scaling, and open source AI research".[20] While they do not sell any of their technologies as products, they publish the results of their research in academic venues, write blog posts detailing their ideas and methodologies, and provide trained models for anyone to use for free.[citation needed]

The Pile[edit]

The Pile is an 886 GB dataset designed for training large language models. It was originally developed to train EleutherAI's GPT-Neo models but has become widely used to train other models, including Microsoft's Megatron-Turing Natural Language Generation,[21][22] Meta AI's Open Pre-trained Transformers,[23] LLaMA,[24] and Galactica,[25] Stanford University's BioMedLM 2.7B,[26] the Beijing Academy of Artificial Intelligence's Chinese-Transformer-XL,[27] and Yandex's YaLM 100B.[28] Compared to other datasets, the Pile's main distinguishing features are that it is a curated selection of data chosen by researchers at EleutherAI to contain information they thought language models should learn and that it is the only such dataset that is thoroughly documented by the researchers who developed it.[29]

GPT models[edit]

EleutherAI's most prominent research relates to its work to train open source large language models inspired by OpenAI's GPT-3.[30] EleutherAI's "GPT-Neo" model series has released 125 million, 1.3 billion, 2.7 billion, 6 billion, and 20 billion parameter models.

  • GPT-Neo (125M, 1.3B, 2.7B):[31] released in March 2021, it was the largest open source GPT-3-style language model in the world at the time of release.
  • GPT-J (6B):[32] released in March 2021, it was the largest open source GPT-3-style language model in the world at the time of release.[33]
  • GPT-NeoX (20B):[34] released in February 2022, it was the largest open-source language model in the world at the time of release.

While the overwhelming majority of large language models are trained in either English or Chinese, EleutherAI also trains language models in other languages, such as the Korean-language Polyglot-Ko.[35]

Public reception[edit]

Praise[edit]

EleutherAI's work to democratize GPT-3 has won substantial praise from a variety of open-source advocates. They won the UNESCO Netexplo Global Innovation Award in 2021,[36] InfoWorld's Best of Open Source Software Award in 2021[37] and 2022,[38] was nominated for VentureBeat's AI Innovation Award in 2021.[39]

Gary Marcus, a cognitive scientist and noted critic of deep learning companies such as OpenAI and DeepMind,[40] has repeatedly[41][42] praised EleutherAI's dedication to open source and transparent research.

Maximilian Gahntz, a senior policy researcher at the Mozilla Foundation, applauded EleutherAI's efforts to give more researchers the ability to audit and assess AI technology. "If models are open and if data sets are open, that'll enable much more of the critical research that's pointed out many of the flaws and harms associated with generative AI and that's often far too difficult to conduct."[43]

Criticism[edit]

Technology journalist Kyle Wiggers has raised concerns about whether EleutherAI is as independent as it claims, or "whether the involvement of commercially motivated ventures like Stability AI and Hugging Face — both of which are backed by substantial venture capital — might influence EleutherAI's research."[44]

References[edit]

  1. ^ a b Leahy, Connor; Hallahan, Eric; Gao, Leo; Biderman, Stella (7 July 2021). "What A Long, Strange Trip It's Been: EleutherAI One Year Retrospective".
  2. ^ "Talk with Stella Biderman on The Pile, GPT-Neo and MTG". The Interference Podcast. 2 April 2021. Retrieved 26 March 2023.
  3. ^ Smith, Craig (21 March 2022). "EleutherAI: When OpenAI Isn't Open Enough". IEEE Spectrum. IEEE. Retrieved 17 December 2022.
  4. ^ Wiggers, Kyle (2 March 2023). "Stability AI, Hugging Face and Canva back new AI research nonprofit". TechCrunch. Retrieved 6 April 2023.
  5. ^ Leahy, Connor; Hallahan, Eric; Gao, Leo; Biderman, Stella (7 July 2021). "What A Long, Strange Trip It's Been: EleutherAI One Year Retrospective". EleutherAI Blog. Retrieved 14 April 2023.
  6. ^ Gao, Leo; Biderman, Stella; Black, Sid; et al. (31 December 2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv 2101.00027. arXiv:2101.00027.
  7. ^ "GPT-3's free alternative GPT-Neo is something to be excited about". VentureBeat. 15 May 2021. Retrieved 14 April 2023.
  8. ^ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". www.forefront.ai. Archived from the original on 9 March 2023. Retrieved 1 March 2023.
  9. ^ MIRANDA, LJ. "The Illustrated VQGAN". ljvmiranda921.github.io. Retrieved 8 March 2023.
  10. ^ "Inside The World of Uncanny AI Twitter Art". Nylon. Retrieved 8 March 2023.
  11. ^ "This AI Turns Movie Text Descriptions Into Abstract Posters". Yahoo Life. Retrieved 8 March 2023.
  12. ^ Quach, Katyanna. "A man spent a year in jail on a murder charge involving disputed AI evidence. Now the case has been dropped". www.theregister.com. Retrieved 8 March 2023.
  13. ^ "Alien Dreams: An Emerging Art Scene - ML@B Blog". Alien Dreams: An Emerging Art Scene - ML@B Blog. Retrieved 8 March 2023.
  14. ^ "We asked an AI tool to 'paint' images of Australia. Critics say they're good enough to sell". 14 July 2021. Retrieved 8 March 2023 – via www.abc.net.au.
  15. ^ Nataraj, Poornima (28 February 2022). "Online tools to create mind-blowing AI art". Analytics India Magazine. Retrieved 8 March 2023.
  16. ^ "Meet the Woman Making Viral Portraits of Mental Health on TikTok". www.vice.com. Retrieved 8 March 2023.
  17. ^ @EMostaque (2 March 2023). "Stability AI came out of @AiEleuther and we have been delighted to incubate it as the foundation was set up" (Tweet) – via Twitter.
  18. ^ "EleutherAI: When OpenAI Isn't Open Enough". IEEE Spectrum. Retrieved 1 March 2023.
  19. ^ (PDF). 10 February 2022 https://web.archive.org/web/20220210034643/http://eaidata.bmk.sh/data/GPT_NeoX_20B.pdf. Archived from the original on 10 February 2022. Retrieved 1 March 2023. {{cite web}}: Missing or empty |title= (help)CS1 maint: bot: original URL status unknown (link)
  20. ^ "EleutherAI Website". EleutherAI. Retrieved 1 July 2021.
  21. ^ "Microsoft and Nvidia team up to train one of the world's largest language models". 11 October 2021. Retrieved 8 March 2023.
  22. ^ "AI: Megatron the Transformer, and its related language models". 24 September 2021. Retrieved 8 March 2023.
  23. ^ Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June 2022). "OPT: Open Pre-trained Transformer Language Models". arXiv:2205.01068 [cs.CL].
  24. ^ Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Grave, Edouard; Lample, Guillaume; et al. (27 February 2023). "LLaMA: Open and Efficient Foundation Language Models". arXiv:2302.13971 [cs.CL].
  25. ^ Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science". arXiv:2211.09085 [cs.CL].
  26. ^ "Model Card for BioMedLM 2.7B". huggingface.co. Retrieved 5 June 2023.
  27. ^ Yuan, Sha; Zhao, Hanyu; Du, Zhengxiao; Ding, Ming; Liu, Xiao; Cen, Yukuo; Zou, Xu; Yang, Zhilin; Tang, Jie (1 January 2021). "WuDaoCorpora: A super large-scale Chinese corpora for pre-training language models". AI Open. 2: 65–68. doi:10.1016/j.aiopen.2021.06.001. S2CID 236712622. Retrieved 8 March 2023 – via ScienceDirect.
  28. ^ Grabovskiy, Ilya (2022). "Yandex publishes YaLM 100B, the largest GPT-like neural network in open source" (Press release). Yandex. Retrieved 5 June 2023.
  29. ^ Khan, Mehtab; Hanna, Alex (13 September 2022). "The Subjects and Stages of AI Dataset Development: A Framework for Dataset Accountability". SSRN 4217148. Retrieved 8 March 2023 – via papers.ssrn.com.
  30. ^ "GPT-3's free alternative GPT-Neo is something to be excited about". 15 May 2021.
  31. ^ Andonian, Alex; Biderman, Stella; Black, Sid; Gali, Preetham; Gao, Leo; Hallahan, Eric; Levy-Kramer, Josh; Leahy, Connor; Nestler, Lucas; Parker, Kip; Pieler, Michael; Purohit, Shivanshu; Songz, Tri; Phil, Wang; Weinbach, Samuel (13 August 2021). "GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch" – via GitHub.
  32. ^ "EleutherAI/gpt-j-6B · Hugging Face". huggingface.co.
  33. ^ "GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront". www.forefront.ai. Archived from the original on 9 March 2023. Retrieved 1 March 2023.
  34. ^ Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. (1 May 2022). GPT-NeoX-20B: An Open-Source Autoregressive Language Model. Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. Vol. Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136. Retrieved 19 December 2022.
  35. ^ ""한국어기반 AI소스 공개합니다 마음껏 쓰세요"". 매일경제. 31 October 2022.
  36. ^ "Request Rejected". Retrieved 8 March 2023.
  37. ^ Yegulalp, James R. Borck, Martin Heller, Andrew C. Oliver, Ian Pointer, Matthew Tyson and Serdar (18 October 2021). "The best open source software of 2021". InfoWorld. Retrieved 8 March 2023.
  38. ^ Yegulalp, James R. Borck, Martin Heller, Andrew C. Oliver, Ian Pointer, Isaac Sacolick, Matthew Tyson and Serdar (17 October 2022). "The best open source software of 2022". InfoWorld. Retrieved 8 March 2023.
  39. ^ "VentureBeat presents AI Innovation Awards nominees at Transform 2021". 16 July 2021. Retrieved 8 March 2023.
  40. ^ "What's next for AI: Gary Marcus talks about the journey toward robust artificial intelligence". ZDNET. Retrieved 8 March 2023.
  41. ^ @GaryMarcus (10 February 2022). "GPT-NeoX-20B, 20 billion parameter large language model made freely available to public, with candid report on strengths, limits, ecological costs, etc" (Tweet) – via Twitter.
  42. ^ @GaryMarcus (19 February 2022). "incredibly important result: "our results raise the question of how much [large language] models actually generalize beyond pretraining data"" (Tweet) – via Twitter.
  43. ^ Chowdhury, Meghmala (29 December 2022). "Will Powerful AI Disrupt Industries Once Thought to be Safe in 2023?". Analytics Insight. Retrieved 6 April 2023.
  44. ^ Wiggers, Kyle (2 March 2023). "Stability AI, Hugging Face and Canva back new AI research nonprofit". Retrieved 8 March 2023.