1 Star 0 Fork 1

modelee / bertin-roberta-base-spanish

加入 Gitee
与超过 1200万 开发者一起发现、参与优秀开源项目,私有仓库也完全免费 :)
免费加入
该仓库未声明开源许可证文件(LICENSE),使用请关注具体项目描述及其代码上游依赖。
克隆/下载
贡献代码
同步代码
取消
提示: 由于 Git 不支持空文件夾,创建文件夹后会生成空的 .keep 文件
Loading...
README
language license tags pipeline_tag widget
false
CC-BY 4.0
spanish
roberta
fill-mask
text
Lo hizo en un abrir y cerar de .

BERTIN

BERTIN is a series of BERT-based models for Spanish. This one is a RoBERTa-large model trained from scratch on the Spanish portion of mC4 using Flax, including training scripts.

This is part of the Flax/Jax Community Week, organised by HuggingFace and TPU usage sponsored by Google.

Spanish mC4

The Spanish portion of mC4 containes about 416 million records and 235 billion words.

$ zcat c4/multilingual/c4-es*.tfrecord*.json.gz | wc -l
416057992
$ zcat c4/multilingual/c4-es*.tfrecord-*.json.gz | jq -r '.text | split(" ") | length' | paste -s -d+ - | bc
235303687795

Team members

  • Javier de la Rosa (versae)
  • Manu Romero (mrm8488)
  • María Grandury (mariagrandury)
  • Ari Polakov (aripo99)
  • Pablogps
  • daveni
  • Sri Lakshmi

Useful links

空文件

简介

暂无描述 展开 收起
Python 等 2 种语言
取消

发行版

暂无发行版

贡献者

全部

近期动态

加载更多
不能加载更多了
1
https://gitee.com/modelee/bertin-roberta-base-spanish.git
git@gitee.com:modelee/bertin-roberta-base-spanish.git
modelee
bertin-roberta-base-spanish
bertin-roberta-base-spanish
develop

搜索帮助

53164aa7 5694891 3bd8fe86 5694891