Skip to content

The foundational model that we will be training on is a pre-trained model that was trained by using Bidirectional Encoder Representations and Transformers (BERT).

Notifications You must be signed in to change notification settings

MobinEzzati/BERT-semantic-Model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

BERT-semantic-Model

The foundational model that we will be training on is a pre-trained model that was trained by using Bidirectional Encoder Representations and Transformers (BERT). The specific BERT model we are using is the “bert-base-uncase” model was created by Google, and it can be load from the transformer’s library import. The neat thing about this pre-trained model is that it does differentiate between upper-case and lower-case letters. BERT transformer models come pre-trained using the English language and a masked language modeling (MLM) objective. The training data was a large corpus of English data like Wikipedia in a self-supervised fashion. It was pretrained using raw text only, with no human’s supervision. It can automatically generate inputs and labels from text. It was pretrained to have two objectives, Masked Language Modeling (MLM), and Next Sentence Prediction (NSP). MLM allows a model to learn bidirectional representations of sentences. Which is a nice feature to more accurately determine how sentiment is produced by sentence context and not the definition of a word alone. The are a few variations for the Bert models like case sensitive ones, larges one, small ones, multilingual, and even ones meant for word-masking. Some key specifications for Bert models are that they consist of 12 transformer layers for the encoder, they have a hidden size of 76, and it uses 12 attention heads to allow it to work on different parts of the input sequence at once. The bidirectional embeddings approach is ideal for NLP tasks as it allows it to capture rich contextual information for each token in a given sequence. Also, since it provides features that allow a developer to fine tune it on a downstream task like text classification

About

The foundational model that we will be training on is a pre-trained model that was trained by using Bidirectional Encoder Representations and Transformers (BERT).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published