The foundational model that we will be training on is a pre-trained model that was trained by using Bidirectional Encoder Representations and Transformers (BERT). The specific BERT model we are using is the “bert-base-uncase” model was created by Google, and it can be load from the transformer’s library import. The neat thing about this pre-trained model is that it does differentiate between upper-case and lower-case letters. BERT transformer models come pre-trained using the English language and a masked language modeling (MLM) objective. The training data was a large corpus of English data like Wikipedia in a self-supervised fashion. It was pretrained using raw text only, with no human’s supervision. It can automatically generate inputs and labels from text. It was pretrained to have two objectives, Masked Language Modeling (MLM), and Next Sentence Prediction (NSP). MLM allows a model to learn bidirectional representations of sentences. Which is a nice feature to more accurately determine how sentiment is produced by sentence context and not the definition of a word alone. The are a few variations for the Bert models like case sensitive ones, larges one, small ones, multilingual, and even ones meant for word-masking. Some key specifications for Bert models are that they consist of 12 transformer layers for the encoder, they have a hidden size of 76, and it uses 12 attention heads to allow it to work on different parts of the input sequence at once. The bidirectional embeddings approach is ideal for NLP tasks as it allows it to capture rich contextual information for each token in a given sequence. Also, since it provides features that allow a developer to fine tune it on a downstream task like text classification
-
Notifications
You must be signed in to change notification settings - Fork 0
MobinEzzati/BERT-semantic-Model
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
The foundational model that we will be training on is a pre-trained model that was trained by using Bidirectional Encoder Representations and Transformers (BERT).
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published