interview shit network
interview network part
加载过慢请开启缓存,浏览器默认开启
我真的好害怕,心里老是充满着苦涩,我不知道我能不能成功,如果一无所有会怎么办,🐶和❄️加油了,搞完继续做DF的lateral,我搞不清楚我想要什么了,只能硬着头皮向前走去.
we need a bunch of tokens in order to process via vectordb
for example for a sentence like
I like playing counter strike
first we need to get some tokenizer(In this case BertTokenizer)
first we generate a dictionary using WordPieces
the dictionary contains
[I, like, playing, counter, strike]
for every word which is inside the dictionary, we directly use it, otherwise we split it into even smallar part
for
unlike
we generate something like
["un##", "like"]
we also have some special tokens
[[CLS], [SEP], [PAD]]
Then we generate a map from word to ID
also combine a mask which have 0/1 indicates whether its a PAD or a word