大厂100 NLP interview questions外企

这篇具有很好参考价值的文章主要介绍了大厂100 NLP interview questions外企。希望对大家有所帮助。如果存在错误或未考虑完全的地方,请大家不吝赐教,您也可以点击"举报违法"按钮提交疑问。

CLASSIC NLP

TF-IDF & ML (8)
  1. Write TF-IDF from scratch.

  2. What is normalization in TF-IDF ?

  3. Why do you need to know about TF-IDF in our time, and how can you use it in complex models?

  4. Explain how Naive Bayes works. What can you use it for?

  5. How can SVM be prone to overfitting?

  6. Explain possible methods for text preprocessing ( lemmatization and stemming ). What algorithms do you know for this, and in what cases would you use them?

  7. What metrics for text similarity do you know?

  8. Explain the difference between cosine similarity and cosine distance. Which of these values can be negative? How would you use them?

METRICS (7)
  1. Explain precision and recall in simple words and what you would look at in the absence of F1 score?

  2. In what case would you observe changes in specificity ?

  3. When would you look at macro, and when at micro metrics? Why does the weighted metric exist?

  4. What is perplexity? What can we consider it with?

  5. What is the BLEU metric?

  6. Explain the difference between different types of ROUGE metrics?

  7. What is the difference between BLUE and ROUGE?

WORD2VEC(9)
  1. Explain how Word2Vec learns? What is the loss function? What is maximized?

  2. What methods of obtaining embeddings do you know? When will each be better?

  3. What is the difference between static and contextual embeddings?

  4. What are the two main architectures you know, and which one learns faster?

  5. What is the difference between Glove, ELMO, FastText, and Word2Vec ?

  6. What is negative sampling and why is it needed? What other tricks for Word2Vec do you know, and how can you apply them?

  7. What are dense and sparse embeddings? Provide examples.

  8. Why might the dimensionality of embeddings be important?

  9. What problems can arise when training Word2Vec on short textual data, and how can you deal with them?

RNN & CNN(7)
  1. How many training parameters are there in a simple 1-layer RNN ?

  2. How does RNN training occur?

  3. What problems exist in RNN?

  4. What types of RNN networks do you know? Explain the difference between GRU and LSTM?

  5. What parameters can we tune in such networks? (Stacking, number of layers)

  6. What are vanishing gradients for RNN? How do you solve this problem?

  7. Why use a Convolutional Neural Network in NLP, and how can you use it? How can you compare CNN within the attention paradigm?

NLP and TRANSFORMERS

ATTENTION AND TRANSFORMER ARCHITECTURE (15)
  1. How do you compute attention ? (additional: for what task was it proposed, and why?)

  2. Complexity of attention? Compare it with the complexity in RNN.

  3. Compare RNN and attention . In what cases would you use attention, and when RNN?

  4. Write attention from scratch.

  5. Explain masking in attention.

  6. What is the dimensionality of the self-attention matrix?

  7. What is the difference between BERT and GPT in terms of attention calculation ?

  8. What is the dimensionality of the embedding layer in the transformer?

  9. Why are embeddings called contextual? How does it work?

  10. What is used in transformers, layer norm or batch norm , and why?

  11. Why do transformers have PreNorm and PostNorm ?

  12. Explain the difference between soft and hard (local/global) attention?

  13. Explain multihead attention.

  14. What other types of attention mechanisms do you know? What are the purposes of these modifications?

  15. How does self-attention become more complex with an increase in the number of heads?

TRANSFORMER MODEL TYPES (7)

  1. Why does BERT largely lag behind RoBERTa , and what can you take from RoBERTa?

  2. What are T5 and BART models? How do they differ?

  3. What are task-agnostic models? Provide examples.

  4. Explain transformer models by comparing BERT, GPT, and T5.

  5. What major problem exists in BERT, GPT, etc., regarding model knowledge? How can this be addressed?

  6. How does a decoder-like GPT work during training and inference? What is the difference?

  7. Explain the difference between heads and layers in transformer models.

POSITIONAL ENCODING (6)

  1. Why is information about positions lost in embeddings of transformer models with attention?

  2. Explain approaches to positional embeddings and their pros and cons.

  3. Why can’t we simply add an embedding with the token index?

  4. Why don’t we train positional embeddings?

  5. What is relative and absolute positional encoding?

  6. Explain in detail the working principle of rotary positional embeddings.

PRETRAINING (4)
  1. How does causal language modeling work?

  2. When do we use a pretrained model?

  3. How to train a transformer from scratch? Explain your pipeline, and in what cases would you do this?

  4. What models, besides BERT and GPT, do you know for various pretraining tasks?

TOKENIZERS (9)
  1. What types of tokenizers do you know? Compare them.

  2. Can you extend a tokenizer? If yes, in what case would you do this? When would you retrain a tokenizer? What needs to be done when adding new tokens?

  3. How do regular tokens differ from special tokens?

  4. Why is lemmatization not used in transformers? And why do we need tokens?

  5. How is a tokenizer trained? Explain with examples of WordPiece and BPE .

  6. What position does the CLS vector occupy? Why?

  7. What tokenizer is used in BERT, and which one in GPT?

  8. Explain how modern tokenizers handle out-of-vocabulary words?

  9. What does the tokenizer vocab size affect? How will you choose it in the case of new training?

TRAINING (8)
  1. What is class imbalance? How can it be identified? Name all approaches to solving this problem.

  2. Can dropout be used during inference, and why?

  3. What is the difference between the Adam optimizer and AdamW?

  4. How do consumed resources change with gradient accumulation?

  5. How to optimize resource consumption during training?

  6. What ways of distributed training do you know?

  7. What is textual augmentation? Name all methods you know.

  8. Why is padding less frequently used? What is done instead?

  9. Explain how warm-up works.

  10. Explain the concept of gradient clipping?

  11. How does teacher forcing work, provide examples?

  12. Why and how should skip connections be used?

  13. What are adapters? Where and how can we use them?

  14. Explain the concepts of metric learning. What approaches do you know?

INFERENCE (4)
  1. What does the temperature in softmax control? What value would you set?

  2. Explain types of sampling in generation? top-k, top-p, nucleus sampling?

  3. What is the complexity of beam search, and how does it work?

  4. What is sentence embedding? What are the ways you can obtain it?

LLM (13)
  1. How does LoRA work? How would you choose parameters? Imagine that we want to fine-tune a large language model, apply LORA with a small R, but the model still doesn’t fit in memory. What else can be done?

  2. What is the difference between prefix tuning , p-tuning , and prompt tuning ?

  3. Explain the scaling law .

  4. Explain all stages of LLM training. From which stages can we abstain, and in what cases?

  5. How does RAG work? How does it differ from few-shot KNN?

  6. What quantization methods do you know? Can we fine-tune quantized models?

  7. How can you prevent catastrophic forgetting in LLM?

  8. Explain the working principle of KV cache , Grouped-Query Attention , and MultiQuery Attention .

  9. Explain the technology behind MixTral, what are its pros and cons?

  10. How are you? How are things going?文章来源地址https://www.toymoban.com/news/detail-852937.html

到了这里,关于大厂100 NLP interview questions外企的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处: 如若内容造成侵权/违法违规/事实不符,请点击违法举报进行投诉反馈,一经查实,立即删除!

领支付宝红包 赞助服务器费用

相关文章

  • 自然语言处理(NLP)

    基础 自然语言处理(NLP) 自然语言处理PaddleNLP-词向量应用展示 自然语言处理(NLP)-前预训练时代的自监督学习 自然语言处理PaddleNLP-预训练语言模型及应用 自然语言处理PaddleNLP-文本语义相似度计算(ERNIE-Gram) 自然语言处理PaddleNLP-词法分析技术及其应用 自然语言处理Pa

    2024年02月08日
    浏览(35)
  • NLP(自然语言处理)

     一、NLP是什么 自然语言处理( Natural Language Processing, NLP)是计算机科学领域与人工智能领域中的一个重要方向。它研究能实现人与计算机之间用自然语言进行有效通信的各种理论和方法。自然语言处理是一门融语言学、计算机科学、数学于一体的科学。因此,这一领域的研究

    2024年02月02日
    浏览(43)
  • NLP自然语言介绍

    自然语言处理(Natural Language Processing, NLP)是人工智能领域中研究和处理人类语言的一项技术。它涉及将人类语言转化为计算机可理解和处理的形式,以便计算机能够理解、分析、生成和回复自然语言。 NLP技术的目标是使计算机能够像人类一样理解和处理语言。它包括以下几

    2024年01月24日
    浏览(33)
  • 自然语言处理2-NLP

    目录 自然语言处理2-NLP 如何把词转换为向量 如何让向量具有语义信息 在CBOW中 在Skip-gram中 skip-gram比CBOW效果更好 CBOW和Skip-gram的算法实现 Skip-gram的理想实现 Skip-gram的实际实现 在自然语言处理任务中, 词向量(Word Embedding)是表示自然语言里单词的一种方法 ,即把每个词都表

    2024年02月11日
    浏览(56)
  • 聊聊自然语言处理NLP

    自然语言处理(NLP)的正式定义:是一个使用计算机科学、人工智能(AI)和形式语言学概念来分析自然语言的研究领域。不太正式的定义表明:它是一组工具,用于从自然语言源(如web页面和文本文档)获取有意义和有用的信息。 NLP工具的实现一般是基于机器学习与深度学习、其它

    2024年02月14日
    浏览(53)
  • 自然语言处理(NLP)技术

            自然语言处理技术是一种人工智能技术,它的目标是使计算机能够理解、分析、处理和生成自然语言(人类使用的语言)。NLP技术包括文本分类、情感分析、机器翻译、语音识别、语音合成、信息检索、信息抽取、问答系统等。NLP技术的应用非常广泛,例如智能客

    2024年02月14日
    浏览(46)
  • NLP自然语言处理介绍

    自然语言处理(NLP,Natural Language Processing)是一门涉及计算机与人类语言之间交互的学科。它的目标是使计算机能够理解和生成人类语言,从而更好地处理和解析大量的文本数据。NLP不仅是人工智能领域中一个重要的分支,也是当今社会应用广泛的领域之一。 在NLP中,一个基

    2024年01月21日
    浏览(46)
  • 【NLP】灵魂提问:自然语言处理(NLP)技术是什么?

    自然语言处理(NLP)技术是一种让计算机能够理解和处理人类语言的技术。它可以应用于多个领域,包括自动翻译、语音识别、情感分析、问答系统等。 当涉及到自然语言处理(NLP)技术时,有许多不同的应用例子,包括但不限于以下几个方面: 机器翻译:NLP技术可以用于

    2024年01月20日
    浏览(44)
  • NLP(自然语言处理)是什么?

    NLP基本概念: 自然语言处理( Natural Language Processing, NLP)是以语言为对象,利用计算机技术来分析、理解和处理自然语言的一门学科,即把计算机作为语言研究的强大工具,在计算机的支持下对语言信息进行定量化的研究,并提供可供人与计算机之间能共同使用的语言描写。包括

    2024年02月12日
    浏览(34)
  • 自然语言处理(NLP)是什么?

    您有没有和聊天机器人互动过?或者您是否向虚拟助手,例如 Siri、Alexa 或您车上的车载娱乐系统发出过某些请求?您使用过在线翻译吗?我们大多数人都曾与这些人工智能 (AI) 互动过,我们也从未停止过思考如何便捷地表达我们的需求并获得适当的回应。如果我和Siri说:“

    2024年02月10日
    浏览(51)

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

博客赞助

微信扫一扫打赏

请作者喝杯咖啡吧~博客赞助

支付宝扫一扫领取红包,优惠每天领

二维码1

领取红包

二维码2

领红包