Intro

In the world of data science and technology, one cannot ignore the allure of Large Language Models (LLMs). Their capabilities are undeniably captivating for enthusiasts in the field. However, despite the excitement, caution should be exercised. Let’s talk about when it’s not advisable to use LLMs in your data science projects.

Targeted use case and limited data

As we all know, Large Language Models are trained on a massive amount of data so that they can perform a variety of tasks, allowing users to save a significant amount of time. They provide higher-quality outputs in tasks like translation, text generation, and question answering, compared to, for example, rule-based systems where developers manually create rules and patterns for language understanding. Conversely, if your data science project involves highly technical or specialized content, using a pre-trained LLM alone may result in inaccurate or incomplete results. In such cases, incorporating domain-specific models or knowledge bases may be necessary. To accomplish this, a substantial amount of data is essential, given that these models possess billions of parameters. Effective fine-tuning requires a significant quantity of data.

Consequently, if there is an awareness that the data available is limited, or if there are constraints on the data, it is advisable to first consider an approach utilizing Natural Language Processing (NLP). In such cases, an NLP model or less complex LLM, which is also known as Small Language Model can still yield satisfactory results on the available dataset. Review our article about the advantages of using SLMs over LLMs: When bigger isn’t always better – Bring your attention to Small Language Models.

Factuality

When discussing the drawbacks of Large Language Models, it is essential to mention one of the most common issues, namely the tendency of models to hallucinate. Anyone who has used or is using ChatGPT3.5 has undoubtedly experienced this phenomenon – simply put, it is the moment when the model’s responses are completely incorrect, containing untrue information, despite appearing coherent and logical at first glance. This is primarily influenced by the dataset on which the model was trained, as it is vast, originating from many sources that often may contain subjective, biased views, or distorted information.

The cause of hallucinations also lies in using models for tasks they were not adapted for. The feature which seems to be an advantage when it comes to creative tasks, such as composing songs and writing poems, becomes a disadvantage when we expect the model to provide only factual information. As we know, LLMs perform very well in general natural language processing tasks, so applying them to specific Data Science tasks will result in outcomes deviating from the truth. In such situations, it is necessary to tailor these models to a specific problem, armed with an adequate amount of high-quality data. As we know from the previous paragraph, acquiring such data is a challenging and laborious process. However, even if we manage to create such a dataset, the issue of fine-tuning the model still remains, posing an additional challenge if computational power and cost resources are limited.

Streaming applications such as multi-round dialogue

LLMs also encounter challenges in processing streaming data. As we know, they are trained on texts of finite length (a few thousand tokens), resulting in a decrease in performance when handling sequences longer than those on which they were trained. The architecture of LLMs caches key-value states of all previous tokens during inference, consuming a significant amount of memory. As a result of this limitation, large language models face difficulties in handling systems that require extended conversations, such as chatbots or interactive systems.

It is worth noting that the StreamingLLM framework comes to the rescue in this context, where the authors leverage the initial tokens of LLMs to serve as the focal point for the allocation of attention scores by caching initial tokens alongside recent ones. Nevertheless, keep in mind that this framework does not extend LLMs context window – retaining only the latest tokens and attention sinks while discarding the middle ones.

Security concerns

Deploying LLMs in data science projects may raise legal and ethical challenges, especially when dealing with sensitive or regulated domains. LLMs can be vulnerable to attacks, where malicious actors intentionally input data to deceive the model. It is crucial to remember that the model’s responses may contain inappropriate or sensitive information.

The absence of proper data filtering or management can lead to the leakage of private data, exposing us to the risk of privacy and security breaches. The recent inadvertent disclosure of confidential information by Samsung employees highlights significant security concerns associated with the use of Large Language Models (LLMs) like ChatGPT. Samsung’s employees accidentally leaked top-secret data while seeking assistance from ChatGPT for work-related tasks.

The incident serves as a stark reminder that any information shared with these models is retained and utilized for further training, raising privacy and data security issues. This incident not only demonstrates the unintentional vulnerabilities associated with using LLMs in corporate settings but also underscores the need for organizations to establish strict protocols to safeguard sensitive data. It emphasizes the delicate balance between leveraging advanced language models for productivity and ensuring robust security measures to prevent inadvertent data leaks.

Interpretability and explainability

Another important aspect is that LLMs generate responses that are non-interpretable and unexplainable. Large Language Models are often referred to as black boxes, as it is often impossible for users or even the creators of the model to determine exactly what factors influenced a particular response. Additionally, there may be cases where the same question yields different answers, which is unacceptable for certain use cases.

Therefore, if project requirements include a transparent and logical decision-making process, relying on responses from a language model is not advisable. However, it is still worth considering eXplainable Artificial Intelligence (XAI) in Natural Language Processing (NLP) for such problems. Explore the role of XAI in addressing the interpretability posed by machine learning models in another of our insightful articles: Unveiling the Black Box: An overview of Explainable AI.

Real-time processing

In situations where project requirements involve processing responses in real-time, large language models are not a suitable choice. They possess an enormous number of parameters, translating into a significant demand for computational power for processing. The computational load of large models can be prohibitive. Due to the high complexity, large language models often exhibit extended inference times, introducing delays that are unacceptable in real-time contexts. Applications processing vast amounts of data in real-time, given their flexibility and the tendency for context changes in text, would require continuous fine-tuning to meet demands. This, in turn, results in substantial costs for maintaining model quality.

Summary

In summary, while large language models exhibit impressive language understanding, their practical implementation comes with challenges related to computational efficiency, latency, resource usage, scalability, unpredictability, interpretability, adaptability to dynamic environments, and the risk of biases. These factors should be carefully considered when deciding whether to use large language models in data science projects.

All content in this blog is created exclusively by technical experts specializing in Data Consulting, Data Insight, Data Engineering, and Data Science. Our aim is purely educational, providing valuable insights without marketing intent.