ChatGPT’s gamechanger

 ChatGPT’s gamechanger- multi-modality. What this means

If you went to LinkedIn over last week/2 weeks, you were probably inundated by people losing their minds over GPT integrating multi-modality into its capabilities. Normally, I would take some time to tell you that this is another example of the hype machine working overtime to sell you another fundamentally useless idea.

Well, this time is different. Multi-modality is a genuinely powerful development, one that does warrant the attention that it is receiving. In this article, I will give you a quick introduction to multi-modality, why it’s a big deal for AI Models, and some problems it can come with (remember, nothing is a silver bullet).Multimodality 101

What is multi modal AI- Simply put, multi-modal AI refers to AI that integrates multiple types of data (multiple modalities of information). Traditionally, we develop language models for language, acoustic models for sound, statistical models for tabular data etc. Multi-modal models are trained with a mixture of these inputs in the same training process. This is typically done by running the input through embedding models that will create vector representation of your data in a common n-dimensional space

Why multimodality is a big deal- Instead of getting all mathy, I want you to go outside right now. Take a walk. Now look at the sky and imagine that you had a jet pack. Think of how many more paths you could take- even if you stayed in the same geographic area. Multi-modality adds another dimension to your data- allowing your model to sample from a search space that is an order of magnitude greater. In our walking example, we went from x² possible points to hit to x³ points. When introducing their multi-modal AI infrastructure Pathways, Google wrote the following-

Website





Post a Comment

0 Comments