At the Forefront of AI Research: Multimodality, Agents, Open-Source LLM, and Beyond

4 Jul 2024

I recently came across an article where the author argues that while 2023 was a frenzied year in AI, being a hot topic in corporate boardrooms and in the media and even driving public stock market performance, 2024 will be a year of exploration and discovery. He likens the current state of AI to a "primordial soup" phase, brimming with potential yet still amorphous, and claims we've rushed from AI's exploration phase to active exploitation too quickly, chasing fast and easy results. Now, it's “time to press the reset button” and further explore AI towards meaningful value creation.

This article struck a chord with me, sparking a curiosity to understand the minds shaping the present and future of AI research. To gain deeper insights, I interviewed Mohammad (Hamudi) Naanaa, CTO and Co-Founder at Portal.ai, and a former AI Research Scientist at Amazon and R&D Lab Manager at Apple. Our conversation delves into the current state of AI exploration and its next frontier, the challenges and opportunities of responsible and ethical AI development, the potential impact of shadow AI, what it takes to build robust AI expertise and much more.

Enjoy the reading!

Hamudi, what drew you to the field of AI research, and what specific area(s) are you currently exploring?

My journey into AI began during my university years with the groundbreaking AlexNet paper. The idea of training a model to classify images was awe-inspiring, something that seemed unattainable with conventional software. Inspired by this complexity, I delved into AI research to better understand neural networks. I had a strong intuition that if we could solve image classification, it was only a matter of time before we could tackle even more complex data and problems, ultimately building intelligence. I wanted to be part of that journey.

Initially, I dived into computer vision, fascinated by the creative possibilities of generative AI, specifically GANs and diffusion models. Later, the explosion of language models with the Transformer paper caught my attention, bringing the dream of true artificial intelligence closer. Today, I'm at the exciting intersection of generative AI in both text and images.

What do you see as the next frontier for AI exploration?

Even now, years later, we're just scratching the surface of AI's potential. It's a very hot topic; you see a lot of trends coming and going, and the frontier is being shaped every day.

One prominent direction at this frontier is multimodality. The world is more than just text, and I see a bright future in natively multimodal AI — integrating text, images, audio, and beyond. Many major AI companies are already embracing this, and we see foundational models supporting various inputs.

Another domain I see a lot of anticipation and excitement about is agents. These systems have a complete feedback loop with observations, reasoning, state, actions, and reflection. They go beyond the "input-output" paradigm we have seen today with most LLM-based AI.

There's an ongoing debate about whether transformer-based architectures, such as input-output token machines, are sufficient for "true" intelligence.

Exploring fundamentally new architectures is a promising but challenging direction. We might see a renaissance of stateful memory-native architectures, such as Neural Turing Machines (NTMs) or Differential Neural Computers (DNCs), which could address some of transformers' flaws.

All these advancements will revolutionize robotics, bringing intelligent assistants into our daily lives sooner than expected. I believe we'll see the first robots walking among us within a few years, maybe even less.

However, developing technology is one thing, and building useful products on top of it is another.

The value of a multimodal audio-native AI is that users get to generate ultra-personalized songs into which they put emotions. The same core technology, but the product in the right packaging is what empowers people. And this is where I see a lot of exploration happening soon with models becoming more reliable, controllable, and robust.

How do you see multimodal AI systems changing the interaction between humans and technology? Are there specific industries or applications in which multimodal AI will have the most significant impact?

Multimodal AI is already disrupting how we interact with technology. Take chatbots—once simple text-based tools people would ignore on websites, they're now evolving into sophisticated, multimodal interfaces at the center of new designs.

Multimodality is enabling new interaction patterns—take educational apps like Duolingo or Khan Academy. Being able to practice your language skills in text writing to your AI partner, improve your pronunciation in a voice conversation, or show your math equations in photos is a completely new way to interact with technology more naturally, increasing productivity and engagement.

I envision a future with super apps or even new operating systems where users can give instructions and receive a result without navigating through different apps.

For example, instead of clicking through icons and text to order food, you might speak, gesture, or even look at certain elements to interact more humanly. Early adopters like the Humane AI Pin and Rabbit R1 show promise but also highlight unpredictability and room for improvement. As developers and AI researchers, we need to address these issues, and I'm optimistic that we will.

Multimodal AI systems are set to revolutionize how we interact with technology by breaking down the barriers between different forms of communication. We are still at the beginning of exploring that new way of building interfaces, but one common pattern is already noticeable:

existing systems with pre-defined interaction patterns are going to be reinvented.

As AI research advances rapidly, what are some of the biggest challenges we face in ensuring responsible development of AI and mitigating its potential negative impact?

Navigating the ethical landscape of AI is complex yet crucial, as the technology evolves rapidly and its implications are still being understood. We must anticipate and mitigate biases and unintended consequences.

Some challenges stem from ethical implications related to human flaws. For instance, projects aimed at building AI companions can help combat loneliness. Still, they might also exacerbate it by encouraging people to find comfort in AI rather than real-life interactions. This raises questions for creators about the implications of their apps and how they should address them. This is just one example of the fundamental questions that arise from seemingly simple apps, and there are many more we have yet to imagine, let alone the side effects of their existence.

Recent incidents in big tech, such as skewed historical representations of people in generated images, highlight the significant challenges, including ethical concerns and unintended consequences, that come with the rapid advancement of AI technology.

There isn't a simple answer, but I believe ensuring transparency through open-source LLM development (exposing both models and the data they were trained on) and fostering a multidisciplinary approach involving people with diverse backgrounds, not just engineers and scientists, are critical steps in addressing these challenges.

Asking these questions is the only right approach. We are responsible for shaping the future of the most powerful technologies to be built. As creators of AI, we must consider inherent and potential biases and how to mitigate them.

Since your time at Amazon, what projects or research endeavors have you been involved in? What are you working on now?

The magic of AI lies in understanding the laser-focused use cases where it can be most helpful. After leaving Amazon, I had discussions with my friend Vlad Panchenko, envisioning the future and various ways AI could benefit humanity. Having built agentic systems for some time, and combining that knowledge with Vlad's experience as a successful serial entrepreneur, we began thinking about how AI agents could be applied to businesses. Most businesses lack access to top-tier CMOs, COOs, and other experts needed to succeed. AI can democratize access to intelligence on an unprecedented scale. Together, we explored decomposing complex business processes into small, identifiable tasks, viewing agents as individual bricks that can be joined and communicate with each other. I was excited by the potential, and this led to the birth of Portal AI, driven by the belief in bringing world-class AI intelligence to support businesses in their daily operations, from marketing to logistics, allowing them to focus on what truly matters.

How do you envision AI transforming business management practices?

AI is poised to revolutionize business management by automating repetitive tasks and enhancing decision-making.

Imagine having an AI partner that handles your marketing, logistics, and HR, allowing you to focus on creative and strategic work. This transformation will democratize access to expert knowledge, enabling every business to operate at a higher level.

AI's ability to streamline operations will not only boost efficiency but also foster innovation and growth.

As AI becomes more sophisticated, what are your thoughts on the potential impact of 'shadow AI' on areas like workplace integrity and cybersecurity? How can we mitigate these potential risks?

'Shadow AI'—the unintended and often hidden use of AI—poses significant risks. For example, people using AI to game social media algorithms highlight how AI can be misused. As AI content floods the internet, maintaining integrity and security becomes challenging. Ethical AI research must keep pace with these developments, promoting transparency and robust safeguards. Addressing these risks requires continuous vigilance and adaptive strategies to protect against misuse.

We find ourselves in this new era where there are a lot of things that we have to really keep in our minds and keep debating.

Given the field's rapid evolution, how do you stay updated on the latest advancements and maintain your expertise in AI? What advice would you give someone aspiring to build expertise in this fast-paced domain?

Everything is moving and changing so fast, it's great. But that also means that already in three months, there's a good likelihood that something is going to be out of date, obsolete or just old fashioned. There's no way to just read a book and be up to date in these quick iteration cycles.

There are major leaders and reputable sources in the field, so following them helps stay updated. To dive deeper into research, I subscribe to relevant newsletters and communities on platforms like Reddit and Twitter/X—and, of course, I use AI to summarize my threads on Reddit.

For someone aspiring to build expertise in AI, there are multiple paths. If you want to become a researcher, build a strong foundation—AI is deeply rooted in math, and while trends change, the underlying math remains the same.

Overall, I'm a huge advocate of hackathons. I've been to a lot, organized several. And I've got to see a lot of projects. They're great for people to learn something new to use. If I were to recommend one thing to anyone, whether an engineer, product manager, or CEO, it would be: go out there, meet people who want to build something, get your hands dirty, and get it rolling. This is the best way to actually understand things, because you get to develop your intuition, and have fun.Just stay curious!

Looking 20 years into the future, how do you envision the role of AI in our daily lives? What are you most excited about, and what aspects of this future do you find most challenging to predict?

I really want to read this interview in 20 years! AI is changing so rapidly that predicting what will happen in 20 months, let alone 20 years, is difficult. We are in a unique moment, at the early stages of being able to consolidate all human intelligence into one system, allowing universal access to knowledge. Currently, resources like education are not evenly distributed, and I believe AI will have a big impact here being a universal equalizer in many ways.

And touching on robots again, I think this will become a real thing. We'll have personal robots that live with us as our assistants and take away all domestic tasks.

We'll have hyper-personalized products—our own tutors, coaches, and friends. We don't even have a name for these entities yet, but it's already happening.

Another thing that excites me is the acceleration of research. I'm thrilled by the prospect of the first AI-co-developed medicine or cure—what a beautiful world that would be. I'm a strong believer in a better future and am excited to do everything I can to shape that future.

That's right! I was born in Lebanon, moved to Ukraine as a child, and grew up there. Ukraine shaped me profoundly. At 17, I moved to Germany for university, where my family later joined me, and my career began. Living in diverse and equally beautiful societies, I learned about their unique challenges and opportunities.

AI research currently has an English-speaking-centric bias, with most data and systems built by and for English speakers. Believing that AI should be a universal equalizer, we need to accommodate and support every language to build truly universal AI. Being able to speak five languages, I identify with all of them—I'm Lebanese, Ukrainian, and German. I'm Human. These experiences have given me invaluable insights into what connects us while making us unique, and I carry this knowledge with me in all my endeavors.