The latest AI study “Elephant in the Room”

Let me drill down the reasons why a recently published study in AI “Elephant in the Room” creating so much interest and noise?

Artificial Intelligence academic world and applied AI in the industry are two different multiverses co-existing. The only link between the two is the real world applications that find their way in popular tech domain from time to time but a lot of the discussion, researches, prefaces, and debates happen in the academic world that mostly doesn’t impact the applied AI world in usual terms. But, something different happened a few months back.

On 9-Aug-2018 a study named “The Elephant in the Room” was published by a team of 3 researchers Amir Rosenfeld (Postdoctoral Fellow, York University), Richard Zemel (Professor at University of Toronto, Computer Science Department) & John K. Tsotsos (Distinguished Research Professor at York University).

The study has nothing very ambitious stated in its small description, neither it seems ominous at the start, but the impact and turbulence it has created among the academic world that even in Singapore I could see the reference of this study being quoted in multiple sessions with different AI discussions. I first got to know about this study in Gary Marcus’s (a neuroscientist at New York University) twitter feed in early Sep-2018 when I first went through it and studied its premise and the submission done.

Funny enough I didn’t find anything too significant at that time and replied to Gary about its practical unimportance in applied AI but since then I am seeing more and more noise being created about it and it made me go back and dig out why this study is gaining so much importance.

Difference of opinions

Two birds cross with each other

The first thing that I noticed, is that entire community, I am including both academia and industry in this is divided into two camps. Optimists & Pessimists. Optimists always try to celebrate the baby steps in magnanimous ways and are hopeful of big gains are around the corner and are the ones who motivate companies, researchers and people in general, to keep pushing. They include various CEOs of AI oriented firms including Google, Facebook & Apple.

The other group think they are the realists here and they try to bring in the voice of reason to keep the optimists over-celebrations under check, but in many cases they rather than just being realists, become the dampeners of the mood, these are usually the researchers of human brain (neuroscientists) and some tech researchers too, who claim that people drumming up the accolades to AI have to really see how far we have to go.

This study put the two sides of the community directly against each other and the name calling, and bickering started.

What did the study achieve?

The study has been very forthright in stating it’s objective in the Abstract, which stated:

We showcase a family of common failures of state-of-the-art object detectors. These are obtained by replacing image sub-regions with another sub-image that contains a trained object. We call this “object transplanting”. Modifying an image in this manner is shown to have a non-local impact on object detection. Slight changes in object position can affect its identity according to an object detector as well as that of other objects in the image. We provide some analysis and suggest possible reasons for the reported phenomena.

Direct excerpt from the study’s Abstract section
Credit – Scenario snapshot taken from actual study

In the first part of the study this trained object here they used an image of an Elephant that was a bit frivolous, as this fits the metaphorical construct of “Elephant in the room” which it means the most obvious fact that no one in the room wants to address.

Yes, the results of the object detectors went awry, and they started identifying things either wrongly or with a lesser confidence level. Note: Confidence level is what indicates in an AI system for what to derive the final meaning of the scene. The lesser confidence level may tip the system is taking an entirely different meaning of the scene.

Study very clearly established the benchmarks it is using and trying to go by the industry accepted standards like in this case they used the Microsoft COCO object detection benchmark that they trained the object on. This benchmark is one of the many benchmarks used by a lot of autonomous vehicles producing companies.

In Second part of the study, the researchers duplicated the objects that were present in the picture to see if it trips the object identifier to see the entire image differently and to everyone’s surprise it did. This proved the fragility of the object identifiers. I’ll cover the relevance of this scenario shortly below.

And in the last section of the study, the researchers tried to test their findings with more samples of anomalous objects in different images to see if the object identifiers misread them all and what other behaviors we’ll find.

The practical impact of this study

Any study and its stated outcomes must be a takeaway for both academic and applied world and this study is no different. As I mentioned earlier, I’ll try to derive relevant conclusions from the study to make sure the takeaways are practical and implementable and let us discover the real-world scenarios where this study can be used, and its findings can help us to ensure the solutions we build take in the consideration of scenarios mentioned in the study.

This study is focused on the scene interpretation realm of AI and thus all applied sciences that work based on scene interpretation are in scope of the use case.

  1. Medical Robotics
  2. Navigation & Exploration
  3. Autonomous Vehicles

In all three scenarios above, we must be very clear that scene interpretation done by the machine is accurate to identify the next decision made, the route is taken, or action taken in all three respective use cases. Let us be very clear. Machines don’t have the problem of ghosts. They don’t see the images as humans do, but the derive the images by sensory data they receive.

As dumb as they are, inserting any anomalous object in the scene is the case of image tampering or an external object placed by an external entity. This steps into the realm of the system security and cyber attacks. So, any deductions we make from this scenario will be irrelevant if we make sure no one unauthorized can access the system on the go.

This first part of the study only provides an insight into the system behavior if any hacker can access the system and inserts something malicious to the system data. Autonomous cars may start behaving strangely, they may run off course due to cyber attacks, apply sudden breaks or ram into someone if they misread the scene they’re trying to process and react to.

Place 2 cameras to counter balance each other

In applied AI, we can counter this by placing a secondary and independent system to keep getting a secondary snapshot of the image and the scene, that is interpreted by the primary system gets validated by secondary to act upon. The major disadvantage this brings to the overall system is could be a reduced response time, but that limitation we may have to factor in as per the overall security of the system.

The second scenario of the study takes on the scenario where an object that is part of the picture gets duplicated and hence fools the system into a different output state than it should’ve been. This scenario can exist without system even being compromised by an external influencer or a cyber attack.

Remember the days of old camera films, where overexposure from one picture can affect the second picture in sequence and we could see the part of the image of first into the second also. Even the digital systems sometimes get plagued by this phenomenon. When we generate a video feed it is quite possible that the primary camera in question is skipping the frames due to maintenance issue or camera issue.

This can result in two or more frames overlapping with each other where the relative placement of an object in the previous frame was different from the current and creates a duplicate object creation in the frame. This could tip off the sensors to interpret the image entirely different than what it should be. Now top this case with a cyber attack, so this can be done by an external party as well.

What we learned from this case is that the importance of having a second independent validation system becomes even more important for the case of autonomous vehicles. Thus, whatever performance drag this may introduce, overall system must bear the load and take this as a part of the solution.

Why so much fuss about the study?

Well, any technological advancement evokes an interesting response by the society. It has its extremes on both sides of the spectrum and it has its moderates. This study is cajoling one of the most talked about and the debated phenomenon of industrial automation that would impact the future of work for a lot of humans. Autonomous vehicles.

Scene detection relevancy for Autonomous vehicles

With societal fears of a huge amount of people scared with the feeling of losing jobs to autonomous vehicles, this problem has taken a political color in many countries. So, people are trying to find arguments that will position themselves to gain the maximum from any study that, further their agenda.

Automation companies like Google, Tesla are pushing for autonomous vehicles as they have presented a case where the majority of the accidents are caused due to human errors, either people are tired behind the wheels, careless or even drunk. These problems are not present with autonomous vehicles. Hence, it makes logical sense to progress towards vehicles that are free from such issues.

On the other hand, people who earn their livelihood from this profession are scared that they may get redundant, and may lose their jobs, hence they are happy to be part of the camp that counters the over-reliance of these automated systems.


These debates will keep happening in future as we’ve seen all this in the past too. But, for now, it seems the study will keep triggering extreme emotions and counterarguments which are always present albeit on simmer.

People have a tendency to resist any change, and any automation in processes. Fears of Skynet & terminator rising will keep humanity on the tenterhooks for this subject. We need a voice of rationality and logic to keep these fears to their practical impact and not convert them into the public hysteria, neither ignore them as unnecessary noise.

In the end, I would like to thank Gary Marcus to direct me towards the tailing issue in AI systems which helped me understand the more residual problems to such scenarios. You should follow him for his views on twitter too.

Thanks Gary!!!

Shailendra Malik

Shailendra started off as an entrepreneur from his family business and has been working for 25 years. Formally became an IT Systems consultant 15 years back, delivering strategic solutions from paper to real life and PoC initiatives to live production systems. Known to be a problem solver & out of the box thinker, he is currently working with a consulting firm in Singapore. He has lived and worked in India, Middle East & Singapore markets.

1090total visits,1visits today