From 3M Health Information Systems
AI Talk: AI and beauty, coffee consumption and labeling errors
In the AI of the Beholder
I saw this podcast by Jennifer Strong of MIT Technology Review which was eye-opening to say the least. Ever wonder how the Tik Tok recommendation system works? Or how Facebook and other social media companies promote specific content? Apparently, it factors in the attractiveness of the participants. The podcast starts with interviewing two young teenage girls who start out using face-filters, initially for comic effect and then later to enhance how they look in their online posts! Now, it appears that there is a whole population of young girls using these filters.
Next, Strong interviewed the founder of Quoves Studio, Shafee Hassan. This company’s offering? Upload a picture of yourself and they will tell you how you can become a better you. They will tell you all the facial flaws you have and how to correct them using surgery. Yes, actual surgery! If this company succeeds, the plastic surgery industry will see new highs. Professor Lauren Rhue of University of Maryland School of Business provides a cautionary backdrop to this topic. Judging outward beauty, particularly using Eurocentric definitions, can have a huge detrimental impact on people’s psyche. More research is clearly called for, but the beauty industry with all the face creams and fashion accessories seems to have gone all digital and it is scary.
We have all seen confusing pronouncements from nutritional studies. One day eggs are good for you, then they aren’t and then they are again. And the process repeats for countless foods. This story is about coffee.
Coffee drinking is of course a practice that goes back millennia! Here is a fun link about the history of coffee. Billions consume this beverage, but in 1991 the World Health Organization pronounced coffee to be a possible carcinogen. The study was later discredited as it turned out that if they factored out smoking as a confounding variable, the link to cancer disappeared. Herein lies the problem with these studies. Unlike drug trials where you can judge efficacy using randomized clinical trials, nutrition studies involve teasing out correlations in aggregate data. Correlation doesn’t imply causation as the WHO study (above) demonstrates.
A recent study reported in American Heart Association Journals, attempts to tease out the relationship between coffee drinking and heart health (5). The New York Times did an extensive report on this study. The researchers in this study did not start with a hypothesis surrounding coffee. Instead, they took publicly available datasets to see if they could tease out any correlation from the data. One such dataset is the famous Framingham Heart Study (FHS). FHS was started in 1948 in the city of Farmingham and follows a cohort of patients: what they eat, what they do and what conditions they develop. This incredible study continues to date and has spawned 4,000+ research papers.
The authors of the current research paper found various correlations with coffee use. From it they attempted to answer a few basic sets of questions: Is coffee good for you? How many cups? What are the benefits? Costs? Well, it turns out the jury is out on most of these questions. All that the study can say is “it may be good for you,” and a high probability “it’s not bad for you.” As a moderate coffee consumer, myself (2-3 cups/day) I am happy to know that I am not negatively impacting my heart with my coffee consumption.
Machine learning accuracy is strongly tied to the quality of the training data. Training data that is not representative of real-world situations is one problem with data, but what if the training data also has labeling errors? An MIT study found that roughly 3.4 percent of the training data in popular datasets are mislabeled. How did they find these labeling errors? The authors proposed a new paradigm: “Confident Learning.” This approach seeks to identify mislabeling by estimating the joint distribution of noisy and true labels. The basic notion is if you can eliminate training data that falls below a threshold of confidence, you can improve the training process and the derived model. They showed that their method identified images in popular datasets such as RESNET that were mislabeled. They have even created a website that showcases these instances of mislabeling. An interesting approach.
My long-time friend, Chandy, sent me The New York Times article.
My colleague, Dan Walker, sent me the article on labeling errors.
I am always looking for feedback and if you would like me to cover a story, please let me know! Leave me a comment below or ask a question on my blogger profile page.
V. “Juggy” Jagannathan, PhD, is Director of Research for 3M M*Modal and is an AI Evangelist with four decades of experience in AI and Computer Science research.