Developer Relations Manager
SurrealDB
DevRelCon London 2023
In this talk from DevRelCon London 2023, Naiyarah Hussain discusses the importance of staying ethical in a generative AI world.
She explores the current state of generative AI, provides resources for tech builders and users, and addresses common objections. Naiyarah emphasizes the need for transparency, understanding of rights, and responsible decision-making in the development and use of AI technologies. She also highlights the concept of “ethics washing” and the importance of considering the broader societal impact of AI.
Kevin Lewis:
I have the pleasure of introducing the last talk. I’m extremely excited for this talk. I think that not enough has been spoken about in this arena. So this is a talk from Naiyarah on staying ethical in a generative AI world. Let’s do a very dramatic walk to the stage. You ready? Yeah. Stand. Stand standing. We got it. Yeah. Fantastic. Good energy.
Naiyarah Hussain:
All right. Hi everyone. Thank you all for coming to my talk today. I really appreciate it. My name is Naiyarah Hussain and I’ll be speaking to you about how to stay ethical in a generative AI world. Right before we get started, I am a developer advocate, so everything I do in develop advocacy is driven by feedback. So I always have a QR code on my slides, so if there’s anything you liked, you think can be improved, you’d like to see next related to the topic, let me know in that QR code in the form, and I’ll be able to do that hopefully for my next talk. And I would also like to take this time to speak a little bit about my background in this topic. So while I’m a developer advocate, I’ve worked in developer advocacy for about seven years. First with IBM, then lightning ai, and now most recently with serial db, what I do in ethical ai.
And this space is not related to my work in any way. It’s just personal interest and a personal hobby. So everything I’ll be sharing with you is just based on what I’ve explored personally. And here are some of the sources I’ve used to do that. The first one, probably the most important one is the example on the laptop in the middle. So over here, what do you see over here? And that’s a project that I worked on with the Academy of Technology while at IBM. So this is a group that includes researchers, it includes fellows, distinguished engineers, and they come together to solve the problems that we foresee. So one of them is how do people understand that the systems they’re using use machine learning? Are they using AI and how are they using the data? So this example over here, we’ve taken the Titanic use case.
Anyone here has heard of the Titanic dataset? Quick show of hands, I have two people, 3, 4, 5 people. Okay? So to explain the Titanic use case and the dataset is a dataset of all the passengers who are actually on the Titanic. And you also see all the information. So the class when they embarked upon which location they joined on the ship. And you also see did they get a live raft and did they survive or not? Did they survive this incident or not? So it’s a very good machine learning use case so that you can try it out even with your own data or once you build a model, you can understand if you’re given any new data, will you survive this scenario or not? So we’ve built a UI over here that’s showing you hypothetically, if the passengers of the Titanic were given this sort of ui, they would understand number one, that there’s a seal of approval on top.
It’s using some machine learning in the background and you understand a probability scores, the probability of receiving a life raft. What’s the top contributing factor? So what’s the data point that contributes to this decision the most? You understand your placement where you’re on the queue and you understand an aggregate score. So what all the different data factors that contributed to this? And you can even have an option to appeal this decision. So this is just an ideal use case and scenario, and this is what we hope to see implemented in the real world at some point. So that my experience on this project, it was a one year project that heavily informs what I’m going to talk to you about today. Some of the other resources that I think are pretty useful is this book, invisible Women about Data Bias. Anyone has heard of or read this book before?
We have a couple of people in the room. Great. Another one is the Netflix documentary Coded Bias. Anyone seen this one? We have a few hands, about five hands. It’s very nice. It’s about the bias that we see in tech and it came out around 2020. So it’s based on a project called Gender Shades, which back then showcased the major cloud providers, how they have bias embedded in their facial recognition models. So if you want to dig into that deeper, you should definitely check out the documentary. And the last one is a course. So this one is called Practical Data Ethics by Fast ai, anyone here has heard of Fast ai? Okay, nice. Three or five people. So that also I found was very useful to help inform the material that we’ll be going through today. So that’s my experience in this topic. And by profession, I focus on developer advocacy and I’m a web mobile and more of a machine learning engineer when it comes to my technical skillset.
So this is what we’ll be covering today. What’s the current state of the gen AI world? I won’t be going to this too much in detail because experiencing this firsthand, we’ve been through a sort of exponential growth of AI at the moment and everyone’s starting to see it embedded in the apps and services that you’re using these days. The next section we’ll go through is for tech builders. So anyone here considers yourself a builder, you’re actively contributing to how tech is created. So it might be in design, might be in data, might be in development. We’ve got about a third of the room and the rest of it is for tech users. So if you have a smartphone, you’re probably a tech user. So if you’re a tech user, what can you do in this generative AI world? And the last section is resource. And what’s more?
So what’s next once you walk away from the stock? So, so we are in a generative AI world. So this is Ali, she’s a real person, and this is a generated image of her with a prompt through stable diffusion. And this is what her day looks like. She uses Spotify for her morning playlist chat, GPT for her daily plan. Google to recommend a cafe for work, uses hyper write to edit and reply to emails, maps to drive her around. And end of the day with Netflix and Amazon, she easily interacts with AI a hundred plus times a day. And I think there are many of us who can say, our day looks very similar to this. You can just swap out the tools and technologies and the apps for any number of apps that you might be having on your phone. So whether you know it and whether you like it or not, you are interacting with AI on a daily basis.
So now it’s sort of our responsibility to understand how do these tools and systems work in the background and what can you do to mitigate some of the harms? This is a quote by Cassie ov, she’s a chief decision scientist at Google, so related to the data science stream, and she wrote a couple blogs on generative ai, and one of the things I liked in her blog was this code, for better or worse, many decisions related to the responsible and ethical uses of AI are shifting from the builders to the users themselves. So if you’ve been in tech a while, you might have heard the term move fast and break things. Anyone has heard of this or even buys into this philosophy. So that’s kind of something that’s popularized with the startup culture and you can see it embedded in the AI development culture today as well, right?
In the effort to make profits and in the effort to get value out of those AI services that we have available, people are moving extremely fast without thinking of the consequences. So that means big tech has kind of washed off the hands of the ethical responsibility and it’s now in the hands of the user. So now that there are all these big questions in the hand of the user, I wanted to ask how many people here in the room can say they can make an objective decision? Quick show of hands, A couple of shakes of the head maybe. Okay, alright, pretty solid response. So there was a study by Yale researchers that shows if you perceive yourself as being objective, this is actually correlated with even more bias. So y y’all are doing pretty well, right? So this sort of, I’ll tell you a bit about this study.
It was looking into the discrimination that happens when it comes to hiring. And what they did was they gave these people in the experiment a bunch of roles like stereotypically male and female roles to hire for. So examples are police or defense job. Another example is nursing and they’re given a whole stack of cvs. Now, in these cvs they’re given lots of male cvs, lots of female cvs, and some of the cvs have impressive academic credentials. Some of them have impressive practical experience, but the people who are making the decision don’t know this. So what they found that people often made a decision and they assigned male profiles to stereotypically male roles and female profiles to stereotypically female roles. And they came up with an ad hoc justification. So they changed the criteria of success after. So they said, of course this person should work as in defense because of their impressive practical experience.
Or of course this person should be a nurse because of their impressive academic credentials. So you can see sort of like a sliding window, like the goal for success, the goalpost is constantly shifting and then we can come to how tech is actually influencing your decision and how you make decisions and your user behavior. So there’s often in tech we look at metrics like engagement or short-term metrics, and we might be ignoring some long-term metrics. And this has led to some technical platforms being associated with the brand associated with maybe flatter theories, supremacy, privacy scandals, political manipulation, facilitating genocide even. And these are years of this platform being associated with that. In tech you also have the problem of incompatible incentives at the foundation, you’ve got maybe recommendation algorithms and then the AI built on top of that which might be recommending very divisive content.
For instance, very highly positive content like cats and puppies and babies, and then highly negative content that might trigger an emotional reaction. And then they also have these guidelines, which is more of a public relations effort to say, Hey, of course our tech cannot recommend content that goes against our guidelines. We have this up on our platform and we’re doing some sort of content moderation to ensure that no content goes on our platform that goes against these guidelines. So these tech companies, they have a financial incentive to keep the gap wide between the algorithms and the guidelines. So these are some fundamental issues within the tech world and you can see that this is also affecting our user behavior at the end of the day. So there are some deceptive patterns that are slowly altering our behavior over time. There’s also quite a few good books about this sort of behavioral psychology. For instance, there’s a nice one I like called Nudge about, has anyone here heard about choice architecture? A couple nods in the room. So about having the default choice available and then how to make a user or someone else take the choice that you want versus the one they want. So default or keeping the easiest option available is probably the one you want. And if you want to make it harder for them, they have to take an extra step.
And this is also a quote I like quite a bit. This is by Majewski, he’s a Polish American web developer and he speaks a lot at conferences. He’s very well known on Twitter. He says Machine learning is like money laundering for bias. It’s a clean mathematical arius that gives the status quo the aura of logical inevitability. The numbers don’t lie. I dunno if you’ve come across this in your day-to-day lives, but there might be some systems or some algorithms that just give you an answer and it’s hard for you to question why it gave you that answer. So I’ll give you an example. Back when I was a developer advocate at IBM, A common issue we’d run into is when people sign up on the platform, some emails are flagged for fraud. If I go down the rabbit hole and dig, why is it flagged for fraud?
At some point we’re using a third party service that’s using Visa, that’s using some algorithm that is flagging for fraud. But for me, even as a developer advocate, I cannot investigate further. I cannot question the system further. I’m just told that it’s an algorithm that returned this result and I have to tell the user, Hey, sorry, you can’t use the service with that email. Another thing to look at is recently there was a report put out saying 52% of ChatGPT answers are incorrect when it comes to answering stack overflow questions. Some people have heard about this right and above 70% of it is verbose. It’s hard to identify the errors, especially if you have to put this code into an IDE and then go through it. So we are running into a lot of issues now with generative ai. What can we do about it?
So if you’re someone in tech, there’s some good resources you can use if you’re working in design. So these are ethical design resources for different industries. So say you’re working in the medical, the legal industry, these are frameworks that have already been in use at some point. So then you can reference this and use it for your own work. Another one is humane by design about design principles. So if you’re building for inclusivity for example, you can find more examples of that on this site. The next one is for anyone working in development or data. Has anyone here heard of model cards?
Okay, we have two hands. So model cards was a project that was started off at Google and back in 2020, I dunno if anyone remembers, Google had this scandal about the Epics board being laid off. One of the people who was laid off had created model cards. The good thing is she has now moved to hugging face. And now in all the hugging face models you can see some model cards implemented and it’s even easier for developers now to create model cards. So what are model cards? It tells you within your model, what’s the expected input, the expected output, how does the model itself work? What are its performance metrics? Some basics that you need to know like ABCs before using that model. So now model cards have even been extended to cover data. So you have data sheets statements, you have the model cards, value and consumer labels for models, and you even have systems focused.
So once that machine learning model is embedded in a system, how do you understand that system end to end? How does it work? And it works. These work for machine learning engineers, for model developers, students, policy policymakers, ethicists, data scientists and impacted and individuals. So if you’re a developer, try to find the model card for the model before you use it and go through that to understand where the data is coming from, if there’s any bias in it, is it robust to attacks? So this is one way. The second thing is AI fact sheets. So like I shared earlier, if you want to understand a system as a whole, you can look at AI fact sheets, which are a lot more comprehensive. So this is more of a marketing image to explain to you what it does. It’s like a nutrition label for your machine learning model, but it’s much more comprehensive when you see the data in there.
It shows you the intended purpose, the domain, the data used, model used, performance bias, robustness against attacks. I can share the slides later if you need them. And I tried to encompass the different resources a developer would have into a toolkit. So the first one is with the new regulation, especially the AI act, if anyone has heard of that, one of the most important things is for limited risk AI models, you start to need to explain your model decisions, right? So now we can start looking at explainable frameworks. I don’t think I have time to show you these examples one by one. I’ve got a couple of notebooks if anyone’s interested. I can show you later. One framework, which is really cool and very easy to use is called Dalex, D a l e x, sort of like in Dr. Hu. The next one is safety filters.
So think about embedding safety filters in your application. So if you might be generating accidentally generating some content that might not be suitable for some audiences or not safe for work, think about having a safety filter. An example is in stable diffusion. When it first came out, they had the Rick Astley Rickroll video. Anytime you generate any content that might not be safe for work, this is a new one, a new web three standard called content authenticity. Now that we’re in a world that content can be remixed, it can be generated, it can be mixed along with the original content. That’s quite tricky. And content authenticity aims to fix that by showing you who’s the original creator, where it has been remixed along the way before you consume this content. This is probably the most important identify feedback loops in any machine learning system. It’s a combination of the data plus the machine learning model plus the UI plus the overall user experience, plus the feedback from the end user to that system that makes that system work. If any of these parts of the system are broken, your whole system is broken, right? So make sure, number one, you have channels to identify feedback. And secondly, specifically for ethics, make sure you’re able to collect feedback on ethics and irate through that. The next one is more for a workshop you can have while you are thinking of the design or you’re building out the product, be able to play developer’s advocate, sorry, play devil’s advocate.
There’s your Freudian slip of the day. Be able to play the devil’s advocate. Think about the terrible people who might be using your product, who will want to abuse, steal, misinterpret, weaponize, hack, or destroy what you’ve built. Who will use it with alarming irrationality or ignorance? What rewards or incentives have you inadvertently built into this product? What can you do to remove those rewards and incentives? So that’s like a nice workshop to play with your team. And the last one, I thought these resources are really useful. So in GitHub there’s this ethical ML organization, so that’s the Ethical ML Institute. I think it’s based in the uk and they have some really nice repositories. So one is awesome artificial intelligence guidelines. So that’s in general guidelines about different countries, different organizations that have been put out. So something you can use as a practice while you’re building. And the second one is awesome production machine learning. So where have people used ethical ML in production in the real world? So I will leave that for you to explore in detail. It’s a nice rabbit hole to go down.
What I also wanted to talk to you about are some common objections. Now, some people in the room might already have some common objections in their head, but these are the ones I’ve heard of the most. The first one is don’t people know what they’re getting into when they use these services? Everyone accepts some terms and conditions when they decide to use the service, but in general we know that it takes time to read company policies. I think I read a paper, an article that said it takes around 40 hours on average if we had to read the terms and conditions for every service we use. Also, these are not designed really to be that readable for most of them. They’re designed to be read at a college graduate level. And for example, in America I’ve understood that most people read at an eighth grade level.
So that can be quite complicated. And also if you choose not to use the service, people would need to opt out of module of their modern life. There’s no sort of granular control to say, okay, I’m all right with you reading my messages, but I’m not okay with you collecting my user location for instance. So that’s quite complicated. The second one is, I’ve done nothing wrong. I’ve got nothing to hide. With enough surveillance, you can get anyone on anything. Some demographic groups have higher surveillance, they have highest scrutiny in some countries. So some examples are Muslims in some countries or immigrants. Anyone who identifies for instance as being LGBTQ in the Middle East. And the last one is what about differential privacy? So this means anonymizing the dataset. So you’re not sharing any personally identifiable information like the name, the phone number, the email, but you might have some other information in there like the zip code where the person works. And if you collect enough information you can get a rough idea of who this person is. This also assumes that the database owner is the good guy. It treats the harms as individual, not for the community as a whole. It doesn’t consider the alternatives of collecting less data or the data actually required for the use case. And it gives corporations a means of whitewashing the risks if the corporation just says, Hey, we use differential privacy. So your data, your personal data is not being shared.
So this slide is just to think about the common objections and how to handle them if you come across this in the real world. The next section is for tech users. So the first thing I want you to know is to know your rights. So many tech companies are going to take advantage of you if you don’t know your rights. Okay, so this is a quick chart showing the privacy, AI data governance, macro environment laws that might govern your rights related to any tool or service that you’re using. Any tech tool or service. Some of them are related to data protection, some of them are privacy shields, how data is flowing, whistleblower protections, digital sovereignty. So two thirds of countries have some sort of privacy law and data governance regulation. This is an excellent site to start to understand some of that. It really simplifies it for you. No long-term and conditions there. And you can just click on the interactive map and understand for your part of the world or countries that you deal with, what are the relevant laws and regulation.
Another thing I wanted to talk to you about, which might seem like extra hoops to jump through, and I just said you don’t need to jump through extra hoops, but I do see it pop up quite a bit in courses like the practical data ethics course is using the sift methodology. So when you are trying to find any information, try to stop, investigate, trace back the content you’re going through to its original quotes, understand if it’s accurately represented. Is the media source you’re consuming? Is it reputable? Who are its sponsors? What are its missions and purpose? So it’s kind of like going through a second reevaluating the content that you’re consuming at this point. This I think is very important consideration. Are you opting in or opting out to these large models? So remember we spoke about choice architecture and the default. The default at the moment is you have to opt out by default.
If it’s online, is it going to be consumed into a model when you never consented or opted in to that being consumed? So you can try using the site, have i been trained.com and put in maybe some content you’ve created or your name to see has your data been included? And this is just one dataset. It’s called the Lion dataset. It’s the largest image dataset and the API they use is something called the spawning a P I, which you can now see in some hugging face models. So it’s a start, it’s a step in the right direction, but it’s still not widely used everywhere. You can’t see people consenting to the data being used everywhere. And the last thing I wanted to show you is about ethics washing. So I’m going to skip to the fancy bit to the last slide. So companies can often game AI ethics.
They can use fake ethics and tokenism. So having a chart of principles to communicate the importance of ethics in that corporation, they might have some obfuscation. So talking about the ethical principles and social good while delaying action on societal change impact regulation. And the last one is spin cycle. So having maybe an AI ethics influencer to talk publicly about the AI ethics as a sole action to define the company’s ethics. So while the company might not be doing anything, having much impact, they’re having someone publicly speak about AI ethics in general. So quite similar to greenwashing, if you’re familiar with that in the fashion industry, while a brand will associate themselves with the colored green or sustainable or the recycled logo, similar to that you have ethics washing. There is also the ethics of AI ethics, which I like. This is a paper from 2021 and it’s really just a frequency analysis, examining the ethics from different, sorry, the principles from different companies from different governments to see which ones are mentioned more and which ones are mentioned less. So you can see out of the ones mentioned less here at the bottom, you can see there are hidden costs. For example, labeling, content moderation, energy consumption resources, protection of whistleblowers, diversity in the field of ai, the dual use problem. You’ve developed some machine learning system for one use case and you’re using it for something else like the military or the AI arms race, future of employment.
And I think I have about a minute or so, more two minutes. Alright, the last question I want to leave you with is what was the most revolutionary technology of the earlier 20th century? So where we are in the generative AI space is really still quite nascent, but we are going at an exponential pace and I want you to think about what do you think is the revolutionary tech of the 20th century that has affected everyone? The internet? No anything. Electric Electricity. Electricity is a good one. iPhone, there’s a lot of people who still don’t have access to phones than the internet.
Say steam engine electricity is a good one. The steam engine is close, you’re getting warmer cars. Sorry, cars. Cars. You’re right, it’s the automobile, right? So this is the 40 model that came out in 1908. The special thing is it was the first mass produced and consumed car. So with this, there was also a high increase in death rates. So the US government looked at this and said, how do we reduce the death rates? So they had a sort of holistic approach over the years. It was crash avoidance with mirrors, driver assistance with anti-lock brakes, crashworthiness with airbags and dummies, pedestrian safety with crosswalks, car to car safety with traffic lights and laws like driving under influence and safety training. They did not say have ethics training for drivers. They said yes to enforced public policy. So that’s it. That’s the end of my talk today.
I wanted to share some resources if you want to dive into this topic deeper. There’s a good community called Alltech is human and they have a job board if you’re looking for that, called the responsible tech job board. This is the course I did, practical data ethics and all the references for my talk are in the short link. Again, please, please share your feedback. I’m often asked to give this talk, for instance, for government officials. So I would really like your feedback to see what you think is important and what I should include next. Thank you.
Kevin Lewis:
Thank you so much for a really, really strong end to our day in this core skills track at Dev Recon. We have five-ish minutes for questions. Should there be any?
Naiyarah Hussain:
Yes,
Kevin Lewis:
I will come over there with this mic. Hang
Naiyarah Hussain:
On. Yeah, I should keep disconnected then.
Audience member 1:
Sorry. So this is potentially slightly off topic, but it feels like it’s in the right area. I don’t directly work on a lot of the ML model type stuff that my company does, but I’m a pretty heavy consumer of ML stuff for work in the sense that I use GitHub copilot and ChatGPT writes most of my thoughts. I’m curious what your thoughts are in terms of disclosure. So for DevRel, there’s kind of a promised land of some magic is going to build on my code samples and write about tutorials. Do you think there is an onus on folks who create that kind of content to declare that it was mostly written by a model or do you think if it’s all just going to be written, the default assumption should be a computer probably wrote this?
Naiyarah Hussain:
A very good question and more of an ethical question probably that will allow us to dive in deeper to ethics, which I didn’t have time for on the stock, but it is there in my notes. So if you want to look into that deeper, you can. I think currently at the moment it’s pretty much governed by if your company has an ethics board, what are the ethics that they follow and the principles that they follow. And you’re probably held accountable that. Secondly, it’s your personal ethics. What kind of personal ethics do you have and do you feel the responsibility and need to disclose that?
Audience member 2:
And what is your take on if we do not go all in on AI development, no matter the costs, some bad actor will because this is some argument that I hear when we go, okay, we shouldn’t develop the next iteration of AI because it could be dangerous. And then somebody comes in and say, no, no, we should because if we are not in the front row, then someone else with bad intentions for some bad country will develop the next AI generation and we will be in the dust. What’s your take on this?
Naiyarah Hussain:
What’s my take on this? So I’ve already seen a few TED talks about this. For instance, people speaking about how they are doing the best for the AI arms race for their country and how they’re defending their country. So one way it’s quite interesting and quite scary to see this happen because as an average citizen or as an average tech user, you don’t want to be caught in the crossfire pretty much, right? So I dunno what to say here. I’m seeing it already happening. I don’t have an opinion on it. All I can do as a developer advocate is educate developers and educate end users. Thank you.
Kevin Lewis:
Time for one last question. Should there be one? Wonderful, here you go.
Audience member 3:
Thanks for the talk. Yeah, it’s interesting you mentioned the, have I been trained? So you upload an image and it tells you
Naiyarah Hussain:
Have I been trained? Yeah.
Audience member 3:
Have I been trained? What I’ve observed is with a lot of content creators, like people who create images, just like completely original works, when you ask them, do you want to opt out? They always say yes. They don’t want that to be trained. How do you see that going forward if everyone is opting out for generative ai?
Naiyarah Hussain:
Right. So I think everyone isn’t opting out because most people go with the default option. And the way the internet is architected at the moment is, especially with the AI stuff, the large models, the default is to opt-in. So now people are trying to change it with the spawning APIwith have I been trained, for example else saw an article the other day of someone who read the documentation of OpenAI, where they put out how their bots crawl different sites, so like the GPT bots, sort of how they scan the robot text file on your site. And what they’ve done is they’ve just blocked the IP addresses since it was there in the docs because they don’t want their tech to be used to train some large motor right round applause.