so anyway I'm going to just briefly say my name is Dan Kaminsky I'm the director of the analytics program and director today the Institute I'll say a few more words about the data Institute upstairs site I hope everyone will come upstairs and join us food and drink afterwards but right now it's my great pleasure to introduce Jeremy Howard who is many things a serial entrepreneur his most recent ventures analytic which is bringing deep learning to medicine and before that it was also the former president and chief data scientist at kaggle and I think I'm going to leave it there that could keep going but anyway let's give a warm welcome to Jeremy thanks very much David everybody can hear ok yes great ok so you know my passion at the moment and for the last few years has been this area of deep learning who here is kind of come across deep learning at some point heard of it or knows about it maybe a bit of a healthy 2/3 okay great it's one of these things which kind of feels like a great fad or a great marketing thing or something kind of like I don't know it's data or Internet of Things or you know all these various things that we have but it actually reminds me of another fad which I was really excited about in the early 90s not telling everybody is going to be huge and that fad was called the Internet and so some fats fats but they're also fed for a reason I think deep learning is going to be more important and more transformational than the internet so that hence the title of this talk it changes everything every one of you will be deeply impacted by sleep learning and that many of us are already starting to be impacted by people learning so before I talk about that I want to talk about that kind of how people have viewed computers so many years and people have really made computers the butt of jokes so many years for all the things they can't do so you may remember from 2009 this was Google's autopilot which was their April Fool's joke and you know the bad of the joke was basic legal of course you know computers can't send email and so that was April Fool's joke of 2009 going back further a source of chemo for Douglas Adams with the Babel Fish which was basically the idea that technology could never be so advanced as to do something as clever as to translate language so he came up this idea of this this fish called the babel fish that we translate language and so and probably useful was this thing that it was used as a proof of the existence of God in the Hitchhiker's Guide to the galaxy but things have changed in the last year suddenly the joke doesn't work anymore because your computer really can reply to your email these are actual examples of replies that have been automatically generated by Google inbox which is a mobile app for Android and iOS and you can also access on the web and this is not some carefully curated set of responses for this particular email in fact 15% of emails century inbox by Google are now automatically created by the system so it's actually being very widely used already so here's another example of what the system does and indeed the Babel Fish now exists as well you can for free use Skype translator system to translate voice to voice for any of six languages and they're adding more and more even computers are out of smell this is actually not a genuine van Gogh but a fairly impressive impact on that I'm going to give you a little test which is to figure out which of these are real paintings from drawings and which ones were done by a computer so they're all pretty sophisticated now that you've made your decisions I will show you the first sketch I guess testable sketch on the left is done by a computer the second drawing is done by a computer and the third one we all know about extrapolating from past events but it is tough it worked it also done by a computer and you can see here that the level of nuance that the computer has done here and kind of realizing that this piece uses a lot of lines and arcs and has decided to actually connect this ladies eyebrow to her nose to shoulder as an arc and also have these kind of areas of birth of color and realize that her hair done would be a good place to have a color like it's quite a sophisticated rendition of both the style and the and the content so as you might have guessed the reason that fiction has become reality and computers have gone past what was previously a joke and indeed now they're generating art which is very hard to tell apart from real human art it's because of other thing called deep learning I don't have time today to go into detail about all of the interesting applications but I do have a talk on ted.com that you can watch if you have 18 minutes and get more information about it but before we talk more about deep learning let's talk about machine learning they are not one of the same thing deep learning is a way of doing machine learning so a machine learning was competitive so this guy Arthur Samuels in 1956 with him playing chess against an IBM mainframe rather than programming with IBM mainframe to play chess in sir checkers instead he got the computer to play against itself thousands of times and figure out how to play effectively and after doing that this computer beat the creator of the program so that was a big step in 1956 so machine learning has been around for a long time the thing is though that until very recently you needed an answer Samuels to write your machine learning algorithm for you to actually get to the point that the machine could learn to tackle your task took a lot of programming effort and engineering effort not a lot of domain expertise to bring basically mainly to do what's called feature engineering but something very interesting has happened more recently which is that we have the three pieces that at least in theory or to make machine learning Universal so imagine if you could get a computer to learn and it could learn any type of relationship now when you see the word function as a mathematical function you might think like a line or a quadratic or something but I mean function in the most wide possible sense like the function that translates Russian into Japanese well the function that allows you to recognize the face of George Clooney digital functions that's what I mean by an infinitely flexible function so imagine if you had that and you had some way to fit the parameters of that function such that that function could do anything it could model anything that you could come up with such as the two examples I just gave that would be all very well you just need one more piece which is the ability to do that quickly and at scale and if you have those three things you now have a totally general learning system which is what we now have that's what big learning is deep learning is a particular algorithm for doing machine learning which has these three vital characteristics the infinitely flexible function is the neural network which has been around for a long time the all-purpose parameter fitting is back propagation which has been around since the really sensitive robberies really since 1974 that was not noticed by the world until 1986 until very recently though we didn't have it and the fast and scalable has recently come along for various reasons including the advances in GPUs which used mainly to play computer games but also turned out to be perfect for deep learning why there are a variety of data and some vital improvements to the algorithms themselves so it's interesting how this is working our Jeff Dean presented this from Google last week showing how often deep learning is now being used in Google products and services and you can see this classic hockey stick shape showing an exponential growth here Google are amongst the first or maybe the first at really picking up on using this technology effectively but you can imagine that if if Google is basically what they did was they set aside a group of people and they said go to different parts of Google tell them about deep learning and see if they can use it and for my understanding of people I know they're everywhere they went the answer was yes again and that's why we now have this this shape and of course the people that that original team talk to when I'm talking to other people and that kind of also creative so when I say people learning changes everything I certainly would expect in your organization's that you would probably find the same thing every aspect of your organization can probably be touched effectively by this an example when Google wanted to map the locations of every residents and business and France they did it in less than one hour they basically grabbed the entire Street View database these are examples of pictures from the street to database and they build a deep learning system that could identify house numbers and could then read those sales numbers and an hour later they had mapped the entirety of the country of France this is obviously something that previously would have taken you know hundreds of people many years and this is one of the reasons that particularly the startups here in the Bay Area this is important deep learning really does change everything because suddenly a start-up can do things that previously require huge amounts of resources so we've kind of seen a little bit of this before what happens when an algorithm comes along that you know makes a big difference and Yahoo discovered what happened they used to remember where 80% of home pages were Yahoo back in the day and Yahoo was manually curated by expert web servers and then this company came along and replaced the expert web service surface with a machine learning algorithm called teach rank and we all know what happened there now this was an algorithm that compared to deep learning is incredibly limited and simple in terms of what it can do but if you think about the impact that that algorithm had on on Yahoo well think about the impact of the collaborative filtering algorithm had on kind of Amazon versus Barnes & Noble now that we have really successful recommendation systems you can see how even relatively simple versions of machine learning have had huge commercial impacts already so what can deep learning through I just gives you a few examples paper last year showed that deep learning is able to recognize the content of both photos this new thing called the image not image net data set which is one and a half million photos and a very patient human had actually spent a time trying to classify thousands of these photos and tested themselves and found that they had a 5 percent error rate and last year it was announced by Microsoft Research that they had a system which was better than humans at recognizing images in fact this numbers now down to about 3 percent and so it keeps on dropping quickly so we seek one in computers can now see and they can see in a range of interesting ways that anybody here from China will probably recognize Baidu sure to and on by their shooter which is a part of popular fan of Google competitor one of their competitors giggles not their their Google akin system by doing you can upload a picture which is what I did here I upload the picture in the top left and it has come up with all of these similar images I didn't upload any text so it figured out the breed of the dog the composition the type of the background the fact that it's had its tongue hanging out and so forth so you can see that image analysis is a lot more than just saying it's a dog which is what the Chinese at the top saying is the golden retriever but really understanding what's going on there and I'll give you some examples of some of the extraordinary things that allows us to do shortly talking about Baidu they have now announced that they can recognize speech more accurately than humans in Chinese and English at least so we now have computers at a point where last year they can recognize pictures better than us and now they can recognize speech better than us Microsoft has this amazing system using deep learning where you can take a picture which large pits have been cut off in this case it as a panorama that was done quite badly that's the top picture now the bottom shows how it is automatically filled in it's guess as to what the rest might look like and so this is taking the image recognition to the next level which is to say can i construct an image which would be believable to an image recognizer which is part of something called generative models which is a huge area right now again this is a freely available software that you can download off the internet and a monkey that's in Spain okay there you go if I had Pete blowing system here I probably could have looked it up so generative models are kind of interesting this is like in some ways more quickly than anything else but I think it's fascinating these pictures here the four corners are actual photos the ones in the middle are generated by people learning algorithm to try and interpolate between the sets of photos but what you can do more than that is you can then say to the deep learning algorithm what would this photo look like if the person was feeling differently and then we can animate that and that is nothing is not creepy I mean the interesting thing here you can see it's doing a lot more than just plastering a smile on their faces you know their eyes are smiling beta phases of routing we can even take some famous paintings and slightly change how they're looking or we can do the same to the Queen all we need to do the same to Mona Lisa and you can see if she's moving her eyes up and down again the whole features are groupings as well one of the interesting things about this Mona Lisa example was that this system she's looking pretty shifty now isn't she this system was originally trained without having any paintings in the training service only trained with actual photos and one of the interesting things about deep learning is how well it can generalize to plaques of data it hasn't seen before in this case it turns out that it knows how to generate different page movements for paintings a lot of people think the seek learning is just about big data it's not Ilya sutskever from open IFA I presented last week a new model in which he showed that a very famous data set called n dust which we'll learn about more shortly but it's basically trying to recognize digits it's digit recognition it's very old classic machine learning problem he discovered it was just 50 labeled images of digits he could train a 99% accurate in a number classifier so we're not talking millions or billions we're talking you and so these racing advances that allow us to use small amounts of data is something that's really changing what's possible with deep learning it's also turning really anybody into an artist there's a thing called neural doodle so it allows you to whip out your stylus and jot down some sophisticated imagery like this and then say how you would like it rendered what style in this case it was being rendered as Impressionism you can see it's done a pretty good job of generating a an image which hopefully fits what the original artist had in their head with their original doodle and it's not just about images about text as well lawyers in combining the two this is a field called multimodal learning these sentences totally novel sentences constructed from scratch by a deep learning algorithm but after looking at the picture so you can see that in order to construct this completes learning algorithm must have understood a lot about not just what two main objects in the picture are but how they relate to each other and what they're doing so I got so excited about this that three years ago I left my job at Cargill and spent a year researching what are the biggest opportunities for deep learning in the world and I came to the conclusion that the number one biggest opportunity at that time was medicine I started a new company called in Luke and we had four of us all computer scientists and mathematicians know of medical people on the team and within two months we had a system for radiology which could predict the malignancy of lung cancer more accurately than a panel of four of the world's best radiologists this was kind of very exciting to me because it was everything that I hoped was possible also always somehow surprising when you actually run a model and it's classifying cancer and you genuinely have no idea how give it because of course all you do is set up the kind of situation in which you can learn and then it does that learning so this turned out to be very successful and remember that today is raised fifteen million dollars it's a pretty successful company and one of the things I mentioned is that earlier by be sure to example of taking a picture and finding similar pictures that's doing big things in radiology it basically allows radiologists to find previous patients from databases millions of CT scans and MRIs to find the people that have medical imagery just like the patient that they're interested in and then they can find out exactly the part of that patient you know how did they respond to different drugs you know so forth so this kind of semantic search of imagery is a really exciting area so one thing interesting about my particular CV when it comes to creating a so steep learning medical diagnostic company is not so much what I've done so perhaps what I haven't done and so that's that's the entirety of my actual biology life sciences and medicine experience and one of the exciting things to those of you who are entrepreneurs are interested in being entrepreneurs is that there is no limit as to what you can hope to do you know you recognize the problem that you want to solve and that you care about and that hopefully you know maybe hasn't been solved before that well and and have a go and really you can do a lot and once you in my case once I kind of showed that we could do some useful stuff you know in oncology and we got covered by CNN on one of the TV shows and then suddenly the medical establishment kind of came to us at which point we've got a lot of help from the medical establishment as well so you can get this nice feedback loop going on so most importantly deep learning can also do choreography so if you're excited about this and think this all sounds interesting you might be wondering well where can you learn more and the answer you won't be surprised to hear is the data Institute we haven't previously announced this but I'm going to announce now that the first to our knowledge the first ever interim University accredited deep learning certificate will be here at the data Institute starting in the second lesson we'll start in late October so I invite you all to join you might be wondering when the first lesson is and the answer is it's right now so let's get started so you came to University come on you could have expected me to be studying here there's no slacking off so what I'm going to show you is so the sticker course will be seven weeks or two and a half hours each we don't have two and a half hours right now so this one's by necessity be heavily compressed so it just doesn't make as much sense as you might like it to don't worry the MCM students will certainly follow along fine but if you would a but I'll try and make it as clear as possible one of the things that I strongly believe is that deep learning is easy it is made hard by people who put way more math into it than is necessary and also by what I think is a desire for exclusivity among certain kind of deep learning specialists they make up create the new jargon about things that are really very simple so I want to kind of show you how simple it can be and specifically we're going to look at eminence the data set I told you about which is about recognizing handwritten digits and I'm going to use a system called Jupiter notebook for those of you that don't code I hope the fact that this is done in sodas and true of pudding you certainly don't need to use code for everything but I find it a very good way to kind of show what's going on so I'm going to have to make sure that we actually have this in running not track okay try that okay okay so I'm going to load the data in and so the data the emulous data has 55,000 28 by 28 images in it so we're going to take a look at one so here is the first of those images as you can see it is a 28 by 28 picture and as well as the image we also have labels which is just a list of numbers and so you can see that this has a common with pretty much every machine learning data set you have two things you have something which is information that you're given and then information that you have to derive so in this case the goal with this data step is to take a picture of a number and return what the number is so let's have a look at a few more so here's the first five pictures and the first five numbers that go with each one so this was originally generated by an isp and they basically had thousands of people draw a lots of numbers and then somebody went through and coded into a computer what each one was so I'm going to show you some interesting things we can do with pictures the first thing I'm going to do is I'm going to create a little matrix here that are called top and as you can see I've got minus ones at the top of it and then one and then zeros or maybe you want to see officially then what I'm going to show you is in fact I want you to think about something which is what would happen if I took that matrix and what I want to do is basically shift it over this first image so I'm going to take the this three by three and I'm going to put it right at the top left I'm going to move it right a bit so I'm going to lose rather bit I'm going to go all the way to the end I'll start back here and all the way to the end and at each point that is kind of overlapping a 3×3 area of pixels I want to take the value of that pixel and I want to multiply it by each of the equivalent values in this matrix and add them all together and just to give you a sense of what that looks like here on the right is a low-res photo here on the left is how that photo is representative numbers so you can see here where it's black their lower numbers in the Twinkies okay and where it's white their high numbers in the 300 okay so that's how the store picked you to store it in your computer and then you can see here we've got an example of a particular matrix and basically we can multiply every one of these sets of three pixels by the three things and that matrix and you get something that comes out on the right this is this is basically what we're doing so in this case we're going to take this picture and multiply it by this matrix and so to make life a little bit easier for ourselves let's try and zoom in to a little bit of it so here's our original picture the first picture and like let's move into the top or left hand corner so it was that okay and it looks pretty good alright so let's think about what would happen if we took that three by three picture and like it was over here or it was over here what would happen so I want you to try and have a guess so what you think is going to happen for each one of these pixels and no going to have very much room yet so what I've done here is I've printed out the actual value of each one of those pixels okay so you can see at the top it's all black it's all zeros okay and in that bit where there's a little bit of seven pointing out true there's some numbers that go up to one so let's try it's called correlating by the way let's try correlating my top kilter with the picture and see what it looks like so here's the result and you can see at the top it's all zeros and up here we've got some high numbers and down here we've got some low numbers what does that look like that's what it looks like so test yourself how did you go did you figure out what that was going to look like so you can see basically what it's done if we look at the whole picture is it has highlight at the top edges so it's kind of pretty interesting right we've taken something incredibly simple which is this three by three matrix we've multiplied it by every 3×3 area in our picture each time you rattled it up and we've ended up with something that finds plot edges and so before deep learning this is part of what we would call feature engineering this is basically where people would say how do you figure out what kind of number this is well maybe one of the things we should do is find out you know where the edges of it are so we're going to keep doing this little bit more so one of the things we could do is look at other kinds of edges and it's quite nice in Python you can basically take a matrix and say rotate it by 90 degrees n times so if I rotate it by 90 degrees to one I now have something which looks like this and if I do that for every possible rotation you can see that basically gives me four different edge filters so words that you're going to hear a lot is convolutional neural networks because convolutional neural networks is basically what all image recognition today uses and the word convolution is one of these overly complex words in my opinion it actually means the same thing as correlation the only difference is that convolution means you take the original filter and you rotate it by 180 degrees I'm going to prove to you here so convolved my image by my top filter rotated by 90 degrees and plot it and you can see it looks exactly the same okay so when you hear people talk about convolutions this is actually all they mean they're basically multiplying it by each area and adding it up so we can do the same thing for diagonal edges so here are your four different diagonal hitchers and then I could try taking our first image and correlating it with every one of those and so here you can see a buttock runner at the top with the left bottom right and each of the diagonal so what have we done that well basically this is because this is a kind of feature engineering we have found eight different ways of thinking about the number seven for this particular rendition of the number seven and so what we do with that in machine learning is we want to basically create like a fingerprint of like what does the set intend to look like on average and so in deep learning to do that we tend to use something called max pooling and max pooling is another of these complex sounding things that is actually ridiculously easy and as you can see enticements actually a single line of code what we're going to do is we're going to take each 75 seven area because these are 28 by 28 right zero to this four seven by seven areas and find the value of the brightest pixel the max in each so this is the result of doing max pooling so you can see that this top edge one to higher there's some really big numbers here you can see that for the bottom left edge there's very little which is right okay so this is kind of like a fingerprint of this particular image so I'm going to use this now to create something really simple it's going to figure out the difference between 8 and 1 because that just seems like the easiest thing we can do they're very different numbers so I'm going to grab all of the 8 out of our MS data set and all of the ones and I'm going to show you a few examples of each of them okay for this makes missing one hopefully one of the things you're seeing here is if you're not somebody who codes or maybe you used to and you don't much feed anymore it's very quick and easy to code like these things are generally like one short line you know it doesn't take lots of mucking around like it used to back in the days of writing C code so what I'm going to do now is I'm going to create this this max pooling fingerprint basically for every single one of my eight and then what I can do is I'll show you the first five of them so here's the first five eight that are in our data session what they little fingerprints look like for their top edge such as for one of the edges so what I can now do is I can basically say tell me what the average one of those fingerprints looks like so all of the eight so that's what I'm going to do here I'm going to take the mean across all of the eight that have been pulled and here is what that looks like these eight pictures here are the average of the top side left side bottom side of the right side bottom bottom side right side and so forth so all of the HDNet data set so this is like our kind of ideal eight and so we can do something we can sort of this will over keep the exact same process to the ones and hopefully we'll be able to see that there'll be some differences and you can write you can see that the ones basically have no diagonal edges access all very library but they have very strong vertical edges so what we're hoping is that we can use this insight now to recognize eight verses one from the half hour Fodor Hendra I have little digit recognizer so the way we're going to do that is we are going to correlate for every image in our data set they're going to correlate it with each of these parts of the fingerprint basically that's what this single line of code here does that's defining that function so here's an example of taking the very first one of our aides and seeing how well it correlates with each of our fingerprints that's just an example so we're basically at the point where we can now put all this together so what I'm going to do is I'm going to basically say all right I'm going to decide whether something is an H or not so for this function cord is at night and it's going to return on one this is the sum of squared errors which won't bother explaining she but a lot of you probably already know what that is some squared errors and so basically if it's closer to the filters for being a one then it is to the what for being an eight is going to hell um decides that it's going to be a one or an eight so I'm just going to test and I'm going to say if the error is higher for the ones then it must be an ache and vice-versa so that's that's basically my little function so here's an example right here it's the very first one of the eight and I've tested to see my error for the eight filters and my error for the one courses you can see my error for the eight courses is lower than my error for the one filters that looks very hopeful so let's do it we can now calculate for our entire data set of aches and ones is it an eight and for our entire data set of eights and ones one – is that neat in other words is it not an eight so as you can see this is taking now a little while to calculate because it's basically running with on all of them okay so tune is the first set so for each of them eight five thousand two hundred times it said yes if it wasn't eight and three hundred eighty-seven times it said yes if it was a one so it's great it has successfully down something that can recognize a difference and what about the it's not innate and again it's done a good job it's for eight thousand nine hundred times if it's a one zero one one hundred and sixty-six times so it's a 158 so this these four numbers here called a classification matrix and when data scientists build these machine learning models this is basically the thing that we tend to look at decide whether they're any good or not so that's it that's the entirety of building a simple machine learning approach to image recognition so how do we make it better I'm sure you guys took lots examples of how to make it better and like one obvious way to make it better would be to not use the crappy first attempt I had at likely to literally the first eight features I guess came up with I'm sure there's a lot of much better features we could be using specifically there's a lot of much better 3×3 matrices we call these filters we could be using so that would be one separately to make these better another would be it doesn't really make sense to wear it like treating all of the filters is equally important you know with it's like averaging out how close they are and maybe some are more important than others more importantly though wouldn't we like to be able to say other stuff I don't just want to look for a straight edge or a horizontal edge but I want to look for something more complex which is a corner you know I'd love to go to fine corners just hear the deep learning is a thing that takes this and does all of those things and the way it does it is by using something called optimization basically what we do is rather than starting out with eight carefully planned filters like these we actually start out with a random filters and we or a hundred random filters and we set up something that tries to make those filters better and better and better so I'm going to show you how that works so rather than I'm optimizing filters we are going to optimize a simple line a lot of you have probably looked at linear regression sometime in your life and we're going to do linear regression but the big learning way so again this is going to be super simple the definition of a line is something that takes a slope a color coefficient and an x value and it gives you a X plus B all right probably everybody has done that amount of math at the very least so I have now defined a line again gonna have to restart this guy okay so after we define a line let's actually set up some data so let's say actual a is 3 and actual B is 8 okay so that's done so we're not going to create some random data are going to create 30 random points okay in forex if it's going to be a random number and then for y it will be the correct value of y based on this if minds okay so here is my X values and here is my Y values so now we've generated some data the machine learning goal if you were given this data would be against that you ever knew that the correct values of a were 3 and B is 8 you have to figure out what they were this is the equivalent of figuring out what the optimal set of filters are so my image recognition is basically the same thing but in this case my filters we have you know quite a few of them but here we just have two of them to make the reasoning simpler but actually is going to be exactly the same totally identical as to how this works so once you know how to do this you'll know how to do that deep learning thing I've been subscribed of actually optimizing these filters so to do it we do something very similar what we had before we basically have to define how do we know whether our prediction is good or not and so will basically say our prediction is good if it is if the squared error again we're using the squared error thing it is high versus low okay so that's as long as grid errors so our last function every deep learning algorithm is a loss function will be the errors based on the Y values that we actually have versus the result of applying our linear function and so then I have to start somewhere I have to start with some random numbers so I've just decided that let's start at getting that a is minus 1 and getting it B is positive 1 okay so if if that were the case what would my average must be and so this is on average Jeremy you would have been out by 8.6 ok so I want to improve that and the way I improve that is very simple basically figure out can I make it a little bit higher or a little bit lower and for each of my a guess in my big s and with my loss function go up or would it go down I've actually got a nice little Excel spreadsheet that actually does this I won't go through it in detail now but basically I've done exactly the same thing I've got my random X's and Y's I've got my predictions that's the linear function I've got my summer squared errors and you can see I've literally taken what's the value of y if I add 2001 to a what's the value of y if I add point oh one two three so what's the change in the error if I add point oh one two a and I want to be okay and so if I divide that error by my 0.01 that gives me what's known as the derivative and so anybody's done calculus or of course recognize this so all I need to do now is say okay if I increase X by a bit my loss function goes down if I increase deep so a by a bit my loss function goes down when Chris B by a bit increase debugger if my loss function goes down therefore I should increase increase a and B by a bit but how do we decide to aside what a bit is we just make it up it's called a learning rate okay and so I'm picking a learning rate with point no one so here is the entirety of how to do an optimization from scratch it's just this code here now it's basically saying okay calculate my predicted Y that's just my linear function with my a guess and my B guess and my X okay now I calculate my two derivatives and you'll see in this case I'm not doing at that slow way of adding 0.01 but everybody who's done calculus will notice or is a shortcut in calculus to doing things quickly and in case you're thinking you if you want to do deep learning you're going to have to remember all of your rules of calculus you don't in real life nobody does that in real life if you need a derivative you go to alpha dot Wolfram SOCOM and you type in the thing that you want your derivative on and you press Enter you wait three seconds you then go to plain text you double-click that copiers and you paste it okay and so that's what I've done here okay and then I placed the dead into my code all right so that's that that's how we do except shoeless okay so I bet you're glad you spent lots of time learning those stupid rule so okay now that I've done that don't worry too much about this code but basically what I'm going to do now is I'm going to animate what happens as we call this update function 40 times okay starting with a gasp of a of minus 1 and I guess would be as 1 and I'm going to plot the original data and my line and let's see what happens there it is okay so if I have to run it for a little bit longer of course if it was going to exactly hit but you can see that the line is different closer and closer to the data so just imagine now taking this idea and doing it for each of these filters what would happen and now I'm making something further imagine if you didn't just have these filters but imagine if these filters themselves became the inputs to a second set of filters that allow you to create a corner because I could say Oh a bit of top edge and a bit of right edge right assuming that the original thing did actually decide that edge is very interesting so it turns out that this is why the little guy here so we can be excited that we just successfully learned about deep learning it turns out that somebody did this and two years ago they showed the results they created lots and lots of layers optimized in exactly exactly this way not this is not some super dumbed down version this is it right they did this and they discovered that layer one I mean what also does one different truth they have color images rather than black and white this is mine out of service 84 examples they had on the first layer and you can see it's decided did it wanted to look for edges as well as gradients okay this is what that shows on the right hand side it's showing examples from real photos they had one in the half million photos real examples of like nine patches of photo that matched this particular patch this is layer one so then what they did was to say okay this guy's name is Matt Sylar he said okay what would happen now if we created a new layer which took these as inputs and combines them exactly the same way as we did with pixels and he caught it by a 2 and so layer 2 is a little bit harder to draw so instead he draws nine examples of how it gets activated by various images and you can see in layer two let's learn to find lots of horizontal lines of the vertical lines trying to find circles and indeed if you look on the right it's even already basically got something that signs something all right so a layer two is finding circular things stripy things edges and as we looked corners so what if layer is going to do layer 3 is going to do exactly the same thing but it's going to start with the DS this is just 60 and out of surf probably 60 or so filters here and so each part actually exponentially more sophisticated what it can do and so by layer 3 we already have a filter which can find text we have a filter that can find repeating patterns by layer 4 we have a filter which can filter which can find dog faces they layer 5 we have a filter that can find the eyeballs of lizards and birds the most recent deep learning networks have over 1,000 filters so can you imagine each of these exponentially improving levels of kind of semantic richness and that is why these incredibly simple things I showed you which is convolutions plus optimization applied to multiple layers can let you understand speech that isn't a human recognize images better than a human so that's basically the summary of what deep learning changes everything and if you want to have the rest of lesson 1 and overview and then let's see to come along in late October let's thank you [Applause]