WEBVTT 1 00:00:00.020 --> 00:00:02.130 Wendy Nilsen: Where we are, live. 2 00:00:10.820 --> 00:00:11.869 David Berkowitz: All right. 3 00:00:12.830 --> 00:00:14.840 David Berkowitz: Well, welcome everyone. 4 00:00:14.970 --> 00:00:26.370 David Berkowitz: My name is Dave Berkowitz, and I'm the Ad. Or assistant director for the Us. National Science Foundation's Directorate for mathematical and physical Sciences or Mps. 5 00:00:26.630 --> 00:00:35.729 David Berkowitz: And I'm delighted to welcome you to today's smart Health frontier symposium that has the title Driving Innovation with Fundamental Science 6 00:00:35.910 --> 00:00:40.260 David Berkowitz: precision medicine through mathematics and statistics. 7 00:00:40.710 --> 00:00:44.809 David Berkowitz: We have 3 distinguished panelists who'll be speaking today, all 8 00:00:45.050 --> 00:00:52.099 David Berkowitz: at the interface of mathematical and statistical science and biomedical technology 9 00:00:52.350 --> 00:00:55.020 David Berkowitz: before we get into their talks. 10 00:00:55.160 --> 00:01:02.389 David Berkowitz: I want to highlight some of the features of the smart health program that undergirds their work. 11 00:01:03.020 --> 00:01:08.719 David Berkowitz: We are hosting this symposium to celebrate the 10 plus anniversary 12 00:01:08.850 --> 00:01:16.460 David Berkowitz: 10 year plus anniversary of the Smart Health Program is a multidisciplinary collaboration between the National Science Foundation 13 00:01:16.680 --> 00:01:18.669 David Berkowitz: and the National Institutes of health. 14 00:01:18.850 --> 00:01:25.290 David Berkowitz: The Smart Health program is designed to accelerate innovative research in this area, bridging the gap 15 00:01:25.400 --> 00:01:29.610 David Berkowitz: between biomedical, fundamental, biomedical research. 16 00:01:29.690 --> 00:01:53.600 David Berkowitz: the mathematical sciences and transforming biomedical and science and public health in the United States. In so doing, a key component of this has been to create sustainable partnerships among mathematicians, statisticians, scientists, engineers, and biomedical researchers, so that we can ensure the most innovative and transformation 17 00:01:53.670 --> 00:02:00.110 David Berkowitz: research going forward and targeting the most relevant biomedical questions. 18 00:02:01.010 --> 00:02:12.630 David Berkowitz: We would like biomedical research to take full advantage of the power of mathematics and statistics as they build their programs as they build out their research directions. 19 00:02:12.740 --> 00:02:17.599 David Berkowitz: advances in mathematics, AI and other technological informations 20 00:02:17.720 --> 00:02:26.419 David Berkowitz: are poised today to make widespread changes in the delivery of medicine, ultimately and in biomedical research. 21 00:02:26.860 --> 00:02:38.729 David Berkowitz: But the integration of these advancements in fundamental science Mps. Science with health related science has been somewhat slow, and it is our intent to catalyze this. 22 00:02:39.330 --> 00:02:52.869 David Berkowitz: Today's symposium is going to showcase how advances in math can develop novel approaches to precision, medicine that have the potential to be paradigm, shifting in healthcare and improve the health of 23 00:02:52.930 --> 00:03:18.289 David Berkowitz: millions of Americans. Wider adoption of such emerging approaches will not be possible without a principled assessment of such new techniques, to ensure their reliability, safety, and fairness. So this seminar will showcase how math and statistics and innovations in these fundamental sciences help to develop such foundational approaches in precision. Medicine 24 00:03:18.410 --> 00:03:39.850 David Berkowitz: attacking such diverse problems as individualized risk assessment for glaucoma AI assisted stroke, rehabilitation, and predictive models for maternal health, for women of color together, mathematical and statistical researchers in the biomedical community can bring this research to fruition to generate a healthier 25 00:03:39.960 --> 00:03:53.470 David Berkowitz: United States of America, and with that I would like to pass the baton to my esteemed colleague, Wendy Nelson, who, from the size directorate who will introduce our speakers? Wendy. 26 00:03:53.780 --> 00:04:13.039 Wendy Nilsen: Great. Thank you, David. We're so grateful to be partnering with the Mps directorate as well as with our colleagues at Nih, so it's an exciting effort. So we're going to have 3 wonderful speakers today. Carlos Fernandez. Sorry, Carlos. I messed that one up 27 00:04:13.432 --> 00:04:25.199 Wendy Nilsen: and Annie Chu, and then Christopher Weichel will be presenting. And I'm gonna I'm gonna introduce each one specifically as they go. But I'm gonna take these slides down now. 28 00:04:25.756 --> 00:04:38.199 Wendy Nilsen: And I'm gonna let Annie put her slides up while I give you a brief on Annie Annie, Q. Phd. Is the Chancellor's Professor Department of Statistics at the University of California in Irvine. 29 00:04:38.450 --> 00:05:03.280 Wendy Nilsen: her research focuses on solving fundamental issues regarding structured and unstructured large scale data and developing cutting edge statistical methods and theory and machine learning and algorithms for personalized medicine, text mining recommender systems, medical imaging data and mobile health data for complex heterogeneous data. Before joining Uc. Irvine, Dr. Chu was the Data 30 00:05:03.280 --> 00:05:08.920 Wendy Nilsen: Science, founder, Professor of Statistics, and the Director of Illinois Statistics Office 31 00:05:09.000 --> 00:05:28.990 Wendy Nilsen: at the University of Illinois, Urbana-champaign. She also serves as a journal of the American Statistical Association theory and methods. Co-editor from 2023 to 2025. And as an Ims program secretary from 2021 to 27. With that I'm going to hand it over to Dr. Chu. 32 00:05:30.270 --> 00:05:44.320 Annie Qu, PhD: Thank you, Wendy, for the very kind introduction. Also, I'm very grateful for this opportunity, and which allow me to share my research also, before I started. Also, I'd like to thank Yuri gear. Organize this event. 33 00:05:44.950 --> 00:05:46.130 Annie Qu, PhD: So 34 00:05:47.838 --> 00:06:03.970 Annie Qu, PhD: okay, so first, st let me give you some introduction to wearable health technology, wearable device, actually are transforming our healthcare. More and more. People are wearing a wearable device, such as Smartwatch. 35 00:06:03.970 --> 00:06:28.920 Annie Qu, PhD: Smart Ring Fitbit. I can just show you. And this is a smart ring I'm currently wearing every day. So there are a lot of applications for wearable device. And today I'm going to just focus on the wearable device which can track our stress and sleep and also physical activity. So you can see this screenshot, this actually from 36 00:06:28.920 --> 00:06:44.319 Annie Qu, PhD: my own data. For one night. You can see there's my sleep parameter, which have a deep sleep, rapid eye movement, time, light, sleep, and total duration time, and also has a minimal heart rate. 37 00:06:44.320 --> 00:07:08.210 Annie Qu, PhD: And so I didn't just have all the screenshots actually, for this data, you also have heart rate variability, which is a physiological measurement to track our physiological stress. So you see that you wear the smart ring just for one day you get a lot of information and also have much step, track. 38 00:07:08.210 --> 00:07:34.189 Annie Qu, PhD: movement and exercise intensity on the right screen. I also show you this is the mobile app we developed. We call this ecological momentary assessment Ema tracking, which also this is kind of subjective measurement for our emotion and well-being. So how happy you are! How content you are so forth. 39 00:07:34.190 --> 00:07:49.640 Annie Qu, PhD: So for this type of a data, we encounter some unique challenges. I'm going to talk some of the challenges, probably not list of all. And this challenge will also motivate us to develop innovative statistics, methodology. 40 00:07:49.890 --> 00:08:03.239 Annie Qu, PhD: First, st we realize we have this irregularity due to this kind of we call multi resolution observation. So this occur like, suppose you have this non-uniform 41 00:08:03.240 --> 00:08:25.799 Annie Qu, PhD: time intervals within each time series. So this is not surprising, because always keep in mind that we are doing observational study. You have no control people when to wear and when not to wear. Okay, people just take off their ring or take off their watch when they take a shower or do the battery charge. 42 00:08:25.800 --> 00:08:49.870 Annie Qu, PhD: And also we have this varying time intervals across multiple Time series. And I'm going to give a little bit more details a little bit later, and in addition, on the top of that, we also have a lot of irregularity among subjects, in particularly this high heterogeneity among the subjects we also encounter large missing data 43 00:08:49.870 --> 00:09:13.500 Annie Qu, PhD: we have already mentioned, due to the observational studies and some data in some project. We also have very small sample size, for example, for the pregnant woman, we are only able to recruit about 20 something subjects, and that's all we have. So getting a little bit more details about multi resolution time series data. So we have mentioned many 44 00:09:13.500 --> 00:09:37.750 Annie Qu, PhD: measurements, including your movement, your heart rate, your heart rate variability to measure your stress, and also your well-being so very commonly that it's impossible you're going to measure them at the same resolution. There's no point you ask someone, are you happy every 5 min. Right? So it's a totally pointless 45 00:09:37.750 --> 00:10:02.380 Annie Qu, PhD: and also some longitudinal data could have a low resolution due to the technical technical limitation, because some data was calculated based on the other measurements, and therefore you need to require certain cumulated information to do the calculation. And here I give you example about the irregularity and heterogeneity among 46 00:10:02.380 --> 00:10:26.760 Annie Qu, PhD: subjects. So some people are morning Person. Some people are evening person. So you clearly see the stress level really vary across different subjects, and we also notice. On the right figure we notice the observation, time, and the frequency of measurements also fluctuate among the different individuals. So the 1st figure is about the heart rate 47 00:10:26.760 --> 00:10:37.580 Annie Qu, PhD: and the low figures about the stress. You see the different resolution and different time points and different gap, the points, the measurements are missing. 48 00:10:39.860 --> 00:10:48.990 Annie Qu, PhD: So how can we handle these challenges? So I'm talking about data integration in a sense that 49 00:10:49.487 --> 00:11:13.350 Annie Qu, PhD: we have this observed time series with all the irregularity. So the idea is that, can we sort of do some transformation in a sense that we can project this multi-resolutional time series into this dynamic latent space. And therefore this irregularity with a lot of missing data. 50 00:11:13.350 --> 00:11:37.680 Annie Qu, PhD: can be transformed to be more continuous, more informative, latent information. And this represent this dynamic latent representation. The reason we can do this because there are intrinsic correlation among all this time series. Therefore, we can borrow information among this correlation to do this projection. 51 00:11:38.030 --> 00:12:02.839 Annie Qu, PhD: In addition, we can also borrow information, cross subjects, because clearly certain physiological phenomenon also shared among all subjects. For example, if the room temperature is very high and everyone's heart rate is going increased. Okay, so this is applied for everyone. And this also, I learned over the time that when, if you want to have a good night's sleep. You try to 52 00:12:02.840 --> 00:12:13.109 Annie Qu, PhD: make sure your room temperature as low as possible as you can tolerate, and this really help your sleep, because make your make your heart rate slow down. 53 00:12:13.450 --> 00:12:14.180 Annie Qu, PhD: Okay. 54 00:12:16.100 --> 00:12:44.769 Annie Qu, PhD: okay, so I'll give you a little bit more details. So how can we handle this? Both individualized information and also population, wise common information. So here we do the model to this decomposition in a sense that we have a dynamic latent factor, Theta, which is, use information from individual information, and this can capture the individual trend. 55 00:12:44.850 --> 00:13:11.349 Annie Qu, PhD: On the other hand, we also can borrow information from the population, wise subject, and this is captured by latent factor F, and through this kind of a multiplication we also say in the product, we are able to capture this interaction effect among the individual effect and the common population-wise information. So we can better explain the data. 56 00:13:11.890 --> 00:13:30.940 Annie Qu, PhD: Okay, so give you more example. When we're doing this kind of modeling, we are able to achieve something other method that they are not able to achieve. So here the figure, the top figure, is the heart rate, and the blue dots is observed the training data. So suppose you know all the information 57 00:13:30.940 --> 00:13:42.570 Annie Qu, PhD: and the low figure is about the stress figure. And you notice that in one time period the data points are missing that we don't have the observation during this time period. 58 00:13:42.570 --> 00:13:49.750 Annie Qu, PhD: using our method that we able to able to capture that during this time period 59 00:13:49.750 --> 00:14:01.539 Annie Qu, PhD: the the stress level actually very high for this subject. Okay, so this is very useful. So in contrast to other methodologies, such as deep 60 00:14:01.540 --> 00:14:29.659 Annie Qu, PhD: recurrent neural network, the green line, the measurements, the prediction, pretty much flat. So they're not able to incorporate the information from the heart rate infer that the stress level is also very high during this time period. Now notice that during this time period there's no exercise, so there's no reason why the person suddenly have a heart rate except the stress level could be very high. 61 00:14:29.660 --> 00:14:54.509 Annie Qu, PhD: Okay, so if you're interested in more details, you can also check our paper. Another thing I want to talk about this is the auto ring data for the pregnant woman we find both homogeneous finding and heterogeneity finding. So, for example, it's actually not surprising that if you have a more deep sleep it associate, it leads to a low stress. And actually, if you have a 62 00:14:54.510 --> 00:15:18.699 Annie Qu, PhD: more rem sleep, which is, when you have a vivid dream, your body is frozen during the rapid eye movement your body is frozen, but meanwhile you could have vivid dream. If you have a more rem sleep, it links to high stress and also low resting heart rate links to the low stress and also older pregnant women tend to have a high stress and high pre-pregnant 63 00:15:19.010 --> 00:15:30.779 Annie Qu, PhD: Bmi, also linked to the high stress. On the other hand, we find some heterogeneous among this subgroup for the subgroup with 64 00:15:31.260 --> 00:16:00.510 Annie Qu, PhD: very high emotional distress level, measured by Ema data, we show that walk more, more steps help to relieve their stress level. But for the moderate and the low emotional distress group, the more walking doesn't have a significant impact to low their stress level. Here the imussd is a high is a better and low is a worse for the stress. 65 00:16:00.770 --> 00:16:26.439 Annie Qu, PhD: So come down to the conclusion and some discussion. So today, actually, I kind of really focus on about the stress management. We are aware that chronical stress could link to the mental health issue and also could link to the chronicle disease, even cancer. And therefore, if our methodology can help us interpret 66 00:16:26.440 --> 00:16:43.879 Annie Qu, PhD: the unobserved, the time points where the data was missing. But we're able to do a precise prediction for the stress prediction. This will be useful to do the better stress management. And 67 00:16:43.910 --> 00:17:08.569 Annie Qu, PhD: so and also we talk about some statistic methodology, for example, how to address, heterogeneity, multi resolution, time series, data, low sample size, informative, missing. I want to point out that data. Integration is a key. On the one hand, we can use data integration to alleviate heterogeneity. For example. 68 00:17:08.569 --> 00:17:33.490 Annie Qu, PhD: we can extract shared information from this heterogeneous source. For example, the multimodality and heterogeneous subjects for better pattern discovery and interpretation. On the other hand, we can also use data integration to harness. The heterogeneity sounds like a contradiction, but they are not. We can borrow information across heterogeneous sources 69 00:17:33.490 --> 00:17:49.730 Annie Qu, PhD: to improve the individual prediction, to enhance the position modeling. So also we could do some statistic inference, such as a conformal prediction. We can quantify this uncertainty on prediction. 70 00:17:49.780 --> 00:18:16.180 Annie Qu, PhD: So this is the last slide to talk about the current. We also recruit Uci, graduate students and postdoc from Stanfield and health science. We try to gain better insight about their stress variation. Using one look like this, and also the Ema Mobile app we have developed. So thank you for your attention, and this is the last acknowledgement. 71 00:18:18.730 --> 00:18:26.170 Wendy Nilsen: Thank you so much. Dr. I'm gonna we're gonna hold questions till the end. And I think 72 00:18:28.360 --> 00:18:38.600 Wendy Nilsen: we'll we'll hold some questions till the end, and unless you have a specific question about that one of the presentations, I think it'll be best if we save till the end. So. 73 00:18:39.147 --> 00:18:41.790 Wendy Nilsen: Thank you, Dr. Chu. That was fabulous. 74 00:18:41.960 --> 00:18:45.130 Wendy Nilsen: and I'm going to introduce our next speaker and and 75 00:18:45.230 --> 00:19:05.729 Wendy Nilsen: Dr. Dr. Weichel can start putting his slides up. There we go. Dr. Weichel. Christopher Weichel is Phd. Is a curator's distinguished professor of statistics at the University of Missouri. He obtained his Phd. From Iowa State University, and it has been on the faculty from the University of Missouri for 27 years. 76 00:19:05.760 --> 00:19:15.769 Wendy Nilsen: His research specialty is spatial temporal statistics, with applications to geophysical processes, complex biological processes, and the environment. 77 00:19:15.910 --> 00:19:41.620 Wendy Nilsen: He focuses on develop computationally efficient, deep hierarchical Bayesian dynamic, spatiotemporal models motivated by scientific principles with more recent work at the interface of deep neural modeling and statistics. He's a fellow of the Asa Ims Isi and Aaas, and has published 2 award-winning books in spatiotemporal statistics with that, Dr. Weichel, I'll let you go. 78 00:19:42.830 --> 00:20:03.569 Christopher K. Wikle, PhD: Great. Thank you so much. It's a pleasure to be here to celebrate the 10 year anniversary of this awesome collaborative program between Nsf and Nih, and for me to represent my colleagues in this interdisciplinary team. And with that let me just say this is my group, and I'm putting it up at the beginning, because 79 00:20:03.570 --> 00:20:15.209 Christopher K. Wikle, PhD: I want you to recognize that it's multiple institutions and multiple individuals across all sorts of disciplines, statistics, engineering, computer science applied math math 80 00:20:15.290 --> 00:20:19.219 Christopher K. Wikle, PhD: and physicians. And 81 00:20:19.270 --> 00:20:37.169 Christopher K. Wikle, PhD: it's a real collaboration in the sense that everybody is actually contributing to everything here. So it's really wonderful. So let me just get started. So what's glaucoma? Well, it's a degeneration of your optic nerve and the loss of cells on your retina. And this is sort of what it looks like 82 00:20:37.170 --> 00:20:50.840 Christopher K. Wikle, PhD: if if you have it, and as it progresses and you can see, you sort of your vision starts decreasing from the outside in, and it is the leading cause of irreversible blindness in the Us. And in the world. 83 00:20:50.840 --> 00:20:59.480 Christopher K. Wikle, PhD: and its prevalence changes, of course, depending on race ethnicity and location and age. But it's a major problem. 84 00:20:59.480 --> 00:21:12.250 Christopher K. Wikle, PhD: So how's it diagnosed? Well, you know. Obviously, if you start seeing vision problems like mentioned here, that would be a clue. But you can also test for that using visual field testing. 85 00:21:12.250 --> 00:21:36.119 Christopher K. Wikle, PhD: And that's a fairly non-invasive procedure, and many of you might have had that done before to really solidify it, though you would have to go in and look at the structural damage. And you would use cameras, for example, to look at the optic nerve and other things that are very specialized, and not a lot of clinics would not have access to that information. So what causes glaucoma? 86 00:21:36.480 --> 00:21:59.670 Christopher K. Wikle, PhD: Well, we don't really know. We do know that it's often associated with increased ocular pressure. So interocular pressure or Iop is just the pressure inside your eyeball. And when that starts pushing on the optic nerve, it can actually damage it. Often Iop is used as a surrogate for glaucoma, and that's because it's really the only treatable factor. 87 00:21:59.750 --> 00:22:02.490 Christopher K. Wikle, PhD: And so when you go to 88 00:22:02.510 --> 00:22:26.099 Christopher K. Wikle, PhD: an optometrist they'll check you for glaucoma. And what they're really checking is your iop. But unfortunately, it's not the only risk factor. And in fact, it's not a very good one in the sense that people with high Iop often do not develop glaucoma, and many of the people who have glaucoma don't have elevated Iop, and, in fact, if you treat it, if you treat high Iop 89 00:22:26.100 --> 00:22:35.179 Christopher K. Wikle, PhD: about a 3rd to a 4th of the people who have glaucoma still progress to blindness. So it has poor sensitivity and poor specificity. 90 00:22:35.180 --> 00:22:58.409 Christopher K. Wikle, PhD: So one of the things we're interested in is what other easy to measure. Risk. Factors might play a role here that would help us on an individualized basis to diagnose and project the progression of glaucoma. And so it's a multifactorial disease sort of the things, many things that that could be a factor. And I'm going to focus on blood pressure because it's so easy to measure. 91 00:22:58.820 --> 00:23:13.489 Christopher K. Wikle, PhD: And we know that blood pressure is a risk factor and is supported by many studies, but unfortunately these results are not very consistent. In some cases low blood pressure has shown to be a factor. In other cases high blood pressure has shown to be a factor. 92 00:23:13.620 --> 00:23:27.289 Christopher K. Wikle, PhD: So one of the main goals of what we're doing here is to come up with trying to understand the balance between Iop and blood pressure to help us with sort of individual diagnosis and treatment 93 00:23:27.400 --> 00:23:28.250 Christopher K. Wikle, PhD: plans. 94 00:23:28.430 --> 00:23:36.010 Christopher K. Wikle, PhD: So let's look at that blood pressure and Iop a little bit more. And so the can we sort of identify subgroups of glaucoma 95 00:23:36.280 --> 00:23:46.479 Christopher K. Wikle, PhD: levels or different stages of glaucoma through just these 2 measurements. And so there are studies that have looked at this. 96 00:23:46.480 --> 00:24:03.010 Christopher K. Wikle, PhD: And I'm focusing here on the Indianapolis glaucoma progression study, which was kind of unique because it was one of the few longitudinal studies. And so we have 7 years worth of data every 6 months on 1 15 individuals. And so 97 00:24:03.010 --> 00:24:17.390 Christopher K. Wikle, PhD: if you look at that, and you look at those individuals, and you try to cluster them with regards to Iop and and a measure of blood pressure, and the mean arterial pressure is just a linear combination of systolic and diastolic. 98 00:24:17.390 --> 00:24:21.479 Christopher K. Wikle, PhD: Unfortunately, there's no discernible clusters that can can be 99 00:24:22.130 --> 00:24:27.559 Christopher K. Wikle, PhD: that come out of that. And so the answer to this question seems to be, No, but 100 00:24:27.790 --> 00:24:39.650 Christopher K. Wikle, PhD: those things by themselves don't really talk about what's happening inside the vascular structure of your eyeball. And so the purpose of our of our 101 00:24:39.760 --> 00:25:03.369 Christopher K. Wikle, PhD: project is to look at physiology informed machine learning to see if we can help with this. And so the basic idea is, we obtain a suite of physiologically relevant features. And we get that through a mathematical model of the ocular hemodynamics of the eye. So basically, people might call this a digital twin these days, which is, we have a 102 00:25:03.370 --> 00:25:14.040 Christopher K. Wikle, PhD: numerical model of the vascular structure of the eye. And we use information from the real world to inform that. And then that tells us something about 103 00:25:14.060 --> 00:25:33.620 Christopher K. Wikle, PhD: the model, and we kind of go back and forth on that. So our team has developed that. And then so we use the features that can come out of that model in a machine learning or inferential framework. And so in particular, we we can take inputs that you could easily get from the 104 00:25:33.660 --> 00:26:02.820 Christopher K. Wikle, PhD: at your clinic. Iop and blood pressure run it through this mathematical digital twin. Get these physiology enhanced data sets? Do some machine learning, or your favorite type of clustering, and then show that that is significant in some sense and leading towards progression or structural progression and functional progression of glaucoma. And then, finally, we want to find out if clinicians will actually use this. 105 00:26:03.200 --> 00:26:08.987 Christopher K. Wikle, PhD: So so what did we find? Well, in that Indiana 106 00:26:09.700 --> 00:26:25.250 Christopher K. Wikle, PhD: study those data? We found that the 12 dimensional data set this physiology enhanced data set does, in fact, lead to discernible clusters. And this is sort of projecting that 12 dimensional 107 00:26:25.778 --> 00:26:36.959 Christopher K. Wikle, PhD: cluster space into 3 dimensions. So you can visualize it. And you can see there's there's 3 distinct clusters there. And the important thing about those clusters is that we looked into that 108 00:26:37.100 --> 00:26:44.930 Christopher K. Wikle, PhD: in the study we actually had other measurements on subjects, so we could actually identify the stage of glaucoma they were at. 109 00:26:45.010 --> 00:27:09.050 Christopher K. Wikle, PhD: And so what we find is that those 3 clusters actually do correspond to significantly different vascular behavior and glaucoma behavior in these individuals. And so the nice thing about that is suggest a very simple way. If this holds up a very simple way that clinicians could actually evaluate this on the spot. 110 00:27:09.050 --> 00:27:31.140 Christopher K. Wikle, PhD: And so, for example, if I take this 3 dimensional cluster cloud, and I project it into the blood pressure. Iop plane again. You can see now there are those clusters now. They're not perfectly separated here in this only 2 dimensions, but it's pretty good. And in fact, we could. If we do something like a support vector machine to identify 111 00:27:31.260 --> 00:27:53.479 Christopher K. Wikle, PhD: regions in that space, then we find that a clinician could actually use this quite simply, they could measure your Iop. They could measure your blood pressure, convert it, to map real quickly, and then see where you fall in here, and then, based on that, we know what the progression is likely to be from our results. For 112 00:27:53.490 --> 00:28:01.989 Christopher K. Wikle, PhD: where you fall in this group, and, for example, the Green Group, there is kind of the the best group in the sense that they typically do not 113 00:28:02.030 --> 00:28:06.660 Christopher K. Wikle, PhD: find that we find that they do not progress 114 00:28:06.890 --> 00:28:24.540 Christopher K. Wikle, PhD: in their glaucoma, whereas other ones are much worse. And so what we're doing now is, we're sort of using transfer learning to other studies here to see if this holds up. And we're also trying to understand a little bit better the progression of the disease based on our 115 00:28:25.250 --> 00:28:26.270 Christopher K. Wikle, PhD: physical model. 116 00:28:26.430 --> 00:28:51.070 Christopher K. Wikle, PhD: So another component of this work is about uncertainty, quantification, and this is one of the parts that's dear to me, and what I work on a lot. And the reason for this is because we have a mathematical model that's deterministic. But yet we know there's all sorts of uncertainties in various places, including the inputs. And so, for example, if I'm interested in thinking about how somebody's 117 00:28:51.240 --> 00:28:51.930 Christopher K. Wikle, PhD: oh. 118 00:28:52.370 --> 00:29:07.690 Christopher K. Wikle, PhD: hemodynamics in their eye is going to change throughout the day as a function of their blood pressure and heart rate and Iop, those things change throughout the day. We know that there's diurnal cycle. In fact, Annie's talk kind of shows that to some extent, too. 119 00:29:07.690 --> 00:29:30.700 Christopher K. Wikle, PhD: that that there's uncertainty here. And so we could say, Well, let's just do a Monte Carlo analysis by running our mathematical model many times with these inputs over this diurnal cycle throughout the day, and we could see what happens. But the problem is, the mathematical model is expensive to run. So what we do is we build a surrogate statistical model to emulate the mathematical model which is very fast to 120 00:29:31.070 --> 00:29:56.900 Christopher K. Wikle, PhD: to simulate. And so, in particular, it's a what I would call a hybrid, statistical, extreme learning machine sort of hybrid, neural, statistical model, which is the things that my group works on. And then what it gives us is this is just one of the variables from the mathematical model. But now not only do we get uncertainties, but we can start looking at different scenarios very quickly, like what happens under the case where you have high blood pressure 121 00:29:56.900 --> 00:30:04.270 Christopher K. Wikle, PhD: and normal Iop, or high blood pressure and extreme Iop. 122 00:30:04.270 --> 00:30:30.049 Christopher K. Wikle, PhD: and vice versa. You can look at all these different things with uncertainty. And so then we can start actually making some imprint about that, and how things are likely to change. So I just want to emphasize. It would be impossible to measure these things with current technology. Something like the mean blood pressure in the central retinal artery over this many subjects over this amount of time that would not be possible. 123 00:30:30.670 --> 00:30:48.379 Christopher K. Wikle, PhD: So the last thing that I wanted to say that we're doing that I'm excited about is that we also want to know how physicians would react to this, what's their take? And so we did this pilot study. So to understand whether 124 00:30:49.630 --> 00:31:04.969 Christopher K. Wikle, PhD: clinicians and ophthalmologists would use AI if it came about. And you can see some of the questions here and again. This is a small study with only 18 participants. But it's just an idea to get an idea what people would say. 125 00:31:04.970 --> 00:31:24.599 Christopher K. Wikle, PhD: And you can see by looking at some of those quotes that basically, if you summarize it, that they do believe AI is vital to ophthalmology and machine learning is vital and that it will inform their practice. But they still think there needs to be a balance between the computer, what the computer tells them, or what the AI or the statistics tell them. 126 00:31:24.600 --> 00:31:36.380 Christopher K. Wikle, PhD: and then what they see themselves in their clinical practice, and they recognize there's some challenges still to this in terms of integrating this into their actual practice, even though we've shown it can be quite simple. 127 00:31:36.380 --> 00:31:50.609 Christopher K. Wikle, PhD: but also just making sure that everyone would have access to this. And so we have an ongoing study. That's also trying to understand how ophthalmologists currently use blood pressure to inform their assessment of patients. 128 00:31:51.106 --> 00:32:00.480 Christopher K. Wikle, PhD: And that study is still, we're wrapping up recruitment. And that'll be done really quickly here and surveys are going out. 129 00:32:00.660 --> 00:32:26.649 Christopher K. Wikle, PhD: So just to conclude, you know, this sort of physiology enhanced digital twin machine learning statistical approach. I don't know. We don't have a good title for the whole thing, but it's very promising, and it's really exciting, because there's all this cross multidisciplinary collaboration going on for each one of these components. I personally knew nothing about the hemodynamics of the eye, even though I'm 130 00:32:26.650 --> 00:32:37.639 Christopher K. Wikle, PhD: kind of a dynamicist by training. I found it fascinating. And so I'm super interested in the mathematical model as much as I am the machine learning and the statistics of this project. 131 00:32:37.680 --> 00:32:59.280 Christopher K. Wikle, PhD: So some of the things we have left to do, some transfer learning to other studies, getting better use of the uncertainty that comes from our fuzzy, clustering mechanism, building a more complicated peode to simulate the space-time structure of the eye mathematically, and building a statistical emulator of that. 132 00:32:59.360 --> 00:33:05.610 Christopher K. Wikle, PhD: and then finishing this last study or this second study on ophthalmologist use of blood pressure. 133 00:33:05.700 --> 00:33:11.432 Christopher K. Wikle, PhD: So that's where we are. And it's like you said, I just want to emphasize how 134 00:33:11.920 --> 00:33:18.609 Christopher K. Wikle, PhD: fun this project is, and if you have any questions, feel free to email me, and I'm happy to send you some references or request. 135 00:33:20.830 --> 00:33:41.949 Wendy Nilsen: Dr. Weichel. Thank you so much. That was fabulous. And last, but definitely, not least, is Carlos Fernandez, granda, and he's the associate professor of mathematics and data science at New York University. During his Phd. He developed a mathematical theory of super resolution methods based on convex optimization. 136 00:33:42.130 --> 00:33:55.960 Wendy Nilsen: Since joining Nyu, his group is focused on the design and analysis of data, science methodology with particular emphasis on machine learning motivated by applications in medicine, climate, science, and scientific imaging. 137 00:33:56.510 --> 00:33:59.000 Wendy Nilsen: I'm going to turn it over to you now. Thank you. 138 00:33:59.270 --> 00:34:02.260 Carlos Fernandez-Granda, PhD: Thank you very much for the kind introduction. Can you see my slides. 139 00:34:05.440 --> 00:34:06.120 Wendy Nilsen: Yes. 140 00:34:06.250 --> 00:34:15.609 Carlos Fernandez-Granda, PhD: All right. Thank you. So today, I'm going to talk about a new method for anomaly detection based on model confidence that we apply to a medical application. 141 00:34:16.040 --> 00:34:21.240 Carlos Fernandez-Granda, PhD: Let me begin with the motivating application we're interested in stroke 142 00:34:21.510 --> 00:34:27.490 Carlos Fernandez-Granda, PhD: stroke, as many of you probably know, corresponds to lack of blood flow or bleeding in the brain. 143 00:34:27.790 --> 00:34:39.619 Carlos Fernandez-Granda, PhD: And unfortunately, it's a very serious medical problem in the United States and worldwide in the Us. There were. There are more than there are around 800,000 strokes a year currently 144 00:34:40.697 --> 00:34:44.199 Carlos Fernandez-Granda, PhD: a lot of patients that suffer from stroke 145 00:34:44.310 --> 00:34:57.300 Carlos Fernandez-Granda, PhD: afterwards have to endure serious, long-term disability, which is a terrible problem for them and their families. It reduces mobility in more than half of stroke survivors, age 65, and older. 146 00:34:58.440 --> 00:35:09.959 Carlos Fernandez-Granda, PhD: A key challenge that we looked at in this study is how to quantify and or monitor impairment in stroke patients in a practical way. 147 00:35:10.540 --> 00:35:21.380 Carlos Fernandez-Granda, PhD: So first, st I'm going to tell you how impairment is quantified in stroke patients right now. And the way it is is these patients. 148 00:35:21.750 --> 00:35:24.210 Carlos Fernandez-Granda, PhD: basically there needs to be a technician 149 00:35:24.590 --> 00:35:32.469 Carlos Fernandez-Granda, PhD: or an expert, rather, perhaps, who interviews these patients and sees how they move different limbs. 150 00:35:32.820 --> 00:35:40.940 Carlos Fernandez-Granda, PhD: different parts of the body like their shoulder their elbow, and essentially writes down the mobility. For each of these joints. 151 00:35:41.470 --> 00:35:52.029 Carlos Fernandez-Granda, PhD: and this can take up to 15 min, and again requires a trained expert. So it's very costly in terms of human resources, and also in terms of time. 152 00:35:54.790 --> 00:36:09.279 Carlos Fernandez-Granda, PhD: Current assessment, as I said, is is time consuming, and requires an expert. Our goal is to try to perform this quantification of impairment directly from video or wearable sensor data. 153 00:36:09.720 --> 00:36:38.999 Carlos Fernandez-Granda, PhD: This could enable monitoring patients in a like at a higher time resolution so more often so they don't have to come into the clinic to have some some expert do the assessment. It would perhaps be more objective, as it wouldn't depend on what expert is is making the assessment, and it would be affordable because it wouldn't rely. It wouldn't require an expert to be involved or the patient to go to the clinic. 154 00:36:39.210 --> 00:37:00.490 Carlos Fernandez-Granda, PhD: This is a visualization of a wearable sensor data here on the left, and also a video of a patient performing a rehabilitation task. You can see the sensors on the patient's back and on the patient's arms, and the data on the left correspond to accelerations and rotations of the sensors. 155 00:37:02.110 --> 00:37:08.400 Carlos Fernandez-Granda, PhD: The idea is to try to automatically quantify the degree of impairment of the patient from such data. 156 00:37:10.840 --> 00:37:26.459 Carlos Fernandez-Granda, PhD: Let's see if I can. Yeah, okay, so we run into a big problem when we try to apply standard machine learning methodology to solve this challenge, which is that the largest publicly available data set 157 00:37:26.740 --> 00:37:31.415 Carlos Fernandez-Granda, PhD: consists of data from 51 patients. So 158 00:37:32.210 --> 00:37:39.800 Carlos Fernandez-Granda, PhD: for a machine learning in order to train and test a machine learning model. This is way too too little. 159 00:37:40.160 --> 00:37:53.039 Carlos Fernandez-Granda, PhD: And this is a pervasive problem there. There are basically very little data available with the corresponding impairment impairment level of the patients. 160 00:37:53.680 --> 00:37:56.049 Carlos Fernandez-Granda, PhD: Therefore, we had to get a bit creative. 161 00:37:56.580 --> 00:37:58.870 Carlos Fernandez-Granda, PhD: And we developed a framework 162 00:37:59.080 --> 00:38:08.110 Carlos Fernandez-Granda, PhD: which uses AI models that are not trained on the stroke patients, but rather they're trained on a healthy population. 163 00:38:08.720 --> 00:38:20.779 Carlos Fernandez-Granda, PhD: And then we use those AI models to quantify the deviation of a patient's movement from normal motion, and that allows us to quantify their degree of impairment. 164 00:38:21.710 --> 00:38:24.279 Carlos Fernandez-Granda, PhD: Let me explain in more detail. 165 00:38:24.500 --> 00:38:37.809 Carlos Fernandez-Granda, PhD: This is an anomaly detection problem. Because again, we're measuring the deviation from normality. We want to quantify to what extent data differ are different from our reference population. 166 00:38:38.030 --> 00:38:45.350 Carlos Fernandez-Granda, PhD: We call our method confidence-based characterization of anomalies. You will see in a moment where the confidence part 167 00:38:45.480 --> 00:38:50.370 Carlos Fernandez-Granda, PhD: comes in, and that's why there's a cobra dressed as a doctor here. 168 00:38:50.490 --> 00:39:04.689 Carlos Fernandez-Granda, PhD: The idea is actually relatively simple. So we train a model to perform a clinically relevant task to what we're interested in which in this case is stroke impairment caused by stroke. 169 00:39:05.470 --> 00:39:13.930 Carlos Fernandez-Granda, PhD: And we train this model to perform this related task which is going to be identifying what motions people are doing on a healthy population. 170 00:39:14.620 --> 00:39:22.359 Carlos Fernandez-Granda, PhD: And then we use the model confidence, when applied to a new patient, to determine to what extent 171 00:39:22.560 --> 00:39:31.369 Carlos Fernandez-Granda, PhD: the movements of these new patients are anomalous to what extent they deviate from normal movement, and that allows us to quantify the degree of impairment 172 00:39:31.530 --> 00:39:32.620 Carlos Fernandez-Granda, PhD: in the patient. 173 00:39:34.060 --> 00:39:44.870 Carlos Fernandez-Granda, PhD: The motion. The basic task that our AI model is performing is identifying what movements the patients are 174 00:39:45.190 --> 00:40:11.930 Carlos Fernandez-Granda, PhD: are performing during rehabilitation. Here I'm just showing you a hierarchy of rehabilitation, art activities where people mimic daily activities, such as dressing, bathing, meal, preparation. This involves some functional movements that are cutting vegetables, tasting sauce, stirring the pot, etc. We are interested in more basic movements that are just reaching to grab an object, repositioning an object, transporting an object, stabilizing an object, and doing math. 175 00:40:13.450 --> 00:40:19.620 Carlos Fernandez-Granda, PhD: So these are some examples of this movement. This is a reach. This is a patient that is going to reach 176 00:40:20.160 --> 00:40:28.490 Carlos Fernandez-Granda, PhD: an object. And we're we're going to train an AI model to automatically identify when the patient is doing that, when they're reaching to grab an object. 177 00:40:29.360 --> 00:40:41.380 Carlos Fernandez-Granda, PhD: This is a transport where the patient is moving an object. In this case, the the arm that we see on the right is the arm that was affected by stroke, and the one which we should look at, they just moved an object. This is called a transport. 178 00:40:41.520 --> 00:40:49.739 Carlos Fernandez-Granda, PhD: Stabilizing an object is keeping an object in without moving, while another, and is manipulating them. Again, we have to look at 179 00:40:49.930 --> 00:40:51.340 Carlos Fernandez-Granda, PhD: the arm on the right. 180 00:40:52.120 --> 00:40:59.669 Carlos Fernandez-Granda, PhD: And a this is idle. So basically, the the patient is doing nothing with their their paretica. 181 00:41:01.600 --> 00:41:13.100 Carlos Fernandez-Granda, PhD: So we trained a neural network to automatically identify which of these simple motions were being carried out by the the individuals. 182 00:41:13.340 --> 00:41:15.910 Carlos Fernandez-Granda, PhD: This is an example of how our model works. 183 00:41:16.820 --> 00:41:20.950 Carlos Fernandez-Granda, PhD: So you have to look at the arm that is enclosed by this red oval. 184 00:41:21.310 --> 00:41:26.610 Carlos Fernandez-Granda, PhD: and the model essentially tries to predict which of these actions is happening at each time. 185 00:41:30.370 --> 00:41:32.210 Carlos Fernandez-Granda, PhD: It works reasonably well. 186 00:41:33.570 --> 00:41:54.329 Carlos Fernandez-Granda, PhD: Now, I'm going to get to the confidence which is crucial for anomaly detection process. So typically neural networks when they try to or in general, machine learning algorithms or statistical models that are classifying between different classes, they typically assign probabilities. 187 00:41:54.640 --> 00:42:00.169 Carlos Fernandez-Granda, PhD: or to each class, are conditioned on the data that have been observed. So in this case. 188 00:42:00.330 --> 00:42:19.560 Carlos Fernandez-Granda, PhD: when the video of this patient is observed over a little amount of time. The model might say that it thinks that this is a reach with probability, 0 point 1 transport with probability, 0 point 0 5 reposition 0 point 1 stabilize 0 point 0 7 and idle 0 point 6 8. So in this case the probability of idle is higher. 189 00:42:19.820 --> 00:42:40.209 Carlos Fernandez-Granda, PhD: and this is the class that would be assigned to this at this time we can interpret this highest probability as the confidence of the model, if that probability sorry, is close to one. This means the model is very confident, if it's close to 0. That means that the level is not confident at all. 190 00:42:40.720 --> 00:42:59.780 Carlos Fernandez-Granda, PhD: What we realized is that when we trained a model on healthy patients and looked at the confidences over a rehabilitation session for a held out healthy patient, not seen sorry, healthy subject not seen previously by the model, and compared those confidences to the ones 191 00:42:59.840 --> 00:43:11.670 Carlos Fernandez-Granda, PhD: and produced by the model. When the data comes from an impaired patient. We realized that there was a lowering in confidence because patients are impaired and 192 00:43:11.960 --> 00:43:25.050 Carlos Fernandez-Granda, PhD: their movements are different from the ones from the healthy patients. Here you can see a histogram of the confidences for the stroke, patient in red and a histogram of the confidences for the healthy individual in blue. 193 00:43:25.900 --> 00:43:40.019 Carlos Fernandez-Granda, PhD: And that's basically our method. So in our method, we train a model on healthy subjects, and then we apply the model to different patients, and depending on how the confidence decreases. That gives us a measure of impairment. 194 00:43:40.880 --> 00:44:08.780 Carlos Fernandez-Granda, PhD: I'm going to finish by showing you the results. Sorry this went back on a way on an independent test cohort. Here you can see on the X-axis our automatic score that uses this AI model trained exclusively on healthy patients based on this confidence. It's just the average confidence on each of these subjects and the Y-axis, you can see the Fugel Meyer score, which is this score that I showed you at the beginning, that is computed, based on a 15 min interview 195 00:44:08.780 --> 00:44:13.559 Carlos Fernandez-Granda, PhD: with a trained expert. And you can see that the correlation is extremely high. 196 00:44:14.070 --> 00:44:25.459 Carlos Fernandez-Granda, PhD: This is the same for videos. In the case of videos, the correlation is a bit lower, because there are certain confounding factors such as when the patients are manipulating an object. Some objects are a little bit more difficult to see. 197 00:44:25.760 --> 00:44:44.239 Carlos Fernandez-Granda, PhD: So with that I will finish. We have developed an anomaly quantification method that is based on AI models that are trained exclusively on healthy patients, and we observe high correlation with expert-based metrics. The lessons learned are that model confidence can be very informative about deviations from normality 198 00:44:44.380 --> 00:44:59.350 Carlos Fernandez-Granda, PhD: on average, and the taxiary labels can be very useful, even if they are only available for healthy subjects. In this case these are the labels indicating what motions these healthy subjects were doing. And that's it. These are the papers related to this project. 199 00:44:59.800 --> 00:45:01.400 Carlos Fernandez-Granda, PhD: I want to thank 200 00:45:01.540 --> 00:45:17.869 Carlos Fernandez-Granda, PhD: my co-authors, especially Heidi Shamra, at the New York University School of Medicine, who who led the clinical side of this project, and I really want to acknowledge the support of Nih and Nsf. Without whom this this research would have been impossible. Thank you very much. 201 00:45:19.130 --> 00:45:32.790 Wendy Nilsen: Thank you so much. Dr. Fernandez. Granda. All right. So we have a few minutes left for some questions, so I'd ask all of my speakers to come back on on their cameras. And I'm I'm 202 00:45:33.210 --> 00:45:44.669 Wendy Nilsen: got. We've got a bunch of questions, but I think there's some that really cross all of these. So I'm gonna start with the 1st question, who are your biomedical collaborators? You doing this all on your own? Or 203 00:45:45.420 --> 00:45:46.420 Wendy Nilsen: what are you all. 204 00:45:46.660 --> 00:45:48.489 Carlos Fernandez-Granda, PhD: Carlos, do you want to start? 205 00:45:48.490 --> 00:45:50.340 Carlos Fernandez-Granda, PhD: I just mentioned mine. So Heidi 206 00:45:50.690 --> 00:46:03.099 Carlos Fernandez-Granda, PhD: is at the Nyu School of Medicine, and she's absolutely crucial for this because she gathered the data and had the pioneering idea of applying machine learning to stroke rehabilitation. And it's really been a wonderful collaboration. 207 00:46:04.470 --> 00:46:05.160 Wendy Nilsen: Thanks. 208 00:46:05.340 --> 00:46:07.520 Wendy Nilsen: Pub Dr. Chu. 209 00:46:07.770 --> 00:46:20.200 Annie Qu, PhD: Yeah, I collaborate with the school of a nursing. So we kind of keep track about the caretaker and also pregnant woman for the nursing subject. 210 00:46:22.540 --> 00:46:23.749 Wendy Nilsen: And Dr. Weichel. 211 00:46:23.750 --> 00:46:26.261 Christopher K. Wikle, PhD: Yeah. So our collaborators are 212 00:46:28.170 --> 00:46:37.860 Christopher K. Wikle, PhD: at the Mount Sinai School of Medicine, and Alon Harris, Dr. Alon Harris's group and ophthalmology. There. 213 00:46:38.960 --> 00:46:39.650 Wendy Nilsen: Great. 214 00:46:39.800 --> 00:46:40.450 Wendy Nilsen: Thank you. 215 00:46:40.690 --> 00:46:49.659 Wendy Nilsen: I guess the the point on here is that you can't do this alone. You've got wonderful collaborators to do what you're doing, and that brings out the best in everyone. 216 00:46:49.910 --> 00:46:50.580 Wendy Nilsen: So 217 00:46:51.520 --> 00:46:58.079 Wendy Nilsen: i i 1 of the questions that came up here. Somebody said, it looks like Nih research. And I'm just. You've all had 218 00:46:58.360 --> 00:47:06.370 Wendy Nilsen: collaborations across. What do you think makes this Nsf research that a fair question. 219 00:47:07.010 --> 00:47:12.559 Carlos Fernandez-Granda, PhD: Yeah. So in my case, an important part of this project was developing anomaly quantification 220 00:47:13.200 --> 00:47:20.154 Carlos Fernandez-Granda, PhD: methodology that is able to identify data that deviates from 221 00:47:21.500 --> 00:47:40.049 Carlos Fernandez-Granda, PhD: from normal populations. And this is a very fundamental statistical question that connects also to to machine learning, because we would want to do this from very high dimensional data. And that is, in my opinion, a very yeah. It is fundamental research as opposed to applied clinical research. 222 00:47:42.290 --> 00:47:49.330 Christopher K. Wikle, PhD: Yeah, in my case. Both the development of the mathematical model 223 00:47:50.490 --> 00:47:55.760 Christopher K. Wikle, PhD: and also the development of the emulation emulator 224 00:47:56.280 --> 00:48:02.459 Christopher K. Wikle, PhD: of the mathematical model are both novel, require novel 225 00:48:02.730 --> 00:48:12.639 Christopher K. Wikle, PhD: mathematics and statistical methods. And so I think in that sense it's very much nsf oriented. It's just the goal is really 226 00:48:13.280 --> 00:48:14.820 Christopher K. Wikle, PhD: much broader than that. 227 00:48:16.190 --> 00:48:41.089 Annie Qu, PhD: So for our research, we original submit to Nsf, but then, Nsf, think it's a great project. Also recommend Nci to fund us. So currently our project is founded by National Institute of Health, because the stress is also, you know, chronic stress related to trigger the cancer. So I would say, Mathematica 228 00:48:41.090 --> 00:49:02.760 Annie Qu, PhD: modelings. Machine learning is related to basic science for Nsf. And I think this, too, like Nih, is more care about science, medicine, discovery, and the conclusion. And how? What's the impact? But without a sound mathematical modeling and foundation of statistics, we cannot achieve this goal. 229 00:49:04.360 --> 00:49:05.935 Wendy Nilsen: Great. Thank you. 230 00:49:06.500 --> 00:49:35.189 Wendy Nilsen: you know this is. And just for our audience, I will say, even when these projects that come in through this mechanism are funded by Nih, they're they're picking for the same reasons. Nsf is so they're looking for the same fundamental science. We are, because this is the way they can bring it in and change and bring in some new scientific ideas. So it's not like, there's a separate Nih idea and an Nsf idea. It's fundamental science questions driving all of it. 231 00:49:35.946 --> 00:49:49.540 Wendy Nilsen: There's a there's a question here. There's many questions about missing data. How do you? How does your analysis assume? Does it assume data is missing at random, or somebody says, Is it informative missing this? How do you all deal with that. 232 00:49:50.370 --> 00:49:50.750 Annie Qu, PhD: Yes. 233 00:49:50.750 --> 00:49:54.299 Wendy Nilsen: So, Dr. Chu, I think it started with your presentation. So. 234 00:49:54.820 --> 00:50:17.559 Annie Qu, PhD: Yeah. So first, st we have to be realistic. We talk about missing data. But let's say, we have the chunk of the data. That's all missing that, let's say, today, I know what's going on the previous stock market going on. But if you want me to predict the stock market even a month later or a year later. There's no any information, I mean, during that period of time. 235 00:50:17.560 --> 00:50:41.670 Annie Qu, PhD: the accuracy we cannot guarantee. We have to be realistic. So here the missing data is more like we have some information from some measurements, but a certain resolution. They're missing, and you can borrow information. And also the nearby time points, so we can extrapolate or interpolate. Let's say I can predict pretty well 236 00:50:41.670 --> 00:50:54.259 Annie Qu, PhD: what's what will happen? Maybe the next hour next day. So in that sense that it's really have to be time dependent and also resolution, the frequency dependence. So 237 00:50:54.330 --> 00:51:20.129 Annie Qu, PhD: on the one hand, we can do it. But also we have to be aware the limitation. So it's not just informative missing, because when we talk about informative missing, it's more like you have the observed associated with the future one whether it's informative. So here sometimes, in reality, it's very difficult to verify. It's a missing mechanism. 238 00:51:24.290 --> 00:51:27.229 Wendy Nilsen: Do any of the other have comments on that one. 239 00:51:31.060 --> 00:51:34.987 Christopher K. Wikle, PhD: I mean, I don't really, for this particular study, because 240 00:51:35.810 --> 00:51:43.250 Christopher K. Wikle, PhD: that you know more of what we've done has been more exploratory at this point. So and and it's a well. 241 00:51:43.380 --> 00:51:52.439 Christopher K. Wikle, PhD: all use data set. So all those things have sort of been worked out by now. But it is, it can definitely be a problem. It's just not a problem for what we're doing right now. 242 00:51:54.600 --> 00:52:02.549 Carlos Fernandez-Granda, PhD: In our case, it's also not a problem. Although the scarcity of label data in terms of 243 00:52:02.780 --> 00:52:13.609 Carlos Fernandez-Granda, PhD: data of stroke patients for whom we know the impairment was actually a main motivation for applying anomaly detection and and trying to use models that are trained on healthy patients. Unhealthy individuals. 244 00:52:15.260 --> 00:52:15.969 Wendy Nilsen: Thank you. 245 00:52:16.340 --> 00:52:26.179 Wendy Nilsen: So there's a comment again, about kind of a collaboration question, how do you navigate the balance between computational and health contributions in your work. 246 00:52:26.370 --> 00:52:36.370 Wendy Nilsen: they were saying, do you start with a computation, approach and identify an appropriate health application? Or do you start with the health side and then build your computation out. 247 00:52:36.760 --> 00:52:41.390 Wendy Nilsen: Pretty sure I'm gonna get many different answers here. So how? 248 00:52:41.880 --> 00:52:43.159 Wendy Nilsen: Who wants to start. 249 00:52:43.360 --> 00:53:09.190 Christopher K. Wikle, PhD: I'll start with that one. I'm actually late to this collaborative team in the sense that Dr. Harris and Dr. Guidaboni had been collaborating, I think, for years before this, and it was very much driven by wanting to understand what's actually happening with hemodynamics in the eye with respect to glaucoma. 250 00:53:09.220 --> 00:53:23.050 Christopher K. Wikle, PhD: So it was very much driven by the scientific question and the medical question. And then over time, it became much more of a as data became available. It became a data question as well, and then 251 00:53:23.480 --> 00:53:28.160 Christopher K. Wikle, PhD: opened it up for machine learning and and statistics to to come in and play a role. 252 00:53:29.970 --> 00:53:54.099 Annie Qu, PhD: Yeah, I can follow the Chris point. I think we were 1st approached by the domain scientist. And then during this research, we discover you can do some abstract thinkings in kind of abstract problem for the statistics problem. So it's a real day. That's very messy. And you see, there's some interesting statistic problem. 253 00:53:54.210 --> 00:54:18.090 Annie Qu, PhD: Then, later, we also, I decide to collect the data ourselves, because the mobile health data is, I mean, compared to other domain signs may be more difficult to collect. But the mobile health is relatively easy to collect, because then we can do the smart design. We know what kind of data we want 254 00:54:18.090 --> 00:54:38.100 Annie Qu, PhD: to collect what subjects we want to approach. So we have better control even at the beginning. So it's more like I get motivated by the the domain science. Then I got motivated to develop statistic methodology. And then later, I also get involved that we actually also can collect the data ourselves. So it's a it's a really fun. 255 00:54:38.100 --> 00:54:39.260 Annie Qu, PhD: a project. 256 00:54:40.450 --> 00:54:41.780 Wendy Nilsen: Great. Thank you. 257 00:54:42.501 --> 00:55:02.649 Wendy Nilsen: All right. There's a it started out with Dr. Fernandez, granda, but I think you'll all get to this one, Dr. Fernandez. This would be great to see how this would translate to Parkinson's patients. Have you explored other applications? And I think I'd love to hear how you all see your work evolving over time. So. 258 00:55:03.050 --> 00:55:31.160 Carlos Fernandez-Granda, PhD: So we have not looked at partners. I think this is a great suggestion, and we should definitely look into it. We did look at an application to disease severity, assessment. Because this framework, where you say, well, I have a related task that is relevant. And then I'm going to look at the confidence of a model trained on healthy subjects, can be applied quite broadly in the application that we explored. It was we had people that suffered from knee osteoarthritis, and we had their knee, Mris. 259 00:55:31.240 --> 00:55:34.929 Carlos Fernandez-Granda, PhD: and we trained a model to segment, healthy knees. 260 00:55:35.070 --> 00:55:50.880 Carlos Fernandez-Granda, PhD: and then we applied that model to these patients, and we observed again that there was a correlation between a lowering of the confidence of the model with the degree of severity of knee osteoarthritis. But we haven't looked into Parkinson's. I think it's a great suggestion. 261 00:55:54.100 --> 00:56:02.741 Christopher K. Wikle, PhD: Yeah, I was in our case. Definitely, this methodology of using a physiology enhanced data set. 262 00:56:03.350 --> 00:56:26.030 Christopher K. Wikle, PhD: where the that enhancement comes from mathematical model. Digital twins, I think, is sort of untapped at this point. And we are actually doing this in other, in other areas, or starting to requires developing a new mathematical model for these things as a component of the biology. But yeah, it's super exciting. I think. 263 00:56:26.080 --> 00:56:37.069 Christopher K. Wikle, PhD: in a way, it's it's just a way to expand the dimension of your input data in a way that is scientifically meaningful. And I think there's all sorts of ways to do that that we haven't even thought of yet. 264 00:56:39.600 --> 00:56:40.260 Wendy Nilsen: Great. 265 00:56:40.430 --> 00:56:42.669 Wendy Nilsen: Did you? Do you want to weigh in on this one? 266 00:56:46.620 --> 00:56:47.660 Wendy Nilsen: Dr. 2. 267 00:56:48.070 --> 00:56:49.296 Annie Qu, PhD: Oh, sorry! 268 00:56:50.180 --> 00:56:51.820 Wendy Nilsen: Do you want to weigh in on this at all? 269 00:56:52.815 --> 00:56:54.809 Annie Qu, PhD: No, I'm okay. I can skip that. 270 00:56:54.810 --> 00:56:55.490 Wendy Nilsen: Okay. 271 00:56:56.113 --> 00:57:15.859 Wendy Nilsen: Somebody's asking about AI, which, having worked working in the division at Nsf, that there's the home there. I have to have to ask the AI question. It says AI has changed everything. Are we expected to conduct AI research, or is it a different approach to be further ahead? And I think you all have interesting ideas about that, because. 272 00:57:17.110 --> 00:57:24.309 Wendy Nilsen: much as we all live, AI, it's not the only thing in the world so wants to start. 273 00:57:24.690 --> 00:57:25.350 Annie Qu, PhD: I can. 274 00:57:25.350 --> 00:57:25.980 Wendy Nilsen: That's true. 275 00:57:26.360 --> 00:57:51.260 Annie Qu, PhD: Yeah. So I think AI really plays a role for the future. It's already plays a role for our health. For example, the smart ring. So we AI, what's AI, it's about we can think about for statistical way to thinking about the algorithm. If we can do automatic algorithm based on observe the data and give us some suggestion which I hope 276 00:57:51.260 --> 00:58:16.229 Annie Qu, PhD: it's a sound suggestion, and I think a lot of AI cannot take into about the precision part to the heterogeneity part. It's doing well for the homogeneous information. They can accumulate all a lot of information, a lot of the data and tell what's a general population, but for the individual this is a very, very challenging for AI to personalize AI. 277 00:58:16.230 --> 00:58:32.979 Annie Qu, PhD: It's extremely challenging. And as a statistician. And we can think about, how can we do even personalize a large language model and a personalized machine learning? So this I think it's a future red hot topic. 278 00:58:36.630 --> 00:58:46.949 Carlos Fernandez-Granda, PhD: In in our in our project. Ai is absolutely fundamental to deal with the high, dimensional, wearable sensor and video data with. 279 00:58:47.380 --> 00:59:08.989 Carlos Fernandez-Granda, PhD: with the methods that we used to have before the advent of deep learning. It would be almost impossible to identify these movements from this high dimensional time series, or from video automatically with high accuracy. But at the same time we run into a challenge that comes with using standard AI methods, which is, we cannot just 280 00:59:08.990 --> 00:59:11.600 Carlos Fernandez-Granda, PhD: apply a machine learning methods that automatically. 281 00:59:11.610 --> 00:59:16.529 Carlos Fernandez-Granda, PhD: it predicts impairment because we have 50 labels of impairment in our 282 00:59:16.550 --> 00:59:21.510 Carlos Fernandez-Granda, PhD: data set these 50 stroke patients. Instead, we have to get creative 283 00:59:21.620 --> 00:59:29.209 Carlos Fernandez-Granda, PhD: and combine AI with statistical ideas. In this case these are normally detection, confidence based 284 00:59:29.350 --> 00:59:32.799 Carlos Fernandez-Granda, PhD: a method in order to to use it effectively. 285 00:59:35.940 --> 00:59:47.959 Christopher K. Wikle, PhD: Yeah. And just to kinda echo that I see 2 things here, 1 1. If in our project we use whatever tools the best for each component of our project. So we have 286 00:59:48.080 --> 01:00:02.429 Christopher K. Wikle, PhD: deterministic mathematical modeling. We have more traditional machine learning. We have AI components, we have everything. Whatever we we need to solve that component and then to integrate them together. And I think that's 287 01:00:02.760 --> 01:00:16.889 Christopher K. Wikle, PhD: that's what real data analysis and data science is. My other view of it is, for as a statistician is that I believe that AI has a lot to teach us about modeling, and and we have a lot to teach 288 01:00:18.240 --> 01:00:25.430 Christopher K. Wikle, PhD: AI practitioners about modeling as well. And I like this notion of being a hybrid thinker about how those things interact. 289 01:00:27.480 --> 01:00:28.250 Wendy Nilsen: Great. 290 01:00:28.250 --> 01:00:29.240 Wendy Nilsen: Thank you all. 291 01:00:29.726 --> 01:00:33.090 Wendy Nilsen: I think it's 4 o'clock. So I'm gonna have to 292 01:00:33.350 --> 01:00:47.200 Wendy Nilsen: to shut off our questions. If you have, we'll be posting the video and so look for that because you can. You can watch it again and learn even more. I know I really enjoyed this, and I just want to give a rousing. 293 01:00:47.270 --> 01:01:07.359 Wendy Nilsen: It's always hard to give applause, because Zoom is going gonna kill my clap. But at least my hands are clapping so and I know everybody else that's on is is thrilled with this. Thank you. Thank you. Thank you for being here with us, and we look forward to to learning more about all the work that you're doing. So. Thanks everyone. Thanks for joining us. 294 01:01:07.890 --> 01:01:08.250 Carlos Fernandez-Granda, PhD: Very much. 295 01:01:08.250 --> 01:01:09.479 Wendy Nilsen: Thank you. 296 01:01:09.480 --> 01:01:10.180 Annie Qu, PhD: But. 297 01:01:10.180 --> 01:01:10.870 Wendy Nilsen: I.