Smart city lifestyle sensing, big data, geo-analytics and intelligence for smarter public health decision-making in overweight, obesity and type 2 diabetes prevention: the research we should be doing

The public health burden caused by overweight, obesity (OO) and type-2 diabetes (T2D) is very significant and continues to rise worldwide. The causation of OO and T2D is complex and highly multifactorial rather than a mere energy intake (food) and expenditure (exercise) imbalance. But previous research into food and physical activity (PA) neighbourhood environments has mainly focused on associating body mass index (BMI) with proximity to stores selling fresh fruits and vegetables or fast food restaurants and takeaways, or with neighbourhood walkability factors and access to green spaces or public gym facilities, making largely naive, crude and inconsistent assumptions and conclusions that are far from the spirit of 'precision and accuracy public health'. Different people and population groups respond differently to the same food and PA environments, due to a myriad of unique individual and population group factors (genetic/epigenetic, metabolic, dietary and lifestyle habits, health literacy profiles, screen viewing times, stress levels, sleep patterns, environmental air and noise pollution levels, etc.) and their complex interplays with each other and with local food and PA settings. Furthermore, the same food store or fast food outlet can often sell or serve both healthy and non-healthy options/portions, so a simple binary classification into 'good' or 'bad' store/outlet should be avoided. Moreover, appropriate physical exercise, whilst essential for good health and disease prevention, is not very effective for weight maintenance or loss (especially when solely relied upon), and cannot offset the effects of a bad diet. The research we should be doing in the third decade of the twenty-first century should use a systems thinking approach, helped by recent advances in sensors, big data and related technologies, to investigate and consider all these factors in our quest to design better targeted and more effective public health interventions for OO and T2D control and prevention. Supplementary Information The online version contains supplementary material available at 10.1186/s12942-021-00266-0.

• Overweight:* • China has the largest overweight population in the world, bumping the United States to second place • Obesity:** • China has the highest numbers of obese children in the world • The United States followed by China have the highest numbers of obese adults in the world • Overweight and obesity are a documented leading risk factor of major non-communicable diseases, including type 2 diabetes (and its many complications), heart disease, joint disease, and certain types of cancer and dementia Obesity drives the high prevalence of type 2 diabetes in China (and elsewhere in the world); the two conditions are syndemic • China now has the world's largest and fastest growing diabetes epidemic: recent prevalence figures in China are ~11% for diabetes and ~36% for prediabetes* • Up to 10% of people with prediabetes will progress to diabetes each year** • Asians are at increased diabetes risk at lower body weights (i.e., at lower levels of overweight and obesity, about 7 Kg lower) than people from other parts of the world, as they tend to accumulate more visceral fat within the same BMI range^compared with Westerners*** • The resulting burden of disease, death and disability and the costs of managing it are huge, with significant negative economic and productivity impactŝ • Our representative case study city could be Guangzhou (population >13M), with the option to include an additional city (in China or elsewhere), if possible and if the corresponding data/access are available to us. We can then produce some interesting comparisons between the two cities once our demonstrator is up and running* Determine, and regularly update details of, the most problematic neighbourhoods, population groups in most need, and where and how to best target them with appropriate public health interventions/campaigns Geo-tagged data are essential to realise this vision! * Expected Technology Readiness Level at the end of demonstrator development and evaluation is TRL6 or TRL7 Generic MS Power BI clipart

Demonstrator development and evaluation
Task clusters * Task cluster 1: Agree a comprehensive list of key population lifestyle data that have a high correlation to the prevalence of OO and T2D --Later in this slide set, I will provide some examples of relevant population lifestyle data can be collected (just to help kick-start this task; this is not meant to be an exhaustive list of all items one should/can consider) * Task cluster 2: Identify the most efficient and appropriate/reliable sources, modes and frequency for collecting different lifestyle data --e.g., in real-time vs. every two or three months, depending on nature/source of data and their mode of collection. Later in this slide set, I will introduce some of the challenges faced when using wearables * Task cluster 3: Big data issues --Big data governance and management, data and metadata standards (esp. OGC http://www.opengeospatial.org/standards), data quality, ontologies and data warehousing, secure cloud storage, compliance with data protection regulations, individual data privacy issues if non-aggregate data are collected, data sharing agreements with providers, etc.
User requirements analysis (should form part of Tasks 1, 2, 4 and 5) • Public health professionals (the users of the planned dashboards) requirements analysis: what are their information/intelligence needs in relation to OO and T2D prevention • We need to investigate and learn about their current (conventional workflows): their current public health interventions in OO and T2D and the sort of information that feeds into the design of those interventions and is used in (manually) monitoring their implementation/success; what kind of information will help them do a better task and improve their workflows, ultimately resulting in better public health outcomes with reduced costs; etc.
• Ensure continuous user involvement throughout the whole demonstrator development effort from start to end, and not just during the user requirement analysis and evaluation stages • Develop iteratively, and regularly seek and incorporate users' feedback as the work progresses • Fully understanding our users' needs will be key to the successful prioritisation of demonstrator choices and developments/phases in a climate of finite (limited) resources and budgets Environmental factors, such as the various types of environmental pollution, as well as socioeconomic and occupational factors, including household income, influence and are influenced by population lifestyles through a wide range of different mechanisms and pathways (Kamel Boulos et al., 2021) Activity spaces (present and past): Humans are not confined to a single address point/postcode all the time/all their life, and diseases are often the product of both present and past exposures. "If time spent walking outdoors and biking is relevant for the exposure to environmental factors, then relying on (just) the home (or work/school) address as a proxy for exposure location may introduce misclassification" (Klous et al., 2017) Stock clipart credit: 123RF and others Some clipart depictions on the right are not scientifically accurate. For example, recent peer reviewed studies have shown that a large glass of 100% pure no-addedsugar fruit juice a day increases risk of premature death, and that excessive/extreme exercises can lead to irreparable cardiovascular damage and premature death (even in young people)! 'In moderation' is the key phrase here! Lifestyle data and data sources (Tasks 1 and 2) • Some of the factors/items on the previous slide are easy to collect from large populations of infer from existing big data that are routinely collected for other purposes (e.g., semi-annual clinical data aggregates from citywide hospitals/clinics, census data [every 10 years], etc.), with minimal or no individual subject inconvenience; some are more difficult/costly to collect or require surveys of representative samples of the population or other types of population sampling • Mobile crowdsensing: mobile data and telemetry collected without any, or with minimal, user intervention, e.g., obtained from general smartphone apps already in wide use or from purpose-built apps (with or without sensors such as a fitness/sleep tracking band) introduced in a suitable public health campaign,* with appropriate incentives, such as free mobile minutes or data, free tracking band, etc., to encourage large numbers of people to download them (those same purpose-built apps can also serve as a vehicle for serving tailored public health interventions to the public) • Data sampling rates: in real-time, or near-real-time, or non-real-time (e.g., updated monthly or 2-4 times every year or yearly). Some data are not available in real-time, while other data, though available in real-time, will not be needed, for our purposes, at that frequency and can be reasoned with in monthly or less frequent aggregates * e.g., Noise pollution monitoring by citizens (app) http://noisetube.net/ and air quality monitoring by citizens (bike sensors + app) http://cambikesensor.net/ and the Air Quality Egg (IoT device for crowdsourced citizen monitoring of airborne pollutants) https://airqualityegg.com/ * Example of a public diabetes portal introduced in 2019 by England's NHS that will not only benefit individual patients, but also accumulate big health data that can inform future population interventions https://www.dailymail.co.uk/health/article-7076217/Diabetes-patients-encouragedcare-health-using-personalised-website.html Lifestyle data and data sources (Tasks 1 and 2) • Neighbourhood food POI segmentation, such as outlets serving ultra-processed food*, high GI (glycaemic index) food and food high in High Fructose Corn Syrup-HFCS vs. those selling unprocessed or minimally processed foods and low GI food; also alcohol and tobacco outlets, red meat, vegetables/fruits and soda/sugary drinks/fruit juice outlets data are all important; etc.
• Data sources: Population surveys (food types and quantities, tobacco, smoking, etc.); city segmented consumption levels (incl. avg. per capita); city segmented food and drink sales (e.g., major supermarket chain(s), food factories, fast food chains, etc.); alcohol and tobacco sales (e.g., annual sales of pure alcohol in litres per person aged 15 years and older) and related aggregate data derived from hospital records; meat and processed meat consumption/sales/outlets, esp. red meat (high consumption of red meat/animal saturated fats = increased risk of T2D); vegetables and fruit/plant-based food consumption levels/sales/outlets; drinks containing HFCS, soda (even 'diet' varieties with artificial sweeteners), other sugary drinks and fruit juice consumption levels/sales/outlets; etc. • An important research gap in the current research literature is the segmentation of food consumption in large populations (per capita) by food type, and the associations of this segmentation/segmentation patterns with different population characteristics and of course overweight and obesity • It is the type (and relative quantities) of diet that matters the most in overweight and obesity, and not just the total number of calories consumed, the amount of physical activity undertaken, or the proximity to supermarkets or KFCs. The same food store, e.g., supermarket, KFC or other fast food outlet/takeaway, can often sell/serve good and bad calories, healthy and non-healthy options/portions. 100 calories from a 250ml sugary beverage and 100 calories from a handful of nuts are not the same. The segmentation of calories is essential, as not all calories have equal effect with respect to obesity, or as they say 'a calorie is not a calorie' • The type of food outlet/food type sales proportions per outlet, e.g., outlets selling (and proportion of sales of) unprocessed vs. highly processed food; the specific types of food bought by different households as determined by their socioeconomic levels (supermarket loyalty cards/apps are one possible data source here); etc., are the factors that really matter, and the sort of detail resolution we should be investigating and incorporating in our dashboards • N.B. Individual (and population groups') dietary/consumption behaviours (amounts consumed per individual snack/meal/day or per family) are also important, as even the healthiest options can prove unhealthy when overconsumed Food type segmentation -Not all calories are equal • Data about socio-economic/demographic profiles, e.g., household income and neighbourhood affluence ('social class' in Western societies) should be factored in, as they often dictate poor diet composition and poor nutrition/food consumption behaviours, both of which are the core issue in overweight and obesity Both low cost food and high cost food types/quantities that are typically associated with different socioeconomic levels can result in overweight and obesity and can act as a risk factor for T2D = both poorer and richer neighbourhoods are at risk, but in different ways • Geo-tagged aggregates of social media streams can be mined and analysed for content,* including images, that promote unhealthy lifestyles/eating or smoking** vs. healthy lifestyles, as well as for the prevailing mood of populations in various parts of a country or region. This knowledge gained can then be used to inform the design of appropriate/targeted public health interventions, including campaigns delivered on social media,*** which can reach out 'virally' to large numbers of people at minimal costs Lifestyle data and data sources (Tasks 1 and 2) • Physical activity POIs (points of interest) and neighbourhood walkability indicators Different factors (= additional relevant data sources to consider) affect neighbourhood walkability and the physical activity levels of its residents, including pollution (air, light and sound [noise] pollution), crowdedness, crime rates, traffic/road network safety, availability and positioning of green spaces, covered sports spaces and gyms (esp. important during temperature extremes and heavy rain/snow seasons), age and gender differences, etc. N.B. Road traffic, noise, and fine airborne particulate matter (fine dust) are also risk factors in T2D independent of neighbourhood walkability/physical activity levels Also data aggregates obtained by special agreements with mobile telecoms (population daily trajectories/mobility data), app developers, e.g., WeChat WeRun applet (https://blog.wechat.com/tag/werun/), Mi Fit and similar cloud-connected fitness apps that use smartphone GPS and accelerometer data, and which very many people are already using (geo-tagged big data aggregates about population physical activity levels and patterns) • Population steps data (averages per person/per day) should be considered adequate (cut-off point) at ~7,000 steps/day or 150 minutes of moderate exercise/week (UK Guidance) rather than the popular figure of 10,000 steps/day, which has limited scientific basis* * e.g., Lee et al. • Note: Physical activity is undoubtedly critical for good health and disease prevention, including T2D prevention, but is not that good for calorie burning/offsetting excessive calories in diet (our bodies are very energy-efficient machines)! Time to lay to rest that the over-simplistic and inherently flawed 'calories in/out imbalance' model Unlearning bad eating behaviours/habits and avoiding HFCS and ultra-processed foods are a healthier, more effective and sustainable long-term strategy than just attempting to burn a couple hundred more calories through exercise "There's no health guidance that exists to back the figure of 10,000 steps/day" --Mike Brannan, national lead for physical activity at Public Health England depositphotos.com

Notes
• When it comes to physical activity (PA) tracking apps on smartphones or dedicated wearables/bands (steps/distance, etc.), there is a growing body of research evidence and other reports in the grey literature about the accuracy/inaccuracies of their measurements, which can vary by as much as 50% or more between different models within the same brand/manufacturer and across different brands/manufacturers, e.g., the distance recorded by a Huawei Watch 2 Sport is typically half that reported by a Samsung Gear S2 for the same individual and activity session! Also, automatic exercise type detection/segmentation ( • These inaccuracies would be more relevant and pose potential problems when using the measurements for individual-level interventions, but are less important in population-level interventions (as in this demonstrator), where all what we need are general population PA levels and population trends/changes over time from crude aggregate data from large population samples (devices overestimating PA and those underestimating it are likely to offset one another to some degree) Aim for not less than 7000 steps/day (a reasonable cutoff figure for our population analyses Precision and accuracy public health: community-level physical activity (PA) indicators obtained from population app aggregates

• Informed public health decision-making (precision and accuracy public health):
In the future, the ideal PA planning and tracking app(s) would expose an API (application programming interface) with suitable privacy provisions/guarantees, policies and data sharing arrangements to enable public health authorities to obtain population aggregates of PA levels and trends over time amongst their target populations (indicators of community physical activity level/status), e.g., average steps (or, if Google Fit is used, 'Move Minutes' and 'Heart Points', per person/per day in different age groups and country regions/city neighbourhoods; cumulative steps/ Move Minutes and Heart Points per region/neighbourhood for a given period (can be monitored and compared every few months; can also be normalised by population number to compare different regions/neighbourhoods). This feature should help public health authorities devise superior interventions, better target them and monitor their effect over time, making adjustments to the interventions as necessary.
• (In population aggregates, we are looking for crude trends over time (e.g., overall population PA levels, and whether they are increasing or decreasing or remaining unchanged over time/in different seasons of the year, etc.), so the inaccuracies and differences between different sensors and device models shouldn't pose a big issue, and over-and underestimates will offset each other in large samples.) • Other relevant population-level indicators related to heart rate variability (see https://www.health.harvard.edu/blog/heart-rate-variability-new-way-track-well-2017112212789), sedentary behaviour and sleep quality/patterns in the community can also be extracted from fitness and PA app and gadget big data aggregates of user populations to provide additional 'lifestyle intelligence' for designing better public health interventions. • We also need to cover the very latest (and much relevant) published evidence in this area regarding 'smartphone stress'; the corresponding lifestyle data would be smartphone screen time and usage/app patterns and their relation to harmful cortisol levels spikes and fluctuations throughout the day,* which can precipitate or exacerbate diabetes in the long run Data sources: see, for example, TalkingData (a Chinese company) https://www.talkingdata.com/. They offer regularly updated big data-sets of mobile user data and behaviour covering >2.5 billion smart devices in China and more than 200,000 Chinese app developers and their apps cf. American users screen time tracking data published by Moment app https://inthemoment.io/. The SMART Study (Canada)** also developed a methodology to derive objective screen-state from population mobile devices (smartphones and others), and found (as reported in one of their unpublished papers -still under peer review as of May 2019) that when objective measures were compared with subjective (self) reporting, the results indicated that participants consistently under-reported screen time Bluetooth beacons (geolocation) + apps for foot traffic and food environment monitoring-but not without privacy issues • https://www.nytimes.com/interactive/2019/06/14/o pinion/bluetooth-wireless-tracking-privacy.html • http://www.navi-guide.com/ • Proxy mental health data for city population stress/anxiety/depression levels -avg. by age group, gender, neighbourhood, etc.: important factor in overweight/obesity (OO) and binge eating • Sleep duration and quality data -avg. by age group, neighbourhood, etc. (aggregate city population data from fitness/sleep trackers and apps -if available by special agreements with providers or via a purpose-built campaign/consumer health portal or surveys): poor sleep is a major factor in OO and T2D • Health literacy levels data -avg. by age group, gender, neighbourhood, etc. (obtained from population surveys or representative samples); media consumption data by media type (different online and social media, different TV stations/programmes, different newspapers and other print material, etc.), by age group, etc.: important to consider when designing mass health education interventions/public health campaigns • Population clinical data aggregates (anonymised, geo-tagged/aggregated at postcode level or similar, by age group, etc.), updated, say, every 6 months, from clinical screening and other lab tests that are routinely done in hospitals and clinics in our chosen demonstrator location(s) (e.g., the city of Guangzhou), such as BMI, HbA1c, fasting blood glucose, blood pressure, total cholesterol, etc.
Standards-compliant data linkages/interfaces to electronic hospital/health records can be established to automate the process of collecting these data aggregates (cf. syndromic surveillance systems) • No two persons are the same in how their bodies are affected by, and respond to, exactly the same food environment and diet. We have unique individual profiles (metabolic, genetic, gut bacteria, etc.) that determine our different responses • In 2015/16, Kamel Boulos proposed a highly personalised diet and lifestyle recommendation system that takes into consideration many of these unique individual profiles using inputs from specialised sensors/wearables, such as mobile personal indirect calorimetry, etc. Besides using users' data aggregates (aggregates of the same user and of all system users over time) for its own dynamic self-learning and tweaking, the system would also offer a privacy-preserving version of its user population data aggregates for research and public health purposes A calorie is not a calorie: ⁻ Different people with different genetic and gut bacteria profiles absorb and burn exactly the same type and amount of food at different rates. Individual variations also exist in learned lifestyle habits, gut hormones profiles, which control appetite, etc. ⁻ Calories from same food type and amount can vary according to whether the food was left to cool down (less calories, even when reheated) or was consumed fresh and hot (more calories), e.g., toast, potatoes, rice and pasta, due to formation of resistant starch ⁻ etc.
People's appetite and response to so-called 'obesogenic food environments' vary according to their genetic and epigenetic profiles Lifestyle data and data sources (Tasks 1 and 2 • In the future, we will know more about our target populations, thanks to population genome data banks, where profiles of large local population samples can be mined and analysed for the presence specific/relevant genes/gene variants/mutations and dysfunctions, e.g., those controlling appetite, or implicated in overweight and obesity predisposition, or determining our base exercise behaviour: FTO, LEP, MRAP2, MC4R, SDC3, etc., and use this intelligence to inform the design of optimised individual-specific as well as population-wide interventions that can make our genes work best for us and/or offset their negative effects (predispositions) • Similarly, we can have population gut microbiome data banks (with regularly updated profiles, say every 6 months) of large local population samples. A growing number of consumer-oriented labs are offering 'gut bacteria profiling' services today at affordable prices, e.g., DayTwo, Viome, AmericanGut, Aperiomics, AtlasBiomed and others. Gut bacteria types and diversity/ratios (some are 'bad' in relation to obesity, e.g., Firmicutes, while others are 'good', e.g., Christensenella and Bacteroidetes -see, for example, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5949328/ and https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5933040/ ) can be modified or modulated/ influenced, as necessary, through both individual-specific and population-wide interventions (dietary modifications/mass health education about diet) to make our gut microbiomes work best for us (e.g., encourage the consumption of sauerkraut (cabbage), yoghurt, kimchi and miso soup, all of which promote good gut bacteria) • Future challenges associated with the above data banks: protection of genomic and other sensitive population data from being used against employment and health insurance, and informed consent on storing and using genetic and nongenetic information for research and development (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4915347/ ) Lifestyle data and data sources (Tasks 1 and 2, if we are now in 2025) Future directions (beyond this demonstrator): factoring in population genetics/epigenetics and gut microbiome profile aggregates from (future) genome data banks and gut microbiome data banks • Population-level accuracy and precision public health interventions: Only if we have evidence of a large proportion of the population with, say, the wrong gut bacteria profiles (bacteria that promote obesity), then we can roll-out some targeted health education and mass diet interventions to rectify this. Such population evidence will be drawn from 'gut microbiome banks' that store and regularly update the gut bacteria profiles of very large numbers of individuals (numbers that are sufficient to be used as a crude proxy for the whole population). Such gut microbiome data banks do not yet exist, so can be ruled out as a data source in the proposed demonstrator. Same can be said about genetic profiles banks until we have them one day • This does not rule out general (non-targeted) whole-population interventions with respect to these factors, as long as their cost/benefit ratio is reasonable (not always the case), e.g., general health education and mass dietary interventions to promote good gut bacteria or gamified exercise interventions or exergames (not just for those individuals that 'hate' exercise because of their genetic makeup). Such interventions would still benefit the rest of the population in varying degrees; for example, people who already enjoy exercising and have adequate daily exercise levels might still like and benefit from exergames, etc.
• But such accuracy and precision public health interventions, for our purposes in this demonstrator, cannot be seen as targeted (as a response to specific population data/intelligence we have) and their immediate effects cannot be easily monitored (we don't have, for example, a gut microbiome bank that we can use to show and document a significant change in population gut microbiome profiles following an intervention of this type) • On an individual (person-specific) level, similar (accuracy and ) precision medicine interventions are already being prescribed today based on findings from the person's specific genetic makeup and gut bacteria profile Lifestyle data and data sources (Tasks 1 and 2, if we are now in 2025)

A new era of Accuracy and Precision Public Health and Medicine
Conclusion: The research we should be doing Towards precision and accuracy health/public health • The segmentation of food consumption in large populations (per capita) by food type, and the associations of this segmentation/segmentation patterns with different population characteristics and of course overweight and obesity, remain an important research gap in the current literature. It is the type (and relative quantities) of diet that matters the most in overweight and obesity, and not just the total number of calories consumed, the amount of physical activity undertaken, or the proximity to supermarkets or KFCs. The same food store, e.g., a supermarket or KFC, can often sell/serve good and bad calories. One hundred (100) calories from a 250ml sugary beverage and 100 calories from a handful of nuts are not the same. The segmentation of calories is essential, as not all calories have equal effect with respect to obesity, or as they say 'a calorie is not a calorie'. The type of food outlet/food type sales proportions per outlet, e.g., outlets selling (and proportion of sales of) unprocessed vs. highly processed food; the specific types of food bought by different households as determined by their socioeconomic levels (supermarket loyalty cards are one possible data source here); and even screen time (average duration per capita)/screen time patterns, especially in children, etc., are the factors that really matter in childhood and adult obesity, and the sort of detail resolution we should be investigating in future studies • Similarly, when it comes to physical activity (PA) tracking in populations and PA facilities and opportunities in the neighbourhood environment, they should also be segmented by type of activity, taking into consideration the different genetic profiles of population subgroups. A recent study (https://doi.org/10.1371/journal.pgen.1008277) found that regular jogging, followed by mountain climbing, walking, power walking, certain types of dancing, and long yoga practices reduced BMI in individuals who are genetically predisposed to obesity. But cycling, stretching exercises, swimming and Dance Dance Revolution (an exergame) did not counteract the genetic effects on obesity! If we want to deliver effective and truly individualised PA advice and coaching, we have to include this sort of detail resolution about the opportunities and uptake of different PA types by different population subgroups in the neighbourhood^