General stats
Users
Comments map
Printable speech
Words cloud
Back to home
All speeches
What are the Research Challenges for the use of Big Data in Policy Making?
Last update 02 Sep 2019
380 paragraphs, 336 comments
Speech words cloud
Data driven policy making * Data-driven policy making aims to make use of new data sources and new techniques for processing these data and to realize new policies, involving citizens and other relevant stakeholders as data providers. * Policy making typically would like to rely on open (and free) data. One case is that the needed data is simply not collected and the level of granularity needed or targeted. Another case is that data is actually collected but since it is an asset by those who collect and detain it, it is not shared or shared at a (high) cost. * Clearly it is related to the notion of evidence-based policy making, which considers relevant the inclusion of systematic research, program management experience and political judgement in the policy making process (Head 2018). The concept of evidence-based policy making implies that the logic of intervention, impact and accountability are accepted and considered a key part of the policy process. * However, data-driven policy making stresses the importance of big data and open data sources into policy making as well as with co-creation of policy by involving citizens to increase legitimacy (Bijlsma et al 2011) and decrease citizens’ distrust in government (Davies 2017). In this respect, data availability is of great importance, but even more is data relevance. You can have nice (big) open data sources (e.g. Copernicus, weather data, sensor data). But policy making asks for totally different assets * Policy making is conceptualized as a policy cycle, consisting of several different phases, such as agenda setting, policy design and decision making, policy implementation, monitoring and evaluation * It has also to be taken into account that the evaluation phase can be considered as a continuous and horizontal activity which has to be applied in all other policy cycle stages. Therefore, we can talk about E-Policy-Cycle (Höchtl et al. 2015) What follows is a presentation of the use of big data in the different phases of the policy cycles, as well as the challenges of the policy making activities in which big data can be exploited. Phase 1 – Agenda setting * The challenge addressed is to detect (or even predict) problems before they become too costly to face. Clearly the definition of what is a “problem” to be solved has a political element, not just analytical (Vydra & Klievink, 2019) * One traditional problem of policy making is that data and therefore statistics become available only long time after the problems have emerged, hence increasing the costs to solving them * Alternative metrics and datasets can be used to identify early warning signs at an earlier stage and lower costs helping to better understanding causal links. In this respect, there should be a clear effort to use the data available at the real Big Data owners and merchants - Google, Facebook, Amazon and Apple. The big corporations should to pay their taxes in data - not just with cash. Regarding alternative metrics, the research departments at Microsoft and Google have made significant advances in this area (see Stephens-Davidowitz, available at http://sethsd.com) * Moreover according to Höchtl et al. (2016, p. 159) governments can identify emergent topics early and to create relevant agenda points collecting data from social networks with high degrees of participation and identifying citizens’ policy preferences. Clearly, using data from social networks needs a big amount of data cleaning and quality check. In that regard, dedicated discussion spaces (e.g. Opinion Space in the past) ensure better quality. * Overall, it is necessary a mediation between traditional objectives and boundaries with information from big data and participative democracy. Relevant for this problem are optimization methods, techniques like linear and non linear programming, as well as how to combine big data and economic planning Phase 2 – Policy Agenda * Big data and data analytics solutions can be used for providing evidence for the ex ante impact assessment of policy options, by helping to predict possible outcomes of the different options * Big data can also help in analysing the current situation using back-casting techniques. If the data are available, researchers can analyse the already existing previous data and figure out where a failed policy went wrong or where a successful one owes its success * Furthermore, in principle if policies could be compared, counterfactual models could do the trick. However, it is not easy to find systems expected to have the exact same behaviour when talking on political decisions * In this regard, Giest (2017) argues that the increased use of big data is shaping policy instruments, as “The vast amount of administrative data collected at various governmental levels and in different domains, such as tax systems, social programs, health records and the like, can— with their digitization— be used for decision-making in areas of education, economics, health and social policy” * Clearly there is a strong potential to use public data for policy-making, but that does not come for free. There is a obvious wealth of public sector data, but tt needs to be structured and "opened" for other uses, also considering the data protection issue, also in light of the new regulatory frameworks such as the GDPR. Phase 3 – Policy Implementation * Big data and data analytics can help identifying the key stakeholders to involve in policy or to be targeted by policies * One way in which big data can influence the implementation stage of the policy process is the real-time production of data * Clearly this implies that data is available and usable. Frameworks and platforms are needed for this and there is the challenge of the local rooting of public sector bodies and their operations: this is not homogenous across the EU or the world. Also privacy and security issues must be taken into account * The execution of new policies immediately produces new data, which can be used to evaluate the effectiveness of policies and improving the future implementation * Testing a new policy in real time can provide insights whether it has the desired effect or requires modification (e.g. Yom-Tov et al. 2018). However, one has to account for an adjustment period, thus the effects observed immediately after the policy was put into effect might not be representative of its long-term consequences * Furthermore, big data can be used for behavioral insights Phase 4 – Policy Evaluation * Big data and data analytics approaches can help detecting the impact of policies at an early stage (Höchtl et al., 2016, p. 149), before formal evaluation exercises are carried out, or detecting problems related to implementation, such as corruption in public spending * In that regard, formal/structured evaluation mechanisms are complementary to (big) data analytics approaches * Most importantly, big data can be used for continuous evaluation of policies, to inform the policy analysis process, while even empowering and engage citizens and stakeholders in the process (Schintler and Kulkarni 2014, p. 343) Do you agree with these definitions? Do you want to add anything (please add comments in the definitions above)? How can citizens participate in each of the phases above? How can co-creation for data-driven policy making be realized (please add comments in the definitions above)? Which (technical, organizational, legal) requirements need to be met to enable the use of big data in each phase (please add comments in the definitions above)? What are the obstacles and bottlenecks for the use of Big Data in each phase (please add comments in the definitions above)? Key challenges of data-based policy making in which big data can be useful: * Anticipate detection of problems before they become intractable; * Generate a fruitful involvement of citizens in the policy making activity; * Making sense of thousand opinions from citizens; * Uncover causal relationships behind policy problems; * Identify cheaper and real-time proxies for official statistics; * Identify key stakeholders to be involved in or target by specific policies; * Anticipate or monitor in real time the impact of policies. Which big data methodologies can be used to cope with any of the above challenges (please add comments in the lines above)? Gaps and Research Needs * Development of new evaluation frameworks and tools for the assessment of the impact of policies. Such evaluation frameworks should build on a set of evaluation criteria and indicators adapted to the specific domains * Development of new procedures and tools for the establishment of a management system integrating both, financial and nonfinancial performance information linked with quality data, impact measurement and other performance indicators * Development of new tools, methodologies and regulatory frameworks to boost participation of citizens in policies making by mean of crowdsourcing and co-creation of policies, in the view to define stances and to being able to differentiate complaints from critiques * Development of new regulations, tools and technical frameworks that ensure absence of bias and transparency in the policy making process and cybersecurity of IT systems in the public administration * Development and deployment of frameworks and tools that allow the secure sharing of information and data within the public administration, as well as the interoperability of systems and databases. These frameworks include the standardization of organizational processes * Development of specific interoperable cloud infrastructures and (re-usable and integrating) models for the management and analysis of huge volumes of data * Development of new regulations, tools and technical frameworks that ensure absence respect of citizens’ privacy and data ownership/security, especially in case the personal information need to be migrated across public administration agencies * Development and establishment of a unique reliable, secure and economically sustainable technical and IT infrastructure which would work as a backbone for all the public services developed and implemented in the public sector * Development of information management systems and procedures for the collection, storing, sharing, standardization and classification for information pertaining to the public sector * Development of analytical tools to understand the combined contribution of technological convergence. For instance, how technologies such as AI, Blockchain and IoT may be combined to offer super-additive solutions for evidence-based policy making * Development of new analytical tools to support to problem setting: ability to fully understand the policy issue you are trying to tackle in its entirety and its key fundamental processes Do you agree with such gaps/research needs? Do you have any other gap/research need to add? Can you propose any solution to such gap/research need? Research Clusters and Related Research Challenges We define six main research clusters related to the use of Big Data in policy making. Four of them are purely technological and build on the Big Data Cycle, while two are of a more legal and organizational nature. The research clusters are the following: Cluster 1 - Privacy, Transparency and Trust * Even more than with traditional IT architectures, Big Data requires systems for determining and maintaining data ownership, data definitions, and data flows. In fact, Big Data offers unprecedented opportunities to monitor processes that were previously invisible. * In addition, the detail and volume of the data stored raises the stakes on issues such as data privacy and data sovereignty. The output of such research cluster includes a legal framework to ensure ownership, security and privacy of the data generated by the user while using the systems in the public administration. * A second facet of this research cluster is transparency in the policy making process and availability of information and data from the public administration, which is also related to the ability to collect sufficient data, which is not a given, especially when dealing with local public administrations. Concerning the transparency in the policy making process, computer algorithms are widely employed throughout our economy and society to make decisions that have far-reaching impacts, including their applications for education, access to credit, healthcare, and employment, and therefore their transparency is of utmost importance. * On the other side ubiquity of algorithms in everyday lives is an important reason to focus on addressing challenges associated with the design and technical aspects of algorithms and preventing bias from the onset. * A crucial element, which is taking more and more importance in the last decade, is the practice of co-creating public services and public policies with citizens and companies, which would make public services more tailored to the needs of citizens and would open the black box of the inner working of public administration. * In the context of big data, co-creation activities take the form of citizen science-like activities such as data creation on the side of citizens, and in the co-creation of service in which disruptive technologies such as big data are adopted. * In that regard, following Wood-Bodley (2018), harnessing the rich and valuable insights and experience of people in non-policy roles is essential to building fit-for-purpose solutions. * An interesting research avenue that is gaining importance is the co-creation of the algorithms that are used in policy making, especially through serious games and simulations. Finally, openness and availability of government data for re-use provides the possibility to check and put under scrutiny the policy making activity (e.g. the UK-oriented initiative of My2050). Cluster 2 – Public Governance Framework for Data Driven Policy Making Structures * The governance concept has been on the roll for the last couple of years. But, what is the governance concept actually about and how can it be applied for the present purpose? Generally, the governance notion stands for shaping and designing areas of life in the way that rules are set and managed in order to guide policy-making and policy implementation (Lucke and Reinermann, 2002). * Core dimensions of governance are efficiency, transparency, participation and accountability (United Nations, 2007). Corresponding to the definition of electronic governance, evidence-based and data-informed policy-making in the information age applies technology in order to efficiently transform governments, their interactions with citizens and the relationship with citizens, businesses, other stakeholders, creating impact on the society (Estevez and Janowski, 2013). * More concrete, digital technologies are applied for the processing of information and decision-making, the so called smart governance approach is applicable here (Pereira et al., 2018). In this frame, governance has to focus on how to leverage data for more effective, efficient, rational, participative and transparent policymaking. Although the governance discussion is not the newest one, it remains complex challenge in the era of digital transformation. Cluster 3 - Data acquisition, cleaning and representativeness * Data to be used for policy making activity stem from a variety of sources: government administrative data, official statistics, user-generated web content (blogs, wikis, discussion forums, posts, chats, tweets, podcasting, pins, digital images, video, audio files, advertisements, etc.), search engine data, data gathered by connected people and devices (e.g. wearable technology, mobile devices, Internet of Things), tracking data (including GPS/geolocation data, traffic and other transport sensor data), and data sources collected through participation of citizens science activities. * This leads to a huge amount of data that can be used and are of an increased size and resolution, span across time series, and that they are not, in most cases, collected by means of direct elicitation of people. However, concerning data quality, a common issue is balance between random and systematic errors. Random errors in measurements are caused by unknown and unpredictable changes in the measurement. In that regard, the unification of data so as to be editable and available for policy making is of extreme importance: cancelling noise for instance is challenging. * These changes may occur in the measuring instruments or in the environmental conditions. Normally random errors tend to be distributed according to a normal or Gaussian distribution. One consequence of this is that increasing the size of your data helps to reduce random errors. However, this is not the case of systematic errors, which are not random and therefore they affect measurements in one specific way. In this case, errors are from the way how data are created and therefore very large datasets might blind researchers to this kind of errors. * Besides the potential presence of systematic errors, there are two more methodological aspects of big data that require careful evaluation: the issue of representativeness and the construct validity problem. * For this reason, any known limitations of the data accuracy, sources, and bias should be readily available, along with recommendations about the kinds of decision-making the data can and cannot support. The ideal would be a cleansing mechanism for reducing the inaccuracy of the data to the smallest extent, though, especially in case this can be predicted beforehand. Cluster 4 - Data storage, clustering, integration and fusion * This research cluster deals with information extraction from unstructured, multimodal data, heterogeneous, complex, or dynamic data. Heterogeneity and incomplete data must be structured prior to the analysis in an homogeneous way, as most computer systems work better if multiple items are stored in an identical size and structure. But an efficient representation, access and analysis of semi-structured data is necessary because a less structured design is more useful for certain analysis and purposes. * Specifically, the large majority of big data, from the most common such as social media and search engines data to transactions at self-check out in hotels or supermarkets, are generated for different and specific purposes. They are not the design of a researcher that elicits their collection with in mind already an idea of a theoretical framework of reference and of an analytical strategy. Specifically regarding data from social media, they can be really challenging to clean and demand a lot of effort. What is more, the data elicited from social media could be biased. * In this regard, repurposing of data requires a good understanding of the context in which the data repurposed were generated in the first place, finding a balance between identifying the weaknesses of the repurposed data and at the same time finding their strengths. * In synthesis, the combination and meaning extraction of big data stemming from different data sources to be repurposed for another goal requires the composition of teams that combine to types of expertise: data scientists, which can combine different datasets and apply novel statistical techniques; domain experts, that help know the history of how data were collected and can help in the interpretation. * Clearly, a pre-requisite of clustering, integration and fusion is the presence of tools and methodologies to successfully store and process big data Cluster 5 - Modelling and analysis with big data * Despite the recent dramatic boost of inference methods, they still crucially rely on the exploitation of prior knowledge and the problem of how those systems could handle unanticipated knowledge remains a great challenge. * In addition, also with the present available architectures (feed-forward and recurrent networks, topological maps, etc.) it is difficult to go much further than a black-box approach and the understanding of the extraordinary effectiveness of these tools is far from being elucidated. Given the above-mentioned context it is important to make steps towards a deeper insight about the emergence of the new and its regularities. * This implies conceiving better modelling schemes, possibly data-driven, to better grasp the complexity of the challenges in front of us, and aiming at gathering better data more than big data, and wisely blending modelling schemes. But we should also go one step further in developing tools allowing policy makers to have meaningful representations of the present situations along with accurate simulation engines to generate and evaluate future scenarios. * Hence the need of tools allowing for a realistic forecast of how a change in the current conditions will affect and modify the future scenario. In short scenario simulators and decision support tools. In this framework it is highly important to launch new research directions aimed at developing effective infrastructures merging the science of data with the development of highly predictive models, to come up with engaging and meaningful visualizations and friendly scenario simulation engines. * Taking into account the development of new models, there are basically two main approaches: data modelling and simulation modelling. Data modelling is a method in which a model represents correlation relationships between one set of data and the other set of data. On the other hand, simulation modelling is a more classical, but more powerful, method in which a model represents causal relationships between a set of controlled inputs and corresponding outputs. Cluster 6 - Data visualization * Making sense and extracting meaning of data can be achieved by placing them in a visual context: patterns, trends and correlations that might go undetected in text-based data can be exposed and recognized easier with data visualization software. * This is clearly important in a policy making context, in particular when considering the problem setting phase of the policy cycle and the visualization of the results of big data modelling and analysis. Specifically, new techniques allow the automatic visualization of data in real time. Furthermore, visual analytics allows to combining human perception and computing power in order to make visualization more interactive. * How can big data visualization and visual analytics help policy makers? First, generate high involvement of citizens in policy-making. One of the main applications of visualization is in making sense of large datasets and identifying key variables and causal relationships in a non-technical way. Similarly, it enables non-technical users to make sense of data and interact with them. * Further, good visualization is also important in "selling" the data-driven policy making approach. Policy makers need to be convinced that data-driven policy making is sound, and that its conclusions can be effectively communicated to other stakeholders of the policy process. External stakeholders also need to be convinced to trust, or at least, consider data-driven policy-making. * There should be a clear and explicit distinction of the audiences for the policy visualisations: e.g. experts, decision makers, the general public. Experts are analyzing data, are very familiar with the problem domain and will generate draft policies or conclusions leading to policies Decision makers may not be technical users, and may not have the time to delve deep into a problem. They will listen to experts and must be able to understand the issues, make informed decisions and explain why. The public needs to understand the basics of the issue and the resulting policy in a clear manner. * A second element is that visualization help to understand the impact of policies: visualization is instrumental in making evaluation of policy impact more effective. Finally, it helps to identify problems at an early stage, detect the “unknown unknown” and anticipate crisis: visual analytics are largely used in the business intelligence community because they help exploiting the human capacity to detect unexpected patterns and connections between data. Do you agree with this set of research clusters? Do you want to add any other? Do you want to merge any cluster? Do you think that they cover the entire big data chain and/or policy cycle? For each research cluster we defined an initial set of research challenges. Cluster 1 – Privacy, Transparency, Ethics and Trust * Big Data nudging * Algorithmic bias and transparency * Open Government Data * Manipulation of statements and misinformation Cluster 2 – Public Governance Framework for Data Driven Policy Making Structures * Forming of societal and political will * Stakeholder/Data-producer-oriented Governance approaches * Governance administrative levels and jurisdictional silos * Education and personnel development in data sciences Cluster 3 – Data acquisition, cleaning and representativeness * Real time big data collection and production * Quality assessment, data cleaning and formatting * Representativeness of data collected Cluster 4 – Data storage, clustering, integration and fusion * Big Data storage and processing * Identification of patterns, trends and relevant observables * Extraction of relevant information and feature extraction Cluster 5 – Modelling and analysis with big data * Identification of suitable modelling schemes inferred from existing data * Collaborative model simulations and scenarios generation * Integration and re-use of modelling schemes Cluster 6 – Data visualization * Automated visualization of dynamic data in real time * Interactive data visualization What follows is a brief explanation of the research challenges. Cluster 1 – Privacy, Transparency, Ethics and Trust Research Challenge 1.1 - Big Data nudging * Following Misuraca (2018), nudging has long been recognized as a powerful tool to achieve policy goals by inducing changes in citizens behaviour, while at the same time presenting risks in terms of respect of individual freedom. Nudging can help governments, for instance, reducing carbon emissions by changing how citizens commute, using data from public and private sources. But it is not clear to what extent can government use these methods without infringing citizens’ freedom of choice. And it is possible to imagine a wide array of malevolent applications by governments with a more pliable definition of human rights. * The recent case of Cambridge Analytica acts as a powerful reminder of the threats deriving from the combination of big data with behavioural science. These benefits and the risks are multiplied by the combination of nudging with big data analytics, becoming a mode of design-based regulation based on algorithmic decision-guidance techniques. When nudging can exploit thousands of data points on any individual, based on data held by governments but also from private sources, the effectiveness of such measures – for good and for bad – are exponentially higher. * Unlike the static nudges, Big Data analytic nudges (also called hypernudging) are extremely powerful due to their continuously updated, dynamic and pervasive nature, working through algorithmic analysis of data streams from multiple sources offering predictive insights concerning habits, preferences and interests of targeted individuals. * In this respect, as pointed out by Yeung (2016), by “highlighting correlations between data items that would not otherwise be observable, these techniques are being used to shape the informational choice context in which individual decision-making occurs, with the aim of channelling attention and decision-making in directions preferred by the ‘choice architect”. In this respect, these techniques constitute a ‘soft’ form of design-based control, and it remains unchartered territory the definition of the scope, limitations and safeguards – both technological and not – to ensure the simultaneous achievement of fundamental policy goals with respect of basic human rights. Relevance and applications in policy making * Behavioural change is today a fundamental policy tools across all policy priorities. The great challenges of our time, from climate change to increased inequality to healthy living can only be addressed by the concerted effort of all stakeholders. * But in the present context of declining trust in public institutions and recent awareness of the risk of big data for individual freedoms, any intervention towards greater usage of personal data should be treated with enormous care and appropriate safeguards should be developed. * Notwithstanding the big role of the GDPR, the trust factor is not understood well so far. While there are a number of studies on trust and there exist several trust models explaining trust relations and enabling empirical research on the level of trust, these researches are not yet including the study of trust in big data applications and the impact this may have on human behaviour * In this regard, there is the need to assess power and legitimacy of hypernudging to feed real-time policy modelling to inform changes in institutional settings and governance mechanisms, to understand how address key societal challenges exploiting the potential of digital technologies and its impact on institutions and individual and collective behaviours, as well as to anticipate emerging risks and new threats deriving from digital transformation and changes in governance and society. Technologies, tools and methodologies * This research challenge stems from the combination of machine learning algorithms and behavioural science. Machine learning algorithms can be modelled to find patterns in very large datasets. These algorithms consolidate information and adapt to become increasingly sophisticated and accurate, allowing them to learn automatically without being explicitly programmed. * At the same time, potential safeguards deal with transparency tools to ensure adequate consent by the citizens to be involved in such initiatives, as well as algorithm evaluation mechanisms for potential downside. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 1.2 - Algorithmic bias and transparency * Many decisions, are today automated and performed by algorithms. Predictive algorithms have been used for many years in public services, whether for predicting risks of hospital admissions or recidivism in criminal justice. Newer ones could predict exam results or job outcomes or help regulators predict patterns of infraction. It’s useful to be able to make violence risk assessments when a call comes into the police, or to make risk assessments of buildings. Health is already being transformed by much better detection of illness, for example, in blood or eye tests. * Algorithms are designed by humans, and increasingly learn by observing human behaviour through data, therefore they tend to adopt the biases of their developers and of society as a whole. As such, algorithmic decision making can reinforce the prejudice and the bias of the data it is fed with, ultimately compromising the basic human rights such as fair process. Bias is typically not written in the code, but developed through machine learning based on data. * For this reason, it is particularly difficult to detect bias, and can be done only through ex-post auditing and simulation rather than ex-ante analysis of the code. There is a need for common practice and tools to controlling data quality, bias and transparency in algorithms. Furthermore, as required by GDPR, there is a need for ways to explain machine decisions in human format. * Furthermore, the risk of manipulation of data should be considered as well, which may lead to ethical misconduct. Relevance and applications in policy making * Algorithms are increasingly used to take policy decisions that are potentially life changing, and therefore they must be transparent and accountable. GDPR sets out the clear framework for consent and transparency. Transparency is required for both data and algorithm, but as bias is difficult to detect in the algorithm itself and ultimately it is only through assessment of real-life cases that discrimination is detectable. Technologies, tools and methodologies * The main relevant methodologies are algorithm co-creation, regulatory technologies, auditability of algorithms, online experiments, data management processing algorithms and data quality governance approaches. * Regarding governance, the ACM U.S. Public Policy Council (USACM) released a statement and a list of seven principles aimed at addressing potential harmful bias of algorithmic solutions: awareness, access and redress, accountability, explanation, data provenance, auditability, validation and testing. * Further, Geoff Mulgan from NESTA has developed a set of guidelines according to which governments can better keep up with fast-changing industries. Similarly, Eddie Copeland from NESTA has developed a “Code of Standards for Public Sector Algorithmic Decision Making.” Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 1.3 - Open Government Data * Open Data are defined as data which is accessible with minimal or no cost, without limitations as to user identity or intent. Therefore, this means that data should be available online in a digital, machine readable format. Specifically, the notion of Open Government Data concerns all the information that governmental bodies produce, collect or pay for. This could include geographical data, statistics, meteorological data, data from publicly funded research projects, traffic and health data. * In this respect the definition of Open Public Data is applicable when that data can be readily and easily consulted and re-used by anyone with access to a computer. In the European Commission's view 'readily accessible' means much more than the mere absence of a restriction of access to the public. * Data openness has resulted in some applications in the commercial field, but by far the most relevant applications are created in the context of government data repositories. * With regard to linked data in particular, most research is being undertaken in other application domains such as medicine. Government starts to play a leading role towards a web of data. However, current research in the field of open and linked data for government is limited. This is all the more true if we take into account Big Data alimented by automatically collected databases. * An important aspect is the risk of personal data included in open government data or personal data being retrieved from the combination of open data sets. Relevance and applications in policy making * Clearly opening government data can help in displaying the full economic and social impact of information, and create services based on all the information available. Other core elements in the policy making process include promotion of transparency concerning the destination and use of public expenditure, improvement in the quality of policy making, which becomes more evidence based, increase in the collaboration across government bodies, as well as between government and citizens, increase the awareness of citizens on specific issues, as well as their information about government policies, and promotes accountability of public officials. * Nevertheless transparency does not directly imply accountability. “A government can be an open government, in the sense of being transparent, even if it does not embrace new technology. And a government can provide open data on politically neutral topics even as it remains deeply opaque and unaccountable.” (Robinson & Yu, 2012). Technologies, tools and methodologies * An interesting topic of research is the integration of open government data, participatory sensing and sentiment analysis, as well as visualization of real-time, high-quality, reusable open government data. Other avenues of research include the provision of quality, cost-effective, reliable preservation and access to the data, as well as the protection of property rights, privacy and security of sensible data. * Inspiring cases include: Open Government Initiative carried out by the Obama Administration for promoting government transparency on a global scale; Data.gov: platform which increases the ability of the public to easily find, download, and use datasets that are generated and held by the Federal Government. In the scope of Data.gov, US and India have developed an open source version called the Open Government Platform (OGPL), which can be downloaded and evaluated by any national Government or state or local entity as a path toward making their data open and transparent. Research Challenge 1.4 – Manipulation of statements and misinformation * Clearly transparency of policy making and overall trust can be negatively affected by fake news, disinformation and misinformation in general. In a more general sense disinformation can be defined as false information that is purposely spread to deceive people, while misinformation deals with false or misleading information (Lazer et al., 2018), but it also includes the bias that is inherent in news produced by humans with human biases. Lazer et al. (1094) define this most recent phenomenon as ‘fabricated information that mimics news media content in form but not in organizational process or intent.’ * This is hardly a modern issue: what changes in the era of big data, is the velocity according to which fake news and false information spread through social media. * Another example related to big data technologies and that will become even more crucial in the future is the one of deepfakes (portmanteau of "deep learning" and "fake"), which is an artificial intelligence-based human image synthesis technique used to combine and superimpose existing images and videos onto source images or videos. Relevance and applications in policy making * Fake news and misinformation lead to the erosion of trust in public institutions and traditional media sources, and in turn favour the electoral success of populist or anti-establishment parties. In fact, as discussed in Allcott and Gentzkow (2017) and Guess et al. (2018), Trump voters were more likely to be exposed and believe to misinformation. In the Italian context, il Sole 24 Ore found that the consumption of fake news appear to be linked with populism, but the content of the overwhelming majority of pieces of misinformation also displays an obvious anti-establishment bias, as found in Giglietto et al. (2018). * In the recent 2016 US presidential election, there has been the creation and spread of news articles that favoured or attacked one of the two main candidates, Hillary Clinton and Donald Trump, in order to steer the public opinion towards one candidate or the other. * Furthermore, the success of Brexit referendum is another example of how fake news steered the public opinion towards beliefs that are hardly funded on evidence, e.g. the claim that UK was sending £350m a week to the EU, and that this money could be used to fund NHS instead. Technologies, tools and methodologies * In the short term, raising awareness regarding fake news can be an important first step. For instance, the capability to judge is a source is reliable or the capability to triangulate different data sources is crucial in this regard. Furthermore, educating people on the capabilities of AI algorithms will be a good measure to prevent the bad uses of applications like FakeApp having widespread impact. * Regarding technologies to counter fake news, NLP can help to classify text into fake and legitimate instances. In fact, NLP can be used for deception detection in text, and fake news articles can be considered as deceptive text (Chen et al., 2015; Feng et al., 2012; Pérez-Rosas and Mihalcea, 2015). More recently, deep learning has taken over in case large-scale training data is available. For what concerns text classification, feature-based models, recurrent neural networks (RNNs) models, convolutional neural networks (CNNs) models and attention models have been competing (Le and Mikolov, 2014; Zhang et al., 2015; Yang et al., 2016; Conneau et al., 2017; Medvedeva et al., 2017). * Clearly all leading machine learning techniques for text classification, including feature-based and neural network models, are heavily data-driven, and therefore require quality training data based on sufficiently diverse and carefully labelled set of legitimate and fake news articles. * Regarding deepfakes, another possibility is to make use of blockchain technologies, in which every record is replicated on multiple computers and tied to a pair of public and private encryption keys. In this way, the person/institution holding the private key will be the true owner of the data, not the computers storing it. Furthermore, blockchains are rarely affected by security threats, which in turn can attack centralized data stores. As an example, individuals could make of the blockchain to digitally sign and confirm the authenticity of a video or audio file. The more the digital signature, the more is the likelihood that a document is authentic. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge? Cluster 2 – Public Governance Framework for Data Driven Policy Making Structures Research Challenge 2.1 - Forming and monitoring of societal and political will * Many efforts have been undertaken by European governments to establish data platforms and of course, the present development in the open data movement contributes to data driven decisions in the public sector, but is the status quo sufficient or what is needed to leverage data for an advanced data based decision support in the public sector? The legislative and political objectives are often neither clear nor discussed in advance. This leads to the fact, that a huge amount of data is certainly available but not the right data sets to assess specific political problems. In that sense, governance structures and frameworks like outcome and target oriented approaches are needed in order to be able to make the right data available and furthermore, to interpret these data bearing in mind societal and legislative goals (Schmeling et al.) Relevance and applications in policy making * Objectives in the public sector can be multifarious since they are aimed at the common good and not only prior on profit maximisation. Therefore, shared targets have the potential to transform common policies and legislative intentions on a horizontal and a vertical level into public organisations (James and Nakamura, 2015) Technologies, tools and methodologies * Research is needed to investigate how political and societal will can be operationalized in order to be able to design monitoring systems and performance measurement systems based not simply on financial information but rather on outcome and performance-oriented indicators. * An interesting case is given by the TNO policy lab for the co-creation of data-driven policy making. The Policy Lab is a methodology for conducting controlled experiments with new data sources and new technologies for creating data-driven policies. Policy makers experiment with new policies in a safe environment and then scale up. The Policy Lab approach has three pillars: (1) the use of new data sources as sensor data and technological developments for policy development; (2) a multidisciplinary approach: including data science, legal expertise, domain knowledge, etc.; and (3) involving citizens and other stakeholders ('co-creation') and carefully weighing different values. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 2.2 - Stakeholder/Data-producer-oriented Governance approaches * To enhance the evidence-based decisions in policy making, data must be gathered from different sources and stakeholders respectively including company data, citizens’ data, third sector data and Public Administrations’ Data. Every Stakeholder group requires different approaches to provide and exchange data. These approaches must consider political, administrative, legal, societal, management and ICT related conditions. * As a plurality of independent stakeholder groups is involved in the fragmented process of data collection, the governance mode cannot be based on a hierarchical structure. Thus, the network governance approach applies rather on negotiation-based interactions that are privileged to aggregate information, knowledge and assessments that can help qualifying political decisions (Sørensen and Torfing, 2007). * The public administration is in its origin an important advisor of the political system and is not to be underestimated in this context, since the administration owns meaningful data, which should be considered profoundly in political decision making. In addition, the roles and responsibilities of public administrations as data providers must be discussed and clarified. * If specific company data like traffic data from navigation device providers or social media data from social network providers is necessary to assess political questions, guidance and governance models to purchase or exchange this data is needed. * For all aforementioned cases IT standards and IT architecture frameworks for processing data stored in different infrastructures constituting so called data spaces are required (Cuno et al., 2019). * In this regard, an import role is played by massive interconnection, i.e. massive number of objects/things/sensors/devices connected through the information and communications infrastructure to provide value-added services, in particular in the context of smart cities initiatives. The unprecedented availability of data raises obvious concerns for data protection, but also stretch the applicability of traditional safeguards such as informed consent and anonymization (see Kokkinakos et al. 2016). * Data gathered through sensors and other IoT typically are transparent to the user and therefore limit the possibility for informed consent, such as the all too familiar “accept” button in websites. Secondly, the sheer amount of data makes anonymization and pseudonymisation more difficult as most personal data can be easily deanonymized. Advanced techniques such as multiparty computation and homomorphic encryption remain too resource intensive for large scale deployment. * We need robust, modular, scalable anonymization algorithms that guarantee anonymity by adapting to the input (additional datasets) and to the output (purpose of use) by adopting a risk-based approach. Additionally, it is important to ensure adequate forms of consent management across organization and symmetric transparency, allowing citizens to see how their data are being used, by whom and for what purpose. * Clearly sometimes the options are limited, as in the case of geo-positioning, which is needed to be able to use the services provided. Basically, in this case the user pays with their data to use services. Relevance and applications in policy making * Big data offer the potential for public administrations to obtain valuable insights from a large amount of data collected through various sources, and the IoT allows the integration of sensors, radio frequency identification, and Bluetooth in the real-world environment using highly networked services. * The trend towards personalized services only increases the strategic importance of personal data, but simultaneously highlight the urgency of identifying workable solutions. * On the other hand, when talking about once only principle, bureaucracy and intra-organisational interoperability are far more critical. Technologies, tools and methodologies * Several tools are today being developed in this area. Blockchain providing an authentication for machine to machine transaction: blockchain of things. More specifically, inadequate data security and trust of current IoT are seriously limiting its adoption. Blockchain, a distributed and tamper-resistant ledger, maintains consistent records of data at different locations, and has the potential to address the data security concern in IoT networks (Reyna et al. 2018). * Anonymization algorithms and secure multiparty mining algorithm over distributed datasets allow guaranteeing anonymity even when additional datasets are analysed and the partitioning of data mining over different parties (Selva Rathna and Karthikeyan 2015). Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 2.3 - Governance administrative levels and jurisdictional silos * Decisions in the political environment are often facing trans-boundary problems on different administrative levels and in different jurisdictions. Thus, the data collection to understand these problems and to investigate possible solutions causes manifold barriers and constraints, which have to be overcome through modern governance approaches and models. * Like the aforementioned stakeholder network of data providers, a data network has to be coordinated on meta-level and respective rules and access rights have to be established ICT-enabled through data connectors or controlled harvesting methods. * This is becoming increasingly urgent as government holds massive and fastly growing amounts of data that are dramatically underexploited. The achievement of the once only principle, as well the opportunities of big data only add to the urgency. * Interoperability of government data, as well as the issues of data centralization versus federation, as well as data protection, remain challenges to be dealt with. New solutions are needed that balance the need for data integration with the safeguards on data protection, the demand for data centralisation with the need to respect each administration autonomy, and the requirement for ex ante homogenization with more pragmatic, on demand approaches based on the “data lake” paradigm. All this need to take place at European level, to ensure the achievement of the goals of the Tallinn declaration. * And appropriate, modular data access and interoperability is further complicated by the need to include private data sources as provider and user of government data, at the appropriate level of granularity. Last but not least, this needs to work with full transparency and full consent by citizens, ideally enabling citizens to track in real time who is accessing their personal data and for what purposes. Relevance and applications in policy making * Data integration has long been a priority for public administration but with the new European Interoperability Framework and the objective of the once only principle is has become an unavoidable priority. Data integration and integrity are the basic building blocks for ensuring sufficient data quality for decision-makers – when dealing with strategic policy decision and when dealing with day to day decisions in case management. Technologies, tools and methodologies * New interface within which the single administrations can communicate and share data and APIs in a free and open way, allowing for the creation of new and previously-unthinkable services and data applications realised on the basis of the needs of the citizen. * As an example, the Data & Analytics Framework (DAF) by the Italian Digital Team aims to develop and simplify the interoperability of public data between PAs, standardize and promote the dissemination of open data, optimize data analysis processes and generate knowledge. * Another interesting example is given by the X-Road, which is an infrastructure which allows the Estonian various public and private sector e-service information systems to link up. Currently, the infrastructure is implemented also in Finland, Kyrgyzstan, Namibia, Faroe Islands, Iceland, and Ukraine. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 2.4 - Education and personnel development in data sciences * Governance plays also an important role on all questions of education and personnel development in order to ensure that the right capabilities are available in terms of data literacy, data management and interpretation. The need to develop these skills has to be managed and governed as a basis to design HR strategies, trainings and employee developments. Relevance and applications in policy making * Governance in personnel development promotes effective and efficient fulfillment of public duties like evidence based policymaking. * This is all the more true when taking into account the use of Big Data in policy making, as clearly the skills and competence of civil servants are very important for the implementation of reforms and take up of data strategies and solutions. Technologies, tools and methodologies * This research challenge includes focusing on standards to make transparent the assessment criteria of education policies, incentives to motivate specific types of behavior, information in the way of clear definitions of outputs and outcomes and accountability to examine that given outcomes and outputs can be delivered (Lewis and Pettersson, 2009). Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge? Cluster 3 – Data acquisition, cleaning and representativeness Research Challenge 3.1 - Real time big data collection and production * The rapid development of the Internet and web technologies allows ordinary users to generate vast amounts of data about their daily lives. On the Internet of Things (IoT), the number of connected devices has grown exponentially; each of these produces real-time or near real-time streaming data about our physical world. In the IoT paradigm, an enormous amount of networking sensors are embedded into various devices and machines in the real world. Such sensors deployed in different fields may collect various kinds of data, such as environmental data, geographical data, astronomical data, and logistic data. Mobile equipment, transportation facilities, public facilities, and home appliances could all be data acquisition equipment in IoT. * Furthermore, social media analytics deals with collecting data from social media websites like Facebook, Twitter, YouTube, WhatsApp etc. and blogs. Social media analytics can be categorized under big data because the data generated out of the social websites are in huge number, so that some efficient tools and algorithms are required for analysing the data. Data collected include user-generated content (tweets, posts, photos, videos), digital footprints (IP address, preferences, cookies), Mobility data (GPS data), Biometric information (fingerprints, fitness trackers data), and consumption behaviour (credit cards, supermarket fidelity cards). Relevance and applications in policy making * The collection of such amounts of data in real time can help in updated evaluation of policies, in monitoring the effects of policy implementations, in collecting data that can be used for agenda setting (for instance traffic data), as well as for the analysis of the sentiment and behaviour of the citizens, monitoring and evaluating government social media communication and engagement. Technologies, tools and methodologies * For collecting the data from devices, an obvious choice is given by the Internet of Things technologies. Regarding social media, there are many collection and analytics tools readily available for collecting and analysing content. These tools help in collecting the data from the social websites and its service not only stop with data collection but also helps in analysing the usage of data. Examples of tools and technologies are online sentiment analysis and data mining, APIs, data crawling, data scraping. * What is interesting about the development of such tools, is the development of automated technological tools that can collect, clean, store and analyse large volumes of data at high velocity. Indeed, in some instances, social media has the potential to generate population level data in near real-time. * Methodologies used to produce analysis from social media data include Regression Modelling, GIS, Correlation and ANOVA, Network Analysis, Semantic Analysis, Pseudo-Experiments, and Ethnographic Observations. A possible application includes also the project (https://info.openlaws.com/eu-project/), dealing with Big Open Legal Data. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 3.2 - Quality assessment, data cleaning and formatting * Big Data Quality assessment is an important phase integrated within data pre-processing. It is a phase where the data is prepared following the user or application requirements. When the data is well defined with a schema, or in a tabular format, its quality evaluation becomes easier as the data description will help mapping the attributes to quality dimensions and set the quality requirements as baseline to assess the quality metrics. * After the assessment of data quality, it is time for data cleaning. This is the process of correcting (or removing) corrupt or inaccurate records from a record set, table, or database. Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data or coarse data. * This research challenge also deals with formatting, as once one has downloaded sets of data is not obvious at all that their format will be suitable for further analysis and integration in the existing platforms. And another important factor is metadata, which are important for transparency and completeness of information. Relevance and applications in policy making * Apart from systematic errors in data collection, it is important to assess to extent to which the data are of quality, and to amend it, obviously because policy decisions have to be funded on quality data and therefore have to be reliable. * More data does not necessarily mean good or better data, and many of the data available lack the quality required for its safe use in many applications, especially when we are talking about data coming from social networks and internet of things. Technologies, tools and methodologies * Regarding data quality, it is mandatory to use existing and develop new frameworks including big data quality dimensions, quality characteristics, and quality indexes. For what concerns data cleaning, the need for overcoming the hurdle is driving development of technologies that can automate data cleansing processes to help accelerate business analytics. * Considering frameworks for quality assessment, the UNECE Big Data Quality Task Team released in 2014 a framework for the Framework for the Quality of Big Data within the scope of the UNECE/HLG project “The Role of Big Data in the Modernisation of Statistical Production” (UNECE 2014). The Big Data Quality framework developed provides a structured view of quality at three phases of the business process: i.e. Input (acquisition analysis of the data); Throughput (transformation, manipulation and analysis of the data); Output (the reporting of quality with statistical outputs derived from big data sources). Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 3.3 - Representativeness of data collected * A key concern with many Big Data sources is the selectivity, (or conversely, the representativeness) of the dataset. A dataset that is highly unrepresentative may nonetheless be useable for some purposes but inadequate for others. Related to this issue is the whether there exists the ability to calibrate the dataset or perform external validity checks using reference datasets. Selectivity indicators developed for survey data can usually be used to measure how the information available on the Big Data Source differs from the information for the in-scope population. * For example, we can compare how in-scope units included in Big Data differ from in-scope units missing from the Big Data. To assess the difference, it is useful to consider the use of covariates, or variables that contain information that allows to determine the “profile” of the units (for example, geographic location, size, age, etc.) to create domains of interest. It is within these domains that comparisons should be made for “outcome” or study variables of interest (for example, energy consumption, hours worked, etc.). Note that the covariates chosen to create the domains should be related to the study variables being compared. * Regarding social media, research defines a set of challenges that have implications for have implications for validity and reliability of data collected. First, users of social media are not representative of populations (Ruths & Jurgen, 2014). As such, biases will exist and it may be difficult to infer findings to the general population. Furthermore, social media data is seldom created for research purposes, and finally it is difficult to infer how reflective a user’s online behaviour is of their offline behaviour without information on them from other sources (Social Media Research Group 2016). Relevance and applications in policy making * Clearly big data representativeness is crucial to policy making, especially when studying certain characteristics of the population and in analysing its sentiment. It is also important of course when tackling certain subgroups. * In this regard, large datasets may not represent the underlying population of interest and sheer largeness of a dataset clearly does not imply that population parameters can be estimated without bias. Technologies, tools and methodologies * Appropriate sampling design has to be applied in order to ensure representativeness of data and limit the original bias when present. Probability sampling methodologies include: simple random sampling, stratified sampling, cluster sampling, multistage sampling, and systematic sampling. An interesting research area is survey data integration, which aims to combine information from two independent surveys from the same target population. * Kim et al. (2016) propose a new method of survey data integration using fractional imputation, and Park et al. (2017) use a measurement error model to combine information from two independent surveys. Further, Kim and Wang (2018) propose two methods of reducing the selection bias associated with the big data sample. Finally, Tufekci (2014) provides a set of practical steps aimed at mitigating the issue of representativeness, including: targeting non-social dependent variables, establishment of baseline panels to study people’s behaviour, use of multidisciplinary teams and multimethod/multiplatform analysis. * Big Data can be also combined with 'traditional' datasets to improve representativeness (Vaitla 2014). Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge? Cluster 4 – Data storage, clustering, integration and fusion Research Challenge 4.1 – Big Data Storage * Obviously a pre-requisite for clustering, integration and fusion of big data is the presence of efficient mechanisms for data storage and processing. Clearly, Big data storage technologies are a key enabler for advanced analytics that have the potential to transform society and the way key decisions are made, also in terms of policy. One of the first things organizations have to manage when dealing with big data, is where and how this data will be stored once it is acquired. The traditional methods of structured data storage and retrieval include relational databases and data warehouses. Relevance and applications in policy making * Clearly the data acquired by the public administration, to be subsequently used for analytics, modelling and visualization, need to be stored efficiently and safely. In this regard, it is important to understand the encryption and migration needs, the privacy requirements, as well as the procedures for backup or disaster recovery. * Furthermore, big data storage and processing technologies are able to produce information that can enhance different public services Technologies, tools and methodologies * This research topic has been developing rapidly in the last years, delivering new types of massive data storage and processing products e.g. NoSQL knowledge bases. Basing on the advances of cloud computing, the technology market is very developed in this area (for an overview, see Sharma, 2016). Crowdsourcing also plays an important role, and in the light of the climate change and environmental issues energy-efficient data storage methods are also a crucial research priority (Strohbach et al. 2016). Furthermore, to automate complex tasks and make them scalable, hybrid human-algorithmic data curation approaches have to be further developed (Freitas and Curry 2016). * More specifically, the most important technologies are: distributed File Systems such as the Hadoop File System (HDFS), NoSQL and NewSQL Databases, and Big Data Querying Platforms. * On the other hand interesting tools are: Cassandra, Hbase (George, 2011), MangDB, CouchDB, Voldemort, DynamoDB, and Redis Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 4.2 - Identification of patterns, trends and relevant observables * This research challenge deals with technologies and methodologies allowing businesses and policy makers to identify patterns and trends of data both structured and unstructured that may have not been previously visible. Relevance and applications in policy making * Clearly the possibility to extract patterns and trends in data can help the policy maker in having a first sight for discovering issues that are the used to develop the policy agenda. An interesting application is anomaly detection, which is most commonly used in fraud detection. For example, anomaly detection can identify suspicious activity in a database and trigger a response. There is usually some level of machine learning involved in this case. Technologies, tools and methodologies * One of the most used Big Data methodologies for identification of pattern and trends is data mining. Combination of database management, statistics and machine learning methods useful for extracting patterns from large datasets. * Some examples include mining human resources data in order to assess some employee characteristics or consumer bundle analysis to model the behaviour of customers. * It has also to be taken into account that most of the Big Data are not structured and have a huge quantity of text. In this regard, text mining is another technique that can be adopted to identify trends and patterns. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 4.3 - Extraction of relevant information and feature extraction * Summarizing data and meaning extraction to provide a near real time analysis of the data. Some analysis require that data must be structured prior to perform them in an homogeneous way, as algorithms unlike humans are not able to grasp nuance. Furthermore, most computer systems work better if multiple items are stored in an identical size and structure. * But an efficient representation, access and analysis of semi‐structured data is necessary because as a less structured design is more useful for certain analysis and purposes. Even after cleaning and error correction in the database, some errors and incompleteness will remain, challenging the precision of the analysis. Relevance and applications in policy making * While information and feature extraction could appear far from the policy process, it is a fundamental requirement to ensure the veracity of the information obtained and to reduce the effort from the following phases, ensuring the widest reuse of the data for purposes different from the one it was originally gathered. The data have to be adapted according to the use and analysis that are destined too, and moreover they are needed as data preparation for visualization. Technologies, tools and methodologies * Bayesian techniques for meaning extraction; extraction and integration of knowledge from massive, complex, multi-modal, or dynamic data; data mining; scalable machine learning; principal component analysis. Tools include Nosql, hadhoop, deep learning, rapidminer, keymine, R, phython, and sensor data processing (fog and edge computing). Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge? Cluster 5 – Modelling and analysis with big data Research Challenge 5.1 - Identification of suitable modelling schemes inferred from existing data * The traditional way of modelling started with a hypothesis about how a system acts. Then collect data to represent the stimulus. Traditionally, the amount of data collected was small since it rarely already existed, had to be generated with surveys, or perhaps imputed through analogies. Finally, statistical methods established enough causality to arrive at enough truth to represent the system. * So deductive models are forward running, so they end up representing a system not observed before. On the other hand, with the current huge availability of data, it is possible to identify and create new suitable modelling schemes that build on existing data. * These are inductive models that start by observing a system already in place and one that is putting out data as a by-product of its operation. In this respect, the real challenge is to be able to identify and validate from existing data models that are valid and suitable to cope with complexity and unanticipated knowledge. * Model validation is composed of two main phases. The first phase is conceptual model validation, i.e. determining that theories and assumptions underlying the conceptual model are correct. A second phase is the computerised model verification, that ensures that computer programming and implementation of the conceptual model are correct. Relevance and applications in policy making * There are several aspects related to the identification and validation of modelling schemes that are important in policy making. A first deals with the reliability of models: policy makers use simulation results to develop effective policies that have an important impact on citizens, public administration and other stakeholders. Identification and validation is fundamental to guarantee that the output of analysis for policy makers is reliable. * Another aspect is the acceleration of the policy modelling process: policy models must be developed in a timely manner and at minimum cost in order to efficiently and effectively support policy makers. Model identification and validation is both cost and time consuming and if automated and accelerated can lead to a general acceleration of the policy modelling process. Technologies, tools and methodologies * In current practice the most frequently used is a decision of the development team based on the results of the various tests and evaluations conducted as part of the model development process. Another approach is to engage users in the choice and validation process. At any rate, conducting model validation concurrently with the development of the simulation model enables the model development team to receive inputs earlier on each stage of model development. * Therefore, ICT Tools for speeding up, automating and integrating model validation process into policy model development process are necessary to guarantee the validity of models with an effective use of resources. It has finally to be noticed that model validation is not a discrete step in the simulation process. It needs to be applied continuously from the formulation of the problem to the implementation of the study findings as a completely validated and verified model does not exist. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 5.2 - Collaborative model simulations and scenarios generation * This methodology encompasses participation of all stakeholders in the policy-making process through the implementation of online-based easy-to-use tools for all the levels of skills. Decision-making processes have to be supported with meaningful representations of the present situations along with accurate simulation engines to generate and evaluate future scenarios. * Instrumental to all this is the possibility to gather and analyze huge amounts of relevant data and visualize them in a meaningful way also for an audience without technical or scientific expertise. Citizens should also be allowed for probing and real-time data collection for feeding simulation machines at real time, and/or contributing by mean of some sort of online platform. * Understanding the present through data is often not enough and the impact of specific decisions and solutions can be correctly assessed only when projected into the future. Hence the need of tools allowing for a realistic forecast of how a change in the current conditions will affect and modify the future scenario. In short scenario simulators and decision support tools. * In this framework it is highly important to launch new research directions aimed at developing effective infrastructures merging the science of data with the development of highly predictive models, to come up with engaging and meaningful visualizations and friendly scenario simulation engines. * The weakest form of involvement is feedback to the session facilitator, similar to the conventional way of modelling. Stronger forms are proposals for changes or (partial) model proposals. In this particular approach the modelling process should be supported by a combination of narrative scenarios, modelling rules, and e-Participation tools (all Integrated via an ICT e‐Governance platform): so the policy model for a given domain can be created iteratively using cooperation of several stakeholder groups (decision makers, analysts, companies, civic society, and the general public. Relevance and applications in policy making * Clearly the collaboration of several individuals in the simulation and scenario generation allows for policies and impact thereof to be better understood by non-specialists and even by citizens, ensuring a higher acceptance and take up. Furthermore, as citizens have the possibility to intervene in the elaboration of policies, user centricity is achieved. * On the other hand, modelling co-creation has also other advantages: no person typically understands all requirements and understanding tends to be distributed across a number of individuals; a group is better capable of pointing out shortcomings than an individual; individuals who participate during analysis and design are more likely to cooperate during implementation. Technologies, tools and methodologies * CityChrone++ is one of the instantiations of a larger platform dubbed what if-machine (link to whatif.caslparis.com), aimed at providing users with tools to assess the status of our urban and inter-urban spaces and conceive new solutions and new scenarios. The platform integrates flexible data analysis tools with a simple scenario simulation platform in the area of urban accessibility, with a focus on human mobility. In this framework, it will be important to parallel the platform with effective modelling schemes, key for the generation and the assessment of new scenarios. * United Nations Global Policy Model (GPM): this is a tool for investigation of policy scenarios for the world economy. The model is intended to trace historical developments and potential future impacts of trends, shocks, policy initiatives and responses over short, medium and long-term timescales, in the view to provide new insights into problems of policy design and coordination. Recently, the model has been applied to the assessment of possible policy scenarios and implication for the world economy in a post-Brexit setting. * The European Central Bank New Area-Wide Model (NAWM): dynamic stochastic general equilibrium model reproducing the dynamic effects of changes in monetary policy interest rates observed in identified Variable Autoregression Models (VARs). The building blocks are: agents (e.g. households and firms), real and nominal frictions (e.g. habit formation, adjustment costs), financial frictions (domestic and external risk premium), rest-of-World block (SVAR). It is estimated on time series for 18 key macro variables employing Bayesian inference methods. The model is regularly used for counterfactual policy analysis. * TELL ME Model: this a prototype agent-based model, developed within the scope of the European-funded TELL ME project, intended to be used by health communicators to understand the potential effects of different communication plans under various influenza epidemic scenarios. The model is built on two main building blocks: a behaviour model that simulates the way in which people respond to communication and make decisions about whether to vaccinate or adopt other protective behaviour, and an epidemic model that simulates the spread of influenza. * Households and Practices in Energy use Scenarios (HOPES): agent based model aimed to to explore the dynamics of energy use in households. The model has two types of agents: households and practices. Elements (meanings, materials and skills) are entities in the model. The model concept is that households choose different elements to perform practices depending on the socio-technical settings unique to each household. The model is used to test different policy and innovation scenarios to explore the impacts of the performance of practices on energy use. * Global epidemic and mobility model (GLEAM): big data and high performance computing model combining real-world data on populations and human mobility with elaborate stochastic models of disease transmission to model the spread of an influenza-like disease around the globe, in order to be able to test intervention strategies that could minimize the impact of potentially devastating epidemics. An interesting application case quantification of the risk of local Zika virus transmission in the continental US during the 2015-2016 ZIKV epidemic. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 5.3 - Integration and re-use of modelling schemes * This research challenge seeks to find the way to model a system by using already existing models or composing more comprehensive models by using smaller building blocks, either by reusing existing objects/models or by generating/building them from the very beginning. Therefore, the most important issue is the definition/identification of proper (or most apt) modelling standards, procedures and methodologies by using existing ones or by defining new ones. * Further to that, the present sub-challenge calls for establishing the formal mechanisms by which models might be integrated in order to build bigger models or to simply exchange data and valuable information between the models. Finally, the issue of model interoperability as well as the availability of interoperable modelling environments should be tackled, as well as the need for feedback-rich models that are transparent and easy for the public and decision makers to understand. Relevance and applications in policy making * In systems analysis, it is common to deal with the complexity of an entire system by considering it to consist of interrelated sub-systems. This leads naturally to consider models as consisting of sub-models. Such a (conceptual) model can be implemented as a computer model that consists of a number of connected component models (or modules). Component-oriented designs actually represent a natural choice for building scalable, robust, large-scale applications, and to maximize the ease of maintenance in a variety of domains. * An implementation based on component models has at least two major advantages. First, new models can be constructed by coupling existing component models of known and guaranteed quality with new component models. This has the potential to increase the speed of development. Secondly, the forecasting capabilities of several different component models can be compared, as opposed to compare whole simulation systems as the only option. Further, common and frequently used functionalities, such as numerical integration services, visualization and statistical ex-post analyses tools, can be implemented as generic tools and developed once for all and easily shared by model developers. Technologies, tools and methodologies * The CEF BDTI building block provides virtual environments that are built based on a mix of mature open source and off-the-shelf tools and technologies. The building block can be used to experiment with big data sources and models and test concepts and develop pilot projects on big data in a virtual environment. Each of these environments are based on a template that supports one or more use cases. These templates can be deployed, launched and managed as separate software environments. * Specifically, the Big Data Test Infrastructure will provide a set of data and analytics services, from infrastructure, tools and stakeholder onboarding services, allowing European public organisations to experiment with Big Data technologies and move towards data-driven decision making. Applicability of the BDTI includes descriptive analysis, Social Media Analysis, Time-series Analysis, Predictive analysis, Network Analysis, and Text Analysis. * Specifically, BDTI allows public organizations to experiment with big data sources, methods and tools; launch pilot projects on big data and data analytics through a selection of software tools, acquire support and have access to best practice and methodologies on big data; share data sources across policy domains and organisations. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge? Cluster 6 – Data visualization Research Challenge 6.1 - Automated visualization of dynamic data in real time * Due to continuing advances in sensor technology and increasing availability of digital infrastructure that allows for acquisition, transfer, and storage of big data sets, large amounts of data become available even in real-time. * Since most analysis and visualization methods focus on static data sets, adding a dynamic component to the data source results in major challenges for both the automated and visual analysis methods. Besides typical technical challenges such as unpredictable data volumes, unexpected data features and unforeseen extreme values, a major challenge is the capability of analysis methods to work incrementally. * Furthermore, scalability of visualization in face of big data availability is a permanent challenge, since visualization requires additional performances with respect to traditional analytics in order to allow for real time interaction and reduce latency. * Finally, visualization is largely a demand-and design-driven research area. In this sense one of the main challenges is to ensure the multidisciplinary collaboration of engineering, statistics, computer science and graphic design. Relevance and applications in policy making * Visualization of dynamic data in real time allows policy makers to react timely with respect to issues they face. An example can be given by movement data (e.g., road, naval, or air-traffic) enabling analysis in several application fields (e.g., landscape planning and design, urban development, and infrastructure planning). * In this regard, it helps in identifying problems at an early stage, detect the “unknown unknown” and anticipate crisis: visual analytics of data in real time are for instance largely used in the intelligence community because they help exploiting the human capacity to detect unexpected patterns and connections between data. Technologies, tools and methodologies * Methodologies for bringing out meaningful patterns include data mining, machine learning, and statistical methods. Tools for management and automated analysis of data streams include: CViz Cluster visualisation, IBM ILOG visualisation, Survey Visualizer, Infoscope, Sentinel Visualizer, Grapheur2.0, InstantAtlas, Miner3D, VisuMap, Drillet, Eaagle, GraphInsight, Gsharp, Tableau, Sisense, SAS Visual Analytics. Apart from acquiring and storing the data, great emphasis must be given to the analytics and DSS algorithms that will be used. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Research Challenge 6.2 - Interactive data visualization * With the advent of Big Data simulations and models grow in size and complexity, and therefore the process of analysing and visualising the resulting large amounts of data becomes an increasingly difficult task. Traditionally, visualisations were performed as post-processing steps after an analysis or simulation had been completed. As simulations increased in size, this task became increasingly difficult, often requiring significant computation, high-performance machines, high capacity storage, and high bandwidth networks. * In this regard, there is the need of emerging technologies that addresses this problem by “closing the loop” and providing a mechanism for integrating modelling, simulation, data analysis and visualisation. This integration allows a researcher to interactively perform data analysis while avoiding many of the pitfalls associated with the traditional batch / post processing cycle. This integration also plays a crucial role in making the analysis process more extensive and, at the same time, comprehensible. Relevance and applications in policy making * Policy makers should be able to independently visualize results of analysis. In this respect, one of the main benefits of interactive data visualization is basically to generate high involvement of citizens in policy-making. * One of the main applications of visualization is in making sense of large datasets and identifying key variables and causal relationships in a non-technical way. Similarly, it enables non-technical users to make sense of data and interact with them. Secondly, it helps to understand the impact of policies: interactive visualization is instrumental in making evaluation of policy impact more effective. Technologies, tools and methodologies * Visualisation tools are still largely designed for analyst and are not accessible to non‐experts. Intuitive interfaces and devices are needed to interact with data results through clear visualisations and meaningful representations. User acceptability is a challenge in this sense, and clear comparisons with previous systems to assess its adequacy. * Furthermore, a good visual analytics system has to combine the advantages of the automatic analysis with interactive techniques to explore data. Behind this desired technical feature there is the deeper aim to integrate the analytic capability of a computer with the abilities of the human analysis. In this regard, an interesting case is given by the project BigDataOcean. * An interesting approach would be to look into two, or even three, tiers of visualisation tools for different types of users: experts and analysts, decision makers (which are usually not technical experts but must understand the results, make informed decisions and communicate their rationale), and the general public. Visualisation for the general public will support buy-in for the resulting policies as well as the practice of data-driven policy making in general. * Tools available on the market include imMens system, BigVis package for R, Nanocubes, MapD, D3.js, AnyChart, and ScalaR projects, who all use various database techniques to provide fast queries for interactive exploration. Do you agree with the research challenge (please comment above in line)? Can you suggest any application case, tool, methodology (please comment above in line)? Do you want to add any other research challenge?
Comments words cloud
Yes and clearly there is already a lot of work (also quoted in the roadmap) about the activity by ISA2There are research gaps, obviously they resemble research challengesAgreedAn important aspect to considering is the dynamic development of the policy process and the need to allow for feedback loops. In other words it is important not consider policy making as a linear process, rather a complex - non-linear - adaptative cycle. Another important issue to consider is the opportunity that using advanced digital technologies, data analytics and visualisation tools, the policy making cycle can integrate real time data and provide immediate responses and simulate alternative policy impacts. These are actually mutually reinforced by the availability of data that can be processed real time (or ex post) to produce ex ante or ex post evaluation and monitor possible impacts using predictive modelling and simulation tools, so to intervene in the process with corrective measures and alternative policy interventions Such evaluation framework should integrate as much as possible social impact indicators and non tangible impacts, including through the use of "dynamic perception surveys" and sentiment analysis for example. Citizen engagement could potentially contribute to policy making in many, various ways. Citizens could assist in pointing out societal issues to policy makers, help them formulate policy measures to tackle them, assist in the implementation of these measures and even evaluate if the policies are actually addressing the problem and not making it worse. Unintentional effects of public policy could also be made more aware due to more citizen-engagement, so it is very valuable to ensure high quality public services and policy. Especially with the increasing role of data, it is still a must that 'rThis research area requires to explore further the use of computer based simulation models rooted complex systems approaches, such as Systems Dynamics and Agent Based Modelling. At the same time, as AI and IoT-technologies are General Purpose Technologies, it is likely that we can find these technologies being embedded in so many different contexts and cases, so the possibility of assisting in tackling societal problems is almost limitless. IoT could potentially even make us aware of societal problems we didn't even know before, due to the data collection it is able to do. Especially with opportunities of data-sharing among private and public organizations, we could become aware of different societal problems made discoverable by data coming from IoT.Evidence is showing that it is at the local level that there will be the most intense and comprehensive way of interacting with citizens as the 'distance' between the institution and citizens is the lowest. Citizens are more likely to be interested in participating and expression their concerns if a policy influences their only living area.While there is a general excitement about what technology can do to help us move forward, and in particular AI optimists, believe that, in support of the right policies and deployed with care, it can bring about better outcomes for everyone, there are also many challenges that shall be addressed both at sectoral and horizontal policy levelOther frequently used controversial examples of the risks of technology, refer to current and emerging use of AI techniques to make predictions which are assumed to be way more comprehensive and accurate than human-based predictions as the predictions are based on huge data volumes, such us in the case of predictive policing, where law enforcement agencies use AI technologies to predict areas where crimes are more likely to occur, or to detect anomalies within big datasets to help organizations focus on specific cases which, according to the algorithm, stand out from the rest, which could alsoPublic services should be seen as a dynamic service, which is likely to change over time and has to fit citizens with totally different expectations and backgrounds. Public administrations will have to understand that there often is not a one-size fit all and that they will have to adapt and change their services when the need arises.Very crucial, indeed!This, indeed, offers a variety of options among meaningful and beautiful visualisations that can facilitate the represantation and understanding of real-time data. There are several tools that do this at the time and improving this type of technology has extensive scientific and practical value in the era of IoT.New technologies from the public sector POV, or in general?I agree with Ana, quality is the most important aspect.Evidence-based policy making can be achieved when data-driven policy making is applied. The roadmap under consideration is of actual value!I guess there are many approaches out there. This one seems legit though.Cost is not the top priority here i suppose. Dealing with important societal problem should be the focus.It's a practical issue. You cannot have actual data prior to realisation. However, relevant historical data can usually be exploited.Everyone should contribute with (relevan and non-sensitive) data when it comes to coping with societal problems.And not only with data! Tools, technologies and know-how should also be shared for the common good!Almost all concepts discussed here are heavy scientific procedures. I'm not certain that co-creation (if not carefully targeted) can be effective.Interesting finding!I like your approach. However, many consider formats like pdf as open data - although they do not match the description here.In addition, minimal costs still is a cost! Thus, i would think that "cheap" data are not open data."Readily accessible" is indeed much more than "open" and is a very meaningful approach.True, it's not easy to find a plethora of successful commercial apps using only open data.Much as i believe in linked data, up to today the actual impact is not even close to the presumed one.Personal data are not that much of a barrier imo. I think that GDPR for example easily allows processing. Sensitive data is the key here.I agree that these two are not the same. However, they are both of the utmost importance!Policy makers and society at large face a number of fundamental challenges. The different components of the urban system are strongly interwoven, giving rise to complex dynamics and making it difficult to anticipate the impact and unintended consequences of public action. Urban development policies are subject to highly distributed, multi-level decision processes and have a profound impact on a wide variety of stakeholders, often with conflicting or contradictory objectives. The increasing growth and availability of municipal data sets represents a significant opportunity for municipalities and their citizens to develop tools for coping with these problems, especially in newly urbanising countries where new sources of digital data can both help to make sense of changing needs and demographics, and can enable interactive urban planning and governance. These new sources of data are already the subject of interest and experiment. In terms of data access, while some big data are generated and channeled by city authorities - for example via traffic sensors or e-government applications, and are easier to access (provided that privacy restrictions of personal data are met), many big data, are channeled by the private sector - for example real-time details of people’s movements via mobile calling records, or details of economic transactions performed over mobile networks. Similarly, some other forms of data, such as public feedback projects, are channeled by civil society organizations, often in combination with academia.With respect to analysis capacity, big databases require big resources. Although, not every big data problem requires a supercomputer, significant software and hardware resources are necessary to manage and analyse big data due to its richness and real-time aspects. Though municipalities can perform analysis themselves on their own data sets, the opportunity to derive insights from the large data is greatly improved if academic and industry researchers can be invited to participate in collaborative analytics across the data set. In the past few years we have seen the emergence of concepts such as the smart city, urban informatics [Unsworth’14], urban analytics [Marcus’11] and citizen science [Hand’10], which are seen to hold great promise for improving the functioning of cities. However, arguably most of this potential still remains to be realized. The opportunities are vast, but so are the challenges (Herranz’15 - Herranz, Ricardo (2015), “Big Data, Complexity Theory and Urban Development”, New Approaches to Economic Challenges, OECD Report).Foresight Methodologies like Scenario Planning [Ramirez’13] are already applied in policy-making (e.g. the “Gesetzesfolgenabschätzung” in German legislation or “Integrated Impact Assessments” in European legislation) and are regarded as adequate means to prepare political decisions. However, the available methodologies are geared to high national levels and draw mainly on the results of in-depth research and expert judgements [Heuer’11]. They are hardly adapted to the usage in urban areas nor are they readily adjusted to incorporate big-data. Furthermore, scenario planning methodologies have just recently started to become more participatory and build on the knowledge and experience of citizens and the population as such.The sentence should be devided in two in order to better explain the role of citizens and other stakeholders as providers.and results are constantly monitored and evaluated. The idea to pay in data itself schould be supported by some more details. Contrafactual evaluation using big data should be planned well in advance, cannot be done ex-postthis is the right real added value and this point point should be emphasised and strengthened This is a super challange, an european or wordlwide?understanding the problem is crucial. is not the visualisation that generate the involvment of citizens! data visualisation may help and make easier to accesss to data but is not sufficient I would add a cluster on the role of citizens and other stakeholders in providing big data and their role as users.the section should include more focus on how relevant are governamental open data.data visualisation and senario visualisation should be tested in presence, maybe the "citisen juries" method can be used in order to make improvment and validate results.http://designresearchtechniques.com/casestudies/citizen-juries-an-action-research-method/ Are citizens the only providers? I guess usefull data can come from everywhere when policy making is concerned.I found it hard to fully understand the paragraph. Anyway, i see many more problems. E.g. unstable regulation might result in confusion on how to take advantage of data. Another case is that data are out there but policy making organizations lack the know-how to collect and/or process them.I really wonder if there are actually policy making organizations that rely on these principles.Valid point! Of course, convincing someone to share data (especially individuals) is not easy.In general, i consider big data, extreme data etc. as buzzwords. The value is on finding and epxloiting relevant data - even if not "big".Although citizens might not come to recognize the value of the policy (as we tend to recognize the importance that something has only if we lack solutions).True, important barrier!This might ask for more effort that is seems. Citizens indeed discuss issues in fora and social media. But only in dedicated places these issues refer directly to public services, policies etc. "Meta-analysis" might be needed in order from typical social media comments to come up to valuable input for policy making.I'm curoius. Isn't this already happening? It's too obvious not to do.In a sense, it has to be questioned if these are "typically" big data. Although the importance is in their value.Regulatory fragmentation is a barrier practically impossible to overcome. If the focus is on Europe, common legislation in all levels is needed.I trust that this also can be done with experts' input. Data-intensive techniques are not necessary here.I don't get it; big data produce new real-time data?Interoperability with existing systems is also importantIndirectly, citizens can significantly contribute by sharing their data. Of course, user firendly evaluations (e.g. through mobile devices) are also of great added value.From the technical point of view, there is a plethora of commercial solutions out there. I don't know if there are opensource also, which would be ideal for the public sector.As far as the legal are concerned, this is very challenging. As i also said before, legal homogenization in Europe for example is a "green light" for such endeavours.The cost of proven and effective solutionsThe cost of training employees, that might be also non IT literateMaybe even the absence of proper dataThis does not necessarily ask for big data i think. More for user-friendly interfaces and appropriate dissemination.This is very important, well-written!And come-up with intuitive ways to involve themIt's a very valid proposition. The most important detil is that the evaluation criteria should by dynamic.Regulatory frameworks are the key here. Regulations as GDPR sometimes result in confusion and not trust.Interoperability is extremely important. Practically it saves time, effort and monetary resources.Excelent written by Angela. The sxale is the challenge here.The question is if these solutions are actually understood, or seen as black boxes. The latter could cause trust issues in the future.This is a general comment. Systems and resources in general is one thing. Willingness to get out of the "business as usual" is a totally different challenge.The thing is that regulatory frameworks have also to be stable; and not change every now and then as at the moment.I agree that we should let the "machines" decide in many aspects. But end-users should be convinced that it's for the best. As distrust can undo even the most effective and efficient system.This needs a lot of discussion. Typical co-creation, that directly involves stakeholders, might not be the best case for policy making. Indirect involvement, yes.Co-creation of algorithms? Maybe co-training of algorithms?There are cases where people confuse governance with government; so this is a valid entry for more than one reasons.There is the classic issue here. It's not easy to design and develop tools that are both sophisticated and friendly to the average user.These are not mutually exclusive.True, visualization is powerful. It's just that in many cases it's impossible to go in the same detail that algorithms and "core" mathematics can go.Why focusing on the problem setting phase? I think that this approach fits the policy cycle as a whole.Causal relationships are usually hidden and visualizations cannot bring them forward.To-the-point statement!There can also be a gradual approach. You let the user go one step deeper, once he/she claims to have understood the previous one.IMHO, data-driven policy making is the key. The BIG data notion is a good-to-have. But first you need to be certain that the data-driven approach has been achieved.Very very important as i have already stated. Algorithmic bias should be avioded both in prractice, but also as theoretical criticism.Since building/restoring trust is the target, this could be catastrophic.These are very challenging notions. For example, micro-politics can never be out of the agenda; however, they cannot be communicated and modelled as well.Beaurocracy is a "neighboring" notion as wellThis is a very good discussion topic. Building in-house expertise is one way to go. Outsourcing is a different one. A technical and financial analysis should be the criterion probably.I think that one can never be 100% certain on this. However, the more the sources, the larger the datasets, the better the quality can overcome this.This is by far the most important out of the three.I can also note that this is more "horizontal", it does not necessarily belong to this cluster only.When called to analyze this, defining who should participate in the collaboration is an important aspect.Re-use should be careful; only where and when appropriate.It's definitely a research field that policy makers need to invest to. However, i guess it's not "market-ready" yet.Being certain (and not just reassured) that your data are exploited only for the purrposes they are supposed to is of the utmost importance.Machine learning and deep learning require huge datasets for training. Are you certain there are such datasets in the public sector?I don't think i fully follow the meaning here.I can see the extreme value on health. However, i think that many barriers will come up on justice.I think i agree. In any case, it's imortant to invest on investigating algorithmic bias.This is a big issue already faced but never solved. Much effort is oriented to provide some guidelines for data provisioning. So far, no standardization for content management, web services, technological platforms, data modeling, etc.]See previous comment about frameworks and tools.Development of analytical tools and information systems able to understand the effectiveness of the actions taken to face policy issues.Noise, random, and systematic errors must be transparent to policymakers.Moreover, a procedural stack to ensure the application of the 5-star principles for Open Data, as much as possible.Centralization vs. federation and administration autonomy reveal the consistency issue in case of data interrelation, to some extent. This occurs, for example, when an organization modifies some content referenced by others. So, any modification must be propagated in some ways.Please have a look at my article https://www.agendadigitale.eu/personaggi/cristian-lai/to ensure though the big data use appropriately the below mentioned points for data interoperability and GDPR to foster data economy and not being a barrier are very important. Also we need to research more on the wish of several stakeholders to Move to an Open Data Landscape with Open data being necessary for the offerings of new services or to generate research Also a TT deriving recommendation to assist the development of new tools and techniques is to foster on the data quality at least for important datasets. Policy recommendations: Data Interoperability to foster collaboration An important addition is that for using big data appropriately we need to resolve also issues that relate to the data itself, their protection, ownership, interoperability. i.e. example deriving from TT project: we registered while interviewing the 13 pilot leaders fragmented policies regarding GDPR across Europe. many stakeholders were hindered to share data, making big data analysis and use difficult and sometimes not possible. Pilots did follow specific methodologies to facilitate this which delayed their business. Also Extra training or assistive tools was suggested pilots, natural language explanations offered for everyday users Pilots have suggested that even national or regional authorities interpreting complicated issues such as: who owns the data and weather these data are personal Data provider can be everyone; or everything. Think also like IoT.Quality data are the most important. Why not pay for them is necessary?Anyway, all data that are produced by public sector is (or should be) free.Accountability is very important, as long as it's objective. Which is, of course, difficult.E-policy why? Is the "horizontal" evaluation only electronic?Predict is the bast caseStatistics are only a kind of analysis.Patterns and "black-swans" is the obvious important things. But i don't know if these are enough for policy making.Social networks are full of bots, promoted posts etc. I don't know if they are the best source.Ex ante impact assessment can take place also in more traditional ways. But there needs to be true political will.There's a typo here: "data, but TT needs..."I kind of agree with the first comment here...Are thousands of opinions enough to be dealt as big data?This is of the utmost importance, investments should be made.objective evaluation of policies' impact is very important!Whether we want it or not, financial sustinability has to be ensuredParticularly in times of financial/economic crisis.Interoperability, re-usability. Could ask for a roadmap themselves!Systems in general? Or a common system, only with fine-tuning per case?Doesn't this constitute a merging/combination of already mentioned entries?Data management is the correct term i think; it involves data determination, maintenance etc.It's strange that everyone focuses on the same qualities of policy making and they are not achieved...It would be a good idea if the Better Regulation guidelines of the EC are taken into account for the development of new evaluation frameworksOn representativeness of collected data, is a process active, through which data sources are weighted in order to gather the most relevant and accurate?Here the time when data expire, and must be deleted or not taken into consideration, is important. Another challenge is how can we be certain that people are honest regarding their answers Also legal issues (e.g. GDPR) should be addressed, perhaps via anonymisation.Furthermore, it should be considered for how long the data are valid, and when their processing should be stopped. Also, when data are not valid anymore, should they be deleted, or maintained so as to check paterns in the future.Nowadays, security and privacy tend to gain significant priority against the rest of the factors examined.As a means of self-improvement mechanism.New approaches always raise dispute so that is way the y need to be well documented and tested.Good visualisations do make a difference but they may become confusing at timesVery advanced science...Nice!Crucial step!Real time insights are a really interesting case to considerBig data will be a powerful tool in this processYESGood Read!Good Read!Good Read!Good Read!Good Read!Good Read!Good Read!"streaming data" might be misleading (--> media streaming) "stream of data" also includes this idea/pciture of permanent generation of data "agricultural data" might be addedI see a link to FIWARE, as a plattform and even marketplace for tools&methods https://www.fiware.org/ very important, also "traceability" - who issued/published the data? how is is transformed?what is with anonymization? (another way of cleaning...) source "social media" (see above) need anonymization, maybe there is also some fraction of personal data inside other data sources, e.g. mobility datawhat's with the concept of "virtual sensors/sensing"? Topics include "Modelling" and "Data Visualization"what's with the concept of "virtual sensors/sensing"? Topics include "Modelling" and "Data Visualization"This proposition requires further refinement - it does not make entirely clear either the precise definition of the concept, or how it is distinct from other forms of policymaking. This is a convoluted formulation. It is unclear in both instances what the consequences are or will be. While this is an important observation, the proposition needs to be formulated better.It is important to consider, at this stage, advances in digital technologies and their role in the policymaking process. It is important to take into account here developments in HPC tools and techniques used to process big data.This is an important facet to be considered, as it implies the presence of political variables that need to be factored into the agenda setting process.Equally important is that data and/or statistics are not always associated with a problem until it is detected.Most of the data obtained from these sources involves Personally Identifying Information (PII), carries associated privacy risks, and must hence be handled responsibly.This proposition consists of two separate issues: 1) government use of social media networks to enhance participation, and 2) the legal and ethical implications of using this data. The connection between the two ideas needs to be more clearly articulated.Responsible data handling and other data management practices, together with sufficient data protection measures, are two further important considerations in this context.Responsible data handling and other data management practices, together with sufficient data protection measures, are two further important considerations in this context.The use of big data and related analytics tools and services has necessarily to be accompanied by appropriate regulatory measures. This is not new, but perhaps needs to be articulated nonetheless.This should be obvious. It is more important, however, to understand how big data can be leveraged to evaluate the impact of a policy or predict an outcome.Two issues requiring greater understanding present themselves: first, is the nature of data as a dual purpose commodity with both economic and social value. Second, is the nature of the incentives available to encourage data providers to share their data for public benefit. See: Virkar S., Viale Pereira G., Vignoli M. (2019) Investigating the Social, Political, Economic and Cultural Implications of Data Trading. Lindgren I. et al. (eds) Electronic Government. EGOV 2019. Lecture Notes in Computer Science, vol 11685.Springer, Cham.The identification of key stakeholders and their role in the policymaking process should ideally be done in parallel with the analysis of big data, and not necessarily afterwards. In actual practice, this might be neither feasible nor desirable.That new policies immediately produce new data is not necessarily always the case. That new policies immediately produce new data is not necessarily always the case. This is an important observation/consideration. That big data may be used to infer behavioural insights which then inform public policy is a proposition that should not ignored. The government of the United Kingdom has set up a policy unit - The Behavioural Insights Team (BIT) - that does exactly this. See: https://www.bi.team/As are appropriate data management practices and citizen outreach programmes.Appropriate data management policies and privacy safeguards act as incentives - if not necessary prerequisites - for citizens to surrender their data.The implementation of technical standards and legislation that safeguard privacy is required in this context.From a technical perspective, fragmented/inaccessible/incomplete data and data sources constitute a significant bottleneck to the use of Big Data in policymaking.The salience of data used in this context is key.This is not something that the use of Big Data can necessarily resolve.This requires not just the use of Big Data, but also the deployment of appropriate data analytics tools and services.This is an important challenge requiring resolution.This should be done in parallel with, or directly before, the actual formulation of policy.This is possible provided that big data is available regularly and in real-time.This is a valid proposition, and any framework should factor in the use of new digital technologies and tools.A way needs to be found to integrate impact assessment and sentiment analysis tools and techniques to gauge citizen opinion expressed via social media channels.These should already be in place, and this is an important research gap therefore. Achieving near-complete interoperability across public systems and databases, together with the streamlining of organisational processes, are important prerequisites of technology acceptance.This is an important research gap in an era of big data-driven policymaking.This is becoming an important research domain as more and more governments integrate disruptive technologies into existing public systems.This should be a fundamental consideration in policymaking, and ought to be researched further.This is an exceedingly important starting point for research into the privacy, transparency, trust implications of big data-driven policymaking.This consideration is important for determining the incentives available for the opening and sharing of data.Resolving this research challenge is important for bringing an element of transparency into the policymaking process.This is an important observation.This is an emerging area of research, and should not be undervalued.These are two separate research challenges, and need to be considered as such.Agreed. The two concepts - governance and government - should be recognised as being separate and not be confused with each other.This is a continuously evolving research domain, whose development should not be ignored.This is a continuously evolving research domain, whose development should not be ignored.Fragmented, hidden, or inaccessible data sources need to be aggregated and their use optimised thereby.This is an important research challenge/problem space for consideration by practitioners.It must be remembered that this is equally the responsibility of the data provider, and that research efforts should focus to some degree on establishing standards for data handling best practice.Data collected from these sources contains personally identifying information, and hence, researchers must equipped to handle this data responsibly.This is a significant prerequisite, together with the implementation of sufficient data management regulations and the creation of awareness surrounding best practice.Further research into the challenges associated with the training of algorithms is required in this context.In practice, this is a key facet of policymaking.Greater research into decision support tools as applied to policymaking is required, therefore. Developments in both techniques must not be overlooked.Visualisations constitute important representations of big datasets that facilitate sense-making by non-expert users. See: H. Vornhagen's work on governance dashboards.This is important within the context of communicating policy objectives and outcomes to the wider public.See for example: Vornhagen, H., Davis, B., & Zarrouk, M. (2018). Sensemaking of complex sociotechnical systems: the case of governance dashboards. In Proceedings of the 19th Annual International Conference on Digital Government Research: Governance in the Data Age. ACM Press.This should not be a primary consideration, in my opinion.This is correct.The clustering presented here is valid, and reasonably comprehensive.Issues related to stakeholder identification, their roles and responsibilities, and the associated impact of policy could form another specific research cluster.Apart from a missing cluster focusing on stakeholder analysis, the current list appears to be fairly comprehensive. The Behavioural Insights Team has conducted extensive applicative work in the domain.This is not necessarily a threat arising from the application of nudge theory or nudging - instead, this could be a result of the manner in which data sharing is incentivised.This is an important ethical consideration that warrants further research.This is an important observation, and herein the incorporation of behavioural insights during the policymaking process is key.See, for example: Collmann, J., Matei, S.A. (2016)Ethical Reasoning in Big Data: An Exploratory Analysis. Springer, Cham.These are important applicative ethical and socio-technical aspects of policymaking that warrant further investigation.Data availability and veracity in this context is key.See examples of work done by the Behavioural Insights Team: https://www.bi.team/our-work/publications/As mentioned earlier, this is an emerging research area with significant policy implications.This is an important ethical consideration that warrants further exploration.This is an important consideration, with significant implications in practice. The feasibility of these methodologies, as applied in practice, needs to be researched further.Yes, this is an important research challenge.See: Virkar, S., Viale Pereira, G. (2018) Exploring Open Data State-of-the-Art: A Review of the Social, Economic and Political Impacts. Parycek et. al. Electronic Government EGOV2018. Lecture Notes in Computer Science vol.11020. Springer, Cham.The development of data trading platforms, and accompanying targeted national policy, is equally important in this respect. This remains an important domain of critical enquiry.See, for example: Charalabidis, Y., Zuiderwijk, A., Alexopoulos, C., Janssen, M., Lampoltshammer, T.J., Ferro, E. (2018)The World of Open Data: Concepts, Methods, Tools and Experiences. Springer, Cham.The application of HPC tools and techniques to the extraction and aggregation of meaningful information from data are particularly relevant in this context.For a popular take on the subject, see: Vaidhyanathan, S. (2018). Antisocial media: How Facebook disconnects us and undermines democracy. Oxford University Press.This is an important research question, especially if raised while considering the application of disruptive technologies to public service provision and policymaking.Agreed.Research on data trading initiatives is important in this respect.This is an important research consideration. For a discussion of the role of government in data trading, see: See: Virkar S., Viale Pereira G., Vignoli M. (2019) Investigating the Social, Political, Economic and Cultural Implications of Data Trading. Lindgren I. et al. (eds) Electronic Government. EGOV 2019. Lecture Notes in Computer Science, vol 11685.Springer, Cham. See, for example: Rinnerbauer, B., Thurnay, L., Lampoltshammer, T. J. (2018). Limitations of Legal Warranty in Trade of Data. Virkar, S., Parycek, P. Edelmann, N., Glassey, O., Janssen, M., Scholl, H. J., Tambouris, E., Proceedings of the International Conference EGOV-CeDEM-ePart 2018, 3-5 September 2018. Danube University Krems, Austria: 143-151, Edition Donau-Universität Krems.This is an important research consideration.These are important domains with significant legal and ethical implications that thereby warrant further research.These issues related to IoT carry important legal implications that are not yet sufficiently defined or explained in existing literature. Consent management is an hitherto neglected issue pertaining to the use of Big Data in this context.This is a highly relevant research issue, especially in the context of disruptive technology deployment.Well-articulated legal and ethical guidelines need to be developed in this context.This is not necessarily the case.The efficacy of Blockchain technology needs to be looked at further in this context.This is true, particularly in the context of the European Union.An appropriate understanding of the given data network has to be developed first.Agreed. The interoperability of government data, and the design of governmental data networks, warrants further looking into.Informed consent, together with accompanying legal and ethical considerations, is an important facet of this discussion that needs to be investigated further.National interoperability frameworks, and their compatibility with the EIF, need to be developed and understood through further research.New research projects, for example ManyLaws: EU-Wide Legal Text Mining using Big Data Infrastructures, seek to leverage these functionalities to develop e-government systems.These principles are already enshrined in European Union policy, and their implications - both legal and ethical - need to be researched further.The legal implications associated with data acquisition via IoT technologies, and the rights and responsibilities of the various stakeholders involved, are both important research areas. This is an important emerging research domain, as most currently-available analytics software is proprietary.The sourcing and analysis of data obtained from social media platforms is often contentious, and requires careful attention on the part of the researcher.tools and technologies that facilitate online sentiment analysis and data mining are often proprietary, and the research conclusions drawn not always representative. The veracity of opinion-based data obtained from social media sources is often inconsistent. Research conclusions drawn, therefore, require careful consideration. Content analysis of social media posts is also important in this respect.Big Data quality assessment is an important step, that should preferably occur before, rather than during or after, data pre-processing. Hence, it constitutes an important research challenge.This is an important observation.This is not a research challenge per se, instead it becomes a challenge during its implementation in practice.Significant ethical concerns are implied, therefore, that warrant further investigation.There is more substance to stakeholder analysis and policy impact analysis than is mentioned here.More research is required to refine the aforementioned sampling methodologies.This is an emerging research domain that warrants further examination.Legal conditions and ethical considerations need to be critically examined in this context.Agreed.This is a significant emerging research domain in the context of the use of disruptive technologies in the public sector.Data mining and text mining are two important methodologies that warrant further research and practical application in this context.Pattern extraction can be extended to all policy domains, and all potential avenues for the application of these techniques must be thoroughly explored.A thorough knowledge of basic model validation techniques is herein considered important for policymakers. Agent-based modelling is an important technique to be explored in this context. See for example: Batty, M., Crooks, A. T., See, L. M., & Heppenstall, A. J. (2012). Perspectives on agent-based models and geographical systems. In Agent-Based Models of Geographical Systems (pp. 1-15). Springer, Dordrecht. This proposition requires further refinement.Goal modelling, although a fundamental concept, is an important step in this process.Policymakers need to be equipped with the soft skills necessary to leverage the potential of these tools and technologies. Gamification is an important concept that may be applied in this context. See, for example: Devisch, O., Poplin, A., & Sofronie, S. (2016) The Gamification of Civic Participation: Two Experiments in Improving the Skills of Citizens to Reflect Collectively on Spatial Issues. Journal of Urban Technology, 23(2), 81-102.Agreed. In the context of policymaking, this is a fundamental output of data science-based initiatives.Further research in the domain of decision support tools and systems is required, therefore. This is one approach that can be applied successfully to effective policy modelling, and thus warrants further close examination.Collaborative modelling is one way of generating targeted policies, and ensuring their acceptance. However, there are other factors involved in this process, and these must not be forgotten.The risks associated with modelling co-creation also need to be critically examined.This proposition is valid.These are important areas that require further research.Complex systems theory is an interest theoretical approach in this respect. See, for example: Cilliers, P. (2002). Complexity and postmodernism: Understanding complex systems. Routledge.This is an interesting research area that needs to be further explored.The legal implications of data visualisation techniques, especially in the context of IoT, need to be critically explored.Social science input must not be foreclosed.Policymakers need to be equipped to deploy these tools within various policymaking contexts.Agreed, this is an evolving research domain.A critical look at the role of visualisations in policymaking is required in this context.These are two different (although related) issues that need to be addressed in separate propositions.See, for example: Vornhagen, H. (2018). Effective Visualisation to Enable Sensemaking of Complex Systems. The Case of Governance Dashboard. In Virkar, Shefali, Peter Parycek, Noella Edelmann, Olivier Glassey, Marijn Janssen, Hans Jochen Scholl, and Efthimios Tambouris, eds. Proceedings of the International Conference EGOV-CeDEM-ePart 2018: 3-5 September 2018 Danube University Krems, Austria (pp. 313-321). Edition Donau-Universität Krems, 2018.This issue needs to be better understood and addressed through further research.For example, see: Vornhagen, H., Young, K. and Zarrouk, M. (2019) Understanding My City through Dashboards. How Hard Can It Be?. In Virkar, S., Glassey, O., Janssen, M., Parycek, P., Polini, A, Re, B., Reichstädter, P., Scholl, H.J., Tambouris, E. (eds.) EGOV-CeDEM-ePart 2019: Proceedings of Ongoing Research, Practitioners, Posters, Workshops, and Projects of the International Conference EGOV-CeDEM-ePart 2019. 2-4 September 2019, San Benedetto del Tronto, Italy. (pp. 21-30).See H. Vornhagen's work on government dashboards, cited previously.Maybe you could specify more stakeholoders rather than split into "citizens" and "relevant stakeholders"? Is the focus of "new data sources" and "new tehcniques" a digital one only, or are you considering ones that are not too?I agree with Shefali, this point is somewhat complicated, some ideas: "would like to": should "case"/"other case" on the one hand/the other hand In the second case: simplify the statement by saying something like "data is collected but not shared" Problems have increasingly been considered "wicked problems", pointing out the complexity of the problems and that solutions required must be more more than political or analtical.Social media/social network monitoring is a delicate issue because on the one hand there are benefits, but may have an impact on citizen trust. Not only available and usable, but accessible and understandable. Providing too much can also lead to problems.When thinking about the way citizens can participate it is important to consider their needs, abilities and competences, not just regarding their digital skills but also about their role in society. And it is important to ensure that their participation is accepted, as well as the results that may be the outcome from their participation.Why are only citizens mentioned in this list? At the top of the document "other stakeholders" are mnetioned - and how they contribute may be quite different.Will you be getting 1,000s of opinions? It may be useful to think about a framework of competences for different types of participation by different users (stakeholders). It may be useful to think about a framework of competences for different types of participation by different users (stakeholders). Co-creation is understood in many ways (see work by Osbourne, Löffler, Boviard, ect.), expecially in different contexts (e.g. private sector, public sector). Often it is used interchangeably (see the overview by Voorberg et al). In addition, digital co-creation has different implications than analog co-creation.It is extremely important to differentiate between co-creation in the public sector and in the private sector or between citizens and businesses. It is true that if the service is more tailored to the citizens' needs, but the needs of the PA employess must also be part be considered (Åkesson & Edvardsson, 2008 or Ryan, 2012) Regarding the difference between the private and see Grönroos (2011) and Osborne, Radnor, and Nasi (2012)With science-like activities you are making a lot of demands from citizens - in the sense of knowledge, time, skills, access etc.What would their non-policy roles be?Who is able to determine what is fake news? See work by Oystein Sæbø on the difficulties faced by experts (and if they find it difficult, waht skills are missing so that we can learn how to identify fake news?Clear need to provide the necessary competences; see work by Rasto Kuzel!