Data Science relation to big data, and Analytic case study
Receiver Operator curve or ROC curve are used in data mining , machine learning. from area under ROC curve u can calculate Gini coefficient. I have made an excel template
Example to show how its calculated.
if AUC is area under curve then,
Gini coefficient the most watched coefficient of economics these days :
I wrote a article comparing different countries of world with data available
Gini coefficient AUC has some component of noise which called to question of better measures which are used in machine learning DeltP or informedness ,mattews correlation coefficient each one is suitable to its own field while informedness=1 shows perfect performance while -1 represent perverse of negative performance despite all informedness. Economics Gini zero shows perfect equality.
So parameters keep improving there is no end result and there cannot be as our understanding increases we come at better measures and change is constant..but what is truth today was mystery or magic for old and would be kind of half truth for future..But the subjects are interconnected the branching of knowledge areas is going on since last 250 yrs.. earlier there was no engineering everything was under philosophy during Socrates. Socrates rightly said : that you cannot say anything with absolute certainty. But you can have informed decision that is what informedness quantifies that your decision how much they are informed decisions.
See a case from Biometrics:
Collaboration tools integrated offering (course grain integration using ) integration tools like TIBCO, Oracle BPEL, : Components to be integrated:
1. Content management system CMS (SharePoint, Joomla, drupal) and
2. Document Management system like (liferay, Document-um, IBM file-net) can be integrated using flexible integration tools.
3. Communication platform like Windows Communication Foundation ,IBM lotus notes integrated with mail client and Social network like Facebook using Facebook API, LinkedIn API, twitter API ,skype API to direct plugin as well as data Analysis of Social networking platform unstructured data captured of the collaboration for the project discussion.
soft-phone using Skype offering recording conversation facility for later use.
Oracle Web centre:
4. Integrated Project specific Wikki/Sharepoint/other CMS pages integrated with PMO site Artefacts, Enterprise Architecture Artefacts.
5. seamless integration to Enterprise Search using Endeca or Microsoft FAST for discovery of document, information, answers from indexed,tagged repository of data.
6. Structured and Unstructured data : hosted on Hadoop clusters using Map-reduce algorithm to Analyse data, consolidate data using Hadoop Hive, HBase and mining to discover hidden information using data mining library in Mahout for unstructured data.
Structured data kept in RDBMS clusters like RAC rapid application clusters.
7. Integrated with Domain specific Enterprise resource planning ERP packages the communication, collaboration,Discovery, Search layer.
8. All integrated with mesh up architecture providing real-time information maps of resource located and information of nearest help.
9. messaging and communication layer integrated with all on-line company software.
10.Process Orchestration and integration Using Business Process Management tool BPM tool, PEGA BPM, Jboss BPM , windows workflow foundation depending landscape used.
11. Private cloud integration using Oracle cloud , Microsoft Azure, Eucalyptus, open Nebula integrated with web API other web platform landscape.
12. Integrated BI system with real time information access by tools like TIBCO spotfire which can analyse real time data flowing between integrated systems.
Data centre API and virtualisation plaform can also throw in data for analysis to hadoop cluster.
External links for reference: http://www.sap.com/index.epx
SAP XI: http://help.sap.com/saphelp_nw04/helpdata/en/9b/821140d72dc442e10000000a1550b0/content.htm
Map reduce: http://hadoop.apache.org/docs/stable/mapred_tutorial.html
facebook API: https://developers.facebook.com/docs/reference/apis/
Linkedin API: http://developer.linkedin.com/apis
Twitter API: https://dev.twitter.com/
Hadoop is more used for Massive Parallel processing MPP architecture.
new MPP platform which can scaleout to petabyte database hadoop which is open source community(around apache, vendor agnostic framework in MPP), can help in faster precessing of heavy loads. Mapreduce can be used for further customisation.
hadoop can help roles CTO : log analysis of huge data of suppose application logging millions of transaction data .
CMO: targetted offering from social data, target advertisements and customer offerings.
CFO : on using predictive analytics to find toxicity of Loan or mortage from social data of prespects.
As datawarehousing and BI in Technology driven Company people report to CTO only.But it getting pervasive..so user load in BI System increase leading to efficient processing through system like hadoop of social data.
hadoop can help in near realtime analysis of customer like customer click stream real-time analysis,(realtime changing customer interest can be checked over portal ).
Can bring paradigm shift in Next generation enterprise EDW,SOA(hadoop). Mapreduce in data virtualitzation.In cloud we have (platform,Infrastructure,software).
mahout : Framework for machine learning for analyzing huge data and predictive analytic on it. Open source framework support for Mapreduce.Real time analytic helps in figuring trend very early from customer perspective hence adoption level should be high in customer Relationship management modules so it growth of Salesforce.com depicts.
HDFS: is suited for batch processing.
HBase: for but near realtime
casendra : optimized real tim e distributed environment.
Hr Analytics: There are high degree of silos: cycle through lots survey data :–> prepare report –> generalized problem –> find solutions for generalized data . Data from perspective of application, application as perspective of data.
BI help us in getting single version of truth about structure data but unstructured data is where Hadoop helps. Hadoop can process: (structureed,un-structured, timeline etc..across enteripse) data.from service oriented Architeture we need to move from SOA towards SOBA Service oriented business Architecture.SOBAs are applications composed of services in a declarative manner .The SOA Programming Model specifications include the Service Component Architecture (SCA) to simplify the development of creating business services and Service Data Objects (SDO) for accessing data residing in multiple locations and formats.Moving towards data driven application architectures.Rather than application arranged around data have to otherwise application arranged around data.
Architect view point: 1. people and process as overlay of technology. Expose data trough service oriented data access. Hadoop helps in processing power in MDM, quality, integrating data outside enterprise.
utility Industry:Is the first industry to adopt Cloud services with smart metering. Which can give smart input to user about load in network rather then calling services provider user is self aware..Its like Oracle brought this concept of Self service applications.
I am going to refine matter further put some more example and ilustrations if time permits..
Read More details at another blog:
The landscape is complicated as enterprises move more data and business processes to public and private clouds.
•Big Interaction Data: This emerging force consists of social media data from Facebook,Twitter, LinkedIn, and other sources. It includes call detail records (CDRs), device and sensor information, GPS and geolocational mapping data, large image files through Manage File
Transfer, Web text and clickstream data, scientific information, emails, and more.
As Big Data comes into focus, it’s capturing the attention of CIOs, VPs of information management (IM), enterprise architects, line-of-business owners, and business executives who recognize the vital role that data plays in performance.
according to a 2011 Gartner survey of CEOs and senior executives.7 Big Data is relevant to virtually every industry:
•Consumer industries: From retail to travel and hospitality, organizations can capture Facebook posts, Twitter tweets, YouTube videos, blog commentary, and other social media content to better understand, sell to, and service customers, manage brand reputation, and leverage wordof- mouth marketing.
•Financial services: Banks, insurers, brokerages, and diversified financial services companies are looking to Big Data integration and analytics to better attract and retain customers and enable targeted cross-sell, as well as strengthen fraud detection, risk management, and compliance by applying analytics to Big Data.
•Public sector: Federal Networking and Information Technology Research and Development (NITRD) working group announced the Designing a Digital Future report. The report declared that “every federal agency needs a Big Data strategy,” supporting science, medicine, commerce, national security, and other areas; state and local agencies are coping with similar increases in data volumes in such diverse areas as environmental reviews, counter terrorism and constituent relations.
•Manufacturing and supply chain: Managing large real-time flows of radio frequency identification (RFID) data can help companies optimize logistics, inventory, and production while swiftly pinpointing manufacturing defects; GPS and mapping data can streamline supplychain efficiency.
•E-commerce: Harnessing enormous quantities of B2B and B2C clickstream, text, and image data and integrating them with transactional data (such as customer profiles) can improve e-commerce efficiency and precision while enabling a seamless customer experience across multiple channels.
•Healthcare: The industry’s transition to electronic medical records and sharing of medical research data among entities is generating vast data volumes and posing acute data management challenges; biotech and pharmaceutical firms are focusing on Big Data in suchareas as genomic research and drug discovery.
•Telecommunications: Ceaseless streams of CDRs, text messages, and mobile Web access both jeopardize telco profitability and offer opportunities for network optimization. Firms are looking to Big Data for insights to tune product and service delivery to fast-changing customer demands using social network analysis and influence maps.
According to Gartner, “CEO Advisory: ‘Big Data’ Equals Big Opportunity,” March 31, 2011.
Article Big Data Unleashed: Turning Big Data into Big Opportunities with the Informatica Platform Overcoming the Obstacles of Existing Data Infrastructures Traditional approaches to managing data are insufficient to deliver the value of business insight from Big Data sources. The growth of Big Data stands to exacerbate pain points that many enterprises suffer in their information management practices:
•Lack of business/IT agility The IM organization is perceived as too slow and too expensive in delivering solutions that the business needs for data-driven initiatives and decision making.
•Compromised business performance IM constantly deals with complaints from business users about the timeliness, reliability, and accuracy of data while lacking standards to ensure enterprise-wide data quality.
•Over reliance on IM The business has limited abilities to directly access the information it needs, requiring time-consuming involvement of IM and introducing delays into critical business processes.
•High costs and complexity The enterprise suffers escalating costs due to data growth and application sprawl, as well as degradation of systems performance, leaving it poorly positioned for the Big Data onslaught.
•Delays and IT re-engineering Costly architectural rework is necessary when requirements change even slightly, with little reuse of data integration logic across projects and groups.
•Lost customer opportunities Sales and service lack a complete view of the customer, undercutting revenue generation and missing opportunities to leverage behavioral and social media data.
Read Full Article at: http://sandyclassic.wordpress.com/2011/10/26/big-data-and-data-integration/