Tag Archives: hadoop

Collaboration Management System relation to Analytics and data Science

Collaboration tools integrated offering (course grain integration using ) integration tools like TIBCO, Oracle BPEL, : Components to be integrated:
1. Content management system CMS  (SharePoint, Joomla, drupal) and
2. Document Management system like (liferay, Document-um, IBM file-net) can be integrated using flexible integration tools.

3. Communication platform like Windows Communication Foundation ,IBM lotus notes integrated with mail client and Social network like Facebook using Facebook API, LinkedIn API, twitter API ,skype API to direct plugin as well as data Analysis of Social networking platform unstructured data captured of the collaboration for the project discussion.
soft-phone using Skype offering recording conversation facility for later use.

http://sandyclassic.wordpress.com/2013/06/19/how-to-do-social-media-analysis/

Oracle Web centre:
http://sandyclassic.wordpress.com/2011/11/04/new-social-computing-war-oracle-web-centre/
4. Integrated Project specific Wikki/Sharepoint/other CMS pages integrated with PMO site Artefacts, Enterprise Architecture Artefacts.
5. seamless integration to Enterprise Search using Endeca or Microsoft FAST for discovery of document, information, answers from indexed,tagged repository of data.
6. Structured and Unstructured data : hosted on Hadoop clusters using Map-reduce algorithm to Analyse data, consolidate data using Hadoop Hive, HBase and mining to discover hidden information using data mining library in Mahout for unstructured data.
Structured data kept in RDBMS clusters like RAC rapid application clusters.
http://sandyclassic.wordpress.com/2011/10/19/hadoop-its-relation-to-new-architecture-enterprise-datawarehouse/


http://sandyclassic.wordpress.com/2013/07/02/data-warehousing-business-intelligence-and-cloud-computing/
7. Integrated with Domain specific Enterprise resource planning ERP packages the communication, collaboration,Discovery, Search layer.
8. All integrated with mesh up architecture providing real-time information maps of resource located and information of nearest help.
9. messaging and communication layer integrated with all on-line company software.
10.Process Orchestration and integration Using Business Process Management tool BPM tool, PEGA BPM, Jboss BPM , windows workflow foundation depending landscape used.
11. Private cloud integration using Oracle cloud , Microsoft Azure, Eucalyptus, open Nebula integrated with web API other web platform landscape.
http://sandyclassic.wordpress.com/2011/10/20/infrastructure-as-service-iaas-offerings-and-tools-in-market-trends/
12. Integrated BI system with real time information access by tools like TIBCO spotfire which can analyse real time data flowing between integrated systems.
Data centre API and virtualisation plaform can also throw in data for analysis to hadoop cluster.
External links for reference: http://www.sap.com/index.epx
http://www.oracle.com,http://www.tibco.com/,http://spotfire.tibco.com/,
http://scn.sap.com/thread/1228659
S
AP XI: http://help.sap.com/saphelp_nw04/helpdata/en/9b/821140d72dc442e10000000a1550b0/content.htm

Oracle Web centre: http://www.oracle.com/technetwork/middleware/webcenter/suite/overview/index.html

CMS: http://www.joomla.org/,http://www.liferay.com/http://www-03.ibm.com/software/products/us/en/filecontmana/
Hadoop: http://hadoop.apache.org/

Map reduce: http://hadoop.apache.org/docs/stable/mapred_tutorial.html
f
acebook API: https://developers.facebook.com/docs/reference/apis/
L
inkedin API: http://developer.linkedin.com/apis
T
witter API: https://dev.twitter.com/

Hadoop and Data Science

Hadoop is more used for Massive Parallel processing MPP architecture.

new MPP platform which can scaleout to petabyte database hadoop which is open source community(around apache, vendor agnostic framework in MPP), can help in faster precessing of heavy loads. Mapreduce can be used for further customisation.

hadoop can help roles CTO  : log analysis of huge data of suppose application logging millions of transaction data .

CMO: targetted offering from social data, target advertisements and customer offerings.

CFO : on using predictive analytics to find toxicity of Loan or mortage from social data of prespects.

As datawarehousing and BI in Technology driven Company people report to CTO only.But it getting pervasive..so user load in BI System increase leading to efficient processing through system like hadoop of social data.

hadoop can help in near realtime analysis of customer like customer click stream real-time analysis,(realtime changing customer interest  can be checked over portal ).

Can bring paradigm shift in Next generation enterprise EDW,SOA(hadoop).  Mapreduce in data virtualitzation.In  cloud we have  (platform,Infrastructure,software).

mahout : Framework for machine learning for analyzing huge data and predictive analytic on it. Open source framework support for Mapreduce.Real time analytic helps in figuring trend very early from customer perspective hence adoption level should be high in customer Relationship management modules so it growth of Salesforce.com depicts.

HDFS: is suited for batch processing.

HBase: for but near realtime

casendra : optimized real tim e distributed environment.

Hr Analytics: There are  high degree of silos: cycle through lots survey data :–> prepare report –> generalized problem  –> find solutions for generalized data . Data from perspective of application, application as perspective of data.

BI help us in getting single version of truth about structure data but unstructured data is where Hadoop helps. Hadoop can process: (structureed,un-structured, timeline etc..across enteripse) data.from service oriented Architeture we need to move from SOA towards  SOBA Service oriented business Architecture.SOBAs are applications composed of services in a declarative manner .The SOA Programming Model specifications include the Service Component Architecture (SCA) to simplify the development of creating business services and Service Data Objects (SDO) for accessing data residing in multiple locations and formats.Moving towards data driven application architectures.Rather than application arranged around data have to otherwise application arranged around data. 

Architect view point: 1. people and process as overlay of technology. Expose data trough service oriented data access.  Hadoop helps in processing power in MDM, quality, integrating data outside enterprise.

utility Industry:Is the first industry to adopt Cloud services with smart metering. Which can give smart input to user about load in network rather then calling services provider user is self aware..Its like Oracle brought this concept of Self service applications.

I am going to refine matter further put some more example and ilustrations if time permits..
Read More details at another blog:
http://sandyclassic.wordpress.com/2013/09/22/approach-to-best-collaboration-management-system/

Bigdata,cloud , business Intelligence and Analytics

There huge amount of data being generated by BigData Chractersized by 3V (Variety,Volume,Velocity) of different variety (audio, video, text, ) huge volumes (large video feeds, audio feeds etc), and velocity ( rapid change in data , and rapid changes in new delta data being large than existing data each day…) Like facebook keep special software which keep latest data feeds posts on first layer storage server Memcached (memory caching) server bandwidth so that its not clogged and fetched quickly and posted in real time speed the old archive data stored not in front storage servers but second layer of the servers.
Bigdata 3V characteristic data likewise stored in huge (Storage Area Network) SAN of cloud storage can be controlled by IAAS (infrastucture as service) component software like Eucalyptus to create public or private cloud. PAAS (platform as service) provide platform API to control package and integrate to other components using code. while SAAS provide seamless Integration.
Now Bigdata stored in cloud can analyzed using hardtop clusters using business Intelligence and Analytic Software.
Datawahouse DW: in RDBMS database to in Hadoop Hive. Using ETL tools (like Informatica, datastage , SSIS) data can be fetched operational systems into data ware house either Hive  for unstructured data or RDBMS for more structured data.

BI over cloud DW: BI can create very user friendly intuitive reports by giving user access to layer of SQL generating software layer called semantic layer which can generate SQL queries on fly depending on what user drag and drop. This like noSQL and HIVE help in analyzing unstructured data faster like data of social media long text, sentences, video feeds.At same time due to parallelism in Hadoop clusters and use of map reduce algorithm the calculations and processing can be lot quicker..which is fulling the Entry of Hadoop and cloud there.
Analytics and data mining is expension to BI. The social media data mostly being unstructured and hence cannot be analysed without categorization and hence quantification then running other algorithm for analysis..hence Analytics is the only way to get meaning from terabyte of data being populated in social media sites each day.

Even simple assumptions like test of hypothesis cannot be done with analytics on the vast unstructured data without using Analytics. Analytics differentiate itself from datawarehouse as it require much lower granularity data..or like base/raw data..which is were traditional warehouses differ. some provide a workaround by having a staging datawarehouse but still  data storage here has limits and its only possible for structured data. So traditional datawarehouse solution is not fit in new 3V data analysis. here new Hadoop take position with Hive and HBase and noSQL and mining with mahout.
Look At the other Article: For practical tool based Case Study on Social Media Analytics
https://thedatascience.wordpress.com/2014/05/29/step-by-step-data-science/